Hi,
Are there drag and drop GUI (code-free) for RDD functions available? i.e. a
GUI that generates code based on drag-n-drops?
http://spark.apache.org/docs/latest/programming-guide.html#resilient-distributed-datasets-rdds
thanks for brainstorming
I'm going to cut branch-2.2 tomorrow morning.
On Thu, Apr 13, 2017 at 11:02 AM, Michael Armbrust
wrote:
> Yeah, I was delaying until 2.1.1 was out and some of the hive questions
> were resolved. I'll make progress on that by the end of the week. Lets
> aim for 2.2
I think this is Java 8 v Java 7, if you look at the previous build you see
a lot of the same missing classes but tagged as "warning" rather than
"error". I think all in all it makes sense to stick to JDK7 to build the
legacy build which have been built with it previously.
If there is consensus on
Also q-tree is implemented in algebird, not hard to get it going in spark.
That is another probabilistic data structure that is useful for this.
On Apr 17, 2017 11:27, "Jason White" wrote:
> Have you looked at t-digests?
>
> Calculating percentiles (including medians)
The DataFrame API includes an approximate quartile implementation. If you
ask for quantile 0.5, you will get approximate median.
On Sun, Apr 16, 2017 at 9:24 PM svjk24 wrote:
> Hello,
> Is there any interest in an efficient distributed computation of the
> median algorithm?
Have you looked at t-digests?
Calculating percentiles (including medians) is something that is inherently
difficult/inefficient to do in a distributed system. T-digests provide a
useful probabilistic structure to allow you to compute any percentile with a
known (and tunable) margin of error.