RDD functions using GUI

2017-04-17 Thread Ke Yang (Conan)
Hi, Are there drag and drop GUI (code-free) for RDD functions available? i.e. a GUI that generates code based on drag-n-drops? http://spark.apache.org/docs/latest/programming-guide.html#resilient-distributed-datasets-rdds thanks for brainstorming

Re: 2.2 branch

2017-04-17 Thread Michael Armbrust
I'm going to cut branch-2.2 tomorrow morning. On Thu, Apr 13, 2017 at 11:02 AM, Michael Armbrust wrote: > Yeah, I was delaying until 2.1.1 was out and some of the hive questions > were resolved. I'll make progress on that by the end of the week. Lets > aim for 2.2

Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-04-17 Thread Holden Karau
I think this is Java 8 v Java 7, if you look at the previous build you see a lot of the same missing classes but tagged as "warning" rather than "error". I think all in all it makes sense to stick to JDK7 to build the legacy build which have been built with it previously. If there is consensus on

Re: distributed computation of median

2017-04-17 Thread Koert Kuipers
Also q-tree is implemented in algebird, not hard to get it going in spark. That is another probabilistic data structure that is useful for this. On Apr 17, 2017 11:27, "Jason White" wrote: > Have you looked at t-digests? > > Calculating percentiles (including medians)

Re: distributed computation of median

2017-04-17 Thread Reynold Xin
The DataFrame API includes an approximate quartile implementation. If you ask for quantile 0.5, you will get approximate median. On Sun, Apr 16, 2017 at 9:24 PM svjk24 wrote: > Hello, > Is there any interest in an efficient distributed computation of the > median algorithm?

Re: distributed computation of median

2017-04-17 Thread Jason White
Have you looked at t-digests? Calculating percentiles (including medians) is something that is inherently difficult/inefficient to do in a distributed system. T-digests provide a useful probabilistic structure to allow you to compute any percentile with a known (and tunable) margin of error.