RE: Percentile example

2015-02-17 Thread SiMaYunRui
Thanks Imran for very detailed explanations and options. I think for now T-Digest is what I want. From: iras...@cloudera.com Date: Tue, 17 Feb 2015 08:39:48 -0600 Subject: Re: Percentile example To: myl...@hotmail.com CC: user@spark.apache.org (trying to repost to the list w/out URLs

RE: Percentile example

2015-02-17 Thread SiMaYunRui
Thanks Kohler, that's very interesting approach. I never used Spark SQL and not sure whether my cluster was configured well for it. But will definitely have a try.  From: c.koh...@elsevier.com To: myl...@hotmail.com; user@spark.apache.org Subject: Re: Percentile example Date: Tue, 17 Feb 2015

Re: Percentile example

2015-02-17 Thread Imran Rashid
(trying to repost to the list w/out URLs -- rejected as spam earlier) Hi, Using take() is not a good idea, as you have noted it will pull a lot of data down to the driver so its not scalable. Here are some more scalable alternatives: 1. Approximate solutions 1a. Sample the data. Just sample

Re: Percentile example

2015-02-17 Thread Kohler, Curt E (ELS-STL)
@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Percentile example hello, I am a newbie to spark and trying to figure out how to get percentile against a big data set. Actually, I googled this topic but not find any very useful code example and explanation. Seems

Percentile example

2015-02-15 Thread SiMaYunRui
hello, I am a newbie to spark and trying to figure out how to get percentile against a big data set. Actually, I googled this topic but not find any very useful code example and explanation. Seems that I can use transformer SortBykey to get my data set in order, but not pretty sure how can I