This email is good. Just one note -- a lot of people are swamped right before Spark Summit, so you might not get prompt responses this week.
On Wed, Jun 10, 2015 at 2:53 PM, Grega Kešpret <gr...@celtra.com> wrote: > I have some time to work on it now. What's a good way to continue the > discussions before coding it? > > This e-mail list, JIRA or something else? > > On Mon, Apr 6, 2015 at 12:59 AM, Reynold Xin <r...@databricks.com> wrote: > >> I think those are great to have. I would put them in the DataFrame API >> though, since this is applying to structured data. Many of the advanced >> functions on the PairRDDFunctions should really go into the DataFrame API >> now we have it. >> >> One thing that would be great to understand is what state-of-the-art >> alternatives are out there. I did a quick google scholar search using the >> keyword "approximate quantile" and found some older papers. Just the >> first few I found: >> >> http://www.softnet.tuc.gr/~minos/Papers/sigmod05.pdf by bell labs >> >> >> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.6.6513&rep=rep1&type=pdf >> by Bruce Lindsay, IBM >> >> http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf >> >> >> >> >> >> On Mon, Apr 6, 2015 at 12:50 AM, Grega Kešpret <gr...@celtra.com> wrote: >> >>> Hi! >>> >>> I'd like to get community's opinion on implementing a generic quantile >>> approximation algorithm for Spark that is O(n) and requires limited memory. >>> I would find it useful and I haven't found any existing implementation. The >>> plan was basically to wrap t-digest >>> <https://github.com/tdunning/t-digest>, implement the >>> serialization/deserialization boilerplate and provide >>> >>> def cdf(x: Double): Double >>> def quantile(q: Double): Double >>> >>> >>> on RDD[Double] and RDD[(K, Double)]. >>> >>> Let me know what you think. Any other ideas/suggestions also welcome! >>> >>> Best, >>> Grega >>> -- >>> [image: Inline image 1]*Grega Kešpret* >>> Senior Software Engineer, Analytics >>> >>> Skype: gregakespret >>> celtra.com <http://www.celtra.com/> | @celtramobile >>> <http://www.twitter.com/celtramobile> >>> >>> >> >