Hi Sandy, You could take a look at using the Q-Tree data structure that is provided by Twitter's Algebird<https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/QTree.scala>. Due to the associative properties of Algebird's SemiGroup it is ideally suited for streaming computations.
-Ryan On Wed, Dec 4, 2013 at 8:32 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote: > Hi All, > > We're working on a Spark application that could make use of a computing > quantiles in a streaming fashion. Something in the vein of what DataFu has > for Pig > > http://linkedin.github.io/datafu/docs/current/datafu/pig/stats/StreamingQuantile.html > . > > Does anything like this exist in the Spark ecosystem? If not, would there > be a good place to contribute this if we write it? > > thanks, > Sandy >