Hi Sandy,
You could take a look at using the Q-Tree data structure that is provided
by Twitter's 
Algebird<https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/QTree.scala>.
Due to the associative properties of Algebird's SemiGroup it is ideally
suited for streaming computations.

-Ryan


On Wed, Dec 4, 2013 at 8:32 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote:

> Hi All,
>
> We're working on a Spark application that could make use of a computing
> quantiles in a streaming fashion.  Something in the vein of what DataFu has
> for Pig
>
> http://linkedin.github.io/datafu/docs/current/datafu/pig/stats/StreamingQuantile.html
> .
>
> Does anything like this exist in the Spark ecosystem?  If not, would there
> be a good place to contribute this if we write it?
>
> thanks,
> Sandy
>

Reply via email to