Hi,
You can use https://twitter.github.io/algebird/ which provides an
implementation of interesting Monoids and ways to combine them to tuples
(or products) of Monoids. Of course, you are not bound to use the algebird
library but it might be helpful to bootstrap.
On Mon, Dec 18, 2017 at 7:18
It seems interesting, however scalding seems to require be used outside of
spark ?
Le lun. 18 déc. 2017 à 17:15, Anastasios Zouzias a
écrit :
> Hi Julien,
>
> I am not sure if my answer applies on the streaming part of your question.
> However, in batch processing, if you
Hi Julien,
I am not sure if my answer applies on the streaming part of your question.
However, in batch processing, if you want to perform multiple aggregations
over an RDD with a single pass, a common approach is to use multiple
aggregators (a.k.a. tuple monoids), see below an example from
I've been looking for several solutions but I can't find something
efficient to compute many window function efficiently ( optimized
computation or efficient parallelism )
Am I the only one interested by this ?
Regards,
Julien
Le ven. 15 déc. 2017 à 21:34, Julien CHAMP
May be I should consider something like impala ?
Le ven. 15 déc. 2017 à 11:32, Julien CHAMP a écrit :
> Hi Spark Community members !
>
> I want to do several ( from 1 to 10) aggregate functions using window
> functions on something like 100 columns.
>
> Instead of doing
Hi Spark Community members !
I want to do several ( from 1 to 10) aggregate functions using window
functions on something like 100 columns.
Instead of doing several pass on the data to compute each aggregate
function, is there a way to do this efficiently ?
Currently it seems that doing
val