Re: Several Aggregations on a window function

2017-12-18 Thread Anastasios Zouzias
Hi, You can use https://twitter.github.io/algebird/ which provides an implementation of interesting Monoids and ways to combine them to tuples (or products) of Monoids. Of course, you are not bound to use the algebird library but it might be helpful to bootstrap. On Mon, Dec 18, 2017 at 7:18

Re: Several Aggregations on a window function

2017-12-18 Thread Julien CHAMP
It seems interesting, however scalding seems to require be used outside of spark ? Le lun. 18 déc. 2017 à 17:15, Anastasios Zouzias a écrit : > Hi Julien, > > I am not sure if my answer applies on the streaming part of your question. > However, in batch processing, if you

Re: Several Aggregations on a window function

2017-12-18 Thread Anastasios Zouzias
Hi Julien, I am not sure if my answer applies on the streaming part of your question. However, in batch processing, if you want to perform multiple aggregations over an RDD with a single pass, a common approach is to use multiple aggregators (a.k.a. tuple monoids), see below an example from

Re: Several Aggregations on a window function

2017-12-18 Thread Julien CHAMP
I've been looking for several solutions but I can't find something efficient to compute many window function efficiently ( optimized computation or efficient parallelism ) Am I the only one interested by this ? Regards, Julien Le ven. 15 déc. 2017 à 21:34, Julien CHAMP

Re: Several Aggregations on a window function

2017-12-15 Thread Julien CHAMP
May be I should consider something like impala ? Le ven. 15 déc. 2017 à 11:32, Julien CHAMP a écrit : > Hi Spark Community members ! > > I want to do several ( from 1 to 10) aggregate functions using window > functions on something like 100 columns. > > Instead of doing

Several Aggregations on a window function

2017-12-15 Thread Julien CHAMP
Hi Spark Community members ! I want to do several ( from 1 to 10) aggregate functions using window functions on something like 100 columns. Instead of doing several pass on the data to compute each aggregate function, is there a way to do this efficiently ? Currently it seems that doing val