Hi Julien,

I am not sure if my answer applies on the streaming part of your question.
However, in batch processing, if you want to perform multiple aggregations
over an RDD with a single pass, a common approach is to use multiple
aggregators (a.k.a. tuple monoids), see below an example from algebird:

https://github.com/twitter/scalding/wiki/Aggregation-using-Algebird-Aggregators#composing-aggregators
.

Best,
Anastasios

On Mon, Dec 18, 2017 at 10:38 AM, Julien CHAMP <jch...@tellmeplus.com>
wrote:

> I've been looking for several solutions but I can't find something
> efficient to compute many window function efficiently ( optimized
> computation or efficient parallelism )
> Am I the only one interested by this ?
>
>
> Regards,
>
> Julien
>
> Le ven. 15 déc. 2017 à 21:34, Julien CHAMP <jch...@tellmeplus.com> a
> écrit :
>
>> May be I should consider something like impala ?
>>
>> Le ven. 15 déc. 2017 à 11:32, Julien CHAMP <jch...@tellmeplus.com> a
>> écrit :
>>
>>> Hi Spark Community members !
>>>
>>> I want to do several ( from 1 to 10) aggregate functions using window
>>> functions on something like 100 columns.
>>>
>>> Instead of doing several pass on the data to compute each aggregate
>>> function, is there a way to do this efficiently ?
>>>
>>>
>>>
>>> Currently it seems that doing
>>>
>>>
>>> val tw =
>>>   Window
>>>     .orderBy("date")
>>>     .partitionBy("id")
>>>     .rangeBetween(-8035200000L, 0)
>>>
>>> and then
>>>
>>> x
>>>    .withColumn("agg1", max("col").over(tw))
>>>    .withColumn("agg2", min("col").over(tw))
>>>    .withColumn("aggX", avg("col").over(tw))
>>>
>>>
>>> Is not really efficient :/
>>> It seems that it iterates on the whole column for each aggregation ? Am
>>> I right ?
>>>
>>> Is there a way to compute all the required operations on a columns with
>>> a single pass ?
>>> Event better, to compute all the required operations on ALL columns with
>>> a single pass ?
>>>
>>> Thx for your Future[Answers]
>>>
>>> Julien
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>> Julien CHAMP — Data Scientist
>>>
>>>
>>> *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : 
>>> **jch...@tellmeplus.com
>>> <jch...@tellmeplus.com>*
>>>
>>> *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* :  *here*
>>> <https://www.linkedin.com/in/julienchamp>
>>>
>>> TellMePlus S.A — Predictive Objects
>>>
>>> *Paris* : 7 rue des Pommerots, 78400 Chatou
>>> <https://maps.google.com/?q=7+rue+des+Pommerots,+78400+Chatou&entry=gmail&source=g>
>>> *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière
>>> <https://maps.google.com/?q=51+impasse+des+%C3%A9glantiers,+34980+St+Cl%C3%A9ment+de+Rivi%C3%A8re&entry=gmail&source=g>
>>>
>> --
>>
>>
>> Julien CHAMP — Data Scientist
>>
>>
>> *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : 
>> **jch...@tellmeplus.com
>> <jch...@tellmeplus.com>*
>>
>> *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* :  *here*
>> <https://www.linkedin.com/in/julienchamp>
>>
>> TellMePlus S.A — Predictive Objects
>>
>> *Paris* : 7 rue des Pommerots, 78400 Chatou
>> <https://maps.google.com/?q=7+rue+des+Pommerots,+78400+Chatou&entry=gmail&source=g>
>> *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière
>> <https://maps.google.com/?q=51+impasse+des+%C3%A9glantiers,+34980+St+Cl%C3%A9ment+de+Rivi%C3%A8re&entry=gmail&source=g>
>>
> --
>
>
> Julien CHAMP — Data Scientist
>
>
> *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : 
> **jch...@tellmeplus.com
> <jch...@tellmeplus.com>*
>
> *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* :  *here*
> <https://www.linkedin.com/in/julienchamp>
>
> TellMePlus S.A — Predictive Objects
>
> *Paris* : 7 rue des Pommerots, 78400 Chatou
> <https://maps.google.com/?q=7+rue+des+Pommerots,+78400+Chatou&entry=gmail&source=g>
> *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière
> <https://maps.google.com/?q=51+impasse+des+%C3%A9glantiers,+34980+St+Cl%C3%A9ment+de+Rivi%C3%A8re&entry=gmail&source=g>
>
>
> Ce message peut contenir des informations confidentielles ou couvertes par
> le secret professionnel, à l’intention de son destinataire. Si vous n’en
> êtes pas le destinataire, merci de contacter l’expéditeur et d’en supprimer
> toute copie.
> This email may contain confidential and/or privileged information for the
> intended recipient. If you are not the intended recipient, please contact
> the sender and delete all copies.
>
>
> <http://www.tellmeplus.com/assets/emailing/banner.html>
>



-- 
-- Anastasios Zouzias
<a...@zurich.ibm.com>

Reply via email to