Hi Julien, I am not sure if my answer applies on the streaming part of your question. However, in batch processing, if you want to perform multiple aggregations over an RDD with a single pass, a common approach is to use multiple aggregators (a.k.a. tuple monoids), see below an example from algebird:
https://github.com/twitter/scalding/wiki/Aggregation-using-Algebird-Aggregators#composing-aggregators . Best, Anastasios On Mon, Dec 18, 2017 at 10:38 AM, Julien CHAMP <jch...@tellmeplus.com> wrote: > I've been looking for several solutions but I can't find something > efficient to compute many window function efficiently ( optimized > computation or efficient parallelism ) > Am I the only one interested by this ? > > > Regards, > > Julien > > Le ven. 15 déc. 2017 à 21:34, Julien CHAMP <jch...@tellmeplus.com> a > écrit : > >> May be I should consider something like impala ? >> >> Le ven. 15 déc. 2017 à 11:32, Julien CHAMP <jch...@tellmeplus.com> a >> écrit : >> >>> Hi Spark Community members ! >>> >>> I want to do several ( from 1 to 10) aggregate functions using window >>> functions on something like 100 columns. >>> >>> Instead of doing several pass on the data to compute each aggregate >>> function, is there a way to do this efficiently ? >>> >>> >>> >>> Currently it seems that doing >>> >>> >>> val tw = >>> Window >>> .orderBy("date") >>> .partitionBy("id") >>> .rangeBetween(-8035200000L, 0) >>> >>> and then >>> >>> x >>> .withColumn("agg1", max("col").over(tw)) >>> .withColumn("agg2", min("col").over(tw)) >>> .withColumn("aggX", avg("col").over(tw)) >>> >>> >>> Is not really efficient :/ >>> It seems that it iterates on the whole column for each aggregation ? Am >>> I right ? >>> >>> Is there a way to compute all the required operations on a columns with >>> a single pass ? >>> Event better, to compute all the required operations on ALL columns with >>> a single pass ? >>> >>> Thx for your Future[Answers] >>> >>> Julien >>> >>> >>> >>> >>> >>> -- >>> >>> >>> Julien CHAMP — Data Scientist >>> >>> >>> *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : >>> **jch...@tellmeplus.com >>> <jch...@tellmeplus.com>* >>> >>> *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* : *here* >>> <https://www.linkedin.com/in/julienchamp> >>> >>> TellMePlus S.A — Predictive Objects >>> >>> *Paris* : 7 rue des Pommerots, 78400 Chatou >>> <https://maps.google.com/?q=7+rue+des+Pommerots,+78400+Chatou&entry=gmail&source=g> >>> *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière >>> <https://maps.google.com/?q=51+impasse+des+%C3%A9glantiers,+34980+St+Cl%C3%A9ment+de+Rivi%C3%A8re&entry=gmail&source=g> >>> >> -- >> >> >> Julien CHAMP — Data Scientist >> >> >> *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : >> **jch...@tellmeplus.com >> <jch...@tellmeplus.com>* >> >> *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* : *here* >> <https://www.linkedin.com/in/julienchamp> >> >> TellMePlus S.A — Predictive Objects >> >> *Paris* : 7 rue des Pommerots, 78400 Chatou >> <https://maps.google.com/?q=7+rue+des+Pommerots,+78400+Chatou&entry=gmail&source=g> >> *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière >> <https://maps.google.com/?q=51+impasse+des+%C3%A9glantiers,+34980+St+Cl%C3%A9ment+de+Rivi%C3%A8re&entry=gmail&source=g> >> > -- > > > Julien CHAMP — Data Scientist > > > *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : > **jch...@tellmeplus.com > <jch...@tellmeplus.com>* > > *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* : *here* > <https://www.linkedin.com/in/julienchamp> > > TellMePlus S.A — Predictive Objects > > *Paris* : 7 rue des Pommerots, 78400 Chatou > <https://maps.google.com/?q=7+rue+des+Pommerots,+78400+Chatou&entry=gmail&source=g> > *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière > <https://maps.google.com/?q=51+impasse+des+%C3%A9glantiers,+34980+St+Cl%C3%A9ment+de+Rivi%C3%A8re&entry=gmail&source=g> > > > Ce message peut contenir des informations confidentielles ou couvertes par > le secret professionnel, à l’intention de son destinataire. Si vous n’en > êtes pas le destinataire, merci de contacter l’expéditeur et d’en supprimer > toute copie. > This email may contain confidential and/or privileged information for the > intended recipient. If you are not the intended recipient, please contact > the sender and delete all copies. > > > <http://www.tellmeplus.com/assets/emailing/banner.html> > -- -- Anastasios Zouzias <a...@zurich.ibm.com>