Hi All, I already solved it using DF and spark sql I was wondering how to solve in scala rdd, I just got the answer need to check my results compared to spark sql thanks all for your time.
I am trying to solve moving average using scala RDD group by key. input:- -987~20150728~100 -987~20150729~50 -987~20150730~-100 -987~20150804~200 -987~20150807~-300 -987~20150916~100 val test=sc.textFile(file).keyBy(x => x.split("\\~") (0)) .map(x => x._2.split("\\~")) .map(x => ((x(0),x(1),x(2)))) .map{case (account,datevalue,amount) => ((account,datevalue),(amount.toDouble))}.mapValues(x => x).toArray.sliding(2,1).map(x => (x(0)._1,x(1)._2,(x.foldLeft(0.0)(_ + _._2/x.size)),x.foldLeft(0.0)(_ + _._2))).foreach(println) Op:- accountkey,date,balance_of_account, daily_average, sum_base_on_window ((-987,20150728),50.0,75.0,150.0) ((-987,20150729),-100.0,-25.0,-50.0) ((-987,20150730),200.0,50.0,100.0) ((-987,20150804),-300.0,-50.0,-100.0) ((-987,20150807),100.0,-100.0,-200.0) below book is written for Hadoop Mapreduce the book has solution for moving average but its in Java. https://www.safaribooksonline.com/library/view/data-algorithms/9781491906170/ch06.html Sql:- SELECT DATE,balance, SUM(balance) OVER (ORDER BY DATE ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) daily_balance FROM table Thanks Sri On Sun, Jul 31, 2016 at 11:54 AM, Mich Talebzadeh <mich.talebza...@gmail.com > wrote: > Check also this > <https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html> > > HTH > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 31 July 2016 at 19:49, sri hari kali charan Tummala < > kali.tumm...@gmail.com> wrote: > >> Tuple >> >> [Lscala.Tuple2;@65e4cb84 >> >> On Sun, Jul 31, 2016 at 1:00 AM, Jacek Laskowski <ja...@japila.pl> wrote: >> >>> Hi, >>> >>> What's the result type of sliding(2,1)? >>> >>> Pozdrawiam, >>> Jacek Laskowski >>> ---- >>> https://medium.com/@jaceklaskowski/ >>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark >>> Follow me at https://twitter.com/jaceklaskowski >>> >>> >>> On Sun, Jul 31, 2016 at 9:23 AM, sri hari kali charan Tummala >>> <kali.tumm...@gmail.com> wrote: >>> > tried this no luck, wht is non-empty iterator here ? >>> > >>> > OP:- >>> > (-987,non-empty iterator) >>> > (-987,non-empty iterator) >>> > (-987,non-empty iterator) >>> > (-987,non-empty iterator) >>> > (-987,non-empty iterator) >>> > >>> > >>> > sc.textFile(file).keyBy(x => x.split("\\~") (0)) >>> > .map(x => x._2.split("\\~")) >>> > .map(x => (x(0),x(2))) >>> > .map { case (key,value) => >>> (key,value.toArray.toSeq.sliding(2,1).map(x >>> > => x.sum/x.size))}.foreach(println) >>> > >>> > >>> > On Sun, Jul 31, 2016 at 12:03 AM, sri hari kali charan Tummala >>> > <kali.tumm...@gmail.com> wrote: >>> >> >>> >> Hi All, >>> >> >>> >> I managed to write using sliding function but can it get key as well >>> in my >>> >> output ? >>> >> >>> >> sc.textFile(file).keyBy(x => x.split("\\~") (0)) >>> >> .map(x => x._2.split("\\~")) >>> >> .map(x => (x(2).toDouble)).toArray().sliding(2,1).map(x => >>> >> (x,x.size)).foreach(println) >>> >> >>> >> >>> >> at the moment my output:- >>> >> >>> >> 75.0 >>> >> -25.0 >>> >> 50.0 >>> >> -50.0 >>> >> -100.0 >>> >> >>> >> I want with key how to get moving average output based on key ? >>> >> >>> >> >>> >> 987,75.0 >>> >> 987,-25 >>> >> 987,50.0 >>> >> >>> >> Thanks >>> >> Sri >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> On Sat, Jul 30, 2016 at 11:40 AM, sri hari kali charan Tummala >>> >> <kali.tumm...@gmail.com> wrote: >>> >>> >>> >>> for knowledge just wondering how to write it up in scala or spark >>> RDD. >>> >>> >>> >>> Thanks >>> >>> Sri >>> >>> >>> >>> On Sat, Jul 30, 2016 at 11:24 AM, Jacek Laskowski <ja...@japila.pl> >>> >>> wrote: >>> >>>> >>> >>>> Why? >>> >>>> >>> >>>> Pozdrawiam, >>> >>>> Jacek Laskowski >>> >>>> ---- >>> >>>> https://medium.com/@jaceklaskowski/ >>> >>>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark >>> >>>> Follow me at https://twitter.com/jaceklaskowski >>> >>>> >>> >>>> >>> >>>> On Sat, Jul 30, 2016 at 4:42 AM, kali.tumm...@gmail.com >>> >>>> <kali.tumm...@gmail.com> wrote: >>> >>>> > Hi All, >>> >>>> > >>> >>>> > I managed to write business requirement in spark-sql and hive I am >>> >>>> > still >>> >>>> > learning scala how this below sql be written using spark RDD not >>> spark >>> >>>> > data >>> >>>> > frames. >>> >>>> > >>> >>>> > SELECT DATE,balance, >>> >>>> > SUM(balance) OVER (ORDER BY DATE ROWS BETWEEN UNBOUNDED PRECEDING >>> AND >>> >>>> > CURRENT ROW) daily_balance >>> >>>> > FROM table >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > -- >>> >>>> > View this message in context: >>> >>>> > >>> http://apache-spark-user-list.1001560.n3.nabble.com/sql-to-spark-scala-rdd-tp27433.html >>> >>>> > Sent from the Apache Spark User List mailing list archive at >>> >>>> > Nabble.com. >>> >>>> > >>> >>>> > >>> --------------------------------------------------------------------- >>> >>>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >>>> > >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Thanks & Regards >>> >>> Sri Tummala >>> >>> >>> >> >>> >> >>> >> >>> >> -- >>> >> Thanks & Regards >>> >> Sri Tummala >>> >> >>> > >>> > >>> > >>> > -- >>> > Thanks & Regards >>> > Sri Tummala >>> > >>> >> >> >> >> -- >> Thanks & Regards >> Sri Tummala >> >> > -- Thanks & Regards Sri Tummala