Hi All,

I already solved it using DF and spark sql I was wondering how to solve in
scala rdd, I just got the answer need to check my results compared to spark
sql thanks all for your time.

I am trying to solve moving average using scala RDD group by key.


input:-
-987~20150728~100
-987~20150729~50
-987~20150730~-100
-987~20150804~200
-987~20150807~-300
-987~20150916~100


val test=sc.textFile(file).keyBy(x => x.split("\\~") (0))
  .map(x => x._2.split("\\~"))
  .map(x => ((x(0),x(1),x(2))))
  .map{case (account,datevalue,amount) =>
((account,datevalue),(amount.toDouble))}.mapValues(x =>
x).toArray.sliding(2,1).map(x => (x(0)._1,x(1)._2,(x.foldLeft(0.0)(_ +
_._2/x.size)),x.foldLeft(0.0)(_ + _._2))).foreach(println)

Op:-

accountkey,date,balance_of_account, daily_average, sum_base_on_window

((-987,20150728),50.0,75.0,150.0)
((-987,20150729),-100.0,-25.0,-50.0)
((-987,20150730),200.0,50.0,100.0)
((-987,20150804),-300.0,-50.0,-100.0)
((-987,20150807),100.0,-100.0,-200.0)


below book is written for Hadoop Mapreduce the book has solution for moving
average but its in Java.

https://www.safaribooksonline.com/library/view/data-algorithms/9781491906170/ch06.html


Sql:-


SELECT DATE,balance,
SUM(balance) OVER (ORDER BY DATE ROWS BETWEEN UNBOUNDED PRECEDING AND
CURRENT ROW) daily_balance
FROM  table

Thanks
Sri



On Sun, Jul 31, 2016 at 11:54 AM, Mich Talebzadeh <mich.talebza...@gmail.com
> wrote:

> Check also this
> <https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html>
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 31 July 2016 at 19:49, sri hari kali charan Tummala <
> kali.tumm...@gmail.com> wrote:
>
>> Tuple
>>
>> [Lscala.Tuple2;@65e4cb84
>>
>> On Sun, Jul 31, 2016 at 1:00 AM, Jacek Laskowski <ja...@japila.pl> wrote:
>>
>>> Hi,
>>>
>>> What's the result type of sliding(2,1)?
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>>
>>> On Sun, Jul 31, 2016 at 9:23 AM, sri hari kali charan Tummala
>>> <kali.tumm...@gmail.com> wrote:
>>> > tried this no luck, wht is non-empty iterator here ?
>>> >
>>> > OP:-
>>> > (-987,non-empty iterator)
>>> > (-987,non-empty iterator)
>>> > (-987,non-empty iterator)
>>> > (-987,non-empty iterator)
>>> > (-987,non-empty iterator)
>>> >
>>> >
>>> > sc.textFile(file).keyBy(x => x.split("\\~") (0))
>>> >   .map(x => x._2.split("\\~"))
>>> >   .map(x => (x(0),x(2)))
>>> >     .map { case (key,value) =>
>>> (key,value.toArray.toSeq.sliding(2,1).map(x
>>> > => x.sum/x.size))}.foreach(println)
>>> >
>>> >
>>> > On Sun, Jul 31, 2016 at 12:03 AM, sri hari kali charan Tummala
>>> > <kali.tumm...@gmail.com> wrote:
>>> >>
>>> >> Hi All,
>>> >>
>>> >> I managed to write using sliding function but can it get key as well
>>> in my
>>> >> output ?
>>> >>
>>> >> sc.textFile(file).keyBy(x => x.split("\\~") (0))
>>> >>       .map(x => x._2.split("\\~"))
>>> >>       .map(x => (x(2).toDouble)).toArray().sliding(2,1).map(x =>
>>> >> (x,x.size)).foreach(println)
>>> >>
>>> >>
>>> >> at the moment my output:-
>>> >>
>>> >> 75.0
>>> >> -25.0
>>> >> 50.0
>>> >> -50.0
>>> >> -100.0
>>> >>
>>> >> I want with key how to get moving average output based on key ?
>>> >>
>>> >>
>>> >> 987,75.0
>>> >> 987,-25
>>> >> 987,50.0
>>> >>
>>> >> Thanks
>>> >> Sri
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Sat, Jul 30, 2016 at 11:40 AM, sri hari kali charan Tummala
>>> >> <kali.tumm...@gmail.com> wrote:
>>> >>>
>>> >>> for knowledge just wondering how to write it up in scala or spark
>>> RDD.
>>> >>>
>>> >>> Thanks
>>> >>> Sri
>>> >>>
>>> >>> On Sat, Jul 30, 2016 at 11:24 AM, Jacek Laskowski <ja...@japila.pl>
>>> >>> wrote:
>>> >>>>
>>> >>>> Why?
>>> >>>>
>>> >>>> Pozdrawiam,
>>> >>>> Jacek Laskowski
>>> >>>> ----
>>> >>>> https://medium.com/@jaceklaskowski/
>>> >>>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>>> >>>> Follow me at https://twitter.com/jaceklaskowski
>>> >>>>
>>> >>>>
>>> >>>> On Sat, Jul 30, 2016 at 4:42 AM, kali.tumm...@gmail.com
>>> >>>> <kali.tumm...@gmail.com> wrote:
>>> >>>> > Hi All,
>>> >>>> >
>>> >>>> > I managed to write business requirement in spark-sql and hive I am
>>> >>>> > still
>>> >>>> > learning scala how this below sql be written using spark RDD not
>>> spark
>>> >>>> > data
>>> >>>> > frames.
>>> >>>> >
>>> >>>> > SELECT DATE,balance,
>>> >>>> > SUM(balance) OVER (ORDER BY DATE ROWS BETWEEN UNBOUNDED PRECEDING
>>> AND
>>> >>>> > CURRENT ROW) daily_balance
>>> >>>> > FROM  table
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> > --
>>> >>>> > View this message in context:
>>> >>>> >
>>> http://apache-spark-user-list.1001560.n3.nabble.com/sql-to-spark-scala-rdd-tp27433.html
>>> >>>> > Sent from the Apache Spark User List mailing list archive at
>>> >>>> > Nabble.com.
>>> >>>> >
>>> >>>> >
>>> ---------------------------------------------------------------------
>>> >>>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>> >>>> >
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Thanks & Regards
>>> >>> Sri Tummala
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Thanks & Regards
>>> >> Sri Tummala
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Thanks & Regards
>>> > Sri Tummala
>>> >
>>>
>>
>>
>>
>> --
>> Thanks & Regards
>> Sri Tummala
>>
>>
>


-- 
Thanks & Regards
Sri Tummala

Reply via email to