Re: sql to spark scala rdd

sri hari kali charan Tummala Sun, 31 Jul 2016 12:16:19 -0700

val test=sc.textFile(file).keyBy(x => x.split("\\~") (0))
  .map(x => x._2.split("\\~"))
  .map(x => ((x(0),x(1),x(2))))
  .map{case (account,datevalue,amount) =>
((account,datevalue),(amount.toDouble))}.mapValues(x =>
x).toArray.sliding(2,1).map(x => (x(0)._1,x(1)._2,(x.foldLeft(0.0)(_ +
_._2/x.size)),x.foldLeft(0.0)(_ + _._2))).foreach(println)



On Sun, Jul 31, 2016 at 12:15 PM, sri hari kali charan Tummala <
kali.tumm...@gmail.com> wrote:

> Hi All,
>
> I already solved it using DF and spark sql I was wondering how to solve in
> scala rdd, I just got the answer need to check my results compared to spark
> sql thanks all for your time.
>
> I am trying to solve moving average using scala RDD group by key.
>
>
> input:-
> -987~20150728~100
> -987~20150729~50
> -987~20150730~-100
> -987~20150804~200
> -987~20150807~-300
> -987~20150916~100
>
>
> val test=sc.textFile(file).keyBy(x => x.split("\\~") (0))
>   .map(x => x._2.split("\\~"))
>   .map(x => ((x(0),x(1),x(2))))
>   .map{case (account,datevalue,amount) => 
> ((account,datevalue),(amount.toDouble))}.mapValues(x => 
> x).toArray.sliding(2,1).map(x => (x(0)._1,x(1)._2,(x.foldLeft(0.0)(_ + 
> _._2/x.size)),x.foldLeft(0.0)(_ + _._2))).foreach(println)
>
> Op:-
>
> accountkey,date,balance_of_account, daily_average, sum_base_on_window
>
> ((-987,20150728),50.0,75.0,150.0)
> ((-987,20150729),-100.0,-25.0,-50.0)
> ((-987,20150730),200.0,50.0,100.0)
> ((-987,20150804),-300.0,-50.0,-100.0)
> ((-987,20150807),100.0,-100.0,-200.0)
>
>
> below book is written for Hadoop Mapreduce the book has solution for
> moving average but its in Java.
>
>
> https://www.safaribooksonline.com/library/view/data-algorithms/9781491906170/ch06.html
>
>
> Sql:-
>
>
> SELECT DATE,balance,
> SUM(balance) OVER (ORDER BY DATE ROWS BETWEEN UNBOUNDED PRECEDING AND
> CURRENT ROW) daily_balance
> FROM  table
>
> Thanks
> Sri
>
>
>
> On Sun, Jul 31, 2016 at 11:54 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Check also this
>> <https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html>
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 31 July 2016 at 19:49, sri hari kali charan Tummala <
>> kali.tumm...@gmail.com> wrote:
>>
>>> Tuple
>>>
>>> [Lscala.Tuple2;@65e4cb84
>>>
>>> On Sun, Jul 31, 2016 at 1:00 AM, Jacek Laskowski <ja...@japila.pl>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> What's the result type of sliding(2,1)?
>>>>
>>>> Pozdrawiam,
>>>> Jacek Laskowski
>>>> ----
>>>> https://medium.com/@jaceklaskowski/
>>>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>>>> Follow me at https://twitter.com/jaceklaskowski
>>>>
>>>>
>>>> On Sun, Jul 31, 2016 at 9:23 AM, sri hari kali charan Tummala
>>>> <kali.tumm...@gmail.com> wrote:
>>>> > tried this no luck, wht is non-empty iterator here ?
>>>> >
>>>> > OP:-
>>>> > (-987,non-empty iterator)
>>>> > (-987,non-empty iterator)
>>>> > (-987,non-empty iterator)
>>>> > (-987,non-empty iterator)
>>>> > (-987,non-empty iterator)
>>>> >
>>>> >
>>>> > sc.textFile(file).keyBy(x => x.split("\\~") (0))
>>>> >   .map(x => x._2.split("\\~"))
>>>> >   .map(x => (x(0),x(2)))
>>>> >     .map { case (key,value) =>
>>>> (key,value.toArray.toSeq.sliding(2,1).map(x
>>>> > => x.sum/x.size))}.foreach(println)
>>>> >
>>>> >
>>>> > On Sun, Jul 31, 2016 at 12:03 AM, sri hari kali charan Tummala
>>>> > <kali.tumm...@gmail.com> wrote:
>>>> >>
>>>> >> Hi All,
>>>> >>
>>>> >> I managed to write using sliding function but can it get key as well
>>>> in my
>>>> >> output ?
>>>> >>
>>>> >> sc.textFile(file).keyBy(x => x.split("\\~") (0))
>>>> >>       .map(x => x._2.split("\\~"))
>>>> >>       .map(x => (x(2).toDouble)).toArray().sliding(2,1).map(x =>
>>>> >> (x,x.size)).foreach(println)
>>>> >>
>>>> >>
>>>> >> at the moment my output:-
>>>> >>
>>>> >> 75.0
>>>> >> -25.0
>>>> >> 50.0
>>>> >> -50.0
>>>> >> -100.0
>>>> >>
>>>> >> I want with key how to get moving average output based on key ?
>>>> >>
>>>> >>
>>>> >> 987,75.0
>>>> >> 987,-25
>>>> >> 987,50.0
>>>> >>
>>>> >> Thanks
>>>> >> Sri
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Sat, Jul 30, 2016 at 11:40 AM, sri hari kali charan Tummala
>>>> >> <kali.tumm...@gmail.com> wrote:
>>>> >>>
>>>> >>> for knowledge just wondering how to write it up in scala or spark
>>>> RDD.
>>>> >>>
>>>> >>> Thanks
>>>> >>> Sri
>>>> >>>
>>>> >>> On Sat, Jul 30, 2016 at 11:24 AM, Jacek Laskowski <ja...@japila.pl>
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> Why?
>>>> >>>>
>>>> >>>> Pozdrawiam,
>>>> >>>> Jacek Laskowski
>>>> >>>> ----
>>>> >>>> https://medium.com/@jaceklaskowski/
>>>> >>>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>>>> >>>> Follow me at https://twitter.com/jaceklaskowski
>>>> >>>>
>>>> >>>>
>>>> >>>> On Sat, Jul 30, 2016 at 4:42 AM, kali.tumm...@gmail.com
>>>> >>>> <kali.tumm...@gmail.com> wrote:
>>>> >>>> > Hi All,
>>>> >>>> >
>>>> >>>> > I managed to write business requirement in spark-sql and hive I
>>>> am
>>>> >>>> > still
>>>> >>>> > learning scala how this below sql be written using spark RDD not
>>>> spark
>>>> >>>> > data
>>>> >>>> > frames.
>>>> >>>> >
>>>> >>>> > SELECT DATE,balance,
>>>> >>>> > SUM(balance) OVER (ORDER BY DATE ROWS BETWEEN UNBOUNDED
>>>> PRECEDING AND
>>>> >>>> > CURRENT ROW) daily_balance
>>>> >>>> > FROM  table
>>>> >>>> >
>>>> >>>> >
>>>> >>>> >
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > --
>>>> >>>> > View this message in context:
>>>> >>>> >
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/sql-to-spark-scala-rdd-tp27433.html
>>>> >>>> > Sent from the Apache Spark User List mailing list archive at
>>>> >>>> > Nabble.com.
>>>> >>>> >
>>>> >>>> >
>>>> ---------------------------------------------------------------------
>>>> >>>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>> >>>> >
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> Thanks & Regards
>>>> >>> Sri Tummala
>>>> >>>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Thanks & Regards
>>>> >> Sri Tummala
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Thanks & Regards
>>>> > Sri Tummala
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks & Regards
>>> Sri Tummala
>>>
>>>
>>
>
>
> --
> Thanks & Regards
> Sri Tummala
>
>


-- 
Thanks & Regards
Sri Tummala

Re: sql to spark scala rdd

Reply via email to