Re: Transformation not happening for reduceByKey or GroupByKey
HI All, Please find fix info for users who are following the mail chain of this issue and the respective solution below: *reduceByKey: Non working snippet* import org.apache.spark.Context import org.apache.spark.Context._ import org.apache.spark.SparkConf val conf = new SparkConf() val sc = new SparkContext(conf) val DataRDD = SC.makeRDD(Seq((0,1),(0,2),(1,2),(1,3),(2,4))) DataRDD.reduceByKey(_+_).collect Result: Array() is empty *reduceByKey: Working snippet* import org.apache.spark.Context import org.apache.spark.Context._ import org.apache.spark.SparkConf val conf = new SparkConf() val sc = new SparkContext(conf).set("spark.driver.allowMultipleContexts","true") val DataRDD = SC.makeRDD(Seq((0,1),(0,2),(1,2),(1,3),(2,4))) DataRDD.reduceByKey(_+_).collect Result: Array((0,3),(1,5),(2,4)) Regards, Satish Chandra On Sat, Aug 22, 2015 at 11:27 AM, satish chandra j wrote: > HI All, > Currently using DSE 4.7 and Spark 1.2.2 version > > Regards, > Satish > > On Fri, Aug 21, 2015 at 7:30 PM, java8964 wrote: > >> What version of Spark you are using, or comes with DSE 4.7? >> >> We just cannot reproduce it in Spark. >> >> yzhang@localhost>$ more test.spark >> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40))) >> pairs.reduceByKey((x,y) => x + y).collect >> yzhang@localhost>$ ~/spark/bin/spark-shell --master local -i test.spark >> Welcome to >> __ >> / __/__ ___ _/ /__ >> _\ \/ _ \/ _ `/ __/ '_/ >>/___/ .__/\_,_/_/ /_/\_\ version 1.3.1 >> /_/ >> >> Using Scala version 2.10.4 >> Spark context available as sc. >> SQL context available as sqlContext. >> Loading test.spark... >> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at >> makeRDD at :21 >> 15/08/21 09:58:51 WARN SizeEstimator: Failed to check whether >> UseCompressedOops is set; assuming yes >> res0: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) >> >> Yong >> >> >> -- >> Date: Fri, 21 Aug 2015 19:24:09 +0530 >> Subject: Re: Transformation not happening for reduceByKey or GroupByKey >> From: jsatishchan...@gmail.com >> To: abhis...@tetrationanalytics.com >> CC: user@spark.apache.org >> >> >> HI Abhishek, >> >> I have even tried that but rdd2 is empty >> >> Regards, >> Satish >> >> On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh < >> abhis...@tetrationanalytics.com> wrote: >> >> You had: >> >> > RDD.reduceByKey((x,y) => x+y) >> > RDD.take(3) >> >> Maybe try: >> >> > rdd2 = RDD.reduceByKey((x,y) => x+y) >> > rdd2.take(3) >> >> -Abhishek- >> >> On Aug 20, 2015, at 3:05 AM, satish chandra j >> wrote: >> >> > HI All, >> > I have data in RDD as mentioned below: >> > >> > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40)) >> > >> > >> > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function >> on Values for each key >> > >> > Code: >> > RDD.reduceByKey((x,y) => x+y) >> > RDD.take(3) >> > >> > Result in console: >> > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey >> at :73 >> > res:Array[(Int,Int)] = Array() >> > >> > Command as mentioned >> > >> > dse spark --master local --jars postgresql-9.4-1201.jar -i >> > >> > >> > Please let me know what is missing in my code, as my resultant Array is >> empty >> > >> > >> > >> > Regards, >> > Satish >> > >> >> >> >
Re: Transformation not happening for reduceByKey or GroupByKey
HI All, Currently using DSE 4.7 and Spark 1.2.2 version Regards, Satish On Fri, Aug 21, 2015 at 7:30 PM, java8964 wrote: > What version of Spark you are using, or comes with DSE 4.7? > > We just cannot reproduce it in Spark. > > yzhang@localhost>$ more test.spark > val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40))) > pairs.reduceByKey((x,y) => x + y).collect > yzhang@localhost>$ ~/spark/bin/spark-shell --master local -i test.spark > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 1.3.1 > /_/ > > Using Scala version 2.10.4 > Spark context available as sc. > SQL context available as sqlContext. > Loading test.spark... > pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at > makeRDD at :21 > 15/08/21 09:58:51 WARN SizeEstimator: Failed to check whether > UseCompressedOops is set; assuming yes > res0: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) > > Yong > > > -------------- > Date: Fri, 21 Aug 2015 19:24:09 +0530 > Subject: Re: Transformation not happening for reduceByKey or GroupByKey > From: jsatishchan...@gmail.com > To: abhis...@tetrationanalytics.com > CC: user@spark.apache.org > > > HI Abhishek, > > I have even tried that but rdd2 is empty > > Regards, > Satish > > On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh < > abhis...@tetrationanalytics.com> wrote: > > You had: > > > RDD.reduceByKey((x,y) => x+y) > > RDD.take(3) > > Maybe try: > > > rdd2 = RDD.reduceByKey((x,y) => x+y) > > rdd2.take(3) > > -Abhishek- > > On Aug 20, 2015, at 3:05 AM, satish chandra j > wrote: > > > HI All, > > I have data in RDD as mentioned below: > > > > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40)) > > > > > > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function > on Values for each key > > > > Code: > > RDD.reduceByKey((x,y) => x+y) > > RDD.take(3) > > > > Result in console: > > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey > at :73 > > res:Array[(Int,Int)] = Array() > > > > Command as mentioned > > > > dse spark --master local --jars postgresql-9.4-1201.jar -i > > > > > > Please let me know what is missing in my code, as my resultant Array is > empty > > > > > > > > Regards, > > Satish > > > > >
RE: Transformation not happening for reduceByKey or GroupByKey
What version of Spark you are using, or comes with DSE 4.7? We just cannot reproduce it in Spark. yzhang@localhost>$ more test.sparkval pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))pairs.reduceByKey((x,y) => x + y).collectyzhang@localhost>$ ~/spark/bin/spark-shell --master local -i test.sparkWelcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.3.1 /_/ Using Scala version 2.10.4Spark context available as sc.SQL context available as sqlContext.Loading test.spark...pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at makeRDD at :2115/08/21 09:58:51 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yesres0: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) Yong Date: Fri, 21 Aug 2015 19:24:09 +0530 Subject: Re: Transformation not happening for reduceByKey or GroupByKey From: jsatishchan...@gmail.com To: abhis...@tetrationanalytics.com CC: user@spark.apache.org HI Abhishek, I have even tried that but rdd2 is empty Regards,Satish On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh wrote: You had: > RDD.reduceByKey((x,y) => x+y) > RDD.take(3) Maybe try: > rdd2 = RDD.reduceByKey((x,y) => x+y) > rdd2.take(3) -Abhishek- On Aug 20, 2015, at 3:05 AM, satish chandra j wrote: > HI All, > I have data in RDD as mentioned below: > > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40)) > > > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on > Values for each key > > Code: > RDD.reduceByKey((x,y) => x+y) > RDD.take(3) > > Result in console: > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at > :73 > res:Array[(Int,Int)] = Array() > > Command as mentioned > > dse spark --master local --jars postgresql-9.4-1201.jar -i > > > Please let me know what is missing in my code, as my resultant Array is empty > > > > Regards, > Satish >
Re: Transformation not happening for reduceByKey or GroupByKey
HI Abhishek, I have even tried that but rdd2 is empty Regards, Satish On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh < abhis...@tetrationanalytics.com> wrote: > You had: > > > RDD.reduceByKey((x,y) => x+y) > > RDD.take(3) > > Maybe try: > > > rdd2 = RDD.reduceByKey((x,y) => x+y) > > rdd2.take(3) > > -Abhishek- > > On Aug 20, 2015, at 3:05 AM, satish chandra j > wrote: > > > HI All, > > I have data in RDD as mentioned below: > > > > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40)) > > > > > > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function > on Values for each key > > > > Code: > > RDD.reduceByKey((x,y) => x+y) > > RDD.take(3) > > > > Result in console: > > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey > at :73 > > res:Array[(Int,Int)] = Array() > > > > Command as mentioned > > > > dse spark --master local --jars postgresql-9.4-1201.jar -i > > > > > > Please let me know what is missing in my code, as my resultant Array is > empty > > > > > > > > Regards, > > Satish > > > >
Re: Transformation not happening for reduceByKey or GroupByKey
You had: > RDD.reduceByKey((x,y) => x+y) > RDD.take(3) Maybe try: > rdd2 = RDD.reduceByKey((x,y) => x+y) > rdd2.take(3) -Abhishek- On Aug 20, 2015, at 3:05 AM, satish chandra j wrote: > HI All, > I have data in RDD as mentioned below: > > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40)) > > > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on > Values for each key > > Code: > RDD.reduceByKey((x,y) => x+y) > RDD.take(3) > > Result in console: > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at > :73 > res:Array[(Int,Int)] = Array() > > Command as mentioned > > dse spark --master local --jars postgresql-9.4-1201.jar -i > > > Please let me know what is missing in my code, as my resultant Array is empty > > > > Regards, > Satish > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Transformation not happening for reduceByKey or GroupByKey
HI All, Any inputs for the actual problem statement Regards, Satish On Fri, Aug 21, 2015 at 5:57 PM, Jeff Zhang wrote: > Yong, Thanks for your reply. > > I tried spark-shell -i , it works fine for me. Not sure the > different with > dse spark --master local --jars postgresql-9.4-1201.jar -i > > On Fri, Aug 21, 2015 at 7:01 PM, java8964 wrote: > >> I believe "spark-shell -i scriptFile" is there. We also use it, at least >> in Spark 1.3.1. >> >> "dse spark" will just wrap "spark-shell" command, underline it is just >> invoking "spark-shell". >> >> I don't know too much about the original problem though. >> >> Yong >> >> ------ >> Date: Fri, 21 Aug 2015 18:19:49 +0800 >> Subject: Re: Transformation not happening for reduceByKey or GroupByKey >> From: zjf...@gmail.com >> To: jsatishchan...@gmail.com >> CC: robin.e...@xense.co.uk; user@spark.apache.org >> >> >> Hi Satish, >> >> I don't see where spark support "-i", so suspect it is provided by DSE. >> In that case, it might be bug of DSE. >> >> >> >> On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j < >> jsatishchan...@gmail.com> wrote: >> >> HI Robin, >> Yes, it is DSE but issue is related to Spark only >> >> Regards, >> Satish Chandra >> >> On Fri, Aug 21, 2015 at 3:06 PM, Robin East >> wrote: >> >> Not sure, never used dse - it’s part of DataStax Enterprise right? >> >> On 21 Aug 2015, at 10:07, satish chandra j >> wrote: >> >> HI Robin, >> Yes, below mentioned piece or code works fine in Spark Shell but the same >> when place in Script File and executed with -i it creating an >> empty RDD >> >> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40))) >> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] >> at makeRDD at :28 >> >> >> scala> pairs.reduceByKey((x,y) => x + y).collect >> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) >> >> Command: >> >> dse spark --master local --jars postgresql-9.4-1201.jar -i >> >> >> I understand, I am missing something here due to which my final RDD does >> not have as required output >> >> Regards, >> Satish Chandra >> >> On Thu, Aug 20, 2015 at 8:23 PM, Robin East >> wrote: >> >> This works for me: >> >> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40))) >> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] >> at makeRDD at :28 >> >> >> scala> pairs.reduceByKey((x,y) => x + y).collect >> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) >> >> On 20 Aug 2015, at 11:05, satish chandra j >> wrote: >> >> HI All, >> I have data in RDD as mentioned below: >> >> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40)) >> >> >> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function >> on Values for each key >> >> Code: >> RDD.reduceByKey((x,y) => x+y) >> RDD.take(3) >> >> Result in console: >> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey >> at :73 >> res:Array[(Int,Int)] = Array() >> >> Command as mentioned >> >> dse spark --master local --jars postgresql-9.4-1201.jar -i >> >> >> Please let me know what is missing in my code, as my resultant Array is >> empty >> >> >> >> Regards, >> Satish >> >> >> >> >> >> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > > > > -- > Best Regards > > Jeff Zhang >
RE: Transformation not happening for reduceByKey or GroupByKey
I believe "spark-shell -i scriptFile" is there. We also use it, at least in Spark 1.3.1. "dse spark" will just wrap "spark-shell" command, underline it is just invoking "spark-shell". I don't know too much about the original problem though. Yong Date: Fri, 21 Aug 2015 18:19:49 +0800 Subject: Re: Transformation not happening for reduceByKey or GroupByKey From: zjf...@gmail.com To: jsatishchan...@gmail.com CC: robin.e...@xense.co.uk; user@spark.apache.org Hi Satish, I don't see where spark support "-i", so suspect it is provided by DSE. In that case, it might be bug of DSE. On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j wrote: HI Robin,Yes, it is DSE but issue is related to Spark only Regards,Satish Chandra On Fri, Aug 21, 2015 at 3:06 PM, Robin East wrote: Not sure, never used dse - it’s part of DataStax Enterprise right? On 21 Aug 2015, at 10:07, satish chandra j wrote: HI Robin,Yes, below mentioned piece or code works fine in Spark Shell but the same when place in Script File and executed with -i it creating an empty RDD scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at makeRDD at :28 scala> pairs.reduceByKey((x,y) => x + y).collectres43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) Command: dse spark --master local --jars postgresql-9.4-1201.jar -i I understand, I am missing something here due to which my final RDD does not have as required output Regards,Satish Chandra On Thu, Aug 20, 2015 at 8:23 PM, Robin East wrote: This works for me: scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at makeRDD at :28 scala> pairs.reduceByKey((x,y) => x + y).collectres43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) On 20 Aug 2015, at 11:05, satish chandra j wrote: HI All,I have data in RDD as mentioned below: RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40)) I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on Values for each key Code:RDD.reduceByKey((x,y) => x+y)RDD.take(3) Result in console: RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at :73res:Array[(Int,Int)] = Array() Command as mentioned dse spark --master local --jars postgresql-9.4-1201.jar -i Please let me know what is missing in my code, as my resultant Array is empty Regards,Satish -- Best Regards Jeff Zhang
Re: Transformation not happening for reduceByKey or GroupByKey
Hi Satish, I don't see where spark support "-i", so suspect it is provided by DSE. In that case, it might be bug of DSE. On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j wrote: > HI Robin, > Yes, it is DSE but issue is related to Spark only > > Regards, > Satish Chandra > > On Fri, Aug 21, 2015 at 3:06 PM, Robin East > wrote: > >> Not sure, never used dse - it’s part of DataStax Enterprise right? >> >> On 21 Aug 2015, at 10:07, satish chandra j >> wrote: >> >> HI Robin, >> Yes, below mentioned piece or code works fine in Spark Shell but the same >> when place in Script File and executed with -i it creating an >> empty RDD >> >> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40))) >> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] >> at makeRDD at :28 >> >> >> scala> pairs.reduceByKey((x,y) => x + y).collect >> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) >> >> Command: >> >> dse spark --master local --jars postgresql-9.4-1201.jar -i >> >> >> I understand, I am missing something here due to which my final RDD does >> not have as required output >> >> Regards, >> Satish Chandra >> >> On Thu, Aug 20, 2015 at 8:23 PM, Robin East >> wrote: >> >>> This works for me: >>> >>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40))) >>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] >>> at makeRDD at :28 >>> >>> >>> scala> pairs.reduceByKey((x,y) => x + y).collect >>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) >>> >>> On 20 Aug 2015, at 11:05, satish chandra j >>> wrote: >>> >>> HI All, >>> I have data in RDD as mentioned below: >>> >>> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40)) >>> >>> >>> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function >>> on Values for each key >>> >>> Code: >>> RDD.reduceByKey((x,y) => x+y) >>> RDD.take(3) >>> >>> Result in console: >>> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey >>> at :73 >>> res:Array[(Int,Int)] = Array() >>> >>> Command as mentioned >>> >>> dse spark --master local --jars postgresql-9.4-1201.jar -i >>> >>> >>> Please let me know what is missing in my code, as my resultant Array is >>> empty >>> >>> >>> >>> Regards, >>> Satish >>> >>> >>> >> >> > -- Best Regards Jeff Zhang
Re: Transformation not happening for reduceByKey or GroupByKey
HI Robin, Yes, it is DSE but issue is related to Spark only Regards, Satish Chandra On Fri, Aug 21, 2015 at 3:06 PM, Robin East wrote: > Not sure, never used dse - it’s part of DataStax Enterprise right? > > On 21 Aug 2015, at 10:07, satish chandra j > wrote: > > HI Robin, > Yes, below mentioned piece or code works fine in Spark Shell but the same > when place in Script File and executed with -i it creating an > empty RDD > > scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40))) > pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at > makeRDD at :28 > > > scala> pairs.reduceByKey((x,y) => x + y).collect > res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) > > Command: > > dse spark --master local --jars postgresql-9.4-1201.jar -i > > > I understand, I am missing something here due to which my final RDD does > not have as required output > > Regards, > Satish Chandra > > On Thu, Aug 20, 2015 at 8:23 PM, Robin East > wrote: > >> This works for me: >> >> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40))) >> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] >> at makeRDD at :28 >> >> >> scala> pairs.reduceByKey((x,y) => x + y).collect >> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) >> >> On 20 Aug 2015, at 11:05, satish chandra j >> wrote: >> >> HI All, >> I have data in RDD as mentioned below: >> >> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40)) >> >> >> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function >> on Values for each key >> >> Code: >> RDD.reduceByKey((x,y) => x+y) >> RDD.take(3) >> >> Result in console: >> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey >> at :73 >> res:Array[(Int,Int)] = Array() >> >> Command as mentioned >> >> dse spark --master local --jars postgresql-9.4-1201.jar -i >> >> >> Please let me know what is missing in my code, as my resultant Array is >> empty >> >> >> >> Regards, >> Satish >> >> >> > >
Re: Transformation not happening for reduceByKey or GroupByKey
Yes, DSE 4.7 Regards, Satish Chandra On Fri, Aug 21, 2015 at 3:06 PM, Robin East wrote: > Not sure, never used dse - it’s part of DataStax Enterprise right? > > On 21 Aug 2015, at 10:07, satish chandra j > wrote: > > HI Robin, > Yes, below mentioned piece or code works fine in Spark Shell but the same > when place in Script File and executed with -i it creating an > empty RDD > > scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40))) > pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at > makeRDD at :28 > > > scala> pairs.reduceByKey((x,y) => x + y).collect > res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) > > Command: > > dse spark --master local --jars postgresql-9.4-1201.jar -i > > > I understand, I am missing something here due to which my final RDD does > not have as required output > > Regards, > Satish Chandra > > On Thu, Aug 20, 2015 at 8:23 PM, Robin East > wrote: > >> This works for me: >> >> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40))) >> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] >> at makeRDD at :28 >> >> >> scala> pairs.reduceByKey((x,y) => x + y).collect >> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) >> >> On 20 Aug 2015, at 11:05, satish chandra j >> wrote: >> >> HI All, >> I have data in RDD as mentioned below: >> >> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40)) >> >> >> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function >> on Values for each key >> >> Code: >> RDD.reduceByKey((x,y) => x+y) >> RDD.take(3) >> >> Result in console: >> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey >> at :73 >> res:Array[(Int,Int)] = Array() >> >> Command as mentioned >> >> dse spark --master local --jars postgresql-9.4-1201.jar -i >> >> >> Please let me know what is missing in my code, as my resultant Array is >> empty >> >> >> >> Regards, >> Satish >> >> >> > >
Re: Transformation not happening for reduceByKey or GroupByKey
HI Robin, Yes, below mentioned piece or code works fine in Spark Shell but the same when place in Script File and executed with -i it creating an empty RDD scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40))) pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at makeRDD at :28 scala> pairs.reduceByKey((x,y) => x + y).collect res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) Command: dse spark --master local --jars postgresql-9.4-1201.jar -i I understand, I am missing something here due to which my final RDD does not have as required output Regards, Satish Chandra On Thu, Aug 20, 2015 at 8:23 PM, Robin East wrote: > This works for me: > > scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40))) > pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at > makeRDD at :28 > > > scala> pairs.reduceByKey((x,y) => x + y).collect > res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) > > On 20 Aug 2015, at 11:05, satish chandra j > wrote: > > HI All, > I have data in RDD as mentioned below: > > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40)) > > > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on > Values for each key > > Code: > RDD.reduceByKey((x,y) => x+y) > RDD.take(3) > > Result in console: > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at > :73 > res:Array[(Int,Int)] = Array() > > Command as mentioned > > dse spark --master local --jars postgresql-9.4-1201.jar -i > > > Please let me know what is missing in my code, as my resultant Array is > empty > > > > Regards, > Satish > > >
Re: Transformation not happening for reduceByKey or GroupByKey
HI All, Could anybody let me know what is that i missing here, it should work as its a basic transformation Please let me know if any additional information required Regards, Satish On Thu, Aug 20, 2015 at 3:35 PM, satish chandra j wrote: > HI All, > I have data in RDD as mentioned below: > > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40)) > > > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on > Values for each key > > Code: > RDD.reduceByKey((x,y) => x+y) > RDD.take(3) > > Result in console: > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at > :73 > res:Array[(Int,Int)] = Array() > > Command as mentioned > > dse spark --master local --jars postgresql-9.4-1201.jar -i > > > Please let me know what is missing in my code, as my resultant Array is > empty > > > > Regards, > Satish > >