HI All, Please find fix info for users who are following the mail chain of this issue and the respective solution below:
*reduceByKey: Non working snippet* import org.apache.spark.Context import org.apache.spark.Context._ import org.apache.spark.SparkConf val conf = new SparkConf() val sc = new SparkContext(conf) val DataRDD = SC.makeRDD(Seq((0,1),(0,2),(1,2),(1,3),(2,4))) DataRDD.reduceByKey(_+_).collect Result: Array() is empty *reduceByKey: Working snippet* import org.apache.spark.Context import org.apache.spark.Context._ import org.apache.spark.SparkConf val conf = new SparkConf() val sc = new SparkContext(conf).set("spark.driver.allowMultipleContexts","true") val DataRDD = SC.makeRDD(Seq((0,1),(0,2),(1,2),(1,3),(2,4))) DataRDD.reduceByKey(_+_).collect Result: Array((0,3),(1,5),(2,4)) Regards, Satish Chandra On Sat, Aug 22, 2015 at 11:27 AM, satish chandra j <jsatishchan...@gmail.com > wrote: > HI All, > Currently using DSE 4.7 and Spark 1.2.2 version > > Regards, > Satish > > On Fri, Aug 21, 2015 at 7:30 PM, java8964 <java8...@hotmail.com> wrote: > >> What version of Spark you are using, or comes with DSE 4.7? >> >> We just cannot reproduce it in Spark. >> >> yzhang@localhost>$ more test.spark >> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40))) >> pairs.reduceByKey((x,y) => x + y).collect >> yzhang@localhost>$ ~/spark/bin/spark-shell --master local -i test.spark >> Welcome to >> ____ __ >> / __/__ ___ _____/ /__ >> _\ \/ _ \/ _ `/ __/ '_/ >> /___/ .__/\_,_/_/ /_/\_\ version 1.3.1 >> /_/ >> >> Using Scala version 2.10.4 >> Spark context available as sc. >> SQL context available as sqlContext. >> Loading test.spark... >> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at >> makeRDD at <console>:21 >> 15/08/21 09:58:51 WARN SizeEstimator: Failed to check whether >> UseCompressedOops is set; assuming yes >> res0: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) >> >> Yong >> >> >> ------------------------------ >> Date: Fri, 21 Aug 2015 19:24:09 +0530 >> Subject: Re: Transformation not happening for reduceByKey or GroupByKey >> From: jsatishchan...@gmail.com >> To: abhis...@tetrationanalytics.com >> CC: user@spark.apache.org >> >> >> HI Abhishek, >> >> I have even tried that but rdd2 is empty >> >> Regards, >> Satish >> >> On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh < >> abhis...@tetrationanalytics.com> wrote: >> >> You had: >> >> > RDD.reduceByKey((x,y) => x+y) >> > RDD.take(3) >> >> Maybe try: >> >> > rdd2 = RDD.reduceByKey((x,y) => x+y) >> > rdd2.take(3) >> >> -Abhishek- >> >> On Aug 20, 2015, at 3:05 AM, satish chandra j <jsatishchan...@gmail.com> >> wrote: >> >> > HI All, >> > I have data in RDD as mentioned below: >> > >> > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40)) >> > >> > >> > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function >> on Values for each key >> > >> > Code: >> > RDD.reduceByKey((x,y) => x+y) >> > RDD.take(3) >> > >> > Result in console: >> > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey >> at <console>:73 >> > res:Array[(Int,Int)] = Array() >> > >> > Command as mentioned >> > >> > dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile> >> > >> > >> > Please let me know what is missing in my code, as my resultant Array is >> empty >> > >> > >> > >> > Regards, >> > Satish >> > >> >> >> >