Re: Transformation not happening for reduceByKey or GroupByKey

satish chandra j Mon, 24 Aug 2015 04:30:07 -0700

HI All,

Please find fix info for users who are following the mail chain of this
issue and the respective solution below:


*reduceByKey: Non working snippet*

import org.apache.spark.Context
import org.apache.spark.Context._
import org.apache.spark.SparkConf
val conf = new SparkConf()
val sc = new SparkContext(conf)

val DataRDD =  SC.makeRDD(Seq((0,1),(0,2),(1,2),(1,3),(2,4)))
DataRDD.reduceByKey(_+_).collect

Result: Array() is empty

*reduceByKey: Working snippet*

import org.apache.spark.Context
import org.apache.spark.Context._
import org.apache.spark.SparkConf
val conf = new SparkConf()
val sc = new
SparkContext(conf).set("spark.driver.allowMultipleContexts","true")

val DataRDD =  SC.makeRDD(Seq((0,1),(0,2),(1,2),(1,3),(2,4)))
DataRDD.reduceByKey(_+_).collect

Result: Array((0,3),(1,5),(2,4))

Regards,
Satish Chandra


On Sat, Aug 22, 2015 at 11:27 AM, satish chandra j <jsatishchan...@gmail.com
> wrote:

> HI All,
> Currently using DSE 4.7 and Spark 1.2.2 version
>
> Regards,
> Satish
>
> On Fri, Aug 21, 2015 at 7:30 PM, java8964 <java8...@hotmail.com> wrote:
>
>> What version of Spark you are using, or comes with DSE 4.7?
>>
>> We just cannot reproduce it in Spark.
>>
>> yzhang@localhost>$ more test.spark
>> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs.reduceByKey((x,y) => x + y).collect
>> yzhang@localhost>$ ~/spark/bin/spark-shell --master local -i test.spark
>> Welcome to
>>       ____              __
>>      / __/__  ___ _____/ /__
>>     _\ \/ _ \/ _ `/ __/  '_/
>>    /___/ .__/\_,_/_/ /_/\_\   version 1.3.1
>>       /_/
>>
>> Using Scala version 2.10.4
>> Spark context available as sc.
>> SQL context available as sqlContext.
>> Loading test.spark...
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at
>> makeRDD at <console>:21
>> 15/08/21 09:58:51 WARN SizeEstimator: Failed to check whether
>> UseCompressedOops is set; assuming yes
>> res0: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> Yong
>>
>>
>> ------------------------------
>> Date: Fri, 21 Aug 2015 19:24:09 +0530
>> Subject: Re: Transformation not happening for reduceByKey or GroupByKey
>> From: jsatishchan...@gmail.com
>> To: abhis...@tetrationanalytics.com
>> CC: user@spark.apache.org
>>
>>
>> HI Abhishek,
>>
>> I have even tried that but rdd2 is empty
>>
>> Regards,
>> Satish
>>
>> On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh <
>> abhis...@tetrationanalytics.com> wrote:
>>
>> You had:
>>
>> > RDD.reduceByKey((x,y) => x+y)
>> > RDD.take(3)
>>
>> Maybe try:
>>
>> > rdd2 = RDD.reduceByKey((x,y) => x+y)
>> > rdd2.take(3)
>>
>> -Abhishek-
>>
>> On Aug 20, 2015, at 3:05 AM, satish chandra j <jsatishchan...@gmail.com>
>> wrote:
>>
>> > HI All,
>> > I have data in RDD as mentioned below:
>> >
>> > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>> >
>> >
>> > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
>> on Values for each key
>> >
>> > Code:
>> > RDD.reduceByKey((x,y) => x+y)
>> > RDD.take(3)
>> >
>> > Result in console:
>> > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
>> at <console>:73
>> > res:Array[(Int,Int)] = Array()
>> >
>> > Command as mentioned
>> >
>> > dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>
>> >
>> >
>> > Please let me know what is missing in my code, as my resultant Array is
>> empty
>> >
>> >
>> >
>> > Regards,
>> > Satish
>> >
>>
>>
>>
>

Re: Transformation not happening for reduceByKey or GroupByKey

Reply via email to