Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-24 Thread satish chandra j
HI All,

Please find fix info for users who are following the mail chain of this
issue and the respective solution below:

*reduceByKey: Non working snippet*

import org.apache.spark.Context
import org.apache.spark.Context._
import org.apache.spark.SparkConf
val conf = new SparkConf()
val sc = new SparkContext(conf)

val DataRDD =  SC.makeRDD(Seq((0,1),(0,2),(1,2),(1,3),(2,4)))
DataRDD.reduceByKey(_+_).collect

Result: Array() is empty

*reduceByKey: Working snippet*

import org.apache.spark.Context
import org.apache.spark.Context._
import org.apache.spark.SparkConf
val conf = new SparkConf()
val sc = new
SparkContext(conf).set("spark.driver.allowMultipleContexts","true")

val DataRDD =  SC.makeRDD(Seq((0,1),(0,2),(1,2),(1,3),(2,4)))
DataRDD.reduceByKey(_+_).collect

Result: Array((0,3),(1,5),(2,4))

Regards,
Satish Chandra


On Sat, Aug 22, 2015 at 11:27 AM, satish chandra j  wrote:

> HI All,
> Currently using DSE 4.7 and Spark 1.2.2 version
>
> Regards,
> Satish
>
> On Fri, Aug 21, 2015 at 7:30 PM, java8964  wrote:
>
>> What version of Spark you are using, or comes with DSE 4.7?
>>
>> We just cannot reproduce it in Spark.
>>
>> yzhang@localhost>$ more test.spark
>> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs.reduceByKey((x,y) => x + y).collect
>> yzhang@localhost>$ ~/spark/bin/spark-shell --master local -i test.spark
>> Welcome to
>>     __
>>  / __/__  ___ _/ /__
>> _\ \/ _ \/ _ `/ __/  '_/
>>/___/ .__/\_,_/_/ /_/\_\   version 1.3.1
>>   /_/
>>
>> Using Scala version 2.10.4
>> Spark context available as sc.
>> SQL context available as sqlContext.
>> Loading test.spark...
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at
>> makeRDD at :21
>> 15/08/21 09:58:51 WARN SizeEstimator: Failed to check whether
>> UseCompressedOops is set; assuming yes
>> res0: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> Yong
>>
>>
>> --
>> Date: Fri, 21 Aug 2015 19:24:09 +0530
>> Subject: Re: Transformation not happening for reduceByKey or GroupByKey
>> From: jsatishchan...@gmail.com
>> To: abhis...@tetrationanalytics.com
>> CC: user@spark.apache.org
>>
>>
>> HI Abhishek,
>>
>> I have even tried that but rdd2 is empty
>>
>> Regards,
>> Satish
>>
>> On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh <
>> abhis...@tetrationanalytics.com> wrote:
>>
>> You had:
>>
>> > RDD.reduceByKey((x,y) => x+y)
>> > RDD.take(3)
>>
>> Maybe try:
>>
>> > rdd2 = RDD.reduceByKey((x,y) => x+y)
>> > rdd2.take(3)
>>
>> -Abhishek-
>>
>> On Aug 20, 2015, at 3:05 AM, satish chandra j 
>> wrote:
>>
>> > HI All,
>> > I have data in RDD as mentioned below:
>> >
>> > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>> >
>> >
>> > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
>> on Values for each key
>> >
>> > Code:
>> > RDD.reduceByKey((x,y) => x+y)
>> > RDD.take(3)
>> >
>> > Result in console:
>> > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
>> at :73
>> > res:Array[(Int,Int)] = Array()
>> >
>> > Command as mentioned
>> >
>> > dse spark --master local --jars postgresql-9.4-1201.jar -i  
>> >
>> >
>> > Please let me know what is missing in my code, as my resultant Array is
>> empty
>> >
>> >
>> >
>> > Regards,
>> > Satish
>> >
>>
>>
>>
>


Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-22 Thread satish chandra j
HI All,
Currently using DSE 4.7 and Spark 1.2.2 version

Regards,
Satish

On Fri, Aug 21, 2015 at 7:30 PM, java8964  wrote:

> What version of Spark you are using, or comes with DSE 4.7?
>
> We just cannot reproduce it in Spark.
>
> yzhang@localhost>$ more test.spark
> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
> pairs.reduceByKey((x,y) => x + y).collect
> yzhang@localhost>$ ~/spark/bin/spark-shell --master local -i test.spark
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 1.3.1
>   /_/
>
> Using Scala version 2.10.4
> Spark context available as sc.
> SQL context available as sqlContext.
> Loading test.spark...
> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at
> makeRDD at :21
> 15/08/21 09:58:51 WARN SizeEstimator: Failed to check whether
> UseCompressedOops is set; assuming yes
> res0: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>
> Yong
>
>
> --------------
> Date: Fri, 21 Aug 2015 19:24:09 +0530
> Subject: Re: Transformation not happening for reduceByKey or GroupByKey
> From: jsatishchan...@gmail.com
> To: abhis...@tetrationanalytics.com
> CC: user@spark.apache.org
>
>
> HI Abhishek,
>
> I have even tried that but rdd2 is empty
>
> Regards,
> Satish
>
> On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh <
> abhis...@tetrationanalytics.com> wrote:
>
> You had:
>
> > RDD.reduceByKey((x,y) => x+y)
> > RDD.take(3)
>
> Maybe try:
>
> > rdd2 = RDD.reduceByKey((x,y) => x+y)
> > rdd2.take(3)
>
> -Abhishek-
>
> On Aug 20, 2015, at 3:05 AM, satish chandra j 
> wrote:
>
> > HI All,
> > I have data in RDD as mentioned below:
> >
> > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
> >
> >
> > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
> on Values for each key
> >
> > Code:
> > RDD.reduceByKey((x,y) => x+y)
> > RDD.take(3)
> >
> > Result in console:
> > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
> at :73
> > res:Array[(Int,Int)] = Array()
> >
> > Command as mentioned
> >
> > dse spark --master local --jars postgresql-9.4-1201.jar -i  
> >
> >
> > Please let me know what is missing in my code, as my resultant Array is
> empty
> >
> >
> >
> > Regards,
> > Satish
> >
>
>
>


RE: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread java8964
What version of Spark you are using, or comes with DSE 4.7?
We just cannot reproduce it in Spark.
yzhang@localhost>$ more test.sparkval pairs = 
sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))pairs.reduceByKey((x,y) => x + 
y).collectyzhang@localhost>$ ~/spark/bin/spark-shell --master local -i 
test.sparkWelcome to    __ / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/   /___/ .__/\_,_/_/ /_/\_\   version 1.3.1  /_/
Using Scala version 2.10.4Spark context available as sc.SQL context available 
as sqlContext.Loading test.spark...pairs: org.apache.spark.rdd.RDD[(Int, Int)] 
= ParallelCollectionRDD[0] at makeRDD at :2115/08/21 09:58:51 WARN 
SizeEstimator: Failed to check whether UseCompressedOops is set; assuming 
yesres0: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
Yong

Date: Fri, 21 Aug 2015 19:24:09 +0530
Subject: Re: Transformation not happening for reduceByKey or GroupByKey
From: jsatishchan...@gmail.com
To: abhis...@tetrationanalytics.com
CC: user@spark.apache.org

HI Abhishek,
I have even tried that but rdd2 is empty
Regards,Satish
On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh 
 wrote:
You had:



> RDD.reduceByKey((x,y) => x+y)

> RDD.take(3)



Maybe try:



> rdd2 = RDD.reduceByKey((x,y) => x+y)

> rdd2.take(3)



-Abhishek-



On Aug 20, 2015, at 3:05 AM, satish chandra j  wrote:



> HI All,

> I have data in RDD as mentioned below:

>

> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))

>

>

> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on 
> Values for each key

>

> Code:

> RDD.reduceByKey((x,y) => x+y)

> RDD.take(3)

>

> Result in console:

> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at 
> :73

> res:Array[(Int,Int)] = Array()

>

> Command as mentioned

>

> dse spark --master local --jars postgresql-9.4-1201.jar -i  

>

>

> Please let me know what is missing in my code, as my resultant Array is empty

>

>

>

> Regards,

> Satish

>




  

Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread satish chandra j
HI Abhishek,

I have even tried that but rdd2 is empty

Regards,
Satish

On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh <
abhis...@tetrationanalytics.com> wrote:

> You had:
>
> > RDD.reduceByKey((x,y) => x+y)
> > RDD.take(3)
>
> Maybe try:
>
> > rdd2 = RDD.reduceByKey((x,y) => x+y)
> > rdd2.take(3)
>
> -Abhishek-
>
> On Aug 20, 2015, at 3:05 AM, satish chandra j 
> wrote:
>
> > HI All,
> > I have data in RDD as mentioned below:
> >
> > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
> >
> >
> > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
> on Values for each key
> >
> > Code:
> > RDD.reduceByKey((x,y) => x+y)
> > RDD.take(3)
> >
> > Result in console:
> > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
> at :73
> > res:Array[(Int,Int)] = Array()
> >
> > Command as mentioned
> >
> > dse spark --master local --jars postgresql-9.4-1201.jar -i  
> >
> >
> > Please let me know what is missing in my code, as my resultant Array is
> empty
> >
> >
> >
> > Regards,
> > Satish
> >
>
>


Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread Abhishek R. Singh
You had:

> RDD.reduceByKey((x,y) => x+y)
> RDD.take(3)

Maybe try:

> rdd2 = RDD.reduceByKey((x,y) => x+y)
> rdd2.take(3)

-Abhishek-

On Aug 20, 2015, at 3:05 AM, satish chandra j  wrote:

> HI All,
> I have data in RDD as mentioned below:
> 
> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
> 
> 
> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on 
> Values for each key
> 
> Code:
> RDD.reduceByKey((x,y) => x+y)
> RDD.take(3)
> 
> Result in console:
> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at 
> :73
> res:Array[(Int,Int)] = Array()
> 
> Command as mentioned
> 
> dse spark --master local --jars postgresql-9.4-1201.jar -i  
> 
> 
> Please let me know what is missing in my code, as my resultant Array is empty
> 
> 
> 
> Regards,
> Satish
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread satish chandra j
HI All,
Any inputs for the actual problem statement

Regards,
Satish


On Fri, Aug 21, 2015 at 5:57 PM, Jeff Zhang  wrote:

> Yong, Thanks for your reply.
>
> I tried spark-shell -i , it works fine for me. Not sure the
> different with
> dse spark --master local --jars postgresql-9.4-1201.jar -i  
>
> On Fri, Aug 21, 2015 at 7:01 PM, java8964  wrote:
>
>> I believe "spark-shell -i scriptFile" is there. We also use it, at least
>> in Spark 1.3.1.
>>
>> "dse spark" will just wrap "spark-shell" command, underline it is just
>> invoking "spark-shell".
>>
>> I don't know too much about the original problem though.
>>
>> Yong
>>
>> ------
>> Date: Fri, 21 Aug 2015 18:19:49 +0800
>> Subject: Re: Transformation not happening for reduceByKey or GroupByKey
>> From: zjf...@gmail.com
>> To: jsatishchan...@gmail.com
>> CC: robin.e...@xense.co.uk; user@spark.apache.org
>>
>>
>> Hi Satish,
>>
>> I don't see where spark support "-i", so suspect it is provided by DSE.
>> In that case, it might be bug of DSE.
>>
>>
>>
>> On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j <
>> jsatishchan...@gmail.com> wrote:
>>
>> HI Robin,
>> Yes, it is DSE but issue is related to Spark only
>>
>> Regards,
>> Satish Chandra
>>
>> On Fri, Aug 21, 2015 at 3:06 PM, Robin East 
>> wrote:
>>
>> Not sure, never used dse - it’s part of DataStax Enterprise right?
>>
>> On 21 Aug 2015, at 10:07, satish chandra j 
>> wrote:
>>
>> HI Robin,
>> Yes, below mentioned piece or code works fine in Spark Shell but the same
>> when place in Script File and executed with -i  it creating an
>> empty RDD
>>
>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>> at makeRDD at :28
>>
>>
>> scala> pairs.reduceByKey((x,y) => x + y).collect
>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> Command:
>>
>> dse spark --master local --jars postgresql-9.4-1201.jar -i
>>  
>>
>> I understand, I am missing something here due to which my final RDD does
>> not have as required output
>>
>> Regards,
>> Satish Chandra
>>
>> On Thu, Aug 20, 2015 at 8:23 PM, Robin East 
>> wrote:
>>
>> This works for me:
>>
>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>> at makeRDD at :28
>>
>>
>> scala> pairs.reduceByKey((x,y) => x + y).collect
>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> On 20 Aug 2015, at 11:05, satish chandra j 
>> wrote:
>>
>> HI All,
>> I have data in RDD as mentioned below:
>>
>> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>>
>>
>> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
>> on Values for each key
>>
>> Code:
>> RDD.reduceByKey((x,y) => x+y)
>> RDD.take(3)
>>
>> Result in console:
>> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
>> at :73
>> res:Array[(Int,Int)] = Array()
>>
>> Command as mentioned
>>
>> dse spark --master local --jars postgresql-9.4-1201.jar -i  
>>
>>
>> Please let me know what is missing in my code, as my resultant Array is
>> empty
>>
>>
>>
>> Regards,
>> Satish
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>


RE: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread java8964
I believe "spark-shell -i scriptFile" is there. We also use it, at least in 
Spark 1.3.1.
"dse spark" will just wrap "spark-shell" command, underline it is just invoking 
"spark-shell".
I don't know too much about the original problem though.
Yong
Date: Fri, 21 Aug 2015 18:19:49 +0800
Subject: Re: Transformation not happening for reduceByKey or GroupByKey
From: zjf...@gmail.com
To: jsatishchan...@gmail.com
CC: robin.e...@xense.co.uk; user@spark.apache.org

Hi Satish,
I don't see where spark support "-i", so suspect it is provided by DSE. In that 
case, it might be bug of DSE.


On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j  
wrote:
HI Robin,Yes, it is DSE but issue is related to Spark only
Regards,Satish Chandra
On Fri, Aug 21, 2015 at 3:06 PM, Robin East  wrote:
Not sure, never used dse - it’s part of DataStax Enterprise right?
On 21 Aug 2015, at 10:07, satish chandra j  wrote:
HI Robin,Yes, below mentioned piece or code works fine in Spark Shell but the 
same when place in Script File and executed with -i  it creating an 
empty RDD
scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))pairs: 
org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at makeRDD at 
:28

scala> pairs.reduceByKey((x,y) => x + y).collectres43: Array[(Int, Int)] = 
Array((0,3), (1,50), (2,40))
Command:
dse spark --master local --jars postgresql-9.4-1201.jar -i  

I understand, I am missing something here due to which my final RDD does not 
have as required output
Regards,Satish Chandra
On Thu, Aug 20, 2015 at 8:23 PM, Robin East  wrote:
This works for me:
scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))pairs: 
org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at makeRDD at 
:28

scala> pairs.reduceByKey((x,y) => x + y).collectres43: Array[(Int, Int)] = 
Array((0,3), (1,50), (2,40))
On 20 Aug 2015, at 11:05, satish chandra j  wrote:
HI All,I have data in RDD as mentioned below:
RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))

I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on 
Values for each key
Code:RDD.reduceByKey((x,y) => x+y)RDD.take(3)
Result in console:
RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at 
:73res:Array[(Int,Int)] = Array()
Command as mentioned

dse spark --master local --jars postgresql-9.4-1201.jar -i  

Please let me know what is missing in my code, as my resultant Array is empty



Regards,Satish









-- 
Best Regards

Jeff Zhang
  

Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread Jeff Zhang
Hi Satish,

I don't see where spark support "-i", so suspect it is provided by DSE. In
that case, it might be bug of DSE.



On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j 
wrote:

> HI Robin,
> Yes, it is DSE but issue is related to Spark only
>
> Regards,
> Satish Chandra
>
> On Fri, Aug 21, 2015 at 3:06 PM, Robin East 
> wrote:
>
>> Not sure, never used dse - it’s part of DataStax Enterprise right?
>>
>> On 21 Aug 2015, at 10:07, satish chandra j 
>> wrote:
>>
>> HI Robin,
>> Yes, below mentioned piece or code works fine in Spark Shell but the same
>> when place in Script File and executed with -i  it creating an
>> empty RDD
>>
>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>> at makeRDD at :28
>>
>>
>> scala> pairs.reduceByKey((x,y) => x + y).collect
>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> Command:
>>
>> dse spark --master local --jars postgresql-9.4-1201.jar -i
>>  
>>
>> I understand, I am missing something here due to which my final RDD does
>> not have as required output
>>
>> Regards,
>> Satish Chandra
>>
>> On Thu, Aug 20, 2015 at 8:23 PM, Robin East 
>> wrote:
>>
>>> This works for me:
>>>
>>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>>> at makeRDD at :28
>>>
>>>
>>> scala> pairs.reduceByKey((x,y) => x + y).collect
>>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>>
>>> On 20 Aug 2015, at 11:05, satish chandra j 
>>> wrote:
>>>
>>> HI All,
>>> I have data in RDD as mentioned below:
>>>
>>> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>>>
>>>
>>> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
>>> on Values for each key
>>>
>>> Code:
>>> RDD.reduceByKey((x,y) => x+y)
>>> RDD.take(3)
>>>
>>> Result in console:
>>> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
>>> at :73
>>> res:Array[(Int,Int)] = Array()
>>>
>>> Command as mentioned
>>>
>>> dse spark --master local --jars postgresql-9.4-1201.jar -i  
>>>
>>>
>>> Please let me know what is missing in my code, as my resultant Array is
>>> empty
>>>
>>>
>>>
>>> Regards,
>>> Satish
>>>
>>>
>>>
>>
>>
>


-- 
Best Regards

Jeff Zhang


Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread satish chandra j
HI Robin,
Yes, it is DSE but issue is related to Spark only

Regards,
Satish Chandra

On Fri, Aug 21, 2015 at 3:06 PM, Robin East  wrote:

> Not sure, never used dse - it’s part of DataStax Enterprise right?
>
> On 21 Aug 2015, at 10:07, satish chandra j 
> wrote:
>
> HI Robin,
> Yes, below mentioned piece or code works fine in Spark Shell but the same
> when place in Script File and executed with -i  it creating an
> empty RDD
>
> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at
> makeRDD at :28
>
>
> scala> pairs.reduceByKey((x,y) => x + y).collect
> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>
> Command:
>
> dse spark --master local --jars postgresql-9.4-1201.jar -i
>  
>
> I understand, I am missing something here due to which my final RDD does
> not have as required output
>
> Regards,
> Satish Chandra
>
> On Thu, Aug 20, 2015 at 8:23 PM, Robin East 
> wrote:
>
>> This works for me:
>>
>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>> at makeRDD at :28
>>
>>
>> scala> pairs.reduceByKey((x,y) => x + y).collect
>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> On 20 Aug 2015, at 11:05, satish chandra j 
>> wrote:
>>
>> HI All,
>> I have data in RDD as mentioned below:
>>
>> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>>
>>
>> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
>> on Values for each key
>>
>> Code:
>> RDD.reduceByKey((x,y) => x+y)
>> RDD.take(3)
>>
>> Result in console:
>> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
>> at :73
>> res:Array[(Int,Int)] = Array()
>>
>> Command as mentioned
>>
>> dse spark --master local --jars postgresql-9.4-1201.jar -i  
>>
>>
>> Please let me know what is missing in my code, as my resultant Array is
>> empty
>>
>>
>>
>> Regards,
>> Satish
>>
>>
>>
>
>


Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread satish chandra j
Yes, DSE 4.7

Regards,
Satish Chandra

On Fri, Aug 21, 2015 at 3:06 PM, Robin East  wrote:

> Not sure, never used dse - it’s part of DataStax Enterprise right?
>
> On 21 Aug 2015, at 10:07, satish chandra j 
> wrote:
>
> HI Robin,
> Yes, below mentioned piece or code works fine in Spark Shell but the same
> when place in Script File and executed with -i  it creating an
> empty RDD
>
> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at
> makeRDD at :28
>
>
> scala> pairs.reduceByKey((x,y) => x + y).collect
> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>
> Command:
>
> dse spark --master local --jars postgresql-9.4-1201.jar -i
>  
>
> I understand, I am missing something here due to which my final RDD does
> not have as required output
>
> Regards,
> Satish Chandra
>
> On Thu, Aug 20, 2015 at 8:23 PM, Robin East 
> wrote:
>
>> This works for me:
>>
>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>> at makeRDD at :28
>>
>>
>> scala> pairs.reduceByKey((x,y) => x + y).collect
>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> On 20 Aug 2015, at 11:05, satish chandra j 
>> wrote:
>>
>> HI All,
>> I have data in RDD as mentioned below:
>>
>> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>>
>>
>> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
>> on Values for each key
>>
>> Code:
>> RDD.reduceByKey((x,y) => x+y)
>> RDD.take(3)
>>
>> Result in console:
>> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
>> at :73
>> res:Array[(Int,Int)] = Array()
>>
>> Command as mentioned
>>
>> dse spark --master local --jars postgresql-9.4-1201.jar -i  
>>
>>
>> Please let me know what is missing in my code, as my resultant Array is
>> empty
>>
>>
>>
>> Regards,
>> Satish
>>
>>
>>
>
>


Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread satish chandra j
HI Robin,
Yes, below mentioned piece or code works fine in Spark Shell but the same
when place in Script File and executed with -i  it creating an
empty RDD

scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at
makeRDD at :28


scala> pairs.reduceByKey((x,y) => x + y).collect
res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))

Command:

dse spark --master local --jars postgresql-9.4-1201.jar -i
 

I understand, I am missing something here due to which my final RDD does
not have as required output

Regards,
Satish Chandra

On Thu, Aug 20, 2015 at 8:23 PM, Robin East  wrote:

> This works for me:
>
> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at
> makeRDD at :28
>
>
> scala> pairs.reduceByKey((x,y) => x + y).collect
> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>
> On 20 Aug 2015, at 11:05, satish chandra j 
> wrote:
>
> HI All,
> I have data in RDD as mentioned below:
>
> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>
>
> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on
> Values for each key
>
> Code:
> RDD.reduceByKey((x,y) => x+y)
> RDD.take(3)
>
> Result in console:
> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at
> :73
> res:Array[(Int,Int)] = Array()
>
> Command as mentioned
>
> dse spark --master local --jars postgresql-9.4-1201.jar -i  
>
>
> Please let me know what is missing in my code, as my resultant Array is
> empty
>
>
>
> Regards,
> Satish
>
>
>


Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-20 Thread satish chandra j
HI All,
Could anybody let me know what is that i missing here, it should work as
its a basic transformation

Please let me know if any additional information required

Regards,
Satish

On Thu, Aug 20, 2015 at 3:35 PM, satish chandra j 
wrote:

> HI All,
> I have data in RDD as mentioned below:
>
> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>
>
> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on
> Values for each key
>
> Code:
> RDD.reduceByKey((x,y) => x+y)
> RDD.take(3)
>
> Result in console:
> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at
> :73
> res:Array[(Int,Int)] = Array()
>
> Command as mentioned
>
> dse spark --master local --jars postgresql-9.4-1201.jar -i  
>
>
> Please let me know what is missing in my code, as my resultant Array is
> empty
>
>
>
> Regards,
> Satish
>
>


Transformation not happening for reduceByKey or GroupByKey

2015-08-20 Thread satish chandra j
HI All,
I have data in RDD as mentioned below:

RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))


I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on
Values for each key

Code:
RDD.reduceByKey((x,y) => x+y)
RDD.take(3)

Result in console:
RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at
:73
res:Array[(Int,Int)] = Array()

Command as mentioned

dse spark --master local --jars postgresql-9.4-1201.jar -i  


Please let me know what is missing in my code, as my resultant Array is
empty



Regards,
Satish