Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-24 Thread satish chandra j
HI All,

Please find fix info for users who are following the mail chain of this
issue and the respective solution below:

*reduceByKey: Non working snippet*

import org.apache.spark.Context
import org.apache.spark.Context._
import org.apache.spark.SparkConf
val conf = new SparkConf()
val sc = new SparkContext(conf)

val DataRDD =  SC.makeRDD(Seq((0,1),(0,2),(1,2),(1,3),(2,4)))
DataRDD.reduceByKey(_+_).collect

Result: Array() is empty

*reduceByKey: Working snippet*

import org.apache.spark.Context
import org.apache.spark.Context._
import org.apache.spark.SparkConf
val conf = new SparkConf()
val sc = new
SparkContext(conf).set(spark.driver.allowMultipleContexts,true)

val DataRDD =  SC.makeRDD(Seq((0,1),(0,2),(1,2),(1,3),(2,4)))
DataRDD.reduceByKey(_+_).collect

Result: Array((0,3),(1,5),(2,4))

Regards,
Satish Chandra


On Sat, Aug 22, 2015 at 11:27 AM, satish chandra j jsatishchan...@gmail.com
 wrote:

 HI All,
 Currently using DSE 4.7 and Spark 1.2.2 version

 Regards,
 Satish

 On Fri, Aug 21, 2015 at 7:30 PM, java8964 java8...@hotmail.com wrote:

 What version of Spark you are using, or comes with DSE 4.7?

 We just cannot reproduce it in Spark.

 yzhang@localhost$ more test.spark
 val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
 pairs.reduceByKey((x,y) = x + y).collect
 yzhang@localhost$ ~/spark/bin/spark-shell --master local -i test.spark
 Welcome to
     __
  / __/__  ___ _/ /__
 _\ \/ _ \/ _ `/ __/  '_/
/___/ .__/\_,_/_/ /_/\_\   version 1.3.1
   /_/

 Using Scala version 2.10.4
 Spark context available as sc.
 SQL context available as sqlContext.
 Loading test.spark...
 pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at
 makeRDD at console:21
 15/08/21 09:58:51 WARN SizeEstimator: Failed to check whether
 UseCompressedOops is set; assuming yes
 res0: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))

 Yong


 --
 Date: Fri, 21 Aug 2015 19:24:09 +0530
 Subject: Re: Transformation not happening for reduceByKey or GroupByKey
 From: jsatishchan...@gmail.com
 To: abhis...@tetrationanalytics.com
 CC: user@spark.apache.org


 HI Abhishek,

 I have even tried that but rdd2 is empty

 Regards,
 Satish

 On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh 
 abhis...@tetrationanalytics.com wrote:

 You had:

  RDD.reduceByKey((x,y) = x+y)
  RDD.take(3)

 Maybe try:

  rdd2 = RDD.reduceByKey((x,y) = x+y)
  rdd2.take(3)

 -Abhishek-

 On Aug 20, 2015, at 3:05 AM, satish chandra j jsatishchan...@gmail.com
 wrote:

  HI All,
  I have data in RDD as mentioned below:
 
  RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
 
 
  I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
 on Values for each key
 
  Code:
  RDD.reduceByKey((x,y) = x+y)
  RDD.take(3)
 
  Result in console:
  RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
 at console:73
  res:Array[(Int,Int)] = Array()
 
  Command as mentioned
 
  dse spark --master local --jars postgresql-9.4-1201.jar -i  ScriptFile
 
 
  Please let me know what is missing in my code, as my resultant Array is
 empty
 
 
 
  Regards,
  Satish
 






Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-22 Thread satish chandra j
HI All,
Currently using DSE 4.7 and Spark 1.2.2 version

Regards,
Satish

On Fri, Aug 21, 2015 at 7:30 PM, java8964 java8...@hotmail.com wrote:

 What version of Spark you are using, or comes with DSE 4.7?

 We just cannot reproduce it in Spark.

 yzhang@localhost$ more test.spark
 val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
 pairs.reduceByKey((x,y) = x + y).collect
 yzhang@localhost$ ~/spark/bin/spark-shell --master local -i test.spark
 Welcome to
     __
  / __/__  ___ _/ /__
 _\ \/ _ \/ _ `/ __/  '_/
/___/ .__/\_,_/_/ /_/\_\   version 1.3.1
   /_/

 Using Scala version 2.10.4
 Spark context available as sc.
 SQL context available as sqlContext.
 Loading test.spark...
 pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at
 makeRDD at console:21
 15/08/21 09:58:51 WARN SizeEstimator: Failed to check whether
 UseCompressedOops is set; assuming yes
 res0: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))

 Yong


 --
 Date: Fri, 21 Aug 2015 19:24:09 +0530
 Subject: Re: Transformation not happening for reduceByKey or GroupByKey
 From: jsatishchan...@gmail.com
 To: abhis...@tetrationanalytics.com
 CC: user@spark.apache.org


 HI Abhishek,

 I have even tried that but rdd2 is empty

 Regards,
 Satish

 On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh 
 abhis...@tetrationanalytics.com wrote:

 You had:

  RDD.reduceByKey((x,y) = x+y)
  RDD.take(3)

 Maybe try:

  rdd2 = RDD.reduceByKey((x,y) = x+y)
  rdd2.take(3)

 -Abhishek-

 On Aug 20, 2015, at 3:05 AM, satish chandra j jsatishchan...@gmail.com
 wrote:

  HI All,
  I have data in RDD as mentioned below:
 
  RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
 
 
  I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
 on Values for each key
 
  Code:
  RDD.reduceByKey((x,y) = x+y)
  RDD.take(3)
 
  Result in console:
  RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
 at console:73
  res:Array[(Int,Int)] = Array()
 
  Command as mentioned
 
  dse spark --master local --jars postgresql-9.4-1201.jar -i  ScriptFile
 
 
  Please let me know what is missing in my code, as my resultant Array is
 empty
 
 
 
  Regards,
  Satish
 





RE: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread java8964
I believe spark-shell -i scriptFile is there. We also use it, at least in 
Spark 1.3.1.
dse spark will just wrap spark-shell command, underline it is just invoking 
spark-shell.
I don't know too much about the original problem though.
Yong
Date: Fri, 21 Aug 2015 18:19:49 +0800
Subject: Re: Transformation not happening for reduceByKey or GroupByKey
From: zjf...@gmail.com
To: jsatishchan...@gmail.com
CC: robin.e...@xense.co.uk; user@spark.apache.org

Hi Satish,
I don't see where spark support -i, so suspect it is provided by DSE. In that 
case, it might be bug of DSE.


On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j jsatishchan...@gmail.com 
wrote:
HI Robin,Yes, it is DSE but issue is related to Spark only
Regards,Satish Chandra
On Fri, Aug 21, 2015 at 3:06 PM, Robin East robin.e...@xense.co.uk wrote:
Not sure, never used dse - it’s part of DataStax Enterprise right?
On 21 Aug 2015, at 10:07, satish chandra j jsatishchan...@gmail.com wrote:
HI Robin,Yes, below mentioned piece or code works fine in Spark Shell but the 
same when place in Script File and executed with -i file name it creating an 
empty RDD
scala val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))pairs: 
org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at makeRDD at 
console:28

scala pairs.reduceByKey((x,y) = x + y).collectres43: Array[(Int, Int)] = 
Array((0,3), (1,50), (2,40))
Command:
dse spark --master local --jars postgresql-9.4-1201.jar -i  ScriptFile

I understand, I am missing something here due to which my final RDD does not 
have as required output
Regards,Satish Chandra
On Thu, Aug 20, 2015 at 8:23 PM, Robin East robin.e...@xense.co.uk wrote:
This works for me:
scala val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))pairs: 
org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at makeRDD at 
console:28

scala pairs.reduceByKey((x,y) = x + y).collectres43: Array[(Int, Int)] = 
Array((0,3), (1,50), (2,40))
On 20 Aug 2015, at 11:05, satish chandra j jsatishchan...@gmail.com wrote:
HI All,I have data in RDD as mentioned below:
RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))

I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on 
Values for each key
Code:RDD.reduceByKey((x,y) = x+y)RDD.take(3)
Result in console:
RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at 
console:73res:Array[(Int,Int)] = Array()
Command as mentioned

dse spark --master local --jars postgresql-9.4-1201.jar -i  ScriptFile

Please let me know what is missing in my code, as my resultant Array is empty



Regards,Satish









-- 
Best Regards

Jeff Zhang
  

Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread satish chandra j
HI All,
Any inputs for the actual problem statement

Regards,
Satish


On Fri, Aug 21, 2015 at 5:57 PM, Jeff Zhang zjf...@gmail.com wrote:

 Yong, Thanks for your reply.

 I tried spark-shell -i script-file, it works fine for me. Not sure the
 different with
 dse spark --master local --jars postgresql-9.4-1201.jar -i  ScriptFile

 On Fri, Aug 21, 2015 at 7:01 PM, java8964 java8...@hotmail.com wrote:

 I believe spark-shell -i scriptFile is there. We also use it, at least
 in Spark 1.3.1.

 dse spark will just wrap spark-shell command, underline it is just
 invoking spark-shell.

 I don't know too much about the original problem though.

 Yong

 --
 Date: Fri, 21 Aug 2015 18:19:49 +0800
 Subject: Re: Transformation not happening for reduceByKey or GroupByKey
 From: zjf...@gmail.com
 To: jsatishchan...@gmail.com
 CC: robin.e...@xense.co.uk; user@spark.apache.org


 Hi Satish,

 I don't see where spark support -i, so suspect it is provided by DSE.
 In that case, it might be bug of DSE.



 On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j 
 jsatishchan...@gmail.com wrote:

 HI Robin,
 Yes, it is DSE but issue is related to Spark only

 Regards,
 Satish Chandra

 On Fri, Aug 21, 2015 at 3:06 PM, Robin East robin.e...@xense.co.uk
 wrote:

 Not sure, never used dse - it’s part of DataStax Enterprise right?

 On 21 Aug 2015, at 10:07, satish chandra j jsatishchan...@gmail.com
 wrote:

 HI Robin,
 Yes, below mentioned piece or code works fine in Spark Shell but the same
 when place in Script File and executed with -i file name it creating an
 empty RDD

 scala val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
 pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
 at makeRDD at console:28


 scala pairs.reduceByKey((x,y) = x + y).collect
 res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))

 Command:

 dse spark --master local --jars postgresql-9.4-1201.jar -i
  ScriptFile

 I understand, I am missing something here due to which my final RDD does
 not have as required output

 Regards,
 Satish Chandra

 On Thu, Aug 20, 2015 at 8:23 PM, Robin East robin.e...@xense.co.uk
 wrote:

 This works for me:

 scala val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
 pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
 at makeRDD at console:28


 scala pairs.reduceByKey((x,y) = x + y).collect
 res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))

 On 20 Aug 2015, at 11:05, satish chandra j jsatishchan...@gmail.com
 wrote:

 HI All,
 I have data in RDD as mentioned below:

 RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))


 I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
 on Values for each key

 Code:
 RDD.reduceByKey((x,y) = x+y)
 RDD.take(3)

 Result in console:
 RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
 at console:73
 res:Array[(Int,Int)] = Array()

 Command as mentioned

 dse spark --master local --jars postgresql-9.4-1201.jar -i  ScriptFile


 Please let me know what is missing in my code, as my resultant Array is
 empty



 Regards,
 Satish








 --
 Best Regards

 Jeff Zhang




 --
 Best Regards

 Jeff Zhang



Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread Abhishek R. Singh
You had:

 RDD.reduceByKey((x,y) = x+y)
 RDD.take(3)

Maybe try:

 rdd2 = RDD.reduceByKey((x,y) = x+y)
 rdd2.take(3)

-Abhishek-

On Aug 20, 2015, at 3:05 AM, satish chandra j jsatishchan...@gmail.com wrote:

 HI All,
 I have data in RDD as mentioned below:
 
 RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
 
 
 I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on 
 Values for each key
 
 Code:
 RDD.reduceByKey((x,y) = x+y)
 RDD.take(3)
 
 Result in console:
 RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at 
 console:73
 res:Array[(Int,Int)] = Array()
 
 Command as mentioned
 
 dse spark --master local --jars postgresql-9.4-1201.jar -i  ScriptFile
 
 
 Please let me know what is missing in my code, as my resultant Array is empty
 
 
 
 Regards,
 Satish
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread satish chandra j
HI Abhishek,

I have even tried that but rdd2 is empty

Regards,
Satish

On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh 
abhis...@tetrationanalytics.com wrote:

 You had:

  RDD.reduceByKey((x,y) = x+y)
  RDD.take(3)

 Maybe try:

  rdd2 = RDD.reduceByKey((x,y) = x+y)
  rdd2.take(3)

 -Abhishek-

 On Aug 20, 2015, at 3:05 AM, satish chandra j jsatishchan...@gmail.com
 wrote:

  HI All,
  I have data in RDD as mentioned below:
 
  RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
 
 
  I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
 on Values for each key
 
  Code:
  RDD.reduceByKey((x,y) = x+y)
  RDD.take(3)
 
  Result in console:
  RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
 at console:73
  res:Array[(Int,Int)] = Array()
 
  Command as mentioned
 
  dse spark --master local --jars postgresql-9.4-1201.jar -i  ScriptFile
 
 
  Please let me know what is missing in my code, as my resultant Array is
 empty
 
 
 
  Regards,
  Satish
 




RE: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread java8964
What version of Spark you are using, or comes with DSE 4.7?
We just cannot reproduce it in Spark.
yzhang@localhost$ more test.sparkval pairs = 
sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))pairs.reduceByKey((x,y) = x + 
y).collectyzhang@localhost$ ~/spark/bin/spark-shell --master local -i 
test.sparkWelcome to    __ / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/   /___/ .__/\_,_/_/ /_/\_\   version 1.3.1  /_/
Using Scala version 2.10.4Spark context available as sc.SQL context available 
as sqlContext.Loading test.spark...pairs: org.apache.spark.rdd.RDD[(Int, Int)] 
= ParallelCollectionRDD[0] at makeRDD at console:2115/08/21 09:58:51 WARN 
SizeEstimator: Failed to check whether UseCompressedOops is set; assuming 
yesres0: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
Yong

Date: Fri, 21 Aug 2015 19:24:09 +0530
Subject: Re: Transformation not happening for reduceByKey or GroupByKey
From: jsatishchan...@gmail.com
To: abhis...@tetrationanalytics.com
CC: user@spark.apache.org

HI Abhishek,
I have even tried that but rdd2 is empty
Regards,Satish
On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh 
abhis...@tetrationanalytics.com wrote:
You had:



 RDD.reduceByKey((x,y) = x+y)

 RDD.take(3)



Maybe try:



 rdd2 = RDD.reduceByKey((x,y) = x+y)

 rdd2.take(3)



-Abhishek-



On Aug 20, 2015, at 3:05 AM, satish chandra j jsatishchan...@gmail.com wrote:



 HI All,

 I have data in RDD as mentioned below:



 RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))





 I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on 
 Values for each key



 Code:

 RDD.reduceByKey((x,y) = x+y)

 RDD.take(3)



 Result in console:

 RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at 
 console:73

 res:Array[(Int,Int)] = Array()



 Command as mentioned



 dse spark --master local --jars postgresql-9.4-1201.jar -i  ScriptFile





 Please let me know what is missing in my code, as my resultant Array is empty







 Regards,

 Satish






  

Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread satish chandra j
Yes, DSE 4.7

Regards,
Satish Chandra

On Fri, Aug 21, 2015 at 3:06 PM, Robin East robin.e...@xense.co.uk wrote:

 Not sure, never used dse - it’s part of DataStax Enterprise right?

 On 21 Aug 2015, at 10:07, satish chandra j jsatishchan...@gmail.com
 wrote:

 HI Robin,
 Yes, below mentioned piece or code works fine in Spark Shell but the same
 when place in Script File and executed with -i file name it creating an
 empty RDD

 scala val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
 pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at
 makeRDD at console:28


 scala pairs.reduceByKey((x,y) = x + y).collect
 res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))

 Command:

 dse spark --master local --jars postgresql-9.4-1201.jar -i
  ScriptFile

 I understand, I am missing something here due to which my final RDD does
 not have as required output

 Regards,
 Satish Chandra

 On Thu, Aug 20, 2015 at 8:23 PM, Robin East robin.e...@xense.co.uk
 wrote:

 This works for me:

 scala val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
 pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
 at makeRDD at console:28


 scala pairs.reduceByKey((x,y) = x + y).collect
 res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))

 On 20 Aug 2015, at 11:05, satish chandra j jsatishchan...@gmail.com
 wrote:

 HI All,
 I have data in RDD as mentioned below:

 RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))


 I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
 on Values for each key

 Code:
 RDD.reduceByKey((x,y) = x+y)
 RDD.take(3)

 Result in console:
 RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
 at console:73
 res:Array[(Int,Int)] = Array()

 Command as mentioned

 dse spark --master local --jars postgresql-9.4-1201.jar -i  ScriptFile


 Please let me know what is missing in my code, as my resultant Array is
 empty



 Regards,
 Satish







Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread satish chandra j
HI Robin,
Yes, it is DSE but issue is related to Spark only

Regards,
Satish Chandra

On Fri, Aug 21, 2015 at 3:06 PM, Robin East robin.e...@xense.co.uk wrote:

 Not sure, never used dse - it’s part of DataStax Enterprise right?

 On 21 Aug 2015, at 10:07, satish chandra j jsatishchan...@gmail.com
 wrote:

 HI Robin,
 Yes, below mentioned piece or code works fine in Spark Shell but the same
 when place in Script File and executed with -i file name it creating an
 empty RDD

 scala val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
 pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at
 makeRDD at console:28


 scala pairs.reduceByKey((x,y) = x + y).collect
 res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))

 Command:

 dse spark --master local --jars postgresql-9.4-1201.jar -i
  ScriptFile

 I understand, I am missing something here due to which my final RDD does
 not have as required output

 Regards,
 Satish Chandra

 On Thu, Aug 20, 2015 at 8:23 PM, Robin East robin.e...@xense.co.uk
 wrote:

 This works for me:

 scala val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
 pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
 at makeRDD at console:28


 scala pairs.reduceByKey((x,y) = x + y).collect
 res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))

 On 20 Aug 2015, at 11:05, satish chandra j jsatishchan...@gmail.com
 wrote:

 HI All,
 I have data in RDD as mentioned below:

 RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))


 I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
 on Values for each key

 Code:
 RDD.reduceByKey((x,y) = x+y)
 RDD.take(3)

 Result in console:
 RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
 at console:73
 res:Array[(Int,Int)] = Array()

 Command as mentioned

 dse spark --master local --jars postgresql-9.4-1201.jar -i  ScriptFile


 Please let me know what is missing in my code, as my resultant Array is
 empty



 Regards,
 Satish







Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread Jeff Zhang
Hi Satish,

I don't see where spark support -i, so suspect it is provided by DSE. In
that case, it might be bug of DSE.



On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j jsatishchan...@gmail.com
wrote:

 HI Robin,
 Yes, it is DSE but issue is related to Spark only

 Regards,
 Satish Chandra

 On Fri, Aug 21, 2015 at 3:06 PM, Robin East robin.e...@xense.co.uk
 wrote:

 Not sure, never used dse - it’s part of DataStax Enterprise right?

 On 21 Aug 2015, at 10:07, satish chandra j jsatishchan...@gmail.com
 wrote:

 HI Robin,
 Yes, below mentioned piece or code works fine in Spark Shell but the same
 when place in Script File and executed with -i file name it creating an
 empty RDD

 scala val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
 pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
 at makeRDD at console:28


 scala pairs.reduceByKey((x,y) = x + y).collect
 res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))

 Command:

 dse spark --master local --jars postgresql-9.4-1201.jar -i
  ScriptFile

 I understand, I am missing something here due to which my final RDD does
 not have as required output

 Regards,
 Satish Chandra

 On Thu, Aug 20, 2015 at 8:23 PM, Robin East robin.e...@xense.co.uk
 wrote:

 This works for me:

 scala val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
 pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
 at makeRDD at console:28


 scala pairs.reduceByKey((x,y) = x + y).collect
 res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))

 On 20 Aug 2015, at 11:05, satish chandra j jsatishchan...@gmail.com
 wrote:

 HI All,
 I have data in RDD as mentioned below:

 RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))


 I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
 on Values for each key

 Code:
 RDD.reduceByKey((x,y) = x+y)
 RDD.take(3)

 Result in console:
 RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
 at console:73
 res:Array[(Int,Int)] = Array()

 Command as mentioned

 dse spark --master local --jars postgresql-9.4-1201.jar -i  ScriptFile


 Please let me know what is missing in my code, as my resultant Array is
 empty



 Regards,
 Satish








-- 
Best Regards

Jeff Zhang


Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread satish chandra j
HI Robin,
Yes, below mentioned piece or code works fine in Spark Shell but the same
when place in Script File and executed with -i file name it creating an
empty RDD

scala val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at
makeRDD at console:28


scala pairs.reduceByKey((x,y) = x + y).collect
res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))

Command:

dse spark --master local --jars postgresql-9.4-1201.jar -i
 ScriptFile

I understand, I am missing something here due to which my final RDD does
not have as required output

Regards,
Satish Chandra

On Thu, Aug 20, 2015 at 8:23 PM, Robin East robin.e...@xense.co.uk wrote:

 This works for me:

 scala val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
 pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at
 makeRDD at console:28


 scala pairs.reduceByKey((x,y) = x + y).collect
 res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))

 On 20 Aug 2015, at 11:05, satish chandra j jsatishchan...@gmail.com
 wrote:

 HI All,
 I have data in RDD as mentioned below:

 RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))


 I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on
 Values for each key

 Code:
 RDD.reduceByKey((x,y) = x+y)
 RDD.take(3)

 Result in console:
 RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at
 console:73
 res:Array[(Int,Int)] = Array()

 Command as mentioned

 dse spark --master local --jars postgresql-9.4-1201.jar -i  ScriptFile


 Please let me know what is missing in my code, as my resultant Array is
 empty



 Regards,
 Satish





Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-20 Thread satish chandra j
HI All,
Could anybody let me know what is that i missing here, it should work as
its a basic transformation

Please let me know if any additional information required

Regards,
Satish

On Thu, Aug 20, 2015 at 3:35 PM, satish chandra j jsatishchan...@gmail.com
wrote:

 HI All,
 I have data in RDD as mentioned below:

 RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))


 I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on
 Values for each key

 Code:
 RDD.reduceByKey((x,y) = x+y)
 RDD.take(3)

 Result in console:
 RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at
 console:73
 res:Array[(Int,Int)] = Array()

 Command as mentioned

 dse spark --master local --jars postgresql-9.4-1201.jar -i  ScriptFile


 Please let me know what is missing in my code, as my resultant Array is
 empty



 Regards,
 Satish