Re: Splitting RDD and Grouping together to perform computation

2014-03-28 Thread yh18190
Hi,
Thanks Nanzhu.I tried to implement your suggestion on following scenario.I
have RDD of say 24 elements.In that when i partioned into two groups of 12
elements each.Their is loss of order of elements in partition.Elemest are
partitioned randomly.I need to preserve the order such that the first 12
elements should be 1st partition and 2nd 12 elemts should be in 2nd
partition.
Guys please help me how to main order of original sequence even after
partioningAny solution
Before Partition:RDD
64
29186
16059
9143
6439
6155
9187
18416
25565
30420
33952
38302
43712
47092
48803
52687
56286
57471
63429
70715
75995
81878
80974
71288
48556
After Partition:In group1 with 12 elements
64,
29186,
18416
30420
33952
38302
43712
47092
56286
81878
80974
71288
48556



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Splitting-RDD-and-Grouping-together-to-perform-computation-tp3153p3447.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


RE: Splitting RDD and Grouping together to perform computation

2014-03-28 Thread Adrian Mocanu
I think you should sort each RDD

-Original Message-
From: yh18190 [mailto:yh18...@gmail.com] 
Sent: March-28-14 4:44 PM
To: u...@spark.incubator.apache.org
Subject: Re: Splitting RDD and Grouping together to perform computation

Hi,
Thanks Nanzhu.I tried to implement your suggestion on following scenario.I have 
RDD of say 24 elements.In that when i partioned into two groups of 12 elements 
each.Their is loss of order of elements in partition.Elemest are partitioned 
randomly.I need to preserve the order such that the first 12 elements should be 
1st partition and 2nd 12 elemts should be in 2nd partition.
Guys please help me how to main order of original sequence even after 
partioningAny solution
Before Partition:RDD
64
29186
16059
9143
6439
6155
9187
18416
25565
30420
33952
38302
43712
47092
48803
52687
56286
57471
63429
70715
75995
81878
80974
71288
48556
After Partition:In group1 with 12 elements 64, 29186,
18416
30420
33952
38302
43712
47092
56286
81878
80974
71288
48556



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Splitting-RDD-and-Grouping-together-to-perform-computation-tp3153p3447.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


RE: Splitting RDD and Grouping together to perform computation

2014-03-28 Thread Adrian Mocanu
I say you need to remap so you have a key for each tuple that you can sort on.
Then call rdd.sortByKey(true) like this mystream.transform(rdd = 
rdd.sortByKey(true))
For this fn to be available you need to import 
org.apache.spark.rdd.OrderedRDDFunctions

-Original Message-
From: yh18190 [mailto:yh18...@gmail.com] 
Sent: March-28-14 5:02 PM
To: u...@spark.incubator.apache.org
Subject: RE: Splitting RDD and Grouping together to perform computation


Hi,
Here is my code for given scenario.Could you please let me know where to sort?I 
mean on what basis we have to sort??so that they maintain order in partition as 
thatof original sequence..

val res2=reduced_hccg.map(_._2)// which gives RDD of numbers
res2.foreach(println)
val result= res2.mapPartitions(p={
   val l=p.toList
   
   val approx=new ListBuffer[(Int)]
   val detail=new ListBuffer[Double]
   for(i-0 until l.length-1 by 2)
   {
println(l(i),l(i+1))
approx+=(l(i),l(i+1))
   
 
   }
   approx.toList.iterator
   detail.toList.iterator
 })
result.foreach(println)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Splitting-RDD-and-Grouping-together-to-perform-computation-tp3153p3450.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Splitting RDD and Grouping together to perform computation

2014-03-28 Thread Syed A. Hashmi
From the jist of it, it seems like you need to override the default
partitioner to control how your data is distributed among partitions. Take
a look at different Partitioners available (Default, Range, Hash) if none
of these get you desired result, you might want to provide your own.


On Fri, Mar 28, 2014 at 2:08 PM, Adrian Mocanu amoc...@verticalscope.comwrote:

 I say you need to remap so you have a key for each tuple that you can sort
 on.
 Then call rdd.sortByKey(true) like this mystream.transform(rdd =
 rdd.sortByKey(true))
 For this fn to be available you need to import
 org.apache.spark.rdd.OrderedRDDFunctions

 -Original Message-
 From: yh18190 [mailto:yh18...@gmail.com]
 Sent: March-28-14 5:02 PM
 To: u...@spark.incubator.apache.org
 Subject: RE: Splitting RDD and Grouping together to perform computation


 Hi,
 Here is my code for given scenario.Could you please let me know where to
 sort?I mean on what basis we have to sort??so that they maintain order in
 partition as thatof original sequence..

 val res2=reduced_hccg.map(_._2)// which gives RDD of numbers
 res2.foreach(println)
 val result= res2.mapPartitions(p={
val l=p.toList

val approx=new ListBuffer[(Int)]
val detail=new ListBuffer[Double]
for(i-0 until l.length-1 by 2)
{
 println(l(i),l(i+1))
 approx+=(l(i),l(i+1))


}
approx.toList.iterator
detail.toList.iterator
  })
 result.foreach(println)



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Splitting-RDD-and-Grouping-together-to-perform-computation-tp3153p3450.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



RE: Splitting RDD and Grouping together to perform computation

2014-03-28 Thread yh18190
Hi Andriana,

Thanks for suggestion.Could you please modify my code part where I need to
do so..I apologise for inconvinience ,becoz i am new to spark I coudnt apply
appropriately..i would be thankful to you.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Splitting-RDD-and-Grouping-together-to-perform-computation-tp3153p3452.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


RE: Splitting RDD and Grouping together to perform computation

2014-03-28 Thread Adrian Mocanu
Not sure how to change your code because you'd need to generate the keys where 
you get the data. Sorry about that.
I can tell you where to put the code to remap and sort though.

import org.apache.spark.rdd.OrderedRDDFunctions
val res2=reduced_hccg.map(_._2) 
.map( x= (newkey,x)).sortByKey(true)  //and if you want remap them to remove 
the key that you used for sorting: .map(x= x._2)

res2.foreach(println)
val result= res2.mapPartitions(p={
   val l=p.toList
   
   val approx=new ListBuffer[(Int)]
   val detail=new ListBuffer[Double]
   for(i-0 until l.length-1 by 2)
   {
println(l(i),l(i+1))
approx+=(l(i),l(i+1))
   
 
   }
   approx.toList.iterator
   detail.toList.iterator
 })
result.foreach(println)

-Original Message-
From: yh18190 [mailto:yh18...@gmail.com] 
Sent: March-28-14 5:17 PM
To: u...@spark.incubator.apache.org
Subject: RE: Splitting RDD and Grouping together to perform computation

Hi Andriana,

Thanks for suggestion.Could you please modify my code part where I need to do 
so..I apologise for inconvinience ,becoz i am new to spark I coudnt apply 
appropriately..i would be thankful to you.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Splitting-RDD-and-Grouping-together-to-perform-computation-tp3153p3452.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Splitting RDD and Grouping together to perform computation

2014-03-24 Thread yh18190
Hi,I have large data set of numbers ie RDD and wanted to perform a
computation only on groupof two values  at a time.For
example1,2,3,4,5,6,7... is an RDDCan i group the RDD into
(1,2),(3,4),(5,6)...?? and perform the respective computations ?in an
efficient manner?As we do'nt have a way to index elements directly using
forloop etc..(i,i+1)...is their way to resolve this problem?Please suggest
me ..i would be really thankful to you



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Splitting-RDD-and-Grouping-together-to-perform-computation-tp3153.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Splitting RDD and Grouping together to perform computation

2014-03-24 Thread yh18190
We need some one who can explain us with short code snippet on given example
so that we get clear cut idea  on RDDs indexing..
Guys please help us



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Splitting-RDD-and-Grouping-together-to-perform-computation-tp3153p3158.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Splitting RDD and Grouping together to perform computation

2014-03-24 Thread Nan Zhu
partition your input into even number partitions  

use mapPartition to operate on Iterator[Int]

maybe there are some more efficient way….

Best,  

--  
Nan Zhu



On Monday, March 24, 2014 at 7:59 PM, yh18190 wrote:

 Hi, I have large data set of numbers ie RDD and wanted to perform a 
 computation only on groupof two values at a time. For example 
 1,2,3,4,5,6,7... is an RDD Can i group the RDD into (1,2),(3,4),(5,6)...?? 
 and perform the respective computations ?in an efficient manner? As we do'nt 
 have a way to index elements directly using forloop etc..(i,i+1)...is their 
 way to resolve this problem? Please suggest me ..i would be really thankful 
 to you  
 View this message in context: Splitting RDD and Grouping together to perform 
 computation 
 (http://apache-spark-user-list.1001560.n3.nabble.com/Splitting-RDD-and-Grouping-together-to-perform-computation-tp3153.html)
 Sent from the Apache Spark User List mailing list archive 
 (http://apache-spark-user-list.1001560.n3.nabble.com/) at Nabble.com 
 (http://Nabble.com).



Re: Splitting RDD and Grouping together to perform computation

2014-03-24 Thread Nan Zhu
I didn’t group the integers, but process them in group of two,   

partition that

scala val a = sc.parallelize(List(1, 2, 3, 4), 2)  
a: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at 
console:12


process each partition and process elements in the partition in group of 2

scala a.mapPartitions(p = {val l = p.toList;   
 | val ret = new ListBuffer[Int]
 | for (i - 0 until l.length by 2) {
 | ret += l(i) + l(i + 1)
 | }
 | ret.toList.iterator
 | }
 | )
res7: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[1] at mapPartitions at 
console:16



scala res7.collect

res10: Array[Int] = Array(3, 7)

Best,

--  
Nan Zhu



On Monday, March 24, 2014 at 8:40 PM, Nan Zhu wrote:

 partition your input into even number partitions  
  
 use mapPartition to operate on Iterator[Int]
  
 maybe there are some more efficient way….
  
 Best,  
  
 --  
 Nan Zhu
  
  
  
 On Monday, March 24, 2014 at 7:59 PM, yh18190 wrote:
  
  Hi, I have large data set of numbers ie RDD and wanted to perform a 
  computation only on groupof two values at a time. For example 
  1,2,3,4,5,6,7... is an RDD Can i group the RDD into (1,2),(3,4),(5,6)...?? 
  and perform the respective computations ?in an efficient manner? As we 
  do'nt have a way to index elements directly using forloop etc..(i,i+1)...is 
  their way to resolve this problem? Please suggest me ..i would be really 
  thankful to you  
  View this message in context: Splitting RDD and Grouping together to 
  perform computation 
  (http://apache-spark-user-list.1001560.n3.nabble.com/Splitting-RDD-and-Grouping-together-to-perform-computation-tp3153.html)
  Sent from the Apache Spark User List mailing list archive 
  (http://apache-spark-user-list.1001560.n3.nabble.com/) at Nabble.com 
  (http://Nabble.com).