Re: the way to compare any two adjacent elements in one rdd

2015-12-06 Thread Zhiliang Zhu
On Saturday, December 5, 2015 3:00 PM, DB Tsai wrote: This is tricky. You need to shuffle the ending and beginning elements using mapPartitionWithIndex. Does this mean that I need to shuffle the all elements in different partitions into one partition, then

Re: the way to compare any two adjacent elements in one rdd

2015-12-06 Thread Zhiliang Zhu
On Monday, December 7, 2015 10:37 AM, DB Tsai wrote: Only beginning and ending part of data. The rest in the partition can be compared without shuffle. Would you help write a few  pseudo-code about it...It seems that there is not shuffle related  API , or

Re: the way to compare any two adjacent elements in one rdd

2015-12-06 Thread DB Tsai
Only beginning and ending part of data. The rest in the partition can be compared without shuffle. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Sun, Dec 6, 2015 at 6:27 PM, Zhiliang Zhu

Re: the way to compare any two adjacent elements in one rdd

2015-12-05 Thread Zhiliang Zhu
For this, mapWithPartitionsWithIndex would also properly work for filter. Here is the code copied for stack-overflow, which is used to remove the first line of a csv file: JavaRDD rawInputRdd = sparkContext.textFile(dataFile); Function2 removeHeader= new Function2

the way to compare any two adjacent elements in one rdd

2015-12-04 Thread Zhiliang Zhu
Hi All, I would like to compare any two adjacent elements in one given rdd, just as the single machine code part: int a[N] = {...};for (int i=0; i < N - 1; ++i) {   compareFun(a[i], a[i+1]);}... mapPartitions may work for some situations, however, it could not compare elements in different  

Re: the way to compare any two adjacent elements in one rdd

2015-12-04 Thread Zhiliang Zhu
Hi DB Tsai, Thanks very much for your kind reply! Sorry that for one more issue, as tested it seems that filter could only return JavaRDD but not any JavaRDD , is it ?Then it is not much convenient to do general filter for RDD, mapPartitions could work some, but if some partition will left and

Re: the way to compare any two adjacent elements in one rdd

2015-12-04 Thread DB Tsai
This is tricky. You need to shuffle the ending and beginning elements using mapPartitionWithIndex. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Fri, Dec 4, 2015 at 10:30 PM, Zhiliang Zhu