For this, mapWithPartitionsWithIndex would also properly work for filter.
Here is the code copied for stack-overflow, which is used to remove the first 
line of a csv file:
JavaRDD<String> rawInputRdd = sparkContext.textFile(dataFile);

Function2 removeHeader= new Function2<Integer, Iterator<String>, 
Iterator<String>>() {
    @Override
    public Iterator<String> call(Integer index, Iterator<String> iterator) 
throws Exception {
        if(index == 0 && iterator.hasNext()) { //for my usage, 
            iterator.next();               //compare any two adjacent elements, 
or do filter,
            return iterator;         //then index parameter is useless here, 
just is OK to view iterator as from one logical iterator/partition
                                     // is it
        } else
            return iterator;
    }
};
JavaRDD<String> inputRdd = rawInputRdd.mapPartitionsWithIndex(removeHeader, 
false);    On Saturday, December 5, 2015 3:52 PM, Zhiliang Zhu 
<zchl.j...@yahoo.com.INVALID> wrote:
 

 Hi DB Tsai,
Thanks very much for your kind reply!
Sorry that for one more issue, as tested it seems that filter could only return 
JavaRDD<Boolean> but not any JavaRDD<T> , is it ?Then it is not much convenient 
to do general filter for RDD, mapPartitions could work some, but if some 
partition will left and return none element after filter by mapPartitions, some 
problemwill be there. 
Best Wishes!Zhiliang
 


    On Saturday, December 5, 2015 3:00 PM, DB Tsai <dbt...@dbtsai.com> wrote:
 

 This is tricky. You need to shuffle the ending and beginning elements
using mapPartitionWithIndex.

Sincerely,

DB Tsai
----------------------------------------------------------
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D


On Fri, Dec 4, 2015 at 10:30 PM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote:
> Hi All,
>
> I would like to compare any two adjacent elements in one given rdd, just as
> the single machine code part:
>
> int a[N] = {...};
> for (int i=0; i < N - 1; ++i) {
>    compareFun(a[i], a[i+1]);
> }
> ...
>
> mapPartitions may work for some situations, however, it could not compare
> elements in different  partitions.
> foreach also seems not work.
>
> Thanks,
> Zhiliang
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



   

  

Reply via email to