You can use mapPartitionsWithIndex and look at the partition index (0 will be 
the first partition) to decide whether to skip the first line.

Matei

On Apr 14, 2014, at 8:50 AM, Ethan Jewett <esjew...@gmail.com> wrote:

> We have similar needs but IIRC, I came to the conclusion that this would only 
> work on ordered RDDs, and then you would still have to figure out which 
> partition is the first one. I ended up deciding it would be best to just drop 
> the header lines from a Scala iterator before creating an RDD based on it. 
> Not sure if this was the "right" thing to do, but would that work for you?
> 
> Regards,
> Ethan
> 
> 
> On Mon, Apr 14, 2014 at 10:24 AM, Philip Ogren <philip.og...@oracle.com> 
> wrote:
> Has there been any thought to adding a tail() method to RDD?  It would be 
> really handy to skip over the first item in an RDD when it contains header 
> information.  Even better would be a drop(int) function that would allow you 
> to skip over several lines of header information.  Our attempts to do 
> something equivalent with a filter() call seem a bit contorted.  Any thoughts?
> 
> Thanks,
> Philip
> 

Reply via email to