Has there been any thought to adding a tail() method to RDD? It would
be really handy to skip over the first item in an RDD when it contains
header information. Even better would be a drop(int) function that
would allow you to skip over several lines of header information. Our
attempts to
We have similar needs but IIRC, I came to the conclusion that this would
only work on ordered RDDs, and then you would still have to figure out
which partition is the first one. I ended up deciding it would be best to
just drop the header lines from a Scala iterator before creating an RDD
based on
You can use mapPartitionsWithIndex and look at the partition index (0 will be
the first partition) to decide whether to skip the first line.
Matei
On Apr 14, 2014, at 8:50 AM, Ethan Jewett esjew...@gmail.com wrote:
We have similar needs but IIRC, I came to the conclusion that this would only