Re: textFile() ordering and header rows

2015-02-22 Thread Nicholas Chammas
I guess on a technicality the docs just say "first item in this RDD", not
"first line in the source text file". AFAIK there is no way apart from
filtering to remove header lines
.

As long as first() always returns the same value for a given RDD, I think
it's fine, no?

Nick


On Sun Feb 22 2015 at 9:09:01 PM Michael Malak
 wrote:

> Since RDDs are generally unordered, aren't things like textFile().first()
> not guaranteed to return the first row (such as looking for a header row)?
> If so, doesn't that make the example in
> http://spark.apache.org/docs/1.2.1/quick-start.html#basics misleading?
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


textFile() ordering and header rows

2015-02-22 Thread Michael Malak
Since RDDs are generally unordered, aren't things like textFile().first() not 
guaranteed to return the first row (such as looking for a header row)? If so, 
doesn't that make the example in 
http://spark.apache.org/docs/1.2.1/quick-start.html#basics misleading?

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org