I guess on a technicality the docs just say "first item in this RDD", not "first line in the source text file". AFAIK there is no way apart from filtering to remove header lines <http://stackoverflow.com/a/24734612/877069>.
As long as first() always returns the same value for a given RDD, I think it's fine, no? Nick On Sun Feb 22 2015 at 9:09:01 PM Michael Malak <michaelma...@yahoo.com.invalid> wrote: > Since RDDs are generally unordered, aren't things like textFile().first() > not guaranteed to return the first row (such as looking for a header row)? > If so, doesn't that make the example in > http://spark.apache.org/docs/1.2.1/quick-start.html#basics misleading? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >