Re: textFile() ordering and header rows

2015-02-22 Thread Nicholas Chammas
I guess on a technicality the docs just say first item in this RDD, not first line in the source text file. AFAIK there is no way apart from filtering to remove header lines http://stackoverflow.com/a/24734612/877069. As long as first() always returns the same value for a given RDD, I think it's

textFile() ordering and header rows

2015-02-22 Thread Michael Malak
Since RDDs are generally unordered, aren't things like textFile().first() not guaranteed to return the first row (such as looking for a header row)? If so, doesn't that make the example in http://spark.apache.org/docs/1.2.1/quick-start.html#basics misleading?