I guess on a technicality the docs just say first item in this RDD, not
first line in the source text file. AFAIK there is no way apart from
filtering to remove header lines
http://stackoverflow.com/a/24734612/877069.
As long as first() always returns the same value for a given RDD, I think
it's
Since RDDs are generally unordered, aren't things like textFile().first() not
guaranteed to return the first row (such as looking for a header row)? If so,
doesn't that make the example in
http://spark.apache.org/docs/1.2.1/quick-start.html#basics misleading?