Personally I'd find the method useful -- I've often had a .csv file with a header row that I want to drop so filter it out, which touches all partitions anyway. I don't have any comments on the implementation quite yet though.
On Mon, Jul 21, 2014 at 8:24 AM, Erik Erlandson <e...@redhat.com> wrote: > A few weeks ago I submitted a PR for supporting rdd.drop(n), under > SPARK-2315: > https://issues.apache.org/jira/browse/SPARK-2315 > > Supporting the drop method would make some operations convenient, however > it forces computation of >= 1 partition of the parent RDD, and so it would > behave like a "partial action" that returns an RDD as the result. > > I wrote up a discussion of these trade-offs here: > > http://erikerlandson.github.io/blog/2014/07/20/some-implications-of-supporting-the-scala-drop-method-for-spark-rdds/ >