I have an RDD which is potentially too large to store in memory with collect. I want a single task to write the contents as a file to hdfs. Time is not a large issue but memory is. I say the following converting my RDD (scans) to a local Iterator. This works but hasNext shows up as a separate task and takes on the order of 20 sec for a medium sized job - is *toLocalIterator a bad function to call in this case and is there a better one?*
*public void writeScores(final Appendable out, JavaRDD<IScoredScan> scans) { writer.appendHeader(out, getApplication()); Iterator<IScoredScan> scanIterator = scans.toLocalIterator(); while(scanIterator.hasNext()) { IScoredScan scan = scanIterator.next(); writer.appendScan(out, getApplication(), scan); } writer.appendFooter(out, getApplication());}*