I have an RDD which is potentially too large to store in memory with
collect. I want a single task to write the contents as a file to hdfs. Time
is not a large issue but memory is.
I say the following converting my RDD (scans) to a local Iterator. This
works but hasNext shows up as a separate task and takes on the order of 20
sec for a medium sized job -
is *toLocalIterator a bad function to call in this case and is there a
better one?*











*public void writeScores(final Appendable out, JavaRDD<IScoredScan>
scans) {    writer.appendHeader(out, getApplication());
Iterator<IScoredScan> scanIterator = scans.toLocalIterator();
while(scanIterator.hasNext())  {        IScoredScan scan =
scanIterator.next();        writer.appendScan(out, getApplication(),
scan);    }    writer.appendFooter(out, getApplication());}*

Reply via email to