Hi,

This might be an opportunity to give a huge speed bump to toLocalIterator.

Method toLocalIterator fetches the partitions to the driver one by one. This is great. What is not so great, is that any required computation for the yet-to-be-fetched-partitions is not kicked off until it is fetched. Effectively only one partition is being computed at the same time, giving idle resources and longer wait time.

Is this observation correct?

Is it possible to have concurrent computation on all partitions while retaining the download-a-partition at a time behavior?

Kind regards,
    Erik.

--
Erik van Oosten
http://www.day-to-day-stuff.blogspot.com/


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to