Hi,
This might be an opportunity to give a huge speed bump to toLocalIterator.
Method toLocalIterator fetches the partitions to the driver one by one.
This is great. What is not so great, is that any required computation
for the yet-to-be-fetched-partitions is not kicked off until it is
fetched. Effectively only one partition is being computed at the same
time, giving idle resources and longer wait time.
Is this observation correct?
Is it possible to have concurrent computation on all partitions while
retaining the download-a-partition at a time behavior?
Kind regards,
Erik.
--
Erik van Oosten
http://www.day-to-day-stuff.blogspot.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org