Re: sc.parallelise to work more like a producer/consumer?

2015-07-30 Thread Kostas Kougios
there is a work around. sc.parallelise(items, items size / 2) This way each executor will get a batch of 2 items at a time, simulating a producer-consumer. With /4 it will get 4 items. -- View this message in context:

sc.parallelise to work more like a producer/consumer?

2015-07-28 Thread Kostas Kougios
Hi, I am using sc.parallelise(...32k of items) several times for 1 job. Each executor takes x amount of time to process it's items but this results in some executors finishing quickly and staying idle till the others catch up. Only after all executors complete the first 32k batch, the next batch