Re: Can spark provide an option to start reduce stage early?

2015-02-03 Thread Kay Ousterhout
There's a JIRA tracking this here:
https://issues.apache.org/jira/browse/SPARK-2387

On Mon, Feb 2, 2015 at 9:48 PM, Xuelin Cao xuelincao2...@gmail.com wrote:

 In hadoop MR, there is an option *mapred.reduce.slowstart.completed.maps*

 which can be used to start reducer stage when X% mappers are completed. By
 doing this, the data shuffling process is able to parallel with the map
 process.

 In a large multi-tenancy cluster, this option is usually tuned off. But, in
 some cases, turn on the option could accelerate some high priority jobs.

 Will spark provide similar option?



Can spark provide an option to start reduce stage early?

2015-02-02 Thread Xuelin Cao
In hadoop MR, there is an option *mapred.reduce.slowstart.completed.maps*

which can be used to start reducer stage when X% mappers are completed. By
doing this, the data shuffling process is able to parallel with the map
process.

In a large multi-tenancy cluster, this option is usually tuned off. But, in
some cases, turn on the option could accelerate some high priority jobs.

Will spark provide similar option?