Unsubscribe

2023-12-21 Thread yxj1141
Unsubscribe

Re: ShuffleManager and Speculative Execution

2023-12-21 Thread Mich Talebzadeh
Interesting point.

As I understand, the key point is the ShuffleManager ensures that only one
map output file is processed by the reduce task, even when multiple
attempts succeed. So it is not a random selection process. At the reduce
stage, only one copy of the map output needs to be read by the reduce task.
Now which copies, if I am correct, much like other classical examples,
Spark prioritizes the copy that completes first (FIFO). The first completed
instance output will be used, and the output from the other speculative
instances will be ignored. This makes sense as the reduce stage can proceed
with the earliest available data, minimizing the impact of speculative
execution on job completion time which is another important factor.

HTH

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 21 Dec 2023 at 17:51, Enrico Minack  wrote:

> Hi Spark devs,
>
> I have a question around ShuffleManager: With speculative execution, one
> map output file is being created multiple times (by multiple task
> attempts). If both attempts succeed, which is to be read by the reduce
> task in the next stage? Is any map output as good as any other?
>
> Thanks for clarification,
> Enrico
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


ShuffleManager and Speculative Execution

2023-12-21 Thread Enrico Minack

Hi Spark devs,

I have a question around ShuffleManager: With speculative execution, one 
map output file is being created multiple times (by multiple task 
attempts). If both attempts succeed, which is to be read by the reduce 
task in the next stage? Is any map output as good as any other?


Thanks for clarification,
Enrico


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org