Re: [SparkListener] Calculating the total amount of re-computations / waste

Emil Ejbyfeldt Tue, 18 Oct 2022 22:50:18 -0700

Hi,

I don't think the assumption for T2 is correct. For example if there isfetch failure (that needs recompuration in a earlier stage) for thefirst task in a stage that will cause the stage be retried and most ofthe actual work will happen with `stage-attempt-number` > 0.

I believe the correct way to do this calculation on the task level andlook which task has run multiple times and calculate based on that. Buteven then there is things like persist that will mean that the same taskmight do different amounts of work so it might not be clear which cputime should be used for the "not wasted" task.


/ Emil

On 14/10/2022 14:54, Faiz Halde wrote:

Hello,
We run our spark workloads on spot and we would like to quantify theimpact of spot interruptions on our workloads. We are proposing thefollowing metric but would like your opinions on it
We are leveraging Spark's Event Listener and performing the following

T = task
T1 = sum(T.execution-time) for all T where T.status=failed andT.stage-attempt-number = 0
T2 = sum(T.execution-time) for all T where T.stage-attempt-number > 0

Tall = sum(T.execution-time)

Retry% = (T1 + T2) / Tall

The assumption is that
T1 – IF a stage is executing for the first time then only tasks thatfailed was wasteT2 – every task executed for a stage with stage-attempt-number > 0 is aretry since the stage was succeeded previously


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [SparkListener] Calculating the total amount of re-computations / waste

Reply via email to