Re: [DISCUSS] Airflow Scheduling Delay Metric Definition

Mehta, Shubham Wed, 08 Jun 2022 14:35:49 -0700

Ping,

I’m very interested in this as well. A good metric can help us benchmark and 
identify potential improvements in the scheduler performance.
In order to understand the proposal better, can you please share where and how 
do you intend to use “Scheduling delay”? Is it meant for benchmarking or stress 
testing only? Do you plan to expose it to the users in the Airflow UI?

Thanks
Shubham

From: Ping Zhang <pin...@umich.edu>
Reply-To: "dev@airflow.apache.org" <dev@airflow.apache.org>
Date: Wednesday, June 8, 2022 at 11:58 AM
To: "dev@airflow.apache.org" <dev@airflow.apache.org>, "vik...@astronomer.io" 
<vik...@astronomer.io>
Subject: RE: [EXTERNAL][DISCUSS] Airflow Scheduling Delay Metric Definition

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.

Hi Vikram,

Thanks for pointing that out, 'task latency',

"we define task latency as the time it takes for a task to begin executing once 
its dependencies have been met."

It will be great if you can elaborate more about "begin executing" and how you 
calculate "its dependencies have been met.".

If the 'begin executing' means the state of ti becomes running, then the 
'Scheduling Delay' metric focuses on the overhead introduced by the scheduler.

In our prod and stress test, we use the `task_instance_audit` table ( a new row 
is created whenever there is state change in task_instance table) to compute 
the time of a ti should be scheduled.

Thanks,

Ping

On Wed, Jun 8, 2022 at 11:25 AM Vikram Koka <vik...@astronomer.io.invalid> 
wrote:
Ping,

I am quite interested in this topic and trying to understand the difference 
between the "scheduling delay" metric articulated as compared to the "task 
latency" aka "task lag" metric which we have been using before.

As you may recall, we have been using two specific metrics to benchmark 
Scheduler performance, specifically "task latency" and "task throughput" since 
Airflow 2.0.
These were described in the 2.0 Scheduler blog 
post<https://www.astronomer.io/blog/airflow-2-scheduler/>
Specifically, within that we defined task tatency as the time it takes for the 
task to begin executing once it's dependencies are all met.

Thanks,
Vikram

On Wed, Jun 8, 2022 at 10:25 AM Ping Zhang 
<pin...@umich.edu<mailto:pin...@umich.edu>> wrote:
Hi Airflow Community,

Airflow is a scheduling platform for data pipelines, however there is no good 
metric to measure the scheduling delay in the production and also the stress 
test environment. This makes it hard to catch regressions in the scheduler 
during the stress test stage.

I would like to propose an airflow scheduling delay metric definition. Here is 
the detailed design of the metric and its implementation:
https://docs.google.com/document/d/1NhO26kgWkIZJEe50M60yh_jgROaU84dRJ5qGFqbkNbU/edit?usp=sharing

Please take a look and any feedback is welcome.

Thanks,

Ping

Re: [DISCUSS] Airflow Scheduling Delay Metric Definition

Reply via email to