Hi Lionel,

Great questions, most of them are for historic reasons.

Getting run_type form run_id: should only be used for back-compat -- the run_type column didn't used to exist (it was only added about 6-9 months ago from my rough memory) but going forward the "prefix" on run_id has no meaning anymore, run_type is all that matters.

run_id vs execution_date: I have plans (and I'm slowly working towards this) to make execution_date /not/ unique on the dag_run. For example lets say you have two (or n) models you want to try out and see which performs better. To really compare them you need them to operate on the same data, so ideally that means the same execution_date.

run_id is just meant to be that -- an identifier. It's exact value holds _no_ meaning to Airflow anymore, and we are free to have it take whatever value makes most sense to a user.

As to your suggestions:
1. Yes, more clear docs would always be good
2. Yes, we should avoid doing this. Do we still do this anywhere?
3. As per your PR, I think making the behaviour configurable makes sense -- as some airflow install operate "across" more than one TZ, so having them all be UTC might be a good option there.

Thanks,
Ash


On Wed, Aug 18 2021 at 10:33:41 +0800, Lionel Zhao <[email protected]> wrote:
Hi guys,

When I try to use the airflow, I found the dag `run_id` shown on the page is the UTC time and my time zone is +8:00, it makes me quite hard to know which runs exactly are?

For example, I trigger a dag run at ‘2020-08-18 10:10:00’ but the dag `run_id` is `2020-08-18 02:10:00`.

So I create a PR here: https://github.com/apache/airflow/pull/17502 to localize the dag `run_id` and the PR is WIP now.

But I think we can have a discussion about the `run_id`. Actually, it makes me quite confused about the `run_id` definition when I check the sources.

There are 2 points:

Actually, most of the time we use the `execution_date` to query the dag_runs, and there is also a UNIQUE_KEY( dag_id+ execution_date), why do we still need another key to query. And in fact, the `execution_date` can be the `run_id` already and we don’t need another `run_id`. If we want to use the `run_id` to let the user know when the task extract ly run, but it is UTC time, and it is very hard for users to useI saw use in some places, we get the run_type from the `run_id`, but we didn’t set a clear rule of the `run_id`. It will be a risk in the future because it is a hidden rule of the dag `run_id`.
For my suggestions:

1. We should clear the definition of the `run_id` and make a clear rule of it.

2. Avoid getting the `run_type` from the `run_id` and only use the `run_type` in the dag_run

3. Change the `run_id` to local time to make the user know the exact run time easily.





Just awider discussions, let me know what do you think.

Thanks a lot





From,

Lionel Zhao




Reply via email to