Hi Ash,
Thanks for the response. About the suggestion 2 : > 2. Yes, we should avoid doing this. Do we still do this anywhere?> Actually, I haven’t fully understood the `airflow` source code, maybe we can have a check to make sure we don’t use it. About the suggestion 3: > 3. Localize the `run_id` I think about the project, we should care about all the use cases. If the project is installed across more than one `TZ`, the `run_id` should be used UTC time. But if there is only one `TZ` the user use, I think we should give the user an option to localize the `run_id`. So in the PR, I add a config to allow users to choose. On 2021/08/18 21:43:44, Ash Berlin-Taylor <[email protected]> wrote: > Hi Lionel, > > Great questions, most of them are for historic reasons. > > Getting run_type form run_id: should only be used for back-compat -- > the run_type column didn't used to exist (it was only added about 6-9 > months ago from my rough memory) but going forward the "prefix" on > run_id has no meaning anymore, run_type is all that matters. > > run_id vs execution_date: I have plans (and I'm slowly working towards > this) to make execution_date /not/ unique on the dag_run. For example > lets say you have two (or n) models you want to try out and see which > performs better. To really compare them you need them to operate on the > same data, so ideally that means the same execution_date. > > run_id is just meant to be that -- an identifier. It's exact value > holds _no_ meaning to Airflow anymore, and we are free to have it take > whatever value makes most sense to a user. > > As to your suggestions: > 1. Yes, more clear docs would always be good > 2. Yes, we should avoid doing this. Do we still do this anywhere? > 3. As per your PR, I think making the behaviour configurable makes > sense -- as some airflow install operate "across" more than one TZ, so > having them all be UTC might be a good option there. > > Thanks, > Ash > > > On Wed, Aug 18 2021 at 10:33:41 +0800, Lionel Zhao > <[email protected]> wrote: > > Hi guys, > > > > When I try to use the airflow, I found the dag > > `run_id` shown on the page is the UTC time and my time zone is +8:00, > > it makes me quite hard to know which runs exactly are? > > > > For example, I trigger a dag run at ‘2020-08-18 10:10:00’ but the > > dag `run_id` is `2020-08-18 02:10:00`. > > > > So I create a PR here: https://github.com/apache/airflow/pull/17502 > > to localize the dag `run_id` and the PR is WIP now. > > > > But I think we can have a discussion about the `run_id`. Actually, it > > makes me quite confused about the `run_id` definition when I check > > the sources. > > > > There are 2 points: > > > > Actually, most of the time we use the `execution_date` to query the > > dag_runs, and there is also a UNIQUE_KEY( dag_id+ execution_date), > > why do we still need another key to query. And in fact, the > > `execution_date` can be the `run_id` already and we don’t need > > another `run_id`. If we want to use the `run_id` to let the user know > > when the task extract ly run, but it is UTC time, and it is very hard > > for users to useI saw use in some places, we get the run_type from > > the `run_id`, but we didn’t set a clear rule of the `run_id`. It > > will be a risk in the future because it is a hidden rule of the dag > > `run_id`. > > For my suggestions: > > > > 1. We should clear the definition of the `run_id` > > and make a clear rule of it. > > > > 2. Avoid getting the `run_type` from the `run_id` > > and only use the `run_type` in the dag_run > > > > 3. Change the `run_id` to local time to make the > > user know the exact run time easily. > > > > > > > > > > > > Just awider discussions, let me know what do you think. > > > > Thanks a lot > > > > > > > > > > > > From, > > > > Lionel Zhao > > > > > > > >
