Hi folks, To follow up the earlier discussion, I created a PR https://github.com/apache/airflow/pull/37778, in which I implemented the idea I had in mind in the very beginning:
- Add custom TI Dep(s), which can be used by the Scheduling mechanism so that the TI Scheduling decision will consider these custom TI Dep(s). - The design is that: the DagRun will still always be created as normal, then its “task_instance_scheduling_decisions()” method will consider the custom TI Dep(s) and we can achieve more flexible scheduling at TI level. I understand the folks have concern about the potential impact on performance. The design I propose here should be explicit enough, so that the user/admin who is trying to use this feature will most likely be sure about what they are doing. Any feedback/input would be greatly appreciated, and we can continue the discussion either here or in the PR. Many thanks! Regards, XD > On Feb 5, 2024, at 06:29, Jarek Potiuk <ja...@potiuk.com> wrote: > > I had my answer drafted before Ash sent it so I will copy it here. But I > very much agree with Ash's assessment. > > I did not realise we could do that with task_policy but if we can, then I > would go for it - having it undocumented feature for power users. Making it > a feature is a big task that would require a lot of "surrounding" stuff to > make it safe to use. > > Here is what I wrote before Ash answered: > > The ti-deps we have are already composable and foreseen for extensions, so > generally speaking, yes it would be a possibility to add it (as XD > mentioned as plugins for security - similar to timetables). > > Yes, with sufficient performance guide rails (metrics and at least some > guidelines on how to develop it) it could be an interesting (although > pretty niche IMHO) addition. Specifically helping/diagnosing issues of > someone who has custom dependencies with some performance/coding problems > will be very difficult. > > I think if we are to learn from Custom Timetables - as far as I understand > (but maybe I am wrong) there are very few users who use them, one of the > reasons is that they are often difficult to get right and the docs we have > do not explain how to develop and test them, just showing some basic > examples. > > I'd say for me an important part of such change is if the change comes up > with a number of predefined, reusable (but non-standard) ti_dependencies > showing how to develop and implement such custom deps and likely a bit more > than just showing a "plain" example. Also with a set of performance > guidelines on how to develop new ones (what to avoid), showing examples > and some tooling to test them. > > Coming with those real-life, useful (but disable-by-default) implemented > TI_Deps would be a really important part - showing the actual need we have > for it now.. > > I think **just** allowing to inject a ti_dep into current composable deps > is quite a bit too little to open it as "ready to use feature". > > J. > > > > On Mon, Feb 5, 2024 at 3:08 PM Ash Berlin-Taylor <a...@apache.org> wrote: > >> This is an absolute minefield of performance issues as this is the very >> centre of the hot path for the scheduler, so I would advise against any >> casual use of this. >> >> That said: I think it’s possible today. Set `deps` on your operator class >> like >> https://github.com/apache/airflow/blob/main/airflow/models/baseoperator.py#L1128-L1135 >> (I don’t know if it can be set per operator instance or only per operator >> class.) So you can set this from the task_policy. >> >> I would like us no_ to mention this in the docs as I think it’s the 1% of >> 1% of users who can do this without blowing their own foot off. Not until >> we have some good examples of how to do this well from outside of core. >> >> (Not to mention the mentioned caveats about this being “user/operator” >> code execution and all that entails for any work on a multi-tenant >> scheduler in the future) >> >> -ash >> >>> On 5 Feb 2024, at 04:09, Amogh Desai <amoghdesai....@gmail.com> wrote: >>> >>> I already like the idea. >>> >>> It can indeed offer better flexibility and control over the scheduling >>> process beyond the default checks >>> >>> My main concern with the approach is in line with what Jens mentioned, >>> performance but I like yout mitigation through a metric. I'd suggest >>> that the metric should have a threshold value, beyond which it is not at >>> all advised to use the current custom TI. >>> >>> My other concern would be upgrades. These dependency classes could become >>> outdated with version upgrades or airflow. Users should >>> be willing to rewrite / update their custom code based on the airflow >>> version they upgrade to. >>> >>> Thanks & Regards, >>> Amogh Desai >>> >>> >>> >>> >>> On Sun, Feb 4, 2024 at 5:58 AM Xiaodong (XD) DENG >> <xd.d...@apple.com.invalid> >>> wrote: >>> >>>> Hi Jens, >>>> >>>> Many thanks for the feedback. >>>> >>>> I fully agree on the potential performance impact if the custom >> dependency >>>> is not written in the ideal way, and thanks for bringing it up. The deps >>>> are indeed called very frequently, so attention should be paid to ensure >>>> the custom dependency is not dragging the leg. >>>> >>>> This is actually somehow like writing DAG files in Airflow, attention >>>> should be paid otherwise the DAG processor performance would be >> impacted by >>>> the “bad” DAG(s). >>>> >>>> Other than highlighting this to users in the documentation, maybe we >>>> should also consider adding a metric to measure the custom dependency >>>> performance (just like the DAG parsing speed), and give warning in logs >> if >>>> it’s not performant. >>>> >>>> Regarding AIP, I personally don’t find it necessary as this is not going >>>> to be a too big change or significantly touching the architecture. But I >>>> would definitely welcome more inputs here in this email thread. >>>> >>>> Thanks again! >>>> >>>> >>>> XD >>>> >>>>> On Feb 3, 2024, at 00:55, Scheffler Jens (XC-AS/EAE-ADA-T) < >>>> jens.scheff...@de.bosch.com.INVALID> wrote: >>>>> >>>>> I like the idea. >>>>> >>>>> I‘d propose to write-up an AIP so that details of the concept can be >>>> discussed before raising a PR if you feel details might need to be >> aligned. >>>>> >>>>> My concern primarily would be that if somebody uses this and uses >>>> external data or parameters from DB that this could bring-down >> performance >>>> or reduce availability for scheduling because the code might be called >>>> often. Bit this should be just documented and sit in users/admin >>>> responsibility. >>>>> >>>>> Sent from Outlook for iOS<https://aka.ms/o0ukef> >>>>> ________________________________ >>>>> From: Pierre Jeambrun <pierrejb...@gmail.com> >>>>> Sent: Friday, February 2, 2024 11:26:47 PM >>>>> To: dev@airflow.apache.org <dev@airflow.apache.org> >>>>> Subject: Re: Idea for Discussion: custom TI dependencies >>>>> >>>>> I thought that it would allow users to code their own rules and provide >>>>> them to the tasks somehow. If this is restricted to admin or deployment >>>>> managers through plugins or other mechanism, I guess there is no issue >>>> here. >>>>> >>>>> Thanks for answering my concerns. >>>>> >>>>> On Fri 2 Feb 2024 at 22:53, Constance Martineau >>>>> <consta...@astronomer.io.invalid> wrote: >>>>> >>>>>> Not missing anything. It is mainly used for deferrable operators, but >>>> feels >>>>>> like an acceptable place to run user-defined/non-Airflow code portion >> of >>>>>> this, and carry out the custom dependency checks. Assuming of course >>>> that >>>>>> these checks are meant to check external conditions and will never >> need >>>>>> access to Airflow's DB. Once the condition is met, then the scheduler >>>> could >>>>>> carry on scheduling the task. >>>>>> >>>>>> I like the idea, and it was just a thought on how to get around the >>>>>> security concern Pierre brought up. >>>>>> >>>>>> On Fri, Feb 2, 2024 at 4:35 PM Xiaodong (XD) DENG >>>>>> <xd.d...@apple.com.invalid> >>>>>> wrote: >>>>>> >>>>>>> Thanks both for your inputs! >>>>>>> >>>>>>> >>>>>>> Hi Pierre, >>>>>>> >>>>>>> I think the key difference here is: by doing this, we are not >> allowing >>>>>>> Airflow “users” to run their code in scheduler. We are only allowing >>>>>>> Airflow “Admins” to deploy a plugin to run in scheduler, just the >> same >>>> as >>>>>>> dag_policy/task_policy/task_instance_mutation_hook/pod_mutation_hook. >>>>>>> >>>>>>> So I do not think this violates our current preference in terms of >>>>>>> security. >>>>>>> >>>>>>> >>>>>>> Hi Constance, >>>>>>> >>>>>>> I thought the trigger is mainly for deferrable operator cases? It’s >>>> quite >>>>>>> different scenario from what I’m trying to cover here IMHO. >>>>>>> Did I miss anything? Please let me know. >>>>>>> >>>>>>> >>>>>>> Thanks again! Looking forward to more questions/comments! >>>>>>> >>>>>>> >>>>>>> XD >>>>>>> >>>>>>> >>>>>>>> On Feb 2, 2024, at 13:29, Constance Martineau < >>>> consta...@astronomer.io >>>>>> .INVALID> >>>>>>> wrote: >>>>>>>> >>>>>>>> Naive question: Instead of running the code on the scheduler - could >>>>>> the >>>>>>>> condition check be delegated to the triggerer? >>>>>>>> >>>>>>>> On Fri, Feb 2, 2024 at 2:33 PM Pierre Jeambrun < >> pierrejb...@gmail.com >>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> But maybe it’s time to reconsider that :), curious to see what >> others >>>>>>>>> think. >>>>>>>>> >>>>>>>>> On Fri 2 Feb 2024 at 20:30, Pierre Jeambrun <pierrejb...@gmail.com >>> >>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I like the idea and I understand that it might help in some use >>>>>> cases. >>>>>>>>>> >>>>>>>>>> The first concern that I have is that it would allow user code to >>>> run >>>>>>> in >>>>>>>>>> the scheduler, if I understand correctly. This would have big >>>>>>>>> implications >>>>>>>>>> in terms of security and how our security model works. (For >> instance >>>>>>> the >>>>>>>>>> scheduler is a trusted component and has direct access to the DB, >>>>>>> AIP-44 >>>>>>>>>> assumption) >>>>>>>>>> >>>>>>>>>> If I remember correctly this is a route that we specifically tried >>>> to >>>>>>>>> stay >>>>>>>>>> away from. >>>>>>>>>> >>>>>>>>>> On Fri 2 Feb 2024 at 20:03, Xiaodong (XD) DENG >>>>>>> <xd.d...@apple.com.invalid >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi folks, >>>>>>>>>>> >>>>>>>>>>> I’m writing to share my thought regarding the possibility of >>>>>>> supporting >>>>>>>>>>> “custom TI dependencies”. >>>>>>>>>>> >>>>>>>>>>> Currently we maintain the dependency check rules under >>>>>>>>>>> “airflow.ti_deps.deps". They cover the dependency checks like if >>>>>> there >>>>>>>>> are >>>>>>>>>>> available pool slot/if the concurrency allows/TI trigger rules/if >>>>>> the >>>>>>>>> state >>>>>>>>>>> is valid, etc., and play essential role in the scheduling >> process. >>>>>>>>>>> >>>>>>>>>>> One idea was brought up in our team's internal discussion: why >>>>>>> shouldn’t >>>>>>>>>>> we support custom TI dependencies? >>>>>>>>>>> >>>>>>>>>>> In details: just like the cluster policies >>>>>>>>>>> >>>>>>> >> (dag_policy/task_policy/task_instance_mutation_hook/pod_mutation_hook), >>>>>>>>> if >>>>>>>>>>> we support users add their own dependency checks as custom >> classes >>>>>>> (and >>>>>>>>>>> also put under airflow_local_settings.py), it will allow users to >>>>>> have >>>>>>>>> much >>>>>>>>>>> higher flexibility in the TI scheduling. These custom TI deps >>>> should >>>>>>> be >>>>>>>>>>> added as additions to the existing default deps (not replacing or >>>>>>>>> removing >>>>>>>>>>> any of them). >>>>>>>>>>> >>>>>>>>>>> For example: similar to check for pool availability/concurrency, >>>> the >>>>>>> job >>>>>>>>>>> may need to check for user’s infra-specific conditions, like if a >>>>>> GPU >>>>>>> is >>>>>>>>>>> available right now (instead of competing with other jobs >>>> randomly), >>>>>>> or >>>>>>>>> if >>>>>>>>>>> an external system API is ready to be called (otherwise wait a >> bit >>>>>> ). >>>>>>>>> And a >>>>>>>>>>> lot more other possibilities. >>>>>>>>>>> >>>>>>>>>>> Why cluster policies won’t help here? >> task_instance_mutation_hook >>>>>> is >>>>>>>>>>> executed in a “worker”, not in the DAG file processor, just >> before >>>>>> the >>>>>>>>> TI >>>>>>>>>>> is executed. What we are trying to gain some control here, >> though, >>>>>> is >>>>>>> in >>>>>>>>>>> the scheduling process (based on custom rules, to decide if the >> TI >>>>>>> state >>>>>>>>>>> should be updated so it can be scheduled for execution). >>>>>>>>>>> >>>>>>>>>>> I would love to know how community finds this idea, before we >> start >>>>>> to >>>>>>>>>>> implement anything. Any quesiton/suggestion would be greatly >>>>>>>>> appreciated. >>>>>>>>>>> Many thanks! >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> XD >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>>>>> >>>>>>> >>>>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>> >>>> >> >>