Hi folks, 

To follow up the earlier discussion, I created a PR 
https://github.com/apache/airflow/pull/37778, in which I implemented the idea I 
had in mind in the very beginning:

- Add custom TI Dep(s), which can be used by the Scheduling mechanism so that 
the TI Scheduling decision will consider these custom TI Dep(s). 
- The design is that: the DagRun will still always be created as normal, then 
its “task_instance_scheduling_decisions()” method will consider the custom TI 
Dep(s) and we can achieve more flexible scheduling at TI level.

I understand the folks have concern about the potential impact on performance. 
The design I propose here should be explicit enough, so that the user/admin who 
is trying to use this feature will most likely be sure about what they are 
doing.

Any feedback/input would be greatly appreciated, and we can continue the 
discussion either here or in the PR.
Many thanks!


Regards,
XD


> On Feb 5, 2024, at 06:29, Jarek Potiuk <ja...@potiuk.com> wrote:
> 
> I had my answer drafted before Ash sent it so I will copy it here. But I
> very much agree with Ash's assessment.
> 
> I did not realise we could do that with task_policy but if we can, then I
> would go for it - having it undocumented feature for power users. Making it
> a feature is a big task that would require a lot of "surrounding" stuff to
> make it safe to use.
> 
> Here is what I wrote before Ash answered:
> 
> The ti-deps we have are already composable and foreseen for extensions, so
> generally speaking, yes it would be a possibility to add it (as XD
> mentioned as plugins for security - similar to timetables).
> 
> Yes, with sufficient performance guide rails (metrics and at least some
> guidelines on how to develop it) it could be an interesting (although
> pretty niche IMHO) addition. Specifically helping/diagnosing issues of
> someone who has custom dependencies with some performance/coding problems
> will be very difficult.
> 
> I think if we are to learn from Custom Timetables - as far as I understand
> (but maybe I am wrong) there are very few users who use them, one of the
> reasons is that they are often difficult to get right and the docs we have
> do not explain how to develop and test them, just showing some basic
> examples.
> 
> I'd say for me an important part of such change is if the change comes up
> with a number of  predefined, reusable (but non-standard) ti_dependencies
> showing how to develop and implement such custom deps and likely a bit more
> than just showing a "plain" example. Also with a set of performance
> guidelines on how to develop new ones (what to avoid), showing  examples
> and some tooling to test them.
> 
> Coming with those real-life, useful (but disable-by-default) implemented
> TI_Deps would be a really important part - showing the actual need we have
> for it now..
> 
> I think **just** allowing to inject a ti_dep into current composable deps
> is quite a bit too little to open it as "ready to use feature".
> 
> J.
> 
> 
> 
> On Mon, Feb 5, 2024 at 3:08 PM Ash Berlin-Taylor <a...@apache.org> wrote:
> 
>> This is an absolute minefield of performance issues as this is the very
>> centre of the hot path for the scheduler, so I would advise against any
>> casual use of this.
>> 
>> That said: I think it’s possible today. Set `deps` on your operator class
>> like
>> https://github.com/apache/airflow/blob/main/airflow/models/baseoperator.py#L1128-L1135
>> (I don’t know if it can be set per operator instance or only per operator
>> class.) So you can set this from the task_policy.
>> 
>> I would like us no_ to mention this in the docs as I think it’s the 1% of
>> 1% of users who can do this without blowing their own foot off. Not until
>> we have some good examples of how to do this well from outside of core.
>> 
>> (Not to mention the mentioned caveats about this being “user/operator”
>> code execution and all that entails for any work on a multi-tenant
>> scheduler in the future)
>> 
>> -ash
>> 
>>> On 5 Feb 2024, at 04:09, Amogh Desai <amoghdesai....@gmail.com> wrote:
>>> 
>>> I already like the idea.
>>> 
>>> It can indeed offer better flexibility and control over the scheduling
>>> process beyond the default checks
>>> 
>>> My main concern with the approach is in line with what Jens mentioned,
>>> performance but I like yout mitigation through a metric. I'd suggest
>>> that the metric should have a threshold value, beyond which it is not at
>>> all advised to use the current custom TI.
>>> 
>>> My other concern would be upgrades. These dependency classes could become
>>> outdated with version upgrades or airflow. Users should
>>> be willing to rewrite / update their custom code based on the airflow
>>> version they upgrade to.
>>> 
>>> Thanks & Regards,
>>> Amogh Desai
>>> 
>>> 
>>> 
>>> 
>>> On Sun, Feb 4, 2024 at 5:58 AM Xiaodong (XD) DENG
>> <xd.d...@apple.com.invalid>
>>> wrote:
>>> 
>>>> Hi Jens,
>>>> 
>>>> Many thanks for the feedback.
>>>> 
>>>> I fully agree on the potential performance impact if the custom
>> dependency
>>>> is not written in the ideal way, and thanks for bringing it up. The deps
>>>> are indeed called very frequently, so attention should be paid to ensure
>>>> the custom dependency is not dragging the leg.
>>>> 
>>>> This is actually somehow like writing DAG files in Airflow, attention
>>>> should be paid otherwise the DAG processor performance would be
>> impacted by
>>>> the “bad” DAG(s).
>>>> 
>>>> Other than highlighting this to users in the documentation, maybe we
>>>> should also consider adding a metric to measure the custom dependency
>>>> performance (just like the DAG parsing speed), and give warning in logs
>> if
>>>> it’s not performant.
>>>> 
>>>> Regarding AIP, I personally don’t find it necessary as this is not going
>>>> to be a too big change or significantly touching the architecture. But I
>>>> would definitely welcome more inputs here in this email thread.
>>>> 
>>>> Thanks again!
>>>> 
>>>> 
>>>> XD
>>>> 
>>>>> On Feb 3, 2024, at 00:55, Scheffler Jens (XC-AS/EAE-ADA-T) <
>>>> jens.scheff...@de.bosch.com.INVALID> wrote:
>>>>> 
>>>>> I like the idea.
>>>>> 
>>>>> I‘d propose to write-up an AIP so that details of the concept can be
>>>> discussed before raising a PR if you feel details might need to be
>> aligned.
>>>>> 
>>>>> My concern primarily would be that if somebody uses this and uses
>>>> external data or parameters from DB that this could bring-down
>> performance
>>>> or reduce availability for scheduling because the code might be called
>>>> often. Bit this should be just documented and sit in users/admin
>>>> responsibility.
>>>>> 
>>>>> Sent from Outlook for iOS<https://aka.ms/o0ukef>
>>>>> ________________________________
>>>>> From: Pierre Jeambrun <pierrejb...@gmail.com>
>>>>> Sent: Friday, February 2, 2024 11:26:47 PM
>>>>> To: dev@airflow.apache.org <dev@airflow.apache.org>
>>>>> Subject: Re: Idea for Discussion: custom TI dependencies
>>>>> 
>>>>> I thought that it would allow users to code their own rules and provide
>>>>> them to the tasks somehow. If this is restricted to admin or deployment
>>>>> managers through plugins or other mechanism, I guess there is no issue
>>>> here.
>>>>> 
>>>>> Thanks for answering my concerns.
>>>>> 
>>>>> On Fri 2 Feb 2024 at 22:53, Constance Martineau
>>>>> <consta...@astronomer.io.invalid> wrote:
>>>>> 
>>>>>> Not missing anything. It is mainly used for deferrable operators, but
>>>> feels
>>>>>> like an acceptable place to run user-defined/non-Airflow code portion
>> of
>>>>>> this, and carry out the custom dependency checks. Assuming of course
>>>> that
>>>>>> these checks are meant to check external conditions and will never
>> need
>>>>>> access to Airflow's DB. Once the condition is met, then the scheduler
>>>> could
>>>>>> carry on scheduling the task.
>>>>>> 
>>>>>> I like the idea, and it was just a thought on how to get around the
>>>>>> security concern Pierre brought up.
>>>>>> 
>>>>>> On Fri, Feb 2, 2024 at 4:35 PM Xiaodong (XD) DENG
>>>>>> <xd.d...@apple.com.invalid>
>>>>>> wrote:
>>>>>> 
>>>>>>> Thanks both for your inputs!
>>>>>>> 
>>>>>>> 
>>>>>>> Hi Pierre,
>>>>>>> 
>>>>>>> I think the key difference here is: by doing this, we are not
>> allowing
>>>>>>> Airflow “users” to run their code in scheduler. We are only allowing
>>>>>>> Airflow “Admins” to deploy a plugin to run in scheduler, just the
>> same
>>>> as
>>>>>>> dag_policy/task_policy/task_instance_mutation_hook/pod_mutation_hook.
>>>>>>> 
>>>>>>> So I do not think this violates our current preference in terms of
>>>>>>> security.
>>>>>>> 
>>>>>>> 
>>>>>>> Hi Constance,
>>>>>>> 
>>>>>>> I thought the trigger is mainly for deferrable operator cases? It’s
>>>> quite
>>>>>>> different scenario from what I’m trying to cover here IMHO.
>>>>>>> Did I miss anything? Please let me know.
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks again! Looking forward to more questions/comments!
>>>>>>> 
>>>>>>> 
>>>>>>> XD
>>>>>>> 
>>>>>>> 
>>>>>>>> On Feb 2, 2024, at 13:29, Constance Martineau <
>>>> consta...@astronomer.io
>>>>>> .INVALID>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Naive question: Instead of running the code on the scheduler - could
>>>>>> the
>>>>>>>> condition check be delegated to the triggerer?
>>>>>>>> 
>>>>>>>> On Fri, Feb 2, 2024 at 2:33 PM Pierre Jeambrun <
>> pierrejb...@gmail.com
>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> But maybe it’s time to reconsider that :), curious to see what
>> others
>>>>>>>>> think.
>>>>>>>>> 
>>>>>>>>> On Fri 2 Feb 2024 at 20:30, Pierre Jeambrun <pierrejb...@gmail.com
>>> 
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> I like the idea and I understand that it might help in some use
>>>>>> cases.
>>>>>>>>>> 
>>>>>>>>>> The first concern that I have is that it would allow user code to
>>>> run
>>>>>>> in
>>>>>>>>>> the scheduler, if I understand correctly. This would have big
>>>>>>>>> implications
>>>>>>>>>> in terms of security and how our security model works. (For
>> instance
>>>>>>> the
>>>>>>>>>> scheduler is a trusted component and has direct access to the DB,
>>>>>>> AIP-44
>>>>>>>>>> assumption)
>>>>>>>>>> 
>>>>>>>>>> If I remember correctly this is a route that we specifically tried
>>>> to
>>>>>>>>> stay
>>>>>>>>>> away from.
>>>>>>>>>> 
>>>>>>>>>> On Fri 2 Feb 2024 at 20:03, Xiaodong (XD) DENG
>>>>>>> <xd.d...@apple.com.invalid
>>>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi folks,
>>>>>>>>>>> 
>>>>>>>>>>> I’m writing to share my thought regarding the possibility of
>>>>>>> supporting
>>>>>>>>>>> “custom TI dependencies”.
>>>>>>>>>>> 
>>>>>>>>>>> Currently we maintain the dependency check rules under
>>>>>>>>>>> “airflow.ti_deps.deps". They cover the dependency checks like if
>>>>>> there
>>>>>>>>> are
>>>>>>>>>>> available pool slot/if the concurrency allows/TI trigger rules/if
>>>>>> the
>>>>>>>>> state
>>>>>>>>>>> is valid, etc., and play essential role in the scheduling
>> process.
>>>>>>>>>>> 
>>>>>>>>>>> One idea was brought up in our team's internal discussion: why
>>>>>>> shouldn’t
>>>>>>>>>>> we support custom TI dependencies?
>>>>>>>>>>> 
>>>>>>>>>>> In details: just like the cluster policies
>>>>>>>>>>> 
>>>>>>> 
>> (dag_policy/task_policy/task_instance_mutation_hook/pod_mutation_hook),
>>>>>>>>> if
>>>>>>>>>>> we support users add their own dependency checks as custom
>> classes
>>>>>>> (and
>>>>>>>>>>> also put under airflow_local_settings.py), it will allow users to
>>>>>> have
>>>>>>>>> much
>>>>>>>>>>> higher flexibility in the TI scheduling. These custom TI deps
>>>> should
>>>>>>> be
>>>>>>>>>>> added as additions to the existing default deps (not replacing or
>>>>>>>>> removing
>>>>>>>>>>> any of them).
>>>>>>>>>>> 
>>>>>>>>>>> For example: similar to check for pool availability/concurrency,
>>>> the
>>>>>>> job
>>>>>>>>>>> may need to check for user’s infra-specific conditions, like if a
>>>>>> GPU
>>>>>>> is
>>>>>>>>>>> available right now (instead of competing with other jobs
>>>> randomly),
>>>>>>> or
>>>>>>>>> if
>>>>>>>>>>> an external system API is ready to be called (otherwise wait a
>> bit
>>>>>> ).
>>>>>>>>> And a
>>>>>>>>>>> lot more other possibilities.
>>>>>>>>>>> 
>>>>>>>>>>> Why cluster policies won’t help here?
>> task_instance_mutation_hook
>>>>>> is
>>>>>>>>>>> executed in a “worker”, not in the DAG file processor, just
>> before
>>>>>> the
>>>>>>>>> TI
>>>>>>>>>>> is executed. What we are trying to gain some control here,
>> though,
>>>>>> is
>>>>>>> in
>>>>>>>>>>> the scheduling process (based on custom rules, to decide if the
>> TI
>>>>>>> state
>>>>>>>>>>> should be updated so it can be scheduled for execution).
>>>>>>>>>>> 
>>>>>>>>>>> I would love to know how community finds this idea, before we
>> start
>>>>>> to
>>>>>>>>>>> implement anything. Any quesiton/suggestion would be greatly
>>>>>>>>> appreciated.
>>>>>>>>>>> Many thanks!
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> XD
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>>> 
>>>> 
>> 
>> 

Reply via email to