Naive question: Instead of running the code on the scheduler - could the condition check be delegated to the triggerer?
On Fri, Feb 2, 2024 at 2:33 PM Pierre Jeambrun <pierrejb...@gmail.com> wrote: > But maybe it’s time to reconsider that :), curious to see what others > think. > > On Fri 2 Feb 2024 at 20:30, Pierre Jeambrun <pierrejb...@gmail.com> wrote: > > > I like the idea and I understand that it might help in some use cases. > > > > The first concern that I have is that it would allow user code to run in > > the scheduler, if I understand correctly. This would have big > implications > > in terms of security and how our security model works. (For instance the > > scheduler is a trusted component and has direct access to the DB, AIP-44 > > assumption) > > > > If I remember correctly this is a route that we specifically tried to > stay > > away from. > > > > On Fri 2 Feb 2024 at 20:03, Xiaodong (XD) DENG <xd.d...@apple.com.invalid > > > > wrote: > > > >> Hi folks, > >> > >> I’m writing to share my thought regarding the possibility of supporting > >> “custom TI dependencies”. > >> > >> Currently we maintain the dependency check rules under > >> “airflow.ti_deps.deps". They cover the dependency checks like if there > are > >> available pool slot/if the concurrency allows/TI trigger rules/if the > state > >> is valid, etc., and play essential role in the scheduling process. > >> > >> One idea was brought up in our team's internal discussion: why shouldn’t > >> we support custom TI dependencies? > >> > >> In details: just like the cluster policies > >> (dag_policy/task_policy/task_instance_mutation_hook/pod_mutation_hook), > if > >> we support users add their own dependency checks as custom classes (and > >> also put under airflow_local_settings.py), it will allow users to have > much > >> higher flexibility in the TI scheduling. These custom TI deps should be > >> added as additions to the existing default deps (not replacing or > removing > >> any of them). > >> > >> For example: similar to check for pool availability/concurrency, the job > >> may need to check for user’s infra-specific conditions, like if a GPU is > >> available right now (instead of competing with other jobs randomly), or > if > >> an external system API is ready to be called (otherwise wait a bit ). > And a > >> lot more other possibilities. > >> > >> Why cluster policies won’t help here? task_instance_mutation_hook is > >> executed in a “worker”, not in the DAG file processor, just before the > TI > >> is executed. What we are trying to gain some control here, though, is in > >> the scheduling process (based on custom rules, to decide if the TI state > >> should be updated so it can be scheduled for execution). > >> > >> I would love to know how community finds this idea, before we start to > >> implement anything. Any quesiton/suggestion would be greatly > appreciated. > >> Many thanks! > >> > >> > >> XD > >> > >> > >> >