potiuk commented on PR #29821:
URL: https://github.com/apache/airflow/pull/29821#issuecomment-1802925162

   @gdavoidan -  what I proposed and I believe @hussein-awala agrees with is 
that you define which fields of BaseOperator you exclude from the check. This 
is what my proposal is about. Find out which of those fields CAN be templated 
and remove them from the check. Basically allow-list of fields you want to 
allow templating. Rather than allow all by default. There is no need "specify" 
them as extra field - just hard-code them.
   
   What @hussein-awala proposes is that you attempt to test it and figure out 
which fields can be templated by templating then one by one and trying to 
serialize the DAG. because not all of them CAN be serialized and you need to 
serialize DAG to DB to execute it. See description ogf the original error 
https://github.com/apache/airflow/issues/29819 whcih explains the reason for 
doing it. Try to do what the author of the oriignal issue and see it for 
yourself when you remoge the protection. You don't need to believe anyone's 
word just try it and read the issues.
   
   > This sounds like a workaround for a workaround :) For some reason, you've 
prohibited the rendering of all the BaseOperator fields (I'm still not 
convinced of the need for this change), and now you're asking me to exclude 
some of the fields from that rule (the rule I don't even agree and the rule you 
can't really justify)...
   
   No. You simply didn't dig deep enough. Look at the issue and try to 
understand what happened there. This is solution to a real problem that 
overcautiously excluded all the fields. What we are kindly asking you to do is 
to find out which fields out of those can be templated.
   
   Just looking at your examples neither `pool` nor `doc_md` should be allowed 
to be tempalted. The`pool` field is evaluated by the scheduler when scheduler 
scheduler tasks for execution, and then when celery worker picks up the task it 
only selects the tasks belonging to it's queue. And it happens long before 
template rendering happens - template rendering happpen in the worker of 
cellery AFTER the worker already used `queue` field to pick tasks. Similary 
`doc_md` should not be templated, because it is used by the UI to display task 
details NOT task instance details - and in order to do do it, cannot render the 
field, because  again - rendering happens at execution time by the task, not 
when the UI decides to display description of the task. Similarly 
`execution_timeout` cannot be rendered, because it is serialized to the DB by 
DAG file processor, WAY BEFORE the task gets executed, and there it is stored 
as INT not string (which is the original issue that this PR solved). And it mak
 es completely no sense to render `execution_timeout` because again, it is used 
BEFORE the task starts and rendering is happening in the worker - because 
execution_timout is applied at the monitoring level of the task in teh process 
that starts the actual "local task process" where rendering happen. So 
basically (if you look at the code) SIGARM is started and alarm on timeout is 
set before JINJA rendering happens. You likely would not like to do:
   
   ```
   signal.setitimer(signal.ITIMER_REAL, '{{ renerd_timer }}')
   ```
   
   That simply would not work.
   
   So assumption when that PR got merged that most (if not all)  of the fields 
of BaseOperators are like that and all of them should be excluded by default. 
   
   You seem to find one that might be excluded. Maybe more can be as well. What 
we are kindly asking you to review all of those, possibly test (as 
@hussein-awala proposed) and come with PR where you specifically exclude those 
fields that CAN be allowlisted and you tested that they work.
   
   Is it too much of an ask?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to