coufon commented on issue #5594: [AIRFLOW-4924] Loading DAGs asynchronously in 
Airflow webserver
URL: https://github.com/apache/airflow/pull/5594#issuecomment-512600274
 
 
   Thanks Ash for the review. The comments are helpful. Using stringified DAG 
does lose information on UI, I would like to iterate more on how to minimize 
this loss. To your questions:
   
   > This limitation needs to be addressed, it would make dags using custom 
operators apperar wrong in the UI wouldn't it?
   
   Yes. I agree we should improve this. We don't have this in the first version 
because we think UI down is a bigger issue. But we should consider this.
   
   My initial idea to support this is to collect a list of module file paths in 
the DAG collecting subprocess. A module path is added if it detects a 
non-Airflow module. On the webserver main process, it firstly loads all modules 
in the list, then unpickled DAGs.
   
   This maybe abused if the user defines Operators in each DAG file: it goes 
back to the case that webserver process has to load all DAGs files. But as long 
as the user defined Operator in shared modules. It would be a small overhead. 
How does it sound?
   
   > Additionally I would like to see more detail about what the stringified 
form is - I don't see any explicit tests covering this.
   
   The stringified form is that all fields that can not be pickled or unpickled 
are replaced. Normally there are two cases:
   
   (1) Local functions and lambda functions:
   They can not be pickled: they are replaced by the string of their source 
code. The source code can be displayed normally on UI, however, a template 
using the function can not be rendered. UI shows error when user click the 
template page.
   
   (2) Customer defined operators:
   They can not be unpickled: replace by BaseOperator, they are displayed as 
BaseOperator on UI.
   
   (3) Other customer defined modules used as DAG or task fields:
   They can not be unpickled: they are replaced by a class string, e.g., 
'<__main__.A object at 0x7f4121780828>'
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to