nicolas-gaillard opened a new issue, #26936:
URL: https://github.com/apache/airflow/issues/26936

   ### Apache Airflow version
   
   2.4.1
   
   ### What happened
   
   Hi everyone,
   
   Being on Airflow 2.3.0, I am in the process of migrating to 2.4.1 and I am 
having an issue with the parsing of DAGs (which is affecting the UI).
   
   In order to reuse code, we encapsulate DAGs in Python classes. It happens 
that a DAG inherits from another one to modify a behavior while preserving the 
original shape of the DAG as shown in the example below:
   
   ```python
   # airflow/app/dags/dummyA/dag.py
   from datetime import datetime
   from airflow.decorators import dag, task
   
   class BaseDag:
       START_DATE = datetime(2022, 1, 1)
   
       def __init__(self, message: str):
           self.message = message
   
       def dag_wrapper(self, dag_id: str):
           @dag(dag_id=dag_id, start_date=self.START_DATE, catchup=False)
           def _base_dag():
   
               @task(task_id="print_message")
               def print_message(message: str):
                   print(message)
   
               print_message(self.message)
   
           return _base_dag()
   
   BaseDag("my message").dag_wrapper("BaseDag")
   ```
   
   ```python
   # airflow/app/dags/dummyB/dag.py
   from app.dags.dummyA.dag import BaseDag
   
   class ChildDag(BaseDag):
   
       def __init__(self, message: str):
           self.message = f"custom {message}"
   
   ChildDag("my message").dag_wrapper("ChildDag")
   ```
   
   We use an extremely basic configuration of Airflow with a containerized 
Postgres database, a container for the webserver and one for the scheduler 
(which uses the LocalExecutor).
   
   During the airflow db init, I have the following error:
   ```
   ERROR [airflow.models.dagbag.DagBag] Exception bagging dag: BaseDag
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/dagbag.py", 
line 484, in _bag_dag
       raise AirflowDagDuplicatedIdException(
   airflow.exceptions.AirflowDagDuplicatedIdException: Ignoring DAG BaseDag 
from /usr/local/airflow/app/dags/dummyB/dag.py - also found in 
/usr/local/airflow/app/dags/dummyA/dag.py
   ERROR [airflow.models.dagbag.DagBag] Failed to bag_dag: 
/usr/local/airflow/app/dags/dummyB/dag.py
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/dagbag.py", 
line 425, in _process_modules
       self.bag_dag(dag=dag, root_dag=dag)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/dagbag.py", 
line 452, in bag_dag
       self._bag_dag(dag=dag, root_dag=root_dag, recursive=True)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/dagbag.py", 
line 484, in _bag_dag
       raise AirflowDagDuplicatedIdException(
   airflow.exceptions.AirflowDagDuplicatedIdException: Ignoring DAG BaseDag 
from /usr/local/airflow/app/dags/dummyB/dag.py - also found in 
/usr/local/airflow/app/dags/dummyA/dag.py
   ```
   
   This error does not impact the functioning of Airflow or my DAGs but when I 
go to the interface and look at the BaseDag code (it’s good for the ChildDag):
   
![image](https://user-images.githubusercontent.com/14204307/194590602-4057df15-6852-42c3-9c0c-b047b8c704b2.png)
   
   This behavior is confirmed by checking the database (dag table):
   
![image](https://user-images.githubusercontent.com/14204307/194590643-1d9c3424-01dd-448e-989b-62e74e03028f.png)
   
   If I run the DAG, it is indeed the correct code that is executed and I have 
then the correct code displayed but if I reload the UI, it is again the code of 
the child class that is displayed.
   
   What is surprising is that when I display the `fileloc` attribute of these 
two DAGs, it is the file path of the `BaseDag` that is displayed.
   
   (In case I don't do the `airflow db init`, I observe this same behavior on 
the interface.)
   
   
   
   ### What you think should happen instead
   
   The `BaseDag` code should be displayed (instead of the `ChildDag` one).
   
   ### How to reproduce
   
   Run airflow 2.4.1 instance with these two DAGs and you should see the wrong 
code in the UI (to display error logs, you just can run `airflow db init`).
   
   ### Operating System
   
   Docker's image `apache/airflow:2.4.1-python3.8` (Debian GNU/Linux 11 
(bullseye))
   
   ### Versions of Apache Airflow Providers
   
   ```
   apache-airflow-providers-common-sql==1.2.0
   apache-airflow-providers-docker==3.2.0
   apache-airflow-providers-odbc==3.1.2
   apache-airflow-providers-postgres==5.2.2
   ```
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   used image: `apache/airflow:2.4.1-python3.8` (Python 3.8)
   * a Postgres container (postgres:14.4)
   * an airflow init container (`airflow db init; airflow db upgrade; airflow 
users create`) 
   * a scheduler (`LocalExecutor`)
   * a webserver
   (This is a simplified version of the official docker-compose.)
   
   ### Anything else
   
   This problem occurs every time and it happened when I upgraded from airflow 
2.3.4 to 2.4.1, no other libraries were changed.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to