[ 
https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129818#comment-17129818
 ] 

ASF GitHub Bot commented on AIRFLOW-3973:
-----------------------------------------

eeshugerman commented on pull request #9182:
URL: https://github.com/apache/airflow/pull/9182#issuecomment-641595881


   :+1: I see, thanks, I was wondering how that works.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is 
> used for the internal database
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-3973
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3973
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Elliott Shugerman
>            Assignee: Elliott Shugerman
>            Priority: Minor
>             Fix For: 2.0.0
>
>
> h2. Notes:
>  * This does not occur if the database is already initialized. If it is, run 
> `resetdb` instead to observe the bug.
>  * This does not occur with the default SQLite database.
> h2. Example
> {{ERROR [airflow.models.DagBag] Failed to import: 
> /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): 
> File 
> "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
>  line 1236, in _execute_context cursor, statement, parameters, context File 
> "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
>  line 536, in do_execute cursor.execute(statement, parameters) 
> psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM 
> variable}}
> h2. Explanation
> The first thing {{airflow initdb}} does is run the Alembic migrations. All 
> migrations are run in one transaction. Most tables, including the 
> {{variable}} table, are defined in the initial migration. A [later 
> migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py]
>  imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} 
> calls its {{collect_dags}} method, which scans the DAGs directory and 
> attempts to load all DAGs it finds. When it loads a DAG that uses a 
> {{Variable}}, it will query the database to see if that {{Variable}} is 
> defined in the {{variable}} table. It's not clear to me how exactly the 
> connection for that query is created, but I think it is apparent that it does 
> _not_ use the same transaction that is used to run the migrations. Since the 
> migrations are not yet complete, and all migrations are run in one 
> transaction, the migration that creates the {{variable}} table has not yet 
> been committed, and therefore the table does not exist to any other 
> connection/transaction. This raises {{ProgrammingError}}, which is caught and 
> logged by {{collect_dags}}.
>  
> h2. Proposed Solution
> Run each Alembic migration in its own transaction. I will open a pull request 
> which accomplishes this shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to