[ https://issues.apache.org/jira/browse/AIRFLOW-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005764#comment-17005764 ]
t oo commented on AIRFLOW-4464: ------------------------------- ran into this..... https://github.com/apache/airflow/blob/1.10.6/airflow/models/dagrun.py#L392-L399 is where error was raised IntegrityError: (MySQLdb._exceptions.IntegrityError) (1062, "Duplicate entry File "/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/jobs/scheduler_job.py", line 157, in _run_file_processor pickle_dags) File "/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/jobs/scheduler_job.py", line 1591, in process_file self._process_dags(dagbag, dags, ti_keys_to_schedule) File "/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/jobs/scheduler_job.py", line 1276, in _process_dags self._process_task_instances(dag, tis_out) File "/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/jobs/scheduler_job.py", line 761, in _process_task_instances run.verify_integrity(session=session) File "/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/utils/db.py", line 70, in wrapper return func(*args, **kwargs) File "/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/models/dagrun.py", line 399, in verify_integrity session.commit() > Fix case-insensitive id columns in mysql > ---------------------------------------- > > Key: AIRFLOW-4464 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4464 > Project: Apache Airflow > Issue Type: Improvement > Components: database > Reporter: Joshua Carp > Assignee: Joshua Carp > Priority: Minor > Labels: mysql > > By default, string comparisons in mysql are case-insensitive, so the task ids > "foo" and "FOO" are treated as identical. This means that a dag with those > task ids will fail to schedule with a sqlalchemy `IntegrityError` using > mysql, but not postgres or sqlite. This situation probably doesn't happen > often, and users probably shouldn't use task ids that are identical except > for case, but I think we should improve the behavior here. A few options: > > * Configure sqlalchemy to use a binary collation for string id columns under > mysql so that string comparisons are case-sensitive. > * Require dag and task ids to be unique regardless of case. This would be a > breaking change. > * Document that mysql users should configure mysql to use binary collations > for string types by default. This would still show users a 500 if the > database isn't configured correctly. > > I'll submit a pull request with a failing unit test to describe the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)