[ 
https://issues.apache.org/jira/browse/AIRFLOW-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005764#comment-17005764
 ] 

t oo commented on AIRFLOW-4464:
-------------------------------

ran into this..... 
https://github.com/apache/airflow/blob/1.10.6/airflow/models/dagrun.py#L392-L399
 is where error was raised   IntegrityError: 
(MySQLdb._exceptions.IntegrityError) (1062, "Duplicate entry 



  File 
"/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/jobs/scheduler_job.py",
 line 157, in _run_file_processor
    pickle_dags)
  File 
"/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/utils/db.py", 
line 74, in wrapper
    return func(*args, **kwargs)
  File 
"/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/jobs/scheduler_job.py",
 line 1591, in process_file
    self._process_dags(dagbag, dags, ti_keys_to_schedule)
  File 
"/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/jobs/scheduler_job.py",
 line 1276, in _process_dags
    self._process_task_instances(dag, tis_out)
  File 
"/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/utils/db.py", 
line 74, in wrapper
    return func(*args, **kwargs)
  File 
"/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/jobs/scheduler_job.py",
 line 761, in _process_task_instances
    run.verify_integrity(session=session)
  File 
"/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/utils/db.py", 
line 70, in wrapper
    return func(*args, **kwargs)
  File 
"/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/models/dagrun.py",
 line 399, in verify_integrity
    session.commit()

> Fix case-insensitive id columns in mysql
> ----------------------------------------
>
>                 Key: AIRFLOW-4464
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4464
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: database
>            Reporter: Joshua Carp
>            Assignee: Joshua Carp
>            Priority: Minor
>              Labels: mysql
>
> By default, string comparisons in mysql are case-insensitive, so the task ids 
> "foo" and "FOO" are treated as identical. This means that a dag with those 
> task ids will fail to schedule with a sqlalchemy `IntegrityError` using 
> mysql, but not postgres or sqlite. This situation probably doesn't happen 
> often, and users probably shouldn't use task ids that are identical except 
> for case, but I think we should improve the behavior here. A few options:
>  
>  * Configure sqlalchemy to use a binary collation for string id columns under 
> mysql so that string comparisons are case-sensitive.
>  * Require dag and task ids to be unique regardless of case. This would be a 
> breaking change.
>  * Document that mysql users should configure mysql to use binary collations 
> for string types by default. This would still show users a 500 if the 
> database isn't configured correctly.
>  
> I'll submit a pull request with a failing unit test to describe the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to