Joel Croteau created AIRFLOW-5281:
-------------------------------------

             Summary: GCP transfer operators do not detect previous successful 
runs if interrupted
                 Key: AIRFLOW-5281
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5281
             Project: Apache Airflow
          Issue Type: Bug
          Components: contrib, gcp, operators
    Affects Versions: 1.10.3
            Reporter: Joel Croteau


Operators that rely on GCS/BigQuery transfer service basically work by creating 
a transfer job, then periodically polling the transfer service for the status 
of their transfer job, and reporting success or failure upon job completion. 
This can cause problems if a task instance is terminated by the cluster, e.g. 
for lack of resources, as retries will try to create a new transfer job, and if 
the previous job was successful, the new job will usually fail because there 
will already be files at its destination, and this will cause the task overall 
to fail, despite having actually succeeded in transferring what it needs to 
transfer. I have noticed this in particular using 
`{{}}{{S3ToGoogleCloudStorageTransferOperator}}` and 
`{{GoogleCloudStorageToBigQueryOperator}}`, but I imagine that it exists in 
other operators as well. I know that the documentation for at least some of 
these operators includes a big warning about how they are not idempotent, and 
that multiple runs will create multiple transfer jobs, but that doesn't 
actually help with fixing it. What they should do is set a variable or XCom 
upon creation of the job, and retries should check for existing jobs before 
starting a new one. I've also noticed this same thing using `PostgresOperator` 
to execute an `UNLOAD` query on Redshift (I didn't use `RedshiftToS3Transfer` 
because that operator has zero customization options, and I wanted to actually 
control what columns I exported).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to