wilsonhooi86 commented on issue #59075: URL: https://github.com/apache/airflow/issues/59075#issuecomment-3617210293
Hi @jroachgolf84 , thank you for your comments, to explain better: Our current Scenario: 1. GlueJobOperator task 1 runs `glue_job_name_1`. In AWS Glue UI, we can see `glue_job_name_1` running 2. Suddenly GlueJobOperator faced internal error as below. Failed the airflow task but however `glue_job_name_1` still running at AWS Glue ``` botocore.errorfactory.ThrottlingException: An error occurred (ThrottlingException) when calling the FilterLogEvents operation (reached max retries: 4): Rate exceeded ``` This error is due to the AWS hard limit when we have too many concurrent glue jobs running. 3. GlueJobOperator task 1 retry 2nd time and runs `glue_job_name_1`. AWS Glue calls a new `glue_job_name_1` 4. In AWS Glue, there are 2 `glue_job_name_1` running This would cause double glue job running and may cause complication as we only need 1 glue job name running at a time. However GlueJobOperator does not know the 1st glue job was still running. **Proposed solution:** We are hoping if we can have option for the GlueJobOperator to check previous glue job ID state with following conditions: Example: `check_previous_job_id_run: True` During next retry, GlueJobOperator can retrieve the `previous_glue_job_id ` from XCOM If the glue STATE = in progress, GlueJobOperator should not create a new glue job_run_id If the glue STATE = failed/completed, should create a new glue job_run_id Or if you have other alternatives, we are happy to hear as well =) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
