wilsonhooi86 commented on issue #59075:
URL: https://github.com/apache/airflow/issues/59075#issuecomment-3617210293

   Hi @jroachgolf84 , thank you for your comments, to explain better:
   
   Our current Scenario: 
   
   1. GlueJobOperator task 1 runs `glue_job_name_1`.  In AWS Glue UI, we can 
see `glue_job_name_1` running
   2. Suddenly GlueJobOperator faced internal error as below. Failed the 
airflow task but however `glue_job_name_1` still running at AWS Glue
   
   ```
   botocore.errorfactory.ThrottlingException: An error occurred 
(ThrottlingException) when calling the FilterLogEvents operation (reached max 
retries: 4): Rate exceeded
   ```
   This error is due to the AWS hard limit when we have too many concurrent 
glue jobs running.
   
   3. GlueJobOperator task 1 retry 2nd time and runs `glue_job_name_1`.  AWS 
Glue calls a new `glue_job_name_1`
   4.  In AWS Glue, there are 2 `glue_job_name_1` running
   
   This would cause double glue job running and may cause complication as we 
only need 1 glue job name running at a time. However GlueJobOperator does not 
know the 1st glue job was still running.
   
   **Proposed solution:**
   We are hoping if we can have option for the GlueJobOperator to check 
previous glue job ID state with following conditions:
   
   Example: `check_previous_job_id_run: True`
   
   During next retry, GlueJobOperator can retrieve the `previous_glue_job_id ` 
from XCOM
   If the glue STATE = in progress, GlueJobOperator should not create a new 
glue job_run_id
   If the glue STATE = failed/completed, should create a new glue job_run_id
   
   Or if you have other alternatives, we are happy to hear as well =)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to