GitHub user simonjobs created a discussion: GlueJobOperator in deferred mode 
does not include final status details

Hello,

Starting this discussion with the intention to share our use case, current 
issues and to get any ideas or inspiration on how to best proceed.

### Background
We are currently using MWAA 2.10.3 to orchestrate among others, Glue jobs. For 
this we are using the GlueJobOperator to trigger runs of already defined jobs 
with minimal arguments provided.

``` 
task = GlueJobOperator(
    task_id="task-id",
    job_name="job-name",
    region_name=AWS_DEFAULT_REGION,
    deferrable=True,
    retries=3
)
```

Key detail to note is that we are using `deferrable=True`, main reason for this 
is that we have longer running jobs and sensors and we do not want to reserve 
workers for them over longer periods.

### Issue
We are using `on_failure_callback` with a custom implemented function that 
extracts the error message from the context of a failed task and posts it as a 
card to our Teams channel. `exception = context.get('exception')`

When a glue job fails while the task is in deferred status it will only pick up 
that the state has failed and our callback simply extracts "Trigger failure".

This is an issue because in our Teams error notifications we want to 
immediately be able to see the high level cause of failure. Currently we would 
need to either go to Glue logs directly or via the Airflow logs.

### Possibly solutions
We have considered the following solutions or workarounds

- verbose=True
While this should include all detailed logs in our Airflow tasks we are not 
confident that this will actually solve our issue as status check on the final 
attempt will still fail. We are also hesitant to enable this as it would 
further duplicate our existing logs 1:1.

- Wrap GlueJobOperator and execute_complete function
This could possible be a good solution to modify the behaviour of that final 
status check. But we are hesitant to wrap the original operator as that would 
complicate further MWAA version upgrades for us.

- Enhance custom callback to include additional get_job_run call based on 
job_run_id from context
This is currently our preferred approach with the caveat that the final error 
message of the Glue job will not be included in the task logs. But it will be 
included in our error notification in Teams.

### Summary
Happy to receive any thoughts or inputs on the described issue. Let me know if 
I have missed to describe any essential part.

Also interested to know if this type of behaviour would be encouraged to be 
added to the functionality of GlueJobOperator or if this has been a concious 
decision to not include.

GitHub link: https://github.com/apache/airflow/discussions/63706

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to