dominikhei opened a new pull request, #52770: URL: https://github.com/apache/airflow/pull/52770
--- closes: #32799 I opened this as a draft since I’m unsure about the pattern used and made some compromises —> would appreciate feedback. @hussein-awala , I like your idea of retrying on certain error codes (e.g., 408, 5xx). However, requests to Tableau’s get_by_id endpoint don’t expose status codes, and many exceptions don’t either. Error messages have them in plain text like: ``` 409093: Resource Conflict Job for 'Main Transactions Dashboard' is already queued. Not queuing a duplicate. ``` Parsing strings for status codes feels fragile. We could catch exceptions that have `response.status_code` and retry on likely transient codes, but many don’t. *My suggestion*: add an optional retries_on_failure parameter to the sensor. It retries a set number of times on any failure, then raises an AirflowException. This keeps logic simple. The only other sensor that allows for configuring retries by itself is the [BashSensor](https://github.com/apache/airflow/blob/main/providers/standard/src/airflow/providers/standard/sensors/bash.py#L51). Handling job cancellations on failure, as described in the issue, seems better suited for `on_failure_callback`. Additionally the TableauOperator comes with a param `blocking_refresh` which could lead to the same problem, I wanted to address the sensor first and get some feedback. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org