dominikhei opened a new pull request, #52770:
URL: https://github.com/apache/airflow/pull/52770

   
   ---
   closes: #32799
   I opened this as a draft since I’m unsure about the pattern used and made 
some compromises —> would appreciate feedback.
   
   @hussein-awala , I like your idea of retrying on certain error codes (e.g., 
408, 5xx). However, requests to Tableau’s get_by_id endpoint don’t expose 
status codes, and many exceptions don’t either. Error messages have them in 
plain text like:
   ```
   409093: Resource Conflict
   Job for 'Main Transactions Dashboard' is already queued. Not queuing a 
duplicate.
   ```
   Parsing strings for status codes feels fragile. We could catch exceptions 
that have `response.status_code` and retry on likely transient codes, but many 
don’t.
   *My suggestion*: add an optional retries_on_failure parameter to the sensor. 
It retries a set number of times on any failure, then raises an 
AirflowException. This keeps logic simple. The only other sensor that allows 
for configuring retries by itself is the 
[BashSensor](https://github.com/apache/airflow/blob/main/providers/standard/src/airflow/providers/standard/sensors/bash.py#L51).
   
   Handling job cancellations on failure, as described in the issue, seems 
better suited for `on_failure_callback`. 
   Additionally the TableauOperator comes with a param `blocking_refresh` which 
could lead to the same problem, I wanted to address the sensor first and get 
some feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to