[ 
https://issues.apache.org/jira/browse/FLINK-39225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18065254#comment-18065254
 ] 

陈哲凯 commented on FLINK-39225:
-----------------------------

Hi, I'd like to take this one. Will start by looking at the parent issue
  FLINK-38857 to get familiar with the Triton module design first.

> Add retry with default value fallback for triton inference failures
> -------------------------------------------------------------------
>
>                 Key: FLINK-39225
>                 URL: https://issues.apache.org/jira/browse/FLINK-39225
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table SQL / Runtime
>            Reporter: featzhang
>            Priority: Major
>
> Adds retry mechanism with default value fallback for Triton model inference 
> failures, enabling robust error handling and downstream filtering.
> h2. Brief change log
> h3. 1. New Configuration Options (TritonOptions.java)
>  * {{{}max-retries{}}}: Maximum retry attempts (default: 0)
>  * {{{}retry-backoff{}}}: Initial backoff duration with exponential strategy 
> (default: 100ms)
>  * {{{}default-value{}}}: Fallback value when all retries fail
> h3. 2. Retry Logic (TritonInferenceModelFunction.java)
>  * Implements exponential backoff retry strategy
>  * Retries on network errors and 5xx server errors (503, 504)
>  * Fails immediately on 4xx client errors (configuration issues)
>  * Detailed logging for each retry attempt
> h3. 3. Default Value Fallback
>  * Returns configured default value after exhausting all retries
>  * Supports all output types: STRING, numeric, ARRAY
>  * Enables downstream view-based routing for success/failure cases
>  * Backward compatible: throws exceptions if no default value configured
> h3. 4. AbstractTritonModelFunction.java
>  * Added fields and getters for retry configuration
> h2. Use Cases
> {*}Scenario{*}: After N consecutive failures, return a default value that 
> downstream can use to route records to success/failure paths.
> {*}Example Configuration{*}:
> CREATE MODEL my_triton_model
> WITH (  'provider' = 'triton',  'endpoint' = 'http://triton:8000/v2/models',  
> 'model-name' = 'my-model',  'max-retries' = '3',              -- Retry up to 
> 3 times'retry-backoff' = '100ms',        -- 100ms, 200ms, 400ms 
> backoff'default-value' = 'FAILED'        -- Return 'FAILED' on all failures);
>  
> {*}Downstream Processing{*}:
> -- Route based on prediction resultINSERT INTO success_tableSELECT * FROM 
> predictions WHERE result != 'FAILED';INSERT INTO failure_tableSELECT * FROM 
> predictions WHERE result = 'FAILED';
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to