[ 
https://issues.apache.org/jira/browse/FLINK-36404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lorenzo Nicora updated FLINK-36404:
-----------------------------------
    Description: 
*Issue*
{{PrometheusSinkWriteException}} thrown by {{HttpResponseCallback}} do not 
cause the httpclient IOReactor to fail, being actually swallowed, and 
preventing the job from failing.
Also, related: exceptions from the IOReactor eventually causes the response 
callback {{failed}} to be called. Allowing the user to set DISCARD_AND_CONTINUE 
on generic exceptions thrown by the client may hide rethrown exceptions. Also, 
there is really no use of not failing on a generic unhandled exceptions from 
the client.

*Solution*
1. Intercept {{PrometheusSinkWriteException}} up the httpclient stack, adding 
to the client a {{IOSessionListener}} to that can rethow those exceptions, 
causing the reactor to actually fail, and consequently also the operator to 
fail.
2. Remove the ability to configure of error handling behaviour on generic 
exceptions thrown by the httpclient. The job should always fail.
3. When the httpclient IOReactor fail, a long chain of exceptions is logged. To 
keep the actual root cause evident, the response callback should log to ERROR 
when the exception happens

  was:
*Issue*
{{PrometheusSinkWriteException}} thrown by {{HttpResponseCallback}} do not 
cause the httpclient IOReactor to fail, being actually swallowed, and 
preventing the job from failing.

Also, rekatd: 

*Solution*
Intercept {{PrometheusSinkWriteException}} up the httpclient stack, adding to 
the client a {{IOSessionListener}} to that can rethow those exceptions, causing 
the reactor to actually fail, and consequently also the operator to fail.

Note: the httpclient IOReactor failing causes a number of exceptions. To keep 
the actual root cause evident, the response callback should log to ERROR when 
the exception happens


> PrometheusSinkWriteException thrown by the response callback may not cause 
> job to fail
> --------------------------------------------------------------------------------------
>
>                 Key: FLINK-36404
>                 URL: https://issues.apache.org/jira/browse/FLINK-36404
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Connectors / Prometheus
>            Reporter: Lorenzo Nicora
>            Priority: Critical
>
> *Issue*
> {{PrometheusSinkWriteException}} thrown by {{HttpResponseCallback}} do not 
> cause the httpclient IOReactor to fail, being actually swallowed, and 
> preventing the job from failing.
> Also, related: exceptions from the IOReactor eventually causes the response 
> callback {{failed}} to be called. Allowing the user to set 
> DISCARD_AND_CONTINUE on generic exceptions thrown by the client may hide 
> rethrown exceptions. Also, there is really no use of not failing on a generic 
> unhandled exceptions from the client.
> *Solution*
> 1. Intercept {{PrometheusSinkWriteException}} up the httpclient stack, adding 
> to the client a {{IOSessionListener}} to that can rethow those exceptions, 
> causing the reactor to actually fail, and consequently also the operator to 
> fail.
> 2. Remove the ability to configure of error handling behaviour on generic 
> exceptions thrown by the httpclient. The job should always fail.
> 3. When the httpclient IOReactor fail, a long chain of exceptions is logged. 
> To keep the actual root cause evident, the response callback should log to 
> ERROR when the exception happens



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to