[ https://issues.apache.org/jira/browse/FLINK-36404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lorenzo Nicora updated FLINK-36404: ----------------------------------- Description: *Issue* {{PrometheusSinkWriteException}} thrown by {{HttpResponseCallback}} do not cause the httpclient IOReactor to fail, being actually swallowed, and preventing the job from failing. Also, related: exceptions from the IOReactor eventually causes the response callback {{failed}} to be called. Allowing the user to set DISCARD_AND_CONTINUE on generic exceptions thrown by the client may hide rethrown exceptions. Also, there is really no use of not failing on a generic unhandled exceptions from the client. *Solution* 1. Intercept {{PrometheusSinkWriteException}} up the httpclient stack, adding to the client a {{IOSessionListener}} to that can rethow those exceptions, causing the reactor to actually fail, and consequently also the operator to fail. 2. Remove the ability to configure of error handling behaviour on generic exceptions thrown by the httpclient. The job should always fail. 3. When the httpclient IOReactor fail, a long chain of exceptions is logged. To keep the actual root cause evident, the response callback should log to ERROR when the exception happens was: *Issue* {{PrometheusSinkWriteException}} thrown by {{HttpResponseCallback}} do not cause the httpclient IOReactor to fail, being actually swallowed, and preventing the job from failing. Also, rekatd: *Solution* Intercept {{PrometheusSinkWriteException}} up the httpclient stack, adding to the client a {{IOSessionListener}} to that can rethow those exceptions, causing the reactor to actually fail, and consequently also the operator to fail. Note: the httpclient IOReactor failing causes a number of exceptions. To keep the actual root cause evident, the response callback should log to ERROR when the exception happens > PrometheusSinkWriteException thrown by the response callback may not cause > job to fail > -------------------------------------------------------------------------------------- > > Key: FLINK-36404 > URL: https://issues.apache.org/jira/browse/FLINK-36404 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Prometheus > Reporter: Lorenzo Nicora > Priority: Critical > > *Issue* > {{PrometheusSinkWriteException}} thrown by {{HttpResponseCallback}} do not > cause the httpclient IOReactor to fail, being actually swallowed, and > preventing the job from failing. > Also, related: exceptions from the IOReactor eventually causes the response > callback {{failed}} to be called. Allowing the user to set > DISCARD_AND_CONTINUE on generic exceptions thrown by the client may hide > rethrown exceptions. Also, there is really no use of not failing on a generic > unhandled exceptions from the client. > *Solution* > 1. Intercept {{PrometheusSinkWriteException}} up the httpclient stack, adding > to the client a {{IOSessionListener}} to that can rethow those exceptions, > causing the reactor to actually fail, and consequently also the operator to > fail. > 2. Remove the ability to configure of error handling behaviour on generic > exceptions thrown by the httpclient. The job should always fail. > 3. When the httpclient IOReactor fail, a long chain of exceptions is logged. > To keep the actual root cause evident, the response callback should log to > ERROR when the exception happens -- This message was sent by Atlassian Jira (v8.20.10#820010)