[ 
https://issues.apache.org/jira/browse/FLINK-35815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Liang Teoh reassigned FLINK-35815:
---------------------------------------

    Assignee: Aleksandr Pilipenko

> KinesisProxySyncV2 doesn't always retry throttling exceptions. 
> ---------------------------------------------------------------
>
>                 Key: FLINK-35815
>                 URL: https://issues.apache.org/jira/browse/FLINK-35815
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Kinesis
>    Affects Versions: aws-connector-4.2.0, aws-connector-4.3.0, 1.19.1
>            Reporter: Krzysztof Dziolak
>            Assignee: Aleksandr Pilipenko
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: aws-connector-4.4.0
>
>
> *Problem:*
> We have observed missing retrys on throttling for DescribeStreamSummary calls 
> from Kinesis.
> {code:java}
> org.apache.flink.runtime.rest.handler.RestHandlerException: Could not execute 
> application.
> ...
> Caused by: java.util.concurrent.CompletionException: 
> org.apache.flink.util.FlinkRuntimeException: Could not execute application.
> ...
> Caused by: org.apache.flink.util.FlinkRuntimeException: Could not execute 
> application.
> ... 
> Caused by: org.apache.flink.client.program.ProgramInvocationException: The 
> main method caused an error: Error registering stream:  <edited>
> ...
> Caused by: 
> org.apache.flink.streaming.connectors.kinesis.util.StreamConsumerRegistrarUtil$FlinkKinesisStreamConsumerRegistrarException:
>  Error registering stream: <edited>
> ...
> Caused by: 
> org.apache.flink.kinesis.shaded.software.amazon.awssdk.services.kinesis.model.LimitExceededException:
>  Rate exceeded for stream  <edited>. (Service: Kinesis, Status Code: 400, 
> Request ID: efe4c2a9-3c3b-9c1c-b0ed-d9b05db93be2, Extended Request ID: 
> pSG6kwQXgPWD2S7YoPT4RKf+g8QbRBaxc0grhNz6juEoti/uGUQTzyqsfmFCLSHoM+u1ydHvqxzsv/0ICUid6aTAQdndy2EO)
> ...
>     at 
> org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxySyncV2.lambda$describeStreamSummary$0(KinesisProxySyncV2.java:91)
>     at 
> org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxySyncV2.invokeWithRetryAndBackoff(KinesisProxySyncV2.java:175)
>     at 
> org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxySyncV2.describeStreamSummary(KinesisProxySyncV2.java:90)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.fanout.StreamConsumerRegistrar.registerStreamConsumer(StreamConsumerRegistrar.java:92)
>     at 
> org.apache.flink.streaming.connectors.kinesis.util.StreamConsumerRegistrarUtil.registerStreamConsumers(StreamConsumerRegistrarUtil.java:121)
> ... {code}
> The same problem occurs both with LAZY and EAGER registration strategies.
>  
> *Why does it get stuck?*
> *[https://github.com/apache/flink-connector-aws/blob/c716ca439b2c8e6d4b5905a03c867c418e031688/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/AwsV2Util.java#L77]*
> The `isRecoverableException` check validates the cause of the exception only, 
> but it doesn't inspect the actual exception being evaluated for retriability. 
> In this particular case, LimitExceededException is thrown without wrappers 
> and it appears that this case is not handled correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to