[ https://issues.apache.org/jira/browse/FLINK-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166925#comment-16166925 ]
Bowen Li commented on FLINK-7622: --------------------------------- [~tzulitai] I think FLINK-7508 indirectly solves this issue, since FLINK-7508 greatly increased the throughput of Kinesis producer. Here's an example: https://imgur.com/a/u198h Here are the metrics of our prod Flink job - UserRecords Put (aka UserRecords put into KPL queue), and user_records_pending (aka outstanding UserRecords waiting to be sent to AWS in the queue). When using per_request threading model, a small number (~0.5million) of UserRecords can cause huge number (~15k) of records pending because the throughput is so low. After switching to pooled threading model, you can see the number of outstanding UserRecords has dropped significantly (~0) even though the number of UserRecords put into the queue grow to 16X bigger (~8million at peak). Take a closer look at our user_records_pending metric for the past two weeks at https://imgur.com/a/2YxIm, the # of outstanding records is consistently under 150 (impressive, right?). Thus, I believe propagating back pressure to upstream for FlinkKinesisProducer is not necessary anymore. > Respect local KPL queue size in FlinkKinesisProducer when adding records to > KPL client > -------------------------------------------------------------------------------------- > > Key: FLINK-7622 > URL: https://issues.apache.org/jira/browse/FLINK-7622 > Project: Flink > Issue Type: Improvement > Components: Kinesis Connector > Reporter: Tzu-Li (Gordon) Tai > > This issue was brought to discussion by [~sthm] offline. > Currently, records are added to the Kinesis KPL producer client without > checking the number of outstanding records within the local KPL queue. This > manner is basically neglecting backpressure when producing to Kinesis through > KPL, and can therefore exhaust system resources. > We should respect {{producer.getOutstandingRecordsCount()}} as a measure of > backpressure, and propagate backpressure upstream by blocking further sink > invocations when some threshold of outstanding record count is exceeded. The > recommended threshold [1] seems to be 10,000. > [1] > https://aws.amazon.com/blogs/big-data/implementing-efficient-and-reliable-producers-with-the-amazon-kinesis-producer-library/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)