[ 
https://issues.apache.org/jira/browse/FLINK-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166925#comment-16166925
 ] 

Bowen Li commented on FLINK-7622:
---------------------------------

[~tzulitai] I think FLINK-7508 indirectly solves this issue, since FLINK-7508 
greatly increased the throughput of Kinesis producer. Here's an example:

https://imgur.com/a/u198h  Here are the metrics of our prod Flink job - 
UserRecords Put (aka UserRecords put into KPL queue), and user_records_pending 
(aka outstanding UserRecords waiting to be sent to AWS in the queue).

When using per_request threading model, a small number (~0.5million) of 
UserRecords can cause huge number (~15k) of records pending because the 
throughput is so low. After switching to pooled threading model, you can see 
the number of outstanding UserRecords has dropped significantly (~0) even 
though the number of UserRecords put into the queue grow to 16X bigger 
(~8million at peak). Take a closer look at our user_records_pending metric for 
the past two weeks at https://imgur.com/a/2YxIm, the # of outstanding records 
is consistently under 150 (impressive, right?). 

Thus, I believe propagating back pressure to upstream for FlinkKinesisProducer 
is not necessary anymore.

> Respect local KPL queue size in FlinkKinesisProducer when adding records to 
> KPL client
> --------------------------------------------------------------------------------------
>
>                 Key: FLINK-7622
>                 URL: https://issues.apache.org/jira/browse/FLINK-7622
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kinesis Connector
>            Reporter: Tzu-Li (Gordon) Tai
>
> This issue was brought to discussion by [~sthm] offline.
> Currently, records are added to the Kinesis KPL producer client without 
> checking the number of outstanding records within the local KPL queue. This 
> manner is basically neglecting backpressure when producing to Kinesis through 
> KPL, and can therefore exhaust system resources.
> We should respect {{producer.getOutstandingRecordsCount()}} as a measure of 
> backpressure, and propagate backpressure upstream by blocking further sink 
> invocations when some threshold of outstanding record count is exceeded. The 
> recommended threshold [1] seems to be 10,000.
> [1] 
> https://aws.amazon.com/blogs/big-data/implementing-efficient-and-reliable-producers-with-the-amazon-kinesis-producer-library/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to