[ https://issues.apache.org/jira/browse/FLINK-32127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723798#comment-17723798 ]
Gyula Fora commented on FLINK-32127: ------------------------------------ cc [~mxm] > Source busy time is inaccurate in many cases > -------------------------------------------- > > Key: FLINK-32127 > URL: https://issues.apache.org/jira/browse/FLINK-32127 > Project: Flink > Issue Type: Improvement > Components: Autoscaler > Reporter: Zhanghao Chen > Priority: Major > > We found that source busy time is inaccurate in many cases. The reason is > that sources are usu. multi-threaded (Kafka and RocketMq for example), there > is a fetcher thread fetching data from data source, and a consumer thread > deserializes data with an blocking queue in between. A source is considered > # *idle* if the consumer is blocked by fetching data from the queue > # *backpressured* if the consumer is blocked by writing data to downstream > operators > # *busy* otherwise > However, this means that if the bottleneck is on the fetcher side, the > consumer will be often blocked by fetching data from the queue, the source > idle time would be high, but in fact it is busy and consumes a lot of CPU. In > some of our jobs, the source max busy time is only ~600 ms while it has > actually reached the limit. > The bottleneck could be on the fetcher side, for example, when Kafka enables > zstd compression, uncompression on the consumer side could be quite heavy > compared to data deserialization on the consumer thread side. -- This message was sent by Atlassian Jira (v8.20.10#820010)