[ https://issues.apache.org/jira/browse/IGNITE-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ilya Shishkov updated IGNITE-18209: ----------------------------------- Description: Currently, there is a bottleneck in synchronized method {{KafkaToIgniteMetadataUpdater#updateMetadata}}: # {{KafkaToIgniteCdcStreamer}} contains multiple {{KafkaToIgniteCdcStreamerApplier}} which shares _single_ {{KafkaToIgniteMetadataUpdater}}. # All appliers handle corrsponding partitions consequently. # {{META_UPDATE_MARKER}} is sent twice to each partition of event topic: firstly, in case of type mappings updates, secondly, in case of binary types update. # When first {{KafkaToIgniteCdcStreamerApplier}} meets {{META_UPDATE_MARKER}} it calls {{KafkaToIgniteMetadataUpdater#updateMetadata}} which in turn calls {{KafkaConsumer#poll}}. # {{KafkaConsumer#poll}} returns immediately [1] when there are data in metadata topic. If there are few binary types and mappings to update, some {{KafkaToIgniteCdcStreamerApplier}} thread will consume all entries from metadata topic. # All other threads of all {{KafkaToIgniteCdcStreamerApplier}} will call {{KafkaConsumer#poll}} from empty metadata topic, which will remain blocked until new data becomes available or request timeout occurs [1]. # Because of {{synchronized}} access to {{KafkaToIgniteMetadataUpdater#updateMetadata}} all threads of all {{KafkaToIgniteCdcStreamerApplier}} will form a sequence of calls. Each call will block remaining appliers threads for {{kafkaReqTimeout}} period (if metadata topic remains empty). # The last call, i.e. part last partition update in this chain will happen after {{(partitionsCount x 2 - 1) x kafkaReqTimeout}}. For example for default timeout and 16 partitions _last partition will be consumed after approximately 1.5 minutes_. Amount of thread does not make sence. was: Currently, there is a bottleneck in synchronized method {{KafkaToIgniteMetadataUpdater#updateMetadata}}: # {{KafkaToIgniteCdcStreamer}} contains multiple {{KafkaToIgniteCdcStreamerApplier}} which shares _single_ {{KafkaToIgniteMetadataUpdater}}. # All appliers handle corrsponding partitions consequently. # {{META_UPDATE_MARKER}} is sent twice to each partition of event topic: firstly, in case of type mappings updates, secondly, in case of binary types update. # When first {{KafkaToIgniteCdcStreamerApplier}} meets {{META_UPDATE_MARKER}} it calls {{KafkaToIgniteMetadataUpdater#updateMetadata}} which in turn calls {{KafkaConsumer#poll}}. # {{KafkaConsumer#poll}} returns immediately [1] when there are data in metadata topic. If there are few binary types and mappings to update, some {{KafkaToIgniteCdcStreamerApplier}} thread will consume all entries from metadata topic. # All other threads of all {{KafkaToIgniteCdcStreamerApplier}} will call {{KafkaConsumer#poll}} from empty metadata topic, which will remain blocked until new data becomes available or request timeout occurs [1]. # Because of {{synchronized}} access to {{KafkaToIgniteMetadataUpdater#updateMetadata}} all threads of all {{KafkaToIgniteCdcStreamerApplier}} will form a sequence of calls. Each call will block remaining applier threads for {{kafkaReqTimeout}} period (if metadata topic remains empty). # The last call, i.e. part last partition update in this chain will happen after {{(partitionsCount x 2 - 1) x kafkaReqTimeout}}. For example for default timeout and 16 partitions _last partition will be consumed after approximately 1.5 minutes_. Amount of thread does not make sence. > Reduce binary metadata synchronization time for CDC through Kafka > ----------------------------------------------------------------- > > Key: IGNITE-18209 > URL: https://issues.apache.org/jira/browse/IGNITE-18209 > Project: Ignite > Issue Type: Improvement > Components: extensions > Reporter: Ilya Shishkov > Priority: Minor > Labels: IEP-59, ise > > Currently, there is a bottleneck in synchronized method > {{KafkaToIgniteMetadataUpdater#updateMetadata}}: > # {{KafkaToIgniteCdcStreamer}} contains multiple > {{KafkaToIgniteCdcStreamerApplier}} which shares _single_ > {{KafkaToIgniteMetadataUpdater}}. > # All appliers handle corrsponding partitions consequently. > # {{META_UPDATE_MARKER}} is sent twice to each partition of event topic: > firstly, in case of type mappings updates, secondly, in case of binary types > update. > # When first {{KafkaToIgniteCdcStreamerApplier}} meets {{META_UPDATE_MARKER}} > it calls {{KafkaToIgniteMetadataUpdater#updateMetadata}} which in turn calls > {{KafkaConsumer#poll}}. > # {{KafkaConsumer#poll}} returns immediately [1] when there are data in > metadata topic. If there are few binary types and mappings to update, some > {{KafkaToIgniteCdcStreamerApplier}} thread will consume all entries from > metadata topic. > # All other threads of all {{KafkaToIgniteCdcStreamerApplier}} will call > {{KafkaConsumer#poll}} from empty metadata topic, which will remain blocked > until new data becomes available or request timeout occurs [1]. > # Because of {{synchronized}} access to > {{KafkaToIgniteMetadataUpdater#updateMetadata}} all threads of all > {{KafkaToIgniteCdcStreamerApplier}} will form a sequence of calls. Each call > will block remaining appliers threads for {{kafkaReqTimeout}} period (if > metadata topic remains empty). > # The last call, i.e. part last partition update in this chain will happen > after {{(partitionsCount x 2 - 1) x kafkaReqTimeout}}. For example for > default timeout and 16 partitions _last partition will be consumed after > approximately 1.5 minutes_. Amount of thread does not make sence. -- This message was sent by Atlassian Jira (v8.20.10#820010)