Hi devs, I’d like to start a discussion about enabling the dynamic partition discovery feature by default in Kafka source. Dynamic partition discovery [1] is a useful feature in Kafka source especially under the scenario when the consuming Kafka topic scales out, or the source subscribes to multiple Kafka topics with a pattern. Users don’t have to restart the Flink job to consume messages in the new partition with this feature enabled. Currently, dynamic partition discovery is disabled by default and users have to explicitly specify the interval of discovery in order to turn it on.
# Breaking changes For Kafka table source: - “scan.topic-partition-discovery.interval” will be set to 30 seconds by default. - As we need to provide a way for users to disable the feature, “scan.topic-partition-discovery.interval” = “0” will be used to turn off the discovery. Before this proposal, “0” means to enable partition discovery with interval = 0, which is a bit senseless in practice. Unfortunately we can't use negative values as the type of this option is Duration. For KafkaSource (DataStream API) - Dynamic partition discovery in Kafka source will be enabled by default, with discovery interval set to 30 seconds. - To align with table source, only a positive value for option “ partition.discovery.interval.ms” could be used to specify the discovery interval. Both negative and zero will be interpreted as disabling the feature. # Overhead of partition discovery Partition discovery is made on KafkaSourceEnumerator, which asynchronously fetches topic metadata from Kafka cluster and checks if there’s any new topic and partition. This shouldn’t introduce performance issues on the Flink side. On the Kafka broker side, partition discovery makes MetadataRequest to Kafka broker for fetching topic infos. Considering Kafka broker has its metadata cache and the default request frequency is relatively low (per 30 seconds), this is not a heavy operation and the performance of the broker won’t be affected a lot. It'll also be great to get some inputs from Kafka experts. Looking forward to your feedback! [1] https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/kafka/#dynamic-partition-discovery Best regards, Qingsheng