Hi devs,

I’d like to start a discussion about enabling the dynamic partition
discovery feature by default in Kafka source. Dynamic partition discovery
[1] is a useful feature in Kafka source especially under the scenario when
the consuming Kafka topic scales out, or the source subscribes to multiple
Kafka topics with a pattern. Users don’t have to restart the Flink job to
consume messages in the new partition with this feature enabled. Currently,
dynamic partition discovery is disabled by default and users have to
explicitly specify the interval of discovery in order to turn it on.

# Breaking changes

For Kafka table source:

- “scan.topic-partition-discovery.interval” will be set to 30 seconds by
default.
- As we need to provide a way for users to disable the feature,
“scan.topic-partition-discovery.interval” = “0” will be used to turn off
the discovery. Before this proposal, “0” means to enable partition
discovery with interval = 0, which is a bit senseless in practice.
Unfortunately we can't use negative values as the type of this option is
Duration.

For KafkaSource (DataStream API)

- Dynamic partition discovery in Kafka source will be enabled by default,
with discovery interval set to 30 seconds.
- To align with table source, only a positive value for option “
partition.discovery.interval.ms” could be used to specify the discovery
interval. Both negative and zero will be interpreted as disabling the
feature.

# Overhead of partition discovery

Partition discovery is made on KafkaSourceEnumerator, which asynchronously
fetches topic metadata from Kafka cluster and checks if there’s any new
topic and partition. This shouldn’t introduce performance issues on the
Flink side.

On the Kafka broker side, partition discovery makes MetadataRequest to
Kafka broker for fetching topic infos. Considering Kafka broker has its
metadata cache and the default request frequency is relatively low (per 30
seconds), this is not a heavy operation and the performance of the broker
won’t be affected a lot. It'll also be great to get some inputs from Kafka
experts.

Looking forward to your feedback!

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/kafka/#dynamic-partition-discovery

Best regards,
Qingsheng

Reply via email to