Re: [PR] [KafkaIO] Make ReadFromKafkaDoFn restriction trackers unsplittable [beam]

via GitHub Fri, 28 Nov 2025 05:21:41 -0800


scwhittle commented on code in PR #36935:
URL: https://github.com/apache/beam/pull/36935#discussion_r2571677163



##########
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java:
##########
@@ -108,6 +109,12 @@
  *
  * <h4>Splitting</h4>
  *
+ * <p>Consumer groups must not consume from the same {@link TopicPartition} 
simultaneously. Doing so
+ * may arbitrarily overwrite a consumer group's committed offset for a {@link 
TopicPartition}.

Review Comment:
   So I think that this could work, but would we rather support processing 
distinct portions of the same partition by caching more than 1 consumer? I 
think we'd have to change the code to more explicitly acquire and release a 
consumer.
   
   Though batch isn't particularly common given we haven't hit this before, 
this could maybe be useful even for streaming for processing large backlogs if 
we made the initial splits per partition something like [s, (s+f)/2) [(s+f)/2, 
f) [f, streaming tail]  where s is start position, f is current position.
   
   This is assuming that consumers are independent and that this won't screw up 
caches or something on the server and kill performance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [KafkaIO] Make ReadFromKafkaDoFn restriction trackers unsplittable [beam]

Reply via email to