Github user srdo commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1924#discussion_r106685548
  
    --- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java
 ---
    @@ -268,12 +268,12 @@ private Builder(Builder<?, ?> builder, 
SerializableDeserializer<K> keyDes, Class
             
             /**
              * The maximum number of records a poll will return.
    -         * Will only work with Kafka 0.10.0 and above.
    +         * This is limited by maxUncommittedOffsets, since it doesn't make 
sense to allow larger polls than the spout is allowed to emit.
    +         * Please note that when there are retriable tuples on a 
partition, maxPollRecords is an upper bound for how far the spout will read 
past the last committed offset on that partition.
    +         * It is recommended that users set maxUncommittedOffsets and 
maxPollRecords to be equal.
    --- End diff --
    
    @hmcl I agree that it is not ideal, but there's an issue with letting 
maxUncommittedOffsets be larger than maxPollRecords.
    
    From an earlier response:
    "If maxPollRecords is less than maxUncommittedOffsets, there's a risk of 
the spout getting stuck on some tuples for a while when it is retrying tuples.
    Say there are 10 retriable tuples following the last committed offset, and 
maxUncommittedOffsets is 10. If maxPollRecords is 5 and the first 5 retriable 
tuples are reemitted in the first batch, the next 5 tuples can't be emitted 
until (some of) the first 5 are acked. This is because the spout will seek the 
consumer back to the last committed offset any time there are failed tuples, 
which will lead to it getting the first 5 tuples out of the consumer, checking 
that they are emitted, and skipping them. This will repeat until the last 
committed offset moves. If there are other partitions with tuples available, 
those tuples may get emitted, but the "blocked" partition won't progress until 
some tuples are acked on it."
    
    How about we fix this by making doSeekRetriableTopicPartitions seek to the 
lowest retriable offset per partition for the partitions with failed tuples, 
instead of seeking to the last committed/committable offset? It seems like 
seeking to the last committed offset is likely to have some bad interactions 
with maxUncommittedOffsets and maxPollRecords. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to