Github user jianbzhou commented on the pull request:
https://github.com/apache/storm/pull/1131#issuecomment-221922582
@hmcl, currently if user give firstPollOffsetStrategy=UNCOMMITTED_LATEST or
LATEST, the spout will not work, because if a kafka consumer re-balance
happened, the offset will be seeked to the end, and there will be lots of
messages not consumed/emitted/acked&failed, so will never find the next
continuous offset to commit, so the log will keep showing that "Non continuous
offset found"......
I have a questions here - if a spout read and emit one message, I assume
storm will ensure the message will be acked or failed without exception, right?
because if it is possible that one emitted message failed to get acked or
failed message under some strange situations, it means we cannot find the
continuous message to commit, which will directly break the spout. Could you
please help confirm if my assumption is correct?
If my assumption is not correct - which means one emitted message may not
be able to get acked or failed message back, then we must change the spout(need
a timeout setting if failed to find next continuous message to commit) -
currently the spout will always find the next continuous message to commit, it
will try forever...
due to the spout will always find the next continuous message to commit, we
need to be cautious for below method:
private boolean poll() {
return !waitingToEmit() && emitted.size() <
kafkaSpoutConfig.getMaxUncommittedOffsets();
}
if the MaxUncommittedOffsets is too small, the spout will frequently stop
polling from kafka, if a rebalance happened and seek back to the failed
message, at this moment if the spout stop polling, will also cause the spout
failed to find the next committed message. Currently we set this value to
200000 and seems working fine for now.
Looking forward to hearing from you! thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---