Github user jianbzhou commented on the pull request:

    https://github.com/apache/storm/pull/1131#issuecomment-222308004
  
    @hmcl, 1, work load is not distributed well is not because of the spout, 
that is a kafka cluster setup issue and now is resolved 2, for the other two, I 
dig into the log(sent to your via email) - seems everytime when a re-balance 
happens, the spout seek to a bigger offset than the committed offset in this 
partition, per my understanding, this will cause some message not be able to 
consumed/emitted, so all the log show "Non continuous offset found"
    user spout setting is: firstPollOffsetStrategy=UNCOMMITTED_LATEST, 
pollTimeoutMs=2000, offsetCommitPeriodMs=10000, maxRetries=2147483647
    I know firstPollOffsetStrategy cannot be EARLIEST or LATEST, but seems to 
me UNCOMMITTED_LATEST should not cause this issue.
    I asked user to try UNCOMMITTED_EARLIEST and now seems the issue does NOT 
happen again as per the log, though it may happen later...
    From the code perspective, i cannot understand why the weird behavior 
happened, could you help?
    
    Also, per our previous testing - we find once - a worker died and re 
balance happened, we find one spout(not in the died worker) have some message 
not acked or failed back. That also caused the "Non continuous offset found" 
show many times in the log,  which will cause no message will be committed to 
kafka. The only solution will be restart the storm topology.
    
    We emit message in this way - kafkaSpoutStreams.emit(collector, tuple, 
msgId); Could you please help confirm - **storm would ensure  all the messages 
that emitted by the spout will be acked/failed back without exception?** 
Because if this is not the case, the spout will not be able to find the 
continuous offset to commit, then we must fix this issue urgently as we plan to 
release the change early next month. Please help advise. thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to