[ https://issues.apache.org/jira/browse/BOOKKEEPER-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454504#comment-13454504 ]
Aniruddha commented on BOOKKEEPER-400: -------------------------------------- I haven't had a chance to look at https://issues.apache.org/jira/browse/BOOKKEEPER-208 yet, but we should definitely mark the change in flight before making any change to the ensemble. Good catch on the bug introduced by https://issues.apache.org/jira/browse/BOOKKEEPER-337. Over the long term, these classes should be made threadsafe. For example, handleBookieFailure also accesses lastAddConfirmed but this access is not synchronized. Moreover, only a certain subset of ledger handle operations need to be executed in the same thread, not all. > Ledger entry not found in any of the bookies in the ensemble responsible for > that entry. > ---------------------------------------------------------------------------------------- > > Key: BOOKKEEPER-400 > URL: https://issues.apache.org/jira/browse/BOOKKEEPER-400 > Project: Bookkeeper > Issue Type: Bug > Components: bookkeeper-client > Reporter: Aniruddha > Attachments: clean.log.gz > > > Detailed discussion at > http://mail-archives.apache.org/mod_mbox/zookeeper-bookkeeper-dev/201209.mbox/%3cCAOLhyDQzrmeOHmTxzPikeAqJ7pZUn0=vHfd=gc1srmtuye5...@mail.gmail.com%3e > We had an internal discussion about this. From BOOKKEEPER-337, it seems that > handleBookieFailure could be invoked in parallel by a thread other the one > that calls LedgerHandle#sendAddSuccessCallbacks. The values updated by > handleBookieFailure might not be visible to the thread running > sendAddSuccessCallbacks because the fields are not volatile and this might > have caused our bad state. > BK-337 synchronizes access to metadata.addEnsemble() and we believe this > would make this scenario very improbable. > A long term fix might be to make LedgerMetadata immutable since it is rarely > updated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira