[ https://issues.apache.org/jira/browse/BOOKKEEPER-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195462#comment-14195462 ]
Hadoop QA commented on BOOKKEEPER-795: -------------------------------------- Testing JIRA BOOKKEEPER-795 Patch [0001-Made-ledger-metadata-immutable.patch|https://issues.apache.org/jira/secure/attachment/12678958/0001-Made-ledger-metadata-immutable.patch] downloaded at Tue Nov 4 00:01:05 UTC 2014 ---------------------------- {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} . {color:green}+1{color} the patch does not introduce any @author tags . {color:green}+1{color} the patch does not introduce any tabs . {color:green}+1{color} the patch does not introduce any trailing spaces . {color:green}+1{color} the patch does not introduce any line longer than 120 . {color:green}+1{color} the patch does adds/modifies 20 testcase(s) {color:green}+1 RAT{color} . {color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} . {color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} . {color:green}+1{color} HEAD compiles . {color:green}+1{color} patch compiles . {color:green}+1{color} the patch does not seem to introduce new javac warnings {color:red}-1 FINDBUGS{color} . {color:red}-1{color} the patch seems to introduce 2 new Findbugs warning(s) in module(s) [bookkeeper-server] {color:green}+1 TESTS{color} . Tests run: 930 {color:green}+1 DISTRO{color} . {color:green}+1{color} distro tarball builds with the patch ---------------------------- {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/bookkeeper-trunk-precommit-build/806/ > Race condition causes writes to hang if ledger is fences > -------------------------------------------------------- > > Key: BOOKKEEPER-795 > URL: https://issues.apache.org/jira/browse/BOOKKEEPER-795 > Project: Bookkeeper > Issue Type: Bug > Reporter: Ivan Kelly > Assignee: Ivan Kelly > Priority: Blocker > Fix For: 4.4.0 > > Attachments: 0001-Demonstrate-race-condition.patch, > 0001-Made-ledger-metadata-immutable.patch, > TEST-org.apache.bookkeeper.client.LedgerCloseTest.xml > > > If a ledger is fenced while the write is still writing to it, some of the > writes will fail to ever complete. > I've attached the log of this happening along with a test case that will > trigger the behaviour. > What appears to be happening is that when the fence occurs, the first write > after the fence gets an unrecoverable error, so tries to close the ledger. > Closing the ledger sets the closed flag on the ledger metadata, and tries to > write it, which fails as the metadata in zookeeper was modified by the > fencing operation, so the close op fails, resets the closed status for a > moment, a write operation gets through, which then fails with a fencing > error, so we try to close the ledger, but the other close operation has since > closed the ledger in our metadata, so nothing happens, and the write hangs > forever. > There's a number of issues here, but foremost, the ledger metadata that the > handle is using should only ever represent what is actually in zookeeper. > Having various parts of the code flipping bits just explodes the state space. > The LedgerMetadata object itself should be immutable, and should only be > modified, as a local variable, using a builder, before writing to zookeeper. > Only when the zookeeper operation succeeds should we update the reference > which LedgerHandle has access to. > There's also a problem in how we handle pendingaddops when we close. Really > it shouldn't be possible for a write op to get through after a closure, but > we should be defensive here and error out anything that has gotten through, > adding a big old log message to alert us that this cases that shouldn't > happen, is happening. -- This message was sent by Atlassian JIRA (v6.3.4#6332)