[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195514#comment-14195514
 ] 

Hadoop QA commented on BOOKKEEPER-795:
--------------------------------------

Testing JIRA BOOKKEEPER-795


Patch 
[0001-Made-ledger-metadata-immutable.patch|https://issues.apache.org/jira/secure/attachment/12678958/0001-Made-ledger-metadata-immutable.patch]
 downloaded at Tue Nov  4 00:37:32 UTC 2014

----------------------------

{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.    {color:green}+1{color} the patch does not introduce any @author tags
.    {color:green}+1{color} the patch does not introduce any tabs
.    {color:green}+1{color} the patch does not introduce any trailing spaces
.    {color:green}+1{color} the patch does not introduce any line longer than 
120
.    {color:green}+1{color} the patch does adds/modifies 20 testcase(s)
{color:green}+1 RAT{color}
.    {color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.    {color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.    {color:green}+1{color} HEAD compiles
.    {color:green}+1{color} patch compiles
.    {color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:red}-1 FINDBUGS{color}
.    {color:red}-1{color} the patch seems to introduce 2 new Findbugs 
warning(s) in module(s) [bookkeeper-server]
{color:green}+1 TESTS{color}
.    Tests run: 930
{color:green}+1 DISTRO{color}
.    {color:green}+1{color} distro tarball builds with the patch 

----------------------------
{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/bookkeeper-trunk-precommit-build/807/

> Race condition causes writes to hang if ledger is fences
> --------------------------------------------------------
>
>                 Key: BOOKKEEPER-795
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-795
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>            Priority: Blocker
>             Fix For: 4.4.0
>
>         Attachments: 0001-Demonstrate-race-condition.patch, 
> 0001-Made-ledger-metadata-immutable.patch, 
> TEST-org.apache.bookkeeper.client.LedgerCloseTest.xml
>
>
> If a ledger is fenced while the write is still writing to it, some of the 
> writes will fail to ever complete.
> I've attached the log of this happening along with a test case that will 
> trigger the behaviour.
> What appears to be happening is that when the fence occurs, the first write 
> after the fence gets an unrecoverable error, so tries to close the ledger. 
> Closing the ledger sets the closed flag on the ledger metadata, and tries to 
> write it, which fails as the metadata in zookeeper was modified by the 
> fencing operation, so the close op fails, resets the closed status for a 
> moment, a write operation gets through, which then fails with a fencing 
> error, so we try to close the ledger, but the other close operation has since 
> closed the ledger in our metadata, so nothing happens, and the write hangs 
> forever.
> There's a number of issues here, but foremost, the ledger metadata that the 
> handle is using should only ever represent what is actually in zookeeper. 
> Having various parts of the code flipping bits just explodes the state space. 
> The LedgerMetadata object itself should be immutable, and should only be 
> modified, as a local variable, using a builder, before writing to zookeeper. 
> Only when the zookeeper operation succeeds should we update the reference 
> which LedgerHandle has access to.
> There's also a problem in how we handle pendingaddops when we close. Really 
> it shouldn't be possible for a write op to get through after a closure, but 
> we should be defensive here and error out anything that has gotten through, 
> adding a big old log message to alert us that this cases that shouldn't 
> happen, is happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to