[ https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658270#comment-16658270 ]
stack commented on HBASE-21354: ------------------------------- Makes sense [~allan163] Nice find sir. Here's hoping this addresses the weird issue I've seen when lots of chaos where I cannot clean up a Procedure because another holds a lock but the 'other' no longer exists. nit: These kind of logs w/o adding context -- name of the file being recovered -- can be useless.... LOG.debug("Starting WAL Procedure Store lease recovery"); Great test. > Procedure may be deleted improperly during master restarts resulting in > 'Corrupt' > --------------------------------------------------------------------------------- > > Key: HBASE-21354 > URL: https://issues.apache.org/jira/browse/HBASE-21354 > Project: HBase > Issue Type: Sub-task > Affects Versions: 2.1.0, 2.0.2 > Reporter: Allan Yang > Assignee: Allan Yang > Priority: Major > Attachments: HBASE-21354.branch-2.0.001.patch, > HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch > > > Good news! [~stack], [~Apache9], I may find the root cause of mysterious > ‘Corrupted procedure’ or some procedures disappeared after master > restarts(happens during ITBLL). > This is because during master restarts, we load procedures from the log, and > builds the 'holdingCleanupTracker' according each log's tracker. We may mark > a procedure in the oldest log as deleted if one log doesn't contain the > procedure. This is Inappropriate since one log will not contain info of the > log if this procedure was not updated during the time. We can only delete the > procedure only if it is not in the global tracker, which have the whole > picture. > {code} > trackerNode = tracker.lookupClosestNode(trackerNode, procId); > if (trackerNode == null || !trackerNode.contains(procId) || > trackerNode.isModified(procId)) { > // the procedure was removed or modified > node.delete(procId); > } > {code} > A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the > corruption happened in the patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)