[ https://issues.apache.org/jira/browse/HBASE-19457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293380#comment-16293380 ]
Appy commented on HBASE-19457: ------------------------------ We discussed few things, here's the summary: - we have procs spawing subprocs, but not sure if there's an example where this tree's depth > 2. If yes, we can change truncate proc to just delete proc + create proc. bq. As a step in truncate before we create the new? Wonder why this needs it and CreateTable doesnt (I think you ask this above). Both have ADD_TO_META step where they add regions to meta. But when we fail after that: in case of truncate proc, there's a table row in meta with state null --> gets assumed as enabled --> AM starts interfering in case of create proc, there's no table row at all --> AM ignores those new regions New stuff: Stack recently committed HBASE-18946 which fixes issues around balancer and assigning. After it went in, we see more greens for TestTruncateTableProcedure in flaky dashboard. A word on that: When AM interfered on recovery (see "...recovery: TableStateManager treats table with null state as ENABLED. AM treats regions with null state as offline. Combined result - AM starts assigning the new " in description), it started Assign procs. But they got stuck for some reason (which i didn't care to debug as part of this test fix since it's unrelated). His patch makes that case better. But the real fix here should be to correctly handle state in TTP so that AM doesn't interfere. We'll keep an eye on dashboard, see the new failures, and then decide verdict on this patch. In meantime opened this new jira to discuss other questions HBASE-19529, HBASE-19530 > Debugging flaky > TestTruncateTableProcedure#testRecoveryAndDoubleExecutionPreserveSplits > --------------------------------------------------------------------------------------- > > Key: HBASE-19457 > URL: https://issues.apache.org/jira/browse/HBASE-19457 > Project: HBase > Issue Type: Bug > Reporter: Appy > Assignee: Appy > Attachments: HBASE-19457.master.001.patch, patch1, test-output.txt > > > Trying to explain the bug in a more general way where understanding of > ProcedureV2 is not required. > Truncating table operation: > .... > delete region states from meta > delete table state from meta > .... > add new regions to meta with state null. > ....crash > ....recovery: TableStateManager treats table with null state as ENABLED. AM > treats regions with null state as offline. Combined result - AM starts > assigning the new regions from incomplete truncate operation. > Fix: Mark table as disabled instead of deleting it's state. > ---- > *patch1* > Just added some logging to help with debugging: > - 60s was too less time, increased timeout > - Added some useful log statements -- This message was sent by Atlassian JIRA (v6.4.14#64029)