[ 
https://issues.apache.org/jira/browse/HBASE-19457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293380#comment-16293380
 ] 

Appy commented on HBASE-19457:
------------------------------

We discussed few things, here's the summary:
- we have procs spawing subprocs, but not sure if there's an example where this 
tree's depth > 2. If yes, we can change truncate proc to just delete proc + 
create proc.

bq. As a step in truncate before we create the new? Wonder why this needs it 
and CreateTable doesnt (I think you ask this above).
Both have ADD_TO_META step where they add regions to meta. But when we fail 
after that:
in case of truncate proc, there's a table row in meta with state null --> gets 
assumed as enabled --> AM starts interfering
in case of create proc, there's no table row at all --> AM ignores those new 
regions

New stuff:
Stack recently committed HBASE-18946 which fixes issues around balancer and 
assigning. After it went in, we see more greens for TestTruncateTableProcedure 
in flaky dashboard.
A word on that:
When AM interfered on recovery (see "...recovery: TableStateManager treats 
table with null state as ENABLED. AM treats regions with null state as offline. 
Combined result - AM starts assigning the new " in description), it started 
Assign procs. But they got stuck for some reason (which i didn't care to debug 
as part of this test fix since it's unrelated). His patch makes that case 
better.
But the real fix here should be to correctly handle state in TTP so that AM 
doesn't interfere.

We'll keep an eye on dashboard, see the new failures, and then decide verdict 
on this patch.

In meantime opened this new jira to discuss other questions HBASE-19529, 
HBASE-19530

> Debugging flaky 
> TestTruncateTableProcedure#testRecoveryAndDoubleExecutionPreserveSplits
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-19457
>                 URL: https://issues.apache.org/jira/browse/HBASE-19457
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Appy
>            Assignee: Appy
>         Attachments: HBASE-19457.master.001.patch, patch1, test-output.txt
>
>
> Trying to explain the bug in a more general way where understanding of 
> ProcedureV2 is not required.
> Truncating table operation:
> ....
> delete region states from meta
> delete table state from meta
> ....
> add new regions to meta with state null.
> ....crash
> ....recovery: TableStateManager treats table with null state as ENABLED. AM 
> treats regions with null state as offline. Combined result - AM starts 
> assigning the new regions from incomplete truncate operation.
> Fix: Mark table as disabled instead of deleting it's state.
> ----
> *patch1*
> Just added some logging to help with debugging:
> - 60s was too less time, increased timeout
> - Added some useful log statements



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to