[jira] [Commented] (HBASE-21192) Add HOW-TO repair damaged AMv2.
[ https://issues.apache.org/jira/browse/HBASE-21192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635045#comment-16635045 ] stack commented on HBASE-21192: --- h2. Master startup cannot progress, in holding-pattern until region onlined If the cluster comes up and reports in the logs lines like the below: {code} 2018-10-01 22:07:42,792 WARN org.apache.hadoop.hbase.master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=CLOSING, ts=1538456302300, server=ve1017.halxg.cloudera.com,22101,1538449648131}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined. {code} ... there is no procedure to assign meta. To inject one, use the hbck2 tool: {code} HBASE_CLASSPATH_PREFIX=./hbase-hbck2-1.0.0-SNAPSHOT.jar hbase org.apache.hbase.HBCK2 unassigns 1588230740 {code} (1588230740 is the hard-coded encoded region name for hbase:meta -- the hbck2 takes encoded region names). You'll probably have to assign the hbase:namespace too if you had to assign meta. Look out for the encoded name of the namespace region... it'll be a line like this: {code} 2018-10-01 22:09:49,681 WARN org.apache.hadoop.hbase.master.HMaster: hbase:namespace,,1526694055629.37cc206fe9c4bc1c0a46a34c5f523d16. is NOT online; state={37cc206fe9c4bc1c0a46a34c5f523d16 state=OPEN, ts=1538456987236, server=ve1233.halxg.cloudera.com,22101,1538441741767}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined. {code} 37cc206fe9c4bc1c0a46a34c5f523d16 is the encoded name of the namespace table region... (This stuff will be cleaned up more... just dropping note here for moment so don't forget when doing writeup...) > Add HOW-TO repair damaged AMv2. > --- > > Key: HBASE-21192 > URL: https://issues.apache.org/jira/browse/HBASE-21192 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > > Need a page or two on how to do various fixups. Will include doc on how to > identify particular circumstance, how to run a repair, as well as caveats > (e.g. if no log recovery, then region may be missing edits). > Add pointer to log messages, especially those that explicitly ask for > operator intervention; e.g. Master#inMeta. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21192) Add HOW-TO repair damaged AMv2.
[ https://issues.apache.org/jira/browse/HBASE-21192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622829#comment-16622829 ] stack commented on HBASE-21192: --- There are others... that we should remove. abort_procedure! > Add HOW-TO repair damaged AMv2. > --- > > Key: HBASE-21192 > URL: https://issues.apache.org/jira/browse/HBASE-21192 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > > Need a page or two on how to do various fixups. Will include doc on how to > identify particular circumstance, how to run a repair, as well as caveats > (e.g. if no log recovery, then region may be missing edits). > Add pointer to log messages, especially those that explicitly ask for > operator intervention; e.g. Master#inMeta. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21192) Add HOW-TO repair damaged AMv2.
[ https://issues.apache.org/jira/browse/HBASE-21192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619831#comment-16619831 ] stack commented on HBASE-21192: --- To list outstanding locks and procedures: {code} $ echo "list_locks"| hbase shell &> /tmp/locks.txt $ echo "list_procedures"| hbase shell &> /tmp/procedures.txt {code} > Add HOW-TO repair damaged AMv2. > --- > > Key: HBASE-21192 > URL: https://issues.apache.org/jira/browse/HBASE-21192 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > > Need a page or two on how to do various fixups. Will include doc on how to > identify particular circumstance, how to run a repair, as well as caveats > (e.g. if no log recovery, then region may be missing edits). > Add pointer to log messages, especially those that explicitly ask for > operator intervention; e.g. Master#inMeta. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21192) Add HOW-TO repair damaged AMv2.
[ https://issues.apache.org/jira/browse/HBASE-21192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619791#comment-16619791 ] stack commented on HBASE-21192: --- h2. STUCK Region Redux The hbck2 tool can now do bulk assigning (HBASE-21156). On a cluster with 60k regions stuck in the OPENING state (no locks held -- the OPENING state came about because all MasterProcWALs had been removed from under a running cluster), I did the following: {code} # First get list of all the STUCK and OPENING regions $ grep STUCK master.log|grep OPENING|sed -e "s/^.*region=//"|sort -u > /tmp/stuck.txt # Split the file with 60k STUCK regions into files of 1k regions each. $ split -l 1000 /tmp/stuck.txt STUCK # Feed each file to the hbck2 tool... call assigns and pass list of 1k encoded region names. $ for i in `ls STUCK*`; do ls $i; HBASE_CLASSPATH_PREFIX=./hbase-hbck2-1.0.0-SNAPSHOT.jar hbase org.apache.hbase.HBCK2 assigns `cat $i|tr "\n" " "`; done {code} > Add HOW-TO repair damaged AMv2. > --- > > Key: HBASE-21192 > URL: https://issues.apache.org/jira/browse/HBASE-21192 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > > Need a page or two on how to do various fixups. Will include doc on how to > identify particular circumstance, how to run a repair, as well as caveats > (e.g. if no log recovery, then region may be missing edits). > Add pointer to log messages, especially those that explicitly ask for > operator intervention; e.g. Master#inMeta. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21192) Add HOW-TO repair damaged AMv2.
[ https://issues.apache.org/jira/browse/HBASE-21192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612821#comment-16612821 ] stack commented on HBASE-21192: --- For example: If this in log: {code} 2018-09-12 15:29:06,558 WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK Region-In-Transition rit=OPENING, location=va1001.halxg.cloudera.com,22101,1536173230599, table=IntegrationTestBigLinkedList_20180626110336, region=dbdb56242f17610c46ea044f7a42895b 2018-09-12 15:29:06,558 WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK Region-In-Transition rit=OPENING, location=ve1229.halxg.cloudera.com,22101,1536173229844, table=IntegrationTestBigLinkedList_20180803113809, region=4d1618634dae662acb06f5e3b55223c9 {code} ... as long as no lock on the region, you should be able to just do an assign of region to unSTUCK it. If many, make a file of them all and cat into hbase shell as in: {{$ cat /tmp/a.txt |hbase shell}} where /tmp/a.txt has stuff like: {code} assign 'fb9e0a6e864e36894c48da74074de65d' assign '494e64585e49a22dad2f35383e7b9bb9' assign 'e1fa1d4c3dcd59d6a0a61a5c63f4fda5' assign '4b1fa4fd3bc52d1a6a94db1c4c13ab2b' assign '86c5348e84e200fdf2f8633c9ac188b5' assign 'ab60573f41a978de566a8a7097cf8ccc' assign '085e05caefffcfb17356d4326e99c523' assign '6ab89f20867d6a97fdb2a61fa82be4cc' assign '4feb719da3cb53374d7b9162c0849c90' assign '38d66170d5004c22ed61b184b8209f74' assign 'c9807aef53ef14f14c9fc1de6ad942c5' assign 'dbdb56242f17610c46ea044f7a42895b' assign '4d1618634dae662acb06f5e3b55223c9' assign '95035cf88e92179c5673c49d3eceaf7d' {code} > Add HOW-TO repair damaged AMv2. > --- > > Key: HBASE-21192 > URL: https://issues.apache.org/jira/browse/HBASE-21192 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > > Need a page or two on how to do various fixups. Will include doc on how to > identify particular circumstance, how to run a repair, as well as caveats > (e.g. if no log recovery, then region may be missing edits). > Add pointer to log messages, especially those that explicitly ask for > operator intervention; e.g. Master#inMeta. -- This message was sent by Atlassian JIRA (v7.6.3#76005)