[jira] [Commented] (HBASE-21192) Add HOW-TO repair damaged AMv2.

2018-10-01 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635045#comment-16635045
 ] 

stack commented on HBASE-21192:
---

h2. Master startup cannot progress, in holding-pattern until region onlined

If the cluster comes up and reports in the logs lines like the below:

{code}
2018-10-01 22:07:42,792 WARN org.apache.hadoop.hbase.master.HMaster: 
hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=CLOSING, 
ts=1538456302300, server=ve1017.halxg.cloudera.com,22101,1538449648131}; 
ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern 
until region onlined.
{code}

... there is no procedure to assign meta.

To inject one, use the hbck2 tool:

{code}
 HBASE_CLASSPATH_PREFIX=./hbase-hbck2-1.0.0-SNAPSHOT.jar hbase 
org.apache.hbase.HBCK2 unassigns 1588230740
{code}

(1588230740 is the hard-coded encoded region name for hbase:meta -- the hbck2 
takes encoded region names).

You'll probably have to assign the hbase:namespace too if you had to assign 
meta. Look out for the encoded name of the namespace region... it'll be a line 
like this:

{code}
2018-10-01 22:09:49,681 WARN org.apache.hadoop.hbase.master.HMaster: 
hbase:namespace,,1526694055629.37cc206fe9c4bc1c0a46a34c5f523d16. is NOT online; 
state={37cc206fe9c4bc1c0a46a34c5f523d16 state=OPEN, ts=1538456987236, 
server=ve1233.halxg.cloudera.com,22101,1538441741767}; 
ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern 
until region onlined.
{code}

37cc206fe9c4bc1c0a46a34c5f523d16 is the encoded name of the namespace table 
region... 

(This stuff will be cleaned up more... just dropping note here for moment so 
don't forget when doing writeup...)


> Add HOW-TO repair damaged AMv2.
> ---
>
> Key: HBASE-21192
> URL: https://issues.apache.org/jira/browse/HBASE-21192
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> Need a page or two on how to do various fixups. Will include doc on how to 
> identify particular circumstance, how to run a repair, as well as caveats 
> (e.g. if no log recovery, then region may be missing edits).
> Add pointer to log messages, especially those that explicitly ask for 
> operator intervention; e.g. Master#inMeta.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21192) Add HOW-TO repair damaged AMv2.

2018-09-20 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622829#comment-16622829
 ] 

stack commented on HBASE-21192:
---

There are others... that we should remove. abort_procedure!

> Add HOW-TO repair damaged AMv2.
> ---
>
> Key: HBASE-21192
> URL: https://issues.apache.org/jira/browse/HBASE-21192
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> Need a page or two on how to do various fixups. Will include doc on how to 
> identify particular circumstance, how to run a repair, as well as caveats 
> (e.g. if no log recovery, then region may be missing edits).
> Add pointer to log messages, especially those that explicitly ask for 
> operator intervention; e.g. Master#inMeta.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21192) Add HOW-TO repair damaged AMv2.

2018-09-18 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619831#comment-16619831
 ] 

stack commented on HBASE-21192:
---

To list outstanding locks and procedures:

{code}
$ echo "list_locks"| hbase shell &> /tmp/locks.txt
$ echo "list_procedures"| hbase shell &> /tmp/procedures.txt
{code}

> Add HOW-TO repair damaged AMv2.
> ---
>
> Key: HBASE-21192
> URL: https://issues.apache.org/jira/browse/HBASE-21192
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> Need a page or two on how to do various fixups. Will include doc on how to 
> identify particular circumstance, how to run a repair, as well as caveats 
> (e.g. if no log recovery, then region may be missing edits).
> Add pointer to log messages, especially those that explicitly ask for 
> operator intervention; e.g. Master#inMeta.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21192) Add HOW-TO repair damaged AMv2.

2018-09-18 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619791#comment-16619791
 ] 

stack commented on HBASE-21192:
---

h2. STUCK Region Redux

The hbck2 tool can now do bulk assigning (HBASE-21156). On a cluster with 60k 
regions stuck in the OPENING state (no locks held -- the OPENING state came 
about because all MasterProcWALs had been removed from under a running 
cluster), I did the following:
{code}
 # First get list of all the STUCK and OPENING regions
 $ grep STUCK master.log|grep OPENING|sed -e "s/^.*region=//"|sort -u > 
/tmp/stuck.txt
 # Split the file with 60k STUCK regions into files of 1k regions each.
 $ split -l 1000 /tmp/stuck.txt STUCK
 # Feed each file to the hbck2 tool... call assigns and pass list of 1k encoded 
region names.
 $ for i in `ls STUCK*`; do ls $i; 
HBASE_CLASSPATH_PREFIX=./hbase-hbck2-1.0.0-SNAPSHOT.jar hbase 
org.apache.hbase.HBCK2 assigns `cat $i|tr "\n" " "`; done
{code}

> Add HOW-TO repair damaged AMv2.
> ---
>
> Key: HBASE-21192
> URL: https://issues.apache.org/jira/browse/HBASE-21192
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> Need a page or two on how to do various fixups. Will include doc on how to 
> identify particular circumstance, how to run a repair, as well as caveats 
> (e.g. if no log recovery, then region may be missing edits).
> Add pointer to log messages, especially those that explicitly ask for 
> operator intervention; e.g. Master#inMeta.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21192) Add HOW-TO repair damaged AMv2.

2018-09-12 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612821#comment-16612821
 ] 

stack commented on HBASE-21192:
---

For example:

If this in log:

{code}
2018-09-12 15:29:06,558 WARN 
org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK 
Region-In-Transition rit=OPENING, 
location=va1001.halxg.cloudera.com,22101,1536173230599, 
table=IntegrationTestBigLinkedList_20180626110336, 
region=dbdb56242f17610c46ea044f7a42895b
2018-09-12 15:29:06,558 WARN 
org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK 
Region-In-Transition rit=OPENING, 
location=ve1229.halxg.cloudera.com,22101,1536173229844, 
table=IntegrationTestBigLinkedList_20180803113809, 
region=4d1618634dae662acb06f5e3b55223c9
{code}

... as long as no lock on the region, you should be able to just do an assign 
of region to unSTUCK it. If many, make a file of them all and cat into hbase 
shell as in: {{$ cat /tmp/a.txt |hbase shell}} where /tmp/a.txt has stuff like:

{code}
assign 'fb9e0a6e864e36894c48da74074de65d'
assign '494e64585e49a22dad2f35383e7b9bb9'
assign 'e1fa1d4c3dcd59d6a0a61a5c63f4fda5'
assign '4b1fa4fd3bc52d1a6a94db1c4c13ab2b'
assign '86c5348e84e200fdf2f8633c9ac188b5'
assign 'ab60573f41a978de566a8a7097cf8ccc'
assign '085e05caefffcfb17356d4326e99c523'
assign '6ab89f20867d6a97fdb2a61fa82be4cc'
assign '4feb719da3cb53374d7b9162c0849c90'
assign '38d66170d5004c22ed61b184b8209f74'
assign 'c9807aef53ef14f14c9fc1de6ad942c5'
assign 'dbdb56242f17610c46ea044f7a42895b'
assign '4d1618634dae662acb06f5e3b55223c9'
assign '95035cf88e92179c5673c49d3eceaf7d'
{code}



> Add HOW-TO repair damaged AMv2.
> ---
>
> Key: HBASE-21192
> URL: https://issues.apache.org/jira/browse/HBASE-21192
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> Need a page or two on how to do various fixups. Will include doc on how to 
> identify particular circumstance, how to run a repair, as well as caveats 
> (e.g. if no log recovery, then region may be missing edits).
> Add pointer to log messages, especially those that explicitly ask for 
> operator intervention; e.g. Master#inMeta.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)