[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-09-05 Thread stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923691#comment-16923691
 ] 

stack commented on HBASE-21745:
---

Let what is in this issue make up HBCK2 1.0.0. I made HBASE-22977 for HBCK2 
2.0.0 brainstorming.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-08-27 Thread stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917231#comment-16917231
 ] 

stack commented on HBASE-21745:
---

[~zghaobac] Yes. BulkLoad Tool is right place to start. Let me look at it. 

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-08-12 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905687#comment-16905687
 ] 

Guanghao Zhang commented on HBASE-21745:


bq. Ok. Lets see how we do. I think adoption service is what we want in the 
end. Maybe we get there incrementally.
Agreed. We can add adoption service in the future.

bq. For anything else, an overlap found in HDFS... we'll need to add something 
... to the master? Otherwise its HBCK2 figuring where the hfiles should go, 
placing them there, then reopen of the touched regions.
We can fix it by bulkload tool?
Step1. we need to make sure there are no holes for this table. If there are 
holes, fix meta holes first.
Step2. Get the table's regions start/end key. Read the orphan regions dir's 
hfile and generate new hfile.
Step3. Bulkload the new generated hfile.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-08-12 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905312#comment-16905312
 ] 

stack commented on HBASE-21745:
---

bq. We found some orphan regions on our test cluster. The regions dir is empty. 
And no region holes in meta. For this case, the orphan region dir shouled be 
remove directly.

Yes.

If an adoption service, we'd give the master these dirs and it would just 
delete them.

bq. I thought we can do this at HBCK tool (client-side). But should not add 
'adopt' API to hbck Service now (master-side).

Ok. Lets see how we do. I think adoption service is what we want in the end. 
Maybe we get there incrementally.

So, for empty dirs, the answer is easy... just delete. If a region in HDFS and 
a hole in meta that corresponds, there is the tool that [~wchevreuil] added to 
HBCK2 which should find the region in HDFS and then hoists it up into 
hbase:meta filling the hole.

For anything else, an overlap found in HDFS...  we'll need to add something ... 
to the master? Otherwise its HBCK2 figuring where the hfiles should go, placing 
them there, then reopen of the touched regions. 

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-08-11 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904783#comment-16904783
 ] 

Guanghao Zhang commented on HBASE-21745:


bq. I do not follow what you are saying above. I was fine till I go to "and 
there are overlap regions on meta."
We found some orphan regions on our test cluster. The regions dir is empty. And 
no region holes in meta. For this case, the orphan region dir shouled be remove 
directly.

bq. I was thinking we'd add an 'adopt' API to hbck Service. You'd pass it one 
or more 'orphan' directories. The Master would read the directory and figure 
where to put the hfiles doing the right thing. The Master would be running an 
'adoption service'.
The 'adoption service' is not easy to handle all cases... HBCK2 tool only 
provide basic function and user can combine them to generate powerful functions 
now. I thought we can do this at HBCK tool (client-side). But should not add 
'adopt' API to hbck Service now (master-side).

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-08-09 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904153#comment-16904153
 ] 

stack commented on HBASE-21745:
---

[~zghaobac]

bq. For orphan regions on filesystem, use HBCK2 to fix it directly, ie. remove 
the region directory? stack

I was thinking we'd add an 'adopt' API to hbck Service. You'd pass it one or 
more 'orphan' directories. The Master would read the directory and figure where 
to put the hfiles doing the right thing. The Master would be running an 
'adoption service'. What you think [~zghaobac]?

bq. For orphan regions on filesystem, the region dir can be removed directly 
only when the region dir is empty and there are overlap regions on meta. 

I do not follow what you are saying above. I was fine till I go to "and 
there are overlap regions on meta."

bq. But for the region holes, it should add a new regioninfo to meta and assign 
this region again. This need the user to analyze how to fix it.

For region holes, we have a 'fix' in HBASE-22771 soon to be exposed by 
HBASE-22825)? It will create the region in meta. Does not assign it though. 
Yeah, operator would do this. Should add assign after hole is fixed?

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-08-09 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903641#comment-16903641
 ] 

Guanghao Zhang commented on HBASE-21745:


For orphan regions on filesystem, the region dir can be removed directly only 
when the region dir is empty and there are overlap regions on meta. But for the 
region holes, it should add a new regioninfo to meta and assign this region 
again. This need the user to analyze how to fix it.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-08-08 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903524#comment-16903524
 ] 

Guanghao Zhang commented on HBASE-21745:


For orphan regions on filesystem, use HBCK2 to fix it directly, ie. remove the 
region directory? [~stack]

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-07-30 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896638#comment-16896638
 ] 

stack commented on HBASE-21745:
---

HBASE-22771 is about adding a fix meta method to Hbck Service. Can be called by 
HBCK2 (and maybe later by the shell).  In another issue, will create the HBCK2 
client-side version of what is added here for the installs shipped before this 
issue landed.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-07-29 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895796#comment-16895796
 ] 

stack commented on HBASE-21745:
---

Back after [~zghaobac] landed his new 'HBCK Report' in HBase UI and after 
adding reporting by CatalogJanitor on various issues w/ hbase:meta. UI is 
coming along so now to outstanding fixHoles, fixOverlaps, and orphans.

On fixHoles:

 * Need to add to Hbck Interface a fixMeta method. It'll plug holes and 
overlaps found in hbase:meta by the new CatalogJanitor additions in 
HBASE-22723. If available on the server, HBCK2 would call it (later, could add 
an hbck shell command and it would call it instead -- could be part of a 
general plan of gradually moving hbck2 functionality over into server). 
Otherwise, if not available, HBCK2 would run its own attempt at fixing 
client-side where it did a scan of hbase:meta to find problems and then work on 
trying to fix anything it found -- a copy of the code we've added server-side.
 * But before fixing holes, operator should call addMissingRegionsToMeta -- the 
[~wchevreuil] addition over in HBASE-22567. It does better than hole fixing 
looking for candidate regions in fs with which it can plug holes. Could have 
the hole-fixer in HBCK2 call it first always (or maybe not if we are doing 
HBCK2 is made up of plumbing tools not trying to be porcelain).
 * fix overlaps is done by merging. The merge tool probably needs expansion. 
For example, it currently takes region names only. Would be handy if could pass 
rows in table so operator could ask merge all between two marker rows.

Outstanding then is adopting orphans in hdfs. 

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-07-25 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893192#comment-16893192
 ] 

stack commented on HBASE-21745:
---

Porting comments by [~busbey] from HBASE-21966 here because they are general 
and will sit better here than in the issue they were pulled from.

Some notes from a discussion [~elserj] and I ended up having after a customer 
escalation:

{quote}We liked the idea
of making very focused operations, rather than sweeping "-repair"
options, e.g. "create a row in hbase:meta for this regiondir", "create
an empty regiondir", "create a regiondir from this hbase:meta row".
With tools like this, you could imagine a CLI tool stepping through
the Regions of a table presenting a three-way merge like tool: for
every region in a table, ask the user if they want to keep the state
in meta or the state in HDFS. We like this idea because it keeps HBase
devs out of the role of figuring out what the correct thing to do is.

A couple of related concerns:

Such a tool would not aim to be sufficient for the bar some folks have 
expressed for HBck2 to move the "stable" pointer to HBase 2. Essentially some 
ability to rebuild meta from external info ala HBASE-21665 or HBASE-18840
Need to make sure these tools are compose-able. It's all
well-and-good to say that human-insight needs to be applied to know
what HBCK2 commands to run, but we also don't want cluster recoveries
to take 10's of hours. Something easily scriptable is important. To continue 
the three-way-merge analogy, merge strategies ala ours/theirs in git.{quote}

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-07-25 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893176#comment-16893176
 ] 

stack commented on HBASE-21745:
---

WIP doc on hbck2/hbck1 messings around this issue.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-07-25 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893168#comment-16893168
 ] 

stack commented on HBASE-21745:
---

The original hbck2 issue.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-07-25 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893163#comment-16893163
 ] 

stack commented on HBASE-21745:
---

HBASE-21447 is about adding hole-fixing to hbck2. Related.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-07-25 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893091#comment-16893091
 ] 

stack commented on HBASE-21745:
---

HBASE-22723 is about reporting holes, overlaps, etc. found during 
catalogjanitor scans.  [~zghaobac] over in HBASE-22709 adds a chore that runs 
in Master that finds 'problematic regions' -- i.e. old hbck1 findings such as 
orphans in HDFS, regions on RS not in hbase:meta, etc. -- and then shows them 
on a new 'hbck report' page added to the UI (HBASE-22527 had added listing 
problematic regions on the head of the main master UI but HBASE-22709 moves the 
info to a dedicated page). After HBASE-22709 and HBASE-22723 go in, will add to 
the new 'hbck report' the findings from HBASE-22723 so also shows in UI.

TODO:
 * HOWTO fix it doc added inline into 'hbck report' page
 * shell command to regenerate the 'hbck report' so shows fresher info.

For now, fix-it is hbck2 tool. Later, we might consider moving fixing into the 
shell to a new hbck command.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-07-22 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890658#comment-16890658
 ] 

stack commented on HBASE-21745:
---

HBASE-22723 is about implementing first idea above, having CatalogJanitor 
report on holes and overlaps.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-07-19 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889132#comment-16889132
 ] 

stack commented on HBASE-21745:
---

Linking a good one by [~wchevreuil]

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-07-19 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889062#comment-16889062
 ] 

stack commented on HBASE-21745:
---

A few thoughts on remaining items:

 * Fix region holes, overlaps, and other errors in the region chain
 * Fix failed split and merge transactions that have failed to roll back due to 
some bug (related to previous)

There are holes and overlaps in hbase:meta and then there are holes and 
overlaps in the filesystem (hdfs). In the past, hbck1 would fix 'holes and 
overlaps' in hdfs then hbase:meta would be consulted and adjusted to pick 
up the hdfs changes. Lets not do it this way for hbck2 (Caveat HBASE-22567 
which finds hbase:meta holes and if an hdfs region, hoists it up into 
hbase;meta). In hbck2, perhaps the Master itself can see 'holes' and 'overlaps' 
in hbase:meta. Master already runs a process on a period to ‘check’ hbase:meta 
called CatalogJanitor. It could minimally report holes and overlaps (as well as 
unknown servers, etc.). I was going to have a look at doing this. CJ could 
report to the UI its findings (after the [~zghaobac] new tendency)

What about leftover directories in hdfs? Orphans and broken regions or broken 
tables? In hdfs, hbck1 used to have the notion of 'adoption' where a new region 
was created in a target table and the 'orphan' region's content was copied into 
the new location. Thereafter, there'd be machinations to get the new region up 
into hbase:meta. What if we ran an 'adoption service' in the Master where hbck2 
would pass the Master a list of directories and tell the Master to 'adopt' the 
content whether files or dropped regions, overlapping dirs, or even tables? The 
Master's hbase:meta would have to be healthy first so new data had a home to go 
to.

On fix split and merge transactions, this category of issues we should roll up 
into the general master fix described above where something like CJ recognizes 
any problem (it already does a bunch of the heavy-lifting for split/merges). 
The 'HBASE-21965
Fix failed split and merge transactions that have failed to roll back' "fix" 
above has actually been undone for now in favor of "HBASE-22709 Add a web ui to 
show the failed splited/merged regions" whose intent is listing in UI 
split/merges with recipes for fix.

And then perhaps a release of hbase-operator-tools?



> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * -Fix assignment errors (undeployed regions, double assignments (yes, 
> should not be possible), etc)- (See 
> https://issues.apache.org/jira/browse/HBASE-21745?focusedCommentId=16888302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16888302)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-07-18 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888302#comment-16888302
 ] 

stack commented on HBASE-21745:
---

On...

bq. Fix assignment errors (undeployed regions, double assignments (yes, should 
not be possible), etc)

HBASE-22527 adds display on master UI of 'problematic regions' which are one of 
the following:

 * Master thought this region opened, but no regionserver reported it.
 * Master thought this region opened on Server1, but regionserver reported 
Server2
 * More than one regionservers reported opened this region

All above should be fixable with HBCK2 currently; what combination depends on 
the particular problem. For example, HBASE-22527 has case #1 above where a meta 
had a region assigned to a server no longer a member of the cluster (for 
whatever reason...). A recipe in HBASE-22527 shows one fix (I think there a 
more compact solution but in the heat of the moment... whatever works). For #2 
and #3, Master used to tell disagreeing regionserver to kill itself because it 
was in disagreement with the Master's view of the world (but I think this 
killing was later undone).

I think 'Fix assignment errors (undeployed regions, double assignments (yes, 
should not be possible), etc)' covered. Let me strike it out in the list above.


> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * Fix assignment errors (undeployed regions, double assignments (yes, should 
> not be possible), etc)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-07-18 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888182#comment-16888182
 ] 

stack commented on HBASE-21745:
---

bq. Taking a look at the refguide, we should at least mention healthy meta == 
healthy upgrade. I can add a note...
Done by HBASE-22685.

Putting a quote from [~zghaobac] that I like here so it gets more of an 
'airing'. Its about hbck/hbck2, etc. (Paraphrased because I'm trying to commit 
it as comment against the Hbck Interface).

{code}
+// The Hbck interface should only support methods which are about "fixing". 
The "checking" work
+// can be done elsewhere -- Canary or by Master: E.g. we can add a chore 
thread in master to
+// check for wrong circumstance and show badness in Master web ui. -- 
paraphrase of
+// Guanghao Zhang from 
https://issues.apache.org/jira/browse/HBASE-22673?focusedCommentId=16882591=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16882591
{code}

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
>  * -Rebuild meta from region metadata in the filesystem, aka offline meta 
> rebuild.-
>  * Fix assignment errors (undeployed regions, double assignments (yes, should 
> not be possible), etc)
>  * Fix region holes, overlaps, and other errors in the region chain
>  * Fix failed split and merge transactions that have failed to roll back due 
> to some bug (related to previous)
>  *  -Enumerate store files to determine file level corruption and sideline 
> corrupt files-
>  * -Fix hfile link problems (dangling / broken)-



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-07-12 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884234#comment-16884234
 ] 

stack commented on HBASE-21745:
---

HBASE-22688 adds reporting and fixing on hfile corrupt, broken references and 
links. Also writes hbase.version file if missing.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
> {quote}
>- Rebuild meta from region metadata in the filesystem, aka offline meta
>rebuild.
>- Fix assignment errors (undeployed regions, double assignments (yes,
>should not be possible), etc)
>- Fix region holes, overlaps, and other errors in the region chain
>- Fix failed split and merge transactions that have failed to roll back
>due to some bug (related to previous)
>- Enumerate store files to determine file level corruption and sideline
>corrupt files
>- Fix hfile link problems (dangling / broken)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-07-12 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884006#comment-16884006
 ] 

stack commented on HBASE-21745:
---

HBASE-22680 adds offline rebuild of meta for hbase2 to hbck2. Over in 
HBASE-22567, I've added note on how I think the OfflineMetaRepair update and 
the work in HBASE-22567 compliment each other.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
> {quote}
>- Rebuild meta from region metadata in the filesystem, aka offline meta
>rebuild.
>- Fix assignment errors (undeployed regions, double assignments (yes,
>should not be possible), etc)
>- Fix region holes, overlaps, and other errors in the region chain
>- Fix failed split and merge transactions that have failed to roll back
>due to some bug (related to previous)
>- Enumerate store files to determine file level corruption and sideline
>corrupt files
>- Fix hfile link problems (dangling / broken)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-07-10 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882561#comment-16882561
 ] 

stack commented on HBASE-21745:
---

Just to state that over the last few days, have been playing with moving the 
hbck1 tooling out to hbase-operator-tools under hbase-hbck and then altering it 
so it works against the hbase2 context.  This is so as to realize at least the 
below from the original list that opens this JIRA:

 * Rebuild meta from region metadata in the filesystem, aka offline meta 
rebuild.
 * Enumerate store files to determine file level corruption and sideline 
corrupt files
 * Fix hfile link problems (dangling / broken)

So far the project has been 'interesting': e.g. we'll have to be able to 
distinguish hbase2 from hbase3 now hbase3 has namespace integrated into 
hbase:meta; NOT having namespace integrated complicates the offline meta 
rebuild for hbase2... Hope to have something to share soon.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
> {quote}
>- Rebuild meta from region metadata in the filesystem, aka offline meta
>rebuild.
>- Fix assignment errors (undeployed regions, double assignments (yes,
>should not be possible), etc)
>- Fix region holes, overlaps, and other errors in the region chain
>- Fix failed split and merge transactions that have failed to roll back
>due to some bug (related to previous)
>- Enumerate store files to determine file level corruption and sideline
>corrupt files
>- Fix hfile link problems (dangling / broken)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-06-11 Thread Wellington Chevreuil (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861248#comment-16861248
 ] 

Wellington Chevreuil commented on HBASE-21745:
--

Created HBASE-22567 for converting the mentioned tool to an hbck2 command. Had 
linked a first PR there. 

In another topic, since we are adding more commands into hbck2, maybe it's time 
to start working on a refactoring applying a _command like_ design, and move 
commands specific logic from the current _HBCK2_ main class. 

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
> {quote}
>- Rebuild meta from region metadata in the filesystem, aka offline meta
>rebuild.
>- Fix assignment errors (undeployed regions, double assignments (yes,
>should not be possible), etc)
>- Fix region holes, overlaps, and other errors in the region chain
>- Fix failed split and merge transactions that have failed to roll back
>due to some bug (related to previous)
>- Enumerate store files to determine file level corruption and sideline
>corrupt files
>- Fix hfile link problems (dangling / broken)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-06-05 Thread Toshihiro Suzuki (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856417#comment-16856417
 ] 

Toshihiro Suzuki commented on HBASE-21745:
--

> I can convert it into another hbck2 command.
+1

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
> {quote}
>- Rebuild meta from region metadata in the filesystem, aka offline meta
>rebuild.
>- Fix assignment errors (undeployed regions, double assignments (yes,
>should not be possible), etc)
>- Fix region holes, overlaps, and other errors in the region chain
>- Fix failed split and merge transactions that have failed to roll back
>due to some bug (related to previous)
>- Enumerate store files to determine file level corruption and sideline
>corrupt files
>- Fix hfile link problems (dangling / broken)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-06-03 Thread Wellington Chevreuil (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854915#comment-16854915
 ] 

Wellington Chevreuil commented on HBASE-21745:
--

Thanks for the notes [~stack]! Yeah, it seems these meta issues are mostly 
induced by poor practices, but still punches us in the face of supportability.

As for the tool, it's very simple, yet better than having to rebuild the whole 
hbase dir and bulkload previous files. It indeed requires refinements, if we 
want to port it to hbck2.

Briefly detailing, it takes a list of tables in the format of an hbase shell 
meta scan result filtering by table:state cell. It can be easily obtained as:  
_echo "scan 'hbase:meta'" | hbase shell | grep "column=table:state"._ It does 
not assign regions, only checks for regions with _regioinfo_ in hdfs that are 
missing in meta, then _put_ those in meta, in CLOSED state. At the end, 
currently, it just prints an _assigns_ command with all the re-inserted 
regions. It's up for an operator to run this resulting command with hbck2, 
then, but if we want to make it more automated, we can sure submit APs. For the 
META reinsertion phase, maybe we can make disable given table to ensure no key 
range collision would happen, say in case an already existing region splits?

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
> {quote}
>- Rebuild meta from region metadata in the filesystem, aka offline meta
>rebuild.
>- Fix assignment errors (undeployed regions, double assignments (yes,
>should not be possible), etc)
>- Fix region holes, overlaps, and other errors in the region chain
>- Fix failed split and merge transactions that have failed to roll back
>due to some bug (related to previous)
>- Enumerate store files to determine file level corruption and sideline
>corrupt files
>- Fix hfile link problems (dangling / broken)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-06-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854813#comment-16854813
 ] 

stack commented on HBASE-21745:
---

Nice writeup [~wchevreuil].  Taking a look at the refguide, we should at least 
mention healthy meta == healthy upgrade. I can add a note...  

Looking at tool, I like how it is basic. Could do w/ a few comments explaining 
what its up to (smile). It is reading a text file or what format? Might want to 
ensure no assigning is going on concurrently.  It is editing the meta directly? 
If so, that will not go well out in the field I'd say in anything but the most 
controlled of circumstance. Queue a procedure instead?

Good on you W.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
> {quote}
>- Rebuild meta from region metadata in the filesystem, aka offline meta
>rebuild.
>- Fix assignment errors (undeployed regions, double assignments (yes,
>should not be possible), etc)
>- Fix region holes, overlaps, and other errors in the region chain
>- Fix failed split and merge transactions that have failed to roll back
>due to some bug (related to previous)
>- Enumerate store files to determine file level corruption and sideline
>corrupt files
>- Fix hfile link problems (dangling / broken)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-06-03 Thread Wellington Chevreuil (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854501#comment-16854501
 ] 

Wellington Chevreuil commented on HBASE-21745:
--

{quote}Rebuild meta from region metadata in the filesystem, aka offline meta 
rebuild.
{quote}
We had seen a few cases requiring this among our customer base, lately. Some 
trends observed:
 - Happens after upgrading from 1.x to 2.1.0;
 - Cluster apparently wasn't on a healthy state prior to upgrading (some 
regions were already on an inconsistent state, or wrong permissions defined on 
hbase table folders);
 - OfflineMetaRepair/hbck1 forcedly used;

Obviously, it appears this situation mostly happens due poor admin actions, but 
it may still be valid provide a convenient fix option for this (it has also 
been mentioned by some folks on a recent discussion mail thread about hbck2 
directions). 

To aid our support team in such cases, I had recently written a [simpler 
client|https://github.com/wchevreuil/support-tools/blob/master/src/main/java/com/cloudera/support/hbase/RegionMetaBuilder.java]
 than OfflineMetaRepair, for re-inserting regions with metadata in hdfs back 
into meta. If folks think it's relevant, I can convert it into another hbck2 
command.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
> {quote}
>- Rebuild meta from region metadata in the filesystem, aka offline meta
>rebuild.
>- Fix assignment errors (undeployed regions, double assignments (yes,
>should not be possible), etc)
>- Fix region holes, overlaps, and other errors in the region chain
>- Fix failed split and merge transactions that have failed to roll back
>due to some bug (related to previous)
>- Enumerate store files to determine file level corruption and sideline
>corrupt files
>- Fix hfile link problems (dangling / broken)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-02-26 Thread Jingyun Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16778944#comment-16778944
 ] 

Jingyun Tian commented on HBASE-21745:
--

{quote} - Fix region holes, overlaps, and other errors in the region chain
 - Fix failed split and merge transactions that have failed to roll back
due to some bug (related to previous){quote}
I think HBCK2 should be able to fix these two problems. They may be caused by 
master restart during some procedures running and procedure logs lost. Although 
this should rarely happens, we should be able to fix these.

Let add subtasks for fixing them.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
> {quote}
>- Rebuild meta from region metadata in the filesystem, aka offline meta
>rebuild.
>- Fix assignment errors (undeployed regions, double assignments (yes,
>should not be possible), etc)
>- Fix region holes, overlaps, and other errors in the region chain
>- Fix failed split and merge transactions that have failed to roll back
>due to some bug (related to previous)
>- Enumerate store files to determine file level corruption and sideline
>corrupt files
>- Fix hfile link problems (dangling / broken)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-02-18 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771384#comment-16771384
 ] 

stack commented on HBASE-21745:
---

None yet. Will come back here soon

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
> {quote}
>- Rebuild meta from region metadata in the filesystem, aka offline meta
>rebuild.
>- Fix assignment errors (undeployed regions, double assignments (yes,
>should not be possible), etc)
>- Fix region holes, overlaps, and other errors in the region chain
>- Fix failed split and merge transactions that have failed to roll back
>due to some bug (related to previous)
>- Enumerate store files to determine file level corruption and sideline
>corrupt files
>- Fix hfile link problems (dangling / broken)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-02-18 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771038#comment-16771038
 ] 

Duo Zhang commented on HBASE-21745:
---

Ping [~stack] sir, any progress here? Seems lots of user wants HBCK2 to have 
the ability to fix more issues other than assignment so they can upgrade to 
2.x...

Thanks.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
> {quote}
>- Rebuild meta from region metadata in the filesystem, aka offline meta
>rebuild.
>- Fix assignment errors (undeployed regions, double assignments (yes,
>should not be possible), etc)
>- Fix region holes, overlaps, and other errors in the region chain
>- Fix failed split and merge transactions that have failed to roll back
>due to some bug (related to previous)
>- Enumerate store files to determine file level corruption and sideline
>corrupt files
>- Fix hfile link problems (dangling / broken)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-01-24 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751933#comment-16751933
 ] 

stack commented on HBASE-21745:
---

Doing review of hbck1 to see what applies still and what to bring over so works 
against new guise.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
> {quote}
>- Rebuild meta from region metadata in the filesystem, aka offline meta
>rebuild.
>- Fix assignment errors (undeployed regions, double assignments (yes,
>should not be possible), etc)
>- Fix region holes, overlaps, and other errors in the region chain
>- Fix failed split and merge transactions that have failed to roll back
>due to some bug (related to previous)
>- Enumerate store files to determine file level corruption and sideline
>corrupt files
>- Fix hfile link problems (dangling / broken)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-01-22 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749011#comment-16749011
 ] 

stack commented on HBASE-21745:
---

Thanks for link/reminder [~pankaj2461]

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
> {quote}
>- Rebuild meta from region metadata in the filesystem, aka offline meta
>rebuild.
>- Fix assignment errors (undeployed regions, double assignments (yes,
>should not be possible), etc)
>- Fix region holes, overlaps, and other errors in the region chain
>- Fix failed split and merge transactions that have failed to roll back
>due to some bug (related to previous)
>- Enumerate store files to determine file level corruption and sideline
>corrupt files
>- Fix hfile link problems (dangling / broken)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-01-22 Thread Pankaj Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749005#comment-16749005
 ] 

Pankaj Kumar commented on HBASE-21745:
--

{quote}Has anyone ever had use for this facility in recent memory?
{quote}
Pardon [~stack] Sir, we have use case where we rebuild meta region for old 
version data migration or when meta got corrupted. Similar discussion happened 
in HBASE-21665. 

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
> {quote}
>- Rebuild meta from region metadata in the filesystem, aka offline meta
>rebuild.
>- Fix assignment errors (undeployed regions, double assignments (yes,
>should not be possible), etc)
>- Fix region holes, overlaps, and other errors in the region chain
>- Fix failed split and merge transactions that have failed to roll back
>due to some bug (related to previous)
>- Enumerate store files to determine file level corruption and sideline
>corrupt files
>- Fix hfile link problems (dangling / broken)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-01-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748425#comment-16748425
 ] 

Duo Zhang commented on HBASE-21745:
---

[~stack] In general sir, HBCK2 is for fixing the broken cluster where we have 
bugs. For AMv1, we know that the design itself has problems so we can see lots 
of broken states in real production, and AMv2 aims to solve this and now it 
seems work pretty well. But anyway, we could have bugs in code, so we haven't 
seen it now does not mean it will not happen in the future...

FWIW, I think we should have the ability to fix region holes, and failed 
split/merge, etc.

Thanks.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
> {quote}
>- Rebuild meta from region metadata in the filesystem, aka offline meta
>rebuild.
>- Fix assignment errors (undeployed regions, double assignments (yes,
>should not be possible), etc)
>- Fix region holes, overlaps, and other errors in the region chain
>- Fix failed split and merge transactions that have failed to roll back
>due to some bug (related to previous)
>- Enumerate store files to determine file level corruption and sideline
>corrupt files
>- Fix hfile link problems (dangling / broken)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21745) Make HBCK2 be able to fix issues other than region assignment

2019-01-21 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748404#comment-16748404
 ] 

stack commented on HBASE-21745:
---

bq. Rebuild meta from region metadata in the filesystem, aka offline meta 
rebuild.

Has anyone ever had use for this facility in recent memory? This was always a 
hack dependent on possible stale schema info echo'd to the FS. It would be 
blind to splits and merges. I'd actually like to remove the echoing of schema 
to the FS -- we'd have too if we want to change the layout in the FS so we can 
scale.

But we could do a version for hbase2 that gave some comfort in the unlikely 
case we lost a whole hbase:meta.

bq. Fix assignment errors (undeployed regions, double assignments (yes, should 
not be possible), etc)

I've not seen double assign in hbase2. On assign, between shell and hbck2, 
there are a variety of toolings -- some that were not available in hbase1 such 
as bulk assign/unassign. I think we are covered here.

Perhaps a check on the wholesomeness of hbase:meta? hbck2 has done work in the 
Canary to serve this verification function. Let me take another look and see if 
we need to hoist up stuff from hbck1.

bq. Fix region holes, overlaps, and other errors in the region chain

Haven't seen this in amv2. Doesn't seem to happen.

bq. Fix failed split and merge transactions that have failed to roll back due 
to some bug (related to previous)

Yeah. Haven't seen this either.

bq. Enumerate store files to determine file level corruption and sideline 
corrupt files

This seems to be an hfile tool variant. Let me look into adding it.

bq. Fix hfile link problems (dangling / broken)

Ditto.

Will be back.

> Make HBCK2 be able to fix issues other than region assignment
> -
>
> Key: HBASE-21745
> URL: https://issues.apache.org/jira/browse/HBASE-21745
> Project: HBase
>  Issue Type: Umbrella
>  Components: hbase-operator-tools, hbck2
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
>
> This is what [~apurtell] posted on mailing-list, HBCK2 should support
> {quote}
>- Rebuild meta from region metadata in the filesystem, aka offline meta
>rebuild.
>- Fix assignment errors (undeployed regions, double assignments (yes,
>should not be possible), etc)
>- Fix region holes, overlaps, and other errors in the region chain
>- Fix failed split and merge transactions that have failed to roll back
>due to some bug (related to previous)
>- Enumerate store files to determine file level corruption and sideline
>corrupt files
>- Fix hfile link problems (dangling / broken)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)