[ 
https://issues.apache.org/jira/browse/HBASE-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761881#comment-16761881
 ] 

Sean Busbey commented on HBASE-21843:
-------------------------------------

I don't think it's a good idea to revert HBASE-20856. the principles of that 
issue are still sound.

We already have logic somewhere for the AsyncDFS based WAL that falls back to 
the default if something goes wrong with the needed HDFS hooks. Can we do 
something similar for the region grouping provider and make the check something 
like "did you ask for a provider for meta"?

> RegionGroupingProvider breaks the meta wal file name pattern which may cause 
> data loss for meta region
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-21843
>                 URL: https://issues.apache.org/jira/browse/HBASE-21843
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 3.0.0, 2.1.0, 2.2.0
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Blocker
>              Labels: data-loss
>             Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5, 2.3.0
>
>         Attachments: HBASE-21843.master.001.patch, HBASE-21843.patch
>
>
> A bit unusual, but managed to face this twice lately on both distributed and 
> local standalone mode, on VMs. Somehow, after some VM pause/resume, got into 
> a situation where regions on meta were assigned to a give RS startcode that 
> had no corresponding WAL dir.
> That caused those regions to never get assigned, because the given RS 
> startcode is not found anywhere by RegionServerTracker/ServerManager, so no 
> SCP is created to this RS startcode, leaving the region "open" on a dead 
> server forever, in META.
> Could get this sorted by adding extra check on loadMeta, checking if the RS 
> assigned to the region in meta is not online and doesn't have a WAL dir, then 
> mark this region as offline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to