[ 
https://issues.apache.org/jira/browse/HBASE-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gavin updated HBASE-17963:
--------------------------
    Comment: was deleted

(was: A comment with security level 'jira-users' was removed.)

> RegionServers lose file locality on unplanned restart
> -----------------------------------------------------
>
>                 Key: HBASE-17963
>                 URL: https://issues.apache.org/jira/browse/HBASE-17963
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.1.2
>         Environment: Evident with HDP 2.4.3 running HBase 1.1.2
>            Reporter: Bjorn Olsen
>            Priority: Major
>
> When an HBase cluster crashes, HFile locality is lost. 
> Crashes can happen for a variety of reasons, and in this event having a quick 
> time to recover (both data and database performance) is critical. 
> On cluster restore, region servers do not load their previous set of regions, 
> which means all HFiles must be moved around until locality is achieved again. 
> Performance is poor while file locality is not close to 100%. 
> A major compaction must be run to move the regions around, which further 
> impacts performance and will take longer the more data was in HBase at the 
> time of the crash.
> There is a graceful_stop script which is useful for planned outages - you can 
> first unload the regions from the region server, restart it, and then reload 
> the regions to the same server. No HFiles need to be moved and file locality 
> is quickly restored.
> However, with an unplanned outage, there is no locality kept of where the 
> regions were. On a crash HBase randomly assigns regions to region servers and 
> HFile locality is very low. We then need to move all the HFiles around until 
> file locality is restored.
> This is fine for a small number of regions and small HFiles but becomes 
> problematic when you have a large number of region servers or large files.
> This JIRA is a request to improve this behavior for unplanned outages by 
> trying to restore the regions assigned per server, after a cluster restart. 
> For example, HBase could keep a list of the region locality at regular 
> intervals, and use this as an initial guideline when regions are restarted. 
> Locality might still not be 100% immediately - but presumably better than 0%. 
> It would be necessary to first disable the load balancer (if enabled) while 
> this restore is happening and enable it afterward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to