[ https://issues.apache.org/jira/browse/HBASE-22538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Purtell updated HBASE-22538: ----------------------------------- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 1.4.11 1.3.6 1.5.0 Status: Resolved (was: Patch Available) > Prevent graceful_stop.sh from shutting down RS too early before finishing > unloading regions > ------------------------------------------------------------------------------------------- > > Key: HBASE-22538 > URL: https://issues.apache.org/jira/browse/HBASE-22538 > Project: HBase > Issue Type: Bug > Components: shell > Affects Versions: 1.4.9 > Reporter: Jeongdae Kim > Assignee: Jeongdae Kim > Priority: Minor > Fix For: 1.5.0, 1.3.6, 1.4.11 > > Attachments: HBASE-22538.branch-1.4.001.patch, > HBASE-22538.branch-1.4.002.patch > > > We can stop or restart region servers gracefully using graceful_stop.sh > command > This command should guarantee that all regions are moved out before shutting > down a region server. > However, sometimes i saw many requests failed while restarting a region > server with this command in our production clusters(v1.2.5) > affected clients got many RegionServerStoppedExceptions and exhausted retry > count. > I found it took 0.03 sec to move a region, it’s too fast. and, > moving(unloading) regions in the region server wasn’t finished, even didn’t > closed yet when region server got shutdown signal. > Because a region server serving regions (didn't be closed) were stopped, > clients got many exception (RegionServerStoppedException) > But, region_mover should wait until a region is served by other region > server(meta changed) > https://github.com/apache/hbase/blob/branch-1.2/bin/region_mover.rb#L153 > I figured out why this early shutdown happened. > a) our clusters use upper case hostname > b) region server makes ServerName with lowercase hostname, and it will be > sent to the master > https://github.com/apache/hbase/blob/branch-1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L542 > c) when updating meta, server name will keep its own case > https://github.com/apache/hbase/blob/branch-1.2/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java#L1527 > d) region_mover.rb just compare b) and c), so it is always false > https://github.com/apache/hbase/blob/branch-1.2/bin/region_mover.rb#L91 > https://github.com/apache/hbase/blob/branch-1.2/bin/region_mover.rb#L52 > I think region_mover should compare server name between master and meta with > the same case(lower) > With patch, I confirmed region_mover waited until finishing moving all > regions, then triggered shutting down region sever. (also observed only > RegionMovedException before shutdown log, and no exception after starting > shutdown) -- This message was sent by Atlassian JIRA (v7.6.3#76005)