[ https://issues.apache.org/jira/browse/HBASE-26596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yutong Xiao reassigned HBASE-26596: ----------------------------------- Assignee: Yutong Xiao > region_mover should gracefully ignore null response from > RSGroupAdmin#getRSGroupOfServer > ---------------------------------------------------------------------------------------- > > Key: HBASE-26596 > URL: https://issues.apache.org/jira/browse/HBASE-26596 > Project: HBase > Issue Type: Bug > Components: mover, rsgroup > Affects Versions: 1.7.1 > Reporter: Viraj Jasani > Assignee: Yutong Xiao > Priority: Major > > If regionserver has any non-daemon thread running even after it's own > shutdown, the running non-daemon thread can prevent clean JVM exit and > regionserver could be stuck in the zombie state. We have recently provided a > workaround for this in HBASE-26468 for regionserver exit hook to wait 30s for > all non-daemon threads to get stopped before terminating JVM abnormally. > However, if regionserver is stuck in such state, region_mover unload fails > with: > {code:java} > NoMethodError: undefined method `getName` for nil:NilClass > getSameRSGroupServers at /bin/region_mover.rb:503 > __ensure__ at /bin/region_mover.rb:313 > unloadRegions at /bin/region_mover.rb:310 > (root) at /bin/region_mover.rb:572 > {code} > This happens if the cluster has RSGroup enabled and the given server is > already stopped, hence RSGroupAdmin#getRSGroupOfServer would return null (as > the server is not running anymore so it is not part of any RSGroup). > region_mover should ride over this null response and gracefully exit from > unloadRegions() call. > > We should also check if the fix is applicable to branch-2 and above. -- This message was sent by Atlassian Jira (v8.20.1#820001)