[
https://issues.apache.org/jira/browse/HBASE-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861931#action_12861931
]
Miklos Kurucz commented on HBASE-2497:
--------------------------------------
Allright.
I would also like to note that this same defective code exists in
master/TableOperation.java at lines 106-107 in 0.20.3, but is already fixed in
trunk.
> ProcessServerShutdown throws NullPointerException for offline regions
> ---------------------------------------------------------------------
>
> Key: HBASE-2497
> URL: https://issues.apache.org/jira/browse/HBASE-2497
> Project: Hadoop HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.20.3
> Reporter: Miklos Kurucz
> Fix For: 0.20.5, 0.21.0
>
> Attachments: 2497-v2.patch, pss_diff.txt
>
>
> When a regionsserver dies the master can run into the following bug.
> 2010-04-27 17:20:37,303 DEBUG org.apache.hadoop.hbase.master.HMaster:
> Processing todo: ProcessServerShutdown of dell106.cluster,60020,1272377612991
> 2010-04-27 17:20:37,303 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
> server dell106.cluster,60020,1272377612991: logSplit: true, rootRescanned:
> true, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
> 2010-04-27 17:20:01,637 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: Log split complete,
> meta reassignment and scanning:
> 2010-04-27 17:20:01,653 DEBUG
> org.apache.hadoop.hbase.master.ProcessServerShutdown$ScanRootRegion: process
> server shutdown scanning root region on 10.1.3.124
> 2010-04-27 17:20:01,664 DEBUG
> org.apache.hadoop.hbase.master.RegionServerOperation: process server shutdown
> scanning root region on 10.1.3.124 finished master
> 2010-04-27 17:20:01,683 DEBUG
> org.apache.hadoop.hbase.master.ProcessServerShutdown$ScanMetaRegions: process
> server shutdown scanning .META.,,1 on 10.1.3.104:60020
> 2010-04-27 17:20:18,087 DEBUG
> org.apache.hadoop.hbase.master.ProcessServerShutdown$ScanMetaRegions:
> Exception in RetryableMetaOperation:
> 2010-04-27 17:20:18,118 WARN org.apache.hadoop.hbase.master.HMaster: Adding
> to delayed queue: ProcessServerShutdown of dell106.cluster,60020,1272377612991
> java.lang.RuntimeException: java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.master.RetryableMetaOperation.doWithRetries(RetryableMetaOperation.java:100)
> at
> org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:345)
> at
> org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:509)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:448)
> Caused by: java.lang.NullPointerException
> at org.apache.hadoop.hbase.util.Bytes.toLong(Bytes.java:487)
> at org.apache.hadoop.hbase.util.Bytes.toLong(Bytes.java:461)
> at
> org.apache.hadoop.hbase.master.ProcessServerShutdown.scanMetaRegion(ProcessServerShutdown.java:147)
> at
> org.apache.hadoop.hbase.master.ProcessServerShutdown$ScanMetaRegions.call(ProcessServerShutdown.java:264)
> at
> org.apache.hadoop.hbase.master.ProcessServerShutdown$ScanMetaRegions.call(ProcessServerShutdown.java:250)
> at
> org.apache.hadoop.hbase.master.RetryableMetaOperation.doWithRetries(RetryableMetaOperation.java:69)
> ... 3 more
> The problem is in ProcessServerShutdown.java at line 148-149:
> 146 String serverAddress =
> 147 Bytes.toString(values.getValue(CATALOG_FAMILY,
> SERVER_QUALIFIER));
> 148 long startCode =
> 149 Bytes.toLong(values.getValue(CATALOG_FAMILY,
> STARTCODE_QUALIFIER));
> 150 String serverName = null;
> 151 if (serverAddress != null && serverAddress.length() > 0) {
> 152 serverName = HServerInfo.getServerName(serverAddress,
> startCode);
> 153 }
> It should be modified to:
> 146 String serverAddress =
> 147 Bytes.toString(values.getValue(CATALOG_FAMILY,
> SERVER_QUALIFIER));
> 150 String serverName = null;
> 151 if (serverAddress != null && serverAddress.length() > 0) {
> 148 long startCode =
> 149 Bytes.toLong(values.getValue(CATALOG_FAMILY,
> STARTCODE_QUALIFIER));
> 152 serverName = HServerInfo.getServerName(serverAddress,
> startCode);
> 153 }
> As Bytes.toLong cannot handle the null pointer returned by getValue for
> missing STARTCODE_QUALIFIER of offline regions in META.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.