Yu Li created HBASE-16201:
-----------------------------

             Summary: NPE in RpcServer causing intermittent UT failure of 
TestMasterReplication#testHFileCyclicReplication
                 Key: HBASE-16201
                 URL: https://issues.apache.org/jira/browse/HBASE-16201
             Project: HBase
          Issue Type: Bug
            Reporter: Yu Li
            Assignee: Yu Li


Every several rounds of {{TestMasterReplication#testHFileCyclicReplication}}, 
we could observe below NPE in UT log:
{noformat}
java.lang.NullPointerException
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2257)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:118)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:189)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:169)
{noformat}
And related codes at RpcServer line 2257 are:
{code}
      if (e instanceof ServiceException) {
        e = e.getCause();
      }

      // increment the number of requests that were exceptions.
      metrics.exception(e);

      if (e instanceof LinkageError) throw new DoNotRetryIOException(e);
      if (e instanceof IOException) throw (IOException)e;
{code}
And after some debugging, we could find several places that constructing 
ServiceException with no cause, such as in {{RsRpcServices#replicateWALEntry}}:
{code}
      if (regionServer.replicationSinkHandler != null) {
        ...
      } else {
        throw new ServiceException("Replication services are not initialized 
yet");
      }
{code}
So we should firstly check and only reset {{e=e.getCause()}} when the cause is 
not null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to