Yu Li created HBASE-16201: ----------------------------- Summary: NPE in RpcServer causing intermittent UT failure of TestMasterReplication#testHFileCyclicReplication Key: HBASE-16201 URL: https://issues.apache.org/jira/browse/HBASE-16201 Project: HBase Issue Type: Bug Reporter: Yu Li Assignee: Yu Li
Every several rounds of {{TestMasterReplication#testHFileCyclicReplication}}, we could observe below NPE in UT log: {noformat} java.lang.NullPointerException at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2257) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:118) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:189) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:169) {noformat} And related codes at RpcServer line 2257 are: {code} if (e instanceof ServiceException) { e = e.getCause(); } // increment the number of requests that were exceptions. metrics.exception(e); if (e instanceof LinkageError) throw new DoNotRetryIOException(e); if (e instanceof IOException) throw (IOException)e; {code} And after some debugging, we could find several places that constructing ServiceException with no cause, such as in {{RsRpcServices#replicateWALEntry}}: {code} if (regionServer.replicationSinkHandler != null) { ... } else { throw new ServiceException("Replication services are not initialized yet"); } {code} So we should firstly check and only reset {{e=e.getCause()}} when the cause is not null -- This message was sent by Atlassian JIRA (v6.3.4#6332)