[ https://issues.apache.org/jira/browse/HADOOP-18811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ConfX updated HADOOP-18811: --------------------------- Description: h2. What happened: In ZKFailoverController.java, initRPC() function gets ZKFC RpcServer binding address and create a new ZKFCRpcServer object rpcServer. However rpcServer may be null when the ZKFCRpcServer constructor accepts a null policy provider and cause any later rpcServer usage a null pointer exception. h2. Buggy code: In ZKFailoverController.java {code:java} protected void initRPC() throws IOException { InetSocketAddress bindAddr = getRpcAddressToBindTo(); LOG.info("ZKFC RpcServer binding to {}", bindAddr); rpcServer = new ZKFCRpcServer(conf, bindAddr, this, getPolicyProvider()); // <-- Here getpolicyProvider might be null } {code} ZKFCRpcServer() eventually calls refreshWithLoadedConfiguration() function below. This function directly use provider without check null and this turns out making rpcServer above to be a null object. In ServiceAuthorizationManager.java {code:java} @Private public void refreshWithLoadedConfiguration(Configuration conf, PolicyProvider provider) { ... // Parse the config file Service[] services = provider.getServices(); // <--- provider might be null here ... {code} h2. How to trigger this bug: (1) Set hadoop.security.authorization to true (2) Run test org.apache.hadoop.ha.TestZKFailoverControllerStress#testRandomExpirations (3) You will see the following stack trace: {code:java} java.lang.NullPointerException at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:258) at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:63) at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:181) at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:177) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:503) at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:177) at org.apache.hadoop.ha.MiniZKFCCluster$DummyZKFCThread.doWork(MiniZKFCCluster.java:301) at org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189){code} (4) The null pointer exception here is due to the null {{rpcServer}} object caused by the bug described above. You can use the reproduce.sh in the attachment to easily reproduce the bug: We are happy to provide a patch if this issue is confirmed. was: h2. What happened: In ZKFailoverController.java, initRPC() function gets ZKFC RpcServer binding address and create a new ZKFCRpcServer object rpcServer. However rpcServer may be null when the ZKFCRpcServer constructor accepts a null policy provider and cause any later rpcServer usage a null pointer exception. h2. Buggy code: In ZKFailoverController.java {code:java} protected void initRPC() throws IOException { InetSocketAddress bindAddr = getRpcAddressToBindTo(); LOG.info("ZKFC RpcServer binding to {}", bindAddr); rpcServer = new ZKFCRpcServer(conf, bindAddr, this, getPolicyProvider()); // <-- Here getpolicyProvider might be null } {code} ZKFCRpcServer() eventually calls refreshWithLoadedConfiguration() function below. This function directly use provider without check null and this turns out making rpcServer above to be a null object. In ServiceAuthorizationManager.java {code:java} @Private public void refreshWithLoadedConfiguration(Configuration conf, PolicyProvider provider) { ... // Parse the config file Service[] services = provider.getServices(); // <--- provider might be null here ... {code} h2. How to trigger this bug: (1) Set hadoop.security.authorization to true (2) Run test org.apache.hadoop.ha.TestZKFailoverControllerStress#testRandomExpirations (3) You will see the following stack trace: {code:java} java.lang.NullPointerException at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:258) at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:63) at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:181) at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:177) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:503) at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:177) at org.apache.hadoop.ha.MiniZKFCCluster$DummyZKFCThread.doWork(MiniZKFCCluster.java:301) at org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189){code} (4) The null pointer exception here is due to the null {{rpcServer}} object caused by the bug described above. > Buggy ZKFCRpcServer constructor creates null object and crashes the rpcServer > ----------------------------------------------------------------------------- > > Key: HADOOP-18811 > URL: https://issues.apache.org/jira/browse/HADOOP-18811 > Project: Hadoop Common > Issue Type: Bug > Reporter: ConfX > Priority: Critical > > h2. What happened: > In ZKFailoverController.java, initRPC() function gets ZKFC RpcServer binding > address and create a new ZKFCRpcServer object rpcServer. However rpcServer > may be null when the ZKFCRpcServer constructor accepts a null policy provider > and cause any later rpcServer usage a null pointer exception. > h2. Buggy code: > In ZKFailoverController.java > {code:java} > protected void initRPC() throws IOException { > InetSocketAddress bindAddr = getRpcAddressToBindTo(); > LOG.info("ZKFC RpcServer binding to {}", bindAddr); > rpcServer = new ZKFCRpcServer(conf, bindAddr, this, getPolicyProvider()); > // <-- Here getpolicyProvider might be null > } > {code} > ZKFCRpcServer() eventually calls refreshWithLoadedConfiguration() function > below. This function directly use provider without check null and this turns > out making rpcServer above to be a null object. > In ServiceAuthorizationManager.java > {code:java} > @Private > public void refreshWithLoadedConfiguration(Configuration conf, > PolicyProvider provider) { > ... > // Parse the config file > Service[] services = provider.getServices(); // <--- provider might be > null here > ... {code} > h2. How to trigger this bug: > (1) Set hadoop.security.authorization to true > (2) Run test > org.apache.hadoop.ha.TestZKFailoverControllerStress#testRandomExpirations > (3) You will see the following stack trace: > {code:java} > java.lang.NullPointerException > > at > org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:258) > > > at > org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:63) > > at > org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:181) > > > at > org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:177) > > > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:503) > > > at > org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:177) > > at > org.apache.hadoop.ha.MiniZKFCCluster$DummyZKFCThread.doWork(MiniZKFCCluster.java:301) > > at > org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189){code} > (4) The null pointer exception here is due to the null {{rpcServer}} object > caused by the bug described above. > You can use the reproduce.sh in the attachment to easily reproduce the bug: > We are happy to provide a patch if this issue is confirmed. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org