[ https://issues.apache.org/jira/browse/HADOOP-18811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ConfX updated HADOOP-18811: --------------------------- Attachment: reproduce.sh > Buggy ZKFCRpcServer constructor creates null object and crashes the rpcServer > ----------------------------------------------------------------------------- > > Key: HADOOP-18811 > URL: https://issues.apache.org/jira/browse/HADOOP-18811 > Project: Hadoop Common > Issue Type: Bug > Reporter: ConfX > Priority: Critical > Attachments: reproduce.sh > > > h2. What happened: > In ZKFailoverController.java, initRPC() function gets ZKFC RpcServer binding > address and create a new ZKFCRpcServer object rpcServer. However rpcServer > may be null when the ZKFCRpcServer constructor accepts a null policy provider > and cause any later rpcServer usage a null pointer exception. > h2. Buggy code: > In ZKFailoverController.java > {code:java} > protected void initRPC() throws IOException { > InetSocketAddress bindAddr = getRpcAddressToBindTo(); > LOG.info("ZKFC RpcServer binding to {}", bindAddr); > rpcServer = new ZKFCRpcServer(conf, bindAddr, this, getPolicyProvider()); > // <-- Here getpolicyProvider might be null > } > {code} > ZKFCRpcServer() eventually calls refreshWithLoadedConfiguration() function > below. This function directly use provider without check null and this turns > out making rpcServer above to be a null object. > In ServiceAuthorizationManager.java > {code:java} > @Private > public void refreshWithLoadedConfiguration(Configuration conf, > PolicyProvider provider) { > ... > // Parse the config file > Service[] services = provider.getServices(); // <--- provider might be > null here > ... {code} > h2. How to trigger this bug: > (1) Set hadoop.security.authorization to true > (2) Run test > org.apache.hadoop.ha.TestZKFailoverControllerStress#testRandomExpirations > (3) You will see the following stack trace: > {code:java} > java.lang.NullPointerException > > at > org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:258) > > > at > org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:63) > > at > org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:181) > > > at > org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:177) > > > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:503) > > > at > org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:177) > > at > org.apache.hadoop.ha.MiniZKFCCluster$DummyZKFCThread.doWork(MiniZKFCCluster.java:301) > > at > org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189){code} > (4) The null pointer exception here is due to the null {{rpcServer}} object > caused by the bug described above. > You can use the reproduce.sh in the attachment to easily reproduce the bug: > We are happy to provide a patch if this issue is confirmed. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org