[ https://issues.apache.org/jira/browse/HIVE-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhihai xu updated HIVE-14303: ----------------------------- Attachment: (was: HIVE-14303.000.patch) > CommonJoinOperator.checkAndGenObject should return directly at CLOSE state to > avoid NPE if ExecReducer.close is called twice. > ----------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-14303 > URL: https://issues.apache.org/jira/browse/HIVE-14303 > Project: Hive > Issue Type: Bug > Reporter: zhihai xu > Assignee: zhihai xu > Fix For: 2.1.0 > > > CommonJoinOperator.checkAndGenObject should return directly at CLOSE state to > avoid NPE if ExecReducer.close is called twice. ExecReducer.close implements > Closeable interface and ExecReducer.close can be called multiple time. We saw > the following NPE which hide the real exception due to this bug. > {code} > Error: java.lang.RuntimeException: Hive Runtime Error while closing > operators: null > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296) > at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284) > ... 8 more > {code} > The code from ReduceTask.runOldReducer: > {code} > reducer.close(); //line 453 > reducer = null; > > out.close(reporter); > out = null; > } finally { > IOUtils.cleanup(LOG, reducer);// line 459 > closeQuietly(out, reporter); > } > {code} > Based on the above stack trace and code, reducer.close() is called twice > because the exception happened when reducer.close() is called for the first > time at line 453, the code exit before reducer was set to null. > NullPointerException is triggered when reducer.close() is called for the > second time in IOUtils.cleanup at line 459. NullPointerException hide the > real exception which happened when reducer.close() is called for the first > time at line 453. > The reason for NPE is: > The first reducer.close called CommonJoinOperator.closeOp which clear > {{storage}} > {code} > Arrays.fill(storage, null); > {code} > the second reduce.close generated NPE due to null {{storage[alias]}} which is > set to null by first reducer.close. > The following reducer log can give more proof: > {code} > 2016-07-14 22:24:51,016 INFO [main] > org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished. closing... > 2016-07-14 22:24:51,016 INFO [main] > org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished. closing... > 2016-07-14 22:24:51,016 INFO [main] > org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0 > 2016-07-14 22:24:51,016 INFO [main] > org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished. closing... > 2016-07-14 22:24:51,016 INFO [main] > org.apache.hadoop.hive.ql.exec.SelectOperator: 2 finished. closing... > 2016-07-14 22:24:51,016 INFO [main] > org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing... > 2016-07-14 22:24:51,016 INFO [main] > org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 finished. closing... > 2016-07-14 22:24:51,016 INFO [main] > org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[4]: records written - > 53466 > 2016-07-14 22:25:11,555 ERROR [main] ExecReducer: Hit error while closing > operators - failing tree > 2016-07-14 22:25:11,649 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.RuntimeException: Hive Runtime Error > while closing operators: null > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296) > at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284) > ... 8 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)