[ 
https://issues.apache.org/jira/browse/HIVE-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401656#comment-15401656
 ] 

zhihai xu commented on HIVE-14303:
----------------------------------

Attached the stack trace which prove checkAndGenObject is called by Derived 
class(SMBMapJoinOperator)'s closeOp
{code}
        at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:686)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.joinObject(SMBMapJoinOperator.java:414)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.joinOneGroup(SMBMapJoinOperator.java:383)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.joinFinalLeftData(SMBMapJoinOperator.java:357)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.closeOp(SMBMapJoinOperator.java:625)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:683) 
[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697) 
[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697) 
[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697) 
[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697) 
[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189) 
[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) 
[hadoop-mapreduce-client-core-2.6.1.jar:?]
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) 
[hadoop-mapreduce-client-core-2.6.1.jar:?]
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
[hadoop-mapreduce-client-core-2.6.1.jar:?]
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
 [hadoop-mapreduce-client-common-2.6.1.jar:?]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
[?:1.7.0_79]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_79]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[?:1.7.0_79]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[?:1.7.0_79]
        at java.lang.Thread.run(Thread.java:745) [?:1.7.0_79]
{code}

> CommonJoinOperator.checkAndGenObject should return directly at CLOSE state to 
> avoid NPE if ExecReducer.close is called twice.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-14303
>                 URL: https://issues.apache.org/jira/browse/HIVE-14303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>             Fix For: 2.2.0
>
>         Attachments: HIVE-14303.0.patch, HIVE-14303.1.patch
>
>
> CommonJoinOperator.checkAndGenObject should return directly at CLOSE state to 
> avoid NPE if ExecReducer.close is called twice. ExecReducer.close implements 
> Closeable interface and ExecReducer.close can be called multiple time. We saw 
> the following NPE which hide the real exception due to this bug.
> {code}
> Error: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators: null
>         at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296)
>         at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
>         at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718)
>         at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
>         at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284)
>         ... 8 more
> {code}
> The code from ReduceTask.runOldReducer:
> {code}
>       reducer.close(); //line 453
>       reducer = null;
>       
>       out.close(reporter);
>       out = null;
>     } finally {
>       IOUtils.cleanup(LOG, reducer);// line 459
>       closeQuietly(out, reporter);
>     }
> {code}
> Based on the above stack trace and code, reducer.close() is called twice 
> because the exception happened when reducer.close() is called for the first 
> time at line 453, the code exit before reducer was set to null. 
> NullPointerException is triggered when reducer.close() is called for the 
> second time in IOUtils.cleanup at line 459. NullPointerException hide the 
> real exception which happened when reducer.close() is called for the first 
> time at line 453.
> The reason for NPE is:
> The first reducer.close called CommonJoinOperator.closeOp which clear 
> {{storage}}
> {code}
> Arrays.fill(storage, null);
> {code}
> the second reduce.close generated NPE due to null {{storage[alias]}} which is 
> set to null by first reducer.close.
> The following reducer log can give more proof:
> {code}
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.SelectOperator: 2 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[4]: records written - 
> 53466
> 2016-07-14 22:25:11,555 ERROR [main] ExecReducer: Hit error while closing 
> operators - failing tree
> 2016-07-14 22:25:11,649 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.RuntimeException: Hive Runtime Error 
> while closing operators: null
>       at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296)
>       at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>       at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718)
>       at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
>       at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284)
>       ... 8 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to