[jira] [Commented] (HIVE-15278) PTF+MergeJoin = NPE
[ https://issues.apache.org/jira/browse/HIVE-15278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710499#comment-15710499 ] Sergey Shelukhin commented on HIVE-15278: - [~hagleitn] ping, does +1 still stand? > PTF+MergeJoin = NPE > --- > > Key: HIVE-15278 > URL: https://issues.apache.org/jira/browse/HIVE-15278 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-15278.patch > > > Manifests as > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.persistence.PTFRowContainer.first(PTFRowContainer.java:115) > at > org.apache.hadoop.hive.ql.exec.PTFPartition.iterator(PTFPartition.java:114) > at > org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:340) > at > org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:114) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343) > ... 29 more > {noformat} > It's actually a somewhat subtle ordering problem in sortmerge - as it stands, > it calls different branches of the tree in closeOp after they themselves have > already been closed. Other operators that clean stuff up in close may result > in different errors. The common pattern is > {noformat} >1125 at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352) >1126 at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274) >1127 at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:404) > ... >1131 at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinFinalLeftData(CommonMergeJoinOperator.java:428) >1132 at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.closeOp(CommonMergeJoinOperator.java:388) >1133 at > org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617) > ... >1139 at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:294) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15278) PTF+MergeJoin = NPE
[ https://issues.apache.org/jira/browse/HIVE-15278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702903#comment-15702903 ] Sergey Shelukhin commented on HIVE-15278: - Yes, we make 2 assumptions: 1) That it won't try to pump more records thru the big table side, which won't work in any way; logically, it makes no sense cause the big table side is the one that's causing the operators to get closed in the first place, so it should be done with all records. 2) Main table side is closed first. That is true now; reduceWork vs mergeWorkList in ReduceRecordProducer. I am not sure if we can add a test. Repro that we have is too specific (and large potentially) for q files and this code is too much of a mess to repro with a unit test. > PTF+MergeJoin = NPE > --- > > Key: HIVE-15278 > URL: https://issues.apache.org/jira/browse/HIVE-15278 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-15278.patch > > > Manifests as > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.persistence.PTFRowContainer.first(PTFRowContainer.java:115) > at > org.apache.hadoop.hive.ql.exec.PTFPartition.iterator(PTFPartition.java:114) > at > org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:340) > at > org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:114) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343) > ... 29 more > {noformat} > It's actually a somewhat subtle ordering problem in sortmerge - as it stands, > it calls different branches of the tree in closeOp after they themselves have > already been closed. Other operators that clean stuff up in close may result > in different errors. The common pattern is > {noformat} >1125 at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352) >1126 at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274) >1127 at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:404) > ... >1131 at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinFinalLeftData(CommonMergeJoinOperator.java:428) >1132 at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.closeOp(CommonMergeJoinOperator.java:388) >1133 at > org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617) > ... >1139 at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:294) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15278) PTF+MergeJoin = NPE
[ https://issues.apache.org/jira/browse/HIVE-15278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702566#comment-15702566 ] Gunther Hagleitner commented on HIVE-15278: --- LGTM +1. This does look like it'd be painful to debug. Is it possible to add a small test to avoid this debug pain for the next person? One thing I'm not completely sure of: The bug is that the join operator is trying to pump records through it's parents after they have been closed. It's doing that to finish the last pending group when the first of it's parents is closed. Your fix finishes the group after the first parent is closed not the last - do you know for a fact that the join operator won't try to push records through that (closed) parent? (I think that's the case because it's the big table side and all remaining records should be from other branches). > PTF+MergeJoin = NPE > --- > > Key: HIVE-15278 > URL: https://issues.apache.org/jira/browse/HIVE-15278 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-15278.patch > > > Manifests as > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.persistence.PTFRowContainer.first(PTFRowContainer.java:115) > at > org.apache.hadoop.hive.ql.exec.PTFPartition.iterator(PTFPartition.java:114) > at > org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:340) > at > org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:114) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343) > ... 29 more > {noformat} > It's actually a somewhat subtle ordering problem in sortmerge - as it stands, > it calls different branches of the tree in closeOp after they themselves have > already been closed. Other operators that clean stuff up in close may result > in different errors. The common pattern is > {noformat} >1125 at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352) >1126 at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274) >1127 at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:404) > ... >1131 at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinFinalLeftData(CommonMergeJoinOperator.java:428) >1132 at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.closeOp(CommonMergeJoinOperator.java:388) >1133 at > org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617) > ... >1139 at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:294) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15278) PTF+MergeJoin = NPE
[ https://issues.apache.org/jira/browse/HIVE-15278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15692201#comment-15692201 ] Hive QA commented on HIVE-15278: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12840352/HIVE-15278.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10733 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] (batchId=91) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] (batchId=90) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2277/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2277/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2277/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12840352 - PreCommit-HIVE-Build > PTF+MergeJoin = NPE > --- > > Key: HIVE-15278 > URL: https://issues.apache.org/jira/browse/HIVE-15278 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-15278.patch > > > Manifests as > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.persistence.PTFRowContainer.first(PTFRowContainer.java:115) > at > org.apache.hadoop.hive.ql.exec.PTFPartition.iterator(PTFPartition.java:114) > at > org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:340) > at > org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:114) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343) > ... 29 more > {noformat} > It's actually a somewhat subtle ordering problem in sortmerge - as it stands, > it calls different branches of the tree in closeOp after they themselves have > already been closed. Other operators that clean stuff up in close may result > in different errors. The common pattern is > {noformat} >1125 at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352) >1126 at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274) >1127 at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:404) > ... >1131 at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinFinalLeftData(CommonMergeJoinOperator.java:428) >1132 at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.closeOp(CommonMergeJoinOperator.java:388) >1133 at > org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617) > ... >1139 at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:294) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)