[
https://issues.apache.org/jira/browse/HIVE-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897608#comment-13897608
]
Harish Butani commented on HIVE-6403:
-------------------------------------
[~navis] I am no expert on the MapJoinProcessor. Following is what I see; I
will need to spend more time on this.
Maybe from my comments you can see the issue.
1. The Plan generated at genPlan is:
TS[0] (the scan for b) has 2 child operators: [RS[4], RS[25]]
These are for the joins for each of the SubQuery expressions:
b.key in
(select a.key
from src a
where b.value = a.value and a.key > '9'
)
and
b.key not in ( select key from src s1 where s1.key > '2')
The plan looks complex because the handling of not in requires the null check.
This issue will occur even if the second insert is a 'in' subquery predicate.
It will be easier to follow for such an e.g.
2. With set hive.auto.convert.join=false
The second RS gets converted to a FileSink. You can observe this from the
explain output. A subsequent Stage reads this intermediate output to perform
the processing for the 2nd SubQuery.
3. With set hive.auto.convert.join=true;
When it comes to CommonJoinResolver the TS[0] has children [RS[4], FS[44]] ie
the 2nd ReduceSink is converted to a FileSink
The MapJoinProcessor:genMapJoinLocalWork line 145 it is assuming that a
TableScanOp can only have 1 child.
The fix maybe to ignore any FileSink operators that are children of TableScan.
Another test to add is a multi insert on 3 tables.
> uncorrelated subquery is failing with auto.convert.join=true
> ------------------------------------------------------------
>
> Key: HIVE-6403
> URL: https://issues.apache.org/jira/browse/HIVE-6403
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Navis
>
> Fixing HIVE-5690, I've found query in subquery_multiinsert.q is not working
> with hive.auto.convert.join=true
> {noformat}
> set hive.auto.convert.join=true;
> hive> explain
> > from src b
> > INSERT OVERWRITE TABLE src_4
> > select *
> > where b.key in
> > (select a.key
> > from src a
> > where b.value = a.value and a.key > '9'
> > )
> > INSERT OVERWRITE TABLE src_5
> > select *
> > where b.key not in ( select key from src s1 where s1.key > '2')
> > order by key
> > ;
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at
> org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.genMapJoinLocalWork(MapJoinProcessor.java:149)
> at
> org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.genLocalWorkForMapJoin(MapJoinProcessor.java:256)
> at
> org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.genMapJoinOpAndLocalWork(MapJoinProcessor.java:248)
> at
> org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.convertTaskToMapJoinTask(CommonJoinTaskDispatcher.java:191)
> at
> org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.processCurrentTask(CommonJoinTaskDispatcher.java:481)
> at
> org.apache.hadoop.hive.ql.optimizer.physical.AbstractJoinTaskDispatcher.dispatch(AbstractJoinTaskDispatcher.java:182)
> at
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
> at
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194)
> at
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139)
> at
> org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:79)
> at
> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:100)
> at
> org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:290)
> at
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:216)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9167)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
> at
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:446)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:346)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1056)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1099)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:992)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
> at
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:687)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> org.apache.hadoop.hive.ql.parse.SemanticException: Failed to generate new
> mapJoin operator by exception : Index: 0, Size: 0
> at
> org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.genLocalWorkForMapJoin(MapJoinProcessor.java:266)
> at
> org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.genMapJoinOpAndLocalWork(MapJoinProcessor.java:248)
> at
> org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.convertTaskToMapJoinTask(CommonJoinTaskDispatcher.java:191)
> at
> org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.processCurrentTask(CommonJoinTaskDispatcher.java:481)
> at
> org.apache.hadoop.hive.ql.optimizer.physical.AbstractJoinTaskDispatcher.dispatch(AbstractJoinTaskDispatcher.java:182)
> at
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
> at
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194)
> at
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139)
> at
> org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:79)
> at
> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:100)
> at
> org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:290)
> at
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:216)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9167)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
> at
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:446)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:346)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1056)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1099)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:992)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
> at
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:687)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> FAILED: SemanticException Generate Map Join Task Error: Failed to generate
> new mapJoin operator by exception : Index: 0, Size: 0
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)