[jira] [Commented] (HIVE-6041) Incorrect task dependency graph for skewed join optimization

Muthu (JIRA) Wed, 26 Feb 2014 18:26:31 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913950#comment-13913950
 ]


Muthu commented on HIVE-6041:
-----------------------------

This patch doesn't seems to work for hive 0.12 for queries with auto MAPJOIN.
set hive.optimize.skewjoin=true; set hive.auto.convert.join=true; SELECT 
ru.userid, SUM(ru.total_count) FROM BIGTABLE ru JOIN SMALLTABLE c on 
c.creative_id = ru.creative_id JOIN placement_dapi p ON p.placement_id = 
c.placement_id WHERE ru.realdate = '2014-01-02' AND ru.userid > 0 GROUP BY 
ru.userid;

Stage-1 is selected by condition resolver.
java.io.FileNotFoundException: java.io.FileNotFoundException: File does not 
exist: 
/tmp/hive-muthu.nivas/tmp/hive-muthu.nivas/hive_2014-02-26_18-17-04_075_3879899075227148508-1/-mr-10002
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96)
        at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58)
        at 
org.apache.hadoop.hdfs.DFSClient.getContentSummary(DFSClient.java:917)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getContentSummary(DistributedFileSystem.java:232)
        at 
org.apache.hadoop.hive.ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask(ConditionalResolverCommonJoin.java:185)
        at 
org.apache.hadoop.hive.ql.plan.ConditionalResolverCommonJoin.getTasks(ConditionalResolverCommonJoin.java:117)
        at 
org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:55)

> Incorrect task dependency graph for skewed join optimization
> ------------------------------------------------------------
>
>                 Key: HIVE-6041
>                 URL: https://issues.apache.org/jira/browse/HIVE-6041
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0
>         Environment: Hadoop 1.0.3
>            Reporter: Adrian Popescu
>            Assignee: Navis
>            Priority: Critical
>             Fix For: 0.13.0
>
>         Attachments: HIVE-6041.1.patch.txt
>
>
> The dependency graph among task stages is incorrect for the skewed join 
> optimized plan. Skewed joins are enabled through "hive.optimize.skewjoin". 
> For the case that skewed keys do not exist, all the tasks following the 
> common join are filtered out at runtime.
> In particular, the conditional task in the optimized plan maintains no 
> dependency with the child tasks of the common join task in the original plan. 
> The conditional task is composed of the map join task which maintains all 
> these dependencies, but for the case the map join task is filtered out (i.e., 
> no skewed keys exist), all these dependencies are lost. Hence, all the other 
> task stages of the query (e.g., move stage which writes down the results into 
> the result table) are skipped.
> The bug resides in "ql/optimizer/physical/GenMRSkewJoinProcessor.java", 
> processSkewJoin() function, immediately after the ConditionalTask is created 
> and its dependencies are set.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HIVE-6041) Incorrect task dependency graph for skewed join optimization

Reply via email to