[jira] [Commented] (HIVE-3467) BucketMapJoinOptimizer should optimize joins on partition columns

Zhenxiao Luo (JIRA) Sat, 29 Sep 2012 22:45:11 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466422#comment-13466422
 ]


Zhenxiao Luo commented on HIVE-3467:
------------------------------------

Currently, BucketMapJoinOptimizer does not keep Partition information in its 
aliasToPartitionBucketNumberMapping and aliasToPartitionBucketFileNamesMapping, 
without information of Partition Columns, could not do the partition aware 
optimization. How about adding Partition info into the map:


-      LinkedHashMap<String, List<Integer>> aliasToPartitionBucketNumberMapping 
=
-          new LinkedHashMap<String, List<Integer>>();
-      LinkedHashMap<String, List<List<String>>> 
aliasToPartitionBucketFileNamesMapping =
-          new LinkedHashMap<String, List<List<String>>>();
+
+      // (alias to <Partition, BucketNumber>)
+      // AND (alias to <Partition, BucketFileNames>)
+      // one pair for each partition
+      // partition key/values info is needed in optimization
+      LinkedHashMap<String, List<Map<Partition, Integer>>>
+        aliasToPartitionBucketNumberMapping =
+        new LinkedHashMap<String, List<Map<Partition, Integer>>>();
+      LinkedHashMap<String, List<Map<Partition, List<String>>>>
+        aliasToPartitionBucketFileNamesMapping =
+        new LinkedHashMap<String, List<Map<Partition, List<String>>>>();

                
> BucketMapJoinOptimizer should optimize joins on partition columns
> -----------------------------------------------------------------
>
>                 Key: HIVE-3467
>                 URL: https://issues.apache.org/jira/browse/HIVE-3467
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.10.0
>            Reporter: Kevin Wilfong
>
> Consider the query:
> SELECT * FROM t1 JOIN t2 on t1.part = t2.part and t1.key = t2.key;
> Where t1 and t2 are partitioned by part and bucketed by key.
> Suppose part take values 1 and 2 and t1 and t2 are bucketed into 2 buckets.
> The bucket map join optimizer will put the first bucket of part=1 and part=2 
> partitions of t2 into the same mapper as that of part=1 partition of t1.  It 
> will do the same for the part=2 partition of t1.
> It could take advantage of the partition values and send the first bucket of 
> only the part=1 partitions of t1 and t2 into one mapper and the first bucket 
> of only the part=2 partitions into another.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3467) BucketMapJoinOptimizer should optimize joins on partition columns

Reply via email to