[ 
https://issues.apache.org/jira/browse/HIVE-28798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shohei Okumiya updated HIVE-28798:
----------------------------------
    Fix Version/s: 4.2.0
       Resolution: Fixed
           Status: Resolved  (was: Patch Available)

> Bucket Map Join partially using partition transforms
> ----------------------------------------------------
>
>                 Key: HIVE-28798
>                 URL: https://issues.apache.org/jira/browse/HIVE-28798
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Iceberg integration
>            Reporter: Shohei Okumiya
>            Assignee: Shohei Okumiya
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.2.0
>
>
> The current implementation requires all bucket transforms to be projected. 
> Unlike Hive's native bucketing, Iceberg allows multiple bucket keys to be 
> decomposed into multiple partition transforms. For example,
> {code:java}
> CREATE TABLE srcbucket_big(key1 int, key2 string, value string, id int)
> PARTITIONED BY SPEC(bucket(4, key1), bucket(8, key2)) STORED BY ICEBERG; 
> {code}
> Currently, BMJ is applied when both key1 and key2 are used.
> {code:java}
> SELECT a.key1, a.key2, a.id
> FROM srcbucket_big a
> JOIN src_small b ON a.key1 = b.key1 AND a.key2 = b.key2
> ORDER BY a.id; {code}
> Considering the storage layout of Apache Iceberg, the following query can 
> also leverage BMJ.
> {code:java}
> SELECT a.key1, a.id
> FROM srcbucket_big a
> JOIN src_small b ON a.key1 = b.key1
> ORDER BY a.id; {code}
> This optimization would be helpful when HIVE-28414 extended the optimization 
> to non-bucket transforms, such as daily partitioning that are typically not 
> used as a join key.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to