[
https://issues.apache.org/jira/browse/HIVE-28798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shohei Okumiya updated HIVE-28798:
----------------------------------
Fix Version/s: 4.2.0
Resolution: Fixed
Status: Resolved (was: Patch Available)
> Bucket Map Join partially using partition transforms
> ----------------------------------------------------
>
> Key: HIVE-28798
> URL: https://issues.apache.org/jira/browse/HIVE-28798
> Project: Hive
> Issue Type: Sub-task
> Components: Iceberg integration
> Reporter: Shohei Okumiya
> Assignee: Shohei Okumiya
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.2.0
>
>
> The current implementation requires all bucket transforms to be projected.
> Unlike Hive's native bucketing, Iceberg allows multiple bucket keys to be
> decomposed into multiple partition transforms. For example,
> {code:java}
> CREATE TABLE srcbucket_big(key1 int, key2 string, value string, id int)
> PARTITIONED BY SPEC(bucket(4, key1), bucket(8, key2)) STORED BY ICEBERG;
> {code}
> Currently, BMJ is applied when both key1 and key2 are used.
> {code:java}
> SELECT a.key1, a.key2, a.id
> FROM srcbucket_big a
> JOIN src_small b ON a.key1 = b.key1 AND a.key2 = b.key2
> ORDER BY a.id; {code}
> Considering the storage layout of Apache Iceberg, the following query can
> also leverage BMJ.
> {code:java}
> SELECT a.key1, a.id
> FROM srcbucket_big a
> JOIN src_small b ON a.key1 = b.key1
> ORDER BY a.id; {code}
> This optimization would be helpful when HIVE-28414 extended the optimization
> to non-bucket transforms, such as daily partitioning that are typically not
> used as a join key.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)