[
https://issues.apache.org/jira/browse/HIVE-29312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18060761#comment-18060761
]
Wechar commented on HIVE-29312:
-------------------------------
[~zabetak] I test on MySQL, and here is a prod table with 1.74 millions
partitions, the explain result is same:
{code:sql}
mysql> explain extended select * from PARTITIONS where TBL_ID=1616 and
PART_NAME like 'app_id=0/usage_id=0/grass_region=AR/grass_date=2026-02-24/%';
+----+-------------+------------+------------+-------+---------------+-----------+---------+------+---------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key
| key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------+-----------+---------+------+---------+----------+------------------------------------+
| 1 | SIMPLE | PARTITIONS | NULL | range | uniq_part |
uniq_part | 2060 | NULL | 2668150 | 100.00 | Using index condition; Using
where |
+----+-------------+------------+------------+-------+---------------+-----------+---------+------+---------+----------+------------------------------------+
1 row in set, 2 warnings (0.00 sec)
mysql> explain extended select * from PARTITIONS where TBL_ID=1616 and
PART_NAME like 'app_id=0/%' and PART_NAME like '%/usage_id=0/%' and PART_NAME
like '%/grass_region=AR/%' and PART_NAME like '%/grass_date=2026-02-24/%';
+----+-------------+------------+------------+-------+---------------+-----------+---------+------+---------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key
| key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------+-----------+---------+------+---------+----------+------------------------------------+
| 1 | SIMPLE | PARTITIONS | NULL | range | uniq_part |
uniq_part | 2060 | NULL | 2668150 | 100.00 | Using index condition; Using
where |
+----+-------------+------------+------------+-------+---------------+-----------+---------+------+---------+----------+------------------------------------+
1 row in set, 2 warnings (0.00 sec)
{code}
But the actual execution time is different:
{code:sql}
mysql> select * from PARTITIONS where TBL_ID=1616 and PART_NAME like
'app_id=0/%' and PART_NAME like '%/usage_id=0/%' and PART_NAME like
'%/grass_region=AR/%' and PART_NAME like '%/grass_date=2026-02-24/%';
+-----------+-------------+------------------+---------------------------------------------------------------------------------------------+-----------+--------+
110 rows in set (8.28 sec)
mysql> select * from PARTITIONS where TBL_ID=1616 and PART_NAME like
'app_id=0/usage_id=0/grass_region=AR/grass_date=2026-02-24/%';
+-----------+-------------+------------------+---------------------------------------------------------------------------------------------+-----------+--------+
110 rows in set (8.01 sec)
{code}
I think the index length in the explain result is not accurate enough.
> Concatenate equality conditions in AND nodes
> --------------------------------------------
>
> Key: HIVE-29312
> URL: https://issues.apache.org/jira/browse/HIVE-29312
> Project: Hive
> Issue Type: Improvement
> Components: Standalone Metastore
> Affects Versions: 4.1.0
> Reporter: Wechar
> Assignee: Wechar
> Priority: Major
> Labels: pull-request-available
>
> Current partition filtert tree is a binary tree with single condition in the
> leaf node, actually the adjacent AND leaf nodes can be combined together to
> match longer index prefix. For example, a table has partition key: a, b, c,
> d, and a filter is {{a=1 and b=1 and c=1}}, when visit as a binary tree
> *AND(AND(a=1, b=1), c=1)*, the filter is:
> {code:sql}
> PART_NAME like 'a=1/%' and PART_NAME like '%/b=2/%' and PART_NAME like
> '%/c=3/%'
> {code}
> If we combine the adjacent AND conditions, the tree will be *MultiAnd(a=1,
> b=2, c=3)*, aand the filte can be:
> {code:bash}
> PART_NAME like 'a=1/b=2/c=3/%'
> {code}
> Obviously, the latter could match a longer index prefix and is more efficient.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)