[ 
https://issues.apache.org/jira/browse/HIVE-29312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18060761#comment-18060761
 ] 

Wechar commented on HIVE-29312:
-------------------------------

[~zabetak] I test on MySQL, and here is a prod table with 1.74 millions 
partitions,  the explain result is same:
{code:sql}
mysql> explain extended select * from PARTITIONS where TBL_ID=1616 and 
PART_NAME like 'app_id=0/usage_id=0/grass_region=AR/grass_date=2026-02-24/%';
+----+-------------+------------+------------+-------+---------------+-----------+---------+------+---------+----------+------------------------------------+
| id | select_type | table      | partitions | type  | possible_keys | key      
 | key_len | ref  | rows    | filtered | Extra                              |
+----+-------------+------------+------------+-------+---------------+-----------+---------+------+---------+----------+------------------------------------+
|  1 | SIMPLE      | PARTITIONS | NULL       | range | uniq_part     | 
uniq_part | 2060    | NULL | 2668150 |   100.00 | Using index condition; Using 
where |
+----+-------------+------------+------------+-------+---------------+-----------+---------+------+---------+----------+------------------------------------+
1 row in set, 2 warnings (0.00 sec)

mysql> explain extended select * from PARTITIONS where TBL_ID=1616 and 
PART_NAME like 'app_id=0/%' and PART_NAME like '%/usage_id=0/%' and PART_NAME 
like '%/grass_region=AR/%' and PART_NAME like '%/grass_date=2026-02-24/%';
+----+-------------+------------+------------+-------+---------------+-----------+---------+------+---------+----------+------------------------------------+
| id | select_type | table      | partitions | type  | possible_keys | key      
 | key_len | ref  | rows    | filtered | Extra                              |
+----+-------------+------------+------------+-------+---------------+-----------+---------+------+---------+----------+------------------------------------+
|  1 | SIMPLE      | PARTITIONS | NULL       | range | uniq_part     | 
uniq_part | 2060    | NULL | 2668150 |   100.00 | Using index condition; Using 
where |
+----+-------------+------------+------------+-------+---------------+-----------+---------+------+---------+----------+------------------------------------+
1 row in set, 2 warnings (0.00 sec)
{code}
But the actual execution time is different:
{code:sql}
mysql> select * from PARTITIONS where TBL_ID=1616 and PART_NAME like 
'app_id=0/%' and PART_NAME like '%/usage_id=0/%' and PART_NAME like 
'%/grass_region=AR/%' and PART_NAME like '%/grass_date=2026-02-24/%';
+-----------+-------------+------------------+---------------------------------------------------------------------------------------------+-----------+--------+
110 rows in set (8.28 sec)

mysql> select * from PARTITIONS where TBL_ID=1616 and PART_NAME like 
'app_id=0/usage_id=0/grass_region=AR/grass_date=2026-02-24/%';
+-----------+-------------+------------------+---------------------------------------------------------------------------------------------+-----------+--------+
110 rows in set (8.01 sec)
{code}

I think the index length in the explain result is not accurate enough.


> Concatenate equality conditions in AND nodes
> --------------------------------------------
>
>                 Key: HIVE-29312
>                 URL: https://issues.apache.org/jira/browse/HIVE-29312
>             Project: Hive
>          Issue Type: Improvement
>          Components: Standalone Metastore
>    Affects Versions: 4.1.0
>            Reporter: Wechar
>            Assignee: Wechar
>            Priority: Major
>              Labels: pull-request-available
>
> Current partition filtert tree is a binary tree with single condition in the 
> leaf node, actually the adjacent AND leaf nodes can be combined together to 
> match longer index prefix. For example, a table has partition key: a, b, c, 
> d, and a filter is {{a=1 and b=1 and c=1}}, when visit as a binary tree 
> *AND(AND(a=1, b=1), c=1)*,  the filter is:
> {code:sql}
> PART_NAME like 'a=1/%' and PART_NAME like '%/b=2/%' and PART_NAME like 
> '%/c=3/%'
> {code}
> If we combine the adjacent AND conditions, the tree will be *MultiAnd(a=1, 
> b=2, c=3)*, aand the filte can be:
> {code:bash}
> PART_NAME like 'a=1/b=2/c=3/%'
> {code}
> Obviously, the latter could match a longer index prefix and is more efficient.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to