[jira] [Commented] (HIVE-29312) Concatenate equality conditions in AND nodes

Stamatis Zampetakis (Jira) Tue, 17 Feb 2026 09:30:08 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-29312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18059120#comment-18059120
 ]


Stamatis Zampetakis commented on HIVE-29312:
--------------------------------------------

[~wechar] Thanks for the additional details. I don't readily see how the 
following transformation can improve index efficiency
{code:sql}
PART_NAME like 'a=1/%' and PART_NAME like '%/b=2/%' and PART_NAME like '%/c=3/%'
PART_NAME like 'a=1/b=2/c=3/%'
{code}
The description says that with the transformation the second expression will 
match a longer index prefix so I would expect to see this somehow in the query 
plan.

I played a bit with the [Postgres TPCDS 30TB metastore 
dump|https://hub.docker.com/repository/docker/zabetak/postgres-tpcds-metastore] 
but the plans I get for the following queries do not indicate a better index 
usage.
{code:sql}
EXPLAIN ANALYZE SELECT * FROM "PARTITIONS" WHERE 
"PART_NAME" LIKE 'dt=2020-05-07/%' AND 
"PART_NAME" LIKE '%/ns=compute-1588357950-c2h8/%' AND 
"PART_NAME" LIKE '%/app=usage-monitor' AND 
"TBL_ID" = 56;
{code}
{noformat}
                                                                                
        QUERY PLAN                                                              
                           
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on "PARTITIONS"  (cost=105.05..459.75 rows=1 width=84) 
(actual time=1.121..1.162 rows=1 loops=1)
   Recheck Cond: ("TBL_ID" = 56)
   Filter: ((("PART_NAME")::text ~~ 'dt=2020-05-07/%'::text) AND 
(("PART_NAME")::text ~~ '%/ns=compute-1588357950-c2h8/%'::text) AND 
(("PART_NAME")::text ~~ '%/app=usage-monitor'::text))
   Rows Removed by Filter: 5434
   Heap Blocks: exact=99
   ->  Bitmap Index Scan on "PARTITIONS_N49"  (cost=0.00..105.05 rows=5435 
width=0) (actual time=0.242..0.242 rows=5435 loops=1)
         Index Cond: ("TBL_ID" = 56)
 Planning Time: 0.190 ms
 Execution Time: 1.189 ms
(9 rows)
{noformat}
{code:sql}
EXPLAIN ANALYZE SELECT * FROM "PARTITIONS" WHERE 
"PART_NAME" LIKE 'dt=2020-05-07/ns=compute-1588357950-c2h8/app=usage-monitor%' 
AND 
"TBL_ID" = 56;
{code}
{noformat}
                                                           QUERY PLAN           
                                                 
---------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on "PARTITIONS"  (cost=105.05..432.58 rows=1 width=84) 
(actual time=0.897..0.933 rows=1 loops=1)
   Recheck Cond: ("TBL_ID" = 56)
   Filter: (("PART_NAME")::text ~~ 
'dt=2020-05-07/ns=compute-1588357950-c2h8/app=usage-monitor%'::text)
   Rows Removed by Filter: 5434
   Heap Blocks: exact=99
   ->  Bitmap Index Scan on "PARTITIONS_N49"  (cost=0.00..105.05 rows=5435 
width=0) (actual time=0.187..0.187 rows=5435 loops=1)
         Index Cond: ("TBL_ID" = 56)
 Planning Time: 0.113 ms
 Execution Time: 0.957 ms
(9 rows)
{noformat}
In both cases the index is accessed based on the TBL_ID; the filter on 
PART_NAME is applied as post-processing on the matching tuples.

The only case where the index access is affected is if the filter on PART_NAME 
is an exact match where the LIKE does not have a REGEX expression:
{code:sql}
EXPLAIN ANALYZE SELECT * FROM "PARTITIONS" WHERE 
"PART_NAME" LIKE 'dt=2020-05-07/ns=compute-1588357950-c2h8/app=usage-monitor' 
AND 
"TBL_ID" = 56;
{code}
{noformat}
                                                           QUERY PLAN           
                                                 
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using "UNIQUEPARTITION" on "PARTITIONS"  (cost=0.41..8.43 rows=1 
width=84) (actual time=0.107..0.109 rows=1 loops=1)
   Index Cond: ((("PART_NAME")::text = 
'dt=2020-05-07/ns=compute-1588357950-c2h8/app=usage-monitor'::text) AND 
("TBL_ID" = 56))
   Filter: (("PART_NAME")::text ~~ 
'dt=2020-05-07/ns=compute-1588357950-c2h8/app=usage-monitor'::text)
 Planning Time: 0.110 ms
 Execution Time: 0.129 ms
(5 rows)
{noformat}
[~wechar] can you please clarify with which DBMS did you run the experiments 
and also outline the impact on the query execution plan similarly to what I 
have done above?

> Concatenate equality conditions in AND nodes
> --------------------------------------------
>
>                 Key: HIVE-29312
>                 URL: https://issues.apache.org/jira/browse/HIVE-29312
>             Project: Hive
>          Issue Type: Improvement
>          Components: Standalone Metastore
>    Affects Versions: 4.1.0
>            Reporter: Wechar
>            Assignee: Wechar
>            Priority: Major
>              Labels: pull-request-available
>
> Current partition filtert tree is a binary tree with single condition in the 
> leaf node, actually the adjacent AND leaf nodes can be combined together to 
> match longer index prefix. For example, a table has partition key: a, b, c, 
> d, and a filter is {{a=1 and b=1 and c=1}}, when visit as a binary tree 
> *AND(AND(a=1, b=1), c=1)*,  the filter is:
> {code:sql}
> PART_NAME like 'a=1/%' and PART_NAME like '%/b=2/%' and PART_NAME like 
> '%/c=3/%'
> {code}
> If we combine the adjacent AND conditions, the tree will be *MultiAnd(a=1, 
> b=2, c=3)*, aand the filte can be:
> {code:bash}
> PART_NAME like 'a=1/b=2/c=3/%'
> {code}
> Obviously, the latter could match a longer index prefix and is more efficient.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-29312) Concatenate equality conditions in AND nodes

Reply via email to