Naresh P R created HIVE-26526:
---------------------------------
Summary: MSCK sync is not removing partitions with special
characters
Key: HIVE-26526
URL: https://issues.apache.org/jira/browse/HIVE-26526
Project: Hive
Issue Type: New Feature
Reporter: Naresh P R
PARTITIONS table were having encoding string & PARTITION_KEY_VALS were having
original string.
{code:java}
hive=> select * from "PARTITION_KEY_VALS" where "PART_ID" IN (46753, 46754,
46755, 46756);
PART_ID | PART_KEY_VAL | INTEGER_IDX
---------+---------------------+-------------
46753 | 2022-02-* | 0
46754 | 2011-03-01 | 0
46755 | 2022-01-* | 0
46756 | 2010-01-01 | 0
hive=> select * from "PARTITIONS" where "TBL_ID" = 23567 ;
PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID |
TBL_ID | WRITE_ID
---------+-------------+------------------+-----------------------+-------+--------+----------
46753 | 0 | 0 | part_date=2022-02-%2A | 70195 |
23567 | 0
46754 | 0 | 0 | part_date=2011-03-01 | 70196 |
23567 | 0
46755 | 0 | 0 | part_date=2022-01-%2A | 70197 |
23567 | 0
46756 | 0 | 0 | part_date=2010-01-01 | 70198 |
23567 | 0
(4 rows){code}
1) DirectSQL has a join condition on PARTITION_KEY_VALS.PART_KEY_VAL =
"2022-02-%2A" at here
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L883
2) Jdo is having filter condition on PARTITIONS.PART_NAME =
"part_date=2022-02-%252A" (ie., 2 times url encoded)
Once from HS2
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java#L353
2nd from HMS
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java#L365]
Above conditions returns 0 partitions, so those are not removed from HMS
metadata.
Attaching repro q file
--
This message was sent by Atlassian Jira
(v8.20.10#820010)