vikramahuja1001 commented on code in PR #5976: URL: https://github.com/apache/hive/pull/5976#discussion_r2225269017
########## ql/src/test/results/clientpositive/llap/msck_repair_9.q.out: ########## @@ -90,7 +90,7 @@ PREHOOK: Output: default@tbl_y POSTHOOK: query: MSCK REPAIR TABLE tbl_y POSTHOOK: type: MSCK POSTHOOK: Output: default@tbl_y -Partitions not in metastore: tbl_y:month=12/day=2 tbl_y:month=12/day=3 tbl_y:month=12/day=__HIVE_DEFAULT_PARTITION__ tbl_y:month=ANOTHER_PARTITION/day=3 +Partitions not in metastore: tbl_y:month=12/day=2 tbl_y:month=12/day=3 tbl_y:month=ANOTHER_PARTITION/day=3 Review Comment: Hey, I have gone through the discussion in the thread [PR5568](https://github.com/apache/hive/pull/5568#discussion_r1871258402) . Have gone through both the scenarios that you have mentioned in that discussion. I believe under no circumstance the user should be allowed to add arbitrary strings for non string partition values by using MSCK repair it can lead to unsafe comparisons down the line. Only DEFAULT PARTITION VALUE should be considered at a time. If not, this leads to various issues like: 1. **Type Integrity Violation:** The core expectation in Hive is that partition values must adhere to the column's defined type. Allowing arbitrary strings in a BIGINT/INT or any other non string column undermines this contract. 2. Downstream Failures: Operations downstream (like queries, filters, and metadata tools) may fail or behave unpredictably when encountering such invalid partition values opening a pandora box of further issues. For instance: [HIVE-29100](https://issues.apache.org/jira/browse/HIVE-29100). We will need to specifically handle such cases now. 3. Silent Corruption Risk: Since MSCK is a metadata recovery tool, silently accepting bad values gives a false sense of correctness, potentially leading to hard-to-debug issues in the future. 4. Inconsistent Behavior Across Interfaces: Hive does not allow such partitions to be added via standard DDL operations (e.g., ALTER TABLE ADD PARTITION) due to strict type checking. This inconsistency between interfaces (DDL vs MSCK) can confuse users and lead to misconfiguration. We should allow partitions under hive.exec.default.partition.name property and thus a basic validation like is must and the property` hive.msck.path.validation` can decide whether to throw an error for such cases or just log and skip those arbitrary partitions. The config ` hive.msck.path.validation=skip` can be set as session level. By default I have kept this value as `throw`. No need to set anything at cluster level. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org