vikramahuja1001 commented on code in PR #5976:
URL: https://github.com/apache/hive/pull/5976#discussion_r2225269017


##########
ql/src/test/results/clientpositive/llap/msck_repair_9.q.out:
##########
@@ -90,7 +90,7 @@ PREHOOK: Output: default@tbl_y
 POSTHOOK: query: MSCK REPAIR TABLE tbl_y
 POSTHOOK: type: MSCK
 POSTHOOK: Output: default@tbl_y
-Partitions not in metastore:   tbl_y:month=12/day=2    tbl_y:month=12/day=3    
tbl_y:month=12/day=__HIVE_DEFAULT_PARTITION__   
tbl_y:month=ANOTHER_PARTITION/day=3
+Partitions not in metastore:   tbl_y:month=12/day=2    tbl_y:month=12/day=3    
tbl_y:month=ANOTHER_PARTITION/day=3

Review Comment:
   Hey, I have gone through the discussion in the thread 
[PR5568](https://github.com/apache/hive/pull/5568#discussion_r1871258402) . 
   
   Have gone through both the scenarios that you have mentioned in that 
discussion.  I believe under no circumstance the user should be allowed to add 
arbitrary strings for non string partition values by using MSCK repair it can 
lead to unsafe comparisons down the line. Only DEFAULT PARTITION VALUE should 
be considered at a time. If not, this leads to various issues like:
   1. **Type Integrity Violation:**
   The core expectation in Hive is that partition values must adhere to the 
column's defined type. Allowing arbitrary strings in a BIGINT/INT or any other 
non string column undermines this contract.
   2. Downstream Failures:
   Operations downstream (like queries, filters, and metadata tools) may fail 
or behave unpredictably when encountering such invalid partition values opening 
a pandora box of further issues. For instance: 
[HIVE-29100](https://issues.apache.org/jira/browse/HIVE-29100). We will need to 
specifically handle such cases now.
   3. Silent Corruption Risk:
   Since MSCK is a metadata recovery tool, silently accepting bad values gives 
a false sense of correctness, potentially leading to hard-to-debug issues in 
the future.
   4. Inconsistent Behavior Across Interfaces:
   Hive does not allow such partitions to be added via standard DDL operations 
(e.g., ALTER TABLE ADD PARTITION) due to strict type checking. This 
inconsistency between interfaces (DDL vs MSCK) can confuse users and lead to 
misconfiguration.
   
   We should allow partitions under hive.exec.default.partition.name property 
and thus a basic validation like is must and the property` 
hive.msck.path.validation` can decide whether to throw an error for such cases 
or just log and skip those arbitrary partitions.
   
   The config ` hive.msck.path.validation=skip` can be set as session level. By 
default I have kept this value as `throw`. No need to set anything at cluster 
level. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to