Copilot commented on code in PR #6271:
URL: https://github.com/apache/hive/pull/6271#discussion_r2809556536


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java:
##########
@@ -1610,41 +1613,65 @@ public static Partition getPartition(IMetaStoreClient 
msc, Table tbl, Map<String
    * @return Partition name, for example partitiondate=2008-01-01
    */
   public static String getPartitionName(Path tablePath, Path partitionPath, 
Set<String> partCols,
-                                        Map<String, String> 
partitionColToTypeMap, Configuration conf) {
+                                        Map<String, String> 
partitionColToTypeMap, Configuration conf)
+          throws MetastoreException {
     StringBuilder result = null;
+    String customPattern = conf.get(HCAT_CUSTOM_DYNAMIC_PATTERN);
     Path currPath = partitionPath;
     LOG.debug("tablePath:" + tablePath + ", partCols: " + partCols);
-
-    while (currPath != null && !tablePath.equals(currPath)) {
-      // format: partition=p_val
-      // Add only when table partition colName matches
-      String[] parts = currPath.getName().split("=");
-      if (parts.length > 0) {
-        if (parts.length != 2) {
-          LOG.warn(currPath.getName() + " is not a valid partition name");
-          return result.toString();
-        }
-
-        // Since hive stores partitions keys in lower case, if the hdfs path 
contains mixed case,
-        // it should be converted to lower case
-        String partitionName = parts[0].toLowerCase();
-        // Do not convert the partitionValue to lowercase
-        String partitionValue = parts[1];
-        if (partCols.contains(partitionName)) {
-          String normalisedPartitionValue = 
getNormalisedPartitionValue(partitionValue,
-                  partitionColToTypeMap.get(partitionName), conf);
-          if (normalisedPartitionValue == null) {
-            return null;
+    if (customPattern != null) {
+      DynamicPartitioningCustomPattern compiledCustomPattern = new 
DynamicPartitioningCustomPattern.Builder()
+              .setCustomPattern(customPattern)
+              .build();
+      Pattern customPathPattern = 
compiledCustomPattern.getPartitionCapturePattern();
+      //partition columns in order that they appear in the pattern
+      List<String> patternPartCols = 
compiledCustomPattern.getPartitionColumns();
+      //relative path starts after tablepath and the / afterwards
+      String relPath = 
partitionPath.toString().substring(tablePath.toString().length() + 1);
+      Matcher pathMatcher = customPathPattern.matcher(relPath);
+      boolean didMatch = pathMatcher.matches();
+      if (!didMatch) { //partition path doesn't match the pattern, should have 
been detected at an earlier step
+        throw new MetastoreException("Path " + relPath + "doesn't match custom 
partition pattern "
+                + customPathPattern + "partitionPathFull: " + partitionPath);
+      }
+      if (!patternPartCols.isEmpty()) {
+        result = new StringBuilder(patternPartCols.get(0) + "=" + 
pathMatcher.group(1));
+      }
+      for (int i = 1; i < patternPartCols.size(); i++) {
+        
result.append(Path.SEPARATOR).append(patternPartCols.get(i)).append("=").append(pathMatcher.group(i+1));

Review Comment:
   The partition name is constructed using the order of columns as they appear 
in the custom pattern, but Hive expects partition names to follow the order 
defined in the table's partition columns. For example, if the table has 
partition columns (partcity, partstate, partid) but the custom pattern is 
"const/${partstate}/const2/${partid}-${partcity}", the code creates 
"partstate=CO/partid=1/partcity=denver" instead of the required 
"partcity=denver/partstate=CO/partid=1". The extracted values should be 
reordered to match the table's partition column order before constructing the 
partition name.
   ```suggestion
         // Map captured values by column name as they appear in the pattern.
         Map<String, String> capturedValues = new HashMap<>();
         for (int i = 0; i < patternPartCols.size(); i++) {
           String colName = patternPartCols.get(i);
           String value = pathMatcher.group(i + 1);
           capturedValues.put(colName, value);
         }
         // Rebuild the partition name in the order of the table's partition 
columns.
         for (String colName : partCols) {
           String value = capturedValues.get(colName);
           if (value == null) {
             continue;
           }
           if (result == null) {
             result = new StringBuilder(colName + "=" + value);
           } else {
             
result.append(Path.SEPARATOR).append(colName).append("=").append(value);
           }
   ```



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java:
##########
@@ -1610,41 +1613,65 @@ public static Partition getPartition(IMetaStoreClient 
msc, Table tbl, Map<String
    * @return Partition name, for example partitiondate=2008-01-01
    */
   public static String getPartitionName(Path tablePath, Path partitionPath, 
Set<String> partCols,
-                                        Map<String, String> 
partitionColToTypeMap, Configuration conf) {
+                                        Map<String, String> 
partitionColToTypeMap, Configuration conf)
+          throws MetastoreException {
     StringBuilder result = null;
+    String customPattern = conf.get(HCAT_CUSTOM_DYNAMIC_PATTERN);
     Path currPath = partitionPath;
     LOG.debug("tablePath:" + tablePath + ", partCols: " + partCols);
-
-    while (currPath != null && !tablePath.equals(currPath)) {
-      // format: partition=p_val
-      // Add only when table partition colName matches
-      String[] parts = currPath.getName().split("=");
-      if (parts.length > 0) {
-        if (parts.length != 2) {
-          LOG.warn(currPath.getName() + " is not a valid partition name");
-          return result.toString();
-        }
-
-        // Since hive stores partitions keys in lower case, if the hdfs path 
contains mixed case,
-        // it should be converted to lower case
-        String partitionName = parts[0].toLowerCase();
-        // Do not convert the partitionValue to lowercase
-        String partitionValue = parts[1];
-        if (partCols.contains(partitionName)) {
-          String normalisedPartitionValue = 
getNormalisedPartitionValue(partitionValue,
-                  partitionColToTypeMap.get(partitionName), conf);
-          if (normalisedPartitionValue == null) {
-            return null;
+    if (customPattern != null) {
+      DynamicPartitioningCustomPattern compiledCustomPattern = new 
DynamicPartitioningCustomPattern.Builder()
+              .setCustomPattern(customPattern)
+              .build();
+      Pattern customPathPattern = 
compiledCustomPattern.getPartitionCapturePattern();
+      //partition columns in order that they appear in the pattern
+      List<String> patternPartCols = 
compiledCustomPattern.getPartitionColumns();
+      //relative path starts after tablepath and the / afterwards
+      String relPath = 
partitionPath.toString().substring(tablePath.toString().length() + 1);
+      Matcher pathMatcher = customPathPattern.matcher(relPath);
+      boolean didMatch = pathMatcher.matches();
+      if (!didMatch) { //partition path doesn't match the pattern, should have 
been detected at an earlier step
+        throw new MetastoreException("Path " + relPath + "doesn't match custom 
partition pattern "
+                + customPathPattern + "partitionPathFull: " + partitionPath);

Review Comment:
   The error message is missing proper spacing. It should include a space 
before "partitionPathFull:" for better readability.
   ```suggestion
                   + customPathPattern + " partitionPathFull: " + 
partitionPath);
   ```



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java:
##########
@@ -1610,41 +1613,65 @@ public static Partition getPartition(IMetaStoreClient 
msc, Table tbl, Map<String
    * @return Partition name, for example partitiondate=2008-01-01
    */
   public static String getPartitionName(Path tablePath, Path partitionPath, 
Set<String> partCols,
-                                        Map<String, String> 
partitionColToTypeMap, Configuration conf) {
+                                        Map<String, String> 
partitionColToTypeMap, Configuration conf)
+          throws MetastoreException {
     StringBuilder result = null;
+    String customPattern = conf.get(HCAT_CUSTOM_DYNAMIC_PATTERN);
     Path currPath = partitionPath;
     LOG.debug("tablePath:" + tablePath + ", partCols: " + partCols);
-
-    while (currPath != null && !tablePath.equals(currPath)) {
-      // format: partition=p_val
-      // Add only when table partition colName matches
-      String[] parts = currPath.getName().split("=");
-      if (parts.length > 0) {
-        if (parts.length != 2) {
-          LOG.warn(currPath.getName() + " is not a valid partition name");
-          return result.toString();
-        }
-
-        // Since hive stores partitions keys in lower case, if the hdfs path 
contains mixed case,
-        // it should be converted to lower case
-        String partitionName = parts[0].toLowerCase();
-        // Do not convert the partitionValue to lowercase
-        String partitionValue = parts[1];
-        if (partCols.contains(partitionName)) {
-          String normalisedPartitionValue = 
getNormalisedPartitionValue(partitionValue,
-                  partitionColToTypeMap.get(partitionName), conf);
-          if (normalisedPartitionValue == null) {
-            return null;
+    if (customPattern != null) {
+      DynamicPartitioningCustomPattern compiledCustomPattern = new 
DynamicPartitioningCustomPattern.Builder()
+              .setCustomPattern(customPattern)
+              .build();
+      Pattern customPathPattern = 
compiledCustomPattern.getPartitionCapturePattern();
+      //partition columns in order that they appear in the pattern
+      List<String> patternPartCols = 
compiledCustomPattern.getPartitionColumns();

Review Comment:
   There is no validation that the partition columns extracted from the custom 
pattern match the table's partition columns (provided in `partCols`). The code 
should verify that: (1) all columns in `patternPartCols` exist in the table's 
partition columns, and (2) the number of columns matches. Without this 
validation, the code could create invalid partition names with columns that 
don't exist in the table definition or with mismatched column counts.



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java:
##########
@@ -593,7 +608,17 @@ private Path processPathDepthInfo(final PathDepthInfo pd)
         throws IOException, MetastoreException {
       final Path currentPath = pd.p;
       final int currentDepth = pd.depth;
-      if (currentDepth == partColNames.size()) {
+      if (currentDepth == maxDepth) {
+        if (compiledCustomPattern != null) { //validate path against pattern
+          Pattern fullPattern = 
compiledCustomPattern.getPartitionCapturePattern();
+          String[] parts = currentPath.toString().split(Path.SEPARATOR);
+          String relPath = String.join(Path.SEPARATOR,
+                  Arrays.copyOfRange(parts, parts.length - maxDepth, 
parts.length));
+          if (!fullPattern.matcher(relPath).matches()) {
+            logOrThrowExceptionWithMsg("File path " + currentPath + " does not 
match custom partitioning pattern"

Review Comment:
   The error message is missing a space between the pattern and 
"partitionPathFull". It should be "does not match custom partitioning pattern " 
instead of "does not match custom partitioning pattern" to improve readability.
   ```suggestion
               logOrThrowExceptionWithMsg("File path " + currentPath + " does 
not match custom partitioning pattern "
   ```



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java:
##########
@@ -1610,41 +1613,65 @@ public static Partition getPartition(IMetaStoreClient 
msc, Table tbl, Map<String
    * @return Partition name, for example partitiondate=2008-01-01
    */
   public static String getPartitionName(Path tablePath, Path partitionPath, 
Set<String> partCols,
-                                        Map<String, String> 
partitionColToTypeMap, Configuration conf) {
+                                        Map<String, String> 
partitionColToTypeMap, Configuration conf)
+          throws MetastoreException {
     StringBuilder result = null;
+    String customPattern = conf.get(HCAT_CUSTOM_DYNAMIC_PATTERN);
     Path currPath = partitionPath;
     LOG.debug("tablePath:" + tablePath + ", partCols: " + partCols);
-
-    while (currPath != null && !tablePath.equals(currPath)) {
-      // format: partition=p_val
-      // Add only when table partition colName matches
-      String[] parts = currPath.getName().split("=");
-      if (parts.length > 0) {
-        if (parts.length != 2) {
-          LOG.warn(currPath.getName() + " is not a valid partition name");
-          return result.toString();
-        }
-
-        // Since hive stores partitions keys in lower case, if the hdfs path 
contains mixed case,
-        // it should be converted to lower case
-        String partitionName = parts[0].toLowerCase();
-        // Do not convert the partitionValue to lowercase
-        String partitionValue = parts[1];
-        if (partCols.contains(partitionName)) {
-          String normalisedPartitionValue = 
getNormalisedPartitionValue(partitionValue,
-                  partitionColToTypeMap.get(partitionName), conf);
-          if (normalisedPartitionValue == null) {
-            return null;
+    if (customPattern != null) {
+      DynamicPartitioningCustomPattern compiledCustomPattern = new 
DynamicPartitioningCustomPattern.Builder()
+              .setCustomPattern(customPattern)
+              .build();
+      Pattern customPathPattern = 
compiledCustomPattern.getPartitionCapturePattern();
+      //partition columns in order that they appear in the pattern
+      List<String> patternPartCols = 
compiledCustomPattern.getPartitionColumns();
+      //relative path starts after tablepath and the / afterwards
+      String relPath = 
partitionPath.toString().substring(tablePath.toString().length() + 1);
+      Matcher pathMatcher = customPathPattern.matcher(relPath);
+      boolean didMatch = pathMatcher.matches();
+      if (!didMatch) { //partition path doesn't match the pattern, should have 
been detected at an earlier step
+        throw new MetastoreException("Path " + relPath + "doesn't match custom 
partition pattern "
+                + customPathPattern + "partitionPathFull: " + partitionPath);
+      }
+      if (!patternPartCols.isEmpty()) {
+        result = new StringBuilder(patternPartCols.get(0) + "=" + 
pathMatcher.group(1));
+      }
+      for (int i = 1; i < patternPartCols.size(); i++) {
+        
result.append(Path.SEPARATOR).append(patternPartCols.get(i)).append("=").append(pathMatcher.group(i+1));

Review Comment:
   Partition column names extracted from the custom pattern are not converted 
to lowercase, unlike the standard path (line 1656). Hive stores partition keys 
in lowercase, so the extracted column names from `patternPartCols.get(i)` 
should be lowercased before being used in the result to maintain consistency 
with the standard partition name handling.
   ```suggestion
           String partitionName = patternPartCols.get(0).toLowerCase();
           result = new StringBuilder(partitionName + "=" + 
pathMatcher.group(1));
         }
         for (int i = 1; i < patternPartCols.size(); i++) {
           String partitionName = patternPartCols.get(i).toLowerCase();
           
result.append(Path.SEPARATOR).append(partitionName).append("=").append(pathMatcher.group(i
 + 1));
   ```



##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java:
##########
@@ -142,6 +150,15 @@ public static 
org.apache.hadoop.hive.metastore.api.Partition createMetaPartition
 
     if (!tbl.isView()) {
       tpart.setSd(tbl.getSd().deepCopy());
+      if (location == null && customPattern != null) { //should only happen 
for msck repair table with custom pattern
+        Matcher m = customPathPattern.matcher(customPattern);
+        StringBuffer sb = new StringBuffer();
+        while (m.find()) {
+          m.appendReplacement(sb, partSpec.get(m.group(2)));

Review Comment:
   The code `partSpec.get(m.group(2))` can return null if the partition spec 
doesn't contain a value for the column name extracted from the pattern. This 
would result in `appendReplacement` inserting "null" as a string into the path. 
Add a null check and throw an appropriate exception if a required partition 
column is missing from the spec.
   ```suggestion
             String partColName = m.group(2);
             String partValue = partSpec.get(partColName);
             if (partValue == null) {
               throw new HiveException("partition spec is invalid; required 
partition column '" 
                   + partColName + "' is missing for custom pattern: " + 
customPattern);
             }
             m.appendReplacement(sb, partValue);
   ```



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/DynamicPartitioningCustomPattern.java:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.utils;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+/**
+ * When using msck repair table with custom partitioning patterns, we need to 
capture
+ * the partition keys from the pattern, and use those to construct a regex 
which will
+ * match the paths and extract the partition key values.*/
+public final class DynamicPartitioningCustomPattern {
+
+  private final String customPattern;
+  private final Pattern partitionCapturePattern;
+  private final List<String> partitionColumns;
+
+    // regex of the form: ${column name}. Following characters are not allowed 
in column name:
+    // whitespace characters, /, {, }, \
+  public static final Pattern customPathPattern = 
Pattern.compile("(\\$\\{)([^\\s/\\{\\}\\\\]+)(\\})");
+
+  private DynamicPartitioningCustomPattern(String customPattern, Pattern 
partitionCapturePattern,
+                                           List<String> partitionColumns) {
+    this.customPattern = customPattern;
+    this.partitionCapturePattern = partitionCapturePattern;
+    this.partitionColumns = partitionColumns;
+  }
+
+  /**
+   * @return stored custom pattern string
+   * */
+  public String getCustomPattern() {
+    return customPattern;
+  }
+
+  /**
+   * @return stored custom pattern regex matcher
+   * */
+  public Pattern getPartitionCapturePattern() {
+    return partitionCapturePattern;
+  }
+
+  /**
+   * @return list of partition key columns
+   * */
+  public List<String> getPartitionColumns() {
+    return partitionColumns;
+  }
+
+  public static class Builder {
+    private String customPattern;
+
+    public Builder setCustomPattern(String customPattern) {
+      this.customPattern = customPattern;
+      return this;
+    }
+
+    /**
+     * Constructs the regex to match the partition values in a path based on 
the custom pattern.
+     *
+     * @return custom partition pattern matcher */
+    public DynamicPartitioningCustomPattern build() {
+      StringBuffer sb = new StringBuffer();
+      Matcher m = customPathPattern.matcher(customPattern);
+      List<String> partColumns = new ArrayList<>();
+      while (m.find()) {
+        m.appendReplacement(sb, "(.*)");
+        partColumns.add(m.group(2));
+      }
+      m.appendTail(sb);
+      return new DynamicPartitioningCustomPattern(customPattern, 
Pattern.compile(sb.toString()), partColumns);

Review Comment:
   The custom pattern is compiled as a regex without escaping special regex 
characters. If the pattern contains characters like `.`, `*`, `+`, `?`, etc., 
they will be interpreted as regex metacharacters rather than literals. For 
example, `const./${partstate}` would match `constX/CO` where X is any 
character, not just `const./CO`. Consider using `Pattern.quote()` to escape the 
literal portions of the pattern between the `${}` placeholders, or document 
that the pattern supports regex metacharacters.
   ```suggestion
         StringBuilder sb = new StringBuilder();
         Matcher m = customPathPattern.matcher(customPattern);
         List<String> partColumns = new ArrayList<>();
         int lastEnd = 0;
         while (m.find()) {
           // Append the literal text before this placeholder, escaped so regex 
metacharacters are treated literally.
           if (m.start() > lastEnd) {
             sb.append(Pattern.quote(customPattern.substring(lastEnd, 
m.start())));
           }
           // Replace the placeholder itself with a capturing group to extract 
the partition value.
           sb.append("(.*)");
           partColumns.add(m.group(2));
           lastEnd = m.end();
         }
         // Append any remaining literal text after the last placeholder.
         if (lastEnd < customPattern.length()) {
           sb.append(Pattern.quote(customPattern.substring(lastEnd)));
         }
         Pattern partitionCapturePattern = Pattern.compile(sb.toString());
         return new DynamicPartitioningCustomPattern(customPattern, 
partitionCapturePattern, partColumns);
   ```



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java:
##########
@@ -1610,41 +1613,65 @@ public static Partition getPartition(IMetaStoreClient 
msc, Table tbl, Map<String
    * @return Partition name, for example partitiondate=2008-01-01
    */
   public static String getPartitionName(Path tablePath, Path partitionPath, 
Set<String> partCols,
-                                        Map<String, String> 
partitionColToTypeMap, Configuration conf) {
+                                        Map<String, String> 
partitionColToTypeMap, Configuration conf)
+          throws MetastoreException {
     StringBuilder result = null;
+    String customPattern = conf.get(HCAT_CUSTOM_DYNAMIC_PATTERN);
     Path currPath = partitionPath;
     LOG.debug("tablePath:" + tablePath + ", partCols: " + partCols);
-
-    while (currPath != null && !tablePath.equals(currPath)) {
-      // format: partition=p_val
-      // Add only when table partition colName matches
-      String[] parts = currPath.getName().split("=");
-      if (parts.length > 0) {
-        if (parts.length != 2) {
-          LOG.warn(currPath.getName() + " is not a valid partition name");
-          return result.toString();
-        }
-
-        // Since hive stores partitions keys in lower case, if the hdfs path 
contains mixed case,
-        // it should be converted to lower case
-        String partitionName = parts[0].toLowerCase();
-        // Do not convert the partitionValue to lowercase
-        String partitionValue = parts[1];
-        if (partCols.contains(partitionName)) {
-          String normalisedPartitionValue = 
getNormalisedPartitionValue(partitionValue,
-                  partitionColToTypeMap.get(partitionName), conf);
-          if (normalisedPartitionValue == null) {
-            return null;
+    if (customPattern != null) {
+      DynamicPartitioningCustomPattern compiledCustomPattern = new 
DynamicPartitioningCustomPattern.Builder()
+              .setCustomPattern(customPattern)
+              .build();
+      Pattern customPathPattern = 
compiledCustomPattern.getPartitionCapturePattern();
+      //partition columns in order that they appear in the pattern
+      List<String> patternPartCols = 
compiledCustomPattern.getPartitionColumns();
+      //relative path starts after tablepath and the / afterwards
+      String relPath = 
partitionPath.toString().substring(tablePath.toString().length() + 1);
+      Matcher pathMatcher = customPathPattern.matcher(relPath);
+      boolean didMatch = pathMatcher.matches();
+      if (!didMatch) { //partition path doesn't match the pattern, should have 
been detected at an earlier step
+        throw new MetastoreException("Path " + relPath + "doesn't match custom 
partition pattern "
+                + customPathPattern + "partitionPathFull: " + partitionPath);
+      }
+      if (!patternPartCols.isEmpty()) {
+        result = new StringBuilder(patternPartCols.get(0) + "=" + 
pathMatcher.group(1));
+      }
+      for (int i = 1; i < patternPartCols.size(); i++) {
+        
result.append(Path.SEPARATOR).append(patternPartCols.get(i)).append("=").append(pathMatcher.group(i+1));

Review Comment:
   The custom pattern path does not call `getNormalisedPartitionValue` for the 
extracted partition values, unlike the standard path (lines 1660-1661). This 
means numeric partition values won't be normalized (e.g., "01" vs "1") when 
using custom patterns, leading to inconsistent behavior. The extracted 
partition values should be normalized using 
`getNormalisedPartitionValue(pathMatcher.group(i+1), 
partitionColToTypeMap.get(patternPartCols.get(i)), conf)` before being appended 
to the result.
   ```suggestion
           String firstColName = patternPartCols.get(0);
           String firstRawValue = pathMatcher.group(1);
           String firstNormalisedValue = getNormalisedPartitionValue(
                   firstRawValue, partitionColToTypeMap.get(firstColName), 
conf);
           if (firstNormalisedValue == null) {
             return null;
           }
           result = new StringBuilder(firstColName + "=" + 
firstNormalisedValue);
         }
         for (int i = 1; i < patternPartCols.size(); i++) {
           String colName = patternPartCols.get(i);
           String rawValue = pathMatcher.group(i + 1);
           String normalisedValue = getNormalisedPartitionValue(
                   rawValue, partitionColToTypeMap.get(colName), conf);
           if (normalisedValue == null) {
             return null;
           }
           
result.append(Path.SEPARATOR).append(colName).append("=").append(normalisedValue);
   ```



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java:
##########
@@ -1610,41 +1613,65 @@ public static Partition getPartition(IMetaStoreClient 
msc, Table tbl, Map<String
    * @return Partition name, for example partitiondate=2008-01-01
    */
   public static String getPartitionName(Path tablePath, Path partitionPath, 
Set<String> partCols,
-                                        Map<String, String> 
partitionColToTypeMap, Configuration conf) {
+                                        Map<String, String> 
partitionColToTypeMap, Configuration conf)
+          throws MetastoreException {
     StringBuilder result = null;
+    String customPattern = conf.get(HCAT_CUSTOM_DYNAMIC_PATTERN);
     Path currPath = partitionPath;
     LOG.debug("tablePath:" + tablePath + ", partCols: " + partCols);
-
-    while (currPath != null && !tablePath.equals(currPath)) {
-      // format: partition=p_val
-      // Add only when table partition colName matches
-      String[] parts = currPath.getName().split("=");
-      if (parts.length > 0) {
-        if (parts.length != 2) {
-          LOG.warn(currPath.getName() + " is not a valid partition name");
-          return result.toString();
-        }
-
-        // Since hive stores partitions keys in lower case, if the hdfs path 
contains mixed case,
-        // it should be converted to lower case
-        String partitionName = parts[0].toLowerCase();
-        // Do not convert the partitionValue to lowercase
-        String partitionValue = parts[1];
-        if (partCols.contains(partitionName)) {
-          String normalisedPartitionValue = 
getNormalisedPartitionValue(partitionValue,
-                  partitionColToTypeMap.get(partitionName), conf);
-          if (normalisedPartitionValue == null) {
-            return null;
+    if (customPattern != null) {
+      DynamicPartitioningCustomPattern compiledCustomPattern = new 
DynamicPartitioningCustomPattern.Builder()
+              .setCustomPattern(customPattern)
+              .build();
+      Pattern customPathPattern = 
compiledCustomPattern.getPartitionCapturePattern();
+      //partition columns in order that they appear in the pattern
+      List<String> patternPartCols = 
compiledCustomPattern.getPartitionColumns();
+      //relative path starts after tablepath and the / afterwards
+      String relPath = 
partitionPath.toString().substring(tablePath.toString().length() + 1);
+      Matcher pathMatcher = customPathPattern.matcher(relPath);
+      boolean didMatch = pathMatcher.matches();
+      if (!didMatch) { //partition path doesn't match the pattern, should have 
been detected at an earlier step
+        throw new MetastoreException("Path " + relPath + "doesn't match custom 
partition pattern "
+                + customPathPattern + "partitionPathFull: " + partitionPath);
+      }
+      if (!patternPartCols.isEmpty()) {
+        result = new StringBuilder(patternPartCols.get(0) + "=" + 
pathMatcher.group(1));
+      }
+      for (int i = 1; i < patternPartCols.size(); i++) {
+        
result.append(Path.SEPARATOR).append(patternPartCols.get(i)).append("=").append(pathMatcher.group(i+1));

Review Comment:
   If the custom pattern contains no partition column placeholders (empty 
`patternPartCols`), the result will be null, which is returned at line 1676. 
However, this case should be validated earlier and an appropriate error should 
be thrown, as a partition pattern without any partition columns doesn't make 
sense. Consider adding validation in the 
DynamicPartitioningCustomPattern.Builder.build() method to ensure at least one 
partition column is present.
   ```suggestion
         if (patternPartCols.isEmpty()) {
           throw new MetastoreException("Custom dynamic partition pattern must 
contain at least one partition column: "
                   + customPattern);
         }
         result = new StringBuilder(patternPartCols.get(0) + "=" + 
pathMatcher.group(1));
         for (int i = 1; i < patternPartCols.size(); i++) {
           
result.append(Path.SEPARATOR).append(patternPartCols.get(i)).append("=").append(pathMatcher.group(i
 + 1));
   ```



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/DynamicPartitioningCustomPattern.java:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.utils;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+/**
+ * When using msck repair table with custom partitioning patterns, we need to 
capture
+ * the partition keys from the pattern, and use those to construct a regex 
which will
+ * match the paths and extract the partition key values.*/
+public final class DynamicPartitioningCustomPattern {
+
+  private final String customPattern;
+  private final Pattern partitionCapturePattern;
+  private final List<String> partitionColumns;
+
+    // regex of the form: ${column name}. Following characters are not allowed 
in column name:
+    // whitespace characters, /, {, }, \
+  public static final Pattern customPathPattern = 
Pattern.compile("(\\$\\{)([^\\s/\\{\\}\\\\]+)(\\})");
+
+  private DynamicPartitioningCustomPattern(String customPattern, Pattern 
partitionCapturePattern,
+                                           List<String> partitionColumns) {
+    this.customPattern = customPattern;
+    this.partitionCapturePattern = partitionCapturePattern;
+    this.partitionColumns = partitionColumns;
+  }
+
+  /**
+   * @return stored custom pattern string
+   * */
+  public String getCustomPattern() {
+    return customPattern;
+  }
+
+  /**
+   * @return stored custom pattern regex matcher
+   * */
+  public Pattern getPartitionCapturePattern() {
+    return partitionCapturePattern;
+  }
+
+  /**
+   * @return list of partition key columns
+   * */
+  public List<String> getPartitionColumns() {
+    return partitionColumns;
+  }
+
+  public static class Builder {
+    private String customPattern;
+
+    public Builder setCustomPattern(String customPattern) {
+      this.customPattern = customPattern;
+      return this;
+    }
+
+    /**
+     * Constructs the regex to match the partition values in a path based on 
the custom pattern.
+     *
+     * @return custom partition pattern matcher */
+    public DynamicPartitioningCustomPattern build() {
+      StringBuffer sb = new StringBuffer();
+      Matcher m = customPathPattern.matcher(customPattern);
+      List<String> partColumns = new ArrayList<>();
+      while (m.find()) {
+        m.appendReplacement(sb, "(.*)");

Review Comment:
   The greedy regex pattern `(.*)` can lead to performance issues and 
unexpected matching behavior. Consider using a non-greedy pattern `(.*?)` or, 
better yet, `([^/]+)` to match partition values that don't contain path 
separators. The greedy pattern requires excessive backtracking when the regex 
engine tries to match subsequent literals in the pattern, and could incorrectly 
match values that span multiple path segments.
   ```suggestion
           m.appendReplacement(sb, "([^/]+)");
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


Reply via email to