cshuo commented on code in PR #18229:
URL: https://github.com/apache/hudi/pull/18229#discussion_r2844215651


##########
hudi-common/src/main/java/org/apache/hudi/common/table/PartitionPathParser.java:
##########
@@ -164,4 +171,53 @@ private static Object inferDateValue(
   private static boolean isTimeBasedType(HoodieSchemaType type) {
     return type == HoodieSchemaType.DATE || type == HoodieSchemaType.TIMESTAMP 
|| type == HoodieSchemaType.TIME;
   }
+
+  /**
+   * Parses the {@code lookup.partitions} option value into a list of Hudi 
partition paths.
+   *
+   * <p>The spec format is {@code key1=val1,key2=val2} per partition, with 
multiple partitions
+   * separated by {@code ;}. Example: {@code 
"dt=2024-01-01,region=us;dt=2024-01-02,region=eu"}.
+   *
+   * @param spec         the raw option value
+   * @param partitionKeys ordered list of partition key names as defined in 
the table schema
+   * @param hiveStyle    whether the table uses Hive-style partitioning 
({@code key=value} directories)
+   * @return list of partition paths in the format used by Hudi's file index
+   * @throws IllegalArgumentException if any key in the spec is not a valid 
partition key,
+   *                                  or if a key-value pair does not follow 
{@code key=value} format
+   */
+  public static List<String> parseLookupPartitionPaths(String spec, 
List<String> partitionKeys, boolean hiveStyle) {

Review Comment:
   lookup.partitions currently allows partial key sets for multi-key 
partitioned tables (e.g. only year=2024 when partition keys are year,month).
   For lookup pruning, this is risky because StaticPartitionPruner does exact 
partition-path matching. A partial path can silently fail to match real 
partition directories (e.g. year=2024/month=01), resulting in empty lookup 
cache for expected rows.
   Could we enforce that each partition spec includes all declared partition 
keys (fail fast with IllegalArgumentException if any key is missing)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to