cshuo commented on code in PR #18229:
URL: https://github.com/apache/hudi/pull/18229#discussion_r2844215651
##########
hudi-common/src/main/java/org/apache/hudi/common/table/PartitionPathParser.java:
##########
@@ -164,4 +171,53 @@ private static Object inferDateValue(
private static boolean isTimeBasedType(HoodieSchemaType type) {
return type == HoodieSchemaType.DATE || type == HoodieSchemaType.TIMESTAMP
|| type == HoodieSchemaType.TIME;
}
+
+ /**
+ * Parses the {@code lookup.partitions} option value into a list of Hudi
partition paths.
+ *
+ * <p>The spec format is {@code key1=val1,key2=val2} per partition, with
multiple partitions
+ * separated by {@code ;}. Example: {@code
"dt=2024-01-01,region=us;dt=2024-01-02,region=eu"}.
+ *
+ * @param spec the raw option value
+ * @param partitionKeys ordered list of partition key names as defined in
the table schema
+ * @param hiveStyle whether the table uses Hive-style partitioning
({@code key=value} directories)
+ * @return list of partition paths in the format used by Hudi's file index
+ * @throws IllegalArgumentException if any key in the spec is not a valid
partition key,
+ * or if a key-value pair does not follow
{@code key=value} format
+ */
+ public static List<String> parseLookupPartitionPaths(String spec,
List<String> partitionKeys, boolean hiveStyle) {
Review Comment:
lookup.partitions currently allows partial key sets for multi-key
partitioned tables (e.g. only year=2024 when partition keys are year,month).
For lookup pruning, this is risky because StaticPartitionPruner does exact
partition-path matching. A partial path can silently fail to match real
partition directories (e.g. year=2024/month=01), resulting in empty lookup
cache for expected rows.
Could we enforce that each partition spec includes all declared partition
keys (fail fast with IllegalArgumentException if any key is missing)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]