kosiew commented on code in PR #23226:
URL: https://github.com/apache/datafusion/pull/23226#discussion_r3511047744


##########
datafusion/catalog-listing/src/helpers.rs:
##########
@@ -549,15 +559,39 @@ mod tests {
             )
         );
         assert_eq!(
-            Some(vec!["v1"]),
+            Some(vec![Cow::Borrowed("v1")]),
             parse_partitions_for_path(
                 &ListingTableUrl::parse("file:///bucket/mytable").unwrap(),
                 &Path::from("bucket/mytable/mypartition=v1/file.csv"),
                 vec!["mypartition"]
             )

Review Comment:
   The test coverage looks solid. It might also be worth adding a couple of 
extra cases for completeness:
   
   - Multiple percent-encoded sequences in the same value, for example 
`test%20dir%2Ffile` -> `test dir/file`
   - A multi-byte UTF-8 sequence such as `%C3%A9` -> `é`
   
   These are probably uncommon in Hive partition values, but they would give a 
little more confidence that the decoding behaves correctly across a wider range 
of valid inputs.



##########
datafusion/catalog-listing/src/helpers.rs:
##########
@@ -449,7 +454,12 @@ where
     let mut part_values = vec![];
     for (part, expected_partition) in subpath.zip(table_partition_cols) {
         match part.split_once('=') {
-            Some((name, val)) if name == expected_partition => 
part_values.push(val),
+            Some((name, val)) if name == expected_partition => {
+                let decoded = percent_decode_str(val)
+                    .decode_utf8()
+                    .unwrap_or(Cow::Borrowed(val));

Review Comment:
   Nice use of `unwrap_or(Cow::Borrowed(val))` here. It took me a second to 
realize why we intentionally preserve the original value on decode failure. A 
short comment could make that intent clearer for future readers, for example:
   
   ```rust
   // Preserve the original value if percent-decoding produces invalid UTF-8
   ```
   
   That makes it obvious that malformed input is passed through intentionally 
rather than being silently modified.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to