[PR] fix: Reproduce nested partition columns pruning data validation failure [hudi]

via GitHub Tue, 30 Dec 2025 17:35:36 -0800


vinishjail97 opened a new pull request, #17759:
URL: https://github.com/apache/hudi/pull/17759


   ### Describe the issue this Pull Request addresses
   
   There's a change in behavior for  for SparkHoodieTableFileIndex since 
0.14.1. The StructType(partitionFields) returned doesn't have the full path and 
causing the data validation failures. This behavior was changed as part of this 
PR https://github.com/apache/hudi/pull/9863/changes
   
   ### Summary and Changelog
   
   If there's a table with a nested partition column whose leaf name conflicts 
with another top level field the partitionedSchema passed to the new file group 
reader is incorrect. When I tried reverting the previous change found another 
issue where we are relying on 
`HoodieSchemaConversionUtils.convertStructTypeToHoodieSchema` to get 
requestedSchema in buildReaderWithPartitionValues but this fails because 
HoodieSchema doesn't like dots in the names.
   
   Looking for guidance or feedback on how to read nested partition columns 
through parquet reader?  
   
   
   ### Impact
   High
   
   <!-- Describe any public API or user-facing feature change or any 
performance impact. -->
   
   ### Risk Level
   High
   
   ### Documentation Update
   
   None.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] fix: Reproduce nested partition columns pruning data validation failure [hudi]

Reply via email to