[I] SparkValidatorUtils should be able to run precommit validation queries against an empty write [hudi]

via GitHub Fri, 16 Jan 2026 16:22:37 -0800


kbuci opened a new issue, #17923:
URL: https://github.com/apache/hudi/issues/17923


   ### Task Description
   
   **What needs to be done:**
   When creating the spark dataset of records,  
`org.apache.hudi.client.utils.SparkValidatorUtils` should infer the schema if 
the write has no base files (due to being empty).
   
   **Why this task is needed:**
   In our internal HUDI 0.x and Spark3 build, we have seen the precommit 
validation check fail with error `Column . . . does not exist` when the user 
does an empty write with no files/records changed.
   When a SQL validation query is configured that references specific column 
names, the spark validation query execution can fail due to being unable to 
find the column name . This is since when 
`org.apache.hudi.client.utils.SparkValidatorUtils#getRecordsFromPendingCommits` 
is called on an "empty" write with no records/base files, it will create an 
empty data frame with no schema. 
   
   
   ### Task Type
   
   Code improvement/refactoring
   
   ### Related Issues
   
   **Parent feature issue:** (if applicable )
   **Related issues:**
   NOTE: Use `Relationships` button to add parent/blocking issues after issue 
is created.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] SparkValidatorUtils should be able to run precommit validation queries against an empty write [hudi]

Reply via email to