kbuci opened a new issue, #17923: URL: https://github.com/apache/hudi/issues/17923
### Task Description **What needs to be done:** When creating the spark dataset of records, `org.apache.hudi.client.utils.SparkValidatorUtils` should infer the schema if the write has no base files (due to being empty). **Why this task is needed:** In our internal HUDI 0.x and Spark3 build, we have seen the precommit validation check fail with error `Column . . . does not exist` when the user does an empty write with no files/records changed. When a SQL validation query is configured that references specific column names, the spark validation query execution can fail due to being unable to find the column name . This is since when `org.apache.hudi.client.utils.SparkValidatorUtils#getRecordsFromPendingCommits` is called on an "empty" write with no records/base files, it will create an empty data frame with no schema. ### Task Type Code improvement/refactoring ### Related Issues **Parent feature issue:** (if applicable ) **Related issues:** NOTE: Use `Relationships` button to add parent/blocking issues after issue is created. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
