lokeshj1703 opened a new pull request, #18977:
URL: https://github.com/apache/hudi/pull/18977
### Describe the issue this Pull Request addresses
Closes #18668.
`org.apache.hudi.DefaultSource` has two read-side overloads of
`createRelation`:
- The 2-arg overload `createRelation(sqlContext, parameters)` wraps its body
in a `try { … } catch { case _: HoodieSchemaNotFoundException => new
EmptyRelation(…) }`. This catch was added in [HUDI-7147 /
#10689](https://github.com/apache/hudi/pull/10689) so that schema-less Hudi
tables (no commits / commit metadata deleted / legacy schema-less layout) do
not explode at query analysis time.
- The 3-arg overload `createRelation(sqlContext, optParams, schema)` calls
`DefaultSource.createRelation(sqlContext, metaClient, schema, options.toMap)`
directly, **without** the same catch.
Spark's `DataSource.resolveRelation()` chooses the overload based on whether
a user-supplied schema is present:
```scala
case (dataSource: SchemaRelationProvider, Some(schema)) =>
dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions,
schema)
case (dataSource: RelationProvider, _) =>
dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions)
```
So any read path that supplies a schema (e.g.
`spark.read.schema(s).format("hudi").load(path)`, or HMS-catalog resolution
that already knows the schema) bypasses the 2-arg catch and surfaces
`HoodieSchemaNotFoundException` directly.
### Summary and Changelog
- **`DefaultSource.scala` (3-arg `createRelation`)**: mirror the existing
2-arg catch so `HoodieSchemaNotFoundException` resolves to `EmptyRelation` on
this overload too. Adds an inline comment explaining why both overloads need
the same catch.
- **`TestCOWDataSource.testReadOfAnEmptyTableWithUserSuppliedSchema`**:
sibling of the existing `testReadOfAnEmptyTable` that asserts
`spark.read.schema(userSchema).format("hudi").load(basePath).count() == 0`
instead of throwing on a schema-less table.
### Impact
User-facing: a Hudi table whose schema is unresolvable will now return an
empty relation when queried with a user-supplied schema, matching the existing
no-schema-supplied behavior. No previously-successful path changes behavior —
this only converts a previously-thrown exception into an empty result on the
same exact failure condition.
### Risk Level
low — minimal scope (one try/catch mirroring existing logic), covered by a
new unit test that mirrors an existing one.
### Documentation Update
none
### Contributor's checklist
- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable
- [x] CI passed
🤖 Generated with [Claude Code](https://claude.com/claude-code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]