Pavan-249 commented on PR #18893:
URL: https://github.com/apache/hudi/pull/18893#issuecomment-4712472097
## Describe the issue this Pull Request addresses
Part of #14281 (RFC-99 #14263)
## Summary and Changelog
Migrate schema providers in `hudi-utilities` from Avro `Schema` to
`HoodieSchema`.
Updated schema providers:
* `FilebasedSchemaProvider`
* Replace Avro `Schema` field/return types with `HoodieSchema`
* Rename overrides to `getSourceHoodieSchema()` / `getTargetHoodieSchema()`
* `HiveSchemaProvider`
* Replace Avro `Schema` field/return types with `HoodieSchema`
* Rename overrides to `getSourceHoodieSchema()` / `getTargetHoodieSchema()`
* `SchemaRegistryProvider`
* Replace Avro `Schema.Parser` usage with `HoodieSchema.Parser`
* Return `HoodieSchema` directly from source/target schema methods
* `SimpleSchemaProvider`
* Replace Avro `Schema` field/return types with `HoodieSchema`
* Add explicit `getSourceHoodieSchema()` override
* `JdbcbasedSchemaProvider`
* Replace Avro `Schema` field/return types with `HoodieSchema`
* Return the `HoodieSchema` produced by JDBC schema utilities directly
instead of converting back to Avro
* `ProtoClassBasedSchemaProvider`
* Replace Avro `Schema` field/return types with `HoodieSchema`
* Use `HoodieSchema.Parser` for parsing the generated proto schema string
* `RowBasedSchemaProvider`
* Replace Avro `Schema` return type with `HoodieSchema`
* Return the
`HoodieSchemaConversionUtils.convertStructTypeToHoodieSchema(...)` result
directly
Also updated:
* `SchemaProviderWithPostProcessor`
* Add `getSourceHoodieSchema()` and `getTargetHoodieSchema()` paths
* Keep deprecated Avro `getSourceSchema()` / `getTargetSchema()` bridge
methods for compatibility with existing callers
Note: `hudi-sync` does not require changes in this PR.
## Impact
No on-disk formats or serialization formats are changed.
This migration updates the schema provider APIs to use `HoodieSchema` while
preserving Avro compatibility through existing bridge methods where needed.
## Risk Level
Medium.
The migration follows the established RFC-99 pattern. The main
compatibility-sensitive area is `SchemaProviderWithPostProcessor`, where
deprecated Avro bridge methods are intentionally retained while the internal
processing path now uses `HoodieSchema`.
## Testing
Added unit tests:
* `TestSimpleSchemaProvider`
* `TestRowBasedSchemaProvider`
Validated locally with:
```bash
mvn -pl hudi-utilities
-Dtest=TestSimpleSchemaProvider,TestRowBasedSchemaProvider test
```
This passed locally with:
* Checkstyle: 0 violations
* RAT: passed
* Tests run: 2
* Failures: 0
* Errors: 0
* Skipped: 0
## Documentation Update
None.
## Contributor's checklist
* [x] Read through contributor's guide
* [x] Enough context is provided in the sections above
* [x] Adequate tests were added where applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]