Pavan-249 commented on PR #18893:
URL: https://github.com/apache/hudi/pull/18893#issuecomment-4712472097

   ## Describe the issue this Pull Request addresses
   
   Part of #14281 (RFC-99 #14263)
   
   ## Summary and Changelog
   
   Migrate schema providers in `hudi-utilities` from Avro `Schema` to 
`HoodieSchema`.
   
   Updated schema providers:
   
   * `FilebasedSchemaProvider`
   
     * Replace Avro `Schema` field/return types with `HoodieSchema`
     * Rename overrides to `getSourceHoodieSchema()` / `getTargetHoodieSchema()`
   
   * `HiveSchemaProvider`
   
     * Replace Avro `Schema` field/return types with `HoodieSchema`
     * Rename overrides to `getSourceHoodieSchema()` / `getTargetHoodieSchema()`
   
   * `SchemaRegistryProvider`
   
     * Replace Avro `Schema.Parser` usage with `HoodieSchema.Parser`
     * Return `HoodieSchema` directly from source/target schema methods
   
   * `SimpleSchemaProvider`
   
     * Replace Avro `Schema` field/return types with `HoodieSchema`
     * Add explicit `getSourceHoodieSchema()` override
   
   * `JdbcbasedSchemaProvider`
   
     * Replace Avro `Schema` field/return types with `HoodieSchema`
     * Return the `HoodieSchema` produced by JDBC schema utilities directly 
instead of converting back to Avro
   
   * `ProtoClassBasedSchemaProvider`
   
     * Replace Avro `Schema` field/return types with `HoodieSchema`
     * Use `HoodieSchema.Parser` for parsing the generated proto schema string
   
   * `RowBasedSchemaProvider`
   
     * Replace Avro `Schema` return type with `HoodieSchema`
     * Return the 
`HoodieSchemaConversionUtils.convertStructTypeToHoodieSchema(...)` result 
directly
   
   Also updated:
   
   * `SchemaProviderWithPostProcessor`
   
     * Add `getSourceHoodieSchema()` and `getTargetHoodieSchema()` paths
     * Keep deprecated Avro `getSourceSchema()` / `getTargetSchema()` bridge 
methods for compatibility with existing callers
   
   Note: `hudi-sync` does not require changes in this PR.
   
   ## Impact
   
   No on-disk formats or serialization formats are changed.
   
   This migration updates the schema provider APIs to use `HoodieSchema` while 
preserving Avro compatibility through existing bridge methods where needed.
   
   ## Risk Level
   
   Medium.
   
   The migration follows the established RFC-99 pattern. The main 
compatibility-sensitive area is `SchemaProviderWithPostProcessor`, where 
deprecated Avro bridge methods are intentionally retained while the internal 
processing path now uses `HoodieSchema`.
   
   ## Testing
   
   Added unit tests:
   
   * `TestSimpleSchemaProvider`
   * `TestRowBasedSchemaProvider`
   
   Validated locally with:
   
   ```bash
   mvn -pl hudi-utilities 
-Dtest=TestSimpleSchemaProvider,TestRowBasedSchemaProvider test
   ```
   
   This passed locally with:
   
   * Checkstyle: 0 violations
   * RAT: passed
   * Tests run: 2
   * Failures: 0
   * Errors: 0
   * Skipped: 0
   
   ## Documentation Update
   
   None.
   
   ## Contributor's checklist
   
   * [x] Read through contributor's guide
   * [x] Enough context is provided in the sections above
   * [x] Adequate tests were added where applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to