yihua opened a new pull request, #18773: URL: https://github.com/apache/hudi/pull/18773
### Describe the issue this Pull Request addresses Closes #18606 Spark 4.1 pulls in Avro 1.12, which installs default `Conversion`s on `GenericData.get()` for date/time logical types. Generic records returned by `GenericDatumReader` now materialize `java.time.LocalDate` / `java.time.Instant` / `java.time.LocalDateTime` for fields that Avro 1.11.x (Spark 3.5 / 4.0) exposed as raw `Integer` / `Long`. This breaks Hudi's in-memory comparison and casting on the read path, e.g. `MERGE INTO` with a `timestamp` or `date` precombine field fails with `ClassCastException` in `DefaultHoodieRecordPayload.compareOrderingVal` (Instant vs Long) — and even after fixing the Spark deserializer, the same mismatch surfaces in `HoodieAvroUtils.getNestedFieldVal` for ordering-value extraction. ### Summary and Changelog Read-side normalization only — the on-disk byte format is unaffected and writer / reader cross-compatibility between Spark 3.5 / 4.0 and Spark 4.1 is preserved. - `hudi-common` `HoodieAvroUtils.convertValueForAvroLogicalTypes`: accepts both the Avro 1.11.x primitive form (`Integer` / `Long`) and the Avro 1.12 `java.time` form (`LocalDate` / `Instant` / `LocalDateTime`), normalizing to the same canonical value (epoch-day / epoch-millis / epoch-micros). Added javadoc explaining the Avro 1.12 situation and why storage bytes are not affected. Added private `extract*` helpers. - `hudi-common` `HoodieAvroWrapperUtils.unwrapAvroValueWrapper(Object, String)`: fixed three unguarded `(Integer)` / `(Long)` casts on `GenericRecord.get(0)` for `DateWrapper` / `LocalDateWrapper` / `TimestampMicrosWrapper` via local helpers that accept both encodings. - `hudi-spark4.1.x` `AvroDeserializer.scala`: restored the `Instant` / `LocalDate` / `LocalDateTime` fallback branches in `(INT, IntegerType)`, `(INT, DateType)`, `(LONG, LongType)`, `(LONG, TimestampType)`, and `(LONG, TimestampNTZType)`. Added a block comment explaining the Avro 1.12 vs 1.11.x behavior and that the change is read-side only. - Re-enabled `TestMergeIntoTable.Test Different Type of PreCombineField` on Spark 4.1 (the previous `assume(!gteqSpark4_1, ...)` workaround is no longer needed). Tests added: - `TestHoodieAvroUtils.testConvertValueForAvroLogicalTypesCrossAvroVersion` — feeds both encodings for date / timestamp-millis / timestamp-micros / local-timestamp-millis / local-timestamp-micros and asserts identical canonical output. - `TestHoodieAvroUtils.testGetNestedFieldValOrderingInvariantAcrossAvroVersions` — builds two records (primitive vs java.time) and asserts `compareTo` returns 0, the precise contract `DefaultHoodieRecordPayload.compareOrderingVal` relies on. - `TestSpark4_1AvroLogicalTypeBytes` (new, in `hudi-spark4.1.x`): asserts `HoodieSpark4_1AvroSerializer` emits raw `Long` / `Integer` (never java.time) into the `GenericRecord`, and that `GenericDatumWriter` output matches an independent zig-zag varlong encoding per the Avro spec. This pins the storage-byte invariant directly without needing to build both Spark profiles. ### Impact No public API change. No on-disk format change. Bug fix. ### Risk Level low The change is scoped to in-memory deserialization and ordering-value extraction in `hudi-common` and `hudi-spark4.1.x`. The write path is untouched: `AvroSerializer` (Spark 4.1) emits only primitive `Long` / `Int` into `GenericRecord`s, and `GenericDatumWriter` encodes those bytes per the Avro spec, identical to what Avro 1.11.x writes. The new write-side test enforces this contract. ### Documentation Update none ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
