jecsand838 opened a new pull request, #8595: URL: https://github.com/apache/arrow-rs/pull/8595
# Which issue does this PR close? - Part of #4886 - Stacked on #8584 # Rationale for this change This PR brings Arrow-Avro round‑trip coverage up to date with modern Arrow types and the latest Avro logical types. In particular, Avro 1.12 adds `timestamp-nanos` and `local-timestamp-nanos`. Enabling these logical types and filling in missing Avro writer encoders for Arrow’s newer *view* and list families allows lossless read/write and simpler pipelines. It also hardens timestamp/time scaling in the writer to avoid silent overflow when converting seconds to milliseconds, surfacing a clear error instead. # What changes are included in this PR? * **Nanosecond timestamps**: Introduces a `TimestampNanos(bool)` codec in `arrow-avro` that maps Avro `timestamp-nanos` / `local-timestamp-nanos` to Arrow `Timestamp(Nanosecond, tz)`. The reader/decoder, union field kinds, and Arrow `DataType` mapping are all extended accordingly. Logical type detection is wired through both `logicalType` and the `arrowTimeUnit="nanosecond"` attribute. * **UUID logical type round‑trip fix**: When reading Avro `logicalType="uuid"` fields, preserve that logical type in Arrow field metadata so writers can round‑trip it back to Avro. * **Avro writer encoders**: Add the missing array encoders and coverage for Arrow’s `ListView`, `LargeListView`, and `FixedSizeList`, and extend array encoder support to `BinaryView` and `Utf8View`. (See large additions in `writer/encoder.rs`.) * **Safer time/timestamp scaling**: Guard second to millisecond conversions in `Time32`/`Timestamp` encoders to prevent overflow; encoding now returns a clear `InvalidArgument` error in those cases. * **Schema utilities**: Add `AvroSchemaOptions` with `null_order` and `strip_metadata` flags so Avro JSON can be built while optionally omitting internal Arrow keys during round‑trip schema generation. * **Tests & round‑trip coverage**: Add unit tests for nanosecond timestamp decoding (UTC, local, and with nulls) and additional end‑to‑end/round‑trip tests for the updated writer paths. # Are these changes tested? Yes. * New decoder tests validate `Timestamp(Nanosecond, tz)` behavior for UTC and local timestamps and for nullable unions. * Writer tests validate the nanosecond encoder and exercise an overflow path for second→millisecond conversion that now returns an error. * Additional round‑trip tests were added alongside the new encoders. # Are there any user-facing changes? N/A since `arrow-avro` is not public yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
