jecsand838 opened a new pull request, #8595:
URL: https://github.com/apache/arrow-rs/pull/8595

   # Which issue does this PR close?
   
   - Part of #4886 
   - Stacked on #8584 
   
   # Rationale for this change
   
   This PR brings Arrow-Avro round‑trip coverage up to date with modern Arrow 
types and the latest Avro logical types. In particular, Avro 1.12 adds 
`timestamp-nanos` and `local-timestamp-nanos`. Enabling these logical types and 
filling in missing Avro writer encoders for Arrow’s newer *view* and list 
families allows lossless read/write and simpler pipelines.
   
   It also hardens timestamp/time scaling in the writer to avoid silent 
overflow when converting seconds to milliseconds, surfacing a clear error 
instead.
   
   # What changes are included in this PR?
   
   * **Nanosecond timestamps**: Introduces a `TimestampNanos(bool)` codec in 
`arrow-avro` that maps Avro `timestamp-nanos` / `local-timestamp-nanos` to 
Arrow `Timestamp(Nanosecond, tz)`. The reader/decoder, union field kinds, and 
Arrow `DataType` mapping are all extended accordingly. Logical type detection 
is wired through both `logicalType` and the `arrowTimeUnit="nanosecond"` 
attribute.
   * **UUID logical type round‑trip fix**: When reading Avro 
`logicalType="uuid"` fields, preserve that logical type in Arrow field metadata 
so writers can round‑trip it back to Avro.
   * **Avro writer encoders**: Add the missing array encoders and coverage for 
Arrow’s `ListView`, `LargeListView`, and `FixedSizeList`, and extend array 
encoder support to `BinaryView` and `Utf8View`. (See large additions in 
`writer/encoder.rs`.)
   * **Safer time/timestamp scaling**: Guard second to millisecond conversions 
in `Time32`/`Timestamp` encoders to prevent overflow; encoding now returns a 
clear `InvalidArgument` error in those cases.
   * **Schema utilities**: Add `AvroSchemaOptions` with `null_order` and 
`strip_metadata` flags so Avro JSON can be built while optionally omitting 
internal Arrow keys during round‑trip schema generation.
   * **Tests & round‑trip coverage**: Add unit tests for nanosecond timestamp 
decoding (UTC, local, and with nulls) and additional end‑to‑end/round‑trip 
tests for the updated writer paths.
   
   # Are these changes tested?
   
   Yes.
   
   * New decoder tests validate `Timestamp(Nanosecond, tz)` behavior for UTC 
and local timestamps and for nullable unions.
   * Writer tests validate the nanosecond encoder and exercise an overflow path 
for second→millisecond conversion that now returns an error.
   * Additional round‑trip tests were added alongside the new encoders. 
   
   # Are there any user-facing changes?
   
   N/A since `arrow-avro` is not public yet.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to