Hi all,

Bringing up https://github.com/apache/arrow/issues/35746 to the
mailing list: this issue proposes to bump the default Parquet version
we use for writing to Parquet files in the C++ library (and in the
various bindings including pyarrow and R arrow) from the current
default of "2.4" to "2.6".

In practice, the only change is that the writer will, by default,
write the Timestamp LogicalType with NANOS unit
(https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp)
if your data uses timestamp("ns") (currently, such data gets coerced
to microsecond resolution when writing to Parquet).

In theory this could cause compatibility issues if the files you are
writing need to be read by other Parquet implementations which don't
yet support nanoseconds. But the Parquet format 2.6 was released in
Sept 2018, and parquet-mr added support for it in 2018 as well.

Unless there is pushback on this, we are currently planning to make
this change for the upcoming Arrow 13.0.0 release.

Best,
Joris

Reply via email to