It will still be possible to write files using Parquet 2.4 by
explicitly specifying the 2.4 version to the Parquet writer, correct?
If yes, that provides a simple workaround for users who encounter
compatibility issues.

However we should take care to document this as a potentially breaking
change, and document the workaround in release notes, release blog,
etc.

Ian

On Thu, Jun 15, 2023 at 12:25 PM Joris Van den Bossche
<[email protected]> wrote:
>
> Hi all,
>
> Bringing up https://github.com/apache/arrow/issues/35746 to the
> mailing list: this issue proposes to bump the default Parquet version
> we use for writing to Parquet files in the C++ library (and in the
> various bindings including pyarrow and R arrow) from the current
> default of "2.4" to "2.6".
>
> In practice, the only change is that the writer will, by default,
> write the Timestamp LogicalType with NANOS unit
> (https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp)
> if your data uses timestamp("ns") (currently, such data gets coerced
> to microsecond resolution when writing to Parquet).
>
> In theory this could cause compatibility issues if the files you are
> writing need to be read by other Parquet implementations which don't
> yet support nanoseconds. But the Parquet format 2.6 was released in
> Sept 2018, and parquet-mr added support for it in 2018 as well.
>
> Unless there is pushback on this, we are currently planning to make
> this change for the upcoming Arrow 13.0.0 release.
>
> Best,
> Joris

Reply via email to