jorisvandenbossche commented on code in PR #36137:
URL: https://github.com/apache/arrow/pull/36137#discussion_r1250505457
##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3334,7 +3334,7 @@ TEST(ArrowReadWrite, NestedRequiredOuterOptional) {
auto arrow_writer_props = ArrowWriterProperties::Builder();
arrow_writer_props.store_schema();
if (inner_type->id() == ::arrow::Type::UINT32) {
- writer_props.version(ParquetVersion::PARQUET_2_4);
+ writer_props.version(ParquetVersion::PARQUET_2_6);
Review Comment:
The default writer_props created above with `WriterProperties::Builder();`
will now use 2_6 by default, so potentially this could be removed alltogether
##########
python/pyarrow/tests/parquet/test_datetime.py:
##########
@@ -378,28 +379,31 @@ def test_parquet_version_timestamp_differences():
a_us = pa.array(d_us, type=pa.timestamp('us'))
a_ns = pa.array(d_ns, type=pa.timestamp('ns'))
+ all_versions = ['1.0', '2.4', '2.6']
+
names = ['ts:s', 'ts:ms', 'ts:us', 'ts:ns']
table = pa.Table.from_arrays([a_s, a_ms, a_us, a_ns], names)
- # Using Parquet version 1.0, seconds should be coerced to milliseconds
+ # Using Parquet version 1.0 and 2.4, seconds should be coerced to
milliseconds
# and nanoseconds should be coerced to microseconds by default
expected = pa.Table.from_arrays([a_ms, a_ms, a_us, a_us], names)
- _check_roundtrip(table, expected)
+ _check_roundtrip(table, expected, version='1.0')
+ _check_roundtrip(table, expected, version='2.4')
- # Using Parquet version 2.0, seconds should be coerced to milliseconds
+ # Using Parquet version 2.6, seconds should be coerced to milliseconds
# and nanoseconds should be retained by default
expected = pa.Table.from_arrays([a_ms, a_ms, a_us, a_ns], names)
_check_roundtrip(table, expected, version='2.6')
- # Using Parquet version 1.0, coercing to milliseconds or microseconds
+ # For either Parquet version coercing to milliseconds or microseconds
# is allowed
expected = pa.Table.from_arrays([a_ms, a_ms, a_ms, a_ms], names)
- _check_roundtrip(table, expected, coerce_timestamps='ms')
+ for ver in all_versions:
+ _check_roundtrip(table, expected, coerce_timestamps='ms', version=ver)
- # Using Parquet version 2.0, coercing to milliseconds or microseconds
- # is allowed
expected = pa.Table.from_arrays([a_us, a_us, a_us, a_us], names)
- _check_roundtrip(table, expected, version='2.6', coerce_timestamps='us')
+ for ver in all_versions:
+ _check_roundtrip(table, expected, version=ver, coerce_timestamps='us')
# TODO: after pyarrow allows coerce_timestamps='ns', tests like the
# following should pass ...
Review Comment:
Could also be left for a follow-up, but we could now allow this (in
`_create_arrow_writer_properties` in _parquet.pyx we would need to update the
cython bindings to pass it through, I think on C++ side it's already
implemented)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]