Will Jones created ARROW-15725: ---------------------------------- Summary: [Python] Legacy dataset can't roundtrip Int64 with nulls if partitioned Key: ARROW-15725 URL: https://issues.apache.org/jira/browse/ARROW-15725 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 7.0.0, 4.0.0 Reporter: Will Jones
If there is partitioning and the column has nulls, Int64 columns may not round trip successfully using the legacy datasets implementation. Simple reproduction: {code:python} import pyarrow as pa import pyarrow.parquet as pq import pyarrow.dataset as ds import tempfile table = pa.table({ 'x': pa.array([None, 7753285016841556620]), 'y': pa.array(['a', 'b']) }) ds_dir = tempfile.mkdtemp() pq.write_to_dataset(table, ds_dir, partition_cols=['y']) table_after = ds.dataset(ds_dir).to_table() print(table['x']) print(table_after['x']) assert table['x'] == table_after['x'] {code} {code} [ [ null, 7753285016841556620 ] ] [ [ null ], [ 7753285016841556992 ] ] {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)