[ https://issues.apache.org/jira/browse/ARROW-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-6339: ---------------------------------- Labels: pull-request-available (was: ) > [Python][C++] Rowgroup statistics for pd.NaT array ill defined > -------------------------------------------------------------- > > Key: ARROW-6339 > URL: https://issues.apache.org/jira/browse/ARROW-6339 > Project: Apache Arrow > Issue Type: Bug > Affects Versions: 0.14.1 > Reporter: Florian Jetter > Priority: Minor > Labels: pull-request-available > > When initialising an array with NaT only values the row group statistic is > corrupt returning either random values or raises integer out of bound > exceptions. > {code:python} > import io > import pandas as pd > import pyarrow as pa > import pyarrow.parquet as pq > df = pd.DataFrame({"t": pd.Series([pd.NaT], dtype="datetime64[ns]")}) > buf = pa.BufferOutputStream() > pq.write_table(pa.Table.from_pandas(df), buf, version="2.0") > buf = io.BytesIO(buf.getvalue().to_pybytes()) > parquet_file = pq.ParquetFile(buf) > # Asserting behaviour is difficult since it is random and the state is ill > defined. > # After a few iterations an exception is raised. > while True: > parquet_file.metadata.row_group(0).column(0).statistics.max > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)