[ https://issues.apache.org/jira/browse/ARROW-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche updated ARROW-8142: ----------------------------------------- Fix Version/s: 0.17.0 > [Python/C++] Casting empty table from after parquet roundtrip causes critical > failure > ------------------------------------------------------------------------------------- > > Key: ARROW-8142 > URL: https://issues.apache.org/jira/browse/ARROW-8142 > Project: Apache Arrow > Issue Type: Bug > Reporter: Florian Jetter > Priority: Major > Fix For: 0.17.0 > > > When casting a schema of an empty table from dict encoded to non-dict encoded > type a critical error is raised and not handled causing the interpreter to > shut down. > This only happens after a parquet roundtrip > > {code:python} > import pyarrow as pa > import pandas as pd > import pyarrow.parquet as pq > df = pd.DataFrame({"col": ["a"]}).astype({"col": "category"}).iloc[:0] > table = pa.Table.from_pandas(df) > field = table.schema[0] > new_field = pa.field(field.name, field.type.value_type, field.nullable, > field.metadata) > buf = pa.BufferOutputStream() > pq.write_table(table, buf) > reader = pa.BufferReader(buf.getvalue().to_pybytes()) > table = pq.read_table(reader) > schema = table.schema.remove(0).insert(0, new_field) > new_table = table.cast(schema) > assert new_table.schema == schema > {code} > > Output > {code:java} > WARNING: Logging before InitGoogleLogging() is written to STDERR > F0318 09:55:14.266649 299722176 table.cc:47] Check failed: (chunks.size()) > > (0) cannot construct ChunkedArray from empty vector and omitted type {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)