[ https://issues.apache.org/jira/browse/ARROW-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Florian Jetter updated ARROW-8142: ---------------------------------- Description: When casting a schema of an empty table from dict encoded to non-dict encoded type a critical error is raised and not handled causing the interpreter to shut down. This only happens after a parquet roundtrip {code:python} import pyarrow as paimport pandas as pdimport pyarrow.parquet as pq df = pd.DataFrame({"col": ["a"]}).astype({"col": "category"}).iloc[:0] table = pa.Table.from_pandas(df)field = table.schema[0] new_field = pa.field(field.name, field.type.value_type, field.nullable, field.metadata) buf = pa.BufferOutputStream() pq.write_table(table, buf) reader = pa.BufferReader(buf.getvalue().to_pybytes()) table = pq.read_table(reader) schema = table.schema.remove(0).insert(0, new_field) new_table = table.cast(schema) assert new_table.schema == schema {code} Output {code:java} WARNING: Logging before InitGoogleLogging() is written to STDERR F0318 09:55:14.266649 299722176 table.cc:47] Check failed: (chunks.size()) > (0) cannot construct ChunkedArray from empty vector and omitted type {code} Tested on pyarrow==0.16.0 was: When casting a schema of an empty table from dict encoded to non-dict encoded type a critical error is raised and not handled causing the interpreter to shut down. This only happens after a parquet roundtrip {code:python} import pyarrow as paimport pandas as pdimport pyarrow.parquet as pq df = pd.DataFrame({"col": ["a"]}).astype({"col": "category"}).iloc[:0] table = pa.Table.from_pandas(df)field = table.schema[0] new_field = pa.field(field.name, field.type.value_type, field.nullable, field.metadata) buf = pa.BufferOutputStream() pq.write_table(table, buf) reader = pa.BufferReader(buf.getvalue().to_pybytes()) table = pq.read_table(reader) schema = table.schema.remove(0).insert(0, new_field) new_table = table.cast(schema) assert new_table.schema == schema {code} Output {code:java} WARNING: Logging before InitGoogleLogging() is written to STDERR F0318 09:55:14.266649 299722176 table.cc:47] Check failed: (chunks.size()) > (0) cannot construct ChunkedArray from empty vector and omitted type {code} > [Python/C++] Casting empty table from after parquet roundtrip causes critical > failure > ------------------------------------------------------------------------------------- > > Key: ARROW-8142 > URL: https://issues.apache.org/jira/browse/ARROW-8142 > Project: Apache Arrow > Issue Type: Bug > Reporter: Florian Jetter > Priority: Major > > When casting a schema of an empty table from dict encoded to non-dict encoded > type a critical error is raised and not handled causing the interpreter to > shut down. > This only happens after a parquet roundtrip > > {code:python} > import pyarrow as paimport pandas as pdimport pyarrow.parquet as pq > df = pd.DataFrame({"col": ["a"]}).astype({"col": "category"}).iloc[:0] > table = pa.Table.from_pandas(df)field = table.schema[0] > new_field = pa.field(field.name, field.type.value_type, field.nullable, > field.metadata) > buf = pa.BufferOutputStream() > pq.write_table(table, buf) > reader = pa.BufferReader(buf.getvalue().to_pybytes()) > table = pq.read_table(reader) > schema = table.schema.remove(0).insert(0, new_field) > new_table = table.cast(schema) > assert new_table.schema == schema > {code} > > Output > {code:java} > WARNING: Logging before InitGoogleLogging() is written to STDERR > F0318 09:55:14.266649 299722176 table.cc:47] Check failed: (chunks.size()) > > (0) cannot construct ChunkedArray from empty vector and omitted type {code} > > Tested on pyarrow==0.16.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)