[ 
https://issues.apache.org/jira/browse/ARROW-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Florian Jetter updated ARROW-8142:
----------------------------------
    Description: 
When casting a schema of an empty table from dict encoded to non-dict encoded 
type a critical error is raised and not handled causing the interpreter to shut 
down.

This only happens after a parquet roundtrip

 
{code:python}
import pyarrow as paimport pandas as pdimport pyarrow.parquet as pq

df = pd.DataFrame({"col": ["a"]}).astype({"col": "category"}).iloc[:0]
table = pa.Table.from_pandas(df)field = table.schema[0]
new_field = pa.field(field.name, field.type.value_type, field.nullable, 
field.metadata)

buf = pa.BufferOutputStream()
pq.write_table(table, buf)
reader = pa.BufferReader(buf.getvalue().to_pybytes())
table = pq.read_table(reader)

schema = table.schema.remove(0).insert(0, new_field)
new_table = table.cast(schema)
assert new_table.schema == schema
 {code}
 

Output
{code:java}
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0318 09:55:14.266649 299722176 table.cc:47] Check failed: (chunks.size()) > 
(0) cannot construct ChunkedArray from empty vector and omitted type {code}
 

Tested on pyarrow==0.16.0 

  was:
When casting a schema of an empty table from dict encoded to non-dict encoded 
type a critical error is raised and not handled causing the interpreter to shut 
down.

This only happens after a parquet roundtrip

 
{code:python}
import pyarrow as paimport pandas as pdimport pyarrow.parquet as pq

df = pd.DataFrame({"col": ["a"]}).astype({"col": "category"}).iloc[:0]
table = pa.Table.from_pandas(df)field = table.schema[0]
new_field = pa.field(field.name, field.type.value_type, field.nullable, 
field.metadata)

buf = pa.BufferOutputStream()
pq.write_table(table, buf)
reader = pa.BufferReader(buf.getvalue().to_pybytes())
table = pq.read_table(reader)

schema = table.schema.remove(0).insert(0, new_field)
new_table = table.cast(schema)
assert new_table.schema == schema
 {code}
 

Output
{code:java}
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0318 09:55:14.266649 299722176 table.cc:47] Check failed: (chunks.size()) > 
(0) cannot construct ChunkedArray from empty vector and omitted type {code}
 


> [Python/C++] Casting empty table from after parquet roundtrip causes critical 
> failure
> -------------------------------------------------------------------------------------
>
>                 Key: ARROW-8142
>                 URL: https://issues.apache.org/jira/browse/ARROW-8142
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Florian Jetter
>            Priority: Major
>
> When casting a schema of an empty table from dict encoded to non-dict encoded 
> type a critical error is raised and not handled causing the interpreter to 
> shut down.
> This only happens after a parquet roundtrip
>  
> {code:python}
> import pyarrow as paimport pandas as pdimport pyarrow.parquet as pq
> df = pd.DataFrame({"col": ["a"]}).astype({"col": "category"}).iloc[:0]
> table = pa.Table.from_pandas(df)field = table.schema[0]
> new_field = pa.field(field.name, field.type.value_type, field.nullable, 
> field.metadata)
> buf = pa.BufferOutputStream()
> pq.write_table(table, buf)
> reader = pa.BufferReader(buf.getvalue().to_pybytes())
> table = pq.read_table(reader)
> schema = table.schema.remove(0).insert(0, new_field)
> new_table = table.cast(schema)
> assert new_table.schema == schema
>  {code}
>  
> Output
> {code:java}
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F0318 09:55:14.266649 299722176 table.cc:47] Check failed: (chunks.size()) > 
> (0) cannot construct ChunkedArray from empty vector and omitted type {code}
>  
> Tested on pyarrow==0.16.0 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to