Karthik created ARROW-15142:
-------------------------------

             Summary: Cannot mix struct and non-struct, non-null values error 
when saving nested types with PyArrow 
                 Key: ARROW-15142
                 URL: https://issues.apache.org/jira/browse/ARROW-15142
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 6.0.1
            Reporter: Karthik


When trying to save a Pandas dataframe with a nested type (list within list, 
list within dict) using pyarrow engine, the following error is encountered

{color:#e75c58}ArrowInvalid{color}: ('cannot mix list and non-list, non-null 
values', 'Conversion failed for column A with type object')

 

Repro:
{code:java}
import pandas as pd
x = pd.DataFrame({"A": [[24, 27, [1, 1]]]})
x.to_parquet('/tmp/a.pqt', engine="pyarrow")  {code}
Doing a bit of googling, it appears that this is a known Arrow shortcoming. 
However, this is a commonly encountered datastructure, and 'fastparquet' 
handles this seamlessly. Is there a proposed timeline/plan for fixing this?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to