[ 
https://issues.apache.org/jira/browse/ARROW-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-4814:
-----------------------------------------
    Labels: pandas  (was: pandas parquet)

> [Python] Exception when writing nested columns that are tuples to parquet
> -------------------------------------------------------------------------
>
>                 Key: ARROW-4814
>                 URL: https://issues.apache.org/jira/browse/ARROW-4814
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.12.1
>         Environment: 4.20.8-100.fc28.x86_64
>            Reporter: Suvayu Ali
>            Priority: Major
>              Labels: pandas
>         Attachments: df_to_parquet_fail.py, test.csv
>
>
> I get an exception when I try to write a {{pandas.DataFrame}} to a parquet 
> file where one of the columns has tuples in them.  I use tuples here because 
> it allows for easier querying in pandas (see ARROW-3806 for a more detailed 
> description).
> {code}
> Traceback (most recent call last):
>   File "df_to_parquet_fail.py", line 5, in <module>
>     df.to_parquet("test.parquet")  # crashes
>   File "/home/user/.local/lib/python3.6/site-packages/pandas/core/frame.py", 
> line 2203, in to_parquet                                                      
>                                  
>     partition_cols=partition_cols, **kwargs)
>   File "/home/user/.local/lib/python3.6/site-packages/pandas/io/parquet.py", 
> line 252, in to_parquet                                                       
>                                  
>     partition_cols=partition_cols, **kwargs)
>   File "/home/user/.local/lib/python3.6/site-packages/pandas/io/parquet.py", 
> line 113, in write                                                            
>                                  
>     table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
>   File "pyarrow/table.pxi", line 1141, in pyarrow.lib.Table.from_pandas
>   File 
> "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 431, in dataframe_to_arrays                                              
>                              
>     convert_types)]
>   File 
> "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 430, in <listcomp>                                                       
>                              
>     for c, t in zip(columns_to_convert,
>   File 
> "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 426, in convert_column                                                   
>                              
>     raise e
>   File 
> "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 420, in convert_column                                                   
>                              
>     return pa.array(col, type=ty, from_pandas=True, safe=safe)
>   File "pyarrow/array.pxi", line 176, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: ("Could not convert ('G',) with type tuple: did not 
> recognize Python value type when inferring an Arrow data type", 'Conversion 
> failed for column ALTS with type object')
> {code}
> The issue maybe replicated with the attached script and csv file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to