[ 
https://issues.apache.org/jira/browse/ARROW-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche closed ARROW-4814.
----------------------------------------
    Resolution: Resolved

I opened ARROW-5286 to also support specifying a struct type (in addition to 
list type as I showed above), and 
https://issues.apache.org/jira/browse/ARROW-5287 with the question whether we 
should do automatic type inference for tuples, so your example case above would 
work automatically.

Therefore, closing this issue, as those other two should cover the remaining 
questions. 


> [Python] Exception when writing nested columns that are tuples to parquet
> -------------------------------------------------------------------------
>
>                 Key: ARROW-4814
>                 URL: https://issues.apache.org/jira/browse/ARROW-4814
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.12.1
>         Environment: 4.20.8-100.fc28.x86_64
>            Reporter: Suvayu Ali
>            Priority: Major
>              Labels: pandas
>         Attachments: df_to_parquet_fail.py, test.csv
>
>
> I get an exception when I try to write a {{pandas.DataFrame}} to a parquet 
> file where one of the columns has tuples in them.  I use tuples here because 
> it allows for easier querying in pandas (see ARROW-3806 for a more detailed 
> description).
> {code}
> Traceback (most recent call last):
>   File "df_to_parquet_fail.py", line 5, in <module>
>     df.to_parquet("test.parquet")  # crashes
>   File "/home/user/.local/lib/python3.6/site-packages/pandas/core/frame.py", 
> line 2203, in to_parquet                                                      
>                                  
>     partition_cols=partition_cols, **kwargs)
>   File "/home/user/.local/lib/python3.6/site-packages/pandas/io/parquet.py", 
> line 252, in to_parquet                                                       
>                                  
>     partition_cols=partition_cols, **kwargs)
>   File "/home/user/.local/lib/python3.6/site-packages/pandas/io/parquet.py", 
> line 113, in write                                                            
>                                  
>     table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
>   File "pyarrow/table.pxi", line 1141, in pyarrow.lib.Table.from_pandas
>   File 
> "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 431, in dataframe_to_arrays                                              
>                              
>     convert_types)]
>   File 
> "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 430, in <listcomp>                                                       
>                              
>     for c, t in zip(columns_to_convert,
>   File 
> "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 426, in convert_column                                                   
>                              
>     raise e
>   File 
> "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 420, in convert_column                                                   
>                              
>     return pa.array(col, type=ty, from_pandas=True, safe=safe)
>   File "pyarrow/array.pxi", line 176, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: ("Could not convert ('G',) with type tuple: did not 
> recognize Python value type when inferring an Arrow data type", 'Conversion 
> failed for column ALTS with type object')
> {code}
> The issue maybe replicated with the attached script and csv file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to