Suvayu Ali created ARROW-4814:
---------------------------------

             Summary: [Python] Exception when writing nested columns that are 
tuples to parquet
                 Key: ARROW-4814
                 URL: https://issues.apache.org/jira/browse/ARROW-4814
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.12.1
         Environment: 4.20.8-100.fc28.x86_64
            Reporter: Suvayu Ali
         Attachments: df_to_parquet_fail.py, test.csv

I get an exception when I try to write a {{pandas.DataFrame}} to a parquet file 
where one of the columns has tuples in them.  I use tuples here because it 
allows for easier querying in pandas (see ARROW-3806 for a more detailed 
description).

{code}
Traceback (most recent call last):
  File "df_to_parquet_fail.py", line 5, in <module>
    df.to_parquet("test.parquet")  # crashes
  File "/home/user/.local/lib/python3.6/site-packages/pandas/core/frame.py", 
line 2203, in to_parquet                                                        
                               
    partition_cols=partition_cols, **kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/pandas/io/parquet.py", 
line 252, in to_parquet                                                         
                               
    partition_cols=partition_cols, **kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/pandas/io/parquet.py", 
line 113, in write                                                              
                               
    table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
  File "pyarrow/table.pxi", line 1141, in pyarrow.lib.Table.from_pandas
  File 
"/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 
431, in dataframe_to_arrays                                                     
                      
    convert_types)]
  File 
"/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 
430, in <listcomp>                                                              
                      
    for c, t in zip(columns_to_convert,
  File 
"/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 
426, in convert_column                                                          
                      
    raise e
  File 
"/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 
420, in convert_column                                                          
                      
    return pa.array(col, type=ty, from_pandas=True, safe=safe)
  File "pyarrow/array.pxi", line 176, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: ("Could not convert ('G',) with type tuple: did not 
recognize Python value type when inferring an Arrow data type", 'Conversion 
failed for column ALTS with type object')
{code}

The issue maybe replicated with the attached script and csv file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to