Suvayu Ali created ARROW-4814: --------------------------------- Summary: [Python] Exception when writing nested columns that are tuples to parquet Key: ARROW-4814 URL: https://issues.apache.org/jira/browse/ARROW-4814 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.12.1 Environment: 4.20.8-100.fc28.x86_64 Reporter: Suvayu Ali Attachments: df_to_parquet_fail.py, test.csv
I get an exception when I try to write a {{pandas.DataFrame}} to a parquet file where one of the columns has tuples in them. I use tuples here because it allows for easier querying in pandas (see ARROW-3806 for a more detailed description). {code} Traceback (most recent call last): File "df_to_parquet_fail.py", line 5, in <module> df.to_parquet("test.parquet") # crashes File "/home/user/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2203, in to_parquet partition_cols=partition_cols, **kwargs) File "/home/user/.local/lib/python3.6/site-packages/pandas/io/parquet.py", line 252, in to_parquet partition_cols=partition_cols, **kwargs) File "/home/user/.local/lib/python3.6/site-packages/pandas/io/parquet.py", line 113, in write table = self.api.Table.from_pandas(df, **from_pandas_kwargs) File "pyarrow/table.pxi", line 1141, in pyarrow.lib.Table.from_pandas File "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 431, in dataframe_to_arrays convert_types)] File "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 430, in <listcomp> for c, t in zip(columns_to_convert, File "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 426, in convert_column raise e File "/home/user/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 420, in convert_column return pa.array(col, type=ty, from_pandas=True, safe=safe) File "pyarrow/array.pxi", line 176, in pyarrow.lib.array File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: ("Could not convert ('G',) with type tuple: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column ALTS with type object') {code} The issue maybe replicated with the attached script and csv file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)