Giora Simchoni created ARROW-6607:
-------------------------------------

             Summary: Support for set/list columns in python
                 Key: ARROW-6607
                 URL: https://issues.apache.org/jira/browse/ARROW-6607
             Project: Apache Arrow
          Issue Type: Wish
          Components: Python
         Environment: python 3.6.7, pandas 0.24.2, pyarrow 0.14.1 on WSL in 
Windows 10
            Reporter: Giora Simchoni


Hi,

Using python 3.6.7, pandas 0.24.2, pyarrow 0.14.1 on WSL in Windows 10...

```python
import pandas as pd

df = pd.DataFrame(\{'a': [1,2,3], 'b': [set([1,2]), set([2,3]), set([3,4,5])]})

df.to_feather('test.ft')
```

I get:

```
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/gioras/.local/lib/python3.6/site-packages/pandas/core/frame.py", 
line 2131, in to_feather
 to_feather(self, fname)
 File 
"/home/gioras/.local/lib/python3.6/site-packages/pandas/io/feather_format.py", 
line 83, in to_feather
 feather.write_feather(df, path)
 File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", 
line 182, in write_feather
 writer.write(df)
 File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", 
line 93, in write
 table = Table.from_pandas(df, preserve_index=False)
 File "pyarrow/table.pxi", line 1174, in pyarrow.lib.Table.from_pandas
 File 
"/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
line 496, in dataframe_to_arrays
 for c, f in zip(columns_to_convert, convert_fields)]
 File 
"/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
line 496, in <listcomp>
 for c, f in zip(columns_to_convert, convert_fields)]
 File 
"/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
line 487, in convert_column
 raise e
 File 
"/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
line 481, in convert_column
 result = pa.array(col, type=type_, from_pandas=True, safe=safe)
 File "pyarrow/array.pxi", line 191, in pyarrow.lib.array
 File "pyarrow/array.pxi", line 78, in pyarrow.lib._ndarray_to_array
 File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: ('Could not convert \{1, 2} with type set: did not 
recognize Python value type when inferring an Arrow data type', 'Conversion 
failed for column b with type object')
```

And obviously `df.drop('b', axis=1).to_feather('test.ft')` works.

Questions:
(1) Is it possible to support these kind of set/list columns?
(2) Anyone has an idea on how to deal with this? I *cannot* unnest these 
set/list columns as this would explode the DataFrame. My only other idea is to 
convert set `\{1,2}` into a string `1,2` and parse it after reading the file. 
And hoping it won't be slow.

 

Update:

With lists column the error is different:

```python
import pandas as pd

df = pd.DataFrame(\{'a': [1,2,3], 'b': [[1,2], [2,3], [3,4,5]]})

df.to_feather('test.ft')
```

```

Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/gioras/.local/lib/python3.6/site-packages/pandas/core/frame.py", 
line 2131, in to_feather
 to_feather(self, fname)
 File 
"/home/gioras/.local/lib/python3.6/site-packages/pandas/io/feather_format.py", 
line 83, in to_feather
 feather.write_feather(df, path)
 File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", 
line 182, in write_feather
 writer.write(df)
 File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", 
line 97, in write
 self.writer.write_array(name, col.data.chunk(0))
 File "pyarrow/feather.pxi", line 67, in pyarrow.lib.FeatherWriter.write_array
 File "pyarrow/error.pxi", line 93, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: list<item: int64>

```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to