[ https://issues.apache.org/jira/browse/ARROW-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krisztian Szucs updated ARROW-6607: ----------------------------------- Fix Version/s: (was: 2.0.0) 3.0.0 > [Python] Support for set/list columns when converting from Pandas > ----------------------------------------------------------------- > > Key: ARROW-6607 > URL: https://issues.apache.org/jira/browse/ARROW-6607 > Project: Apache Arrow > Issue Type: Wish > Components: Python > Environment: python 3.6.7, pandas 0.24.2, pyarrow 0.14.1 on WSL in > Windows 10 > Reporter: Giora Simchoni > Assignee: Krisztian Szucs > Priority: Major > Fix For: 3.0.0 > > > Hi, > Using python 3.6.7, pandas 0.24.2, pyarrow 0.14.1 on WSL in Windows 10... > ```python > import pandas as pd > df = pd.DataFrame(\{'a': [1,2,3], 'b': [set([1,2]), set([2,3]), > set([3,4,5])]}) > df.to_feather('test.ft') > ``` > I get: > ``` > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/home/gioras/.local/lib/python3.6/site-packages/pandas/core/frame.py", > line 2131, in to_feather > to_feather(self, fname) > File > "/home/gioras/.local/lib/python3.6/site-packages/pandas/io/feather_format.py", > line 83, in to_feather > feather.write_feather(df, path) > File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", > line 182, in write_feather > writer.write(df) > File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", > line 93, in write > table = Table.from_pandas(df, preserve_index=False) > File "pyarrow/table.pxi", line 1174, in pyarrow.lib.Table.from_pandas > File > "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 496, in dataframe_to_arrays > for c, f in zip(columns_to_convert, convert_fields)] > File > "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 496, in <listcomp> > for c, f in zip(columns_to_convert, convert_fields)] > File > "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 487, in convert_column > raise e > File > "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", > line 481, in convert_column > result = pa.array(col, type=type_, from_pandas=True, safe=safe) > File "pyarrow/array.pxi", line 191, in pyarrow.lib.array > File "pyarrow/array.pxi", line 78, in pyarrow.lib._ndarray_to_array > File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: ('Could not convert \{1, 2} with type set: did not > recognize Python value type when inferring an Arrow data type', 'Conversion > failed for column b with type object') > ``` > And obviously `df.drop('b', axis=1).to_feather('test.ft')` works. > Questions: > (1) Is it possible to support these kind of set/list columns? > (2) Anyone has an idea on how to deal with this? I *cannot* unnest these > set/list columns as this would explode the DataFrame. My only other idea is > to convert set `\{1,2}` into a string `1,2` and parse it after reading the > file. And hoping it won't be slow. > > Update: > With lists column the error is different: > ```python > import pandas as pd > df = pd.DataFrame(\{'a': [1,2,3], 'b': [[1,2], [2,3], [3,4,5]]}) > df.to_feather('test.ft') > ``` > ``` > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/home/gioras/.local/lib/python3.6/site-packages/pandas/core/frame.py", > line 2131, in to_feather > to_feather(self, fname) > File > "/home/gioras/.local/lib/python3.6/site-packages/pandas/io/feather_format.py", > line 83, in to_feather > feather.write_feather(df, path) > File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", > line 182, in write_feather > writer.write(df) > File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", > line 97, in write > self.writer.write_array(name, col.data.chunk(0)) > File "pyarrow/feather.pxi", line 67, in pyarrow.lib.FeatherWriter.write_array > File "pyarrow/error.pxi", line 93, in pyarrow.lib.check_status > pyarrow.lib.ArrowNotImplementedError: list<item: int64> > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)