thatlittleboy opened a new issue, #12899:
URL: https://github.com/apache/arrow/issues/12899
Consider the following example with pandas:
```python
[ins] In [11]: df = pd.DataFrame({
...: "cat1": pd.Categorical(["a", "b", "a"]),
...: "cat2": pd.cut(range(1, 10, 3), [-1, 5, 10]),
...: })
[ins] In [14]: df['cat2'].cat.categories
Out[14]: IntervalIndex([(-1, 5], (5, 10]], dtype='interval[int64, right]')
```
I have a categorical column `cat2` whose category dtypes are intervals.
I can write the dataframe to a feather file, no issues, but reading it
throws an ArrowInvalid error:
```python
[ins] In [19]: feather.write_feather(df, "test.feather")
[ins] In [20]: feather.read_feather("test.feather")
---------------------------------------------------------------------------
ArrowInvalid Traceback (most recent call last)
Input In [20], in <cell line: 1>()
----> 1 feather.read_feather("test.feather")
File ~/Desktop/test/venv/lib/python3.9/site-packages/pyarrow/feather.py:220,
in read_feather(source, columns, use_threads, memory_map)
198 """
199 Read a pandas.DataFrame from Feather format. To read as
pyarrow.Table use
200 feather.read_table.
(...)
217 df : pandas.DataFrame
218 """
219 _check_pandas_version()
--> 220 return (read_table(
221 source, columns=columns, memory_map=memory_map,
222 use_threads=use_threads).to_pandas(use_threads=use_threads))
File ~/Desktop/test/venv/lib/python3.9/site-packages/pyarrow/feather.py:248,
in read_table(source, columns, memory_map, use_threads)
244 reader = _feather.FeatherReader(
245 source, use_memory_map=memory_map, use_threads=use_threads)
247 if columns is None:
--> 248 return reader.read()
250 column_types = [type(column) for column in columns]
251 if all(map(lambda t: t == int, column_types)):
File
~/Desktop/test/venv/lib/python3.9/site-packages/pyarrow/_feather.pyx:88, in
pyarrow._feather.FeatherReader.read()
File ~/Desktop/test/venv/lib/python3.9/site-packages/pyarrow/error.pxi:99,
in pyarrow.lib.check_status()
ArrowInvalid: Ran out of field metadata, likely malformed
```
The error only occurs with the `cat2` (category[interval]) column. For
normal categorical columns like `cat1` in my example, there are no issues.
I note that Interval types are supposedly supported
([here](https://github.com/apache/arrow/blob/master/docs/source/status.rst)),
so is this a bug or am I misunderstanding anything (and the error is expected)?
## versions
python 3.9.10
pandas==1.4.2
pyarrow==7.0.0
mac OS 12.2
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]