atherkevin opened a new issue, #15133: URL: https://github.com/apache/arrow/issues/15133
### Describe the bug, including details regarding any error messages, version, and platform. Hello, I'm running into an issue using the pandas to_feather() call with pyarrow. Pyarrow seems to ignore the data types in the dataframe, and instead tries to convert objects to it's own data type, but then fails on the conversion if the column has mixed data types. I'm running python 3.11.1, pandas 1.5.2, numpy 1.24.1, and pyarrow 10.0.1, and have reproduced in a docker image of python 3.11.1-slim and on mac M1. `df = pd.DataFrame(({ 'a': ['a', 'a', 'a', 'b', 'b', 'b', 'c'], 'b': ['c', 'c', None, 'd', 'd', 'd', 'e'], 'c': [1.5, 2.0, 3.5, 5.0, 8.0, 10.0, 'a string'], 'd': [1, 2, 3, 1, 2, 3, 1], 'a1': [1, 1, 1, 1, 1, 1, 1], # junk columns to test return df 'a2': [2, 2, 2, 2, 2, 2, 2], # junk columns to test return df })) df.to_feather('file')` Pandas handles this fine, and will say the dtype of column 'C' is either object or string. When attempting to save as a feather file, the following stack trace happens: > Error > Traceback (most recent call last): > File "/Users/kevin/PycharmProjects/analyzethatv1/tests/test_pyarrow_file_failure.py", line 16, in test_error_conversion > df.to_feather('file') > File "/Users/kevin/PycharmProjects/analyzethatv1/venv/lib/python3.11/site-packages/pandas/util/_decorators.py", line 211, in wrapper > return func(*args, **kwargs) > ^^^^^^^^^^^^^^^^^^^^^ > File "/Users/kevin/PycharmProjects/analyzethatv1/venv/lib/python3.11/site-packages/pandas/core/frame.py", line 2794, in to_feather > to_feather(self, path, **kwargs) > File "/Users/kevin/PycharmProjects/analyzethatv1/venv/lib/python3.11/site-packages/pandas/io/feather_format.py", line 93, in to_feather > feather.write_feather(df, handles.handle, **kwargs) > File "/Users/kevin/PycharmProjects/analyzethatv1/venv/lib/python3.11/site-packages/pyarrow/feather.py", line 164, in write_feather > table = Table.from_pandas(df, preserve_index=preserve_index) > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas > File "/Users/kevin/PycharmProjects/analyzethatv1/venv/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 611, in dataframe_to_arrays > arrays = [convert_column(c, f) > ^^^^^^^^^^^^^^^^^^^^^ > File "/Users/kevin/PycharmProjects/analyzethatv1/venv/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 611, in <listcomp> > arrays = [convert_column(c, f) > ^^^^^^^^^^^^^^^^^^^^ > File "/Users/kevin/PycharmProjects/analyzethatv1/venv/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 598, in convert_column > raise e > File "/Users/kevin/PycharmProjects/analyzethatv1/venv/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 592, in convert_column > result = pa.array(col, type=type_, from_pandas=True, safe=safe) > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > File "pyarrow/array.pxi", line 316, in pyarrow.lib.array > File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: ("Could not convert 'a string' with type str: tried to convert to double", 'Conversion failed for column c with type object') Questions- why would this not obey the pandas dtype? I can appreciate attempting the conversion, but if the conversion fails, why not maintain the pandas dtype for saving? Is there some option I'm missing? ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org