AlenkaF commented on issue #41625:
URL: https://github.com/apache/arrow/issues/41625#issuecomment-2120014321
Hi, thank you for opening an issue @djouallah!
I have been able to reproduce on my dev environment. For next time, it will
be much easier to help if you present a simple reproducible example. The google
colab you have linked has lots (lots!) of code not connected to the issue and I
was very reluctant at first to download files and manipulate them but did so
after taking time and checking the source and all the code.
Also, the possibility to actually get an answer on your issue will be higher
with a simple example ;)
Here is a on I created that shows the issue:
```python
>>> import pyarrow as pa
>>> data = {'UNIT': ["DUNIT", "DUNIT", "DUNIT", "DUNIT"],
... 'version' : [1, 1, 3, 3]}
>>> df = pd.DataFrame(data)
>>> df.index = df['version']
>>> df.columns.name = np.int64(142564) ------> The issue is here, numpy
int64 column index name
>>> df
142564 UNIT version
version
1 DUNIT 1
1 DUNIT 1
3 DUNIT 3
3 DUNIT 3
>>> pa.Table.from_pandas(df)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/table.pxi", line 4559, in pyarrow.lib.Table.from_pandas
arrays, schema, n_rows = dataframe_to_arrays(
File "/Users/alenkafrim/repos/arrow/python/pyarrow/pandas_compat.py", line
635, in dataframe_to_arrays
pandas_metadata = construct_metadata(
^^^^^^^^^^^^^^^^^^^
File "/Users/alenkafrim/repos/arrow/python/pyarrow/pandas_compat.py", line
257, in construct_metadata
b'pandas': json.dumps({
^^^^^^^^^^^^
File
"/opt/homebrew/Cellar/[email protected]/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py",
line 231, in dumps
return _default_encoder.encode(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/opt/homebrew/Cellar/[email protected]/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py",
line 200, in encode
chunks = self.iterencode(o, _one_shot=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/opt/homebrew/Cellar/[email protected]/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py",
line 258, in iterencode
return _iterencode(o, 0)
^^^^^^^^^^^^^^^^^
File
"/opt/homebrew/Cellar/[email protected]/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py",
line 180, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type int64 is not JSON serializable
```
The code worked if I remove the column name
```python
>>> df.columns.name = None
>>> pa.Table.from_pandas(df)
pyarrow.Table
UNIT: string
version: int64
__index_level_0__: int64
----
UNIT: [["DUNIT","DUNIT","DUNIT","DUNIT"]]
version: [[1,1,3,3]]
__index_level_0__: [[1,1,3,3]]
```
It would have also worked if python int type would have been used instead of
`numpy.int64`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]