jorisvandenbossche commented on code in PR #39609:
URL: https://github.com/apache/arrow/pull/39609#discussion_r1453098551
##########
python/pyarrow/pandas_compat.py:
##########
@@ -950,7 +950,7 @@ def _reconstruct_index(table, index_descriptors,
all_columns, types_mapper=None)
index = index_arrays[0]
if not isinstance(index, pd.Index):
# Box anything that wasn't boxed above
- index = pd.Index(index, name=index_names[0])
+ index = pd.Index(index.infer_objects(), name=index_names[0])
Review Comment:
The index object is created in `_extract_index_level`, and it seems that for
interval dtype, the `.values` call is causing conversion to object dtype:
```
(Pdb) l
971 return result_table, None, None
972
973 pd = _pandas_api.pd
974
975 col = table.column(i)
976 -> values = col.to_pandas(types_mapper=types_mapper).values
977
978 if hasattr(values, 'flags') and not values.flags.writeable:
979 # ARROW-1054: in pandas 0.19.2, factorize will reject
980 # non-writeable arrays when calling
MultiIndex.from_arrays
981 values = values.copy()
(Pdb) col.to_pandas()
0 (0, 1]
1 (1, 2]
2 (2, 3]
Name: __index_level_0__, dtype: interval
(Pdb) col.to_pandas().values
array([Interval(0, 1, closed='right'), Interval(1, 2, closed='right'),
Interval(2, 3, closed='right')], dtype=object)
```
The reason we do the `.values` is because we check whether the data is
writeable for old pandas versions. This if block can nowadays maybe be removed,
and then we don't need to conversion of a Series to an array.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]