mikelui opened a new issue, #34729: URL: https://github.com/apache/arrow/issues/34729
### Describe the enhancement requested Today (Py)Arrow -> Pandas treats: 1. structs as Python dicts (pydicts) 2. maps as Python list of tuples (i.e. [(key1, value1), (key2, value2), ...] While treating maps as a list of tuples has various pros (preserve ordering, allows duplicates, speed of iteration/creation), many times users simply want a ... map! (i.e. pydict). Having to convert every element via `dict(map_elem)` is cumbersome, slow, and downright nasty when working with arbitrarily nested maps in Pandas. Today, Pyarrow already supports (pydicts -> arrow maps) when a schema is provided. So, it's a known use-case. I propose a simple switch in PandasOptions for `table.to_pandas(...)` to generate pydicts for maps. This creates a symmetrical option for the (pydict -> arrow maps), as well. ---- As alluded to above, the cons are that: 1. Users lose ordering. 2. Duplicates will be removed, resulting in potential data loss. This should be made clear to the user. 3. Potential ambiguity when examining data that has both maps and structs I think the upsides of ergonomic flexibility outweigh these cons. ---- Separately, I think there's a bug that precludes (pydicts -> arrow maps) when the type is nested (e.g. list of maps). That should be fixed as well to provide a more featureful map experience. ### Component(s) C++, Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
