[GitHub] [arrow] mikelui opened a new issue, #34729: Better support for Maps in Pandas

via GitHub Sat, 25 Mar 2023 22:28:46 -0700


mikelui opened a new issue, #34729:
URL: https://github.com/apache/arrow/issues/34729


   ### Describe the enhancement requested
   
   Today (Py)Arrow -> Pandas treats:
   
   1. structs as Python dicts (pydicts)
   2. maps as Python list of tuples (i.e. [(key1, value1), (key2, value2), ...]
   
   While treating maps as a list of tuples has various pros (preserve ordering, 
allows duplicates, speed of iteration/creation), many times users simply want a 
... map! (i.e. pydict). 
   
   Having to convert every element via `dict(map_elem)` is cumbersome, slow, 
and downright nasty when working with arbitrarily nested maps in Pandas.
   
   Today, Pyarrow already supports (pydicts -> arrow maps) when a schema is 
provided. So, it's a known use-case. 
   
   I propose a simple switch in PandasOptions for `table.to_pandas(...)` to 
generate pydicts for maps. This creates a symmetrical option for the (pydict -> 
arrow maps), as well.
   
   ----
   
   As alluded to above, the cons are that:
   1. Users lose ordering.
   2. Duplicates will be removed, resulting in potential data loss. This should 
be made clear to the user.
   3. Potential ambiguity when examining data that has both maps and structs
   
   I think the upsides of ergonomic flexibility outweigh these cons.
   
   ----
   
   Separately, I think there's a bug that precludes (pydicts -> arrow maps) 
when the type is nested (e.g. list of maps). That should be fixed as well to 
provide a more featureful map experience.
   
   ### Component(s)
   
   C++, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] mikelui opened a new issue, #34729: Better support for Maps in Pandas

Reply via email to