[jira] [Commented] (ARROW-15554) [Format][C++] Add "LargeMap" type with 64-bit offsets
[ https://issues.apache.org/jira/browse/ARROW-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488413#comment-17488413 ] Sarah Gilmore commented on ARROW-15554: --- Wil do! > [Format][C++] Add "LargeMap" type with 64-bit offsets > - > > Key: ARROW-15554 > URL: https://issues.apache.org/jira/browse/ARROW-15554 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Format >Reporter: Sarah Gilmore >Priority: Major > > It would be nice if a "LargeMap" type existed along side the "Map" type for > parity. For other datatypes that require offset arrays/buffers, such as > String, List, BinaryArray, provides a "large" version of these types, i.e. > LargeString, LargeList, and LargeBinaryArray. It would be nice to have a > "LargeMap" for parity. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ARROW-15554) [Format][C++] Add "LargeMap" type with 64-bit offsets
[ https://issues.apache.org/jira/browse/ARROW-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488293#comment-17488293 ] Antoine Pitrou commented on ARROW-15554: Hi [~sgilmore] , thank you for the explanation. In any case, format additions have to be discussed and voted on on the development mailing-list. I encourage you to create a new discussion there: see [https://arrow.apache.org/community/] > [Format][C++] Add "LargeMap" type with 64-bit offsets > - > > Key: ARROW-15554 > URL: https://issues.apache.org/jira/browse/ARROW-15554 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Format >Reporter: Sarah Gilmore >Priority: Major > > It would be nice if a "LargeMap" type existed along side the "Map" type for > parity. For other datatypes that require offset arrays/buffers, such as > String, List, BinaryArray, provides a "large" version of these types, i.e. > LargeString, LargeList, and LargeBinaryArray. It would be nice to have a > "LargeMap" for parity. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ARROW-15554) [Format][C++] Add "LargeMap" type with 64-bit offsets
[ https://issues.apache.org/jira/browse/ARROW-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487136#comment-17487136 ] Sarah Gilmore commented on ARROW-15554: --- Hi [~apitrou], I was more thinking about the future when I created this Jira issue. I don't have a concrete need now, but I can picture a few scenarios in which the size limitation imposed by MapArray's 32-bit offsets cannot be worked around. *Scenario 1:* Suppose you have a ListArray of MapArrays. If one of the maps requires more than int32::max key-value pairs, there's no way to do this currently. You could try using a ChunkedArray, but you would still need to split the large map across multiple rows in the list. *Scenario 2:* Even if the MapArray is at the top of the object hierarchy, the same problem could potentially arise if a row within the array needs to contain more than int32::max key-value pairs. You could try to use a ChunkedArray to resolve the issue, but the key-value pairs would still be split across multiple rows. I've seen Parquet files with MAP columns, and I can imagine a situation in which someone has a very large MAP as the top-most data structure or within a nested one. While running into a situation in which they can't use MapArrays to represent their data is probably rare, it's not entirely impossible given int32's size restrictions. I'd honestly be interested in looking into this myself. I hope this helps. Best, Sarah > [Format][C++] Add "LargeMap" type with 64-bit offsets > - > > Key: ARROW-15554 > URL: https://issues.apache.org/jira/browse/ARROW-15554 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Format >Reporter: Sarah Gilmore >Priority: Major > > It would be nice if a "LargeMap" type existed along side the "Map" type for > parity. For other datatypes that require offset arrays/buffers, such as > String, List, BinaryArray, provides a "large" version of these types, i.e. > LargeString, LargeList, and LargeBinaryArray. It would be nice to have a > "LargeMap" for parity. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ARROW-15554) [Format][C++] Add "LargeMap" type with 64-bit offsets
[ https://issues.apache.org/jira/browse/ARROW-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486676#comment-17486676 ] Antoine Pitrou commented on ARROW-15554: Is this out of a concrete need? > [Format][C++] Add "LargeMap" type with 64-bit offsets > - > > Key: ARROW-15554 > URL: https://issues.apache.org/jira/browse/ARROW-15554 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Format >Reporter: Sarah Gilmore >Priority: Major > > It would be nice if a "LargeMap" type existed along side the "Map" type for > parity. For other datatypes that require offset arrays/buffers, such as > String, List, BinaryArray, provides a "large" version of these types, i.e. > LargeString, LargeList, and LargeBinaryArray. It would be nice to have a > "LargeMap" for parity. -- This message was sent by Atlassian Jira (v8.20.1#820001)