[jira] [Commented] (ARROW-15554) [Format][C++] Add "LargeMap" type with 64-bit offsets

2022-02-07 Thread Sarah Gilmore (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488413#comment-17488413
 ] 

Sarah Gilmore commented on ARROW-15554:
---

Wil do!

> [Format][C++] Add "LargeMap" type with 64-bit offsets
> -
>
> Key: ARROW-15554
> URL: https://issues.apache.org/jira/browse/ARROW-15554
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Reporter: Sarah Gilmore
>Priority: Major
>
> It would be nice if a "LargeMap" type existed along side the "Map" type for 
> parity. For other datatypes that require offset arrays/buffers, such as 
> String, List, BinaryArray, provides a "large" version of these types, i.e. 
> LargeString, LargeList, and LargeBinaryArray. It would be nice to have a 
> "LargeMap" for parity.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15554) [Format][C++] Add "LargeMap" type with 64-bit offsets

2022-02-07 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488293#comment-17488293
 ] 

Antoine Pitrou commented on ARROW-15554:


Hi [~sgilmore] , thank you for the explanation. In any case, format additions 
have to be discussed and voted on on the development mailing-list. I encourage 
you to create a new discussion there: see [https://arrow.apache.org/community/]

> [Format][C++] Add "LargeMap" type with 64-bit offsets
> -
>
> Key: ARROW-15554
> URL: https://issues.apache.org/jira/browse/ARROW-15554
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Reporter: Sarah Gilmore
>Priority: Major
>
> It would be nice if a "LargeMap" type existed along side the "Map" type for 
> parity. For other datatypes that require offset arrays/buffers, such as 
> String, List, BinaryArray, provides a "large" version of these types, i.e. 
> LargeString, LargeList, and LargeBinaryArray. It would be nice to have a 
> "LargeMap" for parity.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15554) [Format][C++] Add "LargeMap" type with 64-bit offsets

2022-02-04 Thread Sarah Gilmore (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17487136#comment-17487136
 ] 

Sarah Gilmore commented on ARROW-15554:
---

Hi [~apitrou],
 
I was more thinking about the future when I created this Jira issue. I don't 
have a concrete need now, but I can picture a few scenarios in which the size 
limitation imposed by MapArray's 32-bit offsets cannot be worked around.
 
*Scenario 1:*
 
Suppose you have a ListArray of MapArrays. If one of the maps requires more 
than int32::max key-value pairs, there's no way to do this currently. You could 
try using a ChunkedArray, but you would still need to split the large map 
across multiple rows in the list.
 
*Scenario 2:*
 
Even if the MapArray is at the top of the object hierarchy, the same problem 
could potentially arise if a row within the array needs to contain more than 
int32::max key-value pairs. You could try to use a ChunkedArray to resolve the 
issue, but the key-value pairs would still be split across multiple rows.
 
I've seen Parquet files with MAP columns, and I can imagine a situation in 
which someone has a very large MAP as the top-most data structure or within a 
nested one. While running into a situation in which they can't use MapArrays to 
represent their data is probably rare, it's not entirely impossible given 
int32's size restrictions. 
 
I'd honestly be interested in looking into this myself.
 
I hope this helps.
 
Best,
Sarah
 
 

> [Format][C++] Add "LargeMap" type with 64-bit offsets
> -
>
> Key: ARROW-15554
> URL: https://issues.apache.org/jira/browse/ARROW-15554
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Reporter: Sarah Gilmore
>Priority: Major
>
> It would be nice if a "LargeMap" type existed along side the "Map" type for 
> parity. For other datatypes that require offset arrays/buffers, such as 
> String, List, BinaryArray, provides a "large" version of these types, i.e. 
> LargeString, LargeList, and LargeBinaryArray. It would be nice to have a 
> "LargeMap" for parity.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15554) [Format][C++] Add "LargeMap" type with 64-bit offsets

2022-02-03 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17486676#comment-17486676
 ] 

Antoine Pitrou commented on ARROW-15554:


Is this out of a concrete need?

> [Format][C++] Add "LargeMap" type with 64-bit offsets
> -
>
> Key: ARROW-15554
> URL: https://issues.apache.org/jira/browse/ARROW-15554
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Reporter: Sarah Gilmore
>Priority: Major
>
> It would be nice if a "LargeMap" type existed along side the "Map" type for 
> parity. For other datatypes that require offset arrays/buffers, such as 
> String, List, BinaryArray, provides a "large" version of these types, i.e. 
> LargeString, LargeList, and LargeBinaryArray. It would be nice to have a 
> "LargeMap" for parity.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)