drin commented on PR #13487:
URL: https://github.com/apache/arrow/pull/13487#issuecomment-1400854026

   The key portion that added extra dev time was thinking I needed something in 
between a 
[KeyColumnArray](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/light_array.h#L78-L83)
 (alternative to `ArraySpan`) and 
[ResizableArrayData](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/light_array.h#L247-L253)
 (version of `KeyColumnArray` that owns buffers). For now I am calling it a 
`KeyColumnPseudoSpan`.
   
   For example, when hashing a `ListArray` a class in between the 2 mentioned 
above would allow me to maintain a view into the data owned by the `ListArray` 
but alter offsets to that only the top-most structure is preserved. For a 
standard `ListArray` this means just representing the `ListArray` with a 
`KeyColumnArray`, for a nested `ListArray` with 2-levels, this means flattening 
the `ListArray` into a standard `ListArray` (1 level) and then representing the 
`ListArray` data and modified offsets with the new `KeyColumnPseudoSpan` class.
   
   I'll try to get the commit in today so that I can clarify the approach and 
get feedback


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to