drin commented on PR #13487: URL: https://github.com/apache/arrow/pull/13487#issuecomment-1400854026
The key portion that added extra dev time was thinking I needed something in between a [KeyColumnArray](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/light_array.h#L78-L83) (alternative to `ArraySpan`) and [ResizableArrayData](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/light_array.h#L247-L253) (version of `KeyColumnArray` that owns buffers). For now I am calling it a `KeyColumnPseudoSpan`. For example, when hashing a `ListArray` a class in between the 2 mentioned above would allow me to maintain a view into the data owned by the `ListArray` but alter offsets to that only the top-most structure is preserved. For a standard `ListArray` this means just representing the `ListArray` with a `KeyColumnArray`, for a nested `ListArray` with 2-levels, this means flattening the `ListArray` into a standard `ListArray` (1 level) and then representing the `ListArray` data and modified offsets with the new `KeyColumnPseudoSpan` class. I'll try to get the commit in today so that I can clarify the approach and get feedback -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org