maxburke opened a new issue, #9014: URL: https://github.com/apache/arrow-rs/issues/9014
The comment for GenericByteViewArray::slice says that it's zero-copy, but it isn't. This thread from the Datafusion slack channel has some details about a particularly gnarly query I was trying to debug: https://the-asf.slack.com/archives/C04RJ0C85UZ/p1765917347919129 The upshot is that if the query uses Utf8 it finishes pretty quickly, but if it uses Utf8View, it is so slow that it effectively never completes because of all the allocation + reallocation + copying that happens when a GenericByteViewArray is sliced. I hacked up my local version of Datafusion and Arrow to make the `GenericByteViewArray::buffers` element an `Arc<Vec<Buffer>>` and it improved the performance dramatically. It wasn't quite as fast as the plain-Utf8 version, possibly because my implementation was pretty hacky, but it at least completed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
