maxburke opened a new issue, #9014:
URL: https://github.com/apache/arrow-rs/issues/9014

   The comment for GenericByteViewArray::slice says that it's zero-copy, but it 
isn't.
   
   This thread from the Datafusion slack channel has some details about a 
particularly gnarly query I was trying to debug: 
https://the-asf.slack.com/archives/C04RJ0C85UZ/p1765917347919129
   
   The upshot is that if the query uses Utf8 it finishes pretty quickly, but if 
it uses Utf8View, it is so slow that it effectively never completes because of 
all the allocation + reallocation + copying that happens when a 
GenericByteViewArray is sliced.
   
   I hacked up my local version of Datafusion and Arrow to make the 
`GenericByteViewArray::buffers` element an `Arc<Vec<Buffer>>` and it improved 
the performance dramatically. It wasn't quite as fast as the plain-Utf8 
version, possibly because my implementation was pretty hacky, but it at least 
completed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to