alamb opened a new issue, #12338:
URL: https://github.com/apache/datafusion/issues/12338

   ### Is your feature request related to a problem or challenge?
   
   Part of https://github.com/apache/datafusion/issues/11752
   
   StringView is a new arrow array type that allows for more efficient string 
processing -- specifically it allows string data to be adjusted without copying 
the underlying data
   
   See this blog post for more details: 
https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/
   
   @Kev1n8 added support for `StringView` to the `substr` function in  
https://github.com/apache/datafusion/pull/12044
   
   At the moment `substr` produces a `StringArray` output  when the input is 
`StringArray`, but we could actually generate a `StringViewArray` as output 
which would be more efficient in most cases (avoids copying the string values)
   
   However, in order to avoid errors when `substr` is used in an expression,  
we need to make sure that all the rest of the String functions support 
StringView as input as well. Aka we should wait for the "Required for enabling 
StringView by default" list on 
https://github.com/apache/datafusion/issues/11752 to be completed
   
   
   
   ### Describe the solution you'd like
   
   1. change the output type of `substr` to be `StringViewArray` when the input 
is `StringArray` (note for `LargeStringArray` we will still need to copy the 
data I think as `StringView` is limited to 2^32 bytes)
   2. Change the implementation of `substr` to use `StringView` internally
   3. Add tests
   
   
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   
   
   Note that @Kevin8 has already added support for `StringView` to the `substr` 
function in  https://github.com/apache/datafusion/pull/12044
   
   They also suggested this same optimization could be applied 
https://github.com/apache/datafusion/pull/12044#issuecomment-2316111793


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to