alamb opened a new issue, #22148: URL: https://github.com/apache/datafusion/issues/22148
- a follow on to https://github.com/apache/datafusion/pull/22029 The idea is that now that we have some very optimized string builder APIs that generalize to the three different string types, we can reuse them in multiple kernels At the moment the code is all in in the datafusion-functions crate: https://github.com/apache/datafusion/blob/7708aa2dc61271423a5c334bd2e2025b5e275133/datafusion/functions/src/strings.rs However, that means they can't be used in other crates. I suggest we could put the string code in https://github.com/apache/datafusion/blob/0dfcd97a37e083e48aefc5267539ac453cc07b44/datafusion/physical-expr-common This is consistent with things like String/BinaryMap: https://github.com/apache/datafusion/blob/0dfcd97a37e083e48aefc5267539ac453cc07b44/datafusion/physical-expr-common/src/binary_map.rs#L40-L39 This might make it easier to and and reuse across crates As @neilconway says: Other places where these APIs should be useful: * `initcap` * `lower`, `upper`: at least for the Unicode code path; for ASCII, we might not beat the hand-optimized code added in #21980 * `translate` * `reverse` (might need a slightly different API) * `to_char` (might need a small API extension) * `lpad`, `rpad` (needs a closer look) If we make the builders accessible outside the current crate, some of the Spark functions could use these APIs, as well as `||` for `Utf8View` values. _Originally posted by @neilconway in https://github.com/apache/datafusion/issues/22029#issuecomment-4382974325_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
