edponce commented on pull request #10869: URL: https://github.com/apache/arrow/pull/10869#issuecomment-907122705
The capitalize and title kernels are the first vector string kernels that perform code point transforms. The code point transforms (case changes) can grow in bytes and thus required the use of `MaxCodeUnits` with the 3/2 growth factor. Also, these kernels depend on `EnsureLookupTablesFilled` to be called in `PreExec` method. These two requirements existed in different classes (`CaseMappingTransform` and `StringTransformCodepoint`) and their use applied `TransformCodepoint` (scalar method) instead of `Transform` (custom). For these reasons, I created `StringTransformCodepointBase` class with the growth factor and lookup tables methods. Now the available classes can be chosen depending on kind of kernel and for those that apply codepoint transforms. cc @pitrou -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org