[ https://issues.apache.org/jira/browse/ARROW-9991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maarten Breddels updated ARROW-9991: ------------------------------------ Summary: [C++] split kernels for strings/binary (was: [C++] split kernsl for strings/binary) > [C++] split kernels for strings/binary > -------------------------------------- > > Key: ARROW-9991 > URL: https://issues.apache.org/jira/browse/ARROW-9991 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Maarten Breddels > Assignee: Maarten Breddels > Priority: Major > > Similar to Python str.split and bytes.split, we'd like to have a way to > convert str into list[str] (and similarly for bytes). > When the separator is given, the algorithms for both types are the same. > Python, however, overloads strip. When given no separator, the algorithm will > split considering all whitespace (unicode for str, ascii for bytes) as > separator. > I'd rather see not too much overloaded kernels, e.g. > binary_split (takes string/binary separator, and maxsplit arg, no special > utf8 version needed) > utf8_split_whitespace (similar to Python's version given no separator) > ascii_split_whitespace (similar to Python's version given no separator, but > considering ascii, although this could work on any binary data) > there can also be rsplit versions of these, or they could be an argument. > -- This message was sent by Atlassian Jira (v8.3.4#803005)