pitrou commented on a change in pull request #8271:
URL: https://github.com/apache/arrow/pull/8271#discussion_r497570569
##########
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##########
@@ -809,6 +809,475 @@ struct IsUpperAscii :
CharacterPredicateAscii<IsUpperAscii> {
}
};
+// splitting
+
+template <typename Type, typename ListType, typename Options, typename Derived>
+struct SplitBaseTransform {
+ // TODO: assert offsets types are the same?
+ using offset_type = typename Type::offset_type;
+ using ArrayType = typename TypeTraits<Type>::ArrayType;
+ using ArrayListType = typename TypeTraits<ListType>::ArrayType;
+ using ListScalarType = typename TypeTraits<ListType>::ScalarType;
+ using ScalarType = typename TypeTraits<Type>::ScalarType;
+ using Builder = typename TypeTraits<Type>::BuilderType;
+ using State = OptionsWrapper<Options>;
+
+ static void Split(const uint8_t* input_string, offset_type
input_string_nbytes,
+ offset_type** output_string_offsets, offset_type*
string_output_count,
+ offset_type* string_output_offset, uint8_t**
output_string_data,
+ const Options& options) {
+ const uint8_t* begin = input_string;
+ const uint8_t* end = begin + input_string_nbytes;
+
+ int64_t max_splits = options.max_splits;
+ // if there is no max splits, reversing does not make sense (and is
probably less
+ // efficient), but is useful for testing
+ if (options.reverse) {
+ // note that i points 1 further than the 'current'
+ const uint8_t* i = end;
+ // we will record the parts in reverse order
+ std::vector<std::pair<const uint8_t*, const uint8_t*>> parts;
Review comment:
I'll also note it's a pity to allocate a `vector` dynamically for each
input string. Perhaps you can store this scratch space on the instance instead?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]