Jefffrey commented on code in PR #22387:
URL: https://github.com/apache/datafusion/pull/22387#discussion_r3274459851
##########
datafusion/functions-nested/src/replace.rs:
##########
@@ -412,63 +473,165 @@ fn general_replace<O: OffsetSizeTrait>(
)?))
}
-fn array_replace_inner(args: &[ArrayRef]) -> Result<ArrayRef> {
- let [array, from, to] = take_function_args("array_replace", args)?;
+/// For each element of `list_array[i]`, replaces up to `arr_n[i]` occurrences
+/// of `needle[0]` with `to_array[i]`.
+///
+/// This is a specialized version of `general_replace` for scalar needles that
+/// uses a single bulk comparison for better performance.
+fn general_replace_with_scalar<O: OffsetSizeTrait>(
+ list_array: &GenericListArray<O>,
+ needle: &ArrayRef,
+ to_array: &ArrayRef,
+ arr_n: &[i64],
+) -> Result<ArrayRef> {
+ let mut offsets: Vec<O> = vec![O::usize_as(0)];
Review Comment:
Using
[`OffsetBufferBuilder`](https://docs.rs/arrow/latest/arrow/array/struct.OffsetBufferBuilder.html)
provides a nicer API for doing these operations (can just push length)
##########
datafusion/functions-nested/src/replace.rs:
##########
@@ -412,63 +473,165 @@ fn general_replace<O: OffsetSizeTrait>(
)?))
}
-fn array_replace_inner(args: &[ArrayRef]) -> Result<ArrayRef> {
- let [array, from, to] = take_function_args("array_replace", args)?;
+/// For each element of `list_array[i]`, replaces up to `arr_n[i]` occurrences
+/// of `needle[0]` with `to_array[i]`.
+///
+/// This is a specialized version of `general_replace` for scalar needles that
+/// uses a single bulk comparison for better performance.
+fn general_replace_with_scalar<O: OffsetSizeTrait>(
+ list_array: &GenericListArray<O>,
+ needle: &ArrayRef,
Review Comment:
I think `needle` should be a `Scalar` in the arguments here, to make it
clear this is the scalar (without needing to read the docstring)
- This can be passed in all the way from `replace_with_scalar_needle` for
example, since its still a `ScalarValue` at that point
##########
datafusion/functions-nested/src/replace.rs:
##########
@@ -412,63 +473,165 @@ fn general_replace<O: OffsetSizeTrait>(
)?))
}
-fn array_replace_inner(args: &[ArrayRef]) -> Result<ArrayRef> {
- let [array, from, to] = take_function_args("array_replace", args)?;
+/// For each element of `list_array[i]`, replaces up to `arr_n[i]` occurrences
+/// of `needle[0]` with `to_array[i]`.
+///
+/// This is a specialized version of `general_replace` for scalar needles that
+/// uses a single bulk comparison for better performance.
+fn general_replace_with_scalar<O: OffsetSizeTrait>(
+ list_array: &GenericListArray<O>,
+ needle: &ArrayRef,
+ to_array: &ArrayRef,
+ arr_n: &[i64],
+) -> Result<ArrayRef> {
+ let mut offsets: Vec<O> = vec![O::usize_as(0)];
+ let values = list_array.values();
+ let original_data = values.to_data();
+ let to_data = to_array.to_data();
+ let capacity = Capacities::Array(original_data.len());
- // replace at most one occurrence for each element
- let arr_n = vec![1; array.len()];
- match array.data_type() {
- DataType::List(_) => {
- let list_array = array.as_list::<i32>();
- general_replace::<i32>(list_array, from, to, &arr_n)
+ let mut mutable = MutableArrayData::with_capacities(
+ vec![&original_data, &to_data],
+ false,
+ capacity,
+ );
+
+ let mut valid = NullBufferBuilder::new(list_array.len());
Review Comment:
I don't think we need a builder for nulls, we can copy the input array null
buffer as is
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]