Dandandan commented on code in PR #21006:
URL: https://github.com/apache/datafusion/pull/21006#discussion_r2955756061
##########
datafusion/functions-nested/src/sort.rs:
##########
@@ -208,55 +211,150 @@ fn array_sort_generic<OffsetSize: OffsetSizeTrait>(
list_array: &GenericListArray<OffsetSize>,
field: FieldRef,
sort_options: Option<SortOptions>,
+) -> Result<ArrayRef> {
+ let values = list_array.values();
+
+ if values.data_type().is_primitive() {
+ array_sort_direct(list_array, field, sort_options)
+ } else {
+ array_sort_batch_indices(list_array, field, sort_options)
+ }
+}
+
+/// Sort each row using `compute::sort()` and concatenate the results.
+///
+/// This is efficient for primitive element types because Arrow's sort kernel
+/// does the sorting in-place.
+fn array_sort_direct<OffsetSize: OffsetSizeTrait>(
+ list_array: &GenericListArray<OffsetSize>,
+ field: FieldRef,
+ sort_options: Option<SortOptions>,
) -> Result<ArrayRef> {
let row_count = list_array.len();
+ let values = list_array.values();
- let mut array_lengths = vec![];
- let mut arrays = vec![];
+ let mut array_lengths = Vec::with_capacity(row_count);
+ let mut sorted_arrays = Vec::with_capacity(row_count);
for i in 0..row_count {
if list_array.is_null(i) {
array_lengths.push(0);
} else {
let arr_ref = list_array.value(i);
Review Comment:
This existing code seems _very_ inefficient to me as it converts each list
to an individual array and sorts those and then also concatenates those small
arrays.
I think one could make a sort kernel that sorts the lists directly in a
target buffer, that would be much faster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]