korowa commented on code in PR #14232:
URL: https://github.com/apache/datafusion/pull/14232#discussion_r1929071971
##########
datafusion/functions-aggregate/src/first_last.rs:
##########
@@ -701,9 +713,98 @@ fn convert_to_sort_cols(arrs: &[ArrayRef], sort_exprs:
&LexOrdering) -> Vec<Sort
#[cfg(test)]
mod tests {
use arrow::array::Int64Array;
+ use arrow_schema::Schema;
+ use compute::SortOptions;
+ use datafusion_physical_expr::{expressions::col, PhysicalSortExpr};
use super::*;
+ #[test]
+ fn test_last_value_with_order_bys() -> Result<()> {
+ // TODO: Move this kind of test to slt, we don't have a nice way to
define the batch size for each `update_batch`
Review Comment:
Reading data from source file (not sure about memory tables) into
aggregation with `batch_size = N` should help. Is this not working?
##########
datafusion/functions-aggregate/src/first_last.rs:
##########
@@ -569,6 +573,13 @@ impl LastValueAccumulator {
})
.collect::<Vec<_>>();
+ // Order by indices for cases where the values are the same, we expect
the last index
+ let indices: UInt64Array = (0..num_rows).map(|x| x as u64).collect();
+ sort_columns.push(SortColumn {
Review Comment:
This part is a bit concerning in terms of performance -- won't sorting by
multiple columns (due to row indices being added, now there are no cases with a
single column) be noticeably slower due to falling back to lexicographical
comparator?
If so, perhaps, it should either be worked around (to somehow sort leaving
last `is_ge()` value) or maybe it doesn't worth that, since this will work only
for single threaded queries, while, I suppose, the primarily use-case is
multithreaded execution, which messes up the original order of records in files
due to parallel reads followed by repartitioning.
\+ Not sure if current behaviour of this function is incorrect and must be
fixed, it looks more like nice-to-have feature (but that is just an opinion).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]