berkaysynnada commented on code in PR #16905:
URL: https://github.com/apache/datafusion/pull/16905#discussion_r2232864563
##########
datafusion/core/benches/partial_sort_benchmark.rs:
##########
@@ -0,0 +1,239 @@
+use criterion::{black_box, criterion_group, criterion_main, Criterion};
+use datafusion::arrow::array::Int32Array;
+use datafusion::arrow::datatypes::{DataType, Field, Schema};
+use datafusion::arrow::record_batch::RecordBatch;
+use datafusion::datasource::MemTable;
+use datafusion::logical_expr::{col, SortExpr};
+use datafusion::prelude::*;
+use datafusion_common::Result;
+use std::sync::Arc;
+use tokio::runtime::Runtime;
+
+fn create_presorted_data(rows: usize, groups: usize) -> Result<RecordBatch> {
Review Comment:
can you share these benchmark results in the PR body, before and after the
change?
I think we need more comprehensive analysis here to apply this change, such
as total row counts, batch sizes, number of distinct prefix values, having a
fetch value, cardinality of sort columns, parallelism etc. If you have time,
investigating these would be very helpful to make the right call
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]