Re: [PR] Push down preferred sorts into `TableScan` logical plan node [datafusion]

via GitHub Tue, 21 Oct 2025 06:42:15 -0700


alamb commented on code in PR #17337:
URL: https://github.com/apache/datafusion/pull/17337#discussion_r2448213631



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -2593,6 +2608,68 @@ impl PartialOrd for Window {
     }
 }
 
+/// Communicates the desired ordering of the output of a scan operation.
+///
+/// Preferred orderings can potentially help DataFusion optimize queries, even 
in cases
+/// when the output does not completely follow that order. This is information 
passed
+/// to the scan about what might help.
+///
+/// For example, a query with `ORDER BY time DESC LIMIT 10`, DataFusion's 
dynamic

Review Comment:
   How about also linking to the blog that explains this in more detail: 
https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/



##########
datafusion/optimizer/src/optimizer.rs:
##########
@@ -247,6 +248,8 @@ impl Optimizer {
             Arc::new(EliminateOuterJoin::new()),
             // Filters can't be pushed down past Limits, we should do 
PushDownFilter after PushDownLimit
             Arc::new(PushDownLimit::new()),
+            // Sort pushdown should happen before filter pushdown to maximize 
optimization opportunities

Review Comment:
   Most of the other sort operations happen at the ExecutionPlan layer. Can you 
remind me why this pushdown is happening at the LogicalLevel? 



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -2593,6 +2608,68 @@ impl PartialOrd for Window {
     }
 }
 
+/// Communicates the desired ordering of the output of a scan operation.

Review Comment:
   Most of this text is about the preferred_ordering field, so maybe it would 
be best moved closer there.
   
   I can imagine potentially adding other fields like `required_ordering` in 
the future, which could communicate if the scan was required (if/when we extend 
the ExecutionPlan API to communicate what type of sort pushdowns are supported 
🤔 )



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -2593,6 +2608,68 @@ impl PartialOrd for Window {
     }
 }
 
+/// Communicates the desired ordering of the output of a scan operation.
+///
+/// Preferred orderings can potentially help DataFusion optimize queries, even 
in cases
+/// when the output does not completely follow that order. This is information 
passed
+/// to the scan about what might help.
+///
+/// For example, a query with `ORDER BY time DESC LIMIT 10`, DataFusion's 
dynamic
+/// predicates and TopK operator will work better if the data is roughly 
ordered by descending
+/// time (more recent data first).
+///
+/// Implementers of [`TableProvider`] should use this information to optimize 
the order in which data is output from the scan.
+///
+/// It is a hint and not a requirement:
+/// - If this information is completely ignored, e.g. data is scanned 
randomly, the query will still be correct because a sort will be applied to the 
data.
+/// - Partially ordered data will also be re-sorted but this may result in 
optimizations like early stopping, additional data pruning, reduced memory 
usage during the sort, etc.
+/// - If the scan produces exactly the requested ordering, and sets it's 
properties to reflect this, upstream sorts may be optimized away.
+///
+/// Actually removing unnecessary sorts is done at the physical plan level: 
logical operators like a join may or may not preserve ordering
+/// depending on what physical operator is chosen (e.g. HashJoin vs. 
SortMergeJoin).
+/// If you as a [`TableProvider`] implementer would like to eliminiate 
unnecessary sorts you should make sure the [`ExecutionPlan`]
+/// you produce reflects the ordering in it's properties.
+///
+/// [`TableProvider`]: 
https://docs.rs/datafusion/latest/datafusion/catalog/trait.TableProvider.html
+/// [`ExecutionPlan`]: 
https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html
+#[derive(Clone, PartialEq, Eq, Hash, PartialOrd, Default)]
+pub struct ScanOrdering {
+    /// Optional preferred ordering for the scan that matches the output order 
of upstream query nodes.
+    /// It is optional / best effort for the scan to produce this ordering.
+    /// If the scan produces this exact ordering and sets it's properties to 
reflect this upstream sorts may be optimized away.
+    /// Otherwise the sorts may remain in place but partial ordering may be 
exploited e.g. to do early stopping or reduce complexity of the sort.
+    /// Thus it is recommended for the scan to also do a best effort to 
produce partially sorted data if possible.
+    pub preferred_ordering: Option<Vec<SortExpr>>,

Review Comment:
   It comes down to what does a `Vec` of zero length mean? That would imply to 
me there is no preferred ordering, so an Option<Vec<SortExpr>> is redundant
   
   I would personally recommend making this non `pub` and adding an accessor 
like `fn preferred_ordering(&self) -> Option<&Vec<...>> {}` and documenting in 
doc comments what the invariants are



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Push down preferred sorts into `TableScan` logical plan node [datafusion]

Reply via email to