gene-bordegaray commented on code in PR #19304:
URL: https://github.com/apache/datafusion/pull/19304#discussion_r2628775400
##########
datafusion/common/src/config.rs:
##########
@@ -1000,6 +1000,34 @@ config_namespace! {
/// ```
pub repartition_sorts: bool, default = true
+ /// Partition count threshold for subset satisfaction optimization.
+ ///
+ /// When the current partition count is >= this threshold, DataFusion
will
+ /// skip repartitioning if the required partitioning expression is a
subset
+ /// of the current partition expression such as Hash(a) satisfies
Hash(a, b).
+ ///
+ /// When the current partition count is < this threshold, DataFusion
will
+ /// repartition to increase parallelism even when subset satisfaction
applies.
+ ///
+ /// Set to 0 to always repartition (disable subset satisfaction
optimization).
+ /// Set to a high value to always use subset satisfaction.
+ ///
+ /// Example (subset_satisfaction_partition_threshold = 4):
+ /// ```text
+ /// Hash([a]) satisfies Hash([a, b]) because (Hash([a, b]) is
subset of Hash([a])
+ ///
+ /// If current partitions (3) < threshold (4), repartition:
+ /// AggregateExec: mode=FinalPartitioned, gby=[a, b], aggr=[SUM(x)]
+ /// RepartitionExec: partitioning=Hash([a, b], 8),
input_partitions=3
+ /// AggregateExec: mode=Partial, gby=[a, b], aggr=[SUM(x)]
+ /// DataSourceExec: file_groups={...},
output_partitioning=Hash([a], 3)
+ ///
+ /// If current partitions (8) >= threshold (4), use subset
satisfaction:
+ /// AggregateExec: mode=SinglePartitioned, gby=[a, b],
aggr=[SUM(x)]
+ /// DataSourceExec: file_groups={...},
output_partitioning=Hash([a], 8)
+ /// ```
+ pub subset_satisfaction_partition_threshold: usize, default = 4
Review Comment:
I was battling with a name for a while 😅
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]