gene-bordegaray commented on code in PR #19124:
URL: https://github.com/apache/datafusion/pull/19124#discussion_r2606686879
##########
datafusion/common/src/config.rs:
##########
@@ -965,6 +965,19 @@ config_namespace! {
/// record tables provided to the MemTable on creation.
pub repartition_file_scans: bool, default = true
+ /// Minimum number of distinct partition values required to group
files by their
+ /// Hive partition column values (enabling Hash partitioning
declaration).
+ ///
+ /// How the option is used:
+ /// - preserve_file_partitions=0: Disable it.
+ /// - preserve_file_partitions=1: Always enable it.
+ /// - preserve_file_partitions=N, actual file partitions=M: Only
enable when M >= N.
+ /// This threshold preserves I/O parallelism when file
partitioning is below it.
+ ///
+ /// Note: This may reduce parallelism at the I/O level if the number
of distinct
Review Comment:
Yes you are correct, I think a better way to convey what i am trying to say
is "rooting from the I/O level"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]