Re: [PR] Support reverse parquet scan and fast parquet order inversion at row group level [datafusion]

via GitHub Sun, 23 Nov 2025 04:07:08 -0800


zhuqi-lucas commented on code in PR #18817:
URL: https://github.com/apache/datafusion/pull/18817#discussion_r2554023909



##########
datafusion/common/src/config.rs:
##########
@@ -836,6 +836,13 @@ config_namespace! {
         /// writing out already in-memory data, such as from a cached
         /// data frame.
         pub maximum_buffered_record_batches_per_stream: usize, default = 2
+
+        /// Enable reverse scan optimization for ORDER BY ... DESC queries
+        /// on sorted Parquet files. When enabled, row groups and batches
+        /// are read in reverse order to eliminate sort operations.
+        /// Note: This buffers one row group at a time (typically ~128MB).
+        /// Default: true
+        pub enable_reverse_scan: bool, default = true

Review Comment:
   Note, i default to true for reverse optimization, we can default to false if 
you think it's risky for some cases.
   
   The key risk is the memory overhead, because it's row group level reverse, 
so we need to cache the row group level batches, if we setting the row group 
max size big, it  will use high memory.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Support reverse parquet scan and fast parquet order inversion at row group level [datafusion]

Reply via email to