zhuqi-lucas commented on PR #18817:
URL: https://github.com/apache/datafusion/pull/18817#issuecomment-3569024898

   > Supporting scanning Parquet files in reverse order is an absolutely great 
idea. I have a few questions.
   > 
   > Let me first rephrase it to make sure I understand correctly, this PR does:
   > 
   > 1. For applicable query patterns (topK that has reverse order to the 
parquet existing order), reverse the row-group scanning order
   > 2. For each row group, first cache all the result, then reverse the 
row-level order batch by batch.
   > 
   > This implementation is quite aggressive, I think it can get a bit tricky 
to tune it right, to avoid excessive caching, or reversing rows batch by batch 
become too expensive.
   > 
   > What if we limit the initial implementation only to reverse the row-group 
order, similar to what @adriangb is planning to do at file level in #17271 
After scanning the last row-group, the topk dynamic filter will automatically 
get updated and skip the preceding row groups.
   > 
   > * The benefits are simplicity and lower risk of regressions
   > * The downside is it's too conservative and can't get the optimal 
performance. But once we have native reverse parquet decoding support in 
`arrow-rs` (that is described in the original issue [Fast parquet order 
inversionĀ #17172](https://github.com/apache/datafusion/issues/17172)), we can 
implement the reverse scan at the row level as follow-ups.
   
   Thank you @2010YOUY01 for review and valid concern:
   You raise valid concerns about memory overhead is what i mentioned the key 
risk for this approach.
   However, I want to clarify that row group reversal alone cannot eliminate 
the SortExec - it only provides TopK filtering benefits. Without reversing rows 
within each row group, the data remains in the original order (e.g., ASC when 
we need DESC), so the sort must stay. I propose we keep the complete 
optimization but default enable_reverse_scan to false. Once we implement 
page-level caching in arrow-rs (which will reduce memory overhead 
significantly), we can consider enabling it by default.
   
   And we've been running the full implementation (row group + row-level 
reversal) in production for very long time with excellent results: 10-100x 
speedups for time-series queries, well-controlled memory usage (~one row group 
cached at a time), but we need to note we should not make the row group size 
big if we enable this feature. And with very small limit, the high memory usage 
is very short time. Also the reverse time is very small compared to the benefit 
we remove all sort.
   
   And if we want to improve the original scan to support limit the initial 
implementation only to reverse the row-group order, i think we can add 
follow-up PRs because this is another optimization which can't remove the sort 
for optimization so we need to do this in another PR.
   
   What's your opinion?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to