GitHub user berkaysynnada added a comment to the discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files
> > Yes, please, I actually did some testing today, > > > > * [Entire input is resorted when the data is partially sorted (not using > > `PartialSortExec`) > > #16899](https://github.com/apache/datafusion/issues/16899) > > * [Add partial_sort.slt test for partially sorted data > > #16900](https://github.com/apache/datafusion/pull/16900) > > I noticed for [Add partial_sort.slt test for partially sorted data > #16900](https://github.com/apache/datafusion/pull/16900), a related change: > #16881 was made by @berkaysynnada. Would their change solve this issue? I didn't notice any PartialSortExec in your plans. PartialSort only emerges if the source is unbounded at the current datafusion configuration. I guess #16881 won't change any behavior of your scenario GitHub link: https://github.com/apache/datafusion/discussions/16776#discussioncomment-13888197 ---- This is an automatically sent email for github@datafusion.apache.org. To unsubscribe, please send an email to: github-unsubscr...@datafusion.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org