GitHub user berkaysynnada added a comment to the discussion: Best practices for 
memory-efficient deduplication of pre-sorted Parquet files

> > Yes, please, I actually did some testing today,
> > 
> > * [Entire input is resorted when the data is partially sorted (not using 
> > `PartialSortExec`) 
> > #16899](https://github.com/apache/datafusion/issues/16899)
> > * [Add partial_sort.slt test for partially sorted data 
> > #16900](https://github.com/apache/datafusion/pull/16900)
> 
> I noticed for [Add partial_sort.slt test for partially sorted data 
> #16900](https://github.com/apache/datafusion/pull/16900), a related change: 
> #16881 was made by @berkaysynnada. Would their change solve this issue?

I didn't notice any PartialSortExec in your plans. PartialSort only emerges if 
the source is unbounded at the current datafusion configuration. I guess #16881 
won't change any behavior of your scenario

GitHub link: 
https://github.com/apache/datafusion/discussions/16776#discussioncomment-13888197

----
This is an automatically sent email for github@datafusion.apache.org.
To unsubscribe, please send an email to: 
github-unsubscr...@datafusion.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to