GitHub user zheniasigayev added a comment to the discussion: Best practices for 
memory-efficient deduplication of pre-sorted Parquet files

> Yes, please, I actually did some testing today,
> 
> * [Entire input is resorted when the data is partially sorted (not using 
> `PartialSortExec`) #16899](https://github.com/apache/datafusion/issues/16899)
> * [Add partial_sort.slt test for partially sorted data 
> #16900](https://github.com/apache/datafusion/pull/16900)

I noticed for [Add partial_sort.slt test for partially sorted data 
#16900](https://github.com/apache/datafusion/pull/16900), a related change: 
https://github.com/apache/datafusion/pull/16881 was made by @berkaysynnada. 
Would their change solve this issue?


GitHub link: 
https://github.com/apache/datafusion/discussions/16776#discussioncomment-13882839

----
This is an automatically sent email for github@datafusion.apache.org.
To unsubscribe, please send an email to: 
github-unsubscr...@datafusion.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to