GitHub user zheniasigayev added a comment to the discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files
I created a GitHub issue with relevant details summarized. See: `Streaming Aggregate operator not being used in deduplication of pre-sorted Parquet files` #16919. @alamb, let me know what other help I can try to provide from my end. GitHub link: https://github.com/apache/datafusion/discussions/16776#discussioncomment-13893368 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
