ShashidharM0118 commented on issue #18595: URL: https://github.com/apache/datafusion/issues/18595#issuecomment-3538482301
Hey everyone! I've been following this discussion and agree with @LiaCastaneda that Option 2 (Physical Planner approach) is better since we can choose `AggregateMode::Single` upfront for small datasets, avoiding the backwards compatibility issues of skipping repartitions after `FinalPartitioned` mode is set. As @NGA-TRAN mentioned, I'd like to start with Parquet files since they already have row count statistics available. I'm thinking we could add a check in the physical planner that uses the row count to decide the aggregate mode. Would it be okay if I work on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
