gene-bordegaray commented on PR #23184: URL: https://github.com/apache/datafusion/pull/23184#issuecomment-4827385756
@2010YOUY01 thank you for the comments and suggestions. I do think that `KeyPartitioned` and `HashPartitioned` can probably collapse into a single distribution type (I prefer `KeyPartitioned` as hash is / has been histroically misleading). I also believe my current implementation of co-partitioning is satisfying the same contract as described here. Since `Distribution` was not designed for multiple children I like the current `co_partitioned_with()` API as it can accept any `Distribution`s and check them without a need to redesign the API. I would be ok with supporting aggregations before multi-input operators. Something I would like to point out though, is if we are going to collapse `KeyPartitioned` and `HashPartitioned`, and have `RangePartitioning` satisfy the `KeyPartitioned` type in the public `satisfaction()` method, this will apply to the checks in `EnforceDistribution` which applies repartitioning across many operators. If we take this approach we could add explicit checks of what operator we are checking distribuion for in `EnforceDistribution` and just start with aggregations or add private helpers for each before having `RangePartitioning` saisfy `KeyPartitioned` After thinking about it for a bit I would prefer adding private helpers for each operator and then have general satisfaction for `RangePartitioning` once a good amount of operators are supported. Let me know what you think 😄 cc: @gabotechs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
