irenjj commented on PR #15993: URL: https://github.com/apache/datafusion/pull/15993#issuecomment-2865547053
> Hi @irenjj. Thank you working on this, but did you try to set that config to false and run the tests? > > I think people should see here how the plans are changed if we do so (perhaps you can open another PR for that -- I don't know if there is an easier way) > > As I said in the discussion, I afraid our physical rules are not ready for this. we might either miss some sort eliminations, or possibly eliminate some subquery sorts even if config says opposite. Therefore; I was suggesting enabling this option from lower level to higher. If people still consent this, I couldn't say more but we should add some warnings at least. Yep, I've tried to set config to false and it can forbid eliminating sort in subquery. I'll add a test. >If I approach this case practically, when "order by" clauses are given in subqueries: these are converted into SortExecs at somewhere in the plan. However, in enforce_sorting, we don't track ordering requirements through SortExecs directly (otherwise, we wouldn't be able to eliminate truly necessary SortExecs). Instead, we track the requirement by inserting OutputRequirementExec at the top of the plan, which corresponds to the global ordering - that is the ordering expected when an explicit ORDER BY is given in the outermost query. > > If we decide to introduce a new config for this setting, we first need to improve the optimizer phase. Specifically, OutputRequirementExecs could be inserted into intermediate nodes in the plan, meaning the subplan beneath must guarantee to bring the required ordering at that point. So, TLDR, unless we adapt the current enforce_sorting rule accordingly, we risk losing the ability to eliminate unnecessary SortExecs in some cases. This is a very good approach. We can implement it later to replace the current approach of adding configuration -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
