On Wed, Sep 18, 2024 at 11:48 AM Robert Haas <robertmh...@gmail.com> wrote: > Still, I think it's a pretty useful starting point. It is mostly > enough to give you control over join planning, and if combined with > similar work for scan planning, I think it would be enough for > pg_hint_plan. If we also got control over appendrel and agg planning, > then you could do a bunch more cool things.
Here's a new set of patches where I added a similar mechanism for scan type control. See the commit message for some notes on limitations of this approach. In the absence of better ideas, I'd like to proceed with something along the lines of 0001 and 0002. I upgraded the hint_via_alias contrib module (which is not intended for commit, it's just a demo) so that it can hint either scan type or join type. I think this is sufficient to demonstrate that it's quite easy to use hooks to leverage this new infrastructure. In fact, the biggest thing I'm unhappy about right now is the difficulty of providing the hooks with any sort of useful information. I don't think it should be the job of this patch set to solve that problem, but I do think we should do something about it. The problem exists on two levels: 1. If you want to specify in-query hints using comments, how does your extension get access to the comments? I realize that a lot of people hate in-query hints and hope they die in a fire, but pg_hint_plan does implement them, among other things, and the way it does that is to have a complete copy of the backend lexer so that it can re-lex the query text and pull out the comments, which it can then parse. The fact that someone was willing to do that is impressive, but it's pretty ugly. Still, it's not clear what other approach you could adopt. We could make the core system able to extract and pass through comments to extensions. We could add new syntax so that instead of saying SELECT ... FROM foo AS bar you can say SELECT ... FROM foo AS bar ADVICE 'anything you want goes here' and arrange to pass that string through to extensions. We could also spend a lot of time ranting about how this is a terrible idea on principle and therefore we shouldn't care about supporting it, but the horse is already out of the barn, so I'm not very impressed by that approach. 2. If you want a certain base relation or join relation to be treated in a certain way, how do you identify it? You might think that this is easy because, even when a query contains multiple references to a relation with the same name, or identical aliases in different parts of the query, EXPLAIN renames them so they have disjoint names. What would be nice is if you could feed those names back into your extension and use them as a way of specifying what behavior you want where. But that doesn't work, because what actually happens is that the plan can contain duplicated aliases, and then when EXPLAIN deparses it using ruleutils.c, that's when we rename things so that they're distinct. This means that, at the time we're planning, we don't yet know what name EXPLAIN will end up giving to any relation yet, which means we can't use the names that EXPLAIN produced for an earlier plan for the same query to associate behaviors with relations. I wonder if we could think about reversing the order of operations here and making it so that we do the distinct-ification during parse analysis or maybe early in planning, so that the name you see EXPLAIN print out is the real name of that thing and not just a value invented for display purposes. This second problem affects practically any use of the mechanism added by this patch, as well as things like pg_hint_plan and EDB's own internal implementation of planner hints. As far as I know, nobody has a good solution, and everybody just punts. Hints get specified by table name or table alias and then you hope that things match in the right places. This is sort of workable if hints are what you're trying to implement, but AFAICS it's a complete disaster if what you want to do is recreate automatically a plan you saw before. If you're hinting your queries and the hints aren't applying in quite the right places because of some name collisions, you can maybe adjust the query to avoid the collisions and still win. But if you are trying to recreate a previous plan, you really need to look at what happened last time and then make the same things happen in the same places this time, and how are you supposed to do that if there's no unique key that you can use to reliably identify the rels involved in the query? Before somebody says it, I do realize that these patches as proposed aren't enough to ensure reliably recreating a plan even if we had a perfect solution to this problem, but you have to start someplace. It seems fundamentally reasonable to me to say "hey, if I want to modify the planner behavior, I need a way to say which part of the query plan should get modified," and right now it appears to me that we don't have that. So again, I am definitely not saying that these patches get us all the way to where we should be -- not in terms of the ability to control the plan, and definitely not in terms of giving extensions all the information they need to be effective. But if we insist on a perfect solution before doing anything, we will never get anywhere, and I personally believe these are going in a useful direction. Comments, preferably constructive ones, welcome. -- Robert Haas EDB: http://www.enterprisedb.com
v4-0003-New-contrib-module-alphabet_join.patch
Description: Binary data
v4-0004-New-contrib-module-hint_via_alias.patch
Description: Binary data
v4-0002-Allow-extensions-to-control-scan-strategy.patch
Description: Binary data
v4-0001-Allow-extensions-to-control-join-strategy.patch
Description: Binary data