Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 > I don't have a problem with the third explanation ;-). The issue here > is really planner speed relative to execution speed, and that's not so > hardware-sensitive as all that. Yeah, you can plan a 12-join query > way faster than ten years ago, but you can execute it way faster too, > and that's what drives expectations for planning speed. Flat-planning > a 15-way query costs just as much more relative to a 12-way query as > it did ten years ago. Even granted that ratio has stayed the same (and I'd think the planning speed would increase slightly faster than the execution speed, no?), the underlying factor is that there is a magic sweet spot at which we start making tradeoffs for the user. Even if a flat 15-way is the same relative to a 12-way, the absolute numbers should be the prime consideration. In other words, if the average flat plan/execute speed for a 13-way on an average box was 15 seconds in 2000, but 2 seconds in 2009, I would presume that it would now be worth it to consider 13 as needing default geqo coverage. - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200912040823 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAksZDbkACgkQvJuQZxSWSsjjLgCeKIFStyakHzCQljeqtX2Ie5wi 8fgAoM8ZsXHrVWgmM7UsSP6dKGslQVWK =g6ET -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
On Wed, Dec 2, 2009 at 10:53 PM, Tom Lane wrote: > Robert Haas writes: >> Well, when I was testing, I believe I observed that an n-way join with >> 1 cross join was slower to plan than an n-way join with no cross >> joins. ISTM that it should actually be faster, because you should >> plan it like an (n-1)-way join and then do the cross join at the end. > > It's not entirely clear to me what case you're describing, but I wonder > whether this was a "flat" join problem or restricted by the collapse > limits. Argh. I can't reproduce exactly what I thought I was seeing before. However, with the attached schema, geqo off, and the collapse thresholds set to 100, "explain select * from bar2_view" and "explain select * from bar3_view" have roughly the same run time. They are identical except that one of the join clauses has been omitted in the second case. One would think that the second case could be planned faster, if we plan to just leave the cross join to the end. (And in fact on my system if you remove bar8 from the second view entirely the plan time improves by a factor of six.) ...Robert join2.sql Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
Robert Haas writes: > Well, when I was testing, I believe I observed that an n-way join with > 1 cross join was slower to plan than an n-way join with no cross > joins. ISTM that it should actually be faster, because you should > plan it like an (n-1)-way join and then do the cross join at the end. It's not entirely clear to me what case you're describing, but I wonder whether this was a "flat" join problem or restricted by the collapse limits. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
On Wed, Dec 2, 2009 at 10:32 PM, Tom Lane wrote: > Robert Haas writes: >> Not sure what you mean. There's already a special-case code path for >> cross joins; but I think it's probably considering a lot of silly >> paths. Is there a case where it makes sense to do cross joins at some >> stage of the process other than last? > > They *are* done last, as a rule, because of the heuristic that prefers to > join where there's a join clause. Well, when I was testing, I believe I observed that an n-way join with 1 cross join was slower to plan than an n-way join with no cross joins. ISTM that it should actually be faster, because you should plan it like an (n-1)-way join and then do the cross join at the end. > (However I've gotten negative comments > about that --- some people think that when joining small detail tables > to a big fact table, it'd be better to cross-join the detail tables and > then do one multi-clause join to the big table. I'm unconvinced myself > but there does seem to be more than one school of thought about it.) Sounds weird to me. There might be a sweet spot where that's true (3 or 4 detail tables with 2 or 3 rows each, that aren't too wide?) but even if there is, I bet it's not very big. If someone cares though it should be possible to convince the planner to execute the query that way (using OFFSET 0, maybe) and benchmark it vs. whatever the planner wants to do otherwise. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
Robert Haas writes: > Not sure what you mean. There's already a special-case code path for > cross joins; but I think it's probably considering a lot of silly > paths. Is there a case where it makes sense to do cross joins at some > stage of the process other than last? They *are* done last, as a rule, because of the heuristic that prefers to join where there's a join clause. (However I've gotten negative comments about that --- some people think that when joining small detail tables to a big fact table, it'd be better to cross-join the detail tables and then do one multi-clause join to the big table. I'm unconvinced myself but there does seem to be more than one school of thought about it.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
On Wed, Dec 2, 2009 at 9:55 PM, Tom Lane wrote: >> One other thing I'm noticing about the current implementation is that >> it seems to spend an entirely excessive amount of brain power >> considering the best order in which to execute cross-joins. If I do >> X, A JOIN B ON Pab JOIN C ON Pac JOIN D ON Pad JOIN E ON Pae, it looks >> to me like join_search_one_level() will try joining X to each of A-E. >> That seems fairly pointless; why would I ever want to join X to >> anything other than {A B C D E}? > > Not sure that a lot of cross joins with no conditions is the case to > design around. Usually queries aren't that devoid of features of > interest, and so different join paths are actually usefully different. Not sure what you mean. There's already a special-case code path for cross joins; but I think it's probably considering a lot of silly paths. Is there a case where it makes sense to do cross joins at some stage of the process other than last? >> ... We should maybe also >> think about raising the default value for work_mem. It's hard for me >> to believe that the average Postgres user wants a sort that takes more >> than 1MB of memory to spill to disk; there certainly are people who >> probably want that, but I doubt there are very many. I believe we've >> been using that value for a decade, and memory size has increased a >> lot in that time. > > Maybe. I'll certainly grant that machines have more memory, but is the > average Postgres installation using that to run bigger sorts, or to run > more sorts (either more concurrent queries or more complex queries > containing more sorts)? We know that increasing work_mem too much > can be counterproductive, and much sooner than one might think. A further confounding factor is that work_mem also controls memory usage for hash tables - whereas the original sort_mem did not - and at least in my experience it's more common to have multiple hashes in a query than multiple sorts. It would be nice to have some data on this rather than just hand-waving, but I'm not sure how to get it. For default_statistics_target, *_collapse_threshold, and geqo_threshold, we were able to construct worst-case queries and benchmark them. I have no idea how to do something comparable for work_mem. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
Robert Haas writes: > Actually, I think Tom made some changes for 8.5 that should eliminate > the randomness, if not the badness. Or am I misremembering? It was mostly Andres' work, see http://archives.postgresql.org/pgsql-committers/2009-07/msg00148.php > One other thing I'm noticing about the current implementation is that > it seems to spend an entirely excessive amount of brain power > considering the best order in which to execute cross-joins. If I do > X, A JOIN B ON Pab JOIN C ON Pac JOIN D ON Pad JOIN E ON Pae, it looks > to me like join_search_one_level() will try joining X to each of A-E. > That seems fairly pointless; why would I ever want to join X to > anything other than {A B C D E}? Not sure that a lot of cross joins with no conditions is the case to design around. Usually queries aren't that devoid of features of interest, and so different join paths are actually usefully different. > ... We should maybe also > think about raising the default value for work_mem. It's hard for me > to believe that the average Postgres user wants a sort that takes more > than 1MB of memory to spill to disk; there certainly are people who > probably want that, but I doubt there are very many. I believe we've > been using that value for a decade, and memory size has increased a > lot in that time. Maybe. I'll certainly grant that machines have more memory, but is the average Postgres installation using that to run bigger sorts, or to run more sorts (either more concurrent queries or more complex queries containing more sorts)? We know that increasing work_mem too much can be counterproductive, and much sooner than one might think. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
On Wed, Dec 2, 2009 at 5:08 PM, Greg Sabino Mullane wrote: >>> What about 14? Could we at least raise it to 14? 1/2 :) > >> I doubt we can raise it at all without lying to ourselves about the >> likely results of so doing. The GEQO planning times are in the low >> double digits of milliseconds. My apps typically have a budget of at >> most ~200 ms to plan and execute the query, and I'm not always >> operating on empty tables. > > Well, this might be addressed elsewhere, but it's not just the planning > time, it's the non-repeatable plans when you hit geqo. That tends to > drive my clients mad (as in very confused, and then angry). And the plans > that geqo comes up with are seldom very good ones either. So yes, it's an > adjustable knob, but I'd rather see it default to 0 or >= 14. Actually, I think Tom made some changes for 8.5 that should eliminate the randomness, if not the badness. Or am I misremembering? One other thing I'm noticing about the current implementation is that it seems to spend an entirely excessive amount of brain power considering the best order in which to execute cross-joins. If I do X, A JOIN B ON Pab JOIN C ON Pac JOIN D ON Pad JOIN E ON Pae, it looks to me like join_search_one_level() will try joining X to each of A-E. That seems fairly pointless; why would I ever want to join X to anything other than {A B C D E}? > Well, it's more a matter of consensus on the Right Thing To Do rather > than a Simple Matter of Coding. Some of the more interesting conversations > over the years has been on what to set the defaults to (random_page_cost > anyone?). The conflict is then "real world anecdotes" versus "test-backed, > data-driven numbers". It's hard to get real numbers on many things, > especially when people are using Postgres on a staggeringly large collection > of hardware, database size, activity, etc. There's always a balance to > hit the sweet spot for many knobs (both in postgresql.conf and elsewhere) > between benefitting the most people while adversely impacting the least > number of people. The project has been very, very conservative in this > respect, which is why they need people like me who keep pushing in the > other direction. Even if I secretly agree with Tom 99% of the time. :) Heh. Well, we did raise default_statistics_target quite a bit for 8.4. I suggested raising from_collapse_threshold, join_collapse_threshold, and geqo_threshold, but diligent experimenation by Tom and Andres Freund revealed this idea to suck. I think it's an interesting area for more work to try to eliminate the suckage, but I don't think we're there yet. I don't think I remember the last round of debates about random_page_cost and seq_page_cost; the current values probably are too high for most people, because typically you've got a lot of stuff cached. We should maybe also think about raising the default value for work_mem. It's hard for me to believe that the average Postgres user wants a sort that takes more than 1MB of memory to spill to disk; there certainly are people who probably want that, but I doubt there are very many. I believe we've been using that value for a decade, and memory size has increased a lot in that time. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
"Greg Sabino Mullane" writes: >> The reason this is a configurable parameter is so that >> people can tune it to their own needs. I think the current >> default fits all right with our usual policy of being >> conservative about hardware requirements. > That only makes sense if you adjust it accordingly over time. > It's been 12 for a long time - since January 2004 - while > hardware has radically improved in that time, which means that > either 12 was too high five years ago, is too low now, or is very > insensitive to the speed of the hardware. I submit it's probably > more of the second option. I don't have a problem with the third explanation ;-). The issue here is really planner speed relative to execution speed, and that's not so hardware-sensitive as all that. Yeah, you can plan a 12-join query way faster than ten years ago, but you can execute it way faster too, and that's what drives expectations for planning speed. Flat-planning a 15-way query costs just as much more relative to a 12-way query as it did ten years ago. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 >> What about 14? Could we at least raise it to 14? 1/2 :) > I doubt we can raise it at all without lying to ourselves about the > likely results of so doing. The GEQO planning times are in the low > double digits of milliseconds. My apps typically have a budget of at > most ~200 ms to plan and execute the query, and I'm not always > operating on empty tables. Well, this might be addressed elsewhere, but it's not just the planning time, it's the non-repeatable plans when you hit geqo. That tends to drive my clients mad (as in very confused, and then angry). And the plans that geqo comes up with are seldom very good ones either. So yes, it's an adjustable knob, but I'd rather see it default to 0 or >= 14. >> I'm not sure I agree with the premise that there is a problem in need >> of fixing. I think we're usually pretty good about fixing things >> when there is a simple, straightforward fix. When the only real fixes >> involve writing a lot of code, we tend to be good about fixing them >> when - and only when - someone is willing and able to write that code. >> Often that's not the case, but that's an economic problem more than a >> process problem. And then there are cases (like CRCs) where we can't >> even figure out what the code would look like, and then we tend to do >> nothing, but what's the other choice? >> Obviously you see this issue differently so I'd like to hear more of >> your thoughts. Well, it's more a matter of consensus on the Right Thing To Do rather than a Simple Matter of Coding. Some of the more interesting conversations over the years has been on what to set the defaults to (random_page_cost anyone?). The conflict is then "real world anecdotes" versus "test-backed, data-driven numbers". It's hard to get real numbers on many things, especially when people are using Postgres on a staggeringly large collection of hardware, database size, activity, etc. There's always a balance to hit the sweet spot for many knobs (both in postgresql.conf and elsewhere) between benefitting the most people while adversely impacting the least number of people. The project has been very, very conservative in this respect, which is why they need people like me who keep pushing in the other direction. Even if I secretly agree with Tom 99% of the time. :) - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200912021705 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAksW5SoACgkQvJuQZxSWSsjmlQCePLKdyrCLpv86tIQtDbazKe+4 l5EAn3KfOy+ySxqhIe9UG2Jtshlb93Up =U7PP -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 > The reason this is a configurable parameter is so that > people can tune it to their own needs. I think the current > default fits all right with our usual policy of being > conservative about hardware requirements. That only makes sense if you adjust it accordingly over time. It's been 12 for a long time - since January 2004 - while hardware has radically improved in that time, which means that either 12 was too high five years ago, is too low now, or is very insensitive to the speed of the hardware. I submit it's probably more of the second option. The postgresql.conf file has been supporting "a toaster" for a long time now, but we don't seem to recognize that the minimum toaster^H^Hhardware encountered in the wild changes quite a bit from year to year. IMHO. - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200912021651 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAksW4YMACgkQvJuQZxSWSsg7gACggzmyKkudzomoleil3PYMnB+z j+UAoMhi9yEAi8iPBVnailm8jKTe+z39 =++Jr -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
On Fri, Nov 27, 2009 at 8:23 PM, Tom Lane wrote: >> I don't think there's any easy fix for this. > > Nope :-(. I was able to get rid of the specific O(N^2) behavior that > you were running into, but that just pushes the problem out a bit. Yeah. Testing a couple of the cases I was looking at last night suggests that your patch shaved off close to 30%, which isn't bad for a Friday's evening's work. My original email was really about some GEQO difficulties under 8.3.x, but I think the effort towards improving the standard planner is well-spent, because the empirical evidence suggests that everyone always prefers to use the standard planner whenever possible. > FWIW, a profile of CVS HEAD running on this query looks like > > samples % symbol name > 208189 15.9398 compare_fuzzy_path_costs > 116260 8.9014 add_path > 97509 7.4657 AllocSetAlloc > 78354 5.9991 make_canonical_pathkey > 69027 5.2850 generate_join_implied_equalities > 65153 4.9884 cost_nestloop > 59689 4.5700 bms_is_subset > 49176 3.7651 add_paths_to_joinrel > 39720 3.0411 bms_overlap > 32562 2.4931 cost_mergejoin > 30101 2.3047 pathkeys_useful_for_merging > 26409 2.0220 AllocSetFree > 24459 1.8727 MemoryContextAllocZeroAligned > 23247 1.7799 hash_search_with_hash_value > 16018 1.2264 check_list_invariants > > which is at least unsurprising. However, it eats memory fast enough > to start swapping after level 7 or so, so there is no way that the > exhaustive planner is ever going to finish this problem. Yes, that looks similar to what I've seen in the past. The thing about this is that in a case like this one, the actual join order barely matters at all because the joins are mostly equivalent. None of them change the cardinality of the output set, and while in a real application the various foo{i}, bar{i}, baz{i}, and bletch{i} would be of somewhat different sizes, it typically doesn't matter very much which one you do first. If there were a filter criteria on say bar7_id, you'd want to do the joins necessary to allow the filter to be applied as early as possible, and then the order of the rest doesn't matter. Unfortunately, it's hard to know how to formalize that. >> Playing around with this a bit, I was easily able to get 2-second >> planing times on 15 table join, 6-second planning times on a 16 table >> join and 30-second planning times on a 17 table join. This makes it >> hard to support raising the GEQO threshold, as most recently suggested >> by Greg Sabino Mullane here (and previously by me on an earlier >> thread): > > Yeah. I think GEQO or other approximate methods are the only hope for > very large join problems. We could however hope to get to the point of > being able to raise the collapse limits, rather than keeping them > below the geqo_threshold by default. In principle that should result > in better plans ... Yes. from_collapse_limit implementation is all-but-guaranteed to generate bad plans on queries like "SELECT * FROM foo_view WHERE id = 1". It would be OK to constrain the search space at any given step by considering joins to only a subset of the relations to which a join would actually be legal - what we need to avoid is anything that encourages joining relations to each other before they've been joined to foo1, which is exactly what from_collapse_limit does. If from_collapse_limit could be read to mean "once you've joined anything to a relation within bar_view, you must finish joining to everything else in bar_view before you can think about joining to anything else", it would generate tolerable plans. Of course it also wouldn't constrain the search space nearly as effectively. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
marcin mank writes: > This is pretty off-topic, but if we had some upper bound on the cost > of the complete plan, we could discard pieces of the plan that already > cost more. > One way to get the upper bound is to generate the plan in depth-first > fashion, instead of the current breadth-first. Instead of bottom-up > dynamic programming, employ memoization. Well, the breadth-first approach also allows elimination of large parts of the search space. It's not immediately obvious to me that the above would be better. You should be able to try it if you want though --- these days, there's even a hook to allow replacement of the join search strategy at runtime. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
Robert Haas writes: > On Mon, Nov 9, 2009 at 1:10 PM, Tom Lane wrote: >> Yowza. 18000 distinct paths for one relation? Could we see the test >> case? > Hey, wait a minute. Unless I'm missing something, the items being > accumulated here are not paths (as Tom said upthread and I naively > believed), but RelOptInfos. Yeah, I failed to think carefully about that. > It looks like we create a RelOptInfo for > every legal join order, so this is going to happen on any large-ish > query where the joins can be reordered relatively freely. Actually, it's more like a RelOptInfo for every combination of base rels that could be formed along any legal join path (where "legal" includes the heuristics that avoid clauseless joins when possible). So the worst case is 2^n for n base rels, but usually considerably fewer. Nonetheless, even a small fraction of 2^37 is Too Many. > I don't think there's any easy fix for this. Nope :-(. I was able to get rid of the specific O(N^2) behavior that you were running into, but that just pushes the problem out a bit. FWIW, a profile of CVS HEAD running on this query looks like samples %symbol name 208189 15.9398 compare_fuzzy_path_costs 1162608.9014 add_path 97509 7.4657 AllocSetAlloc 78354 5.9991 make_canonical_pathkey 69027 5.2850 generate_join_implied_equalities 65153 4.9884 cost_nestloop 59689 4.5700 bms_is_subset 49176 3.7651 add_paths_to_joinrel 39720 3.0411 bms_overlap 32562 2.4931 cost_mergejoin 30101 2.3047 pathkeys_useful_for_merging 26409 2.0220 AllocSetFree 24459 1.8727 MemoryContextAllocZeroAligned 23247 1.7799 hash_search_with_hash_value 16018 1.2264 check_list_invariants which is at least unsurprising. However, it eats memory fast enough to start swapping after level 7 or so, so there is no way that the exhaustive planner is ever going to finish this problem. > Playing around with this a bit, I was easily able to get 2-second > planing times on 15 table join, 6-second planning times on a 16 table > join and 30-second planning times on a 17 table join. This makes it > hard to support raising the GEQO threshold, as most recently suggested > by Greg Sabino Mullane here (and previously by me on an earlier > thread): Yeah. I think GEQO or other approximate methods are the only hope for very large join problems. We could however hope to get to the point of being able to raise the collapse limits, rather than keeping them below the geqo_threshold by default. In principle that should result in better plans ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
On Sat, Nov 28, 2009 at 12:04 AM, Tom Lane wrote: > It's not so much so-many-paths as so-many-join-relations that's killing > it. I put some instrumentation into join_search_one_level to count the > number of joinrels it was creating, and got this before getting bored: This is pretty off-topic, but if we had some upper bound on the cost of the complete plan, we could discard pieces of the plan that already cost more. One way to get the upper bound is to generate the plan in depth-first fashion, instead of the current breadth-first. Instead of bottom-up dynamic programming, employ memoization. The doubt I have is that this could show to not be a win because to discard a sub-plan we would have to consider the startup cost, not the total cost, and therefore we would be discarding not enough to make it worthwile. But I thought I`d mention it anyway, in case someone has a better idea :) Greetings Marcin Mańk -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
On Fri, Nov 27, 2009 at 3:05 PM, Robert Haas wrote: > On Mon, Nov 9, 2009 at 1:42 PM, Robert Haas wrote: >> On Mon, Nov 9, 2009 at 1:10 PM, Tom Lane wrote: >>> Robert Haas writes: On Mon, Nov 9, 2009 at 10:57 AM, Tom Lane wrote: > Too bad you don't have debug symbols ... it'd be interesting to see > how long that list is. >>> I stopped it a couple of times. Lengths of list1, list2 respectively: >>> 8876, 20 14649, 18 15334, 10 17148, 18 18173, 18 >>> >>> Yowza. 18000 distinct paths for one relation? Could we see the test >>> case? >> >> Well, the test case isn't simple, and I'm not sure that my employer >> would be pleased if I posted it to a public mailing list. The general >> thrust of it is that [...] > > Test case attached. Load this into an empty database and then: > > set geqo to off; > set join_collapse_limit to 100; > set from_collapse_limit to 100; > select * from foo_view order by name; > > I guess it's somewhat unsurprising that you can make the planner get > into trouble with the above settings - we've been over that ground > before. But it might be possibly interesting that such a simple > schema is capable of generating so many paths. This breaks 40,000 > without finishing. Hey, wait a minute. Unless I'm missing something, the items being accumulated here are not paths (as Tom said upthread and I naively believed), but RelOptInfos. It looks like we create a RelOptInfo for every legal join order, so this is going to happen on any large-ish query where the joins can be reordered relatively freely. I don't think there's any easy fix for this. When the joins can be reordered relatively freely, the number of legal join orders is roughly n!. It might be possible to reduce the overhead associated with a single one of those join orderings (which consists of a RelOptInfo, some supporting data structures, and however many paths) but given the explosive growth of n! that won't buy very much at all, and it won't make the lists shorter in any case. Playing around with this a bit, I was easily able to get 2-second planing times on 15 table join, 6-second planning times on a 16 table join and 30-second planning times on a 17 table join. This makes it hard to support raising the GEQO threshold, as most recently suggested by Greg Sabino Mullane here (and previously by me on an earlier thread): http://archives.postgresql.org/pgsql-hackers/2009-11/msg01106.php What sucks about the current geqo_threshold mechanism is that the number of tables is a fairly coarse predictor of the number of legal join orders. On some queries it is close to n! - on others it is far less. It would be nice if we had a way to either (1) approximate the number of legal join orders before we start planning the query, and base the GEQO decision on that number rather than on the number of tables, or (2) always start with the standard (exhaustive) planner, and switch to some kind of approximation algorithm (GEQO or otherwise) if we find that to be impractical. But neither of these seems at all straightforward. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
Robert Haas writes: > I guess it's somewhat unsurprising that you can make the planner get > into trouble with the above settings - we've been over that ground > before. But it might be possibly interesting that such a simple > schema is capable of generating so many paths. It's not so much so-many-paths as so-many-join-relations that's killing it. I put some instrumentation into join_search_one_level to count the number of joinrels it was creating, and got this before getting bored: join_search_one_level(2): 36 total, 36 by clause, 0 by clauseless, 0 bushy, 0 lastditch; paths/rel min 1 max 4 avg 3.06 join_search_one_level(3): 192 total, 384 by clause, 0 by clauseless, 0 bushy, 0 lastditch; paths/rel min 1 max 6 avg 4.70 join_search_one_level(4): 865 total, 2373 by clause, 0 by clauseless, 222 bushy, 0 lastditch; paths/rel min 1 max 8 avg 6.20 join_search_one_level(5): 3719 total, 12637 by clause, 0 by clauseless, 2239 bushy, 0 lastditch; paths/rel min 2 max 10 avg 7.81 join_search_one_level(6): 15387 total, 62373 by clause, 0 by clauseless, 14562 bushy, 0 lastditch; paths/rel min 3 max 12 avg 9.43 join_search_one_level(7): 59857 total, 283371 by clause, 0 by clauseless, 75771 bushy, 0 lastditch; paths/rel min 4 max 15 avg 10.92 So the number of paths per relation seems to be staying manageable, but the number of join relations is going through the roof. On the other hand, you've got 37 base relations here, and (37 choose 7) is 10295472, so the planner is doing reasonably well at pruning the search tree. Anyway, the immediate problem is that list_concat_unique_ptr is a bad way of forming the lists of relations with a certain number of members. It strikes me that that's easily fixable given that we know when we are constructing a new RelOptInfo how many members it has. We could add the rel to the right list at that time, instead of waiting until we get back up to join_search_one_level. It implies making the joinrels[] array part of the planner's global state instead of local to make_one_rel and join_search_one_level, but for the sorts of list lengths we're seeing here it seems clearly worthwhile. Maybe I'll go try that while I'm sitting here digesting leftover turkey. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
On Mon, Nov 9, 2009 at 1:10 PM, Tom Lane wrote: > Robert Haas writes: >> On Mon, Nov 9, 2009 at 10:57 AM, Tom Lane wrote: >>> Too bad you don't have debug symbols ... it'd be interesting to see >>> how long that list is. > >> I stopped it a couple of times. Lengths of list1, list2 respectively: > >> 8876, 20 >> 14649, 18 >> 15334, 10 >> 17148, 18 >> 18173, 18 > > Yowza. 18000 distinct paths for one relation? Could we see the test > case? Well, the test case isn't simple, and I'm not sure that my employer would be pleased if I posted it to a public mailing list. The general thrust of it is that there is a view, let's call it foo_view, of the following form, where foo[1-6] are base tables: foo1 JOIN bar_view JOIN baz_view JOIN foo3 LEFT JOIN foo4 LEFT JOIN foo5 LEFT JOIN foo6 bar_view is of the following form, bar[1-14] being base tables: bar1, bletch_view, bar2, bar3, bar4, bar5, bar6, bar7 LEFT JOIN bar8 LEFT JOIN bar9 LEFT JOIN bar10 LEFT JOIN bar11 LEFT JOIN bar12 LEFT JOIN bar13 LEFT JOIN bar14 baz_view is of the following form, baz[1-9] being base tables: baz1, baz2, baz3 JOIN baz4 LEFT JOIN baz5 LEFT JOIN baz6 LEFT JOIN baz7 LEFT JOIN baz8 LEFT JOIN baz9 bletch_view is of the following form, bletch[1-9] being base tables: bletch1, bletch2 LEFT JOIN bletch3 LEFT JOIN bletch4 LEFT JOIN bletch5 LEFT JOIN bletch6 LEFT JOIN bletch7 LEFT JOIN bletch8 LEFT JOIN bletch9 Since the webapp front-end gives users a choice of which columns to pull down, values from most of these tables can potentially appear in the output. There are a handful of rels in bar_view none of whose attributes can possibly be needed in the output, so I may make a slightly stripped down version of bar_view just for this purpose, and keep the original one around for other queries. I've already done this for bletch_view, which is a significantly stripped-down version of a more complex view that is used in other queries. Most if not all of the joins are from some random column of the left-hand relation to the primary key of the right-hand relation. There are no Cartesian products. Most of the base tables have a unique index on the primary key and no other indices, although a few of them have one or two additional indices. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
Robert Haas writes: > On Mon, Nov 9, 2009 at 10:57 AM, Tom Lane wrote: >> Too bad you don't have debug symbols ... it'd be interesting to see >> how long that list is. > I stopped it a couple of times. Lengths of list1, list2 respectively: > 8876, 20 > 14649, 18 > 15334, 10 > 17148, 18 > 18173, 18 Yowza. 18000 distinct paths for one relation? Could we see the test case? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
On Mon, Nov 9, 2009 at 10:57 AM, Tom Lane wrote: > Robert Haas writes: >> I've stopped the query more than 10 times now and EVERY SINGLE ONE >> finds it in list_concat_unique_ptr(). :-( > > Too bad you don't have debug symbols ... it'd be interesting to see > how long that list is. [ teaches self how to install debug symbols ] I stopped it a couple of times. Lengths of list1, list2 respectively: 8876, 20 14649, 18 15334, 10 17148, 18 18173, 18 ...Robert ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
Robert Haas writes: > I've stopped the query more than 10 times now and EVERY SINGLE ONE > finds it in list_concat_unique_ptr(). :-( Too bad you don't have debug symbols ... it'd be interesting to see how long that list is. > It's also using about 12x as much RAM as the GEQO version. No surprise. GEQO is designed to constrain its memory use, the exhaustive planner not so much. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
On Monday 09 November 2009 16:28:46 Robert Haas wrote: > On Mon, Nov 9, 2009 at 10:27 AM, Andres Freund wrote: > > On Monday 09 November 2009 16:23:52 Robert Haas wrote: > >> On Mon, Nov 9, 2009 at 10:21 AM, Andres Freund wrote: > >> > On Monday 09 November 2009 16:18:10 Robert Haas wrote: > >> >> On Sun, Jul 19, 2009 at 4:00 PM, Tom Lane wrote: > >> >> > Log Message: > >> >> > --- > >> >> > Rewrite GEQO's gimme_tree function so that it always finds a legal > >> >> > join sequence, even when the input "tour" doesn't lead directly to > >> >> > such a sequence. The stack logic that was added in 2004 only > >> >> > supported cases where relations that had to be joined to each other > >> >> > (due to join order restrictions) were adjacent in the tour. > >> >> > However, relying on a random search to figure that out is > >> >> > tremendously inefficient in large join problems, and could even > >> >> > fail completely (leading to "failed to make a valid plan" errors) > >> >> > if > >> >> > random_init_pool ran out of patience. It seems better to make the > >> >> > tour-to-plan transformation a little bit fuzzier so that every tour > >> >> > can form a legal plan, even though this means that apparently > >> >> > different tours will sometimes yield the same plan. > >> >> > > >> >> > In the same vein, get rid of the logic that knew that tours > >> >> > (a,b,c,d,...) are the same as tours (b,a,c,d,...), and therefore > >> >> > insisted the latter are invalid. The chance of generating two > >> >> > tours that differ only in this way isn't that high, and throwing > >> >> > out 50% of possible tours to avoid such duplication seems more > >> >> > likely to waste valuable genetic- refinement generations than to do > >> >> > anything useful. > >> >> > > >> >> > This leaves us with no cases in which geqo_eval will deem a tour > >> >> > invalid, so get rid of assorted kluges that tried to deal with such > >> >> > cases, in particular the undocumented assumption that DBL_MAX is an > >> >> > impossible plan cost. > >> >> > > >> >> > This is all per testing of Robert Haas' > >> >> > lets-remove-the-collapse-limits patch. That idea has crashed and > >> >> > burned, at least for now, but we still got something useful out of > >> >> > it. > >> >> > > >> >> > It's possible we should back-patch this change, since the "failed > >> >> > to make a valid plan" error can happen in existing releases; but > >> >> > I'd rather not until it has gotten more testing. > >> >> > >> >> I think I just ran smack dab into this bug on 8.3.8 (RPM: > >> >> postgresql-8.3.8-1.fc10.i386). I had a query that wasn't coming out > >> >> very well with the default settings so I raised the collapse limits > >> >> and let GEQO have a crack at it. This was not a rousing success. > >> >> It didn't actually fail, but it did this sort of thing for a real > >> >> long time. > >> > > >> > Yea. Seeing those backtraces all the time was what lead me to use > >> > 64bit bitmapsets... > >> > > >> > The problem with that change is that it might change existing queries > >> > that work well today to get very slow - I have one such case. Its just > >> > a happenstance, but... > >> > >> Wait, which change can make existing queries slow? My original > >> change, this fix by Tom, or 64-bit bitmapsets? > > > > The fix by Tom - it completely changes which plans will get produced (Oh, > > well. Your change did as well, but nobody thought of backpatching those) > > Although even the old plans were not really reproducable, so I guess my > > argument isnt that strong. > Well, we might want to look at your example then - this wasn't > backpatched, but it's in HEAD. Hm. Its a heuristic search - so its not surprising it does find a good plan with some sort of heuristic (<=8.4 behaviour) and not in another. I guess my point is just that different plans will be found which are currently not found (because the old geqo gives up quite early) Fixing this will probably require a way much more intelligent/new heuristic planner - which is a relatively big untertaking I see nobody really doing right now. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
On Mon, Nov 9, 2009 at 10:18 AM, Robert Haas wrote: > I think I just ran smack dab into this bug on 8.3.8 (RPM: > postgresql-8.3.8-1.fc10.i386). I had a query that wasn't coming out > very well with the default settings so I raised the collapse limits > and let GEQO have a crack at it. This was not a rousing success. > It didn't actually fail, but it did this sort of thing for a real long > time. > > #0 0x08192c6b in bms_overlap () > #1 0x081b6046 in have_join_order_restriction () > #2 0x081a9438 in gimme_tree () > #3 0x081a9515 in geqo_eval () > #4 0x081a9d6c in random_init_pool () > #5 0x081a9728 in geqo () [snip] That backtrace actually bounced around a little bit - it was always in gimme_tree(), but the rest varied. If I raise the collapse limits AND disable GEQO, then I get backtraces that CONSISTENTLY look like this. #0 0x081923bc in list_concat_unique_ptr () #1 0x081b6922 in join_search_one_level () #2 0x081ab22b in standard_join_search () #3 0x081abf72 in ?? () #4 0x081be4c3 in query_planner () #5 0x081beedb in ?? () #6 0x081c00ce in subquery_planner () #7 0x081c057c in standard_planner () #8 0x08207caa in pg_plan_query () #9 0x08207df4 in pg_plan_queries () #10 0x082083d4 in ?? () #11 0x08209b3f in PostgresMain () #12 0x081dc1e5 in ?? () #13 0x081dd20a in PostmasterMain () #14 0x08190f96 in main () I've stopped the query more than 10 times now and EVERY SINGLE ONE finds it in list_concat_unique_ptr(). :-( It's also using about 12x as much RAM as the GEQO version. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
On Mon, Nov 9, 2009 at 10:27 AM, Andres Freund wrote: > On Monday 09 November 2009 16:23:52 Robert Haas wrote: >> On Mon, Nov 9, 2009 at 10:21 AM, Andres Freund wrote: >> > On Monday 09 November 2009 16:18:10 Robert Haas wrote: >> >> On Sun, Jul 19, 2009 at 4:00 PM, Tom Lane wrote: >> >> > Log Message: >> >> > --- >> >> > Rewrite GEQO's gimme_tree function so that it always finds a legal >> >> > join sequence, even when the input "tour" doesn't lead directly to >> >> > such a sequence. The stack logic that was added in 2004 only supported >> >> > cases where relations that had to be joined to each other (due to join >> >> > order restrictions) were adjacent in the tour. However, relying on a >> >> > random search to figure that out is tremendously inefficient in large >> >> > join problems, and could even fail completely (leading to "failed to >> >> > make a valid plan" errors) if >> >> > random_init_pool ran out of patience. It seems better to make the >> >> > tour-to-plan transformation a little bit fuzzier so that every tour >> >> > can form a legal plan, even though this means that apparently >> >> > different tours will sometimes yield the same plan. >> >> > >> >> > In the same vein, get rid of the logic that knew that tours >> >> > (a,b,c,d,...) are the same as tours (b,a,c,d,...), and therefore >> >> > insisted the latter are invalid. The chance of generating two tours >> >> > that differ only in this way isn't that high, and throwing out 50% of >> >> > possible tours to avoid such duplication seems more likely to waste >> >> > valuable genetic- refinement generations than to do anything useful. >> >> > >> >> > This leaves us with no cases in which geqo_eval will deem a tour >> >> > invalid, so get rid of assorted kluges that tried to deal with such >> >> > cases, in particular the undocumented assumption that DBL_MAX is an >> >> > impossible plan cost. >> >> > >> >> > This is all per testing of Robert Haas' >> >> > lets-remove-the-collapse-limits patch. That idea has crashed and >> >> > burned, at least for now, but we still got something useful out of it. >> >> > >> >> > It's possible we should back-patch this change, since the "failed to >> >> > make a valid plan" error can happen in existing releases; but I'd >> >> > rather not until it has gotten more testing. >> >> >> >> I think I just ran smack dab into this bug on 8.3.8 (RPM: >> >> postgresql-8.3.8-1.fc10.i386). I had a query that wasn't coming out >> >> very well with the default settings so I raised the collapse limits >> >> and let GEQO have a crack at it. This was not a rousing success. >> >> It didn't actually fail, but it did this sort of thing for a real long >> >> time. >> > >> > Yea. Seeing those backtraces all the time was what lead me to use 64bit >> > bitmapsets... >> > >> > The problem with that change is that it might change existing queries >> > that work well today to get very slow - I have one such case. Its just a >> > happenstance, but... >> Wait, which change can make existing queries slow? My original >> change, this fix by Tom, or 64-bit bitmapsets? > The fix by Tom - it completely changes which plans will get produced (Oh, > well. > Your change did as well, but nobody thought of backpatching those) > > Although even the old plans were not really reproducable, so I guess my > argument isnt that strong. Well, we might want to look at your example then - this wasn't backpatched, but it's in HEAD. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
On Monday 09 November 2009 16:23:52 Robert Haas wrote: > On Mon, Nov 9, 2009 at 10:21 AM, Andres Freund wrote: > > On Monday 09 November 2009 16:18:10 Robert Haas wrote: > >> On Sun, Jul 19, 2009 at 4:00 PM, Tom Lane wrote: > >> > Log Message: > >> > --- > >> > Rewrite GEQO's gimme_tree function so that it always finds a legal > >> > join sequence, even when the input "tour" doesn't lead directly to > >> > such a sequence. The stack logic that was added in 2004 only supported > >> > cases where relations that had to be joined to each other (due to join > >> > order restrictions) were adjacent in the tour. However, relying on a > >> > random search to figure that out is tremendously inefficient in large > >> > join problems, and could even fail completely (leading to "failed to > >> > make a valid plan" errors) if > >> > random_init_pool ran out of patience. It seems better to make the > >> > tour-to-plan transformation a little bit fuzzier so that every tour > >> > can form a legal plan, even though this means that apparently > >> > different tours will sometimes yield the same plan. > >> > > >> > In the same vein, get rid of the logic that knew that tours > >> > (a,b,c,d,...) are the same as tours (b,a,c,d,...), and therefore > >> > insisted the latter are invalid. The chance of generating two tours > >> > that differ only in this way isn't that high, and throwing out 50% of > >> > possible tours to avoid such duplication seems more likely to waste > >> > valuable genetic- refinement generations than to do anything useful. > >> > > >> > This leaves us with no cases in which geqo_eval will deem a tour > >> > invalid, so get rid of assorted kluges that tried to deal with such > >> > cases, in particular the undocumented assumption that DBL_MAX is an > >> > impossible plan cost. > >> > > >> > This is all per testing of Robert Haas' > >> > lets-remove-the-collapse-limits patch. That idea has crashed and > >> > burned, at least for now, but we still got something useful out of it. > >> > > >> > It's possible we should back-patch this change, since the "failed to > >> > make a valid plan" error can happen in existing releases; but I'd > >> > rather not until it has gotten more testing. > >> > >> I think I just ran smack dab into this bug on 8.3.8 (RPM: > >> postgresql-8.3.8-1.fc10.i386). I had a query that wasn't coming out > >> very well with the default settings so I raised the collapse limits > >> and let GEQO have a crack at it. This was not a rousing success. > >> It didn't actually fail, but it did this sort of thing for a real long > >> time. > > > > Yea. Seeing those backtraces all the time was what lead me to use 64bit > > bitmapsets... > > > > The problem with that change is that it might change existing queries > > that work well today to get very slow - I have one such case. Its just a > > happenstance, but... > Wait, which change can make existing queries slow? My original > change, this fix by Tom, or 64-bit bitmapsets? The fix by Tom - it completely changes which plans will get produced (Oh, well. Your change did as well, but nobody thought of backpatching those) Although even the old plans were not really reproducable, so I guess my argument isnt that strong. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
On Mon, Nov 9, 2009 at 10:21 AM, Andres Freund wrote: > On Monday 09 November 2009 16:18:10 Robert Haas wrote: >> On Sun, Jul 19, 2009 at 4:00 PM, Tom Lane wrote: >> > Log Message: >> > --- >> > Rewrite GEQO's gimme_tree function so that it always finds a legal join >> > sequence, even when the input "tour" doesn't lead directly to such a >> > sequence. The stack logic that was added in 2004 only supported cases >> > where relations that had to be joined to each other (due to join order >> > restrictions) were adjacent in the tour. However, relying on a random >> > search to figure that out is tremendously inefficient in large join >> > problems, and could even fail completely (leading to "failed to make a >> > valid plan" errors) if >> > random_init_pool ran out of patience. It seems better to make the >> > tour-to-plan transformation a little bit fuzzier so that every tour can >> > form a legal plan, even though this means that apparently different tours >> > will sometimes yield the same plan. >> > >> > In the same vein, get rid of the logic that knew that tours (a,b,c,d,...) >> > are the same as tours (b,a,c,d,...), and therefore insisted the latter >> > are invalid. The chance of generating two tours that differ only in >> > this way isn't that high, and throwing out 50% of possible tours to >> > avoid such duplication seems more likely to waste valuable genetic- >> > refinement generations than to do anything useful. >> > >> > This leaves us with no cases in which geqo_eval will deem a tour invalid, >> > so get rid of assorted kluges that tried to deal with such cases, in >> > particular the undocumented assumption that DBL_MAX is an impossible >> > plan cost. >> > >> > This is all per testing of Robert Haas' lets-remove-the-collapse-limits >> > patch. That idea has crashed and burned, at least for now, but we still >> > got something useful out of it. >> > >> > It's possible we should back-patch this change, since the "failed to make >> > a valid plan" error can happen in existing releases; but I'd rather not >> > until it has gotten more testing. >> I think I just ran smack dab into this bug on 8.3.8 (RPM: >> postgresql-8.3.8-1.fc10.i386). I had a query that wasn't coming out >> very well with the default settings so I raised the collapse limits >> and let GEQO have a crack at it. This was not a rousing success. >> It didn't actually fail, but it did this sort of thing for a real long >> time. > Yea. Seeing those backtraces all the time was what lead me to use 64bit > bitmapsets... > > The problem with that change is that it might change existing queries that > work well today to get very slow - I have one such case. Its just a > happenstance, but... Wait, which change can make existing queries slow? My original change, this fix by Tom, or 64-bit bitmapsets? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
On Monday 09 November 2009 16:18:10 Robert Haas wrote: > On Sun, Jul 19, 2009 at 4:00 PM, Tom Lane wrote: > > Log Message: > > --- > > Rewrite GEQO's gimme_tree function so that it always finds a legal join > > sequence, even when the input "tour" doesn't lead directly to such a > > sequence. The stack logic that was added in 2004 only supported cases > > where relations that had to be joined to each other (due to join order > > restrictions) were adjacent in the tour. However, relying on a random > > search to figure that out is tremendously inefficient in large join > > problems, and could even fail completely (leading to "failed to make a > > valid plan" errors) if > > random_init_pool ran out of patience. It seems better to make the > > tour-to-plan transformation a little bit fuzzier so that every tour can > > form a legal plan, even though this means that apparently different tours > > will sometimes yield the same plan. > > > > In the same vein, get rid of the logic that knew that tours (a,b,c,d,...) > > are the same as tours (b,a,c,d,...), and therefore insisted the latter > > are invalid. The chance of generating two tours that differ only in > > this way isn't that high, and throwing out 50% of possible tours to > > avoid such duplication seems more likely to waste valuable genetic- > > refinement generations than to do anything useful. > > > > This leaves us with no cases in which geqo_eval will deem a tour invalid, > > so get rid of assorted kluges that tried to deal with such cases, in > > particular the undocumented assumption that DBL_MAX is an impossible > > plan cost. > > > > This is all per testing of Robert Haas' lets-remove-the-collapse-limits > > patch. That idea has crashed and burned, at least for now, but we still > > got something useful out of it. > > > > It's possible we should back-patch this change, since the "failed to make > > a valid plan" error can happen in existing releases; but I'd rather not > > until it has gotten more testing. > I think I just ran smack dab into this bug on 8.3.8 (RPM: > postgresql-8.3.8-1.fc10.i386). I had a query that wasn't coming out > very well with the default settings so I raised the collapse limits > and let GEQO have a crack at it. This was not a rousing success. > It didn't actually fail, but it did this sort of thing for a real long > time. Yea. Seeing those backtraces all the time was what lead me to use 64bit bitmapsets... The problem with that change is that it might change existing queries that work well today to get very slow - I have one such case. Its just a happenstance, but... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
On Sun, Jul 19, 2009 at 4:00 PM, Tom Lane wrote: > Log Message: > --- > Rewrite GEQO's gimme_tree function so that it always finds a legal join > sequence, even when the input "tour" doesn't lead directly to such a sequence. > The stack logic that was added in 2004 only supported cases where relations > that had to be joined to each other (due to join order restrictions) were > adjacent in the tour. However, relying on a random search to figure that out > is tremendously inefficient in large join problems, and could even fail > completely (leading to "failed to make a valid plan" errors) if > random_init_pool ran out of patience. It seems better to make the > tour-to-plan transformation a little bit fuzzier so that every tour can form > a legal plan, even though this means that apparently different tours will > sometimes yield the same plan. > > In the same vein, get rid of the logic that knew that tours (a,b,c,d,...) > are the same as tours (b,a,c,d,...), and therefore insisted the latter > are invalid. The chance of generating two tours that differ only in > this way isn't that high, and throwing out 50% of possible tours to > avoid such duplication seems more likely to waste valuable genetic- > refinement generations than to do anything useful. > > This leaves us with no cases in which geqo_eval will deem a tour invalid, > so get rid of assorted kluges that tried to deal with such cases, in > particular the undocumented assumption that DBL_MAX is an impossible > plan cost. > > This is all per testing of Robert Haas' lets-remove-the-collapse-limits > patch. That idea has crashed and burned, at least for now, but we still > got something useful out of it. > > It's possible we should back-patch this change, since the "failed to make a > valid plan" error can happen in existing releases; but I'd rather not until > it has gotten more testing. I think I just ran smack dab into this bug on 8.3.8 (RPM: postgresql-8.3.8-1.fc10.i386). I had a query that wasn't coming out very well with the default settings so I raised the collapse limits and let GEQO have a crack at it. This was not a rousing success. It didn't actually fail, but it did this sort of thing for a real long time. #0 0x08192c6b in bms_overlap () #1 0x081b6046 in have_join_order_restriction () #2 0x081a9438 in gimme_tree () #3 0x081a9515 in geqo_eval () #4 0x081a9d6c in random_init_pool () #5 0x081a9728 in geqo () #6 0x081abf8d in ?? () #7 0x081be4c3 in query_planner () #8 0x081beedb in ?? () #9 0x081c00ce in subquery_planner () #10 0x081c057c in standard_planner () #11 0x08207caa in pg_plan_query () #12 0x08207df4 in pg_plan_queries () #13 0x082083d4 in ?? () #14 0x08209b3f in PostgresMain () #15 0x081dc1e5 in ?? () #16 0x081dd20a in PostmasterMain () #17 0x08190f96 in main () Nothing makes you appreciate a query that takes 3564.709 ms to run like one that takes 272302.799 ms... ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers