> On 30 Dec 2015, at 10:16, David Rowley <david.row...@2ndquadrant.com> wrote: > > Hi, > > On [1] I suggested an idea to make improvements to the planner around the > Equivalence Class code. Later in [2] Tom raised concerns with this adding too > many planning cycles for a perhaps not common enough situation. I don't want > to discuss that particular patch here, I want to discuss more generally about > the dilemma about adding more smarts to the planner to allow it to generate a > more optimal plan in order to save on execution time. > > In the case of the Equivalence Class Filters code, I quoted an example where > pushing these filters down into the joined relation caused a significant > performance improvement to a query. Now, I understand Tom's concerns with > slowing down the planner, as in cases where the query is short running, or > the optimisations don't apply, then we could cause the query to overall > (including planning time) perform worse. Nobody wants that, but on the other > hand, if spending 5-10 extra microseconds during planning equates to 6 hours > shaved off execution time, then nobody would think to grudge that extra 5-10 > microseconds during planning. > > What I'd like to discuss here is what was touched on on that other thread on > ways to get around this problem: > > A number of ideas were suggested on the other thread about how we might go > about solving this problem. In [3] Simon talked about perhaps enabling extra > optimisations when the planner sees that the plan will cost more than some > given threshold. That's perhaps an option, but may not work well for > optimisations which must take place very early in planning, for example [4]. > Another idea which came up was from Evgeniy [5], which was more of a request > not to do it this way, but never-the-less, the idea was basically to add lots > of GUCs to enable/disable each extra planner feature. >
Well, my idea was to track planning/execution cost in something like pg_stat_statements. That way we can track actual time, not estimated cost like Simon proposed. This table can be combined with Tomas proposal of plan caching. > Another option which I've thought about previously was a planner_strength > GUC, at which various additional optimisations are enabled at various > predefined strength levels, so that databases which tend to spend a great > deal more execution time compared to planning time can turn this up a bit to > see if that helps change that ratio a bit. This idea is far from perfect > though, as who's to say that planner feature X should "kick in" before > planner feature Y? I've also often thought that it might be nice to have it > so the planner does not modify the Parse object, so that the planner has the > option to throw away what it's done so far and start planning all over again > with the "planner_strength" knob turned up to the maximum, if the cost > happened to indicate that the query was going to take a long time to execute. > > In reality we already have some planner features which are possible > candidates for non essential optimisations. For example join removals likely > don't apply in all that many cases, but when they do, this feature is a great > win. So by having some sort of ability to enable/disable planner features we > also stand to actually speed the planner up for fast simple queries. > > I do strongly believe that we need to come up with something to solve this > problem. I already summarised my thoughts on the other thread. > > I wrote: > > I believe that with parallel query on the horizon for 9.6 that we're now > > aiming to support bigger OLAP type database than ever before. So if we > > ignore patches like this one then it appears that we have some conflicting > > goals in the community as it seems that we're willing to add the brawn, but > > we're not willing to add the brain. If this is the case then it's a shame, > > as I think we can have both. So I very much agree on the fact that we must > > find a way to maintain support and high performance of small OLTP databases > > too. > > So here I'd very much like to kick off discussion on an acceptable way to > solve this problem, in a realistic way which we're all happy with. > > Comments are of course welcome. > > [1] > http://www.postgresql.org/message-id/cakjs1f9fk_x_5hkcpcseimy16owe3empmmgsgwlckkj_rw9...@mail.gmail.com > [2] http://www.postgresql.org/message-id/30810.1449335...@sss.pgh.pa.us > [3] > http://www.postgresql.org/message-id/canp8+jlrprn4ynmsrkoqhyi-dw5jrodmot05qejhrayrsex...@mail.gmail.com > [4] > http://www.postgresql.org/message-id/cakjs1f_uz_mxtpot6epxsghsujoucrkuxyhlh06h072rdxs...@mail.gmail.com > [5] > http://www.postgresql.org/message-id/2f30ba8b-dab9-4907-9e4e-102d24256...@gmail.com > > -- > David Rowley http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers