Re: [HACKERS] More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

2015-12-30 Thread Evgeniy Shishkin

> On 30 Dec 2015, at 10:16, David Rowley  wrote:
> 
> Hi,
> 
> On [1] I suggested an idea to make improvements to the planner around the 
> Equivalence Class code. Later in [2] Tom raised concerns with this adding too 
> many planning cycles for a perhaps not common enough situation.  I don't want 
> to discuss that particular patch here, I want to discuss more generally about 
> the dilemma about adding more smarts to the planner to allow it to generate a 
> more optimal plan in order to save on execution time.
> 
> In the case of the Equivalence Class Filters code, I quoted an example where 
> pushing these filters down into the joined relation caused a significant 
> performance improvement to a query. Now, I understand Tom's concerns with 
> slowing down the planner, as in cases where the query is short running, or 
> the optimisations don't apply, then we could cause the query to overall 
> (including planning time) perform worse. Nobody wants that, but on the other 
> hand, if spending 5-10 extra microseconds during planning equates to 6 hours 
> shaved off execution time, then nobody would think to grudge that extra 5-10 
> microseconds during planning.
> 
> What I'd like to discuss here is what was touched on on that other thread on 
> ways to get around this problem:
> 
> A number of ideas were suggested on the other thread about how we might go 
> about solving this problem. In [3] Simon talked about perhaps enabling extra 
> optimisations when the planner sees that the plan will cost more than some 
> given threshold. That's perhaps an option, but may not work well for 
> optimisations which must take place very early in planning, for example [4].
> Another idea which came up was from Evgeniy [5], which was more of a request 
> not to do it this way, but never-the-less, the idea was basically to add lots 
> of GUCs to enable/disable each extra planner feature.
> 

Well, my idea was to track planning/execution cost in something like 
pg_stat_statements.
That way we can track actual time, not estimated cost like Simon proposed.

This table can be combined with Tomas proposal of plan caching.


> Another option which I've thought about previously was a planner_strength 
> GUC, at which various additional optimisations are enabled at various 
> predefined strength levels, so that databases which tend to spend a great 
> deal more execution time compared to planning time can turn this up a bit to 
> see if that helps change that ratio a bit.  This idea is far from perfect 
> though, as who's to say that planner feature X should "kick in" before 
> planner feature Y? I've also often thought that it might be nice to have it 
> so the planner does not modify the Parse object, so that the planner has the 
> option to throw away what it's done so far and start planning all over again 
> with the "planner_strength" knob turned up to the maximum, if the cost 
> happened to indicate that the query was going to take a long time to execute.
> 
> In reality we already have some planner features which are possible 
> candidates for non essential optimisations. For example join removals likely 
> don't apply in all that many cases, but when they do, this feature is a great 
> win. So by having some sort of ability to enable/disable planner features we 
> also stand to actually speed the planner up for fast simple queries. 
> 
> I do strongly believe that we need to come up with something to solve this 
> problem. I already summarised my thoughts on the other thread.
> 
> I wrote:
> > I believe that with parallel query on the horizon for 9.6 that we're now
> > aiming to support bigger OLAP type database than ever before. So if we
> > ignore patches like this one then it appears that we have some conflicting
> > goals in the community as it seems that we're willing to add the brawn, but
> > we're not willing to add the brain. If this is the case then it's a shame,
> > as I think we can have both. So I very much agree on the fact that we must
> > find a way to maintain support and high performance of small OLTP databases
> > too.
> 
> So here I'd very much like to kick off discussion on an acceptable way to 
> solve this problem, in a realistic way which we're all happy with.
> 
> Comments are of course welcome.
> 
> [1] 
> http://www.postgresql.org/message-id/cakjs1f9fk_x_5hkcpcseimy16owe3empmmgsgwlckkj_rw9...@mail.gmail.com
> [2] http://www.postgresql.org/message-id/30810.1449335...@sss.pgh.pa.us
> [3] 
> http://www.postgresql.org/message-id/canp8+jlrprn4ynmsrkoqhyi-dw5jrodmot05qejhrayrsex...@mail.gmail.com
> [4] 
> http://www.postgresql.org/message-id/cakjs1f_uz_mxtpot6epxsghsujoucrkuxyhlh06h072rdxs...@mail.gmail.com
> [5] 
> http://www.postgresql.org/message-id/2f30ba8b-dab9-4907-9e4e-102d24256...@gmail.com
> 
> -- 
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services



-- 
Sent via pgsql-hackers mailing list 

Re: [HACKERS] More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

2015-12-30 Thread David Rowley
On 30 December 2015 at 21:12, Benedikt Grundmann 
wrote:

> On Wed, Dec 30, 2015 at 7:16 AM, David Rowley <
> david.row...@2ndquadrant.com> wrote:
>
>>
>> A number of ideas were suggested on the other thread about how we might
>> go about solving this problem. In [3] Simon talked about perhaps enabling
>> extra optimisations when the planner sees that the plan will cost more than
>> some given threshold. That's perhaps an option, but may not work well for
>> optimisations which must take place very early in planning, for example [4].
>>
>
> A small tweak on 3 to deal with 4.  If the returned plan cost is quite
> high (say you estimate minutes+) you could just restart planning from
> scratch with all costly planning enabled, because even in the worst case
> (that is the additional options don't find a better plan), the total
> planning cost won't matter much in the grand scheme of things.
>

I do personally quite like this idea. Quite likely the extra logic could be
added to the planner() function so that it calls standard_planner() again
in the event that the cost exceeds some specified threshold. I think the
planner might need a little bit of work before replanning on the same parse
is ok, as there's places where the planner makes changes to this object
which cause things not to work well during the replan. So I think if we
went down this route, then the first steps should be to find alternative
ways to do things so that the parse is never edited, and set new standards
that the parse cannot be changed within the planner anymore.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

2015-12-30 Thread Benedikt Grundmann
On Wed, Dec 30, 2015 at 7:16 AM, David Rowley 
wrote:

>
> A number of ideas were suggested on the other thread about how we might go
> about solving this problem. In [3] Simon talked about perhaps enabling
> extra optimisations when the planner sees that the plan will cost more than
> some given threshold. That's perhaps an option, but may not work well for
> optimisations which must take place very early in planning, for example [4].
>

A small tweak on 3 to deal with 4.  If the returned plan cost is quite high
(say you estimate minutes+) you could just restart planning from scratch
with all costly planning enabled, because even in the worst case (that is
the additional options don't find a better plan), the total planning cost
won't matter much in the grand scheme of things.


[HACKERS] More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

2015-12-29 Thread David Rowley
Hi,

On [1] I suggested an idea to make improvements to the planner around the
Equivalence Class code. Later in [2] Tom raised concerns with this adding
too many planning cycles for a perhaps not common enough situation.  I
don't want to discuss that particular patch here, I want to discuss more
generally about the dilemma about adding more smarts to the planner to
allow it to generate a more optimal plan in order to save on execution time.

In the case of the Equivalence Class Filters code, I quoted an example
where pushing these filters down into the joined relation caused a
significant performance improvement to a query. Now, I understand Tom's
concerns with slowing down the planner, as in cases where the query is
short running, or the optimisations don't apply, then we could cause the
query to overall (including planning time) perform worse. Nobody wants
that, but on the other hand, if spending 5-10 extra microseconds during
planning equates to 6 hours shaved off execution time, then nobody would
think to grudge that extra 5-10 microseconds during planning.

What I'd like to discuss here is what was touched on on that other thread
on ways to get around this problem:

A number of ideas were suggested on the other thread about how we might go
about solving this problem. In [3] Simon talked about perhaps enabling
extra optimisations when the planner sees that the plan will cost more than
some given threshold. That's perhaps an option, but may not work well for
optimisations which must take place very early in planning, for example [4].
Another idea which came up was from Evgeniy [5], which was more of a
request not to do it this way, but never-the-less, the idea was basically
to add lots of GUCs to enable/disable each extra planner feature.

Another option which I've thought about previously was a planner_strength
GUC, at which various additional optimisations are enabled at various
predefined strength levels, so that databases which tend to spend a great
deal more execution time compared to planning time can turn this up a bit
to see if that helps change that ratio a bit.  This idea is far from
perfect though, as who's to say that planner feature X should "kick in"
before planner feature Y? I've also often thought that it might be nice to
have it so the planner does not modify the Parse object, so that the
planner has the option to throw away what it's done so far and start
planning all over again with the "planner_strength" knob turned up to the
maximum, if the cost happened to indicate that the query was going to take
a long time to execute.

In reality we already have some planner features which are possible
candidates for non essential optimisations. For example join removals
likely don't apply in all that many cases, but when they do, this feature
is a great win. So by having some sort of ability to enable/disable planner
features we also stand to actually speed the planner up for fast simple
queries.

I do strongly believe that we need to come up with something to solve this
problem. I already summarised my thoughts on the other thread.

I wrote:
> I believe that with parallel query on the horizon for 9.6 that we're now
> aiming to support bigger OLAP type database than ever before. So if we
> ignore patches like this one then it appears that we have some conflicting
> goals in the community as it seems that we're willing to add the brawn,
but
> we're not willing to add the brain. If this is the case then it's a shame,
> as I think we can have both. So I very much agree on the fact that we must
> find a way to maintain support and high performance of small OLTP
databases
> too.

So here I'd very much like to kick off discussion on an acceptable way to
solve this problem, in a realistic way which we're all happy with.

Comments are of course welcome.

[1]
http://www.postgresql.org/message-id/cakjs1f9fk_x_5hkcpcseimy16owe3empmmgsgwlckkj_rw9...@mail.gmail.com
[2] http://www.postgresql.org/message-id/30810.1449335...@sss.pgh.pa.us
[3]
http://www.postgresql.org/message-id/canp8+jlrprn4ynmsrkoqhyi-dw5jrodmot05qejhrayrsex...@mail.gmail.com
[4]
http://www.postgresql.org/message-id/cakjs1f_uz_mxtpot6epxsghsujoucrkuxyhlh06h072rdxs...@mail.gmail.com
[5]
http://www.postgresql.org/message-id/2f30ba8b-dab9-4907-9e4e-102d24256...@gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services