Re: [HACKERS] More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)

2015-12-30 Thread Evgeniy Shishkin

> On 30 Dec 2015, at 10:16, David Rowley  wrote:
> 
> Hi,
> 
> On [1] I suggested an idea to make improvements to the planner around the 
> Equivalence Class code. Later in [2] Tom raised concerns with this adding too 
> many planning cycles for a perhaps not common enough situation.  I don't want 
> to discuss that particular patch here, I want to discuss more generally about 
> the dilemma about adding more smarts to the planner to allow it to generate a 
> more optimal plan in order to save on execution time.
> 
> In the case of the Equivalence Class Filters code, I quoted an example where 
> pushing these filters down into the joined relation caused a significant 
> performance improvement to a query. Now, I understand Tom's concerns with 
> slowing down the planner, as in cases where the query is short running, or 
> the optimisations don't apply, then we could cause the query to overall 
> (including planning time) perform worse. Nobody wants that, but on the other 
> hand, if spending 5-10 extra microseconds during planning equates to 6 hours 
> shaved off execution time, then nobody would think to grudge that extra 5-10 
> microseconds during planning.
> 
> What I'd like to discuss here is what was touched on on that other thread on 
> ways to get around this problem:
> 
> A number of ideas were suggested on the other thread about how we might go 
> about solving this problem. In [3] Simon talked about perhaps enabling extra 
> optimisations when the planner sees that the plan will cost more than some 
> given threshold. That's perhaps an option, but may not work well for 
> optimisations which must take place very early in planning, for example [4].
> Another idea which came up was from Evgeniy [5], which was more of a request 
> not to do it this way, but never-the-less, the idea was basically to add lots 
> of GUCs to enable/disable each extra planner feature.
> 

Well, my idea was to track planning/execution cost in something like 
pg_stat_statements.
That way we can track actual time, not estimated cost like Simon proposed.

This table can be combined with Tomas proposal of plan caching.


> Another option which I've thought about previously was a planner_strength 
> GUC, at which various additional optimisations are enabled at various 
> predefined strength levels, so that databases which tend to spend a great 
> deal more execution time compared to planning time can turn this up a bit to 
> see if that helps change that ratio a bit.  This idea is far from perfect 
> though, as who's to say that planner feature X should "kick in" before 
> planner feature Y? I've also often thought that it might be nice to have it 
> so the planner does not modify the Parse object, so that the planner has the 
> option to throw away what it's done so far and start planning all over again 
> with the "planner_strength" knob turned up to the maximum, if the cost 
> happened to indicate that the query was going to take a long time to execute.
> 
> In reality we already have some planner features which are possible 
> candidates for non essential optimisations. For example join removals likely 
> don't apply in all that many cases, but when they do, this feature is a great 
> win. So by having some sort of ability to enable/disable planner features we 
> also stand to actually speed the planner up for fast simple queries. 
> 
> I do strongly believe that we need to come up with something to solve this 
> problem. I already summarised my thoughts on the other thread.
> 
> I wrote:
> > I believe that with parallel query on the horizon for 9.6 that we're now
> > aiming to support bigger OLAP type database than ever before. So if we
> > ignore patches like this one then it appears that we have some conflicting
> > goals in the community as it seems that we're willing to add the brawn, but
> > we're not willing to add the brain. If this is the case then it's a shame,
> > as I think we can have both. So I very much agree on the fact that we must
> > find a way to maintain support and high performance of small OLTP databases
> > too.
> 
> So here I'd very much like to kick off discussion on an acceptable way to 
> solve this problem, in a realistic way which we're all happy with.
> 
> Comments are of course welcome.
> 
> [1] 
> http://www.postgresql.org/message-id/cakjs1f9fk_x_5hkcpcseimy16owe3empmmgsgwlckkj_rw9...@mail.gmail.com
> [2] http://www.postgresql.org/message-id/30810.1449335...@sss.pgh.pa.us
> [3] 
> http://www.postgresql.org/message-id/canp8+jlrprn4ynmsrkoqhyi-dw5jrodmot05qejhrayrsex...@mail.gmail.com
> [4] 
> http://www.postgresql.org/message-id/cakjs1f_uz_mxtpot6epxsghsujoucrkuxyhlh06h072rdxs...@mail.gmail.com
> [5] 
> http://www.postgresql.org/message-id/2f30ba8b-dab9-4907-9e4e-102d24256...@gmail.com
> 
> -- 
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
T

Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-07 Thread Evgeniy Shishkin

> On 07 Dec 2015, at 22:27, Gavin Flower  wrote:
> 
> On 08/12/15 05:27, David G. Johnston wrote:
>> On Mon, Dec 7, 2015 at 8:35 AM, Jim Nasby > >wrote:
>> 
>>On 12/6/15 10:38 AM, Tom Lane wrote:
>> 
>>I said "in most cases".  You can find example cases to support
>>almost any
>>weird planner optimization no matter how expensive and
>>single-purpose;
>>but that is the wrong way to think about it.  What you have to
>>think about
>>is average cases, and in particular, not putting a drag on
>>planning time
>>in cases where no benefit ensues.  We're not committing any
>>patches that
>>give one uncommon case an 1100X speedup by penalizing every
>>other query 10%,
>>or even 1%; especially not when there may be other ways to fix it.
>> 
>> 
>>This is a problem that seriously hurts Postgres in data
>>warehousing applications. We can't keep ignoring optimizations
>>that provide even as little as 10% execution improvements for 10x
>>worse planner performance, because in a warehouse it's next to
>>impossible for planning time to matter.
>> 
>>Obviously it'd be great if there was a fast, easy way to figure
>>out whether a query would be expensive enough to go the whole 9
>>yards on planning it but at this point I suspect a simple GUC
>>would be a big improvement.
>> 
>> 
>> Something like "enable_equivalencefilters" but that defaults to false unlike 
>> every one existing "enable_*" GUC?
>> 
>> ​It would be a lot more user-friendly to have something along the lines of 
>> "planner_mode (text)" with labels like "star, transactional, bulk_load, 
>> etc..." because I suspect there are other things we'd want to add if we 
>> start identifying queries by their type/usage and optimize accordingly. 
>> Having the knobs available is necessary but putting on a façade would make 
>> the user more straight-forward for the common cases.
>> 
>> David J.
>> 
> How about:
> 
> planning_time_base 10  # Default effort, may be increased or decreased as 
> required - must be at least 1
> planning_time_  0  # By default, planner makes no (or minimal) effort to 
> optimise for feature 
> 
> So for some people, adjusting planning_time_base may be sufficient - but for 
> more specialised cases, people can tell the planner to consider expending 
> more effort.
> 

Mysql have now 19 optimizer_switch parameters
https://dev.mysql.com/doc/refman/5.7/en/switchable-optimizations.html

Please don't do that.


I'd rather like some sort of pg_stat_statements, which would track execution 
and planning time.
On new query, we can lookup if query can benefit from more planning time.
But i don't know how costly this can be. 

> 
> Cheers,
> Gavin
> 
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO: replica information functions

2015-07-28 Thread Evgeniy Shishkin

> On 28 Jul 2015, at 21:35, Josh Berkus  wrote:
> 
> Hackers,
> 
> Since merging recovery.conf with postgresql.conf is apparently off the
> table indefinitely, we could really use some additional information
> functions which work on the replica.  Here's my list of what I need for
> failover automation:
> 
> pg_standby_is_streaming()
>   returns true if the standby is configured for streaming and
>   is currently connected with the master.
>   returns false if the connection to the master is broken,
>   of if there is no primary_conninfo
> 

I believe we should have some function to tell if standby is configured for 
streaming
and another function to tell if everything is okay.


> pg_standby_conninfo()
>   returns connection string to master.  Superuser-only for
>   previously discussed reasons
> 
> pg_recovery_config(config_item TEXT)
>   returns the specified configuration item from recovery.conf
>   superuser-only?
> 
> Does this make sense?  Is there other information we need?
> 
> -- 
> Josh Berkus
> PostgreSQL Experts Inc.
> http://pgexperts.com
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON data type and schema validation

2015-05-04 Thread Evgeniy Shishkin

> On 04 May 2015, at 06:20, Dmitry Shirokov  wrote:
> 
> Hi all,
> 
> Are there any plans to introduce in next versions of Postgres a schema 
> validation for JSON field type? It would be very nice to have a support of 
> something like json-schema spec, see 
> http://json-schema.org/documentation.html. Right now there's the only way to 
> do it via individual constraints, which is not very convenient in most cases. 
> Please correct me if I'm wrong.
> 

Take a look at https://github.com/akorotkov/jsquery
You can find schema validation example in docs. 

> Cheers,
> Dmitry



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers