Re: [HACKERS] [PATCH] optional cleaning queries stored in pg_stat_statements

Greg Smith Sun, 06 Nov 2011 11:57:08 -0800

On 11/06/2011 01:08 PM, Tom Lane wrote:

Peter Geoghegan<pe...@2ndquadrant.com>  writes:

... It also does things
like intelligently distinguishing between queries with different
limit/offset constant values, as these constants are deemed to be
differentiators of queries for our purposes. A guiding principal that
I've followed is that anything that could result in a different plan
is a differentiator of queries.

This claim seems like bunk, unless you're hashing the plan tree,
in which case it's tautological.

Peter's patch adds a planner hook and hashes at that level. Since thiscat is rapidly escaping its bag and impacting other contributors, wemight as well share the work in progress early if anyone has a burningdesire to look at the code. A diff against the version I've beeninternally reviewing in prep for community submission is athttps://github.com/greg2ndQuadrant/postgres/compare/master...pg_stat_statements_normEasier to browse than ask questions probing about it I think. Apologiesto Peter for sharing his work before he was completely ready; there is alist of known problems with it. I also regret the thread hijack here.

The first chunk of code is a Python based regression test program, andan open item here is the best way to turn that into a standardregression test set.

There will be additional infrastructure added to the parser to support
normalisation of query strings for the patch I'll be submitting (that
obviously won't be supported in the version that builds against
existing Postgres versions that I'll make available). Essentially,
I'll be adding a length field to certain nodes,

This seems like a good way to get your patch rejected: adding overhead
to the core system for the benefit of a feature that not everyone cares
about is problematic.

Struggling around this area is the main reason this code hasn't beensubmitted yet. To step back for a moment, there are basically threeoptions here that any code like this can take, in regards to storing theprocessed query name used as the key:


1) Use the first instance of that query seen as the "reference" version
2) Use the last instance seen

3) Transform the text of the query in a way that's stable across allduplicates of that statement, and output that

Downstream tools operating on this data, things that will samplepg_stat_statements multiple times, need some sort of stable query textthey can operate on. It really needs to survive server restart too.Neither (1) nor (2) seem like usable solutions. Both have beenimplemented already in Peter's patch, with (2) being what's in thecurrent patch. How best to do (3) instead is not obvious though.

Doing the query matching operating at the planner level seems effectiveat side-stepping the problem of needing to parse the SQL, which is wheremost implementations of this idea get stuck doing fragiletransformations. My own first try at this back in September and Tomas'spatch both fall into that category. That part of Peter's work seems towork as expected. That still leaves the problem of "parsed query ->stable normalized string". I think that is an easier one to solve thandirectly taking on the entirety of "query text -> stable normalizedstring" though. Peter has an idea he's exploring for how to implementthat, and any amount of overhead it adds to people who don't use thisfeature is an obvious concern. There are certainly use cases that don'tcare about this problem, ones that would happily take (1) or (2). I'drather not ship yet another not quite right for everyone version ofpg_stat_statements again though; only solving too limited of a use caseis the big problem with the one that's already there.


--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] optional cleaning queries stored in pg_stat_statements

Reply via email to