Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sun, Dec 8, 2013 at 1:00 AM, Peter Geoghegan p...@heroku.com wrote: On Sat, Dec 7, 2013 at 3:50 PM, Peter Eisentraut pete...@gmx.net wrote: 32-bit buildfarm members are having problems with this patch. This should fix that problem. Thanks. Applied. I also noted on http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=frogmouthdt=2013-12-08%2007%3A30%3A01stg=make-contrib that there are compiler warnings being generated in pgss. But from a quick look that looks like something pre-existing and not caused by the latest patch. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sat, 2013-12-07 at 16:00 -0800, Peter Geoghegan wrote: On Sat, Dec 7, 2013 at 3:50 PM, Peter Eisentraut pete...@gmx.net wrote: 32-bit buildfarm members are having problems with this patch. This should fix that problem. Thanks. This is incidentally the same problem that was reported here about another pg_stat_statements patch: http://www.postgresql.org/message-id/bf2827dcce55594c8d7a8f7ffd3ab7713ddac...@szxeml508-mbx.china.huawei.com Can we make this more robust so that we don't accidentally keep breaking 32-bit systems? Maybe a static assert? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sat, Dec 7, 2013 at 6:31 AM, Peter Geoghegan p...@heroku.com wrote: On Fri, Dec 6, 2013 at 12:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: There seems to be no problem even if we use bigint as the type of unsigned 32-bit integer like queryid. For example, txid_current() returns the transaction ID, i.e., unsigned 32-bit integer, as bigint. Could you tell me what the problem is when using bigint for queryid? We're talking about the output of some view, right, not internal storage? +1 for using bigint for that. Using OID is definitely an abuse, because the value *isn't* an OID. And besides, what if we someday decide we need 64-bit keys not 32-bit? Fair enough. I was concerned about the cost of external storage of 64-bit integers (unlike query text, they might have to be stored many times for many distinct intervals or something like that), but in hindsight that was fairly miserly of me. Attached revision displays signed 64-bit integers instead. Thanks! Looks good to me. Committed! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sun, 2013-12-08 at 02:08 +0900, Fujii Masao wrote: Attached revision displays signed 64-bit integers instead. Thanks! Looks good to me. Committed! 32-bit buildfarm members are having problems with this patch. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sat, Dec 7, 2013 at 3:50 PM, Peter Eisentraut pete...@gmx.net wrote: 32-bit buildfarm members are having problems with this patch. This should fix that problem. Thanks. -- Peter Geoghegan diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c new file mode 100644 index 4e262b4..9f3e376 *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** pg_stat_statements(PG_FUNCTION_ARGS) *** 1160,1165 --- 1160,1166 bool nulls[PG_STAT_STATEMENTS_COLS]; int i = 0; Counters tmp; + int64 queryid = entry-key.queryid; memset(values, 0, sizeof(values)); memset(nulls, 0, sizeof(nulls)); *** pg_stat_statements(PG_FUNCTION_ARGS) *** 1172,1178 char *qstr; if (detected_version = PGSS_V1_2) ! values[i++] = Int64GetDatumFast((int64) entry-key.queryid); qstr = (char *) pg_do_encoding_conversion((unsigned char *) entry-query, --- 1173,1179 char *qstr; if (detected_version = PGSS_V1_2) ! values[i++] = Int64GetDatumFast(queryid); qstr = (char *) pg_do_encoding_conversion((unsigned char *) entry-query, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sun, Nov 24, 2013 at 10:58 AM, Peter Geoghegan p...@heroku.com wrote: On Mon, Nov 18, 2013 at 1:54 AM, Sameer Thakur samthaku...@gmail.com wrote: Please find v10 of patch attached. This patch addresses following review comments I've cleaned this up - revision attached - and marked it ready for committer. I decided that queryid should be of type oid, not bigint. This is arguably a slight abuse of notation, but since ultimately Oids are just abstract object identifiers (so say the docs), but also because there is no other convenient, minimal way of representing unsigned 32-bit integers in the view that I'm aware of, I'm inclined to think that it's appropriate. There seems to be no problem even if we use bigint as the type of unsigned 32-bit integer like queryid. For example, txid_current() returns the transaction ID, i.e., unsigned 32-bit integer, as bigint. Could you tell me what the problem is when using bigint for queryid? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
Fujii Masao masao.fu...@gmail.com writes: On Sun, Nov 24, 2013 at 10:58 AM, Peter Geoghegan p...@heroku.com wrote: I decided that queryid should be of type oid, not bigint. This is arguably a slight abuse of notation, but since ultimately Oids are just abstract object identifiers (so say the docs), but also because there is no other convenient, minimal way of representing unsigned 32-bit integers in the view that I'm aware of, I'm inclined to think that it's appropriate. There seems to be no problem even if we use bigint as the type of unsigned 32-bit integer like queryid. For example, txid_current() returns the transaction ID, i.e., unsigned 32-bit integer, as bigint. Could you tell me what the problem is when using bigint for queryid? We're talking about the output of some view, right, not internal storage? +1 for using bigint for that. Using OID is definitely an abuse, because the value *isn't* an OID. And besides, what if we someday decide we need 64-bit keys not 32-bit? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Fri, Dec 6, 2013 at 12:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: There seems to be no problem even if we use bigint as the type of unsigned 32-bit integer like queryid. For example, txid_current() returns the transaction ID, i.e., unsigned 32-bit integer, as bigint. Could you tell me what the problem is when using bigint for queryid? We're talking about the output of some view, right, not internal storage? +1 for using bigint for that. Using OID is definitely an abuse, because the value *isn't* an OID. And besides, what if we someday decide we need 64-bit keys not 32-bit? Fair enough. I was concerned about the cost of external storage of 64-bit integers (unlike query text, they might have to be stored many times for many distinct intervals or something like that), but in hindsight that was fairly miserly of me. Attached revision displays signed 64-bit integers instead. -- Peter Geoghegan diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile new file mode 100644 index e8aed61..95a2767 *** a/contrib/pg_stat_statements/Makefile --- b/contrib/pg_stat_statements/Makefile *** MODULE_big = pg_stat_statements *** 4,11 OBJS = pg_stat_statements.o EXTENSION = pg_stat_statements ! DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \ ! pg_stat_statements--unpackaged--1.0.sql ifdef USE_PGXS PG_CONFIG = pg_config --- 4,11 OBJS = pg_stat_statements.o EXTENSION = pg_stat_statements ! DATA = pg_stat_statements--1.2.sql pg_stat_statements--1.1--1.2.sql \ ! pg_stat_statements--1.0--1.1.sql pg_stat_statements--unpackaged--1.0.sql ifdef USE_PGXS PG_CONFIG = pg_config diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql new file mode 100644 index ...7824e3e *** a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql --- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql *** *** 0 --- 1,43 + /* contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql */ + + -- complain if script is sourced in psql, rather than via ALTER EXTENSION + \echo Use ALTER EXTENSION pg_stat_statements UPDATE to load this file. \quit + + /* First we have to remove them from the extension */ + ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements; + ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(); + + /* Then we can drop them */ + DROP VIEW pg_stat_statements; + DROP FUNCTION pg_stat_statements(); + + /* Now redefine */ + CREATE FUNCTION pg_stat_statements( + OUT userid oid, + OUT dbid oid, + OUT queryid bigint, + OUT query text, + OUT calls int8, + OUT total_time float8, + OUT rows int8, + OUT shared_blks_hit int8, + OUT shared_blks_read int8, + OUT shared_blks_dirtied int8, + OUT shared_blks_written int8, + OUT local_blks_hit int8, + OUT local_blks_read int8, + OUT local_blks_dirtied int8, + OUT local_blks_written int8, + OUT temp_blks_read int8, + OUT temp_blks_written int8, + OUT blk_read_time float8, + OUT blk_write_time float8 + ) + RETURNS SETOF record + AS 'MODULE_PATHNAME' + LANGUAGE C; + + CREATE VIEW pg_stat_statements AS + SELECT * FROM pg_stat_statements(); + + GRANT SELECT ON pg_stat_statements TO PUBLIC; diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.1.sql b/contrib/pg_stat_statements/pg_stat_statements--1.1.sql new file mode . index 42e4d68..e69de29 *** a/contrib/pg_stat_statements/pg_stat_statements--1.1.sql --- b/contrib/pg_stat_statements/pg_stat_statements--1.1.sql *** *** 1,43 - /* contrib/pg_stat_statements/pg_stat_statements--1.1.sql */ - - -- complain if script is sourced in psql, rather than via CREATE EXTENSION - \echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit - - -- Register functions. - CREATE FUNCTION pg_stat_statements_reset() - RETURNS void - AS 'MODULE_PATHNAME' - LANGUAGE C; - - CREATE FUNCTION pg_stat_statements( - OUT userid oid, - OUT dbid oid, - OUT query text, - OUT calls int8, - OUT total_time float8, - OUT rows int8, - OUT shared_blks_hit int8, - OUT shared_blks_read int8, - OUT shared_blks_dirtied int8, - OUT shared_blks_written int8, - OUT local_blks_hit int8, - OUT local_blks_read int8, - OUT local_blks_dirtied int8, - OUT local_blks_written int8, - OUT temp_blks_read int8, - OUT temp_blks_written int8, - OUT blk_read_time float8, - OUT blk_write_time float8 - ) - RETURNS SETOF record - AS 'MODULE_PATHNAME' - LANGUAGE C; - - -- Register a view on the function for ease of use. - CREATE VIEW pg_stat_statements AS - SELECT * FROM pg_stat_statements(); - - GRANT SELECT ON pg_stat_statements TO PUBLIC; - - -- Don't want this to be available to non-superusers. - REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC; --- 0
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
I've cleaned this up - revision attached - and marked it ready for committer. Thank you for this. I did the basic hygiene test. The patch applies correctly and compiles with no warnings. Did not find anything broken in basic functionality. In the documentation i have a minor suggestion of replacing phrase might judge to be a non-distinct with - may judge to be non- distinct. regards Sameer -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5781577.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Mon, Nov 18, 2013 at 1:54 AM, Sameer Thakur samthaku...@gmail.com wrote: Please find v10 of patch attached. This patch addresses following review comments I've cleaned this up - revision attached - and marked it ready for committer. I decided that queryid should be of type oid, not bigint. This is arguably a slight abuse of notation, but since ultimately Oids are just abstract object identifiers (so say the docs), but also because there is no other convenient, minimal way of representing unsigned 32-bit integers in the view that I'm aware of, I'm inclined to think that it's appropriate. In passing, I've made pg_stat_statements invalidate serialized entries if there is a change in major version. This seems desirable as a catch-all invalidator of entries. I note that Tom has objected to exposing the queryid in the past, on numerous occasions. I'm more confident than ever that it's actually the right thing to do. I've had people I don't know walk up to me at conferences and ask me what we don't already expose this at least twice now. There are very strong indications that many people want this, and given that I've documented the caveats, I think that we should trust those calling for this. At the very least, it allows people to GROUP BY queryid, when they don't want things broken out by userid. We're self-evidently already effectively relying on the queryid to be as stable as it is documented to be in this patch. The hash function cannot really change in minor releases, because to do so would at the very least necessitate re-indexing hash indexes, and would of course invalidate internally managed pg_stat_statements entries, both of which are undesirable outcomes (and therefore, for these reasons and more, unlikely). Arguments for not documenting hash_any() do not apply here -- we're already suffering the full consequences of whatever queryid instability may exist. Quite apart from all of that, I think we need to have a way of identifying particular entries for the purposes of supporting per-entry settings. Recent discussion about min/max query time, or somehow characterizing the distribution of each entry's historic execution time (or whatever) have not considered one important questoin: What are you supposed to do when you find out that there is an outlier (whatever an outlier is)? I won't go into the details, because there is little point, but I'm reasonably confident that it will be virtually impossible for pg_stat_statements itself to usefully classify particular query executions as outliers (I'm not even sure that we could do it if we assumed a normal distribution, which would be bogus, and certainly made very noisy by caching effects and so on. Furthermore, who are we to say that an outlier is an execution time two sigmas to the right? Seems useless). Outliers are typically caused by things like bad plans, or problematic constant values that appear in the most common values list (and are therefore just inherently far more expensive to query against), or lock contention. In all of those cases, with a min/max or something we probably won't even get to know what the problematic constant values were when response time suddenly suffers, because of course pg_stat_statements doesn't help with that. So have we gained much? Even with detective work, the trail might have gone cold by the time the outlier is examined. And, monitoring is only one concern -- what about alerting? The bigger point is that having this will facilitate being able to mark entries as SLA queries or something like that, where if their execution exceeds a time (specified by the DBA per entry), that is assumed to be very bad, and pg_stat_statements complains. That gets dumped to the logs (which ought to be a rare occurrence when the facility is used correctly). Of course, the (typically particularly problematic) constant values *do* appear in the logs, and there is a greppable keyword, potentially for the benefit of a tool like tail_n_mail. You could think of this as being like a smart log_min_duration_statement. I think that the DBA needs to tell pg_stat_statements what to care about, and what bad looks like. If the DBA doesn't know where to start specifying such things, the 5 queries with the most calls can usefully have this set to (mean_execution_time * 1.5) or something. SLA queries can also be pinned, perhaps (that is, given a stay of execution when eviction occurs). -- Peter Geoghegan diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile new file mode 100644 index e8aed61..95a2767 *** a/contrib/pg_stat_statements/Makefile --- b/contrib/pg_stat_statements/Makefile *** MODULE_big = pg_stat_statements *** 4,11 OBJS = pg_stat_statements.o EXTENSION = pg_stat_statements ! DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \ ! pg_stat_statements--unpackaged--1.0.sql ifdef USE_PGXS PG_CONFIG = pg_config --- 4,11 OBJS =
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
Hello, Please find v10 of patch attached. This patch addresses following review comments 1. Removed errcode and used elogs for error pg_stat_statements schema is not supported by its binary 2. Removed comments and other code formatting not directly relevant to patch functionality 3. changed position of query_id in view to userid,dbid,query_id.. 4 cleaned the patch some more to avoid unnecessary whitespaces, newlines. I assume the usage of PGSS_TUP_LATEST after explanation given. Also the mixing of PG_VERSION_NUM with query_id is ok after after explanation given. regards Sameer pg_stat_statements-identification-v10.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Tue, Nov 5, 2013 at 5:30 AM, Sameer Thakur samthaku...@gmail.com wrote: Hello, Please find attached pg_stat_statements-identification-v9.patch. I took a quick look. Observations: + /* Making query ID dependent on PG version */ + query-queryId |= PG_VERSION_NUM 16; If you want to do something like this, make the value of PGSS_FILE_HEADER incorporate (PG_VERSION_NUM / 100) or something. Why are you doing this? @@ -128,6 +146,7 @@ typedef struct pgssEntry pgssHashKey key;/* hash key of entry - MUST BE FIRST */ Counterscounters; /* the statistics for this query */ int query_len; /* # of valid bytes in query string */ + uint32 query_id; /* jumble value for this entry */ query_id is already in key. Not sure I like the idea of the new enum at all, but in any case you shouldn't have a PGSS_TUP_LATEST constant - should someone go update all usage of that constant only when your version isn't the latest? Like here: + if (detected_version = PGSS_TUP_LATEST) I forget why Daniel originally altered the min value of pg_stat_statements.max to 1 (I just remember that he did), but I don't think it holds that you should keep it there. Have you considered the failure modes when it is actually set to 1? This is what I call a can't happen error, or a defensive one: + else + { + /* +* Couldn't identify the tuple format. Raise error. +* +* This is an exceptional case that may only happen in bizarre +* situations, since it is thought that every released version +* of pg_stat_statements has a matching schema. +*/ + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), +errmsg(pg_stat_statements schema is not supported + by its installed binary))); + } I'll generally make these simple elogs(), which are more terse. No one is going to find all that dressing useful. Please take a look at this, for future reference: https://wiki.postgresql.org/wiki/Creating_Clean_Patches The whitespace changes are distracting. It probably isn't useful to comment random, unaffected code that isn't affected by your patch - I don't find this new refactoring useful, and am surprised to see it in your patch: + /* Check header existence and magic number match. */ if (fread(header, sizeof(uint32), 1, file) != 1 || - header != PGSS_FILE_HEADER || - fread(num, sizeof(int32), 1, file) != 1) + header != PGSS_FILE_HEADER) + goto error; + + /* Read how many table entries there are. */ + if (fread(num, sizeof(int32), 1, file) != 1) goto error; Did you mean to add all this, or is it left over from Daniel's patch? @@ -43,6 +43,7 @@ */ #include postgres.h +#include time.h #include unistd.h #include access/hash.h @@ -59,15 +60,18 @@ #include storage/spin.h #include tcop/utility.h #include utils/builtins.h +#include utils/timestamp.h Final thought: I think the order in the pg_stat_statements view is wrong. It ought to be like a composite primary key - (userid, dbid, query_id). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Nov 14, 2013 at 7:25 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Nov 5, 2013 at 5:30 AM, Sameer Thakur samthaku...@gmail.com wrote: Hello, Please find attached pg_stat_statements-identification-v9.patch. I took a quick look. Observations: + /* Making query ID dependent on PG version */ + query-queryId |= PG_VERSION_NUM 16; If you want to do something like this, make the value of PGSS_FILE_HEADER incorporate (PG_VERSION_NUM / 100) or something. Why are you doing this? The destabilization of the query_id is to attempt to skirt the objection that the unpredictability of instability in the query_id would otherwise cause problems and present a false contract, particular in regards to point release upgrades. Earliest drafts of mine included destabilizing every session, but this kills off a nice use case of correlating query ids between primaries and standbys, if memory serves. This has the semantics of destabilizing -- for sure -- every point release. @@ -128,6 +146,7 @@ typedef struct pgssEntry pgssHashKey key;/* hash key of entry - MUST BE FIRST */ Counterscounters; /* the statistics for this query */ int query_len; /* # of valid bytes in query string */ + uint32 query_id; /* jumble value for this entry */ query_id is already in key. Not sure I like the idea of the new enum at all, but in any case you shouldn't have a PGSS_TUP_LATEST constant - should someone go update all usage of that constant only when your version isn't the latest? Like here: + if (detected_version = PGSS_TUP_LATEST) It is roughly modeled after the column version of the same thing that pre-existed in pg_stat_statements. The only reason to have a LATEST is for some of the invariants though; otherwise, most uses should use the version-stamped symbol. I did this wrongly a bunch of times as spotted by Alvaro, I believe. I forget why Daniel originally altered the min value of pg_stat_statements.max to 1 (I just remember that he did), but I don't think it holds that you should keep it there. Have you considered the failure modes when it is actually set to 1? Testing. It was very useful to set to small numbers, like two or three. It's not crucial to the patch at all but having to whack it back all the time to keep the patch submission devoid of it seemed impractically tedious during revisions. This is what I call a can't happen error, or a defensive one: + else + { + /* +* Couldn't identify the tuple format. Raise error. +* +* This is an exceptional case that may only happen in bizarre +* situations, since it is thought that every released version +* of pg_stat_statements has a matching schema. +*/ + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), +errmsg(pg_stat_statements schema is not supported + by its installed binary))); + } I'll generally make these simple elogs(), which are more terse. No one is going to find all that dressing useful. That seems reasonable. Yes, it's basically a soft assert. I hit this one in development a few times, it was handy. It probably isn't useful to comment random, unaffected code that isn't affected by your patch - I don't find this new refactoring useful, and am surprised to see it in your patch: + /* Check header existence and magic number match. */ if (fread(header, sizeof(uint32), 1, file) != 1 || - header != PGSS_FILE_HEADER || - fread(num, sizeof(int32), 1, file) != 1) + header != PGSS_FILE_HEADER) + goto error; + + /* Read how many table entries there are. */ + if (fread(num, sizeof(int32), 1, file) != 1) goto error; Did you mean to add all this, or is it left over from Daniel's patch? The whitespace changes are not intentional, but I think the comments help a reader track what's going on better, which becomes more important if one adds more fields. It can be broken out into a separate patch, but it didn't seem like that bookkeeping was necessary. @@ -43,6 +43,7 @@ */ #include postgres.h +#include time.h #include unistd.h #include access/hash.h @@ -59,15 +60,18 @@ #include storage/spin.h #include tcop/utility.h #include utils/builtins.h +#include utils/timestamp.h Final thought: I think the order in the pg_stat_statements view is wrong. It ought to be like a composite primary key - (userid, dbid, query_id). I made no design decisions here...no complaints from me. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
I took a quick look. Observations: + /* Making query ID dependent on PG version */ + query-queryId |= PG_VERSION_NUM 16; If you want to do something like this, make the value of PGSS_FILE_HEADER incorporate (PG_VERSION_NUM / 100) or something. Why are you doing this? The thought was queryid should have a different value for the same query across PG versions, to ensure that clients using the view,do not assume otherwise. @@ -128,6 +146,7 @@ typedef struct pgssEntry pgssHashKey key; /* hash key of entry - MUST BE FIRST */ Counters counters; /* the statistics for this query */ int query_len; /* # of valid bytes in query string */ + uint32 query_id; /* jumble value for this entry */ query_id is already in key. Not sure I like the idea of the new enum at all, but in any case you shouldn't have a PGSS_TUP_LATEST constant - should someone go update all usage of that constant only when your version isn't the latest? Like here: + if (detected_version = PGSS_TUP_LATEST) There is #define PGSS_TUP_LATEST PGSS_TUP_V1_2 So if an update has to be done, this is the one place to do it. I forget why Daniel originally altered the min value of pg_stat_statements.max to 1 (I just remember that he did), but I don't think it holds that you should keep it there. Have you considered the failure modes when it is actually set to 1? Will set it back to the original value and also test for max value = 1 This is what I call a can't happen error, or a defensive one: + else + { + /* + * Couldn't identify the tuple format. Raise error. + * + * This is an exceptional case that may only happen in bizarre + * situations, since it is thought that every released version + * of pg_stat_statements has a matching schema. + */ + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg(pg_stat_statements schema is not supported + by its installed binary))); + } I'll generally make these simple elogs(), which are more terse. No one is going to find all that dressing useful. Will convert to using elogs Please take a look at this, for future reference: https://wiki.postgresql.org/wiki/Creating_Clean_Patches The whitespace changes are distracting. Thanks! Still learning the art of clean patch submission. It probably isn't useful to comment random, unaffected code that isn't affected by your patch - I don't find this new refactoring useful, and am surprised to see it in your patch: + /* Check header existence and magic number match. */ if (fread(header, sizeof(uint32), 1, file) != 1 || - header != PGSS_FILE_HEADER || - fread(num, sizeof(int32), 1, file) != 1) + header != PGSS_FILE_HEADER) + goto error; + + /* Read how many table entries there are. */ + if (fread(num, sizeof(int32), 1, file) != 1) goto error; Did you mean to add all this, or is it left over from Daniel's patch? I think its a carry over from Daniel's code. I understand the thought. Will keep patch strictly restricted to functionality implemented @@ -43,6 +43,7 @@ */ #include postgres.h +#include time.h #include unistd.h #include access/hash.h @@ -59,15 +60,18 @@ #include storage/spin.h #include tcop/utility.h #include utils/builtins.h +#include utils/timestamp.h Final thought: I think the order in the pg_stat_statements view is wrong. It ought to be like a composite primary key - (userid, dbid, query_id). Will make the change. -- Peter Geoghegan Thank you for the review Sameer -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5778472.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
Hello, Please find attached pg_stat_statements-identification-v9.patch. I have tried to address the following review comments 1. Use version PGSS_TUP_V1_2 2.Fixed total time being zero 3. Remove 'session_start' from the view and use point release number to generate queryid 4. Hide only queryid and query text and not all fields from unauthorized user 5. Removed introduced field from view and code as statistics session concept is not being used 6. Removed struct Instrumentation usage 7. Updated sgml to reflect changes made. Removed all references to statistics session, and introduced fields. regards Sameer pg_stat_statements-identification-v9.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
This paragraph reads a bit strange to me: + A statistics session is the time period when statistics are gathered by statistics collector + without being reset. So a statistics session continues across normal shutdowns, + but whenever statistics are reset, like during a crash or upgrade, a new time period + of statistics collection commences i.e. a new statistics session. + The query_id value generation is linked to statistics session to emphasize the fact + that whenever statistics are reset,the query_id for the same queries will also change. time period when? Shouldn't that be time period during which. Also, doesn't a new statistics session start when a stats reset is invoked by the user? The bit after commences appears correct (to me, not a native by any means) but seems also a bit strange. I have tried to rephrase this. Hopefully less confusing A statistics session refers to the time period when statement statistics are gathered by statistics collector. A statistics session persists across normal shutdowns. Whenever statistics are reset like during a crash or upgrade, a new statistics session starts. The query_id value generation is linked to statistics session to emphasize that whenever statistics are reset,the query_id for the same queries will also change. regards Sameer -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5774365.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On 10/10/13 6:20 AM, Sameer Thakur wrote: Please find patch attached which adds documentation for session_start and introduced fields and corrects documentation for queryid to be query_id. session_start remains in the view as agreed. Please fix the tabs in the SGML files. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
The V7-Patch applied cleanly and I got no issues in my first tests. The change from column session_start to a function seems very reasonable for me. Concernig the usability, I would like to suggest a minor change, that massively increases the usefulness of the patch for beginners, who often use this view as a first approach to optimize index structure. The history of this tool contains a first version without normalization. This wasn't useful enough except for prepared queries. The actual version has normalized queries, so calls get summarized to get a glimpse of bad queries. But the drawback of this approach is impossibility to use explain analyze without further substitutions. The identification patch provides the possibility to summarize calls by query_id, so that the normalized query string itself is no longer needed to be exposed in the view for everyone. I suggest to add a parameter to recover the possibility to display real queries. The following very minor change (based on V7) exposes the first real query getting this query_id if normalization of the exposed string ist deactivated (The actual behaviour is the default). This new option is not always the best starting point to discover index shortfalls, but a huge gain for beginners because it serves the needs in more than 90% of the normal use cases. What do you think? Arne P.S. This message is resent, because it is stalled two days, so sorry for a possible duplicate Date: Mon Oct 7 17:54:08 2013 + Switch to disable normalized query strings diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index e50dfba..6cc9244 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -234,7 +234,7 @@ static int pgss_max; /* max # statements to track */ static int pgss_track; /* tracking level */ static bool pgss_track_utility; /* whether to track utility commands */ static bool pgss_save; /* whether to save stats across shutdown */ - +static bool pgss_normalize; /* whether to normalize the query representation shown in the view (otherwise show the first query executed with this query_id) */ #define pgss_enabled() \ (pgss_track == PGSS_TRACK_ALL || \ @@ -356,6 +356,17 @@ _PG_init(void) NULL, NULL); + DefineCustomBoolVariable(pg_stat_statements.normalize, +Selects whether the view column contains the query strings in a normalized form., + NULL, + pgss_normalize, + true, + PGC_SUSET, + 0, + NULL, + NULL, + NULL); + EmitWarningsOnPlaceholders(pg_stat_statements); /* @@ -1084,9 +1095,9 @@ pgss_store(const char *query, uint32 queryId, query_len = strlen(query); - if (jstate) + if (jstate pgss_normalize) { - /* Normalize the string if enabled */ + /* Normalize the string is not NULL and normalized query strings are enabled */ norm_query = generate_normalized_query(jstate, query, query_len, key.encoding); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
Please find patch attached which adds documentation for session_start and introduced fields and corrects documentation for queryid to be query_id. session_start remains in the view as agreed. regards Sameer pg_stat_statements-identification-v8.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Oct 10, 2013 at 3:11 AM, pilum...@uni-muenster.de wrote: But the drawback of this approach is impossibility to use explain analyze without further substitutions. You can fairly easily disable the swapping of constants with '?' symbols, so that the query text stored would match the full originally executed query. Why would you want to, though? There could be many actual plans whose costs are aggregated as one query. Seeing one of them is not necessarily useful at all, and could be misleading. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
Thx for your reply. On Thu, 10 Oct 2013, Peter Geoghegan wrote: On Thu, Oct 10, 2013 at 3:11 AM, pilum...@uni-muenster.de wrote: But the drawback of this approach is impossibility to use explain analyze without further substitutions. You can fairly easily disable the swapping of constants with '?' symbols, so that the query text stored would match the full originally I thought I did ?! I introduced an additional user parameter to disable the normalization in the patch shown in my last mail. If there is already an easier way in the actual distribution, i simply missed ist. Where is this behaviour documented? executed query. Why would you want to, though? There could be many actual plans whose costs are aggregated as one query. Seeing one of them is not necessarily useful at all, and could be misleading. Yeah, (thinking of for example parameter ranges) I mentioned that, I think, but in the majority of cases beginners can easily conclude missing indices executing explain analyze, because the queries, that are aggregated and displayed under one query_id have very similar (or simply the same) query plans. It's also only an option disabled by default: You can simply do nothing, if you don't like it :-) VlG Arne Scheffer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Oct 10, 2013 at 7:20 PM, Sameer Thakur samthaku...@gmail.com wrote: Please find patch attached which adds documentation for session_start and introduced fields and corrects documentation for queryid to be query_id. session_start remains in the view as agreed. Thanks for updating the document! I'm not clear about the use case of 'session_start' and 'introduced' yet. Could you elaborate it? Maybe we should add that explanation into the document? In my test, I found that pg_stat_statements.total_time always indicates a zero. I guess that the patch might handle pg_stat_statements.total_time wrongly. +values[i++] = DatumGetTimestamp( +instr_get_timestamptz(pgss-session_start)); +values[i++] = DatumGetTimestamp( +instr_get_timestamptz(entry-introduced)); These should be executed only when detected_version = PGSS_TUP_LATEST? + entrystructfieldsession_start/structfield/entry + entrytypetext/type/entry + entry/entry + entryStart time of a statistics session./entry + /row + + row + entrystructfieldintroduced/structfield/entry + entrytypetext/type/entry The data type of these columns is not text. +instr_time session_start; +uint64private_stat_session_key; it's better to add the comments explaining these fields. +microsec = INSTR_TIME_GET_MICROSEC(t) - +((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * USECS_PER_DAY); HAVE_INT64_TIMESTAMP should be considered here. +if (log_cannot_read) +ereport(LOG, +(errcode_for_file_access(), + errmsg(could not read pg_stat_statement file \%s\: %m, +PGSS_DUMP_FILE))); Is it better to use WARNING instead of LOG as the log level here? +/* + * The role calling this function is unable to see + * sensitive aspects of this tuple. + * + * Nullify everything except the insufficient privilege + * message for this entry + */ +memset(nulls, 1, sizeof nulls); + +nulls[i] = 0; +values[i] = CStringGetTextDatum(insufficient privilege); Why do we need to hide *all* the fields in pg_stat_statements, when it's accessed by improper user? This is a big change of pg_stat_statements behavior, and it might break the compatibility. This is not directly related to the patch itself, but why does the queryid need to be calculated based on also the statistics session? If we expose hash value of query tree, without using statistics session, it is possible that users might make wrong assumption that this value remains stable across version upgrades, when in reality it depends on whether the version has make changes to query tree internals. So to explicitly ensure that users do not make this wrong assumption, hash value generation use statistics session id, which is newly created under situations described above. So, ISTM that we can use, for example, the version number to calculate the query_id. Right? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Oct 10, 2013 at 7:20 PM, Sameer Thakur samthaku...@gmail.com wrote: Please find patch attached which adds documentation for session_start and introduced fields and corrects documentation for queryid to be query_id. session_start remains in the view as agreed. Thanks for updating the document! I'm not clear about the use case of 'session_start' and 'introduced' yet. Could you elaborate it? Maybe we should add that explanation into the document? Probably. The idea is that without those fields it's, to wit, impossible to explain non-monotonic movement in metrics of those queries for precise use in tools that insist on monotonicity of the fields, which are all cumulative to date I think. pg_stat_statements_reset() or crashing loses the session, so one expects ncalls may decrease. In addition, a query that is bouncing in and out of the hash table will have its statistics be lost, so its statistics will also decrease. This can cause un-resolvable artifact in analysis tools. The two fields allow for precise understanding of when the statistics for a given query_id are continuous since the last time a program inspected it. In my test, I found that pg_stat_statements.total_time always indicates a zero. I guess that the patch might handle pg_stat_statements.total_time wrongly. +values[i++] = DatumGetTimestamp( +instr_get_timestamptz(pgss-session_start)); +values[i++] = DatumGetTimestamp( +instr_get_timestamptz(entry-introduced)); These should be executed only when detected_version = PGSS_TUP_LATEST? Yes. Oversight. + entrystructfieldsession_start/structfield/entry + entrytypetext/type/entry + entry/entry + entryStart time of a statistics session./entry + /row + + row + entrystructfieldintroduced/structfield/entry + entrytypetext/type/entry The data type of these columns is not text. Oops +instr_time session_start; +uint64private_stat_session_key; it's better to add the comments explaining these fields. Yeah. +microsec = INSTR_TIME_GET_MICROSEC(t) - +((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * USECS_PER_DAY); HAVE_INT64_TIMESTAMP should be considered here. That's not a bad idea. +if (log_cannot_read) +ereport(LOG, +(errcode_for_file_access(), + errmsg(could not read pg_stat_statement file \%s\: %m, +PGSS_DUMP_FILE))); Is it better to use WARNING instead of LOG as the log level here? Is this new code? Why did I add it again? Seems reasonable though to call it a WARNING. +/* + * The role calling this function is unable to see + * sensitive aspects of this tuple. + * + * Nullify everything except the insufficient privilege + * message for this entry + */ +memset(nulls, 1, sizeof nulls); + +nulls[i] = 0; +values[i] = CStringGetTextDatum(insufficient privilege); Why do we need to hide *all* the fields in pg_stat_statements, when it's accessed by improper user? This is a big change of pg_stat_statements behavior, and it might break the compatibility. It seems like an information leak that grows more major if query_id is exposed and, at any point, one can determine the query_id for a query text. This is not directly related to the patch itself, but why does the queryid need to be calculated based on also the statistics session? If we expose hash value of query tree, without using statistics session, it is possible that users might make wrong assumption that this value remains stable across version upgrades, when in reality it depends on whether the version has make changes to query tree internals. So to explicitly ensure that users do not make this wrong assumption, hash value generation use statistics session id, which is newly created under situations described above. So, ISTM that we can use, for example, the version number to calculate the query_id. Right? That does seem like it may be a more reasonable stability vs. once per statistics session, because then use case with standbys work somewhat better. That said, the general concern was accidental contracts being assumed by consuming code, so this approach is tuned for making the query_id as unstable as possible while still being useful: stable, but only in one statistics gathering section. I did not raise the objection about over-aggressive contracts being exposed although I think the concern has merit...but given the use case involving standbys, I am for now charitable to increasing the stability to the level you indicate: a guaranteed break every point release. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Fri, Oct 11, 2013 at 12:19 AM, Daniel Farina dan...@heroku.com wrote: Probably. The idea is that without those fields it's, to wit, impossible to explain non-monotonic movement in metrics of those queries for precise use in tools that insist on monotonicity of the fields, which are all cumulative to date I think. pg_stat_statements_reset() or crashing loses the session, so one expects ncalls may decrease. In addition, a query that is bouncing in and out of the hash table will have its statistics be lost, so its statistics will also decrease. This can cause un-resolvable artifact in analysis tools. The two fields allow for precise understanding of when the statistics for a given query_id are continuous since the last time a program inspected it. Thanks for elaborating them! Since 'introduced' is reset even when the statistics is reset, maybe we can do without 'session_start' for that purpose? +/* + * The role calling this function is unable to see + * sensitive aspects of this tuple. + * + * Nullify everything except the insufficient privilege + * message for this entry + */ +memset(nulls, 1, sizeof nulls); + +nulls[i] = 0; +values[i] = CStringGetTextDatum(insufficient privilege); Why do we need to hide *all* the fields in pg_stat_statements, when it's accessed by improper user? This is a big change of pg_stat_statements behavior, and it might break the compatibility. It seems like an information leak that grows more major if query_id is exposed and, at any point, one can determine the query_id for a query text. So hiding only query and query_id is enough? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
Daniel Farina escribió: On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao masao.fu...@gmail.com wrote: In my test, I found that pg_stat_statements.total_time always indicates a zero. I guess that the patch might handle pg_stat_statements.total_time wrongly. +values[i++] = DatumGetTimestamp( +instr_get_timestamptz(pgss-session_start)); +values[i++] = DatumGetTimestamp( +instr_get_timestamptz(entry-introduced)); These should be executed only when detected_version = PGSS_TUP_LATEST? Yes. Oversight. Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2? I mean, if later somebody patches pgss to have a PGSS_TUP_V2 or V1_3 and that becomes latest, somebody running the current definition with the updated .so will no longer get these values. Or is the intention that PGSS_TUP_LATEST will never be updated again, and future versions will get PGSS_TUP_REALLY_LATEST and PGSS_TUP_REALLY_LATEST_HONEST and so on? I mean, surely we can always come up with new symbols to use, but it seems hard to follow. There's one other use of PGSS_TUP_LATEST here which I think should also be changed to = SOME_SPECIFIC_VERSION: + if (detected_version = PGSS_TUP_LATEST) + { + uint64 qid = pgss-private_stat_session_key; + + qid ^= (uint64) entry-query_id; + qid ^= ((uint64) entry-query_id) 32; + + values[i++] = Int64GetDatumFast(qid); + } This paragraph reads a bit strange to me: + A statistics session is the time period when statistics are gathered by statistics collector + without being reset. So a statistics session continues across normal shutdowns, + but whenever statistics are reset, like during a crash or upgrade, a new time period + of statistics collection commences i.e. a new statistics session. + The query_id value generation is linked to statistics session to emphasize the fact + that whenever statistics are reset,the query_id for the same queries will also change. time period when? Shouldn't that be time period during which. Also, doesn't a new statistics session start when a stats reset is invoked by the user? The bit after commences appears correct (to me, not a native by any means) but seems also a bit strange. I just noticed that the table describing the output indicates that session_start and introduced are of type text, but the SQL defines timestamptz. The instr_time thingy being used for these times maps to QueryPerformanceCounter() on Windows, and I'm not sure how useful this is for long-term time tracking; see http://stackoverflow.com/questions/5296990/queryperformancecounter-and-overflows#5297163 for instance. I think it'd be better to use TimestampTz and GetCurrentTimestamp() for this. Just noticed that you changed the timer to struct Instrumentation. Not really sure about that change. Since you seem to be using only the start time and counter, wouldn't it be better to store only those? Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc(). -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Fri, Oct 11, 2013 at 2:49 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Daniel Farina escribió: On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao masao.fu...@gmail.com wrote: In my test, I found that pg_stat_statements.total_time always indicates a zero. I guess that the patch might handle pg_stat_statements.total_time wrongly. +values[i++] = DatumGetTimestamp( +instr_get_timestamptz(pgss-session_start)); +values[i++] = DatumGetTimestamp( +instr_get_timestamptz(entry-introduced)); These should be executed only when detected_version = PGSS_TUP_LATEST? Yes. Oversight. Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2? I was just thinking the same thing. Agreed. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Oct 10, 2013 at 10:12 AM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Oct 11, 2013 at 12:19 AM, Daniel Farina dan...@heroku.com wrote: Probably. The idea is that without those fields it's, to wit, impossible to explain non-monotonic movement in metrics of those queries for precise use in tools that insist on monotonicity of the fields, which are all cumulative to date I think. pg_stat_statements_reset() or crashing loses the session, so one expects ncalls may decrease. In addition, a query that is bouncing in and out of the hash table will have its statistics be lost, so its statistics will also decrease. This can cause un-resolvable artifact in analysis tools. The two fields allow for precise understanding of when the statistics for a given query_id are continuous since the last time a program inspected it. Thanks for elaborating them! Since 'introduced' is reset even when the statistics is reset, maybe we can do without 'session_start' for that purpose? There is a small loss of precision. The original reason was that if one wanted to know, given two samples of pg_stat_statements, when the query_id is going to remain stable for a given query. For example: If the session changes on account of a reset, then it is known that all query_ids one's external program is tracking are no longer going to be updated, and the program can take account for the fact that the same query text is going to have a new query_id. Given the new idea of mixing in the point release number: If the code is changed to instead mixing in the full version of Postgres, as you suggested recently, this can probably be removed. The caveat there is then the client is going to have to do something a bit weird like ask for the point release and perhaps even compile options of Postgres to know when the query_id is going to have a different value for a given query text. But, maybe this is an acceptable compromise to remove one field. +/* + * The role calling this function is unable to see + * sensitive aspects of this tuple. + * + * Nullify everything except the insufficient privilege + * message for this entry + */ +memset(nulls, 1, sizeof nulls); + +nulls[i] = 0; +values[i] = CStringGetTextDatum(insufficient privilege); Why do we need to hide *all* the fields in pg_stat_statements, when it's accessed by improper user? This is a big change of pg_stat_statements behavior, and it might break the compatibility. It seems like an information leak that grows more major if query_id is exposed and, at any point, one can determine the query_id for a query text. So hiding only query and query_id is enough? Yeah, I think so. The other fields feel a bit weird to leave hanging around as well, so I thought I'd just fix it in one shot, but doing the minimum or only applying this idea to new fields is safer. It's shorter to hit all the fields, though, which is why I was tempted to do that. Perhaps not a good economy for potential subtle breaks in the next version. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Oct 10, 2013 at 10:49 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Daniel Farina escribió: On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao masao.fu...@gmail.com wrote: In my test, I found that pg_stat_statements.total_time always indicates a zero. I guess that the patch might handle pg_stat_statements.total_time wrongly. +values[i++] = DatumGetTimestamp( +instr_get_timestamptz(pgss-session_start)); +values[i++] = DatumGetTimestamp( +instr_get_timestamptz(entry-introduced)); These should be executed only when detected_version = PGSS_TUP_LATEST? Yes. Oversight. Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2? I mean, if later somebody patches pgss to have a PGSS_TUP_V2 or V1_3 and that becomes latest, somebody running the current definition with the updated .so will no longer get these values. Or is the intention that PGSS_TUP_LATEST will never be updated again, and future versions will get PGSS_TUP_REALLY_LATEST and PGSS_TUP_REALLY_LATEST_HONEST and so on? I mean, surely we can always come up with new symbols to use, but it seems hard to follow. There's one other use of PGSS_TUP_LATEST here which I think should also be changed to = SOME_SPECIFIC_VERSION: + if (detected_version = PGSS_TUP_LATEST) + { + uint64 qid = pgss-private_stat_session_key; + + qid ^= (uint64) entry-query_id; + qid ^= ((uint64) entry-query_id) 32; + + values[i++] = Int64GetDatumFast(qid); + } I made some confusing mistakes here in using the newer tuple versioning. Let me try to explain what the motivation was: I was adding new fields to pg_stat_statements and was afraid that it'd be hard to get a very clear view that all the set fields are in alignment and there were no accidental overruns, with the problem getting more convoluted as more versions are added. The idea of PGSS_TUP_LATEST is useful to catch common programmer error by testing some invariants, and it'd be nice not to have to thrash the invariant checking code every release, which would probably defeat the point of such oops prevention code. But, the fact that I went on to rampantly do questionable things PGSS_TUP_LATEST is a bad sign. By example, here are the two uses that have served me very well: /* Perform version detection */ if (tupdesc-natts == PG_STAT_STATEMENTS_COLS_V1_0) detected_version = PGSS_TUP_V1_0; else if (tupdesc-natts == PG_STAT_STATEMENTS_COLS_V1_1) detected_version = PGSS_TUP_V1_1; else if (tupdesc-natts == PG_STAT_STATEMENTS_COLS) detected_version = PGSS_TUP_LATEST; else { /* * Couldn't identify the tuple format. Raise error. * * This is an exceptional case that may only happen in bizarre * situations, since it is thought that every released version * of pg_stat_statements has a matching schema. */ ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg(pg_stat_statements schema is not supported by its installed binary))); } And #ifdef USE_ASSERT_CHECKING /* Check that every column appears to be filled */ switch (detected_version) { case PGSS_TUP_V1_0: Assert(i == PG_STAT_STATEMENTS_COLS_V1_0); break; case PGSS_TUP_V1_1: Assert(i == PG_STAT_STATEMENTS_COLS_V1_1); break; case PGSS_TUP_LATEST: Assert(i == PG_STAT_STATEMENTS_COLS); break; default: Assert(false); } #endif Given that, perhaps a way to fix this is something like this patch-fragment: { PGSS_TUP_V1_0 = 1, PGSS_TUP_V1_1, - PGSS_TUP_LATEST + PGSS_TUP_V1_2 } pgssTupVersion; +#define PGSS_TUP_LATEST PGSS_TUP_V1_2 This way when a programmer is making new tuple versions, they are much more likely to see the immediate need to teach those two sites about the new tuple size. But, the fact that one does not get the invariants updated in a completely compulsory way may push the value of this checking under water. I'd be sad to see it go, it has saved me a lot of effort when returning to the code after a while. What do you think? The instr_time thingy being used for these times maps to QueryPerformanceCounter() on Windows, and I'm not sure how useful this is for long-term time tracking; see http://stackoverflow.com/questions/5296990/queryperformancecounter-and-overflows#5297163 for instance. I think it'd be better to use TimestampTz and GetCurrentTimestamp() for this. Ah. I was going to do that, but thought it'd be nice to merely push down the already-extant Instr struct in most cases, as to get the 'start' time stored there for free. But if it can't
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
Daniel Farina escribió: Given that, perhaps a way to fix this is something like this patch-fragment: { PGSS_TUP_V1_0 = 1, PGSS_TUP_V1_1, - PGSS_TUP_LATEST + PGSS_TUP_V1_2 } pgssTupVersion; +#define PGSS_TUP_LATEST PGSS_TUP_V1_2 This sounds good. I have seen other places that have the LATEST definition as part of the enum, assigning the previous value to it. I'm not really sure which of these is harder to miss when updating the code. I'm happy with either. Make sure to use the PGSS_TUP_V1_2 in the two places mentioned, in any case. Just noticed that you changed the timer to struct Instrumentation. Not really sure about that change. Since you seem to be using only the start time and counter, wouldn't it be better to store only those? Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc(). Yeah, I was unsure about that too. The motivation was that I need one more piece of information in pgss_store (the absolute start time). I was going to widen the argument list, but it was looking pretty long, so instead I was thinking it'd be more concise to push the entire, typically extant Instrumentation struct pointer down. Would it work to define your own struct to pass around? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
The V7-Patch applied cleanly and I got no issues in my first tests. The change from column session_start to a function seems very reasonable for me. Concernig the usability, I would like to suggest a minor change, that massively increases the usefulness of the patch for beginners, who often use this view as a first approach to optimize index structure. The history of this tool contains a first version without normalization. This wasn't useful enough except for prepared queries. The actual version has normalized queries, so calls get summarized to get a glimpse of bad queries. But the drawback of this approach is impossibility to use explain analyze without further substitutions. The identification patch provides the possibility to summarize calls by query_id, so that the normalized query string itself is no longer needed to be exposed in the view for everyone. I suggest to add a parameter to recover the possibility to display real queries. The following very minor change (based on V7) exposes the first real query getting this query_id if normalization of the exposed string ist deactivated (The actual behaviour is the default). This new option is not always the best starting point to discover index shortfalls, but a huge gain for beginners because it serves the needs in more than 90% of the normal use cases. What do you think? Arne Date: Mon Oct 7 17:54:08 2013 + Switch to disable normalized query strings diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index e50dfba..6cc9244 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -234,7 +234,7 @@ static int pgss_max; /* max # statements to track */ static int pgss_track; /* tracking level */ static bool pgss_track_utility; /* whether to track utility commands */ static bool pgss_save; /* whether to save stats across shutdown */ - +static bool pgss_normalize; /* whether to normalize the query representation shown in the view (otherwise show the first query executed with this query_id) */ #define pgss_enabled() \ (pgss_track == PGSS_TRACK_ALL || \ @@ -356,6 +356,17 @@ _PG_init(void) NULL, NULL); + DefineCustomBoolVariable(pg_stat_statements.normalize, +Selects whether the view column contains the query strings in a normalized form., + NULL, + pgss_normalize, + true, + PGC_SUSET, + 0, + NULL, + NULL, + NULL); + EmitWarningsOnPlaceholders(pg_stat_statements); /* @@ -1084,9 +1095,9 @@ pgss_store(const char *query, uint32 queryId, query_len = strlen(query); - if (jstate) + if (jstate pgss_normalize) { - /* Normalize the string if enabled */ + /* Normalize the string is not NULL and normalized query strings are enabled */ norm_query = generate_normalized_query(jstate, query, query_len, key.encoding); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Oct 10, 2013 at 1:30 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Just noticed that you changed the timer to struct Instrumentation. Not really sure about that change. Since you seem to be using only the start time and counter, wouldn't it be better to store only those? Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc(). Yeah, I was unsure about that too. The motivation was that I need one more piece of information in pgss_store (the absolute start time). I was going to widen the argument list, but it was looking pretty long, so instead I was thinking it'd be more concise to push the entire, typically extant Instrumentation struct pointer down. Would it work to define your own struct to pass around? Absolutely, I was just hoping to spare the code another abstraction if another was a precise superset. Looks like that isn't going to happen, though, so a pgss-oriented struct is likely what will have to be. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Fri, Oct 4, 2013 at 7:22 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Oct 3, 2013 at 5:11 PM, Sameer Thakur samthaku...@gmail.com wrote: On Wed, Oct 2, 2013 at 6:40 PM, Sameer Thakur samthaku...@gmail.com wrote: Looks pretty good. Do you want to package up the patch with your change and do the honors and re-submit it? Thanks for helping out so much! Sure, will do. Need to add a bit of documentation explaining statistics session as well. I did some more basic testing around pg_stat_statements.max, now that we have clarity from Peter about its value being legitimate below 100. Seems to work fine, with pg_stat_statements =4 the max unique queries in the view are 4. On the 5th query the view holds just the latest unique query discarding the previous 4. Fujii had reported a segmentation fault in this scenario. Thank you for the patch Please find the patch attached Thanks for the patch! Here are the review comments: +OUT session_start timestamptz, +OUT introduced timestamptz, The patch exposes these columns in pg_stat_statements view. These should be documented. I don't think that session_start should be exposed in every rows in pg_stat_statements because it's updated only when all statistics are reset, i.e., session_start of all entries in pg_stat_statements indicate the same. Dunno. I agree it'd be less query traffic and noise. Maybe hidden behind a UDF? I thought stats_reset on pg_database may be prior art, but realized that the statistics there differ depending on stats resets per database (maybe a name change of 'session' to 'stats_reset' would be useful to avoid too much in-cohesion, though). I didn't want to bloat the taxonomy of exposed API/symbols too much for pg_stat_statements, but perhaps in this instance it is reasonable. Also, isn't the interlock with the result set is perhaps more precise/fine-grained with the current solution? Yet, that's awfully corner-casey. I'm on the fence because the simplicity and precision of the current regime for aggregation tools is nice, but avoiding the noise for inspecting humans in the common case is also nice. I don't see a reason right now to go strongly either way, so if you feel moderately strongly that the repetitive column should be stripped then I am happy to relent there and help out. Let me know of your detailed thoughts (or modify the patch) with your idea. +OUT query_id int8, query_id or queryid? I like the latter. Also the document uses the latter. Not much opinion. I prefer expending the _ character because most compound words in postgres performance statistics catalogs seem to use an underscore, though. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
Please find the patch attached Thanks for the patch! Here are the review comments: +OUT session_start timestamptz, +OUT introduced timestamptz, The patch exposes these columns in pg_stat_statements view. These should be documented. Yes, will add to documentation. I don't think that session_start should be exposed in every rows in pg_stat_statements because it's updated only when all statistics are reset, i.e., session_start of all entries in pg_stat_statements indicate the same. I understand. Will remove session_start from view and expose it via a function pg_stat_statements_current_session_start() ? +OUT query_id int8, query_id or queryid? I like the latter. Also the document uses the latter. Will change to queryid Thank you Sameer -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5773448.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sat, Oct 5, 2013 at 1:38 PM, Daniel Farina dan...@heroku.com wrote: On Fri, Oct 4, 2013 at 7:22 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Oct 3, 2013 at 5:11 PM, Sameer Thakur samthaku...@gmail.com wrote: On Wed, Oct 2, 2013 at 6:40 PM, Sameer Thakur samthaku...@gmail.com wrote: Looks pretty good. Do you want to package up the patch with your change and do the honors and re-submit it? Thanks for helping out so much! Sure, will do. Need to add a bit of documentation explaining statistics session as well. I did some more basic testing around pg_stat_statements.max, now that we have clarity from Peter about its value being legitimate below 100. Seems to work fine, with pg_stat_statements =4 the max unique queries in the view are 4. On the 5th query the view holds just the latest unique query discarding the previous 4. Fujii had reported a segmentation fault in this scenario. Thank you for the patch Please find the patch attached Thanks for the patch! Here are the review comments: +OUT session_start timestamptz, +OUT introduced timestamptz, The patch exposes these columns in pg_stat_statements view. These should be documented. I don't think that session_start should be exposed in every rows in pg_stat_statements because it's updated only when all statistics are reset, i.e., session_start of all entries in pg_stat_statements indicate the same. Dunno. I agree it'd be less query traffic and noise. Maybe hidden behind a UDF? I thought stats_reset on pg_database may be prior art, but realized that the statistics there differ depending on stats resets per database (maybe a name change of 'session' to 'stats_reset' would be useful to avoid too much in-cohesion, though). I didn't want to bloat the taxonomy of exposed API/symbols too much for pg_stat_statements, but perhaps in this instance it is reasonable. Also, isn't the interlock with the result set is perhaps more precise/fine-grained with the current solution? Yet, that's awfully corner-casey. I'm on the fence because the simplicity and precision of the current regime for aggregation tools is nice, but avoiding the noise for inspecting humans in the common case is also nice. I don't see a reason right now to go strongly either way, so if you feel moderately strongly that the repetitive column should be stripped then I am happy to relent there and help out. Let me know of your detailed thoughts (or modify the patch) with your idea. Thinking a bit more, if its just a question of a repeating value we have the same situation for userid and dbid. They would be the same for a user across multiple queries in same statistics session. So userid,dbid and session_start do repeat across rows. Not sure why treatment for session_start be different. I also checked pg_stat_plans @ https://github.com/2ndQuadrant/pg_stat_plans and did not see any special treatment given for a particular field in terms of access i.e. the granularity of api wrt pg_stat_statements has been maintained. regards Sameer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Oct 3, 2013 at 5:11 PM, Sameer Thakur samthaku...@gmail.com wrote: On Wed, Oct 2, 2013 at 6:40 PM, Sameer Thakur samthaku...@gmail.com wrote: Looks pretty good. Do you want to package up the patch with your change and do the honors and re-submit it? Thanks for helping out so much! Sure, will do. Need to add a bit of documentation explaining statistics session as well. I did some more basic testing around pg_stat_statements.max, now that we have clarity from Peter about its value being legitimate below 100. Seems to work fine, with pg_stat_statements =4 the max unique queries in the view are 4. On the 5th query the view holds just the latest unique query discarding the previous 4. Fujii had reported a segmentation fault in this scenario. Thank you for the patch Please find the patch attached Thanks for the patch! Here are the review comments: +OUT session_start timestamptz, +OUT introduced timestamptz, The patch exposes these columns in pg_stat_statements view. These should be documented. I don't think that session_start should be exposed in every rows in pg_stat_statements because it's updated only when all statistics are reset, i.e., session_start of all entries in pg_stat_statements indicate the same. +OUT query_id int8, query_id or queryid? I like the latter. Also the document uses the latter. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Wed, Oct 2, 2013 at 6:40 PM, Sameer Thakur samthaku...@gmail.com wrote: Looks pretty good. Do you want to package up the patch with your change and do the honors and re-submit it? Thanks for helping out so much! Sure, will do. Need to add a bit of documentation explaining statistics session as well. I did some more basic testing around pg_stat_statements.max, now that we have clarity from Peter about its value being legitimate below 100. Seems to work fine, with pg_stat_statements =4 the max unique queries in the view are 4. On the 5th query the view holds just the latest unique query discarding the previous 4. Fujii had reported a segmentation fault in this scenario. Thank you for the patch Please find the patch attached regards Sameer pg_stat_statements-identification-v7.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
Looks pretty good. Do you want to package up the patch with your change and do the honors and re-submit it? Thanks for helping out so much! Sure, will do. Need to add a bit of documentation explaining statistics session as well. I did some more basic testing around pg_stat_statements.max, now that we have clarity from Peter about its value being legitimate below 100. Seems to work fine, with pg_stat_statements =4 the max unique queries in the view are 4. On the 5th query the view holds just the latest unique query discarding the previous 4. Fujii had reported a segmentation fault in this scenario. Thank you for the patch regards Sameer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Tue, Oct 1, 2013 at 12:48 AM, Daniel Farina-5 [via PostgreSQL] ml-node+s1045698n5772887...@n5.nabble.com wrote: On Sep 30, 2013 4:39 AM, Sameer Thakur [hidden email] wrote: Also, for onlookers, I have changed this patch around to do the date-oriented stuff but want to look it over before stapling it up and sending it. If one cannot wait, one can look at https://github.com/fdr/postgres/tree/queryid. The squashed-version of that history contains a reasonable patch I think, but a re-read often finds something for me and I've only just completed it yesterday. I did the following 1. Forked from fdr/postgres 2. cloned branch queryid 3. squashed 22899c802571a57cfaf0df38e6c5c366b5430c74 d813096e29049667151a49fc5e5cf3d6bbe55702 picked be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5 4. usual make/make install/create extension pg_stat_statements. (pg_stat_statements.max=100). 5. select * from pg_stat_statements_reset(), select * from pgbench_tellers. result below: userid | dbid | session_start |introduced | query | query_id | calls | total_time | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_written | t emp_blks_read | temp_blks_written | blk_read_time | blk_write_time +---+--+---+---+-+---++ --+-+--+-+-++-+++-- --+---+---+ 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01 05:30:00+05:30 | select * from pg_stat_statements_reset(); | 2531907647060518039 | 1 | 0 | 1 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01 05:30:00+05:30 | select * from pgbench_tellers ; | 7580333025384382649 | 1 | 0 | 10 | 1 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 (2 rows) I understand session_start and verified that it changes with each database restart to reflect current time. It should only restart when the statistics file cannot be loaded. I am not sure why introduced keeps showing the same 1970-01-01 05:30:00+05:30 value. I thought it reflected the (most recent) time query statements statistics is added to hashtable. Is this a bug? Will continue to test and try and understand the code. Yes, a bug. There are a few calls to pgss store and I must be submitting a zero value for the introduction time in one of those cases. Heh, I thought that was fixed, but maybe I broke something. Like I said; preliminary. At the earliest I can look at this Wednesday, but feel free to amend and resubmit including my changes if you feel inclined and get to it first. In pg_stat_statements.c line 1440 changed if (instr == NULL) to if (instr == NULL || INSTR_TIME_IS_ZERO(instr-starttime)) This seemed to do the trick. I will continue to test some more. regards Sameer -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5772930.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Tue, Oct 1, 2013 at 12:48 AM, Daniel Farina-5 [via PostgreSQL] ml-node+s1045698n5772887...@n5.nabble.com wrote: On Sep 30, 2013 4:39 AM, Sameer Thakur [hidden email] wrote: Also, for onlookers, I have changed this patch around to do the date-oriented stuff but want to look it over before stapling it up and sending it. If one cannot wait, one can look at https://github.com/fdr/postgres/tree/queryid. The squashed-version of that history contains a reasonable patch I think, but a re-read often finds something for me and I've only just completed it yesterday. I did the following 1. Forked from fdr/postgres 2. cloned branch queryid 3. squashed 22899c802571a57cfaf0df38e6c5c366b5430c74 d813096e29049667151a49fc5e5cf3d6bbe55702 picked be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5 4. usual make/make install/create extension pg_stat_statements. (pg_stat_statements.max=100). 5. select * from pg_stat_statements_reset(), select * from pgbench_tellers. result below: userid | dbid | session_start |introduced | query | query_id | calls | total_time | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_written | t emp_blks_read | temp_blks_written | blk_read_time | blk_write_time +---+--+---+---+-+---++ --+-+--+-+-++-+++-- --+---+---+ 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01 05:30:00+05:30 | select * from pg_stat_statements_reset(); | 2531907647060518039 | 1 | 0 | 1 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01 05:30:00+05:30 | select * from pgbench_tellers ; | 7580333025384382649 | 1 | 0 | 10 | 1 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 (2 rows) I understand session_start and verified that it changes with each database restart to reflect current time. It should only restart when the statistics file cannot be loaded. This seems to work fine. 1. Started the instance 2. Executed pg_stat_statements_reset(), select * from pgbench_history,select* from pgbench_tellers. Got the following in pg_stat_statements view userid | dbid | session_start | introduced| query | query_id | calls | tota l_time | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_wri tten | temp_blks_read | temp_blks_written | blk_read_time | blk_write_time +---+--+--+---+--+---+- ---+--+-+--+-+-++-++--- -++---+---+ 10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01 17:43:43.724301+05:30 | select * from pgbench_history;| -165801328395488047 | 1 | 0 |0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01 17:43:37.379785+05:30 | select * from pgbench_tellers;| 8376871363863945311 | 1 | 0 | 10 | 0 |1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01 17:43:26.667178+05:30 | select * from pg_stat_statements_reset(); | -1061018443194138344 | 1 | 0 |1 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 (3 rows) Then restarted the server and saw pg_stat_statements view again. userid | dbid | session_start |
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Tue, Oct 1, 2013 at 5:32 AM, Sameer Thakur samthaku...@gmail.com wrote: On Tue, Oct 1, 2013 at 12:48 AM, Daniel Farina-5 [via PostgreSQL] [hidden email] wrote: On Sep 30, 2013 4:39 AM, Sameer Thakur [hidden email] wrote: Also, for onlookers, I have changed this patch around to do the date-oriented stuff but want to look it over before stapling it up and sending it. If one cannot wait, one can look at https://github.com/fdr/postgres/tree/queryid. The squashed-version of that history contains a reasonable patch I think, but a re-read often finds something for me and I've only just completed it yesterday. I did the following 1. Forked from fdr/postgres 2. cloned branch queryid 3. squashed 22899c802571a57cfaf0df38e6c5c366b5430c74 d813096e29049667151a49fc5e5cf3d6bbe55702 picked be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5 4. usual make/make install/create extension pg_stat_statements. (pg_stat_statements.max=100). 5. select * from pg_stat_statements_reset(), select * from pgbench_tellers. result below: userid | dbid | session_start |introduced | query | query_id | calls | total_time | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_written | t emp_blks_read | temp_blks_written | blk_read_time | blk_write_time +---+--+---+---+-+---++ --+-+--+-+-++-+++-- --+---+---+ 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01 05:30:00+05:30 | select * from pg_stat_statements_reset(); | 2531907647060518039 | 1 | 0 | 1 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01 05:30:00+05:30 | select * from pgbench_tellers ; | 7580333025384382649 | 1 | 0 | 10 | 1 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 (2 rows) I understand session_start and verified that it changes with each database restart to reflect current time. It should only restart when the statistics file cannot be loaded. This seems to work fine. 1. Started the instance 2. Executed pg_stat_statements_reset(), select * from pgbench_history,select* from pgbench_tellers. Got the following in pg_stat_statements view userid | dbid | session_start | introduced| query | query_id | calls | tota l_time | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_wri tten | temp_blks_read | temp_blks_written | blk_read_time | blk_write_time +---+--+--+---+--+---+- ---+--+-+--+-+-++-++--- -++---+---+ 10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01 17:43:43.724301+05:30 | select * from pgbench_history;| -165801328395488047 | 1 | 0 |0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01 17:43:37.379785+05:30 | select * from pgbench_tellers;| 8376871363863945311 | 1 | 0 | 10 | 0 |1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01 17:43:26.667178+05:30 | select * from pg_stat_statements_reset(); | -1061018443194138344 | 1 | 0 |1 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 (3 rows) Then restarted the server and
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sun, Sep 29, 2013 at 10:58 PM, Daniel Farina dan...@fdr.io wrote: I remember hacking that out for testing sake. I can only justify it as a foot-gun to prevent someone from being stuck restarting the database to get a reasonable number in there. Let's CC Peter; maybe he can remember some thoughts about that. On reflection I think it was entirely to do with the testing of the patch. I don't think that the minimum value of pg_stat_statements has any real significance. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sun, Sep 29, 2013 at 10:25 AM, Sameer Thakur samthaku...@gmail.com wrote: Yes i was. Just saw a warning when pg_stat_statements is loaded that valid values for pg_stat_statements.max is between 100 and 2147483647. Not sure why though. I remember hacking that out for testing sake. I can only justify it as a foot-gun to prevent someone from being stuck restarting the database to get a reasonable number in there. Let's CC Peter; maybe he can remember some thoughts about that. Also, for onlookers, I have changed this patch around to do the date-oriented stuff but want to look it over before stapling it up and sending it. If one cannot wait, one can look at https://github.com/fdr/postgres/tree/queryid. The squashed-version of that history contains a reasonable patch I think, but a re-read often finds something for me and I've only just completed it yesterday. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
Also, for onlookers, I have changed this patch around to do the date-oriented stuff but want to look it over before stapling it up and sending it. If one cannot wait, one can look at https://github.com/fdr/postgres/tree/queryid. The squashed-version of that history contains a reasonable patch I think, but a re-read often finds something for me and I've only just completed it yesterday. I did the following 1. Forked from fdr/postgres 2. cloned branch queryid 3. squashed 22899c802571a57cfaf0df38e6c5c366b5430c74 d813096e29049667151a49fc5e5cf3d6bbe55702 picked be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5 4. usual make/make install/create extension pg_stat_statements. (pg_stat_statements.max=100). 5. select * from pg_stat_statements_reset(), select * from pgbench_tellers. result below: userid | dbid | session_start |introduced | query | query_id | calls | total_time | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_written | t emp_blks_read | temp_blks_written | blk_read_time | blk_write_time +---+--+---+---+-+---++ --+-+--+-+-++-+++-- --+---+---+ 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01 05:30:00+05:30 | select * from pg_stat_statements_reset(); | 2531907647060518039 | 1 | 0 | 1 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01 05:30:00+05:30 | select * from pgbench_tellers ; | 7580333025384382649 | 1 | 0 | 10 | 1 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 (2 rows) I understand session_start and verified that it changes with each database restart to reflect current time. I am not sure why introduced keeps showing the same 1970-01-01 05:30:00+05:30 value. I thought it reflected the (most recent) time query statements statistics is added to hashtable. Is this a bug? Will continue to test and try and understand the code. regards Sameer -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5772841.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sep 30, 2013 4:39 AM, Sameer Thakur samthaku...@gmail.com wrote: Also, for onlookers, I have changed this patch around to do the date-oriented stuff but want to look it over before stapling it up and sending it. If one cannot wait, one can look at https://github.com/fdr/postgres/tree/queryid. The squashed-version of that history contains a reasonable patch I think, but a re-read often finds something for me and I've only just completed it yesterday. I did the following 1. Forked from fdr/postgres 2. cloned branch queryid 3. squashed 22899c802571a57cfaf0df38e6c5c366b5430c74 d813096e29049667151a49fc5e5cf3d6bbe55702 picked be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5 4. usual make/make install/create extension pg_stat_statements. (pg_stat_statements.max=100). 5. select * from pg_stat_statements_reset(), select * from pgbench_tellers. result below: userid | dbid | session_start |introduced | query | query_id | calls | total_time | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_written | t emp_blks_read | temp_blks_written | blk_read_time | blk_write_time +---+--+---+---+-+---++ --+-+--+-+-++-+++-- --+---+---+ 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01 05:30:00+05:30 | select * from pg_stat_statements_reset(); | 2531907647060518039 | 1 | 0 | 1 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01 05:30:00+05:30 | select * from pgbench_tellers ; | 7580333025384382649 | 1 | 0 | 10 | 1 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 (2 rows) I understand session_start and verified that it changes with each database restart to reflect current time. It should only restart when the statistics file cannot be loaded. I am not sure why introduced keeps showing the same 1970-01-01 05:30:00+05:30 value. I thought it reflected the (most recent) time query statements statistics is added to hashtable. Is this a bug? Will continue to test and try and understand the code. Yes, a bug. There are a few calls to pgss store and I must be submitting a zero value for the introduction time in one of those cases. Heh, I thought that was fixed, but maybe I broke something. Like I said; preliminary. At the earliest I can look at this Wednesday, but feel free to amend and resubmit including my changes if you feel inclined and get to it first.
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Mon, Sep 23, 2013 at 1:26 PM, samthakur74 samthaku...@gmail.com wrote: I forgot about removal of the relevant SGML, amended here in v6. Thank you for this! i did a quick test with following steps: 1. Applied v6 patch 2. make and make install on pg_stat_statements; 3. Restarted Postgres with pg_stat_statements loaded with pg_stat_statements.max = 4 4. Dropped and created extension pg_stat_statements. Executed following: select * from pg_stat_statements_reset(); select * from pgbench_branches ; select * from pgbench_history ; select * from pgbench_tellers ; select * from pgbench_accounts; I expected 4 rows in pg_stat_statements view for each of 4 queries above. But i saw just 2 rows. select * from pg_stat_statements; userid | dbid | stat_session_id | query | quer y_id | calls | total_time | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | l ocal_blks_dirtied | local_blks_written | temp_blks_read | temp_blks_written | bl k_read_time | blk_write_time +---+-+-+--- ---+---+++-+--+- +-++-+-- --+++---+--- + 10 | 12900 |21595345 | select * from pgbench_accounts; | -803800319 3522943111 | 1 |108.176 | 10 |1640 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 12900 |21595345 | select * from pgbench_tellers ; | -149722997 7134331757 | 1 | 0.227 | 10 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 (2 rows) Am i doing something wrong? Yes i was. Just saw a warning when pg_stat_statements is loaded that valid values for pg_stat_statements.max is between 100 and 2147483647. Not sure why though. regards Sameer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
I forgot about removal of the relevant SGML, amended here in v6. Thank you for this! i did a quick test with following steps: 1. Applied v6 patch 2. make and make install on pg_stat_statements; 3. Restarted Postgres with pg_stat_statements loaded with pg_stat_statements.max = 4 4. Dropped and created extension pg_stat_statements. Executed following: select * from pg_stat_statements_reset(); select * from pgbench_branches ; select * from pgbench_history ; select * from pgbench_tellers ; select * from pgbench_accounts; I expected 4 rows in pg_stat_statements view for each of 4 queries above. But i saw just 2 rows. select * from pg_stat_statements; userid | dbid | stat_session_id | query | quer y_id | calls | total_time | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | l ocal_blks_dirtied | local_blks_written | temp_blks_read | temp_blks_written | bl k_read_time | blk_write_time +---+-+-+--- ---+---+++-+--+- +-++-+-- --+++---+--- + 10 | 12900 |21595345 | select * from pgbench_accounts; | -803800319 3522943111 | 1 |108.176 | 10 |1640 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 12900 |21595345 | select * from pgbench_tellers ; | -149722997 7134331757 | 1 | 0.227 | 10 | 1 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 (2 rows) Am i doing something wrong? regards Sameer -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771960.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
Daniel Farina escribió: On Fri, Sep 20, 2013 at 1:11 AM, Daniel Farina dan...@fdr.io wrote: I think the n-call underestimation propagation may not be quite precise for various detailed reasons (having to do with 'sticky' queries) and to make it precise is probably more work than it's worth. And, on more reflection, I'm also having a hard time imaging people intuiting that value usefully. So, here's a version removing that. I forgot about removal of the relevant SGML, amended here in v6. Nice. You need to remove the --1.1.sql entry (the file itself and the DATA entry in Makefile). Also, I think it would be good to have a common fooRandom() routine, instead of having a second copy of PostmasterRandom. Might I suggest putting it in a new file under src/common/. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Sep 19, 2013 at 11:32 AM, Fujii Masao-2 [via PostgreSQL] ml-node+s1045698n5771565...@n5.nabble.com wrote: On Thu, Sep 19, 2013 at 2:25 PM, samthakur74 [hidden email]http://user/SendEmail.jtp?type=nodenode=5771565i=0 wrote: I got the segmentation fault when I tested the case where the least-executed query statistics is discarded, i.e., when I executed different queries more than pg_stat_statements.max times. I guess that the patch might have a bug. Thanks, will try to fix it. pg_stat_statements--1.1.sql should be removed. Yes will do that + entrystructfieldqueryid/structfield/entry + entrytypebigint/type/entry + entry/entry + entryUnique value of each representative statement for the current statistics session. + This value will change for each new statistics session./entry What does statistics session mean? The time period when statistics are gathered by statistics collector without being reset. So the statistics session continues across normal shutdowns, but in case of abnormal situations like crashes, format upgrades or statistics being reset for any other reason, a new time period of statistics collection starts i.e. a new statistics session. The queryid value generation is linked to statistics session so emphasize the fact that in case of crashes,format upgrades or any situation of statistics reset, the queryid for the same queries will also change. I'm afraid that this behavior narrows down the use case of queryid very much. For example, since the queryid of the same query would not be the same in the master and the standby servers, we cannot associate those two statistics by using the queryid. The queryid changes through the crash recovery, so we cannot associate the query statistics generated before the crash with that generated after the crash recovery even if the query is the same. Yes, these are limitations in this approach. The other approaches suggested included 1. Expose query id hash value as is, in the view, but document the fact that it will be unstable between releases 2. Expose query id hash value via an undocumented function and let more expert users decided if they want to use it. The approach of using statistics session id to generate queryid is a compromise between not exposing it at all and exposing it without warning the users of unstable hash value from query tree between releases. This is not directly related to the patch itself, but why does the queryid need to be calculated based on also the statistics session? If we expose hash value of query tree, without using statistics session, it is possible that users might make wrong assumption that this value remains stable across version upgrades, when in reality it depends on whether the version has make changes to query tree internals. So to explicitly ensure that users do not make this wrong assumption, hash value generation use statistics session id, which is newly created under situations described above. Will update documentation clearly explain the term statistics session in this context Yep, that's helpful! Regards, Sameer -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771701.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Fri, Sep 20, 2013 at 1:11 AM, Daniel Farina dan...@fdr.io wrote: I think the n-call underestimation propagation may not be quite precise for various detailed reasons (having to do with 'sticky' queries) and to make it precise is probably more work than it's worth. And, on more reflection, I'm also having a hard time imaging people intuiting that value usefully. So, here's a version removing that. I forgot about removal of the relevant SGML, amended here in v6. pg_stat_statements-identification-v6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Sep 19, 2013 at 2:25 PM, samthakur74 samthaku...@gmail.com wrote: I got the segmentation fault when I tested the case where the least-executed query statistics is discarded, i.e., when I executed different queries more than pg_stat_statements.max times. I guess that the patch might have a bug. Thanks, will try to fix it. pg_stat_statements--1.1.sql should be removed. Yes will do that + entrystructfieldqueryid/structfield/entry + entrytypebigint/type/entry + entry/entry + entryUnique value of each representative statement for the current statistics session. + This value will change for each new statistics session./entry What does statistics session mean? The time period when statistics are gathered by statistics collector without being reset. So the statistics session continues across normal shutdowns, but in case of abnormal situations like crashes, format upgrades or statistics being reset for any other reason, a new time period of statistics collection starts i.e. a new statistics session. The queryid value generation is linked to statistics session so emphasize the fact that in case of crashes,format upgrades or any situation of statistics reset, the queryid for the same queries will also change. I'm afraid that this behavior narrows down the use case of queryid very much. For example, since the queryid of the same query would not be the same in the master and the standby servers, we cannot associate those two statistics by using the queryid. The queryid changes through the crash recovery, so we cannot associate the query statistics generated before the crash with that generated after the crash recovery even if the query is the same. This is not directly related to the patch itself, but why does the queryid need to be calculated based on also the statistics session? Will update documentation clearly explain the term statistics session in this context Yep, that's helpful! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Wed, Sep 18, 2013 at 2:41 PM, Sameer Thakur samthaku...@gmail.com wrote: You seem to have forgotten to include the pg_stat_statements--1.2.sql and pg_stat_statements--1.1--1.2.sql in the patch. Sorry again. Please find updated patch attached. I did not add pg_stat_statements--1.2.sql. I have added that now and updated the patch again. Thanks! I got the segmentation fault when I tested the case where the least-executed query statistics is discarded, i.e., when I executed different queries more than pg_stat_statements.max times. I guess that the patch might have a bug. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Thu, Sep 19, 2013 at 2:41 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Sep 18, 2013 at 2:41 PM, Sameer Thakur samthaku...@gmail.com wrote: You seem to have forgotten to include the pg_stat_statements--1.2.sql and pg_stat_statements--1.1--1.2.sql in the patch. Sorry again. Please find updated patch attached. I did not add pg_stat_statements--1.2.sql. I have added that now and updated the patch again. pg_stat_statements--1.1.sql should be removed. + entrystructfieldqueryid/structfield/entry + entrytypebigint/type/entry + entry/entry + entryUnique value of each representative statement for the current statistics session. + This value will change for each new statistics session./entry What does statistics session mean? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
I got the segmentation fault when I tested the case where the least-executed query statistics is discarded, i.e., when I executed different queries more than pg_stat_statements.max times. I guess that the patch might have a bug. Thanks, will try to fix it. pg_stat_statements--1.1.sql should be removed. Yes will do that + entrystructfieldqueryid/structfield/entry + entrytypebigint/type/entry + entry/entry + entryUnique value of each representative statement for the current statistics session. + This value will change for each new statistics session./entry What does statistics session mean? The time period when statistics are gathered by statistics collector without being reset. So the statistics session continues across normal shutdowns, but in case of abnormal situations like crashes, format upgrades or statistics being reset for any other reason, a new time period of statistics collection starts i.e. a new statistics session. The queryid value generation is linked to statistics session so emphasize the fact that in case of crashes,format upgrades or any situation of statistics reset, the queryid for the same queries will also change. Will update documentation clearly explain the term statistics session in this context regards Sameer -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771562.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sun, Sep 15, 2013 at 3:54 PM, samthakur74 samthaku...@gmail.com wrote: You have added this email to the commit fest, but it contains no patch. Please add the email with the actual patch. I hope its attached now! You seem to have forgotten to include the pg_stat_statements--1.2.sql and pg_stat_statements--1.1--1.2.sql in the patch. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
You seem to have forgotten to include the pg_stat_statements--1.2.sql and pg_stat_statements--1.1--1.2.sql in the patch. Sorry again. Please find updated patch attached. http://www.postgresql.org/mailpref/pgsql-hackers NAMLhttp://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml On Tue, Sep 17, 2013 at 5:47 PM, Fujii Masao-2 [via PostgreSQL] ml-node+s1045698n5771213...@n5.nabble.com wrote: On Sun, Sep 15, 2013 at 3:54 PM, samthakur74 [hidden email]http://user/SendEmail.jtp?type=nodenode=5771213i=0 wrote: You have added this email to the commit fest, but it contains no patch. Please add the email with the actual patch. I hope its attached now! You seem to have forgotten to include the pg_stat_statements--1.2.sql and pg_stat_statements--1.1--1.2.sql in the patch. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list ([hidden email]http://user/SendEmail.jtp?type=nodenode=5771213i=1) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- If you reply to this email, your message will be added to the discussion below: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771213.html To unsubscribe from pg_stat_statements: calls under-estimation propagation, click herehttp://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=5738128code=c2FtdGhha3VyNzRAZ21haWwuY29tfDU3MzgxMjh8ODM4MzYxMTcy . NAMLhttp://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml pg_stat_statements-identification-v4.patch.gz (9K) http://postgresql.1045698.n5.nabble.com/attachment/5771248/0/pg_stat_statements-identification-v4.patch.gz -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771248.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Tue, Sep 17, 2013 at 10:48 PM, samthakur74 samthaku...@gmail.com wrote: You seem to have forgotten to include the pg_stat_statements--1.2.sql and pg_stat_statements--1.1--1.2.sql in the patch. Sorry again. Please find updated patch attached. pg_stat_statements--1.2.sql is missing. Could you confirm that the patch includes all the changes? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
You seem to have forgotten to include the pg_stat_statements--1.2.sql and pg_stat_statements--1.1--1.2.sql in the patch. Sorry again. Please find updated patch attached. I did not add pg_stat_statements--1.2.sql. I have added that now and updated the patch again. The patch attached should contain following file changes patching file contrib/pg_stat_statements/Makefile patching file contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql patching file contrib/pg_stat_statements/pg_stat_statements--1.2.sql patching file contrib/pg_stat_statements/pg_stat_statements.c patching file contrib/pg_stat_statements/pg_stat_statements.control patching file doc/src/sgml/pgstatstatements.sgml regards Sameer pg_stat_statements-identification-v4.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
You have added this email to the commit fest, but it contains no patch. Please add the email with the actual patch. I hope its attached now! Maybe the author should be given a chance to update the patches, though, because they are quite old. I did connect with Daniel and he did have some improvement ideas. I am not sure when they could be implemented. Since we have a interest in the current version of the patch, which needed documentation, i tried to complete that. Thank you, Sameer pg_stat_statements-identification-v4.patch.gz (8K) http://postgresql.1045698.n5.nabble.com/attachment/5770937/0/pg_stat_statements-identification-v4.patch.gz -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5770937.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
This patch needs documentation. At a minimum, the new calls_underest field needs to be listed in the description of the pg_stat_statements. I have attached a version which includes documentation. pg_stat_statements-identification-v4.patch.gz http://postgresql.1045698.n5.nabble.com/file/n5770844/pg_stat_statements-identification-v4.patch.gz Pardon for not following the discussion: What exactly does the calls_underest field mean? I couldn't decipher it from the patch. What can an admin do with the value? How does it compare with just bumping up pg_stat_statements.max? Paraphrasing the documentation. There is a possibility that,for queries which are tracked intermittently, could have their statistics silently reset. The calls_underest field gives an indication that a query has been tracked intermittently and consequently have a higher probability of erroneous statistics. Queries tracked constantly will have a zero value for calls_underest indicating zero probability of erroneous statistics. A greater value of calls_underest indicates greater degree of inconsistent tracking which in turn means greater possibility of erroneous statistics due to statistics being reset when query was not being tracked. Increasing pg_stat_statements.max reduces the possibility of a query being tracked intermittently but does not address the problem of indicating the probability of erroneous statistics if the query is indeed being tracked intermittently. I do not believe the admin can influence the value of calls_underest as it is not a GUC parameter. We have a need to see this patch committed hence the revived interest in this thread regards Sameer -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5770844.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sat, 2013-09-14 at 03:51 -0700, samthakur74 wrote: We have a need to see this patch committed hence the revived interest in this thread You have added this email to the commit fest, but it contains no patch. Please add the email with the actual patch. Maybe the author should be given a chance to update the patches, though, because they are quite old. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On 30.12.2012 08:31, Daniel Farina wrote: A version implementing that is attached, except I generate an additional 64-bit session not exposed to the client to prevent even casual de-leaking of the session state. That may seem absurd, until someone writes a tool that de-xors things and relies on it and then nobody feels inclined to break it. It also keeps the public session number short. I also opted to save the underestimate since I'm adding a handful of fixed width fields to the file format anyway. This patch needs documentation. At a minimum, the new calls_underest field needs to be listed in the description of the pg_stat_statements. Pardon for not following the discussion: What exactly does the calls_underest field mean? I couldn't decipher it from the patch. What can an admin do with the value? How does it compare with just bumping up pg_stat_statements.max? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
Attached is a cumulative patch attempting to address the below. One can see the deltas to get there at https://github.com/fdr/postgres.git error-prop-pg_stat_statements-v2. On Fri, Dec 28, 2012 at 9:58 AM, Peter Geoghegan pe...@2ndquadrant.com wrote: However, with this approach, calls_underest values might appear to the user in what might be considered an astonishing order. Now, I'm not suggesting that that's a real problem - just that they may not be the semantics we want, particularly as we can reasonably defer assigning a calls_underest until a sticky entry is unstuck, and an entry becomes user-visible, within pgss_store(). Fix for this as I understand it: *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** *** 1036,1041 pgss_store(const char *query, uint32 queryId, --- 1036,1042 e-counters.usage = USAGE_INIT; e-counters.calls += 1; + e-counters.calls_underest = pgss-calls_max_underest; e-counters.total_time += total_time; e-counters.rows += rows; e-counters.shared_blks_hit += bufusage-shared_blks_hit; *** *** 1264,1272 entry_alloc(pgssHashKey *key, const char *query, int query_len, bool sticky) /* set the appropriate initial usage count */ entry-counters.usage = sticky ? pgss-cur_median_usage : USAGE_INIT; - /* propagate calls under-estimation bound */ - entry-counters.calls_underest = pgss-calls_max_underest; - /* re-initialize the mutex each time ... we assume no one using it */ SpinLockInit(entry-mutex); /* ... and don't forget the query text */ --- 1265,1270 Also, it seems like you should initialise pgss-calls_max_underest, within pgss_shmem_startup(). Easy enough. Somehow I wrongly thought zero-initialization was a thing for the shmem functions. *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** *** 426,431 pgss_shmem_startup(void) --- 426,432 { /* First time through ... */ pgss-lock = LWLockAssign(); + pgss-calls_max_underest = 0; pgss-query_size = pgstat_track_activity_query_size; pgss-cur_median_usage = ASSUMED_MEDIAN_INIT; } You should probably serialise the value to disk, and initialise it to 0 if there is no such value to begin with. I prefer different approach here: just compute it while loading the entries from disk, since the calls + underestimation can be used to find a new pessimum underestimation global value. *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** *** 419,424 pgss_shmem_startup(void) --- 419,425 int query_size; int buffer_size; char *buffer = NULL; + int64 calls_max_underest = 0; if (prev_shmem_startup_hook) prev_shmem_startup_hook(); *** *** 440,446 pgss_shmem_startup(void) { /* First time through ... */ pgss-lock = LWLockAssign(); ! pgss-calls_max_underest = 0; pgss-query_size = pgstat_track_activity_query_size; pgss-cur_median_usage = ASSUMED_MEDIAN_INIT; } --- 441,447 { /* First time through ... */ pgss-lock = LWLockAssign(); ! pgss-calls_max_underest = calls_max_underest; pgss-query_size = pgstat_track_activity_query_size; pgss-cur_median_usage = ASSUMED_MEDIAN_INIT; } *** *** 528,533 pgss_shmem_startup(void) --- 529,545 temp.query_len, query_size - 1); + /* +* Compute maxima of under-estimation over the read entries +* for reinitializing pgss-calls_max_underest. +*/ + { + int64 cur_underest; + + cur_underest = temp.calls + temp.calls_underest; + calls_max_underest = Max(calls_max_underest, cur_underest); + } + /* make the hashtable entry (discards old entries if too many) */ entry = entry_alloc(temp.key, buffer, temp.query_len, false); *** *** 535,540 pgss_shmem_startup(void) --- 547,559 entry-counters = temp.counters; } + /* +* Reinitialize global under-estimation
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sat, Dec 29, 2012 at 4:21 AM, Daniel Farina drfar...@acm.org wrote: These were not express goals of the patch, but so long as you are inviting features, attached is a bonus patch that exposes the queryid and also the notion of a statistics session that is re-rolled whenever the stats file could not be read or the stats are reset, able to fully explain all obvious causes of retrograde motion in statistics. It too is cumulative, so it includes the under-estimation field. Notably, I also opted to nullify extra pg_stat_statements fields when they'd also show insufficient privileges (that one is spared from this censorship), because I feel as though a bit too much information leaks from pg_stat_statement's statistics to ignore, especially after adding the query id. Since the common theme here is identifying queries, I have called it pg_stat_statements-identification, and it can be found in the git repo above under the same name (...-v1). A small amendment that doesn't really change the spirit of the narrative is attached. -- fdr pg_stat_statements-identification-v2.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On 29 December 2012 12:21, Daniel Farina drfar...@acm.org wrote: These were not express goals of the patch, but so long as you are inviting features, attached is a bonus patch that exposes the queryid and also the notion of a statistics session that is re-rolled whenever the stats file could not be read or the stats are reset, able to fully explain all obvious causes of retrograde motion in statistics. It too is cumulative, so it includes the under-estimation field. Cool. I had a thought about Tom's objection to exposing the hash value. I would like to propose a compromise between exposing the hash value and not doing so at all. What if we expose the hash value without documenting it, in a way not apparent to normal users, while letting experts willing to make an executive decision about its stability use it? What I have in mind is to expose the hash value from the pg_stat_statements function, and yet to avoid exposing it within the pg_stat_statements view definition. The existence of the hash value would not need to be documented, since the pg_stat_statements function is an undocumented implementation detail. Precedent for this exists, I think - the undocumented system hash functions are exposed via an SQL interface. Some satellite projects rely on this (apparently the pl/proxy documentation shows the use of hashtext(), which is a thin wrapper on hash_any(), and there is chatter about it elsewhere). So it is already the case that people are using hashtext(), which should not be problematic if the applications that do so have a reasonable set of expectations about its stability (i.e. it's not going to change in a point release, because that would break hash indexes, but may well change across major releases). We've already in effect promised to not break hashtext() in a point release, just as we've already in effect promised to not break the hash values that pg_stat_statements uses internally (to do any less would invalidate the on-disk representation, and necessitate bumping PGSS_FILE_HEADER to wipe the stored stats). Thoughts? Notably, I also opted to nullify extra pg_stat_statements fields when they'd also show insufficient privileges (that one is spared from this censorship), because I feel as though a bit too much information leaks from pg_stat_statement's statistics to ignore, especially after adding the query id. That seems sensible. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sat, Dec 29, 2012 at 6:37 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 29 December 2012 12:21, Daniel Farina drfar...@acm.org wrote: These were not express goals of the patch, but so long as you are inviting features, attached is a bonus patch that exposes the queryid and also the notion of a statistics session that is re-rolled whenever the stats file could not be read or the stats are reset, able to fully explain all obvious causes of retrograde motion in statistics. It too is cumulative, so it includes the under-estimation field. Cool. I had a thought about Tom's objection to exposing the hash value. I would like to propose a compromise between exposing the hash value and not doing so at all. As I recall, the gist of this objection had to do with a false sense of stability of the hash value, and the desire to enforce the ability to alter it. Here's an option: xor the hash value with the 'statistics session id', so it's *known* to be unstable between sessions. That gets you continuity in the common case and sound deprecation in the less-common cases (crashes, format upgrades, stat resetting). A change in hash function should also then necessitate changing the stat file header. Another more minor issue is the hard-wiring of the precision of the id. For that reason, I have thought it reasonable to expose this as a string also. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On 30 December 2012 02:45, Daniel Farina dan...@heroku.com wrote: As I recall, the gist of this objection had to do with a false sense of stability of the hash value, and the desire to enforce the ability to alter it. Here's an option: xor the hash value with the 'statistics session id', so it's *known* to be unstable between sessions. That gets you continuity in the common case and sound deprecation in the less-common cases (crashes, format upgrades, stat resetting). Hmm. I like the idea, but a concern there would be that you'd introduce additional scope for collisions in the third-party utility building time-series data from snapshots. I currently put the probability of a collision within pg_stat_statements as 1% in the event of a pg_stat_statements.max of 10,000. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sat, Dec 29, 2012 at 7:12 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 30 December 2012 02:45, Daniel Farina dan...@heroku.com wrote: As I recall, the gist of this objection had to do with a false sense of stability of the hash value, and the desire to enforce the ability to alter it. Here's an option: xor the hash value with the 'statistics session id', so it's *known* to be unstable between sessions. That gets you continuity in the common case and sound deprecation in the less-common cases (crashes, format upgrades, stat resetting). Hmm. I like the idea, but a concern there would be that you'd introduce additional scope for collisions in the third-party utility building time-series data from snapshots. I currently put the probability of a collision within pg_stat_statements as 1% in the event of a pg_stat_statements.max of 10,000. We can use a longer session key and duplicate the queryid (effectively padding) a couple of times to complete the XOR. I think that makes the cases of collisions introduced by this astronomically low, as an increase over the base collision rate. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sat, Dec 29, 2012 at 7:16 PM, Daniel Farina dan...@heroku.com wrote: On Sat, Dec 29, 2012 at 7:12 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 30 December 2012 02:45, Daniel Farina dan...@heroku.com wrote: As I recall, the gist of this objection had to do with a false sense of stability of the hash value, and the desire to enforce the ability to alter it. Here's an option: xor the hash value with the 'statistics session id', so it's *known* to be unstable between sessions. That gets you continuity in the common case and sound deprecation in the less-common cases (crashes, format upgrades, stat resetting). Hmm. I like the idea, but a concern there would be that you'd introduce additional scope for collisions in the third-party utility building time-series data from snapshots. I currently put the probability of a collision within pg_stat_statements as 1% in the event of a pg_stat_statements.max of 10,000. We can use a longer session key and duplicate the queryid (effectively padding) a couple of times to complete the XOR. I think that makes the cases of collisions introduced by this astronomically low, as an increase over the base collision rate. A version implementing that is attached, except I generate an additional 64-bit session not exposed to the client to prevent even casual de-leaking of the session state. That may seem absurd, until someone writes a tool that de-xors things and relies on it and then nobody feels inclined to break it. It also keeps the public session number short. I also opted to save the underestimate since I'm adding a handful of fixed width fields to the file format anyway. -- fdr pg_stat_statements-identification-v3.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_stat_statements: calls under-estimation propagation
Hello, After long delay (sorry) here's a patch implementing what was hand-waved at in http://archives.postgresql.org/pgsql-hackers/2012-10/msg00176.php I am still something at a loss at how to test it besides prodding it by hand; it seems like it's going to involve infrastructure or introducing hooks into pg_stat_statements for the express purpose. The patch can also be sourced from: https://github.com/fdr/postgres.git error-prop-pg_stat_statements Without further ado, the cover letter taken from the top of the patch: This tries to establish a maximum under-estimate of the number of calls for a given pg_stat_statements entry. That means the number of calls to the canonical form of the query is between 'calls' and 'calls + calls_underest'. This is useful to determine when accumulating statistics if a given record is bouncing in and out of the pg_stat_statements table, having its ncalls reset all the time, but also having calls_underest grow very rapidly. Records that always stay in pg_stat_statements will have a calls_underest that do not change at all. An interesting case is when a query that usually is called is not called for a while, and falls out of pg_stat_statements. The result can be that the query with the most 'calls' can also have more uncertainty than the query with the second most calls, which is also exactly the truth in reality. Unceremoniously bundled into this patch is a reduction in the minimum table size for pg_stat_statements, from 100 to 1. Using tiny values is not likely to be seen in production, but makes testing the patch a lot easier in some situations. I will add this to the commitfest. -- fdr add-pg_stat_statements-calls-underestimation-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On 28 December 2012 11:43, Daniel Farina drfar...@acm.org wrote: Without further ado, the cover letter taken from the top of the patch: This tries to establish a maximum under-estimate of the number of calls for a given pg_stat_statements entry. That means the number of calls to the canonical form of the query is between 'calls' and 'calls + calls_underest'. Cool. One possible issue I see with this is that this code: + + /* propagate calls under-estimation bound */ + entry-counters.calls_underest = pgss-calls_max_underest; + which appears within entry_alloc(). So if the first execution of the query results in an error during planning or (much more likely) execution, as in the event of an integrity constraint violation, you're going to get a dead sticky entry that no one will ever see. Now, we currently decay sticky entries much more aggressively, so the mere fact that we waste an entry isn't a real problem. This was established by this commit: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d5375491f8e391224b48e4bb449995a4642183ea However, with this approach, calls_underest values might appear to the user in what might be considered an astonishing order. Now, I'm not suggesting that that's a real problem - just that they may not be the semantics we want, particularly as we can reasonably defer assigning a calls_underest until a sticky entry is unstuck, and an entry becomes user-visible, within pgss_store(). Also, it seems like you should initialise pgss-calls_max_underest, within pgss_shmem_startup(). You should probably serialise the value to disk, and initialise it to 0 if there is no such value to begin with. I wonder if the way that pg_stat_statements throws its hands up when it comes to crash safety (i.e. the contents of the hash table are completely lost) could be a concern here. In other words, a program tasked with tracking execution costs over time and graphing time-series data from snapshots has to take responsibility for ensuring that there hasn't been a crash (or, indeed, a reset). Another issue is that I don't think that what you've done here solves the problem of uniquely identify each entry over time, in the same way that simply exposing the hash value would. I'm concerned with the related problem to the problem solved here - simply identifying the entry uniquely. As we've already discussed, the query string is an imperfect proxy for each entry, even with constants swapped with '?' characters (and even when combined with the userid and dbid values - it's still not the same as the hashtable key, in a way that is likely to bother people that use pg_stat_statements for long enough). I think you probably should have created a PG_STAT_STATEMENTS_COLS_V1_1 macro, since that version of the module is now legacy, like *V1_0 is in HEAD. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers