subject:"\[HACKERS\] pg_stat_statements\: calls under\-estimation propagation"

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-12-08 Thread Magnus Hagander

On Sun, Dec 8, 2013 at 1:00 AM, Peter Geoghegan p...@heroku.com wrote:

 On Sat, Dec 7, 2013 at 3:50 PM, Peter Eisentraut pete...@gmx.net wrote:
  32-bit buildfarm members are having problems with this patch.

 This should fix that problem. Thanks.


Applied.

I also noted on
http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=frogmouthdt=2013-12-08%2007%3A30%3A01stg=make-contrib
that
there are compiler warnings being generated in pgss. But from a quick look
that looks like something pre-existing and not caused by the latest patch.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-12-08 Thread Peter Eisentraut

On Sat, 2013-12-07 at 16:00 -0800, Peter Geoghegan wrote:
 On Sat, Dec 7, 2013 at 3:50 PM, Peter Eisentraut pete...@gmx.net wrote:
  32-bit buildfarm members are having problems with this patch.
 
 This should fix that problem. Thanks.

This is incidentally the same problem that was reported here about
another pg_stat_statements patch:
http://www.postgresql.org/message-id/bf2827dcce55594c8d7a8f7ffd3ab7713ddac...@szxeml508-mbx.china.huawei.com

Can we make this more robust so that we don't accidentally keep breaking
32-bit systems?  Maybe a static assert?




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-12-07 Thread Fujii Masao

On Sat, Dec 7, 2013 at 6:31 AM, Peter Geoghegan p...@heroku.com wrote:
 On Fri, Dec 6, 2013 at 12:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 There seems to be no problem even if we use bigint as the type of
 unsigned 32-bit integer like queryid. For example, txid_current()
 returns the transaction ID, i.e., unsigned 32-bit integer, as bigint.
 Could you tell me what the problem is when using bigint for queryid?

 We're talking about the output of some view, right, not internal storage?
 +1 for using bigint for that.  Using OID is definitely an abuse, because
 the value *isn't* an OID.  And besides, what if we someday decide we need
 64-bit keys not 32-bit?

 Fair enough. I was concerned about the cost of external storage of
 64-bit integers (unlike query text, they might have to be stored many
 times for many distinct intervals or something like that), but in
 hindsight that was fairly miserly of me.

 Attached revision displays signed 64-bit integers instead.

Thanks! Looks good to me. Committed!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-12-07 Thread Peter Eisentraut

On Sun, 2013-12-08 at 02:08 +0900, Fujii Masao wrote:
  Attached revision displays signed 64-bit integers instead.
 
 Thanks! Looks good to me. Committed!

32-bit buildfarm members are having problems with this patch.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-12-07 Thread Peter Geoghegan

On Sat, Dec 7, 2013 at 3:50 PM, Peter Eisentraut pete...@gmx.net wrote:
 32-bit buildfarm members are having problems with this patch.

This should fix that problem. Thanks.


-- 
Peter Geoghegan
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
new file mode 100644
index 4e262b4..9f3e376
*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
*** pg_stat_statements(PG_FUNCTION_ARGS)
*** 1160,1165 
--- 1160,1166 
  		bool		nulls[PG_STAT_STATEMENTS_COLS];
  		int			i = 0;
  		Counters	tmp;
+ 		int64		queryid = entry-key.queryid;
  
  		memset(values, 0, sizeof(values));
  		memset(nulls, 0, sizeof(nulls));
*** pg_stat_statements(PG_FUNCTION_ARGS)
*** 1172,1178 
  			char	   *qstr;
  
  			if (detected_version = PGSS_V1_2)
! values[i++] = Int64GetDatumFast((int64) entry-key.queryid);
  
  			qstr = (char *)
  pg_do_encoding_conversion((unsigned char *) entry-query,
--- 1173,1179 
  			char	   *qstr;
  
  			if (detected_version = PGSS_V1_2)
! values[i++] = Int64GetDatumFast(queryid);
  
  			qstr = (char *)
  pg_do_encoding_conversion((unsigned char *) entry-query,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-12-06 Thread Fujii Masao

On Sun, Nov 24, 2013 at 10:58 AM, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Nov 18, 2013 at 1:54 AM, Sameer Thakur samthaku...@gmail.com wrote:
 Please find v10 of patch attached. This patch addresses following
 review comments

 I've cleaned this up - revision attached - and marked it ready for 
 committer.

 I decided that queryid should be of type oid, not bigint. This is
 arguably a slight abuse of notation, but since ultimately Oids are
 just abstract object identifiers (so say the docs), but also because
 there is no other convenient, minimal way of representing unsigned
 32-bit integers in the view that I'm aware of, I'm inclined to think
 that it's appropriate.

There seems to be no problem even if we use bigint as the type of
unsigned 32-bit integer like queryid. For example, txid_current()
returns the transaction ID, i.e., unsigned 32-bit integer, as bigint.
Could you tell me what the problem is when using bigint for queryid?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-12-06 Thread Tom Lane

Fujii Masao masao.fu...@gmail.com writes:
 On Sun, Nov 24, 2013 at 10:58 AM, Peter Geoghegan p...@heroku.com wrote:
 I decided that queryid should be of type oid, not bigint. This is
 arguably a slight abuse of notation, but since ultimately Oids are
 just abstract object identifiers (so say the docs), but also because
 there is no other convenient, minimal way of representing unsigned
 32-bit integers in the view that I'm aware of, I'm inclined to think
 that it's appropriate.

 There seems to be no problem even if we use bigint as the type of
 unsigned 32-bit integer like queryid. For example, txid_current()
 returns the transaction ID, i.e., unsigned 32-bit integer, as bigint.
 Could you tell me what the problem is when using bigint for queryid?

We're talking about the output of some view, right, not internal storage?
+1 for using bigint for that.  Using OID is definitely an abuse, because
the value *isn't* an OID.  And besides, what if we someday decide we need
64-bit keys not 32-bit?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-12-06 Thread Peter Geoghegan

On Fri, Dec 6, 2013 at 12:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 There seems to be no problem even if we use bigint as the type of
 unsigned 32-bit integer like queryid. For example, txid_current()
 returns the transaction ID, i.e., unsigned 32-bit integer, as bigint.
 Could you tell me what the problem is when using bigint for queryid?

 We're talking about the output of some view, right, not internal storage?
 +1 for using bigint for that.  Using OID is definitely an abuse, because
 the value *isn't* an OID.  And besides, what if we someday decide we need
 64-bit keys not 32-bit?

Fair enough. I was concerned about the cost of external storage of
64-bit integers (unlike query text, they might have to be stored many
times for many distinct intervals or something like that), but in
hindsight that was fairly miserly of me.

Attached revision displays signed 64-bit integers instead.

-- 
Peter Geoghegan
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
new file mode 100644
index e8aed61..95a2767
*** a/contrib/pg_stat_statements/Makefile
--- b/contrib/pg_stat_statements/Makefile
*** MODULE_big = pg_stat_statements
*** 4,11 
  OBJS = pg_stat_statements.o
  
  EXTENSION = pg_stat_statements
! DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \
! 	pg_stat_statements--unpackaged--1.0.sql
  
  ifdef USE_PGXS
  PG_CONFIG = pg_config
--- 4,11 
  OBJS = pg_stat_statements.o
  
  EXTENSION = pg_stat_statements
! DATA = pg_stat_statements--1.2.sql pg_stat_statements--1.1--1.2.sql \
! 	pg_stat_statements--1.0--1.1.sql pg_stat_statements--unpackaged--1.0.sql
  
  ifdef USE_PGXS
  PG_CONFIG = pg_config
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
new file mode 100644
index ...7824e3e
*** a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
***
*** 0 
--- 1,43 
+ /* contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql */
+ 
+ -- complain if script is sourced in psql, rather than via ALTER EXTENSION
+ \echo Use ALTER EXTENSION pg_stat_statements UPDATE to load this file. \quit
+ 
+ /* First we have to remove them from the extension */
+ ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements();
+ 
+ /* Then we can drop them */
+ DROP VIEW pg_stat_statements;
+ DROP FUNCTION pg_stat_statements();
+ 
+ /* Now redefine */
+ CREATE FUNCTION pg_stat_statements(
+ OUT userid oid,
+ OUT dbid oid,
+ OUT queryid bigint,
+ OUT query text,
+ OUT calls int8,
+ OUT total_time float8,
+ OUT rows int8,
+ OUT shared_blks_hit int8,
+ OUT shared_blks_read int8,
+ OUT shared_blks_dirtied int8,
+ OUT shared_blks_written int8,
+ OUT local_blks_hit int8,
+ OUT local_blks_read int8,
+ OUT local_blks_dirtied int8,
+ OUT local_blks_written int8,
+ OUT temp_blks_read int8,
+ OUT temp_blks_written int8,
+ OUT blk_read_time float8,
+ OUT blk_write_time float8
+ )
+ RETURNS SETOF record
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ CREATE VIEW pg_stat_statements AS
+   SELECT * FROM pg_stat_statements();
+ 
+ GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.1.sql b/contrib/pg_stat_statements/pg_stat_statements--1.1.sql
new file mode .
index 42e4d68..e69de29
*** a/contrib/pg_stat_statements/pg_stat_statements--1.1.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--1.1.sql
***
*** 1,43 
- /* contrib/pg_stat_statements/pg_stat_statements--1.1.sql */
- 
- -- complain if script is sourced in psql, rather than via CREATE EXTENSION
- \echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit
- 
- -- Register functions.
- CREATE FUNCTION pg_stat_statements_reset()
- RETURNS void
- AS 'MODULE_PATHNAME'
- LANGUAGE C;
- 
- CREATE FUNCTION pg_stat_statements(
- OUT userid oid,
- OUT dbid oid,
- OUT query text,
- OUT calls int8,
- OUT total_time float8,
- OUT rows int8,
- OUT shared_blks_hit int8,
- OUT shared_blks_read int8,
- OUT shared_blks_dirtied int8,
- OUT shared_blks_written int8,
- OUT local_blks_hit int8,
- OUT local_blks_read int8,
- OUT local_blks_dirtied int8,
- OUT local_blks_written int8,
- OUT temp_blks_read int8,
- OUT temp_blks_written int8,
- OUT blk_read_time float8,
- OUT blk_write_time float8
- )
- RETURNS SETOF record
- AS 'MODULE_PATHNAME'
- LANGUAGE C;
- 
- -- Register a view on the function for ease of use.
- CREATE VIEW pg_stat_statements AS
-   SELECT * FROM pg_stat_statements();
- 
- GRANT SELECT ON pg_stat_statements TO PUBLIC;
- 
- -- Don't want this to be available to non-superusers.
- REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC;
--- 0

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-12-04 Thread Sameer Thakur

 I've cleaned this up - revision attached - and marked it ready for
 committer.
 Thank you for this.

  I did the basic hygiene test. The patch applies correctly and compiles
with no warnings. Did not find anything broken in basic functionality.
  In the documentation i have a minor suggestion of replacing phrase might
judge to be a non-distinct  with - may judge to be non-  distinct.

regards
Sameer




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5781577.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-11-23 Thread Peter Geoghegan

On Mon, Nov 18, 2013 at 1:54 AM, Sameer Thakur samthaku...@gmail.com wrote:
 Please find v10 of patch attached. This patch addresses following
 review comments

I've cleaned this up - revision attached - and marked it ready for committer.

I decided that queryid should be of type oid, not bigint. This is
arguably a slight abuse of notation, but since ultimately Oids are
just abstract object identifiers (so say the docs), but also because
there is no other convenient, minimal way of representing unsigned
32-bit integers in the view that I'm aware of, I'm inclined to think
that it's appropriate.

In passing, I've made pg_stat_statements invalidate serialized entries
if there is a change in major version. This seems desirable as a
catch-all invalidator of entries.

I note that Tom has objected to exposing the queryid in the past, on
numerous occasions. I'm more confident than ever that it's actually
the right thing to do. I've had people I don't know walk up to me at
conferences and ask me what we don't already expose this at least
twice now. There are very strong indications that many people want
this, and given that I've documented the caveats, I think that we
should trust those calling for this. At the very least, it allows
people to GROUP BY queryid, when they don't want things broken out by
userid.

We're self-evidently already effectively relying on the queryid to be
as stable as it is documented to be in this patch. The hash function
cannot really change in minor releases, because to do so would at the
very least necessitate re-indexing hash indexes, and would of course
invalidate internally managed pg_stat_statements entries, both of
which are undesirable outcomes (and therefore, for these reasons and
more, unlikely). Arguments for not documenting hash_any() do not apply
here -- we're already suffering the full consequences of whatever
queryid instability may exist.

Quite apart from all of that, I think we need to have a way of
identifying particular entries for the purposes of supporting
per-entry settings. Recent discussion about min/max query time, or
somehow characterizing the distribution of each entry's historic
execution time (or whatever) have not considered one important
questoin: What are you supposed to do when you find out that there is
an outlier (whatever an outlier is)?

I won't go into the details, because there is little point, but I'm
reasonably confident that it will be virtually impossible for
pg_stat_statements itself to usefully classify particular query
executions as outliers (I'm not even sure that we could do it if we
assumed a normal distribution, which would be bogus, and certainly
made very noisy by caching effects and so on. Furthermore, who are we
to say that an outlier is an execution time two sigmas to the right?
Seems useless).

Outliers are typically caused by things like bad plans, or problematic
constant values that appear in the most common values list (and are
therefore just inherently far more expensive to query against), or
lock contention. In all of those cases, with a min/max or something we
probably won't even get to know what the problematic constant values
were when response time suddenly suffers, because of course
pg_stat_statements doesn't help with that. So have we gained much?
Even with detective work, the trail might have gone cold by the time
the outlier is examined. And, monitoring is only one concern -- what
about alerting?

The bigger point is that having this will facilitate being able to
mark entries as SLA queries or something like that, where if their
execution exceeds a time (specified by the DBA per entry), that is
assumed to be very bad, and pg_stat_statements complains. That gets
dumped to the logs (which ought to be a rare occurrence when the
facility is used correctly). Of course, the (typically particularly
problematic) constant values *do* appear in the logs, and there is a
greppable keyword, potentially for the benefit of a tool like
tail_n_mail. You could think of this as being like a smart
log_min_duration_statement. I think that the DBA needs to tell
pg_stat_statements what to care about, and what bad looks like. If the
DBA doesn't know where to start specifying such things, the 5 queries
with the most calls can usefully have this set to (mean_execution_time
* 1.5) or something. SLA queries can also be pinned, perhaps  (that
is, given a stay of execution when eviction occurs).

-- 
Peter Geoghegan
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
new file mode 100644
index e8aed61..95a2767
*** a/contrib/pg_stat_statements/Makefile
--- b/contrib/pg_stat_statements/Makefile
*** MODULE_big = pg_stat_statements
*** 4,11 
  OBJS = pg_stat_statements.o
  
  EXTENSION = pg_stat_statements
! DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \
! 	pg_stat_statements--unpackaged--1.0.sql
  
  ifdef USE_PGXS
  PG_CONFIG = pg_config
--- 4,11 
  OBJS =

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-11-18 Thread Sameer Thakur

Hello,
Please find v10 of patch attached. This patch addresses following
review comments
1. Removed errcode and used elogs for error pg_stat_statements schema
is not supported by its binary
2. Removed comments and other code formatting not directly relevant to
patch functionality
3. changed position of query_id in view to userid,dbid,query_id..
4 cleaned the patch some more to avoid unnecessary  whitespaces, newlines.

I assume the usage of PGSS_TUP_LATEST after explanation given.
Also the mixing of PG_VERSION_NUM with query_id is ok after after
explanation given.

regards
Sameer


pg_stat_statements-identification-v10.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-11-14 Thread Peter Geoghegan

On Tue, Nov 5, 2013 at 5:30 AM, Sameer Thakur samthaku...@gmail.com wrote:
 Hello,
 Please find attached pg_stat_statements-identification-v9.patch.

I took a quick look. Observations:

+   /* Making query ID dependent on PG version */
+   query-queryId |= PG_VERSION_NUM  16;

If you want to do something like this, make the value of
PGSS_FILE_HEADER incorporate (PG_VERSION_NUM / 100) or something.

Why are you doing this?

@@ -128,6 +146,7 @@ typedef struct pgssEntry
pgssHashKey key;/* hash key of entry - MUST BE 
FIRST */
Counterscounters;   /* the statistics for this 
query */
int query_len;  /* # of valid bytes in 
query string */
+   uint32  query_id;   /* jumble value for this entry 
*/

query_id is already in key.

Not sure I like the idea of the new enum at all, but in any case you
shouldn't have a PGSS_TUP_LATEST constant - should someone go update
all usage of that constant only when your version isn't the latest?
Like here:

+   if (detected_version = PGSS_TUP_LATEST)

I forget why Daniel originally altered the min value of
pg_stat_statements.max to 1 (I just remember that he did), but I don't
think it holds that you should keep it there. Have you considered the
failure modes when it is actually set to 1?

This is what I call a can't happen error, or a defensive one:

+   else
+   {
+   /*
+* Couldn't identify the tuple format.  Raise error.
+*
+* This is an exceptional case that may only happen in bizarre
+* situations, since it is thought that every released version
+* of pg_stat_statements has a matching schema.
+*/
+   ereport(ERROR,
+   
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+errmsg(pg_stat_statements schema is not 
supported 
+   by its installed binary)));
+   }

I'll generally make these simple elogs(), which are more terse. No one
is going to find all that dressing useful.

Please take a look at this, for future reference:

https://wiki.postgresql.org/wiki/Creating_Clean_Patches

The whitespace changes are distracting.

It probably isn't useful to comment random, unaffected code that isn't
affected by your patch - I don't find this new refactoring useful, and
am surprised to see it in your patch:

+   /* Check header existence and magic number match. */
if (fread(header, sizeof(uint32), 1, file) != 1 ||
-   header != PGSS_FILE_HEADER ||
-   fread(num, sizeof(int32), 1, file) != 1)
+   header != PGSS_FILE_HEADER)
+   goto error;
+
+   /* Read how many table entries there are. */
+   if (fread(num, sizeof(int32), 1, file) != 1)
goto error;

Did you mean to add all this, or is it left over from Daniel's patch?

@@ -43,6 +43,7 @@
  */
 #include postgres.h

+#include time.h
 #include unistd.h

 #include access/hash.h
@@ -59,15 +60,18 @@
 #include storage/spin.h
 #include tcop/utility.h
 #include utils/builtins.h
+#include utils/timestamp.h

Final thought: I think the order in the pg_stat_statements view is
wrong. It ought to be like a composite primary key - (userid, dbid,
query_id).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-11-14 Thread Daniel Farina

On Thu, Nov 14, 2013 at 7:25 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Nov 5, 2013 at 5:30 AM, Sameer Thakur samthaku...@gmail.com wrote:
 Hello,
 Please find attached pg_stat_statements-identification-v9.patch.

 I took a quick look. Observations:

 +   /* Making query ID dependent on PG version */
 +   query-queryId |= PG_VERSION_NUM  16;

 If you want to do something like this, make the value of
 PGSS_FILE_HEADER incorporate (PG_VERSION_NUM / 100) or something.

 Why are you doing this?

The destabilization of the query_id is to attempt to skirt the
objection that the unpredictability of instability in the query_id
would otherwise cause problems and present a false contract,
particular in regards to point release upgrades.

Earliest drafts of mine included destabilizing every session, but this
kills off a nice use case of correlating query ids between primaries
and standbys, if memory serves.  This has the semantics of
destabilizing -- for sure -- every point release.

 @@ -128,6 +146,7 @@ typedef struct pgssEntry
 pgssHashKey key;/* hash key of entry - MUST 
 BE FIRST */
 Counterscounters;   /* the statistics for this 
 query */
 int query_len;  /* # of valid bytes 
 in query string */
 +   uint32  query_id;   /* jumble value for this 
 entry */

 query_id is already in key.

 Not sure I like the idea of the new enum at all, but in any case you
 shouldn't have a PGSS_TUP_LATEST constant - should someone go update
 all usage of that constant only when your version isn't the latest?
 Like here:

 +   if (detected_version = PGSS_TUP_LATEST)

It is roughly modeled after the column version of the same thing
that pre-existed in pg_stat_statements.  The only reason to have a
LATEST is for some of the invariants though; otherwise, most uses
should use the version-stamped symbol.  I did this wrongly a bunch of
times as spotted by Alvaro, I believe.

 I forget why Daniel originally altered the min value of
 pg_stat_statements.max to 1 (I just remember that he did), but I don't
 think it holds that you should keep it there. Have you considered the
 failure modes when it is actually set to 1?

Testing.  It was very useful to set to small numbers, like two or
three.  It's not crucial to the patch at all but having to whack it
back all the time to keep the patch submission devoid of it seemed
impractically tedious during revisions.

 This is what I call a can't happen error, or a defensive one:

 +   else
 +   {
 +   /*
 +* Couldn't identify the tuple format.  Raise error.
 +*
 +* This is an exceptional case that may only happen in bizarre
 +* situations, since it is thought that every released version
 +* of pg_stat_statements has a matching schema.
 +*/
 +   ereport(ERROR,
 +   
 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 +errmsg(pg_stat_statements schema is not 
 supported 
 +   by its installed binary)));
 +   }

 I'll generally make these simple elogs(), which are more terse. No one
 is going to find all that dressing useful.

That seems reasonable.  Yes, it's basically a soft assert.  I hit this
one in development a few times, it was handy.

 It probably isn't useful to comment random, unaffected code that isn't
 affected by your patch - I don't find this new refactoring useful, and
 am surprised to see it in your patch:

 +   /* Check header existence and magic number match. */
 if (fread(header, sizeof(uint32), 1, file) != 1 ||
 -   header != PGSS_FILE_HEADER ||
 -   fread(num, sizeof(int32), 1, file) != 1)
 +   header != PGSS_FILE_HEADER)
 +   goto error;
 +
 +   /* Read how many table entries there are. */
 +   if (fread(num, sizeof(int32), 1, file) != 1)
 goto error;

 Did you mean to add all this, or is it left over from Daniel's patch?

The whitespace changes are not intentional, but I think the comments
help a reader track what's going on better, which becomes more
important if one adds more fields.  It can be broken out into a
separate patch, but it didn't seem like that bookkeeping was
necessary.

 @@ -43,6 +43,7 @@
   */
  #include postgres.h

 +#include time.h
  #include unistd.h

  #include access/hash.h
 @@ -59,15 +60,18 @@
  #include storage/spin.h
  #include tcop/utility.h
  #include utils/builtins.h
 +#include utils/timestamp.h

 Final thought: I think the order in the pg_stat_statements view is
 wrong. It ought to be like a composite primary key - (userid, dbid,
 query_id).

I made no design decisions here...no complaints from me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-11-14 Thread Sameer Thakur

 I took a quick look. Observations:

 + /* Making query ID dependent on PG version */
 + query-queryId |= PG_VERSION_NUM  16;

 If you want to do something like this, make the value of
 PGSS_FILE_HEADER incorporate (PG_VERSION_NUM / 100) or something.

 Why are you doing this?

The thought was queryid should have a different value for the same
query across PG versions, to ensure that clients using
the view,do not assume otherwise.

 @@ -128,6 +146,7 @@ typedef struct pgssEntry
   pgssHashKey key; /* hash key of entry - MUST BE FIRST */
   Counters counters; /* the statistics for this query */
   int query_len; /* # of valid bytes in query string */
 + uint32 query_id; /* jumble value for this entry */

 query_id is already in key.

 Not sure I like the idea of the new enum at all, but in any case you
 shouldn't have a PGSS_TUP_LATEST constant - should someone go update
 all usage of that constant only when your version isn't the latest?
 Like here:

 + if (detected_version = PGSS_TUP_LATEST)

There is #define PGSS_TUP_LATEST PGSS_TUP_V1_2
So if an update has to be done, this is the one place to do it.

 I forget why Daniel originally altered the min value of
 pg_stat_statements.max to 1 (I just remember that he did), but I don't
 think it holds that you should keep it there. Have you considered the
 failure modes when it is actually set to 1?
Will set it back to the original value and also test for max value = 1

 This is what I call a can't happen error, or a defensive one:

 + else
 + {
 + /*
 + * Couldn't identify the tuple format.  Raise error.
 + *
 + * This is an exceptional case that may only happen in bizarre
 + * situations, since it is thought that every released version
 + * of pg_stat_statements has a matching schema.
 + */
 + ereport(ERROR,
 + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 + errmsg(pg_stat_statements schema is not supported 
 + by its installed binary)));
 + }

 I'll generally make these simple elogs(), which are more terse. No one
 is going to find all that dressing useful.
 Will convert to using elogs

 Please take a look at this, for future reference:

 https://wiki.postgresql.org/wiki/Creating_Clean_Patches

 The whitespace changes are distracting.

Thanks! Still learning the art of clean patch submission.

 It probably isn't useful to comment random, unaffected code that isn't
 affected by your patch - I don't find this new refactoring useful, and
 am surprised to see it in your patch:

 + /* Check header existence and magic number match. */
   if (fread(header, sizeof(uint32), 1, file) != 1 ||
 - header != PGSS_FILE_HEADER ||
 - fread(num, sizeof(int32), 1, file) != 1)
 + header != PGSS_FILE_HEADER)
 + goto error;
 +
 + /* Read how many table entries there are. */
 + if (fread(num, sizeof(int32), 1, file) != 1)
   goto error;

 Did you mean to add all this, or is it left over from Daniel's patch?
I think its a carry over from Daniel's code. I understand the thought.
Will keep patch strictly restricted to functionality implemented
 @@ -43,6 +43,7 @@
   */
  #include postgres.h

 +#include time.h
  #include unistd.h

  #include access/hash.h
 @@ -59,15 +60,18 @@
  #include storage/spin.h
  #include tcop/utility.h
  #include utils/builtins.h
 +#include utils/timestamp.h

 Final thought: I think the order in the pg_stat_statements view is
 wrong. It ought to be like a composite primary key - (userid, dbid,
 query_id).
Will make the change.
 --
 Peter Geoghegan

Thank you for the review
Sameer




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5778472.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-11-05 Thread Sameer Thakur

Hello,
Please find attached pg_stat_statements-identification-v9.patch.
I have tried to address the following review comments
1. Use version PGSS_TUP_V1_2
2.Fixed total time being zero
3. Remove 'session_start' from the view and use point release number
to generate queryid
4. Hide only queryid and query text and not all fields from unauthorized user
5. Removed introduced field from view and code as statistics session
concept is not being used
6. Removed struct Instrumentation usage
7. Updated sgml to reflect changes made. Removed all references to
statistics session, and introduced fields.

regards
Sameer


pg_stat_statements-identification-v9.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-12 Thread Sameer Thakur

This paragraph reads a bit strange to me:

+ A statistics session is the time period when statistics are gathered by
statistics collector
+ without being reset. So a statistics session continues across normal
shutdowns,
+ but whenever statistics are reset, like during a crash or upgrade, a new
time period
+ of statistics collection commences i.e. a new statistics session.
+ The query_id value generation is linked to statistics session to
emphasize the fact
+ that whenever statistics are reset,the query_id for the same queries will
also change.

time period when? Shouldn't that be time period during which.
Also, doesn't a new statistics session start when a stats reset is
invoked by the user? The bit after commences appears correct (to me,
not a native by any means) but seems also a bit strange.

I have tried to rephrase this. Hopefully less confusing

A statistics session refers to the time period when statement
statistics are gathered by
statistics collector. A statistics session persists across normal
shutdowns. Whenever statistics are reset like during a crash or upgrade, a new
statistics session starts. The query_id value generation is linked to
statistics session to
emphasize that whenever statistics are reset,the query_id for the same
queries will also change.

regards
Sameer

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5774365.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-11 Thread Peter Eisentraut

On 10/10/13 6:20 AM, Sameer Thakur wrote:
 Please find patch attached which adds documentation for session_start
 and introduced fields and corrects documentation for queryid to be
 query_id. session_start remains in the view as agreed.

Please fix the tabs in the SGML files.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread pilum . 70


The V7-Patch applied cleanly and I got no issues in my first tests.

The change from column session_start to a function seems very reasonable for 
me.


Concernig the usability, I would like to suggest a minor change, that massively 
increases the usefulness of the patch for beginners, who often use this view as 
a first approach to optimize index structure.


The history of this tool contains a first version without normalization.
This wasn't useful enough except for prepared queries.
The actual version has normalized queries, so calls get
summarized to get a glimpse of bad queries.
But the drawback of this approach is impossibility to use
explain analyze without further substitutions.

The identification patch provides the possibility to summarize calls
by query_id, so that the normalized query string itself is no longer needed to 
be exposed in the view for everyone.


I suggest to add a parameter to recover the possibility to
display real queries. The following very minor change
(based on V7) exposes the first real query getting this
query_id if normalization of the exposed string ist deactivated (The actual 
behaviour is the default). This new option is not always the best starting 
point to discover index shortfalls, but a huge gain for beginners because it 
serves the needs

in more than 90% of the normal use cases.

What do you think?

Arne

P.S. This message is resent, because it is stalled two days, so sorry for a 
possible duplicate

Date:   Mon Oct 7 17:54:08 2013 +

Switch to disable normalized query strings

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c

index e50dfba..6cc9244 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -234,7 +234,7 @@ static int  pgss_max;   /* max # 
statements to track */
 static int pgss_track; /* tracking level */
 static bool pgss_track_utility; /* whether to track utility commands */
 static bool pgss_save; /* whether to save stats across 
shutdown */
-
+static bool pgss_normalize; /* whether to normalize the query 
representation shown in the view (otherwise show the first query executed with 
this query_id) */

 #define pgss_enabled() \
(pgss_track == PGSS_TRACK_ALL || \
@@ -356,6 +356,17 @@ _PG_init(void)
 NULL,
 NULL);

+   DefineCustomBoolVariable(pg_stat_statements.normalize,
+Selects whether the view column contains the query 
strings in a normalized form.,
+  NULL,
+  pgss_normalize,
+  true,
+  PGC_SUSET,
+  0,
+  NULL,
+  NULL,
+  NULL);
+
EmitWarningsOnPlaceholders(pg_stat_statements);

/*
@@ -1084,9 +1095,9 @@ pgss_store(const char *query, uint32 queryId,

query_len = strlen(query);

-   if (jstate)
+   if (jstate  pgss_normalize)
{
-   /* Normalize the string if enabled */
+   /* Normalize the string is not NULL and normalized 
query strings are enabled */
norm_query = generate_normalized_query(jstate, query,

   query_len,

   key.encoding);






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Sameer Thakur

Please find patch attached which adds documentation for session_start
and introduced fields and corrects documentation for queryid to be
query_id. session_start remains in the view as agreed.
regards
Sameer


pg_stat_statements-identification-v8.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Peter Geoghegan

On Thu, Oct 10, 2013 at 3:11 AM,  pilum...@uni-muenster.de wrote:
 But the drawback of this approach is impossibility to use
 explain analyze without further substitutions.

You can fairly easily disable the swapping of constants with '?'
symbols, so that the query text stored would match the full originally
executed query. Why would you want to, though? There could be many
actual plans whose costs are aggregated as one query. Seeing one of
them is not necessarily useful at all, and could be misleading.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread pilum . 70


Thx for your reply.

On Thu, 10 Oct 2013, Peter Geoghegan wrote:


On Thu, Oct 10, 2013 at 3:11 AM,  pilum...@uni-muenster.de wrote:

But the drawback of this approach is impossibility to use
explain analyze without further substitutions.


You can fairly easily disable the swapping of constants with '?'
symbols, so that the query text stored would match the full originally


I thought I did ?! I introduced an additional user parameter
to disable the normalization in the patch shown in my last mail.

If there is already an easier way in the actual distribution, 
i simply missed ist. 
Where is this behaviour documented?




executed query. Why would you want to, though? There could be many
actual plans whose costs are aggregated as one query. Seeing one of
them is not necessarily useful at all, and could be misleading.



Yeah, (thinking of for example parameter ranges) I mentioned that, I think,
but in the majority of cases beginners can easily conclude missing indices
executing explain analyze, because the queries, that are aggregated
and displayed under one query_id have very similar (or simply the same) query 
plans.

It's also only an option disabled by default: You can simply do nothing, if you 
don't like it :-)

VlG

Arne Scheffer


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Fujii Masao

On Thu, Oct 10, 2013 at 7:20 PM, Sameer Thakur samthaku...@gmail.com wrote:
 Please find patch attached which adds documentation for session_start
 and introduced fields and corrects documentation for queryid to be
 query_id. session_start remains in the view as agreed.

Thanks for updating the document!

I'm not clear about the use case of 'session_start' and 'introduced' yet.
Could you elaborate it? Maybe we should add that explanation into
the document?

In my test, I found that pg_stat_statements.total_time always indicates a zero.
I guess that the patch might handle pg_stat_statements.total_time wrongly.

+values[i++] = DatumGetTimestamp(
+instr_get_timestamptz(pgss-session_start));
+values[i++] = DatumGetTimestamp(
+instr_get_timestamptz(entry-introduced));

These should be executed only when detected_version = PGSS_TUP_LATEST?

+  entrystructfieldsession_start/structfield/entry
+  entrytypetext/type/entry
+  entry/entry
+  entryStart time of a statistics session./entry
+ /row
+
+ row
+  entrystructfieldintroduced/structfield/entry
+  entrytypetext/type/entry

The data type of these columns is not text.

+instr_time  session_start;
+uint64private_stat_session_key;

it's better to add the comments explaining these fields.

+microsec = INSTR_TIME_GET_MICROSEC(t) -
+((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * USECS_PER_DAY);

HAVE_INT64_TIMESTAMP should be considered here.

+if (log_cannot_read)
+ereport(LOG,
+(errcode_for_file_access(),
+ errmsg(could not read pg_stat_statement file \%s\: %m,
+PGSS_DUMP_FILE)));

Is it better to use WARNING instead of LOG as the log level here?

+/*
+ * The role calling this function is unable to see
+ * sensitive aspects of this tuple.
+ *
+ * Nullify everything except the insufficient privilege
+ * message for this entry
+ */
+memset(nulls, 1, sizeof nulls);
+
+nulls[i]  = 0;
+values[i] = CStringGetTextDatum(insufficient privilege);

Why do we need to hide *all* the fields in pg_stat_statements, when
it's accessed by improper user? This is a big change of pg_stat_statements
behavior, and it might break the compatibility.

 This is not directly related to the patch itself, but why does the
  queryid
 need to be calculated based on also the statistics session?

   If we expose hash value of query tree, without using statistics session,
 it is possible that users might make wrong assumption that this value
 remains stable across version upgrades, when in reality it depends on
 whether the version has make changes to query tree internals. So to
 explicitly ensure that users do not make this wrong assumption, hash value
 generation use statistics session id, which is newly created under
 situations described above.

So, ISTM that we can use, for example, the version number to calculate
the query_id. Right?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Daniel Farina

On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 10, 2013 at 7:20 PM, Sameer Thakur samthaku...@gmail.com wrote:
 Please find patch attached which adds documentation for session_start
 and introduced fields and corrects documentation for queryid to be
 query_id. session_start remains in the view as agreed.

 Thanks for updating the document!

 I'm not clear about the use case of 'session_start' and 'introduced' yet.
 Could you elaborate it? Maybe we should add that explanation into
 the document?

Probably.

The idea is that without those fields it's, to wit, impossible to
explain non-monotonic movement in metrics of those queries for precise
use in tools that insist on monotonicity of the fields, which are all
cumulative to date I think.

pg_stat_statements_reset() or crashing loses the session, so one
expects ncalls may decrease.  In addition, a query that is bouncing
in and out of the hash table will have its statistics be lost, so its
statistics will also decrease.  This can cause un-resolvable artifact
in analysis tools.

The two fields allow for precise understanding of when the statistics
for a given query_id are continuous since the last time a program
inspected it.

 In my test, I found that pg_stat_statements.total_time always indicates a 
 zero.
 I guess that the patch might handle pg_stat_statements.total_time wrongly.

 +values[i++] = DatumGetTimestamp(
 +instr_get_timestamptz(pgss-session_start));
 +values[i++] = DatumGetTimestamp(
 +instr_get_timestamptz(entry-introduced));

 These should be executed only when detected_version = PGSS_TUP_LATEST?

Yes. Oversight.

 +  entrystructfieldsession_start/structfield/entry
 +  entrytypetext/type/entry
 +  entry/entry
 +  entryStart time of a statistics session./entry
 + /row
 +
 + row
 +  entrystructfieldintroduced/structfield/entry
 +  entrytypetext/type/entry

 The data type of these columns is not text.

Oops

 +instr_time  session_start;
 +uint64private_stat_session_key;

 it's better to add the comments explaining these fields.

Yeah.


 +microsec = INSTR_TIME_GET_MICROSEC(t) -
 +((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * USECS_PER_DAY);

 HAVE_INT64_TIMESTAMP should be considered here.

That's not a bad idea.

 +if (log_cannot_read)
 +ereport(LOG,
 +(errcode_for_file_access(),
 + errmsg(could not read pg_stat_statement file \%s\: %m,
 +PGSS_DUMP_FILE)));

 Is it better to use WARNING instead of LOG as the log level here?

Is this new code?  Why did I add it again?  Seems reasonable
though to call it a WARNING.

 +/*
 + * The role calling this function is unable to see
 + * sensitive aspects of this tuple.
 + *
 + * Nullify everything except the insufficient privilege
 + * message for this entry
 + */
 +memset(nulls, 1, sizeof nulls);
 +
 +nulls[i]  = 0;
 +values[i] = CStringGetTextDatum(insufficient privilege);

 Why do we need to hide *all* the fields in pg_stat_statements, when
 it's accessed by improper user? This is a big change of pg_stat_statements
 behavior, and it might break the compatibility.

It seems like an information leak that grows more major if query_id is
exposed and, at any point, one can determine the query_id for a query
text.

 This is not directly related to the patch itself, but why does the
  queryid
 need to be calculated based on also the statistics session?

   If we expose hash value of query tree, without using statistics session,
 it is possible that users might make wrong assumption that this value
 remains stable across version upgrades, when in reality it depends on
 whether the version has make changes to query tree internals. So to
 explicitly ensure that users do not make this wrong assumption, hash value
 generation use statistics session id, which is newly created under
 situations described above.

 So, ISTM that we can use, for example, the version number to calculate
 the query_id. Right?

That does seem like it may be a more reasonable stability vs. once per
statistics session, because then use case with standbys work somewhat
better.

That said, the general concern was accidental contracts being assumed
by consuming code, so this approach is tuned for making the query_id
as unstable as possible while still being useful: stable, but only in
one statistics gathering section.

I did not raise the objection about over-aggressive contracts being
exposed although I think the concern has merit...but given the use
case involving standbys, I am for now charitable to increasing the
stability to the level you indicate: a guaranteed break every point
release.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Fujii Masao

On Fri, Oct 11, 2013 at 12:19 AM, Daniel Farina dan...@heroku.com wrote:
 Probably.

 The idea is that without those fields it's, to wit, impossible to
 explain non-monotonic movement in metrics of those queries for precise
 use in tools that insist on monotonicity of the fields, which are all
 cumulative to date I think.

 pg_stat_statements_reset() or crashing loses the session, so one
 expects ncalls may decrease.  In addition, a query that is bouncing
 in and out of the hash table will have its statistics be lost, so its
 statistics will also decrease.  This can cause un-resolvable artifact
 in analysis tools.

 The two fields allow for precise understanding of when the statistics
 for a given query_id are continuous since the last time a program
 inspected it.

Thanks for elaborating them! Since 'introduced' is reset even when
the statistics is reset, maybe we can do without 'session_start' for
that purpose?

 +/*
 + * The role calling this function is unable to see
 + * sensitive aspects of this tuple.
 + *
 + * Nullify everything except the insufficient privilege
 + * message for this entry
 + */
 +memset(nulls, 1, sizeof nulls);
 +
 +nulls[i]  = 0;
 +values[i] = CStringGetTextDatum(insufficient privilege);

 Why do we need to hide *all* the fields in pg_stat_statements, when
 it's accessed by improper user? This is a big change of pg_stat_statements
 behavior, and it might break the compatibility.

 It seems like an information leak that grows more major if query_id is
 exposed and, at any point, one can determine the query_id for a query
 text.

So hiding only query and query_id is enough?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Alvaro Herrera

Daniel Farina escribió:
 On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao masao.fu...@gmail.com wrote:

  In my test, I found that pg_stat_statements.total_time always indicates a 
  zero.
  I guess that the patch might handle pg_stat_statements.total_time wrongly.
 
  +values[i++] = DatumGetTimestamp(
  +instr_get_timestamptz(pgss-session_start));
  +values[i++] = DatumGetTimestamp(
  +instr_get_timestamptz(entry-introduced));
 
  These should be executed only when detected_version = PGSS_TUP_LATEST?
 
 Yes. Oversight.

Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2?  I mean, if
later somebody patches pgss to have a PGSS_TUP_V2 or V1_3 and that
becomes latest, somebody running the current definition with the updated
.so will no longer get these values.  Or is the intention that
PGSS_TUP_LATEST will never be updated again, and future versions will
get PGSS_TUP_REALLY_LATEST and PGSS_TUP_REALLY_LATEST_HONEST and so on?
I mean, surely we can always come up with new symbols to use, but it
seems hard to follow.

There's one other use of PGSS_TUP_LATEST here which I think should also
be changed to = SOME_SPECIFIC_VERSION:

+   if (detected_version = PGSS_TUP_LATEST)
+   {
+   uint64 qid = pgss-private_stat_session_key;
+ 
+   qid ^= (uint64) entry-query_id;
+   qid ^= ((uint64) entry-query_id)  32;
+ 
+   values[i++] = Int64GetDatumFast(qid);
+   }


This paragraph reads a bit strange to me:

+  A statistics session is the time period when statistics are gathered by 
statistics collector 
+  without being reset. So a statistics session continues across normal 
shutdowns, 
+  but whenever statistics are reset, like during a crash or upgrade, a new 
time period 
+  of statistics collection commences i.e. a new statistics session. 
+  The query_id value generation is linked to statistics session to emphasize 
the fact 
+  that whenever statistics are reset,the query_id for the same queries will 
also change.

time period when?  Shouldn't that be time period during which.
Also, doesn't a new statistics session start when a stats reset is
invoked by the user?  The bit after commences appears correct (to me,
not a native by any means) but seems also a bit strange.

I just noticed that the table describing the output indicates that
session_start and introduced are of type text, but the SQL defines
timestamptz.


The instr_time thingy being used for these times maps to
QueryPerformanceCounter() on Windows, and I'm not sure how useful this
is for long-term time tracking; see
http://stackoverflow.com/questions/5296990/queryperformancecounter-and-overflows#5297163
for instance.  I think it'd be better to use TimestampTz and
GetCurrentTimestamp() for this.

Just noticed that you changed the timer to struct Instrumentation.  Not
really sure about that change.  Since you seem to be using only the
start time and counter, wouldn't it be better to store only those?
Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc().

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Fujii Masao

On Fri, Oct 11, 2013 at 2:49 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Daniel Farina escribió:
 On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao masao.fu...@gmail.com wrote:

  In my test, I found that pg_stat_statements.total_time always indicates a 
  zero.
  I guess that the patch might handle pg_stat_statements.total_time wrongly.
 
  +values[i++] = DatumGetTimestamp(
  +instr_get_timestamptz(pgss-session_start));
  +values[i++] = DatumGetTimestamp(
  +instr_get_timestamptz(entry-introduced));
 
  These should be executed only when detected_version = PGSS_TUP_LATEST?

 Yes. Oversight.

 Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2?

I was just thinking the same thing. Agreed.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Daniel Farina

On Thu, Oct 10, 2013 at 10:12 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Oct 11, 2013 at 12:19 AM, Daniel Farina dan...@heroku.com wrote:
 Probably.

 The idea is that without those fields it's, to wit, impossible to
 explain non-monotonic movement in metrics of those queries for precise
 use in tools that insist on monotonicity of the fields, which are all
 cumulative to date I think.

 pg_stat_statements_reset() or crashing loses the session, so one
 expects ncalls may decrease.  In addition, a query that is bouncing
 in and out of the hash table will have its statistics be lost, so its
 statistics will also decrease.  This can cause un-resolvable artifact
 in analysis tools.

 The two fields allow for precise understanding of when the statistics
 for a given query_id are continuous since the last time a program
 inspected it.

 Thanks for elaborating them! Since 'introduced' is reset even when
 the statistics is reset, maybe we can do without 'session_start' for
 that purpose?

There is a small loss of precision.

The original reason was that if one wanted to know, given two samples
of pg_stat_statements, when the query_id is going to remain stable for
a given query.

For example:

If the session changes on account of a reset, then it is known that
all query_ids one's external program is tracking are no longer going
to be updated, and the program can take account for the fact that the
same query text is going to have a new query_id.

Given the new idea of mixing in the point release number:

If the code is changed to instead mixing in the full version of
Postgres, as you suggested recently, this can probably be removed.
The caveat there is then the client is going to have to do something a
bit weird like ask for the point release and perhaps even compile
options of Postgres to know when the query_id is going to have a
different value for a given query text.  But, maybe this is an
acceptable compromise to remove one field.

 +/*
 + * The role calling this function is unable to see
 + * sensitive aspects of this tuple.
 + *
 + * Nullify everything except the insufficient privilege
 + * message for this entry
 + */
 +memset(nulls, 1, sizeof nulls);
 +
 +nulls[i]  = 0;
 +values[i] = CStringGetTextDatum(insufficient privilege);

 Why do we need to hide *all* the fields in pg_stat_statements, when
 it's accessed by improper user? This is a big change of pg_stat_statements
 behavior, and it might break the compatibility.

 It seems like an information leak that grows more major if query_id is
 exposed and, at any point, one can determine the query_id for a query
 text.

 So hiding only query and query_id is enough?

Yeah, I think so.  The other fields feel a bit weird to leave hanging
around as well, so I thought I'd just fix it in one shot, but doing
the minimum or only applying this idea to new fields is safer.  It's
shorter to hit all the fields, though, which is why I was tempted to
do that.

Perhaps not a good economy for potential subtle breaks in the next version.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Daniel Farina

On Thu, Oct 10, 2013 at 10:49 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Daniel Farina escribió:
 On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao masao.fu...@gmail.com wrote:

  In my test, I found that pg_stat_statements.total_time always indicates a 
  zero.
  I guess that the patch might handle pg_stat_statements.total_time wrongly.
 
  +values[i++] = DatumGetTimestamp(
  +instr_get_timestamptz(pgss-session_start));
  +values[i++] = DatumGetTimestamp(
  +instr_get_timestamptz(entry-introduced));
 
  These should be executed only when detected_version = PGSS_TUP_LATEST?

 Yes. Oversight.

 Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2?  I mean, if
 later somebody patches pgss to have a PGSS_TUP_V2 or V1_3 and that
 becomes latest, somebody running the current definition with the updated
 .so will no longer get these values.  Or is the intention that
 PGSS_TUP_LATEST will never be updated again, and future versions will
 get PGSS_TUP_REALLY_LATEST and PGSS_TUP_REALLY_LATEST_HONEST and so on?
 I mean, surely we can always come up with new symbols to use, but it
 seems hard to follow.

 There's one other use of PGSS_TUP_LATEST here which I think should also
 be changed to = SOME_SPECIFIC_VERSION:

 +   if (detected_version = PGSS_TUP_LATEST)
 +   {
 +   uint64 qid = pgss-private_stat_session_key;
 +
 +   qid ^= (uint64) entry-query_id;
 +   qid ^= ((uint64) entry-query_id)  32;
 +
 +   values[i++] = Int64GetDatumFast(qid);
 +   }

I made some confusing mistakes here in using the newer tuple
versioning.  Let me try to explain what the motivation was:

I was adding new fields to pg_stat_statements and was afraid that it'd
be hard to get a very clear view that all the set fields are in
alignment and there were no accidental overruns, with the problem
getting more convoluted as more versions are added.

The idea of PGSS_TUP_LATEST is useful to catch common programmer error
by testing some invariants, and it'd be nice not to have to thrash the
invariant checking code every release, which would probably defeat the
point of such oops prevention code.  But, the fact that I went on to
rampantly do questionable things PGSS_TUP_LATEST is a bad sign.

By example, here are the two uses that have served me very well:

/* Perform version detection */
if (tupdesc-natts == PG_STAT_STATEMENTS_COLS_V1_0)
detected_version = PGSS_TUP_V1_0;
else if (tupdesc-natts == PG_STAT_STATEMENTS_COLS_V1_1)
detected_version = PGSS_TUP_V1_1;
else if (tupdesc-natts == PG_STAT_STATEMENTS_COLS)
detected_version = PGSS_TUP_LATEST;
else
{
/*
 * Couldn't identify the tuple format.  Raise error.
 *
 * This is an exceptional case that may only happen in bizarre
 * situations, since it is thought that every released version
 * of pg_stat_statements has a matching schema.
 */
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg(pg_stat_statements schema is not supported 
by its installed binary)));
}

And

#ifdef USE_ASSERT_CHECKING
/* Check that every column appears to be filled */
switch (detected_version)
{
case PGSS_TUP_V1_0:
Assert(i == PG_STAT_STATEMENTS_COLS_V1_0);
break;
case PGSS_TUP_V1_1:
Assert(i == PG_STAT_STATEMENTS_COLS_V1_1);
break;
case PGSS_TUP_LATEST:
Assert(i == PG_STAT_STATEMENTS_COLS);
break;
default:
Assert(false);
}
#endif

Given that, perhaps a way to fix this is something like this patch-fragment:


 {
  PGSS_TUP_V1_0 = 1,
  PGSS_TUP_V1_1,
- PGSS_TUP_LATEST
+ PGSS_TUP_V1_2
 } pgssTupVersion;

+#define PGSS_TUP_LATEST PGSS_TUP_V1_2


This way when a programmer is making new tuple versions, they are much
more likely to see the immediate need to teach those two sites about
the new tuple size.  But, the fact that one does not get the
invariants updated in a completely compulsory way may push the value
of this checking under water.

I'd be sad to see it go, it has saved me a lot of effort when
returning to the code after a while.

What do you think?

 The instr_time thingy being used for these times maps to
 QueryPerformanceCounter() on Windows, and I'm not sure how useful this
 is for long-term time tracking; see
 http://stackoverflow.com/questions/5296990/queryperformancecounter-and-overflows#5297163
 for instance.  I think it'd be better to use TimestampTz and
 GetCurrentTimestamp() for this.

Ah. I was going to do that, but thought it'd be nice to merely push
down the already-extant Instr struct in most cases, as to get the
'start' time stored there for free.  But if it can't

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Alvaro Herrera

Daniel Farina escribió:

 Given that, perhaps a way to fix this is something like this patch-fragment:
 
 
  {
   PGSS_TUP_V1_0 = 1,
   PGSS_TUP_V1_1,
 - PGSS_TUP_LATEST
 + PGSS_TUP_V1_2
  } pgssTupVersion;
 
 +#define PGSS_TUP_LATEST PGSS_TUP_V1_2
 

This sounds good.  I have seen other places that have the LATEST
definition as part of the enum, assigning the previous value to it.  I'm
not really sure which of these is harder to miss when updating the code.
I'm happy with either.

Make sure to use the PGSS_TUP_V1_2 in the two places mentioned, in any
case.

  Just noticed that you changed the timer to struct Instrumentation.  Not
  really sure about that change.  Since you seem to be using only the
  start time and counter, wouldn't it be better to store only those?
  Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc().
 
 Yeah, I was unsure about that too.
 
 The motivation was that I need one more piece of information in
 pgss_store (the absolute start time).  I was going to widen the
 argument list, but it was looking pretty long, so instead I was
 thinking it'd be more concise to push the entire, typically extant
 Instrumentation struct pointer down.

Would it work to define your own struct to pass around?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread pilum . 70


The V7-Patch applied cleanly and I got no issues in my first tests.

The change from column session_start to a function seems very reasonable for me.

Concernig the usability, I would like to suggest a minor change, 
that massively increases the usefulness of the patch for beginners, 
who often use this view as a first approach to optimize index structure.


The history of this tool contains a first version without normalization.
This wasn't useful enough except for prepared queries.
The actual version has normalized queries, so calls get
summarized to get a glimpse of bad queries.
But the drawback of this approach is impossibility to use
explain analyze without further substitutions.

The identification patch provides the possibility to summarize calls
by query_id, so that the normalized query string itself is no longer 
needed to be exposed in the view for everyone.


I suggest to add a parameter to recover the possibility to
display real queries. The following very minor change
(based on V7) exposes the first real query getting this
query_id if normalization of the exposed string ist deactivated 
(The actual behaviour is the default). 
This new option is not always the best starting point to discover index shortfalls, 
but a huge gain for beginners because it serves the needs

in more than 90% of the normal use cases.

What do you think?

Arne

Date:   Mon Oct 7 17:54:08 2013 +

Switch to disable normalized query strings

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index e50dfba..6cc9244 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -234,7 +234,7 @@ static int  pgss_max;   /* max # 
statements to track */
 static int pgss_track; /* tracking level */
 static bool pgss_track_utility; /* whether to track utility commands */
 static bool pgss_save; /* whether to save stats across 
shutdown */
-
+static bool pgss_normalize; /* whether to normalize the query 
representation shown in the view (otherwise show the first query executed with 
this query_id) */

 #define pgss_enabled() \
(pgss_track == PGSS_TRACK_ALL || \
@@ -356,6 +356,17 @@ _PG_init(void)
 NULL,
 NULL);

+   DefineCustomBoolVariable(pg_stat_statements.normalize,
+Selects whether the view column contains the query 
strings in a normalized form.,
+  NULL,
+  pgss_normalize,
+  true,
+  PGC_SUSET,
+  0,
+  NULL,
+  NULL,
+  NULL);
+
EmitWarningsOnPlaceholders(pg_stat_statements);

/*
@@ -1084,9 +1095,9 @@ pgss_store(const char *query, uint32 queryId,

query_len = strlen(query);

-   if (jstate)
+   if (jstate  pgss_normalize)
{
-   /* Normalize the string if enabled */
+   /* Normalize the string is not NULL and normalized 
query strings are enabled */
norm_query = generate_normalized_query(jstate, query,

   query_len,

   key.encoding);






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Daniel Farina

On Thu, Oct 10, 2013 at 1:30 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
  Just noticed that you changed the timer to struct Instrumentation.  Not
  really sure about that change.  Since you seem to be using only the
  start time and counter, wouldn't it be better to store only those?
  Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc().

 Yeah, I was unsure about that too.

 The motivation was that I need one more piece of information in
 pgss_store (the absolute start time).  I was going to widen the
 argument list, but it was looking pretty long, so instead I was
 thinking it'd be more concise to push the entire, typically extant
 Instrumentation struct pointer down.

 Would it work to define your own struct to pass around?

Absolutely, I was just hoping to spare the code another abstraction if
another was a precise superset.

Looks like that isn't going to happen, though, so a pgss-oriented
struct is likely what will have to be.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-05 Thread Daniel Farina

On Fri, Oct 4, 2013 at 7:22 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 3, 2013 at 5:11 PM, Sameer Thakur samthaku...@gmail.com wrote:
 On Wed, Oct 2, 2013 at 6:40 PM, Sameer Thakur samthaku...@gmail.com wrote:

 Looks pretty good. Do you want to package up the patch with your
 change and do the honors and re-submit it? Thanks for helping out so
 much!
 Sure, will do. Need to add a bit of documentation explaining
 statistics session as well.
 I did some more basic testing around pg_stat_statements.max, now that
 we have clarity from Peter about its value being legitimate below 100.
 Seems to work fine, with pg_stat_statements =4 the max unique queries
 in the view are 4. On the 5th query the view holds just the latest
 unique query discarding the previous 4. Fujii had reported a
 segmentation fault in this scenario.
 Thank you for the patch

 Please find the patch attached

 Thanks for the patch! Here are the review comments:

 +OUT session_start timestamptz,
 +OUT introduced timestamptz,

 The patch exposes these columns in pg_stat_statements view.
 These should be documented.

 I don't think that session_start should be exposed in every
 rows in pg_stat_statements because it's updated only when
 all statistics are reset, i.e., session_start of all entries
 in pg_stat_statements indicate the same.

Dunno.  I agree it'd be less query traffic and noise.  Maybe hidden
behind a UDF?  I thought stats_reset on pg_database may be prior
art, but realized that the statistics there differ depending on stats
resets per database (maybe a name change of 'session' to 'stats_reset'
would be useful to avoid too much in-cohesion, though).

I didn't want to bloat the taxonomy of exposed API/symbols too much
for pg_stat_statements, but perhaps in this instance it is reasonable.
 Also, isn't the interlock with the result set is perhaps more
precise/fine-grained with the current solution?  Yet, that's awfully
corner-casey.

I'm on the fence because the simplicity and precision of the current
regime for aggregation tools is nice, but avoiding the noise for
inspecting humans in the common case is also nice.  I don't see a
reason right now to go strongly either way, so if you feel moderately
strongly that the repetitive column should be stripped then I am happy
to relent there and help out.  Let me know of your detailed thoughts
(or modify the patch) with your idea.

 +OUT query_id int8,

 query_id or queryid? I like the latter. Also the document
 uses the latter.

Not much opinion.  I prefer expending the _ character because most
compound words in postgres performance statistics catalogs seem to use
an underscore, though.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-05 Thread Sameer Thakur

 Please find the patch attached

 Thanks for the patch! Here are the review comments:

 +OUT session_start timestamptz,
 +OUT introduced timestamptz,

 The patch exposes these columns in pg_stat_statements view.
 These should be documented.
 Yes, will add to documentation.
 I don't think that session_start should be exposed in every
 rows in pg_stat_statements because it's updated only when
 all statistics are reset, i.e., session_start of all entries
 in pg_stat_statements indicate the same.
I understand. Will remove session_start from view and expose it via a
function pg_stat_statements_current_session_start() ?
 +OUT query_id int8,
 query_id or queryid? I like the latter. Also the document
 uses the latter.

 Will change to queryid

Thank you
Sameer




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5773448.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-05 Thread Sameer Thakur

On Sat, Oct 5, 2013 at 1:38 PM, Daniel Farina dan...@heroku.com wrote:
 On Fri, Oct 4, 2013 at 7:22 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 3, 2013 at 5:11 PM, Sameer Thakur samthaku...@gmail.com wrote:
 On Wed, Oct 2, 2013 at 6:40 PM, Sameer Thakur samthaku...@gmail.com wrote:

 Looks pretty good. Do you want to package up the patch with your
 change and do the honors and re-submit it? Thanks for helping out so
 much!
 Sure, will do. Need to add a bit of documentation explaining
 statistics session as well.
 I did some more basic testing around pg_stat_statements.max, now that
 we have clarity from Peter about its value being legitimate below 100.
 Seems to work fine, with pg_stat_statements =4 the max unique queries
 in the view are 4. On the 5th query the view holds just the latest
 unique query discarding the previous 4. Fujii had reported a
 segmentation fault in this scenario.
 Thank you for the patch

 Please find the patch attached

 Thanks for the patch! Here are the review comments:

 +OUT session_start timestamptz,
 +OUT introduced timestamptz,

 The patch exposes these columns in pg_stat_statements view.
 These should be documented.

 I don't think that session_start should be exposed in every
 rows in pg_stat_statements because it's updated only when
 all statistics are reset, i.e., session_start of all entries
 in pg_stat_statements indicate the same.

 Dunno.  I agree it'd be less query traffic and noise.  Maybe hidden
 behind a UDF?  I thought stats_reset on pg_database may be prior
 art, but realized that the statistics there differ depending on stats
 resets per database (maybe a name change of 'session' to 'stats_reset'
 would be useful to avoid too much in-cohesion, though).

 I didn't want to bloat the taxonomy of exposed API/symbols too much
 for pg_stat_statements, but perhaps in this instance it is reasonable.
  Also, isn't the interlock with the result set is perhaps more
 precise/fine-grained with the current solution?  Yet, that's awfully
 corner-casey.

 I'm on the fence because the simplicity and precision of the current
 regime for aggregation tools is nice, but avoiding the noise for
 inspecting humans in the common case is also nice.  I don't see a
 reason right now to go strongly either way, so if you feel moderately
 strongly that the repetitive column should be stripped then I am happy
 to relent there and help out.  Let me know of your detailed thoughts
 (or modify the patch) with your idea.


Thinking a bit more, if its just a question of a repeating value we
have the same situation for userid and dbid. They would be the same
for a user across multiple queries in same statistics session. So
userid,dbid and session_start do repeat across rows. Not sure why
treatment for session_start be different. I also checked pg_stat_plans
@ https://github.com/2ndQuadrant/pg_stat_plans and did not see any
special treatment given for a particular field in terms of access i.e.
the granularity of api wrt pg_stat_statements has been maintained.

regards
Sameer


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-04 Thread Fujii Masao

On Thu, Oct 3, 2013 at 5:11 PM, Sameer Thakur samthaku...@gmail.com wrote:
 On Wed, Oct 2, 2013 at 6:40 PM, Sameer Thakur samthaku...@gmail.com wrote:

 Looks pretty good. Do you want to package up the patch with your
 change and do the honors and re-submit it? Thanks for helping out so
 much!
 Sure, will do. Need to add a bit of documentation explaining
 statistics session as well.
 I did some more basic testing around pg_stat_statements.max, now that
 we have clarity from Peter about its value being legitimate below 100.
 Seems to work fine, with pg_stat_statements =4 the max unique queries
 in the view are 4. On the 5th query the view holds just the latest
 unique query discarding the previous 4. Fujii had reported a
 segmentation fault in this scenario.
 Thank you for the patch

 Please find the patch attached

Thanks for the patch! Here are the review comments:

+OUT session_start timestamptz,
+OUT introduced timestamptz,

The patch exposes these columns in pg_stat_statements view.
These should be documented.

I don't think that session_start should be exposed in every
rows in pg_stat_statements because it's updated only when
all statistics are reset, i.e., session_start of all entries
in pg_stat_statements indicate the same.

+OUT query_id int8,

query_id or queryid? I like the latter. Also the document
uses the latter.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-03 Thread Sameer Thakur

On Wed, Oct 2, 2013 at 6:40 PM, Sameer Thakur samthaku...@gmail.com wrote:

 Looks pretty good. Do you want to package up the patch with your
 change and do the honors and re-submit it? Thanks for helping out so
 much!
 Sure, will do. Need to add a bit of documentation explaining
 statistics session as well.
 I did some more basic testing around pg_stat_statements.max, now that
 we have clarity from Peter about its value being legitimate below 100.
 Seems to work fine, with pg_stat_statements =4 the max unique queries
 in the view are 4. On the 5th query the view holds just the latest
 unique query discarding the previous 4. Fujii had reported a
 segmentation fault in this scenario.
 Thank you for the patch

Please find the patch attached

regards
Sameer


pg_stat_statements-identification-v7.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-02 Thread Sameer Thakur


 Looks pretty good. Do you want to package up the patch with your
 change and do the honors and re-submit it? Thanks for helping out so
 much!
Sure, will do. Need to add a bit of documentation explaining
statistics session as well.
I did some more basic testing around pg_stat_statements.max, now that
we have clarity from Peter about its value being legitimate below 100.
Seems to work fine, with pg_stat_statements =4 the max unique queries
in the view are 4. On the 5th query the view holds just the latest
unique query discarding the previous 4. Fujii had reported a
segmentation fault in this scenario.
Thank you for the patch
regards
Sameer


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-01 Thread Sameer Thakur

On Tue, Oct 1, 2013 at 12:48 AM, Daniel Farina-5 [via PostgreSQL]
ml-node+s1045698n5772887...@n5.nabble.com wrote:

 On Sep 30, 2013 4:39 AM, Sameer Thakur [hidden email] wrote:

  Also, for onlookers, I have changed this patch around to do the
  date-oriented stuff but want to look it over before stapling it up and
  sending it.  If one cannot wait, one can look at
  https://github.com/fdr/postgres/tree/queryid.  The squashed-version of
  that history contains a reasonable patch I think, but a re-read often
  finds something for me and I've only just completed it yesterday.
 

 I did the following
 1. Forked from fdr/postgres
 2. cloned branch queryid
 3. squashed
 22899c802571a57cfaf0df38e6c5c366b5430c74
 d813096e29049667151a49fc5e5cf3d6bbe55702
 picked
 be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5
 4. usual make/make install/create extension pg_stat_statements.
 (pg_stat_statements.max=100).
 5. select * from pg_stat_statements_reset(), select * from
 pgbench_tellers.
 result below:

 userid | dbid  |  session_start   |introduced
|   query   |  query_id
   | calls | total_time |
  rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied |
 shared_blks_written | local_blks_hit | local_blks_read |
 local_blks_dirtied | local_blks_written | t
 emp_blks_read | temp_blks_written | blk_read_time | blk_write_time

 +---+--+---+---+-+---++

 --+-+--+-+-++-+++--
 --+---+---+
  10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
 05:30:00+05:30 | select * from pg_stat_statements_reset(); |
 2531907647060518039 | 1 |  0 |
 1 |   0 |0 |   0 |
   0 |  0 |   0 |
 0 |  0 |
 0 | 0 | 0 |  0
  10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
 05:30:00+05:30 | select * from pgbench_tellers ;   |
 7580333025384382649 | 1 |  0 |
10 |   1 |0 |   0 |
   0 |  0 |   0 |
 0 |  0 |
 0 | 0 | 0 |  0
 (2 rows)


 I understand session_start and verified that it changes with each
 database restart to reflect current time.

 It should only restart when the statistics file cannot be loaded.

 I am not sure why introduced

 keeps showing the same 1970-01-01 05:30:00+05:30 value. I thought it
 reflected the (most recent) time query statements statistics is added
 to hashtable. Is this a bug?
 Will continue to test and try and understand the code.

 Yes, a bug.  There are a few calls to pgss store and I must be submitting a
 zero value for the introduction time in one of those cases.

 Heh, I thought that was fixed, but maybe I broke something.  Like I said;
 preliminary. At the earliest I can look at this Wednesday, but feel free to
 amend and resubmit including my changes if you feel inclined and get to it
 first.

In pg_stat_statements.c line 1440
changed
if (instr == NULL)
to
if (instr == NULL || INSTR_TIME_IS_ZERO(instr-starttime))

This seemed to do the trick. I will continue to test some more.
regards
Sameer




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5772930.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-01 Thread Sameer Thakur

On Tue, Oct 1, 2013 at 12:48 AM, Daniel Farina-5 [via PostgreSQL]
ml-node+s1045698n5772887...@n5.nabble.com wrote:

 On Sep 30, 2013 4:39 AM, Sameer Thakur [hidden email] wrote:

  Also, for onlookers, I have changed this patch around to do the
  date-oriented stuff but want to look it over before stapling it up and
  sending it.  If one cannot wait, one can look at
  https://github.com/fdr/postgres/tree/queryid.  The squashed-version of
  that history contains a reasonable patch I think, but a re-read often
  finds something for me and I've only just completed it yesterday.
 

 I did the following
 1. Forked from fdr/postgres
 2. cloned branch queryid
 3. squashed
 22899c802571a57cfaf0df38e6c5c366b5430c74
 d813096e29049667151a49fc5e5cf3d6bbe55702
 picked
 be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5
 4. usual make/make install/create extension pg_stat_statements.
 (pg_stat_statements.max=100).
 5. select * from pg_stat_statements_reset(), select * from
 pgbench_tellers.
 result below:

 userid | dbid  |  session_start   |introduced
|   query   |  query_id
   | calls | total_time |
  rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied |
 shared_blks_written | local_blks_hit | local_blks_read |
 local_blks_dirtied | local_blks_written | t
 emp_blks_read | temp_blks_written | blk_read_time | blk_write_time

 +---+--+---+---+-+---++

 --+-+--+-+-++-+++--
 --+---+---+
  10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
 05:30:00+05:30 | select * from pg_stat_statements_reset(); |
 2531907647060518039 | 1 |  0 |
 1 |   0 |0 |   0 |
   0 |  0 |   0 |
 0 |  0 |
 0 | 0 | 0 |  0
  10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
 05:30:00+05:30 | select * from pgbench_tellers ;   |
 7580333025384382649 | 1 |  0 |
10 |   1 |0 |   0 |
   0 |  0 |   0 |
 0 |  0 |
 0 | 0 | 0 |  0
 (2 rows)


 I understand session_start and verified that it changes with each
 database restart to reflect current time.

 It should only restart when the statistics file cannot be loaded.

This seems to work fine.
1. Started the instance
2. Executed pg_stat_statements_reset(), select * from
pgbench_history,select* from pgbench_tellers. Got the following in
pg_stat_statements view
userid | dbid  |  session_start   |
introduced|   query   |
   query_id   | calls | tota
l_time | rows | shared_blks_hit | shared_blks_read |
shared_blks_dirtied | shared_blks_written | local_blks_hit |
local_blks_read | local_blks_dirtied | local_blks_wri
tten | temp_blks_read | temp_blks_written | blk_read_time | blk_write_time
+---+--+--+---+--+---+-
---+--+-+--+-+-++-++---
-++---+---+
 10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
17:43:43.724301+05:30 | select * from pgbench_history;|
-165801328395488047 | 1 |
 0 |0 |   0 |0 |
0 |   0 |  0 |   0 |
   0 |
   0 |  0 | 0 | 0 |  0
 10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
17:43:37.379785+05:30 | select * from pgbench_tellers;|
8376871363863945311 | 1 |
 0 |   10 |   0 |1 |
0 |   0 |  0 |   0 |
   0 |
   0 |  0 | 0 | 0 |  0
 10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
17:43:26.667178+05:30 | select * from pg_stat_statements_reset(); |
-1061018443194138344 | 1 |
 0 |1 |   0 |0 |
0 |   0 |  0 |   0 |
   0 |
   0 |  0 | 0 | 0 |  0
(3 rows)

Then restarted the server and saw pg_stat_statements view again.

userid | dbid  |  session_start   |

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-01 Thread Daniel Farina

On Tue, Oct 1, 2013 at 5:32 AM, Sameer Thakur samthaku...@gmail.com wrote:
 On Tue, Oct 1, 2013 at 12:48 AM, Daniel Farina-5 [via PostgreSQL]
 [hidden email] wrote:


 On Sep 30, 2013 4:39 AM, Sameer Thakur [hidden email] wrote:

  Also, for onlookers, I have changed this patch around to do the
  date-oriented stuff but want to look it over before stapling it up and
  sending it.  If one cannot wait, one can look at
  https://github.com/fdr/postgres/tree/queryid.  The squashed-version of
  that history contains a reasonable patch I think, but a re-read often
  finds something for me and I've only just completed it yesterday.
 

 I did the following
 1. Forked from fdr/postgres
 2. cloned branch queryid
 3. squashed
 22899c802571a57cfaf0df38e6c5c366b5430c74
 d813096e29049667151a49fc5e5cf3d6bbe55702
 picked
 be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5
 4. usual make/make install/create extension pg_stat_statements.
 (pg_stat_statements.max=100).
 5. select * from pg_stat_statements_reset(), select * from
 pgbench_tellers.
 result below:

 userid | dbid  |  session_start   |introduced
|   query   |  query_id
   | calls | total_time |
  rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied |
 shared_blks_written | local_blks_hit | local_blks_read |
 local_blks_dirtied | local_blks_written | t
 emp_blks_read | temp_blks_written | blk_read_time | blk_write_time


 +---+--+---+---+-+---++


 --+-+--+-+-++-+++--
 --+---+---+
  10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
 05:30:00+05:30 | select * from pg_stat_statements_reset(); |
 2531907647060518039 | 1 |  0 |
 1 |   0 |0 |   0 |
   0 |  0 |   0 |
 0 |  0 |
 0 | 0 | 0 |  0
  10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
 05:30:00+05:30 | select * from pgbench_tellers ;   |
 7580333025384382649 | 1 |  0 |
10 |   1 |0 |   0 |
   0 |  0 |   0 |
 0 |  0 |
 0 | 0 | 0 |  0
 (2 rows)


 I understand session_start and verified that it changes with each
 database restart to reflect current time.

 It should only restart when the statistics file cannot be loaded.

 This seems to work fine.
 1. Started the instance
 2. Executed pg_stat_statements_reset(), select * from
 pgbench_history,select* from pgbench_tellers. Got the following in
 pg_stat_statements view
 userid | dbid  |  session_start   |
 introduced|   query   |
query_id   | calls | tota
 l_time | rows | shared_blks_hit | shared_blks_read |
 shared_blks_dirtied | shared_blks_written | local_blks_hit |
 local_blks_read | local_blks_dirtied | local_blks_wri
 tten | temp_blks_read | temp_blks_written | blk_read_time | blk_write_time
 +---+--+--+---+--+---+-
 ---+--+-+--+-+-++-++---
 -++---+---+
  10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
 17:43:43.724301+05:30 | select * from pgbench_history;|
 -165801328395488047 | 1 |
  0 |0 |   0 |0 |
 0 |   0 |  0 |   0 |
0 |
0 |  0 | 0 | 0 |  0
  10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
 17:43:37.379785+05:30 | select * from pgbench_tellers;|
 8376871363863945311 | 1 |
  0 |   10 |   0 |1 |
 0 |   0 |  0 |   0 |
0 |
0 |  0 | 0 | 0 |  0
  10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
 17:43:26.667178+05:30 | select * from pg_stat_statements_reset(); |
 -1061018443194138344 | 1 |
  0 |1 |   0 |0 |
 0 |   0 |  0 |   0 |
0 |
0 |  0 | 0 | 0 |  0
 (3 rows)

 Then restarted the server and

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-01 Thread Peter Geoghegan

On Sun, Sep 29, 2013 at 10:58 PM, Daniel Farina dan...@fdr.io wrote:
 I remember hacking that out for testing sake.

 I can only justify it as a foot-gun to prevent someone from being
 stuck restarting the database to get a reasonable number in there.
 Let's CC Peter; maybe he can remember some thoughts about that.

On reflection I think it was entirely to do with the testing of the
patch.  I don't think that the minimum value of pg_stat_statements has
any real significance.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-30 Thread Daniel Farina

On Sun, Sep 29, 2013 at 10:25 AM, Sameer Thakur samthaku...@gmail.com wrote:
 Yes i was. Just saw a warning when pg_stat_statements is loaded that
 valid values for pg_stat_statements.max is between 100 and 2147483647.
 Not sure why though.

I remember hacking that out for testing sake.

I can only justify it as a foot-gun to prevent someone from being
stuck restarting the database to get a reasonable number in there.
Let's CC Peter; maybe he can remember some thoughts about that.

Also, for onlookers, I have changed this patch around to do the
date-oriented stuff but want to look it over before stapling it up and
sending it.  If one cannot wait, one can look at
https://github.com/fdr/postgres/tree/queryid.  The squashed-version of
that history contains a reasonable patch I think, but a re-read often
finds something for me and I've only just completed it yesterday.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-30 Thread Sameer Thakur

 Also, for onlookers, I have changed this patch around to do the
 date-oriented stuff but want to look it over before stapling it up and
 sending it.  If one cannot wait, one can look at
 https://github.com/fdr/postgres/tree/queryid.  The squashed-version of
 that history contains a reasonable patch I think, but a re-read often
 finds something for me and I've only just completed it yesterday.


I did the following
1. Forked from fdr/postgres
2. cloned branch queryid
3. squashed
22899c802571a57cfaf0df38e6c5c366b5430c74
d813096e29049667151a49fc5e5cf3d6bbe55702
picked
be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5
4. usual make/make install/create extension pg_stat_statements.
(pg_stat_statements.max=100).
5. select * from pg_stat_statements_reset(), select * from pgbench_tellers.
result below:

userid | dbid  |  session_start   |introduced
   |   query   |  query_id
  | calls | total_time |
 rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied |
shared_blks_written | local_blks_hit | local_blks_read |
local_blks_dirtied | local_blks_written | t
emp_blks_read | temp_blks_written | blk_read_time | blk_write_time
+---+--+---+---+-+---++
--+-+--+-+-++-+++--
--+---+---+
 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
05:30:00+05:30 | select * from pg_stat_statements_reset(); |
2531907647060518039 | 1 |  0 |
1 |   0 |0 |   0 |
  0 |  0 |   0 |
0 |  0 |
0 | 0 | 0 |  0
 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
05:30:00+05:30 | select * from pgbench_tellers ;   |
7580333025384382649 | 1 |  0 |
   10 |   1 |0 |   0 |
  0 |  0 |   0 |
0 |  0 |
0 | 0 | 0 |  0
(2 rows)


I understand session_start and verified that it changes with each
database restart to reflect current time. I am not sure why introduced
keeps showing the same 1970-01-01 05:30:00+05:30 value. I thought it
reflected the (most recent) time query statements statistics is added
to hashtable. Is this a bug?
Will continue to test and try and understand the code.

regards
Sameer




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5772841.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-30 Thread Daniel Farina

On Sep 30, 2013 4:39 AM, Sameer Thakur samthaku...@gmail.com wrote:

  Also, for onlookers, I have changed this patch around to do the
  date-oriented stuff but want to look it over before stapling it up and
  sending it.  If one cannot wait, one can look at
  https://github.com/fdr/postgres/tree/queryid.  The squashed-version of
  that history contains a reasonable patch I think, but a re-read often
  finds something for me and I've only just completed it yesterday.
 

 I did the following
 1. Forked from fdr/postgres
 2. cloned branch queryid
 3. squashed
 22899c802571a57cfaf0df38e6c5c366b5430c74
 d813096e29049667151a49fc5e5cf3d6bbe55702
 picked
 be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5
 4. usual make/make install/create extension pg_stat_statements.
 (pg_stat_statements.max=100).
 5. select * from pg_stat_statements_reset(), select * from
pgbench_tellers.
 result below:

 userid | dbid  |  session_start   |introduced
|   query   |  query_id
   | calls | total_time |
  rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied |
 shared_blks_written | local_blks_hit | local_blks_read |
 local_blks_dirtied | local_blks_written | t
 emp_blks_read | temp_blks_written | blk_read_time | blk_write_time

+---+--+---+---+-+---++


--+-+--+-+-++-+++--

 --+---+---+
  10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
 05:30:00+05:30 | select * from pg_stat_statements_reset(); |
 2531907647060518039 | 1 |  0 |
 1 |   0 |0 |   0 |
   0 |  0 |   0 |
 0 |  0 |
 0 | 0 | 0 |  0
  10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
 05:30:00+05:30 | select * from pgbench_tellers ;   |
 7580333025384382649 | 1 |  0 |
10 |   1 |0 |   0 |
   0 |  0 |   0 |
 0 |  0 |
 0 | 0 | 0 |  0
 (2 rows)


 I understand session_start and verified that it changes with each
 database restart to reflect current time.

It should only restart when the statistics file cannot be loaded.

I am not sure why introduced
 keeps showing the same 1970-01-01 05:30:00+05:30 value. I thought it
 reflected the (most recent) time query statements statistics is added
 to hashtable. Is this a bug?
 Will continue to test and try and understand the code.

Yes, a bug.  There are a few calls to pgss store and I must be submitting a
zero value for the introduction time in one of those cases.

Heh, I thought that was fixed, but maybe I broke something.  Like I said;
preliminary. At the earliest I can look at this Wednesday, but feel free to
amend and resubmit including my changes if you feel inclined and get to it
first.

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-29 Thread Sameer Thakur

On Mon, Sep 23, 2013 at 1:26 PM, samthakur74 samthaku...@gmail.com wrote:

 I forgot about removal of the relevant SGML, amended here in v6.

 Thank you for this!
 i did a quick test with following steps:
 1. Applied v6 patch
 2. make and make install on pg_stat_statements;
 3. Restarted Postgres with pg_stat_statements loaded with
 pg_stat_statements.max = 4
 4. Dropped and created extension pg_stat_statements.

 Executed following:
 select * from pg_stat_statements_reset();
 select * from pgbench_branches ;
 select * from pgbench_history ;
  select * from pgbench_tellers ;
  select * from pgbench_accounts;

 I expected 4 rows in pg_stat_statements view for each of 4 queries
 above. But i saw just 2 rows.

 select * from pg_stat_statements;

 userid | dbid  | stat_session_id |  query  |
 quer
 y_id   | calls | total_time |  rows  | shared_blks_hit |
 shared_blks_read |
 shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read
 | l
 ocal_blks_dirtied | local_blks_written | temp_blks_read | temp_blks_written
 | bl
 k_read_time | blk_write_time
 +---+-+-+---
 ---+---+++-+--+-
 +-++-+--
 --+++---+---
 +
  10 | 12900 |21595345 | select * from pgbench_accounts; |
 -803800319
 3522943111 | 1 |108.176 | 10 |1640 |
 0 |
   0 |   0 |  0 |   0
 |
 0 |  0 |  0 | 0
 |
   0 |  0
  10 | 12900 |21595345 | select * from pgbench_tellers ; |
 -149722997
 7134331757 | 1 |  0.227 | 10 |   1 |
 0 |
   0 |   0 |  0 |   0
 |
 0 |  0 |  0 | 0
 |
   0 |  0
 (2 rows)


 Am i doing something wrong?

Yes i was. Just saw a warning when pg_stat_statements is loaded that
valid values for pg_stat_statements.max is between 100 and 2147483647.
Not sure why though.
regards
Sameer


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-23 Thread samthakur74


 I forgot about removal of the relevant SGML, amended here in v6.

Thank you for this!
i did a quick test with following steps:
1. Applied v6 patch
2. make and make install on pg_stat_statements;
3. Restarted Postgres with pg_stat_statements loaded with
pg_stat_statements.max = 4
4. Dropped and created extension pg_stat_statements.

Executed following:
select * from pg_stat_statements_reset();
select * from pgbench_branches ;
select * from pgbench_history ;
 select * from pgbench_tellers ;
 select * from pgbench_accounts;

I expected 4 rows in pg_stat_statements view for each of 4 queries
above. But i saw just 2 rows.

select * from pg_stat_statements;

userid | dbid  | stat_session_id |  query  |   quer
y_id   | calls | total_time |  rows  | shared_blks_hit | shared_blks_read |
shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | l
ocal_blks_dirtied | local_blks_written | temp_blks_read | temp_blks_written | bl
k_read_time | blk_write_time
+---+-+-+---
---+---+++-+--+-
+-++-+--
--+++---+---
+
 10 | 12900 |21595345 | select * from pgbench_accounts; | -803800319
3522943111 | 1 |108.176 | 10 |1640 |0 |
  0 |   0 |  0 |   0 |
0 |  0 |  0 | 0 |
  0 |  0
 10 | 12900 |21595345 | select * from pgbench_tellers ; | -149722997
7134331757 | 1 |  0.227 | 10 |   1 |0 |
  0 |   0 |  0 |   0 |
0 |  0 |  0 | 0 |
  0 |  0
(2 rows)


Am i doing something wrong?

regards
Sameer




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771960.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-23 Thread Alvaro Herrera

Daniel Farina escribió:
 On Fri, Sep 20, 2013 at 1:11 AM, Daniel Farina dan...@fdr.io wrote:
  I think the n-call underestimation propagation may not be quite precise for
  various detailed reasons (having to do with 'sticky' queries) and to make it
  precise is probably more work than it's worth.  And, on more reflection, I'm
  also having a hard time imaging people intuiting that value usefully.  So,
  here's a version removing that.
 
 I forgot about removal of the relevant SGML, amended here in v6.

Nice.

You need to remove the --1.1.sql entry (the file itself and the DATA
entry in Makefile).

Also, I think it would be good to have a common fooRandom() routine,
instead of having a second copy of PostmasterRandom.  Might I suggest
putting it in a new file under src/common/.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-20 Thread samthakur74

On Thu, Sep 19, 2013 at 11:32 AM, Fujii Masao-2 [via PostgreSQL]
ml-node+s1045698n5771565...@n5.nabble.com wrote:

On Thu, Sep 19, 2013 at 2:25 PM, samthakur74 [hidden
email]http://user/SendEmail.jtp?type=nodenode=5771565i=0
wrote:

I got the segmentation fault when I tested the case where the
least-executed
query statistics is discarded, i.e., when I executed different queries
more
than
pg_stat_statements.max times. I guess that the patch might have a bug.
Thanks, will try to fix it.

pg_stat_statements--1.1.sql should be removed.
Yes will do that

+ entrystructfieldqueryid/structfield/entry
+ entrytypebigint/type/entry
+ entry/entry
+ entryUnique value of each representative statement for the
current statistics session.
+ This value will change for each new statistics
session./entry

What does statistics session mean?
The time period when statistics are gathered by statistics collector
without being reset. So the statistics session continues across normal
shutdowns, but in case of abnormal situations like crashes, format
upgrades
or statistics being reset for any other reason, a new time period of
statistics collection starts i.e. a new statistics session. The queryid
value generation is linked to statistics session so emphasize the fact
that
in case of crashes,format upgrades or any situation of statistics
reset, the
queryid for the same queries will also change.

I'm afraid that this behavior narrows down the use case of queryid very
much.
For example, since the queryid of the same query would not be the same in
the master and the standby servers, we cannot associate those two
statistics
by using the queryid. The queryid changes through the crash recovery, so
we cannot associate the query statistics generated before the crash with
that
generated after the crash recovery even if the query is the same.

Yes, these are limitations in this approach. The other approaches
suggested included
1. Expose query id hash value as is, in the view, but document the fact
that it will be unstable between releases
2. Expose query id hash value via an undocumented function and let more
expert users decided if they want to use it.

The approach of using statistics session id to generate queryid is a
compromise between not exposing it at all and exposing it without warning
the users of unstable hash value from query tree between releases.

This is not directly related to the patch itself, but why does the
queryid
need to be calculated based on also the statistics session?

If we expose hash value of query tree, without using statistics session,
it is possible that users might make wrong assumption that this value
remains stable across version upgrades, when in reality it depends on
whether the version has make changes to query tree internals. So to
explicitly ensure that users do not make this wrong assumption, hash value
generation use statistics session id, which is newly created under
situations described above.

Will update documentation
clearly explain the term statistics session in this context

Yep, that's helpful!

Regards,
Sameer

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771701.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-20 Thread Daniel Farina

On Fri, Sep 20, 2013 at 1:11 AM, Daniel Farina dan...@fdr.io wrote:
 I think the n-call underestimation propagation may not be quite precise for
 various detailed reasons (having to do with 'sticky' queries) and to make it
 precise is probably more work than it's worth.  And, on more reflection, I'm
 also having a hard time imaging people intuiting that value usefully.  So,
 here's a version removing that.

I forgot about removal of the relevant SGML, amended here in v6.


pg_stat_statements-identification-v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-19 Thread Fujii Masao

On Thu, Sep 19, 2013 at 2:25 PM, samthakur74 samthaku...@gmail.com wrote:
I got the segmentation fault when I tested the case where the
 least-executed
query statistics is discarded, i.e., when I executed different queries more
 than
pg_stat_statements.max times. I guess that the patch might have a bug.
 Thanks, will try to fix it.

 pg_stat_statements--1.1.sql should be removed.
 Yes will do that



 +  entrystructfieldqueryid/structfield/entry
 +  entrytypebigint/type/entry
 +  entry/entry
 +  entryUnique value of each representative statement for the
 current statistics session.
 +   This value will change for each new statistics session./entry

 What does statistics session mean?
 The time period when statistics are gathered by statistics collector
 without being reset. So the statistics session continues across normal
 shutdowns, but in case of abnormal situations like crashes, format upgrades
 or statistics being reset for any other reason, a new time period of
 statistics collection starts i.e. a new statistics session. The queryid
 value generation is linked to statistics session so emphasize the fact that
 in case of crashes,format upgrades or any situation of statistics reset, the
 queryid for the same queries will also change.

I'm afraid that this behavior narrows down the use case of queryid very much.
For example, since the queryid of the same query would not be the same in
the master and the standby servers, we cannot associate those two statistics
by using the queryid. The queryid changes through the crash recovery, so
we cannot associate the query statistics generated before the crash with that
generated after the crash recovery even if the query is the same.

This is not directly related to the patch itself, but why does the queryid
need to be calculated based on also the statistics session?

 Will update documentation
 clearly explain the term statistics session in this context

Yep, that's helpful!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-18 Thread Fujii Masao

On Wed, Sep 18, 2013 at 2:41 PM, Sameer Thakur samthaku...@gmail.com wrote:
 You seem to have forgotten to include the pg_stat_statements--1.2.sql
 and pg_stat_statements--1.1--1.2.sql in the patch.
 Sorry again. Please find updated patch attached.

 I did not add pg_stat_statements--1.2.sql. I have added that now and updated
 the patch again.

Thanks!

I got the segmentation fault when I tested the case where the least-executed
query statistics is discarded, i.e., when I executed different queries more than
pg_stat_statements.max times. I guess that the patch might have a bug.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-18 Thread Fujii Masao

On Thu, Sep 19, 2013 at 2:41 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Sep 18, 2013 at 2:41 PM, Sameer Thakur samthaku...@gmail.com wrote:
 You seem to have forgotten to include the pg_stat_statements--1.2.sql
 and pg_stat_statements--1.1--1.2.sql in the patch.
 Sorry again. Please find updated patch attached.

 I did not add pg_stat_statements--1.2.sql. I have added that now and updated
 the patch again.

pg_stat_statements--1.1.sql should be removed.

+  entrystructfieldqueryid/structfield/entry
+  entrytypebigint/type/entry
+  entry/entry
+  entryUnique value of each representative statement for the
current statistics session.
+   This value will change for each new statistics session./entry

What does statistics session mean?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-18 Thread samthakur74

I got the segmentation fault when I tested the case where the
least-executed
query statistics is discarded, i.e., when I executed different queries
more than
pg_stat_statements.max times. I guess that the patch might have a bug.
Thanks, will try to fix it.

pg_stat_statements--1.1.sql should be removed.
Yes will do that

What does statistics session mean?
The time period when statistics are gathered by statistics collector
without being reset. So the statistics session continues across normal
shutdowns, but in case of abnormal situations like crashes, format upgrades
or statistics being reset for any other reason, a new time period of
statistics collection starts i.e. a new statistics session. The queryid
value generation is linked to statistics session so emphasize the fact that
in case of crashes,format upgrades or any situation of statistics reset,
the queryid for the same queries will also change. Will update
documentation clearly explain the term statistics session in this context

regards
Sameer

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771562.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-17 Thread Fujii Masao

On Sun, Sep 15, 2013 at 3:54 PM, samthakur74 samthaku...@gmail.com wrote:


 You have added this email to the commit fest, but it contains no patch.

 Please add the email with the actual patch.

  I hope its attached now!

You seem to have forgotten to include the pg_stat_statements--1.2.sql
and pg_stat_statements--1.1--1.2.sql in the patch.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-17 Thread samthakur74

You seem to have forgotten to include the pg_stat_statements--1.2.sql
and pg_stat_statements--1.1--1.2.sql in the patch.


 Sorry again. Please find updated patch attached.



 http://www.postgresql.org/mailpref/pgsql-hackers

 NAMLhttp://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




On Tue, Sep 17, 2013 at 5:47 PM, Fujii Masao-2 [via PostgreSQL] 
ml-node+s1045698n5771213...@n5.nabble.com wrote:

 On Sun, Sep 15, 2013 at 3:54 PM, samthakur74 [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=5771213i=0
 wrote:
 
 
  You have added this email to the commit fest, but it contains no
 patch.
 
  Please add the email with the actual patch.
 
   I hope its attached now!

 You seem to have forgotten to include the pg_stat_statements--1.2.sql
 and pg_stat_statements--1.1--1.2.sql in the patch.

 Regards,

 --
 Fujii Masao


 --
 Sent via pgsql-hackers mailing list ([hidden 
 email]http://user/SendEmail.jtp?type=nodenode=5771213i=1)

 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771213.html
  To unsubscribe from pg_stat_statements: calls under-estimation
 propagation, click 
 herehttp://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=5738128code=c2FtdGhha3VyNzRAZ21haWwuY29tfDU3MzgxMjh8ODM4MzYxMTcy
 .
 NAMLhttp://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



pg_stat_statements-identification-v4.patch.gz (9K) 
http://postgresql.1045698.n5.nabble.com/attachment/5771248/0/pg_stat_statements-identification-v4.patch.gz




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771248.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-17 Thread Fujii Masao

On Tue, Sep 17, 2013 at 10:48 PM, samthakur74 samthaku...@gmail.com wrote:




 You seem to have forgotten to include the pg_stat_statements--1.2.sql
 and pg_stat_statements--1.1--1.2.sql in the patch.


 Sorry again. Please find updated patch attached.

pg_stat_statements--1.2.sql is missing. Could you confirm that the patch
includes all the changes?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-17 Thread Sameer Thakur

 You seem to have forgotten to include the pg_stat_statements--1.2.sql
 and pg_stat_statements--1.1--1.2.sql in the patch.
 Sorry again. Please find updated patch attached.

I did not add pg_stat_statements--1.2.sql. I have added that now and
updated the patch again.

The patch attached should contain following file changes
patching file contrib/pg_stat_statements/Makefile
patching file contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
patching file contrib/pg_stat_statements/pg_stat_statements--1.2.sql
patching file contrib/pg_stat_statements/pg_stat_statements.c
patching file contrib/pg_stat_statements/pg_stat_statements.control
patching file doc/src/sgml/pgstatstatements.sgml

regards
Sameer


pg_stat_statements-identification-v4.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-15 Thread samthakur74

 You have added this email to the commit fest, but it contains no patch.

Please add the email with the actual patch.

 I hope its attached now!

  Maybe the author should be
 given a chance to update the patches, though, because they are quite
 old.

 I did connect with Daniel and he did have some improvement ideas. I am not
sure when they could be implemented. Since we have a interest in the
current version of the patch, which needed documentation, i tried to
complete that.
Thank you,
Sameer


pg_stat_statements-identification-v4.patch.gz (8K) 
http://postgresql.1045698.n5.nabble.com/attachment/5770937/0/pg_stat_statements-identification-v4.patch.gz




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5770937.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-14 Thread samthakur74

This patch needs documentation. At a minimum, the new calls_underest
field needs to be listed in the description of the pg_stat_statements.
I have attached a version which includes documentation.
pg_stat_statements-identification-v4.patch.gz
http://postgresql.1045698.n5.nabble.com/file/n5770844/pg_stat_statements-identification-v4.patch.gz

Pardon for not following the discussion: What exactly does the
calls_underest field mean? I couldn't decipher it from the patch. What
can an admin do with the value? How does it compare with just bumping up
pg_stat_statements.max?
Paraphrasing the documentation.
There is a possibility that,for queries which are tracked intermittently,
could have their statistics silently reset.
The calls_underest field gives an indication that a query has been tracked
intermittently and consequently
have a higher probability of erroneous statistics. Queries tracked
constantly will have a zero value for
calls_underest indicating zero probability of erroneous statistics.
A greater value of calls_underest indicates greater degree of inconsistent
tracking which in
turn means greater possibility of erroneous statistics due to statistics
being reset when
query was not being tracked.

Increasing pg_stat_statements.max reduces the possibility of a query being
tracked intermittently but does not address the problem of indicating the
probability of erroneous statistics if the query is indeed being tracked
intermittently. I do not believe the admin can influence the value of
calls_underest as it is not a GUC parameter.

We have a need to see this patch committed hence the revived interest in
this thread

regards
Sameer

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5770844.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-14 Thread Peter Eisentraut

On Sat, 2013-09-14 at 03:51 -0700, samthakur74 wrote:
 We have a need to see this patch committed hence the revived interest
 in this thread 

You have added this email to the commit fest, but it contains no patch.
Please add the email with the actual patch.  Maybe the author should be
given a chance to update the patches, though, because they are quite
old.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-03-26 Thread Heikki Linnakangas


On 30.12.2012 08:31, Daniel Farina wrote:

A version implementing that is attached, except I generate an
additional 64-bit session not exposed to the client to prevent even
casual de-leaking of the session state.  That may seem absurd, until
someone writes a tool that de-xors things and relies on it and then
nobody feels inclined to break it.  It also keeps the public session
number short.

I also opted to save the underestimate since I'm adding a handful of
fixed width fields to the file format anyway.


This patch needs documentation. At a minimum, the new calls_underest 
field needs to be listed in the description of the pg_stat_statements.


Pardon for not following the discussion: What exactly does the 
calls_underest field mean? I couldn't decipher it from the patch. What 
can an admin do with the value? How does it compare with just bumping up 
pg_stat_statements.max?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2012-12-29 Thread Daniel Farina

Attached is a cumulative patch attempting to address the below.  One
can see the deltas to get there at https://github.com/fdr/postgres.git
error-prop-pg_stat_statements-v2.

On Fri, Dec 28, 2012 at 9:58 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 However, with this approach, calls_underest values might appear to the
 user in what might be considered an astonishing order. Now, I'm not
 suggesting that that's a real problem - just that they may not be the
 semantics we want, particularly as we can reasonably defer assigning a
 calls_underest until a sticky entry is unstuck, and an entry becomes
 user-visible, within pgss_store().

Fix for this as I understand it:

*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***
*** 1036,1041  pgss_store(const char *query, uint32 queryId,
--- 1036,1042 
e-counters.usage = USAGE_INIT;

e-counters.calls += 1;
+   e-counters.calls_underest = pgss-calls_max_underest;
e-counters.total_time += total_time;
e-counters.rows += rows;
e-counters.shared_blks_hit += bufusage-shared_blks_hit;
***
*** 1264,1272  entry_alloc(pgssHashKey *key, const char *query,
int query_len, bool sticky)
/* set the appropriate initial usage count */
entry-counters.usage = sticky ? pgss-cur_median_usage : 
USAGE_INIT;

-   /* propagate calls under-estimation bound */
-   entry-counters.calls_underest = pgss-calls_max_underest;
-
/* re-initialize the mutex each time ... we assume no one using 
it */
SpinLockInit(entry-mutex);
/* ... and don't forget the query text */
--- 1265,1270 

 Also, it seems like you should initialise pgss-calls_max_underest,
 within pgss_shmem_startup().

Easy enough. Somehow I wrongly thought zero-initialization was a thing
for the shmem functions.

*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***
*** 426,431  pgss_shmem_startup(void)
--- 426,432 
{
/* First time through ... */
pgss-lock = LWLockAssign();
+   pgss-calls_max_underest = 0;
pgss-query_size = pgstat_track_activity_query_size;
pgss-cur_median_usage = ASSUMED_MEDIAN_INIT;
}

 You should probably serialise the value
 to disk, and initialise it to 0 if there is no such value to begin
 with.

I prefer different approach here: just compute it while loading the
entries from disk, since the calls + underestimation can be used to
find a new pessimum underestimation global value.

*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***
*** 419,424  pgss_shmem_startup(void)
--- 419,425 
int query_size;
int buffer_size;
char   *buffer = NULL;
+   int64   calls_max_underest = 0;

if (prev_shmem_startup_hook)
prev_shmem_startup_hook();
***
*** 440,446  pgss_shmem_startup(void)
{
/* First time through ... */
pgss-lock = LWLockAssign();
!   pgss-calls_max_underest = 0;
pgss-query_size = pgstat_track_activity_query_size;
pgss-cur_median_usage = ASSUMED_MEDIAN_INIT;
}
--- 441,447 
{
/* First time through ... */
pgss-lock = LWLockAssign();
!   pgss-calls_max_underest = calls_max_underest;
pgss-query_size = pgstat_track_activity_query_size;
pgss-cur_median_usage = ASSUMED_MEDIAN_INIT;
}
***
*** 528,533  pgss_shmem_startup(void)
--- 529,545 

   temp.query_len,

   query_size - 1);

+   /*
+* Compute maxima of under-estimation over the read entries
+* for reinitializing pgss-calls_max_underest.
+*/
+   {
+   int64 cur_underest;
+   
+   cur_underest = temp.calls + temp.calls_underest;
+   calls_max_underest = Max(calls_max_underest, 
cur_underest);
+   }
+
/* make the hashtable entry (discards old entries if too many) 
*/
entry = entry_alloc(temp.key, buffer, temp.query_len, false);

***
*** 535,540  pgss_shmem_startup(void)
--- 547,559 
entry-counters = temp.counters;
}

+   /*
+* Reinitialize global under-estimation

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2012-12-29 Thread Daniel Farina

On Sat, Dec 29, 2012 at 4:21 AM, Daniel Farina drfar...@acm.org wrote:
 These were not express goals of the patch, but so long as you are
 inviting features, attached is a bonus patch that exposes the queryid
 and also the notion of a statistics session that is re-rolled
 whenever the stats file could not be read or the stats are reset, able
 to fully explain all obvious causes of retrograde motion in
 statistics.  It too is cumulative, so it includes the under-estimation
 field.  Notably, I also opted to nullify extra pg_stat_statements
 fields when they'd also show insufficient privileges (that one is
 spared from this censorship), because I feel as though a bit too much
 information leaks from pg_stat_statement's statistics to ignore,
 especially after adding the query id.  Since the common theme here is
 identifying queries, I have called it
 pg_stat_statements-identification, and it can be found in the git
 repo above under the same name (...-v1).

A small amendment that doesn't really change the spirit of the
narrative is attached.

--
fdr


pg_stat_statements-identification-v2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2012-12-29 Thread Peter Geoghegan

On 29 December 2012 12:21, Daniel Farina drfar...@acm.org wrote:
 These were not express goals of the patch, but so long as you are
 inviting features, attached is a bonus patch that exposes the queryid
 and also the notion of a statistics session that is re-rolled
 whenever the stats file could not be read or the stats are reset, able
 to fully explain all obvious causes of retrograde motion in
 statistics.  It too is cumulative, so it includes the under-estimation
 field.

Cool.

I had a thought about Tom's objection to exposing the hash value. I
would like to propose a compromise between exposing the hash value and
not doing so at all.

What if we expose the hash value without documenting it, in a way not
apparent to normal users, while letting experts willing to make an
executive decision about its stability use it? What I have in mind is
to expose the hash value from the pg_stat_statements function, and yet
to avoid exposing it within the pg_stat_statements view definition.
The existence of the hash value would not need to be documented, since
the pg_stat_statements function is an undocumented implementation
detail.

Precedent for this exists, I think - the undocumented system hash
functions are exposed via an SQL interface. Some satellite projects
rely on this (apparently the pl/proxy documentation shows the use of
hashtext(), which is a thin wrapper on hash_any(), and there is
chatter about it elsewhere). So it is already the case that people are
using hashtext(), which should not be problematic if the applications
that do so have a reasonable set of expectations about its stability
(i.e. it's not going to change in a point release, because that would
break hash indexes, but may well change across major releases). We've
already in effect promised to not break hashtext() in a point release,
just as we've already in effect promised to not break the hash values
that pg_stat_statements uses internally (to do any less would
invalidate the on-disk representation, and necessitate bumping
PGSS_FILE_HEADER to wipe the stored stats).

Thoughts?

 Notably, I also opted to nullify extra pg_stat_statements
 fields when they'd also show insufficient privileges (that one is
 spared from this censorship), because I feel as though a bit too much
 information leaks from pg_stat_statement's statistics to ignore,
 especially after adding the query id.

That seems sensible.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2012-12-29 Thread Daniel Farina

On Sat, Dec 29, 2012 at 6:37 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 29 December 2012 12:21, Daniel Farina drfar...@acm.org wrote:
 These were not express goals of the patch, but so long as you are
 inviting features, attached is a bonus patch that exposes the queryid
 and also the notion of a statistics session that is re-rolled
 whenever the stats file could not be read or the stats are reset, able
 to fully explain all obvious causes of retrograde motion in
 statistics.  It too is cumulative, so it includes the under-estimation
 field.

 Cool.

 I had a thought about Tom's objection to exposing the hash value. I
 would like to propose a compromise between exposing the hash value and
 not doing so at all.

As I recall, the gist of this objection had to do with a false sense
of stability of the hash value, and the desire to enforce the ability
to alter it.  Here's an option: xor the hash value with the
'statistics session id', so it's *known* to be unstable between
sessions.  That gets you continuity in the common case and sound
deprecation in the less-common cases (crashes, format upgrades, stat
resetting).

A change in hash function should also then necessitate changing the
stat file header.

Another more minor issue is the hard-wiring of the precision of the
id.  For that reason, I have thought it reasonable to expose this as a
string also.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2012-12-29 Thread Peter Geoghegan

On 30 December 2012 02:45, Daniel Farina dan...@heroku.com wrote:
 As I recall, the gist of this objection had to do with a false sense
 of stability of the hash value, and the desire to enforce the ability
 to alter it.  Here's an option: xor the hash value with the
 'statistics session id', so it's *known* to be unstable between
 sessions.  That gets you continuity in the common case and sound
 deprecation in the less-common cases (crashes, format upgrades, stat
 resetting).

Hmm. I like the idea, but a concern there would be that you'd
introduce additional scope for collisions in the third-party utility
building time-series data from snapshots. I currently put the
probability of a collision within pg_stat_statements as 1% in the
event of a pg_stat_statements.max of 10,000.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2012-12-29 Thread Daniel Farina

On Sat, Dec 29, 2012 at 7:12 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 30 December 2012 02:45, Daniel Farina dan...@heroku.com wrote:
 As I recall, the gist of this objection had to do with a false sense
 of stability of the hash value, and the desire to enforce the ability
 to alter it.  Here's an option: xor the hash value with the
 'statistics session id', so it's *known* to be unstable between
 sessions.  That gets you continuity in the common case and sound
 deprecation in the less-common cases (crashes, format upgrades, stat
 resetting).

 Hmm. I like the idea, but a concern there would be that you'd
 introduce additional scope for collisions in the third-party utility
 building time-series data from snapshots. I currently put the
 probability of a collision within pg_stat_statements as 1% in the
 event of a pg_stat_statements.max of 10,000.

We can use a longer session key and duplicate the queryid (effectively
padding) a couple of times to complete the XOR.  I think that makes
the cases of collisions introduced by this astronomically low, as an
increase over the base collision rate.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2012-12-29 Thread Daniel Farina

On Sat, Dec 29, 2012 at 7:16 PM, Daniel Farina dan...@heroku.com wrote:
 On Sat, Dec 29, 2012 at 7:12 PM, Peter Geoghegan pe...@2ndquadrant.com 
 wrote:
 On 30 December 2012 02:45, Daniel Farina dan...@heroku.com wrote:
 As I recall, the gist of this objection had to do with a false sense
 of stability of the hash value, and the desire to enforce the ability
 to alter it.  Here's an option: xor the hash value with the
 'statistics session id', so it's *known* to be unstable between
 sessions.  That gets you continuity in the common case and sound
 deprecation in the less-common cases (crashes, format upgrades, stat
 resetting).

 Hmm. I like the idea, but a concern there would be that you'd
 introduce additional scope for collisions in the third-party utility
 building time-series data from snapshots. I currently put the
 probability of a collision within pg_stat_statements as 1% in the
 event of a pg_stat_statements.max of 10,000.

 We can use a longer session key and duplicate the queryid (effectively
 padding) a couple of times to complete the XOR.  I think that makes
 the cases of collisions introduced by this astronomically low, as an
 increase over the base collision rate.

A version implementing that is attached, except I generate an
additional 64-bit session not exposed to the client to prevent even
casual de-leaking of the session state.  That may seem absurd, until
someone writes a tool that de-xors things and relies on it and then
nobody feels inclined to break it.  It also keeps the public session
number short.

I also opted to save the underestimate since I'm adding a handful of
fixed width fields to the file format anyway.

--
fdr


pg_stat_statements-identification-v3.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_stat_statements: calls under-estimation propagation

2012-12-28 Thread Daniel Farina

Hello,

After long delay (sorry) here's a patch implementing what was
hand-waved at in
http://archives.postgresql.org/pgsql-hackers/2012-10/msg00176.php

I am still something at a loss at how to test it besides prodding it
by hand; it seems like it's going to involve infrastructure or
introducing hooks into pg_stat_statements for the express purpose.

The patch can also be sourced from:

  https://github.com/fdr/postgres.git error-prop-pg_stat_statements

Without further ado, the cover letter taken from the top of the patch:

This tries to establish a maximum under-estimate of the number of
calls for a given pg_stat_statements entry.  That means the number of
calls to the canonical form of the query is between 'calls' and 'calls
+ calls_underest'.

This is useful to determine when accumulating statistics if a given
record is bouncing in and out of the pg_stat_statements table, having
its ncalls reset all the time, but also having calls_underest grow
very rapidly.

Records that always stay in pg_stat_statements will have a
calls_underest that do not change at all.

An interesting case is when a query that usually is called is not
called for a while, and falls out of pg_stat_statements.  The result
can be that the query with the most 'calls' can also have more
uncertainty than the query with the second most calls, which is also
exactly the truth in reality.

Unceremoniously bundled into this patch is a reduction in the minimum
table size for pg_stat_statements, from 100 to 1.  Using tiny values
is not likely to be seen in production, but makes testing the patch a
lot easier in some situations.

I will add this to the commitfest.

--
fdr


add-pg_stat_statements-calls-underestimation-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2012-12-28 Thread Peter Geoghegan

On 28 December 2012 11:43, Daniel Farina drfar...@acm.org wrote:
Without further ado, the cover letter taken from the top of the patch:

This tries to establish a maximum under-estimate of the number of
calls for a given pg_stat_statements entry. That means the number of
calls to the canonical form of the query is between 'calls' and 'calls
+ calls_underest'.

Cool.

One possible issue I see with this is that this code:

+
+ /* propagate calls under-estimation bound */
+ entry-counters.calls_underest = pgss-calls_max_underest;
+

which appears within entry_alloc(). So if the first execution of the
query results in an error during planning or (much more likely)
execution, as in the event of an integrity constraint violation,
you're going to get a dead sticky entry that no one will ever see.
Now, we currently decay sticky entries much more aggressively, so the
mere fact that we waste an entry isn't a real problem. This was
established by this commit:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d5375491f8e391224b48e4bb449995a4642183ea

However, with this approach, calls_underest values might appear to the
user in what might be considered an astonishing order. Now, I'm not
suggesting that that's a real problem - just that they may not be the
semantics we want, particularly as we can reasonably defer assigning a
calls_underest until a sticky entry is unstuck, and an entry becomes
user-visible, within pgss_store().

Also, it seems like you should initialise pgss-calls_max_underest,
within pgss_shmem_startup(). You should probably serialise the value
to disk, and initialise it to 0 if there is no such value to begin
with.

I wonder if the way that pg_stat_statements throws its hands up when
it comes to crash safety (i.e. the contents of the hash table are
completely lost) could be a concern here. In other words, a program
tasked with tracking execution costs over time and graphing
time-series data from snapshots has to take responsibility for
ensuring that there hasn't been a crash (or, indeed, a reset).

Another issue is that I don't think that what you've done here solves
the problem of uniquely identify each entry over time, in the same way
that simply exposing the hash value would. I'm concerned with the
related problem to the problem solved here - simply identifying the
entry uniquely. As we've already discussed, the query string is an
imperfect proxy for each entry, even with constants swapped with '?'
characters (and even when combined with the userid and dbid values -
it's still not the same as the hashtable key, in a way that is likely
to bother people that use pg_stat_statements for long enough).

I think you probably should have created a
PG_STAT_STATEMENTS_COLS_V1_1 macro, since that version of the module
is now legacy, like *V1_0 is in HEAD.

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

70 matches

Mail list logo