Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 2015-02-21 16:09:02 -0500, Andrew Dunstan wrote: I think all the outstanding issues are fixed in this patch. Do you plan to push this? I don't see a benefit in delaying things any further... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 22/02/15 01:59, Petr Jelinek wrote: On 21/02/15 22:09, Andrew Dunstan wrote: On 02/16/2015 09:05 PM, Petr Jelinek wrote: I found one more issue with the 1.2--1.3 upgrade script, the DROP FUNCTION pg_stat_statements(); should be DROP FUNCTION pg_stat_statements(bool); since in 1.2 the function identity has changed. I think all the outstanding issues are fixed in this patch. I think PGSS_FILE_HEADER should be also updated, otherwise it's good. I see you marked this as 'needs review', I am marking it as 'ready for committer' as it looks good to me. Just don't forget to update the PGSS_FILE_HEADER while committing (I think that one can be threated same way as catversion, ie be updated by committer and not in patch submission). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 21/02/15 22:09, Andrew Dunstan wrote: On 02/16/2015 09:05 PM, Petr Jelinek wrote: I found one more issue with the 1.2--1.3 upgrade script, the DROP FUNCTION pg_stat_statements(); should be DROP FUNCTION pg_stat_statements(bool); since in 1.2 the function identity has changed. I think all the outstanding issues are fixed in this patch. I think PGSS_FILE_HEADER should be also updated, otherwise it's good. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 02/16/2015 09:05 PM, Petr Jelinek wrote: On 17/02/15 02:57, Andrew Dunstan wrote: On 02/16/2015 08:48 PM, Petr Jelinek wrote: On 17/02/15 01:57, Peter Geoghegan wrote: On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote: We definitely want this feature, I wished to have this info many times. I would still like to see a benchmark. Average of 3 runs of read-only pgbench on my system all with pg_stat_statement activated: HEAD: 20631 SQRT: 20533 SQRTD: 20592 So using sqrtd the cost is 0.18%. I think that's acceptable. I think so too. I found one more issue with the 1.2--1.3 upgrade script, the DROP FUNCTION pg_stat_statements(); should be DROP FUNCTION pg_stat_statements(bool); since in 1.2 the function identity has changed. I think all the outstanding issues are fixed in this patch. cheers andrew diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile index 2709909..975a637 100644 --- a/contrib/pg_stat_statements/Makefile +++ b/contrib/pg_stat_statements/Makefile @@ -4,8 +4,9 @@ MODULE_big = pg_stat_statements OBJS = pg_stat_statements.o $(WIN32RES) EXTENSION = pg_stat_statements -DATA = pg_stat_statements--1.2.sql pg_stat_statements--1.1--1.2.sql \ - pg_stat_statements--1.0--1.1.sql pg_stat_statements--unpackaged--1.0.sql +DATA = pg_stat_statements--1.3.sql pg_stat_statements--1.2--1.3.sql \ + pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql \ + pg_stat_statements--unpackaged--1.0.sql PGFILEDESC = pg_stat_statements - execution statistics of SQL statements ifdef USE_PGXS diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.2--1.3.sql b/contrib/pg_stat_statements/pg_stat_statements--1.2--1.3.sql new file mode 100644 index 000..a56f151 --- /dev/null +++ b/contrib/pg_stat_statements/pg_stat_statements--1.2--1.3.sql @@ -0,0 +1,47 @@ +/* contrib/pg_stat_statements/pg_stat_statements--1.2--1.3.sql */ + +-- complain if script is sourced in psql, rather than via ALTER EXTENSION +\echo Use ALTER EXTENSION pg_stat_statements UPDATE TO '1.3' to load this file. \quit + +/* First we have to remove them from the extension */ +ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements; +ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean); + +/* Then we can drop them */ +DROP VIEW pg_stat_statements; +DROP FUNCTION pg_stat_statements(boolean); + +/* Now redefine */ +CREATE FUNCTION pg_stat_statements(IN showtext boolean, +OUT userid oid, +OUT dbid oid, +OUT queryid bigint, +OUT query text, +OUT calls int8, +OUT total_time float8, +OUT min_time float8, +OUT max_time float8, +OUT mean_time float8, +OUT stddev_time float8, +OUT rows int8, +OUT shared_blks_hit int8, +OUT shared_blks_read int8, +OUT shared_blks_dirtied int8, +OUT shared_blks_written int8, +OUT local_blks_hit int8, +OUT local_blks_read int8, +OUT local_blks_dirtied int8, +OUT local_blks_written int8, +OUT temp_blks_read int8, +OUT temp_blks_written int8, +OUT blk_read_time float8, +OUT blk_write_time float8 +) +RETURNS SETOF record +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_3' +LANGUAGE C STRICT VOLATILE; + +CREATE VIEW pg_stat_statements AS + SELECT * FROM pg_stat_statements(true); + +GRANT SELECT ON pg_stat_statements TO PUBLIC; diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql deleted file mode 100644 index 5bfa9a5..000 --- a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql +++ /dev/null @@ -1,44 +0,0 @@ -/* contrib/pg_stat_statements/pg_stat_statements--1.2.sql */ - --- complain if script is sourced in psql, rather than via CREATE EXTENSION -\echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit - --- Register functions. -CREATE FUNCTION pg_stat_statements_reset() -RETURNS void -AS 'MODULE_PATHNAME' -LANGUAGE C; - -CREATE FUNCTION pg_stat_statements(IN showtext boolean, -OUT userid oid, -OUT dbid oid, -OUT queryid bigint, -OUT query text, -OUT calls int8, -OUT total_time float8, -OUT rows int8, -OUT shared_blks_hit int8, -OUT shared_blks_read int8, -OUT shared_blks_dirtied int8, -OUT shared_blks_written int8, -OUT local_blks_hit int8, -OUT local_blks_read int8, -OUT local_blks_dirtied int8, -OUT local_blks_written int8, -OUT temp_blks_read int8, -OUT temp_blks_written int8, -OUT blk_read_time float8, -OUT blk_write_time float8 -) -RETURNS SETOF record -AS 'MODULE_PATHNAME', 'pg_stat_statements_1_2' -LANGUAGE C STRICT VOLATILE; - --- Register a view on the function for ease of use. -CREATE VIEW pg_stat_statements AS - SELECT * FROM pg_stat_statements(true); - -GRANT SELECT ON pg_stat_statements TO PUBLIC; - --- Don't want this to be available to non-superusers. -REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC; diff --git
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Wed, Feb 18, 2015 at 08:31:09PM -0700, David G. Johnston wrote: On Wed, Feb 18, 2015 at 6:50 PM, Andrew Dunstan and...@dunslane.net wrote: On 02/18/2015 08:34 PM, David Fetter wrote: On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote: On 1/20/15 6:32 PM, David G Johnston wrote: In fact, as far as the database knows, the values provided to this function do represent an entire population and such a correction would be unnecessary. I guess it boils down to whether future queries are considered part of the population or whether the population changes upon each query being run and thus we are calculating the ever-changing population variance. I think we should be calculating the population variance. Why population variance and not sample variance? In distributions where the second moment about the mean exists, it's an unbiased estimator of the variance. In this, it's different from the population variance. Because we're actually measuring the whole population, and not a sample? We're not. We're taking a sample, which is to say past measurements, and using it to make inferences about the population, which includes all queries in the future. The key incorrect word in David Fetter's statement is estimator. We are not estimating anything but rather providing descriptive statistics for a defined population. See above. Users extrapolate that the next member added to the population would be expected to conform to this statistical description without bias (though see below). We can also then define the new population by including this new member and generating new descriptive statistics (which allows for evolution to be captured in the statistics). Currently (I think) we allow the end user to kill off the entire population and build up from scratch so that while, in the short term, the ability to predict the attributes of future members is limited once the population has reached a statistically significant level new predictions will no longer be skewed by population members who attributes were defined in a older and possibly significantly different environment. In theory it would be nice to be able to give the user the ability to specify - by time or percentage - a subset of the population to leave alive. I agree that stale numbers can fuzz things in a way we don't like, but let's not creep too much feature in here. Actual time-weighted sampling would be an alternative but likely one significantly more difficult to accomplish. Right. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Thu, Feb 19, 2015 at 11:10 AM, David Fetter da...@fetter.org wrote: On Wed, Feb 18, 2015 at 08:31:09PM -0700, David G. Johnston wrote: On Wed, Feb 18, 2015 at 6:50 PM, Andrew Dunstan and...@dunslane.net wrote: On 02/18/2015 08:34 PM, David Fetter wrote: On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote: On 1/20/15 6:32 PM, David G Johnston wrote: In fact, as far as the database knows, the values provided to this function do represent an entire population and such a correction would be unnecessary. I guess it boils down to whether future queries are considered part of the population or whether the population changes upon each query being run and thus we are calculating the ever-changing population variance. I think we should be calculating the population variance. Why population variance and not sample variance? In distributions where the second moment about the mean exists, it's an unbiased estimator of the variance. In this, it's different from the population variance. Because we're actually measuring the whole population, and not a sample? We're not. We're taking a sample, which is to say past measurements, and using it to make inferences about the population, which includes all queries in the future. All past measurements does not qualify as a random sample of a population made up of all past measurements and any potential members that may be added in the future. Without the random sample aspect you throw away all pretense of avoiding bias so you might as well just call the totality of the past measurements the population, describe them using population descriptive statistics, and call it a day. For large populations it isn't going to matter anyway but for small populations the difference of one in the divisor seems like it would make the past performance look worse than it actually was. David J.
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 02/17/2015 08:21 PM, Peter Eisentraut wrote: On 1/20/15 6:32 PM, David G Johnston wrote: In fact, as far as the database knows, the values provided to this function do represent an entire population and such a correction would be unnecessary. I guess it boils down to whether future queries are considered part of the population or whether the population changes upon each query being run and thus we are calculating the ever-changing population variance. I think we should be calculating the population variance. We are clearly taking the population to be all past queries (from the last reset point). Otherwise calculating min and max wouldn't make sense. The difference is likely to be very small in any case where you actually want to examine the standard deviation, so I feel we're rather arguing about how many angels can fit on the head of a pin, but if this is the consensus I'll change it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 02/18/2015 08:34 PM, David Fetter wrote: On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote: On 1/20/15 6:32 PM, David G Johnston wrote: In fact, as far as the database knows, the values provided to this function do represent an entire population and such a correction would be unnecessary. I guess it boils down to whether future queries are considered part of the population or whether the population changes upon each query being run and thus we are calculating the ever-changing population variance. I think we should be calculating the population variance. Why population variance and not sample variance? In distributions where the second moment about the mean exists, it's an unbiased estimator of the variance. In this, it's different from the population variance. Because we're actually measuring the whole population, and not a sample? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote: On 1/20/15 6:32 PM, David G Johnston wrote: In fact, as far as the database knows, the values provided to this function do represent an entire population and such a correction would be unnecessary. I guess it boils down to whether future queries are considered part of the population or whether the population changes upon each query being run and thus we are calculating the ever-changing population variance. I think we should be calculating the population variance. Why population variance and not sample variance? In distributions where the second moment about the mean exists, it's an unbiased estimator of the variance. In this, it's different from the population variance. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 1/20/15 6:32 PM, David G Johnston wrote: In fact, as far as the database knows, the values provided to this function do represent an entire population and such a correction would be unnecessary. I guess it boils down to whether future queries are considered part of the population or whether the population changes upon each query being run and thus we are calculating the ever-changing population variance. I think we should be calculating the population variance. We are clearly taking the population to be all past queries (from the last reset point). Otherwise calculating min and max wouldn't make sense. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 17/02/15 16:12, Andres Freund wrote: On 2015-02-17 15:50:39 +0100, Petr Jelinek wrote: On 17/02/15 03:07, Petr Jelinek wrote: On 17/02/15 03:03, Andrew Dunstan wrote: On 02/16/2015 08:57 PM, Andrew Dunstan wrote: Average of 3 runs of read-only pgbench on my system all with pg_stat_statement activated: HEAD: 20631 SQRT: 20533 SQRTD: 20592 So using sqrtd the cost is 0.18%. I think that's acceptable. Actually, sqrt/sqrtd is not called in accumulating the stats, only in the reporting function. So it looks like the difference here should be noise. Maybe we need some longer runs. Yes there are variations between individual runs so it might be really just that, I can leave it running for much longer time tomorrow. Ok so I let it run for more than hour on a different system, the difference is negligible - 14461 vs 14448 TPS. I think there is bigger difference between individual runs than between the two versions... These numbers sound like you measured them without concurrency, am I right? If so, the benchmark isn't that interesting - the computation happens while a spinlock is held, and that'll mainly matter if there are many backends running at the same time. It's pgbench -j16 -c32 -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 2015-02-17 15:50:39 +0100, Petr Jelinek wrote: On 17/02/15 03:07, Petr Jelinek wrote: On 17/02/15 03:03, Andrew Dunstan wrote: On 02/16/2015 08:57 PM, Andrew Dunstan wrote: Average of 3 runs of read-only pgbench on my system all with pg_stat_statement activated: HEAD: 20631 SQRT: 20533 SQRTD: 20592 So using sqrtd the cost is 0.18%. I think that's acceptable. Actually, sqrt/sqrtd is not called in accumulating the stats, only in the reporting function. So it looks like the difference here should be noise. Maybe we need some longer runs. Yes there are variations between individual runs so it might be really just that, I can leave it running for much longer time tomorrow. Ok so I let it run for more than hour on a different system, the difference is negligible - 14461 vs 14448 TPS. I think there is bigger difference between individual runs than between the two versions... These numbers sound like you measured them without concurrency, am I right? If so, the benchmark isn't that interesting - the computation happens while a spinlock is held, and that'll mainly matter if there are many backends running at the same time. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 17/02/15 03:07, Petr Jelinek wrote: On 17/02/15 03:03, Andrew Dunstan wrote: On 02/16/2015 08:57 PM, Andrew Dunstan wrote: On 02/16/2015 08:48 PM, Petr Jelinek wrote: On 17/02/15 01:57, Peter Geoghegan wrote: On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote: We definitely want this feature, I wished to have this info many times. I would still like to see a benchmark. Average of 3 runs of read-only pgbench on my system all with pg_stat_statement activated: HEAD: 20631 SQRT: 20533 SQRTD: 20592 So using sqrtd the cost is 0.18%. I think that's acceptable. Actually, sqrt/sqrtd is not called in accumulating the stats, only in the reporting function. So it looks like the difference here should be noise. Maybe we need some longer runs. Yes there are variations between individual runs so it might be really just that, I can leave it running for much longer time tomorrow. Ok so I let it run for more than hour on a different system, the difference is negligible - 14461 vs 14448 TPS. I think there is bigger difference between individual runs than between the two versions... -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 02/16/2015 08:48 PM, Petr Jelinek wrote: On 17/02/15 01:57, Peter Geoghegan wrote: On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote: We definitely want this feature, I wished to have this info many times. I would still like to see a benchmark. Average of 3 runs of read-only pgbench on my system all with pg_stat_statement activated: HEAD: 20631 SQRT: 20533 SQRTD: 20592 So using sqrtd the cost is 0.18%. I think that's acceptable. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 21/01/15 17:32, Andrew Dunstan wrote: On 01/21/2015 11:21 AM, Arne Scheffer wrote: Why is it a bad thing to call the column stddev_samp analog to the aggregate function or make a note in the documentation, that the sample stddev is used to compute the solution? ... But I will add a note to the documentation, that seems reasonable. I agree this is worth mentioning in the doc. In any case here some review from me: We definitely want this feature, I wished to have this info many times. The patch has couple of issues though: The 1.2 to 1.3 upgrade file is named pg_stat_statements--1.2-1.3.sql and should be renamed to pg_stat_statements--1.2--1.3.sql (two dashes). There is new sqrtd function for fast sqrt calculation, which I think is a good idea as it will reduce the overhead of the additional calculation and this is not something where little loss of precision would matter too much. The problem is that the new function is actually not used anywhere in the code, I only see use of plain sqrt. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote: We definitely want this feature, I wished to have this info many times. I would still like to see a benchmark. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 17/02/15 01:57, Peter Geoghegan wrote: On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote: We definitely want this feature, I wished to have this info many times. I would still like to see a benchmark. Average of 3 runs of read-only pgbench on my system all with pg_stat_statement activated: HEAD: 20631 SQRT: 20533 SQRTD: 20592 -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 17/02/15 03:03, Andrew Dunstan wrote: On 02/16/2015 08:57 PM, Andrew Dunstan wrote: On 02/16/2015 08:48 PM, Petr Jelinek wrote: On 17/02/15 01:57, Peter Geoghegan wrote: On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote: We definitely want this feature, I wished to have this info many times. I would still like to see a benchmark. Average of 3 runs of read-only pgbench on my system all with pg_stat_statement activated: HEAD: 20631 SQRT: 20533 SQRTD: 20592 So using sqrtd the cost is 0.18%. I think that's acceptable. Actually, sqrt/sqrtd is not called in accumulating the stats, only in the reporting function. So it looks like the difference here should be noise. Maybe we need some longer runs. Yes there are variations between individual runs so it might be really just that, I can leave it running for much longer time tomorrow. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 02/16/2015 08:57 PM, Andrew Dunstan wrote: On 02/16/2015 08:48 PM, Petr Jelinek wrote: On 17/02/15 01:57, Peter Geoghegan wrote: On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote: We definitely want this feature, I wished to have this info many times. I would still like to see a benchmark. Average of 3 runs of read-only pgbench on my system all with pg_stat_statement activated: HEAD: 20631 SQRT: 20533 SQRTD: 20592 So using sqrtd the cost is 0.18%. I think that's acceptable. Actually, sqrt/sqrtd is not called in accumulating the stats, only in the reporting function. So it looks like the difference here should be noise. Maybe we need some longer runs. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 17/02/15 02:57, Andrew Dunstan wrote: On 02/16/2015 08:48 PM, Petr Jelinek wrote: On 17/02/15 01:57, Peter Geoghegan wrote: On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote: We definitely want this feature, I wished to have this info many times. I would still like to see a benchmark. Average of 3 runs of read-only pgbench on my system all with pg_stat_statement activated: HEAD: 20631 SQRT: 20533 SQRTD: 20592 So using sqrtd the cost is 0.18%. I think that's acceptable. I think so too. I found one more issue with the 1.2--1.3 upgrade script, the DROP FUNCTION pg_stat_statements(); should be DROP FUNCTION pg_stat_statements(bool); since in 1.2 the function identity has changed. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Wed, 21 Jan 2015, Andrew Dunstan wrote: On 01/21/2015 09:27 AM, Arne Scheffer wrote: Sorry, corrected second try because of copypaste mistakes: VlG-Arne Comments appreciated. Definition var_samp = Sum of squared differences /n-1 Definition stddev_samp = sqrt(var_samp) Example N=4 1.) Sum of squared differences 1_4Sum(Xi-XM4)² = 2.) adding nothing 1_4Sum(Xi-XM4)² +0 +0 +0 = 3.) nothing changed 1_4Sum(Xi-XM4)² +(-1_3Sum(Xi-XM3)²+1_3Sum(Xi-XM3)²) +(-1_2Sum(Xi-XM2)²+1_2Sum(Xi-XM2)²) +(-1_1Sum(Xi-XM1)²+1_1Sum(Xi-XM1)²) = 4.) parts reordered (1_4Sum(Xi-XM4)²-1_3Sum(Xi-XM3)²) +(1_3Sum(Xi-XM3)²-1_2Sum(Xi-XM2)²) +(1_2Sum(Xi-XM2)²-1_1Sum(Xi-XM1)²) +1_1Sum(X1-XM1)² = 5.) (X4-XM4)(X4-XM3) + (X3-XM3)(X3-XM2) + (X2-XM2)(X2-XM1) + (X1-XM1)² = 6.) XM1=X1 = There it is - The iteration part of Welfords Algorithm (in reverse order) (X4-XM4)(X4-XM3) + (X3-XM3)(X3-XM2) + (X2-XM2)(X2-X1) + 0 The missing piece is 4.) to 5.) it's algebra, look at e.g.: http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/ I have no idea what you are saying here. I'm sorry for that statistics stuff, my attempt was only to visualize in detail the mathematical reason for the iterating part of Welfords algorithm being computing the current sum of squared differences in every step - therefore it's in my opinion better to call the variable sum_of_squared_diffs (every statistician will be confused bei sum_of_variances, because: sample variance = sum_of_squared_diffs / n-1, have a look at Mr. Cooks explanation) - therefore deviding by n-1 is the unbiased estimator by definition. (have a look at Mr. Cooks explanation) - therefore I suggested (as a minor nomenclature issue) to call the column/description stdev_samp (PostgreSQL-nomenclature) / sample_ to indicate that information. (have a look at the PostgreSQL aggregate functions, it's doing that the same way) Here are comments in email to me from the author of http://www.johndcook.com/blog/standard_deviation regarding the divisor used: My code is using the unbiased form of the sample variance, dividing by n-1. I am relieved, now we are at least two persons saying that. :-) Insert into the commonly known definition Definition stddev_samp = sqrt(var_samp) from above, and it's exactly my point. Maybe I should add that in the code comments. Otherwise, I don't think we need a change. Huh? Why is it a bad thing to call the column stddev_samp analog to the aggregate function or make a note in the documentation, that the sample stddev is used to compute the solution? I really think it not a good strategy having the user to make a test or dive into the source code to determine the divisor used. E.g. David expected stdev_pop, so there is a need for documentation for cases with a small sample. VlG-Arne -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/21/2015 11:21 AM, Arne Scheffer wrote: Why is it a bad thing to call the column stddev_samp analog to the aggregate function or make a note in the documentation, that the sample stddev is used to compute the solution? I think you are making a mountain out of a molehill, frankly. These stats are not intended as anything other than a pretty indication of the shape, to see if they are significantly influenced by outliers. For any significantly large sample size the difference will be negligible. But I will add a note to the documentation, that seems reasonable. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Andrew Dunstan schrieb am 2015-01-21: On 01/21/2015 11:21 AM, Arne Scheffer wrote: Why is it a bad thing to call the column stddev_samp analog to the aggregate function or make a note in the documentation, that the sample stddev is used to compute the solution? I think you are making a mountain out of a molehill, frankly. These stats are not intended as anything other than a pretty indication of the shape, to see if they are significantly influenced by outliers. For any significantly large sample size the difference will be negligible. You're right, I maybe exaggerated the statistics part a bit. I wanted to help, because the patch is of interest for us. I will try to keep focus in the future. But I will add a note to the documentation, that seems reasonable. *happy* Thx Arne -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
David G Johnston schrieb am 2015-01-21: Andrew Dunstan wrote On 01/20/2015 01:26 PM, Arne Scheffer wrote: And a very minor aspect: The term standard deviation in your code stands for (corrected) sample standard deviation, I think, because you devide by n-1 instead of n to keep the estimator unbiased. How about mentioning the prefix sample to indicate this beiing the estimator? I don't understand. I'm following pretty exactly the calculations stated at lt;http://www.johndcook.com/blog/standard_deviation/gt; I'm not a statistician. Perhaps others who are more literate in statistics can comment on this paragraph. I'm largely in the same boat as Andrew but... I take it that Arne is referring to: http://en.wikipedia.org/wiki/Bessel's_correction Yes, it is. but the mere presence of an (n-1) divisor does not mean that is what is happening. In this particular situation I believe the (n-1) simply is a necessary part of the recurrence formula and not any attempt to correct for sampling bias when estimating a population's variance. That's wrong, it's applied in the end to the sum of squared differences and therefore per definition the corrected sample standard deviation estimator. In fact, as far as the database knows, the values provided to this function do represent an entire population and such a correction would be unnecessary. I That would probably be an exotic assumption in a working database and it is not, what is computed here! guess it boils down to whether future queries are considered part of the population or whether the population changes upon each query being run and thus we are calculating the ever-changing population variance. Yes, indeed correct. And exactly to avoid that misunderstanding, I suggested to use the sample term. To speak in Postgresql terms; applied in Andrews/Welfords algorithm is stddev_samp(le), not stddev_pop(ulation). Therefore stddev in Postgres is only kept for historical reasons, look at http://www.postgresql.org/docs/9.4/static/functions-aggregate.html Table 9-43. VlG-Arne Note point 3 in the linked Wikipedia article. David J. -- View this message in context: http://postgresql.nabble.com/Add-min-and-max-execute-statement-time-in-pg-stat-statement-tp5774989p5834805.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
I don't understand. I'm following pretty exactly the calculations stated at lt;http://www.johndcook.com/blog/standard_deviation/gt; I'm not a statistician. Perhaps others who are more literate in Maybe I'm mistaken here, but I think, the algorithm is not that complicated. I try to explain it further: Comments appreciated. Definition var_samp = Sum of squared differences /n-1 Definition stddev_samp = sqrt(var_samp) Example N=4 1.) Sum of squared differences 1_4Sum(Xi-XM4)² = 2.) adding nothing 1_4Sum(Xi-XM4)² +0 +0 +0 = 3.) nothing changed 1_4Sum(Xi-XM4)² +(-1_3Sum(Xi-XM3)²+1_3Sum(Xi-XM3)²) +(-1_2Sum(Xi-XM2)²+1_2Sum(Xi-XM3)²) +(-1_1Sum(Xi-XM2)²+1_1Sum(Xi-XM3)²) = 4.) parts reordered (1_4Sum(Xi-XM4)²-1_3Sum(Xi-XM3)²) +(1_3Sum(Xi-XM3)²-1_2Sum(Xi-XM2)²) +(1_2Sum(Xi-XM2)²-1_1Sum(Xi-XM2)²) +1_1Sum(X1-XM1)² = 5.) (X4-XM4)(X4-XM3) + (X3-XM3)(X3-XM2) + (X2-XM2)(X2-XM1) + (X1-XM1)² = 6.) XM1=X1 = There it is - The iteration part of Welfords Algorithm (in reverse order) (X4-XM4)(X4-XM3) + (X3-XM3)(X3-XM2) + (X2-XM2)(X2-X1) + 0 The missing piece is 4.) to 5.) it's algebra, look at e.g.: http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/ Thanks. Still not quite sure what to do, though :-) I guess in the end we want the answer to come up with similar results to the builtin stddev SQL function. I'll try to set up a test program, to see if we do. If you want to go this way: Maybe this is one of the very few times, you have to use a small sample ;-) VlG-Arne cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Sorry, corrected second try because of copypaste mistakes: VlG-Arne Comments appreciated. Definition var_samp = Sum of squared differences /n-1 Definition stddev_samp = sqrt(var_samp) Example N=4 1.) Sum of squared differences 1_4Sum(Xi-XM4)² = 2.) adding nothing 1_4Sum(Xi-XM4)² +0 +0 +0 = 3.) nothing changed 1_4Sum(Xi-XM4)² +(-1_3Sum(Xi-XM3)²+1_3Sum(Xi-XM3)²) +(-1_2Sum(Xi-XM2)²+1_2Sum(Xi-XM2)²) +(-1_1Sum(Xi-XM1)²+1_1Sum(Xi-XM1)²) = 4.) parts reordered (1_4Sum(Xi-XM4)²-1_3Sum(Xi-XM3)²) +(1_3Sum(Xi-XM3)²-1_2Sum(Xi-XM2)²) +(1_2Sum(Xi-XM2)²-1_1Sum(Xi-XM1)²) +1_1Sum(X1-XM1)² = 5.) (X4-XM4)(X4-XM3) + (X3-XM3)(X3-XM2) + (X2-XM2)(X2-XM1) + (X1-XM1)² = 6.) XM1=X1 = There it is - The iteration part of Welfords Algorithm (in reverse order) (X4-XM4)(X4-XM3) + (X3-XM3)(X3-XM2) + (X2-XM2)(X2-X1) + 0 The missing piece is 4.) to 5.) it's algebra, look at e.g.: http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/21/2015 09:27 AM, Arne Scheffer wrote: Sorry, corrected second try because of copypaste mistakes: VlG-Arne Comments appreciated. Definition var_samp = Sum of squared differences /n-1 Definition stddev_samp = sqrt(var_samp) Example N=4 1.) Sum of squared differences 1_4Sum(Xi-XM4)² = 2.) adding nothing 1_4Sum(Xi-XM4)² +0 +0 +0 = 3.) nothing changed 1_4Sum(Xi-XM4)² +(-1_3Sum(Xi-XM3)²+1_3Sum(Xi-XM3)²) +(-1_2Sum(Xi-XM2)²+1_2Sum(Xi-XM2)²) +(-1_1Sum(Xi-XM1)²+1_1Sum(Xi-XM1)²) = 4.) parts reordered (1_4Sum(Xi-XM4)²-1_3Sum(Xi-XM3)²) +(1_3Sum(Xi-XM3)²-1_2Sum(Xi-XM2)²) +(1_2Sum(Xi-XM2)²-1_1Sum(Xi-XM1)²) +1_1Sum(X1-XM1)² = 5.) (X4-XM4)(X4-XM3) + (X3-XM3)(X3-XM2) + (X2-XM2)(X2-XM1) + (X1-XM1)² = 6.) XM1=X1 = There it is - The iteration part of Welfords Algorithm (in reverse order) (X4-XM4)(X4-XM3) + (X3-XM3)(X3-XM2) + (X2-XM2)(X2-X1) + 0 The missing piece is 4.) to 5.) it's algebra, look at e.g.: http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/ I have no idea what you are saying here. Here are comments in email to me from the author of http://www.johndcook.com/blog/standard_deviation regarding the divisor used: My code is using the unbiased form of the sample variance, dividing by n-1. It's usually not worthwhile to make a distinction between a sample and a population because the population is often itself a sample. For example, if you could measure the height of everyone on earth at one instance, that's the entire population, but it's still a sample from all who have lived and who ever will live. Also, for large n, there's hardly any difference between 1/n and 1/(n-1). Maybe I should add that in the code comments. Otherwise, I don't think we need a change. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/20/2015 01:26 PM, Arne Scheffer wrote: Interesting patch. I did a quick review looking only into the patch file. The sum of variances variable contains the sum of squared differences instead, I think. Umm, no. It's not. e-counters.sum_var_time += (total_time - old_mean) * (total_time - e-counters.mean_time); This is not a square that's being added. old_mean is not the same as e-counters.mean_time. Since the variance is this value divided by (n - 1), AIUI, I think sum of variances isn't a bad description. I'm open to alternative suggestions. And a very minor aspect: The term standard deviation in your code stands for (corrected) sample standard deviation, I think, because you devide by n-1 instead of n to keep the estimator unbiased. How about mentioning the prefix sample to indicate this beiing the estimator? I don't understand. I'm following pretty exactly the calculations stated at http://www.johndcook.com/blog/standard_deviation/ I'm not a statistician. Perhaps others who are more literate in statistics can comment on this paragraph. And I'm sure I'm missing C specifics (again) (or it's the reduced patch file scope), but you introduce sqrtd, but sqrt is called? Good catch. Will fix. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/20/2015 06:32 PM, David G Johnston wrote: Andrew Dunstan wrote On 01/20/2015 01:26 PM, Arne Scheffer wrote: And a very minor aspect: The term standard deviation in your code stands for (corrected) sample standard deviation, I think, because you devide by n-1 instead of n to keep the estimator unbiased. How about mentioning the prefix sample to indicate this beiing the estimator? I don't understand. I'm following pretty exactly the calculations stated at lt;http://www.johndcook.com/blog/standard_deviation/gt; I'm not a statistician. Perhaps others who are more literate in statistics can comment on this paragraph. I'm largely in the same boat as Andrew but... I take it that Arne is referring to: http://en.wikipedia.org/wiki/Bessel's_correction but the mere presence of an (n-1) divisor does not mean that is what is happening. In this particular situation I believe the (n-1) simply is a necessary part of the recurrence formula and not any attempt to correct for sampling bias when estimating a population's variance. In fact, as far as the database knows, the values provided to this function do represent an entire population and such a correction would be unnecessary. I guess it boils down to whether future queries are considered part of the population or whether the population changes upon each query being run and thus we are calculating the ever-changing population variance. Note point 3 in the linked Wikipedia article. Thanks. Still not quite sure what to do, though :-) I guess in the end we want the answer to come up with similar results to the builtin stddev SQL function. I'll try to set up a test program, to see if we do. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Andrew Dunstan wrote On 01/20/2015 01:26 PM, Arne Scheffer wrote: And a very minor aspect: The term standard deviation in your code stands for (corrected) sample standard deviation, I think, because you devide by n-1 instead of n to keep the estimator unbiased. How about mentioning the prefix sample to indicate this beiing the estimator? I don't understand. I'm following pretty exactly the calculations stated at lt;http://www.johndcook.com/blog/standard_deviation/gt; I'm not a statistician. Perhaps others who are more literate in statistics can comment on this paragraph. I'm largely in the same boat as Andrew but... I take it that Arne is referring to: http://en.wikipedia.org/wiki/Bessel's_correction but the mere presence of an (n-1) divisor does not mean that is what is happening. In this particular situation I believe the (n-1) simply is a necessary part of the recurrence formula and not any attempt to correct for sampling bias when estimating a population's variance. In fact, as far as the database knows, the values provided to this function do represent an entire population and such a correction would be unnecessary. I guess it boils down to whether future queries are considered part of the population or whether the population changes upon each query being run and thus we are calculating the ever-changing population variance. Note point 3 in the linked Wikipedia article. David J. -- View this message in context: http://postgresql.nabble.com/Add-min-and-max-execute-statement-time-in-pg-stat-statement-tp5774989p5834805.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Andrew Dunstan schrieb am 2015-01-20: On 01/20/2015 01:26 PM, Arne Scheffer wrote: Interesting patch. I did a quick review looking only into the patch file. The sum of variances variable contains the sum of squared differences instead, I think. Umm, no. It's not. Umm, yes, i think, it is ;-) e-counters.sum_var_time += (total_time - old_mean) * (total_time - e-counters.mean_time); This is not a square that's being added. That's correct. Nevertheless it's the difference between the computed sum of squared differences and the preceeding one, added in every step. old_mean is not the same as e-counters.mean_time. Since the variance is this value divided by (n - 1), AIUI, I think sum of variances isn't a bad description. I'm open to alternative suggestions. And a very minor aspect: The term standard deviation in your code stands for (corrected) sample standard deviation, I think, because you devide by n-1 instead of n to keep the estimator unbiased. How about mentioning the prefix sample to indicate this beiing the estimator? I don't understand. I'm following pretty exactly the calculations stated at http://www.johndcook.com/blog/standard_deviation/ (There is nothing bad about that calculations, Welford's algorithm is simply sequently adding the differences mentioned above.) VlG-Arne I'm not a statistician. Perhaps others who are more literate in statistics can comment on this paragraph. And I'm sure I'm missing C specifics (again) (or it's the reduced patch file scope), but you introduce sqrtd, but sqrt is called? Good catch. Will fix. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Interesting patch. I did a quick review looking only into the patch file. The sum of variances variable contains the sum of squared differences instead, I think. And a very minor aspect: The term standard deviation in your code stands for (corrected) sample standard deviation, I think, because you devide by n-1 instead of n to keep the estimator unbiased. How about mentioning the prefix sample to indicate this beiing the estimator? And I'm sure I'm missing C specifics (again) (or it's the reduced patch file scope), but you introduce sqrtd, but sqrt is called? VlG Arne -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 12/21/2014 02:50 PM, Andrew Dunstan wrote: On 12/21/2014 02:12 PM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 12/21/2014 01:23 PM, Alvaro Herrera wrote: The point, I think, is that without atomic instructions you have to hold a lock while incrementing the counters. Hmm, do we do that now? We already have a spinlock mutex around the counter adjustment code, so I'm not sure why this discussion is being held. Because Peter suggested we might be able to use atomics. I'm a bit dubious that we can for min and max anyway. I would like someone more versed in numerical analysis than me to tell me how safe using sum of squares actually is in our case. That, on the other hand, might be a real issue. I'm afraid that accumulating across a very long series of statements could lead to severe roundoff error in the reported values, unless we use a method chosen for numerical stability. Right. The next question along those lines is whether we need to keep a running mean or whether that can safely be calculated on the fly. The code at http://www.johndcook.com/blog/standard_deviation/ does keep a running mean, and maybe that's required to prevent ill conditioned results, although I'm quite sure I see how it would. But this isn't my area of expertise. I played it safe and kept a running mean, and since it's there and useful in itself I exposed it via the function, so there are four new columns, min_time, max_time, mean_time and stddev_time. I'll add this to the upcoming commitfest. cheers andrew diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile index 2709909..975a637 100644 --- a/contrib/pg_stat_statements/Makefile +++ b/contrib/pg_stat_statements/Makefile @@ -4,8 +4,9 @@ MODULE_big = pg_stat_statements OBJS = pg_stat_statements.o $(WIN32RES) EXTENSION = pg_stat_statements -DATA = pg_stat_statements--1.2.sql pg_stat_statements--1.1--1.2.sql \ - pg_stat_statements--1.0--1.1.sql pg_stat_statements--unpackaged--1.0.sql +DATA = pg_stat_statements--1.3.sql pg_stat_statements--1.2--1.3.sql \ + pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql \ + pg_stat_statements--unpackaged--1.0.sql PGFILEDESC = pg_stat_statements - execution statistics of SQL statements ifdef USE_PGXS diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.2-1.3.sql b/contrib/pg_stat_statements/pg_stat_statements--1.2-1.3.sql new file mode 100644 index 000..506959d --- /dev/null +++ b/contrib/pg_stat_statements/pg_stat_statements--1.2-1.3.sql @@ -0,0 +1,47 @@ +/* contrib/pg_stat_statements/pg_stat_statements--1.2--1.3.sql */ + +-- complain if script is sourced in psql, rather than via ALTER EXTENSION +\echo Use ALTER EXTENSION pg_stat_statements UPDATE TO '1.3' to load this file. \quit + +/* First we have to remove them from the extension */ +ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements; +ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(); + +/* Then we can drop them */ +DROP VIEW pg_stat_statements; +DROP FUNCTION pg_stat_statements(); + +/* Now redefine */ +CREATE FUNCTION pg_stat_statements(IN showtext boolean, +OUT userid oid, +OUT dbid oid, +OUT queryid bigint, +OUT query text, +OUT calls int8, +OUT total_time float8, +OUT min_time float8, +OUT max_time float8, +OUT mean_time float8, +OUT stddev_time float8, +OUT rows int8, +OUT shared_blks_hit int8, +OUT shared_blks_read int8, +OUT shared_blks_dirtied int8, +OUT shared_blks_written int8, +OUT local_blks_hit int8, +OUT local_blks_read int8, +OUT local_blks_dirtied int8, +OUT local_blks_written int8, +OUT temp_blks_read int8, +OUT temp_blks_written int8, +OUT blk_read_time float8, +OUT blk_write_time float8 +) +RETURNS SETOF record +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_3' +LANGUAGE C STRICT VOLATILE; + +CREATE VIEW pg_stat_statements AS + SELECT * FROM pg_stat_statements(true); + +GRANT SELECT ON pg_stat_statements TO PUBLIC; diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql deleted file mode 100644 index 5bfa9a5..000 --- a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql +++ /dev/null @@ -1,44 +0,0 @@ -/* contrib/pg_stat_statements/pg_stat_statements--1.2.sql */ - --- complain if script is sourced in psql, rather than via CREATE EXTENSION -\echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit - --- Register functions. -CREATE FUNCTION pg_stat_statements_reset() -RETURNS void -AS 'MODULE_PATHNAME' -LANGUAGE C; - -CREATE FUNCTION pg_stat_statements(IN showtext boolean, -OUT userid oid, -OUT dbid oid, -OUT queryid bigint, -OUT query text, -OUT calls int8, -OUT total_time float8, -OUT rows int8, -OUT shared_blks_hit int8, -OUT shared_blks_read int8, -OUT shared_blks_dirtied int8, -OUT
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 04/07/2014 04:19 PM, Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: I just noticed that this patch not only adds min,max,stddev, but it also adds the ability to reset an entry's counters. This hasn't been mentioned in this thread at all; there has been no discussion on whether this is something we want to have, nor on whether this is the right API. What it does is add a new function pg_stat_statements_reset_time() which resets the min and max values from all function's entries, without touching the stddev state variables nor the number of calls. Note there's no ability to reset the values for a single function. That seems pretty bizarre. At this late date, the best thing would probably be to strip out the undiscussed functionality. It can get resubmitted for 9.5 if there's a real use-case for it. This seems to have fallen through the slats completely. I'd like to revive it for 9.5, without the extra function. If someone wants to be able to reset stats on a finer grained basis that should probably be a separate patch. One thing that bothered me slightly is that the stddev calculation appears to use a rather naive sum of squares method of calculation, which is known to give ill conditioned or impossible results under some pathological conditions: see http://www.johndcook.com/blog/standard_deviation for some details. I don't know if our results are likely to be so skewed as to produce pathological results, but ISTM we should probably take the trouble to be correct and use Welford's method anyway. On my blog Peter Geoghegan mentioned something about atomic fetch-and-add being useful here, but I'm not quite sure what that's referring to. Perhaps someone can give me a pointer. Thoughts? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Andrew Dunstan wrote: On my blog Peter Geoghegan mentioned something about atomic fetch-and-add being useful here, but I'm not quite sure what that's referring to. Perhaps someone can give me a pointer. The point, I think, is that without atomic instructions you have to hold a lock while incrementing the counters. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On December 21, 2014 7:23:27 PM CET, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Andrew Dunstan wrote: On my blog Peter Geoghegan mentioned something about atomic fetch-and-add being useful here, but I'm not quite sure what that's referring to. Perhaps someone can give me a pointer. The point, I think, is that without atomic instructions you have to hold a lock while incrementing the counters. Port/atomics.h provides a abstraction for doing such atomic manipulations. Andres --- Please excuse brevity and formatting - I am writing this on my mobile phone. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 12/21/2014 01:23 PM, Alvaro Herrera wrote: Andrew Dunstan wrote: On my blog Peter Geoghegan mentioned something about atomic fetch-and-add being useful here, but I'm not quite sure what that's referring to. Perhaps someone can give me a pointer. The point, I think, is that without atomic instructions you have to hold a lock while incrementing the counters. Hmm, do we do that now? That won't work for the stddev method I was referring to, although it could for the sum of squares method. In that case I would like someone more versed in numerical analysis than me to tell me how safe using sum of squares actually is in our case. And how about min and max? They don't look like good candidates for atomic operations. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Andrew Dunstan and...@dunslane.net writes: On 12/21/2014 01:23 PM, Alvaro Herrera wrote: The point, I think, is that without atomic instructions you have to hold a lock while incrementing the counters. Hmm, do we do that now? We already have a spinlock mutex around the counter adjustment code, so I'm not sure why this discussion is being held. I would like someone more versed in numerical analysis than me to tell me how safe using sum of squares actually is in our case. That, on the other hand, might be a real issue. I'm afraid that accumulating across a very long series of statements could lead to severe roundoff error in the reported values, unless we use a method chosen for numerical stability. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 12/21/2014 02:12 PM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 12/21/2014 01:23 PM, Alvaro Herrera wrote: The point, I think, is that without atomic instructions you have to hold a lock while incrementing the counters. Hmm, do we do that now? We already have a spinlock mutex around the counter adjustment code, so I'm not sure why this discussion is being held. Because Peter suggested we might be able to use atomics. I'm a bit dubious that we can for min and max anyway. I would like someone more versed in numerical analysis than me to tell me how safe using sum of squares actually is in our case. That, on the other hand, might be a real issue. I'm afraid that accumulating across a very long series of statements could lead to severe roundoff error in the reported values, unless we use a method chosen for numerical stability. Right. The next question along those lines is whether we need to keep a running mean or whether that can safely be calculated on the fly. The code at http://www.johndcook.com/blog/standard_deviation/ does keep a running mean, and maybe that's required to prevent ill conditioned results, although I'm quite sure I see how it would. But this isn't my area of expertise. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
KONDO Mitsumasa wrote: Hi all, I think this patch is completely forgotten, and feel very unfortunate:( Min, max, and stdev is basic statistics in general monitoring tools, So I'd like to push it. I just noticed that this patch not only adds min,max,stddev, but it also adds the ability to reset an entry's counters. This hasn't been mentioned in this thread at all; there has been no discussion on whether this is something we want to have, nor on whether this is the right API. What it does is add a new function pg_stat_statements_reset_time() which resets the min and max values from all function's entries, without touching the stddev state variables nor the number of calls. Note there's no ability to reset the values for a single function. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Alvaro Herrera alvhe...@2ndquadrant.com writes: I just noticed that this patch not only adds min,max,stddev, but it also adds the ability to reset an entry's counters. This hasn't been mentioned in this thread at all; there has been no discussion on whether this is something we want to have, nor on whether this is the right API. What it does is add a new function pg_stat_statements_reset_time() which resets the min and max values from all function's entries, without touching the stddev state variables nor the number of calls. Note there's no ability to reset the values for a single function. That seems pretty bizarre. At this late date, the best thing would probably be to strip out the undiscussed functionality. It can get resubmitted for 9.5 if there's a real use-case for it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Mon, Apr 7, 2014 at 4:19 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: I just noticed that this patch not only adds min,max,stddev, but it also adds the ability to reset an entry's counters. This hasn't been mentioned in this thread at all; there has been no discussion on whether this is something we want to have, nor on whether this is the right API. What it does is add a new function pg_stat_statements_reset_time() which resets the min and max values from all function's entries, without touching the stddev state variables nor the number of calls. Note there's no ability to reset the values for a single function. That seems pretty bizarre. At this late date, the best thing would probably be to strip out the undiscussed functionality. It can get resubmitted for 9.5 if there's a real use-case for it. If I'm understanding correctly, that actually seems reasonably sensible. I mean, the big problem with min/max is that you might be taking averages and standard deviations over a period of months, but the maximum and minimum are not that meaningful over such a long period. You might well want to keep a longer-term average while seeing the maximum and minimum for, say, today. I don't know exactly how useful that is, but it doesn't seem obviously dumb. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Robert Haas robertmh...@gmail.com writes: On Mon, Apr 7, 2014 at 4:19 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: What it does is add a new function pg_stat_statements_reset_time() which resets the min and max values from all function's entries, without touching the stddev state variables nor the number of calls. Note there's no ability to reset the values for a single function. That seems pretty bizarre. At this late date, the best thing would probably be to strip out the undiscussed functionality. It can get resubmitted for 9.5 if there's a real use-case for it. If I'm understanding correctly, that actually seems reasonably sensible. I mean, the big problem with min/max is that you might be taking averages and standard deviations over a period of months, but the maximum and minimum are not that meaningful over such a long period. You might well want to keep a longer-term average while seeing the maximum and minimum for, say, today. I don't know exactly how useful that is, but it doesn't seem obviously dumb. Well, maybe it's okay, but not being able to reset the stddev state seems odd, and the chosen function name seems odder. In any case, the time to be discussing the design of such functionality was several months ago. I'm still for ripping it out for 9.4, rather than being stuck with behavior that's not been adequately discussed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Hi all, I think this patch is completely forgotten, and feel very unfortunate:( Min, max, and stdev is basic statistics in general monitoring tools, So I'd like to push it. (2014/02/12 15:45), KONDO Mitsumasa wrote: (2014/01/29 17:31), Rajeev rastogi wrote: No Issue, you can share me the test cases, I will take the performance report. Attached patch is supported to latest pg_stat_statements. It includes min, max, and stdev statistics. Could you run compiling test on your windows enviroments? I think compiling error was fixed. We had disscuttion about which is needed useful statistics in community, I think both of statistics have storong and weak point. When we see the less(2 or 3) executed statement, stdev will be meaningless because it cannot calculate estimated value precisely very much, however in this situation, min and max will be propety work well because it isn't estimated value but fact value. On the other hand, when we see the more frequency executed statement, they will be contrary position statistics, stdev will be very useful statistics for estimating whole statements, and min and max might be extremely value. At the end of the day, these value were needed each other for more useful statistics when we want to see several actual statments. And past my experience showed no performance problems in this patch. So I'd like to implements all these values in pg_stat_statements. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2014/02/17 21:44), Rajeev rastogi wrote: It got compiled successfully on Windows. Thank you for checking on Windows! It is very helpful for me. Can we allow to add three statistics? I think only adding stdev is difficult to image for user. But if there are min and max, we can image each statements situations more easily. And I don't want to manage this feature in my monitoring tool that is called pg_statsinfo. Because it is beneficial for a lot of pg_stat_statements user and for improvement of postgres performance in the future. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 12 February 2014 12:16, KONDO Mitsumasa Wrote: Hi Rajeev, (2014/01/29 17:31), Rajeev rastogi wrote: No Issue, you can share me the test cases, I will take the performance report. Attached patch is supported to latest pg_stat_statements. It includes min, max, and stdev statistics. Could you run compiling test on your windows enviroments? I think compiling error was fixed. It got compiled successfully on Windows. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Hi Rajeev, (2014/01/29 17:31), Rajeev rastogi wrote: No Issue, you can share me the test cases, I will take the performance report. Attached patch is supported to latest pg_stat_statements. It includes min, max, and stdev statistics. Could you run compiling test on your windows enviroments? I think compiling error was fixed. We had disscuttion about which is needed useful statistics in community, I think both of statistics have storong and weak point. When we see the less(2 or 3) executed statement, stdev will be meaningless because it cannot calculate estimated value precisely very much, however in this situation, min and max will be propety work well because it isn't estimated value but fact value. On the other hand, when we see the more frequency executed statement, they will be contrary position statistics, stdev will be very useful statistics for estimating whole statements, and min and max might be extremely value. At the end of the day, these value were needed each other for more useful statistics when we want to see several actual statments. And past my experience showed no performance problems in this patch. So I'd like to implements all these values in pg_stat_statements. Regards, -- Mitsumasa KONDO NTT Open Source Software Center *** a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql --- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql *** *** 19,24 CREATE FUNCTION pg_stat_statements(IN showtext boolean, --- 19,27 OUT query text, OUT calls int8, OUT total_time float8, + OUT min_time float8, + OUT max_time float8, + OUT stdev_time float8, OUT rows int8, OUT shared_blks_hit int8, OUT shared_blks_read int8, *** *** 41,43 CREATE VIEW pg_stat_statements AS --- 44,51 SELECT * FROM pg_stat_statements(true); GRANT SELECT ON pg_stat_statements TO PUBLIC; + + CREATE FUNCTION pg_stat_statements_reset_time() + RETURNS void + AS 'MODULE_PATHNAME' + LANGUAGE C; *** a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql --- b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql *** *** 9,14 RETURNS void --- 9,19 AS 'MODULE_PATHNAME' LANGUAGE C; + CREATE FUNCTION pg_stat_statements_reset_time() + RETURNS void + AS 'MODULE_PATHNAME' + LANGUAGE C; + CREATE FUNCTION pg_stat_statements(IN showtext boolean, OUT userid oid, OUT dbid oid, *** *** 16,21 CREATE FUNCTION pg_stat_statements(IN showtext boolean, --- 21,29 OUT query text, OUT calls int8, OUT total_time float8, + OUT min_time float8, + OUT max_time float8, + OUT stdev_time float8, OUT rows int8, OUT shared_blks_hit int8, OUT shared_blks_read int8, *** *** 42,44 GRANT SELECT ON pg_stat_statements TO PUBLIC; --- 50,53 -- Don't want this to be available to non-superusers. REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC; + REVOKE ALL ON FUNCTION pg_stat_statements_reset_time() FROM PUBLIC; *** a/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql --- b/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql *** *** 4,8 --- 4,9 \echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset(); + ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset_time(); ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements(); ALTER EXTENSION pg_stat_statements ADD view pg_stat_statements; *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** *** 106,111 static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; --- 106,113 #define USAGE_DECREASE_FACTOR (0.99) /* decreased every entry_dealloc */ #define STICKY_DECREASE_FACTOR (0.50) /* factor for sticky entries */ #define USAGE_DEALLOC_PERCENT 5 /* free this % of entries at once */ + #define EXEC_TIME_INIT_MIN DBL_MAX /* initial execution min time */ + #define EXEC_TIME_INIT_MAX -DBL_MAX /* initial execution max time */ #define JUMBLE_SIZE1024 /* query serialization buffer size */ *** *** 137,142 typedef struct Counters --- 139,147 { int64 calls; /* # of times executed */ double total_time; /* total execution time, in msec */ + double total_sqtime; /* cumulated square execution time, in msec */ + double min_time; /* maximum execution time, in msec */ + double max_time; /* minimum execution time, in msec */ int64 rows; /* total # of retrieved or affected rows */ int64 shared_blks_hit; /* # of shared buffer hits */ int64 shared_blks_read; /* # of shared disk blocks read */ *** *** 274,283 void _PG_init(void); --- 279,290 void _PG_fini(void); Datum
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
2014-01-31 Peter Geoghegan p...@heroku.com On Thu, Jan 30, 2014 at 12:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: In reality, actual applications could hardly be further from the perfectly uniform distribution of distinct queries presented here. Yeah, I made the same point in different words. I think any realistic comparison of this code to what we had before needs to measure a workload with a more plausible query frequency distribution. Even though that distribution just doesn't square with anybody's reality, you can still increase the pg_stat_statements.max setting to 10k and the problem goes away at little cost (a lower setting is better, but a setting high enough to cache everything is best). But you're not going to have terribly much use for pg_stat_statements anywayif you really do experience churn at that rate with 5,000 possible entries, the module is ipso facto useless, and should be disabled. I run extra test your and my patch with the pg_stat_statements.max setting=10k in other same setting and servers. They are faster than past results. method | try1 | try2 | try3 peter 3 | 6.769 | 6.784 | 6.785 method 5 | 6.770 | 6.774 | 6.761 I think that most significant overhead in pg_stat_statements is deleting and inserting cost in hash table update, and not at LWLocks. If LWLock is the most overhead, we can see the overhead -S pgbench, because it have one select pet tern which are most Lock conflict case. But we can't see such result. I'm not sure about dynahash.c, but we can see hash conflict case in this code. IMHO, I think It might heavy because it have to run list search and compare one until not conflict it. And past result shows that your patch's most weak point is that deleting most old statement and inserting new old statement cost is very high, as you know. It accelerate to affect update(delete and insert) cost in pg_stat_statements table. So you proposed new setting 10k in default max value. But it is not essential solution, because it is also good perfomance for old pg_stat_statements. And when we set max=10K in your patch and want to get most used only 1000 queries in pg_stat_statements, we have to use order-by-query with limit 1000. Sort cost is relatively high, so monitoring query will be slow and high cost. But old one is only set pg_stat_statements.max=1000, and performance is not relatively bad. It will be best settings for getting most used 1000 queries infomation. That' all my assumption. Sorry for a few extra test, I had no time in my office today. If we hope, I can run 1/N distribution pgbench test next week, I modify my perl script little bit, for creating multiple sql files with various sleep time. Regards, -- Mitsumasa KONDO NTT Open Source Software Center
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Fri, Jan 31, 2014 at 5:07 AM, Mitsumasa KONDO kondo.mitsum...@gmail.com wrote: And past result shows that your patch's most weak point is that deleting most old statement and inserting new old statement cost is very high, as you know. No, there is no reason to imagine that entry_dealloc() is any slower, really. There will perhaps be some additional contention with shared lockers, but that isn't likely to be a major aspect. When the hash table is full, in reality at that point it's very unlikely that there will be two simultaneous sessions that need to create a new entry. As I said, on many of the systems I work with it takes weeks to see a new entry. This will be especially true now that the *.max default is 5,000. It accelerate to affect update(delete and insert) cost in pg_stat_statements table. So you proposed new setting 10k in default max value. But it is not essential solution, because it is also good perfomance for old pg_stat_statements. I was merely pointing out that that would totally change the outcome of your very artificial test-case. Decreasing the number of statements to 5,000 would too. I don't think we should assign much weight to any test case where the large majority or all statistics are wrong afterwards, due to there being so much churn. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Peter Geoghegan p...@heroku.com writes: On Fri, Jan 31, 2014 at 5:07 AM, Mitsumasa KONDO kondo.mitsum...@gmail.com wrote: It accelerate to affect update(delete and insert) cost in pg_stat_statements table. So you proposed new setting 10k in default max value. But it is not essential solution, because it is also good perfomance for old pg_stat_statements. I was merely pointing out that that would totally change the outcome of your very artificial test-case. Decreasing the number of statements to 5,000 would too. I don't think we should assign much weight to any test case where the large majority or all statistics are wrong afterwards, due to there being so much churn. To expand a bit: (1) It's really not very interesting to consider pg_stat_statement's behavior when the table size is too small to avoid 100% turnover; that's not a regime where the module will provide useful functionality. (2) It's completely unfair to pretend that a given hashtable size has the same cost in old and new code; there's more than a 7X difference in the shared-memory space eaten per hashtable entry. That does have an impact on whether people can set the table size large enough to avoid churn. The theory behind this patch is that for the same shared-memory consumption you can make the table size enough larger to move the cache hit rate up significantly, and that that will more than compensate performance-wise for the increased time needed to make a new table entry. Now that theory is capable of being disproven, but an artificial example with a flat query frequency distribution isn't an interesting test case. We don't care about optimizing performance for such cases, because they have nothing to do with real-world usage. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2014/01/30 8:29), Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: I could live with just stddev. Not sure others would be so happy. FWIW, I'd vote for just stddev, on the basis that min/max appear to add more to the counter update time than stddev does; you've got this: + e-counters.total_sqtime += total_time * total_time; versus this: + if (e-counters.min_time total_time || e-counters.min_time == EXEC_TIME_INIT) + e-counters.min_time = total_time; + if (e-counters.max_time total_time) + e-counters.max_time = total_time; I think on most modern machines, a float multiply-and-add is pretty darn cheap; a branch that might or might not be taken, OTOH, is a performance bottleneck. Similarly, the shared memory footprint hit is more: two new doubles for min/max versus one for total_sqtime (assuming we're happy with the naive stddev calculation). If we felt that min/max were of similar value to stddev then this would be mere nitpicking. But since people seem to agree they're worth less, I'm thinking the cost/benefit ratio isn't there. Why do you surplus worried about cost in my patch? Were three double variables assumed a lot of memory, or having lot of calculating cost? My test result showed LWlock bottele-neck is urban legend. If you have more fair test pattern, please tell me, I'll do it if the community will do fair judge. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Wed, Jan 29, 2014 at 8:48 PM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: I'd like to know the truth and the fact in your patch, rather than your argument:-) I see. [part of sample sqls file, they are executed in pgbench] SELECT 1-1/9+5*8*6+5+9-6-3-4/9%7-2%7/5-9/7+2+9/7-1%5%9/3-4/4-9/1+5+5/6/5%2+1*2*3-8/8+5-3-8-4/8+5%2*2-2-5%6+4 SELECT 1%9%7-8/5%6-1%2*2-7+9*6-2*6-9+1-2*9+6+7*8-4-2*1%3/7-1%4%9+4+7/5+4/2-3-5%8/3/7*6-1%8*6*1/7/2%5*6/2-3-9 SELECT 1%3*2/8%6%5-3%1+1/6*6*5/9-2*5+6/6*5-1/2-6%4%8/7%2*7%5%9%5/9%1%1-3-9/2*1+1*6%8/2*4/1+6*7*1+5%6+9*1-9*6 [methods] method 1: with old pgss, pg_stat_statements.max=1000(default) method 2: with old pgss with my patch, pg_stat_statements.max=1000(default) peter 1 : with new pgss(Peter's patch), pg_stat_statements.max=5000(default) peter 2 : with new pgss(Peter's patch), pg_stat_statements.max=1000 [for reference] method 5:with old pgss, pg_stat_statements.max=5000 method 6:with old pgss with my patch, pg_stat_statements.max=5000 [results] method | try1 | try2 | try3 | degrade performance ratio - method 1 | 6.546 | 6.558 | 6.638 | 0% (reference score) method 2 | 6.527 | 6.556 | 6.574 | 1% peter 1 | 5.204 | 5.203 | 5.216 |20% peter 2 | 4.241 | 4.236 | 4.262 |35% method 5 | 5.931 | 5.848 | 5.872 |11% method 6 | 5.794 | 5.792 | 5.776 |12% So, if we compare the old pg_stat_statements and old default with the new pg_stat_statements and the new default (why not? The latter still uses less shared memory than the former), by the standard of this benchmark there is a regression of about 20%. But you also note that there is an 11% regression in the old pg_stat_statements against *itself* when used with a higher pg_stat_statements.max setting of 5,000. This is completely contrary to everyone's prior understanding that increasing the size of the hash table had a very *positive* effect on performance by relieving cache pressure and thereby causing less exclusive locking for an entry_dealloc(). Do you suppose that this very surprising result might just be because this isn't a very representative benchmark? Nothing ever has the benefit of *not* having to exclusive lock. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp writes: (2014/01/30 8:29), Tom Lane wrote: If we felt that min/max were of similar value to stddev then this would be mere nitpicking. But since people seem to agree they're worth less, I'm thinking the cost/benefit ratio isn't there. Why do you surplus worried about cost in my patch? Were three double variables assumed a lot of memory, or having lot of calculating cost? My test result showed LWlock bottele-neck is urban legend. If you have more fair test pattern, please tell me, I'll do it if the community will do fair judge. [ shrug... ] Standard pgbench tests are useless for measuring microscopic issues like this one; whatever incremental cost is there will be swamped by client communication and parse/plan overhead. You may have proven that the overhead isn't 10% of round-trip query time, but we knew that without testing. If I were trying to measure the costs of these changes, I'd use short statements executed inside loops in a plpgsql function. And I'd definitely do it on a machine with quite a few cores (and a running session on each one), so that spinlock contention might conceivably happen often enough to be measurable. If that still says that the runtime cost is down in the noise, we could then proceed to debate whether min/max are worth a 10% increase in the size of the shared pgss hash table. Because by my count, that's about what two more doubles would be, now that we removed the query texts from the hash table --- a table entry is currently 21 doublewords (at least on 64-bit machines) and we're debating increasing it to 22 or 24. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Peter Geoghegan p...@heroku.com writes: On Wed, Jan 29, 2014 at 8:48 PM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: method | try1 | try2 | try3 | degrade performance ratio - method 1 | 6.546 | 6.558 | 6.638 | 0% (reference score) method 2 | 6.527 | 6.556 | 6.574 | 1% peter 1 | 5.204 | 5.203 | 5.216 |20% peter 2 | 4.241 | 4.236 | 4.262 |35% method 5 | 5.931 | 5.848 | 5.872 |11% method 6 | 5.794 | 5.792 | 5.776 |12% So, if we compare the old pg_stat_statements and old default with the new pg_stat_statements and the new default (why not? The latter still uses less shared memory than the former), by the standard of this benchmark there is a regression of about 20%. But you also note that there is an 11% regression in the old pg_stat_statements against *itself* when used with a higher pg_stat_statements.max setting of 5,000. This is completely contrary to everyone's prior understanding that increasing the size of the hash table had a very *positive* effect on performance by relieving cache pressure and thereby causing less exclusive locking for an entry_dealloc(). Do you suppose that this very surprising result might just be because this isn't a very representative benchmark? Nothing ever has the benefit of *not* having to exclusive lock. If I understand this test scenario properly, there are no duplicate queries, so that each iteration creates a new hashtable entry (possibly evicting an old one). And it's a single-threaded test, so that there can be no benefit from reduced locking. I don't find the results particularly surprising given that. We knew going in that the external-texts patch would slow down creation of a new hashtable entry: there's no way that writing a query text to a file isn't slower than memcpy'ing it into shared memory. The performance argument for doing it anyway is that by greatly reducing the size of hashtable entries, it becomes more feasible to size the hashtable large enough so that it's seldom necessary to evict hashtable entries. It's always going to be possible to make the patch look bad by testing cases where no savings in hashtable evictions is possible; which is exactly what this test case does. But I don't think that's relevant to real-world usage. pg_stat_statements isn't going to be useful unless there's a great deal of repeated statement execution, so what we should be worrying about is the case where a table entry exists not the case where it doesn't. At the very least, test cases where there are *no* repeats are not representative of cases people would be using pg_stat_statements with. As for the measured slowdown with larger hashtable size (but still smaller than the number of distinct queries in the test), my money is on the larger table not fitting in L3 cache or something like that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
I wrote: If I understand this test scenario properly, there are no duplicate queries, so that each iteration creates a new hashtable entry (possibly evicting an old one). And it's a single-threaded test, so that there can be no benefit from reduced locking. After looking more closely, it's *not* single-threaded, but each pgbench thread is running through the same sequence of 1 randomly generated SQL statements. So everything depends on how nearly those clients stay in step. I bet they'd stay pretty nearly in step, though --- any process lagging behind would find all the hashtable entries already created, and thus be able to catch up relative to the ones in the lead which are being slowed by having to write out their query texts. So it seems fairly likely that this scenario is greatly stressing the case where multiple processes redundantly create the same hashtable entry. In any case, while the same table entry does get touched once-per-client over a short interval, there's no long-term reuse of table entries. So I still say this isn't at all representative of real-world use of pg_stat_statements. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Thu, Jan 30, 2014 at 12:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: If I understand this test scenario properly, there are no duplicate queries, so that each iteration creates a new hashtable entry (possibly evicting an old one). And it's a single-threaded test, so that there can be no benefit from reduced locking. After looking more closely, it's *not* single-threaded, but each pgbench thread is running through the same sequence of 1 randomly generated SQL statements. So everything depends on how nearly those clients stay in step. I bet they'd stay pretty nearly in step, though --- any process lagging behind would find all the hashtable entries already created, and thus be able to catch up relative to the ones in the lead which are being slowed by having to write out their query texts. So it seems fairly likely that this scenario is greatly stressing the case where multiple processes redundantly create the same hashtable entry. In any case, while the same table entry does get touched once-per-client over a short interval, there's no long-term reuse of table entries. So I still say this isn't at all representative of real-world use of pg_stat_statements. One could test it with each pgbench thread starting at a random point in the same sequence and wrapping at the end. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Robert Haas robertmh...@gmail.com writes: One could test it with each pgbench thread starting at a random point in the same sequence and wrapping at the end. Well, the real point is that 1 distinct statements all occurring with exactly the same frequency isn't a realistic scenario: any hashtable size less than 1 necessarily sucks, any size = 1 is perfect. I'd think that what you want to test is a long-tailed frequency distribution (probably a 1/N type of law) where a small number of statements account for most of the hits and there are progressively fewer uses of less common statements. What would then be interesting is how does the performance change as the hashtable size is varied to cover more or less of that distribution. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
BTW ... it occurs to me to wonder if it'd be feasible to keep the query-texts file mmap'd in each backend, thereby reducing the overhead to write a new text to about the cost of a memcpy, and eliminating the read cost in pg_stat_statements() altogether. It's most likely not worth the trouble; but if a more-realistic benchmark test shows that we actually have a performance issue there, that might be a way out without giving up the functional advantages of Peter's patch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Thu, Jan 30, 2014 at 9:57 AM, Tom Lane t...@sss.pgh.pa.us wrote: BTW ... it occurs to me to wonder if it'd be feasible to keep the query-texts file mmap'd in each backend, thereby reducing the overhead to write a new text to about the cost of a memcpy, and eliminating the read cost in pg_stat_statements() altogether. It's most likely not worth the trouble; but if a more-realistic benchmark test shows that we actually have a performance issue there, that might be a way out without giving up the functional advantages of Peter's patch. There could be a worst case for that scheme too, plus we'd have to figure out how to make in work with windows, which in the case of mmap() is not a sunk cost AFAIK. I'm skeptical of the benefit of pursuing that. I'm totally unimpressed with the benchmark as things stand. It relies on keeping 64 clients in perfect lockstep as each executes 10,000 queries that are each unique snowflakes. Though even though they're unique snowflakes, and even though there are 10,000 of them, everyone executes the same one at exactly the same time relative to each other, in exactly the same order as quickly as possible. Even still, the headline reference score of -35% is completely misleading, because it isn't comparing like with like in terms of has table size. This benchmark incidentally recommends that we reduce the default hash table size to improve performance when the hash table is under pressure, which is ludicrous. It's completely backwards. You could also use the benchmark to demonstrate that the overhead of calling pg_stat_statements() is ridiculously high, since like creating a new query text, that only requires a shared lock too. This is an implausibly bad worst case for larger hash table sizes in pg_stat_statements generally. 5,000 entries is enough for the large majority of applications. But for those that hit that limit, in practice they're still going to find the vast majority of queries already in the table as they're executed. If they don't, they can double or triple their max setting, because the shared memory overhead is so low. No one has any additional overhead once their query is in the hash table already. In reality, actual applications could hardly be further from the perfectly uniform distribution of distinct queries presented here. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Peter Geoghegan p...@heroku.com writes: On Thu, Jan 30, 2014 at 9:57 AM, Tom Lane t...@sss.pgh.pa.us wrote: BTW ... it occurs to me to wonder if it'd be feasible to keep the query-texts file mmap'd in each backend, thereby reducing the overhead to write a new text to about the cost of a memcpy, and eliminating the read cost in pg_stat_statements() altogether. It's most likely not worth the trouble; but if a more-realistic benchmark test shows that we actually have a performance issue there, that might be a way out without giving up the functional advantages of Peter's patch. There could be a worst case for that scheme too, plus we'd have to figure out how to make in work with windows, which in the case of mmap() is not a sunk cost AFAIK. I'm skeptical of the benefit of pursuing that. Well, it'd be something like #ifdef HAVE_MMAP then use mmap, else use what's there today. But I agree that it's not something to pursue unless we see a more credible demonstration of a problem. Presumably any such code would have to be prepared to remap if the file grows larger than it initially allowed for; and that would be complicated and perhaps have unwanted failure modes. In reality, actual applications could hardly be further from the perfectly uniform distribution of distinct queries presented here. Yeah, I made the same point in different words. I think any realistic comparison of this code to what we had before needs to measure a workload with a more plausible query frequency distribution. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Thu, Jan 30, 2014 at 12:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: In reality, actual applications could hardly be further from the perfectly uniform distribution of distinct queries presented here. Yeah, I made the same point in different words. I think any realistic comparison of this code to what we had before needs to measure a workload with a more plausible query frequency distribution. Even though that distribution just doesn't square with anybody's reality, you can still increase the pg_stat_statements.max setting to 10k and the problem goes away at little cost (a lower setting is better, but a setting high enough to cache everything is best). But you're not going to have terribly much use for pg_stat_statements anywayif you really do experience churn at that rate with 5,000 possible entries, the module is ipso facto useless, and should be disabled. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 28th January, Mitsumasa KONDO wrote: 2014-01-26 Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com On 21 January 2014 19:48, Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com wrote: On 21 January 2014 12:54, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp mailto:kondo.mitsum...@lab.ntt.co.jp wrote: Rebased patch is attached. Does this fix the Windows bug reported by Kumar on 20/11/2013 ? Sorry, I was misunderstanding. First name of Mr. Rajeev Rastogi is Kumar! I searched only e-mail address and title by his name... I don't have windows compiler enviroment, but attached patch might be fixed. Could I ask Mr. Rajeev Rastogi to test my patch again? I tried to test this but I could not apply the patch on latest git HEAD. This may be because of recent patch (related to pg_stat_statement only pg_stat_statements external query text storage ), which got committed on 27th January. Thank you for trying to test my patch. As you say, recently commit changes pg_stat_statements.c a lot. So I have to revise my patch. Please wait for a while. By the way, latest pg_stat_statement might affect performance in Windows system. Because it uses fflush() system call every creating new entry in pg_stat_statements, and it calls many fread() to warm file cache. It works well in Linux system, but I'm not sure in Windows system. If you have time, could you test it on your Windows system? If it affects perfomance a lot, we can still change it. No Issue, you can share me the test cases, I will take the performance report. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Wed, Jan 29, 2014 at 9:03 AM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: (2014/01/29 15:51), Tom Lane wrote: KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp writes: By the way, latest pg_stat_statement might affect performance in Windows system. Because it uses fflush() system call every creating new entry in pg_stat_statements, and it calls many fread() to warm file cache. This statement doesn't seem to have much to do with the patch as committed. There are no fflush calls, and no notion of warming the file cache either. Oh, all right. We do assume that the OS is smart enough to keep a frequently-read file in cache ... is Windows too stupid for that? I don't know about it. But I think Windows cache feature is stupid. It seems to always write/read data to/from disk, nevertheless having large memory... I'd like to know test result on Windows, if we can... I am quite certain the Windows cache is *not* that stupid, and hasn't been since the Windows 3.1 days. If you want to make certain, set FILE_ATTRIBUTE_TEMPORARY when the file is opened, then it will work really hard not to write it to disk - harder than most Linux fs'es do AFAIK. This should of course only be done if we don't mind it going away :) As per port/open.c, pg will set this when O_SHORT_LIVED is specified. But AFAICT, we don't actually use this *anywhere* in the backend? Perhaps we should for this - and also for the pgstat files? (I may have missed a codepath, only had a minute to do some greping) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Wed, Jan 29, 2014 at 8:58 AM, Peter Geoghegan p...@heroku.com wrote: I am not opposed in principle to adding new things to the counters struct in pg_stat_statements. I just think that the fact that the overhead of installing the module on a busy production system is currently so low is of *major* value, and therefore any person that proposes to expand that struct should be required to very conclusively demonstrate that there is no appreciably increase in overhead. Having a standard deviation column would be nice, but it's still not that important. Maybe when we have portable atomic addition we can afford to be much more inclusive of that kind of thing. Not (at this point) commenting on the details, but a big +1 on the fact that the overhead is low is a *very* important property. If the overhead starts growing it will be uninstalled - and that will make things even harder to diagnose. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 29 January 2014 09:16, Magnus Hagander mag...@hagander.net wrote: On Wed, Jan 29, 2014 at 8:58 AM, Peter Geoghegan p...@heroku.com wrote: I am not opposed in principle to adding new things to the counters struct in pg_stat_statements. I just think that the fact that the overhead of installing the module on a busy production system is currently so low is of *major* value, and therefore any person that proposes to expand that struct should be required to very conclusively demonstrate that there is no appreciably increase in overhead. Having a standard deviation column would be nice, but it's still not that important. Maybe when we have portable atomic addition we can afford to be much more inclusive of that kind of thing. Not (at this point) commenting on the details, but a big +1 on the fact that the overhead is low is a *very* important property. If the overhead starts growing it will be uninstalled - and that will make things even harder to diagnose. +1 also. I think almost everybody agrees. Having said that, I'm not certain that moves us forwards with how to handle the patch at hand. Here is my attempt at an objective view of whether or not to apply the patch: The purpose of the patch is to provide additional help to diagnose performance issues. Everybody seems to want this as an additional feature, given the above caveat. The author has addressed the original performance concerns there and provided some evidence to show that appears effective. It is possible that adding this extra straw breaks the camel's back, but it seems unlikely to be that simple. A new feature to help performance problems will be of a great use to many people; complaints about the performance of pg_stat_statements are unlikely to increase greatly from this change, since such changes would only affect those right on the threshold of impact. The needs of the many... Further changes to improve scalability of pg_stat_statements seem warranted in the future, but I see no reason to block the patch for that reason. If somebody has some specific evidence that the impact outweighs the benefit, please produce it or I am inclined to proceed to commit. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/29/2014 02:58 AM, Peter Geoghegan wrote: I am not opposed in principle to adding new things to the counters struct in pg_stat_statements. I just think that the fact that the overhead of installing the module on a busy production system is currently so low is of *major* value, and therefore any person that proposes to expand that struct should be required to very conclusively demonstrate that there is no appreciably increase in overhead. Having a standard deviation column would be nice, but it's still not that important. Maybe when we have portable atomic addition we can afford to be much more inclusive of that kind of thing. Importance is in the eye of the beholder. As far as I'm concerned, min and max are of FAR less value than stddev. If stddev gets left out I'm going to be pretty darned annoyed, especially since the benchmarks seem to show the marginal cost as being virtually unmeasurable. If those aren't enough for you, perhaps you'd like to state what sort of tests would satisfy you. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/29/2014 04:55 AM, Simon Riggs wrote: It is possible that adding this extra straw breaks the camel's back, but it seems unlikely to be that simple. A new feature to help performance problems will be of a great use to many people; complaints about the performance of pg_stat_statements are unlikely to increase greatly from this change, since such changes would only affect those right on the threshold of impact. The needs of the many... Further changes to improve scalability of pg_stat_statements seem warranted in the future, but I see no reason to block the patch for that reason If we're talking here about min, max and stddev, then +1. The stats we've seen so far seem to indicate that any affect on query response time is within the margin of error for pgbench. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Wed, Jan 29, 2014 at 9:06 AM, Andrew Dunstan and...@dunslane.net wrote: I am not opposed in principle to adding new things to the counters struct in pg_stat_statements. I just think that the fact that the overhead of installing the module on a busy production system is currently so low is of *major* value, and therefore any person that proposes to expand that struct should be required to very conclusively demonstrate that there is no appreciably increase in overhead. Having a standard deviation column would be nice, but it's still not that important. Maybe when we have portable atomic addition we can afford to be much more inclusive of that kind of thing. Importance is in the eye of the beholder. As far as I'm concerned, min and max are of FAR less value than stddev. If stddev gets left out I'm going to be pretty darned annoyed, especially since the benchmarks seem to show the marginal cost as being virtually unmeasurable. If those aren't enough for you, perhaps you'd like to state what sort of tests would satisfy you. I agree. I find it somewhat unlikely that pg_stat_statements is fragile enough that these few extra counters are going to make much of a difference. At the same time, I find min and max a dubious value proposition. It seems highly likely to me that stddev will pay its way, but I'm much less certain about the others. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/29/2014 11:54 AM, Robert Haas wrote: I agree. I find it somewhat unlikely that pg_stat_statements is fragile enough that these few extra counters are going to make much of a difference. At the same time, I find min and max a dubious value proposition. It seems highly likely to me that stddev will pay its way, but I'm much less certain about the others. What I really want is percentiles, but I'm pretty sure we already shot that down. ;-) I could use min/max -- think of performance test runs. However, I agree that they're less valuable than stddev. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Wed, Jan 29, 2014 at 6:06 AM, Andrew Dunstan and...@dunslane.net wrote: mportance is in the eye of the beholder. As far as I'm concerned, min and max are of FAR less value than stddev. If stddev gets left out I'm going to be pretty darned annoyed, especially since the benchmarks seem to show the marginal cost as being virtually unmeasurable. If those aren't enough for you, perhaps you'd like to state what sort of tests would satisfy you. I certainly agree that stddev is far more valuable than min and max. I'd be inclined to not accept the latter on the grounds that they aren't that useful. So, am I correct in my understanding: The benchmark performed here [1] was of a standard pgbench SELECT workload, with no other factor influencing the general overhead (unlike the later test that queried pg_stat_statements as well)? I'd prefer to have seen the impact on a 32 core system, where spinlock contention would naturally be more likely to appear, but even still I'm duly satisfied. There was no testing of the performance impact prior to 6 days ago, and I'm surprised it took Mitsumasa that long to come up with something like this, since I raised this concern about 3 months ago. [1] http://www.postgresql.org/message-id/52e10e6a.5060...@lab.ntt.co.jp -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/29/2014 04:14 PM, Peter Geoghegan wrote: On Wed, Jan 29, 2014 at 6:06 AM, Andrew Dunstan and...@dunslane.net wrote: mportance is in the eye of the beholder. As far as I'm concerned, min and max are of FAR less value than stddev. If stddev gets left out I'm going to be pretty darned annoyed, especially since the benchmarks seem to show the marginal cost as being virtually unmeasurable. If those aren't enough for you, perhaps you'd like to state what sort of tests would satisfy you. I certainly agree that stddev is far more valuable than min and max. I'd be inclined to not accept the latter on the grounds that they aren't that useful. So, am I correct in my understanding: The benchmark performed here [1] was of a standard pgbench SELECT workload, with no other factor influencing the general overhead (unlike the later test that queried pg_stat_statements as well)? I'd prefer to have seen the impact on a 32 core system, where spinlock contention would naturally be more likely to appear, but even still I'm duly satisfied. I could live with just stddev. Not sure others would be so happy. Glad we're good on the performance impact front. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Andrew Dunstan and...@dunslane.net writes: I could live with just stddev. Not sure others would be so happy. FWIW, I'd vote for just stddev, on the basis that min/max appear to add more to the counter update time than stddev does; you've got this: + e-counters.total_sqtime += total_time * total_time; versus this: + if (e-counters.min_time total_time || e-counters.min_time == EXEC_TIME_INIT) + e-counters.min_time = total_time; + if (e-counters.max_time total_time) + e-counters.max_time = total_time; I think on most modern machines, a float multiply-and-add is pretty darn cheap; a branch that might or might not be taken, OTOH, is a performance bottleneck. Similarly, the shared memory footprint hit is more: two new doubles for min/max versus one for total_sqtime (assuming we're happy with the naive stddev calculation). If we felt that min/max were of similar value to stddev then this would be mere nitpicking. But since people seem to agree they're worth less, I'm thinking the cost/benefit ratio isn't there. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2014/01/29 16:58), Peter Geoghegan wrote: On Tue, Jan 28, 2014 at 10:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp writes: By the way, latest pg_stat_statement might affect performance in Windows system. Because it uses fflush() system call every creating new entry in pg_stat_statements, and it calls many fread() to warm file cache. This statement doesn't seem to have much to do with the patch as committed. You could make a strong case for the patch improving performance in practice for many users, by allowing us to increase the pg_stat_statements.max default value to 5,000, 5 times the previous value. The real expense of creating a new entry is the exclusive locking of the hash table, which blocks *everything* in pg_stat_statements. That isn't expanded at all, since queries are only written with a shared lock, which only blocks the creation of new entries which was already relatively expensive. In particular, it does not block the maintenance of costs for all already observed entries in the hash table. It's obvious that simply reducing the pressure on the cache will improve matters a lot, which for many users the external texts patch does. Since Mitsumasa has compared that patch and at least one of his proposed pg_stat_statements patches on several occasions, I will re-iterate the difference: any increased overhead from the external texts patch is paid only when a query is first entered into the hash table. Then, there is obviously and self-evidently no additional overhead from contention for any future execution of the same query, no matter what, indefinitely (the exclusive locking section of creating a new entry does not do I/O, except in fantastically unlikely circumstances). So for many of the busy production systems I work with, that's the difference between a cost paid perhaps tens of thousands of times a second, and a cost paid every few days or weeks. Even if he is right about a write() taking an unreasonably large amount of time on Windows, the consequences felt as pg_stat_statements LWLock contention would be very limited. I am not opposed in principle to adding new things to the counters struct in pg_stat_statements. I just think that the fact that the overhead of installing the module on a busy production system is currently so low is of *major* value, and therefore any person that proposes to expand that struct should be required to very conclusively demonstrate that there is no appreciably increase in overhead. Having a standard deviation column would be nice, but it's still not that important. Maybe when we have portable atomic addition we can afford to be much more inclusive of that kind of thing. I'd like to know the truth and the fact in your patch, rather than your argument:-) So I create detail pg_stat_statements benchmark tool using pgbench. This tool can create 1 pattern unique sqls in a file, and it is only for measuring pg_stat_statements performance. Because it only updates pg_stat_statements data and doesn't write to disk at all. It's fair benchmark. [usage] perl create_sql.pl file.sql psql -h -h xxx.xxx.xxx.xxx mitsu-ko -c 'SELECT pg_stat_statements_reset()' pgbench -h xxx.xxx.xxx.xxx mitsu-ko -c64 -j32 -n -T 180 -f file.sql [part of sample sqls file, they are executed in pgbench] SELECT 1-1/9+5*8*6+5+9-6-3-4/9%7-2%7/5-9/7+2+9/7-1%5%9/3-4/4-9/1+5+5/6/5%2+1*2*3-8/8+5-3-8-4/8+5%2*2-2-5%6+4 SELECT 1%9%7-8/5%6-1%2*2-7+9*6-2*6-9+1-2*9+6+7*8-4-2*1%3/7-1%4%9+4+7/5+4/2-3-5%8/3/7*6-1%8*6*1/7/2%5*6/2-3-9 SELECT 1%3*2/8%6%5-3%1+1/6*6*5/9-2*5+6/6*5-1/2-6%4%8/7%2*7%5%9%5/9%1%1-3-9/2*1+1*6%8/2*4/1+6*7*1+5%6+9*1-9*6 ... I test some methods that include old pgss, old pgss with my patch, and new pgss. Test server and postgresql.conf are same in I tested last day in this ML-thread. And test methods and test results are here, [methods] method 1: with old pgss, pg_stat_statements.max=1000(default) method 2: with old pgss with my patch, pg_stat_statements.max=1000(default) peter 1 : with new pgss(Peter's patch), pg_stat_statements.max=5000(default) peter 2 : with new pgss(Peter's patch), pg_stat_statements.max=1000 [for reference] method 5:with old pgss, pg_stat_statements.max=5000 method 6:with old pgss with my patch, pg_stat_statements.max=5000 [results] method | try1 | try2 | try3 | degrade performance ratio - method 1 | 6.546 | 6.558 | 6.638 | 0% (reference score) method 2 | 6.527 | 6.556 | 6.574 | 1% peter 1 | 5.204 | 5.203 | 5.216 |20% peter 2 | 4.241 | 4.236 | 4.262 |35% method 5 | 5.931 | 5.848 | 5.872 |11% method 6 | 5.794 | 5.792 | 5.776 |12% New pgss is extremely degrade performance than old pgss, and I cannot see performance scaling. I understand that his argument was My patch reduces time of LWLock in pg_stat_statements, it is good for performance. However, Kondo's patch will be degrade performance in
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2014/01/29 17:31), Rajeev rastogi wrote: On 28th January, Mitsumasa KONDO wrote: By the way, latest pg_stat_statement might affect performance in Windows system. Because it uses fflush() system call every creating new entry in pg_stat_statements, and it calls many fread() to warm file cache. It works well in Linux system, but I'm not sure in Windows system. If you have time, could you test it on your Windows system? If it affects perfomance a lot, we can still change it. No Issue, you can share me the test cases, I will take the performance report. Thank you for your kind! I posted another opinion in his patch. So please wait for a while, for not waste your test time. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2014/01/28 15:17), Rajeev rastogi wrote: On 27th January, Mitsumasa KONDO wrote: 2014-01-26 Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com On 21 January 2014 19:48, Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com wrote: On 21 January 2014 12:54, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp mailto:kondo.mitsum...@lab.ntt.co.jp wrote: Rebased patch is attached. Does this fix the Windows bug reported by Kumar on 20/11/2013 ? Sorry, I was misunderstanding. First name of Mr. Rajeev Rastogi is Kumar! I searched only e-mail address and title by his name... I don't have windows compiler enviroment, but attached patch might be fixed. Could I ask Mr. Rajeev Rastogi to test my patch again? I tried to test this but I could not apply the patch on latest git HEAD. This may be because of recent patch (related to pg_stat_statement only pg_stat_statements external query text storage ), which got committed on 27th January. Thank you for trying to test my patch. As you say, recently commit changes pg_stat_statements.c a lot. So I have to revise my patch. Please wait for a while. By the way, latest pg_stat_statement might affect performance in Windows system. Because it uses fflush() system call every creating new entry in pg_stat_statements, and it calls many fread() to warm file cache. It works well in Linux system, but I'm not sure in Windows system. If you have time, could you test it on your Windows system? If it affects perfomance a lot, we can still change it. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp writes: By the way, latest pg_stat_statement might affect performance in Windows system. Because it uses fflush() system call every creating new entry in pg_stat_statements, and it calls many fread() to warm file cache. This statement doesn't seem to have much to do with the patch as committed. There are no fflush calls, and no notion of warming the file cache either. We do assume that the OS is smart enough to keep a frequently-read file in cache ... is Windows too stupid for that? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Tue, Jan 28, 2014 at 10:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp writes: By the way, latest pg_stat_statement might affect performance in Windows system. Because it uses fflush() system call every creating new entry in pg_stat_statements, and it calls many fread() to warm file cache. This statement doesn't seem to have much to do with the patch as committed. You could make a strong case for the patch improving performance in practice for many users, by allowing us to increase the pg_stat_statements.max default value to 5,000, 5 times the previous value. The real expense of creating a new entry is the exclusive locking of the hash table, which blocks *everything* in pg_stat_statements. That isn't expanded at all, since queries are only written with a shared lock, which only blocks the creation of new entries which was already relatively expensive. In particular, it does not block the maintenance of costs for all already observed entries in the hash table. It's obvious that simply reducing the pressure on the cache will improve matters a lot, which for many users the external texts patch does. Since Mitsumasa has compared that patch and at least one of his proposed pg_stat_statements patches on several occasions, I will re-iterate the difference: any increased overhead from the external texts patch is paid only when a query is first entered into the hash table. Then, there is obviously and self-evidently no additional overhead from contention for any future execution of the same query, no matter what, indefinitely (the exclusive locking section of creating a new entry does not do I/O, except in fantastically unlikely circumstances). So for many of the busy production systems I work with, that's the difference between a cost paid perhaps tens of thousands of times a second, and a cost paid every few days or weeks. Even if he is right about a write() taking an unreasonably large amount of time on Windows, the consequences felt as pg_stat_statements LWLock contention would be very limited. I am not opposed in principle to adding new things to the counters struct in pg_stat_statements. I just think that the fact that the overhead of installing the module on a busy production system is currently so low is of *major* value, and therefore any person that proposes to expand that struct should be required to very conclusively demonstrate that there is no appreciably increase in overhead. Having a standard deviation column would be nice, but it's still not that important. Maybe when we have portable atomic addition we can afford to be much more inclusive of that kind of thing. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2014/01/29 15:51), Tom Lane wrote: KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp writes: By the way, latest pg_stat_statement might affect performance in Windows system. Because it uses fflush() system call every creating new entry in pg_stat_statements, and it calls many fread() to warm file cache. This statement doesn't seem to have much to do with the patch as committed. There are no fflush calls, and no notion of warming the file cache either. Oh, all right. We do assume that the OS is smart enough to keep a frequently-read file in cache ... is Windows too stupid for that? I don't know about it. But I think Windows cache feature is stupid. It seems to always write/read data to/from disk, nevertheless having large memory... I'd like to know test result on Windows, if we can... Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2014/01/23 23:18), Andrew Dunstan wrote: What is more, if the square root calculation is affecting your benchmarks, I suspect you are benchmarking the wrong thing. I run another test that has two pgbench-clients in same time, one is select-only-query and another is executing 'SELECT * pg_stat_statement' query in every one second. I used v6 patch in this test. * Benchmark Commands $bin/pgbench -h xxx.xxx.xxx.xxx mitsu-ko -c64 -j32 -S -T 180 -n $bin/pgbench -h xxx.xxx.xxx.xxx mitsu-ko -T 180 -n -f file.sql ** file.sql SELECT * FROM pg_stat_statement; \sleep 1s * Select-only-query Result (Test result is represented by tps.) method| try1 | try2 | try3 with pgss| 125502 | 125818 | 125809 with patched pgss| 125909 | 125699 | 126040 This result shows my patch is almost same performance than before. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2014/01/26 17:43), Mitsumasa KONDO wrote: 2014-01-26 Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com On 21 January 2014 19:48, Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com wrote: On 21 January 2014 12:54, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp mailto:kondo.mitsum...@lab.ntt.co.jp wrote: Rebased patch is attached. Does this fix the Windows bug reported by Kumar on 20/11/2013 ? Sorry, I was misunderstanding. First name of Mr. Rajeev Rastogi is Kumar! I searched only e-mail address and title by his name... I don't have windows compiler enviroment, but attached patch might be fixed. Could I ask Mr. Rajeev Rastogi to test my patch again? Regards, -- Mitsumasa KONDO NTT Open Source Software Center *** a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql --- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql *** *** 19,24 CREATE FUNCTION pg_stat_statements( --- 19,27 OUT query text, OUT calls int8, OUT total_time float8, + OUT min_time float8, + OUT max_time float8, + OUT stdev_time float8, OUT rows int8, OUT shared_blks_hit int8, OUT shared_blks_read int8, *** *** 41,43 CREATE VIEW pg_stat_statements AS --- 44,52 SELECT * FROM pg_stat_statements(); GRANT SELECT ON pg_stat_statements TO PUBLIC; + + /* New Function */ + CREATE FUNCTION pg_stat_statements_reset_time() + RETURNS void + AS 'MODULE_PATHNAME' + LANGUAGE C; *** a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql --- b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql *** *** 9,14 RETURNS void --- 9,19 AS 'MODULE_PATHNAME' LANGUAGE C; + CREATE FUNCTION pg_stat_statements_reset_time() + RETURNS void + AS 'MODULE_PATHNAME' + LANGUAGE C; + CREATE FUNCTION pg_stat_statements( OUT userid oid, OUT dbid oid, *** *** 16,21 CREATE FUNCTION pg_stat_statements( --- 21,29 OUT query text, OUT calls int8, OUT total_time float8, + OUT min_time float8, + OUT max_time float8, + OUT stdev_time float8, OUT rows int8, OUT shared_blks_hit int8, OUT shared_blks_read int8, *** *** 42,44 GRANT SELECT ON pg_stat_statements TO PUBLIC; --- 50,53 -- Don't want this to be available to non-superusers. REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC; + REVOKE ALL ON FUNCTION pg_stat_statements_reset_time() FROM PUBLIC; *** a/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql --- b/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql *** *** 4,8 --- 4,9 \echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset(); + ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset_time(); ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements(); ALTER EXTENSION pg_stat_statements ADD view pg_stat_statements; *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** *** 78,84 static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; #define USAGE_DECREASE_FACTOR (0.99) /* decreased every entry_dealloc */ #define STICKY_DECREASE_FACTOR (0.50) /* factor for sticky entries */ #define USAGE_DEALLOC_PERCENT 5 /* free this % of entries at once */ ! #define JUMBLE_SIZE1024 /* query serialization buffer size */ /* --- 78,85 #define USAGE_DECREASE_FACTOR (0.99) /* decreased every entry_dealloc */ #define STICKY_DECREASE_FACTOR (0.50) /* factor for sticky entries */ #define USAGE_DEALLOC_PERCENT 5 /* free this % of entries at once */ ! #define EXEC_TIME_INIT_MIN DBL_MAX /* initial execution min time */ ! #define EXEC_TIME_INIT_MAX -DBL_MAX /* initial execution max time */ #define JUMBLE_SIZE1024 /* query serialization buffer size */ /* *** *** 114,119 typedef struct Counters --- 115,123 { int64 calls; /* # of times executed */ double total_time; /* total execution time, in msec */ + double total_sqtime; /* cumulated square execution time, in msec */ + double min_time; /* maximum execution time, in msec */ + double max_time; /* minimum execution time, in msec */ int64 rows; /* total # of retrieved or affected rows */ int64 shared_blks_hit; /* # of shared buffer hits */ int64 shared_blks_read; /* # of shared disk blocks read */ *** *** 237,245 void _PG_init(void); --- 241,251 void _PG_fini(void); Datum pg_stat_statements_reset(PG_FUNCTION_ARGS); + Datum pg_stat_statements_reset_time(PG_FUNCTION_ARGS); Datum pg_stat_statements(PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1(pg_stat_statements_reset); +
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/27/2014 07:09 AM, KONDO Mitsumasa wrote: (2014/01/23 23:18), Andrew Dunstan wrote: What is more, if the square root calculation is affecting your benchmarks, I suspect you are benchmarking the wrong thing. I run another test that has two pgbench-clients in same time, one is select-only-query and another is executing 'SELECT * pg_stat_statement' query in every one second. I used v6 patch in this test. The issue of concern is not the performance of pg_stat_statements, AUIU. The issue is whether this patch affects performance generally, i.e. is there a significant cost in collecting these extra stats. To test this you would compare two general pgbench runs, one with the patch applied and one without. I personally don't give a tinker's cuss about whether the patch slows down pg_stat_statements a bit. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
2014-01-27 Andrew Dunstan and...@dunslane.net On 01/27/2014 07:09 AM, KONDO Mitsumasa wrote: (2014/01/23 23:18), Andrew Dunstan wrote: What is more, if the square root calculation is affecting your benchmarks, I suspect you are benchmarking the wrong thing. I run another test that has two pgbench-clients in same time, one is select-only-query and another is executing 'SELECT * pg_stat_statement' query in every one second. I used v6 patch in this test. The issue of concern is not the performance of pg_stat_statements, AUIU. The issue is whether this patch affects performance generally, i.e. is there a significant cost in collecting these extra stats. To test this you would compare two general pgbench runs, one with the patch applied and one without. I showed first test result which is compared with without pg_stat_statements and without patch last day. They ran in same server and same benchmark settings(clients and scale factor) as today's result. When you merge and see the results, you can confirm not to affect of performance in my patch. Regards, -- Mitsumasa KONDO NTT Open Source Software Center
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/27/2014 08:48 AM, Mitsumasa KONDO wrote: The issue of concern is not the performance of pg_stat_statements, AUIU. The issue is whether this patch affects performance generally, i.e. is there a significant cost in collecting these extra stats. To test this you would compare two general pgbench runs, one with the patch applied and one without. I showed first test result which is compared with without pg_stat_statements and without patch last day. They ran in same server and same benchmark settings(clients and scale factor) as today's result. When you merge and see the results, you can confirm not to affect of performance in my patch. Yeah, sorry, I misread your message. I think this is good to go from a performance point of view, although I'm still a bit worried about the validity of the method (accumulating a sum of squares). OTOH, Welford's method probably requires slightly more per statement overhead, and certainly requires one more accumulator per statement. I guess if we find ill conditioned results it wouldn't be terribly hard to change. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Mon, Jan 27, 2014 at 4:45 AM, Andrew Dunstan and...@dunslane.net wrote: I personally don't give a tinker's cuss about whether the patch slows down pg_stat_statements a bit. Why not? The assurance that the overhead is generally very low is what makes it possible to install it widely usually without a second thought. That's hugely valuable. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/27/2014 04:37 PM, Peter Geoghegan wrote: On Mon, Jan 27, 2014 at 4:45 AM, Andrew Dunstan and...@dunslane.net wrote: I personally don't give a tinker's cuss about whether the patch slows down pg_stat_statements a bit. Why not? The assurance that the overhead is generally very low is what makes it possible to install it widely usually without a second thought. That's hugely valuable. I care very much what the module does to the performance of all statements. But I don't care much if selecting from pg_stat_statements itself is a bit slowed. Perhaps I didn't express myself as clearly as I could have. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Mon, Jan 27, 2014 at 2:01 PM, Andrew Dunstan and...@dunslane.net wrote: I care very much what the module does to the performance of all statements. But I don't care much if selecting from pg_stat_statements itself is a bit slowed. Perhaps I didn't express myself as clearly as I could have. Oh, I see. Of course the overhead of calling the pg_stat_statements() function is much less important. Actually, I think that calling that function is going to add a lot of noise to any benchmark aimed at measuring added overhead as the counters struct is expanded. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 27th January, Mitsumasa KONDO wrote: 2014-01-26 Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com On 21 January 2014 19:48, Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com wrote: On 21 January 2014 12:54, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp mailto:kondo.mitsum...@lab.ntt.co.jp wrote: Rebased patch is attached. Does this fix the Windows bug reported by Kumar on 20/11/2013 ? Sorry, I was misunderstanding. First name of Mr. Rajeev Rastogi is Kumar! I searched only e-mail address and title by his name... I don't have windows compiler enviroment, but attached patch might be fixed. Could I ask Mr. Rajeev Rastogi to test my patch again? I tried to test this but I could not apply the patch on latest git HEAD. This may be because of recent patch (related to pg_stat_statement only pg_stat_statements external query text storage ), which got committed on 27th January. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 21 January 2014 19:48, Simon Riggs si...@2ndquadrant.com wrote: On 21 January 2014 12:54, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: Rebased patch is attached. Does this fix the Windows bug reported by Kumar on 20/11/2013 ? Please respond. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
2014-01-26 Simon Riggs si...@2ndquadrant.com On 21 January 2014 19:48, Simon Riggs si...@2ndquadrant.com wrote: On 21 January 2014 12:54, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: Rebased patch is attached. Does this fix the Windows bug reported by Kumar on 20/11/2013 ? Please respond. Oh.. Very sorry... Last day, I tried to find Kumar mail at 20/11/2013. But I couldn't find it... Could you tell me e-mail title? My patch catches up with latest 9.4HEAD. Regards, -- Mitsumasa KONDO
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 23 January 2014 12:43, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: I tested my patch in pgbench, but I cannot find bottleneck of my latest patch. ... Attached is latest developping patch. It hasn't been test much yet, but sqrt caluclation may be faster. Thank you for reworking this so that the calculation happens outside the lock. Given your testing, and my own observation that the existing code could be reworked to avoid contention if people become concerned, I think this is now ready for commit, as soon as the last point about Windows is answered. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2014/01/23 10:28), Peter Geoghegan wrote: On Wed, Jan 22, 2014 at 5:28 PM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: Oh, thanks to inform me. I think essential problem of my patch has bottle neck in sqrt() function and other division caluculation. Well, that's a pretty easy theory to test. Just stop calling them (and do something similar to what we do for current counter fields instead) and see how much difference it makes. What means calling them? I think that part of heavy you think is pg_stat_statement view that is called by select query, not a part of LWLock getting statistic by hook. Right? I tested my patch in pgbench, but I cannot find bottleneck of my latest patch. (Sorry, I haven't been test select query in pg_stat_statement view...) Here is a test result. * Result (Test result is represented by tps.) method| try1 | try2 | try3 without pgss | 130938 | 131558 | 131796 with pgss| 125164 | 125146 | 125358 with patched pgss| 126091 | 126681 | 126433 * Test Setting shared_buffers=1024MB checkpoint_segments = 300 checkpoint_timeout = 15min checkpoint_completion_target = 0.7 * pgbench script pgbench -h xxx.xxx.xxx.xxx mitsu-ko -i -s 100 psql -h xxx.xxx.xxx.xxx mitsu-ko -c 'CHECKPOINT' pgbench -h xxx.xxx.xxx.xxx mitsu-ko -c64 -j32 -S -T 180 pgbench -h xxx.xxx.xxx.xxx mitsu-ko -c64 -j32 -S -T 180 pgbench -h xxx.xxx.xxx.xxx mitsu-ko -c64 -j32 -S -T 180 * Server SPEC: CPU: Xeon E5-2670 1P/8C 2.6GHz #We don't have 32 core cpu... Memory: 24GB RAID: i420 2GB cache Disk: 15K * 6 (RAID 1+0) Attached is latest developping patch. It hasn't been test much yet, but sqrt caluclation may be faster. Regards, -- Mitsumasa KONDO NTT Open Source Software Center *** a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql --- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql *** *** 19,24 CREATE FUNCTION pg_stat_statements( --- 19,27 OUT query text, OUT calls int8, OUT total_time float8, + OUT min_time float8, + OUT max_time float8, + OUT stdev_time float8, OUT rows int8, OUT shared_blks_hit int8, OUT shared_blks_read int8, *** *** 41,43 CREATE VIEW pg_stat_statements AS --- 44,52 SELECT * FROM pg_stat_statements(); GRANT SELECT ON pg_stat_statements TO PUBLIC; + + /* New Function */ + CREATE FUNCTION pg_stat_statements_reset_time() + RETURNS void + AS 'MODULE_PATHNAME' + LANGUAGE C; *** a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql --- b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql *** *** 9,14 RETURNS void --- 9,19 AS 'MODULE_PATHNAME' LANGUAGE C; + CREATE FUNCTION pg_stat_statements_reset_time() + RETURNS void + AS 'MODULE_PATHNAME' + LANGUAGE C; + CREATE FUNCTION pg_stat_statements( OUT userid oid, OUT dbid oid, *** *** 16,21 CREATE FUNCTION pg_stat_statements( --- 21,29 OUT query text, OUT calls int8, OUT total_time float8, + OUT min_time float8, + OUT max_time float8, + OUT stdev_time float8, OUT rows int8, OUT shared_blks_hit int8, OUT shared_blks_read int8, *** *** 42,44 GRANT SELECT ON pg_stat_statements TO PUBLIC; --- 50,53 -- Don't want this to be available to non-superusers. REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC; + REVOKE ALL ON FUNCTION pg_stat_statements_reset_time() FROM PUBLIC; *** a/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql --- b/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql *** *** 4,8 --- 4,9 \echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset(); + ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset_time(); ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements(); ALTER EXTENSION pg_stat_statements ADD view pg_stat_statements; *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** *** 78,84 static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; #define USAGE_DECREASE_FACTOR (0.99) /* decreased every entry_dealloc */ #define STICKY_DECREASE_FACTOR (0.50) /* factor for sticky entries */ #define USAGE_DEALLOC_PERCENT 5 /* free this % of entries at once */ ! #define JUMBLE_SIZE1024 /* query serialization buffer size */ /* --- 78,85 #define USAGE_DECREASE_FACTOR (0.99) /* decreased every entry_dealloc */ #define STICKY_DECREASE_FACTOR (0.50) /* factor for sticky entries */ #define USAGE_DEALLOC_PERCENT 5 /* free this % of entries at once */ ! #define EXEC_TIME_INIT_MIN DBL_MAX /* initial execution min time */ ! #define EXEC_TIME_INIT_MAX -DBL_MAX /* initial execution max time */
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/22/2014 11:33 PM, KONDO Mitsumasa wrote: (2014/01/23 12:00), Andrew Dunstan wrote: On 01/22/2014 08:28 PM, KONDO Mitsumasa wrote: (2014/01/22 22:26), Robert Haas wrote: On Wed, Jan 22, 2014 at 3:32 AM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: OK, Kondo, please demonstrate benchmarks that show we have 1% impact from this change. Otherwise we may need a config parameter to allow the calculation. OK, testing DBT-2 now. However, error range of benchmark might be 1% higher. So I show you detail HTML results. To see any impact from spinlock contention, I think you're pretty much going to need a machine with 32 cores, I think, and lots of concurrency. pgbench -S is probably a better test than DBT-2, because it leaves out all the writing, so percentage-wise more time will be spent doing things like updating the pgss hash table. Oh, thanks to inform me. I think essential problem of my patch has bottle neck in sqrt() function and other division caluculation. I will replcace sqrt() function in math.h to more faster algorithm. And moving unneccessary part of caluculation in LWlocks or other locks. It might take time to improvement, so please wait for a while. Umm, I have not read the patch, but are you not using Welford's method? Its per-statement overhead should be absolutely tiny (and should not compute a square root at all per statement - the square root should only be computed when the standard deviation is actually wanted, e.g. when a user examines pg_stat_statements) See for example http://www.johndcook.com/standard_deviation.html Thanks for your advice. I read your example roughly, however, I think calculating variance is not so heavy in my patch. Double based sqrt caluculation is most heavily in my mind. And I find fast square root algorithm that is used in 3D games. http://en.wikipedia.org/wiki/Fast_inverse_square_root This page shows inverse square root algorithm, but it can caluculate normal square root, and it is faster algorithm at the price of precision than general algorithm. I think we want to fast algorithm, so it will be suitable. According to the link I gave above: The most obvious way to compute variance then would be to have two sums: one to accumulate the sum of the x's and another to accumulate the sums of the squares of the x's. If the x's are large and the differences between them small, direct evaluation of the equation above would require computing a small number as the difference of two large numbers, a red flag for numerical computing. The loss of precision can be so bad that the expression above evaluates to a /negative/ number even though variance is always positive. As I read your patch that's what it seems to be doing. What is more, if the square root calculation is affecting your benchmarks, I suspect you are benchmarking the wrong thing. The benchmarks should not call for a single square root calculation. What we really want to know is what is the overhead from keeping these stats. But your total runtime code (i.e. code NOT from calling pg_stat_statements()) for stddev appears to be this: e-counters.total_sqtime += total_time * total_time; cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2014/01/22 9:34), Simon Riggs wrote: AFAICS, all that has happened is that people have given their opinions and we've got almost the same identical patch, with a rush-rush comment to commit even though we've waited months. If you submit a patch, then you need to listen to feedback and be clear about what you will do next, if you don't people will learn to ignore you and nobody wants that. I think it was replied that will be heavily. If we realize histogram in pg_stat_statements, we have to implement dobuble precision arrays for storing histogram data. And when we update histogram data in each statements, we must update arrays with searching what response time is the smallest or biggest? It is very big cost, assuming large memory, and too hevily when updating than we get benefit from it. So I just add stddev for as fast as latest pg_stat_statements. I got some agreed from some people, as you say. On 21 January 2014 21:19, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 21, 2014 at 11:48 AM, Simon Riggs si...@2ndquadrant.com wrote: I agree with people saying that stddev is better than nothing at all, so I am inclined to commit this, in spite of the above. I could live with stddev. But we really ought to be investing in making pg_stat_statements work well with third-party tools. I am very wary of enlarging the counters structure, because it is protected by a spinlock. There has been no attempt to quantify that cost, nor has anyone even theorized that it is not likely to be appreciable. OK, Kondo, please demonstrate benchmarks that show we have 1% impact from this change. Otherwise we may need a config parameter to allow the calculation. OK, testing DBT-2 now. However, error range of benchmark might be 1% higher. So I show you detail HTML results. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Wed, Jan 22, 2014 at 3:32 AM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: OK, Kondo, please demonstrate benchmarks that show we have 1% impact from this change. Otherwise we may need a config parameter to allow the calculation. OK, testing DBT-2 now. However, error range of benchmark might be 1% higher. So I show you detail HTML results. To see any impact from spinlock contention, I think you're pretty much going to need a machine with 32 cores, I think, and lots of concurrency. pgbench -S is probably a better test than DBT-2, because it leaves out all the writing, so percentage-wise more time will be spent doing things like updating the pgss hash table. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2014/01/22 22:26), Robert Haas wrote: On Wed, Jan 22, 2014 at 3:32 AM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: OK, Kondo, please demonstrate benchmarks that show we have 1% impact from this change. Otherwise we may need a config parameter to allow the calculation. OK, testing DBT-2 now. However, error range of benchmark might be 1% higher. So I show you detail HTML results. To see any impact from spinlock contention, I think you're pretty much going to need a machine with 32 cores, I think, and lots of concurrency. pgbench -S is probably a better test than DBT-2, because it leaves out all the writing, so percentage-wise more time will be spent doing things like updating the pgss hash table. Oh, thanks to inform me. I think essential problem of my patch has bottle neck in sqrt() function and other division caluculation. I will replcace sqrt() function in math.h to more faster algorithm. And moving unneccessary part of caluculation in LWlocks or other locks. It might take time to improvement, so please wait for a while. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Wed, Jan 22, 2014 at 5:28 PM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: Oh, thanks to inform me. I think essential problem of my patch has bottle neck in sqrt() function and other division caluculation. Well, that's a pretty easy theory to test. Just stop calling them (and do something similar to what we do for current counter fields instead) and see how much difference it makes. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/22/2014 08:28 PM, KONDO Mitsumasa wrote: (2014/01/22 22:26), Robert Haas wrote: On Wed, Jan 22, 2014 at 3:32 AM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: OK, Kondo, please demonstrate benchmarks that show we have 1% impact from this change. Otherwise we may need a config parameter to allow the calculation. OK, testing DBT-2 now. However, error range of benchmark might be 1% higher. So I show you detail HTML results. To see any impact from spinlock contention, I think you're pretty much going to need a machine with 32 cores, I think, and lots of concurrency. pgbench -S is probably a better test than DBT-2, because it leaves out all the writing, so percentage-wise more time will be spent doing things like updating the pgss hash table. Oh, thanks to inform me. I think essential problem of my patch has bottle neck in sqrt() function and other division caluculation. I will replcace sqrt() function in math.h to more faster algorithm. And moving unneccessary part of caluculation in LWlocks or other locks. It might take time to improvement, so please wait for a while. Umm, I have not read the patch, but are you not using Welford's method? Its per-statement overhead should be absolutely tiny (and should not compute a square root at all per statement - the square root should only be computed when the standard deviation is actually wanted, e.g. when a user examines pg_stat_statements) See for example http://www.johndcook.com/standard_deviation.html cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2014/01/23 12:00), Andrew Dunstan wrote: On 01/22/2014 08:28 PM, KONDO Mitsumasa wrote: (2014/01/22 22:26), Robert Haas wrote: On Wed, Jan 22, 2014 at 3:32 AM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: OK, Kondo, please demonstrate benchmarks that show we have 1% impact from this change. Otherwise we may need a config parameter to allow the calculation. OK, testing DBT-2 now. However, error range of benchmark might be 1% higher. So I show you detail HTML results. To see any impact from spinlock contention, I think you're pretty much going to need a machine with 32 cores, I think, and lots of concurrency. pgbench -S is probably a better test than DBT-2, because it leaves out all the writing, so percentage-wise more time will be spent doing things like updating the pgss hash table. Oh, thanks to inform me. I think essential problem of my patch has bottle neck in sqrt() function and other division caluculation. I will replcace sqrt() function in math.h to more faster algorithm. And moving unneccessary part of caluculation in LWlocks or other locks. It might take time to improvement, so please wait for a while. Umm, I have not read the patch, but are you not using Welford's method? Its per-statement overhead should be absolutely tiny (and should not compute a square root at all per statement - the square root should only be computed when the standard deviation is actually wanted, e.g. when a user examines pg_stat_statements) See for example http://www.johndcook.com/standard_deviation.html Thanks for your advice. I read your example roughly, however, I think calculating variance is not so heavy in my patch. Double based sqrt caluculation is most heavily in my mind. And I find fast square root algorithm that is used in 3D games. http://en.wikipedia.org/wiki/Fast_inverse_square_root This page shows inverse square root algorithm, but it can caluculate normal square root, and it is faster algorithm at the price of precision than general algorithm. I think we want to fast algorithm, so it will be suitable. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
Rebased patch is attached. pg_stat_statements in PG9.4dev has already changed table columns in. So I hope this patch will be committed in PG9.4dev. Regards, -- Mitsumasa KONDO NTT Open Source Software Center *** a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql --- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql *** *** 19,24 CREATE FUNCTION pg_stat_statements( --- 19,27 OUT query text, OUT calls int8, OUT total_time float8, + OUT min_time float8, + OUT max_time float8, + OUT stdev_time float8, OUT rows int8, OUT shared_blks_hit int8, OUT shared_blks_read int8, *** *** 41,43 CREATE VIEW pg_stat_statements AS --- 44,52 SELECT * FROM pg_stat_statements(); GRANT SELECT ON pg_stat_statements TO PUBLIC; + + /* New Function */ + CREATE FUNCTION pg_stat_statements_reset_time() + RETURNS void + AS 'MODULE_PATHNAME' + LANGUAGE C; *** a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql --- b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql *** *** 9,14 RETURNS void --- 9,19 AS 'MODULE_PATHNAME' LANGUAGE C; + CREATE FUNCTION pg_stat_statements_reset_time() + RETURNS void + AS 'MODULE_PATHNAME' + LANGUAGE C; + CREATE FUNCTION pg_stat_statements( OUT userid oid, OUT dbid oid, *** *** 16,21 CREATE FUNCTION pg_stat_statements( --- 21,29 OUT query text, OUT calls int8, OUT total_time float8, + OUT min_time float8, + OUT max_time float8, + OUT stdev_time float8, OUT rows int8, OUT shared_blks_hit int8, OUT shared_blks_read int8, *** *** 42,44 GRANT SELECT ON pg_stat_statements TO PUBLIC; --- 50,53 -- Don't want this to be available to non-superusers. REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC; + REVOKE ALL ON FUNCTION pg_stat_statements_reset_time() FROM PUBLIC; *** a/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql --- b/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql *** *** 4,8 --- 4,9 \echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset(); + ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset_time(); ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements(); ALTER EXTENSION pg_stat_statements ADD view pg_stat_statements; *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** *** 78,83 static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; --- 78,84 #define USAGE_DECREASE_FACTOR (0.99) /* decreased every entry_dealloc */ #define STICKY_DECREASE_FACTOR (0.50) /* factor for sticky entries */ #define USAGE_DEALLOC_PERCENT 5 /* free this % of entries at once */ + #define EXEC_TIME_INIT (-1) /* initial execution time */ #define JUMBLE_SIZE1024 /* query serialization buffer size */ *** *** 114,119 typedef struct Counters --- 115,123 { int64 calls; /* # of times executed */ double total_time; /* total execution time, in msec */ + double total_sqtime; /* cumulated square execution time, in msec */ + double min_time; /* maximum execution time, in msec */ + double max_time; /* minimum execution time, in msec */ int64 rows; /* total # of retrieved or affected rows */ int64 shared_blks_hit; /* # of shared buffer hits */ int64 shared_blks_read; /* # of shared disk blocks read */ *** *** 237,245 void _PG_init(void); --- 241,251 void _PG_fini(void); Datum pg_stat_statements_reset(PG_FUNCTION_ARGS); + Datum pg_stat_statements_reset_time(PG_FUNCTION_ARGS); Datum pg_stat_statements(PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1(pg_stat_statements_reset); + PG_FUNCTION_INFO_V1(pg_stat_statements_reset_time); PG_FUNCTION_INFO_V1(pg_stat_statements); static void pgss_shmem_startup(void); *** *** 266,271 static pgssEntry *entry_alloc(pgssHashKey *key, const char *query, --- 272,278 int query_len, bool sticky); static void entry_dealloc(void); static void entry_reset(void); + static void entry_reset_time(void); static void AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size); static void JumbleQuery(pgssJumbleState *jstate, Query *query); *** *** 1046,1051 pgss_store(const char *query, uint32 queryId, --- 1053,1059 e-counters.calls += 1; e-counters.total_time += total_time; + e-counters.total_sqtime += total_time * total_time; e-counters.rows += rows; e-counters.shared_blks_hit += bufusage-shared_blks_hit; e-counters.shared_blks_read += bufusage-shared_blks_read; *** *** 1061,1066 pgss_store(const char *query, uint32
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 21 January 2014 12:54, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: Rebased patch is attached. Does this fix the Windows bug reported by Kumar on 20/11/2013 ? pg_stat_statements in PG9.4dev has already changed table columns in. So I hope this patch will be committed in PG9.4dev. I've read through the preceding discussion and I'm not very impressed. Lots of people have spoken about wanting histogram output and I can't even see a clear statement of whether that will or will not happen. AFAICS, all that has happened is that people have given their opinions and we've got almost the same identical patch, with a rush-rush comment to commit even though we've waited months. If you submit a patch, then you need to listen to feedback and be clear about what you will do next, if you don't people will learn to ignore you and nobody wants that. I should point out that technically this patch is late and we could reject it solely on that basis, if we wanted to. I agree with people saying that stddev is better than nothing at all, so I am inclined to commit this, in spite of the above. Any objections to commit? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers