Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-30 Thread Greg Stark
On Sun, Apr 29, 2012 at 12:26 AM, Robert Haas robertmh...@gmail.com wrote:
 As for track_iotiming - track_io_timing, I'm fine with that as well.

I'm still grumpy about the idea of a GUC changing the explain analyze
output. How would people feel about adding an explain option that
explicitly requests io timing for this explain analyze and then having
the io timing be enabled if either it's requested by explain analyze
or if it's set on globally? That would make it more consistent with
the other explain analyze options?

I realize I don't get to be grumpy without actually contributing
anything, but I'm happy to write up the patch if people agree with the
change.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-30 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 On Sun, Apr 29, 2012 at 12:26 AM, Robert Haas robertmh...@gmail.com wrote:
 As for track_iotiming - track_io_timing, I'm fine with that as well.

 I'm still grumpy about the idea of a GUC changing the explain analyze
 output. How would people feel about adding an explain option that
 explicitly requests io timing for this explain analyze and then having
 the io timing be enabled if either it's requested by explain analyze
 or if it's set on globally? That would make it more consistent with
 the other explain analyze options?

I think it's going to be hard to decouple that altogether.  For
instance, if track_io_timing were not on but you did EXPLAIN (TIMING),
you'd end up with timing info getting sent to the stats collector for
just that one statement.  That seems a bit weird too.

I see where you're coming from but I don't think it's a good idea to
add an EXPLAIN option unless you can make the two behaviors (EXPLAIN
reporting and stats collection) truly independent.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I like the idea of including the word block in there.  I don't think
 it was probably a terribly good idea to abbreviate that to blk
 everywhere, but at this point it's probably better to be consistent,
 sigh.

 As for track_iotiming - track_io_timing, I'm fine with that as well.

I made these changes, so I think we are done with the naming issues.
However, I'd still like to propose that we think about adjusting the
timing column datatypes, ie uniformly use float8 msec for values
representing elapsed times.  By my count there are six columns that
would be affected:

pg_stat_bgwriter.checkpoint_write_time
pg_stat_bgwriter.checkpoint_sync_time
pg_stat_database.blk_read_time
pg_stat_database.blk_write_time
pg_stat_user_functions.total_time
pg_stat_user_functions.self_time

The first four of these are new in 9.2, meaning that we would only be
creating a compatibility issue for the last two.  If we wait to do this
in the future, we will have a significantly bigger problem than if we
do it today.  Advantages of the change are:

* Better precision exposed to the user (pg_stat_user_functions has
historically provided only millisecond precision).

* Removal of arbitrary limit of microsecond precision.  Of course,
the underlying data is still no better than microsecond, but if we
ever are able to migrate to OS APIs that return better-than-microsecond
data, we won't have to adjust the stats view APIs to expose that data.

* A chance to make the functions underlying these stats view columns
agree with the exposed column definitions.

Any objections out there?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 ... You might want to revisit the issue of how the new
 columns in pg_stat_statements are named, as well.  I am not sure I'm
 happy with that, but neither am I sure that I know what I'd like
 better.  It's not too clear that the timing is specifically for data
 block reads and writes, for example.

Well, the names time_read and time_write are certainly out of step
with every other stats view in the system; everyplace else, such columns
are named something_time (and even in this view itself the other
timing column is total_time, not time_total).  So that's got to
change.  We could just reverse the word order to read_time and
write_time, or we could do something like buf_read_time or
data_read_time.  IIUC block_read_time/block_write_time in the
pg_stat_database view are database-wide totals for the same numbers, so
perhaps the pg_stat_statements column names should be consistent with
those.  I am kinda wondering though why those columns spell out block
where every single other column name in the stats views uses the
abbreviation blk.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-28 Thread Tom Lane
... btw, while I'm criticizing names, how about changing
track_iotiming to track_io_timing?  The former seems inelegant and
unreadable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-28 Thread Robert Haas
On Sat, Apr 28, 2012 at 12:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 ... You might want to revisit the issue of how the new
 columns in pg_stat_statements are named, as well.  I am not sure I'm
 happy with that, but neither am I sure that I know what I'd like
 better.  It's not too clear that the timing is specifically for data
 block reads and writes, for example.

 Well, the names time_read and time_write are certainly out of step
 with every other stats view in the system; everyplace else, such columns
 are named something_time (and even in this view itself the other
 timing column is total_time, not time_total).  So that's got to
 change.  We could just reverse the word order to read_time and
 write_time, or we could do something like buf_read_time or
 data_read_time.  IIUC block_read_time/block_write_time in the
 pg_stat_database view are database-wide totals for the same numbers, so
 perhaps the pg_stat_statements column names should be consistent with
 those.  I am kinda wondering though why those columns spell out block
 where every single other column name in the stats views uses the
 abbreviation blk.

I like the idea of including the word block in there.  I don't think
it was probably a terribly good idea to abbreviate that to blk
everywhere, but at this point it's probably better to be consistent,
sigh.

As for track_iotiming - track_io_timing, I'm fine with that as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-25 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Apr 14, 2012 at 10:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 There's no particular reason to think that Moore's Law is going to
 result in an increase in the fractional precision of timing data.
 It hasn't done so in the past, for sure.

 Perhaps, but nobody's explained what we gain out of NOT using numeric.
  It's slow doesn't impress me; selecting from a system view doesn't
 need to be lightning-fast.

Well, how about the code is going to be quite a lot less readable?
C can manipulate floats natively, but not numerics.

Also, as was pointed out upthread, the underlying data in shared memory
is almost certainly never going to be infinite-precision; so using
numeric in the API seems to me to be more likely to convey a false
impression of exactness than to do anything useful.

 However, the main thing here is that we need to do *something* here...

Agreed, this has got to be pushed forward.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-25 Thread Robert Haas
On Wed, Apr 25, 2012 at 12:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, Apr 14, 2012 at 10:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 There's no particular reason to think that Moore's Law is going to
 result in an increase in the fractional precision of timing data.
 It hasn't done so in the past, for sure.

 Perhaps, but nobody's explained what we gain out of NOT using numeric.
  It's slow doesn't impress me; selecting from a system view doesn't
 need to be lightning-fast.

 Well, how about the code is going to be quite a lot less readable?
 C can manipulate floats natively, but not numerics.

 Also, as was pointed out upthread, the underlying data in shared memory
 is almost certainly never going to be infinite-precision; so using
 numeric in the API seems to me to be more likely to convey a false
 impression of exactness than to do anything useful.

 However, the main thing here is that we need to do *something* here...

 Agreed, this has got to be pushed forward.

In the interest of furthering that goal, I propose that whoever is
willing to take the time to clean this up gets to decide what to
standardize on, and I'm happy to give you first crack at that if you
have the cycles.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-25 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Apr 25, 2012 at 12:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 However, the main thing here is that we need to do *something* here...

 Agreed, this has got to be pushed forward.

 In the interest of furthering that goal, I propose that whoever is
 willing to take the time to clean this up gets to decide what to
 standardize on, and I'm happy to give you first crack at that if you
 have the cycles.

OK.  I have just returned from some emergency family business, and have
got assorted catching-up to do, but I will try to get to that later
this week.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-25 Thread Robert Haas
On Wed, Apr 25, 2012 at 1:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Apr 25, 2012 at 12:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 However, the main thing here is that we need to do *something* here...

 Agreed, this has got to be pushed forward.

 In the interest of furthering that goal, I propose that whoever is
 willing to take the time to clean this up gets to decide what to
 standardize on, and I'm happy to give you first crack at that if you
 have the cycles.

 OK.  I have just returned from some emergency family business, and have
 got assorted catching-up to do, but I will try to get to that later
 this week.

Sounds good to me.  You might want to revisit the issue of how the new
columns in pg_stat_statements are named, as well.  I am not sure I'm
happy with that, but neither am I sure that I know what I'd like
better.  It's not too clear that the timing is specifically for data
block reads and writes, for example.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-25 Thread Greg Stark
On Wed, Apr 25, 2012 at 5:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Also, as was pointed out upthread, the underlying data in shared memory
 is almost certainly never going to be infinite-precision; so using
 numeric in the API seems to me to be more likely to convey a false
 impression of exactness than to do anything useful.

I don't think that follows. The underlyng data will be measured in
some metric unit of time like microsecond or nanosecond or something
like that. So a base-10 representation will show exactly the precision
that the underlying data has. On the other hand a floating point
number will show a base-2 approximation that may in fact display with
more digits than the underlying data representation has.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-25 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 On Wed, Apr 25, 2012 at 5:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Also, as was pointed out upthread, the underlying data in shared memory
 is almost certainly never going to be infinite-precision; so using
 numeric in the API seems to me to be more likely to convey a false
 impression of exactness than to do anything useful.

 I don't think that follows. The underlyng data will be measured in
 some metric unit of time like microsecond or nanosecond or something
 like that. So a base-10 representation will show exactly the precision
 that the underlying data has. On the other hand a floating point
 number will show a base-2 approximation that may in fact display with
 more digits than the underlying data representation has.

My point is that the underlying data is going to be stored in a
fixed-width representation, and therefore it will have accuracy and/or
range limitations that are considerably more severe than use of
numeric for output might suggest to the user.  In the current
pg_stat_statements code, timings are in fact accumulated in float8,
and emitting them as something other than float8 is just plain
misleading IMHO.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-25 Thread Robert Haas
On Wed, Apr 25, 2012 at 1:58 PM, Greg Stark st...@mit.edu wrote:
 On Wed, Apr 25, 2012 at 5:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Also, as was pointed out upthread, the underlying data in shared memory
 is almost certainly never going to be infinite-precision; so using
 numeric in the API seems to me to be more likely to convey a false
 impression of exactness than to do anything useful.

 I don't think that follows. The underlyng data will be measured in
 some metric unit of time like microsecond or nanosecond or something
 like that. So a base-10 representation will show exactly the precision
 that the underlying data has. On the other hand a floating point
 number will show a base-2 approximation that may in fact display with
 more digits than the underlying data representation has.

I wholeheartedly agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-25 Thread Peter Eisentraut
On mån, 2012-04-23 at 22:03 -0400, Robert Haas wrote:
 Perhaps, but nobody's explained what we gain out of NOT using numeric.

So if you want to have possibly different internal and external
representations, why not use interval for the external one?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-25 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On mån, 2012-04-23 at 22:03 -0400, Robert Haas wrote:
 Perhaps, but nobody's explained what we gain out of NOT using numeric.

 So if you want to have possibly different internal and external
 representations, why not use interval for the external one?

That doesn't add any usefulness, only extra complication for clients
that want to do more arithmetic with the values.  Also, as was pointed
out earlier, we have a hard-coded restriction to microsecond precision
with the default implementation of interval; and it's not hard to
foresee the day when that won't do.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-25 Thread Robert Haas
On Wed, Apr 25, 2012 at 5:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Eisentraut pete...@gmx.net writes:
 On mån, 2012-04-23 at 22:03 -0400, Robert Haas wrote:
 Perhaps, but nobody's explained what we gain out of NOT using numeric.

 So if you want to have possibly different internal and external
 representations, why not use interval for the external one?

 That doesn't add any usefulness, only extra complication for clients
 that want to do more arithmetic with the values.  Also, as was pointed
 out earlier, we have a hard-coded restriction to microsecond precision
 with the default implementation of interval; and it's not hard to
 foresee the day when that won't do.

Agreed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-23 Thread Robert Haas
On Sat, Apr 14, 2012 at 10:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 The internal representation doesn't have to be (and certainly
 shouldn't be) numeric.  But if you translate to numeric before
 returning the data to the user, then you have the freedom, in the
 future, to whack around the internal representation however you like,
 without breaking backward compatibility.  Choosing float8 for the
 external representation is fine as long as we're sure we're not ever
 going to want more than 16 significant digits, but I see no particular
 value in baking in that assumption.  But perhaps, as the saying goes,
 16 digits ought to be enough for anyone.

 There's no particular reason to think that Moore's Law is going to
 result in an increase in the fractional precision of timing data.
 It hasn't done so in the past, for sure.

Perhaps, but nobody's explained what we gain out of NOT using numeric.
 It's slow doesn't impress me; selecting from a system view doesn't
need to be lightning-fast.

However, the main thing here is that we need to do *something* here...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-14 Thread Greg Smith

On 04/13/2012 06:22 PM, Tom Lane wrote:

But (a) I *don't* want to seriously break things, and don't see a need
to; (b) interval is expensive and has got its own problems, notably an
internal limitation to usec resolution that we would not be able to get
rid of easily.


A straight float seems pretty future proof compared to a usec resolution 
interval.  Jim was commenting in the same direction I already did, that 
ns resolution is not impossible to see coming.


I also expect to compute plenty of derived statistics from these 
numbers.  Interval math is good enough that I'm sure such things could 
be done, but it seems odd to start with those units.  I appreciate that 
the interval type has a nice purist feel to it.  My pragmatic side says 
we're going to pay overhead to create in that type, only to find people 
end up converting it right back to other types for easier math tricks.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-14 Thread Robert Haas
On Sat, Apr 14, 2012 at 3:27 AM, Greg Smith g...@2ndquadrant.com wrote:
 On 04/13/2012 06:22 PM, Tom Lane wrote:

 But (a) I *don't* want to seriously break things, and don't see a need
 to; (b) interval is expensive and has got its own problems, notably an
 internal limitation to usec resolution that we would not be able to get
 rid of easily.

 A straight float seems pretty future proof compared to a usec resolution
 interval.  Jim was commenting in the same direction I already did, that ns
 resolution is not impossible to see coming.

 I also expect to compute plenty of derived statistics from these numbers.
  Interval math is good enough that I'm sure such things could be done, but
 it seems odd to start with those units.  I appreciate that the interval type
 has a nice purist feel to it.  My pragmatic side says we're going to pay
 overhead to create in that type, only to find people end up converting it
 right back to other types for easier math tricks.

I'm still rooting for numeric.  As somebody said upthread, performance
ain't critical here; and that lets us whack around the internal
representation however we like without worrying about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-14 Thread Peter Geoghegan
On 10 April 2012 19:10, Tom Lane t...@sss.pgh.pa.us wrote:
 Hmm.  Maybe we should think about numeric ms, which would have all the
 same advantages but without the round-off error.

 Color me unimpressed ... numeric calculations are vastly more expensive
 than float, and where are you going to get timing data that has more
 than sixteen decimal digits of accuracy?

+1

Besides, how do you propose to solve the problem of storing numerics
in a fixed allocation of shared memory? The only comparable thing in
pg_stat_statements is the query string, which is capped at
track_activity_query_size bytes for this very reason.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-14 Thread Peter Geoghegan
On 14 April 2012 02:42, Greg Stark st...@mit.edu wrote:
 Is this not subject to the birthday paradox? If you have a given hash
 you're worried about a collision with then you have a
 one-in-four-billion chance. But if you have a collection of hashes and
 you're worried about any collisions then it only takes about 64k
 before there's likely a collision.

Just for the sake of the archives, assuming that there is a uniform
distribution of values, by my calculations that puts the probability
of collision at:

pg_stat_statements.max of 1,000 =   0.00011562303995116263

and perhaps more representatively, if we follow the example in the docs:

pg_stat_statements.max of 10,000 = 0.011496378237656368

It's enough of a problem that I'd expect to hear one or two complaints
about it in the next few years, maybe. This is the probability of
there being a collision, not the probability of someone caring about
it.

You probably wouldn't want to push your luck too far:

pg_stat_statements.max of 100,000 = 0.6853509059051395

Even if you did, that would only mean that there was usually one, but
perhaps two or three values that were collisions, out of an entire
100,000. To labour the point, you'd have to have a lot of bad luck for
those to be the values that a human actually ended up caring about.

Jim Nasby said upthread that selecting from the stats view isn't
performance critical, and he's right. However, maintaining the stats
themselves certainly is, since each and every query will have to
update them, adding latency. pg_stat_statements is far from being a
tool of minority interest, particularly now. People are going to want
to add additional bells and whistles to it, which is fine by me, but
I'm very much opposed to making everyone pay for additional features
that imply performance overhead for all queries, particularly if the
feature is of minority interest.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-14 Thread Robert Haas
On Sat, Apr 14, 2012 at 9:54 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 10 April 2012 19:10, Tom Lane t...@sss.pgh.pa.us wrote:
 Hmm.  Maybe we should think about numeric ms, which would have all the
 same advantages but without the round-off error.

 Color me unimpressed ... numeric calculations are vastly more expensive
 than float, and where are you going to get timing data that has more
 than sixteen decimal digits of accuracy?

 +1

 Besides, how do you propose to solve the problem of storing numerics
 in a fixed allocation of shared memory? The only comparable thing in
 pg_stat_statements is the query string, which is capped at
 track_activity_query_size bytes for this very reason.

The internal representation doesn't have to be (and certainly
shouldn't be) numeric.  But if you translate to numeric before
returning the data to the user, then you have the freedom, in the
future, to whack around the internal representation however you like,
without breaking backward compatibility.  Choosing float8 for the
external representation is fine as long as we're sure we're not ever
going to want more than 16 significant digits, but I see no particular
value in baking in that assumption.  But perhaps, as the saying goes,
16 digits ought to be enough for anyone.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 The internal representation doesn't have to be (and certainly
 shouldn't be) numeric.  But if you translate to numeric before
 returning the data to the user, then you have the freedom, in the
 future, to whack around the internal representation however you like,
 without breaking backward compatibility.  Choosing float8 for the
 external representation is fine as long as we're sure we're not ever
 going to want more than 16 significant digits, but I see no particular
 value in baking in that assumption.  But perhaps, as the saying goes,
 16 digits ought to be enough for anyone.

There's no particular reason to think that Moore's Law is going to
result in an increase in the fractional precision of timing data.
It hasn't done so in the past, for sure.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Robert Haas
On Wed, Apr 11, 2012 at 9:02 AM, k...@rice.edu k...@rice.edu wrote:
 By using all 64-bits of the hash that we currently calculate, instead
 of the current use of 32-bits only, the collision probabilities are
 very low.

That's probably true, but I'm not sure it's worth worrying about -
one-in-four-billion is a pretty small probability.

On the broader issue, Peter's argument here seems to boil down to
there is probably a reason why this is useful and Tom's argument
seems to boil down to i don't want to expose it without knowing what
that reason is. Peter may well be right, but that doesn't make Tom
wrong.  If we are going to expose this, we ought to be able to
document why we have it, and right now we can't, because we don't
*really* know what it's good for, other than raising awareness of the
theoretical possibility of collisions, which doesn't seem like quite
enough.

On another note, I think this is a sufficiently major change that we
ought to add Peter's name to the Author section of the
pg_stat_statements documentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On another note, I think this is a sufficiently major change that we
 ought to add Peter's name to the Author section of the
 pg_stat_statements documentation.

+1 ... as long as we have those at all, they probably ought to credit
anybody who did substantial work on the module.

I think that eventually we'll have to give them up, just the way that
we don't credit anybody in particular as author of the core code; but
for now this is a good change.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Robert Haas
On Fri, Apr 13, 2012 at 4:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On another note, I think this is a sufficiently major change that we
 ought to add Peter's name to the Author section of the
 pg_stat_statements documentation.

 +1 ... as long as we have those at all, they probably ought to credit
 anybody who did substantial work on the module.

 I think that eventually we'll have to give them up, just the way that
 we don't credit anybody in particular as author of the core code; but
 for now this is a good change.

Yeah.  I'd actually be in favor of removing them, and similar
references in the comments, because they do lead and have led to
disputes about who deserves to be mentioned.  However, a change of
this magnitude certainly deserves mention; it's not really the same
module any more.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Peter Geoghegan
On 13 April 2012 20:15, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Apr 11, 2012 at 9:02 AM, k...@rice.edu k...@rice.edu wrote:
 By using all 64-bits of the hash that we currently calculate, instead
 of the current use of 32-bits only, the collision probabilities are
 very low.

 That's probably true, but I'm not sure it's worth worrying about -
 one-in-four-billion is a pretty small probability.

That assumes that our hashing of query trees will exhibit uniform
distribution. While it's easy enough to prove that hash_any does that,
it would seem to me that that does not imply that we will exhibit
perfectly uniform distribution within pg_stat_statements. The reason
that I invented the jumble nomenclature to distinguish it from a
straight serialisation is that, apart from the fact that many fields
are simply ignored, we still couldn't deserialize what we do store
from the jumble, because the correct way to serialise a tree entails
serialising NULL pointers too - two non-equivalent jumbles could
actually be bitwise identical. In practice, I think that adding
NodeTags ought to be sufficient here, but I wouldn't like to bet my
life on it. Actually, that is a point that perhaps should have been
touched on in the comments at the top of the file.

You may wish to take a look at my original analysis in the
pg_stat_statements normalisation thread, which attempts to quantify
the odds by drawing upon the mathematics of the birthday problem.

 On the broader issue, Peter's argument here seems to boil down to
 there is probably a reason why this is useful and Tom's argument
 seems to boil down to i don't want to expose it without knowing what
 that reason is. Peter may well be right, but that doesn't make Tom
 wrong.  If we are going to expose this, we ought to be able to
 document why we have it, and right now we can't, because we don't
 *really* know what it's good for, other than raising awareness of the
 theoretical possibility of collisions, which doesn't seem like quite
 enough.

Well, it's important to realise that the query string isn't really
stable, to the extent that it could differ as the query is evicted and
re-enters pg_stat_statements over time. The hash value is necessarily
a stable identifier, as it is the very identifier used by
pg_stat_statements. People are going to want to aggregate this
information over long periods and across database clusters, and I
would really like to facilitate that.

To be honest, with the plans that we have for replication, with the
addition of things like cascading replication, I think it will
increasingly be the case that people prefer built-in replication. The
fact that the actual hash value is affected by factors like the
endianness of the architecture in question seems mostly irrelevant.

If we were to expose this, I'd suggest that the documentation in the
table describing each column say of the value:

query_hash | stable identifier used by pg_stat_statements for the query

There'd then be additional clarification after the existing reference
to the hash value, which gave the required caveats about the hash
value being subject to various implementation artefacts that
effectively only make the values persistently indentify queries
referencing particular objects in the same database (that is, the
object OID values are used), or across databases that are binary
clones, such as between a streaming replica master and its standby.
The OID restriction alone effectively shadows all others, so there's
no need to mention endianness or anything like that.

Anyone who jumped to the conclusion that their aggregation tool would
work fine with Slony or something based on the query_hash description
alone would probably have bigger problems.

 On another note, I think this is a sufficiently major change that we
 ought to add Peter's name to the Author section of the
 pg_stat_statements documentation.

Thanks. I actually thought this myself, but didn't want to mention it
because I didn't think that it was up to me to decide, or to attempt
to influence that decision.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Jim Nasby

On 4/10/12 5:07 PM, Greg Smith wrote:

I'd prefer to see at least usec resolution and 8 bytes of dynamic range for 
query related statistics.  Any of these would be fine from a UI perspective to me:

-float8 seconds
-float8 msec
-float8 usec
-int8 usec

I don't think int8 msec will be enough resolution to time queries for very 
long, if it's not already obsolete.  The committed example for pg_test_timing 
on good hardware already clocks trivial events at a single usec.  Even I/O is 
getting there.  I've measured my Fusion-io loaner card peaking at 8GB/s, which 
works out to 1 usec per 8K page. None of that is even price no object hardware 
today; it's the stuff sitting in my office.

If anything, I'd expect more timing code in the database that only has ms 
resolution right now will start looking fat in a year or two, and more things 
might need to be shifted to usec instead.  Checkpoint timing can survive having 
less resolution because its primary drumbeat is very unlikely to drop below the 
minutes range.


I agree that ms is on it's way out... and frankly it wouldn't surprise me if at 
some point we actually had need of ns resolution.

Given that, I don't think ms or us definitions make sense... float8 seconds 
seams much more logical to me.

Though, if we're going to end up seriously breaking things anyway, perhaps it 
would make sense to switch everything over to interval... I realize that 
there's more overhead there, but I don't think selecting from the stats views 
is exactly performance critical.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Peter Eisentraut
On tis, 2012-04-10 at 09:33 -0400, Robert Haas wrote:
 So, should we make the new columns exposed by pg_stat_statements use
 milliseconds, so that the block read/write timings are everywhere in
 milliseconds, or should we keep them as a float8, so that all the
 times exposed by pg_stat_statements use float8?

Wouldn't interval be the proper type to represent elapsed time?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Tom Lane
Jim Nasby j...@nasby.net writes:
 I agree that ms is on it's way out... and frankly it wouldn't surprise me if 
 at some point we actually had need of ns resolution.

 Given that, I don't think ms or us definitions make sense... float8 seconds 
 seams much more logical to me.

Well, the important point is that it be float8, so that fractions of
whatever units you choose can be represented.  I do not feel a strong
need to change the units in all the existing pg_stat_ columns from msec
to sec; that strikes me as just breaking things to little gain.  If the
majority of them were in sec then I'd want to converge on that, but
since the majority are in msec it seems like the path of least breakage
is to converge on that.

 Though, if we're going to end up seriously breaking things anyway,
 perhaps it would make sense to switch everything over to interval... I
 realize that there's more overhead there, but I don't think selecting
 from the stats views is exactly performance critical.

But (a) I *don't* want to seriously break things, and don't see a need
to; (b) interval is expensive and has got its own problems, notably an
internal limitation to usec resolution that we would not be able to get
rid of easily.  It's not even apparent to me that interval is
semantically The Right Thing for values that represent an accumulation
of a lot of different measurements.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Greg Stark
On Fri, Apr 13, 2012 at 8:15 PM, Robert Haas robertmh...@gmail.com wrote:
 That's probably true, but I'm not sure it's worth worrying about -
 one-in-four-billion is a pretty small probability.

Is this not subject to the birthday paradox? If you have a given hash
you're worried about a collision with then you have a
one-in-four-billion chance. But if you have a collection of hashes and
you're worried about any collisions then it only takes about 64k
before there's likely a collision.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 On Fri, Apr 13, 2012 at 8:15 PM, Robert Haas robertmh...@gmail.com wrote:
 That's probably true, but I'm not sure it's worth worrying about -
 one-in-four-billion is a pretty small probability.

 Is this not subject to the birthday paradox? If you have a given hash
 you're worried about a collision with then you have a
 one-in-four-billion chance. But if you have a collection of hashes and
 you're worried about any collisions then it only takes about 64k
 before there's likely a collision.

... so, if pg_stat_statements.max were set as high as 64k, you would
need to worry.

Realistically, I'm more worried about collisions due to inadequacies in
the jumble calculation logic (Peter already pointed out some risk
factors in that regard).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Peter Geoghegan
On 14 April 2012 03:01, Tom Lane t...@sss.pgh.pa.us wrote:
 Realistically, I'm more worried about collisions due to inadequacies in
 the jumble calculation logic (Peter already pointed out some risk
 factors in that regard).

It's important to have a sense of proportion about the problem. The
worst thing that a collision can do is lead the DBA on a bit of a wild
goose chase. Importantly, collisions across databases and users are
impossible. I've always taken the view that aggregating query
statistics is a lossy process, and these kinds of problems seem like a
more than acceptable price to pay for low-overhead dynamic query
statistics .

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Robert Haas
On Fri, Apr 13, 2012 at 10:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark st...@mit.edu writes:
 On Fri, Apr 13, 2012 at 8:15 PM, Robert Haas robertmh...@gmail.com wrote:
 That's probably true, but I'm not sure it's worth worrying about -
 one-in-four-billion is a pretty small probability.

 Is this not subject to the birthday paradox? If you have a given hash
 you're worried about a collision with then you have a
 one-in-four-billion chance. But if you have a collection of hashes and
 you're worried about any collisions then it only takes about 64k
 before there's likely a collision.

 ... so, if pg_stat_statements.max were set as high as 64k, you would
 need to worry.

Well... at 64k, you'd be very likely to have a collision.  But the
whole birthday paradox thing means that there's a non-trivial
collision probability even at lower numbers of entries.  Seems like
maybe we ought to be using 64 bits here...

 Realistically, I'm more worried about collisions due to inadequacies in
 the jumble calculation logic (Peter already pointed out some risk
 factors in that regard).

...especially if collisions are even more frequent than random chance
would suggest.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-11 Thread k...@rice.edu
On Wed, Apr 11, 2012 at 01:53:06AM +0100, Peter Geoghegan wrote:
 On 11 April 2012 01:16, Tom Lane t...@sss.pgh.pa.us wrote:
  Peter Geoghegan pe...@2ndquadrant.com writes:
  On 11 April 2012 00:35, Robert Haas robertmh...@gmail.com wrote:
  If people need something like that, couldn't they create it by hashing
  the normalized query text with an arbitrary algorithm?
 
  That supposes that the normalised query text is perfectly stable. It
  may well not be, particularly for things like ad-hoc queries or
  queries generated by ORMs, across database clusters and over long
  periods of time -
 
  Indeed, but the hash value isn't stable either given those sorts of
  assumptions, so I'm not convinced that there's any advantage there.
 
 Isn't it? The hash captures the true meaning of the query, while
 having the database server's platform as a usually irrelevant
 artefact. Another thing that I forgot to mention is client encoding -
 it may well be fairly inconvenient to have to use the same algorithm
 to hash the query string across applications. You also have to hash
 the query string yourself again and again, which is expensive to do
 from Python or something, and is often inconvenient - differences
 beyond track_activity_query_size bytes (default:1024) are not
 recognised. Using an SQL code beautifier where a single byte varies
 now breaks everything, which developers don't expect at all (we've
 trained them not to), so in many ways you're back to the same
 limitations as classic pg_stat_statements if you attempt to aggregate
 queries over time and across machines, which is a very real use case.
 
 It's probably pretty annoying to have to get your Python app to use
 the same hash function as your Java app or whatever I, unless you want
 to use something heavyweight like a cryptographic hash function. By
 doing it within Postgres, you avoid those headaches.
 
 I'm not asking you to very loudly proclaim that it should be used like
 this - just expose it, accurately document it, and I'm quite confident
 that it will be widely used and relied upon by those that are
 reasonably well informed, and understand its limitations, which are
 really quite straightforward.
 
  What I think people would actually like to know, if they're in a
  situation where distinct query texts are getting hashed to the same
  thing, is *which* different texts got hashed to the same thing.
  But there's no good way to expose that given the pg_stat_statements
  infrastructure, and exposing the hash value doesn't help.
 
 Apart from detecting the case where we get a straightforward
 collision, I don't expect that that would be useful. The whole point
 is that the user doesn't care about the difference, and I think we've
 specified a practical, widely useful standard for when queries should
 be considered equivalent.
 -- 
 Peter Geoghegan       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training and Services
 

By using all 64-bits of the hash that we currently calculate, instead
of the current use of 32-bits only, the collision probabilities are
very low.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Robert Haas
On Thu, Apr 5, 2012 at 11:45 AM, Robert Haas robertmh...@gmail.com wrote:
 Meanwhile, pg_stat_statements converts the same data to seconds but
 makes it a double rather than a bigint.  I think that was a bad idea
 and we should make it consistent use a bigint and milliseconds while
 we still can.

Hmm.  So, on further review, this is not as simple as it seems.  I'd
like some input from other people on what we should do here.

pg_stat_statements has long exposed a column called total_time as a
float8.  It now exposes columns time_read and time_write which are
actually measuring the time spent reading and writing data blocks, and
those are also exposed as a float8; all these count seconds.

Meanwhile, all times exposed by the stats collector (including the new
and analagous pg_stat_database.block_read_time and
pg_stat_database.block_write_time columns) are exposed as int8; these
count milliseconds.

So, should we make the new columns exposed by pg_stat_statements use
milliseconds, so that the block read/write timings are everywhere in
milliseconds, or should we keep them as a float8, so that all the
times exposed by pg_stat_statements use float8?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Hmm.  So, on further review, this is not as simple as it seems.  I'd
 like some input from other people on what we should do here.

 pg_stat_statements has long exposed a column called total_time as a
 float8.  It now exposes columns time_read and time_write which are
 actually measuring the time spent reading and writing data blocks, and
 those are also exposed as a float8; all these count seconds.

 Meanwhile, all times exposed by the stats collector (including the new
 and analagous pg_stat_database.block_read_time and
 pg_stat_database.block_write_time columns) are exposed as int8; these
 count milliseconds.

 So, should we make the new columns exposed by pg_stat_statements use
 milliseconds, so that the block read/write timings are everywhere in
 milliseconds, or should we keep them as a float8, so that all the
 times exposed by pg_stat_statements use float8?

Given that we've whacked pg_stat_statements' behavior around rather
thoroughly in this release, maybe we could get away with redefining
total_time as being measured in msec rather than sec, thereby aligning
units as msec across the board.  It's arguably a smaller deal than the
way we've redefined what the query column contains...

float8 vs int8 is a distinct issue, and probably one that is not as
much of an impact on clients if we change it.  It is not hard to predict
that somebody will eventually want sub-msec resolution on these things,
which would suggest that float8 would be the better idea.  But perhaps
we could leave that change for a future release.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Peter Geoghegan
On 10 April 2012 14:33, Robert Haas robertmh...@gmail.com wrote:
 So, should we make the new columns exposed by pg_stat_statements use
 milliseconds, so that the block read/write timings are everywhere in
 milliseconds, or should we keep them as a float8, so that all the
 times exposed by pg_stat_statements use float8?

I believe that we should keep them as float8, on the basis that a user
is more likely to generalise from total_time's format (when writing a
script to query the view of whatever) than from that of
pg_stat_database.

A part of me would like to change the view definitions so that all the
columns are strongly typed (i.e. all these values would be intervals).
I realise that that isn't practical though.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Robert Haas
On Tue, Apr 10, 2012 at 10:06 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Hmm.  So, on further review, this is not as simple as it seems.  I'd
 like some input from other people on what we should do here.

 pg_stat_statements has long exposed a column called total_time as a
 float8.  It now exposes columns time_read and time_write which are
 actually measuring the time spent reading and writing data blocks, and
 those are also exposed as a float8; all these count seconds.

 Meanwhile, all times exposed by the stats collector (including the new
 and analagous pg_stat_database.block_read_time and
 pg_stat_database.block_write_time columns) are exposed as int8; these
 count milliseconds.

 So, should we make the new columns exposed by pg_stat_statements use
 milliseconds, so that the block read/write timings are everywhere in
 milliseconds, or should we keep them as a float8, so that all the
 times exposed by pg_stat_statements use float8?

 Given that we've whacked pg_stat_statements' behavior around rather
 thoroughly in this release, maybe we could get away with redefining
 total_time as being measured in msec rather than sec, thereby aligning
 units as msec across the board.  It's arguably a smaller deal than the
 way we've redefined what the query column contains...

Retyping columns is an awfully good way to produce grumpy users.  Then
again, if we're going to do it, it would certainly be better to do it
now rather than later, because right now I'm guessing
pg_stat_statements is a lot less heavily used than it will be after
9.2 hits shelves.

 float8 vs int8 is a distinct issue, and probably one that is not as
 much of an impact on clients if we change it.  It is not hard to predict
 that somebody will eventually want sub-msec resolution on these things,
 which would suggest that float8 would be the better idea.  But perhaps
 we could leave that change for a future release.

Well, internally, the I/O timing stuff as well as the function timing
stuff use microseconds, and the SQL functions expose it as
microseconds, but then the view divides by 1000 to convert to
milliseconds.  I made the I/O timing stuff do it that way because
that's how the function timing stuff does it, but it does seem a
little random.  One thing in its favor is that it provides a way for
users to get this if they want it, without screwing readability for
the vast majority who don't care.

On the flip side, the new checkpoint timing stuff is in milliseconds
all the way through, though it seems vanishingly unlikely that anyone
needs more resolution in that case.  We have lots of other things in
milliseconds, too.

No matter what we end up doing here it will be consistent with
something; I am reminded of the phrase the good thing about standards
is that there are so many to choose from

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 No matter what we end up doing here it will be consistent with
 something; I am reminded of the phrase the good thing about standards
 is that there are so many to choose from

Well, FWIW I vote for making the new columns be float8 msec.  If you
don't want to change total_time to match, I guess there's no law that
says it *has* to be consistent ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Magnus Hagander
On Tue, Apr 10, 2012 at 17:58, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Apr 10, 2012 at 10:06 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Hmm.  So, on further review, this is not as simple as it seems.  I'd
 like some input from other people on what we should do here.

 pg_stat_statements has long exposed a column called total_time as a
 float8.  It now exposes columns time_read and time_write which are
 actually measuring the time spent reading and writing data blocks, and
 those are also exposed as a float8; all these count seconds.

 Meanwhile, all times exposed by the stats collector (including the new
 and analagous pg_stat_database.block_read_time and
 pg_stat_database.block_write_time columns) are exposed as int8; these
 count milliseconds.

 So, should we make the new columns exposed by pg_stat_statements use
 milliseconds, so that the block read/write timings are everywhere in
 milliseconds, or should we keep them as a float8, so that all the
 times exposed by pg_stat_statements use float8?

 Given that we've whacked pg_stat_statements' behavior around rather
 thoroughly in this release, maybe we could get away with redefining
 total_time as being measured in msec rather than sec, thereby aligning
 units as msec across the board.  It's arguably a smaller deal than the
 way we've redefined what the query column contains...

 Retyping columns is an awfully good way to produce grumpy users.  Then
 again, if we're going to do it, it would certainly be better to do it
 now rather than later, because right now I'm guessing
 pg_stat_statements is a lot less heavily used than it will be after
 9.2 hits shelves.

Agreed. It's better if we can also change the name of it - provided we
can come up with a reasonable new name. Then peoples applications will
break *visibly*, which is a lot  better than breaking invisibly. (This
is the main reason why we renamed current_query in pg_stat_activity..)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Tue, Apr 10, 2012 at 17:58, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Apr 10, 2012 at 10:06 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Given that we've whacked pg_stat_statements' behavior around rather
 thoroughly in this release, maybe we could get away with redefining
 total_time as being measured in msec rather than sec, thereby aligning
 units as msec across the board.  It's arguably a smaller deal than the
 way we've redefined what the query column contains...
 
 Retyping columns is an awfully good way to produce grumpy users.  Then
 again, if we're going to do it, it would certainly be better to do it
 now rather than later, because right now I'm guessing
 pg_stat_statements is a lot less heavily used than it will be after
 9.2 hits shelves.

 Agreed. It's better if we can also change the name of it - provided we
 can come up with a reasonable new name. Then peoples applications will
 break *visibly*, which is a lot  better than breaking invisibly. (This
 is the main reason why we renamed current_query in pg_stat_activity..)

That might be overkill.  Changing the column name will definitely break
anything more specific than select * from pg_stat_statements.
However, it's less clear that changing the units in which the column is
expressed will break things.  It seems likely to me that nobody out
there is doing anything much more sophisticated than sorting by the
column, and that's still going to work the same.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Magnus Hagander
On Tue, Apr 10, 2012 at 18:27, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Tue, Apr 10, 2012 at 17:58, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Apr 10, 2012 at 10:06 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Given that we've whacked pg_stat_statements' behavior around rather
 thoroughly in this release, maybe we could get away with redefining
 total_time as being measured in msec rather than sec, thereby aligning
 units as msec across the board.  It's arguably a smaller deal than the
 way we've redefined what the query column contains...

 Retyping columns is an awfully good way to produce grumpy users.  Then
 again, if we're going to do it, it would certainly be better to do it
 now rather than later, because right now I'm guessing
 pg_stat_statements is a lot less heavily used than it will be after
 9.2 hits shelves.

 Agreed. It's better if we can also change the name of it - provided we
 can come up with a reasonable new name. Then peoples applications will
 break *visibly*, which is a lot  better than breaking invisibly. (This
 is the main reason why we renamed current_query in pg_stat_activity..)

 That might be overkill.  Changing the column name will definitely break
 anything more specific than select * from pg_stat_statements.
 However, it's less clear that changing the units in which the column is
 expressed will break things.  It seems likely to me that nobody out
 there is doing anything much more sophisticated than sorting by the
 column, and that's still going to work the same.

I've seen cases where the timing is correlated with external timings,
e.g. from the application. Have I seen it a lot? No - but then I
haven't seen a big usage of pg_stat_statements either, which might be
the better argument for allowing a change of unit but not name.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Robert Haas
On Tue, Apr 10, 2012 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 No matter what we end up doing here it will be consistent with
 something; I am reminded of the phrase the good thing about standards
 is that there are so many to choose from

 Well, FWIW I vote for making the new columns be float8 msec.  If you
 don't want to change total_time to match, I guess there's no law that
 says it *has* to be consistent ...

Ugh.  So the three ways of doing timing that we have already aren't
enough, and we need a fourth one?  Ack!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Apr 10, 2012 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, FWIW I vote for making the new columns be float8 msec.

 Ugh.  So the three ways of doing timing that we have already aren't
 enough, and we need a fourth one?  Ack!

Huh?  I understood what you said upthread to be that we have two ways
in existing releases (anything unreleased has zero standing in this
discussion): float8 sec in pg_stat_statements.total_time, and
int8 msec everywhere else.  Did I miss something?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Robert Haas
On Tue, Apr 10, 2012 at 1:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Apr 10, 2012 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, FWIW I vote for making the new columns be float8 msec.

 Ugh.  So the three ways of doing timing that we have already aren't
 enough, and we need a fourth one?  Ack!

 Huh?  I understood what you said upthread to be that we have two ways
 in existing releases (anything unreleased has zero standing in this
 discussion): float8 sec in pg_stat_statements.total_time, and
 int8 msec everywhere else.  Did I miss something?

We also have int8 usec floating around.  But even if we didn't, float8
msec would be a new one, regardless of whether it would be third or
fourth...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Apr 10, 2012 at 1:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Huh?  I understood what you said upthread to be that we have two ways
 in existing releases (anything unreleased has zero standing in this
 discussion): float8 sec in pg_stat_statements.total_time, and
 int8 msec everywhere else.  Did I miss something?

 We also have int8 usec floating around.  But even if we didn't, float8
 msec would be a new one, regardless of whether it would be third or
 fourth...

It would still be the second one, because it would replace the only use
of float8 sec, no?  And more to the point, it converges us on msec being
the only exposed unit.

The business about underlying microseconds is maybe not so good, but
I don't think we want to touch that right now.  In the long run
I think it would make sense to converge on float8 msec as being the
standard for exposed timing values, because that is readily adaptable to
the underlying data having nsec or even better precision.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Robert Haas
On Tue, Apr 10, 2012 at 1:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Apr 10, 2012 at 1:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Huh?  I understood what you said upthread to be that we have two ways
 in existing releases (anything unreleased has zero standing in this
 discussion): float8 sec in pg_stat_statements.total_time, and
 int8 msec everywhere else.  Did I miss something?

 We also have int8 usec floating around.  But even if we didn't, float8
 msec would be a new one, regardless of whether it would be third or
 fourth...

 It would still be the second one, because it would replace the only use
 of float8 sec, no?  And more to the point, it converges us on msec being
 the only exposed unit.

 The business about underlying microseconds is maybe not so good, but
 I don't think we want to touch that right now.  In the long run
 I think it would make sense to converge on float8 msec as being the
 standard for exposed timing values, because that is readily adaptable to
 the underlying data having nsec or even better precision.

Hmm.  Maybe we should think about numeric ms, which would have all the
same advantages but without the round-off error.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread k...@rice.edu
On Tue, Apr 10, 2012 at 02:01:02PM -0400, Robert Haas wrote:
 On Tue, Apr 10, 2012 at 1:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Tue, Apr 10, 2012 at 1:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Huh?  I understood what you said upthread to be that we have two ways
  in existing releases (anything unreleased has zero standing in this
  discussion): float8 sec in pg_stat_statements.total_time, and
  int8 msec everywhere else.  Did I miss something?
 
  We also have int8 usec floating around.  But even if we didn't, float8
  msec would be a new one, regardless of whether it would be third or
  fourth...
 
  It would still be the second one, because it would replace the only use
  of float8 sec, no?  And more to the point, it converges us on msec being
  the only exposed unit.
 
  The business about underlying microseconds is maybe not so good, but
  I don't think we want to touch that right now.  In the long run
  I think it would make sense to converge on float8 msec as being the
  standard for exposed timing values, because that is readily adaptable to
  the underlying data having nsec or even better precision.
 
 Hmm.  Maybe we should think about numeric ms, which would have all the
 same advantages but without the round-off error.
 
 -- 
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company
 

They are also a lot bigger with tons of added overhead. :)

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Apr 10, 2012 at 1:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The business about underlying microseconds is maybe not so good, but
 I don't think we want to touch that right now.  In the long run
 I think it would make sense to converge on float8 msec as being the
 standard for exposed timing values, because that is readily adaptable to
 the underlying data having nsec or even better precision.

 Hmm.  Maybe we should think about numeric ms, which would have all the
 same advantages but without the round-off error.

Color me unimpressed ... numeric calculations are vastly more expensive
than float, and where are you going to get timing data that has more
than sixteen decimal digits of accuracy?  IME we're lucky to get three
repeatable digits in any timing measurement.  The point of using a
non-integer type here is not so much precision as dynamic range:
sometimes you might be measuring queries that run for hours, and other
times ones that run for microseconds.  In the latter case it's important
to be able to represent nanoseconds, but not so much in the former.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Greg Smith

On 04/10/2012 12:27 PM, Tom Lane wrote:

Changing the column name will definitely break
anything more specific than select * from pg_stat_statements.
However, it's less clear that changing the units in which the column is
expressed will break things.  It seems likely to me that nobody out
there is doing anything much more sophisticated than sorting by the
column, and that's still going to work the same.


I am doing more sophisticated things with it, so I'll celebrate this as 
my opportunity to say I did something you didn't see coming for 2012.


All the sites involved will happily shred those scripts and rewrite for 
either normalized queries *or* better I/O timing info though, so net 
positive for 9.2 changes even if this part breaks on them.  I think this 
is one of those rare opportunities where there's enough positive 
goodwill from changes to ask what's the best way to handle this 
long-term? and get away with whatever change that requires, too.  I'm 
really not liking the look of this wart now that Robert has pointed it out.


I'd prefer to see at least usec resolution and 8 bytes of dynamic 
range for query related statistics.  Any of these would be fine from a 
UI perspective to me:


-float8 seconds
-float8 msec
-float8 usec
-int8 usec

I don't think int8 msec will be enough resolution to time queries for 
very long, if it's not already obsolete.  The committed example for 
pg_test_timing on good hardware already clocks trivial events at a 
single usec.  Even I/O is getting there.  I've measured my Fusion-io 
loaner card peaking at 8GB/s, which works out to 1 usec per 8K page. 
None of that is even price no object hardware today; it's the stuff 
sitting in my office.


If anything, I'd expect more timing code in the database that only has 
ms resolution right now will start looking fat in a year or two, and 
more things might need to be shifted to usec instead.  Checkpoint timing 
can survive having less resolution because its primary drumbeat is very 
unlikely to drop below the minutes range.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Peter Geoghegan
On 10 April 2012 23:07, Greg Smith g...@2ndquadrant.com wrote:
 On 04/10/2012 12:27 PM, Tom Lane wrote:
 I am doing more sophisticated things with it, so I'll celebrate this as my
 opportunity to say I did something you didn't see coming for 2012.

This is why I requested that we expose the query_id hash value - I
believe that it will be generally useful in clustering situations. It
would be nice to have a persistent identifier. While we're discussing
revising pg_stat_statement's interface, are you still opposed to
exposing that value, Tom?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Robert Haas
On Tue, Apr 10, 2012 at 6:32 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 10 April 2012 23:07, Greg Smith g...@2ndquadrant.com wrote:
 On 04/10/2012 12:27 PM, Tom Lane wrote:
 I am doing more sophisticated things with it, so I'll celebrate this as my
 opportunity to say I did something you didn't see coming for 2012.

 This is why I requested that we expose the query_id hash value - I
 believe that it will be generally useful in clustering situations. It
 would be nice to have a persistent identifier. While we're discussing
 revising pg_stat_statement's interface, are you still opposed to
 exposing that value, Tom?

If people need something like that, couldn't they create it by hashing
the normalized query text with an arbitrary algorithm?

The only obvious advantage of exposing the value used internally is
that it might be helpful in terms of understanding the collision
behavior.  But hopefully collisions are pretty rare anyway, so...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 On 10 April 2012 23:07, Greg Smith g...@2ndquadrant.com wrote:
 On 04/10/2012 12:27 PM, Tom Lane wrote:
 I am doing more sophisticated things with it, so I'll celebrate this as my
 opportunity to say I did something you didn't see coming for 2012.

 This is why I requested that we expose the query_id hash value - I
 believe that it will be generally useful in clustering situations. It
 would be nice to have a persistent identifier. While we're discussing
 revising pg_stat_statement's interface, are you still opposed to
 exposing that value, Tom?

I still am.  I'm unconvinced by references to clustering situations,
because as constructed the hash is extremely database-specific.
It will vary depending on OID assignments, not to mention platform
characteristics such as word width and endianness.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Peter Geoghegan
On 11 April 2012 00:35, Robert Haas robertmh...@gmail.com wrote:
 If people need something like that, couldn't they create it by hashing
 the normalized query text with an arbitrary algorithm?

That supposes that the normalised query text is perfectly stable. It
may well not be, particularly for things like ad-hoc queries or
queries generated by ORMs, across database clusters and over long
periods of time - you're basically throwing the benefit of all of that
intelligent normalisation out of the window, because it's pretty close
to free to expose it. What if a developer tweaks an alias in the
application for clarity? Also, as you point out, it has additional
utility in advertising when a collision has happened, and setting the
user's expectations appropriately. I assume that collisions are very
rare, but when they do happen, this gives you a fighting chance of
noticing them.

As Tom points out, the query hash will vary according to platform
specific characteristics, including endianness, and will require OIDs
are the same on every node. However, it is still going to be useful in
clusters that use streaming replication, though not a third party
trigger based replication system for example, because streaming
replication does of course require that those factors (and rather a
lot more) will be identical across the cluster anyway. Realistically,
I'd expect a large majority of people interested in this feature to
only want to use it with streaming replication anyway.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 On 11 April 2012 00:35, Robert Haas robertmh...@gmail.com wrote:
 If people need something like that, couldn't they create it by hashing
 the normalized query text with an arbitrary algorithm?

 That supposes that the normalised query text is perfectly stable. It
 may well not be, particularly for things like ad-hoc queries or
 queries generated by ORMs, across database clusters and over long
 periods of time -

Indeed, but the hash value isn't stable either given those sorts of
assumptions, so I'm not convinced that there's any advantage there.

What I think people would actually like to know, if they're in a
situation where distinct query texts are getting hashed to the same
thing, is *which* different texts got hashed to the same thing.
But there's no good way to expose that given the pg_stat_statements
infrastructure, and exposing the hash value doesn't help.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Peter Geoghegan
On 11 April 2012 01:16, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Geoghegan pe...@2ndquadrant.com writes:
 On 11 April 2012 00:35, Robert Haas robertmh...@gmail.com wrote:
 If people need something like that, couldn't they create it by hashing
 the normalized query text with an arbitrary algorithm?

 That supposes that the normalised query text is perfectly stable. It
 may well not be, particularly for things like ad-hoc queries or
 queries generated by ORMs, across database clusters and over long
 periods of time -

 Indeed, but the hash value isn't stable either given those sorts of
 assumptions, so I'm not convinced that there's any advantage there.

Isn't it? The hash captures the true meaning of the query, while
having the database server's platform as a usually irrelevant
artefact. Another thing that I forgot to mention is client encoding -
it may well be fairly inconvenient to have to use the same algorithm
to hash the query string across applications. You also have to hash
the query string yourself again and again, which is expensive to do
from Python or something, and is often inconvenient - differences
beyond track_activity_query_size bytes (default:1024) are not
recognised. Using an SQL code beautifier where a single byte varies
now breaks everything, which developers don't expect at all (we've
trained them not to), so in many ways you're back to the same
limitations as classic pg_stat_statements if you attempt to aggregate
queries over time and across machines, which is a very real use case.

It's probably pretty annoying to have to get your Python app to use
the same hash function as your Java app or whatever I, unless you want
to use something heavyweight like a cryptographic hash function. By
doing it within Postgres, you avoid those headaches.

I'm not asking you to very loudly proclaim that it should be used like
this - just expose it, accurately document it, and I'm quite confident
that it will be widely used and relied upon by those that are
reasonably well informed, and understand its limitations, which are
really quite straightforward.

 What I think people would actually like to know, if they're in a
 situation where distinct query texts are getting hashed to the same
 thing, is *which* different texts got hashed to the same thing.
 But there's no good way to expose that given the pg_stat_statements
 infrastructure, and exposing the hash value doesn't help.

Apart from detecting the case where we get a straightforward
collision, I don't expect that that would be useful. The whole point
is that the user doesn't care about the difference, and I think we've
specified a practical, widely useful standard for when queries should
be considered equivalent.
-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-05 Thread Robert Haas
On Tue, Mar 27, 2012 at 8:30 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Mar 27, 2012 at 3:22 PM, Ants Aasma a...@cybertec.at wrote:
 On Tue, Mar 27, 2012 at 9:58 PM, Robert Haas robertmh...@gmail.com wrote:
 I've committed the core of this.  I left out the stats collector
 stuff, because it's still per-table and I think perhaps we should back
 off to just per-database.  I changed it so that it does not conflate
 wait time with I/O time.  Maybe there should be a separate method of
 measuring wait time, but I don't think it's a good idea for the
 per-backend stats to measure a different thing than what gets reported
 up to the stats collector - we should have ONE definition of each
 counter.  I also tweaked the EXPLAIN output format a bit, and the
 docs.

 Thank you for working on this.

 Taking a fresh look at it, I agree with you that conflating waiting
 for backend local IO, and IO performed by some other backend might
 paint us into a corner. For most workloads the wait for IO performed
 by others should be the minority anyway.

 I won't really miss the per table stats. But if the pg_stat_statements
 normalisation patch gets commited, it would be really neat to also
 have IO waits there. It would be much easier to quickly diagnose what
 is causing all this IO issues.

 So, the pg_stat_statements part of this is committed now.  Do you want
 to prepare a revised patch to add per-database counters to the
 statistics collector?  I think that might be a good idea...

Hearing nothing further on this point, I went and did it myself.

In the process I noticed a couple of things that I think we need to fix.

Currently, the statistics views that include timing information are
displayed in milliseconds (see pg_stat_user_functions), while the
underlying SQL-callable functions return the data in microseconds.
Whether or not this was a good design decision, we're stuck with it
now; the documentation implies that the views and functions use the
same units.  I'll go fix that next.

Meanwhile, pg_stat_statements converts the same data to seconds but
makes it a double rather than a bigint.  I think that was a bad idea
and we should make it consistent use a bigint and milliseconds while
we still can.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-03-27 Thread Robert Haas
On Thu, Mar 22, 2012 at 7:38 PM, Ants Aasma a...@cybertec.at wrote:
 On Wed, Mar 21, 2012 at 5:01 PM, Robert Haas robertmh...@gmail.com wrote:
 This seems to have bitrotted again.  :-(

 Can you please rebase again?

 Rebased.

I've committed the core of this.  I left out the stats collector
stuff, because it's still per-table and I think perhaps we should back
off to just per-database.  I changed it so that it does not conflate
wait time with I/O time.  Maybe there should be a separate method of
measuring wait time, but I don't think it's a good idea for the
per-backend stats to measure a different thing than what gets reported
up to the stats collector - we should have ONE definition of each
counter.  I also tweaked the EXPLAIN output format a bit, and the
docs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-03-27 Thread Robert Haas
On Tue, Mar 27, 2012 at 2:58 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Mar 22, 2012 at 7:38 PM, Ants Aasma a...@cybertec.at wrote:
 On Wed, Mar 21, 2012 at 5:01 PM, Robert Haas robertmh...@gmail.com wrote:
 This seems to have bitrotted again.  :-(

 Can you please rebase again?

 Rebased.

 I've committed the core of this.  I left out the stats collector
 stuff, because it's still per-table and I think perhaps we should back
 off to just per-database.  I changed it so that it does not conflate
 wait time with I/O time.  Maybe there should be a separate method of
 measuring wait time, but I don't think it's a good idea for the
 per-backend stats to measure a different thing than what gets reported
 up to the stats collector - we should have ONE definition of each
 counter.  I also tweaked the EXPLAIN output format a bit, and the
 docs.

And I've now committed the pg_stat_statements code as well, hopefully
not stomping too badly one what Tom's apparently in the midst of doing
with Peter's pg_stat_statements patch.  I committed this almost
exactly as submitted; just a minor code style correction and a few
documentation enhancements.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-03-27 Thread Ants Aasma
On Tue, Mar 27, 2012 at 9:58 PM, Robert Haas robertmh...@gmail.com wrote:
 I've committed the core of this.  I left out the stats collector
 stuff, because it's still per-table and I think perhaps we should back
 off to just per-database.  I changed it so that it does not conflate
 wait time with I/O time.  Maybe there should be a separate method of
 measuring wait time, but I don't think it's a good idea for the
 per-backend stats to measure a different thing than what gets reported
 up to the stats collector - we should have ONE definition of each
 counter.  I also tweaked the EXPLAIN output format a bit, and the
 docs.

Thank you for working on this.

Taking a fresh look at it, I agree with you that conflating waiting
for backend local IO, and IO performed by some other backend might
paint us into a corner. For most workloads the wait for IO performed
by others should be the minority anyway.

I won't really miss the per table stats. But if the pg_stat_statements
normalisation patch gets commited, it would be really neat to also
have IO waits there. It would be much easier to quickly diagnose what
is causing all this IO issues.

Thanks again,
Ants Aasma
--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-03-27 Thread Robert Haas
On Tue, Mar 27, 2012 at 3:22 PM, Ants Aasma a...@cybertec.at wrote:
 On Tue, Mar 27, 2012 at 9:58 PM, Robert Haas robertmh...@gmail.com wrote:
 I've committed the core of this.  I left out the stats collector
 stuff, because it's still per-table and I think perhaps we should back
 off to just per-database.  I changed it so that it does not conflate
 wait time with I/O time.  Maybe there should be a separate method of
 measuring wait time, but I don't think it's a good idea for the
 per-backend stats to measure a different thing than what gets reported
 up to the stats collector - we should have ONE definition of each
 counter.  I also tweaked the EXPLAIN output format a bit, and the
 docs.

 Thank you for working on this.

 Taking a fresh look at it, I agree with you that conflating waiting
 for backend local IO, and IO performed by some other backend might
 paint us into a corner. For most workloads the wait for IO performed
 by others should be the minority anyway.

 I won't really miss the per table stats. But if the pg_stat_statements
 normalisation patch gets commited, it would be really neat to also
 have IO waits there. It would be much easier to quickly diagnose what
 is causing all this IO issues.

So, the pg_stat_statements part of this is committed now.  Do you want
to prepare a revised patch to add per-database counters to the
statistics collector?  I think that might be a good idea...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-03-27 Thread Greg Stark
On Tue, Mar 27, 2012 at 7:58 PM, Robert Haas robertmh...@gmail.com wrote:
 I've committed the core of this.  I left out the stats collector
 stuff, because it's still per-table and I think perhaps we should back
 off to just per-database.  I changed it so that it does not conflate
 wait time with I/O time.  Maybe there should be a separate method of
 measuring wait time, but I don't think it's a good idea for the
 per-backend stats to measure a different thing than what gets reported
 up to the stats collector - we should have ONE definition of each
 counter.  I also tweaked the EXPLAIN output format a bit, and the
 docs.

Maybe I missed some earlier discussoin -- I've been having trouble
keeping up with the lists.

But was there discussion of why this is a GUC? Why not just another
parameter to EXPLAIN like the others?
i.e. EXPLAIN (ANALYZE, BUFFERS, IOTIMING)

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-03-27 Thread Robert Haas
On Tue, Mar 27, 2012 at 8:41 PM, Greg Stark st...@mit.edu wrote:
 On Tue, Mar 27, 2012 at 7:58 PM, Robert Haas robertmh...@gmail.com wrote:
 I've committed the core of this.  I left out the stats collector
 stuff, because it's still per-table and I think perhaps we should back
 off to just per-database.  I changed it so that it does not conflate
 wait time with I/O time.  Maybe there should be a separate method of
 measuring wait time, but I don't think it's a good idea for the
 per-backend stats to measure a different thing than what gets reported
 up to the stats collector - we should have ONE definition of each
 counter.  I also tweaked the EXPLAIN output format a bit, and the
 docs.

 Maybe I missed some earlier discussoin -- I've been having trouble
 keeping up with the lists.

 But was there discussion of why this is a GUC? Why not just another
 parameter to EXPLAIN like the others?
 i.e. EXPLAIN (ANALYZE, BUFFERS, IOTIMING)

Because you want to be able to expose the data even for queries that
aren't explained.  Right now, you can do that with pg_stat_statements;
and the original patch also had per-table counters, but I didn't
commit that part due to some concerns about stats bloat.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-03-21 Thread Robert Haas
On Fri, Feb 24, 2012 at 2:23 PM, Ants Aasma ants.aa...@eesti.ee wrote:
 On Wed, Feb 22, 2012 at 6:35 PM, Ants Aasma ants.aa...@eesti.ee wrote:
 Some implementation notes.  This currently fails regression test
 create_function_3, haven't looked into why yet.

 I'll take a look at it.

 The failure was due to leakproof changes to pgproc. Attached patches
 are adjusted accordingly and rebased over Robert's blocks dirtied
 patch.

This seems to have bitrotted again.  :-(

Can you please rebase again?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-02-24 Thread Jim Nasby

On 2/22/12 10:35 AM, Ants Aasma wrote:

The reason why I didn't add write timings to relation stats is that I
couldn't figure out what the semantics should be. It could be either
time spent waiting for this relations blocks to be written out or
time spent waiting for some other relations blocks to be written out
to free space for this relations block or maybe distribute the cost,
background writes could be included or excluded. Writes usually return
quickly, unless lots of possibly unrelated writes have dirtied enough
of OS cache, etc. I figured that what ever choices I made, they
wouldn't really help anyone diagnose anything. Having global write
timings in pg_stat_bgwriter might be useful, but I feel that is
something for another patch.


I know it's not perfect, but I would argue that what users care about most of 
the time is time taken up in actual backends. So I wouldn't worry about 
bgwriter. I also wouldn't worry about time spent waiting to find a buffer at 
this point (see below).


  I still think a full wait timing interface is the right long-term direction
  here.  It's hard to reject this idea when it seems to be working right now
  though, while more comprehensive wait storage is still at least a release
  off.   Opinions welcome, I'm still juggling this around now that I have it
  working again.

I agree that wait timing interface is the right direction. I have
thought a bit about it and could share some ideas - maybe I should
create a wiki page where the general design could be hashed out?


Yes, I think a wiki would be a good place to start. As you showed in your 
previous question about writes there's a *lot* of places where timing info 
would be useful to us.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-02-22 Thread Ants Aasma
On Wed, Feb 22, 2012 at 4:43 PM, Greg Smith g...@2ndquadrant.com wrote:
 Attached are updated versions of this feature without the pg_test_timing
 tool part, since I broke that out into another discussion thread.  I've
 split the part that updates pg_stat_statistics out from the main feature
 too, separate patch attached to here (but I'm not reviewing that yet). Lots
 of bitrot since this was submitted, and yes I noticed that I've almost
 recreated earlier versions of this patch--by splitting off the parts that
 were developed later.

Thanks for the review and splitting. Sorry I didn't fix up the bit rot myself.

 Earlier discussion of this got side tracked on a few things, partly my
 fault. It's worth taking a look at what this provides before judging it too
 much.  It can demo well.

 The stated purpose is helping figure out what relations are gobbling up the
 most access time, presumably to optimize them and/or the storage they are
 on.  What do I put onto SSD is surely a popular request nowadays.

I should have stated the purpose more clearly. The original reason for
developing this patch was to figure out what queries are taking the
most time and why, specifically in the case where OS memory is a lot
larger than shared_buffers. Basically the following query to get a
quick overview where the bottlenecks are:
SELECT query, total_time, (time_read+time_write)/total_time AS
io_fraction FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20;

This of course hugely benefits from Peter's pg_stat_statements
normalization patch.

Tracking timings per relation was actually an afterthought.

 Now, the first critical question to ask is what additional information is
 this providing above the existing counters?  After all, it's possible to
 tell pgbench_accounts is the hotspot just from comparing heap_blks_read,
 right?

Like I said above, I find it mostly useful to see what is missing the
OS cache. With memory being as cheap as it is, a reasonably priced
server can have 128G of memory, while max recommended value for
shared_buffers is 8GB. It's quite likely to have tables that fit into
OS cache but not into shared_buffers, but it's not trivial to figure
out which those are.

 This run looks useful at providing the data wished for--that read times are
 slower per capita from the accounts table.  The first time I tried this I
 got a bizarre high number for pgbench_branches.heap_blks_time ; I'm not sure
 how reliable this is yet.  One problem that might be easy to fix is that the
 write timing info doesn't show in any of these system views, only in EXPLAIN
 and statement level ones.

I'm not sure about the source of the huge number, might instability in
the clock source. Have you tried running the monotonicity check for a
longer period while the system is under load? Another issue with the
current timing code is that gettimeofday isn't guaranteed to be
monotonic anyway, things like NTP adjustments can make time go
backwards. clock_gettime with CLOCK_MONOTONIC_RAW would be better, but
that's linux specific :(

The reason why I didn't add write timings to relation stats is that I
couldn't figure out what the semantics should be. It could be either
time spent waiting for this relations blocks to be written out or
time spent waiting for some other relations blocks to be written out
to free space for this relations block or maybe distribute the cost,
background writes could be included or excluded. Writes usually return
quickly, unless lots of possibly unrelated writes have dirtied enough
of OS cache, etc. I figured that what ever choices I made, they
wouldn't really help anyone diagnose anything. Having global write
timings in pg_stat_bgwriter might be useful, but I feel that is
something for another patch.

 I still think a full wait timing interface is the right long-term direction
 here.  It's hard to reject this idea when it seems to be working right now
 though, while more comprehensive wait storage is still at least a release
 off.   Opinions welcome, I'm still juggling this around now that I have it
 working again.

I agree that wait timing interface is the right direction. I have
thought a bit about it and could share some ideas - maybe I should
create a wiki page where the general design could be hashed out?

Anyway, the user visible information from this patch should be trivial
to extract from a general wait timing framework. Pushing my own agenda
a bit - having this patch in the current release would help to get
some field experience on any issues surrounding timing :)

 Some implementation notes.  This currently fails regression test
 create_function_3, haven't looked into why yet.

I'll take a look at it.

Thanks again.

--
Ants Aasma

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-01-17 Thread Greg Smith
Attached is the pg_test_timing utility portion of this submission, 
broken out into its own patch.  It's a contrib module modeled on 
pg_test_fsync.


The documentation is still a bit rough, I'm not done with that yet.  I 
have included an example of good timing results, switching to a bad 
clock source, and the resulting bad results.  Code review found some 
formatting things to nitpick I've already fixed:  non-standard brace 
locations and not including enough spaces in expressions were the main two.


This is now referenced by the existing cryptic documentation comment 
around EXPLAIN ANALYZE, which says that overhead can be high because 
gettimeofday is slow on some systems.  Since this utility measures that 
directly, I think it's a clear win to include it just for that purpose.  
The fact that there are more places coming where timing overhead matters 
is also true.  But this existing one is already bad enough to justify 
shipping something to help measure/manage it in my mind.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

diff --git a/contrib/Makefile b/contrib/Makefile
index 0c238aa..45b601c 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -35,6 +35,7 @@ SUBDIRS = \
 		pg_standby	\
 		pg_stat_statements \
 		pg_test_fsync	\
+		pg_test_timing	\
 		pg_trgm		\
 		pg_upgrade	\
 		pg_upgrade_support \
diff --git a/contrib/pg_test_timing/Makefile b/contrib/pg_test_timing/Makefile
new file mode 100644
index 000..b8b266a
--- /dev/null
+++ b/contrib/pg_test_timing/Makefile
@@ -0,0 +1,18 @@
+# contrib/pg_test_timing/Makefile
+
+PGFILEDESC = pg_test_timing - test timing overhead
+PGAPPICON = win32
+
+PROGRAM  = pg_test_timing
+OBJS = pg_test_timing.o
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_test_timing
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_test_timing/pg_test_timing.c b/contrib/pg_test_timing/pg_test_timing.c
new file mode 100644
index 000..bcf3c3a
--- /dev/null
+++ b/contrib/pg_test_timing/pg_test_timing.c
@@ -0,0 +1,157 @@
+/*
+ *	pg_test_timing.c
+ *		tests overhead of timing calls and their monoticity:  that
+ * 		they always move forward
+ */
+
+#include postgres_fe.h
+
+#include getopt_long.h
+#include portability/instr_time.h
+
+static const char *progname;
+
+static int32 test_duration = 3;
+
+static void handle_args(int argc, char *argv[]);
+static void test_timing(int32);
+
+int
+main(int argc, char *argv[])
+{
+	progname = get_progname(argv[0]);
+
+	handle_args(argc, argv);
+
+	test_timing(test_duration);
+
+	return 0;
+}
+
+static void
+handle_args(int argc, char *argv[])
+{
+	static struct option long_options[] = {
+		{duration, required_argument, NULL, 'd'},
+		{NULL, 0, NULL, 0}
+	};
+	int option;			/* Command line option */
+	int	optindex = 0;	/* used by getopt_long */
+
+	if (argc  1)
+	{
+		if (strcmp(argv[1], --help) == 0 || strcmp(argv[1], -h) == 0 ||
+			strcmp(argv[1], -?) == 0)
+		{
+			printf(Usage: %s [-d DURATION]\n, progname);
+			exit(0);
+		}
+		if (strcmp(argv[1], --version) == 0 || strcmp(argv[1], -V) == 0)
+		{
+			puts(pg_test_timing (PostgreSQL)  PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((option = getopt_long(argc, argv, d:,
+ long_options, optindex)) != -1)
+	{
+		switch (option)
+		{
+			case 'd':
+test_duration = atoi(optarg);
+break;
+
+			default:
+fprintf(stderr, Try \%s --help\ for more information.\n,
+		progname);
+exit(1);
+break;
+		}
+	}
+
+	if (argc  optind)
+	{
+		fprintf(stderr,
+%s: too many command-line arguments (first is \%s\)\n,
+progname, argv[optind]);
+		fprintf(stderr, Try \%s --help\ for more information.\n,
+progname);
+		exit(1);
+	}
+
+	if (test_duration  0)
+	{
+		printf(Testing timing overhead for %d seconds.\n, test_duration);
+	}
+	else
+	{
+		printf(Testing timing was interrupted.\n);
+	}
+}
+
+static void
+test_timing(int32 duration)
+{
+	uint64 total_time;
+	int64 time_elapsed = 0;
+	uint64 loop_count = 0;
+	uint64 prev, cur;
+	int32 diff, i, bits, found;
+
+	instr_time start_time, end_time, temp;
+
+	static int64 histogram[32];
+
+	total_time = duration  0 ? duration * 100 : 0;
+
+	INSTR_TIME_SET_CURRENT(start_time);
+	cur = INSTR_TIME_GET_MICROSEC(start_time);
+
+	while (time_elapsed  total_time)
+	{
+		prev = cur;
+		INSTR_TIME_SET_CURRENT(temp);
+		cur = INSTR_TIME_GET_MICROSEC(temp);
+		diff = cur - prev;
+
+		if (diff  0)
+		{
+			printf(Detected clock going backwards in time.\n);
+			printf(Time warp: %d microseconds\n, diff);
+			exit(1);
+		}
+
+		bits = 0;
+		while (diff)
+		{
+			diff = 1;
+			bits++;
+		}
+		histogram[bits]++;
+
+		loop_count++;
+		INSTR_TIME_SUBTRACT(temp, start_time);
+		time_elapsed = INSTR_TIME_GET_MICROSEC(temp);
+	}
+
+	INSTR_TIME_SET_CURRENT(end_time);
+
+	

Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-01-15 Thread Greg Smith

On 01/15/2012 05:14 PM, Ants Aasma wrote:

I hope that having a tool to measure the overhead and check the sanity
of clock sources is enough to answer the worries about the potential
performance hit. We could also check that the clock source is fast
enough on start-up/when the guc is changed, but that seems a bit too
much and leaves open the question about what is fast enough.


I've been thinking along those same lines--check at startup, provide 
some guidance on the general range of what's considered fast vs. slow in 
both the code and documentation.  What I'm hoping to do here is split 
your patch in half and work on the pg_test_timing contrib utility 
first.  That part answers some overdue questions about when EXPLAIN 
ANALYZE can be expected to add a lot of overhead, which means it's 
useful all on its own.  I'd like to see that utility go into 9.2, along 
with a new documentation section covering that topic.  I'll write the 
new documentation bit.


As far as the buffer timing goes, there is a lot of low-level timing 
information I'd like to see the database collect.  To pick a second 
example with very similar mechanics, I'd like to know which queries 
spend a lot of their time waiting on locks.  The subset of time a 
statement spends waiting just for commit related things is a third.  The 
industry standard term for these is wait events, as seen in Oracle, 
MySQL, MS SQL Server. etc.  That's so standard I don't see an 
intellectual property issue with PostgreSQL using the same term.  Talk 
with a random person who is converting from Oracle to PostgreSQL, ask 
them about their performance concerns.  At least 3/4 of those 
conversations I have mention being nervous about not having wait event data.


Right now, I feel the biggest hurdle to performance tuning PostgreSQL is 
not having good enough built-in query log analysis tools.  If the 
pg_stat_statements normalization upgrade in the CF queue is commited, 
that's enough to make me bump that to solved well enough.  After 
clearing that hurdle, figuring out how to log, analyze, and manage 
storage of wait events is the next biggest missing piece.  One of my top 
goals for 9.3 was to make sure that happened.


I don't think the long-term answer for how to manage wait event data is 
to collect it as part of pg_stat_statements though.  But I don't have a 
good alternate proposal, while you've submitted a patch that actually 
does something useful right now.  I'm going to think some more about how 
to reconcile all that.  There is an intermediate point to consider as 
well, which is just committing something that adjusts the core code to 
make the buffer wait event data available.  pg_stat_statements is easy 
enough to continue work on outside of core.  I could see a path where 
that happens in parallel with adding a better core wait event 
infrastructure, just to get the initial buffer wait info into people's 
hands earlier.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2011-11-28 Thread Tomas Vondra
On 28 Listopad 2011, 8:54, Greg Smith wrote:
 -Have one of the PostgreSQL background processes keep track of a time
 estimate on its own, only periodically pausing to sync against the real
 time.  Then most calls to gettimeofday() can use that value instead.  I
 was thinking of that idea for slightly longer running things though; I
 doubt that can be made accurate enough to test instrument buffer

What about random sampling, i.e. measure just 5% of the events or
something like that? Sure, it's not exact but it significantly reduces the
overhead. And it might be a config parameter, so the user might decide how
precise results are needed, and even consider how fast the clocks are.

 Something more ambitious like the v$ stuff would also take care of what
 you're doing here; I'm not sure that what you've done helps built it
 though.  Please don't take that personally.  Part of one of my own
 instrumentation patches recently was rejected out of hand for the same
 reason, just not being general enough.

Yes, that'd be significant improvement. The wait-event stuff is very
useful and changes the tuning significantly.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2011-11-28 Thread Robert Haas
On Mon, Nov 28, 2011 at 2:54 AM, Greg Smith g...@2ndquadrant.com wrote:
 The real problem with this whole area is that we know there are
 systems floating around where the amount of time taken to grab timestamps
 like this is just terrible.

Assuming the feature is off by default (and I can't imagine we'd
consider anything else), I don't see why that should be cause for
concern.  If the instrumentation creates too much system load, then
don't use it: simple as that.  A more interesting question is how
much load does this feature create even when it's turned off?.

The other big problem for a patch of this sort is that it would bloat
the stats file.  I think we really need to come up with a more
scalable alternative to the current system, but I haven't looked the
current system in enough detail to have a clear feeling about what
such an alternative would look like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2011-11-28 Thread Greg Stark
On Nov 28, 2011 8:55 AM, Greg Smith g...@2ndquadrant.com wrote:

 On 11/27/2011 04:39 PM, Ants Aasma wrote:

 On the AMD I saw about 3% performance drop with timing enabled. On the
 Intel machine I couldn't measure any statistically significant change.


 Oh no, it's party pooper time again.  Sorry I have to be the one to do it
this round.  The real problem with this whole area is that we know there
are systems floating around where the amount of time taken to grab
timestamps like this is just terrible.

I believe on most systems on modern linux kernels gettimeofday an its ilk
will be a vsyscall and nearly as fast as a regular function call.


 I recall a patch similar to this one was submitted by Greg Stark some
time ago.  It used the info for different reasons--to try and figure out
whether reads were cached or not--but I believe it withered rather than
being implemented mainly because it ran into the same fundamental
roadblocks here.  My memory could be wrong here, there were also concerns
about what the data would be used for.

I speculated about doing that but never did. I had an experimental patch
using mincore to do what you describe but it wasn't intended for production
code I think. The only real patch was to use getrusage which I still intend
to commit but it doesn't tell you the time spent in I/O -- though it does
tell you the sys time which should be similar.


Re: [HACKERS] Patch: add timing of buffer I/O requests

2011-11-28 Thread Tomas Vondra
On 28 Listopad 2011, 15:40, Greg Stark wrote:
 On Nov 28, 2011 8:55 AM, Greg Smith g...@2ndquadrant.com wrote:

 On 11/27/2011 04:39 PM, Ants Aasma wrote:

 On the AMD I saw about 3% performance drop with timing enabled. On the
 Intel machine I couldn't measure any statistically significant change.


 Oh no, it's party pooper time again.  Sorry I have to be the one to do
 it
 this round.  The real problem with this whole area is that we know there
 are systems floating around where the amount of time taken to grab
 timestamps like this is just terrible.

 I believe on most systems on modern linux kernels gettimeofday an its ilk
 will be a vsyscall and nearly as fast as a regular function call.

AFAIK a vsyscall should be faster than a regular syscall. It does not need
to switch to kernel space at all, it just reads the data from a shared
page. The problem is that this is Linux-specific - for example FreeBSD
does not have vsyscall at all (it's actually one of the Linux-isms
mentioned here: http://wiki.freebsd.org/AvoidingLinuxisms).

There's also something called VDSO, that (among other things) uses
vsyscall if availabe, or the best implementation available. So there are
platforms that do not provide vsyscall, and in that case it'd be just as
slow as a regular syscall :(

I wouldn't expect a patch that works fine on Linux but not on other
platforms to be accepted, unless there's a compile-time configure switch
(--with-timings) that'd allow to disable that.

Another option would be to reimplement the vsyscall, even on platforms
that don't provide it. The principle is actually quite simple - allocate a
shared memory, store there a current time and update it whenever a clock
interrupt happens. This is basically what Greg suggested in one of the
previous posts, where regularly means on every interrupt. Greg was
worried about the precision, but this should be just fine I guess. It's
the precision you get on Linux, anyway ...

 I recall a patch similar to this one was submitted by Greg Stark some
 time ago.  It used the info for different reasons--to try and figure out
 whether reads were cached or not--but I believe it withered rather than
 being implemented mainly because it ran into the same fundamental
 roadblocks here.  My memory could be wrong here, there were also concerns
 about what the data would be used for.

The difficulty when distinguishing whether the reads were cached or not is
the price we pay for using filesystem cache instead of managing our own.
Not sure if this can be solved just by measuring the latency - with
spinners it's quite easy, the differences are rather huge (and it's not
difficult to derive that even from pgbench log). But with SSDs, multiple
tablespaces on different storage, etc. it gets much harder.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2011-11-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Nov 28, 2011 at 2:54 AM, Greg Smith g...@2ndquadrant.com wrote:
 The real problem with this whole area is that we know there are
 systems floating around where the amount of time taken to grab timestamps
 like this is just terrible.

 Assuming the feature is off by default (and I can't imagine we'd
 consider anything else), I don't see why that should be cause for
 concern.  If the instrumentation creates too much system load, then
 don't use it: simple as that.  A more interesting question is how
 much load does this feature create even when it's turned off?.

Right.  I see that the code already has a switch to skip the
gettimeofday calls, so the objection is only problematic if the added
overhead is significant even with the switch off.  I would worry mainly
about the added time/space to deal with the extra stats counters.

 The other big problem for a patch of this sort is that it would bloat
 the stats file.

Yes.  Which begs the question of why we need to measure this per-table.
I would think per-tablespace would be sufficient.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2011-11-28 Thread Martijn van Oosterhout
On Sun, Nov 27, 2011 at 11:54:38PM -0800, Greg Smith wrote:
 On 11/27/2011 04:39 PM, Ants Aasma wrote:
 On the AMD I saw about 3% performance drop with timing enabled. On the
 Intel machine I couldn't measure any statistically significant change.
 
 Oh no, it's party pooper time again.  Sorry I have to be the one to
 do it this round.  The real problem with this whole area is that we
 know there are systems floating around where the amount of time
 taken to grab timestamps like this is just terrible.  I've been
 annoyed enough by that problem to spend some time digging into why
 that is--seems to be a bunch of trivia around the multiple ways to
 collect time info on x86 systems--and after this CommitFest is over

Something good to know: in Linux the file
/sys/devices/system/clocksource/clocksource0/current_clocksource
lists the current clock source, and
/sys/devices/system/clocksource/clocksource0/available_clocksource
lists the available clock sources. With cat you can switch them. That
way you may be able to quantify the effects on a single machine.

Learned the hard way while tracking clock-skew on a multicore system. 
The hpet may not be the fastest (that would be the cpu timer), but it's
the fastest (IME) that gives guarenteed monotonic time.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] Patch: add timing of buffer I/O requests

2011-11-28 Thread Dimitri Fontaine
Tomas Vondra t...@fuzzy.cz writes:
 Another option would be to reimplement the vsyscall, even on platforms
 that don't provide it. The principle is actually quite simple - allocate a
 shared memory, store there a current time and update it whenever a clock
 interrupt happens. This is basically what Greg suggested in one of the
 previous posts, where regularly means on every interrupt. Greg was
 worried about the precision, but this should be just fine I guess. It's
 the precision you get on Linux, anyway ...

That sounds good for other interesting things, which entails being able
to have timing information attached to the XID sequence.  If we go this
way, how far are we from having a ticker in PostgreSQL?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2011-11-28 Thread Jim Nasby
On Nov 28, 2011, at 9:29 AM, Tomas Vondra wrote:
 I recall a patch similar to this one was submitted by Greg Stark some
 time ago.  It used the info for different reasons--to try and figure out
 whether reads were cached or not--but I believe it withered rather than
 being implemented mainly because it ran into the same fundamental
 roadblocks here.  My memory could be wrong here, there were also concerns
 about what the data would be used for.
 
 The difficulty when distinguishing whether the reads were cached or not is
 the price we pay for using filesystem cache instead of managing our own.
 Not sure if this can be solved just by measuring the latency - with
 spinners it's quite easy, the differences are rather huge (and it's not
 difficult to derive that even from pgbench log). But with SSDs, multiple
 tablespaces on different storage, etc. it gets much harder.

True, but every use case for this information I can think of ultimately only 
cares about how long it took to perform some kind of IO; it doesn't *really* 
care about whether it was cached. So in that context, we don't really care if 
SSDs are fast enough that they look like cache, because that means they're 
performing (essentially) the same as cache.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2011-11-28 Thread Tomas Vondra
On 28.11.2011 22:32, Dimitri Fontaine wrote:
 Tomas Vondra t...@fuzzy.cz writes:
 Another option would be to reimplement the vsyscall, even on platforms
 that don't provide it. The principle is actually quite simple - allocate a
 shared memory, store there a current time and update it whenever a clock
 interrupt happens. This is basically what Greg suggested in one of the
 previous posts, where regularly means on every interrupt. Greg was
 worried about the precision, but this should be just fine I guess. It's
 the precision you get on Linux, anyway ...
 
 That sounds good for other interesting things, which entails being able
 to have timing information attached to the XID sequence.  If we go this
 way, how far are we from having a ticker in PostgreSQL?

I'm not sure. On Linux/x86 this is already done, but my knowledge of
kernel development is rather limited, especially when it comes to other
OSes and platforms. E.g. I'm not sure why it's not available in FreeBSD
on x86, I guess it's rather we don't want it than it's not possible.


In Linux sources, the most interesting pieces are probably these:

1) arch/x86/include/asm/vgtod.h - that's the shared memory structure

2) arch/x86/kernel/vsyscall_64.c - this is how the memory is read
   (do_vgettimeofday)

3) arch/x86/kernel/vsyscall_64.c - this is how the memory is updated
   (update_vsyscall)

4) kernel/time/timekeeping.c - do_settimeofday (calls update_vsyscall)

5) drivers/rtc/class.c (and other) RTC drivers call do_settimeofday


Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2011-11-28 Thread Tomas Vondra
On 29.11.2011 02:14, Jim Nasby wrote:
 On Nov 28, 2011, at 9:29 AM, Tomas Vondra wrote:
 I recall a patch similar to this one was submitted by Greg
 Stark some
 time ago.  It used the info for different reasons--to try and
 figure out whether reads were cached or not--but I believe it
 withered rather than being implemented mainly because it ran into
 the same fundamental roadblocks here.  My memory could be wrong
 here, there were also concerns about what the data would be used
 for.
 
 The difficulty when distinguishing whether the reads were cached or
 not is the price we pay for using filesystem cache instead of
 managing our own. Not sure if this can be solved just by measuring
 the latency - with spinners it's quite easy, the differences are
 rather huge (and it's not difficult to derive that even from
 pgbench log). But with SSDs, multiple tablespaces on different
 storage, etc. it gets much harder.
 
 True, but every use case for this information I can think of
 ultimately only cares about how long it took to perform some kind of
 IO; it doesn't *really* care about whether it was cached. So in that
 context, we don't really care if SSDs are fast enough that they look
 like cache, because that means they're performing (essentially) the
 same as cache.

Yup, that's right. The wait times are generally much more interesting
than the cached/not cached ratio.

Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2011-11-28 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 That sounds good for other interesting things, which entails being able
 to have timing information attached to the XID sequence.  If we go this
 way, how far are we from having a ticker in PostgreSQL?

Those of us who are trying to get rid of idle-state process wakeups will
protest any such thing with vigor.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2011-11-28 Thread Ants Aasma
Sorry for taking so long to respong, had a pretty busy day at work. Anyway..

On Mon, Nov 28, 2011 at 9:54 AM, Greg Smith g...@2ndquadrant.com wrote:
 Oh no, it's party pooper time again.  Sorry I have to be the one to do it
 this round.  The real problem with this whole area is that we know there are
 systems floating around where the amount of time taken to grab timestamps
 like this is just terrible.  I've been annoyed enough by that problem to
 spend some time digging into why that is--seems to be a bunch of trivia
 around the multiple ways to collect time info on x86 systems--and after this
 CommitFest is over I was already hoping to dig through my notes and start
 quantifying that more.  So you can't really prove the overhead of this
 approach is acceptable just by showing two examples; we need to find one of
 the really terrible clocks and test there to get a real feel for the
 worst-case.

Sure, I know that the timing calls might be awfully slow. That's why I turned
it off by default. I saw that track_functions was already using this, so I
figured it was ok to have it potentially run very slowly.

 -Document the underlying problem and known workarounds, provide a way to
 test how bad the overhead is, and just throw our hands up and say sorry,
 you just can't instrument like this if someone has a slow system.

Some documentation about potential problems would definitely be good.
Same goes for a test tool. ISTM that fast accurate timing is just not
possible on all supported platforms. That doesn't seem like a good enough
justification to refuse implementing something useful for the majority that
do as long as it doesn't cause regressions for those that don't or
significant code complexity.

 -Have one of the PostgreSQL background processes keep track of a time
 estimate on its own, only periodically pausing to sync against the real
 time.  Then most calls to gettimeofday() can use that value instead.  I was
 thinking of that idea for slightly longer running things though; I doubt
 that can be made accurate enough to test instrument buffer

This would limit it to those cases where hundreds of milliseconds of jitter
or more don't bother all that much.

 And while I hate to kick off massive bike-shedding in your direction, I'm
 also afraid this area--collecting stats about how long individual operations
 take--will need a much wider ranging approach than just looking at the
 buffer cache ones.  If you step back and ask what do people expect here?,
 there's a pretty large number who really want something like Oracle's
 v$session_wait  and v$system_event interface for finding the underlying
 source of slow things.  There's enough demand for that that EnterpriseDB has
 even done some work in this area too; what I've been told about it suggests
 the code isn't a great fit for contribution to community PostgreSQL though.
  Like I said, this area is really messy and hard to get right.

Yeah, something like that should probably be something to strive for. I'll
ponder a bit more  about resource and latency tracking a general. Maybe the
question here should be about the cost/benefit ratio of having some utility
now vs maintaining/deprecating the user visible interface when a more
general framework will turn up.

 Something more ambitious like the v$ stuff would also take care of what
 you're doing here; I'm not sure that what you've done helps built it though.
  Please don't take that personally.  Part of one of my own instrumentation
 patches recently was rejected out of hand for the same reason, just not
 being general enough.

No problem, I understand that half-way solutions can be more trouble than
they're worth. I actually built this to help with performance testing an
application and thought it would be an interesting experience to try to
give the community back something.

On Mon, Nov 28, 2011 at 4:40 PM, Greg Stark st...@mit.edu wrote:
 I believe on most systems on modern linux kernels gettimeofday an its ilk
 will be a vsyscall and nearly as fast as a regular function call.

clock_gettime() is implemented as a vDSO since 2.6.23. gettimeofday() has
been user context callable since before git shows any history (2.6.12).

On Mon, Nov 28, 2011 at 5:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The other big problem for a patch of this sort is that it would bloat
 the stats file.

 Yes.  Which begs the question of why we need to measure this per-table.
 I would think per-tablespace would be sufficient.

Yeah, I figured that this is something that should be discussed. I
implemented per-table collection because I thought it might be useful for
tools to pick up and show a quick overview on which tables are causing the
most IO overhead for queries.

On Mon, Nov 28, 2011 at 8:10 PM, Martijn van Oosterhout
klep...@svana.org wrote:
 Something good to know: in Linux the file
 /sys/devices/system/clocksource/clocksource0/current_clocksource
 lists the current clock source, and
 

Re: [HACKERS] Patch: add timing of buffer I/O requests

2011-11-28 Thread Greg Smith

On 11/28/2011 05:51 AM, Robert Haas wrote:

On Mon, Nov 28, 2011 at 2:54 AM, Greg Smithg...@2ndquadrant.com  wrote:

The real problem with this whole area is that we know there are
systems floating around where the amount of time taken to grab timestamps
like this is just terrible.

Assuming the feature is off by default (and I can't imagine we'd
consider anything else), I don't see why that should be cause for
concern.  If the instrumentation creates too much system load, then
don't use it: simple as that.


It's not quite that simple though.  Releasing a performance measurement 
feature that itself can perform terribly under undocumented conditions 
has a wider downside than that.


Consider that people aren't going to turn it on until they are already 
overloaded.  If that has the potential to completely tank performance, 
we better make sure that area is at least explored usefully first; the 
minimum diligence should be to document that fact and make suggestions 
for avoiding or testing it.


Instrumentation that can itself become a performance problem is an 
advocacy problem waiting to happen.  As I write this I'm picturing such 
an encounter resulting in an angry blog post, about how this proves 
PostgreSQL isn't usable for serious systems because someone sees massive 
overhead turning this on.  Right now the primary exposure to this class 
of issue is EXPLAIN ANALYZE.  When I was working on my book, I went out 
of my way to find a worst case for that[1], and that turned out to be a 
query that went from 7.994ms to 69.837ms when instrumented.  I've been 
meaning to investigate what was up there since finding that one.  The 
fact that we already have one such problem bit exposed already worries 
me; I'd really prefer not to have two.


[1] (Dell Store 2 schema, query was SELECT count(*) FROM customers;)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2011-11-28 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes:
 On 11/28/2011 05:51 AM, Robert Haas wrote:
 Assuming the feature is off by default (and I can't imagine we'd
 consider anything else), I don't see why that should be cause for
 concern.  If the instrumentation creates too much system load, then
 don't use it: simple as that.

 It's not quite that simple though.  Releasing a performance measurement 
 feature that itself can perform terribly under undocumented conditions 
 has a wider downside than that.

Yeah, that's a good point, and the machines on which this would suck
are exactly the ones where EXPLAIN ANALYZE creates very large overhead.
We don't seem to see a lot of complaints about that anymore, but we do
still see some ... and yes, it's documented that EXPLAIN ANALYZE can add
significant overhead, but that doesn't stop the questions.

 Instrumentation that can itself become a performance problem is an 
 advocacy problem waiting to happen.  As I write this I'm picturing such 
 an encounter resulting in an angry blog post, about how this proves 
 PostgreSQL isn't usable for serious systems because someone sees massive 
 overhead turning this on.

Of course, the rejoinder could be that if you see that, you're not
testing on serious hardware.  But still, I take your point.

 Right now the primary exposure to this class 
 of issue is EXPLAIN ANALYZE.  When I was working on my book, I went out 
 of my way to find a worst case for that[1],
 [1] (Dell Store 2 schema, query was SELECT count(*) FROM customers;)

That's pretty meaningless without saying what sort of clock hardware
was on the machine...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2011-11-27 Thread Greg Smith

On 11/27/2011 04:39 PM, Ants Aasma wrote:

On the AMD I saw about 3% performance drop with timing enabled. On the
Intel machine I couldn't measure any statistically significant change.


Oh no, it's party pooper time again.  Sorry I have to be the one to do 
it this round.  The real problem with this whole area is that we know 
there are systems floating around where the amount of time taken to grab 
timestamps like this is just terrible.  I've been annoyed enough by that 
problem to spend some time digging into why that is--seems to be a bunch 
of trivia around the multiple ways to collect time info on x86 
systems--and after this CommitFest is over I was already hoping to dig 
through my notes and start quantifying that more.  So you can't really 
prove the overhead of this approach is acceptable just by showing two 
examples; we need to find one of the really terrible clocks and test 
there to get a real feel for the worst-case.


I recall a patch similar to this one was submitted by Greg Stark some 
time ago.  It used the info for different reasons--to try and figure out 
whether reads were cached or not--but I believe it withered rather than 
being implemented mainly because it ran into the same fundamental 
roadblocks here.  My memory could be wrong here, there were also 
concerns about what the data would be used for.


I've been thinking about a few ways to try and cope with this whole 
class of timing problem:


-Document the underlying problem and known workarounds, provide a way to 
test how bad the overhead is, and just throw our hands up and say 
sorry, you just can't instrument like this if someone has a slow system.


-Have one of the PostgreSQL background processes keep track of a time 
estimate on its own, only periodically pausing to sync against the real 
time.  Then most calls to gettimeofday() can use that value instead.  I 
was thinking of that idea for slightly longer running things though; I 
doubt that can be made accurate enough to test instrument buffer


And while I hate to kick off massive bike-shedding in your direction, 
I'm also afraid this area--collecting stats about how long individual 
operations take--will need a much wider ranging approach than just 
looking at the buffer cache ones.  If you step back and ask what do 
people expect here?, there's a pretty large number who really want 
something like Oracle's v$session_wait  and v$system_event interface for 
finding the underlying source of slow things.  There's enough demand for 
that that EnterpriseDB has even done some work in this area too; what 
I've been told about it suggests the code isn't a great fit for 
contribution to community PostgreSQL though.  Like I said, this area is 
really messy and hard to get right.


Something more ambitious like the v$ stuff would also take care of what 
you're doing here; I'm not sure that what you've done helps built it 
though.  Please don't take that personally.  Part of one of my own 
instrumentation patches recently was rejected out of hand for the same 
reason, just not being general enough.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers