Re: [HACKERS] Patch: add timing of buffer I/O requests
On Sun, Apr 29, 2012 at 12:26 AM, Robert Haas robertmh...@gmail.com wrote: As for track_iotiming - track_io_timing, I'm fine with that as well. I'm still grumpy about the idea of a GUC changing the explain analyze output. How would people feel about adding an explain option that explicitly requests io timing for this explain analyze and then having the io timing be enabled if either it's requested by explain analyze or if it's set on globally? That would make it more consistent with the other explain analyze options? I realize I don't get to be grumpy without actually contributing anything, but I'm happy to write up the patch if people agree with the change. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Greg Stark st...@mit.edu writes: On Sun, Apr 29, 2012 at 12:26 AM, Robert Haas robertmh...@gmail.com wrote: As for track_iotiming - track_io_timing, I'm fine with that as well. I'm still grumpy about the idea of a GUC changing the explain analyze output. How would people feel about adding an explain option that explicitly requests io timing for this explain analyze and then having the io timing be enabled if either it's requested by explain analyze or if it's set on globally? That would make it more consistent with the other explain analyze options? I think it's going to be hard to decouple that altogether. For instance, if track_io_timing were not on but you did EXPLAIN (TIMING), you'd end up with timing info getting sent to the stats collector for just that one statement. That seems a bit weird too. I see where you're coming from but I don't think it's a good idea to add an EXPLAIN option unless you can make the two behaviors (EXPLAIN reporting and stats collection) truly independent. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Robert Haas robertmh...@gmail.com writes: I like the idea of including the word block in there. I don't think it was probably a terribly good idea to abbreviate that to blk everywhere, but at this point it's probably better to be consistent, sigh. As for track_iotiming - track_io_timing, I'm fine with that as well. I made these changes, so I think we are done with the naming issues. However, I'd still like to propose that we think about adjusting the timing column datatypes, ie uniformly use float8 msec for values representing elapsed times. By my count there are six columns that would be affected: pg_stat_bgwriter.checkpoint_write_time pg_stat_bgwriter.checkpoint_sync_time pg_stat_database.blk_read_time pg_stat_database.blk_write_time pg_stat_user_functions.total_time pg_stat_user_functions.self_time The first four of these are new in 9.2, meaning that we would only be creating a compatibility issue for the last two. If we wait to do this in the future, we will have a significantly bigger problem than if we do it today. Advantages of the change are: * Better precision exposed to the user (pg_stat_user_functions has historically provided only millisecond precision). * Removal of arbitrary limit of microsecond precision. Of course, the underlying data is still no better than microsecond, but if we ever are able to migrate to OS APIs that return better-than-microsecond data, we won't have to adjust the stats view APIs to expose that data. * A chance to make the functions underlying these stats view columns agree with the exposed column definitions. Any objections out there? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Robert Haas robertmh...@gmail.com writes: ... You might want to revisit the issue of how the new columns in pg_stat_statements are named, as well. I am not sure I'm happy with that, but neither am I sure that I know what I'd like better. It's not too clear that the timing is specifically for data block reads and writes, for example. Well, the names time_read and time_write are certainly out of step with every other stats view in the system; everyplace else, such columns are named something_time (and even in this view itself the other timing column is total_time, not time_total). So that's got to change. We could just reverse the word order to read_time and write_time, or we could do something like buf_read_time or data_read_time. IIUC block_read_time/block_write_time in the pg_stat_database view are database-wide totals for the same numbers, so perhaps the pg_stat_statements column names should be consistent with those. I am kinda wondering though why those columns spell out block where every single other column name in the stats views uses the abbreviation blk. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
... btw, while I'm criticizing names, how about changing track_iotiming to track_io_timing? The former seems inelegant and unreadable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Sat, Apr 28, 2012 at 12:49 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: ... You might want to revisit the issue of how the new columns in pg_stat_statements are named, as well. I am not sure I'm happy with that, but neither am I sure that I know what I'd like better. It's not too clear that the timing is specifically for data block reads and writes, for example. Well, the names time_read and time_write are certainly out of step with every other stats view in the system; everyplace else, such columns are named something_time (and even in this view itself the other timing column is total_time, not time_total). So that's got to change. We could just reverse the word order to read_time and write_time, or we could do something like buf_read_time or data_read_time. IIUC block_read_time/block_write_time in the pg_stat_database view are database-wide totals for the same numbers, so perhaps the pg_stat_statements column names should be consistent with those. I am kinda wondering though why those columns spell out block where every single other column name in the stats views uses the abbreviation blk. I like the idea of including the word block in there. I don't think it was probably a terribly good idea to abbreviate that to blk everywhere, but at this point it's probably better to be consistent, sigh. As for track_iotiming - track_io_timing, I'm fine with that as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Robert Haas robertmh...@gmail.com writes: On Sat, Apr 14, 2012 at 10:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: There's no particular reason to think that Moore's Law is going to result in an increase in the fractional precision of timing data. It hasn't done so in the past, for sure. Perhaps, but nobody's explained what we gain out of NOT using numeric. It's slow doesn't impress me; selecting from a system view doesn't need to be lightning-fast. Well, how about the code is going to be quite a lot less readable? C can manipulate floats natively, but not numerics. Also, as was pointed out upthread, the underlying data in shared memory is almost certainly never going to be infinite-precision; so using numeric in the API seems to me to be more likely to convey a false impression of exactness than to do anything useful. However, the main thing here is that we need to do *something* here... Agreed, this has got to be pushed forward. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Wed, Apr 25, 2012 at 12:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sat, Apr 14, 2012 at 10:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: There's no particular reason to think that Moore's Law is going to result in an increase in the fractional precision of timing data. It hasn't done so in the past, for sure. Perhaps, but nobody's explained what we gain out of NOT using numeric. It's slow doesn't impress me; selecting from a system view doesn't need to be lightning-fast. Well, how about the code is going to be quite a lot less readable? C can manipulate floats natively, but not numerics. Also, as was pointed out upthread, the underlying data in shared memory is almost certainly never going to be infinite-precision; so using numeric in the API seems to me to be more likely to convey a false impression of exactness than to do anything useful. However, the main thing here is that we need to do *something* here... Agreed, this has got to be pushed forward. In the interest of furthering that goal, I propose that whoever is willing to take the time to clean this up gets to decide what to standardize on, and I'm happy to give you first crack at that if you have the cycles. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Robert Haas robertmh...@gmail.com writes: On Wed, Apr 25, 2012 at 12:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: However, the main thing here is that we need to do *something* here... Agreed, this has got to be pushed forward. In the interest of furthering that goal, I propose that whoever is willing to take the time to clean this up gets to decide what to standardize on, and I'm happy to give you first crack at that if you have the cycles. OK. I have just returned from some emergency family business, and have got assorted catching-up to do, but I will try to get to that later this week. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Wed, Apr 25, 2012 at 1:12 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Apr 25, 2012 at 12:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: However, the main thing here is that we need to do *something* here... Agreed, this has got to be pushed forward. In the interest of furthering that goal, I propose that whoever is willing to take the time to clean this up gets to decide what to standardize on, and I'm happy to give you first crack at that if you have the cycles. OK. I have just returned from some emergency family business, and have got assorted catching-up to do, but I will try to get to that later this week. Sounds good to me. You might want to revisit the issue of how the new columns in pg_stat_statements are named, as well. I am not sure I'm happy with that, but neither am I sure that I know what I'd like better. It's not too clear that the timing is specifically for data block reads and writes, for example. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Wed, Apr 25, 2012 at 5:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Also, as was pointed out upthread, the underlying data in shared memory is almost certainly never going to be infinite-precision; so using numeric in the API seems to me to be more likely to convey a false impression of exactness than to do anything useful. I don't think that follows. The underlyng data will be measured in some metric unit of time like microsecond or nanosecond or something like that. So a base-10 representation will show exactly the precision that the underlying data has. On the other hand a floating point number will show a base-2 approximation that may in fact display with more digits than the underlying data representation has. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Greg Stark st...@mit.edu writes: On Wed, Apr 25, 2012 at 5:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Also, as was pointed out upthread, the underlying data in shared memory is almost certainly never going to be infinite-precision; so using numeric in the API seems to me to be more likely to convey a false impression of exactness than to do anything useful. I don't think that follows. The underlyng data will be measured in some metric unit of time like microsecond or nanosecond or something like that. So a base-10 representation will show exactly the precision that the underlying data has. On the other hand a floating point number will show a base-2 approximation that may in fact display with more digits than the underlying data representation has. My point is that the underlying data is going to be stored in a fixed-width representation, and therefore it will have accuracy and/or range limitations that are considerably more severe than use of numeric for output might suggest to the user. In the current pg_stat_statements code, timings are in fact accumulated in float8, and emitting them as something other than float8 is just plain misleading IMHO. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Wed, Apr 25, 2012 at 1:58 PM, Greg Stark st...@mit.edu wrote: On Wed, Apr 25, 2012 at 5:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Also, as was pointed out upthread, the underlying data in shared memory is almost certainly never going to be infinite-precision; so using numeric in the API seems to me to be more likely to convey a false impression of exactness than to do anything useful. I don't think that follows. The underlyng data will be measured in some metric unit of time like microsecond or nanosecond or something like that. So a base-10 representation will show exactly the precision that the underlying data has. On the other hand a floating point number will show a base-2 approximation that may in fact display with more digits than the underlying data representation has. I wholeheartedly agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On mån, 2012-04-23 at 22:03 -0400, Robert Haas wrote: Perhaps, but nobody's explained what we gain out of NOT using numeric. So if you want to have possibly different internal and external representations, why not use interval for the external one? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Peter Eisentraut pete...@gmx.net writes: On mån, 2012-04-23 at 22:03 -0400, Robert Haas wrote: Perhaps, but nobody's explained what we gain out of NOT using numeric. So if you want to have possibly different internal and external representations, why not use interval for the external one? That doesn't add any usefulness, only extra complication for clients that want to do more arithmetic with the values. Also, as was pointed out earlier, we have a hard-coded restriction to microsecond precision with the default implementation of interval; and it's not hard to foresee the day when that won't do. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Wed, Apr 25, 2012 at 5:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: On mån, 2012-04-23 at 22:03 -0400, Robert Haas wrote: Perhaps, but nobody's explained what we gain out of NOT using numeric. So if you want to have possibly different internal and external representations, why not use interval for the external one? That doesn't add any usefulness, only extra complication for clients that want to do more arithmetic with the values. Also, as was pointed out earlier, we have a hard-coded restriction to microsecond precision with the default implementation of interval; and it's not hard to foresee the day when that won't do. Agreed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Sat, Apr 14, 2012 at 10:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: The internal representation doesn't have to be (and certainly shouldn't be) numeric. But if you translate to numeric before returning the data to the user, then you have the freedom, in the future, to whack around the internal representation however you like, without breaking backward compatibility. Choosing float8 for the external representation is fine as long as we're sure we're not ever going to want more than 16 significant digits, but I see no particular value in baking in that assumption. But perhaps, as the saying goes, 16 digits ought to be enough for anyone. There's no particular reason to think that Moore's Law is going to result in an increase in the fractional precision of timing data. It hasn't done so in the past, for sure. Perhaps, but nobody's explained what we gain out of NOT using numeric. It's slow doesn't impress me; selecting from a system view doesn't need to be lightning-fast. However, the main thing here is that we need to do *something* here... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 04/13/2012 06:22 PM, Tom Lane wrote: But (a) I *don't* want to seriously break things, and don't see a need to; (b) interval is expensive and has got its own problems, notably an internal limitation to usec resolution that we would not be able to get rid of easily. A straight float seems pretty future proof compared to a usec resolution interval. Jim was commenting in the same direction I already did, that ns resolution is not impossible to see coming. I also expect to compute plenty of derived statistics from these numbers. Interval math is good enough that I'm sure such things could be done, but it seems odd to start with those units. I appreciate that the interval type has a nice purist feel to it. My pragmatic side says we're going to pay overhead to create in that type, only to find people end up converting it right back to other types for easier math tricks. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Sat, Apr 14, 2012 at 3:27 AM, Greg Smith g...@2ndquadrant.com wrote: On 04/13/2012 06:22 PM, Tom Lane wrote: But (a) I *don't* want to seriously break things, and don't see a need to; (b) interval is expensive and has got its own problems, notably an internal limitation to usec resolution that we would not be able to get rid of easily. A straight float seems pretty future proof compared to a usec resolution interval. Jim was commenting in the same direction I already did, that ns resolution is not impossible to see coming. I also expect to compute plenty of derived statistics from these numbers. Interval math is good enough that I'm sure such things could be done, but it seems odd to start with those units. I appreciate that the interval type has a nice purist feel to it. My pragmatic side says we're going to pay overhead to create in that type, only to find people end up converting it right back to other types for easier math tricks. I'm still rooting for numeric. As somebody said upthread, performance ain't critical here; and that lets us whack around the internal representation however we like without worrying about it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 10 April 2012 19:10, Tom Lane t...@sss.pgh.pa.us wrote: Hmm. Maybe we should think about numeric ms, which would have all the same advantages but without the round-off error. Color me unimpressed ... numeric calculations are vastly more expensive than float, and where are you going to get timing data that has more than sixteen decimal digits of accuracy? +1 Besides, how do you propose to solve the problem of storing numerics in a fixed allocation of shared memory? The only comparable thing in pg_stat_statements is the query string, which is capped at track_activity_query_size bytes for this very reason. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 14 April 2012 02:42, Greg Stark st...@mit.edu wrote: Is this not subject to the birthday paradox? If you have a given hash you're worried about a collision with then you have a one-in-four-billion chance. But if you have a collection of hashes and you're worried about any collisions then it only takes about 64k before there's likely a collision. Just for the sake of the archives, assuming that there is a uniform distribution of values, by my calculations that puts the probability of collision at: pg_stat_statements.max of 1,000 = 0.00011562303995116263 and perhaps more representatively, if we follow the example in the docs: pg_stat_statements.max of 10,000 = 0.011496378237656368 It's enough of a problem that I'd expect to hear one or two complaints about it in the next few years, maybe. This is the probability of there being a collision, not the probability of someone caring about it. You probably wouldn't want to push your luck too far: pg_stat_statements.max of 100,000 = 0.6853509059051395 Even if you did, that would only mean that there was usually one, but perhaps two or three values that were collisions, out of an entire 100,000. To labour the point, you'd have to have a lot of bad luck for those to be the values that a human actually ended up caring about. Jim Nasby said upthread that selecting from the stats view isn't performance critical, and he's right. However, maintaining the stats themselves certainly is, since each and every query will have to update them, adding latency. pg_stat_statements is far from being a tool of minority interest, particularly now. People are going to want to add additional bells and whistles to it, which is fine by me, but I'm very much opposed to making everyone pay for additional features that imply performance overhead for all queries, particularly if the feature is of minority interest. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Sat, Apr 14, 2012 at 9:54 AM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 10 April 2012 19:10, Tom Lane t...@sss.pgh.pa.us wrote: Hmm. Maybe we should think about numeric ms, which would have all the same advantages but without the round-off error. Color me unimpressed ... numeric calculations are vastly more expensive than float, and where are you going to get timing data that has more than sixteen decimal digits of accuracy? +1 Besides, how do you propose to solve the problem of storing numerics in a fixed allocation of shared memory? The only comparable thing in pg_stat_statements is the query string, which is capped at track_activity_query_size bytes for this very reason. The internal representation doesn't have to be (and certainly shouldn't be) numeric. But if you translate to numeric before returning the data to the user, then you have the freedom, in the future, to whack around the internal representation however you like, without breaking backward compatibility. Choosing float8 for the external representation is fine as long as we're sure we're not ever going to want more than 16 significant digits, but I see no particular value in baking in that assumption. But perhaps, as the saying goes, 16 digits ought to be enough for anyone. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Robert Haas robertmh...@gmail.com writes: The internal representation doesn't have to be (and certainly shouldn't be) numeric. But if you translate to numeric before returning the data to the user, then you have the freedom, in the future, to whack around the internal representation however you like, without breaking backward compatibility. Choosing float8 for the external representation is fine as long as we're sure we're not ever going to want more than 16 significant digits, but I see no particular value in baking in that assumption. But perhaps, as the saying goes, 16 digits ought to be enough for anyone. There's no particular reason to think that Moore's Law is going to result in an increase in the fractional precision of timing data. It hasn't done so in the past, for sure. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Wed, Apr 11, 2012 at 9:02 AM, k...@rice.edu k...@rice.edu wrote: By using all 64-bits of the hash that we currently calculate, instead of the current use of 32-bits only, the collision probabilities are very low. That's probably true, but I'm not sure it's worth worrying about - one-in-four-billion is a pretty small probability. On the broader issue, Peter's argument here seems to boil down to there is probably a reason why this is useful and Tom's argument seems to boil down to i don't want to expose it without knowing what that reason is. Peter may well be right, but that doesn't make Tom wrong. If we are going to expose this, we ought to be able to document why we have it, and right now we can't, because we don't *really* know what it's good for, other than raising awareness of the theoretical possibility of collisions, which doesn't seem like quite enough. On another note, I think this is a sufficiently major change that we ought to add Peter's name to the Author section of the pg_stat_statements documentation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Robert Haas robertmh...@gmail.com writes: On another note, I think this is a sufficiently major change that we ought to add Peter's name to the Author section of the pg_stat_statements documentation. +1 ... as long as we have those at all, they probably ought to credit anybody who did substantial work on the module. I think that eventually we'll have to give them up, just the way that we don't credit anybody in particular as author of the core code; but for now this is a good change. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Fri, Apr 13, 2012 at 4:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On another note, I think this is a sufficiently major change that we ought to add Peter's name to the Author section of the pg_stat_statements documentation. +1 ... as long as we have those at all, they probably ought to credit anybody who did substantial work on the module. I think that eventually we'll have to give them up, just the way that we don't credit anybody in particular as author of the core code; but for now this is a good change. Yeah. I'd actually be in favor of removing them, and similar references in the comments, because they do lead and have led to disputes about who deserves to be mentioned. However, a change of this magnitude certainly deserves mention; it's not really the same module any more. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 13 April 2012 20:15, Robert Haas robertmh...@gmail.com wrote: On Wed, Apr 11, 2012 at 9:02 AM, k...@rice.edu k...@rice.edu wrote: By using all 64-bits of the hash that we currently calculate, instead of the current use of 32-bits only, the collision probabilities are very low. That's probably true, but I'm not sure it's worth worrying about - one-in-four-billion is a pretty small probability. That assumes that our hashing of query trees will exhibit uniform distribution. While it's easy enough to prove that hash_any does that, it would seem to me that that does not imply that we will exhibit perfectly uniform distribution within pg_stat_statements. The reason that I invented the jumble nomenclature to distinguish it from a straight serialisation is that, apart from the fact that many fields are simply ignored, we still couldn't deserialize what we do store from the jumble, because the correct way to serialise a tree entails serialising NULL pointers too - two non-equivalent jumbles could actually be bitwise identical. In practice, I think that adding NodeTags ought to be sufficient here, but I wouldn't like to bet my life on it. Actually, that is a point that perhaps should have been touched on in the comments at the top of the file. You may wish to take a look at my original analysis in the pg_stat_statements normalisation thread, which attempts to quantify the odds by drawing upon the mathematics of the birthday problem. On the broader issue, Peter's argument here seems to boil down to there is probably a reason why this is useful and Tom's argument seems to boil down to i don't want to expose it without knowing what that reason is. Peter may well be right, but that doesn't make Tom wrong. If we are going to expose this, we ought to be able to document why we have it, and right now we can't, because we don't *really* know what it's good for, other than raising awareness of the theoretical possibility of collisions, which doesn't seem like quite enough. Well, it's important to realise that the query string isn't really stable, to the extent that it could differ as the query is evicted and re-enters pg_stat_statements over time. The hash value is necessarily a stable identifier, as it is the very identifier used by pg_stat_statements. People are going to want to aggregate this information over long periods and across database clusters, and I would really like to facilitate that. To be honest, with the plans that we have for replication, with the addition of things like cascading replication, I think it will increasingly be the case that people prefer built-in replication. The fact that the actual hash value is affected by factors like the endianness of the architecture in question seems mostly irrelevant. If we were to expose this, I'd suggest that the documentation in the table describing each column say of the value: query_hash | stable identifier used by pg_stat_statements for the query There'd then be additional clarification after the existing reference to the hash value, which gave the required caveats about the hash value being subject to various implementation artefacts that effectively only make the values persistently indentify queries referencing particular objects in the same database (that is, the object OID values are used), or across databases that are binary clones, such as between a streaming replica master and its standby. The OID restriction alone effectively shadows all others, so there's no need to mention endianness or anything like that. Anyone who jumped to the conclusion that their aggregation tool would work fine with Slony or something based on the query_hash description alone would probably have bigger problems. On another note, I think this is a sufficiently major change that we ought to add Peter's name to the Author section of the pg_stat_statements documentation. Thanks. I actually thought this myself, but didn't want to mention it because I didn't think that it was up to me to decide, or to attempt to influence that decision. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 4/10/12 5:07 PM, Greg Smith wrote: I'd prefer to see at least usec resolution and 8 bytes of dynamic range for query related statistics. Any of these would be fine from a UI perspective to me: -float8 seconds -float8 msec -float8 usec -int8 usec I don't think int8 msec will be enough resolution to time queries for very long, if it's not already obsolete. The committed example for pg_test_timing on good hardware already clocks trivial events at a single usec. Even I/O is getting there. I've measured my Fusion-io loaner card peaking at 8GB/s, which works out to 1 usec per 8K page. None of that is even price no object hardware today; it's the stuff sitting in my office. If anything, I'd expect more timing code in the database that only has ms resolution right now will start looking fat in a year or two, and more things might need to be shifted to usec instead. Checkpoint timing can survive having less resolution because its primary drumbeat is very unlikely to drop below the minutes range. I agree that ms is on it's way out... and frankly it wouldn't surprise me if at some point we actually had need of ns resolution. Given that, I don't think ms or us definitions make sense... float8 seconds seams much more logical to me. Though, if we're going to end up seriously breaking things anyway, perhaps it would make sense to switch everything over to interval... I realize that there's more overhead there, but I don't think selecting from the stats views is exactly performance critical. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On tis, 2012-04-10 at 09:33 -0400, Robert Haas wrote: So, should we make the new columns exposed by pg_stat_statements use milliseconds, so that the block read/write timings are everywhere in milliseconds, or should we keep them as a float8, so that all the times exposed by pg_stat_statements use float8? Wouldn't interval be the proper type to represent elapsed time? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Jim Nasby j...@nasby.net writes: I agree that ms is on it's way out... and frankly it wouldn't surprise me if at some point we actually had need of ns resolution. Given that, I don't think ms or us definitions make sense... float8 seconds seams much more logical to me. Well, the important point is that it be float8, so that fractions of whatever units you choose can be represented. I do not feel a strong need to change the units in all the existing pg_stat_ columns from msec to sec; that strikes me as just breaking things to little gain. If the majority of them were in sec then I'd want to converge on that, but since the majority are in msec it seems like the path of least breakage is to converge on that. Though, if we're going to end up seriously breaking things anyway, perhaps it would make sense to switch everything over to interval... I realize that there's more overhead there, but I don't think selecting from the stats views is exactly performance critical. But (a) I *don't* want to seriously break things, and don't see a need to; (b) interval is expensive and has got its own problems, notably an internal limitation to usec resolution that we would not be able to get rid of easily. It's not even apparent to me that interval is semantically The Right Thing for values that represent an accumulation of a lot of different measurements. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Fri, Apr 13, 2012 at 8:15 PM, Robert Haas robertmh...@gmail.com wrote: That's probably true, but I'm not sure it's worth worrying about - one-in-four-billion is a pretty small probability. Is this not subject to the birthday paradox? If you have a given hash you're worried about a collision with then you have a one-in-four-billion chance. But if you have a collection of hashes and you're worried about any collisions then it only takes about 64k before there's likely a collision. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Greg Stark st...@mit.edu writes: On Fri, Apr 13, 2012 at 8:15 PM, Robert Haas robertmh...@gmail.com wrote: That's probably true, but I'm not sure it's worth worrying about - one-in-four-billion is a pretty small probability. Is this not subject to the birthday paradox? If you have a given hash you're worried about a collision with then you have a one-in-four-billion chance. But if you have a collection of hashes and you're worried about any collisions then it only takes about 64k before there's likely a collision. ... so, if pg_stat_statements.max were set as high as 64k, you would need to worry. Realistically, I'm more worried about collisions due to inadequacies in the jumble calculation logic (Peter already pointed out some risk factors in that regard). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 14 April 2012 03:01, Tom Lane t...@sss.pgh.pa.us wrote: Realistically, I'm more worried about collisions due to inadequacies in the jumble calculation logic (Peter already pointed out some risk factors in that regard). It's important to have a sense of proportion about the problem. The worst thing that a collision can do is lead the DBA on a bit of a wild goose chase. Importantly, collisions across databases and users are impossible. I've always taken the view that aggregating query statistics is a lossy process, and these kinds of problems seem like a more than acceptable price to pay for low-overhead dynamic query statistics . -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Fri, Apr 13, 2012 at 10:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: On Fri, Apr 13, 2012 at 8:15 PM, Robert Haas robertmh...@gmail.com wrote: That's probably true, but I'm not sure it's worth worrying about - one-in-four-billion is a pretty small probability. Is this not subject to the birthday paradox? If you have a given hash you're worried about a collision with then you have a one-in-four-billion chance. But if you have a collection of hashes and you're worried about any collisions then it only takes about 64k before there's likely a collision. ... so, if pg_stat_statements.max were set as high as 64k, you would need to worry. Well... at 64k, you'd be very likely to have a collision. But the whole birthday paradox thing means that there's a non-trivial collision probability even at lower numbers of entries. Seems like maybe we ought to be using 64 bits here... Realistically, I'm more worried about collisions due to inadequacies in the jumble calculation logic (Peter already pointed out some risk factors in that regard). ...especially if collisions are even more frequent than random chance would suggest. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Wed, Apr 11, 2012 at 01:53:06AM +0100, Peter Geoghegan wrote: On 11 April 2012 01:16, Tom Lane t...@sss.pgh.pa.us wrote: Peter Geoghegan pe...@2ndquadrant.com writes: On 11 April 2012 00:35, Robert Haas robertmh...@gmail.com wrote: If people need something like that, couldn't they create it by hashing the normalized query text with an arbitrary algorithm? That supposes that the normalised query text is perfectly stable. It may well not be, particularly for things like ad-hoc queries or queries generated by ORMs, across database clusters and over long periods of time - Indeed, but the hash value isn't stable either given those sorts of assumptions, so I'm not convinced that there's any advantage there. Isn't it? The hash captures the true meaning of the query, while having the database server's platform as a usually irrelevant artefact. Another thing that I forgot to mention is client encoding - it may well be fairly inconvenient to have to use the same algorithm to hash the query string across applications. You also have to hash the query string yourself again and again, which is expensive to do from Python or something, and is often inconvenient - differences beyond track_activity_query_size bytes (default:1024) are not recognised. Using an SQL code beautifier where a single byte varies now breaks everything, which developers don't expect at all (we've trained them not to), so in many ways you're back to the same limitations as classic pg_stat_statements if you attempt to aggregate queries over time and across machines, which is a very real use case. It's probably pretty annoying to have to get your Python app to use the same hash function as your Java app or whatever I, unless you want to use something heavyweight like a cryptographic hash function. By doing it within Postgres, you avoid those headaches. I'm not asking you to very loudly proclaim that it should be used like this - just expose it, accurately document it, and I'm quite confident that it will be widely used and relied upon by those that are reasonably well informed, and understand its limitations, which are really quite straightforward. What I think people would actually like to know, if they're in a situation where distinct query texts are getting hashed to the same thing, is *which* different texts got hashed to the same thing. But there's no good way to expose that given the pg_stat_statements infrastructure, and exposing the hash value doesn't help. Apart from detecting the case where we get a straightforward collision, I don't expect that that would be useful. The whole point is that the user doesn't care about the difference, and I think we've specified a practical, widely useful standard for when queries should be considered equivalent. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services By using all 64-bits of the hash that we currently calculate, instead of the current use of 32-bits only, the collision probabilities are very low. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Thu, Apr 5, 2012 at 11:45 AM, Robert Haas robertmh...@gmail.com wrote: Meanwhile, pg_stat_statements converts the same data to seconds but makes it a double rather than a bigint. I think that was a bad idea and we should make it consistent use a bigint and milliseconds while we still can. Hmm. So, on further review, this is not as simple as it seems. I'd like some input from other people on what we should do here. pg_stat_statements has long exposed a column called total_time as a float8. It now exposes columns time_read and time_write which are actually measuring the time spent reading and writing data blocks, and those are also exposed as a float8; all these count seconds. Meanwhile, all times exposed by the stats collector (including the new and analagous pg_stat_database.block_read_time and pg_stat_database.block_write_time columns) are exposed as int8; these count milliseconds. So, should we make the new columns exposed by pg_stat_statements use milliseconds, so that the block read/write timings are everywhere in milliseconds, or should we keep them as a float8, so that all the times exposed by pg_stat_statements use float8? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Robert Haas robertmh...@gmail.com writes: Hmm. So, on further review, this is not as simple as it seems. I'd like some input from other people on what we should do here. pg_stat_statements has long exposed a column called total_time as a float8. It now exposes columns time_read and time_write which are actually measuring the time spent reading and writing data blocks, and those are also exposed as a float8; all these count seconds. Meanwhile, all times exposed by the stats collector (including the new and analagous pg_stat_database.block_read_time and pg_stat_database.block_write_time columns) are exposed as int8; these count milliseconds. So, should we make the new columns exposed by pg_stat_statements use milliseconds, so that the block read/write timings are everywhere in milliseconds, or should we keep them as a float8, so that all the times exposed by pg_stat_statements use float8? Given that we've whacked pg_stat_statements' behavior around rather thoroughly in this release, maybe we could get away with redefining total_time as being measured in msec rather than sec, thereby aligning units as msec across the board. It's arguably a smaller deal than the way we've redefined what the query column contains... float8 vs int8 is a distinct issue, and probably one that is not as much of an impact on clients if we change it. It is not hard to predict that somebody will eventually want sub-msec resolution on these things, which would suggest that float8 would be the better idea. But perhaps we could leave that change for a future release. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 10 April 2012 14:33, Robert Haas robertmh...@gmail.com wrote: So, should we make the new columns exposed by pg_stat_statements use milliseconds, so that the block read/write timings are everywhere in milliseconds, or should we keep them as a float8, so that all the times exposed by pg_stat_statements use float8? I believe that we should keep them as float8, on the basis that a user is more likely to generalise from total_time's format (when writing a script to query the view of whatever) than from that of pg_stat_database. A part of me would like to change the view definitions so that all the columns are strongly typed (i.e. all these values would be intervals). I realise that that isn't practical though. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Tue, Apr 10, 2012 at 10:06 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Hmm. So, on further review, this is not as simple as it seems. I'd like some input from other people on what we should do here. pg_stat_statements has long exposed a column called total_time as a float8. It now exposes columns time_read and time_write which are actually measuring the time spent reading and writing data blocks, and those are also exposed as a float8; all these count seconds. Meanwhile, all times exposed by the stats collector (including the new and analagous pg_stat_database.block_read_time and pg_stat_database.block_write_time columns) are exposed as int8; these count milliseconds. So, should we make the new columns exposed by pg_stat_statements use milliseconds, so that the block read/write timings are everywhere in milliseconds, or should we keep them as a float8, so that all the times exposed by pg_stat_statements use float8? Given that we've whacked pg_stat_statements' behavior around rather thoroughly in this release, maybe we could get away with redefining total_time as being measured in msec rather than sec, thereby aligning units as msec across the board. It's arguably a smaller deal than the way we've redefined what the query column contains... Retyping columns is an awfully good way to produce grumpy users. Then again, if we're going to do it, it would certainly be better to do it now rather than later, because right now I'm guessing pg_stat_statements is a lot less heavily used than it will be after 9.2 hits shelves. float8 vs int8 is a distinct issue, and probably one that is not as much of an impact on clients if we change it. It is not hard to predict that somebody will eventually want sub-msec resolution on these things, which would suggest that float8 would be the better idea. But perhaps we could leave that change for a future release. Well, internally, the I/O timing stuff as well as the function timing stuff use microseconds, and the SQL functions expose it as microseconds, but then the view divides by 1000 to convert to milliseconds. I made the I/O timing stuff do it that way because that's how the function timing stuff does it, but it does seem a little random. One thing in its favor is that it provides a way for users to get this if they want it, without screwing readability for the vast majority who don't care. On the flip side, the new checkpoint timing stuff is in milliseconds all the way through, though it seems vanishingly unlikely that anyone needs more resolution in that case. We have lots of other things in milliseconds, too. No matter what we end up doing here it will be consistent with something; I am reminded of the phrase the good thing about standards is that there are so many to choose from -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Robert Haas robertmh...@gmail.com writes: No matter what we end up doing here it will be consistent with something; I am reminded of the phrase the good thing about standards is that there are so many to choose from Well, FWIW I vote for making the new columns be float8 msec. If you don't want to change total_time to match, I guess there's no law that says it *has* to be consistent ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Tue, Apr 10, 2012 at 17:58, Robert Haas robertmh...@gmail.com wrote: On Tue, Apr 10, 2012 at 10:06 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Hmm. So, on further review, this is not as simple as it seems. I'd like some input from other people on what we should do here. pg_stat_statements has long exposed a column called total_time as a float8. It now exposes columns time_read and time_write which are actually measuring the time spent reading and writing data blocks, and those are also exposed as a float8; all these count seconds. Meanwhile, all times exposed by the stats collector (including the new and analagous pg_stat_database.block_read_time and pg_stat_database.block_write_time columns) are exposed as int8; these count milliseconds. So, should we make the new columns exposed by pg_stat_statements use milliseconds, so that the block read/write timings are everywhere in milliseconds, or should we keep them as a float8, so that all the times exposed by pg_stat_statements use float8? Given that we've whacked pg_stat_statements' behavior around rather thoroughly in this release, maybe we could get away with redefining total_time as being measured in msec rather than sec, thereby aligning units as msec across the board. It's arguably a smaller deal than the way we've redefined what the query column contains... Retyping columns is an awfully good way to produce grumpy users. Then again, if we're going to do it, it would certainly be better to do it now rather than later, because right now I'm guessing pg_stat_statements is a lot less heavily used than it will be after 9.2 hits shelves. Agreed. It's better if we can also change the name of it - provided we can come up with a reasonable new name. Then peoples applications will break *visibly*, which is a lot better than breaking invisibly. (This is the main reason why we renamed current_query in pg_stat_activity..) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Magnus Hagander mag...@hagander.net writes: On Tue, Apr 10, 2012 at 17:58, Robert Haas robertmh...@gmail.com wrote: On Tue, Apr 10, 2012 at 10:06 AM, Tom Lane t...@sss.pgh.pa.us wrote: Given that we've whacked pg_stat_statements' behavior around rather thoroughly in this release, maybe we could get away with redefining total_time as being measured in msec rather than sec, thereby aligning units as msec across the board. It's arguably a smaller deal than the way we've redefined what the query column contains... Retyping columns is an awfully good way to produce grumpy users. Then again, if we're going to do it, it would certainly be better to do it now rather than later, because right now I'm guessing pg_stat_statements is a lot less heavily used than it will be after 9.2 hits shelves. Agreed. It's better if we can also change the name of it - provided we can come up with a reasonable new name. Then peoples applications will break *visibly*, which is a lot better than breaking invisibly. (This is the main reason why we renamed current_query in pg_stat_activity..) That might be overkill. Changing the column name will definitely break anything more specific than select * from pg_stat_statements. However, it's less clear that changing the units in which the column is expressed will break things. It seems likely to me that nobody out there is doing anything much more sophisticated than sorting by the column, and that's still going to work the same. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Tue, Apr 10, 2012 at 18:27, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Tue, Apr 10, 2012 at 17:58, Robert Haas robertmh...@gmail.com wrote: On Tue, Apr 10, 2012 at 10:06 AM, Tom Lane t...@sss.pgh.pa.us wrote: Given that we've whacked pg_stat_statements' behavior around rather thoroughly in this release, maybe we could get away with redefining total_time as being measured in msec rather than sec, thereby aligning units as msec across the board. It's arguably a smaller deal than the way we've redefined what the query column contains... Retyping columns is an awfully good way to produce grumpy users. Then again, if we're going to do it, it would certainly be better to do it now rather than later, because right now I'm guessing pg_stat_statements is a lot less heavily used than it will be after 9.2 hits shelves. Agreed. It's better if we can also change the name of it - provided we can come up with a reasonable new name. Then peoples applications will break *visibly*, which is a lot better than breaking invisibly. (This is the main reason why we renamed current_query in pg_stat_activity..) That might be overkill. Changing the column name will definitely break anything more specific than select * from pg_stat_statements. However, it's less clear that changing the units in which the column is expressed will break things. It seems likely to me that nobody out there is doing anything much more sophisticated than sorting by the column, and that's still going to work the same. I've seen cases where the timing is correlated with external timings, e.g. from the application. Have I seen it a lot? No - but then I haven't seen a big usage of pg_stat_statements either, which might be the better argument for allowing a change of unit but not name. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Tue, Apr 10, 2012 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: No matter what we end up doing here it will be consistent with something; I am reminded of the phrase the good thing about standards is that there are so many to choose from Well, FWIW I vote for making the new columns be float8 msec. If you don't want to change total_time to match, I guess there's no law that says it *has* to be consistent ... Ugh. So the three ways of doing timing that we have already aren't enough, and we need a fourth one? Ack! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Robert Haas robertmh...@gmail.com writes: On Tue, Apr 10, 2012 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, FWIW I vote for making the new columns be float8 msec. Ugh. So the three ways of doing timing that we have already aren't enough, and we need a fourth one? Ack! Huh? I understood what you said upthread to be that we have two ways in existing releases (anything unreleased has zero standing in this discussion): float8 sec in pg_stat_statements.total_time, and int8 msec everywhere else. Did I miss something? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Tue, Apr 10, 2012 at 1:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Apr 10, 2012 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, FWIW I vote for making the new columns be float8 msec. Ugh. So the three ways of doing timing that we have already aren't enough, and we need a fourth one? Ack! Huh? I understood what you said upthread to be that we have two ways in existing releases (anything unreleased has zero standing in this discussion): float8 sec in pg_stat_statements.total_time, and int8 msec everywhere else. Did I miss something? We also have int8 usec floating around. But even if we didn't, float8 msec would be a new one, regardless of whether it would be third or fourth... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Robert Haas robertmh...@gmail.com writes: On Tue, Apr 10, 2012 at 1:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: Huh? I understood what you said upthread to be that we have two ways in existing releases (anything unreleased has zero standing in this discussion): float8 sec in pg_stat_statements.total_time, and int8 msec everywhere else. Did I miss something? We also have int8 usec floating around. But even if we didn't, float8 msec would be a new one, regardless of whether it would be third or fourth... It would still be the second one, because it would replace the only use of float8 sec, no? And more to the point, it converges us on msec being the only exposed unit. The business about underlying microseconds is maybe not so good, but I don't think we want to touch that right now. In the long run I think it would make sense to converge on float8 msec as being the standard for exposed timing values, because that is readily adaptable to the underlying data having nsec or even better precision. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Tue, Apr 10, 2012 at 1:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Apr 10, 2012 at 1:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: Huh? I understood what you said upthread to be that we have two ways in existing releases (anything unreleased has zero standing in this discussion): float8 sec in pg_stat_statements.total_time, and int8 msec everywhere else. Did I miss something? We also have int8 usec floating around. But even if we didn't, float8 msec would be a new one, regardless of whether it would be third or fourth... It would still be the second one, because it would replace the only use of float8 sec, no? And more to the point, it converges us on msec being the only exposed unit. The business about underlying microseconds is maybe not so good, but I don't think we want to touch that right now. In the long run I think it would make sense to converge on float8 msec as being the standard for exposed timing values, because that is readily adaptable to the underlying data having nsec or even better precision. Hmm. Maybe we should think about numeric ms, which would have all the same advantages but without the round-off error. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Tue, Apr 10, 2012 at 02:01:02PM -0400, Robert Haas wrote: On Tue, Apr 10, 2012 at 1:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Apr 10, 2012 at 1:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: Huh? I understood what you said upthread to be that we have two ways in existing releases (anything unreleased has zero standing in this discussion): float8 sec in pg_stat_statements.total_time, and int8 msec everywhere else. Did I miss something? We also have int8 usec floating around. But even if we didn't, float8 msec would be a new one, regardless of whether it would be third or fourth... It would still be the second one, because it would replace the only use of float8 sec, no? And more to the point, it converges us on msec being the only exposed unit. The business about underlying microseconds is maybe not so good, but I don't think we want to touch that right now. In the long run I think it would make sense to converge on float8 msec as being the standard for exposed timing values, because that is readily adaptable to the underlying data having nsec or even better precision. Hmm. Maybe we should think about numeric ms, which would have all the same advantages but without the round-off error. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company They are also a lot bigger with tons of added overhead. :) Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Robert Haas robertmh...@gmail.com writes: On Tue, Apr 10, 2012 at 1:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: The business about underlying microseconds is maybe not so good, but I don't think we want to touch that right now. In the long run I think it would make sense to converge on float8 msec as being the standard for exposed timing values, because that is readily adaptable to the underlying data having nsec or even better precision. Hmm. Maybe we should think about numeric ms, which would have all the same advantages but without the round-off error. Color me unimpressed ... numeric calculations are vastly more expensive than float, and where are you going to get timing data that has more than sixteen decimal digits of accuracy? IME we're lucky to get three repeatable digits in any timing measurement. The point of using a non-integer type here is not so much precision as dynamic range: sometimes you might be measuring queries that run for hours, and other times ones that run for microseconds. In the latter case it's important to be able to represent nanoseconds, but not so much in the former. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 04/10/2012 12:27 PM, Tom Lane wrote: Changing the column name will definitely break anything more specific than select * from pg_stat_statements. However, it's less clear that changing the units in which the column is expressed will break things. It seems likely to me that nobody out there is doing anything much more sophisticated than sorting by the column, and that's still going to work the same. I am doing more sophisticated things with it, so I'll celebrate this as my opportunity to say I did something you didn't see coming for 2012. All the sites involved will happily shred those scripts and rewrite for either normalized queries *or* better I/O timing info though, so net positive for 9.2 changes even if this part breaks on them. I think this is one of those rare opportunities where there's enough positive goodwill from changes to ask what's the best way to handle this long-term? and get away with whatever change that requires, too. I'm really not liking the look of this wart now that Robert has pointed it out. I'd prefer to see at least usec resolution and 8 bytes of dynamic range for query related statistics. Any of these would be fine from a UI perspective to me: -float8 seconds -float8 msec -float8 usec -int8 usec I don't think int8 msec will be enough resolution to time queries for very long, if it's not already obsolete. The committed example for pg_test_timing on good hardware already clocks trivial events at a single usec. Even I/O is getting there. I've measured my Fusion-io loaner card peaking at 8GB/s, which works out to 1 usec per 8K page. None of that is even price no object hardware today; it's the stuff sitting in my office. If anything, I'd expect more timing code in the database that only has ms resolution right now will start looking fat in a year or two, and more things might need to be shifted to usec instead. Checkpoint timing can survive having less resolution because its primary drumbeat is very unlikely to drop below the minutes range. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 10 April 2012 23:07, Greg Smith g...@2ndquadrant.com wrote: On 04/10/2012 12:27 PM, Tom Lane wrote: I am doing more sophisticated things with it, so I'll celebrate this as my opportunity to say I did something you didn't see coming for 2012. This is why I requested that we expose the query_id hash value - I believe that it will be generally useful in clustering situations. It would be nice to have a persistent identifier. While we're discussing revising pg_stat_statement's interface, are you still opposed to exposing that value, Tom? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Tue, Apr 10, 2012 at 6:32 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 10 April 2012 23:07, Greg Smith g...@2ndquadrant.com wrote: On 04/10/2012 12:27 PM, Tom Lane wrote: I am doing more sophisticated things with it, so I'll celebrate this as my opportunity to say I did something you didn't see coming for 2012. This is why I requested that we expose the query_id hash value - I believe that it will be generally useful in clustering situations. It would be nice to have a persistent identifier. While we're discussing revising pg_stat_statement's interface, are you still opposed to exposing that value, Tom? If people need something like that, couldn't they create it by hashing the normalized query text with an arbitrary algorithm? The only obvious advantage of exposing the value used internally is that it might be helpful in terms of understanding the collision behavior. But hopefully collisions are pretty rare anyway, so... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Peter Geoghegan pe...@2ndquadrant.com writes: On 10 April 2012 23:07, Greg Smith g...@2ndquadrant.com wrote: On 04/10/2012 12:27 PM, Tom Lane wrote: I am doing more sophisticated things with it, so I'll celebrate this as my opportunity to say I did something you didn't see coming for 2012. This is why I requested that we expose the query_id hash value - I believe that it will be generally useful in clustering situations. It would be nice to have a persistent identifier. While we're discussing revising pg_stat_statement's interface, are you still opposed to exposing that value, Tom? I still am. I'm unconvinced by references to clustering situations, because as constructed the hash is extremely database-specific. It will vary depending on OID assignments, not to mention platform characteristics such as word width and endianness. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 11 April 2012 00:35, Robert Haas robertmh...@gmail.com wrote: If people need something like that, couldn't they create it by hashing the normalized query text with an arbitrary algorithm? That supposes that the normalised query text is perfectly stable. It may well not be, particularly for things like ad-hoc queries or queries generated by ORMs, across database clusters and over long periods of time - you're basically throwing the benefit of all of that intelligent normalisation out of the window, because it's pretty close to free to expose it. What if a developer tweaks an alias in the application for clarity? Also, as you point out, it has additional utility in advertising when a collision has happened, and setting the user's expectations appropriately. I assume that collisions are very rare, but when they do happen, this gives you a fighting chance of noticing them. As Tom points out, the query hash will vary according to platform specific characteristics, including endianness, and will require OIDs are the same on every node. However, it is still going to be useful in clusters that use streaming replication, though not a third party trigger based replication system for example, because streaming replication does of course require that those factors (and rather a lot more) will be identical across the cluster anyway. Realistically, I'd expect a large majority of people interested in this feature to only want to use it with streaming replication anyway. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Peter Geoghegan pe...@2ndquadrant.com writes: On 11 April 2012 00:35, Robert Haas robertmh...@gmail.com wrote: If people need something like that, couldn't they create it by hashing the normalized query text with an arbitrary algorithm? That supposes that the normalised query text is perfectly stable. It may well not be, particularly for things like ad-hoc queries or queries generated by ORMs, across database clusters and over long periods of time - Indeed, but the hash value isn't stable either given those sorts of assumptions, so I'm not convinced that there's any advantage there. What I think people would actually like to know, if they're in a situation where distinct query texts are getting hashed to the same thing, is *which* different texts got hashed to the same thing. But there's no good way to expose that given the pg_stat_statements infrastructure, and exposing the hash value doesn't help. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 11 April 2012 01:16, Tom Lane t...@sss.pgh.pa.us wrote: Peter Geoghegan pe...@2ndquadrant.com writes: On 11 April 2012 00:35, Robert Haas robertmh...@gmail.com wrote: If people need something like that, couldn't they create it by hashing the normalized query text with an arbitrary algorithm? That supposes that the normalised query text is perfectly stable. It may well not be, particularly for things like ad-hoc queries or queries generated by ORMs, across database clusters and over long periods of time - Indeed, but the hash value isn't stable either given those sorts of assumptions, so I'm not convinced that there's any advantage there. Isn't it? The hash captures the true meaning of the query, while having the database server's platform as a usually irrelevant artefact. Another thing that I forgot to mention is client encoding - it may well be fairly inconvenient to have to use the same algorithm to hash the query string across applications. You also have to hash the query string yourself again and again, which is expensive to do from Python or something, and is often inconvenient - differences beyond track_activity_query_size bytes (default:1024) are not recognised. Using an SQL code beautifier where a single byte varies now breaks everything, which developers don't expect at all (we've trained them not to), so in many ways you're back to the same limitations as classic pg_stat_statements if you attempt to aggregate queries over time and across machines, which is a very real use case. It's probably pretty annoying to have to get your Python app to use the same hash function as your Java app or whatever I, unless you want to use something heavyweight like a cryptographic hash function. By doing it within Postgres, you avoid those headaches. I'm not asking you to very loudly proclaim that it should be used like this - just expose it, accurately document it, and I'm quite confident that it will be widely used and relied upon by those that are reasonably well informed, and understand its limitations, which are really quite straightforward. What I think people would actually like to know, if they're in a situation where distinct query texts are getting hashed to the same thing, is *which* different texts got hashed to the same thing. But there's no good way to expose that given the pg_stat_statements infrastructure, and exposing the hash value doesn't help. Apart from detecting the case where we get a straightforward collision, I don't expect that that would be useful. The whole point is that the user doesn't care about the difference, and I think we've specified a practical, widely useful standard for when queries should be considered equivalent. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Tue, Mar 27, 2012 at 8:30 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Mar 27, 2012 at 3:22 PM, Ants Aasma a...@cybertec.at wrote: On Tue, Mar 27, 2012 at 9:58 PM, Robert Haas robertmh...@gmail.com wrote: I've committed the core of this. I left out the stats collector stuff, because it's still per-table and I think perhaps we should back off to just per-database. I changed it so that it does not conflate wait time with I/O time. Maybe there should be a separate method of measuring wait time, but I don't think it's a good idea for the per-backend stats to measure a different thing than what gets reported up to the stats collector - we should have ONE definition of each counter. I also tweaked the EXPLAIN output format a bit, and the docs. Thank you for working on this. Taking a fresh look at it, I agree with you that conflating waiting for backend local IO, and IO performed by some other backend might paint us into a corner. For most workloads the wait for IO performed by others should be the minority anyway. I won't really miss the per table stats. But if the pg_stat_statements normalisation patch gets commited, it would be really neat to also have IO waits there. It would be much easier to quickly diagnose what is causing all this IO issues. So, the pg_stat_statements part of this is committed now. Do you want to prepare a revised patch to add per-database counters to the statistics collector? I think that might be a good idea... Hearing nothing further on this point, I went and did it myself. In the process I noticed a couple of things that I think we need to fix. Currently, the statistics views that include timing information are displayed in milliseconds (see pg_stat_user_functions), while the underlying SQL-callable functions return the data in microseconds. Whether or not this was a good design decision, we're stuck with it now; the documentation implies that the views and functions use the same units. I'll go fix that next. Meanwhile, pg_stat_statements converts the same data to seconds but makes it a double rather than a bigint. I think that was a bad idea and we should make it consistent use a bigint and milliseconds while we still can. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Thu, Mar 22, 2012 at 7:38 PM, Ants Aasma a...@cybertec.at wrote: On Wed, Mar 21, 2012 at 5:01 PM, Robert Haas robertmh...@gmail.com wrote: This seems to have bitrotted again. :-( Can you please rebase again? Rebased. I've committed the core of this. I left out the stats collector stuff, because it's still per-table and I think perhaps we should back off to just per-database. I changed it so that it does not conflate wait time with I/O time. Maybe there should be a separate method of measuring wait time, but I don't think it's a good idea for the per-backend stats to measure a different thing than what gets reported up to the stats collector - we should have ONE definition of each counter. I also tweaked the EXPLAIN output format a bit, and the docs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Tue, Mar 27, 2012 at 2:58 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Mar 22, 2012 at 7:38 PM, Ants Aasma a...@cybertec.at wrote: On Wed, Mar 21, 2012 at 5:01 PM, Robert Haas robertmh...@gmail.com wrote: This seems to have bitrotted again. :-( Can you please rebase again? Rebased. I've committed the core of this. I left out the stats collector stuff, because it's still per-table and I think perhaps we should back off to just per-database. I changed it so that it does not conflate wait time with I/O time. Maybe there should be a separate method of measuring wait time, but I don't think it's a good idea for the per-backend stats to measure a different thing than what gets reported up to the stats collector - we should have ONE definition of each counter. I also tweaked the EXPLAIN output format a bit, and the docs. And I've now committed the pg_stat_statements code as well, hopefully not stomping too badly one what Tom's apparently in the midst of doing with Peter's pg_stat_statements patch. I committed this almost exactly as submitted; just a minor code style correction and a few documentation enhancements. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Tue, Mar 27, 2012 at 9:58 PM, Robert Haas robertmh...@gmail.com wrote: I've committed the core of this. I left out the stats collector stuff, because it's still per-table and I think perhaps we should back off to just per-database. I changed it so that it does not conflate wait time with I/O time. Maybe there should be a separate method of measuring wait time, but I don't think it's a good idea for the per-backend stats to measure a different thing than what gets reported up to the stats collector - we should have ONE definition of each counter. I also tweaked the EXPLAIN output format a bit, and the docs. Thank you for working on this. Taking a fresh look at it, I agree with you that conflating waiting for backend local IO, and IO performed by some other backend might paint us into a corner. For most workloads the wait for IO performed by others should be the minority anyway. I won't really miss the per table stats. But if the pg_stat_statements normalisation patch gets commited, it would be really neat to also have IO waits there. It would be much easier to quickly diagnose what is causing all this IO issues. Thanks again, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Tue, Mar 27, 2012 at 3:22 PM, Ants Aasma a...@cybertec.at wrote: On Tue, Mar 27, 2012 at 9:58 PM, Robert Haas robertmh...@gmail.com wrote: I've committed the core of this. I left out the stats collector stuff, because it's still per-table and I think perhaps we should back off to just per-database. I changed it so that it does not conflate wait time with I/O time. Maybe there should be a separate method of measuring wait time, but I don't think it's a good idea for the per-backend stats to measure a different thing than what gets reported up to the stats collector - we should have ONE definition of each counter. I also tweaked the EXPLAIN output format a bit, and the docs. Thank you for working on this. Taking a fresh look at it, I agree with you that conflating waiting for backend local IO, and IO performed by some other backend might paint us into a corner. For most workloads the wait for IO performed by others should be the minority anyway. I won't really miss the per table stats. But if the pg_stat_statements normalisation patch gets commited, it would be really neat to also have IO waits there. It would be much easier to quickly diagnose what is causing all this IO issues. So, the pg_stat_statements part of this is committed now. Do you want to prepare a revised patch to add per-database counters to the statistics collector? I think that might be a good idea... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Tue, Mar 27, 2012 at 7:58 PM, Robert Haas robertmh...@gmail.com wrote: I've committed the core of this. I left out the stats collector stuff, because it's still per-table and I think perhaps we should back off to just per-database. I changed it so that it does not conflate wait time with I/O time. Maybe there should be a separate method of measuring wait time, but I don't think it's a good idea for the per-backend stats to measure a different thing than what gets reported up to the stats collector - we should have ONE definition of each counter. I also tweaked the EXPLAIN output format a bit, and the docs. Maybe I missed some earlier discussoin -- I've been having trouble keeping up with the lists. But was there discussion of why this is a GUC? Why not just another parameter to EXPLAIN like the others? i.e. EXPLAIN (ANALYZE, BUFFERS, IOTIMING) -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Tue, Mar 27, 2012 at 8:41 PM, Greg Stark st...@mit.edu wrote: On Tue, Mar 27, 2012 at 7:58 PM, Robert Haas robertmh...@gmail.com wrote: I've committed the core of this. I left out the stats collector stuff, because it's still per-table and I think perhaps we should back off to just per-database. I changed it so that it does not conflate wait time with I/O time. Maybe there should be a separate method of measuring wait time, but I don't think it's a good idea for the per-backend stats to measure a different thing than what gets reported up to the stats collector - we should have ONE definition of each counter. I also tweaked the EXPLAIN output format a bit, and the docs. Maybe I missed some earlier discussoin -- I've been having trouble keeping up with the lists. But was there discussion of why this is a GUC? Why not just another parameter to EXPLAIN like the others? i.e. EXPLAIN (ANALYZE, BUFFERS, IOTIMING) Because you want to be able to expose the data even for queries that aren't explained. Right now, you can do that with pg_stat_statements; and the original patch also had per-table counters, but I didn't commit that part due to some concerns about stats bloat. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Fri, Feb 24, 2012 at 2:23 PM, Ants Aasma ants.aa...@eesti.ee wrote: On Wed, Feb 22, 2012 at 6:35 PM, Ants Aasma ants.aa...@eesti.ee wrote: Some implementation notes. This currently fails regression test create_function_3, haven't looked into why yet. I'll take a look at it. The failure was due to leakproof changes to pgproc. Attached patches are adjusted accordingly and rebased over Robert's blocks dirtied patch. This seems to have bitrotted again. :-( Can you please rebase again? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 2/22/12 10:35 AM, Ants Aasma wrote: The reason why I didn't add write timings to relation stats is that I couldn't figure out what the semantics should be. It could be either time spent waiting for this relations blocks to be written out or time spent waiting for some other relations blocks to be written out to free space for this relations block or maybe distribute the cost, background writes could be included or excluded. Writes usually return quickly, unless lots of possibly unrelated writes have dirtied enough of OS cache, etc. I figured that what ever choices I made, they wouldn't really help anyone diagnose anything. Having global write timings in pg_stat_bgwriter might be useful, but I feel that is something for another patch. I know it's not perfect, but I would argue that what users care about most of the time is time taken up in actual backends. So I wouldn't worry about bgwriter. I also wouldn't worry about time spent waiting to find a buffer at this point (see below). I still think a full wait timing interface is the right long-term direction here. It's hard to reject this idea when it seems to be working right now though, while more comprehensive wait storage is still at least a release off. Opinions welcome, I'm still juggling this around now that I have it working again. I agree that wait timing interface is the right direction. I have thought a bit about it and could share some ideas - maybe I should create a wiki page where the general design could be hashed out? Yes, I think a wiki would be a good place to start. As you showed in your previous question about writes there's a *lot* of places where timing info would be useful to us. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Wed, Feb 22, 2012 at 4:43 PM, Greg Smith g...@2ndquadrant.com wrote: Attached are updated versions of this feature without the pg_test_timing tool part, since I broke that out into another discussion thread. I've split the part that updates pg_stat_statistics out from the main feature too, separate patch attached to here (but I'm not reviewing that yet). Lots of bitrot since this was submitted, and yes I noticed that I've almost recreated earlier versions of this patch--by splitting off the parts that were developed later. Thanks for the review and splitting. Sorry I didn't fix up the bit rot myself. Earlier discussion of this got side tracked on a few things, partly my fault. It's worth taking a look at what this provides before judging it too much. It can demo well. The stated purpose is helping figure out what relations are gobbling up the most access time, presumably to optimize them and/or the storage they are on. What do I put onto SSD is surely a popular request nowadays. I should have stated the purpose more clearly. The original reason for developing this patch was to figure out what queries are taking the most time and why, specifically in the case where OS memory is a lot larger than shared_buffers. Basically the following query to get a quick overview where the bottlenecks are: SELECT query, total_time, (time_read+time_write)/total_time AS io_fraction FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20; This of course hugely benefits from Peter's pg_stat_statements normalization patch. Tracking timings per relation was actually an afterthought. Now, the first critical question to ask is what additional information is this providing above the existing counters? After all, it's possible to tell pgbench_accounts is the hotspot just from comparing heap_blks_read, right? Like I said above, I find it mostly useful to see what is missing the OS cache. With memory being as cheap as it is, a reasonably priced server can have 128G of memory, while max recommended value for shared_buffers is 8GB. It's quite likely to have tables that fit into OS cache but not into shared_buffers, but it's not trivial to figure out which those are. This run looks useful at providing the data wished for--that read times are slower per capita from the accounts table. The first time I tried this I got a bizarre high number for pgbench_branches.heap_blks_time ; I'm not sure how reliable this is yet. One problem that might be easy to fix is that the write timing info doesn't show in any of these system views, only in EXPLAIN and statement level ones. I'm not sure about the source of the huge number, might instability in the clock source. Have you tried running the monotonicity check for a longer period while the system is under load? Another issue with the current timing code is that gettimeofday isn't guaranteed to be monotonic anyway, things like NTP adjustments can make time go backwards. clock_gettime with CLOCK_MONOTONIC_RAW would be better, but that's linux specific :( The reason why I didn't add write timings to relation stats is that I couldn't figure out what the semantics should be. It could be either time spent waiting for this relations blocks to be written out or time spent waiting for some other relations blocks to be written out to free space for this relations block or maybe distribute the cost, background writes could be included or excluded. Writes usually return quickly, unless lots of possibly unrelated writes have dirtied enough of OS cache, etc. I figured that what ever choices I made, they wouldn't really help anyone diagnose anything. Having global write timings in pg_stat_bgwriter might be useful, but I feel that is something for another patch. I still think a full wait timing interface is the right long-term direction here. It's hard to reject this idea when it seems to be working right now though, while more comprehensive wait storage is still at least a release off. Opinions welcome, I'm still juggling this around now that I have it working again. I agree that wait timing interface is the right direction. I have thought a bit about it and could share some ideas - maybe I should create a wiki page where the general design could be hashed out? Anyway, the user visible information from this patch should be trivial to extract from a general wait timing framework. Pushing my own agenda a bit - having this patch in the current release would help to get some field experience on any issues surrounding timing :) Some implementation notes. This currently fails regression test create_function_3, haven't looked into why yet. I'll take a look at it. Thanks again. -- Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Attached is the pg_test_timing utility portion of this submission, broken out into its own patch. It's a contrib module modeled on pg_test_fsync. The documentation is still a bit rough, I'm not done with that yet. I have included an example of good timing results, switching to a bad clock source, and the resulting bad results. Code review found some formatting things to nitpick I've already fixed: non-standard brace locations and not including enough spaces in expressions were the main two. This is now referenced by the existing cryptic documentation comment around EXPLAIN ANALYZE, which says that overhead can be high because gettimeofday is slow on some systems. Since this utility measures that directly, I think it's a clear win to include it just for that purpose. The fact that there are more places coming where timing overhead matters is also true. But this existing one is already bad enough to justify shipping something to help measure/manage it in my mind. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com diff --git a/contrib/Makefile b/contrib/Makefile index 0c238aa..45b601c 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -35,6 +35,7 @@ SUBDIRS = \ pg_standby \ pg_stat_statements \ pg_test_fsync \ + pg_test_timing \ pg_trgm \ pg_upgrade \ pg_upgrade_support \ diff --git a/contrib/pg_test_timing/Makefile b/contrib/pg_test_timing/Makefile new file mode 100644 index 000..b8b266a --- /dev/null +++ b/contrib/pg_test_timing/Makefile @@ -0,0 +1,18 @@ +# contrib/pg_test_timing/Makefile + +PGFILEDESC = pg_test_timing - test timing overhead +PGAPPICON = win32 + +PROGRAM = pg_test_timing +OBJS = pg_test_timing.o + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/pg_test_timing +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/contrib/pg_test_timing/pg_test_timing.c b/contrib/pg_test_timing/pg_test_timing.c new file mode 100644 index 000..bcf3c3a --- /dev/null +++ b/contrib/pg_test_timing/pg_test_timing.c @@ -0,0 +1,157 @@ +/* + * pg_test_timing.c + * tests overhead of timing calls and their monoticity: that + * they always move forward + */ + +#include postgres_fe.h + +#include getopt_long.h +#include portability/instr_time.h + +static const char *progname; + +static int32 test_duration = 3; + +static void handle_args(int argc, char *argv[]); +static void test_timing(int32); + +int +main(int argc, char *argv[]) +{ + progname = get_progname(argv[0]); + + handle_args(argc, argv); + + test_timing(test_duration); + + return 0; +} + +static void +handle_args(int argc, char *argv[]) +{ + static struct option long_options[] = { + {duration, required_argument, NULL, 'd'}, + {NULL, 0, NULL, 0} + }; + int option; /* Command line option */ + int optindex = 0; /* used by getopt_long */ + + if (argc 1) + { + if (strcmp(argv[1], --help) == 0 || strcmp(argv[1], -h) == 0 || + strcmp(argv[1], -?) == 0) + { + printf(Usage: %s [-d DURATION]\n, progname); + exit(0); + } + if (strcmp(argv[1], --version) == 0 || strcmp(argv[1], -V) == 0) + { + puts(pg_test_timing (PostgreSQL) PG_VERSION); + exit(0); + } + } + + while ((option = getopt_long(argc, argv, d:, + long_options, optindex)) != -1) + { + switch (option) + { + case 'd': +test_duration = atoi(optarg); +break; + + default: +fprintf(stderr, Try \%s --help\ for more information.\n, + progname); +exit(1); +break; + } + } + + if (argc optind) + { + fprintf(stderr, +%s: too many command-line arguments (first is \%s\)\n, +progname, argv[optind]); + fprintf(stderr, Try \%s --help\ for more information.\n, +progname); + exit(1); + } + + if (test_duration 0) + { + printf(Testing timing overhead for %d seconds.\n, test_duration); + } + else + { + printf(Testing timing was interrupted.\n); + } +} + +static void +test_timing(int32 duration) +{ + uint64 total_time; + int64 time_elapsed = 0; + uint64 loop_count = 0; + uint64 prev, cur; + int32 diff, i, bits, found; + + instr_time start_time, end_time, temp; + + static int64 histogram[32]; + + total_time = duration 0 ? duration * 100 : 0; + + INSTR_TIME_SET_CURRENT(start_time); + cur = INSTR_TIME_GET_MICROSEC(start_time); + + while (time_elapsed total_time) + { + prev = cur; + INSTR_TIME_SET_CURRENT(temp); + cur = INSTR_TIME_GET_MICROSEC(temp); + diff = cur - prev; + + if (diff 0) + { + printf(Detected clock going backwards in time.\n); + printf(Time warp: %d microseconds\n, diff); + exit(1); + } + + bits = 0; + while (diff) + { + diff = 1; + bits++; + } + histogram[bits]++; + + loop_count++; + INSTR_TIME_SUBTRACT(temp, start_time); + time_elapsed = INSTR_TIME_GET_MICROSEC(temp); + } + + INSTR_TIME_SET_CURRENT(end_time); + +
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 01/15/2012 05:14 PM, Ants Aasma wrote: I hope that having a tool to measure the overhead and check the sanity of clock sources is enough to answer the worries about the potential performance hit. We could also check that the clock source is fast enough on start-up/when the guc is changed, but that seems a bit too much and leaves open the question about what is fast enough. I've been thinking along those same lines--check at startup, provide some guidance on the general range of what's considered fast vs. slow in both the code and documentation. What I'm hoping to do here is split your patch in half and work on the pg_test_timing contrib utility first. That part answers some overdue questions about when EXPLAIN ANALYZE can be expected to add a lot of overhead, which means it's useful all on its own. I'd like to see that utility go into 9.2, along with a new documentation section covering that topic. I'll write the new documentation bit. As far as the buffer timing goes, there is a lot of low-level timing information I'd like to see the database collect. To pick a second example with very similar mechanics, I'd like to know which queries spend a lot of their time waiting on locks. The subset of time a statement spends waiting just for commit related things is a third. The industry standard term for these is wait events, as seen in Oracle, MySQL, MS SQL Server. etc. That's so standard I don't see an intellectual property issue with PostgreSQL using the same term. Talk with a random person who is converting from Oracle to PostgreSQL, ask them about their performance concerns. At least 3/4 of those conversations I have mention being nervous about not having wait event data. Right now, I feel the biggest hurdle to performance tuning PostgreSQL is not having good enough built-in query log analysis tools. If the pg_stat_statements normalization upgrade in the CF queue is commited, that's enough to make me bump that to solved well enough. After clearing that hurdle, figuring out how to log, analyze, and manage storage of wait events is the next biggest missing piece. One of my top goals for 9.3 was to make sure that happened. I don't think the long-term answer for how to manage wait event data is to collect it as part of pg_stat_statements though. But I don't have a good alternate proposal, while you've submitted a patch that actually does something useful right now. I'm going to think some more about how to reconcile all that. There is an intermediate point to consider as well, which is just committing something that adjusts the core code to make the buffer wait event data available. pg_stat_statements is easy enough to continue work on outside of core. I could see a path where that happens in parallel with adding a better core wait event infrastructure, just to get the initial buffer wait info into people's hands earlier. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 28 Listopad 2011, 8:54, Greg Smith wrote: -Have one of the PostgreSQL background processes keep track of a time estimate on its own, only periodically pausing to sync against the real time. Then most calls to gettimeofday() can use that value instead. I was thinking of that idea for slightly longer running things though; I doubt that can be made accurate enough to test instrument buffer What about random sampling, i.e. measure just 5% of the events or something like that? Sure, it's not exact but it significantly reduces the overhead. And it might be a config parameter, so the user might decide how precise results are needed, and even consider how fast the clocks are. Something more ambitious like the v$ stuff would also take care of what you're doing here; I'm not sure that what you've done helps built it though. Please don't take that personally. Part of one of my own instrumentation patches recently was rejected out of hand for the same reason, just not being general enough. Yes, that'd be significant improvement. The wait-event stuff is very useful and changes the tuning significantly. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Mon, Nov 28, 2011 at 2:54 AM, Greg Smith g...@2ndquadrant.com wrote: The real problem with this whole area is that we know there are systems floating around where the amount of time taken to grab timestamps like this is just terrible. Assuming the feature is off by default (and I can't imagine we'd consider anything else), I don't see why that should be cause for concern. If the instrumentation creates too much system load, then don't use it: simple as that. A more interesting question is how much load does this feature create even when it's turned off?. The other big problem for a patch of this sort is that it would bloat the stats file. I think we really need to come up with a more scalable alternative to the current system, but I haven't looked the current system in enough detail to have a clear feeling about what such an alternative would look like. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Nov 28, 2011 8:55 AM, Greg Smith g...@2ndquadrant.com wrote: On 11/27/2011 04:39 PM, Ants Aasma wrote: On the AMD I saw about 3% performance drop with timing enabled. On the Intel machine I couldn't measure any statistically significant change. Oh no, it's party pooper time again. Sorry I have to be the one to do it this round. The real problem with this whole area is that we know there are systems floating around where the amount of time taken to grab timestamps like this is just terrible. I believe on most systems on modern linux kernels gettimeofday an its ilk will be a vsyscall and nearly as fast as a regular function call. I recall a patch similar to this one was submitted by Greg Stark some time ago. It used the info for different reasons--to try and figure out whether reads were cached or not--but I believe it withered rather than being implemented mainly because it ran into the same fundamental roadblocks here. My memory could be wrong here, there were also concerns about what the data would be used for. I speculated about doing that but never did. I had an experimental patch using mincore to do what you describe but it wasn't intended for production code I think. The only real patch was to use getrusage which I still intend to commit but it doesn't tell you the time spent in I/O -- though it does tell you the sys time which should be similar.
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 28 Listopad 2011, 15:40, Greg Stark wrote: On Nov 28, 2011 8:55 AM, Greg Smith g...@2ndquadrant.com wrote: On 11/27/2011 04:39 PM, Ants Aasma wrote: On the AMD I saw about 3% performance drop with timing enabled. On the Intel machine I couldn't measure any statistically significant change. Oh no, it's party pooper time again. Sorry I have to be the one to do it this round. The real problem with this whole area is that we know there are systems floating around where the amount of time taken to grab timestamps like this is just terrible. I believe on most systems on modern linux kernels gettimeofday an its ilk will be a vsyscall and nearly as fast as a regular function call. AFAIK a vsyscall should be faster than a regular syscall. It does not need to switch to kernel space at all, it just reads the data from a shared page. The problem is that this is Linux-specific - for example FreeBSD does not have vsyscall at all (it's actually one of the Linux-isms mentioned here: http://wiki.freebsd.org/AvoidingLinuxisms). There's also something called VDSO, that (among other things) uses vsyscall if availabe, or the best implementation available. So there are platforms that do not provide vsyscall, and in that case it'd be just as slow as a regular syscall :( I wouldn't expect a patch that works fine on Linux but not on other platforms to be accepted, unless there's a compile-time configure switch (--with-timings) that'd allow to disable that. Another option would be to reimplement the vsyscall, even on platforms that don't provide it. The principle is actually quite simple - allocate a shared memory, store there a current time and update it whenever a clock interrupt happens. This is basically what Greg suggested in one of the previous posts, where regularly means on every interrupt. Greg was worried about the precision, but this should be just fine I guess. It's the precision you get on Linux, anyway ... I recall a patch similar to this one was submitted by Greg Stark some time ago. It used the info for different reasons--to try and figure out whether reads were cached or not--but I believe it withered rather than being implemented mainly because it ran into the same fundamental roadblocks here. My memory could be wrong here, there were also concerns about what the data would be used for. The difficulty when distinguishing whether the reads were cached or not is the price we pay for using filesystem cache instead of managing our own. Not sure if this can be solved just by measuring the latency - with spinners it's quite easy, the differences are rather huge (and it's not difficult to derive that even from pgbench log). But with SSDs, multiple tablespaces on different storage, etc. it gets much harder. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Robert Haas robertmh...@gmail.com writes: On Mon, Nov 28, 2011 at 2:54 AM, Greg Smith g...@2ndquadrant.com wrote: The real problem with this whole area is that we know there are systems floating around where the amount of time taken to grab timestamps like this is just terrible. Assuming the feature is off by default (and I can't imagine we'd consider anything else), I don't see why that should be cause for concern. If the instrumentation creates too much system load, then don't use it: simple as that. A more interesting question is how much load does this feature create even when it's turned off?. Right. I see that the code already has a switch to skip the gettimeofday calls, so the objection is only problematic if the added overhead is significant even with the switch off. I would worry mainly about the added time/space to deal with the extra stats counters. The other big problem for a patch of this sort is that it would bloat the stats file. Yes. Which begs the question of why we need to measure this per-table. I would think per-tablespace would be sufficient. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Sun, Nov 27, 2011 at 11:54:38PM -0800, Greg Smith wrote: On 11/27/2011 04:39 PM, Ants Aasma wrote: On the AMD I saw about 3% performance drop with timing enabled. On the Intel machine I couldn't measure any statistically significant change. Oh no, it's party pooper time again. Sorry I have to be the one to do it this round. The real problem with this whole area is that we know there are systems floating around where the amount of time taken to grab timestamps like this is just terrible. I've been annoyed enough by that problem to spend some time digging into why that is--seems to be a bunch of trivia around the multiple ways to collect time info on x86 systems--and after this CommitFest is over Something good to know: in Linux the file /sys/devices/system/clocksource/clocksource0/current_clocksource lists the current clock source, and /sys/devices/system/clocksource/clocksource0/available_clocksource lists the available clock sources. With cat you can switch them. That way you may be able to quantify the effects on a single machine. Learned the hard way while tracking clock-skew on a multicore system. The hpet may not be the fastest (that would be the cpu timer), but it's the fastest (IME) that gives guarenteed monotonic time. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] Patch: add timing of buffer I/O requests
Tomas Vondra t...@fuzzy.cz writes: Another option would be to reimplement the vsyscall, even on platforms that don't provide it. The principle is actually quite simple - allocate a shared memory, store there a current time and update it whenever a clock interrupt happens. This is basically what Greg suggested in one of the previous posts, where regularly means on every interrupt. Greg was worried about the precision, but this should be just fine I guess. It's the precision you get on Linux, anyway ... That sounds good for other interesting things, which entails being able to have timing information attached to the XID sequence. If we go this way, how far are we from having a ticker in PostgreSQL? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Nov 28, 2011, at 9:29 AM, Tomas Vondra wrote: I recall a patch similar to this one was submitted by Greg Stark some time ago. It used the info for different reasons--to try and figure out whether reads were cached or not--but I believe it withered rather than being implemented mainly because it ran into the same fundamental roadblocks here. My memory could be wrong here, there were also concerns about what the data would be used for. The difficulty when distinguishing whether the reads were cached or not is the price we pay for using filesystem cache instead of managing our own. Not sure if this can be solved just by measuring the latency - with spinners it's quite easy, the differences are rather huge (and it's not difficult to derive that even from pgbench log). But with SSDs, multiple tablespaces on different storage, etc. it gets much harder. True, but every use case for this information I can think of ultimately only cares about how long it took to perform some kind of IO; it doesn't *really* care about whether it was cached. So in that context, we don't really care if SSDs are fast enough that they look like cache, because that means they're performing (essentially) the same as cache. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 28.11.2011 22:32, Dimitri Fontaine wrote: Tomas Vondra t...@fuzzy.cz writes: Another option would be to reimplement the vsyscall, even on platforms that don't provide it. The principle is actually quite simple - allocate a shared memory, store there a current time and update it whenever a clock interrupt happens. This is basically what Greg suggested in one of the previous posts, where regularly means on every interrupt. Greg was worried about the precision, but this should be just fine I guess. It's the precision you get on Linux, anyway ... That sounds good for other interesting things, which entails being able to have timing information attached to the XID sequence. If we go this way, how far are we from having a ticker in PostgreSQL? I'm not sure. On Linux/x86 this is already done, but my knowledge of kernel development is rather limited, especially when it comes to other OSes and platforms. E.g. I'm not sure why it's not available in FreeBSD on x86, I guess it's rather we don't want it than it's not possible. In Linux sources, the most interesting pieces are probably these: 1) arch/x86/include/asm/vgtod.h - that's the shared memory structure 2) arch/x86/kernel/vsyscall_64.c - this is how the memory is read (do_vgettimeofday) 3) arch/x86/kernel/vsyscall_64.c - this is how the memory is updated (update_vsyscall) 4) kernel/time/timekeeping.c - do_settimeofday (calls update_vsyscall) 5) drivers/rtc/class.c (and other) RTC drivers call do_settimeofday Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 29.11.2011 02:14, Jim Nasby wrote: On Nov 28, 2011, at 9:29 AM, Tomas Vondra wrote: I recall a patch similar to this one was submitted by Greg Stark some time ago. It used the info for different reasons--to try and figure out whether reads were cached or not--but I believe it withered rather than being implemented mainly because it ran into the same fundamental roadblocks here. My memory could be wrong here, there were also concerns about what the data would be used for. The difficulty when distinguishing whether the reads were cached or not is the price we pay for using filesystem cache instead of managing our own. Not sure if this can be solved just by measuring the latency - with spinners it's quite easy, the differences are rather huge (and it's not difficult to derive that even from pgbench log). But with SSDs, multiple tablespaces on different storage, etc. it gets much harder. True, but every use case for this information I can think of ultimately only cares about how long it took to perform some kind of IO; it doesn't *really* care about whether it was cached. So in that context, we don't really care if SSDs are fast enough that they look like cache, because that means they're performing (essentially) the same as cache. Yup, that's right. The wait times are generally much more interesting than the cached/not cached ratio. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Dimitri Fontaine dimi...@2ndquadrant.fr writes: That sounds good for other interesting things, which entails being able to have timing information attached to the XID sequence. If we go this way, how far are we from having a ticker in PostgreSQL? Those of us who are trying to get rid of idle-state process wakeups will protest any such thing with vigor. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Sorry for taking so long to respong, had a pretty busy day at work. Anyway.. On Mon, Nov 28, 2011 at 9:54 AM, Greg Smith g...@2ndquadrant.com wrote: Oh no, it's party pooper time again. Sorry I have to be the one to do it this round. The real problem with this whole area is that we know there are systems floating around where the amount of time taken to grab timestamps like this is just terrible. I've been annoyed enough by that problem to spend some time digging into why that is--seems to be a bunch of trivia around the multiple ways to collect time info on x86 systems--and after this CommitFest is over I was already hoping to dig through my notes and start quantifying that more. So you can't really prove the overhead of this approach is acceptable just by showing two examples; we need to find one of the really terrible clocks and test there to get a real feel for the worst-case. Sure, I know that the timing calls might be awfully slow. That's why I turned it off by default. I saw that track_functions was already using this, so I figured it was ok to have it potentially run very slowly. -Document the underlying problem and known workarounds, provide a way to test how bad the overhead is, and just throw our hands up and say sorry, you just can't instrument like this if someone has a slow system. Some documentation about potential problems would definitely be good. Same goes for a test tool. ISTM that fast accurate timing is just not possible on all supported platforms. That doesn't seem like a good enough justification to refuse implementing something useful for the majority that do as long as it doesn't cause regressions for those that don't or significant code complexity. -Have one of the PostgreSQL background processes keep track of a time estimate on its own, only periodically pausing to sync against the real time. Then most calls to gettimeofday() can use that value instead. I was thinking of that idea for slightly longer running things though; I doubt that can be made accurate enough to test instrument buffer This would limit it to those cases where hundreds of milliseconds of jitter or more don't bother all that much. And while I hate to kick off massive bike-shedding in your direction, I'm also afraid this area--collecting stats about how long individual operations take--will need a much wider ranging approach than just looking at the buffer cache ones. If you step back and ask what do people expect here?, there's a pretty large number who really want something like Oracle's v$session_wait and v$system_event interface for finding the underlying source of slow things. There's enough demand for that that EnterpriseDB has even done some work in this area too; what I've been told about it suggests the code isn't a great fit for contribution to community PostgreSQL though. Like I said, this area is really messy and hard to get right. Yeah, something like that should probably be something to strive for. I'll ponder a bit more about resource and latency tracking a general. Maybe the question here should be about the cost/benefit ratio of having some utility now vs maintaining/deprecating the user visible interface when a more general framework will turn up. Something more ambitious like the v$ stuff would also take care of what you're doing here; I'm not sure that what you've done helps built it though. Please don't take that personally. Part of one of my own instrumentation patches recently was rejected out of hand for the same reason, just not being general enough. No problem, I understand that half-way solutions can be more trouble than they're worth. I actually built this to help with performance testing an application and thought it would be an interesting experience to try to give the community back something. On Mon, Nov 28, 2011 at 4:40 PM, Greg Stark st...@mit.edu wrote: I believe on most systems on modern linux kernels gettimeofday an its ilk will be a vsyscall and nearly as fast as a regular function call. clock_gettime() is implemented as a vDSO since 2.6.23. gettimeofday() has been user context callable since before git shows any history (2.6.12). On Mon, Nov 28, 2011 at 5:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: The other big problem for a patch of this sort is that it would bloat the stats file. Yes. Which begs the question of why we need to measure this per-table. I would think per-tablespace would be sufficient. Yeah, I figured that this is something that should be discussed. I implemented per-table collection because I thought it might be useful for tools to pick up and show a quick overview on which tables are causing the most IO overhead for queries. On Mon, Nov 28, 2011 at 8:10 PM, Martijn van Oosterhout klep...@svana.org wrote: Something good to know: in Linux the file /sys/devices/system/clocksource/clocksource0/current_clocksource lists the current clock source, and
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 11/28/2011 05:51 AM, Robert Haas wrote: On Mon, Nov 28, 2011 at 2:54 AM, Greg Smithg...@2ndquadrant.com wrote: The real problem with this whole area is that we know there are systems floating around where the amount of time taken to grab timestamps like this is just terrible. Assuming the feature is off by default (and I can't imagine we'd consider anything else), I don't see why that should be cause for concern. If the instrumentation creates too much system load, then don't use it: simple as that. It's not quite that simple though. Releasing a performance measurement feature that itself can perform terribly under undocumented conditions has a wider downside than that. Consider that people aren't going to turn it on until they are already overloaded. If that has the potential to completely tank performance, we better make sure that area is at least explored usefully first; the minimum diligence should be to document that fact and make suggestions for avoiding or testing it. Instrumentation that can itself become a performance problem is an advocacy problem waiting to happen. As I write this I'm picturing such an encounter resulting in an angry blog post, about how this proves PostgreSQL isn't usable for serious systems because someone sees massive overhead turning this on. Right now the primary exposure to this class of issue is EXPLAIN ANALYZE. When I was working on my book, I went out of my way to find a worst case for that[1], and that turned out to be a query that went from 7.994ms to 69.837ms when instrumented. I've been meaning to investigate what was up there since finding that one. The fact that we already have one such problem bit exposed already worries me; I'd really prefer not to have two. [1] (Dell Store 2 schema, query was SELECT count(*) FROM customers;) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Greg Smith g...@2ndquadrant.com writes: On 11/28/2011 05:51 AM, Robert Haas wrote: Assuming the feature is off by default (and I can't imagine we'd consider anything else), I don't see why that should be cause for concern. If the instrumentation creates too much system load, then don't use it: simple as that. It's not quite that simple though. Releasing a performance measurement feature that itself can perform terribly under undocumented conditions has a wider downside than that. Yeah, that's a good point, and the machines on which this would suck are exactly the ones where EXPLAIN ANALYZE creates very large overhead. We don't seem to see a lot of complaints about that anymore, but we do still see some ... and yes, it's documented that EXPLAIN ANALYZE can add significant overhead, but that doesn't stop the questions. Instrumentation that can itself become a performance problem is an advocacy problem waiting to happen. As I write this I'm picturing such an encounter resulting in an angry blog post, about how this proves PostgreSQL isn't usable for serious systems because someone sees massive overhead turning this on. Of course, the rejoinder could be that if you see that, you're not testing on serious hardware. But still, I take your point. Right now the primary exposure to this class of issue is EXPLAIN ANALYZE. When I was working on my book, I went out of my way to find a worst case for that[1], [1] (Dell Store 2 schema, query was SELECT count(*) FROM customers;) That's pretty meaningless without saying what sort of clock hardware was on the machine... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 11/27/2011 04:39 PM, Ants Aasma wrote: On the AMD I saw about 3% performance drop with timing enabled. On the Intel machine I couldn't measure any statistically significant change. Oh no, it's party pooper time again. Sorry I have to be the one to do it this round. The real problem with this whole area is that we know there are systems floating around where the amount of time taken to grab timestamps like this is just terrible. I've been annoyed enough by that problem to spend some time digging into why that is--seems to be a bunch of trivia around the multiple ways to collect time info on x86 systems--and after this CommitFest is over I was already hoping to dig through my notes and start quantifying that more. So you can't really prove the overhead of this approach is acceptable just by showing two examples; we need to find one of the really terrible clocks and test there to get a real feel for the worst-case. I recall a patch similar to this one was submitted by Greg Stark some time ago. It used the info for different reasons--to try and figure out whether reads were cached or not--but I believe it withered rather than being implemented mainly because it ran into the same fundamental roadblocks here. My memory could be wrong here, there were also concerns about what the data would be used for. I've been thinking about a few ways to try and cope with this whole class of timing problem: -Document the underlying problem and known workarounds, provide a way to test how bad the overhead is, and just throw our hands up and say sorry, you just can't instrument like this if someone has a slow system. -Have one of the PostgreSQL background processes keep track of a time estimate on its own, only periodically pausing to sync against the real time. Then most calls to gettimeofday() can use that value instead. I was thinking of that idea for slightly longer running things though; I doubt that can be made accurate enough to test instrument buffer And while I hate to kick off massive bike-shedding in your direction, I'm also afraid this area--collecting stats about how long individual operations take--will need a much wider ranging approach than just looking at the buffer cache ones. If you step back and ask what do people expect here?, there's a pretty large number who really want something like Oracle's v$session_wait and v$system_event interface for finding the underlying source of slow things. There's enough demand for that that EnterpriseDB has even done some work in this area too; what I've been told about it suggests the code isn't a great fit for contribution to community PostgreSQL though. Like I said, this area is really messy and hard to get right. Something more ambitious like the v$ stuff would also take care of what you're doing here; I'm not sure that what you've done helps built it though. Please don't take that personally. Part of one of my own instrumentation patches recently was rejected out of hand for the same reason, just not being general enough. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers