On 11/28/2011 05:51 AM, Robert Haas wrote:
On Mon, Nov 28, 2011 at 2:54 AM, Greg Smith<g...@2ndquadrant.com>  wrote:
The real problem with this whole area is that we know there are
systems floating around where the amount of time taken to grab timestamps
like this is just terrible.
Assuming the feature is off by default (and I can't imagine we'd
consider anything else), I don't see why that should be cause for
concern.  If the instrumentation creates too much system load, then
don't use it: simple as that.

It's not quite that simple though. Releasing a performance measurement feature that itself can perform terribly under undocumented conditions has a wider downside than that.

Consider that people aren't going to turn it on until they are already overloaded. If that has the potential to completely tank performance, we better make sure that area is at least explored usefully first; the minimum diligence should be to document that fact and make suggestions for avoiding or testing it.

Instrumentation that can itself become a performance problem is an advocacy problem waiting to happen. As I write this I'm picturing such an encounter resulting in an angry blog post, about how this proves PostgreSQL isn't usable for serious systems because someone sees massive overhead turning this on. Right now the primary exposure to this class of issue is EXPLAIN ANALYZE. When I was working on my book, I went out of my way to find a worst case for that[1], and that turned out to be a query that went from 7.994ms to 69.837ms when instrumented. I've been meaning to investigate what was up there since finding that one. The fact that we already have one such problem bit exposed already worries me; I'd really prefer not to have two.

[1] (Dell Store 2 schema, query was "SELECT count(*) FROM customers;")

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to