On Thu, Jan 30, 2014 at 9:57 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > BTW ... it occurs to me to wonder if it'd be feasible to keep the > query-texts file mmap'd in each backend, thereby reducing the overhead > to write a new text to about the cost of a memcpy, and eliminating the > read cost in pg_stat_statements() altogether. It's most likely not worth > the trouble; but if a more-realistic benchmark test shows that we actually > have a performance issue there, that might be a way out without giving up > the functional advantages of Peter's patch.
There could be a worst case for that scheme too, plus we'd have to figure out how to make in work with windows, which in the case of mmap() is not a sunk cost AFAIK. I'm skeptical of the benefit of pursuing that. I'm totally unimpressed with the benchmark as things stand. It relies on keeping 64 clients in perfect lockstep as each executes 10,000 queries that are each unique snowflakes. Though even though they're unique snowflakes, and even though there are 10,000 of them, everyone executes the same one at exactly the same time relative to each other, in exactly the same order as quickly as possible. Even still, the headline "reference score" of -35% is completely misleading, because it isn't comparing like with like in terms of has table size. This benchmark incidentally recommends that we reduce the default hash table size to improve performance when the hash table is under pressure, which is ludicrous. It's completely backwards. You could also use the benchmark to demonstrate that the overhead of calling pg_stat_statements() is ridiculously high, since like creating a new query text, that only requires a shared lock too. This is an implausibly bad worst case for larger hash table sizes in pg_stat_statements generally. 5,000 entries is enough for the large majority of applications. But for those that hit that limit, in practice they're still going to find the vast majority of queries already in the table as they're executed. If they don't, they can double or triple their "max" setting, because the shared memory overhead is so low. No one has any additional overhead once their query is in the hash table already. In reality, actual applications could hardly be further from the perfectly uniform distribution of distinct queries presented here. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers