2014-01-31 Peter Geoghegan <p...@heroku.com> > On Thu, Jan 30, 2014 at 12:32 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> In reality, actual applications > >> could hardly be further from the perfectly uniform distribution of > >> distinct queries presented here. > > > > Yeah, I made the same point in different words. I think any realistic > > comparison of this code to what we had before needs to measure a workload > > with a more plausible query frequency distribution. > > Even though that distribution just doesn't square with anybody's > reality, you can still increase the pg_stat_statements.max setting to > 10k and the problem goes away at little cost (a lower setting is > better, but a setting high enough to cache everything is best). But > you're not going to have terribly much use for pg_stat_statements > anyway....if you really do experience churn at that rate with 5,000 > possible entries, the module is ipso facto useless, and should be > disabled.
I run extra test your and my patch with the pg_stat_statements.max setting=10k in other same setting and servers. They are faster than past results. method | try1 | try2 | try3 -------------------------------------------- peter 3 | 6.769 | 6.784 | 6.785 method 5 | 6.770 | 6.774 | 6.761 I think that most significant overhead in pg_stat_statements is deleting and inserting cost in hash table update, and not at LWLocks. If LWLock is the most overhead, we can see the overhead -S pgbench, because it have one select pet tern which are most Lock conflict case. But we can't see such result. I'm not sure about dynahash.c, but we can see hash conflict case in this code. IMHO, I think It might heavy because it have to run list search and compare one until not conflict it. And past result shows that your patch's most weak point is that deleting most old statement and inserting new old statement cost is very high, as you know. It accelerate to affect update(delete and insert) cost in pg_stat_statements table. So you proposed new setting 10k in default max value. But it is not essential solution, because it is also good perfomance for old pg_stat_statements. And when we set max=10K in your patch and want to get most used only 1000 queries in pg_stat_statements, we have to use order-by-query with limit 1000. Sort cost is relatively high, so monitoring query will be slow and high cost. But old one is only set pg_stat_statements.max=1000, and performance is not relatively bad. It will be best settings for getting most used 1000 queries infomation. That' all my assumption. Sorry for a few extra test, I had no time in my office today. If we hope, I can run 1/N distribution pgbench test next week, I modify my perl script little bit, for creating multiple sql files with various sleep time. Regards, -- Mitsumasa KONDO NTT Open Source Software Center