On Tue, Jan 21, 2014 at 8:01 PM, Peter Geoghegan <p...@heroku.com> wrote: >> Actually, I think the whole thing is rather badly designed, as warm >> cache or no you're still proposing to do thousands of kernel calls >> while holding a potentially contended LWLock. I suggest what we >> do is (1) read in the whole file, (2) acquire the lock, (3) >> re-read the whole file in the same buffer, (4) work from the buffer. > > fseeko() and fread() are part of the standard library, not any kernel. > I don't believe that what I've done in qtext_fetch() implies thousands > of kernel calls, or thousands of context switches.
I ran an strace on a pg_stat_statements backend with a full ~5,000 entries. It seems that glibc is actually content to always make lseek() and read() syscalls in the event of an fseek(), even though it does maintain its own buffer and could in principle have a lot more gumption about that [1]. I agree that we should copy everything into a buffer in one go. I think that making the LWLocking duration as brief as possible isn't critically important. Automated tools will only be interested in new query texts (implying possible I/O while share locking) as they observe new entries. But new entries are the only thing worth considering that may potentially block on that shared lock (although, not as they write out their query texts, which almost always just requires a shared lock). New entries ought to be very rare occurrence compared to the execution of queries - certainly, on the busy production systems I use pg_stat_statements on, it may be weeks before a new query is observed (uh, with the possible exception of *my* ad-hoc pg_stat_statements queries). On my laptop, "explain analyze select * from pg_stat_statements" indicates a total runtime of ~9ms with 5,000 regression test entries, even with the (still modest) overhead of doing all those read()/lseek() calls. So we're talking about a small impact on a new entry, that will only be affected once, that was always going to have to write out its query text to file anyway. On top of all this, in general it's very improbable that any given new entry will be affected at all, because two unlikely things need to happen at the same time for that to be possible. The most important aspect of keeping the overhead of pg_stat_statements low is that there not be too much cache pressure. Obviously the huge reduction in its shared memory footprint that this patch allows helps with that. [1] http://www.pixelbeat.org/programming/stdio_buffering/ -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers