Andres Freund wrote on 2018-03-01:
> I think the patch probably doesn't apply anymore, due to other changes
> to pg_stat_statements since its posting. Could you refresh?

pgss_plans_v02.patch applies cleanly to master, there were no changes to 
pg_stat_statements since the copyright updates at the beginning of January.
(pgss_plans_v02.patch is attached to message 
1bd396a9-4573-55ad-7ce8-fe7adffa1...@uni-muenster.de and can be found in the 
current commitfest as well.)

> I've not done any sort of review. Scrolling through I noticed //
> comments which aren't pg coding style.

I'll fix that along with any other problems that might be found in a review.


> I'd like to see a small benchmark showing the overhead of the feature.
> Both in runtime and storage size.

I've tried to gather some meaningful results, however either my testing 
methodology was flawed (as variance between all my passes of pgbench was rather 
high) or the takeaway is that the feature only generates little overhead.
This is what I've run on my workstation using a Ryzen 1700 and 16GB of RAM and 
an old Samsung 840 Evo as boot drive, which also held the database:
The database used for the tests was dropped and pgbench initialized anew for 
each test (pgss off, pgss on, pgss on with plan collection) using a scaling of 
16437704*0.003~=50 (roughly what the phoronix test suite uses for a buffer 
test).
Also similar to the phoronix test suite, I used 8 jobs and 32 connections for a 
normal multithreaded load.

I then ran 10 passes, each for 60 seconds, with a 30 second pause between them, 
as well as another test which ran for 10 minutes.

With pg_stat_statements on, the latter test (10 minutes) resulted in 1833 tps, 
while the patched version resulted in 1700 tps, so a little over 7% overhead? 
Well, the "control run", without pg_stat_statements delivered only 1806 tps, so 
variance seems to be quite high.

The results of the ten successive tests, each running 60 seconds and then 
waiting for 30 seconds, are displayed in the attached plot.
I've tinkered with different settings with pgbench for quite some time now and 
all I can come up with are runs with high variance between them.

If anybody has any recommendations for a setup that generates less variance, 
I'll try this again.

Finally, the more interesting metric regarding this patch is the size of the 
pg_stat_statements.stat file, which stores all the metrics while the database 
is shut down. I reckon that the size of pgss_query_texts.stat (which holds only 
the query strings and plan strings while the database is running) will be 
similar, however it might fluctuate more as new strings are simply appended to 
the file until the garbagecollector decides that it has to be cleaned up.
After running the aforementioned tests, the file was 8566 bytes in size for 
pgss in it's unmodified form, while the tests resulted in 32607 bytes for the 
pgss that collects plans as well. This seems reasonable as plans strings are 
usually longer than the statements from which they result. Worst case, the 
pg_stat_statements.stat holds two plans for each type of statement.
I've not tested the length of the file with different encodings, such as JSON, 
YAML, or XML, however I do not expect any hugely different results.

Greetings
Julian

Attachment: pgss_plans_pgbench.pdf
Description: Adobe PDF document

Reply via email to