Benedikt Grundmann wrote:
What I think is missing is a clear way to know if you are vacuuming (and analyzing) enough, and how much you are paying for that.

Any good way to measure if you're vacuuming a particular table enough needs to note how much free space is in that table and its indexes over time. That's why one of the other building blocks in the submission queue for 9.2 is a function to make that easier to do. It's one of the important dependencies to get settled before we can move very far toward answering "am I vacuuming enough?".

A second piece to that is recording a history of that information over time. Can't predict the future need for something without some record of its past to extrapolate from. That's probably a job better suited for an external tool. The way you'd want to audit it most easily is to graph it over time, which isn't the sort of thing PostgreSQL is likely to build in. Also, the proof of whether a suggested implementation for a vacuum "meter" was useful or not would be easiest to validate that way. No sense in doing the difficult work of building one until there's a working prototype, which is possible to do more quickly in languages other than C.

A simple meter might not be possible to create even with some better building blocks to base it on. There is a lot of difference in this area that is workload dependent, and there are many types of database workloads out there. The two hardest systems to tune vacuum for that I work on have settled on completely different approaches to the problem. The only thing I've found so far that is true about both of them is that they'd really appreciate easier controls on the maximum rate.

At the moment we are basically changing the knobs blindly based on
some back of the envelope calculations and hearsay.  Than sometimes
month later we find out that eps we haven't been analyzing enough
and that's why on that particular table the planner is now picking
a "bad" query.

Unlike VACUUM, ANALYZE is so cheap to run that it's possible to improve this situation more easily--just do it a lot more. Lowering autovacuum_analyze_scale_factor is the easiest way. By default that is 0.10, requiring approximately a 10% change in the table size before a new ANALYZE is done. I think the lowest production setting I have for that somewhere is 0.03 on a roughly terabyte scale database. There a 10% change in one the larger tables is well over the point of impacting query plans badly.

If your data changes its character quite frequently based on new information, I wouldn't be afraid in that case to drop as low as 0.01 here. That would give you ANALYZE that happened 10X as often as it does now. You'll waste a moderate amount of CPU and disk resources, but a tuning error that leans toward analyzing too frequently isn't that expensive.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to