On 8/28/14, 12:18 PM, Robert Haas wrote:
At least in situations that I've encountered, it's typical to be able to determine the frequency with which a given table needs to be vacuumed to avoid runaway bloat, and from that you can work backwards to figure out how fast you must process it in MB/s, and from there you can work backwards to figure out what cost delay will achieve that effect. But if the system tinkers with the cost delay under the hood, then you're vacuuming at a different (slower) rate and, of course, the table bloats.
The last time I took a whack at this, I worked toward making all of the parameters operate in terms of target MB/s, for exactly this style of thinking and goal. Those converted into the same old mechanism under the hood and I got the math right to give the same behavior for the simple cases, but that could have been simplified eventually. I consider that line of thinking to be the only useful one here.

The answer I like to these values that don't inherit as expected in the GUC tree is to nuke that style of interface altogether in favor of simplifer bandwidth measured one, then perhaps add multiple QoS levels. Certainly no interest in treating the overly complicated innards of cost computation as a bug and fixing them with even more complicated behavior.

The part of this I was trying hard to find time to do myself by the next CF was a better bloat measure tool needed to actually see the problem better. With that in hand, and some nasty test cases, I wanted to come back to simplified MB/s vacuum parameters with easier to understand sharing rules again. If other people are hot to go on that topic, I don't care if I actually do the work; I just have a pretty clear view of what I think people want.

The only plausible use case for setting a per-table rate that I can see is when you actually want the system to use that exact rate for that particular table. That's the main one, for these must run on schedule or else jobs.
Yes.

On 8/29/14, 9:45 AM, Alvaro Herrera wrote:
Anyway it seems to me maybe there is room for a new table storage
parameter, say autovacuum_do_balance which means to participate in the
balancing program or not.

If that eliminates some of the hairy edge cases, sure.

A useful concept to consider is having a soft limit that most thing work against, along with a total hard limit for the server. When one of these tight schedule queries with !autovacuum_do_balance starts, they must run at their designed speed with no concern for anyone else. Which means:

a) Their bandwidth gets pulled out of the regular, soft limit numbers until they're done. Last time I had one of these jobs, once the big important boys were running, everyone else in the regular shared set were capped at vacuum_cost_limit=5 worth of work. Just enough to keep up with system catalog things, and over the course of many hours process small tables.

b) If you try to submit multiple locked rate jobs at once, and the total goes over the hard limit, they have to just be aborted. If the rush of users comes back at 8AM, and you can clean the table up by then if you give it 10MB/s, what you cannot do is let some other user decrease your rate such that you're unfinished at 8AM. Then you'll have aggressive AV competing against the user load you were trying to prepare for. It's better to just throw a serious error that forces someone to look at the hard limit budget and adjust the schedule instead. The systems with this sort of problem are getting cleaned up every single day, almost continuously; missing a day is not bad as long as it's noted and fixed again before the next cleanup window.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to