On 12/7/14, 6:16 PM, Simon Riggs wrote:
On 20 October 2014 at 10:57, Jim Nasby <jim.na...@bluetreble.com> wrote:

Currently, a non-freeze vacuum will punt on any page it can't get a cleanup
lock on, with no retry. Presumably this should be a rare occurrence, but I
think it's bad that we just assume that and won't warn the user if something
bad is going on.

(I'm having email problems, so I can't see later mails on this thread,
so replying here.)

Logging patch looks fine, but I would rather not add a line of text
for each VACUUM, just in case this is non-zero. I think we should add
that log line only if the blocks skipped > 0.

I thought about doing that, but I'm loath to duplicate a rather large ereport 
call. Happy to make the change if that's the consensus though.

What I'm more interested in is what you plan to do with the
information once we get it?

The assumption that skipping blocks is something bad is strange. I
added it because VACUUM could and did regularly hang on busy tables,
which resulted in bloat because other blocks that needed cleaning
didn't get any attention.

Which is better, spend time obsessively trying to vacuum particular
blocks, or to spend the time on other blocks that are in need of
cleaning and are available to be cleaned?

Which is better, have autovacuum or system wide vacuum progress on to
other tables that need cleaning, or spend lots of effort retrying?

How do we know what is the best next action?

I'd really want to see some analysis of those things before we spend
even more cycles on this.

That's the entire point of logging this information. There is an underlying 
assumption that we won't actually skip many pages, but there's no data to back 
that up, nor is there currently any way to get that data.

My hope is that the logging shows that there isn't anything more that needs to 
be done here. If this is something that causes problems, at least now DBAs will 
be aware of it and hopefully we'll be able to identify specific problem 
scenarios and find a solution.



BTW, my initial proposal[1] was strictly logging. The only difference was 
raising it to a warning if a significant portion of the table was skipped. I 
only investigated retrying locks at the suggestion of others. I never intended 
this to become a big time sink.

[1]:
"Currently, a non-freeze vacuum will punt on any page it can't get a cleanup 
lock on, with no retry. Presumably this should be a rare occurrence, but I think 
it's bad that we just assume that and won't warn the user if something bad is going 
on.

"My thought is that if we skip any pages elog(LOG) how many we skipped. If we skip 
more than 1% of the pages we visited (not relpages) then elog(WARNING) instead."
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to