[ADMIN] Errors on VACUUM

2002-03-15 Thread Bob Smith


I received the following error from a VACUUM ANALYZE:

NOTICE:  FlushRelationBuffers(place, 454): block 231 is referenced 
(private 0, global 4)
FATAL 1:  VACUUM (vc_repair_frag): FlushRelationBuffers returned -2

and psql lost the connection right after that.  This was repeatable, 
after the first occurrence I re-started the postmaster and tried again, 
same result.  The particular table in question is static data, that is 
it was originally filled in once by COPY and there were no subsequent 
inserts or deletes.  But I had just added a column to the table and 
updated it from another table to eliminate a frequently-done join, that 
was the reason for doing the VACUUM.

Since I had the text file with the complete table contents from the 
original COPY, I decided to re-build the table, so next I did DROP 
TABLE, and I got this:

NOTICE:  Buffer Leak: [004] (freeNext=-3, freePrev=-3, 
relname=place_pid, blockNum=1, flags=0xc, refcount=2 -1)
NOTICE:  Buffer Leak: [005] (freeNext=-3, freePrev=-3, relname=place, 
blockNum=231, flags=0xc, refcount=4 -1)
NOTICE:  Buffer Leak: [008] (freeNext=-3, freePrev=-3, relname=place, 
blockNum=85, flags=0xc, refcount=3 -1)
NOTICE:  Buffer Leak: [011] (freeNext=-3, freePrev=-3, relname=place, 
blockNum=0, flags=0xc, refcount=2 -1)
.
.

repeating about 20 more times with blockNum varying, and finally ending 
with the table being dropped.  I then re-built the table from scratch 
and everything seems to be fine now, VACUUM no longer gives errors.

Here is my system configuration:

PostgreSQL 7.0.2
RedHat Linux 7.0
P-III 800, 768MB RAM, 80GB disk

So I'm wondering what happened here, what might have caused the original 
error, if something else is potentially still corrupted, and if I should 
maybe re-build the whole database to be safe?

Advice much appreciated!

Bob Smith
Hammett  Edison, Inc.
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [ADMIN] Errors on VACUUM

2002-03-15 Thread Tom Lane

Bob Smith [EMAIL PROTECTED] writes:
 PostgreSQL 7.0.2

Time to update ...

I'm rather surprised that restarting the postmaster didn't make the
error go away, but it's unlikely that anyone will care to investigate
--- unless you can reproduce the problem in 7.1.3 or later.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [ADMIN] Errors on VACUUM

2002-03-15 Thread Bob Smith


On Friday, March 15, 2002, at 09:53 , Tom Lane wrote:

 Bob Smith [EMAIL PROTECTED] writes:
 PostgreSQL 7.0.2

 Time to update ...

I know, I know...  But I'm about two weeks from bringing a whole new 
server on-line with the latest version of Postgres, so I don't want to 
hassle with upgrading the old one.  I just need to keep it limping along 
for a little while longer.

 I'm rather surprised that restarting the postmaster didn't make the
 error go away, but it's unlikely that anyone will care to investigate
 --- unless you can reproduce the problem in 7.1.3 or later.

So this isn't an error that would be caused by some sort of file 
problem?  The server got new hard drives a few months ago and files went 
through a backup/restore with tar, I didn't use pg_dump/pg_restore, so 
it occurred to me that something might have been scrambled.  But until 
this VACUUM failure, the only problem I've _ever_ had, before or after 
the drive upgrade, was one occurrence of the Buffer Leak error which I 
couldn't reproduce.

Well, I guess I just keep my fingers crossed for a couple of weeks until 
I have my new server up.

Thanks!

Bob Smith
Hammett  Edison, Inc.
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [ADMIN] Errors on VACUUM

2002-03-15 Thread Tom Lane

Bob Smith [EMAIL PROTECTED] writes:
 I'm rather surprised that restarting the postmaster didn't make the
 error go away, but it's unlikely that anyone will care to investigate
 --- unless you can reproduce the problem in 7.1.3 or later.

 So this isn't an error that would be caused by some sort of file 
 problem?

AFAICT what you have there is just dangling buffer reference counts;
ie, something didn't release its reference to a disk page buffer in
shared memory.  We used to have quite a number of bugs of that ilk,
though I've not heard of such lately.  They're fairly harmless
normally.

It's not clear to me why a restart (which would naturally clear shared
memory) didn't fix the problem.  Perhaps you had some configuration
on disk that tickled the original buffer-leak bug again each time you
restarted; but I dunno what.  VACUUM itself was never prone to such
errors, so it's hard to see how a restart and immediate VACUUM would
show the problem.  Were you doing other stuff in between?  Have you
got any interesting stuff (like functional indexes, maybe) on this
table?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [ADMIN] Errors on VACUUM

2002-03-15 Thread Bob Smith


On Friday, March 15, 2002, at 03:07 , Tom Lane wrote:

 It's not clear to me why a restart (which would naturally clear shared
 memory) didn't fix the problem.

Er, um, (humble look), I just studied my syslog, and in fact the 
postmaster _didn't_ restart.  I tried to restart it using the Red Hat 
control script from /etc/rc.d/init.d,  but apparently that script lied 
to me when it said the restart succeeded.  The syslog clearly shows the 
postmaster didn't actually restart until about 10 minutes later, _after_ 
the second error.  So I was talking to the same postmaster every time.  
Arrggh.

I'm reassured now that this is just a passing memory management issue 
and not a symptom of a more serious problem, I'm sure the error wouldn't 
have happened the second time if the postmaster had in fact restarted.  
Everything looks fine now so I'm going to stop worrying about it.

Thanks much for your help!

Bob Smith
Hammett  Edison, Inc.
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster