On Thu, Jul 3, 2008 at 3:47 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > "Andrew Hammond" <[EMAIL PROTECTED]> writes: >> On Thu, Jul 3, 2008 at 2:35 PM, Tom Lane <[EMAIL PROTECTED]> wrote: >>> The whole thing is pretty mystifying, especially the ENOSPC write >>> failure on what seems like it couldn't have been a full disk. > >> Yes, I've passed along the task of explaining why PG thought the disk >> was full to the sysadmin responsible for the box. I'll post the answer >> here, when and if we have one. > > I just noticed something even more mystifying: you said that the ENOSPC > error occurred once a day during vacuuming.
Actually, the ENOSPC happened once. After that first error, we got vacuumdb: vacuuming of database "adecndb" failed: ERROR: failed to re-find parent key in "ledgerdetail_2008_03_idx2" for deletion target page 64767 repeatedly. > That doesn't make any > sense, because a write error would leave the shared buffer still marked > dirty, and so the next checkpoint would try to write it again. If > there's a persistent write error on a particular block, you should see > it being complained of at least once per checkpoint interval. > > If you didn't see that, it suggests that the ENOSPC was transient, > which isn't unreasonable --- but why would it recur for the exact > same block each night? > > Have you looked into the machine's kernel log to see if there is any > evidence of low-level distress (hardware or filesystem level)? I'm > wondering if ENOSPC is being reported because it is the closest > available errno code, but the real problem is something different than > the error message text suggests. Other than the errno the symptoms > all look quite a bit like a bad-sector problem ... I will pass this along to the sysadmin in charge of this box. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers