On Wed, Jun 25, 2008 at 2:58 AM, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
> Andrew Hammond wrote: > >> I found this error message in my log files repeatedly: >> >> Error: failed to re-find parent key in "ledgerdetail_2008_03_idx2" for >> deletion target page 64767 >> >> I though "hmm, that index looks broken. I'd better re-create it." So, I >> dropped the index and then tried to create a new one to replace it. Which >> completely locked up the backend that was running the CREATE TABLE. I ran >> truss against the backend in question and it didn't register anything >> (except signals 2 and 15 when I tried to cancel the query and kill the >> backend respectively). I eventually had to restart the database to get the >> CREATE INDEX process to go away (well, to release that big nasty lock). >> > > What kind of an index is it? Does "SELECT COUNT(*) from <table>" work? After the restart I did a count(*) and it worked. A little under 13m rows. So, sequential scans seem to work. > posting here in case there's interest in gathering some forensic data or a >> clever suggetion about how I can recover this situation or even some ideas >> about what's causing it. >> > Anyway, the current plan is to drop the table and reload it from backup. > I'm > > Yes, please take a filesystem-level backup right away to retain the > evidence. Well, I've already burned our downtime allowance for this month, but we do a regular PITR type backup which hopefully will be sufficient to replicate the problem. > Could you connect to the hung backend with gdb and get a stacktrace? The backend is no longer hung (two restarts later). I'll try to reproduce this problem on my workstation (same binary, same OS, libraries etc) using the PITR dump. Andrew