Hi, Somebody is reading this thread?
This problem seems still remaining on REL9_3_STABLE. Many users would face this problem, so we should resolve this in next release. I think his patch is reasonable to fix this problem. Please check this again. regards, -------------------------- Tomonari Katsumata 2013/12/12 Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> > Hello, we happened to see server crash on archive recovery under > some condition. > > After TLI was incremented, there should be the case that the WAL > file for older timeline is archived but not for that of the same > segment id but for newer timeline. Archive recovery should fail > for the case with PANIC error like follows, > > | PANIC: record with zero length at 0/1820D40 > > Replay script is attached. This issue occured for 9.4dev, 9.3.2, > and not for 9.2.6 and 9.1.11. The latter search pg_xlog for the > TLI before trying archive for older TLIs. > > This occurrs during fetching checkpoint redo record in archive > recovery. > > > if (checkPoint.redo < RecPtr) > > { > > /* back up to find the record */ > > record = ReadRecord(xlogreader, checkPoint.redo, PANIC, false); > > And this is caused by that the segment file for older timeline in > archive directory is preferred to that for newer timeline in > pg_xlog. > > Looking into pg_xlog before trying the older TLIs in archive like > 9.2- fixes this issue. The attached patch is one possible > solution for 9.4dev. > > Attached files are, > > - recvtest.sh: Replay script. Step 1 and 2 makes the condition > and step 3 causes the issue. > > - archrecvfix_20131212.patch: The patch fixes the issue. Archive > recovery reads pg_xlog before trying older TLI in archive > similarly to 9.1- by this patch. > > regards, > > -- > Kyotaro Horiguchi > NTT Open Source Software Center > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >