I explain more detail about this problem.

This problem was occurred by RestartPoint create illegal WAL file in during
archive recovery. But I cannot recognize why illegal WAL file was created
in CreateRestartPoint(). My attached patch is really plain…

In problem case at XLogFileReadAnyTLI(),  first check WAL file does not get
fd. Because it does not exists property WAL File in archive directory.

XLogFileReadAnyTLI()
>     if (sources & XLOG_FROM_ARCHIVE)
>     {
>       fd = XLogFileRead(log, seg, emode, tli, XLOG_FROM_ARCHIVE, true);
>        if (fd != -1)
>        {
>           elog(DEBUG1, "got WAL segment from archive");
>           return fd;
>        }
>     }

Next search WAL file in pg_xlog. There are illegal WAL File in pg_xlog. And
return illegal WAL File’s fd.

XLogFileReadAnyTLI()
>      if (sources & XLOG_FROM_PG_XLOG)
>      {
>         fd = XLogFileRead(log, seg, emode, tli, XLOG_FROM_PG_XLOG, true);
>         if (fd != -1)
>            return fd;
>      }

Returned fd is be readFile value. Of cource readFile value is over 0. So
out of for-loop.

XLogPageRead
>              readFile = XLogFileReadAnyTLI(readId, readSeg, DEBUG2,
>                                      sources);
>               switched_segment = true;
>               if (readFile >= 0)
>                  break;

Next, problem function point. Illegal WAL file was read, and error.

XLogPageRead
>   if (lseek(readFile, (off_t) readOff, SEEK_SET) < 0)
>  {
>      ereport(emode_for_corrupt_record(emode, *RecPtr),
>            (errcode_for_file_access(),
>       errmsg("could not seek in log file %u, segment %u to offset %u: %m",
>            readId, readSeg, readOff)));
>      goto next_record_is_invalid;
>   }
>   if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
>   {
>      ereport(emode_for_corrupt_record(emode, *RecPtr),
>            (errcode_for_file_access(),
>       errmsg("could not read from log file %u, segment %u, offset %u: %m",
>            readId, readSeg, readOff)));
>      goto next_record_is_invalid;
>   }
>   if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false))
>      goto next_record_is_invalid;


I think that horiguchi's discovery point is after this point.
We must fix that CreateRestartPoint() does not create illegal WAL File.

Best regards,

--
Mitsumasa KONDO

Reply via email to