Hi, Thomas!

There are two variants of the patch now.

1) As for the first workaround:

On 31.01.2023 07:09, Thomas Munro wrote:

Maybe it's unlikely that two samples will match while running that
torture test, because it's overwriting the file as fast as it can.
But the idea is that a real system isn't writing the control file most
of the time.

........
Yeah, I was thinking that we should also put a limit on the loop, just
to be cautious.

At first i didn’t understand that the equality condition with the previous
calculated crc and the current one at the second+ attempts was intended
for the case when the pg_control file is really corrupted.

Indeed, by making a few debugging variables and running the tortue test,
i found that there were ~4000 crc errors (~0.0003%) in ~125 million reads from 
the file,
and there was no case when the crc error appeared twice in a row.
So the second and moreover the third successive occurrence of an crc error
can be neglected and for this workaround seems a simpler and maybe more clear
algorithm is possible.
For instance:

for(try = 0 ; try < 3; try++)
{
   open, read from and close pg_control;
   calculate crc;

   *crc_ok_p = EQ_CRC32C(crc, ControlFile->crc);

   if(*crc_ok_p)
      break;
}

2) As for the second variant of the patch with POSIX locks:

On 31.01.2023 14:38, Thomas Munro wrote:
On Tue, Jan 31, 2023 at 5:09 PM Thomas Munro <thomas.mu...@gmail.com> wrote:
Clearly there is an element of speculation or superstition here.  I
don't know what else to do if both PostgreSQL and ext4 decided not to
add interlocking.  Maybe we should rethink that.  How bad would it
really be if control file access used POSIX file locking?  I mean, the
writer is going to *fsync* the file, so it's not like one more wafer
thin system call is going to hurt too much.

Here's an experimental patch for that alternative.  I wonder if
someone would want to be able to turn it off for some reason -- maybe
some NFS problem?  It's less back-patchable, but maybe more
principled?

It looks very strange to me that there may be cases where the cluster data
is stored in NFS. Can't figure out how this can be useful.

i think this variant of the patch is a normal solution
of the problem, not workaround. Found no problems on Linux.
+1 for this variant.

Might add a custom error message for EDEADLK
since it absent in errcode_for_file_access()?

I don't know if Windows suffers from this type of problem.
Unfortunately its equivalent functionality LockFile() looks non-ideal
for this purpose: if your program crashes, the documentation is very
vague on when exactly it is released by the OS, but it's not
immediately on process exit.  That seems non-ideal for a control file
you might want to access again very soon after a crash, to be able to
recover.

Unfortunately i've not had time to reproduce the problem and apply this patch on
Windows yet but i'm going to do it soon on windows 10. If a crc error
will occur there, then we might use the workaround from the first
variant of the patch.

A thought in passing: if UpdateMinRecoveryPoint() performance is an
issue, maybe we should figure out how to use fdatasync() instead of
fsync().

May be choose it in accordance with GUC wal_sync_method?


Sincerely yours,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply via email to