Re: [HACKERS] Causal reads take II

Dmitry Dolgov Sat, 29 Jul 2017 12:08:50 -0700

> On 12 July 2017 at 23:45, Thomas Munro <[email protected]>
wrote:
> I renamed it to "synchronous replay", because "causal reads" seemed a bit
too
> arcane.


Hi

I looked through the code of `synchronous-replay-v1.patch` a bit and ran a
few
tests. I didn't manage to break anything, except one mysterious error that
I've
got only once on one of my replicas, but I couldn't reproduce it yet.
Interesting thing is that this error did not affect another replica or
primary.
Just in case here is the log for this error (maybe you can see something
obvious, that I've not noticed):

```
LOG:  could not remove directory "pg_tblspc/47733/PG_10_201707211/47732":
Directory not empty
CONTEXT:  WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
LOG:  could not remove directory "pg_tblspc/47733/PG_10_201707211":
Directory not empty
CONTEXT:  WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
LOG:  could not remove directory "pg_tblspc/47733/PG_10_201707211/47732":
Directory not empty
CONTEXT:  WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
LOG:  could not remove directory "pg_tblspc/47733/PG_10_201707211":
Directory not empty
CONTEXT:  WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
LOG:  directories for tablespace 47733 could not be removed
HINT:  You can remove the directories manually if necessary.
CONTEXT:  WAL redo at 0/125F4D90 for Tablespace/DROP: 47733
FATAL:  could not create directory "pg_tblspc/47734/PG_10_201707211/47732":
File exists
CONTEXT:  WAL redo at 0/125F5768 for Storage/CREATE:
pg_tblspc/47734/PG_10_201707211/47732/47736
LOG:  startup process (PID 8034) exited with exit code 1
LOG:  terminating any other active server processes
LOG:  database system is shut down
```

And speaking about the code, so far I have just a few notes (some of them
merely questions):

* In general the idea behind this patch sounds interesting for me, but it
  relies heavily on time synchronization. As mentioned in the documentation:
  "Current hardware clocks, NTP implementations and public time servers are
  unlikely to allow the system clocks to differ more than tens or hundreds
of
  milliseconds, and systems synchronized with dedicated local time servers
may
  be considerably more accurate." But as far as I remember from my own
  experience sometimes it maybe not that trivial on something like AWS
because
  of virtualization. Maybe it's an unreasonable fear, but is it possible to
  address this problem somehow?

* Also I noticed that some time-related values are hardcoded (e.g. 50%/25%
  time shift when we're dealing with leases). Does it make sense to move
them
  out and make them configurable?

* Judging from the `SyncReplayPotentialStandby` function, it's possible to
have
  `synchronous_replay_standby_names = "*, server_name"`, which is basically
an
  equivalent for just `*`, but it looks confusing. Is it worth it to prevent
  this behaviour?

* In the same function `SyncReplayPotentialStandby` there is this code:

```
if (!SplitIdentifierString(rawstring, ',', &elemlist))
{
/* syntax error in list */
pfree(rawstring);
list_free(elemlist);
/* GUC machinery will have already complained - no need to do again */
return false;
}
```

  Am I right that ideally this (a situation when at this point in the code
  `synchronous_replay_standby_names` has incorrect value) should not happen,
  because GUC will prevent us from that? If yes, then it looks for me that
it
  still makes sense to put here a log message, just to give more
information in
  a potentially weird situation.

* In the function `SyncRepReleaseWaiters` there is a commentary:

```
  /*
  * If the number of sync standbys is less than requested or we aren't
* managing a sync standby or a standby in synchronous replay state that
* blocks then just leave.
  * /
if ((!got_recptr || !am_sync) && !walsender_sr_blocker)
```

  Is this commentary correct? If I understand everything right
`!got_recptr` -
  the number of sync standbys is less than requested (a), `!am_sync` - we
aren't
  managing a sync standby (b), `walsender_sr_blocker` - a standby in
synchronous
  replay state that blocks (c). Looks like condition is `(a or b) and not
c`.

* In the function `ProcessStandbyReplyMessage` there is a code that
implements
  this:

```
     * If the standby reports that it has fully replayed the WAL for at
least
* 10 seconds, then let's clear the lag times that were measured when it
* last wrote/flushed/applied a WAL record.  This way we avoid displaying
* stale lag data until more WAL traffic arrives.
```
  but I never found any mention of this 10 seconds in the documentation. Is
it
  not that important? Also, question 2 is related to this one.

* In the function `WalSndGetSyncReplayStateString` all the states are in
lower
  case except `UNKNOWN`, is there any particular reason for that?

There are also few more not that important notes (mostly about some typos
and
few confusing names), but I'm going to do another round of review and
testing
anyway so I'll just send them all next time.

Re: [HACKERS] Causal reads take II

Reply via email to