Hi,

what follows are some comments after trying to understand how the autovacuum launcher works and thoughts on how to apply this to the replication manager in Postgres-R.

The initial comments in autovacuum.c say:

If the fork() call fails in the postmaster, it sets a flag in the shared
memory area, and sends a signal to the launcher.

I note that the shmem area that the postmaster is writing to is pretty static and not dependent on any other state stored in shmem. That certainly makes a difference compared to my imessages approach, where a corruption in the shmem for imessages could also confuse the postmaster.

Reading on, the 'can_launch' flag in the launcher's main loop makes sure that only one worker is requested concurrently, so that the launcher doesn't miss a failure or success notice from either the postmaster or the newly started worker. The replication manager currently shamelessly requests as many helper backend as it wants. I think I can change that without much trouble. Would certainly make sense.

Notifications of the replication manager after termination or crashes of a helper backend remain. Upon normal errors (i.e. elog(ERROR... ), the backend processes themselves should take care of notifying the replication manager. But crashes are more difficult. IMO the replication manager needs to stay alive during this reinitialization, to keep the GCS connection. However, it can easily detach from shared memory temporarily (the imessages stuff is the only shmem place it touches, IIRC). However, a more difficult aspect is: it must be able to tell if a backend has applied its transaction *before* it died or not. Thus, after all backends have been killed, the postmaster needs to wait with reinitializing shared memory, until the replication manager has consumed all its messages. (Otherwise we would risk "losing" local transactions, probably also remote ones).

So, yes, after thinking about it, detaching the postmaster from shared memory seems doable for Postgres-R (in the sense of "the postmaster does not rely on possibly corrupted data in shared memory"). Reinitialization needs some more thoughts, but in general that seems like the way to go.

Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to