Hi Alexey,

thanks for your feedback, these are interesting points.

Alexey Klyukin wrote:
In Replicator we avoided the need for postmaster to read/write backend's
shmem data by using it as a signal forwarder. When a backend wants to
inform a special process (i.e. queue monitor) about replication-related
event (such as commit) it sends SIGUSR1 to Postmaster with a related
"reason" flag and the postmaster upon receiving this signal forwards
it to the destination process. Termination of backends and special
processes are handled by the postmaster itself.

Hm.. how about larger data chunks, like change sets? In Postgres-R, those need to travel between the backends and the replication manager, which then sends it to the GCS.

Hm...what would happen with the new data under heavy load when the queue would eventually be filled with messages, the relevant transactions
would be aborted or they would wait for the manager to release the queue
space occupied by already processed messages? ISTM that having a fixed
size buffer limits the maximum transaction rate.

That's why the replication manager is a very simple forwarder, which does not block messages, but consumes them immediately from shared memory. It already features a message cache, which holds messages it cannot currently forward to a backend, because all backends are busy.

And it takes care to only send change sets to helper backend which are not busy and can consume the process the remote transaction immediately. That way, I don't think the limit on shared memory is the bottleneck. However, I didn't measure.

WRT waiting vs aborting: I think at the moment I don't handle this situation gracefully. I've never encountered it. ;-) But I think the simpler option is letting the sender wait until there is enough room in the queue for its message. To avoid deadlocks, each process should consume its messages, before trying to send one. (Which is done correctly only for the replication manager ATM, not for the backends, IIRC).

What about keeping the per-process message queue in the local memory of
the process, and exporting only the queue head to the shmem, thus having
only one message per-process there.

The replication manager already does that with its cache. No other process needs to send (large enough) messages which cannot be consumed immediately. So such a local cache does not make much sense for any other process.

Even for the replication manager, I find it dubious to require such a cache, because it introduces an unnecessary copying of data within memory.

When the queue manager gets a
message from the process it may signal that process to copy the next
message from the process local memory into the shmem. To keep a
correct ordering of queue messages an additional shared memory queue of
pid_t can be maintained, containing one pid per each message.

The replication manager takes care of the ordering for cached messages.

Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to