On Sat, Dec 10, 2011 at 12:29 PM, Greg Smith <g...@2ndquadrant.com> wrote:

> "We can send regular special messages from WALSender to WALReceiver that do
> not form part of the WAL stream, so we don't bulk
> up WAL archives. (i.e. don't use "w" messages)."
>
> Here's my understanding of how this would work.

Let me explain a little more and provide a very partial patch.

We define a new replication protocol message 'k' which sends a
keepalive from primary to standby when there is no WAL to send. The
message does not form part of the WAL stream so does not bloat WAL
files, nor cause them to fill when unattended.

Keepalives contain current end of WAL and a current timestamp.

Keepalive processing is all done on the standby and there is no
overhead on a primary which does not use replication. There is a
slight overhead on primary for keepalives but this happens only when
there are no writes. On the standby we already update shared state
when we receive some data, so not much else to do there.

When the standby has applied up to the end of WAL the replication
delay is receipt time - send time of keepalive.

When standby receives a data packet it records WAL ptr and time. As
standby applies each chunk it removes the record for each data packet
and sets the last applied timestamp.

If standby falls behind the number of data packet records will build
up, so we begin to keep record every 2 packets, then every 4 packets
etc. So the further the standby falls behind the less accurately we
record the replication delay - though the accuracy remains
proportional to the delay.

To complete the patch I need to
* send the keepalive messages when no WAL outstanding
* receive the messages
* store timestamp info for data and keepalives
* progressively filter the messages if we get too many

I will be working on this patch some more this week.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index d6332e5..71c40cc 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1467,6 +1467,54 @@ The commands accepted in walsender mode are:
       <variablelist>
       <varlistentry>
       <term>
+          Primary keepalive message (B)
+      </term>
+      <listitem>
+      <para>
+      <variablelist>
+      <varlistentry>
+      <term>
+          Byte1('k')
+      </term>
+      <listitem>
+      <para>
+          Identifies the message as a sender keepalive.
+      </para>
+      </listitem>
+      </varlistentry>
+      <varlistentry>
+      <term>
+          Byte8
+      </term>
+      <listitem>
+      <para>
+          The current end of WAL on the server, given in
+          XLogRecPtr format.
+      </para>
+      </listitem>
+      </varlistentry>
+      <varlistentry>
+      <term>
+          Byte8
+      </term>
+      <listitem>
+      <para>
+          The server's system clock at the time of transmission,
+          given in TimestampTz format.
+      </para>
+      </listitem>
+      </varlistentry>
+      </variablelist>
+      </para>
+      </listitem>
+      </varlistentry>
+      </variablelist>
+     </para>
+
+     <para>
+      <variablelist>
+      <varlistentry>
+      <term>
           Standby status update (F)
       </term>
       <listitem>
diff --git a/src/include/replication/walprotocol.h b/src/include/replication/walprotocol.h
index 656c8fc..1c73d35 100644
--- a/src/include/replication/walprotocol.h
+++ b/src/include/replication/walprotocol.h
@@ -40,6 +40,21 @@ typedef struct
 } WalDataMessageHeader;
 
 /*
+ * Keepalive message from primary (message type 'k'). (lowercase k)
+ * This is wrapped within a CopyData message at the FE/BE protocol level.
+ *
+ * Note that the data length is not specified here.
+ */
+typedef struct
+{
+	/* Current end of WAL on the sender */
+	XLogRecPtr	walEnd;
+
+	/* Sender's system clock at the time of transmission */
+	TimestampTz sendTime;
+} PrimaryKeepaliveMessage;
+
+/*
  * Reply message from standby (message type 'r').  This is wrapped within
  * a CopyData message at the FE/BE protocol level.
  *
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to