Re: [HACKERS] Sync Rep for 2011CF1

Robert Haas Wed, 16 Feb 2011 09:30:18 -0800

On Wed, Feb 16, 2011 at 11:32 AM, Simon Riggs <[email protected]> wrote:
> On Wed, 2011-02-16 at 17:40 +0200, Heikki Linnakangas wrote:
>> On 16.02.2011 17:36, Simon Riggs wrote:
>> > On Tue, 2011-02-15 at 12:08 -0500, Robert Haas wrote:
>> >> On Mon, Feb 14, 2011 at 12:25 AM, Fujii Masao<[email protected]>  
>> >> wrote:
>> >>> On Fri, Feb 11, 2011 at 4:06 AM, Heikki Linnakangas
>> >>> <[email protected]>  wrote:
>> >>>> I added a XLogWalRcvSendReply() call into XLogWalRcvFlush() so that it 
>> >>>> also
>> >>>> sends a status update every time the WAL is flushed. If the walreceiver 
>> >>>> is
>> >>>> busy receiving and flushing, that would happen once per WAL segment, 
>> >>>> which
>> >>>> seems sensible.
>> >>>
>> >>> This change can make the callback function "WalRcvDie()" call 
>> >>> ereport(ERROR)
>> >>> via XLogWalRcvFlush(). This looks unsafe.
>> >>
>> >> Good catch.  Is the cleanest solution to pass a boolean parameter to
>> >> XLogWalRcvFlush() indicating whether we're in the midst of dying?
>> >
>> > Surely if you do this then sync rep will fail to respond correctly if
>> > WalReceiver dies.
>> >
>> > Why is it OK to write to disk, but not OK to reply?
>>
>> Because the connection might be dead. A broken connection is a likely
>> cause of walreceiver death.
>
> Would it not be possible to check that?


I'm not actually sure that it matters that much whether we do or not.
ISTM that the WAL receiver is normally going to exit the main loop (in
WalReceiverMain) right here:

        /* Process any requests or signals received recently */
        ProcessWalRcvInterrupts();

But to get to that point, we either have to be making our first pass
through the loop (in which case nothing interesting has happened yet)
or we have to have just completed an iteration through the loop (in
which case we just sent a reply).  I think that the only thing that
can have changed since the last reply is the replay position, which
this version of the sync rep patch doesn't care about anyway.  Even if
it did, I'm not sure it'd be worth complicating the die path to
squeeze in one final reply.

Actually, on further reflection, I'm not even sure why we bother with
the fsync.  It seems like a useful safeguard but I'm not seeing how we
can get to that point without having fsync'd everything anyway.  Am I
missing something?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep for 2011CF1

Reply via email to