On 30/05/17 11:02, Kyotaro HORIGUCHI wrote:
> At Thu, 25 May 2017 17:52:50 +0200, Petr Jelinek 
> <petr.jeli...@2ndquadrant.com> wrote in 
> <e082a56a-fd95-a250-3bae-0fff93832...@2ndquadrant.com>
>> Hi,
>>
>> We have had issue with walsender timeout when used with logical decoding
>> and the transaction is taking long time to be decoded (because it
>> contains many changes)
>>
>> I was looking today at the walsender code and realized that it's because
>> if the network and downstream are fast enough, we'll always take fast
>> path in WalSndWriteData which does not do reply or keepalive processing
>> and is only reached once the transaction has finished by other code. So
>> paradoxically we die of timeout because everything was fast enough to
>> never fall back to slow code path.
>>
>> I propose we only use fast path if the last processed reply is not older
>> than half of walsender timeout, if it is then we'll force the slow code
>> path to process the replies again. This is similar logic that we use to
>> determine if to send keepalive message. I also added CHECK_INTERRUPRS
>> call to fast code path because otherwise walsender might ignore them for
>> too long on large transactions.
>>
>> Thoughts?
> 
> +     TimestampTz     now = GetCurrentTimestamp();
> 
> I think it is not recommended to read the current time too
> frequently, especially within a loop that hates slowness. (I
> suppose that a loop that can fill up a send queue falls into that

Yeah that was my main worry for the patch as well, although given that
the loop does tuple processing it might not be very noticeable.

> category.)  If you don't mind a certain amount of additional
> complexity for eliminating the possible slowdown by the check,
> timeout would be usable. Attached patch does almost the same
> thing with your patch but without busy time check.
> 
> What do you think about this?
> 

I think we could do it that way.

> # I saw that SIGQUIT doens't work for active publisher, which I
> # think mention in another thread.

Ah missed that email I guess, but that's the missing CHECK_INTERRUPTS();
in the fast-path which btw your updated patch is missing as well.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to