On Fri, Jan 14, 2022 at 7:30 PM Thomas Munro <thomas.mu...@gmail.com> wrote:
> On Fri, Jan 14, 2022 at 4:35 PM Andres Freund <and...@anarazel.de> wrote:
> > The more I think about it, the less I see why we *ever* need to re-arm the
> > latch in pq_check_connection() in this approach. pq_check_connection() is 
> > only
> > used from from ProcessInterrupts(), and there's plenty things inside
> > ProcessInterrupts() that can cause latches to be reset (e.g. parallel 
> > message
> > processing causing log messages to be sent to the client, causing network 
> > IO,
> > which obviously can do a latch reset).
>
> Thanks for the detailed explanation.  I guess I was being overly
> cautious and a little myopic, "leave things exactly the way you found
> them", so I didn't have to think about any of that.  I see now that
> the scenario I was worrying about would be a bug in whatever
> latch-wait loop happens to reach this code.  Alright then, here is
> just... one... more... patch, this time consuming any latch that gets
> in the way and retrying, with no restore.

And pushed.

My excuse for taking so long to get this into the tree is that it was
tedious to retest this thing across so many OSes and determine that it
really does behave reliably for killed processes AND lost
processes/yanked cables/keepalive timeout, even with buffered data.
In the process I learned a bit more about TCP and got POLLRDHUP added
to FreeBSD (not that it matters for PostgreSQL 15, now that we can use
EV_EOF).  As for the FD_CLOSE behaviour I thought I saw on Windows
upthread: it was a mirage, caused by the RST thing.  There may be some
other way to implement this feature on that TCP implementation, but I
don't know what it is.


Reply via email to