002_standby.pl flaky?

Thomas Munro Tue, 01 Feb 2022 12:16:35 -0800

On Wed, Feb 2, 2022 at 6:38 AM Andres Freund <[email protected]> wrote:
> On 2022-02-01 18:02:34 +1300, Thomas Munro wrote:
> > 1.  "pgsocket" could become a pointer to a heap-allocated wrapper
> > object containing { socket, event, flags } on Windows, or something
> > like that, but that seems a bit invasive and tangled up with public
> > APIs like libpq, which put me off trying that.  I'm willing to explore
> > it if people object to my other idea.
>
> I'm not sure if the libpq aspect really is a problem. We're not going to have
> to do that conversion repeatedly, I think.


Alright, I'm prototyping that variant today.

> > Provide a way to get a callback when a socket is created or closed.
> >
> > XXX TODO handle callback failure
> > XXX TODO investigate overheads/other implications of having a callback
> > installed
>
> What do we need this for? I still don't understand what kind of reconnects we
> need to automatically need to detect.

libpq makes new sockets in various cases like when trying multiple
hosts/ports (the easiest test to set up) or in some SSL and GSSAPI
cases.  In the model shown in the most recent patch where there is a
hash table holding ExtraSocketState objects that libpq doesn't even
know about, we have to do SocketTableDrop(old socket),
SocketTableAdd(new socket) at those times, which is why I introduced
that callback.

If we switch to the model where a socket is really a pointer to a
wrapper struct (which I'm about to prototype), the need for all that
bookkeeping goes away, no callbacks, no hash table, but now libpq has
to participate knowingly in a socket wrapping scheme to help the
backend while also somehow providing unwrapped SOCKET for client API
stability.  Trying some ideas, more on that soon.

> > +#if !defined(FRONTEND)
> > +struct ExtraSocketState
> > +{
> > +#ifdef WIN32
> > +     HANDLE          event_handle;           /* one event for the life of 
> > the socket */
> > +     int                     flags;                          /* most 
> > recent WSAEventSelect() flags */
> > +     bool            seen_fd_close;          /* has FD_CLOSE been 
> > received? */
> > +#else
> > +     int                     dummy;                          /* none of 
> > this is needed for Unix */
> > +#endif
> > +};
>
> Seems like we might want to track more readiness events than just close? If we
> e.g. started tracking whether we've seen writes blocking  / write readiness,
> we could get rid of cruft like
>
>                 /*
>                  * Windows does not guarantee to log an FD_WRITE network event
>                  * indicating that more data can be sent unless the previous 
> send()
>                  * failed with WSAEWOULDBLOCK.  While our caller might well 
> have made
>                  * such a call, we cannot assume that here.  Therefore, if 
> waiting for
>                  * write-ready, force the issue by doing a dummy send().  If 
> the dummy
>                  * send() succeeds, assume that the socket is in fact 
> write-ready, and
>                  * return immediately.  Also, if it fails with something 
> other than
>                  * WSAEWOULDBLOCK, return a write-ready indication to let our 
> caller
>                  * deal with the error condition.
>                  */
>
> that seems likely to just make bugs less likely, rather than actually fix 
> them...

Yeah.  Unlike FD_CLOSE, FD_WRITE is a non-terminal condition so would
also need to be cleared, which requires seeing all
send()/sendto()/write() calls with wrapper functions, but we already
do stuff like that.  Looking into it...

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

Reply via email to