On Fri, Aug 27, 2021 at 4:10 PM Andres Freund <and...@anarazel.de> wrote: > On Fri, Aug 27, 2021, at 13:07, Robert Haas wrote: > > On Fri, Aug 27, 2021 at 3:24 PM Andres Freund <and...@anarazel.de> wrote: > > > I wonder if we could improve the situation somewhat by using the > > > non-blocking > > > pqcomm functions in a few select places. E.g. if elog.c's > > > send_message_to_frontend() sent its message via a new > > > pq_endmessage_noblock() > > > (which'd use the existing pq_putmessage_noblock()) and used > > > pq_flush_if_writable() instead of pq_flush(), we'd a) not block sending > > > to the > > > client before AbortCurrentTransaction(), b) able to queue further error > > > messages safely. > > > > pq_flush_if_writable() could succeed in sending only part of the data, > > so I don't see how this works. > > All the data is buffered though, so I don't see that problem that causes?
OK, I guess I'm confused here. If we're always buffering the data, then I suppose there's no risk of injecting a protocol message into the middle of some other protocol message, assuming that we don't have a non-local transfer of control halfway through putting a message in the buffer. But there's still the risk of getting out of step with the client. Suppose the client does SELECT 1/0 and we send an ErrorResponse complaining about the division by zero. But as we're trying to send that response, we block. Later, a query cancel occurs. We can't queue another ErrorResponse because the client will interpret that as the response to the next query, since the division by zero error is the response to the current one. I don't think that changing pq_flush() to pq_flush_if_writable() in elog.c or anywhere else can fix that problem. But that doesn't mean that it isn't a good idea. Any place where we're doing a pq_flush() and know that another one will happen soon afterward, before we wait for data from the client, can be changed to pq_flush_if_writable() without harm, and it's beneficial to do so, because like you say, it avoids blocking in places that users may find inconvenient - e.g. while holding locks, as Fujii-san said. The comment here claims that "postgres.c will flush out waiting data when control returns to the main loop" but the only pq_flush() call that's directly present in postgres.c is in response to receiving a Flush message, so I suppose this is actually talking about the pq_flush() inside ReadyForQuery. It's not 100% clear to me that we do that in all relevant cases, though. Suppose we hit an error while processing some protocol message that does not set send_ready_for_query = true, like for example Describe ('D'). I think in that case the flush in elog.c is the only one. Perhaps we ought to change postgres.c so that if we don't enter the block guarded by "if (send_ready_for_query)" we instead pq_flush(). -- Robert Haas EDB: http://www.enterprisedb.com