On 08/26/2014 09:17 AM, Kyotaro HORIGUCHI wrote:
but I don't think we want to define the behavior as "usually,
pq_terminate_backend() will kill a backend that's blocked on sending
to the client, but sometimes you have to call it twice (or more!) to
really kill it".

I agree that it is desirable behavior, if any measure to avoid
that. But I think it's better than doing kill -9 engulfing all
innocent backends.

A more robust way is to set ImmediateInterruptOK before calling
send(). That wouldn't let you send data that can be sent without
blocking though. For that, you could put the socket to non-blocking
mode, and sleep with select(), also waiting for the process' latch at
the same time (die() sets the latch, so that will wake up the select()
if a termination request arrives).

I condiered it but select() frequently (rather in most cases when
send() blocks by send buffer exhaustion) fails to predict that
following send() will be blocked. (If my memory is correct.)  So
the final problem would be blocked send()...

My point was to put the socket in non-blocking mode, so that send() will return immediately with EAGAIN instead of blocking, if the send buffer is full. See WalSndWriteData for how that would work, it does something similar.

Is it actually safe to process the die-interrupt where send() is
called? ProcessInterrupts() does "ereport(FATAL, ...)", which will
attempt to send a message to the client. If that happens in the middle
of constructing some other message, that will violate the protocol.

So I strongly agree to you if select() works as the impression
when reading the man document.

Not sure what you mean, but the above is a fatal problem with the patch right now, regardless of how you do the sleeping.

2. I think it would be reasonable to try to kill off the connection
without notifying the client if we're unable to send the data to the
client in a reasonable period of time.  But I'm unsure what "a
reasonable period of time" means.  This patch would basically do it
after no delay at all, which seems like it might be too aggressive.
However, I'm not sure.

I think there's no such a reasonable time.

I agree it's pretty hard to define any reasonable timeout here. I
think it would be fine to just cut the connection; even if you don't
block while sending, you'll probably reach a CHECK_FOR_INTERRUPT()
somewhere higher in the stack and kill the connection almost as
abruptly anyway. (you can't violate the protocol, however)

Yes, closing the blocked connection seems one of the most smarter
way, checking the occurred interrupt could avoid protocol
violation. But the problem for that is that there seems no means
to close sockets elsewhere the blocking handle. dup(2)'ed handle
cannot release the resource by only itself.

I didn't understand that, surely you can just close() the socket? There is no dup(2) involved. And we don't necessarily need to close the socket, we just need to avoid writing to it when we're already in the middle of sending a message.

I'm marking this as Waiting on Author in the commitfest app, because:
1. the protocol violation needs to be avoided one way or another, and
2. the behavior needs to be consistent so that a single pg_terminate_backend() is enough to always kill the connection.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to