Hello Stephen.

Stephen Isard wrote in
 <32731-1629933287-500...@sneakemail.com>:
 |On Wed, 25 Aug 2021, Steffen Nurpmeso steffen-at-sdaoden.eu |s-nail| wrote:
 |
 |>  s-nail: IMAP error: Server shutting down.
 |>  (continue)
 |>  s-nail: There are messages in the error ring, manageable via `errors' \
 |>  command
 |
 |Ok, that's a different error message from the one we get, which is
 |
 |s-nail: IMAP write error: error:00000000:lib(0):func(0):reason(0)

Ok.

  "/tmp/s-nail-edbaseOeQ9Xo" 7L, 94B written
  IMAP ALARM 1
  NOOP 1
  IMAP FOR SOCKET WRITE
  NOOP 1 - out
  IMAP FOR ANSWER
  ^[OQ^C
  ^[OQ

So immediately after one quits the editor the SIGALRM comes in and
IMAP tries to contact the server.  This blocks.
Also we are still in a condition where signals are mostly blocked
because child_wait() is looking out for the SIGCHLD which signals
that the editor has finished.

It seems the (Linux) socket code does not recognize that the
connection broke in this condition, and SO_{RCV,SND}TIMEO (42
seconds) does not come into play either now, so we are waiting for
the _full_ maximum time a socket read can block.  I wonder .. is
this a kernel bug?

  s-nail: IMAP: error:00000000:lib(0):func(0):reason(0)

But there we go, after a looooooooong time.
You seem to have waited for it to pass?
Seems to be around 15 minutes or something here, is that
net.ipv4.tcp_keepalive_time=300?, my net.ipv4.tcp_fin_timeout=21.
It strikes me to be a bug that SO_{RCV,SND}TIMEO is not honoured.
But i am not an expert here.

  IMAP ALARM 2
  CHILD POST WAIT
  EDITOR POIST RUN 2
  (continue)

So then we are back, the SIGALRM signal handler (really, doing
such aka network I/O in a signal handler, that is _so_ sick, maybe
that is why SO_*TIMEO does not come into play .. signal handler?)
finished, and the normal child waiter collects the remains
gracefully.

  -------
  Message contains:
  Author: steffen@kdc.localdomain
  From: steffen@kdc.localdomain
  To: du
  Subject: ball

  One.
  Two.

And this is the new file.

 |Again, I ask what process are you sending the kill -ALRM signal to?  The 
 |s-nail process?  When I do that it has no effect at all.  I can quit the 
 |editor and get no IMAP error.  I only get "my" IMAP error by waiting for 
 |a sufficiently long time, where "sufficiently long" varies.

With "iptables -I INPUT 1 -s 10.0.1.22 -j DROP".  o
It seems to me we also have a OpenSSL problem.

  NOOP 1
  IMAP FOR SOCKET WRITE
  NOOP 1 - out
  IMAP FOR ANSWER
  SOCKET TLS READ PRE AGAIN
  SOCKET TLS READ PRE
  SOCKET TLS READ POST 4294967295
  SOCKET TLS READ PRE
  SOCKET TLS READ POST 4294967295
  SOCKET TLS READ PRE
  SOCKET TLS READ POST 4294967295
  SOCKET TLS READ PRE
  ...

for a loooong time.  It seems OpenSSL does not recognize the
connection is dead.  _Even_ if i quit the VM via monitor, it still
loops like the above.  Even though BIO_fd_should_retry() should
return -1 as i ... 

  "/tmp/s-nail-edbasevYZ0Op" 9L, 99B written
  CHILD POST WAIT
  EDITOR POIST RUN 2
  (continue)
  IMAP ALARM 1
  NOOP 1
  IMAP FOR SOCKET WRITE
  NOOP 1 - out
  IMAP FOR ANSWER
  SOCKET TLS READ PRE AGAIN
  SOCKET TLS READ PRE
  SOCKET TLS READ POST 4294967295 error Resource temporarily unavailable
  SOCKET TLS READ PRE
  SOCKET TLS READ POST 4294967295 error Resource temporarily unavailable
  SOCKET TLS READ PRE
  SOCKET TLS READ POST 4294967295 error Resource temporarily unavailable
  SOCKET TLS READ PRE

Hmm.  Returns EAGAIN even though the VM has been killed.
Should return EPIPE, no????  Maybe because it is all-local??  Hm.

I tell you what, all merde.  Of course the timeouts _are_ handled,
it is just that OpenSSL retries and retries and retries until some
whatever timeout then triggers in.
I need to implement an upper limit on the retries!  In conjunction
with the timeouts we get to

  "/tmp/s-nail-edbaseZlca74" 7L, 97B written
  CHILD POST WAIT
  EDITOR POIST RUN 2
  (continue)
  IMAP ALARM 1
  NOOP 1
  IMAP FOR SOCKET WRITE
  NOOP 1 - out
  IMAP FOR ANSWER
  SOCKET TLS READ PRE AGAIN
  SOCKET TLS READ PRE
  SOCKET TLS READ POST -1 error Resource temporarily unavailable
  SOCKET TLS READ PRE
  SOCKET TLS READ POST -1 error Resource temporarily unavailable
  SOCKET TLS READ PRE
  SOCKET TLS READ POST -1 error Resource temporarily unavailable
  s-nail: IMAP: error:00000000:lib(0):func(0):reason(0)
  IMAP ALARM 2

  -------
  Message contains:
  Author: steffen@kdc.localdomain
  From: steffen@kdc.localdomain
  To: heu
  Subject: stroh

  Ball.
  Fun.
  s-nail: There are messages in the error ring, manageable via `errors' command

Heh.  Let me do it nicely.
In short: (re)enable ALRM while we are in the editor, and
introduce an upper limit on SSL network errors.  We come to

  "/tmp/s-nail-edbaseg3SEf0" 8L, 99B written
  s-nail: TLS socket read error, retrying: 
error:00000000:lib(0):func(0):reason(0)
  s-nail: TLS socket read error, retrying: 
error:00000000:lib(0):func(0):reason(0)
  s-nail: IMAP: error:00000000:lib(0):func(0):reason(0)
  IMAP ALARM 2
  CHILD POST WAIT
  EDITOR POIST RUN 2
  (continue)
  s-nail: There are messages in the error ring, manageable via `errors' command

Nah, that is sick.

  s-nail: TLS socket read error, retrying: Resource temporarily unavailable
  s-nail: TLS socket read error, retrying: Resource temporarily unavailable
  s-nail: IMAP: error:00000000:lib(0):func(0):reason(0)
  IMAP ALARM 2
  CHILD POST WAIT
  EDITOR POIST RUN 2
  (continue)
  s-nail: There are messages in the error ring, manageable via `errors' command

Yeah, let's do it like this.
"Fixes" pushed to the master and stable branch when you read this.

Thanks for reporting the problem and insisting, Stephen!

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Reply via email to