At 9PM -0800 on 5/12/12 Erik A Johnson wrote: > On December 4, 2012 at 4:43:53 AM PST, Ben Morrow <[email protected]> wrote: > > > > So, it looks to me as though you have a firewall problem. You may be > > able to get more information by setting the kern.ipc.sodefunctlog sysctl > > to 1: this should make the kernel log to syslog (or wherever the OSX > > kernel logs go) when sockets are made DEFUNCT and when reads fail for > > that reason. > > sudo sysctl -w kern.ipc.sodefunctlog=1 gives the following in the log: > > 12/5/12 9:10:00.000 PM kernel[0]: sosetdefunct[60169]: (target pid > 60169 level 0) so 0xffffff803159c738 [2,1] marked as defunct > 12/5/12 9:10:00.000 PM kernel[0]: sodefunct[60169]: (target pid 60169 > level 0) so 0xffffff803159c738 [2,1] is now defunct [rcv_si 0x0, > snd_si 0x0, rcv_fl 0x9400, snd_fl 0x1400] > 12/5/12 9:10:00.000 PM kernel[0]: soreceive[60169]: defunct so > 0xffffff803159c738 [2,1] (57) > > The last line is repeated about once every 4 microseconds until I kill it.
OK, so this at least confirms I'm right about what's going on. (I'm assuming 60169 was the pid of the stuck imap-login process?) At 8PM -0800 on 5/12/12 Erik A Johnson wrote: > On December 5, 2012 2:07:14 AM PST, Ben Morrow <[email protected]> wrote: > > > > Well, they're certainly different. Are you sure the second trace > > (withoutpatches) was of a session which went into an infinite loop? > > The only thing peculiar about that trace is that the server closes the > > connection after receiving the first packet from the client, but it does > > so perfectly properly: it ACKs the client's data packet, and does the > > FIN-FIN/ACK exchance properly. You will notice there are no [R] packets, > > which indicate something odd is happening at the server end. > > I'm pretty sure, but I've run it again, confirmed that the imap-login > process is using 100% of a CPU until I kill it, and have attached the > tcpdump. Looks like one packet from SERVER to CLIENT shifted slightly > in chronology, but otherwise the same. OK. > > At 1AM -0800 on 5/12/12 Erik A Johnson wrote: > >> > >> Nope, SO_ISDEFUNCT isn't defined. > > > > Oh, sorry, that needs > > > > #include <sys/socket.h> > > > > at the top. If that doesn't work, then which version of the OS are you > > building for? AFAICT the DEFUNCT socket flag has been present since at > > least 10.5, but the SO_ISDEFUNCT option was only introduced in 10.7. > > This is irritating, actually: it means that to properly fix this on all > > versions of Mac OS Dovecot would need to include the previous ENOTCONN > > code #ifndef SO_ISDEFUNCT. > > I've got both 10.7 and 10.8 SDKs in Xcode and neither have > SO_ISDEFUNCT defined in sys/socket.h (or anywhere else in the > usr/include directories) -- there's a SS_DEFUNCT mask defined in > sys/socketvar.h -- is that what you're looking for? No, it's not: that's the kernel-internal flag, which can't be read from userland. http://opensource.apple.com/source/xnu/xnu-2050.18.24/bsd/sys/socket.h (which is supposedly for 10.8.2) has SO_ISDEFUNCT in among all the other SO_* constants, but I've just noticed it's under #ifndef PRIVATE so maybe it gets removed from the published SDK. I don't really know how Apple system headers get produced. OK, so testing directly isn't going to work. However, I still don't really like the idea of relying on select never to return early during connection setup, nor do I much like testing for this condition every time we try to read. So, how about this (assuming you're not fed up with testing things yet...) Ben --- src/lib/network.c~ 2012-12-06 14:19:33.786585330 +0000 +++ src/lib/network.c 2012-12-06 14:27:46.643586910 +0000 @@ -515,6 +515,22 @@ else return -2; } + +#ifdef __APPLE__ + /* Some Apple firewalls appear to be able to disable a socket + * immediately after accepting, by marking it DEFUNCT. Reads on + * such a socket return immediately with ENOTCONN, which causes + * loops since ENOTCONN is supposed to mean 'wait for the + * connection to finish'. This state can be detected by calling + * connect(): a valid accepted socket will fail with EISCONN, a + * DEFUNCT socket will fail with EOPNOTSUPP. + */ + if (connect(ret, &so.sa, &addrlen) >= 0) + i_panic("dummy connect to detect DEFUNCT socket succeeded"); + if (errno == EOPNOTSUPP) + return -1; +#endif + if (so.sin.sin_family == AF_UNIX) { if (addr != NULL) memset(addr, 0, sizeof(*addr));
