-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 We rolled out 2.3.8 a few months ago (configuration specifics are at the end of the message)... and for the most part things have gone very well (After Bron solved our Reiserfs deadlock problem... he finds all the good bugs!)
A persistent problem in the new config is Outlook 2003 clients using POP3. Short summary: Outlook 2003 (with all available updates) intermittently (about est. 10-15% of sessions) gets confused and, in turn, confuses our frontend. Client software in three confirmed cases is Outlook 2003 SP2 using POP3 SSL on alternate port. Failure rate seems not be related statistically to message size, content, or total message count. Protocol logging reveals that the client successfully connects and list messages in the box. The client then starts receiving messages. In the middle of a message the session stalls. Pop3d sits in select() waiting for input to be available on the socket to the client, the client (also apparently waiting for input) reaches timeout and bails on the connection. The frontend keeps the backend connection to the backend alive for 10min (ala RFC) even after the client gives up. It seems clear that the "problem" is an outlook bug. Manually unlocking the box and "send/receiving" w/ outlook will cause the problem again (at the same point in the same message). If outlook is restarted however, the next session will function successfully. My real question to the cyrus list (surprisingly, it isn't "can you fix outlook"). Is two fold: 1. Looking at the code, if the client side of bitpipe is closed... I can't see a scenario where the backend of the bitpipe is kept alive. The clean up code seems pretty much bullet proof, yet the behavior I am seeing is that the client side of bitppipe is closed and the connection to the backend is left open until POP3 timeout (what I see is pop3d on the backend waiting on select(). What could cause this? (*note* I doubt seriously that resolving this question will do anything to ease the symptoms of this bug in outlook). 2. Are other large sites using a (traditional) murder configuration seeing problems like this? I'd imagined that a bug this annoying in Outlook, combined with the large number of Cyrus-imapd deployments, would have raised quite a few alarms... but in my searches I find no mention of this issue. Any happy/ideas/confirmation/commiseration would be appreciated, Shawn Cyrus 2.3.8 w/ some Fastmail.fm patchs (fast index iterator, statuscache, command timer) Skiplist for all db 1 mupdate master frontend 1 mupdate slave frontend 3 backends (4-5x mail partitions per backend @ 250G ea.) Sample imapd.conf (relevant bits; frontend, but the backends are very similarly configured): # Basic Config configdirectory: /cyrus_config defaultpartition: default partition-default: /cyrus_mboxes/default servername: daytona sendmail: /usr/lib/sendmail singleinstancestore: yes duplicatesuppression: yes quotawarn: 85 timeout: 60 poptimeout: 10 imapidresponse: no maxmessagesize: 52428800 postmaster: postmaster sieve_maxscriptsize: 32 sieve_maxscripts: 1 imapidlepoll: 120 munge8bit: no username_tolower: 1 allowplaintext: yes allowusermoves: 1 expunge_mode: delayed # Namespace stuff hashimapspool: true fulldirhash: true unixhierarchysep: yes altnamespace: yes # TLS/SSL tls_cert_file: /cyrus_config/email_verisign_2007.crt tls_key_file: /cyrus_config/email_verisign_2007.key tls_ca_file: /cyrus_config/verisign.ca.pem tls_session_timeout: 0 imap_tls_request_cert: 0 pop3_tls_request_cert: 0 # Extras statuscache: 1 statuscache_db: skiplist duplicate_db: skiplist #tlscache_db: skiplist - -- Shawn Nock (OpenPGP: 0x5E377505) Unix Systems Group; UITS University of Arizona nock at email.arizona.edu -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHBXkvPAYipF43dQURApaRAKCZMCAoKIzVf1nsYzSVumH1CkzcwwCeP6S+ jWYKUC3DemweMUBx7h2fcpQ= =enEr -----END PGP SIGNATURE-----