[Dovecot] Handling ENFILE

2009-04-13 Thread Mike Abbott
My mail server consumed all of its configured file table slots and  
started returning ENFILE (Too many open files in system) for many  
operations.  This wreaked havoc with dovecot-1.1.13.  Here are some  
areas where dovecot should detect and more gracefully handle ENFILE  
error returns:


1.  Deliver should report an error more helpful than Unknown internal  
error when fdatasync_path fails:
deliver(user): Apr 10 14:53:20 Error: fdatasync_path(/Volumes/MailData/ 
user/new) failed: Too many open files in system
deliver(user): Apr 10 14:53:20 Info: msgid=...: save failed to  
INBOX: BUG: Unknown internal error


2.  Index files should not be fscked when they could not be opened due  
to ENFILE:
Error: IMAP(user): open() failed with index file /Volumes/MailData/ 
user/.Deleted Messages/dovecot.index: Too many open files in system
Warning: IMAP(user): fscking index file /Volumes/MailData/ 
user/.Deleted Messages/dovecot.index


3.  Dovecot should not crash when mail transaction logs become  
corrupt, or appear to have become corrupt due to ENFILE.  Here is a  
sequence of error messages from the log but I can't tell whether the  
IMAP ones up through the panic in the middle are all from the same  
process, or from different processes.

deliver(user): Apr 10 14:55:32 Info: msgid=...: saved mail to INBOX
deliver(user): Apr 10 14:55:54 Error: open() failed with index file / 
Volumes/MailData/user/dovecot.index: Too many open files in system
deliver(user): Apr 10 14:55:54 Error: open() failed with file /Volumes/ 
MailData/user/dovecot.index.log.2: Too many open files in system
deliver(user): Apr 10 14:55:57 Error: open() failed with index cache  
file /Volumes/MailData/user/dovecot.index.cache: Too many open files  
in system
dovecot: Apr 10 14:56:08 Warning: IMAP(user): fscking index file / 
Volumes/MailData/user/dovecot.index
dovecot: Apr 10 14:56:16 Error: IMAP(user): Fixed index file /Volumes/ 
MailData/user/dovecot.index: log_file_seq 15 - 2

deliver(user): Apr 10 14:56:19 Info: msgid=...: saved mail to INBOX
deliver(user): Apr 10 14:56:20 Error: Corrupted transaction log file / 
Volumes/MailData/user/dovecot.index.log seq 2: record size too small  
(type=0x0, offset=23088, size=0) (sync_offset=36848)
deliver(user): Apr 10 14:56:20 Warning: fscking index file (in-memory  
index)
deliver(user): Apr 10 14:56:21 Info: msgid=...: save failed to  
INBOX: Internal error occurred. Refer to server log for more  
information. [2009-04-10 14:56:20]
dovecot: Apr 10 14:56:40 Panic: IMAP(user): file mail-transaction-log- 
view.c: line 103 (mail_transaction_log_view_set): assertion failed:  
(min_file_seq = max_file_seq)
dovecot: Apr 10 14:56:40 Error: IMAP(user): Raw backtrace: [see below  
for better backtrace]
deliver(user): Apr 10 14:56:45 Error: Transaction log file /Volumes/ 
MailData/user/dovecot.index.log: marked corrupted

deliver(user): Apr 10 14:57:04 Info: msgid=...: saved mail to INBOX
deliver(user): Apr 10 14:57:23 Info: msgid=...: saved mail to INBOX
deliver(user): Apr 10 14:57:49 Info: msgid=...: saved mail to INBOX
deliver(user): Apr 10 14:58:21 Info: msgid=...: saved mail to INBOX
deliver(user): Apr 10 14:58:41 Info: msgid=...: saved mail to INBOX
deliver(user): Apr 10 14:59:09 Info: msgid=...: saved mail to INBOX
deliver(user): Apr 10 14:59:28 Info: msgid=...: saved mail to INBOX
deliver(user): Apr 10 14:59:43 Info: msgid=...: saved mail to INBOX
deliver(user): Apr 10 15:00:57 Info: msgid=...: saved mail to INBOX
deliver(user): Apr 10 15:03:01 Info: msgid=...: saved mail to INBOX
deliver(user): Apr 10 15:03:27 Info: msgid=...: saved mail to INBOX
dovecot: Apr 10 15:21:41 Error: IMAP(user): Corrupted transaction log  
file /Volumes/MailData/user/dovecot.index.log seq 2: Invalid  
transaction log size (36848 vs 23088): /Volumes/MailData/user/ 
dovecot.index.log (sync_offset=36848)
dovecot: Apr 10 15:21:42 Info: imap-login: Login: user=user,  
method=CRAM-MD5, rip=..., lip=..., TLS


The backtrace for the assertion failure was:
0   libSystem.B.dylib   0x7fff8007c102 __kill + 10
1   libSystem.B.dylib   0x7fff800f7ffc abort + 83
2   imap  	0x00010006a22e  
default_fatal_finish + 70
3   imap  	0x00010006a27c  
i_syslog_fatal_handler + 0

4   imap0x0001000699db i_info + 0
5   imap  	0x000100058542  
mail_transaction_log_view_set + 104
6   imap  	0x000100053b4c  
view_sync_set_log_view_range + 125
7   imap  	0x000100053fc5  
mail_index_view_sync_begin + 378
8   imap  	0x00010003c26e  
index_mailbox_sync_init + 129
9   imap  	0x000100018457  
maildir_storage_sync_init + 229
10  imap  	0x0001c6bb imap_sync_init +  
132
11  imap  	0x0001c90d cmd_sync_delayed  
+ 477
12  imap   

Re: [Dovecot] Handling ENFILE

2009-04-13 Thread Timo Sirainen
On Mon, 2009-04-13 at 14:29 -0500, Mike Abbott wrote:
 My mail server consumed all of its configured file table slots and  
 started returning ENFILE (Too many open files in system) for many  
 operations.  This wreaked havoc with dovecot-1.1.13.  Here are some  
 areas where dovecot should detect and more gracefully handle ENFILE  
 error returns:

I also just tried making nfs_safe_open() fail randomly and reproducing
these.

 1.  Deliver should report an error more helpful than Unknown internal  
 error when fdatasync_path fails:
 deliver(user): Apr 10 14:53:20 Error: fdatasync_path(/Volumes/MailData/ 
 user/new) failed: Too many open files in system
 deliver(user): Apr 10 14:53:20 Info: msgid=...: save failed to  
 INBOX: BUG: Unknown internal error

http://hg.dovecot.org/dovecot-1.1/rev/576e77047959

 2.  Index files should not be fscked when they could not be opened due  
 to ENFILE:
..
 3.  Dovecot should not crash when mail transaction logs become  
 corrupt, or appear to have become corrupt due to ENFILE.  

I made nfs_safe_open() return failures randomly and found all kinds of
bugs with it. I committed several fixes to hg, but I still get some
weird errors once in a while, I'll try to debug more later..


signature.asc
Description: This is a digitally signed message part