On 03/25/10 06:58, Brian Candler wrote:
On Tue, Mar 23, 2010 at 03:19:49PM +0200, Timo Sirainen wrote:
I have done some small-scale testing and it looks fine.

Stress testing by running imaptest for same user's same mailbox in 2+ different 
servers (i.e. two NFS clients reading/writing same mailbox files) should show 
up quickly what kind of errors you could get. http://imapwiki.org/ImapTest

OK, I've now set this up:

      ImapTest --->  dovecot (same host) ----->  NFS server
              `--->  dovecot (diff host) ----'

* 172.16.23.104: dovecot 1.2.11 and ImapTest-latest. FreeBSD 7.2.
* 172.16.23.101: dovecot 1.2.11 only. FreeBSD 7.2.
* 172.16.23.103: NFS server. Ubuntu Karmic.

All three hosts are ntpd synced.

The following was needed on the FreeBSD boxes to get fcntl locking working:

nfs_client_enable="YES"
rpc_lockd_enable="YES"
rpc_statd_enable="YES"

(imapd worked without these, but maillog showed errors about failing to
obtain locks, "operation not supported")

Test results
------------

* Pointing a single instance of imaptest at a single host, or two instances
of imaptest at the same host (with clients=5 to avoid hitting the 15 client
limit) was fine. ImapTest reported no errors, and nothing out of the ordinary
in maillog.

$ egrep -v "Login:|Disconnected:|Aborted login" /var/log/maillog

* Things went badly wrong with two instances of imaptest pointing at
different dovecot hosts.  I had seen this sort of thing when I'd previously
been using dot locking, and was hoping they'd be fixed by switching to
fcntl, but unfortunately not.

ImapTest reported errors including:

Error: br...@dev.example.com[8]: SELECT failed: 8.3 NO [SERVERBUG] Internal 
error occurred. Refer to server log for more information. [2010-03-25 10:22:23]
  - 6 stalled for 16 secs in command: 11 EXPUNGE

All sorts of errors reported in maillog, including:

Mar 25 10:22:23 freebsd-dev dovecot: IMAP(br...@dev.example.com): fscking index 
file /mail/0/6/37/30/brian%dev.example.com/dovecot.index
Mar 25 10:22:23 freebsd-dev dovecot: IMAP(br...@dev.example.com): Transaction 
log /mail/0/6/37/30/brian%dev.example.com/dovecot.index.log: duplicate 
transaction log sequence (10)
Mar 25 10:22:23 freebsd-dev dovecot: IMAP(br...@dev.example.com): Our dotlock 
file /mail/0/6/37/30/brian%dev.example.com/dovecot-uidlist.lock was overridden 
(locked 0 secs ago, touched 0 secs ago)
Mar 25 10:22:23 freebsd-dev dovecot: IMAP(br...@dev.example.com): fscking index 
file /mail/0/6/37/30/brian%dev.example.com/dovecot.index
Mar 25 10:22:23 freebsd-dev dovecot: IMAP(br...@dev.example.com): Transaction 
log /mail/0/6/37/30/brian%dev.example.com/dovecot.index.log: duplicate 
transaction log sequence (11)
Mar 25 10:22:27 freebsd-dev dovecot: IMAP(br...@dev.example.com): 
/mail/0/6/37/30/brian%dev.example.com/dovecot.index reset, view is now 
inconsistent
Mar 25 10:22:46 freebsd-dev dovecot: IMAP(br...@dev.example.com): Panic: file 
mail-transaction-log-view.c: line 108 (mail_transaction_log_view_set): assertion 
failed: (min_file_seq<= max_file_seq)
Mar 25 10:22:48 freebsd-dev dovecot: IMAP(br...@dev.example.com): 
rename(/mail/0/6/37/30/brian%dev.example.com/dovecot-uidlist.tmp, 
/mail/0/6/37/30/brian%dev.example.com/dovecot-uidlist) failed: No such file or 
directory
Mar 25 10:22:48 freebsd-dev dovecot: IMAP(br...@dev.example.com): 
unlink(/mail/0/6/37/30/brian%dev.example.com/dovecot-uidlist.tmp) failed: No 
such file or directory

Mar 25 10:22:36 wipe-dev dovecot: IMAP(br...@dev.example.com): 
ftruncate(/mail/0/6/37/30/brian%dev.example.com/dovecot-uidlist.lock) failed: 
Stale NFS file handle

(Logs from a single test run are attached)

Interestingly, these messages imply that dovecot is still using dotlocking
in some circumstances, even though I've definitely set fcntl locking.

$ grep ^lock /usr/local/etc/dovecot.conf
lock_method = fcntl

$ egrep '^mail_nfs|^mmap' /usr/local/etc/dovecot.conf
mmap_disable = yes
mail_nfs_storage = yes
mail_nfs_index = yes

All this suggests I should use some sort of 'sticky' load balancing in front
so that all client conns from one IP hit the same frontend box.  However,
that contradicts the experience Adam McDougall has had with a similar setup:

http://dovecot.org/list/dovecot/2010-March/047815.html

It's possible that switching the Linux NFS server to a Netapp will help
(which is what it will be deployed onto eventually anyway)

Adam: did you do any tuning of FreeBSD client NFS settings? And have you
tried using ImapTest, or just real IMAP users?

I see there are a few tunables:

$ grep nfs /etc/defaults/rc.conf
netfs_types="nfs:NFS nfs4:NFS4 smbfs:SMB portalfs:PORTAL nwfs:NWFS" # Net 
filesystems.
nfs_client_enable="NO"                # This host is an NFS client (or NO).
nfs_access_cache="60"         # Client cache timeout in seconds
nfs_server_enable="NO"                # This host is an NFS server (or NO).
nfs_server_flags="-u -t -n 4" # Flags to nfsd (if enabled).
nfs_reserved_port_only="NO"   # Provide NFS only on secure port (or NO).
nfs_bufpackets=""             # bufspace (in packets) for client

I have tried rerunning with
     sysctl vfs.nfs.access_cache_timeout=0
but saw the same problems.

Maybe the load pattern from 'real' IMAP clients is such that these problems
generally don't show in practice?  (i.e.  it would be unusual for a single
IMAP client to make simultaneous changes to the same folder via different
TCP connections)

Regards,

Brian.

I use:
rc.conf:
nfs_client_enable="YES"          # This host is an NFS client (or NO).
rpc_lockd_enable="YES"
rpc_statd_enable="YES"

/etc/fstab:
nfsserver:/vol/mail /egr/mail       nfs     rw,bg,tcp,nosuid 0 0

dovecot.conf: (some other things that helped in general, not necessarily locking related, some got line wrapped)
login_max_processes_count: 512
max_mail_processes: 1024
mail_max_userip_connections: 25
mail_location: maildir:%h/Maildir:CONTROL=%h/Maildir/dovecot/private/control:INDEX=%h/Maildir/dovecot/private/indexes
mmap_disable: yes
mail_nfs_storage: yes
mail_nfs_index: yes
mail_process_size: 1024
mail_log_max_lines_per_sec: 0
auth default:
  worker_max_request_count: 500
# internal note: lock_method is "always dotlock for maildir" according to dovecot author
#lock_method = fcntl

I have played with the access cache but ultimately nothing resulted from it so I leave it un-tuned.

I have not tried imaptest with my servers but I just let them run with real clients, as long as I am not messing around with the back end files in bad ways, I don't really get the errors you turned up in real use. I have seen plenty of them in earlier versions of dovecot before there was code to flush the FreeBSD access cache well enough. I believe I remember Timo saying something about the timestamps on the NetApp NFS server being much more fine grained than some other NFS servers which could be helping me out.




Reply via email to