Re: [Dovecot] 1.1.4 and trouble over NFS

2008-10-09 Thread Mark Zealey
 That looks correct. So either something's changed in Dovecot since I
 last tested it with NFS, or something's changed in Linux kernel. Or
 there's something else weird going on.
 
 I don't really have any good suggestions for solutions and I don't
 really have time to debug NFS problems right now.. Or you could of
 course always try if disabling NFS attribute cache fixes things, but
 that'll most likely also make your performance a lot worse.

Fwiw I just upgraded one of our servers from 1.1.2 to 1.1.4 (32-bit,
centos5.2) and cant see any changes in our nfs access patterns. We use
nfs3 though.

Mark


Re: [Dovecot] 1.1.4 and trouble over NFS

2008-10-08 Thread Timo Sirainen
On Tue, 2008-10-07 at 22:19 +0200, Igor Colombi wrote:
 Timo Sirainen wrote:
  On Oct 7, 2008, at 9:57 PM, Igor Colombi wrote:
 
  Hello, I have some trouble with the current setup (it's a testing 
  environment):
 
  2 server with Dovecot 1.1.4 from source (OS Debian testing 2.6.26) 
  (name: exim and exim2)
  1 NFS server (OS Debian testing 2.6.26)
 
  I use NFS v.4, indexes shared over NFS. The relavant part of Dovecot 
  configuration:
 
  Or probably also running imaptest (http://imapwiki.org/ImapTest) on 
  both machines? Could you run http://www.dovecot.org/tools/nfstest.c 
  and tell me what it outputs?
 
 Hello, this is the output of nftest on exim2 (client) vs exim (the other 
 client)
 
 
 ./nfstest 192.168.0.70 50 /mnt/nfs1/pippo/pluto

That looks correct. So either something's changed in Dovecot since I
last tested it with NFS, or something's changed in Linux kernel. Or
there's something else weird going on.

I don't really have any good suggestions for solutions and I don't
really have time to debug NFS problems right now.. Or you could of
course always try if disabling NFS attribute cache fixes things, but
that'll most likely also make your performance a lot worse.


signature.asc
Description: This is a digitally signed message part


[Dovecot] 1.1.4 and trouble over NFS

2008-10-07 Thread Igor Colombi
Hello, I have some trouble with the current setup (it's a testing 
environment):


2 server with Dovecot 1.1.4 from source (OS Debian testing 2.6.26) 
(name: exim and exim2)

1 NFS server (OS Debian testing 2.6.26)

I use NFS v.4, indexes shared over NFS. The relavant part of Dovecot 
configuration:


dotlock_use_excl = yes
mail_nfs_storage = yes
mail_nfs_index = yes
mmap_disable = yes
lock_method = dotlock

Maildir format.

/etc/exports on NFS server:

/mnt 192.168.0.0/24(rw,sync,no_root_squash,no_subtree_check,fsid=0)
/mnt/vexim 192.168.0.0/24(rw,sync,no_root_squash,no_subtree_check)

/etc/fstab on the clients:

192.168.0.71:/vexim /mnt/nfs1 nfs4 rw 0 0

When I access a mailbox with 2 different clients (for example Outlook 
Express that uses exim and Thunderbird that uses exim2) and in the 
same time some message is delivered to the mailbox (the smtp is served 
by Exim 4.69) Dovecot has some trouble as you can see:


*** Log from machine exim

Oct  7 20:00:20 exim dovecot: IMAP([EMAIL PROTECTED]): 
/mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot-uidlist: file size 
changed unexpectedly after write
Oct  7 20:00:20 exim dovecot: IMAP([EMAIL PROTECTED]): Broken file 
/mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot-uidlist line 3: Invalid 
data:
Oct  7 20:00:21 exim dovecot: IMAP([EMAIL PROTECTED]): 
/mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot-uidlist: Duplicate file 
entry at line 6: 1223402420.H172077P24365.exim.bnscr.it:2, (uid 16919 - 
16922)
Oct  7 20:00:23 exim dovecot: IMAP([EMAIL PROTECTED]): Log 
synchronization error at seq=3,offset=10612 for 
/mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: Append with UID 
16925, but next_uid = 16926
Oct  7 20:00:23 exim dovecot: IMAP([EMAIL PROTECTED]): Log 
synchronization error at seq=3,offset=10724 for 
/mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: Extension record 
update for invalid uid=16926
Oct  7 20:00:23 exim dovecot: IMAP([EMAIL PROTECTED]): 
Synchronization corrupted index header: 
/mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index
Oct  7 20:00:23 exim dovecot: IMAP([EMAIL PROTECTED]): fscking index 
file /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index
Oct  7 20:00:23 exim dovecot: IMAP([EMAIL PROTECTED]): Fixed index 
file /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: 
first_recent_uid 16927 - 16926
Oct  7 20:00:24 exim dovecot: IMAP([EMAIL PROTECTED]): Log 
synchronization error at seq=3,offset=11140 for 
/mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: Extension record 
update for invalid uid=16928
Oct  7 20:00:24 exim dovecot: IMAP([EMAIL PROTECTED]): fscking index 
file /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index
Oct  7 20:01:00 exim dovecot: imap-login: Login: 
user=[EMAIL PROTECTED], method=PLAIN, rip=127.0.0.1, 
lip=127.0.0.1, secured
Oct  7 20:02:00 exim dovecot: imap-login: Login: 
user=[EMAIL PROTECTED], method=PLAIN, rip=127.0.0.1, 
lip=127.0.0.1, secured
Oct  7 20:02:26 exim dovecot: IMAP([EMAIL PROTECTED]): Log 
synchronization error at seq=3,offset=11628 for 
/mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: Append with UID 
16929, but next_uid = 16930
Oct  7 20:02:26 exim dovecot: IMAP([EMAIL PROTECTED]): Log 
synchronization error at seq=3,offset=11756 for 
/mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: Extension record 
update for invalid uid=16930
Oct  7 20:02:26 exim dovecot: IMAP([EMAIL PROTECTED]): fscking index 
file /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index
Oct  7 20:15:23 exim dovecot: Panic: IMAP([EMAIL PROTECTED]): 
Message count decreased
Oct  7 20:15:23 exim dovecot: IMAP([EMAIL PROTECTED]): Raw 
backtrace: imap [0x80d6781] - imap [0x80d6802] - imap [0x80d6199] - 
imap [0x8069711] - imap [0x805e98b] - imap [0x805ea40] - imap 
[0x809c9af] - imap(io_loop_handle_timeouts+0xe9) [0x80ddb19] - 
imap(io_loop_handler_run+0x82) [0x80de3d2] - imap(io_loop_run+0x20) 
[0x80dd8c0] - imap(main+0x46a) [0x806b5aa] - 
/lib/i686/cmov/libc.so.6(__libc_start_main+0xe5) [0xb7df2455] - imap 
[0x805c9f1]


*** Log from machine exim

Oct  7 20:00:20 exim2 dovecot: IMAP([EMAIL PROTECTED]): 
/mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot-uidlist: Duplicate file 
entry at line 5: 1223402419.H453623P30627.exim2.bnscr.it:2, (uid 16919 
- 16920)
Oct  7 20:00:22 exim2 dovecot: IMAP([EMAIL PROTECTED]): Log 
synchronization error at seq=3,offset=10496 for 
/mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: Extension record 
update for invalid uid=16925
Oct  7 20:00:22 exim2 dovecot: IMAP([EMAIL PROTECTED]): fscking 
index file /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index
Oct  7 20:00:25 exim2 dovecot: IMAP([EMAIL PROTECTED]): Log 
synchronization error at seq=3,offset=11324 for 
/mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: Append with UID 
16928, but next_uid = 16929
Oct  7 20:00:25 exim2 dovecot: IMAP([EMAIL PROTECTED]): Log 
synchronization error at seq=3,offset=11528 for 
/mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: Extension record 
update 

Re: [Dovecot] 1.1.4 and trouble over NFS

2008-10-07 Thread Timo Sirainen

On Oct 7, 2008, at 9:57 PM, Igor Colombi wrote:

Hello, I have some trouble with the current setup (it's a testing  
environment):


2 server with Dovecot 1.1.4 from source (OS Debian testing 2.6.26)  
(name: exim and exim2)

1 NFS server (OS Debian testing 2.6.26)

I use NFS v.4, indexes shared over NFS. The relavant part of Dovecot  
configuration:

..
Oct  7 20:00:20 exim dovecot: IMAP([EMAIL PROTECTED]): /mnt/nfs1/ 
minnie.tld/postmaster/Maildir/dovecot-uidlist: file size changed  
unexpectedly after write

..
I know that with Dovecot  1.1 the use on multiple computer when  
users are connected randomly was discouraged, but with version 1.1.x  
this setup is safe.


It's safe as long as Dovecot can reliably flush all NFS caches. But  
this clearly isn't happening.


I can reproduce this kind of problem easly: 2 clients on the same  
mailbox connected to two different imap server and a simple bash  
script that send 2 email every 1 second to the mailbox.


Or probably also running imaptest (http://imapwiki.org/ImapTest) on  
both machines? Could you run http://www.dovecot.org/tools/nfstest.c  
and tell me what it outputs?




PGP.sig
Description: This is a digitally signed message part


Re: [Dovecot] 1.1.4 and trouble over NFS

2008-10-07 Thread Igor Colombi

Timo Sirainen wrote:

On Oct 7, 2008, at 9:57 PM, Igor Colombi wrote:

Hello, I have some trouble with the current setup (it's a testing 
environment):


2 server with Dovecot 1.1.4 from source (OS Debian testing 2.6.26) 
(name: exim and exim2)

1 NFS server (OS Debian testing 2.6.26)

I use NFS v.4, indexes shared over NFS. The relavant part of Dovecot 
configuration:


Or probably also running imaptest (http://imapwiki.org/ImapTest) on 
both machines? Could you run http://www.dovecot.org/tools/nfstest.c 
and tell me what it outputs?


Hello, this is the output of nftest on exim2 (client) vs exim (the other 
client)



./nfstest 192.168.0.70 50 /mnt/nfs1/pippo/pluto
Connected: Acting as test client
ESTALE errors don't happen
O_EXCL appears to be working, but this could be just faked by NFS client
timestamps resolution: seconds

Testing file attribute cache..
Attr cache flush open+close: failed
Attr cache flush close+open: OK
Attr cache flush fchown(-1, -1): failed
Attr cache flush fchown(uid, -1): OK
Attr cache flush fchmod(mode): OK
Attr cache flush chown(-1, -1): failed
Attr cache flush chown(uid, -1): OK
Attr cache flush chmod(mode): OK
Attr cache flush rmdir(): failed
Attr cache flush rmdir(parent dir): failed
Attr cache flush dup+close: failed
Attr cache flush fcntl(shared): OK
Attr cache flush fcntl(exclusive): OK
Attr cache flush flock(shared): OK
Attr cache flush flock(exclusive): OK
Attr cache flush fsync(): failed
Attr cache flush fcntl(O_SYNC): failed
Attr cache flush O_DIRECT: failed

Testing data cache..
Data cache flush no caching: failed
Data cache flush open+close: failed
Data cache flush close+open: failed
Data cache flush fchown(-1, -1): failed
Data cache flush fchown(uid, -1): failed
Data cache flush fchmod(mode): failed
Data cache flush chown(-1, -1): failed
Data cache flush chown(uid, -1): failed
Data cache flush chmod(mode): failed
Data cache flush rmdir(): failed
Data cache flush rmdir(parent dir): failed
Data cache flush dup+close: failed
Data cache flush fcntl(shared): OK
Data cache flush fcntl(exclusive): OK
Data cache flush flock(shared): OK
Data cache flush flock(exclusive): OK
Data cache flush fsync(): failed
Data cache flush fcntl(O_SYNC): failed
Data cache flush O_DIRECT: OK

Testing write flushing..
Write flush no caching: failed
Write flush open+close: OK
Write flush close+open: OK
Write flush fchown(-1, -1): failed
Write flush fchown(uid, -1): OK
Write flush fchmod(mode): OK
Write flush chown(-1, -1): failed
Write flush chown(uid, -1): OK
Write flush chmod(mode): OK
Write flush rmdir(): failed
Write flush rmdir(parent dir): failed
Write flush dup+close: OK
Write flush fcntl(shared): OK
Write flush fcntl(exclusive): OK
Write flush flock(shared): OK
Write flush flock(exclusive): OK
Write flush fsync(): OK
Write flush fcntl(O_SYNC): failed
Write flush O_DIRECT: OK

Testing partial writing..
OK

Testing file handle cache..
File handle cache flush no caching: failed
File handle cache flush open+close: failed
File handle cache flush close+open: failed
File handle cache flush fchown(-1, -1): failed
File handle cache flush fchown(uid, -1): failed
File handle cache flush fchmod(mode): failed
File handle cache flush chown(-1, -1): failed
File handle cache flush chown(uid, -1): failed
File handle cache flush chmod(mode): failed
File handle cache flush rmdir(): failed
File handle cache flush rmdir(parent dir): OK
File handle cache flush dup+close: failed
File handle cache flush fcntl(shared): failed
fcntl(setlk, write) failed: Bad file descriptor
File handle cache flush fcntl(exclusive): failed
File handle cache flush flock(shared): failed
File handle cache flush flock(exclusive): failed
File handle cache flush fsync(): failed
File handle cache flush fcntl(O_SYNC): failed
fcntl(/mnt/nfs1/pippo, O_DIRECT) failed: Invalid argument
File handle cache flush O_DIRECT: failed

Testing negative file handle cache..
Negative file handle cache flush no caching: failed
Negative file handle cache flush open+close: failed
Negative file handle cache flush close+open: failed
Negative file handle cache flush fchown(-1, -1): failed
Negative file handle cache flush fchown(uid, -1): failed
Negative file handle cache flush fchmod(mode): failed
Negative file handle cache flush chown(-1, -1): failed
Negative file handle cache flush chown(uid, -1): failed
Negative file handle cache flush chmod(mode): failed
rmdir(/mnt/nfs1/pippo/pluto) failed: No such file or directory
rmdir(/mnt/nfs1/pippo/pluto) failed: No such file or directory
Negative file handle cache flush rmdir(): failed
Negative file handle cache flush rmdir(parent dir): OK
Negative file handle cache flush dup+close: failed
Negative file handle cache flush fcntl(shared): failed
fcntl(setlk, write) failed: Bad file descriptor
Negative file handle cache flush fcntl(exclusive): failed
Negative file handle cache flush flock(shared): failed
Negative file handle cache flush flock(exclusive): failed
Negative file handle cache flush fsync(): failed