Re: [Dovecot] quick question

2010-02-18 Thread Timo Sirainen
On Wed, 2010-02-10 at 15:15 -0800, Brandon Davidson wrote:
 rip=67.223.67.45, pid=12881: Timeout while waiting for lock for
 transaction log file /home6/pellerin/.imapidx/.INBOX/dovecot.index.log 

That's fcntl lock I guess. You could always try lock_method=dotlock..



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] quick question

2010-02-12 Thread alex handle
I think mail is the wrong application for nfs, because nfs is slow for
metadata operations.
Would rather use it for vm hosting than mail.

We used to have a small clustered netapp with 10k hdds and three
frontend servers with postfix and courier imap/pop3.
the setup was stable however the performance was not good.

So we build a imap cluster out of a pair of dell r710 (6 x 15K hdds)
with centos 5 drbd and heartbeat.
I will scale this setup by adding another pair of r710 servers and
randomizing the mailboxes between the imap/pop3 cluster pairs.
An imap proxy will direct the users to the right server and the
frontend mx servers will also send the mail to the right server
by using smtp as tranport and postfix transport maps for routing.

In the future i would like to switch from courier to dovecot and using
lmtp as transport to our mailstore.

We currently have 1 mailboxes
only 300 - 400 imap connections
but a lot pop access

the load an the active r710 is only 0.10 :)

I think mail is a problem which you can easily partition so why have
all eggs in one basket :)

alex


Re: [Dovecot] quick question

2010-02-11 Thread David Halik

On 02/10/2010 06:15 PM, Brandon Davidson wrote:

Hi David,

   

-Original Message-
From: David Halik

It looks like we're still working towards a layer 7 solution anyway.
Right now we have one of our student programmers hacking Perdition
 

with
   

a new plugin for dynamic username caching, storage, and automatic fail
over. If we get it working I can send you the basics if you're
 

interested.

I'd definitely be glad in taking a look at what you come up with! I'm
still leaning towards MySQL with quick local fallback, but I'm nowhere
near committed to anything.
   


We're in the process of doing some beta work on it, but so far it works 
nicely. It's basically a plugin for perdition that dynamically builds a 
username db with least connections to a pool of servers, then always 
sends the user back to the same machine. There's a tool you can run to 
see who is on what machine and what the overall layout looks like. If 
the perdition server goes down, we have our switch send people to a 
backup perdition and it dynamically recreates the db again. We have to 
do some more testing (and actually make it live), but so far it's 
promising.



On a side note, we've been running with the two latest maildir patches
in production for a few days now. The last few days we've been seeing a
lot of lock failures:

Just thought I'd see if this was happening to anyone else.

   


I haven't been seeing this here. As far as I can tell, there has been no 
noticeable change in either direction with the last two patches. Every 
once in a blue moon I'll find a dead lock file somewhere, but it doesn't 
seem to be a recurring issue.


--

David Halik
System Administrator
OIT-CSS Rutgers University
dha...@jla.rutgers.edu




Re: [Dovecot] quick question

2010-02-10 Thread Brandon Davidson
Hi David,

 -Original Message-
 From: David Halik
 
 It looks like we're still working towards a layer 7 solution anyway.
 Right now we have one of our student programmers hacking Perdition
with
 a new plugin for dynamic username caching, storage, and automatic fail
 over. If we get it working I can send you the basics if you're
interested.

I'd definitely be glad in taking a look at what you come up with! I'm
still leaning towards MySQL with quick local fallback, but I'm nowhere
near committed to anything.

On a side note, we've been running with the two latest maildir patches
in production for a few days now. The last few days we've been seeing a
lot of lock failures:

Feb 10 04:06:02 cc-popmap6p dovecot: imap-login: Login: user=pellerin,
method=PLAIN, rip=67.223.67.45, lip=128.223.142.39, TLS, mailpid=12881 
Feb 10 04:08:03 oh-popmap3p dovecot: imap-login: Login: user=pellerin,
method=PLAIN, rip=67.223.67.45, lip=128.223.142.39, TLS, mailpid=9569 
Feb 10 04:09:02 cc-popmap6p dovecot: imap: user=pellerin,
rip=67.223.67.45, pid=12881: Timeout while waiting for lock for
transaction log file /home6/pellerin/.imapidx/.INBOX/dovecot.index.log 
Feb 10 04:09:02 cc-popmap6p dovecot: imap: user=pellerin,
rip=67.223.67.45, pid=12881: Our dotlock file
/home6/pellerin/Maildir/dovecot-uidlist.lock was modified (1265803562 vs
1265803684), assuming it wa
Feb 10 04:09:02 cc-popmap6p dovecot: imap: user=pellerin,
rip=67.223.67.45, pid=12881: Connection closed bytes=31/772 
Feb 10 04:11:04 oh-popmap3p dovecot: imap: user=pellerin,
rip=67.223.67.45, pid=9569: Timeout while waiting for lock for
transaction log file /home6/pellerin/.imapidx/.INBOX/dovecot.index.log 
Feb 10 04:11:04 oh-popmap3p dovecot: imap: user=pellerin,
rip=67.223.67.45, pid=9569: Our dotlock file
/home6/pellerin/Maildir/dovecot-uidlist.lock was deleted (locked 180
secs ago, touched 180 secs ago) 
Feb 10 04:11:04 oh-popmap3p dovecot: imap: user=pellerin,
rip=67.223.67.45, pid=9569: Connection closed bytes=18/465

I'm not sure if this is just because it's trying more diligently to make
sure it's got the latest info, and is therefore hitting locks where it
didn't previously... but it's been hanging our clients and requiring
manual intervention to clear. We've been removing the lock file and
killing any active dovecot sessions, which seems to resolve things for a
while.

Just thought I'd see if this was happening to anyone else.

-Brad


Re: [Dovecot] quick question

2010-02-08 Thread David Halik

On 02/06/2010 02:32 PM, Timo Sirainen wrote:

On Sat, 2010-02-06 at 14:28 -0500, David Halik wrote:
   

On 2/6/2010 2:06 PM, Timo Sirainen wrote:
 

ab9e0, st=0x7fffc949d4b0) at maildir-uidlist.c:382

Oh, interesting. An infinite loop. Looks like this could have happened
ever since v1.1. Wonder why it hasn't shown up before. Anyway, fixed:
http://hg.dovecot.org/dovecot-1.2/rev/a9710cb350c0


   

Do you think I should try the previous patch with this addition? I never
got a chance to test it for long because of the loop dump.
 

I committed that patch already to hg, so please do test it :)

   


I've been running both patches and so far they're stable with no new 
crashes, but I haven't really seen any better behavior, so I don't 
know if it's accomplishing anything. =)


Still seeing entire uidlist list dupes after the list goes stale. I 
think that was what we were originally discussing.


Feb  8 12:55:06 gehenna11.rutgers.edu dovecot: IMAP(user): 
fdatasync(/rci/nqu/rci/u5/user/dovecot/.INBOX/dovecot-uidlist) failed: 
Stale NFS file handle
Feb  8 12:55:20 gehenna11.rutgers.edu dovecot: IMAP(user): 
/rci/nqu/rci/u5/user/dovecot/.INBOX/dovecot-uidlist: next_uid was 
lowered (40605 - 40604, hdr=40604)
Feb  8 13:03:51 gehenna11.rutgers.edu dovecot: IMAP(user): 
/rci/nqu/rci/u5/user/dovecot/.INBOX/dovecot-uidlist: Duplicate file 
entry at line 4: 
1251801090.M721811P3983V04240006I01A1DAF9_0.gehenna7.rutgers.edu,S=3001:2,S 
(uid 35314 - 40606)
Feb  8 13:03:51 gehenna11.rutgers.edu dovecot: IMAP(user): 
/rci/nqu/rci/u5/user/dovecot/.INBOX/dovecot-uidlist: Duplicate file 
entry at line 5: 
1251810220.M816183P3757V04240006I01A1DB04_0.gehenna7.rutgers.edu,S=4899:2,S 
(uid 35315 - 40607)
Feb  8 13:03:51 gehenna11.rutgers.edu dovecot: IMAP(user): 
/rci/nqu/rci/u5/user/dovecot/.INBOX/dovecot-uidlist: Duplicate file 
entry at line 6: 
1251810579.M402527P753V045C0007I01A1DB05_0.gehenna8.rutgers.edu,S=36471:2,RS 
(uid 35316 - 40608)


 and so on until the end of the list.


--

David Halik
System Administrator
OIT-CSS Rutgers University
dha...@jla.rutgers.edu




Re: [Dovecot] quick question

2010-02-08 Thread Brandon Davidson
Hi David,

 -Original Message-
 From: David Halik
 
 I've been running both patches and so far they're stable with no new
 crashes, but I haven't really seen any better behavior, so I don't
 know if it's accomplishing anything. =)
 
 Still seeing entire uidlist list dupes after the list goes stale. I
 think that was what we were originally discussing.

I wasn't able to roll the patched packages into production until this
morning, but so far I'm seeing the same thing as you - no real change in
behavior.

I guess that brings us back to Timo's possibility number two?

-Brad


Re: [Dovecot] quick question

2010-02-08 Thread David Halik

On 02/08/2010 01:46 PM, Brandon Davidson wrote:

Hi David,

   

-Original Message-
From: David Halik

I've been running both patches and so far they're stable with no new
crashes, but I haven't really seen any better behavior, so I don't
know if it's accomplishing anything. =)

Still seeing entire uidlist list dupes after the list goes stale. I
think that was what we were originally discussing.
 

I wasn't able to roll the patched packages into production until this
morning, but so far I'm seeing the same thing as you - no real change in
behavior.

I guess that brings us back to Timo's possibility number two?

-Brad
   


It looks like we're still working towards a layer 7 solution anyway. 
Right now we have one of our student programmers hacking Perdition with 
a new plugin for dynamic username caching, storage, and automatic fail 
over. If we get it working I can send you the basics if you're interested.


--

David Halik
System Administrator
OIT-CSS Rutgers University
dha...@jla.rutgers.edu




Re: [Dovecot] quick question

2010-02-06 Thread David Halik

On 2/6/2010 2:06 PM, Timo Sirainen wrote:


ab9e0, st=0x7fffc949d4b0) at maildir-uidlist.c:382
   
Oh, interesting. An infinite loop. Looks like this could have happened

ever since v1.1. Wonder why it hasn't shown up before. Anyway, fixed:
http://hg.dovecot.org/dovecot-1.2/rev/a9710cb350c0

   


Do you think I should try the previous patch with this addition? I never 
got a chance to test it for long because of the loop dump.


Re: [Dovecot] quick question

2010-02-06 Thread Timo Sirainen
On Sat, 2010-02-06 at 14:28 -0500, David Halik wrote:
 On 2/6/2010 2:06 PM, Timo Sirainen wrote:
 
  ab9e0, st=0x7fffc949d4b0) at maildir-uidlist.c:382
 
  Oh, interesting. An infinite loop. Looks like this could have happened
  ever since v1.1. Wonder why it hasn't shown up before. Anyway, fixed:
  http://hg.dovecot.org/dovecot-1.2/rev/a9710cb350c0
 
 
 
 Do you think I should try the previous patch with this addition? I never 
 got a chance to test it for long because of the loop dump.

I committed that patch already to hg, so please do test it :)



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] quick question

2010-01-25 Thread David Halik

On 01/22/2010 05:14 PM, Brandon Davidson wrote:


Yeah, as long as the users don't see it, I'm happy to live with the messages
in the log file.

-Brad

   


*sigh*, it looks like there still might be the occasional user visible 
issue. I was hoping that once the assert stopped happening, and the 
process stayed alive, that the users wouldn't see their inbox disappear 
and reappear apparently, this is still happening occasionally.


I just had user experience this with TB 2, and after looking at the logs 
I found the good ole' stale nfs message:


Jan 25 11:39:24 gehenna21 dovecot: IMAP(user): 
fdatasync(/rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist) failed: 
Stale NFS file handle


Fortunately, there were no other messages associated with it (assert or 
otherwise), but I was hoping to have seen the last of the users mail 
momentarily reloading.


For now they're just have to live with it until I either get proxy_maybe 
setup, or some other solution.


--

David Halik
System Administrator
OIT-CSS Rutgers University
dha...@jla.rutgers.edu




Re: [Dovecot] quick question

2010-01-25 Thread Charles Marcus
On 2010-01-25 12:57 PM, David Halik wrote:
 I just had user experience this with TB 2, and after looking at the logs
 I found the good ole' stale nfs message:

Maybe TB3 would be better behaved? It has many, many MAP improvements
over TB2... worth a try at least...

-- 

Best regards,

Charles


Re: [Dovecot] quick question

2010-01-25 Thread David Halik

On 01/25/2010 01:00 PM, Charles Marcus wrote:

On 2010-01-25 12:57 PM, David Halik wrote:
   

I just had user experience this with TB 2, and after looking at the logs
I found the good ole' stale nfs message:
 

Maybe TB3 would be better behaved? It has many, many MAP improvements
over TB2... worth a try at least...

   


I agree, I definitely want the user to try it... especially since 
they're technically inclined and can tell me one way or the other. I'm 
going to wait though until 3.0.3 comes out because of the CONDSTORE issues.


--

David Halik
System Administrator
OIT-CSS Rutgers University
dha...@jla.rutgers.edu




Re: [Dovecot] quick question

2010-01-25 Thread David Halik

On 01/25/2010 01:02 PM, David Halik wrote:

On 01/25/2010 01:00 PM, Charles Marcus wrote:

On 2010-01-25 12:57 PM, David Halik wrote:
I just had user experience this with TB 2, and after looking at the 
logs

I found the good ole' stale nfs message:

Maybe TB3 would be better behaved? It has many, many MAP improvements
over TB2... worth a try at least...



I agree, I definitely want the user to try it... especially since 
they're technically inclined and can tell me one way or the other. I'm 
going to wait though until 3.0.3 comes out because of the CONDSTORE 
issues.




Err, 3.0.2 rather. Speaking of which, I just was notified that the patch 
for approved for inclusion in 3.0.2. Now it just depends on how long it 
takes to be released.



--

David Halik
System Administrator
OIT-CSS Rutgers University
dha...@jla.rutgers.edu




Re: [Dovecot] quick question

2010-01-25 Thread Brandon Davidson
David,

 -Original Message-
 From: David Halik [mailto:dha...@jla.rutgers.edu]
 
 *sigh*, it looks like there still might be the occasional user visible
 issue. I was hoping that once the assert stopped happening, and the
 process stayed alive, that the users wouldn't see their inbox
disappear
 and reappear apparently, this is still happening occasionally.
 
 I just had user experience this with TB 2, and after looking at the
logs
 I found the good ole' stale nfs message:
 

Hmm, that's disappointing to hear. I haven't received any new reports
from our helpdesk, so maybe it's at least less visible?

 For now they're just have to live with it until I either get
proxy_maybe
 setup, or some other solution.

Let me know if you come up with anything. I'm not sure we want to add
MySQL as a dependency for our mail service... but I'm at least curious
to see how things perform with session affinity. I'll add it to my long
list of things to play with when I have time for such things...

-Brad


Re: [Dovecot] quick question

2010-01-25 Thread Timo Sirainen
On Mon, 2010-01-25 at 12:57 -0500, David Halik wrote:
 Jan 25 11:39:24 gehenna21 dovecot: IMAP(user): 
 fdatasync(/rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist) failed: 
 Stale NFS file handle

Well, two possibilities:

a) The attached patch fixes this

b) Dotlocking isn't working for you..

diff -r 0ff07b4ad306 src/lib-storage/index/maildir/maildir-uidlist.c
--- a/src/lib-storage/index/maildir/maildir-uidlist.c	Mon Jan 25 20:24:54 2010 +0200
+++ b/src/lib-storage/index/maildir/maildir-uidlist.c	Mon Jan 25 20:30:25 2010 +0200
@@ -904,11 +904,10 @@
 	}
 }
 
-int maildir_uidlist_refresh(struct maildir_uidlist *uidlist)
+static int maildir_uidlist_open_latest(struct maildir_uidlist *uidlist)
 {
-unsigned int i;
-bool retry, recreated;
-int ret;
+	bool recreated;
+	int ret;
 
 	if (uidlist-fd != -1) {
 		ret = maildir_uidlist_has_changed(uidlist, recreated);
@@ -918,10 +917,29 @@
 			return ret  0 ? -1 : 1;
 		}
 
-		if (recreated)
-			maildir_uidlist_close(uidlist);
+		if (!recreated)
+			return 0;
+		maildir_uidlist_close(uidlist);
 	}
 
+	uidlist-fd = nfs_safe_open(uidlist-path, O_RDWR);
+	if (uidlist-fd == -1  errno != ENOENT) {
+		mail_storage_set_critical(uidlist-ibox-box.storage,
+			open(%s) failed: %m, uidlist-path);
+		return -1;
+	}
+	return 0;
+}
+
+int maildir_uidlist_refresh(struct maildir_uidlist *uidlist)
+{
+unsigned int i;
+bool retry;
+int ret;
+
+	if (maildir_uidlist_open_latest(uidlist)  0)
+		return -1;
+
 for (i = 0; ; i++) {
 		ret = maildir_uidlist_update_read(uidlist, retry,
 		i  UIDLIST_ESTALE_RETRY_COUNT);
@@ -1512,18 +1530,12 @@
 	if (maildir_uidlist_want_recreate(ctx))
 		return maildir_uidlist_recreate(uidlist);
 
-	if (uidlist-fd == -1) {
-		/* NOREFRESH flag used. we're just appending some messages. */
+	if (!uidlist-locked_refresh) {
+		/* make sure we have the latest file (e.g. NOREFRESH used) */
 		i_assert(uidlist-initial_hdr_read);
-
-		uidlist-fd = nfs_safe_open(uidlist-path, O_RDWR);
-		if (uidlist-fd == -1) {
-			mail_storage_set_critical(storage,
-open(%s) failed: %m, uidlist-path);
+		if (maildir_uidlist_open_latest(uidlist)  0)
 			return -1;
-		}
 	}
-
 	i_assert(ctx-first_unwritten_pos != (unsigned int)-1);
 
 	if (lseek(uidlist-fd, 0, SEEK_END)  0) {


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] quick question

2010-01-25 Thread David Jonas
On 01/22/2010 10:15 AM, Brandon Davidson wrote:
 We've thought about enabling IP-based session affinity on the load
 balancer, but this would concentrate the load of our webmail clients, as
 well as not really solving the problem for users that leave clients open
 on multiple systems. 

Webmail and IMAP servers are on the same network for us so we don't have
to go through the BigIP for this, we just use local round-robin DNS to
avoid any sort of clumping. Imapproxy or dovecot proxy local to the
webmail server would get around that too.

 I've done a small bit of looking at nginx's imap
 proxy support, but it's not really set up to do what we want, and would
 require moving the IMAP virtual server off our load balancers and on to
 something significantly less supportable. Having the dovecot processes
 'talk amongst themselves' to synchronize things, or go into proxy mode
 automatically, would be fantastic.

Though we aren't using NFS we do have a BigIP directing IMAP and POP3
traffic to multiple dovecot stores. We use mysql authentication and the
proxy_maybe option to keep users on the correct box. My tests using an
external proxy box didn't significantly reduce the load on the stores
compared to proxy_maybe. And you don't have to manage another
box/config. Since you only need to keep users on the _same_ box and not
the _correct_ box, if you're using mysql authentication you could hash
the username or domain to a particular IP address:

SELECT CONCAT('192.168.1.', ORD(UPPER(SUBSTRING('%d', 1, 1))) AS host,
'Y' AS proxy_maybe, ...

Just assign IP addresses 192.168.1.48-90 to your dovecot servers. Shift
the range by adding or subtracting to the ORD. A mysql function would
likely work just as well. If a server goes down, move it's IP. You could
probably make pairs with heartbeat or some monitoring software to do it
automatically.

-David



Re: [Dovecot] quick question

2010-01-25 Thread Brandon Davidson
David,

 Though we aren't using NFS we do have a BigIP directing IMAP and POP3
 traffic to multiple dovecot stores. We use mysql authentication and
the
 proxy_maybe option to keep users on the correct box. My tests using
an
 external proxy box didn't significantly reduce the load on the stores
 compared to proxy_maybe. And you don't have to manage another
 box/config. Since you only need to keep users on the _same_ box and
not
 the _correct_ box, if you're using mysql authentication you could hash
 the username or domain to a particular IP address:
 
 SELECT CONCAT('192.168.1.', ORD(UPPER(SUBSTRING('%d', 1, 1))) AS host,
 'Y' AS proxy_maybe, ...
 
 Just assign IP addresses 192.168.1.48-90 to your dovecot servers.
Shift
 the range by adding or subtracting to the ORD. A mysql function would
 likely work just as well. If a server goes down, move it's IP. You
could
 probably make pairs with heartbeat or some monitoring software to do
it
 automatically.

Timo posted a similar suggestion recently, and I might try to find some
time to proof this out over the next few weeks. I liked his idea of
storing the user's current server in the database and proxying to that,
with fallback to a local connection if they're new or their current
server is unavailable. The table cleanup and pool monitoring would
probably be what I'd worry most about testing.

Unfortunately we're currently using LDAP auth via PAM... so even if I
could get the SQL and monitoring issues resolved, I think I'd have a
hard time convincing my peers that adding a SQL server as a single point
of failure was a good idea. If it could be set up to just fall back to
using a local connection in the event of a SQL server outage, that might
help things a bit. Anyone know how that might work?

-Brad


Re: [Dovecot] quick question

2010-01-25 Thread Timo Sirainen
On 25.1.2010, at 21.30, Brandon Davidson wrote:

 Unfortunately we're currently using LDAP auth via PAM... so even if I
 could get the SQL and monitoring issues resolved, I think I'd have a
 hard time convincing my peers that adding a SQL server as a single point
 of failure was a good idea. If it could be set up to just fall back to
 using a local connection in the event of a SQL server outage, that might
 help things a bit. Anyone know how that might work?

Well, you can always fall back to LDAP if SQL isn't working.. Just something 
like:

passdb sql {
 ..
}
passdb ldap {
 ..
}



Re: [Dovecot] quick question

2010-01-25 Thread Brandon Davidson
Timo,

 -Original Message-
 From: Timo Sirainen [mailto:t...@iki.fi]
 
 On 25.1.2010, at 21.30, Brandon Davidson wrote:
  If it could be set up to just fall back to
  using a local connection in the event of a SQL server outage, that
might
  help things a bit. Anyone know how that might work?
 
 Well, you can always fall back to LDAP if SQL isn't working.. Just
something
 like:
 
 passdb sql {
  ..
 }
 passdb ldap {
  ..
 }

Or just 'passdb pam { ... }' for the second one in our case, since we're
using system auth with pam_ldap/nss_ldap. Is the SQL connection/query
timeout configurable? It would be nice to make a very cursory attempt at
proxying, and immediately give up and use a local connection if anything
isn't working.

-Brad


Re: [Dovecot] quick question

2010-01-25 Thread David Halik

On 01/25/2010 02:18 PM, David Halik wrote:

On 01/25/2010 01:31 PM, Timo Sirainen wrote:

On Mon, 2010-01-25 at 12:57 -0500, David Halik wrote:

Jan 25 11:39:24 gehenna21 dovecot: IMAP(user):
fdatasync(/rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist) failed:
Stale NFS file handle

Well, two possibilities:

a) The attached patch fixes this



I patched and immediately starting seeing *many* of these:

Jan 25 15:05:33 gehenna18.rutgers.edu dovecot: IMAP(user): 
lseek(/rci/nqu/rci/u1/sendick/dovecot/.Trash/dovecot-uidlist) failed: 
Bad file descriptor
Jan 25 15:05:33 gehenna18.rutgers.edu dovecot: IMAP(user): 
lseek(/rci/nqu/rci/u1/sendick/dovecot/.Trash/dovecot-uidlist) failed: 
Bad file descriptor


...so I backed out right away.


--

David Halik
System Administrator
OIT-CSS Rutgers University
dha...@jla.rutgers.edu




Re: [Dovecot] quick question

2010-01-25 Thread Timo Sirainen
On Mon, 2010-01-25 at 15:12 -0500, David Halik wrote:
 I patched and immediately starting seeing *many* of these:
 
 Jan 25 15:05:33 gehenna18.rutgers.edu dovecot: IMAP(user): 
 lseek(/rci/nqu/rci/u1/sendick/dovecot/.Trash/dovecot-uidlist) failed: 
 Bad file descriptor

Hmm. I put it through a few seconds of imaptest but didn't see these, so
I guess there's something it didn't catch. The attached patch fixes the
first obvious potential problem I can think of, try if you still
dare. :)

diff -r 7b0e1b2c9afd src/lib-storage/index/maildir/maildir-uidlist.c
--- a/src/lib-storage/index/maildir/maildir-uidlist.c	Mon Jan 25 01:35:35 2010 +0200
+++ b/src/lib-storage/index/maildir/maildir-uidlist.c	Mon Jan 25 22:24:07 2010 +0200
@@ -123,6 +123,7 @@
 	uint32_t prev_uid;
 };
 
+static int maildir_uidlist_open_latest(struct maildir_uidlist *uidlist);
 static bool maildir_uidlist_iter_next_rec(struct maildir_uidlist_iter_ctx *ctx,
 	  struct maildir_uidlist_rec **rec_r);
 
@@ -875,11 +876,10 @@
 	}
 }
 
-int maildir_uidlist_refresh(struct maildir_uidlist *uidlist)
+static int maildir_uidlist_open_latest(struct maildir_uidlist *uidlist)
 {
-unsigned int i;
-bool retry, recreated;
-int ret;
+	bool recreated;
+	int ret;
 
 	if (uidlist-fd != -1) {
 		ret = maildir_uidlist_has_changed(uidlist, recreated);
@@ -889,10 +889,29 @@
 			return ret  0 ? -1 : 1;
 		}
 
-		if (recreated)
-			maildir_uidlist_close(uidlist);
+		if (!recreated)
+			return 0;
+		maildir_uidlist_close(uidlist);
 	}
 
+	uidlist-fd = nfs_safe_open(uidlist-path, O_RDWR);
+	if (uidlist-fd == -1  errno != ENOENT) {
+		mail_storage_set_critical(uidlist-ibox-box.storage,
+			open(%s) failed: %m, uidlist-path);
+		return -1;
+	}
+	return 0;
+}
+
+int maildir_uidlist_refresh(struct maildir_uidlist *uidlist)
+{
+unsigned int i;
+bool retry;
+int ret;
+
+	if (maildir_uidlist_open_latest(uidlist)  0)
+		return -1;
+
 for (i = 0; ; i++) {
 		ret = maildir_uidlist_update_read(uidlist, retry,
 		i  UIDLIST_ESTALE_RETRY_COUNT);
@@ -1434,18 +1453,12 @@
 	if (maildir_uidlist_want_recreate(ctx))
 		return maildir_uidlist_recreate(uidlist);
 
-	if (uidlist-fd == -1) {
-		/* NOREFRESH flag used. we're just appending some messages. */
+	if (!uidlist-locked_refresh || uidlist-fd == -1) {
+		/* make sure we have the latest file (e.g. NOREFRESH used) */
 		i_assert(uidlist-initial_hdr_read);
-
-		uidlist-fd = nfs_safe_open(uidlist-path, O_RDWR);
-		if (uidlist-fd == -1) {
-			mail_storage_set_critical(storage,
-open(%s) failed: %m, uidlist-path);
+		if (maildir_uidlist_open_latest(uidlist)  0)
 			return -1;
-		}
 	}
-
 	i_assert(ctx-first_unwritten_pos != (unsigned int)-1);
 
 	if (lseek(uidlist-fd, 0, SEEK_END)  0) {


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] quick question

2010-01-25 Thread Timo Sirainen
On 25.1.2010, at 21.53, Brandon Davidson wrote:

 Or just 'passdb pam { ... }' for the second one in our case, since we're
 using system auth with pam_ldap/nss_ldap. Is the SQL connection/query
 timeout configurable? It would be nice to make a very cursory attempt at
 proxying, and immediately give up and use a local connection if anything
 isn't working.

I don't think it's immediate.. But it's probably something like:

 - notice it's not working - reconnect
 - requests are queued
 - reconnect fails, hopefully soon, but MySQL connect at least fails in max. 10 
seconds
 - reconnect timeout is added, which doubles after each failure
 - requests are failed while it's not trying to connect



Re: [Dovecot] quick question

2010-01-25 Thread Brandon Davidson
Timo,

On 1/25/10 12:31 PM, Timo Sirainen t...@iki.fi wrote:
 
 I don't think it's immediate.. But it's probably something like:
 
  - notice it's not working - reconnect
  - requests are queued
  - reconnect fails, hopefully soon, but MySQL connect at least fails in max.
 10 seconds
  - reconnect timeout is added, which doubles after each failure
  - requests are failed while it's not trying to connect

Hmm, that's not great. Is that tunable at all? Cursory examination shows
that it's hardcoded in src/lib-sql/driver-mysql.c, so I guess not.

I suppose I could also get around to playing with multi-master replication
so I at least have a SQL server available at each of the sites that I have
Dovecot servers...

-Brad



Re: [Dovecot] quick question

2010-01-25 Thread David Halik

On 01/25/2010 03:26 PM, Timo Sirainen wrote:

On Mon, 2010-01-25 at 15:12 -0500, David Halik wrote:
   

I patched and immediately starting seeing *many* of these:

Jan 25 15:05:33 gehenna18.rutgers.edu dovecot: IMAP(user):
lseek(/rci/nqu/rci/u1/sendick/dovecot/.Trash/dovecot-uidlist) failed:
Bad file descriptor
 

Hmm. I put it through a few seconds of imaptest but didn't see these, so
I guess there's something it didn't catch. The attached patch fixes the
first obvious potential problem I can think of, try if you still
dare. :)

   


No guts no glory! So far, so good. The first patch started spewing 
messages within seconds. I've been running for about twenty minutes with 
this version and I haven't seen much of anything yet.


I'll report back tomorrow after it has a day to burn in.

--

David Halik
System Administrator
OIT-CSS Rutgers University
dha...@jla.rutgers.edu




Re: [Dovecot] quick question

2010-01-25 Thread David Halik




No guts no glory! So far, so good. The first patch started spewing messages 
within seconds. I've been running for about twenty minutes with this version 
and I haven't seen much of anything yet.


I'll report back tomorrow after it has a day to burn in.



It's still a bit buggy. I haven't seen any messages in the last few hours, 
but then a user just dumped a gigantic 200MB core. Looking at dump it's 
because of some recursive loop that goes on forever:


#0  0x2b656f2cba71 in _int_malloc (av=0x2b656f5ab9e0, bytes=368) at 
malloc.c:4650

iters = value optimized out
nb = 384
idx = 759448916
bin = value optimized out
victim = value optimized out
size = value optimized out
victim_index = value optimized out
remainder = value optimized out
remainder_size = value optimized out
block = value optimized out
bit = value optimized out
map = value optimized out
fwd = value optimized out
bck = value optimized out
#1  0x2b656f2cd86d in __libc_calloc (n=value optimized out, 
elem_size=value optimized out) at malloc.c:4006

av = (struct malloc_state *) 0x2b656f5ab9e0
oldtop = (struct malloc_chunk *) 0x1da94070
p = value optimized out
bytes = 368
csz = value optimized out
oldtopsize = 12176
mem = (void *) 0x139cdc40
clearsize = value optimized out
nclears = value optimized out
d = value optimized out
#2  0x004a8ea6 in pool_system_malloc (pool=value optimized out, 
size=368) at mempool-system.c:78

mem = value optimized out
#3  0x004a4daa in i_stream_create_fd (fd=12, max_buffer_size=4096, 
autoclose_fd=96) at istream-file.c:156

fstream = value optimized out
	st = {st_dev = 329008600, st_ino = 4452761, st_nlink = 27, st_mode 
= 799030, st_uid = 0, st_gid = 1, pad0 = 0, st_rdev = 109556025819520, 
st_size = 11013, st_blksize = 0, st_blocks = 95, st_atim = {
tv_sec = 4096, tv_nsec = 8}, st_mtim = {tv_sec = 1264465811, tv_nsec = 
499376000}, st_ctim = {tv_sec = 1264465811, tv_nsec = 499378000}, __unused 
= {1264465811, 499384000, 0}}
#4  0x0043fba6 in maildir_uidlist_refresh (uidlist=0x139d6ab0) at 
maildir-uidlist.c:733

retry = 64
ret = -1
#5  0x00440bb5 in maildir_uidlist_update_hdr 
(uidlist=0x2b656f5ab9e0, st=0x7fffc949d360) at maildir-uidlist.c:382

mhdr = (struct maildir_index_header *) 0x139cdc40
#6  0x0043 in maildir_uidlist_refresh (uidlist=0x139d6ab0) at 
maildir-uidlist.c:793

retry = false
ret = 1
#7  0x00440bb5 in maildir_uidlist_update_hdr 
(uidlist=0x2b656f5ab9e0, st=0x7fffc949d4b0) at maildir-uidlist.c:382

mhdr = (struct maildir_index_header *) 0x139cdc40
#8  0x0043 in maildir_uidlist_refresh (uidlist=0x139d6ab0) at 
maildir-uidlist.c:793

retry = false
ret = 1
#9  0x00440bb5 in maildir_uidlist_update_hdr 
(uidlist=0x2b656f5ab9e0, st=0x7fffc949d600) at maildir-uidlist.c:382

mhdr = (struct maildir_index_header *) 0x139cdc40
#10 0x0043 in maildir_uidlist_refresh (uidlist=0x139d6ab0) at 
maildir-uidlist.c:793

retry = false
ret = 1
#11 0x00440bb5 in maildir_uidlist_update_hdr 
(uidlist=0x2b656f5ab9e0, st=0x7fffc949d750) at maildir-uidlist.c:382

mhdr = (struct maildir_index_header *) 0x139cdc40

...and on and on for thousands of lines. I gave up after 20K. ;)


[Dovecot] quick question

2010-01-22 Thread David Halik


Timo (and anyone else who feels like chiming in),

I was just wondering if you'd be able to tell me if the amount of 
corruption I see on a daily basis is what you consider average for our 
current setup and traffic. Now that we are no longer experiencing any 
core dumps with the latest patches since our migration from courier two 
months ago, I'd like to know what is expected as operational norms. 
Prior to this we had never used Dovecot, so I have nothing to go on.


Our physical setup is 10 Centos 5.4 x86_64 IMAP/POP servers, all with 
the same NFS backend where the index, control, and Maildir's for the 
users reside. Accessing this are direct connections from clients, plus 
multiple squirrelmail webservers, and pine users, all at the same time 
with layer4 switch connection load balancing.


Each server has an average of about 400 connections, for a total of 
around concurrent 4000 during a normal business day. This is out of a 
possible user population of about 15,000.


All our dovecot servers syslog to one machine, and on average I see 
about 50-75 instances of file corruption per day. I'm not counting each 
line, since some instances of corruption generate a log message for each 
uid that's wrong. This is just me counting user A was corrupted once at 
10:00, user B was corrupted at 10:25 for example.


Examples of the corruption are as follows:

###
Corrupted transaction log file /dovecot/.INBOX/dovecot.index.log seq 
28: Invalid transaction log size (32692 vs 32800): 
./dovecot/.INBOX/dovecot.index.log (sync_offset=32692)


Corrupted index cache file ./dovecot/.Sent 
Messages/dovecot.index.cache: Corrupted physical size for uid=624: 0 != 
53490263


Corrupted transaction log file /dovecot/.INBOX/dovecot.index.log seq 
66: Unexpected garbage at EOF (sync_offset=21608)


Corrupted transaction log file 
./dovecot/.Trash.RFA/dovecot.index.log seq 2: indexid changed 
1264098644 - 1264098664 (sync_offset=0)


Corrupted index cache file ./dovecot/.INBOX/dovecot.index.cache: 
invalid record size


Corrupted index cache file ./dovecot/.INBOX/dovecot.index.cache: 
field index too large (33 = 19)


Corrupted transaction log file /dovecot/.INBOX/dovecot.index.log seq 
40: record size too small (type=0x0, offset=5788, size=0) (sync_offset=5812)

##

These are most of the unique messages I could find, although the 
majority are the same as the first two I posted. So, my question, is 
this normal for a setup such as ours? I've been arguing with my boss 
over this since the switch. My opinion is that with a setup such as ours 
where a user can be logged in using Thunderbird, Squirrelmail, and their 
Blackberry all concurrently at the same time, there will always be the 
occasional index/log corruption.


Unfortunately, he is of the opinion that there should rarely be any and 
there is a design flaw in how Dovecot is designed to work with multiple 
services with an NFS backend.


What has been your experience so far?

Thanks,
-Dave

--

David Halik
System Administrator
OIT-CSS Rutgers University
dha...@jla.rutgers.edu




Re: [Dovecot] quick question

2010-01-22 Thread Timo Sirainen
On Fri, 2010-01-22 at 11:24 -0500, David Halik wrote:

 Unfortunately, he is of the opinion that there should rarely be any
 and 
 there is a design flaw in how Dovecot is designed to work with
 multiple 
 services with an NFS backend. 

Well, he is pretty much correct. I thought I could add enough NFS cache
flushes to code to make it work well, but that's highly dependent on
what OS or even kernel version the NFS clients are running on. Looking
at the problems with people using NFS it's pretty clear that this
solution just isn't going to work properly.

But then again, Dovecot is the only (free) IMAP server that even
attempts to support this kind of behavior. Or sure, Courier does too,
but disabling index files on Dovecot should get the same stability.

I see only two proper solutions:

1) Change your architecture so that all mail accesses to a specific user
go through a single server. Install Dovecot proxy so all IMAP/POP3
connections go through it to the correct server.

Later once v2.0 is stable install LMTP and make all mail deliveries go
through it too (possibly also LMTP proxy if your MTA can't figure out
the correct destination server). In the mean time use deliver with a
configuration that doesn't update index files.

This guarantees that only a single server ever accesses the user's mails
simultaneously. This is the only guaranteed way to make it work in near
future. With this setup you should see zero corruption.

2) Long term solution will be for Dovecot to not use NFS server for
inter-process communication, but instead connect to other Dovecot
servers directly via network. Again in this setup there would be only a
single server reading/writing user's index files.


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] quick question

2010-01-22 Thread Timo Sirainen
On Fri, 2010-01-22 at 19:16 +0200, Timo Sirainen wrote:
 2) Long term solution will be for Dovecot to not use NFS server for
 inter-process communication, but instead connect to other Dovecot
 servers directly via network. 

Actually not NFS server, but filesystem. So this would be done even
when not using NFS.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] quick question

2010-01-22 Thread Cor Bosman

On Jan 22, 2010, at 1:19 PM, Timo Sirainen wrote:

 On Fri, 2010-01-22 at 19:16 +0200, Timo Sirainen wrote:
 2) Long term solution will be for Dovecot to not use NFS server for
 inter-process communication, but instead connect to other Dovecot
 servers directly via network. 
 
 Actually not NFS server, but filesystem. So this would be done even
 when not using NFS.
 

Is this the situation we discussed once where a dovecot instance becomes a 
proxy if it detects that a user should be on a different server?  The one thing 
I remember sorta missing from that idea at the time was a fallback to local 
spool if the other dovecot server isnt available. 

Cor



Re: [Dovecot] quick question

2010-01-22 Thread Timo Sirainen
On Fri, 2010-01-22 at 13:23 -0400, Cor Bosman wrote:
 On Jan 22, 2010, at 1:19 PM, Timo Sirainen wrote:
 
  On Fri, 2010-01-22 at 19:16 +0200, Timo Sirainen wrote:
  2) Long term solution will be for Dovecot to not use NFS server for
  inter-process communication, but instead connect to other Dovecot
  servers directly via network. 
  
  Actually not NFS server, but filesystem. So this would be done even
  when not using NFS.
  
 
 Is this the situation we discussed once where a dovecot instance becomes a 
 proxy if it detects that a user should be on a different server?

No, that was my 1) plan :) And this is already possible with
proxy_maybe: http://wiki.dovecot.org/PasswordDatabase/ExtraFields/Proxy

 The one thing I remember sorta missing from that idea at the time was a 
 fallback to local spool if the other dovecot server isnt available. 

Right. This still isn't supported. Also it's not really the safest
solution either, because it could place user's connections to different
servers due to some temporary problems. Or if primary has failed, user
has connections on secondary server, primary comes back up, now new
connections go to primary and old connections haven't been killed from
secondary so you'll potentially get corruption.

Better would be to have some kind of a database that externally monitors
what servers are up and where users currently have connections, and
based on that decide where to redirect a new connection. Although that's
also slightly racy unless done carefully.


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] quick question

2010-01-22 Thread Timo Sirainen
On Fri, 2010-01-22 at 19:31 +0200, Timo Sirainen wrote:
  Is this the situation we discussed once where a dovecot instance becomes a 
  proxy if it detects that a user should be on a different server?
 
 No, that was my 1) plan :) And this is already possible with
 proxy_maybe: http://wiki.dovecot.org/PasswordDatabase/ExtraFields/Proxy

So, clarification: Either using dedicated proxies or using proxy_maybe
works for 1). I just didn't remember proxy_maybe. I suppose that's a
better/easier solution since it doesn't require new hardware or network
changes.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] quick question

2010-01-22 Thread Timo Sirainen
On Fri, 2010-01-22 at 19:31 +0200, Timo Sirainen wrote:
 Better would be to have some kind of a database that externally monitors
 what servers are up and where users currently have connections, and
 based on that decide where to redirect a new connection. Although that's
 also slightly racy unless done carefully.

Wonder if something like this would work:

servers (
  id integer,
  host varchar,
  ip varchar,
  last_time_healty timestamp,
  connection_count integer,
  new_connections_ok boolean
);

user_connections (
  user_id integer primary key,
  server_id integer,
  last_lookup timestamp,
  imap_connections integer
);

Then some kind of logic that:

 - if user already exists in user_connections table AND
(imap_connections  0 OR last_lookupnow() - 1 hour) use the old
server_id

 - otherwise figure out a new server for it based on servers'
connection_count and new_connections_ok.

 - when inserting, handle on duplicate key error

 - when updating, use update user_connections .. where user_id = $userid
and server_id = $old_server_id, and be prepared to handle when this
returns 0 rows updated.

Once in a while maybe clean up stale rows from user_connections. And
properly keeping track of imap_connections count might also be
problematic, so maybe once in a while somehow check from all servers if
the user actually still has any connections.


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] quick question

2010-01-22 Thread Timo Sirainen
On Fri, 2010-01-22 at 19:54 +0200, Timo Sirainen wrote:
  - otherwise figure out a new server for it based on servers'
 connection_count and new_connections_ok.

Or in case of proxy_maybe and a external load balancer, maybe just use
the local server in this situation.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] quick question

2010-01-22 Thread Timo Sirainen
One more spam about this :)

On Fri, 2010-01-22 at 19:54 +0200, Timo Sirainen wrote:
 Then some kind of logic that:
 
  - if user already exists in user_connections table AND
 (imap_connections  0 OR last_lookupnow() - 1 hour) use the old
 server_id

AND new_connections_ok also here. The idea being that something
externally monitors servers' health and if it's down for n seconds (n=30
or so?), this field gets updated to FALSE, so new connections for users
that were in the broken server go elsewhere.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] quick question

2010-01-22 Thread Brandon Davidson
David,

 -Original Message-
 From: dovecot-bounces+brandond=uoregon@dovecot.org
[mailto:dovecot-
 Our physical setup is 10 Centos 5.4 x86_64 IMAP/POP servers, all with
 the same NFS backend where the index, control, and Maildir's for the
 users reside. Accessing this are direct connections from clients, plus
 multiple squirrelmail webservers, and pine users, all at the same time
 with layer4 switch connection load balancing.
 
 Each server has an average of about 400 connections, for a total of
 around concurrent 4000 during a normal business day. This is out of a
 possible user population of about 15,000.
 
 All our dovecot servers syslog to one machine, and on average I see
 about 50-75 instances of file corruption per day. I'm not counting
each
 line, since some instances of corruption generate a log message for
each
 uid that's wrong. This is just me counting user A was corrupted once
at
 10:00, user B was corrupted at 10:25 for example.

We have a much similar setup - 8 POP/IMAP servers running RHEL 5.4,
Dovecot 1.2.9 (+ patches), F5 BigIP load balancer cluster
(active/standby) in a L4 profile distributing connections round-robin,
maildirs on two Netapp Filers (clustered 3070s with 54k RPM SATA disks),
10k peak concurrent connections for 45k total accounts. We used to run
with the noac mount option, but performance was abysmal, and we were
approaching 80% CPU utilization on the filers at peak load. After
removing noac, our CPU is down around 30%, and our NFS ops/sec rate is
maybe 1/10th of what it used to be.

The downside to this is that we've started seeing significantly more
crashing and mailbox corruption. Timo's latest patch seems to have fixed
the crashing, but the corruption just seems to be the cost of
distributing users at random across our backend servers.

We've thought about enabling IP-based session affinity on the load
balancer, but this would concentrate the load of our webmail clients, as
well as not really solving the problem for users that leave clients open
on multiple systems. I've done a small bit of looking at nginx's imap
proxy support, but it's not really set up to do what we want, and would
require moving the IMAP virtual server off our load balancers and on to
something significantly less supportable. Having the dovecot processes
'talk amongst themselves' to synchronize things, or go into proxy mode
automatically, would be fantastic.

Anyway, that's where we're at with the issue. As a data point for your
discussion with your boss:
* With 'noac', we would see maybe 1 or two 'corrupt' errors a day. Most
of these were related to users going over quota.
* After removing 'noac', we saw 5-10 'Corrupt' errors and 20-30 crashes
a day. The crashes were highly visible to the users, as their mailbox
would appear to be empty until the rebuild completed.
* Since applying the latest patch, we've seen no crashes, and 60-70
'Corrupt' errors a day. We have not had any new user complaints.

Hope that helps,

-Brad


Re: [Dovecot] quick question

2010-01-22 Thread David Halik

On 01/22/2010 12:16 PM, Timo Sirainen wrote:

Looking at the problems with people using NFS it's pretty clear that this
solution just isn't going to work properly.
   


Actually, considering the amount of people and servers we're throwing at 
it, I think that it's dealing with it pretty well. I'm sure there are 
always more tweaks and enhancements that can be done, but look at how 
much better 1.2 is over 1.0 releases. it's definitely not broken, just 
maybe not quite production ready as it could be. Honestly, at this point 
my users are very happy with the speed increase and as long as their 
imap process isn't dying they don't seem to notice the behind the scenes 
corruption because of the self healing code.



But then again, Dovecot is the only (free) IMAP server that even
attempts to support this kind of behavior. Or sure, Courier does too,
but disabling index files on Dovecot should get the same stability.
   


By the way, I didn't want to give the impression that we were unhappy 
with the product, rather I think what you've accomplished with dovecot 
is great even by non-free enterprise standards, not to mention the level 
of support you've given us has been excellent and I appreciate it 
greatly. It was a clear choice for us over courier once NFS support 
became a reality. Loads on the exact same hardware dropped from an 
average of 5 to 0.5, quite amazing, not to mention the speed benefit of 
the indexes. Our users with extremely large Maildir's were very satisfied.




I see only two proper solutions:

1) Change your architecture so that all mail accesses to a specific user
go through a single server. Install Dovecot proxy so all IMAP/POP3
connections go through it to the correct server.
   


We've discussed this internally and are still considering layer7 
username balancing as a possibility, but I haven't worked too much on 
the specifics yet. We've only been running for two months on dovecot, so 
we wanted to give it some burn in time and see how things progressed. 
Now that the core dumps are fixed, I think we might be able to live with 
the corruption for awhile. The only user visible issue that I was aware 
of was the the users' mailbox disappearing when the processes died, but 
since that's not happening any more I'll have to see if anyone notices 
the corruption.


Thanks for all the feedback. I'm going over some of the ideas you 
suggested and we'll be thinking about long term solutions.


--

David Halik
System Administrator
OIT-CSS Rutgers University
dha...@jla.rutgers.edu




Re: [Dovecot] quick question

2010-01-22 Thread David Halik

On 01/22/2010 01:15 PM, Brandon Davidson wrote:


We have a much similar setup - 8 POP/IMAP servers running RHEL 5.4,
Dovecot 1.2.9 (+ patches), F5 BigIP load balancer cluster
(active/standby) in a L4 profile distributing connections round-robin,
maildirs on two Netapp Filers (clustered 3070s with 54k RPM SATA disks),
10k peak concurrent connections for 45k total accounts. We used to run
with the noac mount option, but performance was abysmal, and we were
approaching 80% CPU utilization on the filers at peak load. After
removing noac, our CPU is down around 30%, and our NFS ops/sec rate is
maybe 1/10th of what it used to be.
   


Wow, that's almost the exact same setup we use, except we have 10 
IMAP/POP and a clustered pair of FAS920's with 10K drives which are 
getting replaced in a few weeks. We also have a pair of clustered 
3050's, but they're not running dovecot (yet).


You're right about noac though, it absolutely destroyed our netapps. Of 
course the corruption was all but eliminated, but the filer performance 
was so bad our users immediately noticed. Definitely not an option.



The downside to this is that we've started seeing significantly more
crashing and mailbox corruption. Timo's latest patch seems to have fixed
the crashing, but the corruption just seems to be the cost of
distributing users at random across our backend servers.
   


Yep, I agree. Like I said in the last email, we'll going to deal with it 
for now and see if anyone really notices. I can live with it if the 
users don't care.


Timo, speaking of which, I'm guessing everyone is happy with the latest 
patches, any ETA on 1.2.10? ;)



We've thought about enabling IP-based session affinity on the load
balancer, but this would concentrate the load of our webmail clients, as
well as not really solving the problem for users that leave clients open
on multiple systems.
   


We currently have IP session 'sticky' on our L4's and it didn't help all 
that much. yes, it reduces thrashing on the backend, but ultimately it 
won't help the corruption. Like you said, multiple logins will still go 
to different servers when the IP's are different.


How if your webmail architecture setup? We're using imapproxy to spread 
them them out across the same load balancer, so essentially all traffic 
from outside and inside get's balanced. The trick is we have an internal 
load balanced virtual IP that spreads the load out for webmail on 
private IP space. If they were to go outside they would get NAT'd as one 
outbound IP, so we just go inside and get the benefit of balancing.




Anyway, that's where we're at with the issue. As a data point for your
discussion with your boss:
* With 'noac', we would see maybe 1 or two 'corrupt' errors a day. Most
of these were related to users going over quota.
* After removing 'noac', we saw 5-10 'Corrupt' errors and 20-30 crashes
a day. The crashes were highly visible to the users, as their mailbox
would appear to be empty until the rebuild completed.
* Since applying the latest patch, we've seen no crashes, and 60-70
'Corrupt' errors a day. We have not had any new user complaints.
   


That's where we are, and as long as the corruptions stay user invisible, 
I'm fine with it. Crashes seem to be the only user visible issue so far, 
with noac being out of the question unless they buy a ridiculously 
expensive filer.


--

David Halik
System Administrator
OIT-CSS Rutgers University
dha...@jla.rutgers.edu




Re: [Dovecot] quick question

2010-01-22 Thread David Halik



We've thought about enabling IP-based session affinity on the load
balancer,
   


Brandon, I just thought of something. Have you always been running 
without IP affinity across all your connections? We've always had it 
turned on because we were under the impression that certain clients like 
Outlook had major issues without it. Basically, as the client spawns new 
connections and they go to other servers rather than the same one the 
client begins to fight itself. IP affinity always seemed like a more 
stable option, but if you've been running without it for a long time, 
maybe it's not such a problem after all. Anyway, what has you experience 
been?


--

David Halik
System Administrator
OIT-CSS Rutgers University
dha...@jla.rutgers.edu




Re: [Dovecot] quick question

2010-01-22 Thread Cor Bosman

 Wow, that's almost the exact same setup we use, except we have 10 IMAP/POP 
 and a clustered pair of FAS920's with 10K drives which are getting replaced 
 in a few weeks. We also have a pair of clustered 3050's, but they're not 
 running dovecot (yet).

Pretty much the same as us as well.  35 imap servers. 10 pop servers.  
clustered pair of 6080s, with about 250 15K disks. We're seeing some corruption 
as well. I myself am using imap extensively and regularly have problems with my 
inbox disappearing. Im not running the patch yet though. Is 1.2.10 imminent or 
should i just patch 1.2.9?

Cor



Re: [Dovecot] quick question

2010-01-22 Thread Timo Sirainen
On Fri, 2010-01-22 at 17:05 -0400, Cor Bosman wrote:

 Is 1.2.10 imminent or should i just patch 1.2.9?

I'll try to get 1.2.10 out on Sunday. There are still some mails I
should read through and maybe fix some other stuff.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] quick question

2010-01-22 Thread Brandon Davidson
Cor,

On 1/22/10 1:05 PM, Cor Bosman c...@xs4all.nl wrote:
 
 Pretty much the same as us as well.  35 imap servers. 10 pop servers.
 clustered pair of 6080s, with about 250 15K disks. We're seeing some
 corruption as well. I myself am using imap extensively and regularly have
 problems with my inbox disappearing. Im not running the patch yet though. Is
 1.2.10 imminent or should i just patch 1.2.9?

You guys must serve a pretty heavy load. What's your peak connection count
across all those machines? How's the load? We recently went through a
hardware replacement cycle, and were targeting  25% utilization at peak
load so we can lose one of our sites (half of our machines are in each site)
without running into any capacity problems. We're actually at closer to 10%
at peak, if that... Probably less now that we've disabled noac. Dovecot is
fantastic :)

-Brad



Re: [Dovecot] quick question

2010-01-22 Thread Cor Bosman
 
 You guys must serve a pretty heavy load. What's your peak connection count
 across all those machines? How's the load? We recently went through a
 hardware replacement cycle, and were targeting  25% utilization at peak
 load so we can lose one of our sites (half of our machines are in each site)
 without running into any capacity problems. We're actually at closer to 10%
 at peak, if that... Probably less now that we've disabled noac. Dovecot is
 fantastic :)

I think the peak is around 1 concurrent connections, out of about 500,000 
mailboxes. The servers are way overspecced, so we can lose half of them. The 
netapps are also being used for webservices.

Cor

Re: [Dovecot] quick question

2010-01-22 Thread Brandon Davidson
David,

On 1/22/10 12:34 PM, David Halik dha...@jla.rutgers.edu wrote:
 
 We currently have IP session 'sticky' on our L4's and it didn't help all
 that much. yes, it reduces thrashing on the backend, but ultimately it
 won't help the corruption. Like you said, multiple logins will still go
 to different servers when the IP's are different.
 
 How if your webmail architecture setup? We're using imapproxy to spread
 them them out across the same load balancer, so essentially all traffic
 from outside and inside get's balanced. The trick is we have an internal
 load balanced virtual IP that spreads the load out for webmail on
 private IP space. If they were to go outside they would get NAT'd as one
 outbound IP, so we just go inside and get the benefit of balancing.

We have two webmail interfaces - one is an old in-house open-source project
called Alphamail, the new one is Roundcube. Both of them point at the same
VIP that we point users at, with no special rules. We're running straight
round-robin L4 connection distribution, with no least-connections or
sticky-client rules.

We've been running this way for about 3 years I think.. I've only been here
a year. We made a number of changes in sequence starting about three and a
half years ago - Linux NFS to Netapp, Courier to Dovecot, mbox to Maildir+,
LVS to F5 BigIP; not necessarily in that order. At no point have we ever had
any sort of session affinity.
 
 That's where we are, and as long as the corruptions stay user invisible,
 I'm fine with it. Crashes seem to be the only user visible issue so far,
 with noac being out of the question unless they buy a ridiculously
 expensive filer.

Yeah, as long as the users don't see it, I'm happy to live with the messages
in the log file.

-Brad



Re: [Dovecot] Quick question...

2009-02-26 Thread Michael Segel
Thanks,

I was thinking of doing this in dovecot because I thought about having to
create virtual mailboxes 'on the fly' and then it would be nice to capture
the original mail for historical/auditing purposes.

There are two ways that I could do this. I could do this using a database
back end for the mailbox and then use an insert trigger, or I could do this
prior to the database using deliver to process the e-mail and then store it
in the database. (Instead of MySQL, I was looking at the IIUG's version of
IBM's Informix. Its free for some uses ;-)

Thanks for the quick response from everyone. 
I just needed a point in the right direction and I was too busy focused on
something else to RTFM. ;-)

-Mike


 -Original Message-
 From: to...@tuxteam.de [mailto:to...@tuxteam.de]
 Sent: Wednesday, February 25, 2009 11:53 PM
 To: mse...@segel.com
 Cc: dovecot@dovecot.org
 Subject: Re: [Dovecot] Quick question...
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On Wed, Feb 25, 2009 at 02:28:50PM -0600, dove...@segel.com wrote:
  Hi,
 
  Here's the scenario.
 
  I want to set up a mailbox so that when mail sent to the address is
 piped to
  a processing application, instead of going to a mailbox.
 
 Conceptually, when a mail arrives there are two processes involved: the
 mail transfer agent (MTA) and the mail delivery agent (MDA). The mail
 transfer agent takes the mail from the net (typically from another
 MTA) and decides whether the mail has to be delivered locally (then it
 passes it on to the MDA) or remotely (then it passes it on to another
 MTA).
 
 Note that Dovecot doesn't enter this picture at all (yet). Its primary
 job is serving up mail to end-users when it has already been delivered.
 
 All that said, most MTAs (Postfix, Exim, Sendmail, Qmail, you name it)
 bring along with them delivery functions (can fill in the role of MDAs).
 The dovecot distribution brings along with it a delivery agent (deliver)
 which you can configure to play many tricks on delivery via a language
 designed explicitly for that (called Sieve), and there are quite powerful
 third party delivery agents (e.g. procmail).
 
 So, to sum up your best bet would be:
 
  - if the requirements are simple, like pipe all mail going to this
user through this program, do as Justin said and tell your mail
transfer agent to do that. To be able to give you any hints, I should
at least know the beast by name ;-)
 
  - if the task is more complex (e.g. depending on other headers, time of
day, you name it), then just tell the MTA to push it to the MDA (most
come preconfigured to do that anyway, if circumstances are right) and
tweak the MDA configuration to achieve that.
 
 Hope that helps. Things can be a bit confusing at the beginning.
 
 Regards
 - -- tomás
 
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)
 
 iD8DBQFJpi4yBcgs9XrR2kYRAmOkAJ9XExiCQYVbD6TrSf38qN4IxXuD5wCcDpEa
 Af0M7SFSpUwVhreUmozaGEk=
 =Vni1
 -END PGP SIGNATURE-




[Dovecot] Quick question...

2009-02-25 Thread dovecot
Hi,

Here's the scenario.

I want to set up a mailbox so that when mail sent to the address is piped to
a processing application, instead of going to a mailbox.

One way I can do this is to set up a mailbox and then have an application
that checks to see if there's mail and then processes it.
(Old school Unix script)

Is there a way to set it up with dovecot? 
(Cleaner solution)

Thx

-Mike


Re: [Dovecot] Quick question...

2009-02-25 Thread Harry Lachanas

dove...@segel.com wrote:

Hi,

Here's the scenario.

I want to set up a mailbox so that when mail sent to the address is piped to
a processing application, instead of going to a mailbox.

One way I can do this is to set up a mailbox and then have an application
that checks to see if there's mail and then processes it.
(Old school Unix script)

Is there a way to set it up with dovecot? 
(Cleaner solution)


Thx

-Mike

  

I am sure this can be done with sieve but I am starting to learn it now 

With procmail it's just trivial 
After making sure it is a mail destined to your application, you just 
pipe it there


# Last delivering recepie after spam cheks etc 
:0
| /path/to/my/processing/application 

You might also have a look at ripmime ( rips mime attachements from a 
mail ) ...

so you can combine all 

Cheers
Harry.







Re: [Dovecot] Quick question...

2009-02-25 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, Feb 25, 2009 at 02:28:50PM -0600, dove...@segel.com wrote:
 Hi,
 
 Here's the scenario.
 
 I want to set up a mailbox so that when mail sent to the address is piped to
 a processing application, instead of going to a mailbox.

Conceptually, when a mail arrives there are two processes involved: the
mail transfer agent (MTA) and the mail delivery agent (MDA). The mail
transfer agent takes the mail from the net (typically from another
MTA) and decides whether the mail has to be delivered locally (then it
passes it on to the MDA) or remotely (then it passes it on to another
MTA).

Note that Dovecot doesn't enter this picture at all (yet). Its primary
job is serving up mail to end-users when it has already been delivered.

All that said, most MTAs (Postfix, Exim, Sendmail, Qmail, you name it)
bring along with them delivery functions (can fill in the role of MDAs).
The dovecot distribution brings along with it a delivery agent (deliver)
which you can configure to play many tricks on delivery via a language
designed explicitly for that (called Sieve), and there are quite powerful
third party delivery agents (e.g. procmail).

So, to sum up your best bet would be:

 - if the requirements are simple, like pipe all mail going to this
   user through this program, do as Justin said and tell your mail
   transfer agent to do that. To be able to give you any hints, I should
   at least know the beast by name ;-)

 - if the task is more complex (e.g. depending on other headers, time of
   day, you name it), then just tell the MTA to push it to the MDA (most
   come preconfigured to do that anyway, if circumstances are right) and
   tweak the MDA configuration to achieve that.

Hope that helps. Things can be a bit confusing at the beginning.

Regards
- -- tomás

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFJpi4yBcgs9XrR2kYRAmOkAJ9XExiCQYVbD6TrSf38qN4IxXuD5wCcDpEa
Af0M7SFSpUwVhreUmozaGEk=
=Vni1
-END PGP SIGNATURE-


Re: [Dovecot] Quick question regarding autocreate plugin

2008-10-29 Thread Jakob Curdes


just a quick question: if I want to use the autocreate plugin with 
1.1.5, I have to compile it by hand, right? How do I do that? Can I 
adapt a Makefile from another plugin?
I meanwhile solved this and updated the WIKI to explain how the plugin 
can be compiled with the 1.1.x source tree.


cf.
http://wiki.dovecot.org/Plugins/Autocreate


Regards,

Jakob


[Dovecot] Quick question regarding autocreate plugin

2008-10-28 Thread Jakob Curdes

Hello,

just a quick question: if I want to use the autocreate plugin with 
1.1.5, I have to compile it by hand, right? How do I do that? Can I 
adapt a Makefile from another plugin?


JC


[Dovecot] quick question about fs quota overhead in plugin

2007-11-28 Thread Adam McDougall
Last night I enabled imap_quota so dovecot could report usage reported
by disk quota.  I don't intend to actually use the quota plugin to place
any limits anytime soon though.  How much overhead does this add to 
normal operations that allocate disk space?  Ideally I'd like a situation
where the only overhead is incurred when the user uses the mail client to
specifically check their usage.  Is that possible, and/or is there a better
way to do this?  If it does cause general overhead on the NFS filer, I could
just accept it until/unless I feel it becomes a burden I cannot bear.

mail_plugins = acl fts fts_squat quota imap_quota
...
quota = fs
...


Re: [Dovecot] quick question about fs quota overhead in plugin

2007-11-28 Thread Timo Sirainen
On Wed, 2007-11-28 at 09:26 -0500, Adam McDougall wrote:
 Last night I enabled imap_quota so dovecot could report usage reported
 by disk quota.  I don't intend to actually use the quota plugin to place
 any limits anytime soon though.  How much overhead does this add to 
 normal operations that allocate disk space?  Ideally I'd like a situation
 where the only overhead is incurred when the user uses the mail client to
 specifically check their usage.  Is that possible, and/or is there a better
 way to do this?  If it does cause general overhead on the NFS filer, I could
 just accept it until/unless I feel it becomes a burden I cannot bear.
 
 mail_plugins = acl fts fts_squat quota imap_quota
 ...
 quota = fs
 ...

http://hg.dovecot.org/dovecot/rev/1d0521b7151d



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Quick question on multiple access to dovecot indexes

2007-05-26 Thread Timo Sirainen
On Fri, 2007-05-25 at 11:28 -0400, Adam McDougall wrote:
 I have up to 4 servers that will run dovecot behind a load balancer, which 
 means
 the same user might be accessing the same mailbox from multiple servers, and 
 it 
 seems like dovecot doesn't like multiple access to the dovecot indexes for 
 the 
 one user since I currently have them stored in a nfs home directory.  Is this
 a bad thing?  Must I keep a seperate index location per server? 

So you're using NFS? Have you read http://wiki.dovecot.org/NFS?



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Quick question on multiple access to dovecot indexes

2007-05-26 Thread Adam McDougall
On Sat, May 26, 2007 at 06:23:45PM +0300, Timo Sirainen wrote:

  On Fri, 2007-05-25 at 11:28 -0400, Adam McDougall wrote:
   I have up to 4 servers that will run dovecot behind a load balancer, which 
means
   the same user might be accessing the same mailbox from multiple servers, 
and it 
   seems like dovecot doesn't like multiple access to the dovecot indexes for 
the 
   one user since I currently have them stored in a nfs home directory.  Is 
this
   a bad thing?  Must I keep a seperate index location per server? 
  
  So you're using NFS? Have you read http://wiki.dovecot.org/NFS?
  
Oops, sorry.  I will take that information into account.



[Dovecot] Quick question on multiple access to dovecot indexes

2007-05-25 Thread Adam McDougall
I have up to 4 servers that will run dovecot behind a load balancer, which means
the same user might be accessing the same mailbox from multiple servers, and it 
seems like dovecot doesn't like multiple access to the dovecot indexes for the 
one user since I currently have them stored in a nfs home directory.  Is this
a bad thing?  Must I keep a seperate index location per server? 

Just today I started running dovecot on more than one server like this and 
started
seeing things in the logs like:

May 25 11:19:13 boomhauer dovecot: IMAP(mcdouga9): Corrupted transaction log 
file 
/home/mcdouga9/Maildir/dovecot/public/indexes/decs/.support.In/dovecot.index.log:
 end_offset (1332)  current 
sync_offset (1244)
May 25 11:19:13 boomhauer dovecot: IMAP(mcdouga9): broken sync positions in 
index file 
/home/mcdouga9/Maildir/dovecot/public/indexes/decs/.support.In/dovecot.index
May 25 11:19:13 boomhauer dovecot: IMAP(mcdouga9): fscking index file 
/home/mcdouga9/Maildir/dovecot/public/indexes/decs/.support.In/dovecot.index
May 25 11:19:13 boomhauer dovecot: IMAP(mcdouga9): Fixed index file 
/home/mcdouga9/Maildir/dovecot/public/indexes/decs/.support.In/dovecot.index: 
log file sync pos 2,1332 - 2, 1244
May 25 11:19:13 boomhauer dovecot: IMAP(mcdouga9): Unexpected transaction log 
desync with index 
/home/mcdouga9/Maildir/dovecot/public/indexes/decs/.support.In/dovecot.index
May 25 11:19:13 boomhauer dovecot: IMAP(mcdouga9): Disconnected: Mailbox is in 
inconsistent state, please relogin.


May 25 11:19:17 dauterive dovecot: IMAP(mcdouga9): file mail-index.c: line 983 
(mail_index_sync_from_transactions): 
assertion failed: (hdr.messages_count == (*map)-hdr.messages_count)
May 25 11:19:17 dauterive dovecot: child 16386 (imap) killed with signal 6
...
May 25 11:19:53 dauterive in.imapproxyd[17211]: LOGIN: 'mcdouga9' 
(127.0.0.1:53650) on existing sd [9]
May 25 11:19:53 dauterive dovecot: IMAP(mcdouga9): Transaction log file 
/home/mcdouga9/Maildir/dovecot/public/indexes/decs/.support.In/dovecot.index.log:
 marked corrupted
May 25 11:19:53 dauterive dovecot: IMAP(mcdouga9): Transaction log file 
/home/mcdouga9/Maildir/dovecot/public/indexes/decs/.support.In/dovecot.index.log.2:
 marked corrupted