Re: sync_client bails out after 3 MAILBOXES need upgrading to USER in one run

2006-09-13 Thread David Carter

On Tue, 12 Sep 2006, Wesley Craig wrote:

On a related note, what was the problem with accepting the Cambridge 
patches for delayed folder deletion?  I'm interested in working on 
getting that or similar code accepted.  Now that we have delayed expunge 
for messages, we continue to run tape backups only for the case where 
users inadvertently delete folders.


Replication and delayed expunge were added by Ken (working as a 
contractor) back before he joined CMU. Replication was sponsored by 
Columbia. Delayed expunge was sponsored by Fastmail, but mostly as a 
performance enhancement. Unexpunge was just a nice side effect.


I believe that Ken implemented the delayed expunge from scratch. My 
original two expunge expunge code is rather more involved:


1) Users can access expunged mail and deleted mailboxes using magic
   mailbox hierarchies (.EXPUNGED/ and .DELETED/).

2) It hooks into the quota system to record the amount of expunged
   space in each quota root. Messages are automatically expired when
   global or per quota root limits are reached.

With hindsight (1) was a daft idea on my part. Our users struggle with the 
idea of multiple mailboxes in their account, let alone magic mailbox 
hierarchies. (2) is arguably useful if you don't have infinite storage on 
your IMAP backends. There are however lots of other spool partitions which 
can fill up under a determined denial of service attack.


Without (1) or (2), delayed mailbox deletion is really nothing more 
exciting than a RENAME operation to some part of the mailbox hierarchy 
without a quota root that only the system administrator can access.


--
David Carter Email: [EMAIL PROTECTED]
University Computing Service,Phone: (01223) 334502
New Museums Site, Pembroke Street,   Fax:   (01223) 334679
Cambridge UK. CB2 3QH.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: sync_client bails out after 3 MAILBOXES need upgrading to USER in one run

2006-09-13 Thread Bron Gondwana

On Wed, 13 Sep 2006 09:14:17 +0100 (BST), David Carter [EMAIL PROTECTED] 
said:
 On Tue, 12 Sep 2006, Wesley Craig wrote:
 
  On a related note, what was the problem with accepting the Cambridge 
  patches for delayed folder deletion?  I'm interested in working on 
  getting that or similar code accepted.  Now that we have delayed expunge 
  for messages, we continue to run tape backups only for the case where 
  users inadvertently delete folders.

 [ ... other things about delayed mailbox deletion ... ]
 
 Without (1) or (2), delayed mailbox deletion is really nothing more 
 exciting than a RENAME operation to some part of the mailbox hierarchy 
 without a quota root that only the system administrator can access.

Ho hum de dum... of course.  Why didn't I think of that.  Much easier
than trying to fiddle around with the filesystem level deletion code.

I smell a patch, some time when I'm more awake and not managing the
migration of users to smaller partitions (note to those who haven't
been bitten yet - don't ever put thousands of users on a 2Tb filesystem.
Just about everything than could possibly go wrong means days of
downtime, and users _hate_ that)

Hrm... first edge case that sounds interesting, deleting a folder that
has subfolders without deleting the subfolders as well...

My urgent todo list is down to about 10 items now, and I'm sure at
least one of those is easy :)

Bron.
-- 
  Bron Gondwana
  [EMAIL PROTECTED]


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: sync_client bails out after 3 MAILBOXES need upgrading to USER in one run

2006-09-12 Thread Wesley Craig

On 29 Aug 2006, at 11:39, Ken Murchison wrote:
The main reason things changed is sypport for shared mailboxes.  I  
can't   elaborate now because I'm driving.


Can you say more about this?  I'd like to fix this code and resubmit  
it.  The current implementation causes sync_client to bail out,  
particularly during xfer.


On a related note, what was the problem with accepting the Cambridge  
patches for delayed folder deletion?  I'm interested in working on  
getting that or similar code accepted.  Now that we have delayed  
expunge for messages, we continue to run tape backups only for the  
case where users inadvertently delete folders.


:wes


-Original Message-
From: Wesley Craig [EMAIL PROTECTED]
To: David Carter [EMAIL PROTECTED]
Cc: Bron Gondwana [EMAIL PROTECTED]; Ken Murchison  
[EMAIL PROTECTED]; Info Cyrus info-cyrus@lists.andrew.cmu.edu

Sent: 8/29/06 11:05 AM
Subject: Re: sync_client bails out after 3 MAILBOXES need upgrading  
to USER in one   run


On 29 Aug 2006, at 04:35, David Carter wrote:

My original code (which we are still running: I'm not in any hurry
to upgrade to 2.3) sorts mailbox actions by user. If a single
mailbox action associated with a user fails the rest are discarded
and a USER event is generated. If the USER event fails it locks the
given user out of the mboxlist and tries again. This is close to
what you describe above.


Why is 2.3 different?  I'm fairly sure that these issues:

4) xfer onto a replicating backend causes sync_client to exit
8) renaming users causes sync_client to exit

would be solved with the algorithm you're using (or the one Bron
outlined).

:wes


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: sync_client bails out after 3 MAILBOXES need upgrading to USER in one run

2006-09-12 Thread Kjetil Torgrim Homme
On Tue, 2006-09-12 at 10:25 -0400, Wesley Craig wrote:
 Now that we have delayed  
 expunge for messages, we continue to run tape backups only for the  
 case where users inadvertently delete folders.

interesting.  is one of the replicas off-site?  you don't worry about
EMP or stuff like that?  for how long do you keep the expunged messages?
I think turning off tape backup would be a very tough sell around
here...
-- 
Kjetil T.



Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: sync_client bails out after 3 MAILBOXES need upgrading to USER in one run

2006-09-12 Thread Wesley Craig

On 12 Sep 2006, at 16:51, Kjetil Torgrim Homme wrote:

interesting.  is one of the replicas off-site?  you don't worry about
EMP or stuff like that?  for how long do you keep the expunged  
messages?

I think turning off tape backup would be a very tough sell around
here...


In our test runs, restoring our data (12 TB and growing) from TSM  
would take 6-8 weeks.  To get a reasonable restore time from our  
(YMMV) TSM installation for Cyrus IMAP for disaster recovery purposes  
is cost prohibitive -- more expensive that our entire IMAP  
installation.  It works fine for user-error, tho.


replicas are off site.  We're keeping unexpunged messages for one  
week, which is per University policy.  Our centrally run backup  
service is a rather old TSM installation.  It needs to be upgraded.   
The cost difference between upgrading TSM and duplicating our IMAP  
infrastructure was strongly in favor of duplicating IMAP.  As an  
added benefit, in the event of a more-likely disaster (fire / flood /  
power problem in our data center, all of which we've experienced in  
the last 10 years), the duplicate IMAP installation could be  
immediately put into production.  Our AFS installation is also moving  
to disk-only backups.


Personally, if UMich experienced a disaster involving EMP or other  
TEOTWAWKI scenarios, I suspect that losing archived email would be  
the least of our worries.  I haven't heard that from UMich disaster  
planners, tho.


:wes

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: sync_client bails out after 3 MAILBOXES need upgrading to USER in one run

2006-08-29 Thread David Carter

On Sun, 27 Aug 2006, Bron Gondwana wrote:

I've attached my trivial solution (against CVS of last week some time), 
but I'm thinking a better (as in, less wasteful) solution might be to 
not return an error at all for a failed mailbox, but instead keep 
walking the entire tree, and then generate a USER event for every 
mailbox that hasn't been marked yet.


My original code (which we are still running: I'm not in any hurry to 
upgrade to 2.3) sorts mailbox actions by user. If a single mailbox action 
associated with a user fails the rest are discarded and a USER event is 
generated. If the USER event fails it locks the given user out of the 
mboxlist and tries again. This is close to what you describe above.


From memory the 3 retries thing was introduced to cope with transient 
problems on shared mailboxes, caused by mailboxes moving around under the 
replication engines feet. No promotion is possible in this case.


Ken and David - is there a reason why you chose to pass a single 
MAILBOXES command with multiple mailboxes to the backend rather than 
single mailbox commands?  The little birdy in my head is whispering (it 
does that at 1am after many hours of debugging) that it has something to 
do with supporting renames.


Rename and copying messages between mailboxes. With single mailbox 
commands RENAME becomes DELETE + CREATE/UPLOAD (which would work, but 
would be a pain if a GByte mailbox was involved). COPY would upload new 
messages rather than reusing the single instance store on the replica.


--
David Carter Email: [EMAIL PROTECTED]
University Computing Service,Phone: (01223) 334502
New Museums Site, Pembroke Street,   Fax:   (01223) 334679
Cambridge UK. CB2 3QH.

Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: sync_client bails out after 3 MAILBOXES need upgrading to USER in one run

2006-08-29 Thread Wesley Craig

On 29 Aug 2006, at 04:35, David Carter wrote:
My original code (which we are still running: I'm not in any hurry  
to upgrade to 2.3) sorts mailbox actions by user. If a single  
mailbox action associated with a user fails the rest are discarded  
and a USER event is generated. If the USER event fails it locks the  
given user out of the mboxlist and tries again. This is close to  
what you describe above.


Why is 2.3 different?  I'm fairly sure that these issues:

4) xfer onto a replicating backend causes sync_client to exit
8) renaming users causes sync_client to exit

would be solved with the algorithm you're using (or the one Bron  
outlined).


:wes

Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


RE: sync_client bails out after 3 MAILBOXES need upgrading to USER in one run

2006-08-29 Thread Ken Murchison
The main reason things changed is sypport for shared mailboxes.  I can't   
elaborate now because I'm driving.

-- 
Kenneth Murchison
Systems Programmer
Project Cyrus Developer/Maintainer
Carnegie Mellon University

-Original Message-
From: Wesley Craig [EMAIL PROTECTED]
To: David Carter [EMAIL PROTECTED]
Cc: Bron Gondwana [EMAIL PROTECTED]; Ken Murchison [EMAIL PROTECTED]; 
Info Cyrus info-cyrus@lists.andrew.cmu.edu
Sent: 8/29/06 11:05 AM
Subject: Re: sync_client bails out after 3 MAILBOXES need upgrading to USER in 
one   run

On 29 Aug 2006, at 04:35, David Carter wrote:
 My original code (which we are still running: I'm not in any hurry  
 to upgrade to 2.3) sorts mailbox actions by user. If a single  
 mailbox action associated with a user fails the rest are discarded  
 and a USER event is generated. If the USER event fails it locks the  
 given user out of the mboxlist and tries again. This is close to  
 what you describe above.

Why is 2.3 different?  I'm fairly sure that these issues:

4) xfer onto a replicating backend causes sync_client to exit
8) renaming users causes sync_client to exit

would be solved with the algorithm you're using (or the one Bron  
outlined).

:wes



Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


sync_client bails out after 3 MAILBOXES need upgrading to USER in one run

2006-08-26 Thread Bron Gondwana
So I've got a huge log file, and it keeps bailing out trying to process it, but
at a different place each time.  Very odd.

Closer inspection shows that the do_sync code calls do_mailboxes up to 3 times
if there are errors, and if there is a mailbox that can be promoted to user
each time, does that.

This should be fine in the general case, but sort of sucks when there are a
lot of new users being created fast enough that sync_client gets more than three
events that need promoting in a single run.

I've attached my trivial solution (against CVS of last week some time), but I'm
thinking a better (as in, less wasteful) solution might be to not return an 
error
at all for a failed mailbox, but instead keep walking the entire tree, and then
generate a USER event for every mailbox that hasn't been marked yet.

Ken and David - is there a reason why you chose to pass a single MAILBOXES
command with multiple mailboxes to the backend rather than single mailbox
commands?  The little birdy in my head is whispering (it does that at 1am
after many hours of debugging) that it has something to do with supporting
renames.

Anyway - I think the real answer is to either have access to the user_list
deep down inside do_mailboxes so that things can be appended directly to it
at the time of error finding, or just not to mark those folders during the
mailboxes pass, so you can upgrade those users to a full USER sync.

My patch just says if there has been any progress then that's good enough
for me - start over with the whole list of unmarked folders again without
incrementing the failure count.  It can't stave (always one more folder
marked per loop) but it sure could get slow if every folder was a subfolder
for a user that hadn't been created yet.

Regards,

Bron.
-- 
  Bron Gondwana
  [EMAIL PROTECTED]

diff -ur --new-file cyrus-imapd-cvs.orig/imap/sync_client.c 
cyrus-imapd-cvs/imap/sync_client.c
--- cyrus-imapd-cvs.orig/imap/sync_client.c 2006-07-26 20:03:15.0 
-0400
+++ cyrus-imapd-cvs/imap/sync_client.c  2006-08-26 10:45:14.0 -0400
@@ -2988,7 +2988,7 @@
 if (folder_list-count) {
int n = 0;
do {
-   sleep(n*2);  /* XXX  should this be longer? */
+   if (n) sleep(n*2);  /* XXX  should this be longer? */
r = do_mailboxes(folder_list);
if (r) {
/* promote failed personal mailboxes to USER */
@@ -3013,6 +3013,7 @@
   folder-name, userid);
}
free(userid);
+   --n; /* we're still making progress */
}
}
} while (r  (++n  SYNC_MAILBOX_RETRIES));

Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html