Re: Importing/moving an older cyrus message tree into a new system, without IMAP
Hi, On Mon, 13 Sep 2010, Forrest Aldrich wrote: I have an older system that crashed - cyrus version is a couple years or so old. I have 1000's of messages in the spool that I need to preserve. My question is about whether there's a way to import that huge tree of messages into a new cyrus installation without imap-to-imap connectivity? We did a migration some months back from an old Kolab v1 (cyrus v2.1) system to a new Kolab v2.2 (cyrus v2.2) system. This was done by writing a script to - dump the ldap database (you might not have this) and load it on the new system - rsync the mailboxes from their location on the old server to the correct location on the new server - recursively reconstruct those mailboxes - copy the .seen and .sub information to the correct new location - copy the quota information to the correct new location - dump the old mailboxes.db and load it on the new system (with cyrus stopped) It's not trivial, but it can be done with some care. We also had to translate usernames from user to user@domain in various places to match the new kolab setup but you probably won't have to worry about that. Gavin Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: imapd dumping core due to SEGV
Sorry for the delay getting back about this, I meant to let people know that the reason for this: Also when this happens the cyrus master process kills all other active imapd processes and restarts, is there a reason for this? I've never heard of master doing that in response to ANY child behavior. Does master log anything? was the way we had setup SMF on Solaris to control Cyrus IMAP. One needs to make sure SMF is setup to ignore core dumps and child processes signaling death otherwise SMF will restart the entire service. On Mon, 12 Jul 2010 18:36:59 +0100, Wesley Craig w...@umich.edu wrote: On 05 Jul 2010, at 10:56, Gavin Gray wrote: Two of them have had imapd processes crash and leave core dumps in the past couple of days. Looking at the core dumps with dbx we see I'm not aware of bug fixes in those code paths. Given how little those two code paths have in common, I'd suspect memory corruption. Also when this happens the cyrus master process kills all other active imapd processes and restarts, is there a reason for this? I've never heard of master doing that in response to ANY child behavior. Does master log anything? :wes -- Gavin Gray Edinburgh University Information Services Rm 2013 JCMB Kings Buildings Edinburgh EH9 3JZ UK tel +44 (0)131 650 5987 email gavin.g...@ed.ac.uk -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
TLS server engine: cannot load CA data
Hello, Strange problem: - Sep 14 09:18:12 mail cyrus/imap[21928]: TLS server engine: cannot load CA data Sep 14 09:18:12 mail cyrus/imap[21928]: unable to get certificate from '/etc/apache2/ssl/mail_rcg_nl.crt' Sep 14 09:18:12 mail cyrus/imap[21928]: TLS server engine: cannot load cert/key data, may be a cert/key mismatch? Sep 14 09:18:12 mail cyrus/imap[21928]: error initializing TLS But this command gives the certificate: su cyrus -c cat /etc/apache2/ssl/mail_rcg_nl.crt Cyrus is running as user cyrus. What could be wrong? With regards, Paul van der Vlis. -- http://www.vandervlis.nl/ Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: TLS server engine: cannot load CA data
On 09/14/2010 07:51 AM, Paul van der Vlis wrote: Hello, Strange problem: - Sep 14 09:18:12 mail cyrus/imap[21928]: TLS server engine: cannot load CA data Sep 14 09:18:12 mail cyrus/imap[21928]: unable to get certificate from '/etc/apache2/ssl/mail_rcg_nl.crt' Sep 14 09:18:12 mail cyrus/imap[21928]: TLS server engine: cannot load cert/key data, may be a cert/key mismatch? Sep 14 09:18:12 mail cyrus/imap[21928]: error initializing TLS But this command gives the certificate: su cyrus -c cat /etc/apache2/ssl/mail_rcg_nl.crt Cyrus is running as user cyrus. What could be wrong? Can cyrus read the private key file (.key) ? With regards, Paul van der Vlis. attachment: boutilpj.vcf Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
sync-server without deletes?
We've been running sync replication between two servers for a few months now and everything has been working well. Recently, management has come down and asked if it's possible to have the sync only perform additions and to ignore deletions. The idea is that they would like our backup server (or possibly a third box) contain an archive of all mail ever delivered to our users (we would manage expiration manually). From what I can tell, this most likely isn't possible with the current sync-server, so I wanted to confirm that hunch and if I'm correct, see what other people are doing for this kind of thing. Thanks, Derek -- -- Derek Chen-Becker Senior Network Engineer, Security Architect CPI Corp, Inc. 1706 Washington Ave St. Louis, MO 63103 Phone: 314-231-7711 x6455 Fax: 314-613-6724 dbec...@cpicorp.com PGP Key available from public key servers Fingerprint: E4C4 26C0 8588 E80A C29F 636D 1FBE 0FE3 2871 4AE8 -- Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: sync-server without deletes?
Quoting Derek Chen-Becker dbec...@cpicorp.com: We've been running sync replication between two servers for a few months now and everything has been working well. Recently, management has come down and asked if it's possible to have the sync only perform additions and to ignore deletions. The idea is that they would like our backup server (or possibly a third box) contain an archive of all mail ever delivered to our users (we would manage expiration manually). From what I can tell, this most likely isn't possible with the current sync-server, so I wanted to confirm that hunch and if I'm correct, see what other people are doing for this kind of thing. IHMO the syncserver uses the options expunge_mode and delete_mode in imapd.conf. So if you don't run cyr_expire mails should stay on the filesystem. Thanks, Derek -- -- Derek Chen-Becker Senior Network Engineer, Security Architect CPI Corp, Inc. 1706 Washington Ave St. Louis, MO 63103 Phone: 314-231-7711 x6455 Fax: 314-613-6724 dbec...@cpicorp.com PGP Key available from public key servers Fingerprint: E4C4 26C0 8588 E80A C29F 636D 1FBE 0FE3 2871 4AE8 -- Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ M.MengeTel.: (49) 7071/29-70316 Universität Tübingen Fax.: (49) 7071/29-5912 Zentrum für Datenverarbeitung mail: michael.me...@zdv.uni-tuebingen.de Wächterstraße 76 72074 Tübingen smime.p7s Description: S/MIME Signatur Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: sync-server without deletes?
On Tue, Sep 14, 2010 at 08:03:47AM -0500, Derek Chen-Becker wrote: We've been running sync replication between two servers for a few months now and everything has been working well. Recently, management has come down and asked if it's possible to have the sync only perform additions and to ignore deletions. The idea is that they would like our backup server (or possibly a third box) contain an archive of all mail ever delivered to our users (we would manage expiration manually). From what I can tell, this most likely isn't possible with the current sync-server, so I wanted to confirm that hunch and if I'm correct, see what other people are doing for this kind of thing. Yeah, not really I'm afraid. Not only doesn't it work like that, but you can't even guarantee that the deleted email gets replicated at all! If it gets expunged in a replication window, it will never get copied. With the new replication engine in 2.4, it will be possible - deleted messages still get replicated for a week - and if you set an explicit long expiry time on the replica (say, years!) then it wouldn't get cleaned up any earlier. A common pattern is just to duplicate all email to a different folder (during LMTP delivery or via sieve rules). Of course, that doesn't catch stuff that's uploaded via IMAP though. Bron. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: imapd dumping core due to SEGV
For Solaris SMF and Cyrus please use in your manifest for Cyrus IMAP: property_group name='startd' type='framework' propval name='ignore_error' type='astring' value='core,signal'/ /property_group The imap service will not be restarted when an imap process is killed anymore. Only when master ends the startd will believe Cyrus is down. Dito for an imap process dumping core. Pascal Gavin Gray gavin.g...@ed.ac.uk a écrit : Sorry for the delay getting back about this, I meant to let people know that the reason for this: Also when this happens the cyrus master process kills all other active imapd processes and restarts, is there a reason for this? I've never heard of master doing that in response to ANY child behavior. Does master log anything? was the way we had setup SMF on Solaris to control Cyrus IMAP. One needs to make sure SMF is setup to ignore core dumps and child processes signaling death otherwise SMF will restart the entire service. On Mon, 12 Jul 2010 18:36:59 +0100, Wesley Craig w...@umich.edu wrote: On 05 Jul 2010, at 10:56, Gavin Gray wrote: Two of them have had imapd processes crash and leave core dumps in the past couple of days. Looking at the core dumps with dbx we see I'm not aware of bug fixes in those code paths. Given how little those two code paths have in common, I'd suspect memory corruption. Also when this happens the cyrus master process kills all other active imapd processes and restarts, is there a reason for this? I've never heard of master doing that in response to ANY child behavior. Does master log anything? :wes -- Gavin Gray Edinburgh University Information Services Rm 2013 JCMB Kings Buildings Edinburgh EH9 3JZ UK tel +44 (0)131 650 5987 email gavin.g...@ed.ac.uk -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ -- Envoyé de mon téléphone Android avec K-9 Mail. Excusez la brièveté. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: Draft: Bugzilla Work Flow
On 09/04/2010 07:41 AM, Jeroen van Meeuwen (Kolab Systems) wrote: To allow some early feedback, I'm putting the page on the list now as opposed to when I feel like I'm done documenting everything in full ;-) http://www.cyrusimap.org/mediawiki/index.php/User:Jeroen_van_Meeuwen/Drafts/Bugzilla_Work_Flow This page is accessible, but what happened to the cyrus wiki? There seems to be a new web page for cyrus which doesn't appear very wiki-like (http://www.cyrusimap.org/), and googling only takes me to this page. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: TLS server engine: cannot load CA data
Patrick Boutilier schreef: On 09/14/2010 07:51 AM, Paul van der Vlis wrote: Hello, Strange problem: - Sep 14 09:18:12 mail cyrus/imap[21928]: TLS server engine: cannot load CA data Sep 14 09:18:12 mail cyrus/imap[21928]: unable to get certificate from '/etc/apache2/ssl/mail_rcg_nl.crt' Sep 14 09:18:12 mail cyrus/imap[21928]: TLS server engine: cannot load cert/key data, may be a cert/key mismatch? Sep 14 09:18:12 mail cyrus/imap[21928]: error initializing TLS But this command gives the certificate: su cyrus -c cat /etc/apache2/ssl/mail_rcg_nl.crt Cyrus is running as user cyrus. What could be wrong? Can cyrus read the private key file (.key) ? Yes, it can. But I think I've found it, the tls_ca_file in imapd.conf was wrong. With regards, Paul van der Vlis. -- http://www.vandervlis.nl/ Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: sync-server without deletes?
On 09/14/2010 08:53 AM, Bron Gondwana wrote: With the new replication engine in 2.4, it will be possible - deleted messages still get replicated for a week - and if you set an explicit long expiry time on the replica (say, years!) then it wouldn't get cleaned up any earlier. That sounds like it's what we want, so I'll plan on moving to that. In the short term perhaps I'll just need to copy to a common folder as you indicated, or have postfix just send a duplicate copy to our long-term backup server. Thanks, Derek -- -- Derek Chen-Becker Senior Network Engineer, Security Architect CPI Corp, Inc. 1706 Washington Ave St. Louis, MO 63103 Phone: 314-231-7711 x6455 Fax: 314-613-6724 dbec...@cpicorp.com PGP Key available from public key servers Fingerprint: E4C4 26C0 8588 E80A C29F 636D 1FBE 0FE3 2871 4AE8 -- Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: Importing/moving an older cyrus message tree into a new system, without IMAP
Dear Dan, If you're not concerned about your quota database, seen state, annotations, and subscription information, and assuming you've already regenerated your top level mailbox hierarchy, then you should be able to copy over the individual email files from each mailbox to the new server and perform a reconstruct on each mailbox (with the -r recursive option). If the new location is already live, then you'll need to be careful that you don't hit any filename collisions between the old server (e.g. email '123.') and the new server. You may also be able to copy over the primary database files (like your configdirectory/mailboxes.db), if your library version and cyrus versions match between the old and new servers. If not, you may need to use cvt_cyrusdb to convert the database from the old server to flat or skiplist and convert them back to their native format on the new server (berkeley db version mismatches are particularly a problem here). What other meta-data files other than mailboxes.db do I need to copy if I want to restore everything (seen flags, other flags, etc)? And will it be a generally good practice to convert all required database files to flat first, then re-convert to the new server's file format? Will this guarantee a trouble-free migration? My aim is to be able to restore all meta-data in the event of a bare metal crash recovery. I'm ok with running a reconstruct if needed, but I should be able to re-create all meta-data, including mail folder permissions (which I'll get from mailboxes.db, I think), flags, quota, etc. I am trying to arrive at a proper process for recovery in the event of slight mismatch between Cyrus versions or in the event of moving between 32-bit and 64-bit hardware. One thing I'm not worried about is how to back up the messages themselves --- a shutdown of Cyrus and simple tar of the spool area will do for me, I think. thanks and regards, Shuvam Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: Importing/moving an older cyrus message tree into a new system, without IMAP
We did a migration some months back from an old Kolab v1 (cyrus v2.1) system to a new Kolab v2.2 (cyrus v2.2) system. This was done by writing a script to - dump the ldap database (you might not have this) and load it on the new system - rsync the mailboxes from their location on the old server to the correct location on the new server - recursively reconstruct those mailboxes - copy the .seen and .sub information to the correct new location - copy the quota information to the correct new location - dump the old mailboxes.db and load it on the new system (with cyrus stopped) Some questions: When you copied all the .seen files, did you dump to flat format and then recreate in the new db format? Did you migrate the mailboxes.db before or after reconstructing the mailboxes? What do the .sub files contain? On my (very small) system, I found just a list of mail folder names. thanks and regards, Shuvam Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: Importing/moving an older cyrus message tree into a new system, without IMAP
On 14/09/10 22:41 +0530, Shuvam Misra wrote: What other meta-data files other than mailboxes.db do I need to copy if I want to restore everything (seen flags, other flags, etc)? And will it be a generally good practice to convert all required database files to flat first, then re-convert to the new server's file format? Will this guarantee a trouble-free migration? See the manpage for imapd.conf for possible formats, but for my 2.3.12 installation, with configdirectory specified at /var/lib/cyrus (and no customization to my *_db options), my database files are: /var/lib/cyrus/mailboxes.db list of mailboxes Cyrus skiplist DB /var/lib/cyrus/annotations.db list of annotations Cyrus skiplist DB /var/lib/cyrus/tls_sessions.db cache of TLS sessions Berkeley DB /var/lib/cyrus/deliver.db duplicate delivery database Berkeley DB Per mailbox/user files: /var/lib/cyrus/domain/e/example.org/user/j/jsmith.mboxkey backend for mailbox keys Cyrus skiplist DB /var/lib/cyrus/domain/e/example.org/user/j/jsmith.seen seen database Cyrus skiplist DB /var/lib/cyrus/domain/e/example.org/user/j/jsmith.sub subscription database flat ASCII /var/lib/cyrus/domain/o/olp.net/quota/j/user.jsmith quotaroot database quotalegacy format Some of those you may not be able to convert to flat (although I haven't actually tried). My aim is to be able to restore all meta-data in the event of a bare metal crash recovery. I'm ok with running a reconstruct if needed, but I should be able to re-create all meta-data, including mail folder permissions (which I'll get from mailboxes.db, I think), flags, quota, etc. I am trying to arrive at a proper process for recovery in the event of slight mismatch between Cyrus versions or in the event of moving between 32-bit and 64-bit hardware. One thing I'm not worried about is how to back up the messages themselves --- a shutdown of Cyrus and simple tar of the spool area will do for me, I think. The most straight forward way to restore from a filesystem backup is to have a backup system available with identical libraries and Cyrus version. If not in a failed scenario, then doing an imap sync, or rolling cyrus replication, is a safe bet. -- Dan White Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: sync-server without deletes?
On Tue, 14 Sep 2010, Derek Chen-Becker wrote: On 09/14/2010 08:53 AM, Bron Gondwana wrote: With the new replication engine in 2.4, it will be possible - deleted messages still get replicated for a week - and if you set an explicit long expiry time on the replica (say, years!) then it wouldn't get cleaned up any earlier. That sounds like it's what we want, so I'll plan on moving to that. In the short term perhaps I'll just need to copy to a common folder as you indicated, or have postfix just send a duplicate copy to our long-term backup server. Well, unless you have users delivering mail to each other through IMAP on shared folders, one usually configures the MTAs to drop a copy of everything into a system mailbox... -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: Store documents in IMAP folders
On 09/12/2010 09:10 AM, Gavin McCullagh wrote: The goal is to have a PDF library available at any time, with basic file search on document/message name, so a file share doesn't solve my problem (and I don't want any document management system, I just want access to files). I don't imagine IMAP's search would work on MIME attachments, unless you did something like add a plain text version to the body of the email. I think he just wants to be able to search on the document name; although, if you're going to write a custom script to get the documents into an IMAP folder, then there's nothing stopping you from harvesting the text from the PDF file and placing it in the body of the message containing the attached document. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Using cvt_cyrusdb to convert quota database from skiplist back to quotalegacy.
Hello, I am having trouble converting a quota skiplist db back to quotalegacy format (I know... this is probably not the most common Cyrus operation :-) % cvt_cyrusdb /ssd/cyrs/imap/quotas.db skiplist /ssd/cyrs/imap/quota quotalegacy Converting from /ssd/cyrs/imap/quotas.db (skiplist) to /ssd/cyrs/imap/quota (quotalegacy) % find quota -type f | wc -l 126 % strings quotas.db|wc -l 135229 quotas.db was created using the reverse operation and took about one minute. I renamed the original 'quota' directory out of the way before making the second cvt_cyrusdb call. Closer inspection of the newly created 'quota' directory reveals 125 quota descriptor files named user.aXX created under 'a', all relating to existing top level mailboxes and containing the correct information, and (curiously) one file named 'u' in directory 'u'. I also tried a Berkeley DB intermediate format and the creation of the quotalegacy structure failed in an identical way. Other question : would I be better off with 65,000 small files (quotalegacy) in a one-level hash or with a single skiplist db for my quota information, when the files reside on solid state storage anyway ? Thx, Eric Luyten, Computing Centre VUB/ULB. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: Importing/moving an older cyrus message tree into a new system, without IMAP
Dear Dan, See the manpage for imapd.conf for possible formats, but for my 2.3.12 installation, with configdirectory specified at /var/lib/cyrus (and no customization to my *_db options), my database files are: Got it. Thanks a lot for the details. /var/lib/cyrus/annotations.db What are annotations? /var/lib/cyrus/tls_sessions.db You were saying these are transient data -- can one skip this? /var/lib/cyrus/deliver.db This too can be skipped right? They won't affect the user's perception of his emails, mailfolders, ACLs, quotas, flags, etc. /var/lib/cyrus/domain/e/example.org/user/j/jsmith.mboxkey backend for mailbox keys What are mailbox keys? /var/lib/cyrus/domain/e/example.org/user/j/jsmith.seen /var/lib/cyrus/domain/e/example.org/user/j/jsmith.sub Yes, these two are important. /var/lib/cyrus/domain/o/olp.net/quota/j/user.jsmith Some of those you may not be able to convert to flat (although I haven't actually tried). Okay, got it. In that case, if I can't convert to/from flat, I can't safely move between dissimilar Cyrus servers. In that case, I'll have to drop that file and lose that data, to be safe. The most straight forward way to restore from a filesystem backup is to have a backup system available with identical libraries and Cyrus version. yes, absolutely, and that's what we offer. However, when there are disaster situations not planned for, I was wondering how far I can provision against data loss when the exact version of Cyrus is simply not available. If not in a failed scenario, then doing an imap sync, or rolling cyrus replication, is a safe bet. Yes, we do this too. The problem was for situations where the installation is too small to justify a second server. I have one customer who intends to deploy our product in about 200 offices all over India. Each office has less than 20 users and just one server. I'll have to provide a complete-snapshot backup facility for him onto removable media, and then provide a restore in case there's a disaster. For situations like that, I was weighing options. Thanks a lot. You've given me more details than I'd hoped for. I'll work on this now. Shuvam Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: sync-server without deletes?
With the new replication engine in 2.4, it will be possible - deleted messages still get replicated for a week - and if you set an explicit long expiry time on the replica (say, years!) then it wouldn't get cleaned up any earlier. That sounds like it's what we want, so I'll plan on moving to that. In the short term perhaps I'll just need to copy to a common folder as you indicated, or have postfix just send a duplicate copy to our long-term backup server. Well, unless you have users delivering mail to each other through IMAP on shared folders, one usually configures the MTAs to drop a copy of everything into a system mailbox... Yes, this is what we do too. We have a milter in Sendmail which adds an envelope recipient for each mail passing through Sendmail. This new recipient is a system mailbox which holds the mail archive. Once a day, a cronjob moves all mails out of Inbox to a freshly created mail folder whose name contains today's date, thus preventing the Inbox accummulating millions of messages over time. It's quite simple, and it doesn't provide for an easy searching of the archive the way a search engine would do, but it's very reliable. One of the biggest differences between this approach and any IMAP replication based approach is that you can send an outgoing mail in the latter approach without it being recorded. (Take the worst-case situation where a guy disables his copy-to-Sent-folder flag and sends the mail, or even does a telnet to the SMTP port and hand-crafts the email.) In our approach, each and every outgoing mail also gets captured in the archive. Our solution can run any number of mail servers, hence the message can actually go through multiple MTAs before leaving the organisation. We make an archive copy at each MTA, thus wasting bandwidth to deliver redundant copies to the archive. But on disk, only one copy is stored, thanks to the de-duplication facility of Cyrus. :) regards, Shuvam Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
De-duping attachments
How difficult or easy would it be to modify Cyrus to strip all attachments from emails and store them separately in files? In the message file, replace the attachment with a special tag which will point to the attachment file. Whenever the message is fetched for any reason, the original MIME-encoded message will be re-constructed and delivered. If this can be implemented, then the file pointer in the message body could be its MD5 sum or something similar. This would ensure automatic de-dup --- if a file with the same MD5 exists, it means I won't store a second copy --- I'll just point to the existing file. Today's de-duping of entire messages is a wonderful facility, based on message-ID. But the problem is that this measure stops halfway -- it does not avoid the enormous duplication when the same JPEG image of Sandra and the kids, Word doc with sales-forecasts or PDF file is forwarded by 20 people in 20 separate messages to their friends and relatives ad infinitum. At the IMAP or POP protocol levels, no clients would see any change. But on the server side, the server's disk space usage would drop sharply and CPU usage would rise somewhat. One problem I can see is tracking of reference counts to attachment files. This intelligence would have to be built into the attachment-stripping layer, and then reference counts would have to be decremented each time a message file is unlink()ed internally by imapd, cyr_expire, etc. One simple way-out of this would be to use the file system itself --- create separate names for each reference to an attachment file, and hard-link these names to the single instance. Each message-file which refers to an existing attachment file will have its own unique reference-name to the attachment. When the message-file is deleted for any reason by Cyrus, it will also look through all embedded reference-names, and delete those reference hardlinks too. This means that if a Cyrus message store is spread across multiple partitions, one physical copy of each attachment-file will have to be stored in each partition (potentially), to allow hardlinking from message references. Shuvam Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: De-duping attachments
How difficult or easy would it be to modify Cyrus to strip all attachments from emails and store them separately in files? In the message file, replace the attachment with a special tag which will point to the attachment file. Whenever the message is fetched for any reason, the original MIME-encoded message will be re-constructed and delivered. Like anything, doable, but quite a lot of work. cyrus likes to mmap the whole file so it can just offset into it to extract which ever part is requested. In IMAP, you can request any arbitrary byte range from the raw RFC822 message using the body[]start.length construct, so you have to be able to byte accurately reconstruct the original email if you remove attachments. Consider the problem of transfer encoding. Say you have a base64 encoded attachment (which basically all are). When storing and deduping, you'd want to base64 decode it to get the underlying binary data. But depending on the line length of the base64 encoded data, the same file can be encoded in a large number of different ways. When you reconstruct the base64 data, you have to be byte accurate in your reconstruction so your offsets are correct, and so any signing of the message (eg DKIM) isn't broken. Once you've solved those problems, the rest is pretty straight forward :) Rob Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: De-duping attachments
On Wed, Sep 15, 2010 at 12:13:03PM +1000, Rob Mueller wrote: How difficult or easy would it be to modify Cyrus to strip all attachments from emails and store them separately in files? In the message file, replace the attachment with a special tag which will point to the attachment file. Whenever the message is fetched for any reason, the original MIME-encoded message will be re-constructed and delivered. http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 2TB - US $109. Like anything, doable, but quite a lot of work. Now de-duping messages on copy is valuable, not so much because of the space it saves, but because of the IO it saves. Copying the file around is expensive. De-duping componenets of messages and then reconstructing? Not so much. You'll be causing MORE IO in general looking for the message, finding the parts. The only real benefit I can see is something like replication or a client that's downloading multiple of these large messages and wants to save network bandwidth. Except - there's no protocol to support this for client, so only replication could gain. cyrus likes to mmap the whole file so it can just offset into it to extract which ever part is requested. In IMAP, you can request any arbitrary byte range from the raw RFC822 message using the body[]start.length construct, so you have to be able to byte accurately reconstruct the original email if you remove attachments. Consider the problem of transfer encoding. Say you have a base64 encoded attachment (which basically all are). When storing and deduping, you'd want to base64 decode it to get the underlying binary data. But depending on the line length of the base64 encoded data, the same file can be encoded in a large number of different ways. When you reconstruct the base64 data, you have to be byte accurate in your reconstruction so your offsets are correct, and so any signing of the message (eg DKIM) isn't broken. Once you've solved those problems, the rest is pretty straight forward :) Yeah, they really aren't so hard to solve. I didn't actually do the research, but I have an idea what to do. Find a big corpus of emails (i.e. FastMail's one!) and figure out the 10-20 most common base64 widths and surrounding layouts. Choose one of those and store it by a single it's this layout. If none of them match exactly, store a binary diff from the closest one as well, it probably won't be very huge. But in general, I'd say you're optimising the wrong problem. It's just not worth it, the savings are minimal and the added complexity is high. Disk space is now cheap, and fast access via a cached copy of the email will beat re-creating the original file from mime parts hands down. Bron. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: De-duping attachments
Dear Rob, I had reservations about some of these things too. :( In particular, I was wondering about having to remember and recreate the exact transfer-encoding. If both of us forward the same attachment in two emails, and one encodes in quoted-printable, the other in base64, Cyrus had better be able to recreate them exactly or have some other workarounds. I wasn't aware of the mmap() usage and the direct seeking into the middle of the message body. But the bigger problem is what you've described about reproducing the message byte-identically. If that can be solved, then we can make Cyrus re-create the message while loading from disk and stick it into RAM. Can we just brainstorm with you and others in this thread... how do we re-create a byte-identical attachment from a disk file? What is the list of attributes we will need to store per stripped attachment to allow an exact re-creation? - file name/reference - full MIME header of the attachment block - separator string (this will be retained in the message body anyway) - transfer encoding - if encoding = base64 then base64 line length - checksum of encoded attachment (as a sanity check in case the re-encoding fails to recreate exactly the same image as the original) If encoding = quoted-printable or uuencode, then don't strip the attachment at all. What other conditions may we need to look for to bypass attachment stripping? Can we just tap into all of you to get the ideas on paper, even if it's not being implemented by anyone right now? It'll at least help us understand the system's internals better. thanks a lot, and regards, Shuvam cyrus likes to mmap the whole file so it can just offset into it to extract which ever part is requested. In IMAP, you can request any arbitrary byte range from the raw RFC822 message using the body[]start.length construct, so you have to be able to byte accurately reconstruct the original email if you remove attachments. Consider the problem of transfer encoding. Say you have a base64 encoded attachment (which basically all are). When storing and deduping, you'd want to base64 decode it to get the underlying binary data. But depending on the line length of the base64 encoded data, the same file can be encoded in a large number of different ways. When you reconstruct the base64 data, you have to be byte accurate in your reconstruction so your offsets are correct, and so any signing of the message (eg DKIM) isn't broken. Once you've solved those problems, the rest is pretty straight forward :) Rob Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: De-duping attachments
Dear Bron, http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 2TB - US $109. Don't want to nit-pick here, but the effective price we pay is about ten times this. To set up a mail server with a few TB of disk space, we usually land up deploying a separate chassis with RAID controllers and a RAID array, with FC connections from servers, etc, etc. All this adds up to about $1,000/TB of usable space if you're using something like the low-end IBM DS3400 box or Dell/EMC equivalent. This is even with inexpensive 7200RPM SATA-II drives, not 15KRPM SAS drives. http://www-07.ibm.com/storage/in/disk/ds3000/ds3400/ And most of our customers actually double this cost because they keep two physically identical chassis for redundancy. (We recommend this too, because we can't trust a single RAID 5 array to withstand controller or PSU failures.) In that case, it's $2000/TB. And you do reach 5-10 TB of email store quite rapidly --- our company has many corporate clients ( 500 email users) whose IMAP store has reached 4TB. No one wants to enforce disk quotas (corporate policy), and most users don't want to delete emails on their own. We keep hearing the logic that storage is cheap, and stories of cloud storage through Amazon, unlimited mailboxes on Gmail, are reinforcing the belief. But at the ground level in mid-market corporate IT budgets, storage costs in data centres (as against inside desktops) are still too high to be trivial, and their prices have only little to do with the prices of raw SATA-II drives. A fully-loaded DS3400 costs a little over $12,000 in India, with a full set of 1TB SATA-II drives from IBM, but even with high cost of IBM drives, the drives themselves contribute less than 30% of the total cost. If we really want to put our collective money where our mouth is, and deliver the storage-is-cheap promise at the ground level, we need to rearchitect every file server and IMAP server to work in map-reduce mode and use disks inside desktops. Anyone game for this project? :) Now de-duping messages on copy is valuable, not so much because of the space it saves, but because of the IO it saves. Copying the file around is expensive. De-duping componenets of messages and then reconstructing? Not so much. You'll be causing MORE IO in general looking for the message, finding the parts. I agree. My aim was not to reduce IOPS but to cut disk space usage. There are two areas where we are seeing a huge increase in inactive disk utilisation for emails. One is for the archive, which is being kept for security and compliance reasons. Every company we work with wants an archive with at least a few years' retention. They search the archive every few weeks to trace lost emails, not for compliance reasons but to find missing information. This means that we can't ask them to move the data out to removable storage. The second area is shared mail folders where all communication with each client/topic/project are stored practically forever. A 500-user company can easily acquire an email archive of 2-5TB. I don't care how much the IO load of that archive server increases, but I'd like to reduce disk space utilisation. If the customer can stick to 2TB of space requirements, he can use a desktop with two 2TB drives in RAID 1, and get a real cheap archive server. If this figure reaches 3-4TB, he goes into a separate RAID chassis --- the hardware cost goes up 5-10 times. These are tradeoffs a lot of small to mid-sized companies in my market fuss about. And in a more generic context, I am seeing that all kinds of intelligent de-duping of infrequently-accessed data is going to become the crying need of every mid-sized and large company. Data is growing too fast, and no one wants to impose user discipline or data cleaning. When we tell the business head This is crazy!, he turns around and tells the CTO But disk space is cheap! Haven't you heard of Google? What are you cribbing about? You must be doing something really inefficient here, wasting money! thanks and regards, Shuvam Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: Importing/moving an older cyrus message tree into a new system, without IMAP
On 15/09/10 06:46 +0530, Shuvam Misra wrote: What are annotations? Annotations are defined in RFC 5257. They allow an admin to add metadata to a mailbox (or the server). The cyradm utility sets annotations with its internal info, mboxcfg, and setinfo commands. /var/lib/cyrus/tls_sessions.db You were saying these are transient data -- can one skip this? Yes. /var/lib/cyrus/deliver.db This too can be skipped right? They won't affect the user's perception of his emails, mailfolders, ACLs, quotas, flags, etc. Right. /var/lib/cyrus/domain/e/example.org/user/j/jsmith.mboxkey backend for mailbox keys What are mailbox keys? It's for URLAUTH. See RFC 4467, and: http://www.cyrusimap.org/docs/cyrus-imapd/2.3.16/internal/database-formats.php -- Dan White Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: De-duping attachments
A 500-user company can easily acquire an email archive of 2-5TB. I don't care how much the IO load of that archive server increases, but I'd like to reduce disk space utilisation. If the customer can stick to 2TB of It would be interesting to measure the amount of duplication that is going on with attachments in emails. While we could do that with Fastmail data, I think because of the broad range of users, we'd be getting one data point, which might be quite different to a data point inside one company. Eg. An architectural firm might end up sending big blueprint documents back and forth between each other a lot, so they'd gain a lot from deduplication. Also even within deduplication, there's some interesting ideas as well. For instance, if you know the same file is being sent back and forth a lot with minor changes, you might want to store the most recent version, and store binary diffs between the most recent and old versions (eg xdelta). Yes accessing the older versions would be much slower (have to get most recent + apply N deltas), but the space savings could be huge. Can we just brainstorm with you and others in this thread... how do we re-create a byte-identical attachment from a disk file? One overall implementation issue. With the message file, do you: 1. Completely rewrite the message file removing the attachments and adding any extra meta data you want in it's place 2. Leave the message file as exactly the same size, just don't write out the attachment content and assume your filesystem supports sparse files (http://en.wikipedia.org/wiki/Sparse_file) The advantage of 2 is that it leaves the message file size correct, and all the offsets in the file are still correct. The downsides are that you must ensure your FS supports sparse files well, and there's the question of where do you actually store the information that links to the external file? - file name/reference - full MIME header of the attachment block I'd leave these intact in the actual message, and just add an extra X-Detached-File header or something like that includes some external reference to the file. Hmmm, that'll break signing though. Not so easy... - separator string (this will be retained in the message body anyway) - transfer encoding - if encoding = base64 then base64 line length Remember every line can actually be a different length! In most cases they will be the same length, but you can't assume it. And you do see messages that have lines in repeating groups like 76, 76, 76, 76, 74, 76, 76, 76, 76, 74, ... repeat ... or cases like that, a pain to deal with. - checksum of encoded attachment (as a sanity check in case the re-encoding fails to recreate exactly the same image as the original) This is seeming a bit more tricky... Rob Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/