Re: Importing/moving an older cyrus message tree into a new system, without IMAP

2010-09-14 Thread Gavin McCullagh
Hi,

On Mon, 13 Sep 2010, Forrest Aldrich wrote:

 I have an older system that crashed - cyrus version is a couple years or
 so old.  I have 1000's of messages in the spool that I need to preserve.
 My question is about whether there's a way to import that huge tree of
 messages into a new cyrus installation without imap-to-imap connectivity?

We did a migration some months back from an old Kolab v1 (cyrus v2.1)
system to a new Kolab v2.2 (cyrus v2.2) system.

This was done by writing a script to

 - dump the ldap database (you might not have this) and load it on the new
   system
 - rsync the mailboxes from their location on the old server to the
   correct location on the new server
 - recursively reconstruct those mailboxes
 - copy the .seen and .sub information to the correct new location
 - copy the quota information to the correct new location
 - dump the old mailboxes.db and load it on the new system (with cyrus
   stopped)

It's not trivial, but it can be done with some care.  We also had to
translate usernames from user to user@domain in various places to
match the new kolab setup but you probably won't have to worry about that.  

Gavin


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: imapd dumping core due to SEGV

2010-09-14 Thread Gavin Gray
Sorry for the delay getting back about this, I meant to let people know  
that the reason for this:

 Also when this happens the cyrus master process kills all other active
 imapd processes and restarts, is there a reason for this?

 I've never heard of master doing that in response to ANY child  
 behavior.  Does master log anything?

was the way we had setup SMF on Solaris to control Cyrus IMAP. One needs  
to make sure SMF is setup to ignore core dumps and child processes  
signaling death otherwise SMF will restart the entire service.

On Mon, 12 Jul 2010 18:36:59 +0100, Wesley Craig w...@umich.edu wrote:

 On 05 Jul 2010, at 10:56, Gavin Gray wrote:
 Two of them have had imapd  processes crash and leave core dumps in
 the past couple of days. Looking at the core dumps with dbx we see

 I'm not aware of bug fixes in those code paths.  Given how little those  
 two code paths have in common, I'd suspect memory corruption.

 Also when this happens the cyrus master process kills all other active
 imapd processes and restarts, is there a reason for this?

 I've never heard of master doing that in response to ANY child  
 behavior.  Does master log anything?

 :wes




-- 
Gavin Gray
Edinburgh University Information Services
Rm 2013 JCMB
Kings Buildings
Edinburgh
EH9 3JZ
UK
tel +44 (0)131 650 5987
email gavin.g...@ed.ac.uk

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


TLS server engine: cannot load CA data

2010-09-14 Thread Paul van der Vlis
Hello,

Strange problem:
-
Sep 14 09:18:12 mail cyrus/imap[21928]: TLS server engine: cannot load
CA data
Sep 14 09:18:12 mail cyrus/imap[21928]: unable to get certificate from
'/etc/apache2/ssl/mail_rcg_nl.crt'
Sep 14 09:18:12 mail cyrus/imap[21928]: TLS server engine: cannot load
cert/key data, may be a cert/key mismatch?
Sep 14 09:18:12 mail cyrus/imap[21928]: error initializing TLS


But this command gives the certificate:
su cyrus -c cat /etc/apache2/ssl/mail_rcg_nl.crt

Cyrus is running as user cyrus.

What could be wrong?

With regards,
Paul van der Vlis.




-- 
http://www.vandervlis.nl/


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: TLS server engine: cannot load CA data

2010-09-14 Thread Patrick Boutilier

On 09/14/2010 07:51 AM, Paul van der Vlis wrote:

Hello,

Strange problem:
-
Sep 14 09:18:12 mail cyrus/imap[21928]: TLS server engine: cannot load
CA data
Sep 14 09:18:12 mail cyrus/imap[21928]: unable to get certificate from
'/etc/apache2/ssl/mail_rcg_nl.crt'
Sep 14 09:18:12 mail cyrus/imap[21928]: TLS server engine: cannot load
cert/key data, may be a cert/key mismatch?
Sep 14 09:18:12 mail cyrus/imap[21928]: error initializing TLS


But this command gives the certificate:
su cyrus -c cat /etc/apache2/ssl/mail_rcg_nl.crt

Cyrus is running as user cyrus.

What could be wrong?


Can cyrus read the private key file (.key) ?






With regards,
Paul van der Vlis.






attachment: boutilpj.vcf
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/

sync-server without deletes?

2010-09-14 Thread Derek Chen-Becker
We've been running sync replication between two servers for a few months
now and everything has been working well. Recently, management has come
down and asked if it's possible to have the sync only perform additions
and to ignore deletions. The idea is that they would like our backup
server (or possibly a third box) contain an archive of all mail ever
delivered to our users (we would manage expiration manually). From what
I can tell, this most likely isn't possible with the current
sync-server, so I wanted to confirm that hunch and if I'm correct, see
what other people are doing for this kind of thing.

Thanks,

Derek

-- 
--
Derek Chen-Becker
Senior Network Engineer, Security Architect
CPI Corp, Inc.
1706 Washington Ave
St. Louis, MO 63103
Phone: 314-231-7711 x6455
Fax:   314-613-6724
dbec...@cpicorp.com
PGP Key available from public key servers
Fingerprint: E4C4 26C0 8588 E80A C29F  636D 1FBE 0FE3 2871 4AE8
--

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: sync-server without deletes?

2010-09-14 Thread Michael Menge

Quoting Derek Chen-Becker dbec...@cpicorp.com:


We've been running sync replication between two servers for a few months
now and everything has been working well. Recently, management has come
down and asked if it's possible to have the sync only perform additions
and to ignore deletions. The idea is that they would like our backup
server (or possibly a third box) contain an archive of all mail ever
delivered to our users (we would manage expiration manually). From what
I can tell, this most likely isn't possible with the current
sync-server, so I wanted to confirm that hunch and if I'm correct, see
what other people are doing for this kind of thing.



IHMO the syncserver uses the options expunge_mode and delete_mode in  
imapd.conf. So if you don't run cyr_expire mails should stay on the

filesystem.



Thanks,

Derek





--
--
Derek Chen-Becker
Senior Network Engineer, Security Architect
CPI Corp, Inc.
1706 Washington Ave
St. Louis, MO 63103
Phone: 314-231-7711 x6455
Fax:   314-613-6724
dbec...@cpicorp.com
PGP Key available from public key servers
Fingerprint: E4C4 26C0 8588 E80A C29F  636D 1FBE 0FE3 2871 4AE8
--

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/







M.MengeTel.: (49) 7071/29-70316
Universität Tübingen   Fax.: (49) 7071/29-5912
Zentrum für Datenverarbeitung  mail:  
michael.me...@zdv.uni-tuebingen.de

Wächterstraße 76
72074 Tübingen

smime.p7s
Description: S/MIME Signatur

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/

Re: sync-server without deletes?

2010-09-14 Thread Bron Gondwana
On Tue, Sep 14, 2010 at 08:03:47AM -0500, Derek Chen-Becker wrote:
 We've been running sync replication between two servers for a few months
 now and everything has been working well. Recently, management has come
 down and asked if it's possible to have the sync only perform additions
 and to ignore deletions. The idea is that they would like our backup
 server (or possibly a third box) contain an archive of all mail ever
 delivered to our users (we would manage expiration manually). From what
 I can tell, this most likely isn't possible with the current
 sync-server, so I wanted to confirm that hunch and if I'm correct, see
 what other people are doing for this kind of thing.

Yeah, not really I'm afraid.  Not only doesn't it work like that, but
you can't even guarantee that the deleted email gets replicated at all!
If it gets expunged in a replication window, it will never get copied.

With the new replication engine in 2.4, it will be possible - deleted
messages still get replicated for a week - and if you set an explicit
long expiry time on the replica (say, years!) then it wouldn't get
cleaned up any earlier.

A common pattern is just to duplicate all email to a different folder
(during LMTP delivery or via sieve rules).  Of course, that doesn't
catch stuff that's uploaded via IMAP though.

Bron.

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: imapd dumping core due to SEGV

2010-09-14 Thread Pascal Gienger
For Solaris SMF and Cyrus please use in your manifest for Cyrus IMAP:

property_group name='startd' type='framework'
  propval name='ignore_error' type='astring' value='core,signal'/
/property_group

The imap service will not be restarted when an imap process is killed anymore. 
Only when master ends the startd will believe Cyrus is down. Dito for an imap 
process dumping core.

Pascal


Gavin Gray gavin.g...@ed.ac.uk a écrit :

Sorry for the delay getting back about this, I meant to let people know  
that the reason for this:

 Also when this happens the cyrus master process kills all other active
 imapd processes and restarts, is there a reason for this?

 I've never heard of master doing that in response to ANY child  
 behavior.  Does master log anything?

was the way we had setup SMF on Solaris to control Cyrus IMAP. One needs  
to make sure SMF is setup to ignore core dumps and child processes  
signaling death otherwise SMF will restart the entire service.

On Mon, 12 Jul 2010 18:36:59 +0100, Wesley Craig w...@umich.edu wrote:

 On 05 Jul 2010, at 10:56, Gavin Gray wrote:
 Two of them have had imapd  processes crash and leave core dumps in
 the past couple of days. Looking at the core dumps with dbx we see

 I'm not aware of bug fixes in those code paths.  Given how little those  
 two code paths have in common, I'd suspect memory corruption.

 Also when this happens the cyrus master process kills all other active
 imapd processes and restarts, is there a reason for this?

 I've never heard of master doing that in response to ANY child  
 behavior.  Does master log anything?

 :wes




-- 
Gavin Gray
Edinburgh University Information Services
Rm 2013 JCMB
Kings Buildings
Edinburgh
EH9 3JZ
UK
tel +44 (0)131 650 5987
email gavin.g...@ed.ac.uk

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/

-- 
Envoyé de mon téléphone Android avec K-9 Mail. Excusez la brièveté.

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/

Re: Draft: Bugzilla Work Flow

2010-09-14 Thread Patrick Goetz
On 09/04/2010 07:41 AM, Jeroen van Meeuwen (Kolab Systems) wrote:
 To allow some early feedback, I'm putting the page on the list now as opposed
 to when I feel like I'm done documenting everything in full ;-)


 http://www.cyrusimap.org/mediawiki/index.php/User:Jeroen_van_Meeuwen/Drafts/Bugzilla_Work_Flow


This page is accessible, but what happened to the cyrus wiki?  There 
seems to be a new web page for cyrus which doesn't appear very wiki-like 
(http://www.cyrusimap.org/), and googling only takes me to this page.

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: TLS server engine: cannot load CA data

2010-09-14 Thread Paul van der Vlis
Patrick Boutilier schreef:
 On 09/14/2010 07:51 AM, Paul van der Vlis wrote:
 Hello,

 Strange problem:
 -
 Sep 14 09:18:12 mail cyrus/imap[21928]: TLS server engine: cannot load
 CA data
 Sep 14 09:18:12 mail cyrus/imap[21928]: unable to get certificate from
 '/etc/apache2/ssl/mail_rcg_nl.crt'
 Sep 14 09:18:12 mail cyrus/imap[21928]: TLS server engine: cannot load
 cert/key data, may be a cert/key mismatch?
 Sep 14 09:18:12 mail cyrus/imap[21928]: error initializing TLS
 

 But this command gives the certificate:
 su cyrus -c cat /etc/apache2/ssl/mail_rcg_nl.crt

 Cyrus is running as user cyrus.

 What could be wrong?
 
 Can cyrus read the private key file (.key) ?

Yes, it can.

But I think I've found it, the tls_ca_file in imapd.conf was wrong.

With regards,
Paul van der Vlis.




-- 
http://www.vandervlis.nl/


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: sync-server without deletes?

2010-09-14 Thread Derek Chen-Becker
On 09/14/2010 08:53 AM, Bron Gondwana wrote:
 With the new replication engine in 2.4, it will be possible - deleted
 messages still get replicated for a week - and if you set an explicit
 long expiry time on the replica (say, years!) then it wouldn't get
 cleaned up any earlier.

That sounds like it's what we want, so I'll plan on moving to that. In
the short term perhaps I'll just need to copy to a common folder as you
indicated, or have postfix just send a duplicate copy to our long-term
backup server.

Thanks,

Derek



-- 
--
Derek Chen-Becker
Senior Network Engineer, Security Architect
CPI Corp, Inc.
1706 Washington Ave
St. Louis, MO 63103
Phone: 314-231-7711 x6455
Fax:   314-613-6724
dbec...@cpicorp.com
PGP Key available from public key servers
Fingerprint: E4C4 26C0 8588 E80A C29F  636D 1FBE 0FE3 2871 4AE8
--

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: Importing/moving an older cyrus message tree into a new system, without IMAP

2010-09-14 Thread Shuvam Misra
Dear Dan,

 If you're not concerned about your quota database, seen state, annotations,
 and subscription information, and assuming you've already regenerated your
 top level mailbox hierarchy, then you should be able to copy over the
 individual email files from each mailbox to the new server and perform a
 reconstruct on each mailbox (with the -r recursive option).
 
 If the new location is already live, then you'll need to be careful that
 you don't hit any filename collisions between the old server (e.g. email
 '123.') and the new server.
 
 You may also be able to copy over the primary database files (like your
 configdirectory/mailboxes.db), if your library version and cyrus versions
 match between the old and new servers. If not, you may need to use
 cvt_cyrusdb to convert the database from the old server to flat or skiplist
 and convert them back to their native format on the new server (berkeley db
 version mismatches are particularly a problem here).

What other meta-data files other than mailboxes.db do I need to copy if
I want to restore everything (seen flags, other flags, etc)? And will it
be a generally good practice to convert all required database files to
flat first, then re-convert to the new server's file format? Will this
guarantee a trouble-free migration?

My aim is to be able to restore all meta-data in the event of a bare
metal crash recovery. I'm ok with running a reconstruct if needed,
but I should be able to re-create all meta-data, including mail folder
permissions (which I'll get from mailboxes.db, I think), flags, quota,
etc. I am trying to arrive at a proper process for recovery in the
event of slight mismatch between Cyrus versions or in the event of
moving between 32-bit and 64-bit hardware. One thing I'm not worried
about is how to back up the messages themselves --- a shutdown of Cyrus
and simple tar of the spool area will do for me, I think.

thanks and regards,
Shuvam

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: Importing/moving an older cyrus message tree into a new system, without IMAP

2010-09-14 Thread Shuvam Misra
 We did a migration some months back from an old Kolab v1 (cyrus v2.1)
 system to a new Kolab v2.2 (cyrus v2.2) system.
 
 This was done by writing a script to
 
  - dump the ldap database (you might not have this) and load it on the new
system
  - rsync the mailboxes from their location on the old server to the
correct location on the new server
  - recursively reconstruct those mailboxes
  - copy the .seen and .sub information to the correct new location
  - copy the quota information to the correct new location
  - dump the old mailboxes.db and load it on the new system (with cyrus
stopped)

Some questions:

When you copied all the .seen files, did you dump to flat format and then
recreate in the new db format?

Did you migrate the mailboxes.db before or after reconstructing the
mailboxes?

What do the .sub files contain? On my (very small) system, I found just
a list of mail folder names.

thanks and regards,
Shuvam

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: Importing/moving an older cyrus message tree into a new system, without IMAP

2010-09-14 Thread Dan White
On 14/09/10 22:41 +0530, Shuvam Misra wrote:
What other meta-data files other than mailboxes.db do I need to copy if
I want to restore everything (seen flags, other flags, etc)? And will it
be a generally good practice to convert all required database files to
flat first, then re-convert to the new server's file format? Will this
guarantee a trouble-free migration?

See the manpage for imapd.conf for possible formats, but for my 2.3.12
installation, with configdirectory specified at /var/lib/cyrus (and no
customization to my *_db options), my database files are:

/var/lib/cyrus/mailboxes.db
   list of mailboxes
   Cyrus skiplist DB

/var/lib/cyrus/annotations.db
   list of annotations
   Cyrus skiplist DB

/var/lib/cyrus/tls_sessions.db
   cache of TLS sessions
   Berkeley DB

/var/lib/cyrus/deliver.db
   duplicate delivery database
   Berkeley DB

Per mailbox/user files:

/var/lib/cyrus/domain/e/example.org/user/j/jsmith.mboxkey
   backend for mailbox keys
   Cyrus skiplist DB

/var/lib/cyrus/domain/e/example.org/user/j/jsmith.seen
   seen database
   Cyrus skiplist DB

/var/lib/cyrus/domain/e/example.org/user/j/jsmith.sub
   subscription database
   flat ASCII

/var/lib/cyrus/domain/o/olp.net/quota/j/user.jsmith
   quotaroot database
   quotalegacy format

Some of those you may not be able to convert to flat (although I haven't
actually tried).

My aim is to be able to restore all meta-data in the event of a bare
metal crash recovery. I'm ok with running a reconstruct if needed,
but I should be able to re-create all meta-data, including mail folder
permissions (which I'll get from mailboxes.db, I think), flags, quota,
etc. I am trying to arrive at a proper process for recovery in the
event of slight mismatch between Cyrus versions or in the event of
moving between 32-bit and 64-bit hardware. One thing I'm not worried
about is how to back up the messages themselves --- a shutdown of Cyrus
and simple tar of the spool area will do for me, I think.

The most straight forward way to restore from a filesystem backup is to
have a backup system available with identical libraries and Cyrus version.

If not in a failed scenario, then doing an imap sync, or rolling cyrus
replication, is a safe bet.

-- 
Dan White

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: sync-server without deletes?

2010-09-14 Thread Henrique de Moraes Holschuh
On Tue, 14 Sep 2010, Derek Chen-Becker wrote:
 On 09/14/2010 08:53 AM, Bron Gondwana wrote:
  With the new replication engine in 2.4, it will be possible - deleted
  messages still get replicated for a week - and if you set an explicit
  long expiry time on the replica (say, years!) then it wouldn't get
  cleaned up any earlier.
 
 That sounds like it's what we want, so I'll plan on moving to that. In
 the short term perhaps I'll just need to copy to a common folder as you
 indicated, or have postfix just send a duplicate copy to our long-term
 backup server.

Well, unless you have users delivering mail to each other through IMAP
on shared folders, one usually configures the MTAs to drop a copy of
everything into a system mailbox...

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: Store documents in IMAP folders

2010-09-14 Thread Patrick Goetz
On 09/12/2010 09:10 AM, Gavin McCullagh wrote:
 The goal is to have a PDF library available at any time, with basic file
 search on document/message name, so a file share doesn't solve my problem
 (and I don't want any document management system, I just want access to
 files).

 I don't imagine IMAP's search would work on MIME attachments, unless you
 did something like add a plain text version to the body of the email.


I think he just wants to be able to search on the document name; 
although, if you're going to write a custom script to get the documents 
into an IMAP folder, then there's nothing stopping you from harvesting 
the text from the PDF file and placing it in the body of the message 
containing the attached document.

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Using cvt_cyrusdb to convert quota database from skiplist back to quotalegacy.

2010-09-14 Thread Eric Luyten
Hello,


I am having trouble converting a quota skiplist db back to quotalegacy
format (I know... this is probably not the most common Cyrus operation :-)

% cvt_cyrusdb /ssd/cyrs/imap/quotas.db skiplist /ssd/cyrs/imap/quota quotalegacy
Converting from /ssd/cyrs/imap/quotas.db (skiplist) to /ssd/cyrs/imap/quota
(quotalegacy)
% find quota -type f | wc -l
 126
% strings quotas.db|wc -l
  135229


quotas.db was created using the reverse operation and took about one minute.
I renamed the original 'quota' directory out of the way before making the
second cvt_cyrusdb call.

Closer inspection of the newly created 'quota' directory reveals 125 quota
descriptor files named user.aXX created under 'a', all relating to
existing top level mailboxes and containing the correct information, and
(curiously) one file named 'u' in directory 'u'.

I also tried a Berkeley DB intermediate format and the creation
of the quotalegacy structure failed in an identical way.


Other question : would I be better off with 65,000 small files
(quotalegacy) in a one-level hash or with a single skiplist db
for my quota information, when the files reside on solid state
storage anyway ?


Thx,
Eric Luyten, Computing Centre VUB/ULB.


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: Importing/moving an older cyrus message tree into a new system, without IMAP

2010-09-14 Thread Shuvam Misra
Dear Dan,

 See the manpage for imapd.conf for possible formats, but for my 2.3.12
 installation, with configdirectory specified at /var/lib/cyrus (and no
 customization to my *_db options), my database files are:

Got it. Thanks a lot for the details.

 /var/lib/cyrus/annotations.db

What are annotations?

 /var/lib/cyrus/tls_sessions.db

You were saying these are transient data -- can one skip this?

 /var/lib/cyrus/deliver.db

This too can be skipped right? They won't affect the user's perception of
his emails, mailfolders, ACLs, quotas, flags, etc.

 /var/lib/cyrus/domain/e/example.org/user/j/jsmith.mboxkey
   backend for mailbox keys

What are mailbox keys?

 /var/lib/cyrus/domain/e/example.org/user/j/jsmith.seen
 /var/lib/cyrus/domain/e/example.org/user/j/jsmith.sub

Yes, these two are important.

 /var/lib/cyrus/domain/o/olp.net/quota/j/user.jsmith
 
 Some of those you may not be able to convert to flat (although I haven't
 actually tried).

Okay, got it. In that case, if I can't convert to/from flat, I can't
safely move between dissimilar Cyrus servers. In that case, I'll have to
drop that file and lose that data, to be safe.

 The most straight forward way to restore from a filesystem backup is to
 have a backup system available with identical libraries and Cyrus version.

yes, absolutely, and that's what we offer. However, when there are
disaster situations not planned for, I was wondering how far I can
provision against data loss when the exact version of Cyrus is simply not
available.

 If not in a failed scenario, then doing an imap sync, or rolling cyrus
 replication, is a safe bet.

Yes, we do this too. The problem was for situations where the
installation is too small to justify a second server. I have one customer
who intends to deploy our product in about 200 offices all over India.
Each office has less than 20 users and just one server. I'll have to
provide a complete-snapshot backup facility for him onto removable media,
and then provide a restore in case there's a disaster. For situations
like that, I was weighing options.

Thanks a lot. You've given me more details than I'd hoped for. I'll work
on this now.

Shuvam

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: sync-server without deletes?

2010-09-14 Thread Shuvam Misra
   With the new replication engine in 2.4, it will be possible - deleted
   messages still get replicated for a week - and if you set an explicit
   long expiry time on the replica (say, years!) then it wouldn't get
   cleaned up any earlier.
  
  That sounds like it's what we want, so I'll plan on moving to that. In
  the short term perhaps I'll just need to copy to a common folder as you
  indicated, or have postfix just send a duplicate copy to our long-term
  backup server.
 
 Well, unless you have users delivering mail to each other through IMAP
 on shared folders, one usually configures the MTAs to drop a copy of
 everything into a system mailbox...

Yes, this is what we do too. We have a milter in Sendmail which adds an
envelope recipient for each mail passing through Sendmail. This new
recipient is a system mailbox which holds the mail archive. Once a day, a
cronjob moves all mails out of Inbox to a freshly created mail folder
whose name contains today's date, thus preventing the Inbox accummulating
millions of messages over time.

It's quite simple, and it doesn't provide for an easy searching of
the archive the way a search engine would do, but it's very reliable.

One of the biggest differences between this approach and any IMAP
replication based approach is that you can send an outgoing mail in the
latter approach without it being recorded. (Take the worst-case situation
where a guy disables his copy-to-Sent-folder flag and sends the mail, or
even does a telnet to the SMTP port and hand-crafts the email.) In our
approach, each and every outgoing mail also gets captured in the archive.

Our solution can run any number of mail servers, hence the message can
actually go through multiple MTAs before leaving the organisation. We
make an archive copy at each MTA, thus wasting bandwidth to deliver
redundant copies to the archive. But on disk, only one copy is stored,
thanks to the de-duplication facility of Cyrus. :)

regards,
Shuvam

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


De-duping attachments

2010-09-14 Thread Shuvam Misra
How difficult or easy would it be to modify Cyrus to strip all
attachments from emails and store them separately in files? In the
message file, replace the attachment with a special tag which will point
to the attachment file. Whenever the message is fetched for any reason,
the original MIME-encoded message will be re-constructed and delivered.

If this can be implemented, then the file pointer in the message body
could be its MD5 sum or something similar. This would ensure automatic
de-dup --- if a file with the same MD5 exists, it means I won't store
a second copy --- I'll just point to the existing file.

Today's de-duping of entire messages is a wonderful facility, based
on message-ID. But the problem is that this measure stops halfway --
it does not avoid the enormous duplication when the same JPEG image
of Sandra and the kids, Word doc with sales-forecasts or PDF file is
forwarded by 20 people in 20 separate messages to their friends and
relatives ad infinitum.

At the IMAP or POP protocol levels, no clients would see any change. But
on the server side, the server's disk space usage would drop sharply
and CPU usage would rise somewhat.

One problem I can see is tracking of reference counts to
attachment files. This intelligence would have to be built into the
attachment-stripping layer, and then reference counts would have to
be decremented each time a message file is unlink()ed internally by
imapd, cyr_expire, etc. One simple way-out of this would be to use the
file system itself --- create separate names for each reference to an
attachment file, and hard-link these names to the single instance. Each
message-file which refers to an existing attachment file will have its
own unique reference-name to the attachment. When the message-file is
deleted for any reason by Cyrus, it will also look through all embedded
reference-names, and delete those reference hardlinks too. This means
that if a Cyrus message store is spread across multiple partitions,
one physical copy of each attachment-file will have to be stored in each
partition (potentially), to allow hardlinking from message references.

Shuvam

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: De-duping attachments

2010-09-14 Thread Rob Mueller

 How difficult or easy would it be to modify Cyrus to strip all
 attachments from emails and store them separately in files? In the
 message file, replace the attachment with a special tag which will point
 to the attachment file. Whenever the message is fetched for any reason,
 the original MIME-encoded message will be re-constructed and delivered.

Like anything, doable, but quite a lot of work.

cyrus likes to mmap the whole file so it can just offset into it to extract 
which ever part is requested. In IMAP, you can request any arbitrary byte 
range from the raw RFC822 message using the body[]start.length construct, 
so you have to be able to byte accurately reconstruct the original email if 
you remove attachments.

Consider the problem of transfer encoding. Say you have a base64 encoded 
attachment (which basically all are). When storing and deduping, you'd want 
to base64 decode it to get the underlying binary data. But depending on the 
line length of the base64 encoded data, the same file can be encoded in a 
large number of different ways. When you reconstruct the base64 data, you 
have to be byte accurate in your reconstruction so your offsets are correct, 
and so any signing of the message (eg DKIM) isn't broken.

Once you've solved those problems, the rest is pretty straight forward :)

Rob


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: De-duping attachments

2010-09-14 Thread Bron Gondwana
On Wed, Sep 15, 2010 at 12:13:03PM +1000, Rob Mueller wrote:
 
  How difficult or easy would it be to modify Cyrus to strip all
  attachments from emails and store them separately in files? In the
  message file, replace the attachment with a special tag which will point
  to the attachment file. Whenever the message is fetched for any reason,
  the original MIME-encoded message will be re-constructed and delivered.

http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413

2TB - US $109.

 Like anything, doable, but quite a lot of work.

Now de-duping messages on copy is valuable, not so much because of
the space it saves, but because of the IO it saves.  Copying the file
around is expensive.

De-duping componenets of messages and then reconstructing?  Not so much.
You'll be causing MORE IO in general looking for the message, finding the
parts.

The only real benefit I can see is something like replication or a
client that's downloading multiple of these large messages and wants
to save network bandwidth.

Except - there's no protocol to support this for client, so only
replication could gain.
 
 cyrus likes to mmap the whole file so it can just offset into it to extract 
 which ever part is requested. In IMAP, you can request any arbitrary byte 
 range from the raw RFC822 message using the body[]start.length construct, 
 so you have to be able to byte accurately reconstruct the original email if 
 you remove attachments.
 
 Consider the problem of transfer encoding. Say you have a base64 encoded 
 attachment (which basically all are). When storing and deduping, you'd want 
 to base64 decode it to get the underlying binary data. But depending on the 
 line length of the base64 encoded data, the same file can be encoded in a 
 large number of different ways. When you reconstruct the base64 data, you 
 have to be byte accurate in your reconstruction so your offsets are correct, 
 and so any signing of the message (eg DKIM) isn't broken.
 
 Once you've solved those problems, the rest is pretty straight forward :)

Yeah, they really aren't so hard to solve.  I didn't actually do the research,
but I have an idea what to do.  Find a big corpus of emails (i.e. FastMail's
one!) and figure out the 10-20 most common base64 widths and surrounding
layouts.  Choose one of those and store it by a single it's this layout.
If none of them match exactly, store a binary diff from the closest one as
well, it probably won't be very huge.

But in general, I'd say you're optimising the wrong problem.  It's just not
worth it, the savings are minimal and the added complexity is high.  Disk
space is now cheap, and fast access via a cached copy of the email will
beat re-creating the original file from mime parts hands down.

Bron.

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: De-duping attachments

2010-09-14 Thread Shuvam Misra
Dear Rob,

I had reservations about some of these things too. :( In particular,
I was wondering about having to remember and recreate the exact
transfer-encoding. If both of us forward the same attachment in two
emails, and one encodes in quoted-printable, the other in base64, Cyrus
had better be able to recreate them exactly or have some other
workarounds.

I wasn't aware of the mmap() usage and the direct seeking into the middle
of the message body. But the bigger problem is what you've described about
reproducing the message byte-identically. If that can be solved, then we
can make Cyrus re-create the message while loading from disk and stick it
into RAM.

Can we just brainstorm with you and others in this thread...  how do we
re-create a byte-identical attachment from a disk file?  What is the list
of attributes we will need to store per stripped attachment to allow an
exact re-creation?

  - file name/reference

  - full MIME header of the attachment block

  - separator string (this will be retained in the message body anyway)

  - transfer encoding

  - if encoding = base64 then
base64 line length

  - checksum of encoded attachment (as a sanity check in case the re-encoding
fails to recreate exactly the same image as the original)

If encoding = quoted-printable or uuencode, then don't strip the
attachment at all.

What other conditions may we need to look for to bypass attachment
stripping?

Can we just tap into all of you to get the ideas on paper, even if
it's not being implemented by anyone right now?  It'll at least help us
understand the system's internals better.

thanks a lot, and regards,
Shuvam

 cyrus likes to mmap the whole file so it can just offset into it to
 extract which ever part is requested. In IMAP, you can request any
 arbitrary byte range from the raw RFC822 message using the
 body[]start.length construct, so you have to be able to byte
 accurately reconstruct the original email if you remove attachments.
 
 Consider the problem of transfer encoding. Say you have a base64
 encoded attachment (which basically all are). When storing and
 deduping, you'd want to base64 decode it to get the underlying
 binary data. But depending on the line length of the base64 encoded
 data, the same file can be encoded in a large number of different
 ways. When you reconstruct the base64 data, you have to be byte
 accurate in your reconstruction so your offsets are correct, and so
 any signing of the message (eg DKIM) isn't broken.
 
 Once you've solved those problems, the rest is pretty straight forward :)
 
 Rob
 

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: De-duping attachments

2010-09-14 Thread Shuvam Misra
Dear Bron,

 http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413
 
 2TB - US $109.

Don't want to nit-pick here, but the effective price we pay is about
ten times this. To set up a mail server with a few TB of disk space,
we usually land up deploying a separate chassis with RAID controllers and
a RAID array, with FC connections from servers, etc, etc.  All this adds
up to about $1,000/TB of usable space if you're using something like the
low-end IBM DS3400 box or Dell/EMC equivalent. This is even with
inexpensive 7200RPM SATA-II drives, not 15KRPM SAS drives.

http://www-07.ibm.com/storage/in/disk/ds3000/ds3400/

And most of our customers actually double this cost because they keep two
physically identical chassis for redundancy. (We recommend this too,
because we can't trust a single RAID 5 array to withstand controller or
PSU failures.) In that case, it's $2000/TB.

And you do reach 5-10 TB of email store quite rapidly --- our company
has many corporate clients ( 500 email users) whose IMAP store has
reached 4TB. No one wants to enforce disk quotas (corporate policy),
and most users don't want to delete emails on their own.

We keep hearing the logic that storage is cheap, and stories of cloud
storage through Amazon, unlimited mailboxes on Gmail, are reinforcing
the belief. But at the ground level in mid-market corporate IT budgets,
storage costs in data centres (as against inside desktops) are still
too high to be trivial, and their prices have only little to do with
the prices of raw SATA-II drives. A fully-loaded DS3400 costs a little
over $12,000 in India, with a full set of 1TB SATA-II drives from IBM,
but even with high cost of IBM drives, the drives themselves contribute
less than 30% of the total cost.

If we really want to put our collective money where our mouth is, and
deliver the storage-is-cheap promise at the ground level, we need to
rearchitect every file server and IMAP server to work in map-reduce mode
and use disks inside desktops. Anyone game for this project? :)

 Now de-duping messages on copy is valuable, not so much because of
 the space it saves, but because of the IO it saves.  Copying the file
 around is expensive.
 
 De-duping componenets of messages and then reconstructing?  Not so much.
 You'll be causing MORE IO in general looking for the message, finding the
 parts.

I agree. My aim was not to reduce IOPS but to cut disk space usage.

There are two areas where we are seeing a huge increase in inactive
disk utilisation for emails. One is for the archive, which is being kept
for security and compliance reasons. Every company we work with wants an
archive with at least a few years' retention. They search the archive
every few weeks to trace lost emails, not for compliance reasons but to
find missing information. This means that we can't ask them to move the
data out to removable storage.

The second area is shared mail folders where all communication with each
client/topic/project are stored practically forever.

A 500-user company can easily acquire an email archive of 2-5TB. I don't
care how much the IO load of that archive server increases, but I'd like
to reduce disk space utilisation. If the customer can stick to 2TB of
space requirements, he can use a desktop with two 2TB drives in RAID
1, and get a real cheap archive server. If this figure reaches 3-4TB,
he goes into a separate RAID chassis --- the hardware cost goes up 5-10
times. These are tradeoffs a lot of small to mid-sized companies in my
market fuss about.

And in a more generic context, I am seeing that all kinds of intelligent
de-duping of infrequently-accessed data is going to become the crying
need of every mid-sized and large company.  Data is growing too fast,
and no one wants to impose user discipline or data cleaning. When we
tell the business head This is crazy!, he turns around and tells the
CTO But disk space is cheap! Haven't you heard of Google? What are you
cribbing about? You must be doing something really inefficient here,
wasting money!

thanks and regards,
Shuvam

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: Importing/moving an older cyrus message tree into a new system, without IMAP

2010-09-14 Thread Dan White
On 15/09/10 06:46 +0530, Shuvam Misra wrote:
What are annotations?

Annotations are defined in RFC 5257.

They allow an admin to add metadata to a mailbox (or the server). The
cyradm utility sets annotations with its internal info, mboxcfg, and
setinfo commands.

 /var/lib/cyrus/tls_sessions.db

You were saying these are transient data -- can one skip this?

Yes.

 /var/lib/cyrus/deliver.db

This too can be skipped right? They won't affect the user's perception of
his emails, mailfolders, ACLs, quotas, flags, etc.

Right.

 /var/lib/cyrus/domain/e/example.org/user/j/jsmith.mboxkey
   backend for mailbox keys

What are mailbox keys?

It's for URLAUTH. See RFC 4467, and:

http://www.cyrusimap.org/docs/cyrus-imapd/2.3.16/internal/database-formats.php

-- 
Dan White

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: De-duping attachments

2010-09-14 Thread Rob Mueller

 A 500-user company can easily acquire an email archive of 2-5TB. I don't
 care how much the IO load of that archive server increases, but I'd like
 to reduce disk space utilisation. If the customer can stick to 2TB of

It would be interesting to measure the amount of duplication that is going 
on with attachments in emails.

While we could do that with Fastmail data, I think because of the broad 
range of users, we'd be getting one data point, which might be quite 
different to a data point inside one company. Eg. An architectural firm 
might end up sending big blueprint documents back and forth between each 
other a lot, so they'd gain a lot from deduplication.

Also even within deduplication, there's some interesting ideas as well. For 
instance, if you know the same file is being sent back and forth a lot with 
minor changes, you might want to store the most recent version, and store 
binary diffs between the most recent and old versions (eg xdelta). Yes 
accessing the older versions would be much slower (have to get most recent + 
apply N deltas), but the space savings could be huge.

 Can we just brainstorm with you and others in this thread...  how do we
 re-create a byte-identical attachment from a disk file?

One overall implementation issue. With the message file, do you:

1. Completely rewrite the message file removing the attachments and adding 
any extra meta data you want in it's place
2. Leave the message file as exactly the same size, just don't write out the 
attachment content and assume your filesystem supports sparse files 
(http://en.wikipedia.org/wiki/Sparse_file)

The advantage of 2 is that it leaves the message file size correct, and all 
the offsets in the file are still correct. The downsides are that you must 
ensure your FS supports sparse files well, and there's the question of where 
do you actually store the information that links to the external file?

  - file name/reference
  - full MIME header of the attachment block

I'd leave these intact in the actual message, and just add an extra 
X-Detached-File header or something like that includes some external 
reference to the file. Hmmm, that'll break signing though. Not so easy...

  - separator string (this will be retained in the message body anyway)
  - transfer encoding
  - if encoding = base64 then
base64 line length

Remember every line can actually be a different length! In most cases they 
will be the same length, but you can't assume it. And you do see messages 
that have lines in repeating groups like 76, 76, 76, 76, 74, 76, 76, 76, 76, 
74, ... repeat ... or cases like that, a pain to deal with.

  - checksum of encoded attachment (as a sanity check in case the 
 re-encoding
fails to recreate exactly the same image as the original)

This is seeming a bit more tricky...

Rob


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/