Re: New Cyrus project site and bugzilla
On Mon, 13 Sep 2010, Mark Cave-Ayland wrote: > (On a separate note, if I go to Downloads -> Getting Started and click > on the "AnonymousCVS" wiki link then I get redirected back to the front > page rather than to a page giving information on how to access CVS) Fixed. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: De-duping attachments
On Wed, Sep 15, 2010 at 05:24:11PM +0100, Gavin McCullagh wrote: > Hi, > > On Wed, 15 Sep 2010, Nik Conwell wrote: > > > Isn't the easy hack for dedup just looking at the above md5 files and > > then doing appropriate hard links? This could be done by a nightly > > trawl of the spool space. A bigger win would be to separate the headers > > from the messages but that's a lot more work. > > For what it's worth, I believe the fsdup tool which is part of fslint will > do this for you. > > http://www.pixelbeat.org/fslint/ Or this lovely little toy. It uses the fact that in current versions of Cyrus the "GUID" field is actually the sha1 of the underlying file. Bron ( warning: may contain FastMail specific assuptions ) #!/usr/bin/perl -w # SETUP {{{ use strict; use warnings; BEGIN { do "/home/mod_perl/hm/ME/FindLibs.pm"; } use Date::Manip; use MailApp::Admin::Actions; use IO::File; use ME::Machine; use Cyrus::HeaderFile; use Data::Dumper; use Cyrus::IndexFile; use Getopt::Std; use Digest::SHA1; use ME::CyrusBackup; use ME::User; use Data::Dumper; # }}} my $sn = shift; my (undef,undef,$uid,$gid) = getpwnam('cyrus'); foreach my $Slot (ME::Machine->ImapSlots()) { next if ($sn and $sn ne $Slot->Name()); my $users = $Slot->AllMailboxes(); my $conf = $Slot->ImapdConf(); foreach my $user (sort keys %$users) { process($conf, $user, $users->{$user}); } } sub process { my ($conf, $user, $folders) = @_; print "$user\n"; my %ihave; foreach my $folder (@$folders) { my $meta = $conf->GetUserLocation('meta', $user, 'default', $folder); my $index = Cyrus::IndexFile->new_file("$meta/cyrus.index") || die "Failed to open $meta/cyrus.index"; while (my $record = $index->next_record()) { push @{$ihave{$record->{MessageGuid}}}, [$folder, $record->{Uid}]; } } foreach my $guid (keys %ihave) { next if @{$ihave{$guid}} <= 1; my ($inode, $srcname); my @others; foreach my $item (@{$ihave{$guid}}) { my $spool = $conf->GetUserLocation('spool', $user, 'default', $item->[0]); $spool =~ s{/$}{}; my $file = "$spool/$item->[1]."; my (@sd) = stat($file); if ($inode) { next if $sd[1] == $inode; push @others, $file; } else { $inode = $sd[1]; $srcname = $file; } } next unless @others; print "fixing up files for $guid ($srcname)\n"; foreach my $file (@others) { my $tmpfile = $file . "tmp"; print "link error $tmpfile\n" unless link($srcname, $tmpfile); chown($uid, $gid, $tmpfile); chmod(0600, $tmpfile); print "rename error $file\n" unless rename($tmpfile, $file); } } } Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: De-duping attachments
On 09/14/2010 11:55 PM, Rob Mueller wrote: > > Eg. An architectural firm > might end up sending big blueprint documents back and forth between each > other a lot, so they'd gain a lot from deduplication. > Not to throw a damp towel on this discussion, but isn't this really an administrative problem rather than a technical one? I.e. shouldn't the system administrator set up a version control system or even something like dropbox for file sharing rather than using email for this situation? > if you know the same file is being sent back and forth a lot with > minor changes, you might want to store the most "recent" version, > and store binary diffs between the most recent and old versions > (eg xdelta). Yes accessing the older versions would be much > slower (have to get most recent + > apply N deltas), but the space savings could be huge. My users frequently mail documents to the person in the office next door (never mind that both their home directories are on the same server!); however this content is almost always different for each attached file; i.e. without re-implementing a version control system under IMAP, as you're suggesting, there would be little benefit in keeping and hard linking to a single copy of each file. However, that seems like it fails the UNIX "do one thing, and do it well" test pretty badly. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: sync-server without deletes?
On Wed, 15 Sep 2010, Shuvam Misra wrote: > > Well, unless you have users delivering mail to each other through IMAP > > on shared folders, one usually configures the MTAs to drop a copy of > > everything into a system mailbox... > > Yes, this is what we do too. We have a milter in Sendmail which adds an > envelope recipient for each mail passing through Sendmail. This new In postfix, this is a built-in feature and it is very powerful. You can "always_bcc" to some fixed address(es). Recent versions can use lookup maps to have different "always_bcc" addresses keyed to the original message recipient or original message sender... http://www.postfix.org/ADDRESS_REWRITING_README.html#auto_bcc But this is getting off-topic :-) -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: De-duping attachments
Hi, On Wed, 15 Sep 2010, Nik Conwell wrote: > Isn't the easy hack for dedup just looking at the above md5 files and > then doing appropriate hard links? This could be done by a nightly > trawl of the spool space. A bigger win would be to separate the headers > from the messages but that's a lot more work. For what it's worth, I believe the fsdup tool which is part of fslint will do this for you. http://www.pixelbeat.org/fslint/ Gavin Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: De-duping attachments
Outside the cyrus box: The Mimedefang milter has a built-in function (optional of course) to remove an attachment, write it to a file, and replace the attachment part with a text part giving a web link to the file. The files could be on a slower type of disk drive than you need for email storage. You could write code choosing which attachments to do this to, say by size or file extension. A mechanism to remove the files is not provided, but it's suggested that recipients would need to download the attachment to their own computer and that therefore the files could be deleted by a cron job based on age. I mention this only as another way to do it. Note that this could be implemented for outgoing mail too. We have not implemented it here so I can't say more than that it is possible. Joseph Brennan Columbia University Information Technology Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: Using cvt_cyrusdb to convert quota database from skiplist back to quotalegacy.
On Tue, Sep 14, 2010 at 11:58:03PM +0200, Eric Luyten wrote: > Hello, > > > I am having trouble converting a quota skiplist db back to quotalegacy > format (I know... this is probably not the most common Cyrus operation :-) Yeah, odd! I wonder what's going on there. I'll take a look. > Other question : would I be better off with 65,000 small files > (quotalegacy) in a one-level hash or with a single skiplist db > for my quota information, when the files reside on solid state > storage anyway ? 65k small files :) Fewer locking issues. Bron. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: De-duping attachments
On Wed, September 15, 2010 2:12 pm, Simon Matter wrote: > You said ZFS, did you > consider testing its built in deduping? > (If its even there in Solaris 10?) Simon, OpenSolaris does have it (block level dedup) since about one year but it is too recent an addition to the commercial Solaris 10 to start using it (IMO). Apparently (Wikipedia) it is ZFS pool feature 21 listed as 'Reserved' by 'zpool upgrade -v' (h... both 'zfs get all ...' and 'zpool get all ...' do not yield a parameter sounding as 'deduplication' ; it may very well not be there yet) Furthermore, I'd like to repeat what has been written earlier in this thread : a message header that is different in size by even one byte will cause block boundaries to shift and, I suspect, block level dedup to fail. Eric Luyten. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: Replication sync-server and Delayed Delete
On Wed, Sep 15, 2010 at 12:29:18PM +0100, Gavin Gray wrote: > Hi there, > > We have a cyrus murder using replication and we have a few questions > about the behaviour we are seeing on our system. > > 1. cyr_expire on the master doesn't cause any replication to happen. > Is that 'correct'? In other words if we want to delete folders from > the DELETED heirarchy on the replicant then we need to also run > cyr_expire on the replicant? Yeah, pretty much. > 2. We're also a little unclear about replication vis a vis the > delayed expunge and the unexpunge facility. Could you explain what > ought to happen in terms of replication when email is expunged and > then possibly unexpunged if anything? It's a bit messy. Unexpunge is a sin against IMAP by the way, and has been replaced with "generate new UID and promote" in 2.4. In which case it's just like a new append wit the same flags, and replicates like an append :) 2.3 replication ignores expunges - it's as if they don't exist! When the mailbox syncs, it nukes the records that aren't "alive" on the master from the replica. If you re-inject them with unexpunge, it should find them and sync_combine_commit() the result. I don't know if unexpunge inserts replication events though - somewhat doubt it. > 3. We are seeing a strange anomaly on the replication of deleting a folder. >e.g a user deletes a folder >the folder goes into the DELETED heirarchy of the partition > the user's mailbox is on >the folder is also deleted from the replicant as we would expect >however the folder on the replicant goes into the DELETED > heirarchy on a different partition(the default partition as > specified in cyrus.conf). Is this normal? Replication and partitions is broken in some ways in 2.3. It should be better in 2.4 I believe, though I haven't tested it. I'm going to be releasing an alpha super-soon (it's been pushed to git.cyrusimap.org now!) Bron. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: Mailbox directory structure
Date: Wed, 15 Sep 2010 14:02:38 +0200 > From: Michael Menge > Subject: Re: Mailbox directory structure > To: info-cyrus@lists.andrew.cmu.edu > Message-ID: <20100915140238.18471f8lfaaqs...@webmail.uni-tuebingen.de> > Content-Type: text/plain; charset="utf-8" > > Quoting Artur Kaminski : > > > Hey all, > > > > I installed imapd server successfully, and moved configuration from old > one, > > but then accidentally loaded another server's configuration from puppet. > Now > > Cyrus looks for user mailboxes in > > > > User Mailbox > > /var/spool/imap/a/ > > user. /var/spool/imap/u/user^ > > > > (checked using cyradmin, with creating the mailboxes above) > > > > > > Actual mailbox for hypothetical user is in /var/spool/imap/a/users/ > > (moved from old server). > > > > In effect Postfix gets response "5.1.1 Mailbox not found" to all > requests, > > including root. > > > The option unixhierarchysep changes the hierarchy seperatror from . to / > The . is representet as ^ in filesystem. > > > > I can log in using IMAP. > > I can't increase Cyrus logging. > > Cyrus logs everything to Syslog, you have to change your syslog config > to change the loglevel. > > > Thank you Michael for quick reply. So "unixhierarchysep = yes" in my imapd.conf should change the dot vel caret to slash? Unfortunately it doesn't, but indeed it would solve my problem. Can it be covered by the same variable set to "no" in another file? I receive a lot of logs from all Cyrus daemons, but I actually wanted to turn on some kind of debugging (to find why it doesn't search in user subdirectory). Dr Google told me about CYRUS_VERBOSE=1, but placed in /etc/init.d/cyrus-master changed nothing. Your advise about unixhierarchysep probably makes me happy with the current level of logging. Thank you Artur Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: De-duping attachments
> On Wed, September 15, 2010 10:01 am, Simon Matter wrote: > >> I guess much more efficient than a compressing filesystem would be a >> compressing and de-duping filesystem or disk storage in this case. Has >> anyone >> tried this with a Cyrus message store with lots of "corporate message >> data" >> stored on it? > > > Simon, > > > The Cyrus server I hope to get online tomorrow evening holds 4.2 TB of > mail > and uses ZFS with maximal compression (gzip9) for the message files. > (OS : Solaris 10) > > ZFS reports a compressratio of between 1.95 and 1.97 (we have nine > partitions) > > A series of tests revealed our metadata can actually be compressed by a > factor > of 3.76 (!) > > Perhaps a two-university environment with 60,000+ users doesn't quite > qualify > as "corporate" enough but here you have our figures :-) Eric, that looks of course interesting. With more "corporate" style I means much less users but much bigger mailboxes. "Enforcing" quota in the mulit GB range seems quite common these days. In such environment I expect the compression ratio to increase. But, the big question for me is how much filesystem / block level deduping is going to shrink it? You said ZFS, did you consider testing its built in deduping? (If its even there in Solaris 10?) Simon Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: Mailbox directory structure
Quoting Artur Kaminski : Hey all, I installed imapd server successfully, and moved configuration from old one, but then accidentally loaded another server's configuration from puppet. Now Cyrus looks for user mailboxes in User Mailbox /var/spool/imap/a/ user. /var/spool/imap/u/user^ (checked using cyradmin, with creating the mailboxes above) Actual mailbox for hypothetical user is in /var/spool/imap/a/users/ (moved from old server). In effect Postfix gets response "5.1.1 Mailbox not found" to all requests, including root. The option unixhierarchysep changes the hierarchy seperatror from . to / The . is representet as ^ in filesystem. I can log in using IMAP. I can't increase Cyrus logging. Cyrus logs everything to Syslog, you have to change your syslog config to change the loglevel. Michael M.MengeTel.: (49) 7071/29-70316 Universität Tübingen Fax.: (49) 7071/29-5912 Zentrum für Datenverarbeitung mail: michael.me...@zdv.uni-tuebingen.de Wächterstraße 76 72074 Tübingen smime.p7s Description: S/MIME Signatur Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: De-duping attachments
Great thread. Here as some real world numbers based on our spools here at BU. One of our masters has 4,800 users, 22,000 mailboxes, and is using about 374G of disk. Based on the md5 files for these users there are 6,046,363 messages. If I look at the first md5 value (md5 on the msg if I understand this) and sort and uniq I get 5,891,974 messages, so assuming we dedup all those messages that would be a shrink to 97.4% of the original number of messages. Assuming an even distribution of message sizes this would mean 374G would drop down to 362.78G. Unfortunately not an obvious huge win. But, I think the md5 of the message file includes headers which may be more likely to be unique over the body content. (Due to legacy support for UW IMAP, we often end up routing things differently for users on the same master so the headers for the same message sent to 2 people could be different). Isn't the easy hack for dedup just looking at the above md5 files and then doing appropriate hard links? This could be done by a nightly trawl of the spool space. A bigger win would be to separate the headers from the messages but that's a lot more work. -nik Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Replication sync-server and Delayed Delete
Hi there, We have a cyrus murder using replication and we have a few questions about the behaviour we are seeing on our system. 1. cyr_expire on the master doesn't cause any replication to happen. Is that 'correct'? In other words if we want to delete folders from the DELETED heirarchy on the replicant then we need to also run cyr_expire on the replicant? 2. We're also a little unclear about replication vis a vis the delayed expunge and the unexpunge facility. Could you explain what ought to happen in terms of replication when email is expunged and then possibly unexpunged if anything? 3. We are seeing a strange anomaly on the replication of deleting a folder. e.g a user deletes a folder the folder goes into the DELETED heirarchy of the partition the user's mailbox is on the folder is also deleted from the replicant as we would expect however the folder on the replicant goes into the DELETED heirarchy on a different partition(the default partition as specified in cyrus.conf). Is this normal? many thanks, Gavin Gray name : Cyrus IMAPD version: v2.3.15 2009/09/09 12:35:48 vendor : Project Cyrus support-url: http://cyrusimap.web.cmu.edu os : SunOS os-version : 5.11 environment: Built w/Cyrus SASL 2.1.23 Running w/Cyrus SASL 2.1.23 Built w/Berkeley DB 4.7.25: (May 15, 2008) Running w/Berkeley DB 4.7.25: (May 15, 2008) Built w/OpenSSL 0.9.8a 11 Oct 2005 (+ security fixes for: CVE-2006-2937 CVE-2006-2940 CVE-2006-3738 CVE-2006-4339 CVE-2006-4343 CVE-2007-3108 CVE-2007-4995 CVE-2007-5135 CVE-2008-5077 CVE-2009-0590) Running w/OpenSSL 0.9.8a 11 Oct 2005 (+ security fixes for: CVE-2006-2937 CVE-2006-2940 CVE-2006-3738 CVE-2006-4339 CVE-2006-4343 CVE-2007-3108 CVE-2007-4995 CVE-2007-5135 CVE-2008-5077 CVE-2009-0590) Built w/zlib 1.2.3 Running w/zlib 1.2.3 CMU Sieve 2.3 NET-SNMP mmap = shared lock = fcntl nonblock = fcntl idle = poll -- Gavin Gray Edinburgh University Information Services Rm 2013 JCMB Kings Buildings Edinburgh EH9 3JZ UK tel +44 (0)131 650 5987 email gavin.g...@ed.ac.uk -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Mailbox directory structure
Hey all, I installed imapd server successfully, and moved configuration from old one, but then accidentally loaded another server's configuration from puppet. Now Cyrus looks for user mailboxes in User Mailbox /var/spool/imap/a/ user. /var/spool/imap/u/user^ (checked using cyradmin, with creating the mailboxes above) Actual mailbox for hypothetical user is in /var/spool/imap/a/users/ (moved from old server). In effect Postfix gets response "5.1.1 Mailbox not found" to all requests, including root. I can log in using IMAP. I can't increase Cyrus logging. Any help/inspiration is appreciated :-) Regards Artur /etc/imapd.conf: configdirectory: /var/lib/imap defaultpartition: default partition-default: /var/spool/imap altnamespace: no unixhierarchysep: yes admins: cyrus sieve_admins: cyrus allowanonymouslogin: no popminpoll: 0 autocreatequota: 0 umask: 077 sieveusehomedir: false sievedir: /var/spool/sieve hashimapspool: true fulldirhash: yes allowplaintext: yes sasl_mech_list: PLAIN LOGIN sasl_pwcheck_method: saslauthd sasl_auto_transition: no tls_ca_path: /etc/ssl/certs tls_session_timeout: 1440 tls_cipher_list: TLSv1:SSLv3:SSLv2:!NULL:!EXPORT:!DES:!LOW:@STRENGTH lmtpsocket: /var/lib/imap/socket/lmtp idlesocket: /var/run/cyrus/socket/idle notifysocket: /var/run/cyrus/socket/notify tls_cert_file: /etc/ssl/certs/corp.crt tls_key_file: /etc/ssl/certs/corp.key Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: De-duping attachments
> Makes sense. There might be some size based logic here too - only > bother applying this on messages over 20k, and where the attachment > is at least 20k in size. Anything smaller than that is pretty > pointless. Yes, absolutely. Left to myself, I'd not have bothered with any attachment less than 100KBytes or so. The stuff that gets my goat is seeing our customers using email to shunt 20MB CAD files back and forth across the world two dozen times. Emails are being used for the kind of work God had meant trucks to do. :( > Sure. Ideas are good :) I don't think I'm sold on the value though. > And given that Rob is actually the one who argued me down from > implementing this years ago ;) But maybe our use case isn't the > same as yours. Let me get some hard data from a few of our large corporate clients' servers, and then we'll talk again. May take a couple of weeks to get this data, because we'll need to look for a time window when the mail server is less loaded to run our scan. Shuvam Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: Importing/moving an older cyrus message tree into a new system, without IMAP
> Annotations are defined in RFC 5257. > > They allow an admin to add metadata to a mailbox (or the server). The > cyradm utility sets annotations with its internal info, mboxcfg, and > setinfo commands. Okay, checked. Don't know where these things are used, other than expiry and sieve, but at least I got the basics. > >What are mailbox keys? > > It's for URLAUTH. See RFC 4467, and: > > http://www.cyrusimap.org/docs/cyrus-imapd/2.3.16/internal/database-formats.php Thanks for the pointer. Still reading. Shuvam Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: De-duping attachments
The sparse file idea is brilliant! Never occurred to me. :) We'd have to store the reference-pointer in the message file, so we would omit the actual attachment but eat up perhaps 50 bytes to keep the reference to the file. Shuvam > 1. Completely rewrite the message file removing the attachments and > adding any extra meta data you want in it's place > 2. Leave the message file as exactly the same size, just don't write > out the attachment content and assume your filesystem supports > sparse files (http://en.wikipedia.org/wiki/Sparse_file) > > The advantage of 2 is that it leaves the message file size correct, > and all the offsets in the file are still correct. The downsides are > that you must ensure your FS supports sparse files well, and there's > the question of where do you actually store the information that > links to the external file? Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: De-duping attachments
Dear Bron, > So you save, what, 50%. Does that sound about right? Do you have > statistics on how much space you'd save with this theoretical > patch? No, and this is the first thing I want to do. I'm getting some simple utilities developed which will run all week (niced suitably) and extract and MD5sum each attachment. I'll then count how many unique message-IDs have the same unique document, and I'll get a report. This has been under discussion in our group for some time --- let me get this done and I'll let all of you know. > You're buying a few months. Usage grows to fill the available storage, > whatever it is. And you can only pull this piece of magic once. Unfortunately, you're totally right. The junk will keep growing. Shuvam Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: Using cvt_cyrusdb to convert quota database from skiplist back to quotalegacy.
On Wed, September 15, 2010 9:27 am, Simon Matter wrote: >> I am having trouble converting a quota skiplist db back to quotalegacy >> format (I know... this is probably not the most common Cyrus operation :-) >> >> % cvt_cyrusdb /ssd/cyrs/imap/quotas.db skiplist /ssd/cyrs/imap/quota >> quotalegacy Converting from /ssd/cyrs/imap/quotas.db (skiplist) to >> /ssd/cyrs/imap/quota >> (quotalegacy) >> % find quota -type f | wc -l >> 126 >> % strings quotas.db|wc -l >> 135229 >> >> >> >> "quotas.db" was created using the reverse operation and took about one >> minute. I renamed the original 'quota' directory out of the way before making >> the second cvt_cyrusdb call. >> >> Closer inspection of the newly created 'quota' directory reveals 125 quota >> descriptor files named user.aXX created under 'a', all relating to >> existing top level mailboxes and containing the correct information, and >> (curiously) one file named 'u' in directory 'u'. >> >> >> I also tried a Berkeley DB intermediate format and the creation >> of the quotalegacy structure failed in an identical way. > > I expect this to be a bug in the way cvt_cyrusdb calls the quotalegacy > backend, or in the backend itself. And I guess you are the first to test it > out > :) > Could test it with different dirhashing options to found out how exactly > it fails? Simon, Yes, one could, but this is not my number one priority right now. Converting from skiplist to flat takes two minutes and, since the "quotalegacy" structure/format is trivial, a few lines of scripting will do the rest, if ever needed. Regards, Eric Luyten, Computing Centre VUB/ULB. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: De-duping attachments
On Wed, September 15, 2010 10:01 am, Simon Matter wrote: > I guess much more efficient than a compressing filesystem would be a > compressing and de-duping filesystem or disk storage in this case. Has anyone > tried this with a Cyrus message store with lots of "corporate message data" > stored on it? Simon, The Cyrus server I hope to get online tomorrow evening holds 4.2 TB of mail and uses ZFS with maximal compression (gzip9) for the message files. (OS : Solaris 10) ZFS reports a compressratio of between 1.95 and 1.97 (we have nine partitions) A series of tests revealed our metadata can actually be compressed by a factor of 3.76 (!) Perhaps a two-university environment with 60,000+ users doesn't quite qualify as "corporate" enough but here you have our figures :-) Regards, Eric. Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: De-duping attachments
> On Wed, Sep 15, 2010 at 09:15:13AM +0530, Shuvam Misra wrote: >> Dear Bron, >> >> > http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 >> > >> > 2TB - US $109. >> >> Don't want to nit-pick here, but the effective price we pay is about >> ten times this. > > Yeah, so? It's going down. That's a large number of attachments > we're talking about there. > >> To set up a mail server with a few TB of disk space, >> we usually land up deploying a separate chassis with RAID controllers >> and >> a RAID array, with FC connections from servers, etc, etc. All this adds >> up to about $1,000/TB of usable space if you're using something like the >> "low-end" IBM DS3400 box or Dell/EMC equivalent. This is even with >> inexpensive 7200RPM SATA-II drives, not 15KRPM SAS drives. > > Hmm... our storage units with metadata on SSD come in about $1200/TB. > Yes, that sounds about right. That's including hot spares, RAID1 on > everything (including the SSDs), scads of processor and memory. > Obviously multiply that by two for replication, and add in a bit of > extra for backups and I'm happy to arrive at a figure of approximately > $3000 per terabyte of actual email. > >> And most of our customers actually double this cost because they keep >> two >> physically identical chassis for redundancy. (We recommend this too, >> because we can't trust a single RAID 5 array to withstand controller or >> PSU failures.) In that case, it's $2000/TB. > > And because it's nice not to have downtime when you're doing > maintainence. I replaced an entire drive unit today, including > about 4 hours downtime on one of our servers as the system was > swamped with IO creating new filesystems and initialising the > drives. The users didn't see a thing, and repliation is now > fully operational again. > >> And you do reach 5-10 TB of email store quite rapidly --- our company >> has many corporate clients (< 500 email users) whose IMAP store has >> reached 4TB. No one wants to enforce disk quotas (corporate policy), >> and most users don't want to delete emails on their own. > > So you save, what, 50%. Does that sound about right? Do you have > statistics on how much space you'd save with this theoretical > patch? > >> We keep hearing the logic that storage is cheap, and stories of cloud >> storage through Amazon, unlimited mailboxes on Gmail, are reinforcing >> the belief. But at the ground level in mid-market corporate IT budgets, >> storage costs in data centres (as against inside desktops) are still >> too high to be trivial, and their prices have only little to do with >> the prices of raw SATA-II drives. A fully-loaded DS3400 costs a little >> over $12,000 in India, with a full set of 1TB SATA-II drives from IBM, >> but even with high cost of IBM drives, the drives themselves contribute >> less than 30% of the total cost. > > You're buying a few months. Usage grows to fill the available storage, > whatever it is. And you can only pull this piece of magic once. > >> If we really want to put our collective money where our mouth is, and >> deliver the storage-is-cheap promise at the ground level, we need to >> rearchitect every file server and IMAP server to work in map-reduce mode >> and use disks inside desktops. Anyone game for this project? :) > > You could buy as much benefit much more quickly by gzipping the > individual email files. Either a filesystem that stores files > compressed, or a cyrus patch to do that and unpack files on the > fly if the body was read. Along with most/all headers in the I guess much more efficient than a compressing filesystem would be a compressing and de-duping filesystem or disk storage in this case. Has anyone tried this with a Cyrus message store with lots of "corporate message data" stored on it? Simon Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: Using cvt_cyrusdb to convert quota database from skiplist back to quotalegacy.
> Hello, > > > I am having trouble converting a quota skiplist db back to quotalegacy > format (I know... this is probably not the most common Cyrus operation :-) > > % cvt_cyrusdb /ssd/cyrs/imap/quotas.db skiplist /ssd/cyrs/imap/quota > quotalegacy > Converting from /ssd/cyrs/imap/quotas.db (skiplist) to > /ssd/cyrs/imap/quota > (quotalegacy) > % find quota -type f | wc -l > 126 > % strings quotas.db|wc -l > 135229 > > > "quotas.db" was created using the reverse operation and took about one > minute. > I renamed the original 'quota' directory out of the way before making the > second cvt_cyrusdb call. > > Closer inspection of the newly created 'quota' directory reveals 125 quota > descriptor files named user.aXX created under 'a', all relating to > existing top level mailboxes and containing the correct information, and > (curiously) one file named 'u' in directory 'u'. > > I also tried a Berkeley DB intermediate format and the creation > of the quotalegacy structure failed in an identical way. I expect this to be a bug in the way cvt_cyrusdb calls the quotalegacy backend, or in the backend itself. And I guess you are the first to test it out :) Could test it with different dirhashing options to found out how exactly it fails? Simon > > > Other question : would I be better off with 65,000 small files > (quotalegacy) in a one-level hash or with a single skiplist db > for my quota information, when the files reside on solid state > storage anyway ? > > > Thx, > Eric Luyten, Computing Centre VUB/ULB. > > > Cyrus Home Page: http://www.cyrusimap.org/ > List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ > Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/