Re: New Cyrus project site and bugzilla

2010-09-15 Thread Matt Selsky
On Mon, 13 Sep 2010, Mark Cave-Ayland wrote:

> (On a separate note, if I go to Downloads -> Getting Started and click
> on the "AnonymousCVS" wiki link then I get redirected back to the front
> page rather than to a page giving information on how to access CVS)

Fixed.

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: De-duping attachments

2010-09-15 Thread Bron Gondwana
On Wed, Sep 15, 2010 at 05:24:11PM +0100, Gavin McCullagh wrote:
> Hi,
> 
> On Wed, 15 Sep 2010, Nik Conwell wrote:
> 
> > Isn't the easy hack for dedup just looking at the above md5 files and 
> > then doing appropriate hard links?  This could be done by a nightly 
> > trawl of the spool space.  A bigger win would be to separate the headers 
> > from the messages but that's a lot more work.
> 
> For what it's worth, I believe the fsdup tool which is part of fslint will
> do this for you.
> 
>   http://www.pixelbeat.org/fslint/

Or this lovely little toy.  It uses the fact that in current versions of
Cyrus the "GUID" field is actually the sha1 of the underlying file.

Bron ( warning: may contain FastMail specific assuptions )
#!/usr/bin/perl -w

# SETUP {{{
use strict;
use warnings;
BEGIN { do "/home/mod_perl/hm/ME/FindLibs.pm"; }
use Date::Manip;
use MailApp::Admin::Actions;
use IO::File;
use ME::Machine;
use Cyrus::HeaderFile;
use Data::Dumper;
use Cyrus::IndexFile;
use Getopt::Std;
use Digest::SHA1;
use ME::CyrusBackup;
use ME::User;
use Data::Dumper;
# }}}

my $sn = shift;

my (undef,undef,$uid,$gid) = getpwnam('cyrus');

foreach my $Slot (ME::Machine->ImapSlots()) {
  next if ($sn and $sn ne $Slot->Name());
  my $users = $Slot->AllMailboxes();
  my $conf = $Slot->ImapdConf();
  foreach my $user (sort keys %$users) {
process($conf, $user, $users->{$user});
  }
}

sub process {
  my ($conf, $user, $folders) = @_;
  print "$user\n";
  my %ihave;
  foreach my $folder (@$folders) {
my $meta = $conf->GetUserLocation('meta', $user, 'default', $folder);
my $index = Cyrus::IndexFile->new_file("$meta/cyrus.index") || die "Failed to open $meta/cyrus.index";
while (my $record = $index->next_record()) {
  push @{$ihave{$record->{MessageGuid}}}, [$folder, $record->{Uid}];
}
  }

  foreach my $guid (keys %ihave) {
next if @{$ihave{$guid}} <= 1;
my ($inode, $srcname);
my @others;
foreach my $item (@{$ihave{$guid}}) {
  my $spool = $conf->GetUserLocation('spool', $user, 'default', $item->[0]);
  $spool =~ s{/$}{};
  my $file = "$spool/$item->[1].";
  my (@sd) = stat($file);
  if ($inode) {
next if $sd[1] == $inode;
push @others, $file;
  }
  else {
$inode = $sd[1];
$srcname = $file;
  }
}
next unless @others;
print "fixing up files for $guid ($srcname)\n";
foreach my $file (@others) {
  my $tmpfile = $file . "tmp";
  print "link error $tmpfile\n" unless link($srcname, $tmpfile);
  chown($uid, $gid, $tmpfile);
  chmod(0600, $tmpfile);
  print "rename error $file\n" unless rename($tmpfile, $file);
}
  }
}

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/

Re: De-duping attachments

2010-09-15 Thread Patrick Goetz
On 09/14/2010 11:55 PM, Rob Mueller wrote:
>
> Eg. An architectural firm
> might end up sending big blueprint documents back and forth between each
> other a lot, so they'd gain a lot from deduplication.
>

Not to throw a damp towel on this discussion, but isn't this really an 
administrative problem rather than a technical one?  I.e. shouldn't the 
system administrator set up a version control system or even something 
like dropbox for file sharing rather than using email for this situation?

 > if you know the same file is being sent back and forth a lot with
 > minor changes, you might want to store the most "recent" version,
 > and store binary diffs between the most recent and old versions
 > (eg xdelta). Yes accessing the older versions would be much
 > slower (have to get most recent +
 > apply N deltas), but the space savings could be huge.


My users frequently mail documents to the person in the office next door 
(never mind that both their home directories are on the same server!); 
however this content is almost always different for each attached file; 
i.e. without re-implementing a version control system under IMAP, as 
you're suggesting, there would be little benefit in keeping and hard 
linking to a single copy of each file.  However, that seems like it 
fails the UNIX "do one thing, and do it well" test pretty badly.

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: sync-server without deletes?

2010-09-15 Thread Henrique de Moraes Holschuh
On Wed, 15 Sep 2010, Shuvam Misra wrote:
> > Well, unless you have users delivering mail to each other through IMAP
> > on shared folders, one usually configures the MTAs to drop a copy of
> > everything into a system mailbox...
> 
> Yes, this is what we do too. We have a milter in Sendmail which adds an
> envelope recipient for each mail passing through Sendmail. This new

In postfix, this is a built-in feature and it is very powerful.  You can
"always_bcc" to some fixed address(es).  Recent versions can use lookup maps
to have different "always_bcc" addresses keyed to the original message
recipient or original message sender...

http://www.postfix.org/ADDRESS_REWRITING_README.html#auto_bcc

But this is getting off-topic :-)

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: De-duping attachments

2010-09-15 Thread Gavin McCullagh
Hi,

On Wed, 15 Sep 2010, Nik Conwell wrote:

> Isn't the easy hack for dedup just looking at the above md5 files and 
> then doing appropriate hard links?  This could be done by a nightly 
> trawl of the spool space.  A bigger win would be to separate the headers 
> from the messages but that's a lot more work.

For what it's worth, I believe the fsdup tool which is part of fslint will
do this for you.

http://www.pixelbeat.org/fslint/

Gavin



Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: De-duping attachments

2010-09-15 Thread Joseph Brennan

Outside the cyrus box:  The Mimedefang milter has a built-in function
(optional of course) to remove an attachment, write it to a file, and
replace the attachment part with a text part giving a web link to the
file.  The files could be on a slower type of disk drive than you need
for email storage.  You could write code choosing which attachments to
do this to, say by size or file extension.  A mechanism to remove the
files is not provided, but it's suggested that recipients would need
to download the attachment to their own computer and that therefore
the files could be deleted by a cron job based on age.

I mention this only as another way to do it.  Note that this could be
implemented for outgoing mail too.  We have not implemented it here
so I can't say more than that it is possible.

Joseph Brennan
Columbia University Information Technology


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: Using cvt_cyrusdb to convert quota database from skiplist back to quotalegacy.

2010-09-15 Thread Bron Gondwana
On Tue, Sep 14, 2010 at 11:58:03PM +0200, Eric Luyten wrote:
> Hello,
> 
> 
> I am having trouble converting a quota skiplist db back to quotalegacy
> format (I know... this is probably not the most common Cyrus operation :-)

Yeah, odd!  I wonder what's going on there.  I'll take a look.

> Other question : would I be better off with 65,000 small files
> (quotalegacy) in a one-level hash or with a single skiplist db
> for my quota information, when the files reside on solid state
> storage anyway ?

65k small files :)  Fewer locking issues.

Bron.

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: De-duping attachments

2010-09-15 Thread Eric Luyten
On Wed, September 15, 2010 2:12 pm, Simon Matter wrote:

> You said ZFS, did you
> consider testing its built in deduping?
> (If its even there in Solaris 10?)

Simon,


OpenSolaris does have it (block level dedup) since about one year
but it is too recent an addition to the commercial Solaris 10 to
start using it (IMO). Apparently (Wikipedia) it is ZFS pool feature
21 listed as 'Reserved' by 'zpool upgrade -v'
(h... both 'zfs get all ...' and 'zpool get all ...' do not
 yield a parameter sounding as 'deduplication' ; it may very well
 not be there yet)

Furthermore, I'd like to repeat what has been written earlier in
this thread : a message header that is different in size by even
one byte will cause block boundaries to shift and, I suspect, block
level dedup to fail.


Eric Luyten.


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: Replication sync-server and Delayed Delete

2010-09-15 Thread Bron Gondwana
On Wed, Sep 15, 2010 at 12:29:18PM +0100, Gavin Gray wrote:
> Hi there,
> 
> We have a cyrus murder using replication and we have a few questions
> about the behaviour we are seeing on our system.
> 
> 1. cyr_expire on the master doesn't cause any replication to happen.
> Is that 'correct'? In other words if we want to delete folders from
> the DELETED heirarchy on the replicant then we need to also run
> cyr_expire on the replicant?

Yeah, pretty much.
 
> 2. We're also a little unclear about replication vis a vis the
> delayed expunge and the unexpunge facility. Could you explain what
> ought to happen in terms of replication when email is expunged and
> then possibly unexpunged if anything?

It's a bit messy.  Unexpunge is a sin against IMAP by the way, and
has been replaced with "generate new UID and promote" in 2.4.  In
which case it's just like a new append wit the same flags, and
replicates like an append :)

2.3 replication ignores expunges - it's as if they don't exist!  When
the mailbox syncs, it nukes the records that aren't "alive" on the
master from the replica.  If you re-inject them with unexpunge, it
should find them and sync_combine_commit() the result.  I don't know
if unexpunge inserts replication events though - somewhat doubt it.

> 3. We are seeing a strange anomaly on the replication of deleting a folder.
>e.g a user deletes a folder
>the folder goes into the DELETED heirarchy of the partition
> the user's mailbox is on
>the folder is also deleted from the replicant as we would expect
>however the folder on the replicant goes into the DELETED
> heirarchy on a different partition(the default partition as
> specified in cyrus.conf). Is this normal?

Replication and partitions is broken in some ways in 2.3.  It should
be better in 2.4 I believe, though I haven't tested it.  I'm going to
be releasing an alpha super-soon (it's been pushed to git.cyrusimap.org
now!)
 
Bron.

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: Mailbox directory structure

2010-09-15 Thread Artur Kaminski
Date: Wed, 15 Sep 2010 14:02:38 +0200

> From: Michael Menge 
> Subject: Re: Mailbox directory structure
> To: info-cyrus@lists.andrew.cmu.edu
> Message-ID: <20100915140238.18471f8lfaaqs...@webmail.uni-tuebingen.de>
> Content-Type: text/plain; charset="utf-8"
>
> Quoting Artur Kaminski :
>
> > Hey all,
> >
> > I installed imapd server successfully, and moved configuration from old
> one,
> > but then accidentally loaded another server's configuration from puppet.
> Now
> > Cyrus looks for user mailboxes in
> >
> > User   Mailbox
> >    /var/spool/imap/a/
> > user.  /var/spool/imap/u/user^
> >
> > (checked using cyradmin, with creating the mailboxes above)
> >
> >
> > Actual mailbox for hypothetical user is in /var/spool/imap/a/users/
> > (moved from old server).
> >
> > In effect Postfix gets response "5.1.1 Mailbox not found" to all
> requests,
> > including root.
> >
> The option unixhierarchysep changes the hierarchy seperatror from . to /
> The . is representet as ^ in filesystem.
>
>
> > I can log in using IMAP.
> > I can't increase Cyrus logging.
>
> Cyrus logs everything to Syslog, you have to change your syslog config
> to change the loglevel.
>
>
>
Thank you Michael for quick reply.

So "unixhierarchysep = yes" in my imapd.conf should change the dot vel caret
to slash?

Unfortunately it doesn't, but indeed it would solve my problem. Can it be
covered by the same variable set to "no" in another file?


I receive a lot of logs from all Cyrus daemons, but I actually wanted to
turn on some kind of debugging (to find why it doesn't search in user
subdirectory). Dr Google told me about CYRUS_VERBOSE=1, but placed in
/etc/init.d/cyrus-master changed nothing.

Your advise about unixhierarchysep probably makes me happy with the current
level of logging.




Thank you
Artur

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/

Re: De-duping attachments

2010-09-15 Thread Simon Matter
> On Wed, September 15, 2010 10:01 am, Simon Matter wrote:
>
>> I guess much more efficient than a compressing filesystem would be a
>> compressing and de-duping filesystem or disk storage in this case. Has
>> anyone
>> tried this with a Cyrus message store with lots of "corporate message
>> data"
>> stored on it?
>
>
> Simon,
>
>
> The Cyrus server I hope to get online tomorrow evening holds 4.2 TB of
> mail
> and uses ZFS with maximal compression (gzip9) for the message files.
> (OS : Solaris 10)
>
> ZFS reports a compressratio of between 1.95 and 1.97 (we have nine
> partitions)
>
> A series of tests revealed our metadata can actually be compressed by a
> factor
> of 3.76 (!)
>
> Perhaps a two-university environment with 60,000+ users doesn't quite
> qualify
> as "corporate" enough but here you have our figures :-)

Eric, that looks of course interesting. With more "corporate" style I
means much less users but much bigger mailboxes. "Enforcing" quota in the
mulit GB range seems quite common these days. In such environment I expect
the compression ratio to increase.
But, the big question for me is how much filesystem / block level deduping
is going to shrink it? You said ZFS, did you consider testing its built in
deduping? (If its even there in Solaris 10?)

Simon


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: Mailbox directory structure

2010-09-15 Thread Michael Menge

Quoting Artur Kaminski :


Hey all,

I installed imapd server successfully, and moved configuration from old one,
but then accidentally loaded another server's configuration from puppet. Now
Cyrus looks for user mailboxes in

User   Mailbox
   /var/spool/imap/a/
user.  /var/spool/imap/u/user^

(checked using cyradmin, with creating the mailboxes above)


Actual mailbox for hypothetical user is in /var/spool/imap/a/users/
(moved from old server).

In effect Postfix gets response "5.1.1 Mailbox not found" to all requests,
including root.


The option unixhierarchysep changes the hierarchy seperatror from . to /
The . is representet as ^ in filesystem.



I can log in using IMAP.
I can't increase Cyrus logging.


Cyrus logs everything to Syslog, you have to change your syslog config
to change the loglevel.


  Michael



M.MengeTel.: (49) 7071/29-70316
Universität Tübingen   Fax.: (49) 7071/29-5912
Zentrum für Datenverarbeitung  mail:  
michael.me...@zdv.uni-tuebingen.de

Wächterstraße 76
72074 Tübingen

smime.p7s
Description: S/MIME Signatur

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/

Re: De-duping attachments

2010-09-15 Thread Nik Conwell
  Great thread.  Here as some real world numbers based on our spools 
here at BU.

One of our masters has 4,800 users, 22,000 mailboxes, and is using about 
374G of disk.

Based on the md5 files for these users there are 6,046,363 messages.  If 
I look at the first md5 value (md5 on the msg if I understand this) and 
sort and uniq I get 5,891,974 messages, so assuming we dedup all those 
messages that would be a shrink to 97.4% of the original number of 
messages.  Assuming an even distribution of message sizes this would 
mean 374G would drop down to 362.78G.  Unfortunately not an obvious huge 
win.

But, I think the md5 of the message file includes headers which may be 
more likely to be unique over the body content.  (Due to legacy support 
for UW IMAP, we often end up routing things differently for users on the 
same master so the headers for the same message sent to 2 people could 
be different).

Isn't the easy hack for dedup just looking at the above md5 files and 
then doing appropriate hard links?  This could be done by a nightly 
trawl of the spool space.  A bigger win would be to separate the headers 
from the messages but that's a lot more work.

-nik


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Replication sync-server and Delayed Delete

2010-09-15 Thread Gavin Gray
Hi there,

We have a cyrus murder using replication and we have a few questions about  
the behaviour we are seeing on our system.

1. cyr_expire on the master doesn't cause any replication to happen. Is  
that 'correct'? In other words if we want to delete folders from the  
DELETED heirarchy on the replicant then we need to also run cyr_expire on  
the replicant?

2. We're also a little unclear about replication vis a vis the  delayed  
expunge and the unexpunge facility. Could you explain what ought to happen  
in terms of replication when email is expunged and then possibly  
unexpunged if anything?

3. We are seeing a strange anomaly on the replication of deleting a folder.
e.g a user deletes a folder
the folder goes into the DELETED heirarchy of the partition the  
user's mailbox is on
the folder is also deleted from the replicant as we would expect
however the folder on the replicant goes into the DELETED heirarchy  
on a different partition(the default partition as specified in  
cyrus.conf). Is this normal?

many thanks,

Gavin Gray



name   : Cyrus IMAPD
version: v2.3.15 2009/09/09 12:35:48
vendor : Project Cyrus
support-url: http://cyrusimap.web.cmu.edu
os : SunOS
os-version : 5.11
environment: Built w/Cyrus SASL 2.1.23
   Running w/Cyrus SASL 2.1.23
   Built w/Berkeley DB 4.7.25: (May 15, 2008)
   Running w/Berkeley DB 4.7.25: (May 15, 2008)
   Built w/OpenSSL 0.9.8a 11 Oct 2005 (+ security fixes
for: CVE-2006-2937 CVE-2006-2940 CVE-2006-3738 CVE-2006-4339
CVE-2006-4343 CVE-2007-3108 CVE-2007-4995 CVE-2007-5135 CVE-2008-5077
CVE-2009-0590)
   Running w/OpenSSL 0.9.8a 11 Oct 2005 (+ security fixes
for: CVE-2006-2937 CVE-2006-2940 CVE-2006-3738 CVE-2006-4339
CVE-2006-4343 CVE-2007-3108 CVE-2007-4995 CVE-2007-5135 CVE-2008-5077
CVE-2009-0590)
   Built w/zlib 1.2.3
   Running w/zlib 1.2.3
   CMU Sieve 2.3
   NET-SNMP
   mmap = shared
   lock = fcntl
   nonblock = fcntl
   idle = poll



-- 
Gavin Gray
Edinburgh University Information Services
Rm 2013 JCMB
Kings Buildings
Edinburgh
EH9 3JZ
UK
tel +44 (0)131 650 5987
email gavin.g...@ed.ac.uk

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Mailbox directory structure

2010-09-15 Thread Artur Kaminski
Hey all,

I installed imapd server successfully, and moved configuration from old one,
but then accidentally loaded another server's configuration from puppet. Now
Cyrus looks for user mailboxes in

User   Mailbox
   /var/spool/imap/a/
user.  /var/spool/imap/u/user^

(checked using cyradmin, with creating the mailboxes above)


Actual mailbox for hypothetical user is in /var/spool/imap/a/users/
(moved from old server).

In effect Postfix gets response "5.1.1 Mailbox not found" to all requests,
including root.

I can log in using IMAP.
I can't increase Cyrus logging.


Any help/inspiration is appreciated :-)




Regards
Artur






/etc/imapd.conf:

configdirectory: /var/lib/imap
defaultpartition: default
partition-default: /var/spool/imap
altnamespace: no
unixhierarchysep: yes
admins: cyrus
sieve_admins: cyrus
allowanonymouslogin: no
popminpoll: 0
autocreatequota: 0
umask: 077
sieveusehomedir: false
sievedir: /var/spool/sieve
hashimapspool: true
fulldirhash: yes
allowplaintext: yes
sasl_mech_list: PLAIN LOGIN
sasl_pwcheck_method: saslauthd
sasl_auto_transition: no
tls_ca_path: /etc/ssl/certs
tls_session_timeout: 1440
tls_cipher_list: TLSv1:SSLv3:SSLv2:!NULL:!EXPORT:!DES:!LOW:@STRENGTH
lmtpsocket: /var/lib/imap/socket/lmtp
idlesocket: /var/run/cyrus/socket/idle
notifysocket: /var/run/cyrus/socket/notify
tls_cert_file: /etc/ssl/certs/corp.crt
tls_key_file: /etc/ssl/certs/corp.key

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/

Re: De-duping attachments

2010-09-15 Thread Shuvam Misra
> Makes sense.  There might be some size based logic here too - only
> bother applying this on messages over 20k, and where the attachment
> is at least 20k in size.  Anything smaller than that is pretty
> pointless.

Yes, absolutely. Left to myself, I'd not have bothered with any
attachment less than 100KBytes or so. The stuff that gets my goat is
seeing our customers using email to shunt 20MB CAD files back and forth
across the world two dozen times. Emails are being used for the kind of
work God had meant trucks to do. :(

> Sure.  Ideas are good :)  I don't think I'm sold on the value though.
> And given that Rob is actually the one who argued me down from
> implementing this years ago ;)  But maybe our use case isn't the
> same as yours.

Let me get some hard data from a few of our large corporate clients'
servers, and then we'll talk again. May take a couple of weeks to get
this data, because we'll need to look for a time window when the mail
server is less loaded to run our scan.

Shuvam

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: Importing/moving an older cyrus message tree into a new system, without IMAP

2010-09-15 Thread Shuvam Misra
> Annotations are defined in RFC 5257.
> 
> They allow an admin to add metadata to a mailbox (or the server). The
> cyradm utility sets annotations with its internal info, mboxcfg, and
> setinfo commands.

Okay, checked. Don't know where these things are used, other than expiry
and sieve, but at least I got the basics.

> >What are mailbox keys?
> 
> It's for URLAUTH. See RFC 4467, and:
> 
> http://www.cyrusimap.org/docs/cyrus-imapd/2.3.16/internal/database-formats.php

Thanks for the pointer. Still reading.

Shuvam

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: De-duping attachments

2010-09-15 Thread Shuvam Misra

The sparse file idea is brilliant! Never occurred to me. :)

We'd have to store the reference-pointer in the message file, so we would
omit the actual attachment but eat up perhaps 50 bytes to keep the
reference to the file.

Shuvam

> 1. Completely rewrite the message file removing the attachments and
> adding any extra meta data you want in it's place
> 2. Leave the message file as exactly the same size, just don't write
> out the attachment content and assume your filesystem supports
> sparse files (http://en.wikipedia.org/wiki/Sparse_file)
> 
> The advantage of 2 is that it leaves the message file size correct,
> and all the offsets in the file are still correct. The downsides are
> that you must ensure your FS supports sparse files well, and there's
> the question of where do you actually store the information that
> links to the external file?

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: De-duping attachments

2010-09-15 Thread Shuvam Misra
Dear Bron,

> So you save, what, 50%.  Does that sound about right?  Do you have
> statistics on how much space you'd save with this theoretical
> patch?

No, and this is the first thing I want to do. I'm getting some simple
utilities developed which will run all week (niced suitably) and extract
and MD5sum each attachment. I'll then count how many unique message-IDs
have the same unique document, and I'll get a report. This has been under
discussion in our group for some time --- let me get this done and I'll
let all of you know.

> You're buying a few months.  Usage grows to fill the available storage,
> whatever it is.  And you can only pull this piece of magic once.

Unfortunately, you're totally right. The junk will keep growing.

Shuvam

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: Using cvt_cyrusdb to convert quota database from skiplist back to quotalegacy.

2010-09-15 Thread Eric Luyten
On Wed, September 15, 2010 9:27 am, Simon Matter wrote:

>> I am having trouble converting a quota skiplist db back to quotalegacy
>> format (I know... this is probably not the most common Cyrus operation :-)
>>
>> % cvt_cyrusdb /ssd/cyrs/imap/quotas.db skiplist /ssd/cyrs/imap/quota
>> quotalegacy Converting from /ssd/cyrs/imap/quotas.db (skiplist) to
>> /ssd/cyrs/imap/quota
>> (quotalegacy)
>> % find quota -type f | wc -l
>> 126
>> % strings quotas.db|wc -l
>> 135229
>>
>>
>>
>> "quotas.db" was created using the reverse operation and took about one
>> minute. I renamed the original 'quota' directory out of the way before making
>> the second cvt_cyrusdb call.
>>
>> Closer inspection of the newly created 'quota' directory reveals 125 quota
>> descriptor files named user.aXX created under 'a', all relating to
>> existing top level mailboxes and containing the correct information, and
>> (curiously) one file named 'u' in directory 'u'.
>>
>>
>> I also tried a Berkeley DB intermediate format and the creation
>> of the quotalegacy structure failed in an identical way.
>
> I expect this to be a bug in the way cvt_cyrusdb calls the quotalegacy
> backend, or in the backend itself. And I guess you are the first to test it 
> out
> :)
> Could test it with different dirhashing options to found out how exactly
> it fails?


Simon,


Yes, one could, but this is not my number one priority right now.

Converting from skiplist to flat takes two minutes and, since the
"quotalegacy" structure/format is trivial, a few lines of scripting
will do the rest, if ever needed.


Regards,
Eric Luyten, Computing Centre VUB/ULB.



Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: De-duping attachments

2010-09-15 Thread Eric Luyten
On Wed, September 15, 2010 10:01 am, Simon Matter wrote:

> I guess much more efficient than a compressing filesystem would be a
> compressing and de-duping filesystem or disk storage in this case. Has anyone
> tried this with a Cyrus message store with lots of "corporate message data"
> stored on it?


Simon,


The Cyrus server I hope to get online tomorrow evening holds 4.2 TB of mail
and uses ZFS with maximal compression (gzip9) for the message files.
(OS : Solaris 10)

ZFS reports a compressratio of between 1.95 and 1.97 (we have nine partitions)

A series of tests revealed our metadata can actually be compressed by a factor
of 3.76 (!)

Perhaps a two-university environment with 60,000+ users doesn't quite qualify
as "corporate" enough but here you have our figures :-)


Regards,
Eric.



Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: De-duping attachments

2010-09-15 Thread Simon Matter
> On Wed, Sep 15, 2010 at 09:15:13AM +0530, Shuvam Misra wrote:
>> Dear Bron,
>>
>> > http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413
>> >
>> > 2TB - US $109.
>>
>> Don't want to nit-pick here, but the effective price we pay is about
>> ten times this.
>
> Yeah, so?  It's going down.  That's a large number of attachments
> we're talking about there.
>
>> To set up a mail server with a few TB of disk space,
>> we usually land up deploying a separate chassis with RAID controllers
>> and
>> a RAID array, with FC connections from servers, etc, etc.  All this adds
>> up to about $1,000/TB of usable space if you're using something like the
>> "low-end" IBM DS3400 box or Dell/EMC equivalent. This is even with
>> inexpensive 7200RPM SATA-II drives, not 15KRPM SAS drives.
>
> Hmm... our storage units with metadata on SSD come in about $1200/TB.
> Yes, that sounds about right.  That's including hot spares, RAID1 on
> everything (including the SSDs), scads of processor and memory.
> Obviously multiply that by two for replication, and add in a bit of
> extra for backups and I'm happy to arrive at a figure of approximately
> $3000 per terabyte of actual email.
>
>> And most of our customers actually double this cost because they keep
>> two
>> physically identical chassis for redundancy. (We recommend this too,
>> because we can't trust a single RAID 5 array to withstand controller or
>> PSU failures.) In that case, it's $2000/TB.
>
> And because it's nice not to have downtime when you're doing
> maintainence.  I replaced an entire drive unit today, including
> about 4 hours downtime on one of our servers as the system was
> swamped with IO creating new filesystems and initialising the
> drives.   The users didn't see a thing, and repliation is now
> fully operational again.
>
>> And you do reach 5-10 TB of email store quite rapidly --- our company
>> has many corporate clients (< 500 email users) whose IMAP store has
>> reached 4TB. No one wants to enforce disk quotas (corporate policy),
>> and most users don't want to delete emails on their own.
>
> So you save, what, 50%.  Does that sound about right?  Do you have
> statistics on how much space you'd save with this theoretical
> patch?
>
>> We keep hearing the logic that storage is cheap, and stories of cloud
>> storage through Amazon, unlimited mailboxes on Gmail, are reinforcing
>> the belief. But at the ground level in mid-market corporate IT budgets,
>> storage costs in data centres (as against inside desktops) are still
>> too high to be trivial, and their prices have only little to do with
>> the prices of raw SATA-II drives. A fully-loaded DS3400 costs a little
>> over $12,000 in India, with a full set of 1TB SATA-II drives from IBM,
>> but even with high cost of IBM drives, the drives themselves contribute
>> less than 30% of the total cost.
>
> You're buying a few months.  Usage grows to fill the available storage,
> whatever it is.  And you can only pull this piece of magic once.
>
>> If we really want to put our collective money where our mouth is, and
>> deliver the storage-is-cheap promise at the ground level, we need to
>> rearchitect every file server and IMAP server to work in map-reduce mode
>> and use disks inside desktops. Anyone game for this project? :)
>
> You could buy as much benefit much more quickly by gzipping the
> individual email files.  Either a filesystem that stores files
> compressed, or a cyrus patch to do that and unpack files on the
> fly if the body was read.  Along with most/all headers in the

I guess much more efficient than a compressing filesystem would be a
compressing and de-duping filesystem or disk storage in this case. Has
anyone tried this with a Cyrus message store with lots of "corporate
message data" stored on it?

Simon


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/


Re: Using cvt_cyrusdb to convert quota database from skiplist back to quotalegacy.

2010-09-15 Thread Simon Matter
> Hello,
>
>
> I am having trouble converting a quota skiplist db back to quotalegacy
> format (I know... this is probably not the most common Cyrus operation :-)
>
> % cvt_cyrusdb /ssd/cyrs/imap/quotas.db skiplist /ssd/cyrs/imap/quota
> quotalegacy
> Converting from /ssd/cyrs/imap/quotas.db (skiplist) to
> /ssd/cyrs/imap/quota
> (quotalegacy)
> % find quota -type f | wc -l
>  126
> % strings quotas.db|wc -l
>   135229
>
>
> "quotas.db" was created using the reverse operation and took about one
> minute.
> I renamed the original 'quota' directory out of the way before making the
> second cvt_cyrusdb call.
>
> Closer inspection of the newly created 'quota' directory reveals 125 quota
> descriptor files named user.aXX created under 'a', all relating to
> existing top level mailboxes and containing the correct information, and
> (curiously) one file named 'u' in directory 'u'.
>
> I also tried a Berkeley DB intermediate format and the creation
> of the quotalegacy structure failed in an identical way.

I expect this to be a bug in the way cvt_cyrusdb calls the quotalegacy
backend, or in the backend itself. And I guess you are the first to test
it out :)
Could test it with different dirhashing options to found out how exactly
it fails?

Simon

>
>
> Other question : would I be better off with 65,000 small files
> (quotalegacy) in a one-level hash or with a single skiplist db
> for my quota information, when the files reside on solid state
> storage anyway ?
>
>
> Thx,
> Eric Luyten, Computing Centre VUB/ULB.
>
> 
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
>



Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/