Re: choosing a file system

2009-01-19 Thread LALOT Dominique
Please, we are following a long thread I introduced a while ago which
was speaking about file system. If  you want to ask question about
something you red in that thread, but not properly speaking  about
file system, please, be kind enough to start another thread and stop
replying in that thread.

Dom

2009/1/19 Jorey Bump :
> Andrew McNamara wrote, at 01/19/2009 01:29 AM:
>>> Yeah, except Postfix encodes the inode of the queue files in its queue
>>> IDs, so it gets very confused if you do this.  Same with restoring
>>> queues from backups.
>>
>> You should be able to get away with this if, when moving the queue to
>> another machine, you move the queued mail from hold, incoming, active and
>> deferred directories into the maildrop directory on the target instance.
>>
>> This (somewhat old, but still correct, I think) message from Wietse
>> might shed more light on it:
>>
>> Date: Thu, 12 Sep 2002 20:33:08 -0400 (EDT)
>> From: wie...@porcupine.org (Wietse Venema)
>> Subject: Re: postfix migration
>>
>> > I want to migrate postfix to another machine. What are also the steps 
>> so
>> > that I won't lose mails on the process?
>>
>> This is the safe procedure.
>>
>> 1) On the old machine, stop Postfix.
>>
>> 2) On the old machine, run as super-user:
>>
>> postsuper -r ALL
>>
>>This moves all queue files to the maildrop queue.
>>
>> 3) On the old machine, back up /var/spool/postfix/maildrop
>>
>> 4) On the new machine, make sure Postfix works.
>>
>> 5) On the new machine, stop Postfix.
>>
>> 6) On the new machine, restore /var/spool/postfix/maildrop
>>
>> 7) On the new machine, start Postfix.
>>
>> There are ways to skip the "postsuper -r ALL" step, and copy the
>> incoming + active + deferred + bounce + defer + flush + hold
>> directories to the new machine, but that would be safe only with
>> an empty queue on the new machine.
>>
>
> This has become somewhat off-topic for this list, but you might be able
> to simply sync the entire Postfix queue to the backup machine, and run
> postsuper -s before starting Postfix on the backup. From the postsuper
> man page:
>
>  -s Structure  check and structure repair.  This should be done
> once before Postfix startup.
>
> Rename files whose name does not match the message file inode
> number. This operation  is necessary after restoring a mail
> queue from a different machine, or from backup media.
>
> The important thing to keep in mind is that Postfix embeds the inode
> number in the filename simply to keep the name unique while the message
> resides on the filesystem. Obviously, this approach breaks when the
> files are copied to another filesystem. Renaming them appropriately on
> the new destination ensures no files will be overwritten as the queue is
> processed or new messages enter the queue. Of course, the scheme I
> proposed earlier requires that once the backup Postfix is brought up, it
> must be impossible for the primary to begin resyncing files to the same
> location on the backup if it becomes active again (or refuses to die a
> graceful death). Certainly tricky, but it sounds like the use case is to
>  preserve the queue in case of a total failure, just to make sure the
> mail goes out (even it means it goes out twice).
>
>
> 
> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>



-- 
Dominique LALOT
Ingénieur Systèmes et Réseaux
http://annuaire.univmed.fr/showuser?uid=lalot

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-19 Thread Jorey Bump
Andrew McNamara wrote, at 01/19/2009 01:29 AM:
>> Yeah, except Postfix encodes the inode of the queue files in its queue
>> IDs, so it gets very confused if you do this.  Same with restoring
>> queues from backups.
> 
> You should be able to get away with this if, when moving the queue to
> another machine, you move the queued mail from hold, incoming, active and
> deferred directories into the maildrop directory on the target instance.
> 
> This (somewhat old, but still correct, I think) message from Wietse
> might shed more light on it:
> 
> Date: Thu, 12 Sep 2002 20:33:08 -0400 (EDT)
> From: wie...@porcupine.org (Wietse Venema)
> Subject: Re: postfix migration
> 
> > I want to migrate postfix to another machine. What are also the steps 
> so 
> > that I won't lose mails on the process?
> 
> This is the safe procedure.
> 
> 1) On the old machine, stop Postfix.
> 
> 2) On the old machine, run as super-user:
> 
> postsuper -r ALL
> 
>This moves all queue files to the maildrop queue.
> 
> 3) On the old machine, back up /var/spool/postfix/maildrop
> 
> 4) On the new machine, make sure Postfix works.
> 
> 5) On the new machine, stop Postfix.
> 
> 6) On the new machine, restore /var/spool/postfix/maildrop
> 
> 7) On the new machine, start Postfix.
> 
> There are ways to skip the "postsuper -r ALL" step, and copy the
> incoming + active + deferred + bounce + defer + flush + hold
> directories to the new machine, but that would be safe only with
> an empty queue on the new machine.
> 

This has become somewhat off-topic for this list, but you might be able
to simply sync the entire Postfix queue to the backup machine, and run
postsuper -s before starting Postfix on the backup. From the postsuper
man page:

  -s Structure  check and structure repair.  This should be done
 once before Postfix startup.

 Rename files whose name does not match the message file inode
 number. This operation  is necessary after restoring a mail
 queue from a different machine, or from backup media.

The important thing to keep in mind is that Postfix embeds the inode
number in the filename simply to keep the name unique while the message
resides on the filesystem. Obviously, this approach breaks when the
files are copied to another filesystem. Renaming them appropriately on
the new destination ensures no files will be overwritten as the queue is
processed or new messages enter the queue. Of course, the scheme I
proposed earlier requires that once the backup Postfix is brought up, it
must be impossible for the primary to begin resyncing files to the same
location on the backup if it becomes active again (or refuses to die a
graceful death). Certainly tricky, but it sounds like the use case is to
 preserve the queue in case of a total failure, just to make sure the
mail goes out (even it means it goes out twice).



Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-18 Thread Andrew McNamara
>Yeah, except Postfix encodes the inode of the queue files in its queue
>IDs, so it gets very confused if you do this.  Same with restoring
>queues from backups.

You should be able to get away with this if, when moving the queue to
another machine, you move the queued mail from hold, incoming, active and
deferred directories into the maildrop directory on the target instance.

This (somewhat old, but still correct, I think) message from Wietse
might shed more light on it:

Date: Thu, 12 Sep 2002 20:33:08 -0400 (EDT)
From: wie...@porcupine.org (Wietse Venema)
Subject: Re: postfix migration

> I want to migrate postfix to another machine. What are also the steps so 
> that I won't lose mails on the process?

This is the safe procedure.

1) On the old machine, stop Postfix.

2) On the old machine, run as super-user:

postsuper -r ALL

   This moves all queue files to the maildrop queue.

3) On the old machine, back up /var/spool/postfix/maildrop

4) On the new machine, make sure Postfix works.

5) On the new machine, stop Postfix.

6) On the new machine, restore /var/spool/postfix/maildrop

7) On the new machine, start Postfix.

There are ways to skip the "postsuper -r ALL" step, and copy the
incoming + active + deferred + bounce + defer + flush + hold
directories to the new machine, but that would be safe only with
an empty queue on the new machine.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-10 Thread Bron Gondwana
On Sat, Jan 10, 2009 at 02:35:53PM -0500, Jorey Bump wrote:
> Bron Gondwana wrote, at 01/10/2009 04:56 AM:
> 
> > So - no filesystem is sacred.  Except for bloody out1 with its 1000+
> > queued postfix emails and no replication.  It's been annoying me for
> > over a year now, because EVERYTHING ELSE is replicated.  We've got
> > some new hardware in place, so I'm investigating drbd as an option
> > here.  Not convined.  It still puts us at the mercy of a filesystem
> > crash.  
> > 
> > I'd prefer a higher level replication solution, but I don't know 
> > any product that replicates outbound mail queues nicely between
> > multiple machines in a way that guarantees that every mail will be
> > delivered at least once, and if there's a machine failure the only
> > possible failure mode is that the second machine isn't aware that
> > the message hasn't been delivered yet, so delivers it again.  That's
> > what I want.
> 
> You could regularly rsync or rdiff-backup your Postfix queue directory
> to another machine where Postfix lies dormant, but with a similar
> configuration. In the event of a machine failure, you can start up
> Postfix on the backup, which may even be able to function as a complete
> replacement (submission, MX, delivery over LMTP). There is still
> opportunity for minor race conditions and automating failover needs to
> be worked out, but it's better than nothing.

Yeah, except Postfix encodes the inode of the queue files in its queue
IDs, so it gets very confused if you do this.  Same with restoring
queues from backups.

My searches on the postfix mailing list archives have shown similar
questions being asked a couple of times, but nobody has come up with
a really good solution so far.

We do keep inhouse patches against postfix - I think we apply 6 at the
moment.  So I'm happy to make small changes to support this :)
 
> Jorey ( big fan of Bron's occasional parenthetical sig comments! )

Bron ( I try ;) )

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-10 Thread Jorey Bump
Bron Gondwana wrote, at 01/10/2009 04:56 AM:

> So - no filesystem is sacred.  Except for bloody out1 with its 1000+
> queued postfix emails and no replication.  It's been annoying me for
> over a year now, because EVERYTHING ELSE is replicated.  We've got
> some new hardware in place, so I'm investigating drbd as an option
> here.  Not convined.  It still puts us at the mercy of a filesystem
> crash.  
> 
> I'd prefer a higher level replication solution, but I don't know 
> any product that replicates outbound mail queues nicely between
> multiple machines in a way that guarantees that every mail will be
> delivered at least once, and if there's a machine failure the only
> possible failure mode is that the second machine isn't aware that
> the message hasn't been delivered yet, so delivers it again.  That's
> what I want.

You could regularly rsync or rdiff-backup your Postfix queue directory
to another machine where Postfix lies dormant, but with a similar
configuration. In the event of a machine failure, you can start up
Postfix on the backup, which may even be able to function as a complete
replacement (submission, MX, delivery over LMTP). There is still
opportunity for minor race conditions and automating failover needs to
be worked out, but it's better than nothing.

Jorey ( big fan of Bron's occasional parenthetical sig comments! )


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-10 Thread Bron Gondwana
On Fri, Jan 09, 2009 at 05:20:02PM +0200, Janne Peltonen wrote:
> I've even been playing a little with userland ZFS, but it's far from
> usable in production (was a nice little toy. though, and a /lot/ faster
> than could be believed).

Yeah - zfs-on-fuse is not something I'd want to trust production data
to. 
 
> I think other points concerning why not to change to another OS
> completely for the benefits available in ZFS were already covered by
> Bron, so I'm not going to waste bandwidth any more with this matter. :)

I did get a bit worked up about it ;)

Thankfully, I don't get confronted with fsck prompts very often, because
my response to fsck required is pretty simple these days :)

a) it's a system partition - reinstall.  Takes 10 minutes from start to
   finish (ok, 15 on some of the bigger servers, POST being the extra)
   and doesn't blat data partitions.

   Our machines are installed using FAI to bring the base operating
   system up and install the "fastmail-server" Debian package, which
   pulls in all the packages we use as dependencies.  It then checks
   out the latest subversion repository and does "make -C conf install"
   which sets up everything else.

   This is all per-role and per machine configured in a config file
   which contains lots of little micro languages optimised for being
   easy to read in a 'diff -u', since that's what our subversion
   commit hook emails us.

b) if it's a cyrus partition, nuke the data and meta partitions and
   re-sync all users from the replicated pair.

c) if it's a VFS partition, nuke it and let the automated balancing
   script fill it back up in its own time (this is the nicest one,
   all key-value based with sha1.  I know I'll probably have to
   migrate the whole thing to sha3 at some stage, but happy to wait
   until it's finalised)

d) oh yeah, mysql.  That's replicated between two machines as well,
   and dumped with ibbackup every night.  If we lose one of these
   we restore from the previous night's backup and let replication
   catch up.  It's never happened (yet) on the primary pair - I've
   had to rebuild a few slaves though, so the process is well tested.

So - no filesystem is sacred.  Except for bloody out1 with its 1000+
queued postfix emails and no replication.  It's been annoying me for
over a year now, because EVERYTHING ELSE is replicated.  We've got
some new hardware in place, so I'm investigating drbd as an option
here.  Not convined.  It still puts us at the mercy of a filesystem
crash.  

I'd prefer a higher level replication solution, but I don't know 
any product that replicates outbound mail queues nicely between
multiple machines in a way that guarantees that every mail will be
delivered at least once, and if there's a machine failure the only
possible failure mode is that the second machine isn't aware that
the message hasn't been delivered yet, so delivers it again.  That's
what I want.

I'd also like a replication mode for our IMAP server that guaranteed
the message was actually committed to disk on both machines before
returning OK to the lmtpd or imapd.  That's a whole lot of work
though.

(we actually lost an entire external drive unit the other day, and
had to move replicas to new machines.  ZFS wouldn't have helped here,
the failure was hardware.  We would still have had perfectly good
filesystems that were offline.  Can't serve up emails while offline)

Bron.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-09 Thread Janne Peltonen
On Fri, Jan 09, 2009 at 08:41:38AM -0600, Scott Lambert wrote:
> On Fri, Jan 09, 2009 at 10:54:10AM +0200, Janne Peltonen wrote:
> > So have I. But in the current Cyrus installation, I'm stuck with Linux,
> > so I concentrated on what's available on Linux. Moreover, I don't want
> > to use non-free operating systems - if anything, I've become more
> > ideological with age... I'd happily use /any/ free unix variant that ran
> > ZFS, but.
> 
> Well, fire up your test environment and start playing with FreeBSD.  ZFS
> and DTrace and "free."  The better ZFS is in 8-CURRENT.  Apparently,
> you have to tweak things a bit on 7-STABLE still.  But, by the time you
> (third person non-specific) get comfortable with "not Linux" 8 may be
> -STABLE.

OK, I was oversimplifying things. ZFS isn't actually non-free as such,
it's just GPL-incompatible.

And the "but" above did include quite a lot of things, like for instance
"us" being committed to red hat / centos. If I haven't been able to even
alter the Linux distribution here, just how hard would you think it'd be
to try altering the unix variant... For example, one thing is, our SAN
vendor says their product supports Red Hat, and I'm already treading
over thin ice by using Centos.

I've even been playing a little with userland ZFS, but it's far from
usable in production (was a nice little toy. though, and a /lot/ faster
than could be believed).

I think other points concerning why not to change to another OS
completely for the benefits available in ZFS were already covered by
Bron, so I'm not going to waste bandwidth any more with this matter. :)


--Janne
-- 
Janne Peltonen  PGP Key ID: 0x9CFAC88B
Please consider membership of the Hospitality Club 
(http://www.hospitalityclub.org)

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-09 Thread Scott Lambert
On Fri, Jan 09, 2009 at 10:54:10AM +0200, Janne Peltonen wrote:
> So have I. But in the current Cyrus installation, I'm stuck with Linux,
> so I concentrated on what's available on Linux. Moreover, I don't want
> to use non-free operating systems - if anything, I've become more
> ideological with age... I'd happily use /any/ free unix variant that ran
> ZFS, but.

Well, fire up your test environment and start playing with FreeBSD.  ZFS
and DTrace and "free."  The better ZFS is in 8-CURRENT.  Apparently,
you have to tweak things a bit on 7-STABLE still.  But, by the time you
(third person non-specific) get comfortable with "not Linux" 8 may be
-STABLE.

-- 
Scott LambertKC5MLE   Unix SysAdmin
lamb...@lambertfam.org


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-09 Thread Martin Wendel
Bron Gondwana wrote:
> On Thu, Jan 08, 2009 at 10:13:25PM -0800, Robert Banz wrote:

>> (notice, didn't mention AIX. I've got my standards ;)
> 
> Hey - I have a friend who _likes_ AIX.  There are odd people in the
> world.

We at Uppsala university have been running cyrus on AIX for a little more than 
10
years. Back then, there was no acceptable alternative to the AIX LVM. AIX still 
is,
as I see it, very competent when it comes to handling disk I/O. About three 
years
ago I thought time was ready for running cyrus on a shared file system. We had a
large installation of IBM SANFS on another system that at the time was 
performing
well. We purchased SANFS and four RS/6000 servers for cyrus and needed to take 
it
in production rather abruptly after a filesystem crash on the old cyrus server.

However time was not quite ready for running cyrus on SANFS. After about two 
years
of SANFS problems (although at no time did SANFS get corrupt, it was very stable
but just could not handle the load. Sometimes we needed to restart the 
filesystem
several times a week) we decided to follow the examples published on this list 
by the
FastMail guys. I did not dare to go with GPFS when SANFS was discontinued.

We've since splitted our 20 million cyrus files onto eight IBM blade servers, 
running
AIX virtualization server handling SAN connections for the 6 virtual RedHat 
servers
running on each of them. We run one cyrus instance on each RedHat server, 3 
primary
servers and 3 replicas on each blade. We thus have 24 primary servers, and 24 
replicas,
with about 1 million cyrus files each.

We did some tests on which file system to choose but there were not that much 
difference
so we decided on ext3.

We also have 4 additional blades running Debian, 2 for LVS and 2 for Nginx, and 
about
10TB of SAN disk area dedicated to cyrus. The system has been running very 
nicely for six
months now.

So I guess this is a success story inspired by FastMail. But I still would not 
choose
anything other than AIX for our TSM servers.




Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-09 Thread Dave McMurtrie
Nic Bernstein wrote:

> PS - This has been a very interesting thread to read.  Some of us just 
> don't have the exposure to large systems like the participants in this 
> thread have, and this can be very educational.

It's actually been helpful to us, as well.

All of our mail backends are currently Solaris with SAN storage using 
vxfs.  We're considering a move to Linux, but which filesystem to choose 
is still a major unanswered question for us.

After reading this entire thread, it makes me realize that I've been 
taking vxfs for granted.  It's been rock solid and the performance is fine.

Thanks,

Dave
-- 
Dave McMurtrie, SPE
Email Systems Team Leader
Carnegie Mellon University,
Computing Services

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-09 Thread Nic Bernstein

On 01/09/2009 12:59 AM, Bron Gondwana wrote:

On Thu, Jan 08, 2009 at 10:13:25PM -0800, Robert Banz wrote:
  

There's a significant upfront cost to learning a whole new system
for one killer feature, especially if it comes along with signifiant
regressions in lots of other features (like a non-sucky userland
out of the box).
  
The "non-sucky" userland comment is simply a matter of preference, and  
bait for a religious war, which I'm not going to bite.



Well, yeah.  Point.  Though most Solaris admins I know tend to pull in
gnu or bsd utilities pretty quickly.  I'll take that one back, it was
baity.
  
So at the risk of entering into a flame war, I must say I am surprised 
that no one has mentioned Nexenta/OS.

   http://www.nexenta.org/os
They have bolted the Ubuntu/Debian userland onto OpenSolaris to give the 
Linux lovers out there a linuxy experience with access to all of that 
shiny new Solaris bling, such as zfs and dtrace.  You may want to give 
it a look-see.


Patching is always an issue on any OS, and you do have the choice of  
running X applications remotely (booting an entire graphic  
environment!?), and many other tools available such as pca to help you  
patch on Solaris, which provide many of the features that you're used  
to.




And I'm seeing there are quite a few third party tools that people have
written to ease the pain of patch management on Solaris (I believe it's
actually one of the nicer unixes to manage patches on, but when you're
used to apt-get, there's a whole world of WTFery in manually downloading
and applying patch sets - especially when you get permission denied on
a bunch of things that the tool has just suggested as being missing)
  

Oh yeah, apt-get included.

Cheers,
   -nic

PS - This has been a very interesting thread to read.  Some of us just 
don't have the exposure to large systems like the participants in this 
thread have, and this can be very educational.


--
Nic Bernstein n...@onlight.com
Onlight llc.  www.onlight.com
2266 North Prospect Avenue #610   v. 414.272.4477
Milwaukee, Wisconsin  53202-6306  f. 414.290.0335


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: choosing a file system

2009-01-09 Thread Bron Gondwana
On Thu, Jan 08, 2009 at 10:13:25PM -0800, Robert Banz wrote:
>>
>> There's a significant upfront cost to learning a whole new system
>> for one killer feature, especially if it comes along with signifiant
>> regressions in lots of other features (like a non-sucky userland
>> out of the box).
>
> The "non-sucky" userland comment is simply a matter of preference, and  
> bait for a religious war, which I'm not going to bite.

Well, yeah.  Point.  Though most Solaris admins I know tend to pull in
gnu or bsd utilities pretty quickly.  I'll take that one back, it was
baity.

> What I will say is that switching between Solaris, Linux, IRIX, Ultrix, 
> FreeBSD, HP-UX, OSF/1 -- any *nix variant, should not be considered a 
> stumbling block. Your comment shows the narrow-mindedness of the current 
> Linux culture, many of us were brought up supporting and using a 
> collection of these platforms at any one time.

There's a switching cost, particularly if you don't have any experience
with a new system.  You have to consider that cost when making an
upgrade choice.  I agree that ZFS is better than anything currently
available on Linux - but the question is "does that outweight the 
disadvantages of learning and supporting a new platform?".

There are basically two worthwhile things on Solaris: ZFS and DTrace.
Other things - fork behaviour caused us pain recently, it's just not
as cheap as on Linux, and forking from a big process caused lots of
swapping because even though it was execing pretty quickly, it had
to commit the memory first.  Oops.  There are downsides to Linux's
overcommit, but having to add complexity to our backup manager
because forking for every backup was too expensive was annoying.

(I do take offence to being considered narrow minded for not blindly
following the latest fashion and wanting to switch everything over to
Solaris because it has the latest bling - I've considered it, but the
numbers just don't add up.  We have something that works, is reliable
and is fast.  Our redundancy is just at a different level)

Hey - back on topic for cyrus.  We store sha1s of message files in
the index file now.  We don't have checksums on index files (yet, I
have crc32 patches half finished somewhere), but we're at a point
where userland scrubs are possible.  Along with replication, you can
restore any damaged file from the replica.  Actually, with out backup
system, you can even pull the original file from the backup, knowing
its sha1 because it gets recalculated again and checked during the
backup phase.

> (notice, didn't mention AIX. I've got my standards ;)

Hey - I have a friend who _likes_ AIX.  There are odd people in the
world.

> Patching is always an issue on any OS, and you do have the choice of  
> running X applications remotely (booting an entire graphic  
> environment!?), and many other tools available such as pca to help you  
> patch on Solaris, which provide many of the features that you're used  
> to.

I take it you haven't run X applications remotely from the other side of
the world before?  I'd hardly call it "running".  Crawling maybe.

My current approach is to run up an vncserver on a box in the same colo
and run X applications remotely to there.  It's significantly less
painful, and also gives me a place to run an iceweasel to talk to the
web interfaces of things things that won't talk to me any other way.
Uploading firmware via the web from locally is similarly less sucky
than pushing it out from Australia.

And I'm seeing there are quite a few third party tools that people have
written to ease the pain of patch management on Solaris (I believe it's
actually one of the nicer unixes to manage patches on, but when you're
used to apt-get, there's a whole world of WTFery in manually downloading
and applying patch sets - especially when you get permission denied on
a bunch of things that the tool has just suggested as being missing)

In short - I'm not sold on the value to FastMail of at least two of us
(bus factor) learning to maintain Solaris to the level that we'd want
for running something so core to our operations as the IMAP servers.

Bron ( happy to either stop the flamewar or take it off list at this
   point.  I don't think we're contributing anything meaningful
   any more )

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-09 Thread Janne Peltonen
On Thu, Jan 08, 2009 at 08:01:04AM -0800, Vincent Fox wrote:
> (Summary of filesystem discussion)
> 
> You left out ZFS.
> 
> Sometimes Linux admins remind me of Windows admins.

I didn't.

--clip--
Btrfs is in so early development that I don't know yet what to say about
it, but the fact of ZFS's being incompatible with GPL might be mitigated
by this.
--clip--

> I have adminned a half-dozen UNIX variants professionally but
> keep running into admins who only do ONE and for whom every
> problem is solved with "how can I do this with one OS only?"

So have I. But in the current Cyrus installation, I'm stuck with Linux,
so I concentrated on what's available on Linux. Moreover, I don't want
to use non-free operating systems - if anything, I've become more
ideological with age... I'd happily use /any/ free unix variant that ran
ZFS, but.

> Dark Ages now for terabytes of mail volume I'd throw a professional fit.
> Even the idea that I need to tune my filesystem for inodes and to avoid it
> wanting to fsck on reboot #20 or whatever seems like caveman discussion.
> Any of them offer cheap and nearly-instant snapshots & online scrubbing?
> No?  Then why use it for large number of files of important nature?

Because there isn't a free FS that does those things (yet). And there
are free systems that do enough...

> I love Linux, I surely do.  Virtually everything of an appliance nature here
> will probably shift over to it in the long run I think and for good reasons.
> But filesystem is one area where the bazaar model has fallen into a very
> deep rut and can't muster energy to climb out.

Really? Btrfs /does/ appear promising to me. I might be wrong, though.

> So far ZFS ticking along with no problems and low iostat numbers
> with everything in one big pool.  I have separate fs for data, imap, mail
> but haven't seen any need to carve mail spool into chunks at all.
> There were initial problems noted here in the mailing lists way back
> in Solaris 10u3 but that was solved with the fsync patch and since then
> it's been like butter.  Mail-store systems nobody ever needs to look
> at them because it "just works".

Well, that's nice. It's a shame they made it GPL-incompatible.


BR
-- 
Janne Peltonen  PGP Key ID: 0x9CFAC88B
Please consider membership of the Hospitality Club 
(http://www.hospitalityclub.org)

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-08 Thread Robert Banz
>
> There's a significant upfront cost to learning a whole new system
> for one killer feature, especially if it comes along with signifiant
> regressions in lots of other features (like a non-sucky userland
> out of the box).

...

The "non-sucky" userland comment is simply a matter of preference, and  
bait for a religious war, which I'm not going to bite.

What I will say is that switching between Solaris, Linux, IRIX,  
Ultrix, FreeBSD, HP-UX, OSF/1 -- any *nix variant, should not be  
considered a stumbling block. Your comment shows the narrow-mindedness  
of the current Linux culture, many of us were brought up supporting  
and using a collection of these platforms at any one time.

(notice, didn't mention AIX. I've got my standards ;)

Patching is always an issue on any OS, and you do have the choice of  
running X applications remotely (booting an entire graphic  
environment!?), and many other tools available such as pca to help you  
patch on Solaris, which provide many of the features that you're used  
to.

-rob

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-08 Thread Vincent Fox
Bron Gondwana wrote:
> BUT - if someone is asking "what's the best filesystem to use
> on Linux" and gets told ZFS, and by the way you should switch
> operating systems and ditch all the rest of your custom setup/
> experience then you're as bad as a Linux weenie saying "just
> use Cyrus on Linux" in a "how should I tune NTFS on my 
> Exchange server" discussion.
>
>   
Point taken.  We can go around that circle all day long but I *am*
saying there are other UNIX OS out there than just Linux and quite
frankly it blows my mind sometimes how people fall into ruts.

Numerous times in my career I have had to switch some application
from AIX to HP-UX, or IRIX to Linux.   The differing flavors of UNIX are
not so different to me as others perhaps.  Particularly when it's a 
single app
on a dedicated server I usually find it odd how people get stuck on 
something
and won't change.  Or they take the safe institutional path and never 
fight it.
Collect your paycheck and go home at 4.

I sleep very well at night knowing the Cyrus mail-stores are on ZFS.
Once in a while I run a scrub just for fun.  No futzing around.

This was no cakewalk.  I was pushing a boulder up a hill particularly
when we ran head-first into the ZFS fsync bottleneck start of Fall quarter.
Managers said we needed a crash program to convert everything
to Linux or Exchange or whatever.  I dug into the bugs instead and Sun
got us an interim patch to fix it and we moved on.  Now as I said it's like
butter and one of those setups nobody thinks about.  There are always
excuses why you will stick with "established" practice even if it's 
antiquated
and full of aches and pains, and I fought that and won.  It seems to me 
there
is no bigger deal than having a RELIABLE filesystem for mail-store and
this is where all other filesystem I have worked with since 1989 have been
a frigging nightmare.  Everything from bad controllers to double-disk
failures in RAID-5 sets keeps me wondering am I paranoid ENOUGH.

I'll be all over btrfs when it hits beta.  I'm not married to ZFS.  But I'm
quite unashamedly looking down my nose at any filesystem now that leaves
me possibly looking at fsck prompt.  I've done enough of that in my career
already it's time to move beyond 30+ years worth of cruft atop antique
designs that seemed tolerable when a huge disk was 20 gigs.







Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-08 Thread Bron Gondwana
On Thu, Jan 08, 2009 at 08:01:04AM -0800, Vincent Fox wrote:
> (Summary of filesystem discussion)
> 
> You left out ZFS.

Just to come back to this - I should say that I'm a big fan
of ZFS and what Sun have done with filesystem design.  Despite
the issues we've had with that machine, I know it's great for
people who are using it...

BUT - if someone is asking "what's the best filesystem to use
on Linux" and gets told ZFS, and by the way you should switch
operating systems and ditch all the rest of your custom setup/
experience then you're as bad as a Linux weenie saying "just
use Cyrus on Linux" in a "how should I tune NTFS on my 
Exchange server" discussion.

>From the original post:

Message-ID: <1617f8010812300849k1c7c878bl2f17e8d4287c1...@mail.gmail.com>

  "zfs (but we should switch to solaris or freebsd and 
   throw away our costly SAN)"

I'd love to do some load testing on a ZFS box with our setup
at some point.  There would be some advantages, though I suspect
having one big mailboxes.db vs the lots of little ones we have
would be a point of contention - and fine-grained skiplist
locking is still very much a wishlist item.  I'd want to take
some time testing it before unleashing it on the world!

Bron.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-08 Thread Bron Gondwana
On Thu, Jan 08, 2009 at 08:57:18PM -0800, Robert Banz wrote:
>
> On Jan 8, 2009, at 4:46 PM, Bron Gondwana wrote:
>
>> On Thu, Jan 08, 2009 at 08:01:04AM -0800, Vincent Fox wrote:
>>> (Summary of filesystem discussion)
>>>
>>> You left out ZFS.
>>>
>>> Sometimes Linux admins remind me of Windows admins.
>>>
>>> I have adminned a half-dozen UNIX variants professionally but
>>> keep running into admins who only do ONE and for whom every
>>> problem is solved with "how can I do this with one OS only?"

There's a significant upfront cost to learning a whole new system
for one killer feature, especially if it comes along with signifiant
regressions in lots of other features (like a non-sucky userland
out of the box).  Applying patches on Solaris seems to be a choice
between incredibly low-level command line tools or boot up a whole
graphical environment on a machine in a datacentre on the other side
of the world.

>> We run one zfs machine.  I've seen it report issues on a scrub
>> only to not have them on the second scrub.  While it looks shiny
>> and great, it's also relatively new.
>
> You'd be surprised how unreliable disks and the transport between the  
> disk and host can be. This isn't a ZFS problem, but a statistical  
> certainty as we're pushing a large amount of bits down the wire.
>
> You can, with a large enough corpus, have on-disk data corruption, or  
> data corruption that appeared en-flight to the disk, or in the  
> controller, that your standard disk CRCs can't correct for. As we keep  
> pushing the limits, data integrity checking at the filesystem layer --  
> before the information is presented for your application to consume --  
> has basically become a requirement.
>
> BTW, the reason that the first scrub saw the error, and the second scrub 
> didn't, is that the first scrub fixed it -- that's the job of a ZFS 

# zpool status -v rpool
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress for 0h0m, 0.69% done, 1h40m to go
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c5t0d0s0  ONLINE   0 0 0
c5t4d0s0  ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

//dev/dsk

---

if that's an "error that the scrub fixed" then it's a really badly
written error message.

Same error didn't exist next scrub, which was what confused me.

Bron.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-08 Thread Robert Banz

On Jan 8, 2009, at 4:46 PM, Bron Gondwana wrote:

> On Thu, Jan 08, 2009 at 08:01:04AM -0800, Vincent Fox wrote:
>> (Summary of filesystem discussion)
>>
>> You left out ZFS.
>>
>> Sometimes Linux admins remind me of Windows admins.
>>
>> I have adminned a half-dozen UNIX variants professionally but
>> keep running into admins who only do ONE and for whom every
>> problem is solved with "how can I do this with one OS only?"
>
> We run one zfs machine.  I've seen it report issues on a scrub
> only to not have them on the second scrub.  While it looks shiny
> and great, it's also relatively new.

You'd be surprised how unreliable disks and the transport between the  
disk and host can be. This isn't a ZFS problem, but a statistical  
certainty as we're pushing a large amount of bits down the wire.

You can, with a large enough corpus, have on-disk data corruption, or  
data corruption that appeared en-flight to the disk, or in the  
controller, that your standard disk CRCs can't correct for. As we keep  
pushing the limits, data integrity checking at the filesystem layer --  
before the information is presented for your application to consume --  
has basically become a requirement.

BTW, the reason that the first scrub saw the error, and the second  
scrub didn't, is that the first scrub fixed it -- that's the job of a  
ZFS scrub.

-rob

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-08 Thread Bron Gondwana


On Thu, 08 Jan 2009 20:03 -0500, "Dale Ghent"  wrote:
> On Jan 8, 2009, at 7:46 PM, Bron Gondwana wrote:
> 
> > We run one zfs machine.  I've seen it report issues on a scrub
> > only to not have them on the second scrub.  While it looks shiny
> > and great, it's also relatively new.
>
> Wait, weren't you just crowing about ext4? The filesystem that was  
> marked GA in the linux kernel release that happened just a few weeks  
> ago? You also sound pretty enthusiastic, rather than cautious, when  
> talking about brtfs and tux3.

I was saying I find it interesting.  I wouldn't seriously consider
using it for production mail stores just yet.  But I have been testing
it on my laptop, where I'm running an offlineimap replicated copy of
my mail.  I wouldn't consider btrfs for production yet either, and
tux3 isn't even on the radar.  They're interesting to watch though,
as is ZFS.

I also said (or at least meant) that if you have commercial support,
ext4 is probably going to be the next evolutionary step from ext3.

> ZFS, and anyone who even remotely seriously follows Solaris would know  
> this, has been GA for 3 years now. For someone who doesn't have their  
> nose buried in Solaris much or with any serious attention span, I  
> guess it could still seem new.

Yeah, it's true - but I've heard anecdotes of people losing entire
zpools due to bugs.  Google turns up things like:

http://www.techcrunch.com/2008/01/15/joyent-suffers-major-downtime-due-to-zfs-bug/

which points to this thread:

http://www.opensolaris.org/jive/thread.jspa?threadID=49020&tstart=0

and finally this comment:

http://www.joyeur.com/2008/01/16/strongspace-and-bingodisk-update#c008480

Not something I would want happening to my entire universe, which is
why having ~280 separate filesystems (at the moment) with our email
spread across them means that a rare filesystem bug is only likely to
affect a single store if it bites - and we can restore one store's
worth of users a lot quicker than the whole system.

It's the same reason we prefer Cyrus replication (and put a LOT of work
into making it stable - check this mailing list from a couple of years
ago.  I wrote most of the patches the stabilised replication between
2.3.3 and 2.3.8)

If all your files are on a single filesystem then a rare bug only has
to hit once.  A frequent bug on the other hand, well - you'll know
about them pretty fast... :)  None of the filesystems mentioned have
frequent bugs (except btrfs and probably tux3 - but they ship with
big fat warnings all over)

> As for your x4500, I can't tell if those syslog lines you pasted were  
> from Aug. 2008 or 2007, but certainly since 2007 the marvel SATA  
> driver has seen some huge improvements to work around some pretty  
> nasty bugs in the marvell chipset. If you still have that x4500, and  
> have not applied the current patch for the marvell88sx driver, I  
> highly suggest doing so. Problems with that chip are some of the  
> reasons Sun switched to the LSI 1068E as the controller in the x4540.

I think it was 2007 actually.  We haven't had any trouble with it for
a while, but then it does pretty little.  The big zpool is just used
for backups, which are pretty much one .tar.gz and one .sqlite3 file
per user - and the .sqlite3 file is just indexing the .tar.gz file,
we can rebuild it by reading the tar file if needed.

As a counterpoint to some of the above, we had an issue with Linux
where there was a bug in 64 bit writev handling of mmaped space.  If
you were doing a writev with a mmaped space that crossed a page boundary
and the following page wasn't mapped in, it would inject spurious zero
bytes in the output where the start of the next page belonged.

It took me a few days to prove it was the kernel and create a repeatable
test case, and then backwards and forwards with Linus and a couple of
other developers we fixed it and tested it _that_day_.  I don't know
anyone with even unobtanium level support with a commercial vendor who
has actually had that sort of turnaround.

This caused pretty massive file corruption of especially our skiplist
files, but bits of every other meta file too.   Luckily, as per above,
we had only upgraded one machine.  We generally do that with new kernels
or software versions - upgrade one production machine and watch it for
a bit.  We also test things on testbed machines first, but you always
find something different on production.  The mmap over boundaries case
was pretty rare - only a few per day would actually cause a crash, the
others were silent corruption that wasn't detected at the time.

If something like this hit an only machine, we would have been seriously
screwed.  Since it only hit one machine, we could apply the fix and
re-replicate all the damaged data from the other machine.  No actual
dataloss.

Bron.
-- 
  Bron Gondwana
  br...@fastmail.fm


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http:/

Re: choosing a file system

2009-01-08 Thread Dale Ghent
On Jan 8, 2009, at 7:46 PM, Bron Gondwana wrote:

> We run one zfs machine.  I've seen it report issues on a scrub
> only to not have them on the second scrub.  While it looks shiny
> and great, it's also relatively new.

Wait, weren't you just crowing about ext4? The filesystem that was  
marked GA in the linux kernel release that happened just a few weeks  
ago? You also sound pretty enthusiastic, rather than cautious, when  
talking about brtfs and tux3.

ZFS, and anyone who even remotely seriously follows Solaris would know  
this, has been GA for 3 years now. For someone who doesn't have their  
nose buried in Solaris much or with any serious attention span, I  
guess it could still seem new.

As for your x4500, I can't tell if those syslog lines you pasted were  
from Aug. 2008 or 2007, but certainly since 2007 the marvel SATA  
driver has seen some huge improvements to work around some pretty  
nasty bugs in the marvell chipset. If you still have that x4500, and  
have not applied the current patch for the marvell88sx driver, I  
highly suggest doing so. Problems with that chip are some of the  
reasons Sun switched to the LSI 1068E as the controller in the x4540.

/dale


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-08 Thread Bron Gondwana
On Thu, Jan 08, 2009 at 08:01:04AM -0800, Vincent Fox wrote:
> (Summary of filesystem discussion)
> 
> You left out ZFS.
> 
> Sometimes Linux admins remind me of Windows admins.
> 
> I have adminned a half-dozen UNIX variants professionally but
> keep running into admins who only do ONE and for whom every
> problem is solved with "how can I do this with one OS only?"

We run one zfs machine.  I've seen it report issues on a scrub
only to not have them on the second scrub.  While it looks shiny
and great, it's also relatively new.

Besides, we had a disk _fail_ early on in our x4500 - Sun shipped
a replacement drive, but the kernel was unable to recognise it:

---

"Nothing odd about how it snaps in. We can see the connectors in the
slot - they seem fine as far as we can tell. The drive's 'ok' light is
on and the blue led lit."

Which suggests the server thinks the drive is fine, but the dmesg data
definitely suggests it isn't.

I've also included the output of hdadm display below as well, which
shows that currently it thinks the drive is not present, even though the
last thing reported in the dmesg log is that the device was connected.

Aug 14 21:59:13 backup1  SATA device attached at port 0
Aug 14 21:59:13 backup1 sata: [ID 663010 kern.info]
+/p...@2,0/pci1022,7...@8/pci11ab,1...@1 :

The output of hdadm display shows that the machine definitely thinks the
drive is NOT connected.

---

Sun's response was to wait for the next kernel upgrade - there was a bug
that made that channel unusable even after a reboot.

> So far ZFS ticking along with no problems and low iostat numbers
> with everything in one big pool.  I have separate fs for data, imap, mail
> but haven't seen any need to carve mail spool into chunks at all.
> There were initial problems noted here in the mailing lists way back
> in Solaris 10u3 but that was solved with the fsync patch and since then
> it's been like butter.  Mail-store systems nobody ever needs to look
> at them because it "just works".

I'd sure hate to lose the entire basket, say due to an unknown bug in
zfs.

Besides, I _know_ Debian quite well.  We don't have any Solaris
experience in our team.  The documentation looks quite good, but it's
still a lot of things that work differently.  I tell you what,
maintaining Solaris and using the Solaris userland feels like going
back 20 years - and the whole "need a sunsolve password and only get
some patches - permission denied on others" crap.  I don't need that.

So while I apprciate that ZFS has some advantages, I'd have to say
that they need to be weighed up against the rest of the system, and
the "all the eggs in a relatively new basket" argument.  Also, the
response we've had from Linus when we find kernel issues has been
absolutely fantastic.

Bron ( Debian on the Solaris kernel would be interesting... )

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-08 Thread Bron Gondwana
On Thu, Jan 08, 2009 at 05:20:00PM +0200, Janne Peltonen wrote:
> If I'm still following after reading through all this discussion,
> everyone who is actually using ReiserFS (v3) appears to be very content
> with it, even with very large installations. Apparently the fact that
> ReiserFS uses the BKL in places doesn't hurt performance too badly, even
> with multi core systems? Another thing I don't recall being mentioned
> was fragmentation - ext3 appears to have a problem with it, in typical
> Cyrus usage, but how does ReiserFS compare to it?

Yeah, I'm surprised the BKL hasn't hurt us more.  Fragmentation, yeah
it does hurt performance a bit.  We run a patch which causes a skiplist
checkpoint every time it runs a "recovery", which includes every
restart.  We also tune skiplists to checkpoint more frequently in
everyday use.  This helps reduce meta fragmentation.

For data fragmentation - we don't care.  Honestly.  Data IO is so rare.

The main time it matters is if someone does a body search.

Which leaves... index files.  The worst case are files that are only
ever appended to, never any records deleted.  Each time you expunge
a mailbox (even with delayed expunge) it causes a complete rewrite of
the cyrus.index file.

I also wrote a filthy little script (attached) which can repack cyrus
meta directories.  I'm not 100% certain that it's problem free though,
so I only run it on replicas.  Besides, it's not "protected" like most
of our auto-system functions, which check the database to see if the
machine is reporting high load problems and choke themselves until the
load drops back down again.
 
> I'm using this happily, with 50k users, 24 distinct mailspools of 240G
> each. Full backups take quite a while to complete (~2 days), but normal
> usage is quite fast. There is the barrier problem, of course... I'm
> using noatime (implying nodiratime) and data=ordered, since
> data=writeback resulted in corrupted skiplist files on crash, while
> data=ordered mostly didn't.

Yeah, full backups.  Ouch.  I think the last time we had to do that it
took somewhat over a week.  Mainly CPU limited on the backup server,
which is doing a LOT of gzipping!

Our incremental backups take about 4 hours.  We could probably speed
this up a little more, but given that it's now down from about 12 hours
two weeks ago, I'm happy.  We were actually rate limited by Perl
'unpack' and hash creation, believe it or not!  I wound up rewriting
Cyrus::IndexFile to provide a raw interface, and unpacking just the
fields that I needed.  I also asserted index file version == 10 in the
backup library so I can guarantee the offsets are correct.

I've described our backup system here before - it's _VERY_ custom,
based on a deep understanding of the Cyrus file structures.  In this
case it's definitely worth it - it allows us to reconstruct partial
mailbox recoveries with flags intact.  Unfortunately, "seen" information
is much trickier.  I've been tempted for a while to patch cyrus's
seen support to store seen information for the user themselves in the
cyrus.index file, and only seen information for unowned folders in the
user.seen files.  The way it works now seems optimised for the uncommon
case at the expense of the common.  That always annoys me!
 
> Ext4 just got stable, so there is no real world Cyrus user experience on
> it. Among other things, it contains an online defragmenter. Journal
> checksumming might also help around the write barrier problem on LVM
> logical volumes, if I've understood correctly.

Yeah, it's interesting.  Local fiddling suggests it's worse for my
Maildir performance than even btrfs, and btrfs feels more jerky than
reiser3, so I stick with reiser3.
 
> Reiser4 might have a future, at least Andrew Morton's -mm patch contains
> it and there are people developing it. But I don't know if it ever will
> be included in the "standard" kernel tree.

Yeah, the mailing list isn't massively active at the moment either... I
do keep an eye on it.

> Btrfs is in so early development that I don't know yet what to say about
> it, but the fact of ZFS's being incompatible with GPL might be mitigated
> by this.

Yeah, btrfs looks interesting.  Especially with their work on improving
locking - even on my little dual processor laptop (yay core processors)
I would expect to see an improvement when they merge the new locking
code.

> I'm going to continue using ext3 for now, and probably ext4 when it's
> available from certain commercial enterprise linux vendor (personally,
> I'd be using Debian, but the department has an official policy of using
> RH / Centos). I'm eagerly waiting for btrfs to appear... I probably /would/
> switch to ReiserFS for now, if RH cluster would support ReiserFS FS
> resources.  Hmm, maybe I should just start hacking... On the other hand,
> the upgrade path from ext3 to ext4 is quite easy, and I don't know yet
> which would be better, ReiserFS or ext4.

Sounds sane.  If vendor support matters, then ext4 is probabl

Re: choosing a file system

2009-01-08 Thread Vincent Fox
(Summary of filesystem discussion)

You left out ZFS.

Sometimes Linux admins remind me of Windows admins.

I have adminned a half-dozen UNIX variants professionally but
keep running into admins who only do ONE and for whom every
problem is solved with "how can I do this with one OS only?"

I admin numerous Linux systems in our data center (Perdition proxy
in front of Cyrus for one)  but frankly you want me to go back into 
filesystem
Dark Ages now for terabytes of mail volume I'd throw a professional fit.
Even the idea that I need to tune my filesystem for inodes and to avoid it
wanting to fsck on reboot #20 or whatever seems like caveman discussion.
Any of them offer cheap and nearly-instant snapshots & online scrubbing?
No?  Then why use it for large number of files of important nature?

I love Linux, I surely do.  Virtually everything of an appliance nature here
will probably shift over to it in the long run I think and for good reasons.
But filesystem is one area where the bazaar model has fallen into a very
deep rut and can't muster energy to climb out.

So far ZFS ticking along with no problems and low iostat numbers
with everything in one big pool.  I have separate fs for data, imap, mail
but haven't seen any need to carve mail spool into chunks at all.
There were initial problems noted here in the mailing lists way back
in Solaris 10u3 but that was solved with the fsync patch and since then
it's been like butter.  Mail-store systems nobody ever needs to look
at them because it "just works".






Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-08 Thread Janne Peltonen
Hm.

ReiserFS:

If I'm still following after reading through all this discussion,
everyone who is actually using ReiserFS (v3) appears to be very content
with it, even with very large installations. Apparently the fact that
ReiserFS uses the BKL in places doesn't hurt performance too badly, even
with multi core systems? Another thing I don't recall being mentioned
was fragmentation - ext3 appears to have a problem with it, in typical
Cyrus usage, but how does ReiserFS compare to it?

Also, the write barrier problem mentioned in response to my earlier post
on ext3 would apparently be there with ReiserFS, too, wouldn't it?

GFS:

Nobody mentioned using GFS, which /is/ a clustered file system and as
such, probably overkill if it's only mounted on one node at a time, but
I'm curious... the overhead of a clustered FS is the fact that all
metadata operations take a long time, because there is a lot of
cluster-wide locking. But how much metadata operations there are, after
all, in Cyrus?

Also, GFS is one of the two file systems available when using RH
clustering...

Ext3:

I'm using this happily, with 50k users, 24 distinct mailspools of 240G
each. Full backups take quite a while to complete (~2 days), but normal
usage is quite fast. There is the barrier problem, of course... I'm
using noatime (implying nodiratime) and data=ordered, since
data=writeback resulted in corrupted skiplist files on crash, while
data=ordered mostly didn't.

Also, ext3 is the other FS available when using RH clustering. (Of
course, it isn't a clustered FS, so it is only available when using the
cluster in active-passive mode.)

XFS:

There was someone using this, too, and happy with it.

JFS:

Mm, apparently no comments on this, not positive, at least.

Future:

Ext4 just got stable, so there is no real world Cyrus user experience on
it. Among other things, it contains an online defragmenter. Journal
checksumming might also help around the write barrier problem on LVM
logical volumes, if I've understood correctly.

Reiser4 might have a future, at least Andrew Morton's -mm patch contains
it and there are people developing it. But I don't know if it ever will
be included in the "standard" kernel tree.

Btrfs is in so early development that I don't know yet what to say about
it, but the fact of ZFS's being incompatible with GPL might be mitigated
by this.

Conclusion:

I'm going to continue using ext3 for now, and probably ext4 when it's
available from certain commercial enterprise linux vendor (personally,
I'd be using Debian, but the department has an official policy of using
RH / Centos). I'm eagerly waiting for btrfs to appear... I probably /would/
switch to ReiserFS for now, if RH cluster would support ReiserFS FS
resources.  Hmm, maybe I should just start hacking... On the other hand,
the upgrade path from ext3 to ext4 is quite easy, and I don't know yet
which would be better, ReiserFS or ext4.


-- 
Janne Peltonen  PGP Key ID: 0x9CFAC88B
Please consider membership of the Hospitality Club 
(http://www.hospitalityclub.org)

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-05 Thread Rob Mueller

>> We've found that splitting the data up into more volumes + more cyrus
>> instances seems to help as well because it seems to reduce overall
>> contention points in the kernel + software (eg filesystem locks spread
>> across multiple mounts, db locks are spread across multiple dbs, etc)
>
> Makes sense.  Single cyrus env here, might consider that in the future.
> At
> that point though, I'd probably consider Murder or similar.

That should work fine as well.

I believe murder just does two main things.

1. It's merges the mailboxes.db from each instance into each other instance,
so you end up with just one giant single namespace
2. It proxies everything (imap/pop/lmtp) as needed to the appropriate
instance if it's not the local one

We don't use murder as we don't really need (1), and we do (2) ourselves 
with a combination of nginx and custom lmtpproxy tool.

Rob


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-05 Thread LALOT Dominique
2009/1/5 Patrick Boutilier 

> David Lang wrote:
>
>> On Sat, 3 Jan 2009, Rob Mueller wrote:
>>
>>  But the new Solid-State-Disks seem very promising. They are claimed to
 give 30x the throughput of a 15k rpm disk. If IO improves by 30 times
 that should make all these optimizations unnecessary.
 As my boss used to tell me ... Good hardware always compensates for
 not-so-good software.

>>> What we've found is that the meta-data (eg mailbox.db, seen db's, quota
>>> files, cyrus.* files) use WAY more IO than the email data, but only use
>>> 1/20th the space.
>>>
>>> By separating the meta data onto RAID1 10k/15k RPM drives, and the email
>>> data onto RAID5/6 7.2k RPM drives, you can get a good balance of
>>> space/speed.
>>>
>>
>> how do you move the cyrus* files onto other drives?
>>
>
> metapartition_files and metapartition-default imapd.conf options in
> cyrus-imapd 2.3.x


So, then, may be we can easily store pure email data on an NFS appliance,
keeping metadata on traditionnal filesystem, which can be synced using low
level tools

Dom



>
>
>
>
>> David Lang
>> 
>> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
>> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
>> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>>
>
>
> 
> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>



-- 
Dominique LALOT
Ingénieur Systèmes et Réseaux
http://annuaire.univmed.fr/showuser?uid=lalot

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: choosing a file system

2009-01-05 Thread Patrick Boutilier

David Lang wrote:

On Sat, 3 Jan 2009, Rob Mueller wrote:


But the new Solid-State-Disks seem very promising. They are claimed to
give 30x the throughput of a 15k rpm disk. If IO improves by 30 times
that should make all these optimizations unnecessary.
As my boss used to tell me ... Good hardware always compensates for
not-so-good software.

What we've found is that the meta-data (eg mailbox.db, seen db's, quota
files, cyrus.* files) use WAY more IO than the email data, but only use
1/20th the space.

By separating the meta data onto RAID1 10k/15k RPM drives, and the email
data onto RAID5/6 7.2k RPM drives, you can get a good balance of
space/speed.


how do you move the cyrus* files onto other drives?


metapartition_files and metapartition-default imapd.conf options in 
cyrus-imapd 2.3.x





David Lang

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


begin:vcard
fn:Patrick Boutilier
n:Boutilier;Patrick
org:;Nova Scotia Department of Education
adr:;;2021 Brunswick Street;Halifax;NS;B3K 2Y5;Canada
email;internet:bouti...@ednet.ns.ca
title:WAN Communications Specialist
tel;work:902-424-6800
tel;fax:902-424-0874
version:2.1
end:vcard


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: choosing a file system

2009-01-05 Thread David Lang
On Sat, 3 Jan 2009, Rob Mueller wrote:

>> But the new Solid-State-Disks seem very promising. They are claimed to
>> give 30x the throughput of a 15k rpm disk. If IO improves by 30 times
>> that should make all these optimizations unnecessary.
>> As my boss used to tell me ... Good hardware always compensates for
>> not-so-good software.
>
> What we've found is that the meta-data (eg mailbox.db, seen db's, quota
> files, cyrus.* files) use WAY more IO than the email data, but only use
> 1/20th the space.
>
> By separating the meta data onto RAID1 10k/15k RPM drives, and the email
> data onto RAID5/6 7.2k RPM drives, you can get a good balance of
> space/speed.

how do you move the cyrus* files onto other drives?

David Lang

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-05 Thread John Madden
> $ mount | wc -l
> 92

Wow.

> We've found that splitting the data up into more volumes + more cyrus
> instances seems to help as well because it seems to reduce overall
> contention points in the kernel + software (eg filesystem locks spread
> across multiple mounts, db locks are spread across multiple dbs, etc)

Makes sense.  Single cyrus env here, might consider that in the future.  At 
that point though, I'd probably consider Murder or similar.

> Also one thing I did fail to mention, was that for the data volumes, you
> should definitely be using the "notail" mount option. Unfortunately that's
> not the default, and I think it probably should be. Tails packing is neat
> for saving space, but it reduces the average meta-data density, which makes
> "stating" lots of files in a directory a lot slower. I think that's what
> you might have been seeing. Of course you also mounted "noatime,nodiratime"
> on both?

Yes, we were using notail,noatime,nodiratime.

John




-- 
John Madden
Sr. UNIX Systems Engineer
Ivy Tech Community College of Indiana
jmad...@ivytech.edu

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-04 Thread Adam Tauno Williams
> On the other hand, XFS was the only Linux filesystems capable to handle our 
> 5 million files (at that time, we're now at 33 million) we had in these 
> days with an acceptable performance. Ext3 was way too slow with directories 
> with > 1000 files (but many things have changed from kernel 2.4.x to 
> nowadays kernels)

It has;  not Cyrus but another application we had, and we had to make a
'hashed' directory structure to avoid the many-files-in-a-directory
situation.  But this isn't true anymore,  ext3 performs well with very
large directories.


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: choosing a file system

2009-01-04 Thread Adam Tauno Williams
> > I had the feeling whatever optimizations done at the FS level would give
> > us a max of 5-10% benefit.
> > We migrated from ext3 to reiserfs  on our cyrus servers with 30k
> > mailboxes. I am not sure I saw a great benefit in terms of the iowait.
> > At peak times I always see a iowait of 40-60%
> To be honest, that's not what we saw in our ext3 <-> reiserfs tests.
> What mount options are you using? Are you using the mount options I 
> mentioned?
> noatime,nodiratime,notail,data=ordered

FYI, noatime implies nodiratime.  You can set nodiratime without atime,
but not atime without nodiratime.

> > But the new Solid-State-Disks seem very promising. They are claimed to
> > give 30x the throughput of a 15k rpm disk. If IO improves by 30 times
> > that should make all these optimizations unnecessary.
> > As my boss used to tell me ... Good hardware always compensates for
> > not-so-good software.
> What we've found is that the meta-data (eg mailbox.db, seen db's, quota 
> files, cyrus.* files) use WAY more IO than the email data, but only use 
> 1/20th the space.

Ditto.  The meta-data is very much the hot-spot for I/O.

> By separating the meta data onto RAID1 10k/15k RPM drives, and the email 
> data onto RAID5/6 7.2k RPM drives, you can get a good balance of 
> space/speed.

Agree.


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-03 Thread Pascal Gienger
Henrique de Moraes Holschuh  wrote:

> Ext4, I never tried.  Nor reiser3.  I may have to, we will build a brand
> new Cyrus spool (small, just 5K users) next month, and the XFS unlink
> [lack of] performance worries me.

Nobody likes deletes. Even databases used to mark deleted space only as 
"deleted" until a vacuum (Postgres) or other periodical maintenance command 
was run. Cyrus offers a similiar construct named "delayed expunge". Before 
we migrated our mail system to Solaris 10 it ran on Linux 2.4 with XFS on a 
FC SAN device. Deletes were extremely slow so we had to delay the expunges 
until the weekend, even on night they were too slow and too IO congesting.

On the other hand, XFS was the only Linux filesystems capable to handle our 
5 million files (at that time, we're now at 33 million) we had in these 
days with an acceptable performance. Ext3 was way too slow with directories 
with > 1000 files (but many things have changed from kernel 2.4.x to 
nowadays kernels), IBM jfs was not stable (it crashed during a high load 
test, which was an immediate k.o.). We were reluctant to use Reiser then as 
it was "too new" in 2001.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-03 Thread Rob Mueller
> Ext4, I never tried.  Nor reiser3.  I may have to, we will build a brand 
> new
> Cyrus spool (small, just 5K users) next month, and the XFS unlink
> [lack of] performance worries me.

>From what I can tell, all filesystems seem to have relatively poor unlink 
performance and unlinks often cause excessive contention and IO for what you 
think they should be doing. And it's not just filesystems. SQL deletes in 
MySQL InnoDB are way slower than you'd expect as well. Maybe deletes in 
general are just not as optimised a path, or there's something tricky about 
making atomic deletes work, I admit I've never really looked into it.

Anyway, that's part of the reason we sponsored Ken to create the "delayed 
expunge" mode code for cyrus, which allows us to delay unlinks to the 
weekends when IO load due to other things is the lowest.

---
Added support for "delayed" expunge, in which messages are
removed from the mailbox index at the time of the EXPUNGE (hiding them
from the client), but the message files and cache entries are left
behind, to be purged at a later time by cyr_expire.  This
reduces the amount of I/O that takes place at the time of EXPUNGE and
should result in greater responsiveness for the client, especially
when expunging a large number of messages.  The new expunge_mode
option in imapd.conf controls whether expunges are
"immediate" or "delayed".  Development sponsored by FastMail.
---

Rob


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-03 Thread Rob Mueller

> Running multiple cyrus instances with different dbs ? How do we do that.
> I have seen the ultimate io-contention point is the mailboxes.db file.
> And that has to be single.
> Do you mean dividing the users to different cyrus instances. That is a
> maintenance issue IMHO.

As Bron said, yes it is, but if you have more than 1 machines worth of users 
anyway, you have maintenance issues anyway. So rather than just one instance 
per machine, we run multiple instances per machine. The only issue it really 
introduces is that folder sharing between arbitrary users isn't possible 
(unless you used murder to join all the instances together again, but we 
don't), only users within an instance can share.

> I had the feeling whatever optimizations done at the FS level would give
> us a max of 5-10% benefit.
> We migrated from ext3 to reiserfs  on our cyrus servers with 30k
> mailboxes. I am not sure I saw a great benefit in terms of the iowait.
> At peak times I always see a iowait of 40-60%

To be honest, that's not what we saw in our ext3 <-> reiserfs tests.

What mount options are you using? Are you using the mount options I 
mentioned?

noatime,nodiratime,notail,data=ordered

> But the new Solid-State-Disks seem very promising. They are claimed to
> give 30x the throughput of a 15k rpm disk. If IO improves by 30 times
> that should make all these optimizations unnecessary.
> As my boss used to tell me ... Good hardware always compensates for
> not-so-good software.

What we've found is that the meta-data (eg mailbox.db, seen db's, quota 
files, cyrus.* files) use WAY more IO than the email data, but only use 
1/20th the space.

By separating the meta data onto RAID1 10k/15k RPM drives, and the email 
data onto RAID5/6 7.2k RPM drives, you can get a good balance of 
space/speed.

Rob


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-03 Thread Henrique de Moraes Holschuh
On Wed, 31 Dec 2008, Adam Tauno Williams wrote:
> > I never really got the point of the data=writeback mode. Sure, it
> > increases throughput, but so does disabling the journal completely, and
> > seems to me the end result as concerns data integrity is exactly the
> > same.
> 
> The *filesystem* is recoverable as the meta-data is journaled.
> *Contents* of files may be lost/corrupted.  I'm fine with that since a
> serious abend usually leaves the state of the data in a questionable
> state anyway for reasons other than the filesystem;  I want something I
> can safely (and quickly) remount and investigate/restore.  It is a
> trade-off.

Err... you guys better read the recent threads in LKML where Pavel goes
really hard on the data safety holes in ext3 and Linux VFS (and POSIX).

Short answer: ext3 without barriers (you can also disable disk write cache,
in that case barriers are not needed) is not deserving of the name "safe".
At which point *I* personally prefer XFS, which is just as adverse to the
lack of barriers on a disk with an enabled write cache, but performs better
than ext3 on most workloads AND has delayed write allocation.

Ext4, I never tried.  Nor reiser3.  I may have to, we will build a brand new
Cyrus spool (small, just 5K users) next month, and the XFS unlink
[lack of] performance worries me.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-03 Thread Bron Gondwana
On Sat, Jan 03, 2009 at 11:46:41AM +0530, ram wrote:
> Running multiple cyrus instances with different dbs ? How do we do that.
> I have seen the ultimate io-contention point is the mailboxes.db file.
> And that has to be single. 

Yeah, mailboxes.db access kinda sucks like that.  If you're making any
changes then it locks the entire DB with a single writelock.

I did consider fine-grained mailboxes.db locking at one point.  It's
definitely doable with fcntl locking, which is what Cyrus is using on
our machines.  It would require some small format changes to skiplist
though.  Somewhere in a checkout I have cyrusdb_skiplist2.c which
contains a bunch of checksumming code and the start of the new format.
I got sidetracked and never finished it though.

All our cyrus instances are installed on completely different drives.
Entirely self-contained on those external units so we can plug then
into a new machine and go.  The init scripts are in /etc/init.d/, but
they are generated from templates which pull their configuration from
a central file.  We can create a new pair of cyrus instances by adding
a single line that looks like this in a config file:

store$n   slot$s1   slot$s2

where $n, $s1 and $s2 are just numbers.  Slots are numbered as
%d%02d with server and partition numbers (it will break if we ever
have over 100 slots on a machine, but I'm happy to renumber at that
point.  Our biggest so far is 40.  When I set this up the biggest was
8.  Future proofing something so easily reconfigurable would have just
meant more typing in the meanwhile.

> Do you mean dividing the users to different cyrus instances. That is a
> maintenance issue IMHO. 

It's amazing what you can do with good tools - besides, if your site is
already bigger than any one single machine then you already have the
issue.  Might as well be smart about it.

As I said upthread somewhere - moving a user is pretty easy for us:

use ME::User;

my $UserName = shift;
my $TargetServer = shift;

my $User = ME::User->new_find($UserName);
$User->MoveUser($TargetServer);
 
> But the new Solid-State-Disks seem very promising. They are claimed to
> give 30x the throughput of a 15k rpm disk. If IO improves by 30 times
> that should make all these optimizations unnecessary. 
> As my boss used to tell me ... Good hardware always compensates for
> not-so-good software. 

Yeah, that would be nice.  Modulo the rewrite cost of course.  Note that
mailboxes.db is a skiplist file.  They make a lot of random updates to
4 bytes at a time when you append a record.  Imagine what that costs if
your minimum rewrite block is larger than the size of the whole file.  
You'd be better off going to flatfile DB.  I'm not kidding you here.
Running "recovery" at startup time would take days on a reasonable
sized DB.  Check out the seeks and rewrites that baby does.  (ok, so if
your filesystem isn't mounted writeback it would probably only rewrite
twice when you actually did the fsyncs.  So much for rhetorical devices)

Bron ( rambling again )

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-02 Thread ram

On Sat, 2009-01-03 at 13:21 +1100, Rob Mueller wrote:
> > Now see, I've had almost exactly the opposite experience.  Reiserfs seemed 
> > to
> > start out well and work consistently until the filesystem reached a 
> > certain
> > size (around 160GB, ~30m files) at which point backing it up would start 
> > to
> > take too long and at around 180GB would take nearly a week.  This forced 
> > us
> > to move to ext3 and it doesn't seem to be degrade that way.  We did, 
> > however,
> > also move from a single partition to 8 of them, so that obviously has some
> > effect as well.
> 
> As you noted, changing two variables at once doesn't help you determine 
> which was the problem!
> 
> Multiple partitions will definitely allow more parallelism, which definitely 
> helps speed things up, which is one of the other things we have done over 
> time. Basically we went from a few large volumes to hundreds of 
> 300G(data)/15G(meta) volumes. One of our machines has 40 data volumes + 40 
> meta data volumes + the standard FS mounts.
> 
> $ mount | wc -l
> 92
> 
> We've found that splitting the data up into more volumes + more cyrus 
> instances seems to help as well because it seems to reduce overall 
> contention points in the kernel + software (eg filesystem locks spread 
> across multiple mounts, db locks are spread across multiple dbs, etc)
> 

Running multiple cyrus instances with different dbs ? How do we do that.
I have seen the ultimate io-contention point is the mailboxes.db file.
And that has to be single. 
Do you mean dividing the users to different cyrus instances. That is a
maintenance issue IMHO. 


I had the feeling whatever optimizations done at the FS level would give
us a max of 5-10% benefit. 
We migrated from ext3 to reiserfs  on our cyrus servers with 30k
mailboxes. I am not sure I saw a great benefit in terms of the iowait.
At peak times I always see a iowait of 40-60% 

But the new Solid-State-Disks seem very promising. They are claimed to
give 30x the throughput of a 15k rpm disk. If IO improves by 30 times
that should make all these optimizations unnecessary. 
As my boss used to tell me ... Good hardware always compensates for
not-so-good software. 




> Also one thing I did fail to mention, was that for the data volumes, you 
> should definitely be using the "notail" mount option. Unfortunately that's 
> not the default, and I think it probably should be. Tails packing is neat 
> for saving space, but it reduces the average meta-data density, which makes 
> "stating" lots of files in a directory a lot slower. I think that's what you 
> might have been seeing. Of course you also mounted "noatime,nodiratime" on 
> both?
> 
> I think that's another problem with a lot of filesystem benchmarks, not 
> finding out what the right mount "tuning" options are for your benchmark. 
> Arguing that "the default should be fine" is clearly wrong, because every 
> sane person uses "noatime", so you're already doing some tuning, so you 
> should find out what's best for the filesystem you are trying.
> 
> For the record, we use:
> 
> noatime,nodiratime,notail,data=ordered
> 
> On all our reiserfs volumes.
> 
> Rob
> 
> 
> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-02 Thread Rob Mueller

> Now see, I've had almost exactly the opposite experience.  Reiserfs seemed 
> to
> start out well and work consistently until the filesystem reached a 
> certain
> size (around 160GB, ~30m files) at which point backing it up would start 
> to
> take too long and at around 180GB would take nearly a week.  This forced 
> us
> to move to ext3 and it doesn't seem to be degrade that way.  We did, 
> however,
> also move from a single partition to 8 of them, so that obviously has some
> effect as well.

As you noted, changing two variables at once doesn't help you determine 
which was the problem!

Multiple partitions will definitely allow more parallelism, which definitely 
helps speed things up, which is one of the other things we have done over 
time. Basically we went from a few large volumes to hundreds of 
300G(data)/15G(meta) volumes. One of our machines has 40 data volumes + 40 
meta data volumes + the standard FS mounts.

$ mount | wc -l
92

We've found that splitting the data up into more volumes + more cyrus 
instances seems to help as well because it seems to reduce overall 
contention points in the kernel + software (eg filesystem locks spread 
across multiple mounts, db locks are spread across multiple dbs, etc)

Also one thing I did fail to mention, was that for the data volumes, you 
should definitely be using the "notail" mount option. Unfortunately that's 
not the default, and I think it probably should be. Tails packing is neat 
for saving space, but it reduces the average meta-data density, which makes 
"stating" lots of files in a directory a lot slower. I think that's what you 
might have been seeing. Of course you also mounted "noatime,nodiratime" on 
both?

I think that's another problem with a lot of filesystem benchmarks, not 
finding out what the right mount "tuning" options are for your benchmark. 
Arguing that "the default should be fine" is clearly wrong, because every 
sane person uses "noatime", so you're already doing some tuning, so you 
should find out what's best for the filesystem you are trying.

For the record, we use:

noatime,nodiratime,notail,data=ordered

On all our reiserfs volumes.

Rob


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-02 Thread John Madden
> Now from our experience, I can tell you that ext3 really does poorly on
> this workload compared to reiserfs. We had two exact same servers, one all
> reiserfs and one all ext3. The ext3 one started out ok, but over the course
> of a few weeks/months, it started getting worse and worse and was
> eventually being completely crushed by IO load. The machine running
> reiserfs had no problems at all even though it had more users on it as well
> and was growing at the same rate as the other machine.

Now see, I've had almost exactly the opposite experience.  Reiserfs seemed to 
start out well and work consistently until the filesystem reached a certain 
size (around 160GB, ~30m files) at which point backing it up would start to 
take too long and at around 180GB would take nearly a week.  This forced us 
to move to ext3 and it doesn't seem to be degrade that way.  We did, however, 
also move from a single partition to 8 of them, so that obviously has some 
effect as well.

John





-- 
John Madden
Sr. UNIX Systems Engineer
Ivy Tech Community College of Indiana
jmad...@ivytech.edu

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-01 Thread Bron Gondwana
On Fri, Jan 02, 2009 at 04:19:52PM +1100, Rob Mueller wrote:
> http://lkml.org/lkml/2008/6/17/9

Ahh, that week.  *sigh*.  Not strictly a reiserfs problem of course,
that would have affected everyone.

Speaking of which, Linus did point out in that thread that the way
Cyrus does IO (mmap for reads, fseek/fwrite for writes) is totally
insane and guaranteed to hit every bug in existance.  Normal people
just use mmap for both.

Does anyone actually run Cyrus on anything that doesn't support
writable mmap these days?

Bron.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-01 Thread Rob Mueller

> There are /lots/ of (comparative) tests done: The most recent I could
> find with a quick Google is here:
>
> http://www.phoronix.com/scan.php?page=article&item=ext4_benchmarks

Almost every filesystem benchmark I've ever seen is effectively useless for 
comparing what's best for a cyrus mail server. They try and show the 
maximums/minimums of a bunch of discrete operation types (eg streaming IO, 
creating files, deleting files, lots of small random reads, etc) running on 
near empty volumes.

What none of them show is what happens to a filesystem when it's a real 
world cyrus mail spool/index:

* 100,000's of directories
* 10,000,000's of files
* 1-1,000,000 files per directory
* files continuously being created and deleted (emails)
* data being appended to existing files (cyrus.* files)
* lots of fsync calls all over the place (every lmtp append has multiple 
fsyncs, as well as various imap actions)
* run over the course of multiple years of continuous operations
* with a filesystem that's 60-90% full depending on your usage levels

There's serious fragmentation issues going on here that no benchmark even 
comes close to simulating.

Now from our experience, I can tell you that ext3 really does poorly on this 
workload compared to reiserfs. We had two exact same servers, one all 
reiserfs and one all ext3. The ext3 one started out ok, but over the course 
of a few weeks/months, it started getting worse and worse and was eventually 
being completely crushed by IO load. The machine running reiserfs had no 
problems at all even though it had more users on it as well and was growing 
at the same rate as the other machine.

Yes we did have directory indexing enabled (we had it turned on from the 
start), and we tried different data modes like data=writeback and 
data=ordered but that didn't help either.

To be honest, I don't know why exactly, and working out what's causing IO 
bottlenecks is not easy. We just went back to reiserfs.

Some previous comments I've made.

http://www.irbs.net/internet/info-cyrus/0412/0042.html
http://lists.andrew.cmu.edu/pipermail/info-cyrus/2006-October/024119.html

> The problem with reiserfs is... well. The developers have explicitely
> stated that the development of v3 has come to its end, and there was the

In this particular case, I'm really almost happy with this! Reiserfs has 
been very stable for us for at least 5 years, and I'm almost glad no-one is 
touching it because invariably people working on something will introduce 
new weird edge case bugs. This was a while back, but it demonstrates how 
apparently just adding 'some "sparse" endian annotations' caused a bug.

http://oss.sgi.com/projects/xfs/faq.html#dir2

That one was really nasty, even the xfs_repair tool couldn't fix it for a 
while!

Having said that, there have been some bugs over the last few years with 
reiserfs, however the kernel developers will still help with bug fixes if 
you find them and can trace them down.

http://blog.fastmail.fm/2007/09/21/reiserfs-bugs-32-bit-vs-64-bit-kernels-cache-vs-inode-memory/
http://lkml.org/lkml/2005/7/12/396
http://lkml.org/lkml/2008/6/17/9

Rob


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2008-12-31 Thread Bron Gondwana
On Wed, Dec 31, 2008 at 07:47:31AM -0500, Nik Conwell wrote:
> 
> On Dec 30, 2008, at 4:43 PM, Shawn Nock wrote:
> 
> [...]
> 
> > a scripted rename of mailboxes to balance partition utilization when  
> > we
> > add another partition.
> 
> Just curious - how do stop people from accessing their mailboxes  
> during the time they are being renamed and moved to another partition?

All access goes via an nginx proxy - we use the proc directory contents
to detect currently active connections and termintate them after
blocking all new logins in the authentication daemon.

Once they're fully moved, logins are enabled again.

Bron.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2008-12-31 Thread Shawn Nock
Nik Conwell wrote:
> 
> On Dec 30, 2008, at 4:43 PM, Shawn Nock wrote:
> 
> [...]
> 
>> a scripted rename of mailboxes to balance partition utilization when we
>> add another partition.
> 
> Just curious - how do stop people from accessing their mailboxes during
> the time they are being renamed and moved to another partition?
> 

We don't really bother. We run the script overnight (over several
nights) to minimize storage utilization and we haven't run into a
problem. I haven't looked at the code in a while, but as I recall the
rename operation is fairly atomic.

In short: it doesn't take long to move a box. The worst thing that I
could imagine would be a momentary outage for a single user (``Mailbox
does not exist'' or similar). This sort of error (if it does occur in
the wild) would clear almost immediately.

Shawn

-- 
Shawn Nock (OpenPGP: 0xFF7D08A3)
Unix Systems Group; UITS
University of Arizona
nock at email.arizona.edu



signature.asc
Description: OpenPGP digital signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: choosing a file system

2008-12-31 Thread David Lang
On Wed, 31 Dec 2008, Adam Tauno Williams wrote:

> On Wed, 2008-12-31 at 11:47 +0100, LALOT Dominique wrote:
>> Thanks for everybody. That was an interesting thread. Nobody seems to
>> use a NetApp appliance, may be due to NFS architecture problems.
>
> Personally, I'd never use NFS for anything.  Over the years I've had way
> to many NFS related problems on other things to ever want to try it
> again.

NFS has some very interesting capabilities and limitations. it's really bad for 
multiple processes writing to the same file (the cyrus* files for example) and 
for atomic actions (writing the message files for example)

there are ways that you can configure it that will work, but unless you already 
have a big NFS server you are probably much better off using a mechanism that 
makes the drives look more like local drives (SAN, iSCSI, etc) or try one of 
the 
cluster filesystems that has different tradeoffs than NFS does

>> I believe I'll look to ext4 that seemed to be available in last
>> kernel, and also to Solaris, but we are not enough to support another
>> OS.
>
> We've used Cyrus on XFS for almost a years, no problems.
>
> In regards to ext3 I'd pay attention to the vintage of problem reports
> and performance issues;  ext3 of several years ago is not the ext3 of
> today, many improvements have been made.  "data=writeback" mode can help
> performance quite a bit, as well as enabling "dir_index" if it isn't
> already (did it ever become the default?).  The periodic fsck can also
> be disabled via tune2fs.   I only point this out since, if you already
> have any ext3 setup,  trying the above are all painless and might buy
> you something.

it's definantly worth testing different filesystems. I last did a test about 
two 
years ago and confirmed XFS as my choice. I have one instance of cyrus still 
running on ext3 and I definantly see it as a user in the performance.

David Lang

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2008-12-31 Thread Scott Likens
Ah the saga of Hans Reiser.  That unfortunately is the Downfall of  
Reiserfs.

Yes, his company has disappeared, and a "void" has appeared from his  
lack of presence?

However, the Reiserfs4 patch set is current against the linux kernel  
2.6.28 (see 
http://www.kernel.org/pub/linux/kernel/people/edward/reiser4/reiser4-for-2.6/)

However I think that (http://en.wikipedia.org/wiki/Reiser4) pretty  
much sums up the future of Reiserfs4.

... However I haven't really run into show stopping bugs on Reiserfs3  
in quite some time (with excellent hardware).  However you replace it  
with dodgy hardware and things change.

I haven't looked at btrfs yet with Cyrus, perhaps I'll do that  
sometime soon.


On Dec 31, 2008, at 6:20 AM, Janne Peltonen wrote:

> On Wed, Dec 31, 2008 at 04:58:57AM -0800, Scott Likens wrote:
>> I would not discount using reiserfs (v3) by any means.  It's still  
>> by far a
>> better choice for a filesystem with Cyrus then Ext3 or Ext4.  I  
>> haven't really
>> seen anyone do any tests with Ext4, but I imagine it should be  
>> about par for
>> the course for Ext3.
>
> There are /lots/ of (comparative) tests done: The most recent I could
> find with a quick Google is here:
>
>  http://www.phoronix.com/scan.php?page=article&item=ext4_benchmarks
>
> The problem with reiserfs is... well. The developers have explicitely
> stated that the development of v3 has come to its end, and there was  
> the
> long argument between Hans Reiser and kernel delevopers about  
> whether v4
> could be included in kernel. When Hans Reiser was charged with murder
> (not the crow or Cyrus variant), his company assured that the
> development (of v4) would continue, but the last time I tried to find
> out anything about the project, it appeared more or less dead. Of
> course, the current reiserfs (v3) is very stable, but if you run into
> any issues, there really isn't a developer you can contact (or send
> patches to, if you figure out the bug).
>
>
> --Janne
> -- 
> Janne Peltonen  PGP Key ID: 0x9CFAC88B
> Please consider membership of the Hospitality Club 
> (http://www.hospitalityclub.org 
> )
>
>
> !DSPAM:495b87d570801804284693!
>
>


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2008-12-31 Thread Adam Tauno Williams
On Wed, 2008-12-31 at 15:46 +0200, Janne Peltonen wrote:
> On Wed, Dec 31, 2008 at 07:38:21AM -0500, Adam Tauno Williams wrote:
> > In regards to ext3 I'd pay attention to the vintage of problem reports
> > and performance issues;  ext3 of several years ago is not the ext3 of
> > today, many improvements have been made.  "data=writeback" mode can help
> > performance quite a bit, as well as enabling "dir_index" if it isn't
> > already (did it ever become the default?).  The periodic fsck can also
> > be disabled via tune2fs.   I only point this out since, if you already
> > have any ext3 setup,  trying the above are all painless and might buy
> > you something.
> I wouldn't call data=writeback painless. I had it on in the testing phase
> of our current Cyrus installation, and if the filesystem had to be
> forcibly unmounted by any reason (yes, there are reasons), the amount of
> corruption in those files that happened to be active during the unmount
> - well, it wasn't a nice sight. And the files weren't recoverable,
> except from backup.
> I never really got the point of the data=writeback mode. Sure, it
> increases throughput, but so does disabling the journal completely, and
> seems to me the end result as concerns data integrity is exactly the
> same.

The *filesystem* is recoverable as the meta-data is journaled.
*Contents* of files may be lost/corrupted.  I'm fine with that since a
serious abend usually leaves the state of the data in a questionable
state anyway for reasons other than the filesystem;  I want something I
can safely (and quickly) remount and investigate/restore.  It is a
trade-off.


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2008-12-31 Thread Janne Peltonen
On Wed, Dec 31, 2008 at 04:58:57AM -0800, Scott Likens wrote:
> I would not discount using reiserfs (v3) by any means.  It's still by far a
> better choice for a filesystem with Cyrus then Ext3 or Ext4.  I haven't really
> seen anyone do any tests with Ext4, but I imagine it should be about par for
> the course for Ext3.

There are /lots/ of (comparative) tests done: The most recent I could
find with a quick Google is here:

  http://www.phoronix.com/scan.php?page=article&item=ext4_benchmarks

The problem with reiserfs is... well. The developers have explicitely
stated that the development of v3 has come to its end, and there was the
long argument between Hans Reiser and kernel delevopers about whether v4
could be included in kernel. When Hans Reiser was charged with murder
(not the crow or Cyrus variant), his company assured that the
development (of v4) would continue, but the last time I tried to find
out anything about the project, it appeared more or less dead. Of
course, the current reiserfs (v3) is very stable, but if you run into
any issues, there really isn't a developer you can contact (or send
patches to, if you figure out the bug).


--Janne
-- 
Janne Peltonen  PGP Key ID: 0x9CFAC88B
Please consider membership of the Hospitality Club 
(http://www.hospitalityclub.org)

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2008-12-31 Thread <::.. Teresa_II ..::>
У вт, 2008-12-30 у 17:49 +0100, LALOT Dominique пише:

> Once, there was a bad shutdown corrupting ext3fs and we spent 6 hours
> on an fsck.

Actually i do use reiserfs over 2 years on cyrus-imapd. It performs
great even with realy big count of files in imap spool folders. But i
dont know how it will perform on EMC.

4 years ago i tryied ext3. It was disaster. Slow as hell.

Reiser4 was once used too, it did even better than reiserfs. But after 2
mounth stable running it get kernel OPS because a FS. And i did swiched
back to reiserfs.

-- 
Teresa



Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: choosing a file system

2008-12-31 Thread Janne Peltonen
On Wed, Dec 31, 2008 at 07:38:21AM -0500, Adam Tauno Williams wrote:
> In regards to ext3 I'd pay attention to the vintage of problem reports
> and performance issues;  ext3 of several years ago is not the ext3 of
> today, many improvements have been made.  "data=writeback" mode can help
> performance quite a bit, as well as enabling "dir_index" if it isn't
> already (did it ever become the default?).  The periodic fsck can also
> be disabled via tune2fs.   I only point this out since, if you already
> have any ext3 setup,  trying the above are all painless and might buy
> you something.

I wouldn't call data=writeback painless. I had it on in the testing phase
of our current Cyrus installation, and if the filesystem had to be
forcibly unmounted by any reason (yes, there are reasons), the amount of
corruption in those files that happened to be active during the unmount
- well, it wasn't a nice sight. And the files weren't recoverable,
except from backup.

I never really got the point of the data=writeback mode. Sure, it
increases throughput, but so does disabling the journal completely, and
seems to me the end result as concerns data integrity is exactly the
same.


--Janne
-- 
Janne Peltonen  PGP Key ID: 0x9CFAC88B
Please consider membership of the Hospitality Club 
(http://www.hospitalityclub.org)

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2008-12-31 Thread Eric Luyten
> -- Nik Conwell  is rumored to have mumbled on 31. Dezember 2008 
> 07:47:31 -0500 regarding Re: choosing a file system:
> 
> > Just curious - how do stop people from accessing their mailboxes
> > during the time they are being renamed and moved to another partition?

I moved a few thousand mailboxes in a similar fashion (summer of 2007) and 
encountered no problems. New message deliveries were nicely "frozen" by
Cyrus while the target Inbox was being renamed/moved.

 

Question : would it, stabilitywise, make a difference if the mail data and 
metadata 
are split, allocating the metadata partitions on SAN-based LUNs and storing 
messages 
in NAS (NFS) space ?

In other words : are the Cyrus-over-NFS inconveniences confined to the cyrus.* 
files ?


Rationale : 
  NAS space can, typically, be "grown" more easily than SAN space.
  This could be an advantage to older server OSes en filesystems...


Eric Luyten, Brussels Free University Computing Centre (Cyrus 2.2, 58k users, 
2.3 TB)

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2008-12-31 Thread Sebastian Hagedorn
-- Nik Conwell  is rumored to have mumbled on 31. Dezember 2008 
07:47:31 -0500 regarding Re: choosing a file system:



Just curious - how do stop people from accessing their mailboxes
during the time they are being renamed and moved to another partition?


I just do a grep on the username in the proc directory - if there is no 
process for that user, I figure it's safe enough to move the mailbox. This 
approach has worked well so far. I experimented with accessing a mailbox 
while it was being moved and that seemed to be OK as well, i.e. it failed 
while the operation was in progress.

--
Sebastian Hagedorn - RZKR-R1 (Flachbau), Zi. 18, Robert-Koch-Str. 10
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
Universität zu Köln / Cologne University - Tel. +49-221-478-5587

pgpPU72K0BOGZ.pgp
Description: PGP signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: choosing a file system

2008-12-31 Thread Scott Likens

Hi,

I would not discount using reiserfs (v3) by any means.  It's still by  
far a better choice for a filesystem with Cyrus then Ext3 or Ext4.  I  
haven't really seen anyone do any tests with Ext4, but I imagine it  
should be about par for the course for Ext3.


as far as the NFS... NFS isn't itself that bad, it's just that people  
tend to find ways to use NFS in a incorrect manner that only ends up  
leading to failure.


Scott

On Dec 31, 2008, at 2:47 AM, LALOT Dominique wrote:

Thanks for everybody. That was an interesting thread. Nobody seems  
to use a NetApp appliance, may be due to NFS architecture problems.


I believe I'll look to ext4 that seemed to be available in last  
kernel, and also to Solaris, but we are not enough to support  
another OS.


Dom

And Happy New Year !

2008/12/31 Bron Gondwana 
On Tue, Dec 30, 2008 at 02:43:14PM -0700, Shawn Nock wrote:
> Bron and the fastmail guys could tell you more about reiserfs...  
we've

> used RH&SuSE/reiserfs/EMC for quite a while and we are very happy.

Yeah, sure could :)

You can probably find plenty of stuff from me in the archives about  
our

setup - the basic things are:

* separate metadata on RAID1 10kRPM (or 15kRPM in the new boxes)  
drives.
* data files on RAID5 big slow drives - data IO isn't a limiting  
factor

* 300Gb "slots" with 15Gb associated meta drives, like this:

/dev/sdb6 14016208   8080360   5935848  58% /mnt/meta6
/dev/sdb7 14016208   8064848   5951360  58% /mnt/meta7
/dev/sdb8 14016208   8498812   5517396  61% /mnt/meta8
/dev/sdd2292959500 248086796  44872704  85% /mnt/data6
/dev/sdd3292959500 242722420  50237080  83% /mnt/data7
/dev/sdd4292959500 248840432  44119068  85% /mnt/data8

as you can see, that balances out pretty nicely.  We also store
per-user bayes databases on the associated meta drives.

We balance our disk usage by moving users between stores when usage
reaches 88% on any partition.  We get emailed if it goes above 92%
and paged if it goes above 95%.

Replication.  We have multiple "slots" on each server, and since
they are all the same size, we have replication pairs spread pretty
randomly around the hosts, so the failure of any one drive unit
(SCSI attached SATA) or imap server doesn't significantly overload
any one other machine.  By using Cyrus replication rather than,
say, DRBD, a filesystem corruption should only affect a single
partition, which won't take so long to fsck.

Moving users is easy - we run a sync_server on the Cyrus master, and
just create a custom config directory with symlinks into the tree on
the real server and a rewritten piece of mailboxes.db so we can
rename them during the move if needed.  It's all automatic.

We also have a "CheckReplication" perl module that can be used to
compare two ends to make sure everything is the same.  It does full
per-message flags checks, random sha1 integrity checks, etc.
Does require a custom patch to expose the GUID (as DIGEST.SHA1)
via IMAP.

I lost an entire drive unit on the 26th.  It stopped responding.
8 x 1TB drives in it.

I tried rebooting everything, then switched the affected stores over
to their replicas.  Total downtime for those users of about 15
minutes because I tried the reboot first just in case (there's a
chance that some messages were delivered and not yet replicated,
so it's better not to bring up the replica uncleanly until you're
sure there's no other choice)

In the end I decided that it wasn't recoverable quickly enough to
be viable, so chose new replica pairs for the slots that had been
on that drive unit (we keep some empty space on our machines for
just this eventuality) and started up another handy little script
"sync_all_users" which runs sync_client -u for every user, then
starts the rolling sync_client again at the end.  It took about
16 hours to bring everything back to fully replicated again.

Bron.



--
Dominique LALOT
Ingénieur Systèmes et Réseaux
http://annuaire.univmed.fr/showuser?uid=lalot
!DSPAM:495b4f1f47731804284693! 
Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

!DSPAM:495b4f1f47731804284693!



Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: choosing a file system

2008-12-31 Thread Nik Conwell

On Dec 30, 2008, at 4:43 PM, Shawn Nock wrote:

[...]

> a scripted rename of mailboxes to balance partition utilization when  
> we
> add another partition.

Just curious - how do stop people from accessing their mailboxes  
during the time they are being renamed and moved to another partition?

-nik

Information Technology
Systems Programming
Boston University


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2008-12-31 Thread Adam Tauno Williams
On Wed, 2008-12-31 at 11:47 +0100, LALOT Dominique wrote:
> Thanks for everybody. That was an interesting thread. Nobody seems to
> use a NetApp appliance, may be due to NFS architecture problems.

Personally, I'd never use NFS for anything.  Over the years I've had way
to many NFS related problems on other things to ever want to try it
again.

> I believe I'll look to ext4 that seemed to be available in last
> kernel, and also to Solaris, but we are not enough to support another
> OS.

We've used Cyrus on XFS for almost a years, no problems.  

In regards to ext3 I'd pay attention to the vintage of problem reports
and performance issues;  ext3 of several years ago is not the ext3 of
today, many improvements have been made.  "data=writeback" mode can help
performance quite a bit, as well as enabling "dir_index" if it isn't
already (did it ever become the default?).  The periodic fsck can also
be disabled via tune2fs.   I only point this out since, if you already
have any ext3 setup,  trying the above are all painless and might buy
you something.


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2008-12-31 Thread LALOT Dominique
Thanks for everybody. That was an interesting thread. Nobody seems to use a
NetApp appliance, may be due to NFS architecture problems.

I believe I'll look to ext4 that seemed to be available in last kernel, and
also to Solaris, but we are not enough to support another OS.

Dom

And Happy New Year !

2008/12/31 Bron Gondwana 

> On Tue, Dec 30, 2008 at 02:43:14PM -0700, Shawn Nock wrote:
> > Bron and the fastmail guys could tell you more about reiserfs... we've
> > used RH&SuSE/reiserfs/EMC for quite a while and we are very happy.
>
> Yeah, sure could :)
>
> You can probably find plenty of stuff from me in the archives about our
> setup - the basic things are:
>
> * separate metadata on RAID1 10kRPM (or 15kRPM in the new boxes) drives.
> * data files on RAID5 big slow drives - data IO isn't a limiting factor
> * 300Gb "slots" with 15Gb associated meta drives, like this:
>
> /dev/sdb6 14016208   8080360   5935848  58% /mnt/meta6
> /dev/sdb7 14016208   8064848   5951360  58% /mnt/meta7
> /dev/sdb8 14016208   8498812   5517396  61% /mnt/meta8
> /dev/sdd2292959500 248086796  44872704  85% /mnt/data6
> /dev/sdd3292959500 242722420  50237080  83% /mnt/data7
> /dev/sdd4292959500 248840432  44119068  85% /mnt/data8
>
> as you can see, that balances out pretty nicely.  We also store
> per-user bayes databases on the associated meta drives.
>
> We balance our disk usage by moving users between stores when usage
> reaches 88% on any partition.  We get emailed if it goes above 92%
> and paged if it goes above 95%.
>
> Replication.  We have multiple "slots" on each server, and since
> they are all the same size, we have replication pairs spread pretty
> randomly around the hosts, so the failure of any one drive unit
> (SCSI attached SATA) or imap server doesn't significantly overload
> any one other machine.  By using Cyrus replication rather than,
> say, DRBD, a filesystem corruption should only affect a single
> partition, which won't take so long to fsck.
>
> Moving users is easy - we run a sync_server on the Cyrus master, and
> just create a custom config directory with symlinks into the tree on
> the real server and a rewritten piece of mailboxes.db so we can
> rename them during the move if needed.  It's all automatic.
>
> We also have a "CheckReplication" perl module that can be used to
> compare two ends to make sure everything is the same.  It does full
> per-message flags checks, random sha1 integrity checks, etc.
> Does require a custom patch to expose the GUID (as DIGEST.SHA1)
> via IMAP.
>
> I lost an entire drive unit on the 26th.  It stopped responding.
> 8 x 1TB drives in it.
>
> I tried rebooting everything, then switched the affected stores over
> to their replicas.  Total downtime for those users of about 15
> minutes because I tried the reboot first just in case (there's a
> chance that some messages were delivered and not yet replicated,
> so it's better not to bring up the replica uncleanly until you're
> sure there's no other choice)
>
> In the end I decided that it wasn't recoverable quickly enough to
> be viable, so chose new replica pairs for the slots that had been
> on that drive unit (we keep some empty space on our machines for
> just this eventuality) and started up another handy little script
> "sync_all_users" which runs sync_client -u for every user, then
> starts the rolling sync_client again at the end.  It took about
> 16 hours to bring everything back to fully replicated again.
>
> Bron.
>



-- 
Dominique LALOT
Ingénieur Systèmes et Réseaux
http://annuaire.univmed.fr/showuser?uid=lalot

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: choosing a file system

2008-12-30 Thread Bron Gondwana
On Tue, Dec 30, 2008 at 02:43:14PM -0700, Shawn Nock wrote:
> Bron and the fastmail guys could tell you more about reiserfs... we've
> used RH&SuSE/reiserfs/EMC for quite a while and we are very happy.

Yeah, sure could :)

You can probably find plenty of stuff from me in the archives about our
setup - the basic things are:

* separate metadata on RAID1 10kRPM (or 15kRPM in the new boxes) drives.
* data files on RAID5 big slow drives - data IO isn't a limiting factor
* 300Gb "slots" with 15Gb associated meta drives, like this:

/dev/sdb6 14016208   8080360   5935848  58% /mnt/meta6
/dev/sdb7 14016208   8064848   5951360  58% /mnt/meta7
/dev/sdb8 14016208   8498812   5517396  61% /mnt/meta8
/dev/sdd2292959500 248086796  44872704  85% /mnt/data6
/dev/sdd3292959500 242722420  50237080  83% /mnt/data7
/dev/sdd4292959500 248840432  44119068  85% /mnt/data8

as you can see, that balances out pretty nicely.  We also store
per-user bayes databases on the associated meta drives.

We balance our disk usage by moving users between stores when usage
reaches 88% on any partition.  We get emailed if it goes above 92%
and paged if it goes above 95%.

Replication.  We have multiple "slots" on each server, and since
they are all the same size, we have replication pairs spread pretty
randomly around the hosts, so the failure of any one drive unit 
(SCSI attached SATA) or imap server doesn't significantly overload
any one other machine.  By using Cyrus replication rather than,
say, DRBD, a filesystem corruption should only affect a single
partition, which won't take so long to fsck.

Moving users is easy - we run a sync_server on the Cyrus master, and
just create a custom config directory with symlinks into the tree on
the real server and a rewritten piece of mailboxes.db so we can
rename them during the move if needed.  It's all automatic.

We also have a "CheckReplication" perl module that can be used to
compare two ends to make sure everything is the same.  It does full
per-message flags checks, random sha1 integrity checks, etc.
Does require a custom patch to expose the GUID (as DIGEST.SHA1)
via IMAP.

I lost an entire drive unit on the 26th.  It stopped responding.
8 x 1TB drives in it.

I tried rebooting everything, then switched the affected stores over
to their replicas.  Total downtime for those users of about 15
minutes because I tried the reboot first just in case (there's a
chance that some messages were delivered and not yet replicated,
so it's better not to bring up the replica uncleanly until you're
sure there's no other choice)

In the end I decided that it wasn't recoverable quickly enough to
be viable, so chose new replica pairs for the slots that had been
on that drive unit (we keep some empty space on our machines for
just this eventuality) and started up another handy little script
"sync_all_users" which runs sync_client -u for every user, then
starts the rolling sync_client again at the end.  It took about
16 hours to bring everything back to fully replicated again.

Bron.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2008-12-30 Thread Shawn Nock
LALOT Dominique wrote:
> Hello,
> 
> We are using cyrus-imap for a long time. Our architecture is a SAN from EMC
> and thanks to our "DELL support" we are obliged to install redhat. The only
> option we have is to use ext3fs on rather old kernels. We have 4000 accounts
> for staff and 2 for students
> The system is rather fast and reliable. BUT..
> 

We support ~8000 faculty and staff and ~45000 students. On 16x 250G
reiserfs 'partitions' from and EMC CX500 arrays. Reiserfs has proven to
handle the load much better than ext3 (which we tested... it was a
disaster). We've been using reiserfs since RedHat Linux 7.x. We also
tested an early xfs patchset... but it was prone to corruption (but that
was years ago).

> Once, there was a bad shutdown corrupting ext3fs and we spent 6 hours on an
> fsck.
> Next we discovered that our backup system was going slower and slower. We
> just pointed out that it was due to fragmentation, and guess what, there's
> no online defrag tool for ext3.

We've only had to reiserfsck a partition once (with --rebuild-tree
eek!). It took a while, but the data was intact... it beats restoring
from tape.

We don't defragment (as such). In an attempt to speed up overnight
backups we once did a scripted rename of mailboxes to spare partitions.
Since this time we have given up on filesystem based backup and simply
do a block-level backup in combination with partition snapshots. Keeping
the cyrus partition size low has limited many of our problems and we do
a scripted rename of mailboxes to balance partition utilization when we
add another partition.

Bron and the fastmail guys could tell you more about reiserfs... we've
used RH&SuSE/reiserfs/EMC for quite a while and we are very happy.

Except those loony folks who want Exchange...

Shawn
-- 
Shawn Nock (OpenPGP: 0xFF7D08A3)
Unix Systems Group; UITS
University of Arizona
nock at email.arizona.edu



signature.asc
Description: OpenPGP digital signature

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: choosing a file system

2008-12-30 Thread Andrew Morgan
On Tue, 30 Dec 2008, LALOT Dominique wrote:

> Hello,
>
> We are using cyrus-imap for a long time. Our architecture is a SAN from EMC
> and thanks to our "DELL support" we are obliged to install redhat. The only
> option we have is to use ext3fs on rather old kernels. We have 4000 accounts
> for staff and 2 for students
> The system is rather fast and reliable. BUT..

We also have a Dell/EMC SAN (currently a CX500 but upgrading to a CX4-240 
soon).  I'd like to dispel any rumors about SAN support though.  Dell will 
support pretty much any combination of software and hardware that has been 
validated by EMC.  This include RedHat, Suse, and Solaris that I'm aware 
of, plus more I'm sure.  Now, if you want to get support for the operating 
system itself from Dell, then you are probably limited to RedHat.  I know 
a lot of folks like to get their entire environment supported from a 
single vendor, but that can really limit your choices too.

We run Solaris 10 and Debian Linux with our CX500.  Dell helped us setup 
the Emulex HBA in the Solaris 10 boxes and connected it to the SAN. 
During the initial setup of the SAN, I installed Suse Enterprise on one of 
our servers so I could see what they did to install the Qlogic HBA and 
setup the SAN connection.  After they left, I blew it away and installed 
Debian Linux.  It's not "supported" by Dell/EMC, but this is all 
standardized hardware and software.  It works great with the 
kernel-included Qlogic drivers and even with standard linux multipathing.

> Once, there was a bad shutdown corrupting ext3fs and we spent 6 hours on an
> fsck.
> Next we discovered that our backup system was going slower and slower. We
> just pointed out that it was due to fragmentation, and guess what, there's
> no online defrag tool for ext3.

How did you determine that it was due to fragmentation?  We use ext3 here 
as well, so I'm curious.

> I'm looking for other solutions:
> ext4fs (does somebody use such filesystem?), xfs
> zfs (but we should switch to solaris or freebsd and throw away our costly
> SAN)

No need to throw away your SAN if you switch to another OS, see above.  :)

Andy

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2008-12-30 Thread LALOT Dominique
John,

No, that was due to framentation. A fresh copy (one night to copy, then 2
hours to backup, 6 times faster then) solved that problem.
There's a filefrag utility, and for some mailboxes, it was over 60%. I have
3 500Mo spools at the moment. And one is left for the copy..

You copy first your data, then you destroy randomly small files and you fill
the holes randomly..
Ext4 is said to do delayed allocation, in order to have a decent idea of the
file size when writing to disk

Dom

2008/12/30 John Madden 

> > Once, there was a bad shutdown corrupting ext3fs and we spent 6 hours on
> an
> > fsck.
> > Next we discovered that our backup system was going slower and slower. We
> > just pointed out that it was due to fragmentation, and guess what,
> there's
> > no online defrag tool for ext3.
>
> Sure it isn't due to the number of files on those filesystems?  File-level
> backups will slow down linearly as the filesystems grow, of course.
> I "solve" this by adding more spools (up to 8 at the moment with about 350k
> mailboxes) so they can be backed up in parallel.  All on ext3.
>
> John
>
>
>
>
> --
> John Madden
> Sr. UNIX Systems Engineer
> Ivy Tech Community College of Indiana
> jmad...@ivytech.edu
> 
> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>



-- 
Dominique LALOT
Ingénieur Systèmes et Réseaux
http://annuaire.univmed.fr/showuser?uid=lalot

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: choosing a file system

2008-12-30 Thread Pascal Gienger
Robert Banz  wrote:

> At my last job, we had explored a Dell/EMC SAN at one point. Those
> folks don't seem to understand the idea that Fibre Channel is a well
> established standard -- they only expect you to connect their
> supported stack of hardware and software, otherwise they don't wanna
> talk.

Regarding to support as described by the support contract you are right - 
but I had many EMC big iron SAN devices running without a problem with 
Solaris 10. You have to adapt scsi_vhci.conf if you want symmetric 
multipathing as Sun does not recognize many of the FC devices which can 
handle symmetric links out there.

ZFS with SAN devices is perfectly OK. We have 33 million files on our 
(single!) ZFS mail pool, running gzip compression (Solaris 10 Patch 
137137-09 resp. 137138-09). Our Tivoli Storage Manager backup (tsm) runs 
every night for three hours approximately. Within this 3 hours it does scan 
all files. We do a zfs snapshot every day and we are holding 14 days 
snapshots to restore mailboxes. We are not conservatice enough to run scrub 
regularly, the last time I did was last week, without any error.

A happy and successful 2009 for all of you!

Pascal
-- 
Pascal Gienger
pas...@southbrain.com
http://southbrain.com/


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2008-12-30 Thread Vincent Fox
We run Solaris 10 on our Cyrus mail-store backends.
The mail is stored in a ZFS pool.  The ZFS pool are
composed of 4 SAN volumes in RAID-10.  The active
and failover server of each backend pair have "fiber multipath"
enabled so their dual connections to the SAN switch ensure
that if an HBA or SAN switch fails there is no downtime.

Once a month we run a scrub while the systems are online.
Never having to run fsck EVER AGAIN is a good thing.
The scrub is run during a weekend and not during a backup
window to be paranoid since it does keep the disks busy
for some hours but it never impacts performance.

Using ZFS also allows us easy & CHEAP snapshots.
We keep 14 days worth of snapshots in the pool and
that handles 99% of restore requests.  We run backups
to tape once a week from the most recent snapshot also.

LALOT Dominique wrote:
> Hello,
>
> We are using cyrus-imap for a long time. Our architecture is a SAN 
> from EMC and thanks to our "DELL support" we are obliged to install 
> redhat. The only option we have is to use ext3fs on rather old 
> kernels. We have 4000 accounts for staff and 2 for students
> The system is rather fast and reliable. BUT..
>
> Once, there was a bad shutdown corrupting ext3fs and we spent 6 hours 
> on an fsck.
> Next we discovered that our backup system was going slower and slower. 
> We just pointed out that it was due to fragmentation, and guess what, 
> there's no online defrag tool for ext3.
>
> I'm looking for other solutions:
> ext4fs (does somebody use such filesystem?), xfs
> zfs (but we should switch to solaris or freebsd and throw away our 
> costly SAN)
> use a NetApp Appliance (are you using such a device?, NFS seems to be 
> tricky with cyrus..)
>
> Thanks for your advice
>
> Dom
>
> -- 
> Dominique LALOT
> Ingénieur Systèmes et Réseaux
> http://annuaire.univmed.fr/showuser?uid=lalot
> 
>
> 
> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2008-12-30 Thread John Madden
> Once, there was a bad shutdown corrupting ext3fs and we spent 6 hours on an
> fsck.
> Next we discovered that our backup system was going slower and slower. We
> just pointed out that it was due to fragmentation, and guess what, there's
> no online defrag tool for ext3.

Sure it isn't due to the number of files on those filesystems?  File-level 
backups will slow down linearly as the filesystems grow, of course.  
I "solve" this by adding more spools (up to 8 at the moment with about 350k 
mailboxes) so they can be backed up in parallel.  All on ext3.

John




-- 
John Madden
Sr. UNIX Systems Engineer
Ivy Tech Community College of Indiana
jmad...@ivytech.edu

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2008-12-30 Thread Robert Banz

On Dec 30, 2008, at 9:06 AM, Pascal Gienger wrote:

> LALOT Dominique  wrote:
>
>> zfs (but we should switch to solaris or freebsd and throw away our  
>> costly
>> SAN)
>
> Why that? SAN volumes are running very fine with Solaris 10 hosts  
> (SPARC
> and x86). You have extended multipathing (symmetric and asymmetric)  
> onboard.
> Solaris accepts nearly all Q-Logic FC cards  (according to my  
> experience).


At my last job, we had explored a Dell/EMC SAN at one point. Those  
folks don't seem to understand the idea that Fibre Channel is a well  
established standard -- they only expect you to connect their  
supported stack of hardware and software, otherwise they don't wanna  
talk.

-rob

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2008-12-30 Thread Pascal Gienger
LALOT Dominique  wrote:

> zfs (but we should switch to solaris or freebsd and throw away our costly
> SAN)

Why that? SAN volumes are running very fine with Solaris 10 hosts (SPARC 
and x86). You have extended multipathing (symmetric and asymmetric) onboard.
Solaris accepts nearly all Q-Logic FC cards  (according to my experience).

Pascal


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2008-12-30 Thread Robert Banz

On Dec 30, 2008, at 8:49 AM, LALOT Dominique wrote:

> Hello,
>
> We are using cyrus-imap for a long time. Our architecture is a SAN  
> from EMC and thanks to our "DELL support" we are obliged to install  
> redhat. The only option we have is to use ext3fs on rather old  
> kernels. We have 4000 accounts for staff and 2 for students
> The system is rather fast and reliable. BUT..
>
> Once, there was a bad shutdown corrupting ext3fs and we spent 6  
> hours on an fsck.
> Next we discovered that our backup system was going slower and  
> slower. We just pointed out that it was due to fragmentation, and  
> guess what, there's no online defrag tool for ext3.
>
> I'm looking for other solutions:
> ext4fs (does somebody use such filesystem?), xfs
> zfs (but we should switch to solaris or freebsd and throw away our  
> costly SAN)
> use a NetApp Appliance (are you using such a device?, NFS seems to  
> be tricky with cyrus..)

Run Solaris, but keep a machine on the SAN with that old version of  
RedHat that you can use to replicate any problems you have? ;)

-rob

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


choosing a file system

2008-12-30 Thread LALOT Dominique
Hello,

We are using cyrus-imap for a long time. Our architecture is a SAN from EMC
and thanks to our "DELL support" we are obliged to install redhat. The only
option we have is to use ext3fs on rather old kernels. We have 4000 accounts
for staff and 2 for students
The system is rather fast and reliable. BUT..

Once, there was a bad shutdown corrupting ext3fs and we spent 6 hours on an
fsck.
Next we discovered that our backup system was going slower and slower. We
just pointed out that it was due to fragmentation, and guess what, there's
no online defrag tool for ext3.

I'm looking for other solutions:
ext4fs (does somebody use such filesystem?), xfs
zfs (but we should switch to solaris or freebsd and throw away our costly
SAN)
use a NetApp Appliance (are you using such a device?, NFS seems to be tricky
with cyrus..)

Thanks for your advice

Dom

-- 
Dominique LALOT
Ingénieur Systèmes et Réseaux
http://annuaire.univmed.fr/showuser?uid=lalot

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html