Not receiving e-mail on submission port

2011-10-09 Thread Tolga

Hi,

I was reported that the e-mails people send never get to my server. When 
I first was reported, I tested and it was true. I tested some more and 
found out I couldn't get e-mail when I switch to submission port. How 
can I fix this? You can find my postconf -n and master.cf below (at the 
moment it's not using submission port.


postconf -n
alias_database = hash:/etc/aliases
alias_maps = hash:/etc/aliases
append_dot_mydomain = no
biff = no
broken_sasl_auth_clients = yes
config_directory = /etc/postfix
debug_peer_level = 3
debug_peer_list = localhost
html_directory = /usr/share/doc/postfix/html
inet_interfaces = all
mailbox_command = procmail -a "$EXTENSION"
mailbox_size_limit = 0
mydestination = localhost
myhostname = vps.ozses.net
mynetworks = 127.0.0.0/8 127.0.0.2/32 184.82.40.0/24 64.120.177.0/24
myorigin = /etc/mailname
readme_directory = /usr/share/doc/postfix
recipient_delimiter = +
relayhost =
smtpd_banner = $myhostname ESMTP $mail_name (Ubuntu)
smtpd_recipient_restrictions = permit_mynetworks,  
permit_sasl_authenticated,  reject_non_fqdn_hostname,  
reject_non_fqdn_sender,  reject_non_fqdn_recipient,  
reject_unauth_destination,  reject_unauth_pipelining,  
reject_invalid_hostname

smtpd_sasl_auth_enable = yes
smtpd_sasl_local_domain = $myhostname
smtpd_sasl_path = private/auth
smtpd_sasl_security_options = noanonymous
smtpd_sasl_type = dovecot
virtual_alias_maps = mysql:/etc/postfix/mysql_virtual_alias_maps.cf
virtual_gid_maps = static:5000
virtual_mailbox_base = /srv/vmail
virtual_mailbox_domains = mysql:/etc/postfix/mysql_virtual_domains_maps.cf
virtual_mailbox_maps = mysql:/etc/postfix/mysql_virtual_mailbox_maps.cf
virtual_minimum_uid = 100
virtual_transport = virtual
virtual_uid_maps = static:5000
root@vps:~# cat /etc/postfix/master.cf
#
# Postfix master process configuration file.  For details on the format
# of the file, see the master(5) manual page (command: "man 5 master").
#
# Do not forget to execute "postfix reload" after editing this file.
#
# ==
# service type  private unpriv  chroot  wakeup  maxproc command + args
#   (yes)   (yes)   (yes)   (never) (100)
# ==
smtp  inet  n   -   n   -   -   smtpd
#submission inet n   -   -   -   -   smtpd
#  -o smtpd_tls_security_level=encrypt
#  -o smtpd_sasl_auth_enable=yes
#  -o smtpd_client_restrictions=permit_sasl_authenticated,reject
#  -o milter_macro_daemon_name=ORIGINATING
#smtps inet  n   -   -   -   -   smtpd
#  -o smtpd_tls_wrappermode=yes
#  -o smtpd_sasl_auth_enable=yes
#  -o smtpd_client_restrictions=permit_sasl_authenticated,reject
#  -o milter_macro_daemon_name=ORIGINATING
#628   inet  n   -   -   -   -   qmqpd
pickupfifo  n   -   -   60  1   pickup
cleanup   unix  n   -   -   -   0   cleanup
qmgr  fifo  n   -   n   300 1   qmgr
#qmgr fifo  n   -   -   300 1   oqmgr
tlsmgrunix  -   -   -   1000?   1   tlsmgr
rewrite   unix  -   -   n   -   -   trivial-rewrite
bounceunix  -   -   -   -   0   bounce
defer unix  -   -   -   -   0   bounce
trace unix  -   -   -   -   0   bounce
verifyunix  -   -   -   -   1   verify
flush unix  n   -   -   1000?   0   flush
proxymap  unix  -   -   n   -   -   proxymap
proxywrite unix -   -   n   -   1   proxymap
smtp  unix  -   -   -   -   -   smtp
# When relaying mail as backup MX, disable fallback_relay to avoid MX loops
relay unix  -   -   -   -   -   smtp
-o smtp_fallback_relay=
#   -o smtp_helo_timeout=5 -o smtp_connect_timeout=5
showq unix  n   -   -   -   -   showq
error unix  -   -   -   -   -   error
retry unix  -   -   -   -   -   error
discard   unix  -   -   -   -   -   discard
local unix  -   n   n   -   -   local
virtual   unix  -   n   n   -   -   virtual
lmtp  unix  -   -   -   -   -   lmtp
anvil unix  -   -   -   -   1   anvil
scacheunix  -   -   -   -   1   scache
#
# 
# Interfaces to non-Postfix software. Be sure to examine the manual
# pages of the non-Postfix software to find out what options it wants.
#
# Many of the following services use the Postfix pipe(8) delivery
# agent.  See the pipe(8) man page for information about ${recipient}
# and other message envelope options.
# ===

Re: Issue integrating with Cyrus-SASL

2011-10-09 Thread Crazedfred
As previously mentioned, the chroot for smtp is turned off:

cat /etc/postfix/master.cf | grep "smtp  inet"
smtp  inet  n   -   n   -   -   smtpd




From: Wietse Venema 
To: Postfix users 
Sent: Wednesday, September 28, 2011 3:07 PM
Subject: Re: Issue integrating with Cyrus-SASL

Crazedfred:
> Any thoughts?

What does the smtpd line in master.cf look like?

If it looks like this:

    smtp      inet  n       -       -       -       -       smtpd

This means that chroot is turned on , and that Postfix won't
be able to talk to saslauthd.

To turn off chroot change it into this and do "postfix reload":

    smtp      inet  n       -       n       -       -       smtpd

Only Debian ships Postfix with chroot turned on. Complain there.


    Wietse

Always check for irregular mail usage of your mail server

2011-10-09 Thread The Doctor
http://www.nk.ca/blog/index.php?/archives/1275-Phishing-spam-mail-script-intercepted.html

-- 
Member - Liberal International  This is doc...@nl2k.ab.ca Ici doc...@nl2k.ab.ca
God, Queen and country! Never Satan President Republic! Beware AntiChrist 
rising! 
https://www.fullyfollow.me/rootnl2k
Ontario, Nfld, and Manitoba boot the extremists out and vote Liberal!


Re: Using Postfix for email retention

2011-10-09 Thread Sahil Tandon
On Mon, 2011-10-10 at 07:20:22 +0530, Janantha Marasinghe wrote:

> I want to know if postfix can be used to save a copy of every e-mail
> sent and received (including attachments) by a mail server for email
> retention. 

See http://article.gmane.org/gmane.mail.postfix.user/221022 and the
mailing list archive for similar discussions.

> If it could indexed for easier searching that would be great!

This has to happen outside of Postfix.

-- 
Sahil Tandon


Re: LDAP table, recursion filter

2011-10-09 Thread Tom Lanyon
On 20/09/2011, at 11:04 AM, Tom Lanyon wrote:
> When using a LDAP lookup table the 'special_result_attribute' parameter is 
> available to allow me to recurse to other DNs [e.g. recursing to members of a 
> LDAP group].  I can also use the 'leaf_result_attribute' parameter to select 
> the attribute I want to return from those recursive DN lookups, but I can't 
> find a way to filter that recursive lookup to avoid returning
> 
> As an example, I have a group with a bunch of members, but a few of those 
> members' objects are marked as 'disabled'.  I'd like to recurse through the 
> group's member DNs to find their 'mail' attribute, but only for members who 
> don't have the 'disabled' attribute set to true [e.g. apply a filter of 
> "(!(disabled=true))"].
> 
> Is it possible to apply such a filter on the recursive DN search?

No bites on this... perhaps it'd help if I gave an example:

LDAP:
 dn: cn=tech-staff,ou=Groups,dc=example,dc=com
 objectclass: top
 objectclass: ldapgroup
 cn: tech-staff
 mail: tech-st...@example.com
 memberdn: uid=adam,ou=People,dc=example,dc=com
 memberdn: uid=bob,ou=People,dc=example,dc=com
 memberdn: uid=chuck,ou=People,dc=example,dc=com

 dn: uid=adam,ou=People,dc=example,dc=com
 objectclass: top
 objectclass: ldapuser
 uid: adam
 mail: a...@example.com

 dn: uid=bob,ou=People,dc=example,dc=com
 objectclass: top
 objectclass: ldapuser
 uid: bob
 mail: b...@example.com
 accountLock: true


Postfix (ldap-group-aliases.cf):

 search_base = ou=Groups,dc=example,dc=com
 query_filter = mail=%s
 result_attribute = mail
 special_result_attribute = memberdn


This is fine, and recurses on the memberdn attributes to find the mail 
attributes for the listed users, but we need a way to filter that recursion 
with a (!(accountLock=true)) filter so that even though bob is a group member, 
his account is disabled so his address shouldn't be expanded...

Advice appreciated.

Regards,
Tom

Using Postfix for email retention

2011-10-09 Thread Janantha Marasinghe

Hi All,

I want to know if postfix can be used to save a copy of every e-mail 
sent and received (including attachments) by a mail server for email 
retention. If it could indexed for easier searching that would be great!

Thnks

J


Re: Premature "No Space left on device" on XFS

2011-10-09 Thread Bron Gondwana
On Sun, Oct 09, 2011 at 06:03:36PM -0500, Stan Hoeppner wrote:
> On 10/9/2011 3:29 PM, Bron Gondwana wrote:
> 
> > I'm honestly more interested in maildir type workload too, spool doesn't
> > get enough traffic usually to care about IO.
> > 
> > (sorry, getting a bit off topic for the postfix list)
> 
> Maybe not off topic.  You're delivering into the maildir mailboxes with
> local(8) right?

Cyrus via LMTP (through an intermediate proxy, what's more) actually.

> > We went with lots of small filesystems to reduce single points of
> > failure rather than one giant filesystem across all our spools.
> 
> Not a bad architecture.  Has a few downsides but one big upside.  Did
> you really mean Postfix spools here, or did you mean to say maildir
> directories?

Destination cyrus directories, yes - sorry, not postfix spools.

> > My goodness.  That's REALLY recent in filesystem times.  Something
>
> XFS has been seeing substantial development for a few years now due to
> interest from RedHat, who plan to make it the default RHEL filesystem in
> the future.  They've dedicated serious resources to the effort,
> including hiring Dave Chinner from SGI.  Dave's major contribution while
> at RedHat has been the code that yields the 10X+ increase in unlink
> performance.  It is enabled by default in 2.6.39 and later kernels.

Fair enough.  It's good to see the extra work going in.

> > that recent plus "all my eggs in one basket" of changing to a
> > large multi-spindle filesystem that would really get the benefits
> > of XFS would be more dangerous than I'm willing to consider.  That's
> 
> That's one opinion, probably not shared by most XFS users.  I assume
> your current architecture is designed to mitigate hardware
> failure--focused on the very rare occasion of filesystem corruption in
> absence of some hardware failure event.  I'd make an educated guess that
> the median size XFS filesystem in the wild today is at least 50TB and
> spans dozens of spindles housed in multiple FC SAN array chassis.

Corruption happens for real.  We get maybe 1-2 per month on average.
Wouldn't even notice them if we didn't actually have the sha1 of
every single email file in the metadata files, and THAT protected
with a crc32 per entry as well.  So we can actually detect them.

> > barely a year old.  At least we're not still running Debian's 2.6.32
> > any more, but still.
> 
> We've been discussing a performance patch to a filesystem driver, not a
> Gnome release.  :)  Age is irrelevant.  It's the mainline default.  If
> you have an "age" hangup WRT kernel patches, well that's just silly.

Seriously?  I do actually build my own kernels still, but upgrading is
always an interesting balancing act, random bits of hardware work
differently - stability is always a question.   Upgrading to a new
gnome release is much less risky.

> > I'll run up some tests again some time, but I'm not thinking of
> > switching soon.
> 
> Don't migrate just to migrate.  If you currently have deficient
> performance with high mailbox concurrency on many spindles, it may make
> sense.  If youre performance is fine, and you have plenty of headroom,
> stick with what you have.
> 
> I evangelize XFS to the masses because it's great for many things, and
> many people haven't heard of it, or know nothing about it.  They simply
> use EXTx because it's the default.  I'm getting to the word out WRT
> possibilities and capabilities.  I'm not trying to _convert_ everyone to
> XFS.
> 
> Apologies to *BSD, AIX, Solaris, HP-UX mail server admins if it appears
> I assume the world is all Linux.  I don't assume that--all the numbers
> out here say it has ~99% of all "UNIX like" server installs.

Well, yeah.  I've heard interestingly mixed things from people running
ZFS too, but mostly positive.  We keep our backups on ZFS on real
Solaris - at least one lot.  The others are on XFS on one of those huge
SAN thingies.  But I don't care so much about performance there,
because I'm reading and writing huge .tar.gz files.  And XFS is good
at that.

Bron.


Re: Premature "No Space left on device" on XFS

2011-10-09 Thread karavelov
- Цитат от Bron Gondwana (br...@fastmail.fm), на 10.10.2011 в 01:50 - 

> On Mon, Oct 10, 2011 at 01:33:31AM +0300, karave...@mail.bg wrote: 
>> Nice setup. And thanks for your work on Cyrus. We are 
>> looking also to move the metadata on SSDs but we have not 
>> found yet cost effective devices - we need at least a pair of 
>> 250G disk for 20-30T spool on a server. 
> 
> You can move cyrus.cache to data now, that's the whole 
> point, because it doesn't need to be mmaped in so much. 
> 
Thanks for the info. 

>> Setting a higher number of allocation groups per XFS 
>> filesystem helps a lot for the concurrency. My rule of 
>> thumb (learnt from databases) is: 
>> number of spindles + 2 * number of CPUs. 
>> You have done the same with multiple filesystems. 
>> 
>> About the fsck times. We experienced a couple of power 
>> failures and XFS comes up in 30-45 minutes (30T in 
>> RAID5 of 12 SATA disks). If the server is shut down 
>> correctly in comes up in a second. 
> 
> Interesting - is that 30-45 minutes actually a proper 
> fsck, or just a log replay? 
> 

I think some kind recovery procedure internal to xfs. The 
XFS log is 2G, so I think it is not just replaying. 

> More interestingly, what's your disaster recovery plan 
> for when you lose multiple disks? Our design is 
> heavily influenced by having lost 3 disks in a RAID6 
> within 12 hours. It took a week to get everyone back 
> from backups, just because of the IO rate limits of 
> the backup server. 

Ouch! You had really bad luck. I do not know how long 
it will take for us to recover from backups. My estimate is 
2-3 weeks if one servers fails. We are looking for better 
options. Your partitioning is a better plan here - smaller 
probability the 2 failing disks to come form one array, 
faster recovery time, etc. 

> 
>> We know that RAID5 is not the best option for write 
>> scalability, but the controller write cache helps a lot. 
> 
> Yeah, we did RAID5 for a while - but it turned out we 
> were still being write limited more than disk space 
> limited, so the last RAID5s are being phased out for 
> more RAID1. 
> 
> Bron. 
> 

-- 
Luben Karavelov


--
Luben Karavelov

Re: Premature "No Space left on device" on XFS

2011-10-09 Thread vg_ us

--
From: "Bron Gondwana" 
Sent: Sunday, October 09, 2011 6:28 PM
To: "vg_ us" 
Cc: "Bron Gondwana" ; "Stan Hoeppner" 
; 

Subject: Re: Premature "No Space left on device" on XFS


On Sun, Oct 09, 2011 at 04:42:25PM -0400, vg_ us wrote:

From: "Bron Gondwana" 
>I'm honestly more interested in maildir type workload too, spool doesn't
>get enough traffic usually to care about IO.

will postmark transaction test do? here - 
http://www.phoronix.com/scan.php?page=article&item=linux_2639_fs&num=1

stop arguing - I think postmark transaction was the only relevant
test XFS was loosing badly - not anymore...
search www.phoronix.com for other tests - there is one for every
kernel version.


Sorry, I don't change filesystems every week just because
the latest shiny got a better benchmark.  I need a pretty
compelling reason, and what's most impressive there is
how shockingly bad XFS was before 2.6.39.  I don't think
there's many stable distributions out there shipping 2.6.39
yet, which means you're bleeding all sorts of edges to get
a faster filesystem...



Ahhh - which part of "search www.phoronix.com for other tests" did you miss?
All I meant - benchmarks are out there...


... and you're storing your customers' email on that.

But - you have convinced me that it may be time to take
another round of tests - particularly since we've added
another couple of database files since my last test,
which will increase the linear IO slightly on regular use.
It may be worth comparing again.  But I will still advise
ext4 to anyone who asks right now.

Bron.



Re: Premature "No Space left on device" on XFS

2011-10-09 Thread Stan Hoeppner
On 10/9/2011 3:29 PM, Bron Gondwana wrote:

> I'm honestly more interested in maildir type workload too, spool doesn't
> get enough traffic usually to care about IO.
> 
> (sorry, getting a bit off topic for the postfix list)

Maybe not off topic.  You're delivering into the maildir mailboxes with
local(8) right?

> We went with lots of small filesystems to reduce single points of
> failure rather than one giant filesystem across all our spools.

Not a bad architecture.  Has a few downsides but one big upside.  Did
you really mean Postfix spools here, or did you mean to say maildir
directories?

> No, not really.  I'm not going to advise people to use something that
> requires a lot of tuning.

My point was that if a workload requires, or can benefit from XFS, it
requires a learning curve, and is worth the effort.

> My goodness.  That's REALLY recent in filesystem times.  Something

XFS has been seeing substantial development for a few years now due to
interest from RedHat, who plan to make it the default RHEL filesystem in
the future.  They've dedicated serious resources to the effort,
including hiring Dave Chinner from SGI.  Dave's major contribution while
at RedHat has been the code that yields the 10X+ increase in unlink
performance.  It is enabled by default in 2.6.39 and later kernels.

> that recent plus "all my eggs in one basket" of changing to a
> large multi-spindle filesystem that would really get the benefits
> of XFS would be more dangerous than I'm willing to consider.  That's

That's one opinion, probably not shared by most XFS users.  I assume
your current architecture is designed to mitigate hardware
failure--focused on the very rare occasion of filesystem corruption in
absence of some hardware failure event.  I'd make an educated guess that
the median size XFS filesystem in the wild today is at least 50TB and
spans dozens of spindles housed in multiple FC SAN array chassis.

> barely a year old.  At least we're not still running Debian's 2.6.32
> any more, but still.

We've been discussing a performance patch to a filesystem driver, not a
Gnome release.  :)  Age is irrelevant.  It's the mainline default.  If
you have an "age" hangup WRT kernel patches, well that's just silly.

> I'll run up some tests again some time, but I'm not thinking of
> switching soon.

Don't migrate just to migrate.  If you currently have deficient
performance with high mailbox concurrency on many spindles, it may make
sense.  If youre performance is fine, and you have plenty of headroom,
stick with what you have.

I evangelize XFS to the masses because it's great for many things, and
many people haven't heard of it, or know nothing about it.  They simply
use EXTx because it's the default.  I'm getting to the word out WRT
possibilities and capabilities.  I'm not trying to _convert_ everyone to
XFS.

Apologies to *BSD, AIX, Solaris, HP-UX mail server admins if it appears
I assume the world is all Linux.  I don't assume that--all the numbers
out here say it has ~99% of all "UNIX like" server installs.

-- 
Stan


Re: Premature "No Space left on device" on XFS

2011-10-09 Thread Bron Gondwana
On Mon, Oct 10, 2011 at 01:49:44AM +0300, karave...@mail.bg wrote:
> I do not trust Postmark - it models mbox appending and skips
> fsync-s. So it is too different from our setup. The best benchmark 
> tool I have found is imaptest (from dovecot fame) - it is actually 
> end to end benchmarking, including the IMAP server.

I use imaptest as something to throw against my Cyrus
dev builds to check I haven't broken anything quickly.
It's very good.  Of course, I run those on tmpfs so my
machine doesn't grind to a halt!

> The last fs tests  I have done were April and there is no 
> fundamental change in the filesystems since then. Make your 
> test and see yourself. The setup here was XFS so we changed 
> only a mount option - delaylog was not default before 2.6.39.
> Ext4 is also a nice choice but we have problems with long fsck 
> times.

Agree, long fsck times suck.  Then again, we have very reliable
UPS setup and multiple power supplies on separate UPSes for
every machine.  I think the last time we lost a single power
channel was about 4 years ago, and I don't recall ever losing
both channels.

Bron.


Re: Premature "No Space left on device" on XFS

2011-10-09 Thread Bron Gondwana
On Mon, Oct 10, 2011 at 01:33:31AM +0300, karave...@mail.bg wrote:
> Nice setup. And thanks for your work on Cyrus. We are 
> looking also to move the metadata on SSDs but we have not
> found yet cost effective devices - we need at least a pair of 
> 250G disk for 20-30T spool on a server. 

You can move cyrus.cache to data now, that's the whole
point, because it doesn't need to be mmaped in so much.

> Setting a higher number  of allocation groups per XFS 
> filesystem helps a lot for the concurrency. My rule of 
> thumb (learnt from databases) is: 
> number of spindles + 2 * number of CPUs.
> You have done the same with multiple filesystems.
>
> About the fsck times. We experienced a couple of power
> failures and XFS comes up in 30-45 minutes  (30T in
> RAID5 of 12 SATA disks).  If the server is shut down 
> correctly in comes up in a second.

Interesting - is that 30-45 minutes actually a proper
fsck, or just a log replay?

More interestingly, what's your disaster recovery plan
for when you lose multiple disks?  Our design is
heavily influenced by having lost 3 disks in a RAID6
within 12 hours.  It took a week to get everyone back
from backups, just because of the IO rate limits of
the backup server.

> We know that RAID5 is not the best option for write 
> scalability, but the controller write cache helps a lot.

Yeah, we did RAID5 for a while - but it turned out we
were still being write limited more than disk space
limited, so the last RAID5s are being phased out for
more RAID1.

Bron.


Re: Premature "No Space left on device" on XFS

2011-10-09 Thread karavelov
- Цитат от Bron Gondwana (br...@fastmail.fm), на 10.10.2011 в 01:28 -

> On Sun, Oct 09, 2011 at 04:42:25PM -0400, vg_ us wrote:
>> From: "Bron Gondwana" 
>> >I'm honestly more interested in maildir type workload too, spool doesn't
>> >get enough traffic usually to care about IO.
>> 
>> will postmark transaction test do? here - 
>> http://www.phoronix.com/scan.php?page=article&item=linux_2639_fs&num=1
>> stop arguing - I think postmark transaction was the only relevant
>> test XFS was loosing badly - not anymore...
>> search www.phoronix.com for other tests - there is one for every
>> kernel version.
> 
> Sorry, I don't change filesystems every week just because
> the latest shiny got a better benchmark.  I need a pretty
> compelling reason, and what's most impressive there is
> how shockingly bad XFS was before 2.6.39.  I don't think
> there's many stable distributions out there shipping 2.6.39
> yet, which means you're bleeding all sorts of edges to get
> a faster filesystem...
> 
> ... and you're storing your customers' email on that.
> 
> But - you have convinced me that it may be time to take
> another round of tests - particularly since we've added
> another couple of database files since my last test,
> which will increase the linear IO slightly on regular use.
> It may be worth comparing again.  But I will still advise
> ext4 to anyone who asks right now.
> 
> Bron.
> 

I do not trust Postmark - it models mbox appending and skips
fsync-s. So it is too different from our setup. The best benchmark 
tool I have found is imaptest (from dovecot fame) - it is actually 
end to end benchmarking, including the IMAP server.

The last fs tests  I have done were April and there is no 
fundamental change in the filesystems since then. Make your 
test and see yourself. The setup here was XFS so we changed 
only a mount option - delaylog was not default before 2.6.39.
Ext4 is also a nice choice but we have problems with long fsck 
times.

Best regards
--
Luben Karavelov

Re: Premature "No Space left on device" on XFS

2011-10-09 Thread Bron Gondwana
On Sun, Oct 09, 2011 at 04:42:25PM -0400, vg_ us wrote:
> will postmark transaction test do? here - 
> http://www.phoronix.com/scan.php?page=article&item=linux_2639_fs&num=1

Oh:

http://blog.goolamabbas.org/2007/06/17/postmark-is-not-a-mail-server-benchmark/

  "Thus it pains me a lot that they are trying to pass of a
   benchmark (Postmark) which does not have a single fsync(2)
   as appropiate for a mail server. "

And that other benchmark posted earlier had barriers turned
off.  Anything that benchmarks without fsync is a lie, because
it can create the file, do something with it, and unlink without
ever having written a byte down to storage.  Woo f'ing hoo.

So no, a postmark transaction test won't do unless you can
show me a resource that says it does fsyncs now, otherwise
you're just playing with improved in-memory datastructures,
and my workload is limited by random disk IO, not CPU.

Bron.


Re: Premature "No Space left on device" on XFS

2011-10-09 Thread karavelov
- Цитат от Bron Gondwana (br...@fastmail.fm), на 10.10.2011 в 01:12 -
> 
> Here's what our current IMAP servers look like:
> 
> 2 x 92GB SSD
> 12 x 2TB SATA
> 
> two of the SATA drives are hotspares - though I'm
> wondering if that's actually necessary now, we
> haven't lost any yet, and we have 24 hr support in
> our datacentres.  Hot swap is probably fine.
> 
> so - 5 x RAID1 for a total of 10TB storage.
> 
> Each 2TB volume is then further split into 4 x 500Gb
> partitions.  The SSD is just a single partition with
> all the metadata, which is a change from our previous
> pattern of separate metadata partitions as well, but
> has been performing OK thanks to the performance of
> SSD.
> 
> The SSDs are in RAID1 as well.
> 
> This gives us 20 separate mailbox databases, which
> not only keeps the size down, but gives us concurrency
> for free - so there's no single points of contention
> for the entire machine.  It gives us small enough
> filesystems that you can actually fsck them in a day,
> and fill up a new replica in a day as well.
> 
> And it means when we need to shut down a single machine,
> the masters transfer to quite a few other machines
> rather than one replica host taking all the load, so
> it spreads things around nicely.
> 
> This is letting us throw a couple of hundred thousand
> users on a single one of these machines and barely
> break a sweat.  It took a year or so of work to rewrite
> the internals of Cyrus IMAP to cut down the IO hits on
> the SATA drives, but it was worth it.
> 
> Total cost for one of these boxes, with 48GB RAM and a
> pair of CPUs is under US $13k - and they scale very
> linearly - throw a handful of them into the datacentre
> and toss some replicas on there.  Easy.
> 
> And there's no single point of failure - each machine
> is totally standalone - with its own CPU, its own
> storage, its own metadata.  Nice.
> 
> So yeah, I'm quite happy with the sweet spot that I've
> found at the moment - and it means that a single machine
> has 21 separate filesystems on it.  So long as there's
> no massive lock that all the filesystems have to go
> through, we get the scalability horizontally rather
> than vertically.
> 
> Bron.
> 

Nice setup. And thanks for your work on Cyrus. We are 
looking also to move the metadata on SSDs but we have not
found yet cost effective devices - we need at least a pair of 
250G disk for 20-30T spool on a server. 

Setting a higher number  of allocation groups per XFS 
filesystem helps a lot for the concurrency. My rule of 
thumb (learnt from databases) is: 
number of spindles + 2 * number of CPUs.
You have done the same with multiple filesystems.

About the fsck times. We experienced a couple of power
failures and XFS comes up in 30-45 minutes  (30T in
RAID5 of 12 SATA disks).  If the server is shut down 
correctly in comes up in a second.

We know that RAID5 is not the best option for write 
scalability, but the controller write cache helps a lot.

Best regards
--
Luben Karavelov

Re: Premature "No Space left on device" on XFS

2011-10-09 Thread Bron Gondwana
On Sun, Oct 09, 2011 at 04:42:25PM -0400, vg_ us wrote:
> From: "Bron Gondwana" 
> >I'm honestly more interested in maildir type workload too, spool doesn't
> >get enough traffic usually to care about IO.
> 
> will postmark transaction test do? here - 
> http://www.phoronix.com/scan.php?page=article&item=linux_2639_fs&num=1
> stop arguing - I think postmark transaction was the only relevant
> test XFS was loosing badly - not anymore...
> search www.phoronix.com for other tests - there is one for every
> kernel version.

Sorry, I don't change filesystems every week just because
the latest shiny got a better benchmark.  I need a pretty
compelling reason, and what's most impressive there is
how shockingly bad XFS was before 2.6.39.  I don't think
there's many stable distributions out there shipping 2.6.39
yet, which means you're bleeding all sorts of edges to get
a faster filesystem...

... and you're storing your customers' email on that.

But - you have convinced me that it may be time to take
another round of tests - particularly since we've added
another couple of database files since my last test,
which will increase the linear IO slightly on regular use.
It may be worth comparing again.  But I will still advise
ext4 to anyone who asks right now.

Bron.


Re: Premature "No Space left on device" on XFS

2011-10-09 Thread Bron Gondwana
On Sun, Oct 09, 2011 at 03:24:44PM -0500, Stan Hoeppner wrote:
> That said, there are plenty of mailbox
> servers in the wild that would benefit from the XFS + linear concat
> setup.  It doesn't require an insane drive count, such as the 136 in the
> test system above, to demonstrate the gains, especially against EXT3/4
> with RAID5/6 on the same set of disks.  I think somewhere between 16-32
> should do it, which is probably somewhat typical of mailbox storage
> servers at many sites.

Here's what our current IMAP servers look like:

2 x 92GB SSD
12 x 2TB SATA

two of the SATA drives are hotspares - though I'm
wondering if that's actually necessary now, we
haven't lost any yet, and we have 24 hr support in
our datacentres.  Hot swap is probably fine.

so - 5 x RAID1 for a total of 10TB storage.

Each 2TB volume is then further split into 4 x 500Gb
partitions.  The SSD is just a single partition with
all the metadata, which is a change from our previous
pattern of separate metadata partitions as well, but
has been performing OK thanks to the performance of
SSD.

The SSDs are in RAID1 as well.

This gives us 20 separate mailbox databases, which
not only keeps the size down, but gives us concurrency
for free - so there's no single points of contention
for the entire machine.  It gives us small enough
filesystems that you can actually fsck them in a day,
and fill up a new replica in a day as well.

And it means when we need to shut down a single machine,
the masters transfer to quite a few other machines
rather than one replica host taking all the load, so
it spreads things around nicely.

This is letting us throw a couple of hundred thousand
users on a single one of these machines and barely
break a sweat.  It took a year or so of work to rewrite
the internals of Cyrus IMAP to cut down the IO hits on
the SATA drives, but it was worth it.

Total cost for one of these boxes, with 48GB RAM and a
pair of CPUs is under US $13k - and they scale very
linearly - throw a handful of them into the datacentre
and toss some replicas on there.  Easy.

And there's no single point of failure - each machine
is totally standalone - with its own CPU, its own
storage, its own metadata.  Nice.

So yeah, I'm quite happy with the sweet spot that I've
found at the moment - and it means that a single machine
has 21 separate filesystems on it.  So long as there's
no massive lock that all the filesystems have to go
through, we get the scalability horizontally rather
than vertically.

Bron.


Re: Premature "No Space left on device" on XFS

2011-10-09 Thread karavelov
- Цитат от Bron Gondwana (br...@fastmail.fm), на 09.10.2011 в 23:29 -
> 
> My goodness.  That's REALLY recent in filesystem times.  Something
> that recent plus "all my eggs in one basket" of changing to a
> large multi-spindle filesystem that would really get the benefits
> of XFS would be more dangerous than I'm willing to consider.  That's
> barely a year old.  At least we're not still running Debian's 2.6.32
> any more, but still.
> 
> I'll run up some tests again some time, but I'm not thinking of
> switching soon.
> 

I run a couple of busy postfix MX servers with queues now on XFS:
average: 400 deliveries per minute 
peak: 1200 deliveries per minute.

4 months ago they were hosted on 8 core Xeon, 6xSAS10k RAID 10 
machines. The spools were on ext4.

When I have switched the queue filesystem to XFS with delaylog option 
(around 2.6.36) the load average dropped from 2.5 to 0.5.

Now I run the same servers on smaller machines - dual core Opterons. 
The queues are on one Intel SLC SSD. The load average of the machines 
is under 0.2.

Now, about the spools. They are managed by Cyrus, so not a Maildir but 
close. We have now in use 2 types of servers for spools:
24 SATA x 1T disks in RAID5
12 SATA x 3T disks in RAID5.
The mail spools and other mail related filesystems are on XFS with 
delaylog option. They run with average 200 TPS 

Yes, the expunges take some time. But we run the task every night for 
1/7 of the mailboxes, so every mailbox is expunged once in a week. The 
expunge task runs for 2-3 hours on around 50k mailboxes.

I have done some test with BTRFS for spools but I am quite disappointed -
horrible performance and horrible stability. 

The only other promising option was ZFS but it means to switch also the 
OS to FreeBSD or some form of Solaris. And we are not there yet.

Best regards
--
Luben Karavelov

Re: Premature "No Space left on device" on XFS

2011-10-09 Thread vg_ us



--
From: "Bron Gondwana" 
Sent: Sunday, October 09, 2011 4:29 PM
To: "Stan Hoeppner" 
Cc: 
Subject: Re: Premature "No Space left on device" on XFS


On Sun, Oct 09, 2011 at 02:31:19PM -0500, Stan Hoeppner wrote:

On 10/9/2011 8:36 AM, Bron Gondwana wrote:
> How many people are running their mail servers on 24-32 SAS spindles
> verses those running them on two spindles in RAID1?

These results are for a maildir type workload, i.e. POP/IMAP, not a
spool workload.  I believe I already stated previously that XFS is not
an optimal filesystem for a spool workload but would work well enough if
setup properly.  There's typically not enough spindles nor concurrency
to take advantage of XFS' strengths on a spool workload.


I'm honestly more interested in maildir type workload too, spool doesn't
get enough traffic usually to care about IO.


will postmark transaction test do? here - 
http://www.phoronix.com/scan.php?page=article&item=linux_2639_fs&num=1
stop arguing - I think postmark transaction was the only relevant test XFS 
was loosing badly - not anymore...
search www.phoronix.com for other tests - there is one for every kernel 
version.


- Vadim Grigoryan



(sorry, getting a bit off topic for the postfix list)


> Wow - just what I love doing.  Building intimate knowledge of the
> XFS allocation group architecture to run up a mail server.  I'll
> get right on it.

As with anything you pick the right tool for the job.  If your job
requires the scalability of XFS you'd learn to use it.  Apparently your
workload doesn't.


We went with lots of small filesystems to reduce single points of
failure rather than one giant filesystem across all our spools.
I'm still convinced that it's a better way to do it, despite people
trying to convince me to throw all my eggs in one basket again.
SANs are great they say, never had any problems, they say.


> Sarcasm aside - if you ship with stupid-ass defaults, don't be
> surprised if people say the product isn't a good choice for
> regular users.

I think you missed the point.


No, not really.  I'm not going to advise people to use something that
requires a lot of tuning.


> I tried XFS for our workload (RAID1 sets, massive set of unlinks once
> per week when we do the weekly expunge cleanup) - and the unlinks were
> just so nasty that we decided not to use it.  I was really hoping for
> btrfs to be ready for prime-time by now, but that's looking unlikely
> to happen any time soon.

Take another look at XFS.  See below, specifically the unlink numbers in
the 2nd linked doc.


hmm...


> Maybe my tuning fu was bad - but you know what, I did a bit of reading
> and chose options that provided similar consistency guarantees to the
> options we were currently using with reiserfs.  Besides, 2.6.17 was
> still recent memory at the time, and it didn't encourage me much.

It was not your lack of tuning fu.  XFS metadata write performance was
abysmal before 2.6.35.  For example deleting a kernel source tree took
10+ times longer than EXT3/4.  Look at the performance since the delayed
logging patch was introduced in 2.6.35.  With a pure unlink workload
it's now up to par with EXT4 performance up to 4 threads, and surpasses
it by a factor of two or more at 8 threads and greater.  XFS' greatest
strength, parallelism, now covers unlink performance, where it was
severely lacking for many years, both on IRIX and Linux.

The design document:
http://xfs.org/index.php/Improving_Metadata_Performance_By_Reducing_Journal_Overhead

Thread discussing the performance gains:
http://oss.sgi.com/archives/xfs/2010-05/msg00329.html


My goodness.  That's REALLY recent in filesystem times.  Something
that recent plus "all my eggs in one basket" of changing to a
large multi-spindle filesystem that would really get the benefits
of XFS would be more dangerous than I'm willing to consider.  That's
barely a year old.  At least we're not still running Debian's 2.6.32
any more, but still.

I'll run up some tests again some time, but I'm not thinking of
switching soon.

Bron.



Re: Premature "No Space left on device" on XFS

2011-10-09 Thread Bron Gondwana
On Sun, Oct 09, 2011 at 02:31:19PM -0500, Stan Hoeppner wrote:
> On 10/9/2011 8:36 AM, Bron Gondwana wrote:
> > How many people are running their mail servers on 24-32 SAS spindles
> > verses those running them on two spindles in RAID1?
> 
> These results are for a maildir type workload, i.e. POP/IMAP, not a
> spool workload.  I believe I already stated previously that XFS is not
> an optimal filesystem for a spool workload but would work well enough if
> setup properly.  There's typically not enough spindles nor concurrency
> to take advantage of XFS' strengths on a spool workload.

I'm honestly more interested in maildir type workload too, spool doesn't
get enough traffic usually to care about IO.

(sorry, getting a bit off topic for the postfix list)

> > Wow - just what I love doing.  Building intimate knowledge of the
> > XFS allocation group architecture to run up a mail server.  I'll
> > get right on it.
> 
> As with anything you pick the right tool for the job.  If your job
> requires the scalability of XFS you'd learn to use it.  Apparently your
> workload doesn't.

We went with lots of small filesystems to reduce single points of
failure rather than one giant filesystem across all our spools.
I'm still convinced that it's a better way to do it, despite people
trying to convince me to throw all my eggs in one basket again.
SANs are great they say, never had any problems, they say.

> > Sarcasm aside - if you ship with stupid-ass defaults, don't be
> > surprised if people say the product isn't a good choice for
> > regular users.
> 
> I think you missed the point.

No, not really.  I'm not going to advise people to use something that
requires a lot of tuning.

> > I tried XFS for our workload (RAID1 sets, massive set of unlinks once
> > per week when we do the weekly expunge cleanup) - and the unlinks were
> > just so nasty that we decided not to use it.  I was really hoping for
> > btrfs to be ready for prime-time by now, but that's looking unlikely
> > to happen any time soon.
> 
> Take another look at XFS.  See below, specifically the unlink numbers in
> the 2nd linked doc.

hmm...

> > Maybe my tuning fu was bad - but you know what, I did a bit of reading
> > and chose options that provided similar consistency guarantees to the
> > options we were currently using with reiserfs.  Besides, 2.6.17 was
> > still recent memory at the time, and it didn't encourage me much.
> 
> It was not your lack of tuning fu.  XFS metadata write performance was
> abysmal before 2.6.35.  For example deleting a kernel source tree took
> 10+ times longer than EXT3/4.  Look at the performance since the delayed
> logging patch was introduced in 2.6.35.  With a pure unlink workload
> it's now up to par with EXT4 performance up to 4 threads, and surpasses
> it by a factor of two or more at 8 threads and greater.  XFS' greatest
> strength, parallelism, now covers unlink performance, where it was
> severely lacking for many years, both on IRIX and Linux.
> 
> The design document:
> http://xfs.org/index.php/Improving_Metadata_Performance_By_Reducing_Journal_Overhead
> 
> Thread discussing the performance gains:
> http://oss.sgi.com/archives/xfs/2010-05/msg00329.html

My goodness.  That's REALLY recent in filesystem times.  Something
that recent plus "all my eggs in one basket" of changing to a
large multi-spindle filesystem that would really get the benefits
of XFS would be more dangerous than I'm willing to consider.  That's
barely a year old.  At least we're not still running Debian's 2.6.32
any more, but still.

I'll run up some tests again some time, but I'm not thinking of
switching soon.

Bron.


Re: Premature "No Space left on device" on XFS

2011-10-09 Thread Stan Hoeppner
On 10/9/2011 9:32 AM, Wietse Venema wrote:
> Stan Hoeppner:
>> On 10/8/2011 3:33 PM, Wietse Venema wrote:
>>> That's a lot of text. How about some hard numbers?
>>
>> Maybe not the perfect example, but here's one such high concurrency
>> synthetic mail server workload comparison showing XFS with a substantial
>> lead over everything but JFS, in which case the lead is much smaller:
>>
>> http://btrfs.boxacle.net/repository/raid/history/History_Mail_server_simulation._num_threads=128.html
> 
> I see no write operations, no unlink operations, and no rename
> operations.

Apologies.  I should have provided more links.  The site isn't setup for
easy navigation...

>From the webroot of the site: http://btrfs.boxacle.net/

Mail Server (raid, single-disk)

Start with one million files spread across one thousand directories.
File sizes range from 1 kB to 1 MB
Each thread creates a new file, reads an entire existing file, or
deletes a file.
57% (4/7) reads
29% (2/7) creates
14% (1/7) deletes
All reads and writes are done in 4 kB blocks.

> Comments on performance are welcome, but I prefer that they are
> based on first-hand experience, and preferably on configurations
> that are likely to be seen in the wild.

I would love to publish first hand experience.  Unfortunately to
sufficiently demonstrate the gains I would need quite a few more
spindles than I currently have available.  With my current hardware the
gains w/XFS are in the statistical noise range as I can't sustain enough
parallelism at the spindles.  That said, there are plenty of mailbox
servers in the wild that would benefit from the XFS + linear concat
setup.  It doesn't require an insane drive count, such as the 136 in the
test system above, to demonstrate the gains, especially against EXT3/4
with RAID5/6 on the same set of disks.  I think somewhere between 16-32
should do it, which is probably somewhat typical of mailbox storage
servers at many sites.

Again, this setup is geared to parallel IMAP/POP and Postfix local
delivery type performance, not spool performance.  The discussion in
this thread drifted at one point away from strictly the spool.  I've
been addressing the other part.  Again, XFS isn't optimal for a typical
Postfix spool.  I never made that case.  For XFS to yield an increase in
spool performance would likely require and unrealistically high inbound
mail flow rate and a high spindle count to sink the messages.

I'll work on getting access to suitable hardware so I can publish some
thorough first hand head-to-head numbers, hopefully with a test harness
that will use Postfix/SMTP and Dovecot/IMAP instead of a purely
synthetic benchmark.

-- 
Stan


Re: Premature "No Space left on device" on XFS

2011-10-09 Thread Stan Hoeppner
On 10/9/2011 8:36 AM, Bron Gondwana wrote:

>> http://btrfs.boxacle.net/repository/raid/history/History_Mail_server_simulation._num_threads=128.html
> 
> Sorry - I don't see unlinks there.  Maybe I'm not not reading very
> carefully...

Unfortunately the web isn't littered with a gazillion head-to-head
filesystem scalability benchmark results using a spool or maildir type
workload.  And I've yet to see one covering multiple operating systems.
 If not for the creation of BTRFS we'd not have the limited set of
results above, which to this point is the most comprehensive I've seen
for anything resembling recent Linux kernel versions.

> How many people are running their mail servers on 24-32 SAS spindles
> verses those running them on two spindles in RAID1?

These results are for a maildir type workload, i.e. POP/IMAP, not a
spool workload.  I believe I already stated previously that XFS is not
an optimal filesystem for a spool workload but would work well enough if
setup properly.  There's typically not enough spindles nor concurrency
to take advantage of XFS' strengths on a spool workload.

> Wow - just what I love doing.  Building intimate knowledge of the
> XFS allocation group architecture to run up a mail server.  I'll
> get right on it.

As with anything you pick the right tool for the job.  If your job
requires the scalability of XFS you'd learn to use it.  Apparently your
workload doesn't.

> Sarcasm aside - if you ship with stupid-ass defaults, don't be
> surprised if people say the product isn't a good choice for
> regular users.

I think you missed the point.

> I tried XFS for our workload (RAID1 sets, massive set of unlinks once
> per week when we do the weekly expunge cleanup) - and the unlinks were
> just so nasty that we decided not to use it.  I was really hoping for
> btrfs to be ready for prime-time by now, but that's looking unlikely
> to happen any time soon.

Take another look at XFS.  See below, specifically the unlink numbers in
the 2nd linked doc.

> Maybe my tuning fu was bad - but you know what, I did a bit of reading
> and chose options that provided similar consistency guarantees to the
> options we were currently using with reiserfs.  Besides, 2.6.17 was
> still recent memory at the time, and it didn't encourage me much.

It was not your lack of tuning fu.  XFS metadata write performance was
abysmal before 2.6.35.  For example deleting a kernel source tree took
10+ times longer than EXT3/4.  Look at the performance since the delayed
logging patch was introduced in 2.6.35.  With a pure unlink workload
it's now up to par with EXT4 performance up to 4 threads, and surpasses
it by a factor of two or more at 8 threads and greater.  XFS' greatest
strength, parallelism, now covers unlink performance, where it was
severely lacking for many years, both on IRIX and Linux.

The design document:
http://xfs.org/index.php/Improving_Metadata_Performance_By_Reducing_Journal_Overhead

Thread discussing the performance gains:
http://oss.sgi.com/archives/xfs/2010-05/msg00329.html

-- 
Stan


Re: Premature "No Space left on device" on XFS

2011-10-09 Thread Wietse Venema
Stan Hoeppner:
> On 10/8/2011 3:33 PM, Wietse Venema wrote:
> > That's a lot of text. How about some hard numbers?
> 
> Maybe not the perfect example, but here's one such high concurrency
> synthetic mail server workload comparison showing XFS with a substantial
> lead over everything but JFS, in which case the lead is much smaller:
> 
> http://btrfs.boxacle.net/repository/raid/history/History_Mail_server_simulation._num_threads=128.html

I see no write operations, no unlink operations, and no rename
operations.

Comments on performance are welcome, but I prefer that they are
based on first-hand experience, and preferably on configurations
that are likely to be seen in the wild.

Wietse


Re: Premature "No Space left on device" on XFS

2011-10-09 Thread Bron Gondwana
On Sun, Oct 09, 2011 at 03:56:39AM -0500, Stan Hoeppner wrote:
> On 10/8/2011 3:33 PM, Wietse Venema wrote:
> > That's a lot of text. How about some hard numbers?
> 
> Maybe not the perfect example, but here's one such high concurrency
> synthetic mail server workload comparison showing XFS with a substantial
> lead over everything but JFS, in which case the lead is much smaller:
> 
> http://btrfs.boxacle.net/repository/raid/history/History_Mail_server_simulation._num_threads=128.html

Sorry - I don't see unlinks there.  Maybe I'm not not reading very
carefully...

> If anyone has a relatively current (4 years) bare metal "lab" box with
> say 24-32 locally attached SAS drives (the more the better) to which I
> could get SSH KVM access, have pretty much free reign to destroy
> anything on it and build a proper test rig, I'd be happy to do a bunch
> of maildir type workload tests of the various Linux filesystems and
> publish the results, focusing on getting the XFS+linear concat info into
> public view.

How many people are running their mail servers on 24-32 SAS spindles
verses those running them on two spindles in RAID1?

> If not, but if someone with sufficient hardware would like to do this
> project him/herself, I'd be glad to assist getting the XFS+linear concat
> configured correctly.  Unfortunately it's not something one can setup
> without already having a somewhat intimate knowledge of the XFS
> allocation group architecture.  Once performance data is out there, and
> there is demand generated, I'll try to publish a how-to.

Wow - just what I love doing.  Building intimate knowledge of the
XFS allocation group architecture to run up a mail server.  I'll
get right on it.

Sarcasm aside - if you ship with stupid-ass defaults, don't be
surprised if people say the product isn't a good choice for
regular users.

> Wietse has called me out on my assertion.  The XFS allocation group
> design properly combined with a linear concat dictates the performance
> is greater for this workload, simply based on the IO math vs striped
> RAID.  All those who have stated they use it testify to the increased
> performance.  But no one has published competitive analysis yet.  I'd
> love to get such data published as it's a great solution and many could
> benefit from it, at least Linux users anyway--XFS is only available on
> Linux now that IRIX is dead...

I tried XFS for our workload (RAID1 sets, massive set of unlinks once
per week when we do the weekly expunge cleanup) - and the unlinks were
just so nasty that we decided not to use it.  I was really hoping for
btrfs to be ready for prime-time by now, but that's looking unlikely
to happen any time soon.

Maybe my tuning fu was bad - but you know what, I did a bit of reading
and chose options that provided similar consistency guarantees to the
options we were currently using with reiserfs.  Besides, 2.6.17 was
still recent memory at the time, and it didn't encourage me much.

Bron.


Re: Premature "No Space left on device" on XFS

2011-10-09 Thread Stan Hoeppner
On 10/8/2011 3:33 PM, Wietse Venema wrote:
> Stan Hoeppner:
> [ Charset ISO-8859-1 unsupported, converting... ]
>> On 10/8/2011 5:17 AM, Wietse Venema wrote:
>>> Stan Hoeppner:
 nicely.  On the other hand, you won't see an EXTx filesystem capable of
 anywhere close to 10GB/s or greater file IO.  Here XFS doesn't break a
 sweat.
>>>
>>> I recall that XFS was optimized for fast read/write with large
>>> files, while email files are small, and have a comparatively high
>>> metadata overhead (updating directories, inodes etc.). XFS is
>>> probably not optimal here.
>>>
>>> Wietse
>>
>>
>> With modern XFS this really depends on the specific workload and custom
>> settings.  Default XFS has always been very good with large file
>> performance and has been optimized for such.  It was historically
>> hampered by write heavy metadata operations, but was sufficiently fast
>> with metadata read operations, especially at high parallelism.  The
>> 'delaylog' code introduced in 2009 has mostly alleviated the metadata
>> write performance issues.  Delaylog is the default mode since Linux 2.6.39.
>>
>> XFS is not optimized by default for the OP's specific mail workload, but
>> is almost infinitely tunable.  The OP has been given multiple options on
>> the XFS list to fix this problem.  XFS is not unsuitable for this
>> workload.  The 10GB XFS filesystem created by the OP for this workload
>> is not suitable.  Doubling the FS size or tweaking the inode layout
>> fixes the problem.
>>
>> As with most things, optimizing the defaults for some workloads may
>> yield less than optimal performance with others.  By default XFS is less
>> than optimal for a high concurrency maildir workload.  However with a
>> proper storage stack architecture and XFS optimizations it handily
>> outperforms all other filesystems.  This would be the "XFS linear
>> concatenation" setup I believe I've described here previously.
>>
>> XFS can do just about anything you want it to at any performance level
>> you need.  For the non default use cases, it simply requires knowledge,
>> planning, tweaking, testing, and tweaking to get it there, not to
>> mention time.  Alas, the learning curve is very steep.
> 
> That's a lot of text. How about some hard numbers?
> 
>   Wietse

Maybe not the perfect example, but here's one such high concurrency
synthetic mail server workload comparison showing XFS with a substantial
lead over everything but JFS, in which case the lead is much smaller:

http://btrfs.boxacle.net/repository/raid/history/History_Mail_server_simulation._num_threads=128.html

I don't have access to this system so I'm unable to demonstrate the
additional performance of an XFS+linear concat setup.  The throughput
would be considerably higher still.  The 8-way LVM stripe over 17 drive
RAID0 stripes would have caused hot and cold spots within the array
spindles, as wide stripe arrays always do with small file random IOPS
workloads.  Using a properly configured XFS+linear concat in these tests
would likely guarantee full concurrency on 128 of the 136 spindles.  I
say likely as I've not read the test code and don't know exactly how it
behaves WRT directory parallelism.

If anyone has a relatively current (4 years) bare metal "lab" box with
say 24-32 locally attached SAS drives (the more the better) to which I
could get SSH KVM access, have pretty much free reign to destroy
anything on it and build a proper test rig, I'd be happy to do a bunch
of maildir type workload tests of the various Linux filesystems and
publish the results, focusing on getting the XFS+linear concat info into
public view.

If not, but if someone with sufficient hardware would like to do this
project him/herself, I'd be glad to assist getting the XFS+linear concat
configured correctly.  Unfortunately it's not something one can setup
without already having a somewhat intimate knowledge of the XFS
allocation group architecture.  Once performance data is out there, and
there is demand generated, I'll try to publish a how-to.

Wietse has called me out on my assertion.  The XFS allocation group
design properly combined with a linear concat dictates the performance
is greater for this workload, simply based on the IO math vs striped
RAID.  All those who have stated they use it testify to the increased
performance.  But no one has published competitive analysis yet.  I'd
love to get such data published as it's a great solution and many could
benefit from it, at least Linux users anyway--XFS is only available on
Linux now that IRIX is dead...

-- 
Stan