Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-15 Thread Patrick Westenberg
Won't be 15k U320 SCSI disks also faster than average SATA disks for the 
indexes?


Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-15 Thread Javier de Miguel Rodrí­guez

El 15/12/10 14:28, Patrick Westenberg escribió:
Won't be 15k U320 SCSI disks also faster than average SATA disks for 
the indexes?

I am using 2xraid5 of 8 SAS disks of 15k rpm for mailboxes  indexes.

I am evaluating the migration of indexes to 1xraid 1+0 8 disks SAS 
15k rpm


Regards

Javier


Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-15 Thread Stan Hoeppner
Patrick Westenberg put forth on 12/15/2010 7:28 AM:
 Won't be 15k U320 SCSI disks also faster than average SATA disks for the
 indexes?

Yes.  A faster spindle speed allows for greater random IOPS.  Index file
reads/writes are random IOPS, and become more random with greater
concurrency, i.e. more users.

Although, I should point out that parallel SCSI (U320) is pretty much a
dead technology at this point.  AFAIK, no vendor has shipped a new
parallel SCSI disk line (only warranty replacements) for a number of
years now.  It has been superseded by Serial Attached SCSI (SAS).

I should also point out that the interface itself, whether parallel
SCSI, SAS, or SATA, makes little difference for random IOPS performance,
as long as the interface is running below the saturation point.  The
spindle speed and the drive firmware, specifically the queuing
implementation, are what make for a good transactional performance disk
drive.  Typically, 15k drives have the best transactional performance,
then 10k, then 7.2k, etc.

-- 
Stan


Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-14 Thread Stan Hoeppner
Javier de Miguel Rodrí­guez put forth on 12/14/2010 6:15 AM:

  I attach you a screenshot of the perfomance of the lefthand: Average: 15 
 MB/seg, 1.700 IOPS. Highest load (today) is ~62 MB/seg, with a whooping 9000 
 IOPS, mucho above the theorical iops of 2 raid5 of 8 disks each (SAS 15K), 
 the 
 cache is working as expected, and queue depth of 226 (a bit overloaded, 
 though)

Ahh, OK.  Using RAID5 makes a *lot* of difference with random write IOPS
throughput loads such as IMAP with maildir storage or transactional
databases.  For a transactional load like IMAP I assumed you were using
RAID10, which has about double the random write IOPS throughput of RAID5
on most RAID systems.

  I still think that my problem is IOPs related, no bandwith related. My 

That may very well be the case.  As I said, random write IOPS for RAID5
is pretty dismal compared to RAID 10.  Your average IOPS is currently
1,700 and your average queue depth is 12.  Dovecot is write heavy to
index files.  A 15K SAS drive maxes at about 250-300 head seeks/sec.
With RAID5, due to parity read/modify/write cycles for each stripe
block, you end up with about only 2 spindles worth of random write IOPS
seek performance, or about 500-600 random write IOPS.  This actually
gets worse as the number of disks in the parity array increases,
although read performance does scale with additional spindles.

With RAID 10, you get full seek bandwidth to half the disks, or about
1000-1200 IOPS for 8 disks.  At 1700 average IOPS, you are currently
outrunning your RAID5 write IOPS throughput by a factor of 3:1.  Your
disks can't keep up.  This is the reason for your high queue depth.
Even if you ran RAID10 on 8 disks they couldn't keep up with your
current average IOPS needs.  To keep up with your current IOPS load,
you'd need

6x15k SAS = 6x300 seeks/sec = 1800 seeks/sec = 12 SAS drives in RAID10

At *minimum* at this moment, you need a 12 drive RAID10 in each P4300
chassis to satisfy your IOPS needs if continuing to store both indexes
and mailboxen on the same P4300, which, is impossible as it maxes at 8
drives.

The load balancing feature of this product is not designed for
parallel transactional workloads.

 maximum bandwith today was 60 MB/seg, that fits entirely in 1 Gbps, but the 
 queue depth is high because of the lot of iops (9000) that only 16 disks 
 can 
 not handle. I can buy better storage heads to delay all that writes, or avoid 
 a 
 lot of them putting the indexes in a SSD or in a ramdisk.

It's 8 disks, not 16.  HP has led you to believe you actually get linear
IOPS scaling across both server boxes (RAID controllers), which I'm
pretty sure isn't the case.  For file server workloads it may scale
well, but not for transactional workloads.  For these, your performance
is limited to each 8 disk box.

What I would suggest at this point, if you have the spare resources, is
to setup a dedicated P4300 with 8 disks in RAID10, and put nothing on it
but your Dovecot index files (for now anyway).  This will allow you to
still maximize your mailbox storage capacity using RAID5 on the
currently deployed arrays (7 of 8 disks of usable space vs 4 disks with
RAID10), while relieving them of the high IOPS generated to/from the
indexes.

Optimally, in the future, assuming you can't go with SSDs for the
indexes (if you can, do it!), you will want to use the same setup I
mention above, with split index and mail store on separate P4300s with
RAID10 and RAID5 arrays respectively, but using Linux kernel 2.6.36 or
later with XFS and the delaylog mount option for your filesystem
solution for box indexes and mail store.  Combining all of these things
should give you a spindle (not cache) IOPS increase for Dovecot of at
least a factor of 4 over what you have now.

  Thank for all the info, I did not know about Nexsan.

You're welcome Javier.

Nexsan makes great products, especially for the price.  They are very
popular with sites that need maximum space, good performance, and who
need to avoid single vendor lock-in for replication/backup.  Most of
their customers have heterogeneous SANs including arrays from the likes
of IBM, SUN, SGI, Nexsan, HP, DataDirect, HDS, etc, and necessarily use
third party backup/replication solutions instead trying to manage each
vendor specific hardware solution.  Thus, Nexsan forgoes implementing
such hardware replication in most of its products, to keep costs down,
and to put those RD dollars into increasing performance, density,
manageability, and power efficiency.  Their web management GUI is the
slickest, most intuitive, easiest to use interface I've yet seen on a
SAN array.

They have a lot of high profile U.S. government customers including NASA
and many/most of the U.S. nuclear weapons labs.  They've won tons of
industry awards over the past 10 years.  A recent example, Caltech
deployed 2 PB of Nexsan storage early this year to store Spitzer space
telescope data for NASA, a combination of 65 SATABeast and SATABoy 

Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-13 Thread Brad Davidson


On Dec 12, 2010, at 23:26, Javier de Miguel Rodrí guez javierdemig...@us.es 
wrote:

 
My SAN(s) (HP LeftHand Networks) do not support SSD, though. But I have 
 several LeftHand nodes, some of them with raid5, others with raid 1+0. 
 Maildirs+indexes are now in raid5, maybe I can separate the indexes to raid 
 1+0 iscsi target in a different san
 
I have two raid5 (7 disks+1 spare) and I have joined them via LVM 
 stripping. Each disk is SAS 15k rpm 450GB, and the SANs have 512 
 MB-battery-backed-cache. In our real workload (imapsync), each raid5 gives 
 around 1700-1800 IOPS, combined 3.500 IOPS.

Your 'slow' storage is running against 16 15k RPM SAS drives? Those LeftHand 
controllers must be terrible. We have Maildir on NFS on a Netapp with 15k RPM 
450GB FC disks and have never had performance problems, even when running the 
controllers up against the wall by mounting with the noac option (60k NFS 
IOPS!). We were using 500GB 4500 RPM ATA disks at that point - doesn't get much 
slower than that.

Our current environment actually houses POP/IMAP/SMTP/web for 60k accounts, and 
an ESX cluster (12k NFS IOPS) without breaking a sweat. We'll soon be adding 
128 1TB disks to the same controllers for Exchange, and should still have 
capacity to spare.

Not particularly helpful to your situation I know, but next time you are 
looking at storage you might reevaluate your current strategy.

-Brad

Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-13 Thread Javier de Miguel Rodrí­guez

El 13/12/10 10:16, Brad Davidson escribió:


On Dec 12, 2010, at 23:26, Javier de Miguel Rodrí guezjavierdemig...@us.es  
wrote:


My SAN(s) (HP LeftHand Networks) do not support SSD, though. But I have 
several LeftHand nodes, some of them with raid5, others with raid 1+0. 
Maildirs+indexes are now in raid5, maybe I can separate the indexes to raid 1+0 
iscsi target in a different san

I have two raid5 (7 disks+1 spare) and I have joined them via LVM 
stripping. Each disk is SAS 15k rpm 450GB, and the SANs have 512 
MB-battery-backed-cache. In our real workload (imapsync), each raid5 gives 
around 1700-1800 IOPS, combined 3.500 IOPS.

Your 'slow' storage is running against 16 15k RPM SAS drives? Those LeftHand 
controllers must be terrible. We have Maildir on NFS on a Netapp with 15k RPM 
450GB FC disks and have never had performance problems, even when running the 
controllers up against the wall by mounting with the noac option (60k NFS 
IOPS!). We were using 500GB 4500 RPM ATA disks at that point - doesn't get much 
slower than that.


Can you give me (off-list if you desire) more info about your 
setup? I am interested in the number and type of spindles you are using. 
We are using LeftHand because of their real time replication 
capabilities, something very interesting to us, and each node pair is 
relatively cheap (8x450 g...@15k rpm sas disks per node, real time 
replication, 512 MB caché, about 25K € each node pair).


We can throw more hardware to this, let's see if using memory-based 
indexes (via ramdisk) we get better results. Zlib compression on indexes 
should be great for this.


Regards

Javier




Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-13 Thread Timo Sirainen
On Mon, 2010-12-13 at 10:26 +0100, Javier de Miguel Rodrí­guez wrote:
  We can throw more hardware to this, let's see if using memory-based 
 indexes (via ramdisk) we get better results. Zlib compression on indexes 
 should be great for this.

Isn't it possible for Linux to compress the ramdisk? It would be a bit
difficult to implement zlib compression for indexes for now. In future
they'll be written to via lib-fs API, which would make compression
possible too by adding a compression middle layer.




Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-13 Thread Timo Sirainen
On Sat, 2010-12-11 at 23:05 +0700, a...@test123.ru wrote:

 Does anybody know concrete answers? Let's consider IMAP and LDA and forget 
 POP3.
 
 1. Is migration to dovecot 2.0 good idea if I want to decrease I/O?

That alone makes no difference.

 2. Can mdbox help decrease IO?

Hopefully! No one has still given me any real world numbers though..




Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-13 Thread Seth Mattinen
On 12/12/2010 00:49, Stan Hoeppner wrote:
 
 Since Javier is looking for ways to decrease I/O load on the SAN, not
 necessarily increase Dovecot performance, I think putting the index
 files on a ramdisk is best thing to try first.  It may not be a silver
 bullet.  If he's still got spare memory to add to this guest, doing both
 would be better.   Using a ramdisk for the index files will instantly
 remove all index I/O from the SAN.  More of Dovecot's IMAP I/O is to the
 index files than mail files isn't it?  So by moving the index files to
 ramdisk you should pretty much instantly remove half your SAN I/O load.
  This is assuming that Javier currently stores his index files on a SAN LUN.
 


Speaking of ramdisk/SSD, has anyone tried a PCIe SSD for indexes?

~Seth


Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-13 Thread Stan Hoeppner
Javier de Miguel Rodrí­guez put forth on 12/13/2010 1:26 AM:

 Sadly, Red Hat Enterprise Linux 5 does not support natively XFS. I
 can install it via CentosPlus, but we need Red Hat support if somethings
 goes VERY wrong. Red Hat Enterprise Linux 6 supports XFS (and gives me
 dovecot 2.0), but maybe it is too early for a RHEL6 deployment for so
 many users (sigh).
 
 I will continue investigating about indexes. Any additional hint?

Dave Chinner is possibly the most prolific/active developer of XFS.  He
is the author of the XFS delayed logging code (up to 100x+ increase in
matadata write throughput).  He's a distinguished kernel engineer at Red
Hat.  I suggest you send an email to

x...@oss.sgi.com

and ask to be CC'd on replies if you don't join the list.

Briefly, but concisely, describe your current mail storage setup
including hardware (SAN topology and storage devices), software, mail
box store format, average/peak concurrent user load, and what filesystem
you currently use.

Then, save every reply you get and read them at least a couple of times
each, thoroughly.  You should then realize you're leaving a ton of
performance on the table, eating far more resources than you should be,
and generating more SAN traffic that what you would using XFS for the
same workload, especially if you're able to use delaylog with a recent
kernel.  Delaylog pushes metadata operation loads almost entirely into
memory, dramatically decreasing physical I/Os over the SAN, while
simultaneously increasing throughput by an order or magnitude.  For
large maildir installations such as yours, you're almost committing a
crime by using EXT3 and not XFS.

Sadly, until a major distro makes XFS its default enterprise filesystem,
the bulk of the Linux world will have no clue what they've been missing
for the past 7+ years, which is very sad.  For any large parallel
workload, XFS trounces EXT3/4/BTRFS/Reiser by a factor of 5-100+
depending on the exact workload.

-- 
Stan


Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-12 Thread Stan Hoeppner
Eric Rostetter put forth on 12/11/2010 9:48 AM:

 Well, it is true I know nothing about vmware/ESX.  I know in my virtual
 machine setups, I _can_ give the virtual instances access to devices which
 are not used by other virtual instances.  This is what I would do.  Yes,
 it is still virtualized, but it is dedicated, and should still perform
 pretty well -- faster than shared storage, and in the case of SSD faster
 than normal disk or iscsi.

He's running an ESX cluster, which assumes use of HA and Vmotion.  For
Vmotion to work, each node in the cluster must have direct hardware
access to every storage device.  Thus, to use an SSD, it would have to
be installed in Javier's iSCSI SAN array.  Many iSCSI arrays are
relatively inexpensive and don't offer SSD support.

However, Javier didn't ask for ways to increase his I/O throughput.  He
asked for the opposite.  I assume this is because they have a 1 GbE
based ethernet SAN, and probably only 2 or 4, GbE ports on the SAN array
controller.  With only 200 to 400MB/s bidirectional bandwidth, and many
busy guests in the eSX farm, probably many applications besides Dovecot,
Javier's organization is likely coming close to bumping the up against
the bandwidth limits of the 1 GbE links on the SAN array controller.
Thus, adding an SSD to the mix would exacerbate the I/O problem.

Thus, putting the index files in a ramdisk or using the Deovecot memory
only index file parameter are really his only two options that I can
think of that will help in the way he desires.

 He was already asking about throwing memory at the problem, and I think
 he implied he had a lot of memory. As such, the caching is there already.
 Your statement is true, but it is also a zero config option if he really
 does have lots of memory in the machine.

He has physical memory available, but he isn't currently assigning it to
the Dovecot guest.  To do so would require changing the memory setting
in ESX for this guest, then rebooting the guest (unless both ESX and his
OS support hot plug memory--I don't know if ESX does).  This is what
Javier was referring to when stating adding memory.

 And in ext3, the flush rate.  Good point, that I forgot about.  It is set
 to a very small value by default (2-3 seconds maybe), and can be increased
 without too much danger (to say 10-30 seconds).

Just to be clear and accurate here, and it's probably a little OT to the
thread, XFS delaylog isn't designed to decrease filesystem log I/O
activity.  It was designed to dramatically increase the rate of write
operations to the journal log--metadata operations--and the I/O
efficiency for metadata ops.

The major visible benefit of this is a massive increase in delete
performance for many tens of thousands (or more) of files.  It decreases
journal log file fragmentation as more writes can be packed into each
inode due to in memory organization before the physical write.  This
packing thus decreases physical disk I/O as fewer, larger blocks are
written per I/O.  XFS with delaylog is an excellent match for maildir
storage.  It won't help much at all with mbox, very slightly more with
mdbox.

XFS delaylog is a _perfect_ match for the POP3 workload.  Each time a
user pulls, then deletes all messages, delaylog will optimize and then
burst the metadata journal write operations to disk, again, with far
fewer physical I/Os due to the inode optimization.

XFS with delaylog is now much faster than any version of ReiserFS, whose
claim to fame was lighting fast mass file deletion.  As of 2.6.36, XFS
is now the fastest filesystem, and not just on Linux, for almost any
workload.  This assuming real storage hardware that can take handle
massive parallelization of reads and writes.  EXT3 is still faster on a
single disk system.  But EXT3 is the everyman OS, which is optimized
more for the single disk case.  XFS was and still is designed for large
parallel servers with big fast storage.

 Assuming normal downtime stats, this would still be a huge win.  Since the
 machine rarely goes down, it would rarely need to rebuild indexes, and
 hence
 would only run poorly a very small percentage of the time.   Of course, it
 could run _very_ poorly right after a reboot for a while, but then will be
 back to normal soon enough.

I totally concur.

 One way to help mitigate this if using a RAM disk is have your shutdown
 script
 flush the RAM disk to physical disk (after stoping dovecot) and the reload
 it to RAM disk at startup (before starting dovecot).  

Excellent idea Eric.  I'd never considered this.  Truly, that's a
fantastic, creative solution, and should be relatively straightforward
to implement.

 This isn't
 possible if
 you use the dovecot index memory settings though.

Yeah, I think the ramdisk is the way to go here.  At least if/until a
better solution can be found.  I don't really see there is one, other
than his org investing in a faster SAN architecture such as 4/8Gb FC or
10 Gbit iSCSI.

The former can be had relatively 

Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-12 Thread Stan Hoeppner
Patrick Westenberg put forth on 12/11/2010 5:12 AM:
 Stan Hoeppner schrieb:
 
 So, either:

 2.  Move indexes to memory
 
 What steps have to be done and what will the configuration look like
 to have your indexes in memory?

Regarding Dovecot 1.2.x, for maildir, I believe it would be something
like this:

mail_location = maildir:~/Maildir:INDEX=MEMORY

The :INDEX=MEMORY disables writing the index files to disk, and as the
name implies, I believe, simply keeps indexes in memory.

The docs say:

If you really want to, you can also disable the index files completely
by appending :INDEX=MEMORY.

My read of that is that indexing isn't disabled completely, merely
storing the indexes to disk is disables.  The indexes are still built
and maintained in memory.

Timo, is that correct?


Also, due to the potential size of the index files (mine alone are 276
MB on an 877 MB mbox), you'll need to do some additional research to see
if this is a possibility for you.  If using a ramdisk, 100 mail boxen
like mine would require ~ 27.6 GB or RAM just for the index files.  This
would not be logical to do or feasible given the amount of RAM required.
 I don't know if, or how much, storing them in RAM via :INDEX=MEMORY
consumes, as compared to using a ramdisk.  The memory consumption may be
less or it may be more.  Timo should be able to answer this, and give a
recommendation as to whether this is even a sane thing to do.

-- 
Stan


Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-12 Thread Timo Sirainen
On 12.12.2010, at 9.39, Stan Hoeppner wrote:

 mail_location = maildir:~/Maildir:INDEX=MEMORY
 
 The :INDEX=MEMORY disables writing the index files to disk, and as the
 name implies, I believe, simply keeps indexes in memory.

I think maybe I shoudn't have called it INDEX=MEMORY, but rather more like 
INDEX=DISABLE.

 If you really want to, you can also disable the index files completely
 by appending :INDEX=MEMORY.
 
 My read of that is that indexing isn't disabled completely, merely
 storing the indexes to disk is disables.  The indexes are still built
 and maintained in memory.
 
 Timo, is that correct?

It's a per-connection in-memory index. Also there is no kind of caching of 
anything (dovecot.index.cache file, which is where most of Dovecot performance 
usually comes from).

 I don't know if, or how much, storing them in RAM via :INDEX=MEMORY
 consumes, as compared to using a ramdisk.  The memory consumption may be
 less or it may be more.  Timo should be able to answer this, and give a
 recommendation as to whether this is even a sane thing to do.

I think INDEX=MEMORY performance is going to suck. 
http://imapwiki.org/Benchmarking explains IMAP performance a bit more. By 
default Dovecot is the Dynamically caching server, but with INDEX=MEMORY it 
becomes Non-caching server.

Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-12 Thread Javier de Miguel Rodríguez


Thank you very much for all the responses in this thread. Now I have 
more questions:


- I have slow I/O (about 3.5000-4.000 IOPS, measured via 
imapsync), if I enable zlib compression in my maildirs, that should 
lower the number the IOPS (less to read, less to write, less IOPS, more 
CPU). Dovecot 2.0 is better for zlib (lda support) than dovecot 1.2.X..


- I understand that indexes should go to the fastest storage I own. 
Somebody talked about storing them in a ramdisk and then backup them to 
disk on shutdown. I have several questions about that:


- In my setup I have 25.000+ users, almost 7.000.000 
messages in my maildir. How much memory should I need in 
a ramdisk to hold that?


 - What happens if something fails? I think that if I 
lose the indexes (ej: kernel crash) the next time I boot 
the system the ramdisk will be empty, so the indexes should be 
recreated. Am I right?


- If I buy a SSD system and export that little and fast 
storage via iSCSI, does zlib compression applies 
to indexes?


- Any additional filesystem info? I am using ext3 on RHEL 5.5, in 
RHEL 5.6 ext4 will be supported. Any performance hint/tuning (I already 
use noatime, 4k blocksize)?



Regards

Javier


mail_location = maildir:~/Maildir:INDEX=MEMORY

The :INDEX=MEMORY disables writing the index files to disk, and as the
name implies, I believe, simply keeps indexes in memory.

I think maybe I shoudn't have called it INDEX=MEMORY, but rather more like 
INDEX=DISABLE.


If you really want to, you can also disable the index files completely
by appending :INDEX=MEMORY.

My read of that is that indexing isn't disabled completely, merely
storing the indexes to disk is disables.  The indexes are still built
and maintained in memory.

Timo, is that correct?

It's a per-connection in-memory index. Also there is no kind of caching of 
anything (dovecot.index.cache file, which is where most of Dovecot performance 
usually comes from).


I don't know if, or how much, storing them in RAM via :INDEX=MEMORY
consumes, as compared to using a ramdisk.  The memory consumption may be
less or it may be more.  Timo should be able to answer this, and give a
recommendation as to whether this is even a sane thing to do.

I think INDEX=MEMORY performance is going to suck. http://imapwiki.org/Benchmarking explains IMAP 
performance a bit more. By default Dovecot is the Dynamically caching server, but with 
INDEX=MEMORY it becomes Non-caching server.




Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-12 Thread Eric Rostetter

Quoting Stan Hoeppner s...@hardwarefreak.com:


Also, due to the potential size of the index files (mine alone are 276
MB on an 877 MB mbox), you'll need to do some additional research to see
if this is a possibility for you.


That's rather high based on my users...  My largest user has 110M of indexes.
The next highest users are 54M, 52M, 43M, 38M, 32M, 30M, 27M, 23M, 22M, and
then tons of users in the teens...  So your situation doesn't seem to be
the norm...

I guess it depends on your site (users, quotas, number of folders per  
user, etc).


--
Eric Rostetter
The Department of Physics
The University of Texas at Austin

Go Longhorns!



Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-12 Thread Eric Rostetter

Quoting Javier de Miguel Rodríguez javierdemig...@us.es:

- I understand that indexes should go to the fastest storage I  
own. Somebody talked about storing them in a ramdisk and then backup  
them to disk on shutdown. I have several questions about that:


- In my setup I have 25.000+ users, almost 7.000.000  
messages in my maildir. How much memory should I  
need in a ramdisk to hold that?


Are you using dovecot with them now?  If so, then you can figure out who
much they are currently using. Otherwise, well, who knows?  It will
depend on the clients used (for dovecot.index.cache), as
well as how often they are accessed (for transaction log), and so on.

Maybe Timo or someone more into the inner workings can give more details.

 - What happens if something fails? I think that if  
I lose the indexes (ej: kernel crash) the next time I 
 boot the system the ramdisk will be empty, so the indexes should be  
recreated. Am I right?


Yes.  If the ramdisk is full, it will switch to INDEX=MEMORY automatically
for all new sessions until space frees up.  And if you crash without saving
the indexes, it will rebuild them when the users reconnect.

- If I buy a SSD system and export that little and  
fast storage via iSCSI, does zlib compression 
 applies to indexes?


I don't think so, but maybe Timo can say for sure.

- Any additional filesystem info? I am using ext3 on RHEL 5.5,  
in RHEL 5.6 ext4 will be supported. Any performance hint/tuning (I  
already use noatime, 4k blocksize)?


Only the ext3 commit interval (raising it might lower your I/O on the SAN)
I mentioned earlier...  But of course, it raises your chances of losing
data in a crash (i.e. you could lose more data, since it flushes less often).
But it is a good trade off sometimes (I always raise it on my laptops in
order to cut down on battery usage).

--
Eric Rostetter
The Department of Physics
The University of Texas at Austin

Go Longhorns!



Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-12 Thread Stan Hoeppner
Eric Rostetter put forth on 12/12/2010 9:08 PM:
 Quoting Stan Hoeppner s...@hardwarefreak.com:
 
 Also, due to the potential size of the index files (mine alone are 276
 MB on an 877 MB mbox), you'll need to do some additional research to see
 if this is a possibility for you.
 
 That's rather high based on my users...  My largest user has 110M of
 indexes.
 The next highest users are 54M, 52M, 43M, 38M, 32M, 30M, 27M, 23M, 22M, and
 then tons of users in the teens...  So your situation doesn't seem to be
 the norm...

Oh, I'm positive of that Eirc. :)

 I guess it depends on your site (users, quotas, number of folders per
 user, etc).

About 135 MB of that 276 MB mentioned above is search indexes.

This mailbox has a little over 62,000 emails, stored in mbox format.
Most of the 62K reside in less than 10 mbox files (IMAP folders).  It's
definitely out of the norm. :)

-- 
Stan


Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-12 Thread Stan Hoeppner
Javier de Miguel Rodríguez put forth on 12/12/2010 1:26 PM:
 
 Thank you very much for all the responses in this thread. Now I have
 more questions:
 
 - I have slow I/O (about 3.5000-4.000 IOPS, measured via
 imapsync), if I enable zlib compression in my maildirs, that should
 lower the number the IOPS (less to read, less to write, less IOPS, more
 CPU). Dovecot 2.0 is better for zlib (lda support) than dovecot 1.2.X..
 
 - I understand that indexes should go to the fastest storage I own.
 Somebody talked about storing them in a ramdisk and then backup them to
 disk on shutdown. I have several questions about that:

For that many users I'm guessing you can't physically stuff enough RAM
into the machines in your ESX cluster to use a ramdisk for the index
files, and if you could, you probably couldn't, or wouldn't want to,
afford the DIMMs required to meet the need.

 - In my setup I have 25.000+ users, almost 7.000.000
 messages in my maildir. How much memory should I need in
 a ramdisk to hold that?
 
  - What happens if something fails? I think that if I
 lose the indexes (ej: kernel crash) the next time I boot
 the system the ramdisk will be empty, so the indexes should be
 recreated. Am I right?

Given the size of your mail user base, I'd probably avoid the ramdisk
option, and go with a couple of striped (RAID 0) 100+ GB SSDs connected
on the iSCSI SAN.  This is an ESX cluster of more than one machine
correct?  You never confirmed this, but it seems a logical assumption
based on what you've stated.  If it's a single machine you should
obviously go with locally attached SATA II SSDs as it's far cheaper with
much greater real bandwidth by a factor of 100:1 vs iSCSI connection.

 - If I buy a SSD system and export that little and fast
 storage via iSCSI, does zlib compression applies
 to indexes?

Timo will have to answer this regarding zlib on indexes.

 - Any additional filesystem info? I am using ext3 on RHEL 5.5, in
 RHEL 5.6 ext4 will be supported. Any performance hint/tuning (I already
 use noatime, 4k blocksize)?

I'm shocked you're running 25K mailboxen with 7 million messages on
maildir atop EXT3!  On your  fast iSCSI SAN array, I assume with at
least 14 spindles in the RAID group LUN where the mail is stored, you
should be using XFS.

Formatted with the correct parameters, and mounted with the correct
options, XFS will give you _at minimum_ a factor of 2 performance gain
over EXT3 with 128 concurrent users.  As you add more concurrent users,
this ratio will grow even greater in XFS' favor.

XFS was designed specifically, and has been optimized since 1994, for
large parallel workloads.  EXT3 has traditionally been optimized for use
as a single user desktop filesystem.  Its performance with highly
parallel workloads pales in comparison to XFS:

http://btrfs.boxacle.net/repository/raid/2.6.35-rc5/2.6.35-rc5/2.6.35-rc5_Mail_server_simulation._num_threads=128.html


If you do add the two ~100 GB SSDs and stripe them via the RAID hardware
or via mdadm or LVM, format the resulting striped device with XFS and
give it 16 allocation groups.  If using kernel 2.6.36 or grater, mount
the resulting filesystem with the delaylog option.  You will be doing
yourself a huge favor, and will be amazed at the index performance.

-- 
Stan


Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-12 Thread Javier de Miguel Rodrí­guez

Thank you for your responses Stan, I reply you below

For that many users I'm guessing you can't physically stuff enough RAM
into the machines in your ESX cluster to use a ramdisk for the index
files, and if you could, you probably couldn't, or wouldn't want to,
afford the DIMMs required to meet the need.

Yes, I have a cluster of 4 ESX servers. I am going to do some 
scriptting to see how much space we are allocating to indexes.





 - In my setup I have 25.000+ users, almost 7.000.000
messages in my maildir. How much memory should I need in
a ramdisk to hold that?

  - What happens if something fails? I think that if I
lose the indexes (ej: kernel crash) the next time I boot
the system the ramdisk will be empty, so the indexes should be
recreated. Am I right?

Given the size of your mail user base, I'd probably avoid the ramdisk
option, and go with a couple of striped (RAID 0) 100+ GB SSDs connected
on the iSCSI SAN.  This is an ESX cluster of more than one machine
correct?  You never confirmed this, but it seems a logical assumption
based on what you've stated.  If it's a single machine you should
obviously go with locally attached SATA II SSDs as it's far cheaper with
much greater real bandwidth by a factor of 100:1 vs iSCSI connection.



My SAN(s) (HP LeftHand Networks) do not support SSD, though. But I 
have several LeftHand nodes, some of them with raid5, others with raid 
1+0. Maildirs+indexes are now in raid5, maybe I can separate the indexes 
to raid 1+0 iscsi target in a different san




 - If I buy a SSD system and export that little and fast
storage via iSCSI, does zlib compression applies
to indexes?

Timo will have to answer this regarding zlib on indexes.



That would be rather interesting.



 - Any additional filesystem info? I am using ext3 on RHEL 5.5, in
RHEL 5.6 ext4 will be supported. Any performance hint/tuning (I already
use noatime, 4k blocksize)?

I'm shocked you're running 25K mailboxen with 7 million messages on
maildir atop EXT3!  On your  fast iSCSI SAN array, I assume with at
least 14 spindles in the RAID group LUN where the mail is stored, you
should be using XFS.



I have two raid5 (7 disks+1 spare) and I have joined them via LVM 
stripping. Each disk is SAS 15k rpm 450GB, and the SANs have 512 
MB-battery-backed-cache. In our real workload (imapsync), each raid5 
gives around 1700-1800 IOPS, combined 3.500 IOPS.



Formatted with the correct parameters, and mounted with the correct
options, XFS will give you _at minimum_ a factor of 2 performance gain
over EXT3 with 128 concurrent users.  As you add more concurrent users,
this ratio will grow even greater in XFS' favor.


Sadly, Red Hat Enterprise Linux 5 does not support natively XFS. I 
can install it via CentosPlus, but we need Red Hat support if somethings 
goes VERY wrong. Red Hat Enterprise Linux 6 supports XFS (and gives me 
dovecot 2.0), but maybe it is too early for a RHEL6 deployment for so 
many users (sigh).


I will continue investigating about indexes. Any additional hint?

Regards

Javier



Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-11 Thread Stan Hoeppner
Eric Rostetter put forth on 12/10/2010 10:11 PM:
 Quoting javierdemig...@us.es:
 
 in our vmware esx cluster. We want to minimize disk I/O, what config
 options should we use. We can exchange CPU  RAM to minimize disk i/o.
 
 Depends on what you are doing -- pop3, imap, both, deliver or some other
 LDA?  Do you care if the indexes are lost on reboot or not?
 
 You might try putting the indexes in, memory (either via dovecot settings
 or a RAM DISK) or on SSD.

snipped good bare metal recommendations

Eric you missed up above that he's running Dovecot on an ESX cluster, so
SSDs or any hardware dedicated to Dovecot isn't possible for the OP.

Javier, email is an I/O intensive application, whether an MTA spool, an
IMAP server, or POP server.  The more concurrent users you have the
greater the file I/O.  Thus, the only way to decrease packets across
your iSCSI SAN is to increase memory so more disk blocks are cached.
But keep in mind, at one point or another, everything has to be written
to disk, or deleted from disk.  So, while you can decrease disk *reads*
by adding memory to the VM, you will never be able to decrease writes,
you can only delay them with things like write cache, or in the case of
XFS, the delaylog mount option.  These comments refer to mail file I/O.

IMAP is a very file I/O intensive application.  As Eric mentioned, you
could put your user *index* files in a RAM disk or make them memory
resident via Dovecot directive.  This would definitely decrease disk
reads and writes quite a bit.  Also as Eric mentioned, if you reboot you
lose the indexes, and along with them Dovecot's key performance enabler.
 User response times will be poor until the indexes get rebuilt.

If this is a POP server, then you really have no way around the disk I/O
issue.  Due to the nature of POP, there is very little opportunity to do
effective disk block caching, unless the bulk of your users configure
their clients to check mail every 60 seconds, 24 hours a day.  In this
scenario you have good opportunity for block caching.  However, if the
bulk of the users only pop the server every half hour or more, there is
no opportunity for file caching.  The files will be read from disk every
time a user pops the server.  However, in this scenario, you'd have
relatively low disk I/O load to begin with, and you'd not be inquiring
here.  Thus, I can only assume your Dovecot server is configured for IMAP.

So, either:

1.  Increase memory and/or
2.  Move indexes to memory

#1 will be less effective at decreasing I/O.  #2 will be very effective,
but at the cost of lost indexes upon reboot or crash.

-- 
Stan


Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-11 Thread Eric Rostetter

Quoting Stan Hoeppner s...@hardwarefreak.com:


snipped good bare metal recommendations

Eric you missed up above that he's running Dovecot on an ESX cluster, so
SSDs or any hardware dedicated to Dovecot isn't possible for the OP.


Well, it is true I know nothing about vmware/ESX.  I know in my virtual
machine setups, I _can_ give the virtual instances access to devices which
are not used by other virtual instances.  This is what I would do.  Yes,
it is still virtualized, but it is dedicated, and should still perform
pretty well -- faster than shared storage, and in the case of SSD faster
than normal disk or iscsi.


Javier, email is an I/O intensive application, whether an MTA spool, an
IMAP server, or POP server.  The more concurrent users you have the
greater the file I/O.  Thus, the only way to decrease packets across
your iSCSI SAN is to increase memory so more disk blocks are cached.


He was already asking about throwing memory at the problem, and I think
he implied he had a lot of memory. As such, the caching is there already.
Your statement is true, but it is also a zero config option if he really
does have lots of memory in the machine.


But keep in mind, at one point or another, everything has to be written
to disk, or deleted from disk.  So, while you can decrease disk *reads*
by adding memory to the VM, you will never be able to decrease writes,
you can only delay them with things like write cache, or in the case of
XFS, the delaylog mount option.  These comments refer to mail file I/O.


And in ext3, the flush rate.  Good point, that I forgot about.  It is set
to a very small value by default (2-3 seconds maybe), and can be increased
without too much danger (to say 10-30 seconds).


IMAP is a very file I/O intensive application.  As Eric mentioned, you
could put your user *index* files in a RAM disk or make them memory
resident via Dovecot directive.  This would definitely decrease disk
reads and writes quite a bit.  Also as Eric mentioned, if you reboot you
lose the indexes, and along with them Dovecot's key performance enabler.
 User response times will be poor until the indexes get rebuilt.


Assuming normal downtime stats, this would still be a huge win.  Since the
machine rarely goes down, it would rarely need to rebuild indexes, and hence
would only run poorly a very small percentage of the time.   Of course, it
could run _very_ poorly right after a reboot for a while, but then will be
back to normal soon enough.

One way to help mitigate this if using a RAM disk is have your shutdown script
flush the RAM disk to physical disk (after stoping dovecot) and the reload
it to RAM disk at startup (before starting dovecot).  This isn't possible if
you use the dovecot index memory settings though.


If this is a POP server, then you really have no way around the disk I/O
issue.


I agree.  POP is very inefficient...


So, either:

1.  Increase memory and/or
2.  Move indexes to memory

#1 will be less effective at decreasing I/O.  #2 will be very effective,
but at the cost of lost indexes upon reboot or crash.


Still some room for filesystem tuning, of course, but the above two options
are of course the ones that will make the largest performance improvement
IMHO.


--
Stan


--
Eric Rostetter
The Department of Physics
The University of Texas at Austin

Go Longhorns!



Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-11 Thread a
Guys. Who is interested in obvious reasoning? More memory, bare metal, depends 
on your needs, bla-bla-bla. Let me remind original concrete question. I am also 
interested.

 We can exchange CPU  RAM to minimize disk i/o. 
 Should we change to dovecot 2.0?
 Maybe mdbox can help us? 
 Maybe ext4 instead of ext3?

Does anybody know concrete answers? Let's consider IMAP and LDA and forget POP3.

1. Is migration to dovecot 2.0 good idea if I want to decrease I/O?
2. Can mdbox help decrease IO?
3. What is better for mdbox or maildir - ext3 or ext4?


On Sat, 11 Dec 2010 09:48:36 -0600, Eric Rostetter rostet...@mail.utexas.edu 
wrote:
 Quoting Stan Hoeppner s...@hardwarefreak.com:
 
 snipped good bare metal recommendations

 Eric you missed up above that he's running Dovecot on an ESX cluster, so
 SSDs or any hardware dedicated to Dovecot isn't possible for the OP.
 
 Well, it is true I know nothing about vmware/ESX.  I know in my virtual
 machine setups, I _can_ give the virtual instances access to devices which
 are not used by other virtual instances.  This is what I would do.  Yes,
 it is still virtualized, but it is dedicated, and should still perform
 pretty well -- faster than shared storage, and in the case of SSD faster
 than normal disk or iscsi.
 
 Javier, email is an I/O intensive application, whether an MTA spool, an
 IMAP server, or POP server.  The more concurrent users you have the
 greater the file I/O.  Thus, the only way to decrease packets across
 your iSCSI SAN is to increase memory so more disk blocks are cached.
 
 He was already asking about throwing memory at the problem, and I think
 he implied he had a lot of memory. As such, the caching is there already.
 Your statement is true, but it is also a zero config option if he really
 does have lots of memory in the machine.
 
 But keep in mind, at one point or another, everything has to be written
 to disk, or deleted from disk.  So, while you can decrease disk *reads*
 by adding memory to the VM, you will never be able to decrease writes,
 you can only delay them with things like write cache, or in the case of
 XFS, the delaylog mount option.  These comments refer to mail file I/O.
 
 And in ext3, the flush rate.  Good point, that I forgot about.  It is set
 to a very small value by default (2-3 seconds maybe), and can be increased
 without too much danger (to say 10-30 seconds).
 
 IMAP is a very file I/O intensive application.  As Eric mentioned, you
 could put your user *index* files in a RAM disk or make them memory
 resident via Dovecot directive.  This would definitely decrease disk
 reads and writes quite a bit.  Also as Eric mentioned, if you reboot you
 lose the indexes, and along with them Dovecot's key performance enabler.
  User response times will be poor until the indexes get rebuilt.
 
 Assuming normal downtime stats, this would still be a huge win.  Since the
 machine rarely goes down, it would rarely need to rebuild indexes, and hence
 would only run poorly a very small percentage of the time.   Of course, it
 could run _very_ poorly right after a reboot for a while, but then will be
 back to normal soon enough.
 
 One way to help mitigate this if using a RAM disk is have your shutdown script
 flush the RAM disk to physical disk (after stoping dovecot) and the reload
 it to RAM disk at startup (before starting dovecot).  This isn't possible if
 you use the dovecot index memory settings though.
 
 If this is a POP server, then you really have no way around the disk I/O
 issue.
 
 I agree.  POP is very inefficient...
 
 So, either:

 1.  Increase memory and/or
 2.  Move indexes to memory

 #1 will be less effective at decreasing I/O.  #2 will be very effective,
 but at the cost of lost indexes upon reboot or crash.
 
 Still some room for filesystem tuning, of course, but the above two options
 are of course the ones that will make the largest performance improvement
 IMHO.
 
 --
 Stan




Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-11 Thread Eric Rostetter

Quoting a...@test123.ru:


Guys. Who is interested in obvious reasoning?


The same people who are interested in vague questions?


Let me remind original concrete question. I am also interested.


We can exchange CPU  RAM to minimize disk i/o.
Should we change to dovecot 2.0?
Maybe mdbox can help us?
Maybe ext4 instead of ext3?


Uhm, well, again, depends on your needs.  Pop3? Imap? Both?  Number of
accounts?  Can't really help without more details.  Maybe I can't help
with more details either, but that is a risk you take on a mailing list.


1. Is migration to dovecot 2.0 good idea if I want to decrease I/O?


Depends on what version you run now really.  But I would recommend it
anyway just on principle.


2. Can mdbox help decrease IO?
3. What is better for mdbox or maildir - ext3 or ext4?


Dont' know.  But you can certainly tune the FS in either case (atime/dtime,
flush rate, external journal, etc).  Some will say XFS is better, etc.
Besides, you can hardly decide the best FS until you know the mailbox
format (mbox, maildir, mdbox, etc).

If you want concret answers, you need concret questions...

--
Eric Rostetter
The Department of Physics
The University of Texas at Austin

Go Longhorns!



Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-11 Thread Kerem Erciyes
Hi,

I am running a fair amount of stored e-mails on maildirs(10 GB+) in 846
folders that gets a fair amount of searching, and 20+ users accessing them,
mostly via IMAP and a few POP3 accounts. I am running these on a Linode XEN
server and have yet to hit any hard limits of bare metal. User and Virual
databases are plain text files.

# 1.2.9: /etc/dovecot/dovecot.conf
# OS: Linux 2.6.32.16-linode28 i686 Ubuntu 10.04.1 LTS ext3

Postfix + Dovecot + SSL for Both with Amavisd seems a breeze. No problems
related to infrastructure yet.

Yet I will wait to see how this system will grow, as we are planning to
include more users and doamins in our system in 2011.

So:

1. I am very interested in these questions about performance
2. My setup should provide some people another way to do things, since I am
not using mysql, ldap etc., kust plain old text files update via scripts
3. I am goind to test this system as we scale out, yet we are bound to add
LDAP for authentication for single sign on at some point, and I will try to
publish my benchmarks public, even if it is just for publicity's sake.

Regards,
Kerem


On Sat, Dec 11, 2010 at 6:47 PM, Eric Rostetter
rostet...@mail.utexas.eduwrote:

 Quoting a...@test123.ru:

  Guys. Who is interested in obvious reasoning?


 The same people who are interested in vague questions?


  Let me remind original concrete question. I am also interested.

  We can exchange CPU  RAM to minimize disk i/o.
 Should we change to dovecot 2.0?
 Maybe mdbox can help us?
 Maybe ext4 instead of ext3?


 Uhm, well, again, depends on your needs.  Pop3? Imap? Both?  Number of
 accounts?  Can't really help without more details.  Maybe I can't help
 with more details either, but that is a risk you take on a mailing list.


  1. Is migration to dovecot 2.0 good idea if I want to decrease I/O?


 Depends on what version you run now really.  But I would recommend it
 anyway just on principle.


  2. Can mdbox help decrease IO?
 3. What is better for mdbox or maildir - ext3 or ext4?


 Dont' know.  But you can certainly tune the FS in either case (atime/dtime,
 flush rate, external journal, etc).  Some will say XFS is better, etc.
 Besides, you can hardly decide the best FS until you know the mailbox
 format (mbox, maildir, mdbox, etc).

 If you want concret answers, you need concret questions...


 --
 Eric Rostetter
 The Department of Physics
 The University of Texas at Austin

 Go Longhorns!




-- 
Kerem Erciyes
Sistem Danismani
http://proje.keremerciyes.com

kerem.erci...@gmail.com
+90 532 737 05 83


Re: [Dovecot] Question about slow storage but fast cpus, plenty of ram and dovecot

2010-12-10 Thread Eric Rostetter

Quoting javierdemig...@us.es:


in our vmware esx cluster. We want to minimize disk I/O, what config
options should we use. We can exchange CPU  RAM to minimize disk i/o.


Depends on what you are doing -- pop3, imap, both, deliver or some other
LDA?  Do you care if the indexes are lost on reboot or not?

You might try putting the indexes in, memory (either via dovecot settings
or a RAM DISK) or on SSD.

You could also try using ext3/4 with an external journal on a SSD.

SSD would preferably be an enterprise SSD, but it could be a lesser
SSD, or even a USB memory stick (replaced periodically).  Which is right
depends on your needs and budget.

Failing that, you should probably put the indexes on the fastest disks
possible (might be local, might be iscsi, you'd have to benchmark).


 Should we change to dovecot 2.0?


For any new system, I'd start with the most recent dovecot 2.x version.
How easy that is if you are upgrading depends on what version you run now.

--
Eric Rostetter
The Department of Physics
The University of Texas at Austin

Go Longhorns!