Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Stan Hoeppner schrieb: Although, I should point out that parallel SCSI (U320) is pretty much a dead technology at this point. AFAIK, no vendor has shipped a new parallel SCSI disk line (only warranty replacements) for a number of years now. It has been superseded by Serial Attached SCSI (SAS). That's for sure! :) But if your storage system is selfmade and you have some fast U320 disks left, you could use them for the index files.
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Patrick Westenberg put forth on 12/15/2010 7:28 AM: > Won't be 15k U320 SCSI disks also faster than average SATA disks for the > indexes? Yes. A faster spindle speed allows for greater random IOPS. Index file reads/writes are random IOPS, and become more random with greater concurrency, i.e. more users. Although, I should point out that parallel SCSI (U320) is pretty much a dead technology at this point. AFAIK, no vendor has shipped a new parallel SCSI disk line (only warranty replacements) for a number of years now. It has been superseded by Serial Attached SCSI (SAS). I should also point out that the interface itself, whether parallel SCSI, SAS, or SATA, makes little difference for random IOPS performance, as long as the interface is running below the saturation point. The spindle speed and the drive firmware, specifically the queuing implementation, are what make for a good transactional performance disk drive. Typically, 15k drives have the best transactional performance, then 10k, then 7.2k, etc. -- Stan
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
El 15/12/10 14:28, Patrick Westenberg escribió: Won't be 15k U320 SCSI disks also faster than average SATA disks for the indexes? I am using 2xraid5 of 8 SAS disks of 15k rpm for mailboxes & indexes. I am evaluating the migration of indexes to 1xraid 1+0 8 disks SAS 15k rpm Regards Javier
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Won't be 15k U320 SCSI disks also faster than average SATA disks for the indexes?
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Javier de Miguel Rodríguez put forth on 12/14/2010 6:15 AM: > I attach you a screenshot of the perfomance of the lefthand: Average: 15 > MB/seg, 1.700 IOPS. Highest load (today) is ~62 MB/seg, with a whooping 9000 > IOPS, mucho above the theorical iops of 2 raid5 of 8 disks each (SAS 15K), > the > cache is working as expected, and queue depth of 226 (a bit overloaded, > though) Ahh, OK. Using RAID5 makes a *lot* of difference with random write IOPS throughput loads such as IMAP with maildir storage or transactional databases. For a transactional load like IMAP I assumed you were using RAID10, which has about double the random write IOPS throughput of RAID5 on most RAID systems. > I still think that my problem is IOPs related, no bandwith related. My That may very well be the case. As I said, random write IOPS for RAID5 is pretty dismal compared to RAID 10. Your average IOPS is currently 1,700 and your average queue depth is 12. Dovecot is write heavy to index files. A 15K SAS drive maxes at about 250-300 head seeks/sec. With RAID5, due to parity read/modify/write cycles for each stripe block, you end up with about only 2 spindles worth of random write IOPS seek performance, or about 500-600 random write IOPS. This actually gets worse as the number of disks in the parity array increases, although read performance does scale with additional spindles. With RAID 10, you get full seek bandwidth to half the disks, or about 1000-1200 IOPS for 8 disks. At 1700 average IOPS, you are currently outrunning your RAID5 write IOPS throughput by a factor of 3:1. Your disks can't keep up. This is the reason for your high queue depth. Even if you ran RAID10 on 8 disks they couldn't keep up with your current average IOPS needs. To keep up with your current IOPS load, you'd need 6x15k SAS = 6x300 seeks/sec = 1800 seeks/sec = 12 SAS drives in RAID10 At *minimum* at this moment, you need a 12 drive RAID10 in each P4300 chassis to satisfy your IOPS needs if continuing to store both indexes and mailboxen on the same P4300, which, is impossible as it maxes at 8 drives. The "load balancing" feature of this product is not designed for parallel transactional workloads. > maximum bandwith today was 60 MB/seg, that fits entirely in 1 Gbps, but the > queue depth is high because of the lot of iops (9000) that "only" 16 disks > can > not handle. I can buy better storage heads to delay all that writes, or avoid > a > lot of them putting the indexes in a SSD or in a ramdisk. It's 8 disks, not 16. HP has led you to believe you actually get linear IOPS scaling across both server boxes (RAID controllers), which I'm pretty sure isn't the case. For file server workloads it may scale well, but not for transactional workloads. For these, your performance is limited to each 8 disk box. What I would suggest at this point, if you have the spare resources, is to setup a dedicated P4300 with 8 disks in RAID10, and put nothing on it but your Dovecot index files (for now anyway). This will allow you to still maximize your mailbox storage capacity using RAID5 on the currently deployed arrays (7 of 8 disks of usable space vs 4 disks with RAID10), while relieving them of the high IOPS generated to/from the indexes. Optimally, in the future, assuming you can't go with SSDs for the indexes (if you can, do it!), you will want to use the same setup I mention above, with split index and mail store on separate P4300s with RAID10 and RAID5 arrays respectively, but using Linux kernel 2.6.36 or later with XFS and the delaylog mount option for your filesystem solution for box indexes and mail store. Combining all of these things should give you a spindle (not cache) IOPS increase for Dovecot of at least a factor of 4 over what you have now. > Thank for all the info, I did not know about Nexsan. You're welcome Javier. Nexsan makes great products, especially for the price. They are very popular with sites that need maximum space, good performance, and who need to avoid single vendor lock-in for replication/backup. Most of their customers have heterogeneous SANs including arrays from the likes of IBM, SUN, SGI, Nexsan, HP, DataDirect, HDS, etc, and necessarily use "third party" backup/replication solutions instead trying to manage each vendor specific hardware solution. Thus, Nexsan forgoes implementing such hardware replication in most of its products, to keep costs down, and to put those R&D dollars into increasing performance, density, manageability, and power efficiency. Their web management GUI is the slickest, most intuitive, easiest to use interface I've yet seen on a SAN array. They have a lot of high profile U.S. government customers including NASA and many/most of the U.S. nuclear weapons labs. They've won tons of industry awards over the past 10 years. A recent example, Caltech deployed 2 PB of Nexsan storage early this year to store Spitzer space telescope data for NASA, a combination of 65 SAT
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Javier de Miguel Rodríguez put forth on 12/13/2010 3:26 AM: > Can you give me (off-list if you desire) more info about your setup? > I am interested in the number and type of spindles you are using. We are > using LeftHand because of their real time replication capabilities, > something very interesting to us, and each node pair is relatively cheap > (8x450 g...@15k rpm sas disks per node, real time replication, 512 MB > caché, about 25K € each node pair). > > We can throw more hardware to this, let's see if using memory-based > indexes (via ramdisk) we get better results. Zlib compression on indexes > should be great for this. Javier, your number and type of spindles aren't the source of your I/O bottleneck. Your array controller is. Or, more specifically, its lack of enough fast network ports is the problem. The P4300 BK718A model of which you speak only has two 1GbE iSCSI ports. That's only 200 MB/s full duplex. That's less link bandwidth than a single 2 Gbit fiber channel interface. 2 Gbit FC is 2 generations and over 8 years old now. It was superseded by 4 Gbit FC and then 8 Gbit FC, which is the current standard. 8 Gbit FC provides 800 MB/s full duplex bandwidth per link. Merely two of the 8 SAS drives in the P4300 unit can saturate the iSCSI interfaces on the controller. This has always been the downside of low end iSCSI arrays. The same eight 450GB 15k spindles backed by a decent controller with 512MB to 1GB cache and dual 4 Gbit FC ports or 10 GbE ports would give you well over double the bandwidth using the same RAID level you do now, and would increase IOPS substantially. You don't need more or faster spindles to increase your performance and decrease interface load Javier. You need more interface throughput than your two 1 GbE ports currently provide. You either need more 1 GbE ports per P4300, say 4 of them, link bonded to your ethernet switch as a single 400 Mb/s full duplex channel (assuming the P4300 supports link bonding), or you need two 10 GbE ports per P4300, if it is upgradeable in this manner. If it is not, you need to move up to another HP model with much greater interface throughput, or a competing vendor's unit with 10 GbE or 8 Gbit FC ports. 1 Gbe iSCSI is a budget solution, not a performance solution, and I think that's what you're beginning to realize. Nexsan sells a low end 14 drive SAS array with dual 2 GB cache active/active controllers for about the same price you're paying for the P4300, maybe a little less. It has dual 4 Gbit FC ports _and_ dual 1 GbE iSCSI ports. It stomps the P4300 in performance when using the FC ports, offering a sustained 800+ MB/s and around 4,300 IOPS to disk (50K+ to cache), but it doesn't have the advanced management features of the HP unit such as built-in snapshots, etc. IMHO VMware Consolidated Backup is a superior SAN backup solution than node-node replication on the P4000 series, which is just one more load sucking the life out of the two tiny 1 GbE ports. Using VCB with an FC SAN is a much more flexible and higher performance disaster recover solution than vendor specific hardware mirroring of two SAN arrays. IMHO, you should look into a phased migration to an FC SAN. Qlogic's FC switches and HBAs are excellent, and affordable on a per port and per chassis basis. They are easy to setup. I taught myself to program the switches in the CLI in a day. The web gui may be even easier. For about $50-60K USD you could have the big brother to the Nexsan unit I mentioned above. It's called the SASBeast. It contains dual active/active controllers, each with 2 GB mirrored cache, 2 8Gbit FC ports, and 2 1GbE iSCSI ports, for a total of 4 8Gb FC ports and 4 1GbE iSCSI ports. It contains 42 x 300, 450, or 600GB SAS 15k drives. It can be expanded with one additional chassis containing another 60 SAS or SATA drives for a total of 104 drives. It offers sustained throughput of over 1.2 GB/s (about 4 times that of the P4300) and over 12,000+ IOPS to disk with 42 SAS drives, 50K+ to cache, increasing to approximately 30k IOPS to disk with a fully populated expansion chassis w/60 SAS 15K drives, 104 drives total in the array. This device's performance is controller limited, similar to the P4300, but due to the processor, not the interfaces, simply for the fact you can expand it to 104 SAS drives for less than $100K USD. If one needs maximum IOPS and bandwidth rather than total storage, one would be better off buying two SASBeasts than one unit with the expansion chassis. IIRC it's also a little cheaper. There are many other fine SAN arrays on the market. I mention Nexsan merely because they are very affordable, fast as hell for the price, easy to manage, and I've used them. Due to the low price, they lack the more advanced features of higher priced units, such as snapshots, remote replication, etc. As I mentioned, I'm not a big fan of relying on SAN controllers to perform all my disaster recovery functions, so I d
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Javier de Miguel Rodríguez put forth on 12/13/2010 1:26 AM: > Sadly, Red Hat Enterprise Linux 5 does not support natively XFS. I > can install it via CentosPlus, but we need Red Hat support if somethings > goes VERY wrong. Red Hat Enterprise Linux 6 supports XFS (and gives me > dovecot 2.0), but maybe it is "too early" for a RHEL6 deployment for so > many users (sigh). > > I will continue investigating about indexes. Any additional hint? Dave Chinner is possibly the most prolific/active developer of XFS. He is the author of the XFS delayed logging code (up to 100x+ increase in matadata write throughput). He's a distinguished kernel engineer at Red Hat. I suggest you send an email to x...@oss.sgi.com and ask to be CC'd on replies if you don't join the list. Briefly, but concisely, describe your current mail storage setup including hardware (SAN topology and storage devices), software, mail box store format, average/peak concurrent user load, and what filesystem you currently use. Then, save every reply you get and read them at least a couple of times each, thoroughly. You should then realize you're leaving a ton of performance on the table, eating far more resources than you should be, and generating more SAN traffic that what you would using XFS for the same workload, especially if you're able to use delaylog with a recent kernel. Delaylog pushes metadata operation loads almost entirely into memory, dramatically decreasing physical I/Os over the SAN, while simultaneously increasing throughput by an order or magnitude. For large maildir installations such as yours, you're almost committing a crime by using EXT3 and not XFS. Sadly, until a major distro makes XFS its default enterprise filesystem, the bulk of the Linux world will have no clue what they've been missing for the past 7+ years, which is very sad. For any large parallel workload, XFS trounces EXT3/4/BTRFS/Reiser by a factor of 5-100+ depending on the exact workload. -- Stan
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
On 12/12/2010 00:49, Stan Hoeppner wrote: > > Since Javier is looking for ways to decrease I/O load on the SAN, not > necessarily increase Dovecot performance, I think putting the index > files on a ramdisk is best thing to try first. It may not be a silver > bullet. If he's still got spare memory to add to this guest, doing both > would be better. Using a ramdisk for the index files will instantly > remove all index I/O from the SAN. More of Dovecot's IMAP I/O is to the > index files than mail files isn't it? So by moving the index files to > ramdisk you should pretty much instantly remove half your SAN I/O load. > This is assuming that Javier currently stores his index files on a SAN LUN. > Speaking of ramdisk/SSD, has anyone tried a PCIe SSD for indexes? ~Seth
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
On Sat, 2010-12-11 at 23:05 +0700, a...@test123.ru wrote: > Does anybody know concrete answers? Let's consider IMAP and LDA and forget > POP3. > > 1. Is migration to dovecot 2.0 good idea if I want to decrease I/O? That alone makes no difference. > 2. Can mdbox help decrease IO? Hopefully! No one has still given me any real world numbers though..
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
On Mon, 2010-12-13 at 10:26 +0100, Javier de Miguel Rodríguez wrote: > We can throw more hardware to this, let's see if using memory-based > indexes (via ramdisk) we get better results. Zlib compression on indexes > should be great for this. Isn't it possible for Linux to compress the ramdisk? It would be a bit difficult to implement zlib compression for indexes for now. In future they'll be written to via lib-fs API, which would make compression possible too by adding a compression middle layer.
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
El 13/12/10 10:16, Brad Davidson escribió: On Dec 12, 2010, at 23:26, Javier de Miguel Rodrí guez wrote: My SAN(s) (HP LeftHand Networks) do not support SSD, though. But I have several LeftHand nodes, some of them with raid5, others with raid 1+0. Maildirs+indexes are now in raid5, maybe I can separate the indexes to raid 1+0 iscsi target in a different san I have two raid5 (7 disks+1 spare) and I have joined them via LVM stripping. Each disk is SAS 15k rpm 450GB, and the SANs have 512 MB-battery-backed-cache. In our real workload (imapsync), each raid5 gives around 1700-1800 IOPS, combined 3.500 IOPS. Your 'slow' storage is running against 16 15k RPM SAS drives? Those LeftHand controllers must be terrible. We have Maildir on NFS on a Netapp with 15k RPM 450GB FC disks and have never had performance problems, even when running the controllers up against the wall by mounting with the noac option (60k NFS IOPS!). We were using 500GB 4500 RPM ATA disks at that point - doesn't get much slower than that. Can you give me (off-list if you desire) more info about your setup? I am interested in the number and type of spindles you are using. We are using LeftHand because of their real time replication capabilities, something very interesting to us, and each node pair is relatively cheap (8x450 g...@15k rpm sas disks per node, real time replication, 512 MB caché, about 25K € each node pair). We can throw more hardware to this, let's see if using memory-based indexes (via ramdisk) we get better results. Zlib compression on indexes should be great for this. Regards Javier
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
On Dec 12, 2010, at 23:26, Javier de Miguel Rodrí guez wrote: > >My SAN(s) (HP LeftHand Networks) do not support SSD, though. But I have > several LeftHand nodes, some of them with raid5, others with raid 1+0. > Maildirs+indexes are now in raid5, maybe I can separate the indexes to raid > 1+0 iscsi target in a different san > >I have two raid5 (7 disks+1 spare) and I have joined them via LVM > stripping. Each disk is SAS 15k rpm 450GB, and the SANs have 512 > MB-battery-backed-cache. In our real workload (imapsync), each raid5 gives > around 1700-1800 IOPS, combined 3.500 IOPS. Your 'slow' storage is running against 16 15k RPM SAS drives? Those LeftHand controllers must be terrible. We have Maildir on NFS on a Netapp with 15k RPM 450GB FC disks and have never had performance problems, even when running the controllers up against the wall by mounting with the noac option (60k NFS IOPS!). We were using 500GB 4500 RPM ATA disks at that point - doesn't get much slower than that. Our current environment actually houses POP/IMAP/SMTP/web for 60k accounts, and an ESX cluster (12k NFS IOPS) without breaking a sweat. We'll soon be adding 128 1TB disks to the same controllers for Exchange, and should still have capacity to spare. Not particularly helpful to your situation I know, but next time you are looking at storage you might reevaluate your current strategy. -Brad
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Thank you for your responses Stan, I reply you below For that many users I'm guessing you can't physically stuff enough RAM into the machines in your ESX cluster to use a ramdisk for the index files, and if you could, you probably couldn't, or wouldn't want to, afford the DIMMs required to meet the need. Yes, I have a cluster of 4 ESX servers. I am going to do some scriptting to see how much space we are allocating to indexes. - In my setup I have 25.000+ users, almost 7.000.000 messages in my maildir. How much memory should I need in a ramdisk to hold that? - What happens if something fails? I think that if I lose the indexes (ej: kernel crash) the next time I boot the system the ramdisk will be empty, so the indexes should be recreated. Am I right? Given the size of your mail user base, I'd probably avoid the ramdisk option, and go with a couple of striped (RAID 0) 100+ GB SSDs connected on the iSCSI SAN. This is an ESX cluster of more than one machine correct? You never confirmed this, but it seems a logical assumption based on what you've stated. If it's a single machine you should obviously go with locally attached SATA II SSDs as it's far cheaper with much greater real bandwidth by a factor of 100:1 vs iSCSI connection. My SAN(s) (HP LeftHand Networks) do not support SSD, though. But I have several LeftHand nodes, some of them with raid5, others with raid 1+0. Maildirs+indexes are now in raid5, maybe I can separate the indexes to raid 1+0 iscsi target in a different san - If I buy a SSD system and export that little and fast storage via iSCSI, does zlib compression applies to indexes? Timo will have to answer this regarding zlib on indexes. That would be rather interesting. - Any additional filesystem info? I am using ext3 on RHEL 5.5, in RHEL 5.6 ext4 will be supported. Any performance hint/tuning (I already use noatime, 4k blocksize)? I'm shocked you're running 25K mailboxen with 7 million messages on maildir atop EXT3! On your fast iSCSI SAN array, I assume with at least 14 spindles in the RAID group LUN where the mail is stored, you should be using XFS. I have two raid5 (7 disks+1 spare) and I have joined them via LVM stripping. Each disk is SAS 15k rpm 450GB, and the SANs have 512 MB-battery-backed-cache. In our real workload (imapsync), each raid5 gives around 1700-1800 IOPS, combined 3.500 IOPS. Formatted with the correct parameters, and mounted with the correct options, XFS will give you _at minimum_ a factor of 2 performance gain over EXT3 with 128 concurrent users. As you add more concurrent users, this ratio will grow even greater in XFS' favor. Sadly, Red Hat Enterprise Linux 5 does not support natively XFS. I can install it via CentosPlus, but we need Red Hat support if somethings goes VERY wrong. Red Hat Enterprise Linux 6 supports XFS (and gives me dovecot 2.0), but maybe it is "too early" for a RHEL6 deployment for so many users (sigh). I will continue investigating about indexes. Any additional hint? Regards Javier
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Javier de Miguel Rodríguez put forth on 12/12/2010 1:26 PM: > > Thank you very much for all the responses in this thread. Now I have > more questions: > > - I have "slow" I/O (about 3.5000-4.000 IOPS, measured via > imapsync), if I enable zlib compression in my maildirs, that should > lower the number the IOPS (less to read, less to write, less IOPS, more > CPU). Dovecot 2.0 is better for zlib (lda support) than dovecot 1.2.X.. > > - I understand that indexes should go to the fastest storage I own. > Somebody talked about storing them in a ramdisk and then backup them to > disk on shutdown. I have several questions about that: For that many users I'm guessing you can't physically stuff enough RAM into the machines in your ESX cluster to use a ramdisk for the index files, and if you could, you probably couldn't, or wouldn't want to, afford the DIMMs required to meet the need. > - In my setup I have 25.000+ users, almost 7.000.000 > messages in my maildir. How much memory should I need in > a ramdisk to hold that? > > - What happens if something fails? I think that if I > lose the indexes (ej: kernel crash) the next time I boot > the system the ramdisk will be empty, so the indexes should be > recreated. Am I right? Given the size of your mail user base, I'd probably avoid the ramdisk option, and go with a couple of striped (RAID 0) 100+ GB SSDs connected on the iSCSI SAN. This is an ESX cluster of more than one machine correct? You never confirmed this, but it seems a logical assumption based on what you've stated. If it's a single machine you should obviously go with locally attached SATA II SSDs as it's far cheaper with much greater real bandwidth by a factor of 100:1 vs iSCSI connection. > - If I buy a SSD system and export that little and fast > storage via iSCSI, does zlib compression applies > to indexes? Timo will have to answer this regarding zlib on indexes. > - Any additional filesystem info? I am using ext3 on RHEL 5.5, in > RHEL 5.6 ext4 will be supported. Any performance hint/tuning (I already > use noatime, 4k blocksize)? I'm shocked you're running 25K mailboxen with 7 million messages on maildir atop EXT3! On your fast iSCSI SAN array, I assume with at least 14 spindles in the RAID group LUN where the mail is stored, you should be using XFS. Formatted with the correct parameters, and mounted with the correct options, XFS will give you _at minimum_ a factor of 2 performance gain over EXT3 with 128 concurrent users. As you add more concurrent users, this ratio will grow even greater in XFS' favor. XFS was designed specifically, and has been optimized since 1994, for large parallel workloads. EXT3 has traditionally been optimized for use as a single user desktop filesystem. Its performance with highly parallel workloads pales in comparison to XFS: http://btrfs.boxacle.net/repository/raid/2.6.35-rc5/2.6.35-rc5/2.6.35-rc5_Mail_server_simulation._num_threads=128.html If you do add the two ~100 GB SSDs and stripe them via the RAID hardware or via mdadm or LVM, format the resulting striped device with XFS and give it 16 allocation groups. If using kernel 2.6.36 or grater, mount the resulting filesystem with the delaylog option. You will be doing yourself a huge favor, and will be amazed at the index performance. -- Stan
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Eric Rostetter put forth on 12/12/2010 9:08 PM: > Quoting Stan Hoeppner : > >> Also, due to the potential size of the index files (mine alone are 276 >> MB on an 877 MB mbox), you'll need to do some additional research to see >> if this is a possibility for you. > > That's rather high based on my users... My largest user has 110M of > indexes. > The next highest users are 54M, 52M, 43M, 38M, 32M, 30M, 27M, 23M, 22M, and > then tons of users in the teens... So your situation doesn't seem to be > the norm... Oh, I'm positive of that Eirc. :) > I guess it depends on your site (users, quotas, number of folders per > user, etc). About 135 MB of that 276 MB mentioned above is search indexes. This mailbox has a little over 62,000 emails, stored in mbox format. Most of the 62K reside in less than 10 mbox files (IMAP folders). It's definitely out of the norm. :) -- Stan
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Quoting Javier de Miguel Rodríguez : - I understand that indexes should go to the fastest storage I own. Somebody talked about storing them in a ramdisk and then backup them to disk on shutdown. I have several questions about that: - In my setup I have 25.000+ users, almost 7.000.000 messages in my maildir. How much memory should I need in a ramdisk to hold that? Are you using dovecot with them now? If so, then you can figure out who much they are currently using. Otherwise, well, who knows? It will depend on the clients used (for dovecot.index.cache), as well as how often they are accessed (for transaction log), and so on. Maybe Timo or someone more into the inner workings can give more details. - What happens if something fails? I think that if I lose the indexes (ej: kernel crash) the next time I boot the system the ramdisk will be empty, so the indexes should be recreated. Am I right? Yes. If the ramdisk is full, it will switch to INDEX=MEMORY automatically for all new sessions until space frees up. And if you crash without saving the indexes, it will rebuild them when the users reconnect. - If I buy a SSD system and export that little and fast storage via iSCSI, does zlib compression applies to indexes? I don't think so, but maybe Timo can say for sure. - Any additional filesystem info? I am using ext3 on RHEL 5.5, in RHEL 5.6 ext4 will be supported. Any performance hint/tuning (I already use noatime, 4k blocksize)? Only the ext3 commit interval (raising it might lower your I/O on the SAN) I mentioned earlier... But of course, it raises your chances of losing data in a crash (i.e. you could lose more data, since it flushes less often). But it is a good trade off sometimes (I always raise it on my laptops in order to cut down on battery usage). -- Eric Rostetter The Department of Physics The University of Texas at Austin Go Longhorns!
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Quoting Stan Hoeppner : Also, due to the potential size of the index files (mine alone are 276 MB on an 877 MB mbox), you'll need to do some additional research to see if this is a possibility for you. That's rather high based on my users... My largest user has 110M of indexes. The next highest users are 54M, 52M, 43M, 38M, 32M, 30M, 27M, 23M, 22M, and then tons of users in the teens... So your situation doesn't seem to be the norm... I guess it depends on your site (users, quotas, number of folders per user, etc). -- Eric Rostetter The Department of Physics The University of Texas at Austin Go Longhorns!
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Thank you very much for all the responses in this thread. Now I have more questions: - I have "slow" I/O (about 3.5000-4.000 IOPS, measured via imapsync), if I enable zlib compression in my maildirs, that should lower the number the IOPS (less to read, less to write, less IOPS, more CPU). Dovecot 2.0 is better for zlib (lda support) than dovecot 1.2.X.. - I understand that indexes should go to the fastest storage I own. Somebody talked about storing them in a ramdisk and then backup them to disk on shutdown. I have several questions about that: - In my setup I have 25.000+ users, almost 7.000.000 messages in my maildir. How much memory should I need in a ramdisk to hold that? - What happens if something fails? I think that if I lose the indexes (ej: kernel crash) the next time I boot the system the ramdisk will be empty, so the indexes should be recreated. Am I right? - If I buy a SSD system and export that little and fast storage via iSCSI, does zlib compression applies to indexes? - Any additional filesystem info? I am using ext3 on RHEL 5.5, in RHEL 5.6 ext4 will be supported. Any performance hint/tuning (I already use noatime, 4k blocksize)? Regards Javier mail_location = maildir:~/Maildir:INDEX=MEMORY The ":INDEX=MEMORY" disables writing the index files to disk, and as the name implies, I believe, simply keeps indexes in memory. I think maybe I shoudn't have called it INDEX=MEMORY, but rather more like INDEX=DISABLE. "If you really want to, you can also disable the index files completely by appending :INDEX=MEMORY." My read of that is that indexing isn't disabled completely, merely storing the indexes to disk is disables. The indexes are still built and maintained in memory. Timo, is that correct? It's a per-connection in-memory index. Also there is no kind of caching of anything (dovecot.index.cache file, which is where most of Dovecot performance usually comes from). I don't know if, or how much, storing them in RAM via :INDEX=MEMORY consumes, as compared to using a ramdisk. The memory consumption may be less or it may be more. Timo should be able to answer this, and give a recommendation as to whether this is even a sane thing to do. I think INDEX=MEMORY performance is going to suck. http://imapwiki.org/Benchmarking explains IMAP performance a bit more. By default Dovecot is the "Dynamically caching server", but with INDEX=MEMORY it becomes "Non-caching server".
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
On 12.12.2010, at 9.39, Stan Hoeppner wrote: > mail_location = maildir:~/Maildir:INDEX=MEMORY > > The ":INDEX=MEMORY" disables writing the index files to disk, and as the > name implies, I believe, simply keeps indexes in memory. I think maybe I shoudn't have called it INDEX=MEMORY, but rather more like INDEX=DISABLE. > "If you really want to, you can also disable the index files completely > by appending :INDEX=MEMORY." > > My read of that is that indexing isn't disabled completely, merely > storing the indexes to disk is disables. The indexes are still built > and maintained in memory. > > Timo, is that correct? It's a per-connection in-memory index. Also there is no kind of caching of anything (dovecot.index.cache file, which is where most of Dovecot performance usually comes from). > I don't know if, or how much, storing them in RAM via :INDEX=MEMORY > consumes, as compared to using a ramdisk. The memory consumption may be > less or it may be more. Timo should be able to answer this, and give a > recommendation as to whether this is even a sane thing to do. I think INDEX=MEMORY performance is going to suck. http://imapwiki.org/Benchmarking explains IMAP performance a bit more. By default Dovecot is the "Dynamically caching server", but with INDEX=MEMORY it becomes "Non-caching server".
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Patrick Westenberg put forth on 12/11/2010 5:12 AM: > Stan Hoeppner schrieb: > >> So, either: >> >> 2. Move indexes to memory > > What steps have to be done and what will the configuration look like > to have your indexes in memory? Regarding Dovecot 1.2.x, for maildir, I believe it would be something like this: mail_location = maildir:~/Maildir:INDEX=MEMORY The ":INDEX=MEMORY" disables writing the index files to disk, and as the name implies, I believe, simply keeps indexes in memory. The docs say: "If you really want to, you can also disable the index files completely by appending :INDEX=MEMORY." My read of that is that indexing isn't disabled completely, merely storing the indexes to disk is disables. The indexes are still built and maintained in memory. Timo, is that correct? Also, due to the potential size of the index files (mine alone are 276 MB on an 877 MB mbox), you'll need to do some additional research to see if this is a possibility for you. If using a ramdisk, 100 mail boxen like mine would require ~ 27.6 GB or RAM just for the index files. This would not be logical to do or feasible given the amount of RAM required. I don't know if, or how much, storing them in RAM via :INDEX=MEMORY consumes, as compared to using a ramdisk. The memory consumption may be less or it may be more. Timo should be able to answer this, and give a recommendation as to whether this is even a sane thing to do. -- Stan
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Eric Rostetter put forth on 12/11/2010 9:48 AM: > Well, it is true I know nothing about vmware/ESX. I know in my virtual > machine setups, I _can_ give the virtual instances access to devices which > are not used by other virtual instances. This is what I would do. Yes, > it is still virtualized, but it is dedicated, and should still perform > pretty well -- faster than shared storage, and in the case of SSD faster > than normal disk or iscsi. He's running an ESX cluster, which assumes use of HA and Vmotion. For Vmotion to work, each node in the cluster must have direct hardware access to every storage device. Thus, to use an SSD, it would have to be installed in Javier's iSCSI SAN array. Many iSCSI arrays are relatively inexpensive and don't offer SSD support. However, Javier didn't ask for ways to increase his I/O throughput. He asked for the opposite. I assume this is because they have a 1 GbE based ethernet SAN, and probably only 2 or 4, GbE ports on the SAN array controller. With only 200 to 400MB/s bidirectional bandwidth, and many busy guests in the eSX farm, probably many applications besides Dovecot, Javier's organization is likely coming close to bumping the up against the bandwidth limits of the 1 GbE links on the SAN array controller. Thus, adding an SSD to the mix would exacerbate the I/O problem. Thus, putting the index files in a ramdisk or using the Deovecot memory only index file parameter are really his only two options that I can think of that will help in the way he desires. > He was already asking about throwing memory at the problem, and I think > he implied he had a lot of memory. As such, the caching is there already. > Your statement is true, but it is also a "zero config" option if he really > does have lots of memory in the machine. He has physical memory available, but he isn't currently assigning it to the Dovecot guest. To do so would require changing the memory setting in ESX for this guest, then rebooting the guest (unless both ESX and his OS support hot plug memory--I don't know if ESX does). This is what Javier was referring to when stating "adding memory". > And in ext3, the flush rate. Good point, that I forgot about. It is set > to a very small value by default (2-3 seconds maybe), and can be increased > without too much danger (to say 10-30 seconds). Just to be clear and accurate here, and it's probably a little OT to the thread, XFS delaylog isn't designed to decrease filesystem log I/O activity. It was designed to dramatically increase the rate of write operations to the journal log--metadata operations--and the I/O efficiency for metadata ops. The major visible benefit of this is a massive increase in delete performance for many tens of thousands (or more) of files. It decreases journal log file fragmentation as more writes can be packed into each inode due to in memory organization before the physical write. This packing thus decreases physical disk I/O as fewer, larger blocks are written per I/O. XFS with delaylog is an excellent match for maildir storage. It won't help much at all with mbox, very slightly more with mdbox. XFS delaylog is a _perfect_ match for the POP3 workload. Each time a user pulls, then deletes all messages, delaylog will optimize and then burst the metadata journal write operations to disk, again, with far fewer physical I/Os due to the inode optimization. XFS with delaylog is now much faster than any version of ReiserFS, whose claim to fame was lighting fast mass file deletion. As of 2.6.36, XFS is now the fastest filesystem, and not just on Linux, for almost any workload. This assuming real storage hardware that can take handle massive parallelization of reads and writes. EXT3 is still faster on a single disk system. But EXT3 is the "everyman" OS, which is optimized more for the single disk case. XFS was and still is designed for large parallel servers with big fast storage. > Assuming normal downtime stats, this would still be a huge win. Since the > machine rarely goes down, it would rarely need to rebuild indexes, and > hence > would only run poorly a very small percentage of the time. Of course, it > could run _very_ poorly right after a reboot for a while, but then will be > back to normal soon enough. I totally concur. > One way to help mitigate this if using a RAM disk is have your shutdown > script > flush the RAM disk to physical disk (after stoping dovecot) and the reload > it to RAM disk at startup (before starting dovecot). Excellent idea Eric. I'd never considered this. Truly, that's a fantastic, creative solution, and should be relatively straightforward to implement. > This isn't > possible if > you use the dovecot index memory settings though. Yeah, I think the ramdisk is the way to go here. At least if/until a better solution can be found. I don't really see there is one, other than his org investing in a faster SAN architecture such as 4/8Gb FC or 10 Gbit iSCSI. The former
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Hi, I am running a fair amount of stored e-mails on maildirs(10 GB+) in 846 folders that gets a fair amount of searching, and 20+ users accessing them, mostly via IMAP and a few POP3 accounts. I am running these on a Linode XEN server and have yet to hit any hard limits of "bare metal". User and Virual databases are plain text files. # 1.2.9: /etc/dovecot/dovecot.conf # OS: Linux 2.6.32.16-linode28 i686 Ubuntu 10.04.1 LTS ext3 Postfix + Dovecot + SSL for Both with Amavisd seems a breeze. No problems related to infrastructure yet. Yet I will wait to see how this system will grow, as we are planning to include more users and doamins in our system in 2011. So: 1. I am very interested in these questions about performance 2. My setup should provide some people another way to do things, since I am not using mysql, ldap etc., kust plain old text files update via scripts 3. I am goind to test this system as we scale out, yet we are bound to add LDAP for authentication for single sign on at some point, and I will try to publish my benchmarks public, even if it is just for publicity's sake. Regards, Kerem On Sat, Dec 11, 2010 at 6:47 PM, Eric Rostetter wrote: > Quoting a...@test123.ru: > > Guys. Who is interested in obvious reasoning? >> > > The same people who are interested in vague questions? > > > Let me remind original concrete question. I am also interested. >> >> We can "exchange" CPU & RAM to minimize disk i/o. >>> Should we change to dovecot 2.0? >>> Maybe mdbox can help us? >>> Maybe ext4 instead of ext3? >>> >> > Uhm, well, again, depends on your needs. Pop3? Imap? Both? Number of > accounts? Can't really help without more details. Maybe I can't help > with more details either, but that is a risk you take on a mailing list. > > > 1. Is migration to dovecot 2.0 good idea if I want to decrease I/O? >> > > Depends on what version you run now really. But I would recommend it > anyway just on principle. > > > 2. Can mdbox help decrease IO? >> 3. What is better for mdbox or maildir - ext3 or ext4? >> > > Dont' know. But you can certainly tune the FS in either case (atime/dtime, > flush rate, external journal, etc). Some will say XFS is better, etc. > Besides, you can hardly decide the best FS until you know the mailbox > format (mbox, maildir, mdbox, etc). > > If you want concret answers, you need concret questions... > > > -- > Eric Rostetter > The Department of Physics > The University of Texas at Austin > > Go Longhorns! > > -- Kerem Erciyes Sistem Danismani http://proje.keremerciyes.com kerem.erci...@gmail.com +90 532 737 05 83
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Quoting a...@test123.ru: Guys. Who is interested in obvious reasoning? The same people who are interested in vague questions? Let me remind original concrete question. I am also interested. We can "exchange" CPU & RAM to minimize disk i/o. Should we change to dovecot 2.0? Maybe mdbox can help us? Maybe ext4 instead of ext3? Uhm, well, again, depends on your needs. Pop3? Imap? Both? Number of accounts? Can't really help without more details. Maybe I can't help with more details either, but that is a risk you take on a mailing list. 1. Is migration to dovecot 2.0 good idea if I want to decrease I/O? Depends on what version you run now really. But I would recommend it anyway just on principle. 2. Can mdbox help decrease IO? 3. What is better for mdbox or maildir - ext3 or ext4? Dont' know. But you can certainly tune the FS in either case (atime/dtime, flush rate, external journal, etc). Some will say XFS is better, etc. Besides, you can hardly decide the best FS until you know the mailbox format (mbox, maildir, mdbox, etc). If you want concret answers, you need concret questions... -- Eric Rostetter The Department of Physics The University of Texas at Austin Go Longhorns!
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Guys. Who is interested in obvious reasoning? More memory, bare metal, depends on your needs, bla-bla-bla. Let me remind original concrete question. I am also interested. > We can "exchange" CPU & RAM to minimize disk i/o. > Should we change to dovecot 2.0? > Maybe mdbox can help us? > Maybe ext4 instead of ext3? Does anybody know concrete answers? Let's consider IMAP and LDA and forget POP3. 1. Is migration to dovecot 2.0 good idea if I want to decrease I/O? 2. Can mdbox help decrease IO? 3. What is better for mdbox or maildir - ext3 or ext4? On Sat, 11 Dec 2010 09:48:36 -0600, Eric Rostetter wrote: > Quoting Stan Hoeppner : > >> >> >> Eric you missed up above that he's running Dovecot on an ESX cluster, so >> SSDs or any hardware dedicated to Dovecot isn't possible for the OP. > > Well, it is true I know nothing about vmware/ESX. I know in my virtual > machine setups, I _can_ give the virtual instances access to devices which > are not used by other virtual instances. This is what I would do. Yes, > it is still virtualized, but it is dedicated, and should still perform > pretty well -- faster than shared storage, and in the case of SSD faster > than normal disk or iscsi. > >> Javier, email is an I/O intensive application, whether an MTA spool, an >> IMAP server, or POP server. The more concurrent users you have the >> greater the file I/O. Thus, the only way to decrease packets across >> your iSCSI SAN is to increase memory so more disk blocks are cached. > > He was already asking about throwing memory at the problem, and I think > he implied he had a lot of memory. As such, the caching is there already. > Your statement is true, but it is also a "zero config" option if he really > does have lots of memory in the machine. > >> But keep in mind, at one point or another, everything has to be written >> to disk, or deleted from disk. So, while you can decrease disk *reads* >> by adding memory to the VM, you will never be able to decrease writes, >> you can only delay them with things like write cache, or in the case of >> XFS, the delaylog mount option. These comments refer to mail file I/O. > > And in ext3, the flush rate. Good point, that I forgot about. It is set > to a very small value by default (2-3 seconds maybe), and can be increased > without too much danger (to say 10-30 seconds). > >> IMAP is a very file I/O intensive application. As Eric mentioned, you >> could put your user *index* files in a RAM disk or make them memory >> resident via Dovecot directive. This would definitely decrease disk >> reads and writes quite a bit. Also as Eric mentioned, if you reboot you >> lose the indexes, and along with them Dovecot's key performance enabler. >> User response times will be poor until the indexes get rebuilt. > > Assuming normal downtime stats, this would still be a huge win. Since the > machine rarely goes down, it would rarely need to rebuild indexes, and hence > would only run poorly a very small percentage of the time. Of course, it > could run _very_ poorly right after a reboot for a while, but then will be > back to normal soon enough. > > One way to help mitigate this if using a RAM disk is have your shutdown script > flush the RAM disk to physical disk (after stoping dovecot) and the reload > it to RAM disk at startup (before starting dovecot). This isn't possible if > you use the dovecot index memory settings though. > >> If this is a POP server, then you really have no way around the disk I/O >> issue. > > I agree. POP is very inefficient... > >> So, either: >> >> 1. Increase memory and/or >> 2. Move indexes to memory >> >> #1 will be less effective at decreasing I/O. #2 will be very effective, >> but at the cost of lost indexes upon reboot or crash. > > Still some room for filesystem tuning, of course, but the above two options > are of course the ones that will make the largest performance improvement > IMHO. > >> -- >> Stan
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Quoting Stan Hoeppner : Eric you missed up above that he's running Dovecot on an ESX cluster, so SSDs or any hardware dedicated to Dovecot isn't possible for the OP. Well, it is true I know nothing about vmware/ESX. I know in my virtual machine setups, I _can_ give the virtual instances access to devices which are not used by other virtual instances. This is what I would do. Yes, it is still virtualized, but it is dedicated, and should still perform pretty well -- faster than shared storage, and in the case of SSD faster than normal disk or iscsi. Javier, email is an I/O intensive application, whether an MTA spool, an IMAP server, or POP server. The more concurrent users you have the greater the file I/O. Thus, the only way to decrease packets across your iSCSI SAN is to increase memory so more disk blocks are cached. He was already asking about throwing memory at the problem, and I think he implied he had a lot of memory. As such, the caching is there already. Your statement is true, but it is also a "zero config" option if he really does have lots of memory in the machine. But keep in mind, at one point or another, everything has to be written to disk, or deleted from disk. So, while you can decrease disk *reads* by adding memory to the VM, you will never be able to decrease writes, you can only delay them with things like write cache, or in the case of XFS, the delaylog mount option. These comments refer to mail file I/O. And in ext3, the flush rate. Good point, that I forgot about. It is set to a very small value by default (2-3 seconds maybe), and can be increased without too much danger (to say 10-30 seconds). IMAP is a very file I/O intensive application. As Eric mentioned, you could put your user *index* files in a RAM disk or make them memory resident via Dovecot directive. This would definitely decrease disk reads and writes quite a bit. Also as Eric mentioned, if you reboot you lose the indexes, and along with them Dovecot's key performance enabler. User response times will be poor until the indexes get rebuilt. Assuming normal downtime stats, this would still be a huge win. Since the machine rarely goes down, it would rarely need to rebuild indexes, and hence would only run poorly a very small percentage of the time. Of course, it could run _very_ poorly right after a reboot for a while, but then will be back to normal soon enough. One way to help mitigate this if using a RAM disk is have your shutdown script flush the RAM disk to physical disk (after stoping dovecot) and the reload it to RAM disk at startup (before starting dovecot). This isn't possible if you use the dovecot index memory settings though. If this is a POP server, then you really have no way around the disk I/O issue. I agree. POP is very inefficient... So, either: 1. Increase memory and/or 2. Move indexes to memory #1 will be less effective at decreasing I/O. #2 will be very effective, but at the cost of lost indexes upon reboot or crash. Still some room for filesystem tuning, of course, but the above two options are of course the ones that will make the largest performance improvement IMHO. -- Stan -- Eric Rostetter The Department of Physics The University of Texas at Austin Go Longhorns!
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Eric Rostetter put forth on 12/10/2010 10:11 PM: > Quoting javierdemig...@us.es: > >> in our vmware esx cluster. We want to minimize disk I/O, what config >> options should we use. We can "exchange" CPU & RAM to minimize disk i/o. > > Depends on what you are doing -- pop3, imap, both, deliver or some other > LDA? Do you care if the indexes are lost on reboot or not? > > You might try putting the indexes in, memory (either via dovecot settings > or a RAM DISK) or on SSD. Eric you missed up above that he's running Dovecot on an ESX cluster, so SSDs or any hardware dedicated to Dovecot isn't possible for the OP. Javier, email is an I/O intensive application, whether an MTA spool, an IMAP server, or POP server. The more concurrent users you have the greater the file I/O. Thus, the only way to decrease packets across your iSCSI SAN is to increase memory so more disk blocks are cached. But keep in mind, at one point or another, everything has to be written to disk, or deleted from disk. So, while you can decrease disk *reads* by adding memory to the VM, you will never be able to decrease writes, you can only delay them with things like write cache, or in the case of XFS, the delaylog mount option. These comments refer to mail file I/O. IMAP is a very file I/O intensive application. As Eric mentioned, you could put your user *index* files in a RAM disk or make them memory resident via Dovecot directive. This would definitely decrease disk reads and writes quite a bit. Also as Eric mentioned, if you reboot you lose the indexes, and along with them Dovecot's key performance enabler. User response times will be poor until the indexes get rebuilt. If this is a POP server, then you really have no way around the disk I/O issue. Due to the nature of POP, there is very little opportunity to do effective disk block caching, unless the bulk of your users configure their clients to check mail every 60 seconds, 24 hours a day. In this scenario you have good opportunity for block caching. However, if the bulk of the users only pop the server every half hour or more, there is no opportunity for file caching. The files will be read from disk every time a user pops the server. However, in this scenario, you'd have relatively low disk I/O load to begin with, and you'd not be inquiring here. Thus, I can only assume your Dovecot server is configured for IMAP. So, either: 1. Increase memory and/or 2. Move indexes to memory #1 will be less effective at decreasing I/O. #2 will be very effective, but at the cost of lost indexes upon reboot or crash. -- Stan
Re: [Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Quoting javierdemig...@us.es: in our vmware esx cluster. We want to minimize disk I/O, what config options should we use. We can "exchange" CPU & RAM to minimize disk i/o. Depends on what you are doing -- pop3, imap, both, deliver or some other LDA? Do you care if the indexes are lost on reboot or not? You might try putting the indexes in, memory (either via dovecot settings or a RAM DISK) or on SSD. You could also try using ext3/4 with an external journal on a SSD. "SSD" would preferably be an enterprise SSD, but it could be a lesser SSD, or even a USB memory stick (replaced periodically). Which is right depends on your needs and budget. Failing that, you should probably put the indexes on the fastest disks possible (might be local, might be iscsi, you'd have to benchmark). Should we change to dovecot 2.0? For any new system, I'd start with the most recent dovecot 2.x version. How easy that is if you are upgrading depends on what version you run now. -- Eric Rostetter The Department of Physics The University of Texas at Austin Go Longhorns!
[Dovecot] Question about "slow" storage but fast cpus, plenty of ram and dovecot
Hello We are using dovecot 1.2.x. In our setup we will have 1200 concurrent imap users (maildirs) and we have 2xraid5 sas 15k diks mounted by iSCSI. The dovecot server (RHEL 5 x64) is a virtual machine in our vmware esx cluster. We want to minimize disk I/O, what config options should we use. We can "exchange" CPU & RAM to minimize disk i/o. Should we change to dovecot 2.0? Maybe mdbox can help us? Maybe ext4 instead of ext3? Any idea is welcome. Regards Javier