> Anyhow, just wondering if we the lone rangers on this particular > edge of the envelope. We alleviated the problem short-term by > recycling some V240 class systems with arrays into Cyrus boxes > with about 3,500 users each, and brought our 2 big Cyrus units > down to 13K-14K users each which seems to work okay.
FastMail has many 100,000's of users in a full replicated setup spread across 10 backend servers (+ separate MX/Spam/Web/Frontend servers). We use IBM servers with some off the shelf SATA-to-SCSI RAID DAS (eg like http://www.areasys.com/area.aspx?m=PSS-6120). Hardware will die at some stage, that's what replication is for. Over the years we've tuned a number of things to get the best possible performance. The biggest things we found: 1. Using the status cache was a big win for us I did some analysis at one stage, and found that most IMAP clients issue STATUS calls to every mailbox a user has on a regular basis (every 5 minutes or so on average, but users can usually change it) so they can update the unread count on every mailbox. The default status code has to iterate over the entire cyrus.index file to get the unread count. Although the cyrus.index file is the smallest file, with 10,000's of users connected with clients doing this regularly for every folder, it basically means you either have to have enough memory to keep every cyrus.index hot in memory, or every 5-15 minutes you'll be forcing a re-read of gigabytes of data from disk, or you need a better way. The better way was to have a status cache. http://cyrus.brong.fastmail.fm/#cyrus-statuscache-2.3.8.diff This helped reduce meta data IO a lot for us. 2. Split your email data + metadata IO With the 12 drive SATA-to-SCSI arrays, we get 4 x 150G 10k RPM WD Raptor drives + 8 x (largest you can get) drives. We then build 2 x 2 drive RAID1 + 2 x 4 drive RAID5 arrays. We use the RAID1 arrays for the meta data (cyrus.* except squatter) and the RAID5 arrays for the email data. We find the email to meta ratio about 20-to-1, higher if you have squatter files, so 150G of meta will support up to 3T of email data fine. >From our iostat data, this seems to be a nice balance. Checking iostat, a rough estimate shows meta data get 2 x the rkB/s and 3 x the wkB/s vs the email spool, even though it's 1/20th the data size and we have the status cache patch! Basically the meta data is "very hot", so optimising access to it is important. 3. Not really related to cyrus, but we switched from perdition to nginx as a frontend POP/IMAP proxy a while back. If you've got lots of IMAP connections, it's really a worthwhile improvement. http://blog.fastmail.fm/2007/01/04/webimappop-frontend-proxies-changed-to-nginx/ 4. Lots of other little things a) putting the proc dir on tmpfs is a good idea b) make sure you have the right filesystem (on linux, reiserfs is much better than ext3 even with ext3s dir hashing) and journaling modes > That is our hypothesis right now, that the application has certain limits > and if you go beyond a certain number of very active users on a > single backend bad things happen. Every application has that problem at some point. Consider something that uses CPU only, and every new unit of work takes the CPU 0.1 seconds, then you can handle 1-10 units of work arriving per second no problem. If 11 units per second arrive, then after 1 second you'll have done 10, but have 1 unit still to do, but another 11 units arrive in that next second again. Basically your outstanding work queue will grow forever in theory. cyrus isn't CPU limited by a long shot, but it can easily become IO limited. This same effect happens with IO, it's just more noticeable because disks are slow. Basically if you start issuing IO requests to a disk system and it can't keep up, the IO queue grows quickly and the system starts crawling. The only way to improve it is reduce your IOPs (eg less users or optimise the application to issue less IOPs in some way) or increase the IOPs your disk system can handle (eg more spindles, faster spindles, NVRAM, etc). That's what 1 (reduce IOPs application generates) & 2 (put hot data on faster spindles) above are both about, rather than the other option (reduce users per server). Rob ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html