Re: improving concurrency/performance (fwd)
On Wed, 09 Nov 2005, Joshua Schmidlkofer wrote: Does this mean that those of us using XFS should run some testing as well? Yes, XFS doesn't journal data in any way, AFAIK. I don't know how one could go about speeding up fsyncs() with it. What I *do* know is that I don't trust spools to XFS, because anything not fsync()'ed upon crash WILL be lost, but the metadata will likely be there and it is a total bitch to find out what has been damaged. You will have to hunt the entire fs over for files containing while blocks of NULs. -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance (fwd)
This guy is having a problem with cyrus-imap and ext3 - when multiple processes are attempting to write to the one filesystem (but not the one file), performance drops to next to nothing when only five processes are writing. An strace shows most of the time is being spent in fdatasync and fsync. Actually, the thread just got off topic quickly -- I'm running this on reiserfs, not ext3. ...And I've got it mounted with data=writeback, too. But thanks for the info, Andrew. John -- John Madden UNIX Systems Engineer Ivy Tech Community College of Indiana [EMAIL PROTECTED] Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance (fwd)
John Madden wrote: This guy is having a problem with cyrus-imap and ext3 - when multiple processes are attempting to write to the one filesystem (but not the one file), performance drops to next to nothing when only five processes are writing. An strace shows most of the time is being spent in fdatasync and fsync. Actually, the thread just got off topic quickly -- I'm running this on reiserfs, not ext3. ...And I've got it mounted with data=writeback, too. But thanks for the info, Andrew. John I'll bet that the fakesync preload library will make diference for you. -- Sergio Bruder Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance (fwd)
This guy is having a problem with cyrus-imap and ext3 - when multiple processes are attempting to write to the one filesystem (but not the one file), performance drops to next to nothing when only five processes are writing. An strace shows most of the time is being spent in fdatasync and fsync. Actually, the thread just got off topic quickly -- I'm running this on reiserfs, not ext3. ...And I've got it mounted with data=writeback, too. But thanks for the info, Andrew. Sorry, my confusion. But it might be worth asking the reiserfs guys. My experience has been that if you are fsync'ing files, then even modern disks only get around 10 fsync's per second (because not only does the file data get writen out, but typically the inode, the directory entry, the free block table and maybe even all the directory entries up to root). Journalling can help, because the commited data is writen sequentially to the journal, rather than being scattered all over the disk, but the journalled operations still need to be applied to the filesystem sooner or later. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance (fwd)
Yes, on ext3, an fsync() syncs the entire filesystem. It has to, because all the metadata for each file is shared - it's just a string of journallable blocks. Similar story with the data, in ordered mode. So effectively, fsync()ing five files one time each is performing 25 fsync()s. One fix (which makes the application specific to ext3 in ordered-data or journalled-data mode) is to perform a single fsync(), with the understanding that this has the side-effect of fsyncing all the other files. That's an ugly solution and is rather hard to do if the workload consists of five separate processes! So I'd recommending mounting the filesystem with the `-o data=writeback' mode. This way, each fsync(fd) will sync fd's data only. That's much better than the default data-ordered mode, wherein a single fsync() will sync all the other file's data too. In data=writeback mode it is still the case that fsync(fd) will sync the other file's metadata, but that's a single linear write to the journal and the additional cost should be low. Bottom line: please try data=writeback, let me know. Does this mean that those of us using XFS should run some testing as well? thanks, joshua Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: improving concurrency/performance (fwd)
I forwarded John's message to Andrew Morton, linux kernel maintainer, and this is his reply (it was cc'ed to the list, but, not being a subscriber, I presume it bounced): --- Forwarded Message Date:Tue, 08 Nov 2005 15:21:31 -0800 From:Andrew Morton [EMAIL PROTECTED] To: Andrew McNamara [EMAIL PROTECTED] cc: John Madden [EMAIL PROTECTED], info-cyrus@lists.andrew.cmu.edu Subject: Re: improving concurrency/performance (fwd) Andrew McNamara [EMAIL PROTECTED] wrote: This guy is having a problem with cyrus-imap and ext3 - when multiple processes are attempting to write to the one filesystem (but not the one file), performance drops to next to nothing when only five processes are writing. An strace shows most of the time is being spent in fdatasync and fsync. ... Yes, on ext3, an fsync() syncs the entire filesystem. It has to, because all the metadata for each file is shared - it's just a string of journallable blocks. Similar story with the data, in ordered mode. So effectively, fsync()ing five files one time each is performing 25 fsync()s. One fix (which makes the application specific to ext3 in ordered-data or journalled-data mode) is to perform a single fsync(), with the understanding that this has the side-effect of fsyncing all the other files. That's an ugly solution and is rather hard to do if the workload consists of five separate processes! So I'd recommending mounting the filesystem with the `-o data=writeback' mode. This way, each fsync(fd) will sync fd's data only. That's much better than the default data-ordered mode, wherein a single fsync() will sync all the other file's data too. In data=writeback mode it is still the case that fsync(fd) will sync the other file's metadata, but that's a single linear write to the journal and the additional cost should be low. Bottom line: please try data=writeback, let me know. --- Forwarded Message Date:Tue, 08 Nov 2005 09:25:54 -0500 From:John Madden [EMAIL PROTECTED] To: Jure =?iso-8859-1?Q?Pe=E8ar?= [EMAIL PROTECTED] cc: info-cyrus@lists.andrew.cmu.edu Subject: Re: improving concurrency/performance As expected, these are from locking operations. 0x8 is file descriptor, which, if I read lsof output correctly, points to config/socket/imap-0.lock (what would that be?) and 0x7 is F_SETLKW which reads as set lock or wait for it to be released in the manual page. Yup, that's exactly the sort of thing I was suspecting -- the performance I was seeing just didn't make sense. imap-0.lock is in /var/imap/socket for me. I believe it's one of the lock file s created when cyrus is started, so it wouldn't make any sense for imapd to ever be spinning on it. The delays I was seeing ocurred when multiple imapd's were writing to the spool at the same time. I do see a lot of this though: fcntl(6, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0 It looks like the lock to open a file in the target mailbox. But again, very l ow actual throughput and still little or no iowait. However, adding a -c to the strace, the top three syscalls are: % time seconds usecs/call callserrors syscall - -- --- --- - - 52.680.5147201243 414 fdatasync 29.870.291830 846 345 fsync 4.190.040898 27 1519 fcntl Makes me wonder why the fsync's are taking so long since the disk is performing so well. Anyone know if that's actually typical? -- John Madden UNIX Systems Engineer Ivy Tech Community College of Indiana [EMAIL PROTECTED] - Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html --- End of Forwarded Message --- End of Forwarded Message Cyrus Home Page: http://asg.web.cmu.edu/cyrus Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html