Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, Aug 05, 2007 at 11:00:29AM -0400, Theodore Tso wrote: > P.S. Yet alternative is to specify noatime on an individual > file/directory basis. We've had this capability for a *long* time, > and if a distro were to set noatime for all files in certain > hierarchies (i.e., /usr/include) and certain top-level directories > (since the chattr +A flag is inherited) This came across my mind again earlier, and I went digging. Can you explain how this works? I've eyeballed the ext2/ext3 code, and feel like I'm missing something obvious. I'm guessing that for eg, with /usr/include/stdio.h, we check the inodes for all four parts of path, and if any of them are +A we avoid the atime update ? If so, where does that inheritance happen in the code? Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Andi Kleen wrote: I always thought the right solution would be to just sync atime only very very lazily. This means if a inode is only dirty because of an atime update put it on a "only write out when there is nothing to do or the memory is really needed" list. Seems like a good idea. atimes will then be written only by memory pressure - or umount. The atimes could be wrong after a crash, but loosing atimes only is not something I'd worry about. Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Christoph Hellwig wrote: Umm, no f**king way. atime selection is 100% policy and belongs into userspace. Add to that the problem that we can't actually re-enable atimes because of the way the vfs-level mount flags API is designed. Instead of doing such a fugly kernel patch just talk to the handfull of distributions that matter to update their defaults. Indeed. Just change /bin/mount so it defaults to "noatime" unless there is an explicit "atime". Similiar for diratime. Problem solved. Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Tue, 2007-08-14 at 04:25 +0200, Andi Kleen wrote: > On Tue, Aug 14, 2007 at 11:44:56AM +1000, Stewart Smith wrote: > > > Since the database fits in RAM, the only kind of access Mysql is doing > > > is writing to the innodb log, the mysql binlog and finally to the innodb > > > database files. > > > There are certainly a whole lot of fsync'ing happening. > > > > yes. Keep in mind that the binlog grows in file size too... so this has > > to sync all the metadata as well (ick, i know). Back in the first days of my original bug report I moved the binlogs to another disk and it didn't change anything to my issue. On Tue, 2007-08-14 at 04:25 +0200, Andi Kleen wrote: > It might be an interesting experiment to see if it still happens > with the file system remounted as ext2. ext2 has a much more > benign fsync than ext3. Is it possible to perform a live remount of the fs on ext2 ? Beside that, the RAID card has a battery backed RAM in write-back mode, I was told that fsync don't really hurt in this case (moreover the fs is mounted in journal=writeback mode). I'll post soon blktrace files in the original bug report, this will show exactly what is the disk workload in the baseline case _and_ in the underload atypical case. Maybe that will help to shed some lights on the issue? Anyway, thanks, -- Brice Figureau <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Tue, Aug 14, 2007 at 11:44:56AM +1000, Stewart Smith wrote: > > Since the database fits in RAM, the only kind of access Mysql is doing > > is writing to the innodb log, the mysql binlog and finally to the innodb > > database files. > > There are certainly a whole lot of fsync'ing happening. > > yes. Keep in mind that the binlog grows in file size too... so this has > to sync all the metadata as well (ick, i know). It might be an interesting experiment to see if it still happens with the file system remounted as ext2. ext2 has a much more benign fsync than ext3. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Mon, 2007-08-06 at 10:40 +0200, Brice Figureau wrote: > Mysql accesses its database files in O_DIRECT mode. binlog is written using buffered IO. for InnoDB, binlog is synced first, then innodb log. on restart (in 5.0) these are synced back up so you don't get inconsistencies. and from a quick look at the innobase source, only data file is using O_DIRECT. > Since the database fits in RAM, the only kind of access Mysql is doing > is writing to the innodb log, the mysql binlog and finally to the innodb > database files. > There are certainly a whole lot of fsync'ing happening. yes. Keep in mind that the binlog grows in file size too... so this has to sync all the metadata as well (ick, i know). -- Stewart Smith, Senior Software Engineer MySQL AB, www.mysql.com Office: +14082136540 Ext: 6616 VoIP: [EMAIL PROTECTED] Mobile: +61 4 3 8844 332 Jumpstart your cluster: http://www.mysql.com/consulting/packaged/cluster.html signature.asc Description: This is a digitally signed message part
Re: [PATCH 00/23] per device dirty throttling -v8
On Wed, Aug 08, 2007 at 05:54:57PM -0700, Martin Bligh wrote: > Andrew Morton wrote: > >On Wed, 08 Aug 2007 14:10:15 -0700 > >"Martin J. Bligh" <[EMAIL PROTECTED]> wrote: > > > >>Why isn't this easily fixable by just adding an additional dirty > >>flag that says atime has changed? Then we only cause a write > >>when we remove the inode from the inode cache, if only atime > >>is updated. > > > >I think that could be made to work, and it would fix the performance > >issue. > > > >It is a behaviour change. At present ext3 (for example) commits everything > >every five seconds. After a change like this, a crash+recovery could cause > >a file's atime to go backwards by an arbitrarily large time interval - it > >could easily be months. > > A second pdflush / workqueue at a slower rate would alleviate that. This becomes delayed atime writes. I'm not sure that it's better to batch up the writes and do them all in one big seeky go, or to trickle them out as they are done. Best of all is not to do them at all. Note when talking about saving up atime updates to write out that the final write is going to be sloow. Inodes are typically 128 bytes, and you may have to do a seek between every one. Currents disks can do on the order of 100 seeks a second. So do a find on 1000 files and you've just created 10 seconds of I/O hanging out in memory. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Fri, 10 Aug 2007 00:04:45 EDT, Bill Davidsen said: > > I never imagined that itwas the 20%+ hit that is being described, and > > with so little impact, or I would have switched to it across the board > > years ago. > > > To get that magnitude you need slow disk with very fast CPU. It helps > most of systems where the disk hardware is marginal or worse for the i/o > load. Don't take that as typical. I suspect that almost every single laptop with a Core2 Duo in it falls into that classification, and it's getting worse every year, as we see more disparity between CPU speeds (increasing) and disk seek times (basically nailed to the floor for the last decade). pgpSAQlmGIEyL.pgp Description: PGP signature
Re: [PATCH 00/23] per device dirty throttling -v8
Updating the manual page mount(8) with an expanded description of atime/noatime and adding nodirtime and data= seems much more reasonable than hacking the kernel because you want others to run their systems the way you think they should. Almost every web search of "linux fast disk" (or related words) references noatime, and many ext3 specific documents explain the caching options. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Andi Kleen wrote: richard kennedy <[EMAIL PROTECTED]> writes: This is on a standard desktop machine so there are lots of other processes running on it, and although there is a degree of variability in the numbers,they are very repeatable and your patch always out performs the stock mm2. looks good to me iirc the goal of this is less to get better performance, but to avoid long user visible latencies. Of course if it's faster it's great too, but that's only secondary. What a trade-off, if you want to get rid of long latency you have to live with better throughput. I can live with that. ;-) Your point well taken, not the intent of the patch, but it may indicate where a performance bottleneck happens as well. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
[EMAIL PROTECTED] wrote: On Sun, 5 Aug 2007, Diego Calleja wrote: El Sun, 5 Aug 2007 09:13:20 +0200, Ingo Molnar <[EMAIL PROTECTED]> escribió: Measurements show that noatime helps 20-30% on regular desktop workloads, easily 50% for kernel builds and much more than that (in excess of 100%) for file-read-intense workloads. We cannot just walk And as everybody knows in servers is a popular practice to disable it. According to an interview to the kernel.org admins "Beyond that, Peter noted, "very little fancy is going on, and that is good because fancy is hard to maintain." He explained that the only fancy thing being done is that all filesystems are mounted noatime meaning that the system doesn't have to make writes to the filesystem for files which are simply being read, "that cut the load average in half." I bet that some people would consider such performance hit a bug... actually, it's popular practice to disable it by people who know how big a hit it is and know how few programs use it. i've been a linux sysadmin for 10 years, and have known about noatime for at least 7 years, but I always thought of it in the catagory of 'use it only on your performance critical machines where you are trying to extract every ounce of performance, and keep an eye out for things misbehaving' I never imagined that itwas the 20%+ hit that is being described, and with so little impact, or I would have switched to it across the board years ago. To get that magnitude you need slow disk with very fast CPU. It helps most of systems where the disk hardware is marginal or worse for the i/o load. Don't take that as typical. I'll bet there are a lot of admins out there in the same boat. adding an option in the kernel to change the default sounds like a very good first step, even if the default isn't changed today. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Andrew Morton wrote: On Wed, 08 Aug 2007 14:10:15 -0700 "Martin J. Bligh" <[EMAIL PROTECTED]> wrote: Why isn't this easily fixable by just adding an additional dirty flag that says atime has changed? Then we only cause a write when we remove the inode from the inode cache, if only atime is updated. I think that could be made to work, and it would fix the performance issue. It is a behaviour change. At present ext3 (for example) commits everything every five seconds. After a change like this, a crash+recovery could cause a file's atime to go backwards by an arbitrarily large time interval - it could easily be months. I would think that (really) updating atime on open would be enough, hopefully without being too much. The "lazyatime" thing I was playing with only updated on open, final close, write, and fork. I like the idea of updating once in a while, but one of the benefits of noatime is allowing drives to spin down via inactivity. If something does get done in the area of less but non-zero atime tracking, perhaps that could be taken into account. I have to check what "laptop_mode actually does, since my laptops are old installs. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
El Thu, 09 Aug 2007 11:02:38 -0400, Chuck Ebbert <[EMAIL PROTECTED]> escribió: > NT maintains atimes by default, at least up to XP. You have to edit the > registry to turn them off, and it is a single global switch -- not per > mountpoint like Unix. > > And it makes a huge difference there, too. In windows Vista they've disabled atime updates by default. And XP maintains atimes, but it uses a trick to avoid the performance penalty we suffer in linux, similar to what Andi Kleen suggested: they keep atime updates in memory for one hour, and only sync to disk after that time - of course they also sync it if there's a oportunity to do it, like when updating mtime. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On 08/09/2007 02:25 AM, Lionel Elie Mamane wrote: > >> yeah, it's really ugly. But otherwise i've got no real complaint >> about ext3 - with the obligatory qualification that >> "noatime,nodiratime" in /etc/fstab is a must. This speeds up things >> very visibly (...). So for most file workloads we give Windows a >> 20%-30% performance edge, for almost nothing. > > It has been years since I used MS Windows much, but from my memories > of my these days, I was under the impression that it (at least the NT > line, the only surviving line these days) also maintained "last > accessed" times. Except I only ever saw it at "right now" because the > file explorer ... accesses the file before getting this metadata or > something like that (when you right-click on a file and ask for its > properties). It has creation and last modification time, too. > NT maintains atimes by default, at least up to XP. You have to edit the registry to turn them off, and it is a single global switch -- not per mountpoint like Unix. And it makes a huge difference there, too. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sat, Aug 04, 2007 at 06:37:33PM +0200, Ingo Molnar wrote: > * Linus Torvalds <[EMAIL PROTECTED]> wrote: >> The fact is, ext3 *sucks* at fsync. I hate hate hate it. It's >> totally unusable, imnsho. > yeah, it's really ugly. But otherwise i've got no real complaint > about ext3 - with the obligatory qualification that > "noatime,nodiratime" in /etc/fstab is a must. This speeds up things > very visibly (...). So for most file workloads we give Windows a > 20%-30% performance edge, for almost nothing. It has been years since I used MS Windows much, but from my memories of my these days, I was under the impression that it (at least the NT line, the only surviving line these days) also maintained "last accessed" times. Except I only ever saw it at "right now" because the file explorer ... accesses the file before getting this metadata or something like that (when you right-click on a file and ask for its properties). It has creation and last modification time, too. So, if my memories are correct, there is no performance edge to be conceded by having atime (but one to be gained by not having atime). -- Lionel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sat, 4 Aug 2007, Ray Lee wrote: On 8/4/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: On Sat, 4 Aug 2007, Ingo Molnar wrote: At least on a surface level, your report has some similarities to http://lkml.org/lkml/2007/5/21/84 . In that message, John Miller mentions several things he tried without effect: < - I increased the max allowed receive buffer through < proc/sys/net/core/rmem_max and the application calls the right < syscall. "netstat -su" does not show any "packet receive errors". mercury1:/proc/sys/net/core# cat rmem_* 124928 131071 mercury1:/proc/sys/net/core# netstat -su Udp: 697853177 packets received 10025642 packets to unknown port received. 191726680 packet receive errors 63194 packets sent RcvbufErrors: 191726680 UdpLite: mercury1:/proc/sys/net/core# echo "512000" >rmem_max < - After getting "kernel: swapper: page allocation failure. < order:0, mode:0x20", I increased /proc/sys/vm/min_free_kbytes I have not seen any similar errors < - ixgb.txt in kernel network documentation suggests to increase < net.core.netdev_max_backlog to 30. This did not help. mercury1:/proc/sys/net/core# cat netdev_* 300 1000 mercury1:/proc/sys/net/core# echo "30" >netdev_max_backlog < - I also had to increase net.core.optmem_max, because the default < value was too small for 700 multicast groups. I'm not running multicast. As they're all pretty simple to test, it may be worthwhile to give them a shot just to rule things out. unfortunantly the load is not high enough right now to see a real difference (it's only doing ~1400 logs/sec) I'll catch it at a higher load point to see if these make any difference. David Lang Ray - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Greg Trounson <[EMAIL PROTECTED]> writes: > mount [fs] -o remount,noatime,nodiratime nodiratime is implied in noatime. > I get a compile time of 1m23.368s, a mere 6% improvement. 6% is nothing to sneeze at. A lot of optimizations would kill for less -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Thu, 9 Aug 2007, Greg Trounson wrote: Measurements show that noatime helps 20-30% on regular desktop workloads, easily 50% for kernel builds and much more than that (in excess of 100%) for file-read-intense workloads. We cannot just walk past such a _huge_ performance impact so easily without even reacting to the performance arguments, and i'm happy Ubuntu picked up noatime,nodiratime and is whipping up the floor with Fedora on the desktop. Sorry I'm just not seeing those gains here. With my filesystems mounted with atime defaults the Quake sources build in 1m28.856s. A test with ls -ltu verifies that atime is working as expected. When I remount my filesystems with: mount [fs] -o remount,noatime,nodiratime I get a compile time of 1m23.368s, a mere 6% improvement. This is on a dual-core Athlon 4200+ box running 2.6.21, so I would have thought this to be close to a best-case file I/O test. what sort of disks does this box have? and what filesystem? slower disks/filesystems can result in this showing a larger difference. however 6% is a fairly significant gain. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Ingo Molnar wrote: * Alan Cox <[EMAIL PROTECTED]> wrote: People just need to know about the performance differences - very few realise its more than a fraction of a percent. I'm sure Gentoo will use relatime the moment anyone knows its > 5% 8) noatime,nodiratime gave 50% of wall-clock kernel rpm build performance improvement for Dave Jones, on a beefy box. Unless i misunderstood what you meant under 'fraction of a percent' your numbers are _WAY_ off. What numbers - I didn't quote any performance numbers ? ok, i misunderstood your "very few realise its more than a fraction of a percent" sentence, i thought you were saying it's a fraction of a percent. Measurements show that noatime helps 20-30% on regular desktop workloads, easily 50% for kernel builds and much more than that (in excess of 100%) for file-read-intense workloads. We cannot just walk past such a _huge_ performance impact so easily without even reacting to the performance arguments, and i'm happy Ubuntu picked up noatime,nodiratime and is whipping up the floor with Fedora on the desktop. Sorry I'm just not seeing those gains here. With my filesystems mounted with atime defaults the Quake sources build in 1m28.856s. A test with ls -ltu verifies that atime is working as expected. When I remount my filesystems with: mount [fs] -o remount,noatime,nodiratime I get a compile time of 1m23.368s, a mere 6% improvement. This is on a dual-core Athlon 4200+ box running 2.6.21, so I would have thought this to be close to a best-case file I/O test. Greg - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Andrew Morton wrote: On Wed, 08 Aug 2007 14:10:15 -0700 "Martin J. Bligh" <[EMAIL PROTECTED]> wrote: Why isn't this easily fixable by just adding an additional dirty flag that says atime has changed? Then we only cause a write when we remove the inode from the inode cache, if only atime is updated. I think that could be made to work, and it would fix the performance issue. It is a behaviour change. At present ext3 (for example) commits everything every five seconds. After a change like this, a crash+recovery could cause a file's atime to go backwards by an arbitrarily large time interval - it could easily be months. A second pdflush / workqueue at a slower rate would alleviate that. Yes, it's a semantic change ... but only in an incredibly small corner-case ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Wed, 08 Aug 2007 15:39:52 -0400 Jeff Garzik <[EMAIL PROTECTED]> wrote: > Bill Davidsen wrote: > > Being standards compliant is not an argument it's a design goal, a > > requirement. Standards compliance is like pregant, you are or you're > > Linux history says different. There was always the "final 1%" of > compliance that required silliness we really did not want to bother with. This isn't about the 1% however. Its about API and ABI. Changing the default is a fairly evil ABI change. Telling everyone relatime is cool on desktops and defaulting it in the distro is not an ABI change and is very sensible - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Wed, 08 Aug 2007 14:10:15 -0700 "Martin J. Bligh" <[EMAIL PROTECTED]> wrote: > Why isn't this easily fixable by just adding an additional dirty > flag that says atime has changed? Then we only cause a write > when we remove the inode from the inode cache, if only atime > is updated. I think that could be made to work, and it would fix the performance issue. It is a behaviour change. At present ext3 (for example) commits everything every five seconds. After a change like this, a crash+recovery could cause a file's atime to go backwards by an arbitrarily large time interval - it could easily be months. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Christoph Hellwig wrote: On Sat, Aug 04, 2007 at 09:42:59PM +0200, J??rn Engel wrote: On Sat, 4 August 2007 21:26:15 +0200, J??rn Engel wrote: Given the choice between only "atime" and "noatime" I'd agree with you. Heck, I use it myself. But "relatime" seems to combine the best of both worlds. It currently just suffers from mount not supporting it in any relevant distro. And here is a completely untested patch to enable it by default. Ingo, can you see how good this fares compared to "atime" and "noatime,nodiratime"? Umm, no f**king way. atime selection is 100% policy and belongs into userspace. Add to that the problem that we can't actually re-enable atimes because of the way the vfs-level mount flags API is designed. Instead of doing such a fugly kernel patch just talk to the handfull of distributions that matter to update their defaults. From what I've seen the problem seems to be that the inode gets marked dirty when we update atime. Why isn't this easily fixable by just adding an additional dirty flag that says atime has changed? Then we only cause a write when we remove the inode from the inode cache, if only atime is updated. Unlike relatime, there's no user-visible change (unless the machine crashes without clean unmount, but not sure anyone cares that much about that cornercase). Atime changes are thus kept in-ram until umount / inode reclaim. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Jeff Garzik wrote: Bill Davidsen wrote: Being standards compliant is not an argument it's a design goal, a requirement. Standards compliance is like pregant, you are or you're Linux history says different. There was always the "final 1%" of compliance that required silliness we really did not want to bother with. This is not 1%, this is a user-visible change in behavior, relative to all previous Linux versions. There has been a way for ages to trade performance for standards for users or distributions, and standards have been chosen. Given that there is now a way to get virtually all of the performance without giving up atime completely, why the sudden attempt to change to a less satisfactory default? I could understand a push to quickly get relatime with a few enhancements (the functionality if not the exact code) into distributions, even as a default, but forcing user or distribution changes just to retain the same dehavior doesn't seem reasonable. It assumes that vendors and users are so stupid they can't understand why benchmark results and more important than standards. People who run servers are smart enough to decide if their application will run as expected without atime. People have lived with this compromise for a very long time, and it seems that a far more balanced solution will be in the kernel soon. -- bill davidsen <[EMAIL PROTECTED]> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Bill Davidsen wrote: Being standards compliant is not an argument it's a design goal, a requirement. Standards compliance is like pregant, you are or you're Linux history says different. There was always the "final 1%" of compliance that required silliness we really did not want to bother with. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Ingo Molnar wrote: || ...For me, I would say 50% is not enough to describe the _visible_ || benefits... Not talking any specific number but past 10sec-1min+ || lagging in X is history, it's gone and I really don't miss it that || much... :-) Cannot reproduce even a second long delay anymore in || window focusing under considerable load as it's basically || instantaneous (I can see that it's loaded but doesn't affect the || feeling of responsiveness I'm now getting), even on some loads that I || couldn't previously even dream of... [...] we really have to ask ourselves whether the "process" is correct if advantages to the user of this order of magnitude can be brushed aside with simple "this breaks binary-only HSM" and "it's not standards compliant" arguments. Being standards compliant is not an argument it's a design goal, a requirement. Standards compliance is like pregant, you are or you're not. And to deliberately ignore standards for speed is saying "it's too hard to do it right, I'll do it wrong and it will be faster." The answer is to do it smarter, with solutions like relatime (which can be enhanced as Linus noted) which provide performance benefits without ignoring standards, or use of a filesystem which does a better job. But when it goes in the kernel the choice of having per-filesystem behavior either vanishes or becomes an exercise in complex and as-yet unwritten mount options. There are certainly ways to improve ext3, not journaling atime updates would certainly be one, less frequent updates of dirty inodes, whatever. But if a user wants to give up standards compliance it should be a deliberate choice, not something which the average user will not understand or learn to do. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Alan Cox wrote: However, relatime has the POSIX behavior without the overhead. Therefore No. relatime has approximately SuS behaviour. Its not the same as "correct" behaviour. Actually correct, but in terms of what can or does break, relatime seems a lot closer than noatime, I can't (personally) come up with any scenario where real applications would see something which would change behavior adversely. Making noatime a default in the kernel requiring a boot option to restore current behavior seems to be a turn toward the "it doesn't really work right but it's *fast*" model. If vendors wanted noatime they are smart enough to enable it. Now with relatime giving most of the benefits and few (of any) of the side effects, I would expect a change. By all means relatime by default in FC8, but not noatime, and let those who find some measurable benefit from noatime use it. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
richard kennedy <[EMAIL PROTECTED]> writes: > > This is on a standard desktop machine so there are lots of other > processes running on it, and although there is a degree of variability > in the numbers,they are very repeatable and your patch always out > performs the stock mm2. > looks good to me iirc the goal of this is less to get better performance, but to avoid long user visible latencies. Of course if it's faster it's great too, but that's only secondary. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Fri, 2007-08-03 at 14:37 +0200, Peter Zijlstra wrote: > Per device dirty throttling patches > > These patches aim to improve balance_dirty_pages() and directly address three > issues: > 1) inter device starvation > 2) stacked device deadlocks > 3) inter process starvation Hi Peter, I've been testing your patch with a simple test case that copies a 3GB file from sda -> sda, and copies a 1GB file from sda -> sdb. the script is roughly this :- dd bs=64k if=[sda]/data3g of=[sda]/temp_data3g & sleep 60 dd bs=64k if=[sda]/data1g of=[sdb]/temp_data1g & wait sleep 200 On my amd64x2 desktop machine where sda is a sata 250 GB drive & sdb is an ide 300 GB drive. Running this test 5 times gives 2.6.23-rc1-mm2 1GB copy MB/s 3GB copy MB/s 16.216.1 15.214.6 17.314.6 18.014.5 19.014.6 2.6.23-rc1-mm2+pddt_patch 1GB copy MB/s 3GB copy MB/s 23.014.7 24.014.6 20.414.8 22.614.5 23.214.5 This is on a standard desktop machine so there are lots of other processes running on it, and although there is a degree of variability in the numbers,they are very repeatable and your patch always out performs the stock mm2. looks good to me Richard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sat, Aug 04, 2007 at 07:17:24PM +0200, Ingo Molnar wrote: > > * Diego Calleja <[EMAIL PROTECTED]> wrote: > > > El Sat, 4 Aug 2007 18:37:33 +0200, Ingo Molnar <[EMAIL PROTECTED]> escribió: > > > > > thousands of applications. So for most file workloads we give > > > Windows a 20%-30% performance edge, for almost nothing. (for > > > RAM-starved kernel builds the performance difference between atime > > > and noatime+nodiratime setups is more on the order of 40%) > > > > Just curious - do you have numbers with relatime? > > nope. Stupid question, i just tried it and got this: > > EXT3-fs: Unrecognized mount option "relatime" or missing value > > i've got util-linux-2.13-0.46.fc6 and 2.6.22 on that box, shouldnt that The relatime patch has been applied to util-lilnux-ng-2.13 (now -rc3), you will see it in Fedora 8 (and probably in the others distros). Karel -- Karel Zak <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Claudio Martins wrote: On Saturday 04 August 2007, Alan Cox wrote: Linux has never been a "suprise your kernel interfaces all just changed today" kernel, nor a "gosh you upgraded and didn't notice your backups broke" kernel. Can you give examples of backup solutions that rely on atime being updated? I can understand backup tools using mtime/ctime for incremental backups (like tar + Amanda, etc), but I'm having trouble figuring out why someone would want to use atime for that. Programs which migrate unused files or delete them are the usual cases. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
> However, relatime has the POSIX behavior without the overhead. Therefore No. relatime has approximately SuS behaviour. Its not the same as "correct" behaviour. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Alan Cox wrote: i cannot over-emphasise how much of a deal it is in practice. Atime updates are by far the biggest IO performance deficiency that Linux has today. Getting rid of atime updates would give us more everyday Linux performance than all the pagecache speedups of the past 10 years, _combined_. it's also perhaps the most stupid Unix design idea of all times. Unix is really nice and well done, but think about this a bit: Think about the user for a moment instead. Do things right. The job of the kernel is not to "correct" for distribution policy decisions. The distributions need to change policy. You do that by showing the distributions the numbers. With a Red Hat on if we can move from /dev/hda to /dev/sda in FC7 then we can move from atime to noatime by default on FC8 with appropriate release note warnings and having a couple of betas to find out what other than mutt goes boom. Is there really enough benefit between relatime and noatime to justify that? If atime doesn't get updated at all it *will* impact operations, and unless there's a real performance gain the path which provides at least nominal POSIX compliance seems best. Plauger's law of least astonishment. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Jeff Garzik wrote: Alan Cox wrote: In some setups it will and in others it won't. Nor is it the only application that has this requirement. Ext3 currently is a standards compliant file system. Turn off atime and its very non standards compliant, turn to relatime and its not standards compliant but nobody will break (which is good) Linux has always been a "POSIX unless its stupid" type of system. For the upstream kernel, we should do the right thing -- noatime by default -- but allow distros and people that care about rigid compliance to easily change the default. However, relatime has the POSIX behavior without the overhead. Therefore that (and maybe reldiratime?) are a far better choice. I don't see a big problem with some version of utils not supporting it, since it can be in the kernel and will be in the utils soon enough. We have lived without it this long, sounds as if we could live a bit longer. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
* Chuck Ebbert <[EMAIL PROTECTED]> wrote: > > Ingo's latest 'not quite noatime' seems to cure mutt/tmpwatch so it > > might finally make sense to do so. > > Do we report max(ctime, mtime) as the atime by default when noatime is > set or do we still need that to be done? noatime is unchanged by my patch (it is not the same as the 'improved relatime' mode my patch activates), but it would make sense to do your change, independently. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
> Per device dirty throttling patches Andrew, may I inquire about your plans with this? > These patches aim to improve balance_dirty_pages() and directly address three > issues: > 1) inter device starvation > 2) stacked device deadlocks This one interests me most, due to various real life, reported problems with fuse filesystems. For this reason I'd really like to get this or a subset of it into mainline as soon as possible. This patchset (or rather the -v7 version) has been running on my laptop for a couple of weeks without problems. I've also verified that it solves the fuse and loop issues. I have some qualms about the complexity of various parts though. Especially the "proportions" library, which I'm having problems understanding. I'm not sure that this level of sophistication is really needed to solve the issues with the old code. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On 08/06/2007 03:37 PM, Alan Cox wrote: >> We already tried that here. The response: "If noatime is so great, why >> isn't it the default in the kernel?" > > Ok so we have a pile of people @redhat.com sitting on linux-kernel > complaining about Red Hat distributions not taking it up. Guys - can > we just fix it internally please like sensible folk ? > > Ingo's latest 'not quite noatime' seems to cure mutt/tmpwatch so it might > finally make sense to do so. Do we report max(ctime, mtime) as the atime by default when noatime is set or do we still need that to be done? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
> We already tried that here. The response: "If noatime is so great, why > isn't it the default in the kernel?" Ok so we have a pile of people @redhat.com sitting on linux-kernel complaining about Red Hat distributions not taking it up. Guys - can we just fix it internally please like sensible folk ? Ingo's latest 'not quite noatime' seems to cure mutt/tmpwatch so it might finally make sense to do so. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Chuck Ebbert wrote: On 08/05/2007 04:36 PM, Christoph Hellwig wrote: Umm, no f**king way. atime selection is 100% policy and belongs into userspace. Add to that the problem that we can't actually re-enable atimes because of the way the vfs-level mount flags API is designed. Instead of doing such a fugly kernel patch just talk to the handfull of distributions that matter to update their defaults. We already tried that here. The response: "If noatime is so great, why isn't it the default in the kernel?" Yes, and around and around we go :/ Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On 08/05/2007 04:36 PM, Christoph Hellwig wrote: > > Umm, no f**king way. atime selection is 100% policy and belongs into > userspace. Add to that the problem that we can't actually re-enable > atimes because of the way the vfs-level mount flags API is designed. > Instead of doing such a fugly kernel patch just talk to the handfull > of distributions that matter to update their defaults. > We already tried that here. The response: "If noatime is so great, why isn't it the default in the kernel?" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
* Dave Jones <[EMAIL PROTECTED]> wrote: > > does it work with the "atime on steroids" patch below? (no need to > > configure anything, just apply the patch and go.) > > people have reported that relatime does work, but my util-linux isn't > new enough to support it, so I've never got it to work. I'll give your > diff a try later, though as it seems to be equivalent I expect it'll > work. would still be nice if you could test it and report back :) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Mon, Aug 06, 2007 at 08:39:09AM +0200, Ingo Molnar wrote: > > * Dave Jones <[EMAIL PROTECTED]> wrote: > > > > btw., Mutt does not go boom, i use it myself. It works just fine > > > and notices new mails even on a noatime,nodiratime filesystem. > > > > It still fails miserably for me. > > > > If I hit 'C' and '?' I get a list of my mail folders, with some of > > them marked 'N' if they have new mail. Without atime, those N's never > > show up and every mbox looks like it has no new mail. > > does it work with the "atime on steroids" patch below? (no need to > configure anything, just apply the patch and go.) people have reported that relatime does work, but my util-linux isn't new enough to support it, so I've never got it to work. I'll give your diff a try later, though as it seems to be equivalent I expect it'll work. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, 5 Aug 2007 11:00:29 -0400 Theodore Tso <[EMAIL PROTECTED]> wrote: > On Sun, Aug 05, 2007 at 02:26:53AM +0200, Andi Kleen wrote: > > I always thought the right solution would be to just sync atime only > > very very lazily. This means if a inode is only dirty because of an > > atime update put it on a "only write out when there is nothing to do > > or the memory is really needed" list. > > As I've mentioend earlier, the memory balancing issues that arise when > we add an "atime dirty" bit scare me a little. It can be addressed, > obviously, but at the cost of more code complexity. ext3 and reiser both use a dirty_inode method to make sure that we don't actually have dirty inodes. This way, kswapd doesn't get stuck on the log and is able to do real work. It would be interesting to see a comparison of relatime with a kinoded that is willing to get stuck on the log. The FS would need a few tweaks so that write_inode() could know if it really needed to log or not, but for testing you could just drop ext3_dirty_inode and have ext3_write_inode do real work. Then just change kswapd to kick a new kinoded and benchmark away. A real patch would have to look for places where mark_inode_dirty was used and expected the dirty_inode callback to log things right away, but for testing its good enough. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Mon, Aug 06, 2007 at 08:57:12AM +0200, Ingo Molnar wrote: > > * Willy Tarreau <[EMAIL PROTECTED]> wrote: > > > In your example above, maybe it's the opposite, users know they can > > keep a file in /tmp one more week by simply cat'ing it. > > sure - and i'm not arguing that noatime should the kernel-wide default. > In every single patch i sent it was a .config option (and a boot option > _and_ a sysctl option that i think you missed) that a user/distro > enables or disabled. But i think the /tmp argument is not very strong: > /tmp is fundamentally volatile, and you can grow dependencies on pretty > much _any_ aspect of the kernel. So the question isnt "is there impact" > (there is, at least for noatime), the question is "is it still worth > doing it". > > > Changing the kernel in a non-easily reversible way is not kind to the > > users. > > none of my patches did any of that... I did not notice you talked about a sysctl. A sysctl provides the ability to switch the behaviour without rebooting, while both the config option and the command line require a reboot. > anyway, my latest patch doesnt do noatime, it does the "more intelligent > relatime" approach. ... which is not equivalent noatime in the initial example. Regards, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, Aug 05, 2007 at 09:41:12PM +0100, Christoph Hellwig wrote: > On Sun, Aug 05, 2007 at 02:26:53AM +0200, Andi Kleen wrote: > > I always thought the right solution would be to just sync atime only > > very very lazily. This means if a inode is only dirty because of an > > atime update put it on a "only write out when there is nothing to do > > or the memory is really needed" list. > > Which is the policy I implemented for XFS a while ago. How would that work? I didn't think XFS had separate inode lists. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Hi Andi, On Mon, 2007-08-06 at 00:17 +0200, Andi Kleen wrote: > Brice Figureau <[EMAIL PROTECTED]> writes: > > > > 2) I _still_ don't get the "performances" of 2.6.17, but since that's the > > better combination I could get, I think there is IMHO progress in the right > > direction (to be compared to no progress since 2.6.18, that's better :-)). > > If you could characterize your workload well (e.g. how many disks, > what file systems, what load on mysql) perhaps it would be possible > to reproduce the problem with a test program or a mysql driver. > Then it could be bisected. My server is a Dell Poweredge 2850 (bi-Xeon EM64T 3GHz running without HT, 4GB of RAM), with a Perc 4/Di (a LSI megaraid with a BBU of 256MB). The hardware RAID card has 2 channels, one is connected to 2 10k RPM 146GB SCSI disk that are mirrored in a RAID 1 array on which the system resides (/dev/sda). The second channel is connected to 4 10k RPM 146GB disks, on a RAID 10 array which contains the database files and database logs (/dev/sdb). The kernel and userspace are 64bits. Above the hardware RAID arrays there is LVM2 with two physical groups (one per array). The RAID10 has only one logical volume. The database volume (the RAID10) is an ext3 volume mounted with rw,noexec,nosuid,nodev,noatime,data=writeback. The I/O scheduler on all arrays is deadline. /proc knobs with values other than defaults are: /proc/sys/vm/swappiness = 2 /proc/sys/vm/dirty_background_ratio = 1 /proc/sys/vm/dirty_ratio = 2 /proc/sys/vm/vfs_cache_pressure = 1 The only thing running on the server is mysql. Mysql memory footprint is about 90% of physical RAM. Mysql is configured to use exclusively InnoDB. Mysql accesses its database files in O_DIRECT mode. Since the database fits in RAM, the only kind of access Mysql is doing is writing to the innodb log, the mysql binlog and finally to the innodb database files. There are certainly a whole lot of fsync'ing happening. All the database reads are done from the innodb in-RAM cache. During all my kernel tests (see the original bug report) the machine was not swapping (so that's not the reason of the stuttering). If that helps: db1:~# cat /proc/meminfo MemTotal: 4052420 kB MemFree: 23972 kB Buffers: 54420 kB Cached: 168096 kB SwapCached:1541744 kB Active:3723468 kB Inactive: 157180 kB SwapTotal:11863960 kB SwapFree: 10193064 kB Dirty: 320 kB Writeback: 0 kB AnonPages: 3657744 kB Mapped: 20508 kB Slab: 119964 kB SReclaimable: 103564 kB SUnreclaim: 16400 kB PageTables: 9408 kB NFS_Unstable:0 kB Bounce: 0 kB CommitLimit: 13890168 kB Committed_AS: 3826764 kB VmallocTotal: 34359738367 kB VmallocUsed:268604 kB VmallocChunk: 34359469435 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB An typical iostat (taken every 2s under light load): Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 2.000.003.50 0.0044.0012.57 0.000.00 0.00 0.00 sdb 0.00 9.000.50 27.00 4.00 288.0010.62 0.010.36 0.36 1.00 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sdb 0.00 223.507.50 185.5060.00 5964.0031.21 0.150.78 0.56 10.80 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 1.000.001.00 0.0015.9216.00 0.000.00 0.00 0.00 sdb 0.00 198.01 19.90 156.22 159.20 2833.8316.99 0.040.24 0.20 3.58 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sdb 0.00 5.000.50 17.00 4.00 176.0010.29 0.010.69 0.69 1.20 Would it help if I try blktrace on this server to capture the I/O ? I enabled it while compiling the kernel, but I don't know yet how to use it: any pointer on how to activate it and capture useful information? Many thanks, -- Brice Figureau <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
* Diego Calleja <[EMAIL PROTECTED]> wrote: > > Measurements show that noatime helps 20-30% on regular desktop > > workloads, easily 50% for kernel builds and much more than that (in > > excess of 100%) for file-read-intense workloads. We cannot just walk > > And as everybody knows in servers is a popular practice to disable it. > According to an interview to the kernel.org admins yeah - but i'd be surprised if more than 1% of all Linux servers out there had noatime. > "Beyond that, Peter noted, "very little fancy is going on, and that is > good because fancy is hard to maintain." He explained that the only > fancy thing being done is that all filesystems are mounted noatime > meaning that the system doesn't have to make writes to the filesystem > for files which are simply being read, "that cut the load average in > half." nice quote :-) > I bet that some people would consider such performance hit a bug... yeah. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
* Willy Tarreau <[EMAIL PROTECTED]> wrote: > In your example above, maybe it's the opposite, users know they can > keep a file in /tmp one more week by simply cat'ing it. sure - and i'm not arguing that noatime should the kernel-wide default. In every single patch i sent it was a .config option (and a boot option _and_ a sysctl option that i think you missed) that a user/distro enables or disabled. But i think the /tmp argument is not very strong: /tmp is fundamentally volatile, and you can grow dependencies on pretty much _any_ aspect of the kernel. So the question isnt "is there impact" (there is, at least for noatime), the question is "is it still worth doing it". > Changing the kernel in a non-easily reversible way is not kind to the > users. none of my patches did any of that... anyway, my latest patch doesnt do noatime, it does the "more intelligent relatime" approach. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
* [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > i've been a linux sysadmin for 10 years, and have known about noatime > for at least 7 years, but I always thought of it in the catagory of > 'use it only on your performance critical machines where you are > trying to extract every ounce of performance, and keep an eye out for > things misbehaving' > > I never imagined that itwas the 20%+ hit that is being described, and > with so little impact, or I would have switched to it across the board > years ago. > > I'll bet there are a lot of admins out there in the same boat. > > adding an option in the kernel to change the default sounds like a > very good first step, even if the default isn't changed today. yep - but note that this was a gradual effect along the years, today the assymetry between CPU performance and disk-seek performance is proportionally larger than 10 years ago. Today CPUs are nearly 100 times faster than 10 years ago, but disk seeks got only 2-3 times faster. (and even that only if you have a high rpm disk - most desktops dont.) 10 years ago noatime was a nifty hack that made a difference if you had lots of files. But it still was a problem with no immediate easy solution and people developed their counter-arguments. Today the same counter-arguments are used, but the situation has evolved alot. and note that often this has a bigger everyday effect than the tweaking of CPU scheduling, IO scheduling or swapping behavior (!). My desktop systems rarely swap, have plenty of CPU power to spare, but atime updates still have a noticeable latency impact, regardless of the memory pressure. Linux has _lots_ of "performance reserves", so people dont normally notice when comparing it to other OSs, but still we should not be so wasteful with our IO performance, for such a fundamental thing as reading files. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
* Dave Jones <[EMAIL PROTECTED]> wrote: > > btw., Mutt does not go boom, i use it myself. It works just fine > > and notices new mails even on a noatime,nodiratime filesystem. > > It still fails miserably for me. > > If I hit 'C' and '?' I get a list of my mail folders, with some of > them marked 'N' if they have new mail. Without atime, those N's never > show up and every mbox looks like it has no new mail. does it work with the "atime on steroids" patch below? (no need to configure anything, just apply the patch and go.) Ingo ---> Subject: [patch] [patch] implement smarter atime updates support From: Ingo Molnar <[EMAIL PROTECTED]> change relatime updates to be performed once per day. This makes relatime a compatible solution for HSM, mailer-notification and tmpwatch applications too. also add the CONFIG_DEFAULT_RELATIME kernel option, which makes "norelatime" the default for all mounts without an extra kernel boot option. add the "default_relatime=0" boot option to turn this off. also add the /proc/sys/kernel/default_relatime flag which can be changed runtime to modify the behavior of subsequent new mounts. tested by moving the date forward: # date Sun Aug 5 22:55:14 CEST 2007 # date -s "Tue Aug 7 22:55:14 CEST 2007" Tue Aug 7 22:55:14 CEST 2007 access to a file did not generate disk IO before the date was set, and it generated exactly one IO after the date was set. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- Documentation/kernel-parameters.txt |8 + fs/Kconfig | 22 ++ fs/inode.c | 53 +++- fs/namespace.c | 24 include/linux/mount.h |3 ++ kernel/sysctl.c | 17 +++ 6 files changed, 114 insertions(+), 13 deletions(-) Index: linux/Documentation/kernel-parameters.txt === --- linux.orig/Documentation/kernel-parameters.txt +++ linux/Documentation/kernel-parameters.txt @@ -525,6 +525,10 @@ and is between 256 and 4096 characters. This is a 16-member array composed of values ranging from 0-255. + default_relatime= + [FS] mount all filesystems with relative atime + updates by default. + default_utf8= [VT] Format=<0|1> Set system-wide default UTF-8 mode for all tty's. @@ -1468,6 +1472,10 @@ and is between 256 and 4096 characters. Format: [,[,...]] See arch/*/kernel/reboot.c or arch/*/kernel/process.c + relatime_interval= + [FS] relative atime update frequency, in seconds. + (default: 1 day: 86400 seconds) + reserve=[KNL,BUGS] Force the kernel to ignore some iomem area reservetop= [X86-32] Index: linux/fs/Kconfig === --- linux.orig/fs/Kconfig +++ linux/fs/Kconfig @@ -2060,6 +2060,28 @@ config 9P_FS endmenu +config DEFAULT_RELATIME + bool "Mount all filesystems with relatime by default" + default y + help + If you say Y here, all your filesystems will be mounted + with the "relatime" mount option. This eliminates many atime + ('file last accessed' timestamp) updates (which otherwise + is performed on every file access and generates a write + IO to the inode) and thus speeds up IO. Atime is still updated, + but only once per day. + + The mtime ('file last modified') and ctime ('file created') + timestamp are unaffected by this change. + + Use the "norelatime" kernel boot option to turn off this + feature. + +config DEFAULT_RELATIME_VAL + int + default "1" if DEFAULT_RELATIME + default "0" + if BLOCK menu "Partition Types" Index: linux/fs/inode.c === --- linux.orig/fs/inode.c +++ linux/fs/inode.c @@ -1162,6 +1162,41 @@ sector_t bmap(struct inode * inode, sect } EXPORT_SYMBOL(bmap); +/* + * Relative atime updates frequency (default: 1 day): + */ +int relatime_interval __read_mostly = 24*60*60; + +/* + * With relative atime, only update atime if the + * previous atime is earlier than either the ctime or + * mtime. + */ +static int relatime_need_update(struct inode *inode, struct timespec now) +{ + /* +* Is mtime younger than atime? If yes, update atime: +*/ + if (timespec_compare(&inode->i_mtime, &inode->i_atime) >= 0) + return 1; + /* +* Is ctime younger than atime? If yes, update atime: +*/ + if (timespec_compare(&inode->i_ctime, &inode->i_atime) >= 0) +
[RFC] VFS: mnotify (was: [PATCH 00/23] per device dirty throttling -v8)
Jakob Oestergaard wrote: > Why on earth would you cripple the kernel defaults for ext3 (which is a > fine FS for boot/root filesystems), when the *fundamental* problem you > really want to solve lie much deeper in the implementation of the > filesystem? Noatime doesn't solve the problem, it just makes it "less > horrible". inotify could easily solve the atime problem, but it's got the drawback of forcing the user to register each and every file/dir of interest, which isn't really reasonable on TB-filesystems. It could be feasible to introduce mnotify, which would notify the user of meta changes, like atime, across the filesystem. Something like mnotify could also be helpful in CoW situations, provided it supported an in-sync interface. Thanks! -- Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sat, Aug 04, 2007 at 09:16:35PM +0200, Florian Weimer wrote: > * Andrew Morton: > > > The easy preventive is to mount with data=writeback. Maybe that should > > have been the default. > > The documentation I could find suggests that this may lead to a > security weakness (old data in blocks of a file that was grown just > before the crash leaks to a different user). XFS overwrites that data > with zeros upon reboot, which tends to irritate users when it happens. XFS has never overwritten data on reboot. It leaves holes when the kernel has failed to write out data. A hole == zeros so XFS does not expose stale data in this situation. As it is, the underlying XFS problem (lack of synchronisation between inode size update and data writes has been mostly fixed in 2.6.22 by only updating the file size to be written to disk on data I/O completion. FWIW, fsync() would prevent this from happening, but many application writers seem strangely reluctant to put fsync() calls into code to ensure the data they write is safely on disk. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, Aug 05, 2007 at 06:42:30AM -0400, Jeff Garzik wrote: > Jakob Oestergaard wrote: > >Oh dear. > > > >Why not just make ext3 fsync() a no-op while you're at it? > > > >Distros can turn it back on if it's needed... > > > >Of course I'm not serious, but like atime, fsync() is something one > > No, they are nothing alike, and you are just making yourself look silly > if you compare them. fsync has to do with fundamental guarantees about > data. Hi Jeff - just as a point to note, I think you should check the spec for fsync before stating that: "It is explicitly intended that a null implementation is permitted." and "... fsync() might or might not actually cause data to be written where it is safe from a power failure." http://www.opengroup.org/onlinepubs/009695399/functions/fsync.html So fsync() does not have to provide the fundamental guarantees you think it should. Note - I'm not saying that this is at all sane (it's crazy, IMO), I'm just pointing out that a "nofsync" mount option to avoid fsync overhead is a legal thing to do Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sat, Aug 04, 2007 at 09:16:35PM +0200, Florian Weimer wrote: > * Andrew Morton: > > > The easy preventive is to mount with data=writeback. Maybe that should > > have been the default. > > The documentation I could find suggests that this may lead to a > security weakness (old data in blocks of a file that was grown just > before the crash leaks to a different user). XFS overwrites that data > with zeros upon reboot, which tends to irritate users when it happens. > > From this point of view, data=ordered doesn't seem too bad. The other alternative which addresses the security concern is data=journal, which if you have a big enough journal, can sometimes be *faster* than data=ordered or even data=writeback, because it reduces seeking. The problem is that it's workload dependent which is better; if the workload is very, very heavy on data writes, each data block ends up getting writen twice, once to the journal and once to the final location on disk, and so this halves your total max write bandwidth. But if the workload doesn't do as much writing, and is very seeky, and or is very, very, fsync()-centric (like a mailhub), data=journal is probably the right answer. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Brice Figureau <[EMAIL PROTECTED]> writes: > > 2) I _still_ don't get the "performances" of 2.6.17, but since that's the > better combination I could get, I think there is IMHO progress in the right > direction (to be compared to no progress since 2.6.18, that's better :-)). If you could characterize your workload well (e.g. how many disks, what file systems, what load on mysql) perhaps it would be possible to reproduce the problem with a test program or a mysql driver. Then it could be bisected. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, Aug 05, 2007 at 09:57:02AM +0200, Florian Weimer wrote: > For instance, some editors don't perform fsync-then-rename, but simply > truncate the file when saving (because they want to preserve hard > links). With XFS, this tends to cause null bytes on crashes. Since > ext3 has got a much larger install base, this would result in lots of > bug reports, I fear. XFS has recently been changed to only updated the on-disk i_size after data writeback has finished to get rid of this irritation. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, Aug 05, 2007 at 02:26:53AM +0200, Andi Kleen wrote: > I always thought the right solution would be to just sync atime only > very very lazily. This means if a inode is only dirty because of an > atime update put it on a "only write out when there is nothing to do > or the memory is really needed" list. Which is the policy I implemented for XFS a while ago. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, 5 Aug 2007 22:21:12 +0200 Jörn Engel <[EMAIL PROTECTED]> wrote: > On Sun, 5 August 2007 20:37:14 +0200, Jörn Engel wrote: > > > > Guess I should throw in a kernel compile test as well, just to get a > > feel for the performance. > > Three runs each of noatime, relatime and atime, both with cold caches > and with warm caches. Scripts below. Run on a Thinkpad T40, 1.5GHz, > 2GiB RAM, 60GB 2.5" IDE disk, ext3. > > Biggest difference between atime and noatime (median run, cold cache) is > ~2.3%, nowhere near the numbers claimed by Ingo. Ingo, how did you > measure 10% and more? Ingo had CONFIG_DEBUG_INFO=y, which generates heaps more writeout, but no additional atime updates. Ingo had a faster computer ;) That will generate many more MB/sec write traffic, so the cost of those atime seeks becomes proportionally higher. Basically: you're CPU-limited, Ingo is seek-limited. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sat, Aug 04, 2007 at 09:42:59PM +0200, J??rn Engel wrote: > On Sat, 4 August 2007 21:26:15 +0200, J??rn Engel wrote: > > > > Given the choice between only "atime" and "noatime" I'd agree with you. > > Heck, I use it myself. But "relatime" seems to combine the best of both > > worlds. It currently just suffers from mount not supporting it in any > > relevant distro. > > And here is a completely untested patch to enable it by default. Ingo, > can you see how good this fares compared to "atime" and > "noatime,nodiratime"? Umm, no f**king way. atime selection is 100% policy and belongs into userspace. Add to that the problem that we can't actually re-enable atimes because of the way the vfs-level mount flags API is designed. Instead of doing such a fugly kernel patch just talk to the handfull of distributions that matter to update their defaults. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, Aug 05, 2007 at 11:01:18AM -0700, Arjan van de Ven wrote: > > on the journalling side this would be one transaction (not 5 milion) > and... since inodes are grouped on disk, you can even get some better > coalescing this way... > > Wonder if we could do inode-grouping smartly; eg if we HAVE to write > inode X, also write out the atime-dirty inodes in range X-Y to X+Y > (where Y is some tunable) in the same IO.. We already have filesystems in the tree that do such advances things as inode writeback clustering for more than ten years :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, 5 August 2007 20:37:14 +0200, Jörn Engel wrote: > > Guess I should throw in a kernel compile test as well, just to get a > feel for the performance. Three runs each of noatime, relatime and atime, both with cold caches and with warm caches. Scripts below. Run on a Thinkpad T40, 1.5GHz, 2GiB RAM, 60GB 2.5" IDE disk, ext3. Biggest difference between atime and noatime (median run, cold cache) is ~2.3%, nowhere near the numbers claimed by Ingo. Ingo, how did you measure 10% and more? noatime, cold cache relatime, cold cacheatime, cold cache real2m10.242s real2m10.549s real2m10.388s user1m46.886s user1m46.680s user1m47.000s sys 0m8.243ssys 0m8.423ssys 0m8.239s real2m11.270s real2m11.212s real2m14.280s user1m46.940s user1m46.776s user1m46.670s sys 0m8.139ssys 0m8.283ssys 0m8.503s real2m11.601s real2m14.861s real2m14.335s user1m46.920s user1m47.103s user1m46.846s sys 0m8.246ssys 0m8.266ssys 0m8.349s noatime, warm cache relatime, warm cacheatime, warm cache real1m55.894s real1m56.053s real1m56.905s user1m46.683s user1m46.600s user1m46.853s sys 0m8.186ssys 0m8.349ssys 0m8.249s real1m55.823s real1m56.093s real1m57.077s user1m46.583s user1m46.913s user1m46.590s sys 0m8.259ssys 0m7.966ssys 0m8.523s real1m55.789s real1m56.214s real1m57.224s user1m46.803s user1m46.753s user1m46.953s sys 0m8.053ssys 0m8.113ssys 0m8.113s Jörn -- Data expands to fill the space available for storage. -- Parkinson's Law Cold cache script: #!/bin/sh make distclean echo 1 > /proc/sys/vm/drop_caches echo 2 > /proc/sys/vm/drop_caches echo 3 > /proc/sys/vm/drop_caches make allnoconfig time make Warm cache script: #!/bin/sh make distclean make allnoconfig rgrep laksdflkdsaflkadsfja . time make - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
* Alan Cox <[EMAIL PROTECTED]> wrote: > > also add the CONFIG_DEFAULT_RELATIME kernel option, which makes > > "norelatime" the default for all mounts without an extra kernel boot > > option. > > Should be a mount option. it is already a mount option too. > > + relatime[FS] default to enabled relatime updates on all > > + filesystems. > > + > > + relatime= [FS] default to enabled/disabled relatime updates on > > + all filesystems. > > + > > Double patch no - it was not a double patch, i made all the common variants valid boot options: "relatime", "relatime=0/1", "norelatime" and "norelatime=0/1". Anyway, this is mooth, in the latest (v2) version there's only a single boot parameter. > > +config DEFAULT_RELATIME > > + bool "Mount all filesystems with relatime by default" > > + default y > > Changes behaviour so probably should default n. Better yet it should > be the mount option so its flexible and strongly encouraged for > vendors. relatime is a mount option already. And distros can disable it if they want. (they are conscious about their kernel config selections anyway.) > > +0 > > +#endif > > +; > > This ifdef mess would go away for a mount option i fixed that in v2. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
> change relatime updates to be performed once per day. This makes > relatime a compatible solution for HSM, mailer-notification and > tmpwatch applications too. Sweet > > also add the CONFIG_DEFAULT_RELATIME kernel option, which makes > "norelatime" the default for all mounts without an extra kernel > boot option. Should be a mount option. > + relatime[FS] default to enabled relatime updates on all > + filesystems. > + > + relatime= [FS] default to enabled/disabled relatime updates on > + all filesystems. > + Double patch > atkbd.extra=[HW] Enable extra LEDs and keys on IBM RapidAccess, > EzKey and similar keyboards > > @@ -1100,6 +1106,12 @@ and is between 256 and 4096 characters. > noasync [HW,M68K] Disables async and sync negotiation for > all devices. > > + norelatime [FS] default to disabled relatime updates on all > + filesystems. > + > + norelatime= [FS] default to disabled/enabled relatime updates > + on all filesystems. > + Double patch > +config DEFAULT_RELATIME > + bool "Mount all filesystems with relatime by default" > + default y Changes behaviour so probably should default n. Better yet it should be the mount option so its flexible and strongly encouraged for vendors. > /* > + * Allow users to disable (or enable) atime updates via a .config > + * option or via the boot line, or via /proc/sys/fs/mount_with_relatime: > + */ > +int mount_with_relatime __read_mostly = > +#ifdef CONFIG_DEFAULT_RELATIME > +1 > +#else > +0 > +#endif > +; This ifdef mess would go away for a mount option > +/* > + * The "norelatime=", "atime=", "norelatime" and "relatime" boot parameters: > + */ > +static int toggle_relatime_updates(int val) > +{ > + mount_with_relatime = val; > + > + printk("Relative atime updates are: %s\n", val ? "on" : "off"); > + > + return 1; > +} > + > +static int __init set_relatime_setup(char *str) > +{ > + int val; > + > + get_option(&str, &val); > + return toggle_relatime_updates(val); > +} > +__setup("relatime=", set_relatime_setup); > + > +static int __init set_norelatime_setup(char *str) > +{ > + int val; > + > + get_option(&str, &val); > + return toggle_relatime_updates(!val); > +} > +__setup("norelatime=", set_norelatime_setup); > + > +static int __init set_relatime(char *str) > +{ > + return toggle_relatime_updates(1); > +} > +__setup("relatime", set_relatime); > + > +static int __init set_norelatime(char *str) > +{ > + return toggle_relatime_updates(0); > +} > +__setup("norelatime", set_norelatime); All the above chunk is unneccessary as it can be a mount option. That avoids tons of messy extra code and complication. Users are far safer editing fstab than grub.conf. > + { > + .ctl_name = CTL_UNNUMBERED, > + .procname = "mount_with_relatime", > + .data = &mount_with_relatime, > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = &proc_dointvec, > + }, More code you don't need if you just leave it as a mount option. I'd much rather see the small clean patch for this as a mount option. Leave the rest to users/distros/lwn and it'll just happen now you've sorted the compabitility problems. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, 5 Aug 2007 20:08:26 +0200 Ingo Molnar <[EMAIL PROTECTED]> wrote: > > * Alan Cox <[EMAIL PROTECTED]> wrote: > > > And you honestly think that putting it in Kconfig as well as allowing > > users to screw up horribly and creating incompatible defaults you > > So far you've not offered one realistic scenario of "screw up horribly". > People have been using noatime for a long time and there are no horror > stories about that. _Which_ OSS HSM software relies on atime? Whats this about "OSS". OSS or proprietary. And you've been given one example already - tmpwatch. Although its more of a trash compactor than HSM. > > can't test for in a user space app where it matters is going to > > *change* this. > > The patch i posted today adds /proc/sys/kernel/mount_with_atime. That > can be tested by user-space, if it truly cares about atime. We have an existing API and ABI thank you. See man mount. > > Do you really think anyone who said "noatime, compatibility, umm errr" > > is going to say "noatime, compatibility, but hey its in Kconfig lets > > do it". You argument doesn't hold up to minimal rational > > consideration. Posting to the distribution devel list with: "Its a 50% > > performance win, we need to fix these corner cases, here's a tmpwatch > > patch" is *exactly* what is needed to change it, and Kconfig options > > are irrelevant to that. > > i did exactly that 6 months ago, check your email folders. I went by the > "process". But it doesnt really matter anymore, Ubuntu has done the step And your Kconfig argument is still not rational. A question I note you chose not to answer. Anyway if Ubuntu has switched to noatime by default (or relatime) and hasn't used a Kconfig line that proves my whole point - we don't need one and its pointless to add so. > we really have to ask ourselves whether the "process" is correct if > advantages to the user of this order of magnitude can be brushed aside > with simple "this breaks binary-only HSM" and "it's not standards > compliant" arguments. Thats a discussion to have with your distribution development team. The kernel provides the required facilities already. Open source means everyone can do cool stuff as they see fit and natural selection will do the rest. Look I agree entirely with you that relatime, or noatime + minor package patches is the right thing to do for FC8. I've also pointed out you can build and release tuning packages for FC 7 and they'll make the distribution. FC8 beta 1 approaches so now is the time to be talking to the distribution people and to the ever kernel building Dave Jones about it. But none of this makes stupid Kconfig hacks the right answer. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, 5 Aug 2007, Diego Calleja wrote: El Sun, 5 Aug 2007 09:13:20 +0200, Ingo Molnar <[EMAIL PROTECTED]> escribió: Measurements show that noatime helps 20-30% on regular desktop workloads, easily 50% for kernel builds and much more than that (in excess of 100%) for file-read-intense workloads. We cannot just walk And as everybody knows in servers is a popular practice to disable it. According to an interview to the kernel.org admins "Beyond that, Peter noted, "very little fancy is going on, and that is good because fancy is hard to maintain." He explained that the only fancy thing being done is that all filesystems are mounted noatime meaning that the system doesn't have to make writes to the filesystem for files which are simply being read, "that cut the load average in half." I bet that some people would consider such performance hit a bug... actually, it's popular practice to disable it by people who know how big a hit it is and know how few programs use it. i've been a linux sysadmin for 10 years, and have known about noatime for at least 7 years, but I always thought of it in the catagory of 'use it only on your performance critical machines where you are trying to extract every ounce of performance, and keep an eye out for things misbehaving' I never imagined that itwas the 20%+ hit that is being described, and with so little impact, or I would have switched to it across the board years ago. I'll bet there are a lot of admins out there in the same boat. adding an option in the kernel to change the default sounds like a very good first step, even if the default isn't changed today. David Lang
Re: [PATCH 00/23] per device dirty throttling -v8
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > On Sun, 5 Aug 2007, Ingo Molnar wrote: > > > > you mean tmpwatch? The trivial change below fixes this. And with that > > we've come to the end of an extremely short list of atime dependencies. > > You wouldn't even need these kinds of games. > > What we could do is to make "relatime" updates a bit smarter. > > A bit smarter would be: > > - update atime if the old atime is <= than mtime/ctime > >Logic: things like mailers can care about whether some new state has >been read or not. This is the current relatime. > > - update atime if the old atime is more than X seconds in the past >(defaulting to one day or something) > >Logic: things like tmpwatch and backup software may want to remove >stuff that hasn't been touched in a long time, but they sure don't care >about "exact" atime. ok, i've implemented this and it's working fine. Check out the relatime_need_update() function for the details of the logic. Atime update frequency is 1 day with that, and we update at least once after every modification as well, for the mailer logic. tested it by moving the date forward: # date Sun Aug 5 22:55:14 CEST 2007 # date -s "Tue Aug 7 22:55:14 CEST 2007" Tue Aug 7 22:55:14 CEST 2007 access to a file did not generate disk IO before the date was set, and it generated exactly one IO after the date was set. ( should i perhaps reduce the number of boot options and only use a single "norelatime_default" boot option to turn this off? ) Ingo > Subject: [patch] add norelatime/relatime boot options, CONFIG_DEFAULT_RELATIME From: Ingo Molnar <[EMAIL PROTECTED]> change relatime updates to be performed once per day. This makes relatime a compatible solution for HSM, mailer-notification and tmpwatch applications too. also add the CONFIG_DEFAULT_RELATIME kernel option, which makes "norelatime" the default for all mounts without an extra kernel boot option. add the "norelatime" (and "relatime") boot options to enable/disable relatime updates for all filesystems. also add the /proc/sys/kernel/mount_with_relatime flag which can be changed runtime to modify the behavior of subsequent new mounts. tested by moving the date forward: # date Sun Aug 5 22:55:14 CEST 2007 # date -s "Tue Aug 7 22:55:14 CEST 2007" Tue Aug 7 22:55:14 CEST 2007 access to a file did not generate disk IO before the date was set, and it generated exactly one IO after the date was set. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- Documentation/kernel-parameters.txt | 12 +++ fs/Kconfig | 17 ++ fs/inode.c | 48 fs/namespace.c | 61 include/linux/mount.h |2 + kernel/sysctl.c |9 + 6 files changed, 136 insertions(+), 13 deletions(-) Index: linux/Documentation/kernel-parameters.txt === --- linux.orig/Documentation/kernel-parameters.txt +++ linux/Documentation/kernel-parameters.txt @@ -303,6 +303,12 @@ and is between 256 and 4096 characters. atascsi=[HW,SCSI] Atari SCSI + relatime[FS] default to enabled relatime updates on all + filesystems. + + relatime= [FS] default to enabled/disabled relatime updates on + all filesystems. + atkbd.extra=[HW] Enable extra LEDs and keys on IBM RapidAccess, EzKey and similar keyboards @@ -1100,6 +1106,12 @@ and is between 256 and 4096 characters. noasync [HW,M68K] Disables async and sync negotiation for all devices. + norelatime [FS] default to disabled relatime updates on all + filesystems. + + norelatime= [FS] default to disabled/enabled relatime updates + on all filesystems. + nobats [PPC] Do not use BATs for mapping kernel lowmem on "Classic" PPC cores. Index: linux/fs/Kconfig === --- linux.orig/fs/Kconfig +++ linux/fs/Kconfig @@ -2060,6 +2060,23 @@ config 9P_FS endmenu +config DEFAULT_RELATIME + bool "Mount all filesystems with relatime by default" + default y + help + If you say Y here, all your filesystems will be mounted + with the "relatime" mount option. This eliminates many atime + ('file last accessed' timestamp) updates (which otherwise + is performed on every file access and generates a write + IO to the inode) and thus speeds up IO. Atime is still updated, + but only once per day. + + The mtime ('file last modified') and ctime ('file created') + timestamp are
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, Aug 05, 2007 at 02:44:08PM -0400, Dave Jones wrote: > It still fails miserably for me. > > If I hit 'C' and '?' I get a list of my mail folders, with some of them > marked 'N' if they have new mail. Without atime, those N's never show > up and every mbox looks like it has no new mail. This is true for one using mbox_type=mbox (i.e unix native mailbox format). Maildir type should work just fine as mutt will noticed that new mail has arrived on 'new' subdir (according to maildir spec). Then yes, it is configuration dependent. Regards, P.Y. Adi Prasaja - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, Aug 05, 2007 at 09:21:41AM +0200, Ingo Molnar wrote: > * Alan Cox <[EMAIL PROTECTED]> wrote: > > > With a Red Hat on if we can move from /dev/hda to /dev/sda in FC7 then > > we can move from atime to noatime by default on FC8 with appropriate > > release note warnings and having a couple of betas to find out what > > other than mutt goes boom. > > btw., Mutt does not go boom, i use it myself. It works just fine and > notices new mails even on a noatime,nodiratime filesystem. It still fails miserably for me. If I hit 'C' and '?' I get a list of my mail folders, with some of them marked 'N' if they have new mail. Without atime, those N's never show up and every mbox looks like it has no new mail. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, 5 August 2007 11:02:33 -0700, Arjan van de Ven wrote: > > but does it work with relatime ? Like a greased penguin. I had to reboot with my ugly patch posted earlier in the patch to actually test it, though. Relatime suffers from a distribution problem, nothing else. Guess I should throw in a kernel compile test as well, just to get a feel for the performance. Jörn -- Homo Sapiens is a goal, not a description. -- unknown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
* Jeff Garzik <[EMAIL PROTECTED]> wrote: > > yeah, i didnt mean to say that it is _always_ a big issue, but "only > > a small number of files are read" is a very, very small minority of > > even the database server world. > > OTOH, consider a popular Linux task, web serving. atime results in a > lot of unnecessary disk traffic. it's a big, noticeable effect on 99% of the Linux boxes. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
> > In addition, big server boxes are usually not reading a huge *number* > of files per second. The place where you see this as a problem is (a) > compilation, thanks to huge /usr/include hierarchies (and here things > have gotten worse over time as include files have gotten much more > complex than in the early Unix days), and (b) silly desktop apps that > want to scan huge numbers of XML files or who want to read every > single image file on the desktop or in an open file browser window to > show c00l icons. Oh, and I guess I should include Maildir setups. > > If you are always reading from the same small set of files (i.e., a > database workload), then those inodes only get updated every 5 seconds > (the traditional/default metadata update sync time, as well as the > default ext3 journal update time), it's no big deal. Or if you are > running a mail server, most of the time the mail queue files are > getting updated anyway as you process them, and usually the mail is > delivered before 5 seconds is up anyway. it's just one of those things that get compounded with journaling filesystems though. a single async write that happens "sometime in the future" is one thing... having a full transaction (which acts as barrier and synchronisation point) is something totally worse. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
* Alan Cox <[EMAIL PROTECTED]> wrote: > And you honestly think that putting it in Kconfig as well as allowing > users to screw up horribly and creating incompatible defaults you So far you've not offered one realistic scenario of "screw up horribly". People have been using noatime for a long time and there are no horror stories about that. _Which_ OSS HSM software relies on atime? > can't test for in a user space app where it matters is going to > *change* this. The patch i posted today adds /proc/sys/kernel/mount_with_atime. That can be tested by user-space, if it truly cares about atime. > Do you really think anyone who said "noatime, compatibility, umm errr" > is going to say "noatime, compatibility, but hey its in Kconfig lets > do it". You argument doesn't hold up to minimal rational > consideration. Posting to the distribution devel list with: "Its a 50% > performance win, we need to fix these corner cases, here's a tmpwatch > patch" is *exactly* what is needed to change it, and Kconfig options > are irrelevant to that. i did exactly that 6 months ago, check your email folders. I went by the "process". But it doesnt really matter anymore, Ubuntu has done the step and Fedora will be forced to do it too. But it's sad that it took us 10 years. I'd like to remind you again: || ...For me, I would say 50% is not enough to describe the _visible_ || benefits... Not talking any specific number but past 10sec-1min+ || lagging in X is history, it's gone and I really don't miss it that || much... :-) Cannot reproduce even a second long delay anymore in || window focusing under considerable load as it's basically || instantaneous (I can see that it's loaded but doesn't affect the || feeling of responsiveness I'm now getting), even on some loads that I || couldn't previously even dream of... [...] we really have to ask ourselves whether the "process" is correct if advantages to the user of this order of magnitude can be brushed aside with simple "this breaks binary-only HSM" and "it's not standards compliant" arguments. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, 2007-08-05 at 16:17 +0200, Jörn Engel wrote: > On Sun, 5 August 2007 10:53:54 +0200, Willy Tarreau wrote: > > On Sun, Aug 05, 2007 at 09:21:41AM +0200, Ingo Molnar wrote: > > > > > > btw., Mutt does not go boom, i use it myself. It works just fine and > > > notices new mails even on a noatime,nodiratime filesystem. > > > > IIRC, atime is used by mailers and by the shell to detect that new > > mail has arrived and report it only once if there are several intances > > watching the same mbox. > > > > I too use mutt and noatime,nodiratime everywhere (same 10 year-old > > thinko), and the only side effect is that when I have a new mail, > > it is reported in all of my xterms until I read it, clearly something > > I can live with (and sometimes it's even desirable). > > > > In fact, mutt is pretty good at this. It updates atime and ctime itself > > as soon as it opens the mbox, so the shell is happy and only reports > > "you have mail" afterwards. > > For me mutt fails to recognize new mail. And the difference might be > this: but does it work with relatime ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sat, 2007-08-04 at 17:48 -0400, Theodore Tso wrote: > On Sat, Aug 04, 2007 at 01:13:19PM -0700, Arjan van de Ven wrote: > > there is another trick possible (more involved though, Al will have to > > jump in on that one I suspect): Have 2 types of "dirty inode" states; > > one is the current dirty state (meaning the full range of ext3 > > transactions etc) and "lighter" state of "atime-dirty"; which will not > > do the background syncs or journal transactions (so if your machine > > crashes, you lose the atime update) but it does keep atime for most > > normal cases and keeps it standard compliant "except after a crash". > > That would make us standards compliant (POSIX explicitly says that > what happens after a unclean shutdown is Unspecified) and it would > make things a heck of a lot faster. However, there is a potential > problem which is that it will keep a large number of inodes pinned in > memory, which is its own problem. So there would have to be some way > to force the atime updates to be merged when under memory pressure, > and and perhaps on some much longer background interval (i.e., every > hour or so). on the journalling side this would be one transaction (not 5 milion) and... since inodes are grouped on disk, you can even get some better coalescing this way... Wonder if we could do inode-grouping smartly; eg if we HAVE to write inode X, also write out the atime-dirty inodes in range X-Y to X+Y (where Y is some tunable) in the same IO.. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Ingo Molnar wrote: * Theodore Tso <[EMAIL PROTECTED]> wrote: If you are always reading from the same small set of files (i.e., a database workload), then those inodes only get updated every 5 seconds (the traditional/default metadata update sync time, as well as the default ext3 journal update time), it's no big deal. Or if you are running a mail server, most of the time the mail queue files are getting updated anyway as you process them, and usually the mail is delivered before 5 seconds is up anyway. So earlier, when Ingo characterized it as, "whenever you read from a file, even one in memory cache do a write!", it's probably a bit unfair. Traditional Unix systems simply had very different workload characteristics than many modern dekstop systems today. yeah, i didnt mean to say that it is _always_ a big issue, but "only a small number of files are read" is a very, very small minority of even the database server world. OTOH, consider a popular Linux task, web serving. atime results in a lot of unnecessary disk traffic. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
* Theodore Tso <[EMAIL PROTECTED]> wrote: > If you are always reading from the same small set of files (i.e., a > database workload), then those inodes only get updated every 5 seconds > (the traditional/default metadata update sync time, as well as the > default ext3 journal update time), it's no big deal. Or if you are > running a mail server, most of the time the mail queue files are > getting updated anyway as you process them, and usually the mail is > delivered before 5 seconds is up anyway. > > So earlier, when Ingo characterized it as, "whenever you read from a > file, even one in memory cache do a write!", it's probably a bit > unfair. Traditional Unix systems simply had very different workload > characteristics than many modern dekstop systems today. yeah, i didnt mean to say that it is _always_ a big issue, but "only a small number of files are read" is a very, very small minority of even the database server world. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Hi, Ingo Molnar elte.hu> writes: > * Linus Torvalds linux-foundation.org> wrote: > > On Fri, 3 Aug 2007, Peter Zijlstra wrote: > > > > > > These patches aim to improve balance_dirty_pages() and directly address > > > three issues: > > > 1) inter device starvation > > > 2) stacked device deadlocks > > > 3) inter process starvation > > > > Ok, the patches certainly look pretty enough, and you fixed the only > > thing I complained about last time (naming), so as far as I'm > > concerned it's now just a matter of whether it *works* or not. I guess > > being in -mm will help somewhat, but it would be good to have people > > with several disks etc actively test this out. > > There are positive reports in the never-ending "my system crawls like an > XT when copying large files" bugzilla entry: > > http://bugzilla.kernel.org/show_bug.cgi?id=7372 > >[ snipped part of the bug report ] > > so the whole problem area seems to be a "perfect storm" created by a > combination of TCQ, IO scheduling and VM dirty handling weaknesses. Per > device dirty throttling is a good step forward and it makes a very > visible positive difference. Foreword: I'm the OP of bug #7372. I just want to say/add that: 1) I'm running the per-bdi patch since about 30 days on a master mysql server under somewhat mild load without any adverse effect I could notice. 2) I _still_ don't get the "performances" of 2.6.17, but since that's the better combination I could get, I think there is IMHO progress in the right direction (to be compared to no progress since 2.6.18, that's better :-)). To be honest, a vanilla 2.6.17 not tuned at all (ie vfs_cache_pressure and other knobs in /proc/sys/vm like swappiness and dirty_*) is still better than any other upcoming kernel I tested. Thus I still think 2.6.18 added a big regression (which unfortunately I couldn't find). Read the full bug report for any background information if needed. Unfortunately it isn't practical to git-bisect my issue as the server is a production server that can't be rebooted/stopped whenever I want (and since I found workarounds of the issue...). Thanks for showing interest in this issue. Please CC: me on any answers as I'm not subscribed to the list. -- Brice Figureau - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, 5 Aug 2007, Ingo Molnar wrote: > > you mean tmpwatch? The trivial change below fixes this. And with that > we've come to the end of an extremely short list of atime dependencies. You wouldn't even need these kinds of games. What we could do is to make "relatime" updates a bit smarter. A bit smarter would be: - update atime if the old atime is <= than mtime/ctime Logic: things like mailers can care about whether some new state has been read or not. This is the current relatime. - update atime if the old atime is more than X seconds in the past (defaulting to one day or something) Logic: things like tmpwatch and backup software may want to remove stuff that hasn't been touched in a long time, but they sure don't care about "exact" atime. Now, you could also make the rule be that "X" depends on mtime/ctime, ie if a file has been "recently" created or modified, we keep more exact track of it and use one hour instead of one day, but if it's some old file that hasn't been modified in the last six months, we change X to a week. IOW, the "exactness" of atime is relative to how old the inode modifications are. We could obviously do with an additional rule: - update atime if the inode is dirty anyway. Logic: there's no downside. which just says that we'll make it exact if there is no reason not to. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, Aug 05, 2007 at 01:49:26AM +0100, Alan Cox wrote: > HSM is the usual one, and to a large extent probably why Unix originally > had atime. Basically migrating less used files away so as to keep the > system disks tidy. > > Its not something usally found on desktop boxes so it doesn't in anyway > argue against the distribution using noatime or relative atime, but on > big server boxes it matters In addition, big server boxes are usually not reading a huge *number* of files per second. The place where you see this as a problem is (a) compilation, thanks to huge /usr/include hierarchies (and here things have gotten worse over time as include files have gotten much more complex than in the early Unix days), and (b) silly desktop apps that want to scan huge numbers of XML files or who want to read every single image file on the desktop or in an open file browser window to show c00l icons. Oh, and I guess I should include Maildir setups. If you are always reading from the same small set of files (i.e., a database workload), then those inodes only get updated every 5 seconds (the traditional/default metadata update sync time, as well as the default ext3 journal update time), it's no big deal. Or if you are running a mail server, most of the time the mail queue files are getting updated anyway as you process them, and usually the mail is delivered before 5 seconds is up anyway. So earlier, when Ingo characterized it as, "whenever you read from a file, even one in memory cache do a write!", it's probably a bit unfair. Traditional Unix systems simply had very different workload characteristics than many modern dekstop systems today. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, Aug 05, 2007 at 02:26:53AM +0200, Andi Kleen wrote: > I always thought the right solution would be to just sync atime only > very very lazily. This means if a inode is only dirty because of an > atime update put it on a "only write out when there is nothing to do > or the memory is really needed" list. As I've mentioend earlier, the memory balancing issues that arise when we add an "atime dirty" bit scare me a little. It can be addressed, obviously, but at the cost of more code complexity. An alternative is to simply have a tunable parameter, via either a mount option or stashed in the superblock which controls atime's granularity guarantee. That is, only update the atime if it is older than some set time that could be configurable as a mount option or in the superblock. Most of the time, an HSM system simply wants to know if a file has been used sometime "recently", where recently might be measured in hours or in days. This is IMHO slightly better than relatime, since it keeps the spirit of the atime update, while keeping the performance impact to a very minimal (and tunable) level. - Ted P.S. Yet alternative is to specify noatime on an individual file/directory basis. We've had this capability for a *long* time, and if a distro were to set noatime for all files in certain hierarchies (i.e., /usr/include) and certain top-level directories (since the chattr +A flag is inherited), I think folks would find that this would reduce the I/O traffic of noatime by a huge amount. This also would be 100% POSIX compliant, since we are extending the filesystem and setting certain files to use it. But if users want to know when was the last time they looked at a particular file in their home directory, they would still have that facility. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, 5 August 2007 10:53:54 +0200, Willy Tarreau wrote: > On Sun, Aug 05, 2007 at 09:21:41AM +0200, Ingo Molnar wrote: > > > > btw., Mutt does not go boom, i use it myself. It works just fine and > > notices new mails even on a noatime,nodiratime filesystem. > > IIRC, atime is used by mailers and by the shell to detect that new > mail has arrived and report it only once if there are several intances > watching the same mbox. > > I too use mutt and noatime,nodiratime everywhere (same 10 year-old > thinko), and the only side effect is that when I have a new mail, > it is reported in all of my xterms until I read it, clearly something > I can live with (and sometimes it's even desirable). > > In fact, mutt is pretty good at this. It updates atime and ctime itself > as soon as it opens the mbox, so the shell is happy and only reports > "you have mail" afterwards. For me mutt fails to recognize new mail. And the difference might be this: http://www.google.de/search?q=enable-buffy-size Jörn -- Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. -- Rob Pike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, Aug 05, 2007 at 02:46:48PM +0200, Ingo Molnar wrote: > > * Jakob Oestergaard <[EMAIL PROTECTED]> wrote: > > > > If you can show massive amounts of users that will actually be > > > negatively impacted, please present hard evidence. > > > > > > Otherwise all this is useless hot air. > > > > Peace Jeff :) > > > > In another mail, I gave an example with tmpreaper clearing out unused > > files; if some of those files are only read and never modified, > > tmpreaper would start deleting files which were still frequently used. > > > > That's a regression, the way I see it. As for 'massive amounts of > > users', well, tmpreaper exists in most distros, so it's possible it > > has other users than just me. > > you mean tmpwatch? Same same. > The trivial change below fixes this. And with that > we've come to the end of an extremely short list of atime dependencies. Please read what I wrote, not what you think I wrote. If I only *read* those files, the mtime will not be updated, only the atime. And the files *will* then magically begin to disappear although they are frequently used. That will happen with a standard piece of software in a standard configuration, in a scenario that may or may not be common... I have no idea how common such a setup is - but I know how much it would suck to have files magically disappearing because of a kernel upgrade :) -- / jakob - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
> it's default off of course. A distro can turn it on or off. ... > i've periodically pushed for a noatime distro kernel for like ... 5-10 > years and last time this argument came up [i brought it up 6 months ago] > most of the distro kernel developer actually recommended using noatime, > but it took only 1-2 kernel developers to come out with the > 'compatibility' and 'compliance' boogeyman to scare the distro userspace > people away from changing /etc/fstab. And you honestly think that putting it in Kconfig as well as allowing users to screw up horribly and creating incompatible defaults you can't test for in a user space app where it matters is going to *change* this. Do you really think anyone who said "noatime, compatibility, umm errr" is going to say "noatime, compatibility, but hey its in Kconfig lets do it". You argument doesn't hold up to minimal rational consideration. Posting to the distribution devel list with: "Its a 50% performance win, we need to fix these corner cases, here's a tmpwatch patch" is *exactly* what is needed to change it, and Kconfig options are irrelevant to that. Be serious and do this the proper way, propose it for FC8, go through the proper due process. Otherwise the FC8 process will simply continue as "umm err, compatibility" and it'll go nowhere. You can't really complain about the CK scheduler and Con trying to do stuff his own way without listening and then do this can you ? Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, Aug 05, 2007 at 02:58:47PM +0200, Ingo Molnar wrote: > > * Alan Cox <[EMAIL PROTECTED]> wrote: > > > > The only remotely valid compatibility argument would be Mutt - but even > > > that handles it just fine. (we broke way more software via noexec) > > > > And went through a sensible process of resolving it. > > > > And its not just mutt. HSM stuff stops working which is a big deal as > > stuff clogs up. The /tmp/ cleaning tools go wrong as well. > > what OSS HSM software stops working and what is its failure mode? /tmp > cleaning tools will work _just fine_ if we report back max(mtime,ctime) > as atime - they'll zap more /tmp stuff as they used to. There's no > guarantee for /tmp contents anyway if tmpwatch is running. Or the patch > below. Ingo, In your example above, maybe it's the opposite, users know they can keep a file in /tmp one more week by simply cat'ing it. Changing the kernel in a non-easily reversible way is not kind to the users. As you pointed it, there's no "atime" option in mount, and quite frankly, having to reboot an NFS server to change a command line option which should belong to fstab is quite gross. And yes, there may be people realying on atime in specific environments. I remember having used it in the past to automatically archive unused files. Those people might not be affected by the drop in performance at all and would rather keep the feature. I like Alan's idea of a package to automatically add "noatime" everywhere in fstab, not only because it's easy to use, but because it will also teach users how they can proceed on their other systems. Also, if you make the package yourself, it will benefit from the "coolness factor" many people see in everything that's done by renown persons (you know, the type of people who regularly ask you if you use vi/emacs and what type of window manager, and who then consider it must be good if you use it). I'll stop ranting here, some of them may be reading ;-) As a second step, once many people explicitly ask for "noatime" by default, it will be time to add MS_ATIME to the kernel and to mount, and set NOATIME as the default with big warnings. This will make everyone happy. But expecting the admins to recompile their kernels or to reboot to change the atime status is not acceptable IMHO. Moreover, they will not even know they have to do this and they will feel frustrated because the system will not do what they want. I've already been bothered a lot by ext3 filesystems with dirindex enabled. When you boot from an old CD and you cannot mount them, it's already quite irritating (not to mention that tune2fs from the old CD does not know about it either so you cannot disable the option). But it's even worse when you plug an USB hard disk into an old server to start a backup and notice that you cannot mount the disk without first upgrading your kernel ! For this reason, I think that the default noatime will be desirable only after MS_ATIME is supported by both the kernel and the tools. Cheers, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
El Sun, 5 Aug 2007 09:13:20 +0200, Ingo Molnar <[EMAIL PROTECTED]> escribió: > Measurements show that noatime helps 20-30% on regular desktop > workloads, easily 50% for kernel builds and much more than that (in > excess of 100%) for file-read-intense workloads. We cannot just walk And as everybody knows in servers is a popular practice to disable it. According to an interview to the kernel.org admins "Beyond that, Peter noted, "very little fancy is going on, and that is good because fancy is hard to maintain." He explained that the only fancy thing being done is that all filesystems are mounted noatime meaning that the system doesn't have to make writes to the filesystem for files which are simply being read, "that cut the load average in half." I bet that some people would consider such performance hit a bug... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
* Alan Cox <[EMAIL PROTECTED]> wrote: > > The only remotely valid compatibility argument would be Mutt - but even > > that handles it just fine. (we broke way more software via noexec) > > And went through a sensible process of resolving it. > > And its not just mutt. HSM stuff stops working which is a big deal as > stuff clogs up. The /tmp/ cleaning tools go wrong as well. what OSS HSM software stops working and what is its failure mode? /tmp cleaning tools will work _just fine_ if we report back max(mtime,ctime) as atime - they'll zap more /tmp stuff as they used to. There's no guarantee for /tmp contents anyway if tmpwatch is running. Or the patch below. Ingo --- /etc/cron.daily/tmpwatch.orig 2007-08-05 14:44:25.0 +0200 +++ /etc/cron.daily/tmpwatch2007-08-05 14:45:10.0 +0200 @@ -1,9 +1,9 @@ #! /bin/sh -/usr/sbin/tmpwatch -x /tmp/.X11-unix -x /tmp/.XIM-unix -x /tmp/.font-unix \ +/usr/sbin/tmpwatch --mtime -x /tmp/.X11-unix -x /tmp/.XIM-unix -x /tmp/.font-unix \ -x /tmp/.ICE-unix -x /tmp/.Test-unix 10d /tmp -/usr/sbin/tmpwatch 30d /var/tmp +/usr/sbin/tmpwatch --mtime 30d /var/tmp for d in /var/{cache/man,catman}/{cat?,X11R6/cat?,local/cat?}; do if [ -d "$d" ]; then - /usr/sbin/tmpwatch -f 30d "$d" + /usr/sbin/tmpwatch --mtime -f 30d "$d" fi done - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
* Alan Cox <[EMAIL PROTECTED]> wrote: > > > we can move from atime to noatime by default on FC8 with > > > appropriate release note warnings and having a couple of betas to > > > find out what other than mutt goes boom. > > > > btw., Mutt does not go boom, i use it myself. It works just fine and > > notices new mails even on a noatime,nodiratime filesystem. > > Configuration dependant, and also mutt and the shell will misreport > new mail with noatime on the mail spool. The shell should probably use > inotify of course but that change has to be made. just to quote from this same email thread: | I too use mutt and noatime,nodiratime everywhere (same 10 year-old | thinko), and the only side effect is that when I have a new mail, it | is reported in all of my xterms until I read it, clearly something I | can live with (and sometimes it's even desirable). | | In fact, mutt is pretty good at this. It updates atime and ctime | itself as soon as it opens the mbox, so the shell is happy and only | reports "you have mail" afterwards. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
* Alan Cox <[EMAIL PROTECTED]> wrote: > > you try to put the blame into distribution makers' shoes but in > > reality, had the kernel stepped forward with a neat .config option > > sooner (combined with a neat boot option as well to turn it off), > > we'd have had noatime systems 10 years ago. A new entry into > > relnotes and done. It's > > Sorry Ingo, having been in the distribution business for over ten > years I have to disagree. Kernel options that magically totally change > the kernel API and behaviour are exactly what a vendor does *NOT* want > to have. it's default off of course. A distro can turn it on or off. > > Distro makers did not dare to do this sooner because some kernel > > developers came forward with these mostly bogus arguments ... The > > impact of atime is far better understood by the kernel community, so > > it is the responsibility of _us_ to signal such things towards > > distributors, not the other way around. > > You are trying to put a bogus divide between kernel community and > developer community. Yet you know perfectly well that a large part of > the kernel community yourself included work for distribution vendors > and are actively building the distribution kernels. i've periodically pushed for a noatime distro kernel for like ... 5-10 years and last time this argument came up [i brought it up 6 months ago] most of the distro kernel developer actually recommended using noatime, but it took only 1-2 kernel developers to come out with the 'compatibility' and 'compliance' boogeyman to scare the distro userspace people away from changing /etc/fstab. so yes, things like this needs a clear message from the kernel folks, and a kernel option for that is a pretty good way of doing it. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
> > we can move from atime to noatime by default on FC8 with appropriate > > release note warnings and having a couple of betas to find out what > > other than mutt goes boom. > > btw., Mutt does not go boom, i use it myself. It works just fine and > notices new mails even on a noatime,nodiratime filesystem. Configuration dependant, and also mutt and the shell will misreport new mail with noatime on the mail spool. The shell should probably use inotify of course but that change has to be made. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
* Jakob Oestergaard <[EMAIL PROTECTED]> wrote: > > If you can show massive amounts of users that will actually be > > negatively impacted, please present hard evidence. > > > > Otherwise all this is useless hot air. > > Peace Jeff :) > > In another mail, I gave an example with tmpreaper clearing out unused > files; if some of those files are only read and never modified, > tmpreaper would start deleting files which were still frequently used. > > That's a regression, the way I see it. As for 'massive amounts of > users', well, tmpreaper exists in most distros, so it's possible it > has other users than just me. you mean tmpwatch? The trivial change below fixes this. And with that we've come to the end of an extremely short list of atime dependencies. Ingo --- /etc/cron.daily/tmpwatch.orig +++ /etc/cron.daily/tmpwatch @@ -1,9 +1,9 @@ #! /bin/sh -/usr/sbin/tmpwatch -x /tmp/.X11-unix -x /tmp/.XIM-unix -x /tmp/.font-unix \ +/usr/sbin/tmpwatch --mtime -x /tmp/.X11-unix -x /tmp/.XIM-unix -x /tmp/.font-unix \ -x /tmp/.ICE-unix -x /tmp/.Test-unix 10d /tmp -/usr/sbin/tmpwatch 30d /var/tmp +/usr/sbin/tmpwatch --mtime 30d /var/tmp for d in /var/{cache/man,catman}/{cat?,X11R6/cat?,local/cat?}; do if [ -d "$d" ]; then - /usr/sbin/tmpwatch -f 30d "$d" + /usr/sbin/tmpwatch --mtime -f 30d "$d" fi done - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
> The only remotely valid compatibility argument would be Mutt - but even > that handles it just fine. (we broke way more software via noexec) And went through a sensible process of resolving it. And its not just mutt. HSM stuff stops working which is a big deal as stuff clogs up. The /tmp/ cleaning tools go wrong as well. These are big deals because you seem intent on using a large hammer to force a change that should be done properly by other means. The /tmp cleaning for example can probably be done other ways in future but the changes should be in place first. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
> you try to put the blame into distribution makers' shoes but in reality, > had the kernel stepped forward with a neat .config option sooner > (combined with a neat boot option as well to turn it off), we'd have had > noatime systems 10 years ago. A new entry into relnotes and done. It's Sorry Ingo, having been in the distribution business for over ten years I have to disagree. Kernel options that magically totally change the kernel API and behaviour are exactly what a vendor does *NOT* want to have. > Distro makers did not dare to do this sooner because some kernel > developers came forward with these mostly bogus arguments ... The impact > of atime is far better understood by the kernel community, so it is the > responsibility of _us_ to signal such things towards distributors, not > the other way around. You are trying to put a bogus divide between kernel community and developer community. Yet you know perfectly well that a large part of the kernel community yourself included work for distribution vendors and are actively building the distribution kernels. You are perfectly positioned to provide timing examples to the Fedora development team and make the case for FC8 beta going out that way. You are perfectly able to propose, build and submit a FC7 extras package of tuning which people can try in the meantime, but you haven't do so. Other people in this discussion can do likewise for Debian, SuSE etc. Your argument appears to be "I can't be bothered to use the due processes of the distribution but I can do it quickly with an ugly kernel hack". That is not the right approach. Propose it with your presented numbers to fedora-devel and I'll be happy to back up such a proposal for the next FC as will many other kernel folk I'm sure. Heck, go write a piece for LWN with the benchmark numbers and how to change your atime options. You'll make Jon happy and lots of folks read it and will give feedback on improvements as a result. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, Aug 05, 2007 at 06:42:30AM -0400, Jeff Garzik wrote: ... > If you can show massive amounts of users that will actually be > negatively impacted, please present hard evidence. > > Otherwise all this is useless hot air. Peace Jeff :) In another mail, I gave an example with tmpreaper clearing out unused files; if some of those files are only read and never modified, tmpreaper would start deleting files which were still frequently used. That's a regression, the way I see it. As for 'massive amounts of users', well, tmpreaper exists in most distros, so it's possible it has other users than just me. -- / jakob - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sat, Aug 04, 2007 at 02:08:40PM -0400, Jeff Garzik wrote: > Linus Torvalds wrote: > >The "relatime" thing that David mentioned might well be very useful, but > >it's probably even less used than "noatime" is. And sadly, I don't really > >see that changing (unless we were to actually change the defaults inside > >the kernel). > > > I actually vote for that. IMO, distros should turn -on- atime updates > when they know its needed. Oh dear. Why not just make ext3 fsync() a no-op while you're at it? Distros can turn it back on if it's needed... Of course I'm not serious, but like atime, fsync() is something one expects to work if it's there. Disabling atime updates or making fsync() a no-op will both result in silent failure which I am sure we can agree is disasterous. Why on earth would you cripple the kernel defaults for ext3 (which is a fine FS for boot/root filesystems), when the *fundamental* problem you really want to solve lie much deeper in the implementation of the filesystem? Noatime doesn't solve the problem, it just makes it "less horrible". If you really need different filesystem performance characteristics, you can switch to another filesystem. There's plenty to choose from. -- / jakob - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Jakob Oestergaard wrote: Oh dear. Why not just make ext3 fsync() a no-op while you're at it? Distros can turn it back on if it's needed... Of course I'm not serious, but like atime, fsync() is something one No, they are nothing alike, and you are just making yourself look silly if you compare them. fsync has to do with fundamental guarantees about data. expects to work if it's there. Disabling atime updates or making fsync() a no-op will both result in silent failure which I am sure we can agree is disasterous. Climb down from hyperbole mountain. If you can show massive amounts of users that will actually be negatively impacted, please present hard evidence. Otherwise all this is useless hot air. Why on earth would you cripple the kernel defaults for ext3 (which is a fine FS for boot/root filesystems), when the *fundamental* problem you really want to solve lie much deeper in the implementation of the filesystem? Noatime doesn't solve the problem, it just makes it "less horrible". atime updates -are- a fundamental problem, one you cannot solve by tweaking filesystem implementations. No matter how much you try to hide or batch, atime dirties an inode each time on every read... for a feature a tiny minority of programs care about, much less depend on. Remember several filesystems lock atime to mtime, because they do not have a concept of atime, and programs continue to work just fine. We already have field proof of how little atime matters in reality. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, Aug 05, 2007 at 09:28:05AM +0200, Ingo Molnar wrote: > > * Alan Cox <[EMAIL PROTECTED]> wrote: > > > > Can you give examples of backup solutions that rely on atime being > > > updated? I can understand backup tools using mtime/ctime for > > > incremental backups (like tar + Amanda, etc), but I'm having trouble > > > figuring out why someone would want to use atime for that. > > > > HSM is the usual one, and to a large extent probably why Unix > > originally had atime. Basically migrating less used files away so as > > to keep the system disks tidy. > > atime is used as a _hint_, at most and HSM sure works just fine on an > atime-incapable filesystem too. So it's the same deal as "add user_xattr > mount option to the filesystem to make Beagle index faster". It's now: > "if you use HSM storage add the atime mount option to make it slightly > more intelligent. Expect huge IO slowdowns though." > > The only remotely valid compatibility argument would be Mutt - but even > that handles it just fine. (we broke way more software via noexec) I find it pretty normal to use tmpreaper to clear out unused files from certain types of semi-temporary directory structures. Those files are often only ever read. They'd start randomly disappearing while in use. But then again, maybe I'm the only guy on the planet who uses tmpreaper. -- / jakob - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
Ingo Molnar wrote: Distro makers did not dare to do this sooner because some kernel developers came forward with these mostly bogus arguments ... The impact of atime is far better understood by the kernel community, so it is the responsibility of _us_ to signal such things towards distributors, not the other way around. Pretty much. AFAICS there was never a "policy decision" on the part of distro makers to begin with. The kernel had its default -- atime -- and the distros ran with that. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Sun, Aug 05, 2007 at 09:21:41AM +0200, Ingo Molnar wrote: > > * Alan Cox <[EMAIL PROTECTED]> wrote: > > > With a Red Hat on if we can move from /dev/hda to /dev/sda in FC7 then > > we can move from atime to noatime by default on FC8 with appropriate > > release note warnings and having a couple of betas to find out what > > other than mutt goes boom. > > btw., Mutt does not go boom, i use it myself. It works just fine and > notices new mails even on a noatime,nodiratime filesystem. IIRC, atime is used by mailers and by the shell to detect that new mail has arrived and report it only once if there are several intances watching the same mbox. I too use mutt and noatime,nodiratime everywhere (same 10 year-old thinko), and the only side effect is that when I have a new mail, it is reported in all of my xterms until I read it, clearly something I can live with (and sometimes it's even desirable). In fact, mutt is pretty good at this. It updates atime and ctime itself as soon as it opens the mbox, so the shell is happy and only reports "you have mail" afterwards. Well, I hope we're not getting too much off-topic here... Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
* Andrew Morton: >> XFS overwrites that data with zeros upon reboot, which tends to >> irritate users when it happens. > > yup. > >> >From this point of view, data=ordered doesn't seem too bad. > > If your computer is used by multiple users who don't trust each other, > sure. That covers, what? About 2% of machines? I wasn't concerned so much with security, but with user experience. For instance, some editors don't perform fsync-then-rename, but simply truncate the file when saving (because they want to preserve hard links). With XFS, this tends to cause null bytes on crashes. Since ext3 has got a much larger install base, this would result in lots of bug reports, I fear. Without zeroing, the truncating editor might garble the file in a more obvious way, but you've got the security issue (and I agree that this is more of a PR issue). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
* Andrew Morton <[EMAIL PROTECTED]> wrote: > On Sun, 5 Aug 2007 09:21:41 +0200 Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > even on a noatime,nodiratime filesystem > > noatime is a superset of nodiratime, btw. heh, indeed. I've been using this trick for 10 years on my desktops so it's an ancient thinko :) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/