Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
On Tue, 15 Jan 2008, Daniel Phillips wrote: Along with this effort, could you let me know if the world actually cares about online fsck? Now we know how to do it I think, but is it worth the effort. Most users seem to care deeply about things just work. Here is why ntfs-3g also took the online fsck path some time ago. NTFS support had a highly bad reputation on Linux thus the new code was written with rigid sanity checks and extensive automatic, regression testing. One of the consequences is that we're detecting way too many inconsistencies left behind by the Windows and other NTFS drivers, hardware faults, device drivers. To better utilize the non-existing developer resources, it was obvious to suggest the already existing Windows fsck (chkdsk) in such cases. Simple and safe as most people like us would think who never used Windows. However years of experience shows that depending on several factors chkdsk may start or not, may report the real problems or not, but on the other hand it may report bogus issues, it may run long or just forever, and it may even remove completely valid files. So one could perhaps even consider suggestions to run chkdsk a call to play Russian roulette. Thankfully NTFS has some level of metadata redundancy with signatures and weak checksums which make possible to correct some common and obvious corruptions on the fly. Similarly to ZFS, Windows Server 2008 also has self-healing NTFS: http://technet2.microsoft.com/windowsserver2008/en/library/6f883d0d-3668-4e15-b7ad-4df0f6e6805d1033.mspx?mfr=true Szaka -- NTFS-3G: http://ntfs-3g.org - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
On Tue 2008-01-15 20:36:16, Chris Mason wrote: On Tue, 15 Jan 2008 20:24:27 -0500 Daniel Phillips [EMAIL PROTECTED] wrote: On Jan 15, 2008 7:15 PM, Alan Cox [EMAIL PROTECTED] wrote: Writeback cache on disk in iteself is not bad, it only gets bad if the disk is not engineered to save all its dirty cache on power loss, using the disk motor as a generator or alternatively a small battery. It would be awfully nice to know which brands fail here, if any, because writeback cache is a big performance booster. AFAIK no drive saves the cache. The worst case cache flush for drives is several seconds with no retries and a couple of minutes if something really bad happens. This is why the kernel has some knowledge of barriers and uses them to issue flushes when needed. Indeed, you are right, which is supported by actual measurements: http://sr5tech.com/write_back_cache_experiments.htm Sorry for implying that anybody has engineered a drive that can do such a nice thing with writeback cache. The disk motor as a generator tale may not be purely folklore. When an IDE drive is not in writeback mode, something special needs to done to ensure the last write to media is not a scribble. A small UPS can make writeback mode actually reliable, provided the system is smart enough to take the drives out of writeback mode when the line power is off. We've had mount -o barrier=1 for ext3 for a while now, it makes writeback caching safe. XFS has this on by default, as does reiserfs. Maybe ext3 should do barriers by default? Having ext3 in lets corrupt data by default... seems like bad idea. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
On Jan 17, 2008 7:29 AM, Szabolcs Szakacsits [EMAIL PROTECTED] wrote: Similarly to ZFS, Windows Server 2008 also has self-healing NTFS: I guess that is enough votes to justify going ahead and trying an implementation of the reverse mapping ideas I posted. But of course more votes for this is better. If online incremental fsck is something people want, then please speak up here and that will very definitely help make it happen. On the walk-before-run principle, it would initially just be filesystem checking, not repair. But even this would help, by setting per-group checked flags that offline fsck could use to do a much quicker repair pass. And it will let you know when a volume needs to be taken offline without having to build in planned downtime just in case, which already eats a bunch of nines. Regards, Daniel - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
Hi! Along with this effort, could you let me know if the world actually cares about online fsck? I'm not the world's spokeperson (yet ;-). Now we know how to do it I think, but is it worth the effort. ext3's lets fsck on every 20 mounts is good idea, but it can be annoying when developing. Having option to fsck while filesystem is online takes that annoyance away. So yes, it would be very useful for me... For long-running servers, this may be less of a problem... but OTOH their filesystems are not checked at all as long servers are online... so online fsck is actually important there, too, but for other reasons. So yes, it is very useful for world. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
On Wed, Jan 16, 2008 at 08:43:25AM +1100, David Chinner wrote: ext3 is not the only filesystem that will have trouble due to volatile write caches. We see problems often enough with XFS due to volatile write caches that it's in our FAQ: In fact it will hit every filesystem. A write-back cache that can't be forced to write back bythe filesystem will cause corruption on uncontained power loss, period. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
On Jan 16, 2008 3:49 AM, Pavel Machek [EMAIL PROTECTED] wrote: ext3's lets fsck on every 20 mounts is good idea, but it can be annoying when developing. Having option to fsck while filesystem is online takes that annoyance away. I'm sure everyone on cc: knows this, but for the record you can change ext3's fsck on N mounts or every N days to something that makes sense for your use case. Usually I just turn it off entirely and run fsck by hand when I'm worried: # tune2fs -c 0 -i 0 /dev/whatever -VAL - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
Alan Cox wrote: Writeback cache on disk in iteself is not bad, it only gets bad if the disk is not engineered to save all its dirty cache on power loss, using the disk motor as a generator or alternatively a small battery. It would be awfully nice to know which brands fail here, if any, because writeback cache is a big performance booster. AFAIK no drive saves the cache. The worst case cache flush for drives is several seconds with no retries and a couple of minutes if something really bad happens. This is why the kernel has some knowledge of barriers and uses them to issue flushes when needed. Problem is, ext3 has barriers off by default so it's not saving most people. And then if you turn them on, but have your filesystem on an lvm device, lvm strips them out again. -Eric - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
On Jan 15, 2008 22:05 -0500, Rik van Riel wrote: With a filesystem that is compartmentalized and checksums metadata, I believe that an online fsck is absolutely worth having. Instead of the filesystem resorting to mounting the whole volume read-only on certain errors, part of the filesystem can be offlined while an fsck runs. This could even be done automatically in many situations. In ext4 we store per-group state flags in each group, and the group descriptor is checksummed (to detect spurious flags), so it should be relatively straight forward to store an error flag in a single group and have it become read-only. As a starting point, it would be worthwhile to check instances of ext4_error() to see how many of them can be targetted at a specific group. I'd guess most of them could be (corrupt inodes, directory and indirect blocks, incorrect bitmaps). Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
Hi! What are ext3 expectations of disk (is there doc somewhere)? For example... if disk does not lie, but powerfail during write damages the sector -- is ext3 still going to work properly? Nope. However the few disks that did this rapidly got firmware updates because there are other OS's that can't cope. If disk does not lie, but powerfail during write may cause random numbers to be returned on read -- can fsck handle that? most of the time. and fsck knows about writing sectors to remove read errors in metadata blocks. What abou disk that kills 5 sectors around sector being written during powerfail; can ext3 survive that? generally. Note btw that for added fun there is nothing that guarantees the blocks around a block on the media are sequentially numbered. The usually are but you never know. Ok, should something like this be added to the documentation? It would be cool to be able to include few examples (modern SATA disks support bariers so are safe, any IDE from 1989 is unsafe), but I do not know enough about hw... Signed-off-by: Pavel Machek [EMAIL PROTECTED] diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt index b45f3c1..adfcc9d 100644 --- a/Documentation/filesystems/ext3.txt +++ b/Documentation/filesystems/ext3.txt @@ -183,6 +183,18 @@ mke2fs:create a ext3 partition with th debugfs: ext2 and ext3 file system debugger. ext2online:online (mounted) ext2 and ext3 filesystem resizer +Requirements + + +Ext3 needs disk that does not do write-back caching or disk that +supports barriers and Linux configuration that can use them. + +* if disk damages the sector being written during powerfail, ext3 + can't cope with that. Fortunately, such disks got firmware updates + to fix this long time ago. + +* if disk writes random data during powerfail, ext3 should survive + that most of the time. References == -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
On Tue, Jan 15, 2008 at 09:16:53PM +0100, Pavel Machek wrote: Hi! What are ext3 expectations of disk (is there doc somewhere)? For example... if disk does not lie, but powerfail during write damages the sector -- is ext3 still going to work properly? Nope. However the few disks that did this rapidly got firmware updates because there are other OS's that can't cope. If disk does not lie, but powerfail during write may cause random numbers to be returned on read -- can fsck handle that? most of the time. and fsck knows about writing sectors to remove read errors in metadata blocks. What abou disk that kills 5 sectors around sector being written during powerfail; can ext3 survive that? generally. Note btw that for added fun there is nothing that guarantees the blocks around a block on the media are sequentially numbered. The usually are but you never know. Ok, should something like this be added to the documentation? It would be cool to be able to include few examples (modern SATA disks support bariers so are safe, any IDE from 1989 is unsafe), but I do not know enough about hw... ext3 is not the only filesystem that will have trouble due to volatile write caches. We see problems often enough with XFS due to volatile write caches that it's in our FAQ: http://oss.sgi.com/projects/xfs/faq.html#wcache Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
Hi! What are ext3 expectations of disk (is there doc somewhere)? For example... if disk does not lie, but powerfail during write damages the sector -- is ext3 still going to work properly? Nope. However the few disks that did this rapidly got firmware updates because there are other OS's that can't cope. If disk does not lie, but powerfail during write may cause random numbers to be returned on read -- can fsck handle that? most of the time. and fsck knows about writing sectors to remove read errors in metadata blocks. What abou disk that kills 5 sectors around sector being written during powerfail; can ext3 survive that? generally. Note btw that for added fun there is nothing that guarantees the blocks around a block on the media are sequentially numbered. The usually are but you never know. Ok, should something like this be added to the documentation? It would be cool to be able to include few examples (modern SATA disks support bariers so are safe, any IDE from 1989 is unsafe), but I do not know enough about hw... ext3 is not the only filesystem that will have trouble due to volatile write caches. We see problems often enough with XFS due to volatile write caches that it's in our FAQ: http://oss.sgi.com/projects/xfs/faq.html#wcache Nice FAQ, yep. Perhaps you should move parts of it to Documentation/ , and I could then make ext3 FAQ point to it? I had write cache enabled on my main computer. Oops. I guess that means we do need better documentation. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
On Jan 15, 2008 6:07 PM, Pavel Machek [EMAIL PROTECTED] wrote: I had write cache enabled on my main computer. Oops. I guess that means we do need better documentation. Writeback cache on disk in iteself is not bad, it only gets bad if the disk is not engineered to save all its dirty cache on power loss, using the disk motor as a generator or alternatively a small battery. It would be awfully nice to know which brands fail here, if any, because writeback cache is a big performance booster. Regards, Daniel - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
Writeback cache on disk in iteself is not bad, it only gets bad if the disk is not engineered to save all its dirty cache on power loss, using the disk motor as a generator or alternatively a small battery. It would be awfully nice to know which brands fail here, if any, because writeback cache is a big performance booster. AFAIK no drive saves the cache. The worst case cache flush for drives is several seconds with no retries and a couple of minutes if something really bad happens. This is why the kernel has some knowledge of barriers and uses them to issue flushes when needed. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
On Jan 15, 2008 7:15 PM, Alan Cox [EMAIL PROTECTED] wrote: Writeback cache on disk in iteself is not bad, it only gets bad if the disk is not engineered to save all its dirty cache on power loss, using the disk motor as a generator or alternatively a small battery. It would be awfully nice to know which brands fail here, if any, because writeback cache is a big performance booster. AFAIK no drive saves the cache. The worst case cache flush for drives is several seconds with no retries and a couple of minutes if something really bad happens. This is why the kernel has some knowledge of barriers and uses them to issue flushes when needed. Indeed, you are right, which is supported by actual measurements: http://sr5tech.com/write_back_cache_experiments.htm Sorry for implying that anybody has engineered a drive that can do such a nice thing with writeback cache. The disk motor as a generator tale may not be purely folklore. When an IDE drive is not in writeback mode, something special needs to done to ensure the last write to media is not a scribble. A small UPS can make writeback mode actually reliable, provided the system is smart enough to take the drives out of writeback mode when the line power is off. Regards, Daniel - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
On Tue, 15 Jan 2008 20:24:27 -0500 Daniel Phillips [EMAIL PROTECTED] wrote: On Jan 15, 2008 7:15 PM, Alan Cox [EMAIL PROTECTED] wrote: Writeback cache on disk in iteself is not bad, it only gets bad if the disk is not engineered to save all its dirty cache on power loss, using the disk motor as a generator or alternatively a small battery. It would be awfully nice to know which brands fail here, if any, because writeback cache is a big performance booster. AFAIK no drive saves the cache. The worst case cache flush for drives is several seconds with no retries and a couple of minutes if something really bad happens. This is why the kernel has some knowledge of barriers and uses them to issue flushes when needed. Indeed, you are right, which is supported by actual measurements: http://sr5tech.com/write_back_cache_experiments.htm Sorry for implying that anybody has engineered a drive that can do such a nice thing with writeback cache. The disk motor as a generator tale may not be purely folklore. When an IDE drive is not in writeback mode, something special needs to done to ensure the last write to media is not a scribble. A small UPS can make writeback mode actually reliable, provided the system is smart enough to take the drives out of writeback mode when the line power is off. We've had mount -o barrier=1 for ext3 for a while now, it makes writeback caching safe. XFS has this on by default, as does reiserfs. -chris - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
Hi Pavel, Along with this effort, could you let me know if the world actually cares about online fsck? Now we know how to do it I think, but is it worth the effort. Regards, Daniel - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
On Tue, 15 Jan 2008 20:44:38 -0500 Daniel Phillips [EMAIL PROTECTED] wrote: Along with this effort, could you let me know if the world actually cares about online fsck? Now we know how to do it I think, but is it worth the effort. With a filesystem that is compartmentalized and checksums metadata, I believe that an online fsck is absolutely worth having. Instead of the filesystem resorting to mounting the whole volume read-only on certain errors, part of the filesystem can be offlined while an fsck runs. This could even be done automatically in many situations. -- All rights reversed. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html