Dieter Ries posted <[EMAIL PROTECTED]>, excerpted below, on Mon, 08 May 2006 10:30:02 +0200:
> I still dont understand why > Checking all filesystems > is running in the boot-up process without checkfs and checkroot in one of > the runlevels. There's two reasons for that. One, Gentoo has an initscript dependency system. If you had read the Working with Gentoo section of the handbook, you'd probably understand this a bit better. Unfortunately, many people apparently think the handbook is only for installation, and end up missing out on understanding a lot of the rest of Gentoo as covered in the rest of the handbook. Without that understanding, they are much less efficient at properly administrating their Gentoo system than they'd be otherwise, as they end up doing things the hard way, and making mistakes they'd not make had they read the documentation. Gentoo has a reputation for some of the best documentation in the community, so it's a shame when folks don't read it and end up doing things the hard way as a result. Anyway, what it amounts to is that other initscripts depend on checkfs and checkroot, so the system ensures they are run before these other initscripts run, even if checkroot and checkfs aren't directly listed to be run, themselves. Again, this is covered in the handbook, if you want to better understand how and why it works that way. Reason two is actually what's working here, however. Without it, it would fall back to reason one above, but reason two is the actual mechanism in play here. Unfortunately, this one is /not/ covered in the handbook, or wasn't last I looked, anyway. However, it's a logical extension of reason one, so understanding it makes following reason two easier. As actually implemented by the /sbin/rc initscript (which is run repeatedly by init, as configured in /etc/inittab, as part of the boot process), certain scripts are considered "critical" to the boot process, and thus, barring a local configuration that bypasses them, default to being run directly by /sbin/rc as part of the boot process, regardless of whether they are in the boot runlevel or not. Take a look at the "get_critical_services" routine in /sbin/rc. Basically, unless you have an /etc/runlevels/boot/.critical file, rc sets: CRITICAL_SERVICES="checkroot modules checkfs localmount clock" Those services are then started in exactly that order, directly by rc, previous to running the boot runlevel, regardless of whether they are set to be started by the boot runlevel or not. If you have the modules you need to mount your automatically mounted filesystems built into the kernel, you can eliminate modules from that list. You can also try eliminating checkroot and checkfs, and localmount in some cases, but the results won't always be quite what you expected. Certain other services might not start in the expected order, or at all, because stuff is missing that they depend on and assume is there. With my system, I can safely list only checkroot and clock in my /etc/runlevels/boot/.critical file. That works, altho I have checkfs and localmount in the boot runlevel so they get run anyway -- they just parallelize a bit better (I have RC_PARALLEL_STARTUP="yes" set in /etc/conf.d/rc). However, if I remove checkroot or clock from the .critical file, things don't work quite right -- they have to be there and started by rc directly or the rest of the services in the boot runlevel don't work as intended. The question then occurs... Why are these services considered so critical? In general, you will find your system remains much more stable if you run checkroot and checkfs at boot every time, for your normally mounted filesystems. The problem is that a hardware fault that would cause a small problem, if caught by an fsck at the next boot, may end up being a HUGE problem if the system is allowed to continue writing to that filesystem as if nothing were wrong. A single cross-linked file can soon become hundreds or thousands, as the metadata becomes increasingly jumbled, until it's impossible to recover from without simply overwriting it with a good backup. The problem may take weeks or months, even years, to develop into a system stability compromising issue that's finally noticed when something critical gets damaged. However, regularly running those at-boot fscks ensures that doesn't happen. With a journaled filesystem, it's not as if it takes hours to run those checks anyway. A few extra seconds or a minute taken at boot, can save you a huge amount of work later, because a small and initially insignificant error wasn't caught until hundreds of files had been corrupted. Of course, one is also expected to use fstab appropriately, turning off fsck at boot for non-critical or not automounted filesystems. Here, I have identical backup snapshots of all the filesystems I consider valuable enough to want to retain. Those are not automounted, and are only written to when I mkfs them and recopy over the data from the live filesystem periodically as part of my backup routine. As such, there's no need to fsck them at every boot, because they've most likely not even been touched since the last boot, not written to, not read from, or even mounted. Likewise, any partitions (like /tmp) that contain essentially throwaway data, it's probably safe to skip the fsck, putting a zero in the appropriate column of fstab. For any partitions you depend on, however, while you can probably get away with avoiding fsck at boot in the short term, to be safe, it's far better just to do it. As mentioned by someone else, you can set ext3 partitions to not fsck at every boot, if desired. That's a useful option. Set it to every third boot, or every fifth, but don't turn it off entirely, at the risk of not catching minor/insignificant damage until it's major and causes you serious issues. Keep in mind that even a partition never written to will develop "bit rot" over time, due to cosmic ray bitflipping and the like. The reality is that on the single bit level hard drives aren't nearly as reliable as we like to think they are. Awesome levels of automated redundant information and error correction normally handle the problems as they develop, correcting them behind the scenes. That's normal and good, and generally suffices for partitions not normally written to. However, once you start actively using a partition, writing as well as reading, if one of those normally insignificant bitflips happens in the wrong place, your write intended for one location on the disk might end up at quite a different location. That's what automated fscks at boot, even after proper shutdown, are designed to detect and correct. Catch it early, and it's insignificant, background noise, corrected by automated mechanisms such that you likely won't notice it at all. Fail to do those automated boot-time fscks, and you are playing the odds, risking your data. Setting the fscks to once every third boot is still well within reasonable safety limits, Setting one in five should be safe under normal conditions but is playing the odds a bit more. I'd not recommend turning it off altogether, or setting it much less frequently than one in five, as that's just undue risk, IMO. You may well have no problems doing it that way for years, if ever. Another person may have problems in a week or a month. It's up to you how much risk you want to put your data at. Meanwhile, back in the Gentoo init scripts, mandating checkroot and checkfs as "critical" parts of the boot sequence remains the most sane default. Gentoo provides the configurability to change those defaults for those sysadmins that choose to do so, but setting anything else as the default would simply not be the sane or responsible thing for Gentoo devs to do. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman in http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html -- gentoo-amd64@gentoo.org mailing list