Re: Boot process robustness
John Baldwin wrote: /boot/loader.conf perhaps, but how does the loader know that the previous boot failed so that it knows to fall back? This is much harder, as a failed kernel boot usually results in a hang or an instant CPU reset. Loader sets a flag before booting, and the boot process resets it at the end. Of course, loader doesn't have write capability. -- Daniel C. Sobral(8-DCS) [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] "There is no spoon." -- Kiki To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Boot process robustness
- Original Message - From: "James Halstead" [EMAIL PROTECTED] To: "Poul-Henning Kamp" [EMAIL PROTECTED] Sent: Tuesday, January 02, 2001 11:30 PM Subject: Re: Boot process robustness - Original Message - From: "Poul-Henning Kamp" [EMAIL PROTECTED] To: "Walter W. Hop" [EMAIL PROTECTED] Cc: "FreeBSD hackers" [EMAIL PROTECTED] Sent: Thursday, December 28, 2000 9:31 AM Subject: Re: Boot process robustness In message [EMAIL PROTECTED], "Walter W. Ho p" writes: Hi all, I was wondering how to increase the robustness of the booting process, so that a box would be able to keep itself on its feet without intervention of the console. I think this would be of great value to the many people who administer colocated boxes. I'm not much of a coder so all I can do is mailing this (at the risk of wasting your time with total useless crap ofcourse, in which case I apologize.) 1. Old kernel recovery When 'make install'ing a new kernel, a flag is raised (say, 'revert_on_fail') which is only cleared after a successful system initialisation. When the new kernel boots, a panic in this state or an unexpected reboot (reset after a system hang) would cause /kernel.old to be loaded on the next boot instead (maybe the same could work for /etc/rc.conf.old) This is actually more a question of where to store the flag than anything else. Couldn't you just modify the shutdown command to have an option for revert on fail, which would create a file on the root filesystem with a timestamp of when the reboot started. Then at boot time, if that timstamp is still there, and has been around for too long, boot the kernel.old instead of kernel. Then the question is what amount of time is reasonable for the wait period. This may have the machine boot the new kernel and panic a few times, but at least you can be assured that it would after x minutes boot the old kernel instead. Once a boot was successful the times stamp file could be removed. Just a thought. ~James Julian made a rather hackish thing for Whistle, but I think we lost that with the advent of the new bootblocks. 2. Automatic file system checks In case of a powercycle or crash, it could be that a filesystem needs fixing. Now I don't know much about fs internals, but I guess that in most cases just answering 'Y' to fsck's questions will fix things. I would appreciate an option where an inconsistency would start up fsck in an "automatic" repair mode, with all actions logged and "undo" data being saved (in case manual review is needed). Alternatively it might be worth considering adding a "remote-single-user" capability: If an fsck fails, ifconfig the interfaces and start an sshd so people can get in remotely and fsck... -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Boot process robustness
If an fsck fails, ifconfig the interfaces and start an sshd so people can get in remotely and fsck... What if an fsck on /usr fails? Other than that, I love the idea! Force-mount it read-only if necessary, or simply copy a static sshd into /sbin. Runnning fsck -y is the wrong solution, since if fsck can't fix an error automatically, something pretty bad has happened (physical media error, someone dd'ing onto the raw disk, etc).. Even so, unless the machine contains invaluable data, I guess 99% still does a fsck -y if fsck fails. I'd rather have my remote boxes do that by themselves, and perhaps email me, than I either have to drive there, or give somebody the root password, and remote control that person to just do fsck -y. In almost all cases, when a machine can't fsck itself after a power failure, a fsck -y fixes it. But then, most of the disk is either squid's cache, or unused stuff like termcaps, kernel source, man pages etc. Most stuff is there just because it could be handy one day, and it is not worth the trouble pruning it. Leif To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Boot process robustness
John Baldwin wrote: On 28-Dec-00 Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], "Walter W. Ho p" writes: Hi all, I was wondering how to increase the robustness of the booting process, so that a box would be able to keep itself on its feet without intervention of the console. I think this would be of great value to the many people who administer colocated boxes. the old 'nextboot(8)' system used to do this if you had the 'writeback' option enabled.. it wrote a sequence of boot strings into block '1' (not 0) and zero'd them out as it used them up. /etc/rc would then be used to write an appropriate set of strings back in the case of a successful boot. This is used successfullly in the thousands of interjets out there in the field. Unfortunatly the new bootblock writers never considered this an important enough feature to emulate. but it gives you a place to look. We wrote a list of ever-increasingly conservative boot strings, eventually even moving to an alternate root partition I'm not much of a coder so all I can do is mailing this (at the risk of wasting your time with total useless crap ofcourse, in which case I apologize.) 1. Old kernel recovery When 'make install'ing a new kernel, a flag is raised (say, 'revert_on_fail') which is only cleared after a successful system initialisation. When the new kernel boots, a panic in this state or an unexpected reboot (reset after a system hang) would cause /kernel.old to be loaded on the next boot instead (maybe the same could work for /etc/rc.conf.old) This is actually more a question of where to store the flag than anything else. /boot/loader.conf perhaps, but how does the loader know that the previous boot failed so that it knows to fall back? This is much harder, as a failed kernel boot usually results in a hang or an instant CPU reset. -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message -- __--_|\ Julian Elischer / \ [EMAIL PROTECTED] ( OZ) World tour 2000 --- X_.---._/ from Perth, presently in: Budapest v To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Boot process robustness
This is used successfullly in the thousands of interjets out there in the field. Unfortunatly the new bootblock writers never considered this an important enough feature to emulate. but it gives you a place to look. One of the "new bootblock authors" actually commented on this thread, including some of the reasons why this approach wasn't taken... -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Boot process robustness
the current ideas all fail badly in the face of 'a' partition filesystem corruption. Very true. I also looked at the CMOS scratchpad registers... -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Boot process robustness
On 28-Dec-00 Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], "Walter W. Ho p" writes: Hi all, I was wondering how to increase the robustness of the booting process, so that a box would be able to keep itself on its feet without intervention of the console. I think this would be of great value to the many people who administer colocated boxes. I'm not much of a coder so all I can do is mailing this (at the risk of wasting your time with total useless crap ofcourse, in which case I apologize.) 1. Old kernel recovery When 'make install'ing a new kernel, a flag is raised (say, 'revert_on_fail') which is only cleared after a successful system initialisation. When the new kernel boots, a panic in this state or an unexpected reboot (reset after a system hang) would cause /kernel.old to be loaded on the next boot instead (maybe the same could work for /etc/rc.conf.old) This is actually more a question of where to store the flag than anything else. /boot/loader.conf perhaps, but how does the loader know that the previous boot failed so that it knows to fall back? This is much harder, as a failed kernel boot usually results in a hang or an instant CPU reset. -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Boot process robustness
This is actually more a question of where to store the flag than anything else. /boot/loader.conf perhaps, but how does the loader know that the previous boot failed so that it knows to fall back? This is much harder, as a failed kernel boot usually results in a hang or an instant CPU reset. I had always planned to write a fixed-size file to disk (probably 512 bytes) and then implement "overwrite only" write support in the various filesystems to allow us to use it as a "persistent" environment store, eg. have a 'save foo' keyword which would update the persistent store with the 'foo' variable. This would avoid the bloat that block allocation, directory creation etc. would entail with "real" write support, whilst allowing us most of the desirable features. All of the primary boot filesystems (ffs, nfs, tftp, fat, ext2) could handle this with trivial modifications. -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Boot process robustness
Hi all, I was wondering how to increase the robustness of the booting process, so that a box would be able to keep itself on its feet without intervention of the console. I think this would be of great value to the many people who administer colocated boxes. I'm not much of a coder so all I can do is mailing this (at the risk of wasting your time with total useless crap ofcourse, in which case I apologize.) 1. Old kernel recovery When 'make install'ing a new kernel, a flag is raised (say, 'revert_on_fail') which is only cleared after a successful system initialisation. When the new kernel boots, a panic in this state or an unexpected reboot (reset after a system hang) would cause /kernel.old to be loaded on the next boot instead (maybe the same could work for /etc/rc.conf.old) 2. Automatic file system checks In case of a powercycle or crash, it could be that a filesystem needs fixing. Now I don't know much about fs internals, but I guess that in most cases just answering 'Y' to fsck's questions will fix things. I would appreciate an option where an inconsistency would start up fsck in an "automatic" repair mode, with all actions logged and "undo" data being saved (in case manual review is needed). There! (Merry etc etc, by the way!) walter -- Walter W. Hop [EMAIL PROTECTED] | +31 6 24290808 | PGP key: 0xD4DD8DEB To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Boot process robustness
In message [EMAIL PROTECTED], "Walter W. Ho p" writes: Hi all, I was wondering how to increase the robustness of the booting process, so that a box would be able to keep itself on its feet without intervention of the console. I think this would be of great value to the many people who administer colocated boxes. I'm not much of a coder so all I can do is mailing this (at the risk of wasting your time with total useless crap ofcourse, in which case I apologize.) 1. Old kernel recovery When 'make install'ing a new kernel, a flag is raised (say, 'revert_on_fail') which is only cleared after a successful system initialisation. When the new kernel boots, a panic in this state or an unexpected reboot (reset after a system hang) would cause /kernel.old to be loaded on the next boot instead (maybe the same could work for /etc/rc.conf.old) This is actually more a question of where to store the flag than anything else. Julian made a rather hackish thing for Whistle, but I think we lost that with the advent of the new bootblocks. 2. Automatic file system checks In case of a powercycle or crash, it could be that a filesystem needs fixing. Now I don't know much about fs internals, but I guess that in most cases just answering 'Y' to fsck's questions will fix things. I would appreciate an option where an inconsistency would start up fsck in an "automatic" repair mode, with all actions logged and "undo" data being saved (in case manual review is needed). Alternatively it might be worth considering adding a "remote-single-user" capability: If an fsck fails, ifconfig the interfaces and start an sshd so people can get in remotely and fsck... -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Boot process robustness
On Thu, Dec 28, 2000 at 03:31:55PM +0100, Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], "Walter W. Ho p" writes: [snip] 2. Automatic file system checks In case of a powercycle or crash, it could be that a filesystem needs fixing. Now I don't know much about fs internals, but I guess that in most cases just answering 'Y' to fsck's questions will fix things. I would appreciate an option where an inconsistency would start up fsck in an "automatic" repair mode, with all actions logged and "undo" data being saved (in case manual review is needed). Alternatively it might be worth considering adding a "remote-single-user" capability: If an fsck fails, ifconfig the interfaces and start an sshd so people can get in remotely and fsck... What if an fsck on /usr fails? Other than that, I love the idea! G'luck, Peter -- I am not the subject of this sentence. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Boot process robustness
In the last episode (Dec 28), Peter Pentchev said: On Thu, Dec 28, 2000 at 03:31:55PM +0100, Poul-Henning Kamp wrote: In message [EMAIL PROTECTED], "Walter W. Hop" writes: 2. Automatic file system checks In case of a powercycle or crash, it could be that a filesystem needs fixing. Now I don't know much about fs internals, but I guess that in most cases just answering 'Y' to fsck's questions will fix things. I would appreciate an option where an inconsistency would start up fsck in an "automatic" repair mode, with all actions logged and "undo" data being saved (in case manual review is needed). Alternatively it might be worth considering adding a "remote-single-user" capability: If an fsck fails, ifconfig the interfaces and start an sshd so people can get in remotely and fsck... What if an fsck on /usr fails? Other than that, I love the idea! Force-mount it read-only if necessary, or simply copy a static sshd into /sbin. Runnning fsck -y is the wrong solution, since if fsck can't fix an error automatically, something pretty bad has happened (physical media error, someone dd'ing onto the raw disk, etc).. -- Dan Nelson [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message