Bug#239111: Grub is shockingly bad code
tags 239111 patch thanks Hi, Please could you try the attached patch, and confirm that it works? Thanks -- Robert Millan The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and how) you may access your data; but nobody's threatening your freedom: we still allow you to remove your data and not access it at all." diff -ur grub-0.97/util/grub-install.in grub-0.97.new/util/grub-install.in --- grub-0.97/util/grub-install.in 2004-07-24 20:57:31.0 +0200 +++ grub-0.97.new/util/grub-install.in 2009-01-13 12:18:43.0 +0100 @@ -450,6 +450,17 @@ # Create a safe temporary file. test -n "$mklog" && log_file=`$mklog` +# GRUB will try to verify that stage2 is accessible using its own +# filesystem drivers. Make sure it's committed to disk. +sync + +# On XFS, sync() is not enough. +if [ `grub-probe -t fs ${grubdir}` = "xfs" ] ; then + xfs_freeze -f ${grubdir} && xfs_freeze -u ${grubdir} + # We don't have set -e. If xfs_freeze failed, it's worth trying anyway, + # maybe we're lucky. +fi + # Now perform the installation. $grub_shell --batch $no_floppy --device-map=$device_map <$log_file root $root_drive
Bug#239111: Grub is shockingly bad code
On Mon, Jan 12, 2009 at 07:57:44PM +, Robert McQueen wrote: > I didn't test that, I was basing on my experiences of installing GRUB > manually. I usually do: > > mkdir /boot/grub > cp -a /usr/lib/grub/i386-pc/* /boot/grub > vi /boot/grub/device.map > grub --device-map=/boot/grub/device.map > # doesn't work because XFS hasn't put /boot/grub onto disk ... > # swear a bit > xfs_freeze -f / > xfs_freeze -u / > grub --device-map=/boot/grub/device.map > # works > update-grub > # makes me a menu.lst file > > This never froze for me. I wasn't aware you were running xfs_freeze by hand. But I don't trust that this won't hang the kernel. See the link I pasted in the previous mail. This situation is very unfortunate. If we call xfs_freeze, we risk hanging Linux. If we don't, we risk the file being inaccesible. As you point out, an error message is preferable to hanging the kernel, but it's a certain error message against a possible but unlikely hang. Maybe we should go with the latter then? In any case, both are better than a certain hang which is what we have now. > > And if we _really_ need xfs_freeze, it means we're doing something wrong. > > I thought the wrongness was well known: the GRUB shell tries to read the > block device directly and traverse the filesystem directories itself to > to find which blocks on the disk are where the stage* files are located. No, this is a missunderstanding (which I got misled with, too). GRUB only tries to verify that the file will be readable later on (that is, when Linux isn't running). stage1 is the MBR. stage1.5 is supposed to be embedded after the MBR. stage2 is in the filesystem, and only accessed using the filesystem driver in stage1.5. > XFS considers putting the metadata into the journal an acceptable > implementation of sync(), but grub's filesystem reading code ignores the > journal, so can't find the files. If the shell used FIBMAP ioctl to find > the blocks when running under the OS, it'd always find them and all > would be sweetness and light. We can't do that. Using kernel facilities to find the blocks would defeat the point of verifiing that GRUB can read the file. -- Robert Millan The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and how) you may access your data; but nobody's threatening your freedom: we still allow you to remove your data and not access it at all." -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#239111: Grub is shockingly bad code
On Mon, Jan 12, 2009 at 08:13:52PM +, Robert McQueen wrote: > > - sync is not enough > > (see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=239111#53) > > Correct. This is a property of XFS. As I said, it considers that putting > metadata into the journal, and knowing it's flushed there, to meet the > API guarantee of sync(). sync() doesn't make any guarantee that the > files can be found by an an incomplete implementation of the filesystem > (such as GRUB's, which ignores the journal). There was a conscious decision to avoid implementing journaling. I don't remember why; code size, I guess. > > - xfs_freeze could hang even if you don't write anything > > (see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=239111#194) > > I believe this report is unreliable, perhaps it froze because it's made > an echo in a process whose stdout is being logged? Good catch. Ok, then I'm fine with that approach. > I will try to find some time to test > it as well as Ben's simple freeze; unfreeze patch which I prefer, You mean the one in #242 ? This looks good to me. Btw, Ben, could you test your patch on GRUB 2 too? It probably has exactly the same problem. -- Robert Millan The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and how) you may access your data; but nobody's threatening your freedom: we still allow you to remove your data and not access it at all." -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#239111: Grub is shockingly bad code
Robert Millan wrote: > It would seem that running sync would suffice for that. Unfortunately, it > seems that: > > - sync is not enough > (see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=239111#53) Correct. This is a property of XFS. As I said, it considers that putting metadata into the journal, and knowing it's flushed there, to meet the API guarantee of sync(). sync() doesn't make any guarantee that the files can be found by an an incomplete implementation of the filesystem (such as GRUB's, which ignores the journal). > - xfs_freeze could hang even if you don't write anything > (see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=239111#194) I believe this report is unreliable, perhaps it froze because it's made an echo in a process whose stdout is being logged? > This is odd. I think I'm going to assume sync does what is expected from > it. If it doesn't maybe it's a bug in Linux, which maybe got fixed already, > or maybe it's a (lesser) bug in GRUB which we can't find untill it's > exposed. It's not a bug, it's the behaviour of XFS which I've explained a few times already. The bug in GRUB is trying to read the filesystem at a block level, behind the kernel's back. > In any case, we're better off not using xfs_freeze untill we know for sure > why it is needed, and are certain it won't cause hangs. I know why it's needed, and have done for around 4 years, and have had this confirmed from the XFS developers too. > I'm waiting for Ron's confirmation that this works for him, and then I'll > remove the patch & upload. I'm pretty sure it won't work reliably without some xfs_freeze, so I still prefer including the patch. I will try to find some time to test it as well as Ben's simple freeze; unfreeze patch which I prefer, as I believe it will make things work in lenny's kernel (although not etch, where freeze is sadly broken, which is why the patch in #179 has such horrible looping sleep/freeze hacks anyway). Cheers, Rob -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#239111: Grub is shockingly bad code
Robert Millan wrote: > Grmf. I was making wrong assumptions. This is not about block lists > (I still think block lists suck, but let's be fair...): ... > So we freeze the filesystem and afterwards try to write to it. Not a > good idea... Indeed not. > #239111 initial report claims GRUB hangs before we added > xfs_freeze.diff, but according to what Rob says, not freezing makes it > work for him. I didn't test that, I was basing on my experiences of installing GRUB manually. I usually do: mkdir /boot/grub cp -a /usr/lib/grub/i386-pc/* /boot/grub vi /boot/grub/device.map grub --device-map=/boot/grub/device.map # doesn't work because XFS hasn't put /boot/grub onto disk ... # swear a bit xfs_freeze -f / xfs_freeze -u / grub --device-map=/boot/grub/device.map # works update-grub # makes me a menu.lst file This never froze for me. > Rob, can you test removing the patch completely? I suppose, it would at least make it fail rather than fuck my system, but based on my understanding of the problem, won't make installations on XFS work reliably. > I'm almost certain it's the fwrite() call in install_func. I could be > wrong, but I still don't see why we would want to use xfs_freeze anyway. You use xfs_freeze to force XFS to flush the metadata for the newly created stage* files out of the journal and into the directory entries themselves. sync() puts the data on disk and the metadata in the journal, but xfs_freeze gives the extra guarantee of the journal being clean and empty so the filesystem can be shapshotted and mounted as read-only (with no journal replay). > And if we _really_ need xfs_freeze, it means we're doing something wrong. I thought the wrongness was well known: the GRUB shell tries to read the block device directly and traverse the filesystem directories itself to to find which blocks on the disk are where the stage* files are located. XFS considers putting the metadata into the journal an acceptable implementation of sync(), but grub's filesystem reading code ignores the journal, so can't find the files. If the shell used FIBMAP ioctl to find the blocks when running under the OS, it'd always find them and all would be sweetness and light. Thats what Steve was trying to do, I think, before he gave up and sent the OP on this thread. Regards, Rob -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#239111: Grub is shockingly bad code
On Mon, Jan 12, 2009 at 08:28:19PM +0100, Robert Millan wrote: > > It's not the log file. Joeyh tried that (see the bug log). > > I'm almost certain it's the fwrite() call in install_func. I could be > wrong, but I still don't see why we would want to use xfs_freeze anyway. > > And if we _really_ need xfs_freeze, it means we're doing something wrong. Ah, I remember. There was a sanity check. The GRUB shell tries to read the file using its own filesystem driver, to make sure real GRUB will be able to do it as well. At least, that's what we do in GRUB 2. It would seem that running sync would suffice for that. Unfortunately, it seems that: - sync is not enough (see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=239111#53) - xfs_freeze could hang even if you don't write anything (see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=239111#194) This is odd. I think I'm going to assume sync does what is expected from it. If it doesn't maybe it's a bug in Linux, which maybe got fixed already, or maybe it's a (lesser) bug in GRUB which we can't find untill it's exposed. In any case, we're better off not using xfs_freeze untill we know for sure why it is needed, and are certain it won't cause hangs. I'm waiting for Ron's confirmation that this works for him, and then I'll remove the patch & upload. -- Robert Millan The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and how) you may access your data; but nobody's threatening your freedom: we still allow you to remove your data and not access it at all." -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#239111: Grub is shockingly bad code
On Mon, Jan 12, 2009 at 06:57:55PM +, Robert McQueen wrote: > > So no, it's nothing to do with where I'm trying to install GRUB to, the > problem is the Debian patch to grub-install did xfs_freeze on my root > filesystem and then did various crap including trying to write to the > log file (and make a /boot/grub/default file, I think?), It's not the log file. Joeyh tried that (see the bug log). I'm almost certain it's the fwrite() call in install_func. I could be wrong, but I still don't see why we would want to use xfs_freeze anyway. And if we _really_ need xfs_freeze, it means we're doing something wrong. -- Robert Millan The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and how) you may access your data; but nobody's threatening your freedom: we still allow you to remove your data and not access it at all." -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#239111: Grub is shockingly bad code
On Mon, Jan 12, 2009 at 07:32:49PM +0100, Robert Millan wrote: > > I (and upstream in general) believe that the only right way to rely on a > hardcoded list of blocks that live inside a filesystem is _not to_. Grmf. I was making wrong assumptions. This is not about block lists (I still think block lists suck, but let's be fair...): #ifdef GRUB_UTIL /* In the grub shell, access the Stage 2 via the OS filesystem service, if possible. */ if (stage2_os_file) { [...] if (fwrite (stage2_buffer, 1, SECTOR_SIZE, fp) != SECTOR_SIZE) [...] } else #endif /* GRUB_UTIL */ { if (! devwrite (saved_sector - part_start, 1, stage2_buffer)) goto fail; } So we freeze the filesystem and afterwards try to write to it. Not a good idea... #239111 initial report claims GRUB hangs before we added xfs_freeze.diff, but according to what Rob says, not freezing makes it work for him. Rob, can you test removing the patch completely? -- Robert Millan The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and how) you may access your data; but nobody's threatening your freedom: we still allow you to remove your data and not access it at all." -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#239111: Grub is shockingly bad code
Robert Millan wrote: > Rob, > > Did you hit this problem when installing GRUB to a partition, or to the > whole disk? I was upgrading from etch to lenny on a box where / is XFS and /boot and /var are on the same partition. GRUB is installed into the MBR. I know you can't install bootloaders onto XFS partitions. (If that had happened, based on previous experience the FS would've panicked and all IO returned with an error, so I'd probably have been able to log in and get a lot of errors, assuming bash was still in the cache. :D) So no, it's nothing to do with where I'm trying to install GRUB to, the problem is the Debian patch to grub-install did xfs_freeze on my root filesystem and then did various crap including trying to write to the log file (and make a /boot/grub/default file, I think?), all of which go into D state while the filesystem is frozen so don't finish, and meanwhile everything else on my system ground to a halt too. Obviously hard for me to debug given I lost all my access to the system, but asking SysRq for backtraces showed everything was blocked on FS access. (Actually as a result of my experience I think XFS devs are adding an unfreeze SysRq atm...) Just make the script not try do anything at all which might potentially cause a write to the filesystem while it's frozen, is the main thing... Backing out the bullshit XFS Debian patch would do that, applying Ben's might even make it work properly on lenny. Regards, Rob -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#239111: Grub is shockingly bad code
Rob, Did you hit this problem when installing GRUB to a partition, or to the whole disk? -- Robert Millan The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and how) you may access your data; but nobody's threatening your freedom: we still allow you to remove your data and not access it at all." -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#239111: Grub is shockingly bad code
Robert Millan wrote: > Hi Rob, Hi Robert, > You just convinced me that this is completely fucked up. This is not the > first time someone claims to have fixed this problem, only to discover that > it wasn't, and I'm not going to gamble with ioctls, freeze/unfreeze combos > or Linux version checks. We all know this is fucked up. :) I'm not claiming to fix anything so it somehow works flawlessly. I just want to avoid my system being locked up when I follow the instructions in NEWS.Debian. So the changes I'm proposing for lenny are just: a) change the Debian XFS hack to grub-install so that it does "copy stage files; freeze; unfreeze; install grub" rather than "copy stage files; freeze; lots of shit; unfreeze" which is guaranteed. I don't care if it *works*, but it will at least print me an error message which is fine because everyone who has Grub and XFS probably installed it manually once anyway. Anyway, Ben's simpler patch (just xfs_freeze followed by unfreeze) probably *does* work on lenny and at least fails gracefully on etch. I'll test it. b) clarify the documentation in NEWS.Debian which encourages people to run "grub-install" unequivocally, so that it tells people with an XFS /boot filesystem to either upgrade after rebooting, or upgrade later. If you can tell me which kernel versions that etch's Grub can boot, I can propose a patch. For what it's worth I've chatted to XFS developers on IRC (Eric Sandeen amongst others) and pointed them to this bug, and they agree with my suggested approach here. > I (and upstream in general) believe that the only right way to rely on a > hardcoded list of blocks that live inside a filesystem is _not to_. You're in 100% agreement with the XFS developers here. > I'll see if it's feasible to make it use embedding, and if not, XFS support > in GRUB Legacy will be terminated, and users will be advised to either > leave /boot/grub untouched or upgrade to GRUB 2. Maybe sensible, but I suggest the above two changes are made as well to avoid fucking people's systems if they /do/ run grub-install, because the current Debian XFS hack makes things much much worse than upstream. Regards, Rob -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#239111: Grub is shockingly bad code
On Mon, Jan 12, 2009 at 06:15:52PM +, Robert McQueen wrote: > severity 239111 grave > thanks > > Robert Millan wrote: > > The whole approach is wrong, so maybe it makes sense to avoid it, or maybe > > it's too late for that, and we should issue a critical debconf warning when > > XFS is detected. > > > > I will have to think about it. > > The problem is that Debian's patch to try and make it work on XFS makes > it /worse/. It turns "grub-install fails" into "grub-install fucks my > system". There's no question we should just fix that ASAP by applying a > patch which just does freeze immediately followed by unfreeze. That's > why I raised the severity, and I still consider this RC for lenny > because the upgrade instructions in NEWS.Debian encouraged me to do > something which bricked my system. > > (Note that although I didn't try it yet, I'm almost certain that Ben's > approach with a helper binary to run FIBMAP ioctl won't help any more > than calling sync() will. sync is /not/ broken in XFS - if you call it > then it does flush the data to disk - the "problem" is that it considers > metadata synced to the journal as safely sync()'d, so calling it isn't > sufficient to udate the dentry's for grub to read directly.) > > Given even xfs_freeze is broken in etch's kernel, we can't make > grub-install work reliably there, but if we put freeze/unfreeze between > copying the files and invoking grub then it will a) not hose people's > systems under etch, and b) will work with lenny's kernel. There's also > no need for a critical warning, because grub-install has to be done by > the user manually anyway. We should just include the correct advice. > > If you'd answer my question about which kernel / grub versions are > actually incompatible, then we can choose between the two behaviours. > If etch's grub can boot lenny's kernel, we can tell XFS users to just > update grub after rebooting, otherwise we can direct them at some manual > install instruction and say "please freeze and unfreeze the filesystem, > run sync a few times, and then have a cup of tea, between copying the > stage* files and running the grub shell". Hi Rob, You just convinced me that this is completely fucked up. This is not the first time someone claims to have fixed this problem, only to discover that it wasn't, and I'm not going to gamble with ioctls, freeze/unfreeze combos or Linux version checks. I (and upstream in general) believe that the only right way to rely on a hardcoded list of blocks that live inside a filesystem is _not to_. I'll see if it's feasible to make it use embedding, and if not, XFS support in GRUB Legacy will be terminated, and users will be advised to either leave /boot/grub untouched or upgrade to GRUB 2. -- Robert Millan The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and how) you may access your data; but nobody's threatening your freedom: we still allow you to remove your data and not access it at all." -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#239111: Grub is shockingly bad code
On Mon, Jan 12, 2009 at 07:00:58PM +0100, Robert Millan wrote: >On Sun, Jan 11, 2009 at 05:16:03PM +, Steve McIntyre wrote: >> Yes, I've looked at the grub2 source and I'm much happier. It doesn't >> look like a novice had written it, which is a major improvement. I'm >> still curious WTH anybody would think the nested functions are a good >> thing, though... > >It's kind of a tradition here. I think the point is just to limit the >scope of "hook functions" same way as is usually done for variables. :-/ As an experienced C programmer, they make me shudder. For simple readability of the code, I'd give up on them altogether. But it's your code... >> The thing I'm more bothered about right now is what will happen to >> people with existing systems who upgrade. That's why Rob raised this >> bug to grave: his remote system locked up as part of an upgrade. What >> do you plan to do to stop this happening again? > >The whole approach is wrong, so maybe it makes sense to avoid it, or maybe >it's too late for that, and we should issue a critical debconf warning when >XFS is detected. > >I will have to think about it. OK; please be quick, as we could really do with something in place for Lenny and we're not far away. -- Steve McIntyre, Cambridge, UK.st...@einval.com You lock the door And throw away the key There's someone in my head but it's not me -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#239111: Grub is shockingly bad code
severity 239111 grave thanks Robert Millan wrote: > The whole approach is wrong, so maybe it makes sense to avoid it, or maybe > it's too late for that, and we should issue a critical debconf warning when > XFS is detected. > > I will have to think about it. The problem is that Debian's patch to try and make it work on XFS makes it /worse/. It turns "grub-install fails" into "grub-install fucks my system". There's no question we should just fix that ASAP by applying a patch which just does freeze immediately followed by unfreeze. That's why I raised the severity, and I still consider this RC for lenny because the upgrade instructions in NEWS.Debian encouraged me to do something which bricked my system. (Note that although I didn't try it yet, I'm almost certain that Ben's approach with a helper binary to run FIBMAP ioctl won't help any more than calling sync() will. sync is /not/ broken in XFS - if you call it then it does flush the data to disk - the "problem" is that it considers metadata synced to the journal as safely sync()'d, so calling it isn't sufficient to udate the dentry's for grub to read directly.) Given even xfs_freeze is broken in etch's kernel, we can't make grub-install work reliably there, but if we put freeze/unfreeze between copying the files and invoking grub then it will a) not hose people's systems under etch, and b) will work with lenny's kernel. There's also no need for a critical warning, because grub-install has to be done by the user manually anyway. We should just include the correct advice. If you'd answer my question about which kernel / grub versions are actually incompatible, then we can choose between the two behaviours. If etch's grub can boot lenny's kernel, we can tell XFS users to just update grub after rebooting, otherwise we can direct them at some manual install instruction and say "please freeze and unfreeze the filesystem, run sync a few times, and then have a cup of tea, between copying the stage* files and running the grub shell". Regards, Rob -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#239111: Grub is shockingly bad code
On Sun, Jan 11, 2009 at 05:16:03PM +, Steve McIntyre wrote: > Yes, I've looked at the grub2 source and I'm much happier. It doesn't > look like a novice had written it, which is a major improvement. I'm > still curious WTH anybody would think the nested functions are a good > thing, though... It's kind of a tradition here. I think the point is just to limit the scope of "hook functions" same way as is usually done for variables. > >As for the bug severity, this bug has been around for a while, and we never > >considered it to be Release Critical. D-I avoids using GRUB Legacy when user > >selected XFS, and rightly so. As I said, with GRUB 2 it's going to be > >different (if someone wants to know more about the technical details on how > >we changed our approach, feel free to contact me). > > The thing I'm more bothered about right now is what will happen to > people with existing systems who upgrade. That's why Rob raised this > bug to grave: his remote system locked up as part of an upgrade. What > do you plan to do to stop this happening again? The whole approach is wrong, so maybe it makes sense to avoid it, or maybe it's too late for that, and we should issue a critical debconf warning when XFS is detected. I will have to think about it. -- Robert Millan The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and how) you may access your data; but nobody's threatening your freedom: we still allow you to remove your data and not access it at all." -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#239111: Grub is shockingly bad code
On Sun, Jan 11, 2009 at 12:26:48PM +0100, Robert Millan wrote: >severity 239111 important >clone 239111 -1 >retitle -1 should refuse to install on XFS unless embedding can be used >reassign -1 grub2 >thanks > >On Sun, Jan 04, 2009 at 02:25:28AM +, Steve McIntyre wrote: >> >> After several hours of working through the source, I give up. It's a >> total mess and I'd much rather see grub simply removed from Debian >> altogether than fix this bug and allow it to live on. Highlights: >> >> * gratuitous use of nested functions >> * horrific interfaces that don't pass enough information around >>internally, leading to: >> * functions passing internal state around via umarked global >>variables, relying on side effects >> * significantly obfuscated code >> * the core bug as described by Rob: accessing a block device >>underneath an active filesystem and assuming that the results will >>be safe and/or correct. >> >> I *know* that grub is a bootloader, so it's always going to end up >> having some nasty lowlevel code somewhere. But that's no excuse for >> the code quality I've just seen. After this experience, I'm about to >> remove grub from all my systems. Come back lilo, all is forgiven. > >You're right. Code in GRUB Legacy is such a mess that we would have to >rewrite it from scratch in order to get things right. > >Oh, wait, we already did. It's called GRUB 2. Though, we're still fond of >the nested functions. :-) Yes, I've looked at the grub2 source and I'm much happier. It doesn't look like a novice had written it, which is a major improvement. I'm still curious WTH anybody would think the nested functions are a good thing, though... >As for accessing a block device underneath an active filesystem, we don't >do this anymore, except when we have no other choice. Yay. >For XFS, this means that as long as you install GRUB to your MBR (and >implicitly, post-MBR area), you'll never hit this problem. That is, >as long as your core.img fits in 32 kiB, which we are very careful to >ensure it will always do. > >This reminds me, when it doesn't, in the particular case of XFS we need to >abort install and tell the user why. I'm cloning this bug for GRUB 2 with >that purpose. > >Robert (et al): Sorry that you spent so much effort debugging GRUB Legacy; >we (upstream) have abandoned this codebase long ago. > >As for the bug severity, this bug has been around for a while, and we never >considered it to be Release Critical. D-I avoids using GRUB Legacy when user >selected XFS, and rightly so. As I said, with GRUB 2 it's going to be >different (if someone wants to know more about the technical details on how >we changed our approach, feel free to contact me). The thing I'm more bothered about right now is what will happen to people with existing systems who upgrade. That's why Rob raised this bug to grave: his remote system locked up as part of an upgrade. What do you plan to do to stop this happening again? -- Steve McIntyre, Cambridge, UK.st...@einval.com "This dress doesn't reverse." -- Alden Spiess -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#239111: Grub is shockingly bad code
severity 239111 important clone 239111 -1 retitle -1 should refuse to install on XFS unless embedding can be used reassign -1 grub2 thanks On Sun, Jan 04, 2009 at 02:25:28AM +, Steve McIntyre wrote: > > After several hours of working through the source, I give up. It's a > total mess and I'd much rather see grub simply removed from Debian > altogether than fix this bug and allow it to live on. Highlights: > > * gratuitous use of nested functions > * horrific interfaces that don't pass enough information around >internally, leading to: > * functions passing internal state around via umarked global >variables, relying on side effects > * significantly obfuscated code > * the core bug as described by Rob: accessing a block device >underneath an active filesystem and assuming that the results will >be safe and/or correct. > > I *know* that grub is a bootloader, so it's always going to end up > having some nasty lowlevel code somewhere. But that's no excuse for > the code quality I've just seen. After this experience, I'm about to > remove grub from all my systems. Come back lilo, all is forgiven. You're right. Code in GRUB Legacy is such a mess that we would have to rewrite it from scratch in order to get things right. Oh, wait, we already did. It's called GRUB 2. Though, we're still fond of the nested functions. :-) As for accessing a block device underneath an active filesystem, we don't do this anymore, except when we have no other choice. For XFS, this means that as long as you install GRUB to your MBR (and implicitly, post-MBR area), you'll never hit this problem. That is, as long as your core.img fits in 32 kiB, which we are very careful to ensure it will always do. This reminds me, when it doesn't, in the particular case of XFS we need to abort install and tell the user why. I'm cloning this bug for GRUB 2 with that purpose. Robert (et al): Sorry that you spent so much effort debugging GRUB Legacy; we (upstream) have abandoned this codebase long ago. As for the bug severity, this bug has been around for a while, and we never considered it to be Release Critical. D-I avoids using GRUB Legacy when user selected XFS, and rightly so. As I said, with GRUB 2 it's going to be different (if someone wants to know more about the technical details on how we changed our approach, feel free to contact me). -- Robert Millan The DRM opt-in fallacy: "Your data belongs to us. We will decide when (and how) you may access your data; but nobody's threatening your freedom: we still allow you to remove your data and not access it at all." -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#239111: Grub is shockingly bad code
Thanks to Rob McQueen for doing a lot of research into this for his mail about this bug on 12 Dec 2008. Based on that mail, I've looked through the code in an attempt to find a way to implement a reasonable fix: using FIBMAP as he suggested. After several hours of working through the source, I give up. It's a total mess and I'd much rather see grub simply removed from Debian altogether than fix this bug and allow it to live on. Highlights: * gratuitous use of nested functions * horrific interfaces that don't pass enough information around internally, leading to: * functions passing internal state around via umarked global variables, relying on side effects * significantly obfuscated code * the core bug as described by Rob: accessing a block device underneath an active filesystem and assuming that the results will be safe and/or correct. I *know* that grub is a bootloader, so it's always going to end up having some nasty lowlevel code somewhere. But that's no excuse for the code quality I've just seen. After this experience, I'm about to remove grub from all my systems. Come back lilo, all is forgiven. -- Steve McIntyre, Cambridge, UK.st...@einval.com Is there anybody out there? -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org