On Thu, May 29, 2014 at 09:40:34AM -0400, Phillip Susi wrote:
> Indeed, part of the problem is that everyone piled into the same bug
> with several different issues rather than troubleshooting it on a case
> by case basis.
This certainly happens, and I realise that's annoying; any bug like this
is likely to be only partially fixed, and at some point people who still
have problems will need to be directed to file new bugs rather than
continuing to comment on the closed bug. However, closing the bug
without making any technical changes is likely to be read as blowing
*everyone* off, no matter the good intentions, and just compounds the
problem.
> It wasn't a starting point for a conversation; I had tried dozens of
> times for weeks to get more information, identify the cause(es), and
> explain why it was a result of incorrect action on the user's part.
> That statement was made in direct response to someone saying that as a
> user they felt they needed to reopen it ( yet again ) without
> understanding why I had closed it, or offering any real
> counter-argument. By that point I was throwing my arms in the air.
When people repeatedly reopen a bug, it's often worth considering
whether it was actually the right thing to do to close it in the first
place. The sheer number of people affected by this class of bugs is an
indication that we shouldn't be closing it out of hand, even if you
don't immediately see what we can do about it. Given that we have
extensive maintainer script code for dealing with situations like this,
there's clearly scope for further improvement.
> It would be helpful if you would comment if you think there actually
> is something that might be done. Since this had gone on for some time
> without any comment from you, I assumed you were ignoring it as just
> another kvetch fest. I certainly would be interested in any ideas you
> might have.
I'm afraid I don't have time to read more than a tiny fraction of the
bug mail I get, although this had been escalated to me by several folks
in my management chain and I'd put it on my to-do list for 14.04.1; I'd
just been heads-down in the image build infrastructure changes I'm
currently doing, so hadn't emerged for long enough to dig through the
bug.
I don't yet have specific fixes in mind, but there is certainly plenty
of fodder for investigation here. For example, skimming through the bug
log, I see an instance
(https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1289977/comments/207)
where somebody swapped disks and then the maintainer scripts didn't
realise that they needed to install GRUB to the new disk. This
situation is *specifically* intended to be handled by the maintainer
script code I wrote some time ago (and wrote up in
http://www.chiark.greenend.org.uk/ucgi/~cjwatson/blosxom/debian/2010-06-21-grub2-boot-problems.html),
so if it's failing then I need to investigate that, not discard it as a
situation we can't fix.
This is a long-standing class of bug, although the precise details have
varied over time. The reason it's so difficult to address is that the
root causes are often far removed in time: if you get your configuration
wrong then you often don't find out about it until the next upgrade.
That makes this very challenging to deal with, although not impossible.
In many cases this is user error, narrowly defined (that is, the user
did not do the "right thing", but perhaps we didn't do much to help them
know what the right thing would be). Still, it's still sometimes
possible to detect it heuristically and offer to correct the situation
on upgrade: given that the result of failure is a failed boot, it's
worth going beyond what we would ordinarily do to handle user error.
For example, I'm considering approaches such as looking for binary
signatures which would serve to identify GRUB across a wide range of
versions, or patching grub-install to leave a note for future
grub-pc.postinst runs, or going through my existing detection code again
to try to find paths where it's supposed to ask questions but fails to
do so.
The other strand of investigation is to try to track down reasons why
this happened in the first place. For example, I suspect that there may
be some paths where installing Ubuntu leaves the wrong thing in
grub-pc/install_devices. I'd also like to go through some of our
user-facing documentation such as
https://help.ubuntu.com/community/Grub2, and try to cut it down a bit
and review closely for any inaccuracies. If I find time I'd also like
to review tools such as boot-repair and see if I can make sure that they
don't fix immediate problems while leaving future timebombs around
(which might relate to patching grub-install).
That's a rough idea of what I plan to look at here. As you can see it's
extensive and will require a good deal of continuous concentration; I
expect to have to carve out at least three solid days to work on this.
--
Colin Watson [cjwat...@ubuntu.c