So Mozilla broke on my laptop one day.  I first realized there was a
problem when my friend Kelly tried to use it for Gmail, and clicking
on a certain link reliably crashed the browser.

    (Unfortunately, Kelly didn't realize that when the dialog box
    popped up and said, "Start New Session or Restore Previous
    Session?", the "Start New Session" button should actually have
    been labeled "Discard All of the Web Pages Kragen Previously Had
    Open, Making Him Very Sad".)

There had actually been an incident a few days earlier in which some
unknown page crashed the browser.  After a few "Restore Session"s, the
browser had forgotten most of the pages I had previously had open.
But I figured that this was not likely to re-occur.

But then it did.  After Kelly's problem, there was a page on
www.cnn.com that crashed the browser every time I tried to view it.  I
edited the "sessionstore.js" file (in
.mozilla/firefox/m9e6kquo.default/) to change all the recent URLs to
something that wouldn't work --- mostly by adding "/broken" in the URL
path, e.g. http://www.example.com/foo/bar would become
http://www.example.com/broken/foo/bar.  This let me figure out which
page caused the problem.

At this point, I figured I was running into a browser bug that
probably every Firefox user had in a latent form, maybe one triggered
by my own unusual configuration, and that was only appearing now due
to something having changed on the web.  I asked my friend Meredith
Patterson to look at the page on her Mac OS X browser, but she had no
trouble.

Tracking it down
----------------

I saved the CNN page to my local machine (using Konqueror as a web
browser) and found that it still crashed.  My next priority was to cut
down the crash to a small, reproducible case, so I cut out all the
external links --- inline images, stylesheets, and the like --- by
rewriting "http://"; links as "xhttp://"; links, which were guaranteed
to not work.

It stopped crashing, so I changed some of the "xhttp://"; things back
to "http://"; until I found one that made the browser crash again.

I tried running gdb on the Firefox process at this point, but all it
told me was that it crashed while trying to execute a nonsense
address.  No clue about where it jumped to the nonsense address from,
or why, although it did make me feel that the bug was probably fairly
serious and possibly an exploitable security hole.  Eventually,
though, I gave up on the gdb approach.

It turned out that the only external thing needed to reproduce the
crash was a single external JavaScript file, main.js, and the contents
of the page.  (I tried an HTML file with just a <script> tag to run
main.js, but it didn't crash the browser.)  So I saved a local copy of
the file, reproduced the crash with the page using the local copy, and
then added an "x" to the beginning of each function named in the file
so that they wouldn't be called; this rendered the file harmless.
Then, I started removing the "x"es one by one to see where the crash
happened, but I didn't have to go very far --- the first function I
re-enabled, called "cnnHandleCSIs", crashed the browser.

So then I sprinkled "alert" calls through the function to see where it
crashed, and it was on this line here:

    if(document.body && document.body.innerHTML && cnnUseDelayedCSI)

So I evaluated each of the three conditions in Jesse Ruderman's
JavaScript shell, and found that just evaluating
document.body.innerHTML was sufficient to crash the browser.

So I figured something was funky about this particular page that made
it unsafe to evaluate document.body.innerHTML, and so I thought I'd
see whether it was the page in its original form, or something unusual
one of the scripts in the page was doing to it.  I'd already disabled
the links to all the external scripts except main.js, but I removed
all the inline JavaScript as well --- but the page kept crashing the
browser whenever I asked for document.body.innerHTML.

Verifying the package
---------------------

My next thought was to cut the page down bit by bit to figure out
which feature or combination of features was sufficient to cause the
crash.  But at this point I whined about the situation to some of my
friends, and Paul Visscher suggested that maybe my copy of the browser
was corrupted, or something in my profile.  I tried with a fresh
.mozilla directory (mv .mozilla .mozilla-tosave; restart browser;
verify crash; rm -rf the newly created .mozilla; mv .mozilla-tosave
.mozilla)

So I installed the "debsums" package ("sudo apt-get install debsums"),
which verifies files installed from Debian packages to see if they've
been corrupted, and ran "debsums -s firefox":

    [EMAIL PROTECTED]:~$ debsums -s firefox
    debsums: checksum mismatch firefox file
        /usr/lib/firefox/components/libgklayout.so

Well, so one of the components of Firefox, a "shared object" or shared
library, had been changed since installation --- fairly unusual!  And
it would make sense that if there were an error in that file, some
normal-ish operations like "document.body.innerHTML" might crash.

So I ran "sudo apt-get install --reinstall firefox", and the problem
went away; the browser no longer crashed on the pages that I'd been
able to reproduce the problem with numerous times before, and debsums
stopped complaining.

The Problem
-----------

I saved a copy of the broken libgklayout.so, and it turned out that it
differed from the correct one only in a single bit.  I opened the two
files with "hexl-find-file" in Emacs and ran "ediff-buffers" to find
the differences between the two, and the differences came down to
this:

Correct:
    002b1250: fea0 000f 8441 0200 0066 83fe 3e76 c166  .....A...f..>v.f
Broken:
    002b1250: fea0 000f 8441 0200 1066 83fe 3e76 c166  .....A...f..>v.f

The broken file had a "10" byte where the correct file had a "00"
byte.  So somewhere along the line, I'd had a single-bit error in a
disk file, one that probably shouldn't be getting modified during the
normal course of operation, and it resulted in all of this
frustration.

The Follow-Up
-------------

Naturally I wanted to see if there were any other packages that had
also been corrupted, so I ran this command:

    debsums -s > all-debsums 2>&1

It found a number of checksums mismatches, like these:

    debsums: checksum mismatch aspell-en file /var/lib/aspell/en-common.rws
    debsums: checksum mismatch aspell-en file /var/lib/aspell/en-variant_0.rws
    debsums: checksum mismatch libpango1.0-common file
        /var/lib/pango/pango.modules
    debsums: checksum mismatch xfonts-scalable file
        /usr/share/fonts/X11/Type1/fonts.dir

There were a lot of packages that didn't have any checksums:

    debsums: no md5sums for binutils

And it had one permission problem:

    debsums: can't open apache2-common file
        /usr/lib/apache2/suexec2 (Permission denied)

I'm not sure why aspell's files are changed; they aren't documented,
as far as I can tell, and they're in binary.  The other
checksum-mismatch files are all "lists of installed stuff", and so
it's OK if they change; they probably should be treated differently by
the packaging system.

Anyway, so I don't know if any of the rest of my disk is corrupted,
but debsums didn't find any problems.

Reply via email to