So Mozilla broke on my laptop one day. I first realized there was a problem when my friend Kelly tried to use it for Gmail, and clicking on a certain link reliably crashed the browser.
(Unfortunately, Kelly didn't realize that when the dialog box popped up and said, "Start New Session or Restore Previous Session?", the "Start New Session" button should actually have been labeled "Discard All of the Web Pages Kragen Previously Had Open, Making Him Very Sad".) There had actually been an incident a few days earlier in which some unknown page crashed the browser. After a few "Restore Session"s, the browser had forgotten most of the pages I had previously had open. But I figured that this was not likely to re-occur. But then it did. After Kelly's problem, there was a page on www.cnn.com that crashed the browser every time I tried to view it. I edited the "sessionstore.js" file (in .mozilla/firefox/m9e6kquo.default/) to change all the recent URLs to something that wouldn't work --- mostly by adding "/broken" in the URL path, e.g. http://www.example.com/foo/bar would become http://www.example.com/broken/foo/bar. This let me figure out which page caused the problem. At this point, I figured I was running into a browser bug that probably every Firefox user had in a latent form, maybe one triggered by my own unusual configuration, and that was only appearing now due to something having changed on the web. I asked my friend Meredith Patterson to look at the page on her Mac OS X browser, but she had no trouble. Tracking it down ---------------- I saved the CNN page to my local machine (using Konqueror as a web browser) and found that it still crashed. My next priority was to cut down the crash to a small, reproducible case, so I cut out all the external links --- inline images, stylesheets, and the like --- by rewriting "http://" links as "xhttp://" links, which were guaranteed to not work. It stopped crashing, so I changed some of the "xhttp://" things back to "http://" until I found one that made the browser crash again. I tried running gdb on the Firefox process at this point, but all it told me was that it crashed while trying to execute a nonsense address. No clue about where it jumped to the nonsense address from, or why, although it did make me feel that the bug was probably fairly serious and possibly an exploitable security hole. Eventually, though, I gave up on the gdb approach. It turned out that the only external thing needed to reproduce the crash was a single external JavaScript file, main.js, and the contents of the page. (I tried an HTML file with just a <script> tag to run main.js, but it didn't crash the browser.) So I saved a local copy of the file, reproduced the crash with the page using the local copy, and then added an "x" to the beginning of each function named in the file so that they wouldn't be called; this rendered the file harmless. Then, I started removing the "x"es one by one to see where the crash happened, but I didn't have to go very far --- the first function I re-enabled, called "cnnHandleCSIs", crashed the browser. So then I sprinkled "alert" calls through the function to see where it crashed, and it was on this line here: if(document.body && document.body.innerHTML && cnnUseDelayedCSI) So I evaluated each of the three conditions in Jesse Ruderman's JavaScript shell, and found that just evaluating document.body.innerHTML was sufficient to crash the browser. So I figured something was funky about this particular page that made it unsafe to evaluate document.body.innerHTML, and so I thought I'd see whether it was the page in its original form, or something unusual one of the scripts in the page was doing to it. I'd already disabled the links to all the external scripts except main.js, but I removed all the inline JavaScript as well --- but the page kept crashing the browser whenever I asked for document.body.innerHTML. Verifying the package --------------------- My next thought was to cut the page down bit by bit to figure out which feature or combination of features was sufficient to cause the crash. But at this point I whined about the situation to some of my friends, and Paul Visscher suggested that maybe my copy of the browser was corrupted, or something in my profile. I tried with a fresh .mozilla directory (mv .mozilla .mozilla-tosave; restart browser; verify crash; rm -rf the newly created .mozilla; mv .mozilla-tosave .mozilla) So I installed the "debsums" package ("sudo apt-get install debsums"), which verifies files installed from Debian packages to see if they've been corrupted, and ran "debsums -s firefox": [EMAIL PROTECTED]:~$ debsums -s firefox debsums: checksum mismatch firefox file /usr/lib/firefox/components/libgklayout.so Well, so one of the components of Firefox, a "shared object" or shared library, had been changed since installation --- fairly unusual! And it would make sense that if there were an error in that file, some normal-ish operations like "document.body.innerHTML" might crash. So I ran "sudo apt-get install --reinstall firefox", and the problem went away; the browser no longer crashed on the pages that I'd been able to reproduce the problem with numerous times before, and debsums stopped complaining. The Problem ----------- I saved a copy of the broken libgklayout.so, and it turned out that it differed from the correct one only in a single bit. I opened the two files with "hexl-find-file" in Emacs and ran "ediff-buffers" to find the differences between the two, and the differences came down to this: Correct: 002b1250: fea0 000f 8441 0200 0066 83fe 3e76 c166 .....A...f..>v.f Broken: 002b1250: fea0 000f 8441 0200 1066 83fe 3e76 c166 .....A...f..>v.f The broken file had a "10" byte where the correct file had a "00" byte. So somewhere along the line, I'd had a single-bit error in a disk file, one that probably shouldn't be getting modified during the normal course of operation, and it resulted in all of this frustration. The Follow-Up ------------- Naturally I wanted to see if there were any other packages that had also been corrupted, so I ran this command: debsums -s > all-debsums 2>&1 It found a number of checksums mismatches, like these: debsums: checksum mismatch aspell-en file /var/lib/aspell/en-common.rws debsums: checksum mismatch aspell-en file /var/lib/aspell/en-variant_0.rws debsums: checksum mismatch libpango1.0-common file /var/lib/pango/pango.modules debsums: checksum mismatch xfonts-scalable file /usr/share/fonts/X11/Type1/fonts.dir There were a lot of packages that didn't have any checksums: debsums: no md5sums for binutils And it had one permission problem: debsums: can't open apache2-common file /usr/lib/apache2/suexec2 (Permission denied) I'm not sure why aspell's files are changed; they aren't documented, as far as I can tell, and they're in binary. The other checksum-mismatch files are all "lists of installed stuff", and so it's OK if they change; they probably should be treated differently by the packaging system. Anyway, so I don't know if any of the rest of my disk is corrupted, but debsums didn't find any problems.