On 03/08/13 04:35, Kevin O'Connor wrote: > On Thu, Mar 07, 2013 at 09:43:04AM +0100, Aurelien Jarno wrote: >> On Wed, Mar 06, 2013 at 07:53:51PM -0500, Kevin O'Connor wrote: >>> That change is definitely just build related - I don't see how it >>> could impact the final SeaBIOS binary. How did you conclude that this >>> commit is what fixes the issue? >>> >> >> I did a git bisect to find the commit fixing the issue. Then, as I was >> not believing the result, I tried the following sequence a dozen of >> times (for some unknown reasons the FreeBSD install CD doesn't exhibit >> the issue, so I used the Debian GNU/kFreeBSD installer): > [...] > > Thanks for the detailed bug report. Here's what I see going on: > > - the SeaBIOS 4219149a commit does change the resulting binary ever so > slightly - the src/virtio_ring.c code has a reference to __FILE__ > (the only code in SeaBIOS that does that), and due to slightly > different build rules in this commit it evaluates to a slightly > different string. > > - the freebsd crash has nothing to do with 4219149a or > src/virtio_ring.c - instead, random changes in the seabios binary > layout can cause (or avoid) the crash. You can see this in action > by modifying seabios to have higher debug levels, commenting out > code, adding dprintf statements, etc. > > - the crash happens when freebsd attempts to emulate the bios code (!) > in order to determine the keyboard typematic rate (!). (See > sys/dev/atkbdc/atkbd.c.) Since SeaBIOS doesn't support the typematic > callback rate (int 0x16 ax=0x0306) this doesn't actually achieve > anything in practice were the call to not crash. However, a crash > does (sometimes) result. > > - the freebsd x86bios_get_pages() code is buggy (See > sys/compat/x86bios/x86bios.c). It attempts to check that its x86 > emulater (!) doesn't access a page it hasn't mapped. However, it > does not check for the case where a two byte access spans two pages. > If the first page is mapped, but the second is not - splat. The > crash I've seen in QEMU had a two byte access to 0xffffff8000015fff > with the fault at 0xffffff8000016000. > > - I have not been able to determine why an attempt was made to access > a non-mapped page. My best guess is that the x86emu code (!) goes > off the deep-end in all cases - just some cases lead it to the bug > above and other cases lead it to a more friendly termination. > (Recall that SeaBIOS doesn't support the typematic call anyway.) It > should be possible to track this down by adding debug statements to > the freebsd code if anyone is familiar with the freebsd kernel > compile-deploy-run cycle.
Great analysis! Laszlo (sorry for the noise)