Re: Fixed ! Re: Interesting panic very early in the boot
On Thu, Jul 18, 2002 at 01:40:48PM -0400, Bosko Milekic wrote: > As pointed out, the change was fairly bogus. There's a new change > that should be committed soon that fixes the problem in a "sort of" > less bogus way. When Peter gets around to reviewing it, it'll be > committed and you shouldn't notice a difference. Excellent. As I stated, all I wanted to report was that "the panic did not occur again". This is good news in any event, because it seemed that it did not occur very frequently, so seemed more difficult to fix. I was just fearing that it would take perhaps a very long time, eg because it does not occur so often on new hardware or somesuch. A correct fix is certainly even better. > As a point of reference, however, what hardware do you have this > running on? Specifically, what board, CPUs, how many, and how much > RAM do you have? Full specs: - Shuttle Spacewalker HOT-637/P motherboard with Intel 440 LX chipset, UP. Has ISA, PCI and AGP slots, doesn't support ACPI. (in any meaningful way) - Intel Pentium II 233 Mhz (Klamath) CPU, Slot 1 - 128 megs of SDRAM (non-DDR) in two 64 meg units - Two ATA HDDs, ATAPI CD-ROM, PCI network card (Realtek 8029), ISA PnP SB 64 AWE sound, S3 Virge GX2 AGP video card, just in case. No SCSI. -- Regards: Szilveszter ADAM Szombathely Hungary To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Fixed ! Re: Interesting panic very early in the boot
On Thu, Jul 18, 2002 at 07:38:39PM +0200, Szilveszter Adam wrote: > Hello everybody, > > I would like to report that after incorporating today's fixes into the > kernel source and recompiling, the panic does not occur again. > > This is probably due to the commit to pmap.c (rev 1.345 by Peter Wemm). > Although the log only talks about SMP, this UP box likes it too. > > So anyway, thanks for fixing this, and anybody who used the > "DISABLE_PG_G" workaround can now switch that off. > > Happy hacking! > -- > Regards: > > Szilveszter ADAM > Szombathely Hungary As pointed out, the change was fairly bogus. There's a new change that should be committed soon that fixes the problem in a "sort of" less bogus way. When Peter gets around to reviewing it, it'll be committed and you shouldn't notice a difference. As a point of reference, however, what hardware do you have this running on? Specifically, what board, CPUs, how many, and how much RAM do you have? Thanks, -- Bosko Milekic [EMAIL PROTECTED] [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Fixed ! Re: Interesting panic very early in the boot
Hello everybody, I would like to report that after incorporating today's fixes into the kernel source and recompiling, the panic does not occur again. This is probably due to the commit to pmap.c (rev 1.345 by Peter Wemm). Although the log only talks about SMP, this UP box likes it too. So anyway, thanks for fixing this, and anybody who used the "DISABLE_PG_G" workaround can now switch that off. Happy hacking! -- Regards: Szilveszter ADAM Szombathely Hungary To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting panic very early in the boot
> That there could be a real error in that code surprises me, since > Peter really knows what he's doing, even if that low in the > hardware, there are undocumented interactions that even Intel's > errata doesn't seem to know about. Turns out the workaround is to use DISABLE_PG_G. Two things made me try this. One: In his commit of pmap.c and locore.s on 7/12 7:56 Peter had this to say: +-- |- Try and fix some very bogus PG_G and PG_PS interactions that were bad | enough to cause vm86 bios calls to break. vm86 depended on our existing ... |New option: DISABLE_PG_G - In case I missed something. +-- Two: cvs diff -r1.336 -r1.337 of i386/pmap.c showes that #ifdef SMP was changed to ifndef DISABLE_PG_G and it is in here that pfeflag is set (pfeflag is what guards the code at the crash site!). > set boot_ddb Didn't do this. > set boot_gdb Did this. I though the above two were mutually exclusive options. boot_ddb should be renamed to start_in_debugger or something. Though boot -d is what I was really looking for. > higher/later code, the root cause is catually in locore.s. Happy > bug hunt! ;^). Thanks but looks like I easily escaped from the hunt this time :-) -- bakul To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting panic very early in the boot
Szilveszter Adam wrote: > On Wed, Jul 17, 2002 at 11:35:35AM -0700, Terry Lambert wrote: > > My bet on the root cause, if I am correct, means that if you change > > the amount of physical RAM installed in the machine, the problem will > > go away, and that the problem is probably rare because it depends on > > certain things that are more complicated, after Matt's changes after > > my complaints about machdep.c reservations on large memory machines, > > as the amount of physical RAM approaches the size of the address > > space. > > 'Key, I can try that too. However, this machine is anything but "large > memory" these days: it has 128 Megs of (non-DDR) SDRAM. (2x64) It used to be that you would see the problem on large memory machines. Matt's changes have likely converted this into a discontinuous function. Another way of checking this without cracking the hood on the machine is to crank up the value of "maxfiles". For complicated reasons, it's best to put this number at 50,000 or higher, if you can still boot your machine with the prereservation of that much KVA space; the result should be an alternative mask for the problem. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting panic very early in the boot
On Wed, Jul 17, 2002 at 11:35:35AM -0700, Terry Lambert wrote: > My bet on the root cause, if I am correct, means that if you change > the amount of physical RAM installed in the machine, the problem will > go away, and that the problem is probably rare because it depends on > certain things that are more complicated, after Matt's changes after > my complaints about machdep.c reservations on large memory machines, > as the amount of physical RAM approaches the size of the address > space. 'Key, I can try that too. However, this machine is anything but "large memory" these days: it has 128 Megs of (non-DDR) SDRAM. (2x64) -- Regards: Szilveszter ADAM Szombathely Hungary To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting panic very early in the boot
On Wed, Jul 17, 2002 at 07:38:22PM +0200, Szilveszter Adam wrote: > Bakul mentioned that the panic happens on a PPro. For me, it's a PII. I > am using a CPUTYPE setting of "p2" in /etc/make.conf. This gets > converted to a "-march=pentiumpro" on the actual compile line. This may > be the same for the PPro. This suggests a remote possibility that there > might be a problem with this option, so this is the next thing that I am > going to test. > > We'll see... We did. It did not make a difference. Let's move on... -- Regards: Szilveszter ADAM Szombathely Hungary To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting panic very early in the boot
Szilveszter Adam wrote: > Ok, so it's time to speculate a bit more about this. > > Although I have been seeing this panic since Sunday, only one other > person has reported it so far. Although it may be that this is due to > the fact that developers do not run up-to-date -CURRENT and hence do not > see problems that are this "new" (and I bet that the tinderbox only > tests building, but does not try to actually run the stuff), perhaps > there is a different explanation. > > Bakul mentioned that the panic happens on a PPro. For me, it's a PII. I > am using a CPUTYPE setting of "p2" in /etc/make.conf. This gets > converted to a "-march=pentiumpro" on the actual compile line. This may > be the same for the PPro. This suggests a remote possibility that there > might be a problem with this option, so this is the next thing that I am > going to test. > > We'll see... My bet on the root cause, if I am correct, means that if you change the amount of physical RAM installed in the machine, the problem will go away, and that the problem is probably rare because it depends on certain things that are more complicated, after Matt's changes after my complaints about machdep.c reservations on large memory machines, as the amount of physical RAM approaches the size of the address space. My suggestion for DISABLE_PSE was intended to mask the problem, not fix it, because the root problem I was thinking of isn't really related to that. That change masking the problem would have been information. The change *not* masking the problem is *not* information; i.e. the masking as a side effect would have been prrof of the theory, but lack of masking doesn't disprove the theory. Finding a gremlin body squished in the gears is proof of gremlins, but not finding one doesn't prove the non-existance of gremlins... ;^). -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting panic very early in the boot
Bakul Shah wrote: > > I believe setting DISABLE_PSE in the config file and rebuilding > > will make this go away. > > Terry, thanks for the suggestion but that didn't do it. This surprises me. I thought it would be the result of Peter's recent pmap changes, and the fact that there is a bad problem with Intel processors with 4M and 4K page mappings existing in the TLB caches for the same regions simultaneously, particularly because invltlb() doesn't do what it's supposed to do when that happens, and you have to be really, really sneaky to get it to do the right thing. FreeBSD was "accidently sneaky" because it used to do things very similar to an Intel example of how to do things, and that example was similar to Intel's test code to make sure the processor was functioning in it's ICE configuration with the extra pins and things. That there could be a real error in that code surprises me, since Peter really knows what he's doing, even if that low in the hardware, there are undocumented interactions that even Intel's errata doesn't seem to know about. > Time to review recent changes and single step the kernel. > BTW, how do you stop the kernel before it panics? It > panics so early that there is no time for sending a break. There is a boot option that breaks directly to the debugger; here is the information from /usr/src/sys/boot/common/help.common: set boot_ddb Instructs the kernel to start in the DDB debugger, rather than proceeding to initialise when booted. set boot_gdb Selects gdb-remote mode for the kernel debugger by default. It should work in this case, because the place you are having the problem is after the DDB (BDE) debugger has been initialized, after the kernel has made the hand crafted virtual memory mirror the physical memory, and jumped into protected mode, and that's about the earlierst you can start stepping through the assembly code with the debugger. To use these, you should hit space to interrupt the boot after the kernel is loaded, while it's in the boot count-down, and just type the commands manually. You could do it earlier, but I don't know what your loader configuration files look like (and, no, I don't, I don't want you to send them to me... 8-)). In a general sense, you should act like the debugger exists outside the kernel and VM system, and concentrate on what the system is trying to do at this point, and why, and how the code does or doesn't do that. As a starting point, it's important to know that this deep in the code, the virtual image is *supposed* to match the physical image, except for the extra content of pages used for the management of the virtual memory itself. That suggests to me that while the symptom is in some higher/later code, the root cause is catually in locore.s. Happy bug hunt! ;^). -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting panic very early in the boot
On Wed, Jul 17, 2002 at 09:57:41AM -0700, Bakul Shah wrote: Terry suggested: > > I believe setting DISABLE_PSE in the config file and rebuilding > > will make this go away. > > Terry, thanks for the suggestion but that didn't do it. Ok, so it's time to speculate a bit more about this. Although I have been seeing this panic since Sunday, only one other person has reported it so far. Although it may be that this is due to the fact that developers do not run up-to-date -CURRENT and hence do not see problems that are this "new" (and I bet that the tinderbox only tests building, but does not try to actually run the stuff), perhaps there is a different explanation. Bakul mentioned that the panic happens on a PPro. For me, it's a PII. I am using a CPUTYPE setting of "p2" in /etc/make.conf. This gets converted to a "-march=pentiumpro" on the actual compile line. This may be the same for the PPro. This suggests a remote possibility that there might be a problem with this option, so this is the next thing that I am going to test. We'll see... -- Regards: Szilveszter ADAM Szombathely Hungary To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting panic very early in the boot
> I believe setting DISABLE_PSE in the config file and rebuilding > will make this go away. Terry, thanks for the suggestion but that didn't do it. Time to review recent changes and single step the kernel. BTW, how do you stop the kernel before it panics? It panics so early that there is no time for sending a break. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting panic very early in the boot
Bakul Shah wrote: > I've run into a very similar bug -- the kernel panics almost > right after it is started by the loader. With remote gdb > I've traced it to this point so far: I believe setting DISABLE_PSE in the config file and rebuilding will make this go away. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting panic very early in the boot
I've run into a very similar bug -- the kernel panics almost right after it is started by the loader. With remote gdb I've traced it to this point so far: (kgdb) target remote /dev/cuaa0 Remote debugging using /dev/cuaa0 pmap_set_opt () at /usr/src/sys/i386/i386/pmap.c:449 449 if (*pte) warning: Unable to find dynamic linker breakpoint function. GDB will be unable to debug shared library initializers and track explicitly loaded dynamic code. warning: shared library handler failed to enable breakpoint (kgdb) where #0 pmap_set_opt () at /usr/src/sys/i386/i386/pmap.c:449 #1 0xc0307c64 in pmap_bootstrap (firstaddr=3146924, loadaddr=0) at /usr/src/sys/i386/i386/pmap.c:403 #2 0xc03056b2 in getmemsize (first=4947968) at /usr/src/sys/i386/i386/machdep.c:1473 #3 0xc0305e2f in init386 (first=4947968) at /usr/src/sys/i386/i386/machdep.c:1817 (kgdbl 444 /* Turn on PG_G for text, data, bss pages. */ 445 va = (vm_offset_t)btext; 446 endva = KERNBASE + KERNend; 447 while (va < endva) { 448 pte = vtopte(va); 449 if (*pte) 450 *pte |= pgeflag; 451 va += PAGE_SIZE; 452 } 453 invltlb(); /* Insurance */ (kgdb) p/x va $2 = 0xc012be70 I can't get to pte for some reason. So hand computing vtopte(va) we get (kgdb) p/x btext $3 = 0xc012be70 (kgdb) p PTmap $7 = 0xbfc0 (kgdb) p/x PTmap+0xc012b $8 = 0xbff004ac This address matches the page fault address. It is a supervisor read, protection violation fault. More details: This is with today's (July 16) kernel (synced at about 5PM PDT) on a Ppro system. This system can take two PPros but I've plugged in just one Pentium Pro. It has 64MB ECC memory. I'll continue investigating but I haven't been in this part of code for ages hence the call for help! If it matters, a kernel built from sources before the KSE changes works fine on this machine. Thanks for any hints. -- bakul To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting panic very early in the boot
you can use addr2line to get info, but at a pinch you can just use nm -n to figure out what function each address is in. On Sun, 14 Jul 2002, Szilveszter Adam wrote: > Hello everybody, > > I have recently finished to upgrade my system to today morning's > -CURRENT, with sources just *before* the commit of rev 1.154 to > src/sys/kern/kern_fork.c by Julian. > > I have an UP IA32 machine, I am not using any additional kernel modules, > and now, upon rebooting with the new kernel, as soon as I allow to > continue from the loader prompt, the kernel greets me with this: > > (No serial console, transcribed by hand, please excuse any typos) > > Fatal trap 12: page fault within kernel mode > > fault virtual address = 0xbff004c0 > fault code = supervisor read, protection violation > instruction pointer = 0x8:0xc035c348 > stack pointer = 0x10:0xc0532c08 > frame pointer = 0x10:0xc0532c10 > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags= interrupt enabled, resume, IOPL=0 > current process = 0 () > > kernel: type 12 trap, code=0 > Stopped at 0xc035c348: cmpl $0,0xbfc0(%eax) > > Unfortunately, there is preciously little I could extract from ddb after > this. > > ddb> ps > pid proc addr uid ppid pgrp flag stat wmesg wchan cmd > 0 c03f00c0 c053 0 0 000 New > > ddb> tr > (null)(c0418920,c080,537000,c0532d48,c03595bd) at 0xc035c348 > (null)(537000,0,c0532c9c,c0532ce8,10) at 0xc035c290 > (null)(537000,c0352524,f,0,8) at 0xc03595bd > (null)(537000) at 0xc0359fb9 > (null)() at 0xc0130c7d > > An attempt to "show locks" resulted in: > > witness_list: witness_cold > > Fatal trap 3 breakpoint instruction fault while in kernel mode > > An attempt to "show witness" resulted in: > > witness_display: witness_cold > > Uptime 1s > and a complete lockup, only a power-cycle helped. > > No dump was taken. > > Does this ring a bell with anyone? I know that the trace may not help > much... > > I will be just too glad to offer any information or testing that may be > needed. > > -- > Regards: > > Szilveszter ADAM > Szombathely Hungary > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-current" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting panic very early in the boot
On Sun, Jul 14, 2002 at 08:06:49PM +0200, Szilveszter Adam wrote: > On Sun, Jul 14, 2002 at 07:49:57PM +0200, Szilveszter Adam wrote: > > Hello everybody, > > > > I have recently finished to upgrade my system to today morning's > > -CURRENT, with sources just *before* the commit of rev 1.154 to > > src/sys/kern/kern_fork.c by Julian. > > > > I have an UP IA32 machine, I am not using any additional kernel modules, > > and now, upon rebooting with the new kernel, as soon as I allow to > > continue from the loader prompt, the kernel greets me with this: > > <...> > > Sorry I should have said that I have ACPI compiled into the kernel, but > it is apparently not supported by the motherboard. Will try without it > next. I upgraded the kernel source and removed ACPI from the config, but still no joy. Something fishy going on here... -- Regards: Szilveszter ADAM Szombathely Hungary To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Interesting panic very early in the boot
On Sun, Jul 14, 2002 at 07:49:57PM +0200, Szilveszter Adam wrote: > Hello everybody, > > I have recently finished to upgrade my system to today morning's > -CURRENT, with sources just *before* the commit of rev 1.154 to > src/sys/kern/kern_fork.c by Julian. > > I have an UP IA32 machine, I am not using any additional kernel modules, > and now, upon rebooting with the new kernel, as soon as I allow to > continue from the loader prompt, the kernel greets me with this: <...> Sorry I should have said that I have ACPI compiled into the kernel, but it is apparently not supported by the motherboard. Will try without it next. -- Regards: Szilveszter ADAM Szombathely Hungary To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Interesting panic very early in the boot
Hello everybody, I have recently finished to upgrade my system to today morning's -CURRENT, with sources just *before* the commit of rev 1.154 to src/sys/kern/kern_fork.c by Julian. I have an UP IA32 machine, I am not using any additional kernel modules, and now, upon rebooting with the new kernel, as soon as I allow to continue from the loader prompt, the kernel greets me with this: (No serial console, transcribed by hand, please excuse any typos) Fatal trap 12: page fault within kernel mode fault virtual address = 0xbff004c0 fault code = supervisor read, protection violation instruction pointer = 0x8:0xc035c348 stack pointer = 0x10:0xc0532c08 frame pointer = 0x10:0xc0532c10 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL=0 current process = 0 () kernel: type 12 trap, code=0 Stopped at 0xc035c348: cmpl $0,0xbfc0(%eax) Unfortunately, there is preciously little I could extract from ddb after this. ddb> ps pid proc addr uid ppid pgrp flag stat wmesg wchan cmd 0 c03f00c0 c053 0 0 000 New ddb> tr (null)(c0418920,c080,537000,c0532d48,c03595bd) at 0xc035c348 (null)(537000,0,c0532c9c,c0532ce8,10) at 0xc035c290 (null)(537000,c0352524,f,0,8) at 0xc03595bd (null)(537000) at 0xc0359fb9 (null)() at 0xc0130c7d An attempt to "show locks" resulted in: witness_list: witness_cold Fatal trap 3 breakpoint instruction fault while in kernel mode An attempt to "show witness" resulted in: witness_display: witness_cold Uptime 1s and a complete lockup, only a power-cycle helped. No dump was taken. Does this ring a bell with anyone? I know that the trace may not help much... I will be just too glad to offer any information or testing that may be needed. -- Regards: Szilveszter ADAM Szombathely Hungary To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message