Re: Fixed ! Re: Interesting panic very early in the boot

2002-07-18 Thread Szilveszter Adam

On Thu, Jul 18, 2002 at 01:40:48PM -0400, Bosko Milekic wrote:
>   As pointed out, the change was fairly bogus.  There's a new change
>   that should be committed soon that fixes the problem in a "sort of"
>   less bogus way.  When Peter gets around to reviewing it, it'll be
>   committed and you shouldn't notice a difference.

Excellent. As I stated, all I wanted to report was that "the panic did
not occur again". This is good news in any event, because it seemed that
it did not occur very frequently, so seemed more difficult to fix. I was
just fearing that it would take perhaps a very long time, eg because it
does not occur so often on new hardware or somesuch. A correct fix is
certainly even better.

>   As a point of reference, however, what hardware do you have this
>   running on?  Specifically, what board, CPUs, how many, and how much
>   RAM do you have?

Full specs:

- Shuttle Spacewalker HOT-637/P motherboard with Intel 440 LX chipset,
  UP. Has ISA, PCI and AGP slots, doesn't support ACPI. (in any
  meaningful way)
- Intel Pentium II 233 Mhz (Klamath) CPU, Slot 1
- 128 megs of SDRAM (non-DDR) in two 64 meg units
- Two ATA HDDs, ATAPI CD-ROM, PCI network card (Realtek 8029), ISA PnP
  SB 64 AWE sound, S3 Virge GX2 AGP video card, just in case. No SCSI.

-- 
Regards:

Szilveszter ADAM
Szombathely Hungary

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Fixed ! Re: Interesting panic very early in the boot

2002-07-18 Thread Bosko Milekic


On Thu, Jul 18, 2002 at 07:38:39PM +0200, Szilveszter Adam wrote:
> Hello everybody,
> 
> I would like to report that after incorporating today's fixes into the
> kernel source and recompiling, the panic does not occur again.
> 
> This is probably due to the commit to pmap.c (rev 1.345 by Peter Wemm).
> Although the log only talks about SMP, this UP box likes it too.
> 
> So anyway, thanks for fixing this, and anybody who used the
> "DISABLE_PG_G" workaround can now switch that off.
> 
> Happy hacking!
> -- 
> Regards:
> 
> Szilveszter ADAM
> Szombathely Hungary

  As pointed out, the change was fairly bogus.  There's a new change
  that should be committed soon that fixes the problem in a "sort of"
  less bogus way.  When Peter gets around to reviewing it, it'll be
  committed and you shouldn't notice a difference.

  As a point of reference, however, what hardware do you have this
  running on?  Specifically, what board, CPUs, how many, and how much
  RAM do you have?

Thanks,
-- 
Bosko Milekic
[EMAIL PROTECTED]
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Fixed ! Re: Interesting panic very early in the boot

2002-07-18 Thread Szilveszter Adam

Hello everybody,

I would like to report that after incorporating today's fixes into the
kernel source and recompiling, the panic does not occur again.

This is probably due to the commit to pmap.c (rev 1.345 by Peter Wemm).
Although the log only talks about SMP, this UP box likes it too.

So anyway, thanks for fixing this, and anybody who used the
"DISABLE_PG_G" workaround can now switch that off.

Happy hacking!
-- 
Regards:

Szilveszter ADAM
Szombathely Hungary

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting panic very early in the boot

2002-07-17 Thread Bakul Shah

> That there could be a real error in that code surprises me, since
> Peter really knows what he's doing, even if that low in the
> hardware, there are undocumented interactions that even Intel's
> errata doesn't seem to know about.

Turns out the workaround is to use DISABLE_PG_G.

Two things made me try this.  One: In his commit of pmap.c
and locore.s on 7/12 7:56 Peter had this to say:

 +--
 |- Try and fix some very bogus PG_G and PG_PS interactions that were bad
 |  enough to cause vm86 bios calls to break.  vm86 depended on our existing
...
 |New option:  DISABLE_PG_G - In case I missed something.
 +--

Two: cvs diff -r1.336 -r1.337 of i386/pmap.c showes that
#ifdef SMP was changed to ifndef DISABLE_PG_G and it is in
here that pfeflag is set (pfeflag is what guards the code at
the crash site!).

> set boot_ddb

Didn't do this.

> set boot_gdb

Did this.  I though the above two were mutually exclusive options.
boot_ddb should be renamed to start_in_debugger or something.

Though boot -d is what I was really looking for.

> higher/later code, the root cause is catually in locore.s.  Happy
> bug hunt!  ;^).

Thanks but looks like I easily escaped from the hunt this
time :-)

-- bakul

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting panic very early in the boot

2002-07-17 Thread Terry Lambert

Szilveszter Adam wrote:
> On Wed, Jul 17, 2002 at 11:35:35AM -0700, Terry Lambert wrote:
> > My bet on the root cause, if I am correct, means that if you change
> > the amount of physical RAM installed in the machine, the problem will
> > go away, and that the problem is probably rare because it depends on
> > certain things that are more complicated, after Matt's changes after
> > my complaints about machdep.c reservations on large memory machines,
> > as the amount of physical RAM approaches the size of the address
> > space.
> 
> 'Key, I can try that too. However, this machine is anything but "large
> memory" these days: it has 128 Megs of (non-DDR) SDRAM. (2x64)

It used to be that you would see the problem on large memory
machines.

Matt's changes have likely converted this into a discontinuous
function.

Another way of checking this without cracking the hood on the
machine is to crank up the value of "maxfiles".  For complicated
reasons, it's best to put this number at 50,000 or higher, if
you can still boot your machine with the prereservation of that
much KVA space; the result should be an alternative mask for the
problem.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting panic very early in the boot

2002-07-17 Thread Szilveszter Adam

On Wed, Jul 17, 2002 at 11:35:35AM -0700, Terry Lambert wrote:
> My bet on the root cause, if I am correct, means that if you change
> the amount of physical RAM installed in the machine, the problem will
> go away, and that the problem is probably rare because it depends on
> certain things that are more complicated, after Matt's changes after
> my complaints about machdep.c reservations on large memory machines,
> as the amount of physical RAM approaches the size of the address
> space.

'Key, I can try that too. However, this machine is anything but "large
memory" these days: it has 128 Megs of (non-DDR) SDRAM. (2x64)

-- 
Regards:

Szilveszter ADAM
Szombathely Hungary

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting panic very early in the boot

2002-07-17 Thread Szilveszter Adam

On Wed, Jul 17, 2002 at 07:38:22PM +0200, Szilveszter Adam wrote:
> Bakul mentioned that the panic happens on a PPro. For me, it's a PII. I
> am using a CPUTYPE setting of "p2" in /etc/make.conf. This gets
> converted to a "-march=pentiumpro" on the actual compile line. This may
> be the same for the PPro. This suggests a remote possibility that there
> might be a problem with this option, so this is the next thing that I am
> going to test.
> 
> We'll see...

We did. It did not make a difference. 

Let's move on...
-- 
Regards:

Szilveszter ADAM
Szombathely Hungary

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting panic very early in the boot

2002-07-17 Thread Terry Lambert

Szilveszter Adam wrote:
> Ok, so it's time to speculate a bit more about this.
> 
> Although I have been seeing this panic since Sunday, only one other
> person has reported it so far. Although it may be that this is due to
> the fact that developers do not run up-to-date -CURRENT and hence do not
> see problems that are this "new" (and I bet that the tinderbox only
> tests building, but does not try to actually run the stuff), perhaps
> there is a different explanation.
> 
> Bakul mentioned that the panic happens on a PPro. For me, it's a PII. I
> am using a CPUTYPE setting of "p2" in /etc/make.conf. This gets
> converted to a "-march=pentiumpro" on the actual compile line. This may
> be the same for the PPro. This suggests a remote possibility that there
> might be a problem with this option, so this is the next thing that I am
> going to test.
> 
> We'll see...

My bet on the root cause, if I am correct, means that if you change
the amount of physical RAM installed in the machine, the problem will
go away, and that the problem is probably rare because it depends on
certain things that are more complicated, after Matt's changes after
my complaints about machdep.c reservations on large memory machines,
as the amount of physical RAM approaches the size of the address
space.

My suggestion for DISABLE_PSE was intended to mask the problem,
not fix it, because the root problem I was thinking of isn't
really related to that.

That change masking the problem would have been information.

The change *not* masking the problem is *not* information; i.e.
the masking as a side effect would have been prrof of the theory,
but lack of masking doesn't disprove the theory.

Finding a gremlin body squished in the gears is proof of gremlins,
but not finding one doesn't prove the non-existance of gremlins...
;^).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting panic very early in the boot

2002-07-17 Thread Terry Lambert

Bakul Shah wrote:
> > I believe setting DISABLE_PSE in the config file and rebuilding
> > will make this go away.
> 
> Terry, thanks for the suggestion but that didn't do it.

This surprises me.  I thought it would be the result of Peter's
recent pmap changes, and the fact that there is a bad problem
with Intel processors with 4M and 4K page mappings existing in
the TLB caches for the same regions simultaneously, particularly
because invltlb() doesn't do what it's supposed to do when that
happens, and you have to be really, really sneaky to get it to
do the right thing.  FreeBSD was "accidently sneaky" because it
used to do things very similar to an Intel example of how to do
things, and that example was similar to Intel's test code to
make sure the processor was functioning in it's ICE configuration
with the extra pins and things.

That there could be a real error in that code surprises me, since
Peter really knows what he's doing, even if that low in the
hardware, there are undocumented interactions that even Intel's
errata doesn't seem to know about.


> Time to review recent changes and single step the kernel.
> BTW, how do you stop the kernel before it panics?  It
> panics so early that there is no time for sending a break.

There is a boot option that breaks directly to the debugger;
here is the information from /usr/src/sys/boot/common/help.common:

set boot_ddb

Instructs the kernel to start in the DDB debugger, rather than
proceeding to initialise when booted.

set boot_gdb

Selects gdb-remote mode for the kernel debugger by default.

It should work in this case, because the place you are having
the problem is after the DDB (BDE) debugger has been initialized,
after the kernel has made the hand crafted virtual memory mirror
the physical memory, and jumped into protected mode, and that's
about the earlierst you can start stepping through the assembly
code with the debugger.

To use these, you should hit space to interrupt the boot after
the kernel is loaded, while it's in the boot count-down, and
just type the commands manually.  You could do it earlier, but
I don't know what your loader configuration files look like
(and, no, I don't, I don't want you to send them to me... 8-)).

In a general sense, you should act like the debugger exists
outside the kernel and VM system, and concentrate on what the
system is trying to do at this point, and why, and how the code
does or doesn't do that.  As a starting point, it's important
to know that this deep in the code, the virtual image is
*supposed* to match the physical image, except for the extra
content of pages used for the management of the virtual memory
itself.  That suggests to me that while the symptom is in some
higher/later code, the root cause is catually in locore.s.  Happy
bug hunt!  ;^).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting panic very early in the boot

2002-07-17 Thread Szilveszter Adam

On Wed, Jul 17, 2002 at 09:57:41AM -0700, Bakul Shah wrote:
Terry suggested:
> > I believe setting DISABLE_PSE in the config file and rebuilding
> > will make this go away.
> 
> Terry, thanks for the suggestion but that didn't do it.

Ok, so it's time to speculate a bit more about this.

Although I have been seeing this panic since Sunday, only one other
person has reported it so far. Although it may be that this is due to
the fact that developers do not run up-to-date -CURRENT and hence do not
see problems that are this "new" (and I bet that the tinderbox only
tests building, but does not try to actually run the stuff), perhaps
there is a different explanation.

Bakul mentioned that the panic happens on a PPro. For me, it's a PII. I
am using a CPUTYPE setting of "p2" in /etc/make.conf. This gets
converted to a "-march=pentiumpro" on the actual compile line. This may
be the same for the PPro. This suggests a remote possibility that there
might be a problem with this option, so this is the next thing that I am
going to test.

We'll see...
-- 
Regards:

Szilveszter ADAM
Szombathely Hungary

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting panic very early in the boot

2002-07-17 Thread Bakul Shah

> I believe setting DISABLE_PSE in the config file and rebuilding
> will make this go away.

Terry, thanks for the suggestion but that didn't do it.
Time to review recent changes and single step the kernel.
BTW, how do you stop the kernel before it panics?  It
panics so early that there is no time for sending a break.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting panic very early in the boot

2002-07-16 Thread Terry Lambert

Bakul Shah wrote:
> I've run into a very similar bug -- the kernel panics almost
> right after it is started by the loader.  With remote gdb
> I've traced it to this point so far:

I believe setting DISABLE_PSE in the config file and rebuilding
will make this go away.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting panic very early in the boot

2002-07-16 Thread Bakul Shah

I've run into a very similar bug -- the kernel panics almost
right after it is started by the loader.  With remote gdb
I've traced it to this point so far:

(kgdb) target remote /dev/cuaa0
Remote debugging using /dev/cuaa0
pmap_set_opt () at /usr/src/sys/i386/i386/pmap.c:449
449 if (*pte)
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
warning: shared library handler failed to enable breakpoint
(kgdb) where
#0  pmap_set_opt () at /usr/src/sys/i386/i386/pmap.c:449
#1  0xc0307c64 in pmap_bootstrap (firstaddr=3146924, loadaddr=0)
at /usr/src/sys/i386/i386/pmap.c:403 
#2  0xc03056b2 in getmemsize (first=4947968)
at /usr/src/sys/i386/i386/machdep.c:1473
#3  0xc0305e2f in init386 (first=4947968)
at /usr/src/sys/i386/i386/machdep.c:1817
(kgdbl
444 /* Turn on PG_G for text, data, bss pages. */
445 va = (vm_offset_t)btext;
446 endva = KERNBASE + KERNend;
447 while (va < endva) {
448 pte = vtopte(va);
449 if (*pte)
450 *pte |= pgeflag;
451 va += PAGE_SIZE;
452 }
453 invltlb();  /* Insurance */
(kgdb) p/x va
$2 = 0xc012be70

I can't get to pte for some reason.  So hand computing vtopte(va) we get

(kgdb) p/x btext
$3 = 0xc012be70
(kgdb) p PTmap
$7 = 0xbfc0
(kgdb) p/x PTmap+0xc012b
$8 = 0xbff004ac

This address matches the page fault address.  It is a
supervisor read, protection violation fault.

More details:

This is with today's (July 16) kernel (synced at about 5PM
PDT) on a Ppro system.  This system can take two PPros but
I've plugged in just one Pentium Pro.  It has 64MB ECC
memory.  I'll continue investigating but I haven't been in
this part of code for ages hence the call for help!

If it matters, a kernel built from sources before the KSE
changes works fine on this machine.

Thanks for any hints.

-- bakul

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting panic very early in the boot

2002-07-14 Thread Julian Elischer

you can use addr2line to get info, but
at a pinch you can just use nm -n to figure out what function each address
is in.


On Sun, 14 Jul 2002, Szilveszter Adam wrote:

> Hello everybody,
> 
> I have recently finished to upgrade my system to today morning's
> -CURRENT, with sources just *before* the commit of rev 1.154 to
> src/sys/kern/kern_fork.c by Julian.
> 
> I have an UP IA32 machine, I am not using any additional kernel modules,
> and now, upon rebooting with the new kernel, as soon as I allow to
> continue from the loader prompt, the kernel greets me with this:
> 
> (No serial console, transcribed by hand, please excuse any typos)
> 
> Fatal trap 12: page fault within kernel mode
> 
> fault virtual address   = 0xbff004c0
> fault code  = supervisor read, protection violation
> instruction pointer = 0x8:0xc035c348
> stack pointer   = 0x10:0xc0532c08
> frame pointer   = 0x10:0xc0532c10
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, def32 1, gran 1
> processor eflags= interrupt enabled, resume, IOPL=0
> current process = 0 ()
> 
> kernel: type 12 trap, code=0
> Stopped at 0xc035c348:  cmpl $0,0xbfc0(%eax)
> 
> Unfortunately, there is preciously little I could extract from ddb after
> this.
> 
> ddb> ps
> pid  proc  addr  uid   ppid   pgrp  flag  stat  wmesg  wchan  cmd
> 0  c03f00c0 c053 0 0  000 New 
> 
> ddb> tr
> (null)(c0418920,c080,537000,c0532d48,c03595bd) at 0xc035c348
> (null)(537000,0,c0532c9c,c0532ce8,10)  at 0xc035c290
> (null)(537000,c0352524,f,0,8) at 0xc03595bd
> (null)(537000) at 0xc0359fb9
> (null)() at 0xc0130c7d
> 
> An attempt to "show locks" resulted in:
> 
> witness_list: witness_cold
> 
> Fatal trap 3 breakpoint instruction fault while in kernel mode
> 
> An attempt to "show witness" resulted in:
> 
> witness_display: witness_cold
> 
> Uptime 1s
> and a complete lockup, only a power-cycle helped.
> 
> No dump was taken.
> 
> Does this ring a bell with anyone? I know that the trace may not help
> much...
> 
> I will be just too glad to offer any information or testing that may be
> needed.
> 
> -- 
> Regards:
> 
> Szilveszter ADAM
> Szombathely Hungary
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-current" in the body of the message
> 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting panic very early in the boot

2002-07-14 Thread Szilveszter Adam

On Sun, Jul 14, 2002 at 08:06:49PM +0200, Szilveszter Adam wrote:
> On Sun, Jul 14, 2002 at 07:49:57PM +0200, Szilveszter Adam wrote:
> > Hello everybody,
> > 
> > I have recently finished to upgrade my system to today morning's
> > -CURRENT, with sources just *before* the commit of rev 1.154 to
> > src/sys/kern/kern_fork.c by Julian.
> > 
> > I have an UP IA32 machine, I am not using any additional kernel modules,
> > and now, upon rebooting with the new kernel, as soon as I allow to
> > continue from the loader prompt, the kernel greets me with this:
> 
> <...>
> 
> Sorry I should have said that I have ACPI compiled into the kernel, but
> it is apparently not supported by the motherboard. Will try without it
> next.

I upgraded the kernel source and removed ACPI from the config, but still
no joy. Something fishy going on here...

-- 
Regards:

Szilveszter ADAM
Szombathely Hungary

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Interesting panic very early in the boot

2002-07-14 Thread Szilveszter Adam

On Sun, Jul 14, 2002 at 07:49:57PM +0200, Szilveszter Adam wrote:
> Hello everybody,
> 
> I have recently finished to upgrade my system to today morning's
> -CURRENT, with sources just *before* the commit of rev 1.154 to
> src/sys/kern/kern_fork.c by Julian.
> 
> I have an UP IA32 machine, I am not using any additional kernel modules,
> and now, upon rebooting with the new kernel, as soon as I allow to
> continue from the loader prompt, the kernel greets me with this:

<...>

Sorry I should have said that I have ACPI compiled into the kernel, but
it is apparently not supported by the motherboard. Will try without it
next.
-- 
Regards:

Szilveszter ADAM
Szombathely Hungary

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Interesting panic very early in the boot

2002-07-14 Thread Szilveszter Adam

Hello everybody,

I have recently finished to upgrade my system to today morning's
-CURRENT, with sources just *before* the commit of rev 1.154 to
src/sys/kern/kern_fork.c by Julian.

I have an UP IA32 machine, I am not using any additional kernel modules,
and now, upon rebooting with the new kernel, as soon as I allow to
continue from the loader prompt, the kernel greets me with this:

(No serial console, transcribed by hand, please excuse any typos)

Fatal trap 12: page fault within kernel mode

fault virtual address   = 0xbff004c0
fault code  = supervisor read, protection violation
instruction pointer = 0x8:0xc035c348
stack pointer   = 0x10:0xc0532c08
frame pointer   = 0x10:0xc0532c10
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL=0
current process = 0 ()

kernel: type 12 trap, code=0
Stopped at 0xc035c348:  cmpl $0,0xbfc0(%eax)

Unfortunately, there is preciously little I could extract from ddb after
this.

ddb> ps
pid  proc  addr  uid   ppid   pgrp  flag  stat  wmesg  wchan  cmd
0  c03f00c0 c053 0 0  000 New 

ddb> tr
(null)(c0418920,c080,537000,c0532d48,c03595bd) at 0xc035c348
(null)(537000,0,c0532c9c,c0532ce8,10)  at 0xc035c290
(null)(537000,c0352524,f,0,8) at 0xc03595bd
(null)(537000) at 0xc0359fb9
(null)() at 0xc0130c7d

An attempt to "show locks" resulted in:

witness_list: witness_cold

Fatal trap 3 breakpoint instruction fault while in kernel mode

An attempt to "show witness" resulted in:

witness_display: witness_cold

Uptime 1s
and a complete lockup, only a power-cycle helped.

No dump was taken.

Does this ring a bell with anyone? I know that the trace may not help
much...

I will be just too glad to offer any information or testing that may be
needed.

-- 
Regards:

Szilveszter ADAM
Szombathely Hungary

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message