Re: Kernel crash on Asus A7N8X-X

2008-03-06 Thread Frédéric PRACA
Selon John Baldwin <[EMAIL PROTECTED]>:

> On Thursday 06 March 2008 08:31:26 am John Baldwin wrote:
> > The Linux AGP driver has the same code.  It appears to be forcing a read of
> > the TLB registers to force prior writes to clear the TLB entries to flush
> > perhaps?  I'm not sure why you are getting a panic.  What kind of fault did
> > you get?  (The original kernel panic messages would be needed.)
>
> Actually, it looks like you have a 64MB aperture and with either a 32MB or
> 64MB aperture this loop runs off the end of the GATT (GATT has 16384 entries
> * 4 bytes == 64k == 16 pages on x86) so if it dies before it starts the next
> loop that might explain it.  The patch below makes it walk the full GATT
> reading the first word from each page to force a flush w/o walking off the
> end of the GATT.
>
> Actually, this is what appears to have happened:
>
> (gdb) set $start = 0xd4d05000  (ag_virtual)
> (gdb) set $fva = 3570491392(eva in trap_pfault() frame)
> (gdb) p ($fva - $start) / 4
> $2 = 17408
>
> That's well over your current ag_entries of 16384.  Try this patch (note
> Linux's in-kernel agp driver has the same bug):
>
> Index: agp_nvidia.c
> ===
> RCS file: /host/cvs/usr/cvs/src/sys/dev/agp/agp_nvidia.c,v
> retrieving revision 1.13
> diff -u -r1.13 agp_nvidia.c
> --- agp_nvidia.c  12 Nov 2007 21:51:37 -  1.13
> +++ agp_nvidia.c  6 Mar 2008 13:37:43 -
> @@ -347,7 +347,7 @@
>   struct agp_nvidia_softc *sc;
>   u_int32_t wbc_reg, temp;
>   volatile u_int32_t *ag_virtual;
> - int i;
> + int i, pages;
>
>   sc = (struct agp_nvidia_softc *)device_get_softc(dev);
>
> @@ -373,9 +373,10 @@
>   ag_virtual = (volatile u_int32_t *)sc->gatt->ag_virtual;
>
>   /* Flush TLB entries. */
> - for(i = 0; i < 32 + 1; i++)
> + pages = sc->gatt->ag_entries * sizeof(u_int32_t) / PAGE_SIZE;
> + for(i = 0; i < pages; i++)
>   temp = ag_virtual[i * PAGE_SIZE / sizeof(u_int32_t)];
> - for(i = 0; i < 32 + 1; i++)
> + for(i = 0; i < pages; i++)
>   temp = ag_virtual[i * PAGE_SIZE / sizeof(u_int32_t)];
>
>   return (0);
>
> --
> John Baldwin

Thanks a lot John, this code works. I have been able to launch X w/o crashing
the kernel.

Fred
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Kernel crash on Asus A7N8X-X

2008-03-06 Thread John Baldwin
On Thursday 06 March 2008 08:31:26 am John Baldwin wrote:
> On Tuesday 04 March 2008 05:59:59 pm Frédéric PRACA wrote:
> > Hello dear hackers,
> > I own a Asus A7N8X-X motherboard (NForce2 chipset) with a Radeon 9600
> > video card. After upgrading from 6.3 to 7.0, I launched xorg which
> > crashed the kernel. After looking in the kernel core dump, I found that
> > the
> > agp_nvidia_flush_tlb function of /usr/src/sys/pci/agp_nvidia.c crashed on
> > the line 377. The loop fails from the beginning (when i==0). I commented
> > out the two last loops and it seems to work now but as I didn't
> > understand what is this code for, I'd like to have some explanation about
> > it and want to know if someone got the same problem.
>
> The Linux AGP driver has the same code.  It appears to be forcing a read of
> the TLB registers to force prior writes to clear the TLB entries to flush
> perhaps?  I'm not sure why you are getting a panic.  What kind of fault did
> you get?  (The original kernel panic messages would be needed.)

Actually, it looks like you have a 64MB aperture and with either a 32MB or 
64MB aperture this loop runs off the end of the GATT (GATT has 16384 entries 
* 4 bytes == 64k == 16 pages on x86) so if it dies before it starts the next 
loop that might explain it.  The patch below makes it walk the full GATT 
reading the first word from each page to force a flush w/o walking off the 
end of the GATT.

Actually, this is what appears to have happened:

(gdb) set $start = 0xd4d05000  (ag_virtual)
(gdb) set $fva = 3570491392(eva in trap_pfault() frame)
(gdb) p ($fva - $start) / 4
$2 = 17408

That's well over your current ag_entries of 16384.  Try this patch (note 
Linux's in-kernel agp driver has the same bug):

Index: agp_nvidia.c
===
RCS file: /host/cvs/usr/cvs/src/sys/dev/agp/agp_nvidia.c,v
retrieving revision 1.13
diff -u -r1.13 agp_nvidia.c
--- agp_nvidia.c12 Nov 2007 21:51:37 -  1.13
+++ agp_nvidia.c6 Mar 2008 13:37:43 -
@@ -347,7 +347,7 @@
struct agp_nvidia_softc *sc;
u_int32_t wbc_reg, temp;
volatile u_int32_t *ag_virtual;
-   int i;
+   int i, pages;
 
sc = (struct agp_nvidia_softc *)device_get_softc(dev);
 
@@ -373,9 +373,10 @@
ag_virtual = (volatile u_int32_t *)sc->gatt->ag_virtual;
 
/* Flush TLB entries. */
-   for(i = 0; i < 32 + 1; i++)
+   pages = sc->gatt->ag_entries * sizeof(u_int32_t) / PAGE_SIZE;
+   for(i = 0; i < pages; i++)
temp = ag_virtual[i * PAGE_SIZE / sizeof(u_int32_t)];
-   for(i = 0; i < 32 + 1; i++)
+   for(i = 0; i < pages; i++)
temp = ag_virtual[i * PAGE_SIZE / sizeof(u_int32_t)];
 
return (0);

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Kernel crash on Asus A7N8X-X

2008-03-06 Thread John Baldwin
On Tuesday 04 March 2008 05:59:59 pm Frédéric PRACA wrote:
> Hello dear hackers,
> I own a Asus A7N8X-X motherboard (NForce2 chipset) with a Radeon 9600 video
> card. After upgrading from 6.3 to 7.0, I launched xorg which crashed the
> kernel. After looking in the kernel core dump, I found that the
> agp_nvidia_flush_tlb function of /usr/src/sys/pci/agp_nvidia.c crashed on
> the line 377. The loop fails from the beginning (when i==0). I commented
> out the two last loops and it seems to work now but as I didn't understand
> what is this code for, I'd like to have some explanation about it and want
> to know if someone got the same problem.

The Linux AGP driver has the same code.  It appears to be forcing a read of 
the TLB registers to force prior writes to clear the TLB entries to flush 
perhaps?  I'm not sure why you are getting a panic.  What kind of fault did 
you get?  (The original kernel panic messages would be needed.)

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Kernel crash on Asus A7N8X-X

2008-03-05 Thread Frédéric PRACA
Selon Jeremy Chadwick <[EMAIL PROTECTED]>:

> On Tue, Mar 04, 2008 at 11:59:59PM +0100, Frédéric PRACA wrote:
> > Hello dear hackers,
> > I own a Asus A7N8X-X motherboard (NForce2 chipset) with a Radeon 9600 video
> > card. After upgrading from 6.3 to 7.0, I launched xorg which crashed the
> kernel.
> > After looking in the kernel core dump, I found that the
> agp_nvidia_flush_tlb
> > function of /usr/src/sys/pci/agp_nvidia.c crashed on the line 377. The loop
> > fails from the beginning (when i==0). I commented out the two last loops
> and it
> > seems to work now but as I didn't understand what is this code for, I'd
> like to
> > have some explanation about it and want to know if someone got the same
> problem.
>
> I'm in no way familiar with X.
>
> That said: you're using an ATI Radeon card, yet the kernel crashed in
> agp_nvidia.c.  nVidia != ATI.  Is it possible that you changed video
> cards at one point, and you're still using the nVidia AGP driver (loaded
> via /boot/loader.conf)?
No, in fact, agp_nvidia.c is the driver for the AGP bus of the NForce2
motherboard chipset, not the video card.

> dmesg might be useful here.
Why not but I can't copy it for the moment.

> --
> | Jeremy Chadwickjdc at parodius.com |
> | Parodius Networking   http://www.parodius.com/ |
> | UNIX Systems Administrator  Mountain View, CA, USA |
> | Making life hard for others since 1977.  PGP: 4BD6C0CB |
>
>


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Kernel crash on Asus A7N8X-X

2008-03-05 Thread Jeremy Chadwick
On Tue, Mar 04, 2008 at 11:59:59PM +0100, Frédéric PRACA wrote:
> Hello dear hackers,
> I own a Asus A7N8X-X motherboard (NForce2 chipset) with a Radeon 9600 video
> card. After upgrading from 6.3 to 7.0, I launched xorg which crashed the 
> kernel.
> After looking in the kernel core dump, I found that the agp_nvidia_flush_tlb
> function of /usr/src/sys/pci/agp_nvidia.c crashed on the line 377. The loop
> fails from the beginning (when i==0). I commented out the two last loops and 
> it
> seems to work now but as I didn't understand what is this code for, I'd like 
> to
> have some explanation about it and want to know if someone got the same 
> problem.

I'm in no way familiar with X.

That said: you're using an ATI Radeon card, yet the kernel crashed in
agp_nvidia.c.  nVidia != ATI.  Is it possible that you changed video
cards at one point, and you're still using the nVidia AGP driver (loaded
via /boot/loader.conf)?

dmesg might be useful here.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Kernel crash on Asus A7N8X-X

2008-03-05 Thread Frédéric PRACA
Selon Kris Kennaway <[EMAIL PROTECTED]>:

> Fr�d�ric PRACA wrote:
> > Hello dear hackers,
> > I own a Asus A7N8X-X motherboard (NForce2 chipset) with a Radeon 9600 video
> > card. After upgrading from 6.3 to 7.0, I launched xorg which crashed the
> kernel.
> > After looking in the kernel core dump, I found that the
> agp_nvidia_flush_tlb
> > function of /usr/src/sys/pci/agp_nvidia.c crashed on the line 377. The loop
> > fails from the beginning (when i==0). I commented out the two last loops
> and it
> > seems to work now but as I didn't understand what is this code for, I'd
> like to
> > have some explanation about it and want to know if someone got the same
> problem.
>
> Usually it's a good idea to show the data that led to your conclusions
> (backtraces, etc), not just your conclusions.  Sometimes there is more
> going on than is immediately apparent.
For sure, sorry.
Here's what I got from kgdb :
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so:
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".
There is no member named pathname.
(kgdb) bt
#0  doadump () at pcpu.h:195
#1  0xc05b49ac in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2  0xc05b4b74 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:563
#3  0xc06f9858 in vm_fault (map=0xc1054000, vaddr=3570491392,
fault_type=Variable "fault_type" is not available.
)
at /usr/src/sys/vm/vm_fault.c:275
#4  0xc0765be5 in trap_pfault (frame=0xd61f1a48, usermode=0, eva=3570491392)
at /usr/src/sys/i386/i386/trap.c:801
#5  0xc0766502 in trap (frame=0xd61f1a48) at /usr/src/sys/i386/i386/trap.c:490
#6  0xc07545bb in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7  0xc0776ed7 in agp_nvidia_flush_tlb (dev=0xc2a3b100, offset=-1029891720)
at /usr/src/sys/pci/agp_nvidia.c:377
#8  0xc06bb9dd in agp_generic_bind_memory (dev=0xc2a3b100, mem=0xc2fe7cc0,
offset=0) at agp_if.h:76
#9  0xc06bb001 in agp_bind_memory (dev=0xc2a3b100, handle=0xc2fe7cc0,
offset=0) at agp_if.h:128
#10 0xc04bd1bf in drm_agp_bind_memory (handle=0xc2fe7cc0, start=0)
at /usr/src/sys/dev/drm/drm_agpsupport.c:456
#11 0xc04bd4dd in drm_agp_bind (dev=0xc2aad000, request=0xd61f1b68)
at /usr/src/sys/dev/drm/drm_agpsupport.c:331
#12 0xc04bd591 in drm_agp_bind_ioctl (kdev=0xc2ac3400, cmd=2148033590,
data=0xc2c309d0 "�|��", flags=67, p=0xc2ed7a50, filp=0x17497)
at /usr/src/sys/dev/drm/drm_agpsupport.c:348
#13 0xc04c25e8 in drm_ioctl (kdev=0xc2ac3400, cmd=2148033590,
data=0xc2c309d0 "�|��", flags=67, p=0xc2ed7a50)
---Type  to continue, or q  to quit---
at /usr/src/sys/dev/drm/drm_drv.c:911
#14 0xc0585ad6 in giant_ioctl (dev=0xc2ac3400, cmd=2148033590,
data=0xc2c309d0 "�|��", fflag=67, td=0xc2ed7a50)
at /usr/src/sys/kern/kern_conf.c:349
#15 0xc0552517 in devfs_ioctl_f (fp=0xc2c3fd38, com=2148033590,
data=0xc2c309d0, cred=0xc4e62400, td=0xc2ed7a50)
at /usr/src/sys/fs/devfs/devfs_vnops.c:494
#16 0xc05e9b43 in kern_ioctl (td=0xc2ed7a50, fd=8, com=2148033590,
data=0xc2c309d0 "�|��") at file.h:266
#17 0xc05e9ca4 in ioctl (td=0xc2ed7a50, uap=0xd61f1cfc)
at /usr/src/sys/kern/sys_generic.c:570
#18 0xc0765f0a in syscall (frame=0xd61f1d38)
at /usr/src/sys/i386/i386/trap.c:1035
#19 0xc0754620 in Xint0x80_syscall ()
at /usr/src/sys/i386/i386/exception.s:196
#20 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) frame 7
#7  0xc0776ed7 in agp_nvidia_flush_tlb (dev=0xc2a3b100, offset=-1029891720)
at /usr/src/sys/pci/agp_nvidia.c:377
warning: Source file is more recent than executable.

377 temp = ag_virtual[i * PAGE_SIZE / sizeof(u_int32_t)];
(kgdb) list
372
373 ag_virtual = (volatile u_int32_t *)sc->gatt->ag_virtual;
374
375 /* Flush TLB entries. */
376 /*for(i = 0; i < 32 + 1; i++)
377 temp = ag_virtual[i * PAGE_SIZE / sizeof(u_int32_t)];
378 for(i = 0; i < 32 + 1; i++)
379 temp = ag_virtual[i * PAGE_SIZE / sizeof(u_int32_t)];
380 */
381 return (0);
(kgdb) print i
$1 = 0
(kgdb) print *ag_virtual
$2 = 299405313
(kgdb) print ag_virtual
$3 = (volatile u_int32_t *) 0xd4d05000
(kgdb) print *sc
$4 = {agp = {as_aperture = 0xc2a34dc0, as_aperture_rid = 16,
as_maxmem = 461373440, as_allocated = 8388608,
as_state = AGP_ACQUIRE_KERNEL, as_memory = {tqh_first = 0xc2fe7cc0,
  tqh_last = 0xc2fe7cc0}, as_nextid = 2, as_isopen = 0,
as_devnode = 0xc2a3c300, as_lock = {lock_object = {
lo_name = 0xc07d15d1 "agp lock", lo_type = 0xc07d15d1 "

Re: Kernel crash on Asus A7N8X-X

2008-03-05 Thread Kris Kennaway

Frédéric PRACA wrote:

Hello dear hackers,
I own a Asus A7N8X-X motherboard (NForce2 chipset) with a Radeon 9600 video
card. After upgrading from 6.3 to 7.0, I launched xorg which crashed the kernel.
After looking in the kernel core dump, I found that the agp_nvidia_flush_tlb
function of /usr/src/sys/pci/agp_nvidia.c crashed on the line 377. The loop
fails from the beginning (when i==0). I commented out the two last loops and it
seems to work now but as I didn't understand what is this code for, I'd like to
have some explanation about it and want to know if someone got the same problem.


Usually it's a good idea to show the data that led to your conclusions 
(backtraces, etc), not just your conclusions.  Sometimes there is more 
going on than is immediately apparent.


Krs
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Kernel crash on Asus A7N8X-X

2008-03-04 Thread Frédéric PRACA
Hello dear hackers,
I own a Asus A7N8X-X motherboard (NForce2 chipset) with a Radeon 9600 video
card. After upgrading from 6.3 to 7.0, I launched xorg which crashed the kernel.
After looking in the kernel core dump, I found that the agp_nvidia_flush_tlb
function of /usr/src/sys/pci/agp_nvidia.c crashed on the line 377. The loop
fails from the beginning (when i==0). I commented out the two last loops and it
seems to work now but as I didn't understand what is this code for, I'd like to
have some explanation about it and want to know if someone got the same problem.

Fred

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"