Re: Kernel crash on Asus A7N8X-X

2008-03-06 Thread John Baldwin
On Tuesday 04 March 2008 05:59:59 pm Frédéric PRACA wrote:
 Hello dear hackers,
 I own a Asus A7N8X-X motherboard (NForce2 chipset) with a Radeon 9600 video
 card. After upgrading from 6.3 to 7.0, I launched xorg which crashed the
 kernel. After looking in the kernel core dump, I found that the
 agp_nvidia_flush_tlb function of /usr/src/sys/pci/agp_nvidia.c crashed on
 the line 377. The loop fails from the beginning (when i==0). I commented
 out the two last loops and it seems to work now but as I didn't understand
 what is this code for, I'd like to have some explanation about it and want
 to know if someone got the same problem.

The Linux AGP driver has the same code.  It appears to be forcing a read of 
the TLB registers to force prior writes to clear the TLB entries to flush 
perhaps?  I'm not sure why you are getting a panic.  What kind of fault did 
you get?  (The original kernel panic messages would be needed.)

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Kernel crash on Asus A7N8X-X

2008-03-06 Thread John Baldwin
On Thursday 06 March 2008 08:31:26 am John Baldwin wrote:
 On Tuesday 04 March 2008 05:59:59 pm Frédéric PRACA wrote:
  Hello dear hackers,
  I own a Asus A7N8X-X motherboard (NForce2 chipset) with a Radeon 9600
  video card. After upgrading from 6.3 to 7.0, I launched xorg which
  crashed the kernel. After looking in the kernel core dump, I found that
  the
  agp_nvidia_flush_tlb function of /usr/src/sys/pci/agp_nvidia.c crashed on
  the line 377. The loop fails from the beginning (when i==0). I commented
  out the two last loops and it seems to work now but as I didn't
  understand what is this code for, I'd like to have some explanation about
  it and want to know if someone got the same problem.

 The Linux AGP driver has the same code.  It appears to be forcing a read of
 the TLB registers to force prior writes to clear the TLB entries to flush
 perhaps?  I'm not sure why you are getting a panic.  What kind of fault did
 you get?  (The original kernel panic messages would be needed.)

Actually, it looks like you have a 64MB aperture and with either a 32MB or 
64MB aperture this loop runs off the end of the GATT (GATT has 16384 entries 
* 4 bytes == 64k == 16 pages on x86) so if it dies before it starts the next 
loop that might explain it.  The patch below makes it walk the full GATT 
reading the first word from each page to force a flush w/o walking off the 
end of the GATT.

Actually, this is what appears to have happened:

(gdb) set $start = 0xd4d05000  (ag_virtual)
(gdb) set $fva = 3570491392(eva in trap_pfault() frame)
(gdb) p ($fva - $start) / 4
$2 = 17408

That's well over your current ag_entries of 16384.  Try this patch (note 
Linux's in-kernel agp driver has the same bug):

Index: agp_nvidia.c
===
RCS file: /host/cvs/usr/cvs/src/sys/dev/agp/agp_nvidia.c,v
retrieving revision 1.13
diff -u -r1.13 agp_nvidia.c
--- agp_nvidia.c12 Nov 2007 21:51:37 -  1.13
+++ agp_nvidia.c6 Mar 2008 13:37:43 -
@@ -347,7 +347,7 @@
struct agp_nvidia_softc *sc;
u_int32_t wbc_reg, temp;
volatile u_int32_t *ag_virtual;
-   int i;
+   int i, pages;
 
sc = (struct agp_nvidia_softc *)device_get_softc(dev);
 
@@ -373,9 +373,10 @@
ag_virtual = (volatile u_int32_t *)sc-gatt-ag_virtual;
 
/* Flush TLB entries. */
-   for(i = 0; i  32 + 1; i++)
+   pages = sc-gatt-ag_entries * sizeof(u_int32_t) / PAGE_SIZE;
+   for(i = 0; i  pages; i++)
temp = ag_virtual[i * PAGE_SIZE / sizeof(u_int32_t)];
-   for(i = 0; i  32 + 1; i++)
+   for(i = 0; i  pages; i++)
temp = ag_virtual[i * PAGE_SIZE / sizeof(u_int32_t)];
 
return (0);

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Kernel crash on Asus A7N8X-X

2008-03-06 Thread Frédéric PRACA
Selon John Baldwin [EMAIL PROTECTED]:

 On Thursday 06 March 2008 08:31:26 am John Baldwin wrote:
  The Linux AGP driver has the same code.  It appears to be forcing a read of
  the TLB registers to force prior writes to clear the TLB entries to flush
  perhaps?  I'm not sure why you are getting a panic.  What kind of fault did
  you get?  (The original kernel panic messages would be needed.)

 Actually, it looks like you have a 64MB aperture and with either a 32MB or
 64MB aperture this loop runs off the end of the GATT (GATT has 16384 entries
 * 4 bytes == 64k == 16 pages on x86) so if it dies before it starts the next
 loop that might explain it.  The patch below makes it walk the full GATT
 reading the first word from each page to force a flush w/o walking off the
 end of the GATT.

 Actually, this is what appears to have happened:

 (gdb) set $start = 0xd4d05000  (ag_virtual)
 (gdb) set $fva = 3570491392(eva in trap_pfault() frame)
 (gdb) p ($fva - $start) / 4
 $2 = 17408

 That's well over your current ag_entries of 16384.  Try this patch (note
 Linux's in-kernel agp driver has the same bug):

 Index: agp_nvidia.c
 ===
 RCS file: /host/cvs/usr/cvs/src/sys/dev/agp/agp_nvidia.c,v
 retrieving revision 1.13
 diff -u -r1.13 agp_nvidia.c
 --- agp_nvidia.c  12 Nov 2007 21:51:37 -  1.13
 +++ agp_nvidia.c  6 Mar 2008 13:37:43 -
 @@ -347,7 +347,7 @@
   struct agp_nvidia_softc *sc;
   u_int32_t wbc_reg, temp;
   volatile u_int32_t *ag_virtual;
 - int i;
 + int i, pages;

   sc = (struct agp_nvidia_softc *)device_get_softc(dev);

 @@ -373,9 +373,10 @@
   ag_virtual = (volatile u_int32_t *)sc-gatt-ag_virtual;

   /* Flush TLB entries. */
 - for(i = 0; i  32 + 1; i++)
 + pages = sc-gatt-ag_entries * sizeof(u_int32_t) / PAGE_SIZE;
 + for(i = 0; i  pages; i++)
   temp = ag_virtual[i * PAGE_SIZE / sizeof(u_int32_t)];
 - for(i = 0; i  32 + 1; i++)
 + for(i = 0; i  pages; i++)
   temp = ag_virtual[i * PAGE_SIZE / sizeof(u_int32_t)];

   return (0);

 --
 John Baldwin

Thanks a lot John, this code works. I have been able to launch X w/o crashing
the kernel.

Fred
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Kernel crash on Asus A7N8X-X

2008-03-05 Thread Kris Kennaway

Frédéric PRACA wrote:

Hello dear hackers,
I own a Asus A7N8X-X motherboard (NForce2 chipset) with a Radeon 9600 video
card. After upgrading from 6.3 to 7.0, I launched xorg which crashed the kernel.
After looking in the kernel core dump, I found that the agp_nvidia_flush_tlb
function of /usr/src/sys/pci/agp_nvidia.c crashed on the line 377. The loop
fails from the beginning (when i==0). I commented out the two last loops and it
seems to work now but as I didn't understand what is this code for, I'd like to
have some explanation about it and want to know if someone got the same problem.


Usually it's a good idea to show the data that led to your conclusions 
(backtraces, etc), not just your conclusions.  Sometimes there is more 
going on than is immediately apparent.


Krs
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Kernel crash on Asus A7N8X-X

2008-03-05 Thread Frédéric PRACA
Selon Kris Kennaway [EMAIL PROTECTED]:

 Fr�d�ric PRACA wrote:
  Hello dear hackers,
  I own a Asus A7N8X-X motherboard (NForce2 chipset) with a Radeon 9600 video
  card. After upgrading from 6.3 to 7.0, I launched xorg which crashed the
 kernel.
  After looking in the kernel core dump, I found that the
 agp_nvidia_flush_tlb
  function of /usr/src/sys/pci/agp_nvidia.c crashed on the line 377. The loop
  fails from the beginning (when i==0). I commented out the two last loops
 and it
  seems to work now but as I didn't understand what is this code for, I'd
 like to
  have some explanation about it and want to know if someone got the same
 problem.

 Usually it's a good idea to show the data that led to your conclusions
 (backtraces, etc), not just your conclusions.  Sometimes there is more
 going on than is immediately apparent.
For sure, sorry.
Here's what I got from kgdb :
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so:
Undefined symbol ps_pglobal_lookup]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-marcel-freebsd.
There is no member named pathname.
(kgdb) bt
#0  doadump () at pcpu.h:195
#1  0xc05b49ac in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2  0xc05b4b74 in panic (fmt=Variable fmt is not available.
) at /usr/src/sys/kern/kern_shutdown.c:563
#3  0xc06f9858 in vm_fault (map=0xc1054000, vaddr=3570491392,
fault_type=Variable fault_type is not available.
)
at /usr/src/sys/vm/vm_fault.c:275
#4  0xc0765be5 in trap_pfault (frame=0xd61f1a48, usermode=0, eva=3570491392)
at /usr/src/sys/i386/i386/trap.c:801
#5  0xc0766502 in trap (frame=0xd61f1a48) at /usr/src/sys/i386/i386/trap.c:490
#6  0xc07545bb in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7  0xc0776ed7 in agp_nvidia_flush_tlb (dev=0xc2a3b100, offset=-1029891720)
at /usr/src/sys/pci/agp_nvidia.c:377
#8  0xc06bb9dd in agp_generic_bind_memory (dev=0xc2a3b100, mem=0xc2fe7cc0,
offset=0) at agp_if.h:76
#9  0xc06bb001 in agp_bind_memory (dev=0xc2a3b100, handle=0xc2fe7cc0,
offset=0) at agp_if.h:128
#10 0xc04bd1bf in drm_agp_bind_memory (handle=0xc2fe7cc0, start=0)
at /usr/src/sys/dev/drm/drm_agpsupport.c:456
#11 0xc04bd4dd in drm_agp_bind (dev=0xc2aad000, request=0xd61f1b68)
at /usr/src/sys/dev/drm/drm_agpsupport.c:331
#12 0xc04bd591 in drm_agp_bind_ioctl (kdev=0xc2ac3400, cmd=2148033590,
data=0xc2c309d0 �|��, flags=67, p=0xc2ed7a50, filp=0x17497)
at /usr/src/sys/dev/drm/drm_agpsupport.c:348
#13 0xc04c25e8 in drm_ioctl (kdev=0xc2ac3400, cmd=2148033590,
data=0xc2c309d0 �|��, flags=67, p=0xc2ed7a50)
---Type return to continue, or q return to quit---
at /usr/src/sys/dev/drm/drm_drv.c:911
#14 0xc0585ad6 in giant_ioctl (dev=0xc2ac3400, cmd=2148033590,
data=0xc2c309d0 �|��, fflag=67, td=0xc2ed7a50)
at /usr/src/sys/kern/kern_conf.c:349
#15 0xc0552517 in devfs_ioctl_f (fp=0xc2c3fd38, com=2148033590,
data=0xc2c309d0, cred=0xc4e62400, td=0xc2ed7a50)
at /usr/src/sys/fs/devfs/devfs_vnops.c:494
#16 0xc05e9b43 in kern_ioctl (td=0xc2ed7a50, fd=8, com=2148033590,
data=0xc2c309d0 �|��) at file.h:266
#17 0xc05e9ca4 in ioctl (td=0xc2ed7a50, uap=0xd61f1cfc)
at /usr/src/sys/kern/sys_generic.c:570
#18 0xc0765f0a in syscall (frame=0xd61f1d38)
at /usr/src/sys/i386/i386/trap.c:1035
#19 0xc0754620 in Xint0x80_syscall ()
at /usr/src/sys/i386/i386/exception.s:196
#20 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) frame 7
#7  0xc0776ed7 in agp_nvidia_flush_tlb (dev=0xc2a3b100, offset=-1029891720)
at /usr/src/sys/pci/agp_nvidia.c:377
warning: Source file is more recent than executable.

377 temp = ag_virtual[i * PAGE_SIZE / sizeof(u_int32_t)];
(kgdb) list
372
373 ag_virtual = (volatile u_int32_t *)sc-gatt-ag_virtual;
374
375 /* Flush TLB entries. */
376 /*for(i = 0; i  32 + 1; i++)
377 temp = ag_virtual[i * PAGE_SIZE / sizeof(u_int32_t)];
378 for(i = 0; i  32 + 1; i++)
379 temp = ag_virtual[i * PAGE_SIZE / sizeof(u_int32_t)];
380 */
381 return (0);
(kgdb) print i
$1 = 0
(kgdb) print *ag_virtual
$2 = 299405313
(kgdb) print ag_virtual
$3 = (volatile u_int32_t *) 0xd4d05000
(kgdb) print *sc
$4 = {agp = {as_aperture = 0xc2a34dc0, as_aperture_rid = 16,
as_maxmem = 461373440, as_allocated = 8388608,
as_state = AGP_ACQUIRE_KERNEL, as_memory = {tqh_first = 0xc2fe7cc0,
  tqh_last = 0xc2fe7cc0}, as_nextid = 2, as_isopen = 0,
as_devnode = 0xc2a3c300, as_lock = {lock_object = {
lo_name = 0xc07d15d1 agp lock, lo_type = 0xc07d15d1 agp lock,
lo_flags = 16973824, 

Re: Kernel crash on Asus A7N8X-X

2008-03-05 Thread Jeremy Chadwick
On Tue, Mar 04, 2008 at 11:59:59PM +0100, Frédéric PRACA wrote:
 Hello dear hackers,
 I own a Asus A7N8X-X motherboard (NForce2 chipset) with a Radeon 9600 video
 card. After upgrading from 6.3 to 7.0, I launched xorg which crashed the 
 kernel.
 After looking in the kernel core dump, I found that the agp_nvidia_flush_tlb
 function of /usr/src/sys/pci/agp_nvidia.c crashed on the line 377. The loop
 fails from the beginning (when i==0). I commented out the two last loops and 
 it
 seems to work now but as I didn't understand what is this code for, I'd like 
 to
 have some explanation about it and want to know if someone got the same 
 problem.

I'm in no way familiar with X.

That said: you're using an ATI Radeon card, yet the kernel crashed in
agp_nvidia.c.  nVidia != ATI.  Is it possible that you changed video
cards at one point, and you're still using the nVidia AGP driver (loaded
via /boot/loader.conf)?

dmesg might be useful here.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Kernel crash on Asus A7N8X-X

2008-03-05 Thread Frédéric PRACA
Selon Jeremy Chadwick [EMAIL PROTECTED]:

 On Tue, Mar 04, 2008 at 11:59:59PM +0100, Frédéric PRACA wrote:
  Hello dear hackers,
  I own a Asus A7N8X-X motherboard (NForce2 chipset) with a Radeon 9600 video
  card. After upgrading from 6.3 to 7.0, I launched xorg which crashed the
 kernel.
  After looking in the kernel core dump, I found that the
 agp_nvidia_flush_tlb
  function of /usr/src/sys/pci/agp_nvidia.c crashed on the line 377. The loop
  fails from the beginning (when i==0). I commented out the two last loops
 and it
  seems to work now but as I didn't understand what is this code for, I'd
 like to
  have some explanation about it and want to know if someone got the same
 problem.

 I'm in no way familiar with X.

 That said: you're using an ATI Radeon card, yet the kernel crashed in
 agp_nvidia.c.  nVidia != ATI.  Is it possible that you changed video
 cards at one point, and you're still using the nVidia AGP driver (loaded
 via /boot/loader.conf)?
No, in fact, agp_nvidia.c is the driver for the AGP bus of the NForce2
motherboard chipset, not the video card.

 dmesg might be useful here.
Why not but I can't copy it for the moment.

 --
 | Jeremy Chadwickjdc at parodius.com |
 | Parodius Networking   http://www.parodius.com/ |
 | UNIX Systems Administrator  Mountain View, CA, USA |
 | Making life hard for others since 1977.  PGP: 4BD6C0CB |




___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Kernel crash on Asus A7N8X-X

2008-03-04 Thread Frédéric PRACA
Hello dear hackers,
I own a Asus A7N8X-X motherboard (NForce2 chipset) with a Radeon 9600 video
card. After upgrading from 6.3 to 7.0, I launched xorg which crashed the kernel.
After looking in the kernel core dump, I found that the agp_nvidia_flush_tlb
function of /usr/src/sys/pci/agp_nvidia.c crashed on the line 377. The loop
fails from the beginning (when i==0). I commented out the two last loops and it
seems to work now but as I didn't understand what is this code for, I'd like to
have some explanation about it and want to know if someone got the same problem.

Fred

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]