8.2-PRERELEASE panic in NETGRAPH (ng_pppoe)
Hello. I have problem in one of my pppoe routers: smp Xeon X5472, network adapter 82575EB mpd5, FreeBSD nas4 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #0: Thu Dec 2 19:07:46 MSK 2010 x...@nas4:/usr/obj/usr/src/sys/router i386 extra kernel config: options KVA_PAGES=512 sysctl: kern.ipc.maxsockbuf=524288 kern.ipc.nmbclusters=65535 net.graph.recvspace=40960 net.graph.maxdgram=40960 net.graph.maxdata=1024 net.inet.tcp.blackhole=2 net.inet.udp.blackhole=1 vm.kmem_size=1512M vm.kmem_size_max=1512M This panic the second similar incident in the last 7 hours. some debug info: Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x44 fault code = supervisor read, page not present instruction pointer = 0x20:0x805dfb56 stack pointer = 0x28:0xfbbab944 frame pointer = 0x28:0xfbbab970 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 1944 (mpd5) trap number = 12 panic: page fault cpuid = 2 KDB: stack backtrace: #0 0x805fce6d at kdb_backtrace+0x48 #1 0x805cdb9c at panic+0x108 #2 0x8079bbd2 at trap_fatal+0x24c #3 0x8079bf8e at trap_pfault+0x270 #4 0x8079c3db at trap+0x371 #5 0x807842dc at calltrap+0x6 #6 0x8068c2ae at ng_uncallout+0x1b #7 0x8069c454 at ng_pppoe_disconnect+0xf8 #8 0x8068d5cc at ng_destroy_hook+0xe0 #9 0x8068e5e9 at ng_apply_item+0x903 #10 0x8068cea7 at ng_snd_item+0x2e9 #11 0x806a04f8 at ngc_send+0x1d3 #12 0x8062e01a at sosend_generic+0x2aa #13 0x80631df0 at kern_sendit+0xfc #14 0x8063203f at sendit+0xcd #15 0x80632122 at sendto+0x48 #16 0x80608641 at syscallenter+0x28d #17 0x8079bfef at syscall+0x2e Uptime: 7h53m5s Physical memory: 2038 MB Dumping 253 MB: 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14 #0 doadump () at pcpu.h:231 231 __asm(movl %%fs:0,%0 : =r (td)); (kgdb) f 6 #6 0x807842dc in calltrap () at /usr/src/sys/i386/i386/exception.s:166 166 calltrap Current language: auto; currently asm (kgdb) l 161 SET_KERNEL_SREGS 162 cld 163 FAKE_MCOUNT(TF_EIP(%esp)) 164 calltrap: 165 pushl %esp 166 calltrap 167 add $4, %esp 168 169 /* 170 * Return via doreti to handle ASTs. (kgdb) up #7 0x805dfb56 in _callout_stop_safe (c=0x8c29f008, safe=0) at /usr/src/sys/kern/kern_timeout.c:683 683 if (c-c_lock == Giant.lock_object) Current language: auto; currently c (kgdb) l 678 /* 679 * Some old subsystems don't hold Giant while running a callout_stop(), 680 * so just discard this check for the moment. 681 */ 682 if (!safe c-c_lock != NULL) { 683 if (c-c_lock == Giant.lock_object) 684 use_lock = mtx_owned(Giant); 685 else { 686 use_lock = 1; 687 class = LOCK_CLASS(c-c_lock); (kgdb) p *c $1 = {c_links = {sle = {sle_next = 0x9}, tqe = {tqe_next = 0x9, tqe_prev = 0x40}}, c_time = 10, c_arg = 0x40, c_func = 0xc, c_lock = 0x40, c_flags = 13, c_cpu = 64} (kgdb) up #8 0x8068c2ae in ng_uncallout (c=0x8c29f008, node=0x874abb00) at /usr/src/sys/netgraph/ng_base.c:3732 3732rval = callout_stop(c); (kgdb) l 3727int rval; 3728 3729KASSERT(c != NULL, (ng_uncallout: NULL callout)); 3730KASSERT(node != NULL, (ng_uncallout: NULL node)); 3731 3732rval = callout_stop(c); 3733item = c-c_arg; 3734/* Do an extra check */ 3735if ((rval 0) (c-c_func == ng_callout_trampoline) 3736(NGI_NODE(item) == node)) { (kgdb) up #9 0x8069c454 in ng_pppoe_disconnect (hook=0x88bcb880) at /usr/src/sys/netgraph/ng_pppoe.c:1791 1791ng_uncallout(sp-neg-handle, node); (kgdb) l 1786/* 1787 * As long as we have somewhere to store the timeout handle, 1788 * we may have a timeout pending.. get rid of it. 1789 */ 1790if (sp-neg) { 1791ng_uncallout(sp-neg-handle, node); 1792if (sp-neg-m) 1793m_freem(sp-neg-m); 1794free(sp-neg, M_NETGRAPH_PPPOE); 1795} (kgdb) p sp $2 = 0x89391180 (kgdb) p *sp $3 = {hook = 0x88bcb880, Session_ID = 0, state = PPPOE_SOFFER, creator = 47, pkt_hdr = {eh = {ether_dhost = \000\000\000\000\000, ether_shost = \000\000\000\000\000, ether_type = 0}, ph = {ver = 0 '\0', type = 0 '\0', code = 0 '\0', sid = 0, length = 0}}, neg = 0x8c29f000, sessions = {le_next = 0x0, le_prev = 0x0}} (kgdb) p *sp-neg $4 = {m = 0x7, pkt = 0x40, handle
Re: 8.2-PRERELEASE panic in NETGRAPH (ng_pppoe)
I wrote ng_pppoe in the subject when in fact it's probably not true. It seems that the NETGRAPH's problem or something wrong on my system. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems with bge (possibly related to r208993)
On Tuesday 15 June 2010 21:50:03 you wrote: . . . nas2 # netstat-ndI bge1 Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop bge1 1500 Link#3 00:1 b: 78: a3: 3c: 01 418543876 1972918 0 446063237 0 0 0 bge1 1500 XX.XX.6.12 XX.XX.6.133 890,306 - - 1,076,833 - - - Ok, I see very large number of Ierrs here. When you send some packets from other hosts to nas2(bge1), do you see Ierrs counter is increasing? . . . It seems RX does not work at all. Because you have zero Drop(from netstat) I think you didn't hit mbuf resource shortage situation. Ierr counter is increased whenever controller drops frames due to receiving errors(e.g. CRC). Given that you have no cabling issue, it could be caused by speed/duplex mismatches between bge1 and link partner. Does the link partner also agrees on resolved speed/duplex of bge1? I had some negotiation problems. But the problems were observed on the other NIC - bge0. bge0 is connected to the dlink-3627 and bge1 is not always setup speed/duplex mode correctly. Usually this is solved by link0 setting. Flag link0 I set for bge1 and bge0. Flag link0 used quite a long time (years). bge1 and bge0 have link0, when I got the problem on NAS2 first time. Then I reset link0 and reboot NAS2. After some time I got the same problem again (current state). However, I do not see any obvious problems with bge0 - AT- x900. current state of the bge0 link partner: awplusshow int port1.0.12 Interface port1.0.12 Scope: both Link is UP, administrative state is UP Thrash-limiting Status Not Detected, Action learn-disable, Timeout 1(s) Hardware is Ethernet, address is .cd29.6e09 index 5012 metric 1 mru 1522 current duplex full, current speed 1000, polarity auto configured duplex auto, configured speed auto UP,BROADCAST,RUNNING,MULTICAST VRF Binding: Not bound SNMP link-status traps: Disabled input packets 136255660241, bytes 119549292157319, dropped 0, multicast packets 5482013 output packets 122988526534, bytes 121030195520423, multicast packets 532582 broadcast packets 2198512 awplusshow int port1.0.12 status Port Name Status Vlan Duplex Speed Type port1.0.12connected 55 a-full a-1000 1000BASE-T awplussh mac address-table |i port1.0.12 55 port1.0.12 001b.78a3.3c01 forward dynamic nas2# ifconfig bge1 bge1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:1b:78:a3:3c:01 inet XX.XX.6.133 netmask 0xffc0 broadcast XX.XX.6.191 media: Ethernet autoselect (1000baseT full-duplex) status: active I tried to do ping -i .01 XX.XX.6.133 from other host: nas2# netstat -hI bge1 1 input (bge1) output packets errs idrops bytespackets errs bytes colls 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ping- 033 0 0 0 0 0 0 094 0 0 0 0 0 0 093 0 0 0 0 0 0 094 0 0 0 0 0 0 094 0 0 0 0 0 0 083 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ping -i .01 XX.XX.6.129 from NAS2 (XX.XX.6.129 have static arp-entry): nas2# netstat -hI bge1 1 input (bge1) output packets errs idrops bytespackets errs bytes colls 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ping- 0 0 0 0 0 0 0 0 040 0 0 62 0 5.9K 0 093 0 0 89 0 8.5K 0 091 0 0 89 0 8.5K 0 091 0 0 88 0 8.4K 0 091 0 0 89 0 8.5K 0 092 0 0 88 0 8.4K 0 093 0 0 88 0 8.4K 0 092 0 0 89 0 8.5K 0 0 0 0 0 85 0 8.1K 0 087 0 0 0 0 0 0 ping -i .01 XX.XX.6.133 from other host: before: nas2# netstat -ndI bge1 NameMtu Network Address Ipkts Ierrs Idrop
Re: Problems with bge (possibly related to r208993)
On Wednesday 16 June 2010 03:21:09 Pyun YongHyeon wrote: Hmm, why you need link0 flag? The link0 flag is used to force the interface MASTER. Normally this configuration is automatically done during auto-negotiation such that one is configured as MASTER and the other is configured as SLAVE. If you manually configure this setting you should be very careful not to use the same configuration of MASTER/SLAVE of link partner. If you have to use link0 option, the link partner should be configured to use SLAVE. Normally you should always use auto-negotiation on 1000baseT unless link partner is severely broken to support NWAY. It seems link partner does not agree on resolved speed/duplex configuration of bge1. Check link partner's resolved link configuration. In any case, now I do not use the flag link0. Now the master/slave is assigned through auto-negotiation. link0 not been set before the problem occurred. I reset link0 flag on NAS2 when I got a problem the first time. Now x900 port1.0.12 and bge0 configured automatically. bge1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 media: Ethernet autoselect (1000baseT full-duplex) status: active awplusshow int port1.0.12 status Port Name Status Vlan Duplex Speed Type port1.0.12connected 55 a-full a-1000 1000BASE-T ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Problems with bge (possibly related to r208993)
On Tuesday 15 June 2010 01:03:43 Pyun YongHyeon wrote: On Sun, Jun 13, 2010 at 07:34:11PM +0400, Artem Kim wrote: Hi, I have two routers (HP DL140G3): NAS3 FreeBSD 8.1-PRERELEASE # 0: Thu Jun 3 04:13:07 MSD 2010 i386 NAS2 FreeBSD 8.1-PRERELEASE # 0: Sat Jun 12 16:42:19 UTC 2010 i386 (r208993 included) bge0 @ pci0: 19:0:0: class = 0x02 card = 0x3260103c chip = 0x165914e4 rev = 0x11 hdr = 0x00 vendor = 'Broadcom Corporation' device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)' class = network subclass = ethernet bge1 @ pci0: 20:0:0: class = 0x02 card = 0x3260103c chip = 0x165914e4 rev = 0x11 hdr = 0x00 vendor = 'Broadcom Corporation' device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)' class = network subclass = ethernet I have some problems with bge on NAS2. After some time (about 15 hours) bge1 stops flowing traffic. NAS3 NAS3 - pppoe server. Through bge1 passes only ip traffic through bge0 no ip-traffic. Problems occur only with the bge1 interface on NAS2. Traffic through bge1 not pass until I will not do ifconfig bge1 down ifconfig bge1 up. When I do ifconfig bge0 down NIC does not shutdown: nas2 # ifconfig bge1 down nas2 # nas2 # ifconfig bge1 bge1: flags = 8843 UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options = 8009b RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether X inet YYY netmask 0xffc0 broadcast media: Ethernet autoselect (1000baseT full-duplex) status: active LED also indicates that the NIC is active. I left the NAS in a state of frozen bge1 - and can provide additional information for diagnosis. Try run tcpdump on bge1 and see whether driver still see incoming traffic. Also show me the output of netstat -ndI bge1 and output of sysctl dev.bge.1.stats. Verbose dmesg output also would be helpful. nas2 # netstat-ndI bge1 Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop bge1 1500 Link#3 00:1 b: 78: a3: 3c: 01 418543876 1972918 0 446063237 0 0 0 bge1 1500 XX.XX.6.12 XX.XX.6.133 890,306 - - 1,076,833 - - - Should I add additional debugging options? nas2 # sysctl dev.bge.1.stats sysctl: unknown oid 'dev.bge.1.stats' nas2 # sysctl dev.bge.1 dev.bge.1.% desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x004101 dev.bge.1.% driver: bge dev.bge.1.% location: slot = 0 function = 0 dev.bge.1.% pnpinfo: vendor = 0x14e4 device = 0x1659 subvendor = 0x103c subdevice = 0x3260 class = 0x02 dev.bge.1.% parent: pci20 dev.bge.1.forced_collapse: 0 I can show verbose dmesg, but this requires a reboot so bge1 come out of the current state. I looked tcpdump on NAS2 - and I only saw the ARP requests from NAS2 (NAS2 - XX.XX.6.133): nas2 # tcpdump -i bge1 tcpdump: verbose output suppressed, use-v or-vv for full protocol decode listening on bge1, link-type EN10MB (Ethernet), capture size 96 bytes 01:23:43.063238 ARP, Request who-has XX.XX.6.129 tell XX.XX.6.133, length 28 01:23:43.162257 ARP, Request who-has XX.XX.6.129 tell XX.XX.6.133, length 28 01:23:43.935016 ARP, Request who-has XX.XX.6.129 tell XX.XX.6.133, length 28 XX.XX.6.129 is l3-switch(AT-X900) default router for NAS2; bge1 is directly connected to the x900. I looked tcpdump on the x900: awplus # tcpdump -ni vlanXX host XX.XX.6.133 05:36:30.455642 arp who-has XX.XX.6.129 tell XX.XX.6.133 05:36:30.455898 arp reply XX.XX.6.129 is-at 00:00: cd: 29:6 e: 09 05:36:31.483353 arp who-has XX.XX.6.129 tell XX.XX.6.133 05:36:31.483505 arp reply XX.XX.6.129 is-at 00:00: cd: 29:6 e: 09 05:36:32.511260 arp who-has XX.XX.6.129 tell XX.XX.6.133 05:36:32.511353 arp reply XX.XX.6.129 is-at 00:00: cd: 29:6 e: 09 05:36:33.539163 arp who-has XX.XX.6.129 tell XX.XX.6.133 ARP requests from NAS2 (XX.XX.6.133). But on NAS2 I can _only_ see ARP- requests from NAS2. I added static arp-entry on NAS2 and do ping XX.XX.6.129. Then I looked again at tcpdump on XX.XX.6.129: awplus # tcpdump -nei vlanXX host XX.XX.6.133 06:13:03.472539 00:00: cd: 29:6 e: 09 ff: ff: ff: ff: ff: ff, ethertype ARP (0x0806), length 42: arp who-has XX.XX.6.133 tell XX. XX.6.129 06:13:03.526768 00:1 b: 78: a3: 3c: 01 00:00: cd: 29:6 e: 09, ethertype IPv4 (0x0800), length 98: XX.XX.6.133 XX.XX.6.129: ICMP echo request, id 6958, seq 1920, length 64 06:13:04.553728 00:1 b: 78: a3: 3c: 01 00:00: cd: 29:6 e: 09, ethertype IPv4 (0x0800), length 98: XX.XX.6.133 XX.XX.6.129: ICMP echo request, id 6958, seq 1921, length 64 06:13:04.554495 00:00: cd: 29:6 e: 09 ff: ff: ff: ff: ff: ff, ethertype ARP (0x0806), length 42: arp who-has XX.XX.6.133 tell XX. XX.6.129 06:13:05.554486 00:00: cd: 29:6 e: 09 ff: ff: ff: ff: ff: ff, ethertype ARP (0x0806), length 42: arp who-has XX.XX.6.133 tell XX. XX.6.129 06:13:05.581488 00:1 b: 78: a3: 3c: 01 00:00: cd: 29:6 e: 09, ethertype IPv4 (0x0800), length 98: XX.XX.6.133
Problems with bge (possibly related to r208993)
Hi, I have two routers (HP DL140G3): NAS3 FreeBSD 8.1-PRERELEASE # 0: Thu Jun 3 04:13:07 MSD 2010 i386 NAS2 FreeBSD 8.1-PRERELEASE # 0: Sat Jun 12 16:42:19 UTC 2010 i386 (r208993 included) bge0 @ pci0: 19:0:0: class = 0x02 card = 0x3260103c chip = 0x165914e4 rev = 0x11 hdr = 0x00 vendor = 'Broadcom Corporation' device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)' class = network subclass = ethernet bge1 @ pci0: 20:0:0: class = 0x02 card = 0x3260103c chip = 0x165914e4 rev = 0x11 hdr = 0x00 vendor = 'Broadcom Corporation' device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)' class = network subclass = ethernet I have some problems with bge on NAS2. After some time (about 15 hours) bge1 stops flowing traffic. NAS3 NAS3 - pppoe server. Through bge1 passes only ip traffic through bge0 no ip-traffic. Problems occur only with the bge1 interface on NAS2. Traffic through bge1 not pass until I will not do ifconfig bge1 down ifconfig bge1 up. When I do ifconfig bge0 down NIC does not shutdown: nas2 # ifconfig bge1 down nas2 # nas2 # ifconfig bge1 bge1: flags = 8843 UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options = 8009b RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether X inet YYY netmask 0xffc0 broadcast media: Ethernet autoselect (1000baseT full-duplex) status: active LED also indicates that the NIC is active. I left the NAS in a state of frozen bge1 - and can provide additional information for diagnosis. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD 7.3/i386 libalias related panic
On Tuesday 06 April 2010 23:24:52 Peter Jeremy wrote: On 2010-Apr-06 00:37:51 +0400, Artem Kim artem_...@inbox.ru wrote: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x7d4c This suggests an offset from a NULL pointer. 0x8069ac41 is in DeleteLink (/usr/src/sys/netinet/libalias/alias_db.c:857). 852 { 853 struct libalias *la = lnk-la; 854 855 LIBALIAS_LOCK_ASSERT(la); 856 /* Don't do anything if the link is marked permanent */ 857 if (la-deleteAllLinks == 0 lnk-flags LINK_PERMANENT) 858 return; (kgdb) bt #7 0x8069ac41 in DeleteLink (lnk=0x84e0f980) at /usr/src/sys/netinet/libalias/alias_db.c:853 #8 0x8069ae3e in HouseKeeping (la=0x84874000) at /usr/src/sys/netinet/libalias/alias_db.c:843 In the absence of someone who's seen this before, my initial guess is that lnk-la is corrupted in frame #7. I'd start with 'print *lnk' at frame #7 to confirm this. If so, you could go up to frame #8 and work through the linkTableOut chain to find which entry is corrupt - but actually finding _why_ it's corrupt will take a lot more work. If this is repeatable, I'd suggest adding WITNESS, WITNESS_SKIPSPIN and INVARIANTS and see if you can get the problem to show up closer to its cause. I have three almost nearly identical machines (two HP DL-140G3 and a HP DL-160G5). These machines have approximately the same setting. Problem occurred only on one (140G3). Two errors occurred in intervals of one hour. Last error happened three days ago. Until now, the problem is not repeated. Introducing additional options to debug the kernel - it is very difficult to machine is under heavy load. On a test desk, I can not reproduce the problem. (kgdb) f 7 #7 0x8069ac41 in DeleteLink (lnk=0x84e0f980) at /usr/src/sys/netinet/libalias/alias_db.c:853 853 struct libalias *la = lnk-la; (kgdb) print *lnk $1 = {la = 0x0, src_addr = {s_addr = 1}, dst_addr = {s_addr = 0}, alias_addr = {s_addr = 0}, proxy_addr = {s_addr = 0}, src_port = 0, dst_port = 0, alias_port = 0, proxy_port = 0, server = 0x0, link_type = 0, flags = 0, pflags = 0, timestamp = 0, expire_time = 0, list_out = {le_next = 0x0, le_prev = 0x853dcdb4}, list_in = {le_next = 0x0, le_prev = 0x84861c48}, data = {frag_ptr = 0x0, frag_addr = {s_addr = 0}, tcp = 0x0}} I'm sorry I do not understand what I should do next. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
FreeBSD 7.3/i386 libalias related panic
Hi, I have a machine that acts as a NAS (mpd5 PPPoE). Also on the same machine using NAT (ipfw + ng_nat). Not so long ago, during one hour, I have two identical kernel panic: FreeBSD nas3.xxx.ru 7.3-RELEASE FreeBSD 7.3-RELEASE #0: Sun Mar 21 17:55:26 MSK 2010 i386 nas3# kgdb kernel.debug /var/crash/vmcore.1 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-marcel-freebsd... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x7d4c fault code = supervisor read, page not present instruction pointer = 0x20:0x8069ac41 stack pointer = 0x28:0xd259a8b0 frame pointer = 0x28:0xd259a8c8 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 27 (irq17: bge1) trap number = 12 panic: page fault cpuid = 1 Uptime: 1h14m2s Physical memory: 1014 MB Dumping 103 MB: 88 72 56 40 24bge1: watchdog timeout -- resetting 8 5bge1: link state changed to DOWN Reading symbols from /boot/kernel/acpi.ko...Reading symbols from /boot/kernel/acpi.ko.symbols...done. done. Loaded symbols for /boot/kernel/acpi.ko #0 doadump () at pcpu.h:196 196 __asm __volatile(movl %%fs:0,%0 : =r (td)); (kgdb) list *0x8069ac41 0x8069ac41 is in DeleteLink (/usr/src/sys/netinet/libalias/alias_db.c:857). 852 { 853 struct libalias *la = lnk-la; 854 855 LIBALIAS_LOCK_ASSERT(la); 856 /* Don't do anything if the link is marked permanent */ 857 if (la-deleteAllLinks == 0 lnk-flags LINK_PERMANENT) 858 return; 859 860 #ifndef NO_FW_PUNCH 861 /* Delete associated firewall hole, if any */ (kgdb) (kgdb) bt #0 doadump () at pcpu.h:196 #1 0x8059ce94 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #2 0x8059d31a in panic (fmt=0x104 Address 0x104 out of bounds) at /usr/src/sys/kern/kern_shutdown.c:574 #3 0x807855dd in trap_fatal (frame=0xd259a870, eva=40) at /usr/src/sys/i386/i386/trap.c:950 #4 0x8078595a in trap_pfault (frame=0xd259a870, usermode=0, eva=32076) at /usr/src/sys/i386/i386/trap.c:863 #5 0x80786277 in trap (frame=0xd259a870) at /usr/src/sys/i386/i386/trap.c:541 #6 0x8076b0eb in calltrap () at /usr/src/sys/i386/i386/exception.s:166 #7 0x8069ac41 in DeleteLink (lnk=0x84e0f980) at /usr/src/sys/netinet/libalias/alias_db.c:853 #8 0x8069ae3e in HouseKeeping (la=0x84874000) at /usr/src/sys/netinet/libalias/alias_db.c:843 #9 0x8069947b in LibAliasInLocked (la=0x84874000, ptr=0x8458e810 E, maxpacketsize=2032) at /usr/src/sys/netinet/libalias/alias.c:1246 #10 0x8069a225 in LibAliasIn (la=0x84874000, ptr=0x8458e810 E, maxpacketsize=2032) at /usr/src/sys/netinet/libalias/alias.c:1228 #11 0x8065fd91 in ng_nat_rcvdata (hook=0x84842900, item=0x84cebba0) at /usr/src/sys/netgraph/ng_nat.c:707 #12 0x80658606 in ng_apply_item (node=0x847de780, item=0x84cebba0, rw=1) at /usr/src/sys/netgraph/ng_base.c:2336 #13 0x80657607 in ng_snd_item (item=0x84cebba0, flags=Variable flags is not available. ) at /usr/src/sys/netgraph/ng_base.c:2254 #14 0x8067e4b6 in ipfw_check_in (arg=0x0, m0=0xd259aba8, ifp=0x84179800, dir=1, inp=0x0) at /usr/src/sys/netinet/ip_fw_pfil.c:189 #15 0x8064af6f in pfil_run_hooks (ph=0x80847c00, mp=0xd259ac00, ifp=0x84179800, dir=1, inp=0x0) at /usr/src/sys/net/pfil.c:78 #16 0x806812bd in ip_input (m=0x87135900) at /usr/src/sys/netinet/ip_input.c:416 #17 0x8063efba in ether_demux (ifp=0x84179800, m=0x87135900) at /usr/src/sys/net/if_ethersubr.c:834 #18 0x8063f1d6 in ether_input (ifp=0x84179800, m=0x87135900) at /usr/src/sys/net/if_ethersubr.c:692 #19 0x80490c8f in bge_rxeof (sc=0x84187000, rx_prod=465, holdlck=1) at /usr/src/sys/dev/bge/if_bge.c:3392 #20 0x80492d67 in bge_intr (xsc=0x84187000) at /usr/src/sys/dev/bge/if_bge.c:3653 #21 0x8057c7bb in ithread_loop (arg=0x84180500) at /usr/src/sys/kern/kern_intr.c:1181 #22 0x80578f25 in fork_exit (callout=0x8057c698 ithread_loop, arg=0x84180500, frame=0xd259ad38) at /usr/src/sys/kern/kern_fork.c:811 #23 0x8076b160 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:271 Thanks for any help ! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 7.2-PRERELEASE X-server hang in drmwtq
I checked 7.2 RC2 problem still here. I found a way to reproduce the problem easily. I used KDE 4.2.2 composite manager is enabled. The problem occurs when two applications run in a way that their window to appear at the same time. I can reproduce the problem on the cards Radeon 9800 XT (AMD64 UP) and Radeon X550 (AMD64 SMP). ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 7.2-PRERELEASE X-server hang in drmwtq
On Saturday 25 April 2009 19:18:43 Robert Noland wrote: On Sat, 2009-04-25 at 16:24 +0400, Artem Kim wrote: I checked 7.2 RC2 problem still here. I found a way to reproduce the problem easily. I used KDE 4.2.2 composite manager is enabled. The problem occurs when two applications run in a way that their window to appear at the same time. Ok, luckily I don't think that KDE is important... compositing might be. Can you give a more complete example of how to trigger the hang? I don't have any r300 based cards handy right now. AMD is sending them though, so it shouldn't be long... I can reproduce the problem on the cards Radeon 9800 XT (AMD64 UP) and Radeon X550 (AMD64 SMP). Are these AGP or PCI(e)? robert. I'm using KDE 4.2.2 as a test. The problem occurs only if the composite manager is enabled. The problem occurs spontaneously when the new window is created. A reliable way to reproduce the problem - run concurrently several applications that create new windows. Typically, a window appears on the screen with some delay after starting the application. Time delays occur (drawing) of a new window depending on the application. The problem occurs if one or more applications have opened new windows (the window starts to draw on the screen) at about the same time. You can run fast (this is important) one after another Konqueror, System Settings, File Manager, it is enough to reproduce the problem. The problem looks like this: X-server in drmwtq state. The screen freezes or just turns off. The keyboard sometimes works, sometimes not. I used a 9800 AGP at the UP and X550 PCI-E to the SMP AMD64 system. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 7.2-PRERELEASE X-server hang in drmwtq
Apr 25 23:44:04 test kernel: [drm: pid782: drm_ioctl] pid = 782, cmd = 0x80046457, nr = 0x57, dev 0xff0001556d00, auth = 1 Apr 25 23:44:04 test kernel: [drm: pid782: drm_ioctl] returning 4 Ok, so what this is saying is that pid 782 is waiting on the rendering engine to catch up. The returning 4 part says that we were interrupted while we were waiting. libdrm retries the wait, which should return immediately if the engine has caught up now. It never appears to catch up, so either the counter is getting corrupted or we failed to get the commands submitted to the card like we thought, or we have locked up the GPU. What does it take to recover from this? Do you have to reboot, or is killing the process that initiated the wait sufficient? robert. In most cases, the system will remain available through the network. The computer can be turned off via acpi power button. However, if you do kill -KILL XORG-PID, after it is impossible to shut down the system correctly. The system continues to be available through the network, Xorg is activated and holds up to 100% of one of the cores CPU. In the kernel messages appear: Apr 26 01:30:05 test kernel: [drm:pid1107:radeon_do_wait_for_fifo] wait for fifo failed status : 0x8411413D 0x9C000800 Apr 26 01:30:05 test kernel: [drm:pid1107:radeon_do_release] radeon_do_cp_idle -16 Apr 26 01:30:05 test kernel: [drm:pid1107:radeon_do_cp_idle] Reboot the system is possible only via a hardware reset. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
7.2-PRERELEASE X-server hang in drmwtq
Hi. In last time, I have a problem with stability on my system: 7.2-PRERELEASE Thu Apr 2 20:20:31 MSD 2009 amd64 (UP); ati 9800-XT From time to time the x-server go in drmwtq state if the AIGLX is enabled. This usually happens when creating a new window. If I setup hw.dri.0.debug to 1, I get a lot of messages: [drm: pid1469: drm_ioctl] pid = 1469, cmd = 0x80046457, nr = 0x57, dev 0xff0001306800, auth = 1 [drm: pid1469: drm_ioctl] returning -1 I can see a recurring message in in ktrace: 1469 Xorg PSIG SIGALRM caught handler = 0x4dca90 mask = 0x0 code = 0x0 1469 Xorg CALL sigreturn (0x7fffe5b0) 1469 Xorg RET sigreturn JUSTRETURN 1469 Xorg CALL ioctl (0xa, 0x80046457, 0x8156e807c) 1469 Xorg RET ioctl RESTART The problems started after vblank rework in the STABLE. The first time I got a panic when i try to restart or shutdown x-server, but the problem with panic was solved (for me ;)) quickly. I am ready to provide any additional information. Many thanks for your work. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org