8.2-PRERELEASE panic in NETGRAPH (ng_pppoe)

2010-12-03 Thread Artem Kim
Hello.

I have problem in one of my pppoe routers:

smp Xeon X5472, network adapter 82575EB 
mpd5,
FreeBSD nas4 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #0: Thu Dec  2 19:07:46 MSK 
2010 x...@nas4:/usr/obj/usr/src/sys/router  i386

extra kernel config:

options KVA_PAGES=512

sysctl:

kern.ipc.maxsockbuf=524288
kern.ipc.nmbclusters=65535

net.graph.recvspace=40960
net.graph.maxdgram=40960
net.graph.maxdata=1024

net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1

vm.kmem_size=1512M
vm.kmem_size_max=1512M

This panic the second similar incident in the last 7 hours.
some debug info:

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x44
fault code  = supervisor read, page not present
instruction pointer = 0x20:0x805dfb56
stack pointer   = 0x28:0xfbbab944
frame pointer   = 0x28:0xfbbab970
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 1944 (mpd5)
trap number = 12
panic: page fault
cpuid = 2
KDB: stack backtrace:
#0 0x805fce6d at kdb_backtrace+0x48
#1 0x805cdb9c at panic+0x108
#2 0x8079bbd2 at trap_fatal+0x24c
#3 0x8079bf8e at trap_pfault+0x270
#4 0x8079c3db at trap+0x371
#5 0x807842dc at calltrap+0x6
#6 0x8068c2ae at ng_uncallout+0x1b
#7 0x8069c454 at ng_pppoe_disconnect+0xf8
#8 0x8068d5cc at ng_destroy_hook+0xe0
#9 0x8068e5e9 at ng_apply_item+0x903
#10 0x8068cea7 at ng_snd_item+0x2e9
#11 0x806a04f8 at ngc_send+0x1d3
#12 0x8062e01a at sosend_generic+0x2aa
#13 0x80631df0 at kern_sendit+0xfc
#14 0x8063203f at sendit+0xcd
#15 0x80632122 at sendto+0x48
#16 0x80608641 at syscallenter+0x28d
#17 0x8079bfef at syscall+0x2e
Uptime: 7h53m5s
Physical memory: 2038 MB
Dumping 253 MB: 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14

#0  doadump () at pcpu.h:231
231 __asm(movl %%fs:0,%0 : =r (td));
(kgdb) f 6
#6  0x807842dc in calltrap () at /usr/src/sys/i386/i386/exception.s:166
166 calltrap
Current language:  auto; currently asm
(kgdb) l
161 SET_KERNEL_SREGS
162 cld
163 FAKE_MCOUNT(TF_EIP(%esp))
164 calltrap:
165 pushl   %esp
166 calltrap
167 add $4, %esp
168
169 /*
170  * Return via doreti to handle ASTs.
(kgdb) up
#7  0x805dfb56 in _callout_stop_safe (c=0x8c29f008, safe=0) at 
/usr/src/sys/kern/kern_timeout.c:683
683 if (c-c_lock == Giant.lock_object)
Current language:  auto; currently c
(kgdb) l
678 /*
679  * Some old subsystems don't hold Giant while running a 
callout_stop(),
680  * so just discard this check for the moment.
681  */
682 if (!safe  c-c_lock != NULL) {
683 if (c-c_lock == Giant.lock_object)
684 use_lock = mtx_owned(Giant);
685 else {
686 use_lock = 1;
687 class = LOCK_CLASS(c-c_lock);
(kgdb) p *c
$1 = {c_links = {sle = {sle_next = 0x9}, tqe = {tqe_next = 0x9, tqe_prev = 
0x40}}, c_time = 10, c_arg = 0x40, c_func = 0xc, c_lock = 0x40, c_flags = 13, 
c_cpu = 64}
(kgdb) up
#8  0x8068c2ae in ng_uncallout (c=0x8c29f008, node=0x874abb00) at 
/usr/src/sys/netgraph/ng_base.c:3732
3732rval = callout_stop(c);
(kgdb) l
3727int rval;
3728
3729KASSERT(c != NULL, (ng_uncallout: NULL callout));
3730KASSERT(node != NULL, (ng_uncallout: NULL node));
3731
3732rval = callout_stop(c);
3733item = c-c_arg;
3734/* Do an extra check */
3735if ((rval  0)  (c-c_func == ng_callout_trampoline) 
3736(NGI_NODE(item) == node)) {
(kgdb) up  
#9  0x8069c454 in ng_pppoe_disconnect (hook=0x88bcb880) at 
/usr/src/sys/netgraph/ng_pppoe.c:1791
1791ng_uncallout(sp-neg-handle, node);
(kgdb) l
1786/*
1787 * As long as we have somewhere to store the timeout 
handle,
1788 * we may have a timeout pending.. get rid of it.
1789 */
1790if (sp-neg) {
1791ng_uncallout(sp-neg-handle, node);
1792if (sp-neg-m)
1793m_freem(sp-neg-m);
1794free(sp-neg, M_NETGRAPH_PPPOE);
1795}
(kgdb) p sp
$2 = 0x89391180
(kgdb) p *sp
$3 = {hook = 0x88bcb880, Session_ID = 0, state = PPPOE_SOFFER, creator = 47, 
pkt_hdr = {eh = {ether_dhost = \000\000\000\000\000, ether_shost = 
\000\000\000\000\000, 
  ether_type = 0}, ph = {ver = 0 '\0', type = 0 '\0', code = 0 '\0', sid = 
0, length = 0}}, neg = 0x8c29f000, sessions = {le_next = 0x0, le_prev = 0x0}}
(kgdb) p *sp-neg 
$4 = {m = 0x7, pkt = 0x40, handle 

Re: 8.2-PRERELEASE panic in NETGRAPH (ng_pppoe)

2010-12-03 Thread Artem Kim
I wrote ng_pppoe in the subject when in fact it's probably not true.

It seems that the  NETGRAPH's problem or something wrong on my system.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems with bge (possibly related to r208993)

2010-06-15 Thread Artem Kim
On Tuesday 15 June 2010 21:50:03 you wrote:
. . .
  nas2 # netstat-ndI bge1
  Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop
  bge1 1500 Link#3 00:1 b: 78: a3: 3c: 01 418543876 1972918 0 446063237 0
  0 0 bge1 1500 XX.XX.6.12 XX.XX.6.133 890,306 - - 1,076,833 - - -
 
 Ok, I see very large number of Ierrs here. When you send some packets
 from other hosts to nas2(bge1), do you see Ierrs counter is
 increasing?
. . .
 
 It seems RX does not work at all. Because you have zero Drop(from
 netstat) I think you didn't hit mbuf resource shortage situation.
 Ierr counter is increased whenever controller drops frames due to
 receiving errors(e.g. CRC). Given that you have no cabling issue,
 it could be caused by speed/duplex mismatches between bge1 and link
 partner. Does the link partner also agrees on resolved speed/duplex
 of bge1?

I had some negotiation problems. But the problems were observed on the other 
NIC - bge0. bge0 is connected to the dlink-3627 and bge1 is not always setup 
speed/duplex mode correctly. Usually this is solved by link0 setting. Flag 
link0 I set for bge1 and bge0. Flag link0 used quite a long time (years).

bge1 and bge0 have link0, when I got the problem on NAS2 first time. Then I 
reset link0 and reboot NAS2. After some time I got the same problem again 
(current state). However, I do not see any obvious problems with  bge0 - AT-
x900.

current state of the bge0 link partner:

awplusshow int port1.0.12
Interface port1.0.12
  Scope: both
  Link is UP, administrative state is UP
  Thrash-limiting
Status Not Detected, Action learn-disable, Timeout 1(s)
  Hardware is Ethernet, address is .cd29.6e09
  index 5012 metric 1 mru 1522
  current duplex full, current speed 1000, polarity auto
  configured duplex auto, configured speed auto
  UP,BROADCAST,RUNNING,MULTICAST
  VRF Binding: Not bound
  SNMP link-status traps: Disabled
input packets 136255660241, bytes 119549292157319, dropped 0, multicast 
packets 5482013
output packets 122988526534, bytes 121030195520423, multicast packets 
532582 broadcast packets 2198512

awplusshow int port1.0.12 status
Port   Name   Status   Vlan Duplex   Speed Type
port1.0.12connected  55 a-full  a-1000 1000BASE-T

awplussh mac address-table |i port1.0.12
55   port1.0.12   001b.78a3.3c01   forward   dynamic


nas2# ifconfig bge1
bge1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500

options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE
ether 00:1b:78:a3:3c:01
inet XX.XX.6.133 netmask 0xffc0 broadcast XX.XX.6.191
media: Ethernet autoselect (1000baseT full-duplex)
status: active


I tried to do ping -i .01 XX.XX.6.133  from other host:
nas2# netstat -hI bge1 1
input (bge1)   output
   packets  errs idrops  bytespackets  errs  bytes colls
 0 0 0  0  0 0  0 0
 0 1 0  0  0 0  0 0
 0 0 0  0  0 0  0 0
 0 0 0  0  0 0  0 0
ping- 033 0  0  0 0  0 0
 094 0  0  0 0  0 0
 093 0  0  0 0  0 0
 094 0  0  0 0  0 0
 094 0  0  0 0  0 0
 083 0  0  0 0  0 0
 0 0 0  0  0 0  0 0


ping -i .01 XX.XX.6.129 from NAS2 (XX.XX.6.129 have static arp-entry):

nas2# netstat -hI bge1 1
input (bge1)   output
   packets  errs idrops  bytespackets  errs  bytes colls
 0 1 0  0  0 0  0 0
 0 1 0  0  0 0  0 0
 0 0 0  0  0 0  0 0
ping-   0 0 0  0  0 0  0 0
 040 0  0 62 0   5.9K 0
 093 0  0 89 0   8.5K 0
 091 0  0 89 0   8.5K 0
 091 0  0 88 0   8.4K 0
 091 0  0 89 0   8.5K 0
 092 0  0 88 0   8.4K 0
 093 0  0 88 0   8.4K 0
 092 0  0 89 0   8.5K 0
 0 0 0  0 85 0   8.1K 0
 087 0  0  0 0  0 0


ping -i .01 XX.XX.6.133 from other host:

before:
nas2# netstat -ndI bge1

NameMtu Network   Address  Ipkts Ierrs Idrop

Re: Problems with bge (possibly related to r208993)

2010-06-15 Thread Artem Kim
On Wednesday 16 June 2010 03:21:09 Pyun YongHyeon wrote:

 Hmm, why you need link0 flag? The link0 flag is used to force the
 interface MASTER. Normally this configuration is automatically
 done during auto-negotiation such that one is configured as MASTER
 and the other is configured as SLAVE. If you manually configure
 this setting you should be very careful not to use the same
 configuration of MASTER/SLAVE of link partner. If you have to use
 link0 option, the link partner should be configured to use SLAVE.
 Normally you should always use auto-negotiation on 1000baseT unless
 link partner is severely broken to support NWAY.
 
 It seems link partner does not agree on resolved speed/duplex
 configuration of bge1. Check link partner's resolved link
 configuration.
 

In any case, now I do not use the flag link0. 
Now the master/slave is  assigned through auto-negotiation. 

link0 not been set before the problem occurred. I reset link0 flag on NAS2 
when I got a problem the first time.

Now x900 port1.0.12 and bge0 configured automatically.

bge1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500
media: Ethernet autoselect (1000baseT full-duplex)
status: active

awplusshow int port1.0.12 status
Port   Name   Status   Vlan Duplex   Speed Type
port1.0.12connected  55 a-full  a-1000 1000BASE-T
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Problems with bge (possibly related to r208993)

2010-06-14 Thread Artem Kim
On Tuesday 15 June 2010 01:03:43 Pyun YongHyeon wrote:
 On Sun, Jun 13, 2010 at 07:34:11PM +0400, Artem Kim wrote:
  Hi,
 
  I have two routers (HP DL140G3):
 
  NAS3 FreeBSD 8.1-PRERELEASE # 0: Thu Jun 3 04:13:07 MSD 2010 i386
  NAS2 FreeBSD 8.1-PRERELEASE # 0: Sat Jun 12 16:42:19 UTC 2010 i386
  (r208993 included)
 
  bge0 @ pci0: 19:0:0: class = 0x02 card = 0x3260103c chip = 0x165914e4
  rev = 0x11 hdr = 0x00
  vendor = 'Broadcom Corporation'
  device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)'
  class = network
  subclass = ethernet
  bge1 @ pci0: 20:0:0: class = 0x02 card = 0x3260103c chip = 0x165914e4
  rev = 0x11 hdr = 0x00
  vendor = 'Broadcom Corporation'
  device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)'
  class = network
  subclass = ethernet
 
 
  I have some problems with bge on NAS2.
 
  After some time (about 15 hours) bge1 stops flowing traffic.
  NAS3 NAS3 - pppoe server. Through bge1 passes only ip traffic through
  bge0 no ip-traffic.
  Problems occur only with the bge1 interface on NAS2.
 
 
  Traffic through bge1 not pass until I will not do ifconfig bge1 down
  ifconfig bge1 up.
 
  When I do ifconfig bge0 down NIC does not shutdown:
 
  nas2 # ifconfig bge1 down
  nas2 #
  nas2 # ifconfig bge1
  bge1: flags = 8843 UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu
  1500 options = 8009b
  RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE
  ether X
  inet YYY netmask 0xffc0 broadcast 
  media: Ethernet autoselect (1000baseT full-duplex)
  status: active
 
  LED also indicates that the NIC is active.
 
  I left the NAS in a state of frozen bge1 - and can provide additional
  information for diagnosis.
 
 Try run tcpdump on bge1 and see whether driver still see incoming
 traffic. Also show me the output of netstat -ndI bge1 and output
 of sysctl dev.bge.1.stats. Verbose dmesg output also would be
 helpful.

nas2 # netstat-ndI bge1
Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop
bge1 1500 Link#3 00:1 b: 78: a3: 3c: 01 418543876 1972918 0 446063237 0 0 0
bge1 1500 XX.XX.6.12 XX.XX.6.133 890,306 - - 1,076,833 - - -


Should I add additional debugging options?

nas2 # sysctl dev.bge.1.stats
sysctl: unknown oid 'dev.bge.1.stats'

nas2 # sysctl dev.bge.1
dev.bge.1.% desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 
0x004101
dev.bge.1.% driver: bge
dev.bge.1.% location: slot = 0 function = 0
dev.bge.1.% pnpinfo: vendor = 0x14e4 device = 0x1659 subvendor = 0x103c 
subdevice = 0x3260 class = 0x02
dev.bge.1.% parent: pci20
dev.bge.1.forced_collapse: 0

I can show verbose dmesg, but this requires a reboot so bge1 come out of the 
current state.


I looked tcpdump on NAS2 - and I only saw the ARP requests from NAS2 (NAS2 - 
XX.XX.6.133):

nas2 # tcpdump -i bge1
tcpdump: verbose output suppressed, use-v or-vv for full protocol decode
listening on bge1, link-type EN10MB (Ethernet), capture size 96 bytes
01:23:43.063238 ARP, Request who-has XX.XX.6.129 tell XX.XX.6.133, length 28
01:23:43.162257 ARP, Request who-has XX.XX.6.129 tell XX.XX.6.133, length 28
01:23:43.935016 ARP, Request who-has XX.XX.6.129 tell XX.XX.6.133, length 28


XX.XX.6.129 is l3-switch(AT-X900) default router for NAS2; bge1 is directly 
connected to the x900.

I looked tcpdump on the x900:

awplus # tcpdump -ni vlanXX host XX.XX.6.133
05:36:30.455642 arp who-has XX.XX.6.129 tell XX.XX.6.133
05:36:30.455898 arp reply XX.XX.6.129 is-at 00:00: cd: 29:6 e: 09
05:36:31.483353 arp who-has XX.XX.6.129 tell XX.XX.6.133
05:36:31.483505 arp reply XX.XX.6.129 is-at 00:00: cd: 29:6 e: 09
05:36:32.511260 arp who-has XX.XX.6.129 tell XX.XX.6.133
05:36:32.511353 arp reply XX.XX.6.129 is-at 00:00: cd: 29:6 e: 09
05:36:33.539163 arp who-has XX.XX.6.129 tell XX.XX.6.133

ARP requests from NAS2 (XX.XX.6.133). But on NAS2 I can _only_ see ARP-
requests from NAS2.

I added static arp-entry on NAS2 and do ping XX.XX.6.129.

Then I looked again at tcpdump on XX.XX.6.129:

awplus # tcpdump -nei vlanXX host XX.XX.6.133
06:13:03.472539 00:00: cd: 29:6 e: 09 ff: ff: ff: ff: ff: ff, ethertype ARP 
(0x0806), length 42: arp who-has XX.XX.6.133 tell XX. XX.6.129
06:13:03.526768 00:1 b: 78: a3: 3c: 01 00:00: cd: 29:6 e: 09, ethertype IPv4 
(0x0800), length 98: XX.XX.6.133 XX.XX.6.129: ICMP echo request, id 6958, seq 
1920, length 64
06:13:04.553728 00:1 b: 78: a3: 3c: 01 00:00: cd: 29:6 e: 09, ethertype IPv4 
(0x0800), length 98: XX.XX.6.133 XX.XX.6.129: ICMP echo request, id 6958, seq 
1921, length 64
06:13:04.554495 00:00: cd: 29:6 e: 09 ff: ff: ff: ff: ff: ff, ethertype ARP 
(0x0806), length 42: arp who-has XX.XX.6.133 tell XX. XX.6.129
06:13:05.554486 00:00: cd: 29:6 e: 09 ff: ff: ff: ff: ff: ff, ethertype ARP 
(0x0806), length 42: arp who-has XX.XX.6.133 tell XX. XX.6.129
06:13:05.581488 00:1 b: 78: a3: 3c: 01 00:00: cd: 29:6 e: 09, ethertype IPv4 
(0x0800), length 98: XX.XX.6.133

Problems with bge (possibly related to r208993)

2010-06-13 Thread Artem Kim
Hi,

I have two routers (HP DL140G3):

NAS3 FreeBSD 8.1-PRERELEASE # 0: Thu Jun 3 04:13:07 MSD 2010 i386
NAS2 FreeBSD 8.1-PRERELEASE # 0: Sat Jun 12 16:42:19 UTC 2010 i386 (r208993 
included)

bge0 @ pci0: 19:0:0: class = 0x02 card = 0x3260103c chip = 0x165914e4 rev 
= 0x11 hdr = 0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)'
class = network
subclass = ethernet
bge1 @ pci0: 20:0:0: class = 0x02 card = 0x3260103c chip = 0x165914e4 rev 
= 0x11 hdr = 0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme Gigabit Ethernet PCI Express (BCM5721)'
class = network
subclass = ethernet


I have some problems with bge on NAS2.

After some time (about 15 hours) bge1 stops flowing traffic.
NAS3 NAS3 - pppoe server. Through bge1 passes only ip traffic through bge0 no 
ip-traffic.
Problems occur only with the bge1 interface on NAS2.


Traffic through bge1 not pass until I will not do ifconfig bge1 down ifconfig 
bge1 up.

When I do ifconfig bge0 down NIC does not shutdown:

nas2 # ifconfig bge1 down
nas2 #
nas2 # ifconfig bge1
bge1: flags = 8843 UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500
options = 8009b 
RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE
ether X
inet YYY netmask 0xffc0 broadcast 
media: Ethernet autoselect (1000baseT full-duplex)
status: active
  
LED also indicates that the NIC is active.

I left the NAS in a state of frozen bge1 - and can provide additional 
information for diagnosis.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 7.3/i386 libalias related panic

2010-04-08 Thread Artem Kim
On Tuesday 06 April 2010 23:24:52 Peter Jeremy wrote:
 On 2010-Apr-06 00:37:51 +0400, Artem Kim artem_...@inbox.ru wrote:
 Fatal trap 12: page fault while in kernel mode
 cpuid = 1; apic id = 01
 fault virtual address   = 0x7d4c
 
 This suggests an offset from a NULL pointer.
 
 0x8069ac41 is in DeleteLink
  (/usr/src/sys/netinet/libalias/alias_db.c:857). 852 {
 853 struct libalias *la = lnk-la;
 854
 855 LIBALIAS_LOCK_ASSERT(la);
 856 /* Don't do anything if the link is marked permanent */
 857 if (la-deleteAllLinks == 0  lnk-flags 
  LINK_PERMANENT) 858 return;
 
 (kgdb) bt
 #7  0x8069ac41 in DeleteLink (lnk=0x84e0f980) at
  /usr/src/sys/netinet/libalias/alias_db.c:853 #8  0x8069ae3e in
  HouseKeeping (la=0x84874000) at
  /usr/src/sys/netinet/libalias/alias_db.c:843
 
 In the absence of someone who's seen this before, my initial guess is
 that lnk-la is corrupted in frame #7.  I'd start with 'print *lnk' at
 frame #7 to confirm this.  If so, you could go up to frame #8 and work
 through the linkTableOut chain to find which entry is corrupt - but
 actually finding _why_ it's corrupt will take a lot more work.
 
 If this is repeatable, I'd suggest adding WITNESS, WITNESS_SKIPSPIN
 and INVARIANTS and see if you can get the problem to show up closer
 to its cause.
 

I have three almost nearly identical machines (two HP DL-140G3 and a HP 
DL-160G5). These machines have approximately the same setting.

Problem occurred only on one (140G3).

Two errors occurred in intervals of one hour. Last error happened three days 
ago. Until now, the problem is not repeated.
Introducing additional options to debug the kernel - it is very difficult to 
machine is under heavy load. On a test desk, I can not reproduce the problem.

(kgdb) f 7
#7  0x8069ac41 in DeleteLink (lnk=0x84e0f980) at 
/usr/src/sys/netinet/libalias/alias_db.c:853
853 struct libalias *la = lnk-la;
(kgdb) print *lnk
$1 = {la = 0x0, src_addr = {s_addr = 1}, dst_addr = {s_addr = 0}, alias_addr = 
{s_addr = 0}, proxy_addr = {s_addr = 0}, src_port = 0, dst_port = 0,
  alias_port = 0, proxy_port = 0, server = 0x0, link_type = 0, flags = 0, 
pflags = 0, timestamp = 0, expire_time = 0, list_out = {le_next = 0x0,
le_prev = 0x853dcdb4}, list_in = {le_next = 0x0, le_prev = 0x84861c48}, 
data = {frag_ptr = 0x0, frag_addr = {s_addr = 0}, tcp = 0x0}}


I'm sorry I do not understand what I should do next.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


FreeBSD 7.3/i386 libalias related panic

2010-04-05 Thread Artem Kim
Hi,
I have a machine that acts as a NAS (mpd5 PPPoE).

Also on the same machine using NAT (ipfw + ng_nat).

Not so long ago, during one hour, I have two identical kernel panic:


FreeBSD nas3.xxx.ru 7.3-RELEASE FreeBSD 7.3-RELEASE #0: Sun Mar 21 17:55:26 
MSK 2010 i386


nas3# kgdb kernel.debug /var/crash/vmcore.1
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-marcel-freebsd...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x7d4c
fault code  = supervisor read, page not present
instruction pointer = 0x20:0x8069ac41
stack pointer   = 0x28:0xd259a8b0
frame pointer   = 0x28:0xd259a8c8
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 27 (irq17: bge1)
trap number = 12
panic: page fault
cpuid = 1
Uptime: 1h14m2s
Physical memory: 1014 MB
Dumping 103 MB: 88 72 56 40 24bge1: watchdog timeout -- resetting
 8
5bge1: link state changed to DOWN

Reading symbols from /boot/kernel/acpi.ko...Reading symbols from 
/boot/kernel/acpi.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/acpi.ko
#0  doadump () at pcpu.h:196
196 __asm __volatile(movl %%fs:0,%0 : =r (td));


(kgdb) list *0x8069ac41
0x8069ac41 is in DeleteLink (/usr/src/sys/netinet/libalias/alias_db.c:857).
852 {
853 struct libalias *la = lnk-la;
854
855 LIBALIAS_LOCK_ASSERT(la);
856 /* Don't do anything if the link is marked permanent */
857 if (la-deleteAllLinks == 0  lnk-flags  LINK_PERMANENT)
858 return;
859
860 #ifndef NO_FW_PUNCH
861 /* Delete associated firewall hole, if any */
(kgdb)

(kgdb) bt
#0  doadump () at pcpu.h:196
#1  0x8059ce94 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0x8059d31a in panic (fmt=0x104 Address 0x104 out of bounds) at 
/usr/src/sys/kern/kern_shutdown.c:574
#3  0x807855dd in trap_fatal (frame=0xd259a870, eva=40) at 
/usr/src/sys/i386/i386/trap.c:950
#4  0x8078595a in trap_pfault (frame=0xd259a870, usermode=0, eva=32076) at 
/usr/src/sys/i386/i386/trap.c:863
#5  0x80786277 in trap (frame=0xd259a870) at /usr/src/sys/i386/i386/trap.c:541
#6  0x8076b0eb in calltrap () at /usr/src/sys/i386/i386/exception.s:166
#7  0x8069ac41 in DeleteLink (lnk=0x84e0f980) at 
/usr/src/sys/netinet/libalias/alias_db.c:853
#8  0x8069ae3e in HouseKeeping (la=0x84874000) at 
/usr/src/sys/netinet/libalias/alias_db.c:843
#9  0x8069947b in LibAliasInLocked (la=0x84874000, ptr=0x8458e810 E, 
maxpacketsize=2032) at /usr/src/sys/netinet/libalias/alias.c:1246
#10 0x8069a225 in LibAliasIn (la=0x84874000, ptr=0x8458e810 E, 
maxpacketsize=2032) at /usr/src/sys/netinet/libalias/alias.c:1228
#11 0x8065fd91 in ng_nat_rcvdata (hook=0x84842900, item=0x84cebba0) at 
/usr/src/sys/netgraph/ng_nat.c:707
#12 0x80658606 in ng_apply_item (node=0x847de780, item=0x84cebba0, rw=1) at 
/usr/src/sys/netgraph/ng_base.c:2336
#13 0x80657607 in ng_snd_item (item=0x84cebba0, flags=Variable flags is not 
available.
) at /usr/src/sys/netgraph/ng_base.c:2254
#14 0x8067e4b6 in ipfw_check_in (arg=0x0, m0=0xd259aba8, ifp=0x84179800, 
dir=1, inp=0x0) at /usr/src/sys/netinet/ip_fw_pfil.c:189
#15 0x8064af6f in pfil_run_hooks (ph=0x80847c00, mp=0xd259ac00, 
ifp=0x84179800, dir=1, inp=0x0) at /usr/src/sys/net/pfil.c:78
#16 0x806812bd in ip_input (m=0x87135900) at 
/usr/src/sys/netinet/ip_input.c:416
#17 0x8063efba in ether_demux (ifp=0x84179800, m=0x87135900) at 
/usr/src/sys/net/if_ethersubr.c:834
#18 0x8063f1d6 in ether_input (ifp=0x84179800, m=0x87135900) at 
/usr/src/sys/net/if_ethersubr.c:692
#19 0x80490c8f in bge_rxeof (sc=0x84187000, rx_prod=465, holdlck=1) at 
/usr/src/sys/dev/bge/if_bge.c:3392
#20 0x80492d67 in bge_intr (xsc=0x84187000) at 
/usr/src/sys/dev/bge/if_bge.c:3653
#21 0x8057c7bb in ithread_loop (arg=0x84180500) at 
/usr/src/sys/kern/kern_intr.c:1181
#22 0x80578f25 in fork_exit (callout=0x8057c698 ithread_loop, 
arg=0x84180500, frame=0xd259ad38) at /usr/src/sys/kern/kern_fork.c:811
#23 0x8076b160 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:271


Thanks for any help !
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 7.2-PRERELEASE X-server hang in drmwtq

2009-04-25 Thread Artem Kim
I checked 7.2 RC2 problem still here.

I found a way to reproduce the problem easily.

I used KDE 4.2.2 composite manager is enabled. The problem occurs when two 
applications run in a way that their window to appear at the same time.

I can reproduce the problem on the cards Radeon 9800 XT (AMD64 UP) and  Radeon 
X550 (AMD64 SMP).
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 7.2-PRERELEASE X-server hang in drmwtq

2009-04-25 Thread Artem Kim
On Saturday 25 April 2009 19:18:43 Robert Noland wrote:
 On Sat, 2009-04-25 at 16:24 +0400, Artem Kim wrote:
  I checked 7.2 RC2 problem still here.
 
  I found a way to reproduce the problem easily.
 
  I used KDE 4.2.2 composite manager is enabled. The problem occurs when
  two applications run in a way that their window to appear at the same
  time.

 Ok, luckily I don't think that KDE is important... compositing might be.
 Can you give a more complete example of how to trigger the hang?  I
 don't have any r300 based cards handy right now.  AMD is sending them
 though, so it shouldn't be long...

  I can reproduce the problem on the cards Radeon 9800 XT (AMD64 UP) and 
  Radeon X550 (AMD64 SMP).

 Are these AGP or PCI(e)?

 robert.


I'm using KDE 4.2.2 as a test.

The problem occurs only if the composite manager is enabled.

The problem occurs spontaneously when the new window is created.

A reliable way to reproduce the problem - run concurrently
several applications that create new windows. Typically, a window appears on 
the screen with some delay after starting the application.
Time delays occur (drawing) of a new window depending on the application.

The problem occurs if one or more applications have opened new windows
(the window starts to draw on the screen) at about the same time.
You can run fast (this is important) one after another Konqueror, System 
Settings, File Manager, it is enough to reproduce the problem.

The problem looks like this:
X-server in drmwtq state.
The screen freezes or just turns off.
The keyboard sometimes works, sometimes not.

I used a 9800 AGP at the UP and X550 PCI-E to the SMP AMD64 system.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 7.2-PRERELEASE X-server hang in drmwtq

2009-04-25 Thread Artem Kim
  Apr 25 23:44:04 test kernel: [drm: pid782: drm_ioctl] pid = 782, cmd =
  0x80046457, nr = 0x57, dev 0xff0001556d00, auth = 1
  Apr 25 23:44:04 test kernel: [drm: pid782: drm_ioctl] returning 4

 Ok, so what this is saying is that pid 782 is waiting on the rendering
 engine to catch up.  The returning 4 part says that we were
 interrupted while we were waiting.  libdrm retries the wait, which
 should return immediately if the engine has caught up now.  It never
 appears to catch up, so either the counter is getting corrupted or we
 failed to get the commands submitted to the card like we thought, or we
 have locked up the GPU.

 What does it take to recover from this?  Do you have to reboot, or is
 killing the process that initiated the wait sufficient?

 robert.

In most cases, the system will remain available through the network. The 
computer can be turned off via acpi power button.
However, if you do kill -KILL XORG-PID, after it is impossible to shut down 
the system correctly. The system continues to be available through the 
network, Xorg is activated and holds up to 100% of one of the cores CPU.

In the kernel messages appear:

Apr 26 01:30:05 test kernel: [drm:pid1107:radeon_do_wait_for_fifo] wait for 
fifo failed status : 0x8411413D 0x9C000800
Apr 26 01:30:05 test kernel: [drm:pid1107:radeon_do_release] radeon_do_cp_idle 
-16
Apr 26 01:30:05 test kernel: [drm:pid1107:radeon_do_cp_idle]

Reboot the system is possible only via a hardware reset.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


7.2-PRERELEASE X-server hang in drmwtq

2009-04-02 Thread Artem Kim
Hi.

In last time, I have a problem with stability
on my system:

7.2-PRERELEASE Thu Apr 2 20:20:31 MSD 2009 amd64 (UP); ati 9800-XT

From time to time the x-server go in drmwtq state if the AIGLX is enabled.
This usually happens when creating a new window.

If I setup hw.dri.0.debug to 1, I get a lot of
messages:

[drm: pid1469: drm_ioctl] pid = 1469, cmd = 0x80046457, nr = 0x57, dev 
0xff0001306800, auth = 1
[drm: pid1469: drm_ioctl] returning -1

I can see a recurring message in in ktrace:

1469 Xorg PSIG SIGALRM caught handler = 0x4dca90 mask = 0x0 code = 0x0
1469 Xorg CALL sigreturn (0x7fffe5b0)
1469 Xorg RET sigreturn JUSTRETURN
1469 Xorg CALL ioctl (0xa, 0x80046457, 0x8156e807c)
1469 Xorg RET ioctl RESTART

The problems started after vblank rework in the STABLE.
The first time I got a panic when i try to restart or shutdown x-server,
but the problem with panic was solved (for me ;)) quickly.

I am ready to provide any additional information.

Many thanks for your work.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org