Re: R8169 driver & Netgear GA511 NIC - soft lockup/hang in rtl8169_phy_timer
On Mon, 2006-01-02 at 00:15 +0100, Francois Romieu wrote: > Russell Steinthal <[EMAIL PROTECTED]> : > [...] > > I'm seeing a "hang" under 2.6.14.4 when the link goes up (I think) on a > > Netgear GA511 PCMCIA gigabit NIC. I put "hang" in quotes because while > > the system is generally unresponsive, Magic SysRq responds and I can > > return the system to normal by ejecting the card. > > Can you reproduce the issue w/o link up at boot time with 2.6.15-rc7 ? Yes, the same problem appears with the card inserted and no ethernet cable plugged in using RC7. On my Ubuntu system, the problem manifested during the initialization of the Hardware Abstraction Layer; the relevant dmesg excerpt is below: BUG: soft lockup detected on CPU#0! Pid: 0, comm: swapper EIP: 0060:[] CPU: 0 EIP is at rtl8169_phy_timer+0x8c/0xbd [r8169] EFLAGS: 0296Not tainted (2.6.15-rc7) EAX: 0001 EBX: ceeec000 ECX: 089e EDX: c0414000 ESI: ceeec260 EDI: d097e000 EBP: ceeec8f8 DS: 007b ES: 007b CR0: 8005003b CR2: 0804fff5 CR3: 0fbfe000 CR4: 0690 [] rtl8169_phy_timer+0x0/0xbd [r8169] [] run_timer_softirq+0xb9/0x208 [] __do_softirq+0x4d/0x9c [] do_softirq+0x2d/0x2f [] irq_exit+0x38/0x3a [] do_IRQ+0x20/0x28 [] common_interrupt+0x1a/0x20 [] default_idle+0x2b/0x53 [] cpu_idle+0x45/0x71 [] start_kernel+0x16b/0x180 [] unknown_bootoption+0x0/0x1bf SysRq : HELP : loglevel0-8 reBoot tErm Full kIll saK showMem Nice powerOff showPc unRaw Sync showTasks Unmount SysRq : Show Regs Pid: 0, comm: swapper EIP: 0060:[] CPU: 0 EIP is at rtl8169_phy_timer+0x8c/0xbd [r8169] EFLAGS: 0296Not tainted (2.6.15-rc7) EAX: 0001 EBX: ceeec000 ECX: 1930 EDX: c0414000 ESI: ceeec260 EDI: d097e000 EBP: ceeec8f8 DS: 007b ES: 007b CR0: 8005003b CR2: 0804fff5 CR3: 0fbfe000 CR4: 0690 [] rtl8169_phy_timer+0x0/0xbd [r8169] [] run_timer_softirq+0xb9/0x208 [] __do_softirq+0x4d/0x9c [] do_softirq+0x2d/0x2f [] irq_exit+0x38/0x3a [] do_IRQ+0x20/0x28 [] common_interrupt+0x1a/0x20 [] default_idle+0x2b/0x53 [] cpu_idle+0x45/0x71 [] start_kernel+0x16b/0x180 [] unknown_bootoption+0x0/0x1bf pccard: card ejected from slot 1 IBM machine detected. Enabling interrupts during APM calls. To avoid making this message overly long, I've excerpted the other portions of the boot logs. The full versions are available, however, at: http://www.steinthal.us/kernel/2.6.15-rc7-boot.log (/var/log/messages) http://www.steinthal.us/kernel/2.6.15-rc7-dmesg (dmesg output) Note that the two logs are actually taken from different (successive) boots, but under identical conditions. > Please send: > - .config Again, here's the networking section of .config; the full .config is available at http://www.steinthal.us/kernel/config-2.6.15-rc7 # # Network device support # CONFIG_NETDEVICES=y CONFIG_DUMMY=m # CONFIG_BONDING is not set # CONFIG_EQUALIZER is not set # CONFIG_TUN is not set # CONFIG_NET_SB1000 is not set # # ARCnet devices # # CONFIG_ARCNET is not set # # PHY device support # # CONFIG_PHYLIB is not set # # Ethernet (10 or 100Mbit) # CONFIG_NET_ETHERNET=y CONFIG_MII=m # CONFIG_HAPPYMEAL is not set # CONFIG_SUNGEM is not set # CONFIG_CASSINI is not set # CONFIG_NET_VENDOR_3COM is not set # CONFIG_LANCE is not set # CONFIG_NET_VENDOR_SMC is not set # CONFIG_NET_VENDOR_RACAL is not set # # Tulip family network device support # # CONFIG_NET_TULIP is not set # CONFIG_AT1700 is not set # CONFIG_DEPCA is not set # CONFIG_HP100 is not set # CONFIG_NET_ISA is not set CONFIG_NET_PCI=y # CONFIG_PCNET32 is not set # CONFIG_AMD8111_ETH is not set # CONFIG_ADAPTEC_STARFIRE is not set # CONFIG_AC3200 is not set # CONFIG_APRICOT is not set # CONFIG_B44 is not set # CONFIG_FORCEDETH is not set # CONFIG_CS89x0 is not set # CONFIG_DGRS is not set CONFIG_EEPRO100=m # CONFIG_E100 is not set # CONFIG_FEALNX is not set # CONFIG_NATSEMI is not set # CONFIG_NE2K_PCI is not set # CONFIG_8139CP is not set # CONFIG_8139TOO is not set # CONFIG_SIS900 is not set # CONFIG_EPIC100 is not set # CONFIG_SUNDANCE is not set # CONFIG_TLAN is not set # CONFIG_VIA_RHINE is not set # CONFIG_NET_POCKET is not set # # Ethernet (1000 Mbit) # # CONFIG_ACENIC is not set # CONFIG_DL2K is not set # CONFIG_E1000 is not set # CONFIG_NS83820 is not set # CONFIG_HAMACHI is not set # CONFIG_YELLOWFIN is not set CONFIG_R8169=m # CONFIG_R8169_NAPI is not set # CONFIG_SIS190 is not set # CONFIG_SKGE is not set # CONFIG_SK98LIN is not set # CONFIG_VIA_VELOCITY is not set # CONFIG_TIGON3 is not set # CONFIG_BNX2 is not set # # Ethernet (1 Mbit) # # CONFIG_CHELSIO_T1 is not set # CONFIG_IXGB is not set # CONFIG_S2IO is not set # # Token Ring devices # # CONFIG_TR is not set # # Wireless LAN (non-hamradio) # CONFIG_NET_RADIO=y # # Obsolete Wireless cards support (pre-802.11) # # CONFIG_STRIP is not set CONFIG_ARLAN=m # CONFIG_WAVELAN is not set # CONFIG_PCMCIA_WAVELAN is not set # CONFIG_PCMCIA_NETWAVE is not set # # Wireless 802.11 Frequency Hopping card
Re: R8169 driver & Netgear GA511 NIC - soft lockup/hang in rtl8169_phy_timer
Russell Steinthal <[EMAIL PROTECTED]> : [...] > I'm seeing a "hang" under 2.6.14.4 when the link goes up (I think) on a > Netgear GA511 PCMCIA gigabit NIC. I put "hang" in quotes because while > the system is generally unresponsive, Magic SysRq responds and I can > return the system to normal by ejecting the card. Can you reproduce the issue w/o link up at boot time with 2.6.15-rc7 ? Please send: - .config - complete dmesg from boot + SysRq-P r8169.o module could help. -- Ueimor - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] latency tracer, 2.6.15-rc7
On Fri, 2005-12-30 at 20:15 -0500, Lee Revell wrote: > On Fri, 2005-12-30 at 17:02 -0800, Linus Torvalds wrote: > > > > On Fri, 30 Dec 2005, Lee Revell wrote: > > > > > > It seems that the networking code's use of RCU can cause 10ms+ > > > latencies: > > > > Hmm. Is there a big jump at the 10ms mark? Do you have a 100Hz timer > > source? > > > > A latency jump at 10ms would tend to indicate a missed wakeup that > > was "picked up" by the next timer tick. > > No there are no large jumps, it really seems that this was the network > code causing an RCU callback to drop ~2K routes at once. Specifically > RCU invokes dst_rcu_free 2085 times in a single batch > (call_rcu_bh(&rt->u.dst.rcu_head, dst_rcu_free) is only called from > rt_free() and rt_drop()). > > I have found that many of the paths in the network stack that can cause > high latencies can be tuned via sysctls (for example > net.ipv4.route.gc_thresh); this one may be the same. On a related topic: I thought I had solved the route cache flushing problem by tuning these sysctls, but it does not seems to help. The short version is that rt_run_flush and rt_garbage_collect can cause 15ms+ latencies when a lot of routes (up to 4096 it seems) are flushed in one go: $ grep "local_bh_enable (rt_run_flush)" /proc/latency_trace | wc -l 4096 $ grep "local_bh_enable (rt_garbage_collect)" /proc/latency_trace | wc -l 4096 (rt_run_flush and rt_garbage_collect call spin_lock_bh/spin_unlock_bh once for each flushed route so the above grep effectively counts the number of routes flushed at once) I reported this a year or so ago and it led to Ingo adding an option to -rt (then called the voluntary preempt patch) to always run softirqs in threads which makes rt_run_flush preemptible. Anyway I thought I could work around this with mainline by tuning the network stack to minimize the effect of route cache flushing on scheduler latency using these sysctls to cause the route cache to be flushed more often and/or limit the maximum size of the route cache: $ sudo /sbin/sysctl -a | grep route net.ipv4.route.gc_elasticity = 8 net.ipv4.route.gc_interval = 60 net.ipv4.route.gc_timeout = 300 net.ipv4.route.gc_min_interval_ms = 20 net.ipv4.route.gc_min_interval = 0 net.ipv4.route.max_size = 65536 net.ipv4.route.gc_thresh = 256 net.ipv4.route.max_delay = 10 net.ipv4.route.min_delay = 2 I tried lowering gc_min_interval_ms, gc_timeout, max_size, and gc_thresh but rt_run_flush will still process up to 4096 (the size of the route hash table?) routes at once. (stop here if you don't care to interpret long latency_trace reports) 17ms+ latency caused by rt_run_flush then rt_garbage_collect processing 4096 routes: preemption latency trace v1.1.5 on 2.6.15-rc7 latency: 17286 us, #19154/19154, CPU#0 | (M:rt VP:0, KP:0, SP:0 HP:0) - | task: gtk-gnutella-8581 (uid:1000 nice:0 policy:0 rt_prio:0) - _--=> CPU# / _-=> irqs-off | / _=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth / | delay cmd pid | time | caller \ /| \ | / epiphany-8742 0dns80us : __trace_start_sched_wakeup (try_to_wake_up) epiphany-8742 0dns81us : __trace_start_sched_wakeup <<...>-8581> (73 0) epiphany-8742 0dns72us : preempt_schedule (__trace_start_sched_wakeup) epiphany-8742 0dns62us : preempt_schedule (try_to_wake_up) epiphany-8742 0dns53us : preempt_schedule (__wake_up) epiphany-8742 0dns54us : preempt_schedule (ep_poll_safewake) epiphany-8742 0dnH56us : do_IRQ (c011223e b 0) epiphany-8742 0d.h.6us : __do_IRQ (do_IRQ) epiphany-8742 0d.h17us+: mask_and_ack_8259A (__do_IRQ) epiphany-8742 0d.h. 12us : handle_IRQ_event (__do_IRQ) epiphany-8742 0d.h. 13us : usb_hcd_irq (handle_IRQ_event) epiphany-8742 0d.h. 13us : uhci_irq (usb_hcd_irq) epiphany-8742 0d.h. 14us : via_driver_irq_handler (handle_IRQ_event) epiphany-8742 0d.h. 16us : rhine_interrupt (handle_IRQ_event) epiphany-8742 0d.h. 16us : ioread16 (rhine_interrupt) epiphany-8742 0d.h. 17us : ioread8 (rhine_interrupt) epiphany-8742 0d.h. 18us : iowrite16 (rhine_interrupt) epiphany-8742 0d.h. 19us : ioread8 (rhine_interrupt) epiphany-8742 0d.h. 20us : rhine_tx (rhine_interrupt) epiphany-8742 0d.h1 21us : raise_softirq_irqoff (rhine_tx) epiphany-8742 0d.h. 22us : ioread16 (rhine_interrupt) epiphany-8742 0d.h. 23us : ioread8 (rhine_interrupt) epiphany-8742 0d.h1 25us : note_interrupt (__do_IRQ) epiphany-8742 0d.h1 25us : end_8259A_irq (__do_IRQ) epiphany-8742 0d.h1 26us : enable_8259A_irq (end_8259A_irq) epiphany-8742 0dnH5 28us : irq_exit (do_IRQ) epiphany-8742 0dns5 28us < (20977