Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
David Miller writes: > Yes, this semaphore thing is highly problematic. In the most crucial > areas where network driver consistency matters the most for ease of > understanding and debugging, the Intel drivers choose to be different > :-( > > The way the napi_disable() logic breaks out from high packet load in > net_rx_action() is it simply returns even leaving interrupts disabled > when a pending napi_disable() is pending. > > This is what trips up the semaphore logic. > > Robert, give this patch a try. Yes it works. e1000 tested for ~3 hours with high very high load and interface up/down every 5:th sec. Without the patch the irq's gets disabled within a couple of seconds A resolute way of handling the semaphores. :) Signed-off-by: Robert Olsson <[EMAIL PROTECTED]> Cheers --ro > In the long term this semaphore should be completely eliminated, > there is no justification for it. > > Signed-off-by: David S. Miller <[EMAIL PROTECTED]> > > diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c > index 0c9a6f7..76c0fa6 100644 > --- a/drivers/net/e1000/e1000_main.c > +++ b/drivers/net/e1000/e1000_main.c > @@ -632,6 +632,7 @@ e1000_down(struct e1000_adapter *adapter) > > #ifdef CONFIG_E1000_NAPI > napi_disable(>napi); > +atomic_set(>irq_sem, 0); > #endif > e1000_irq_disable(adapter); > > diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c > index 2ab3bfb..9cc5a6b 100644 > --- a/drivers/net/e1000e/netdev.c > +++ b/drivers/net/e1000e/netdev.c > @@ -2183,6 +2183,7 @@ void e1000e_down(struct e1000_adapter *adapter) > msleep(10); > > napi_disable(>napi); > +atomic_set(>irq_sem, 0); > e1000_irq_disable(adapter); > > del_timer_sync(>watchdog_timer); > diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c > index d2fb88d..4f63839 100644 > --- a/drivers/net/ixgb/ixgb_main.c > +++ b/drivers/net/ixgb/ixgb_main.c > @@ -296,6 +296,11 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t > kill_watchdog) > { > struct net_device *netdev = adapter->netdev; > > +#ifdef CONFIG_IXGB_NAPI > +napi_disable(>napi); > +atomic_set(>irq_sem, 0); > +#endif > + > ixgb_irq_disable(adapter); > free_irq(adapter->pdev->irq, netdev); > > @@ -304,9 +309,7 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t > kill_watchdog) > > if(kill_watchdog) > del_timer_sync(>watchdog_timer); > -#ifdef CONFIG_IXGB_NAPI > -napi_disable(>napi); > -#endif > + > adapter->link_speed = 0; > adapter->link_duplex = 0; > netif_carrier_off(netdev); > diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c > index de3f45e..a4265bc 100644 > --- a/drivers/net/ixgbe/ixgbe_main.c > +++ b/drivers/net/ixgbe/ixgbe_main.c > @@ -1409,9 +1409,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter) > IXGBE_WRITE_FLUSH(>hw); > msleep(10); > > +napi_disable(>napi); > +atomic_set(>irq_sem, 0); > + > ixgbe_irq_disable(adapter); > > -napi_disable(>napi); > del_timer_sync(>watchdog_timer); > > netif_carrier_off(netdev); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
David Miller writes: Yes, this semaphore thing is highly problematic. In the most crucial areas where network driver consistency matters the most for ease of understanding and debugging, the Intel drivers choose to be different :-( The way the napi_disable() logic breaks out from high packet load in net_rx_action() is it simply returns even leaving interrupts disabled when a pending napi_disable() is pending. This is what trips up the semaphore logic. Robert, give this patch a try. Yes it works. e1000 tested for ~3 hours with high very high load and interface up/down every 5:th sec. Without the patch the irq's gets disabled within a couple of seconds A resolute way of handling the semaphores. :) Signed-off-by: Robert Olsson [EMAIL PROTECTED] Cheers --ro In the long term this semaphore should be completely eliminated, there is no justification for it. Signed-off-by: David S. Miller [EMAIL PROTECTED] diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index 0c9a6f7..76c0fa6 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -632,6 +632,7 @@ e1000_down(struct e1000_adapter *adapter) #ifdef CONFIG_E1000_NAPI napi_disable(adapter-napi); +atomic_set(adapter-irq_sem, 0); #endif e1000_irq_disable(adapter); diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c index 2ab3bfb..9cc5a6b 100644 --- a/drivers/net/e1000e/netdev.c +++ b/drivers/net/e1000e/netdev.c @@ -2183,6 +2183,7 @@ void e1000e_down(struct e1000_adapter *adapter) msleep(10); napi_disable(adapter-napi); +atomic_set(adapter-irq_sem, 0); e1000_irq_disable(adapter); del_timer_sync(adapter-watchdog_timer); diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c index d2fb88d..4f63839 100644 --- a/drivers/net/ixgb/ixgb_main.c +++ b/drivers/net/ixgb/ixgb_main.c @@ -296,6 +296,11 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog) { struct net_device *netdev = adapter-netdev; +#ifdef CONFIG_IXGB_NAPI +napi_disable(adapter-napi); +atomic_set(adapter-irq_sem, 0); +#endif + ixgb_irq_disable(adapter); free_irq(adapter-pdev-irq, netdev); @@ -304,9 +309,7 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog) if(kill_watchdog) del_timer_sync(adapter-watchdog_timer); -#ifdef CONFIG_IXGB_NAPI -napi_disable(adapter-napi); -#endif + adapter-link_speed = 0; adapter-link_duplex = 0; netif_carrier_off(netdev); diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c index de3f45e..a4265bc 100644 --- a/drivers/net/ixgbe/ixgbe_main.c +++ b/drivers/net/ixgbe/ixgbe_main.c @@ -1409,9 +1409,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter) IXGBE_WRITE_FLUSH(adapter-hw); msleep(10); +napi_disable(adapter-napi); +atomic_set(adapter-irq_sem, 0); + ixgbe_irq_disable(adapter); -napi_disable(adapter-napi); del_timer_sync(adapter-watchdog_timer); netif_carrier_off(netdev); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
David Miller writes: > > eth0 e1000_irq_enable sem = 1<- ifconfig eth0 down > > eth0 e1000_irq_disable sem = 2 > > > > **e1000_open <- ifconfig eth0 up > > eth0 e1000_irq_disable sem = 3 Dead. irq's can't be enabled > > e1000_irq_enable miss > > eth0 e1000_irq_enable sem = 2 > > e1000_irq_enable miss > > eth0 e1000_irq_enable sem = 1 > > ADDRCONF(NETDEV_UP): eth0: link is not ready > > Yes, this semaphore thing is highly problematic. In the most crucial > areas where network driver consistency matters the most for ease of > understanding and debugging, the Intel drivers choose to be different I don't understand the idea with semaphore for enabling/disabling irq's either the overall logic must safer/better without it. > The way the napi_disable() logic breaks out from high packet load in > net_rx_action() is it simply returns even leaving interrupts disabled > when a pending napi_disable() is pending. > > This is what trips up the semaphore logic. > > Robert, give this patch a try. > > In the long term this semaphore should be completely eliminated, > there is no justification for it. It's on the testing list... Cheers --ro > > Signed-off-by: David S. Miller <[EMAIL PROTECTED]> > > diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c > index 0c9a6f7..76c0fa6 100644 > --- a/drivers/net/e1000/e1000_main.c > +++ b/drivers/net/e1000/e1000_main.c > @@ -632,6 +632,7 @@ e1000_down(struct e1000_adapter *adapter) > > #ifdef CONFIG_E1000_NAPI > napi_disable(>napi); > +atomic_set(>irq_sem, 0); > #endif > e1000_irq_disable(adapter); > > diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c > index 2ab3bfb..9cc5a6b 100644 > --- a/drivers/net/e1000e/netdev.c > +++ b/drivers/net/e1000e/netdev.c > @@ -2183,6 +2183,7 @@ void e1000e_down(struct e1000_adapter *adapter) > msleep(10); > > napi_disable(>napi); > +atomic_set(>irq_sem, 0); > e1000_irq_disable(adapter); > > del_timer_sync(>watchdog_timer); > diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c > index d2fb88d..4f63839 100644 > --- a/drivers/net/ixgb/ixgb_main.c > +++ b/drivers/net/ixgb/ixgb_main.c > @@ -296,6 +296,11 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t > kill_watchdog) > { > struct net_device *netdev = adapter->netdev; > > +#ifdef CONFIG_IXGB_NAPI > +napi_disable(>napi); > +atomic_set(>irq_sem, 0); > +#endif > + > ixgb_irq_disable(adapter); > free_irq(adapter->pdev->irq, netdev); > > @@ -304,9 +309,7 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t > kill_watchdog) > > if(kill_watchdog) > del_timer_sync(>watchdog_timer); > -#ifdef CONFIG_IXGB_NAPI > -napi_disable(>napi); > -#endif > + > adapter->link_speed = 0; > adapter->link_duplex = 0; > netif_carrier_off(netdev); > diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c > index de3f45e..a4265bc 100644 > --- a/drivers/net/ixgbe/ixgbe_main.c > +++ b/drivers/net/ixgbe/ixgbe_main.c > @@ -1409,9 +1409,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter) > IXGBE_WRITE_FLUSH(>hw); > msleep(10); > > +napi_disable(>napi); > +atomic_set(>irq_sem, 0); > + > ixgbe_irq_disable(adapter); > > -napi_disable(>napi); > del_timer_sync(>watchdog_timer); > > netif_carrier_off(netdev); > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
David Miller writes: eth0 e1000_irq_enable sem = 1- ifconfig eth0 down eth0 e1000_irq_disable sem = 2 **e1000_open - ifconfig eth0 up eth0 e1000_irq_disable sem = 3 Dead. irq's can't be enabled e1000_irq_enable miss eth0 e1000_irq_enable sem = 2 e1000_irq_enable miss eth0 e1000_irq_enable sem = 1 ADDRCONF(NETDEV_UP): eth0: link is not ready Yes, this semaphore thing is highly problematic. In the most crucial areas where network driver consistency matters the most for ease of understanding and debugging, the Intel drivers choose to be different I don't understand the idea with semaphore for enabling/disabling irq's either the overall logic must safer/better without it. The way the napi_disable() logic breaks out from high packet load in net_rx_action() is it simply returns even leaving interrupts disabled when a pending napi_disable() is pending. This is what trips up the semaphore logic. Robert, give this patch a try. In the long term this semaphore should be completely eliminated, there is no justification for it. It's on the testing list... Cheers --ro Signed-off-by: David S. Miller [EMAIL PROTECTED] diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index 0c9a6f7..76c0fa6 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -632,6 +632,7 @@ e1000_down(struct e1000_adapter *adapter) #ifdef CONFIG_E1000_NAPI napi_disable(adapter-napi); +atomic_set(adapter-irq_sem, 0); #endif e1000_irq_disable(adapter); diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c index 2ab3bfb..9cc5a6b 100644 --- a/drivers/net/e1000e/netdev.c +++ b/drivers/net/e1000e/netdev.c @@ -2183,6 +2183,7 @@ void e1000e_down(struct e1000_adapter *adapter) msleep(10); napi_disable(adapter-napi); +atomic_set(adapter-irq_sem, 0); e1000_irq_disable(adapter); del_timer_sync(adapter-watchdog_timer); diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c index d2fb88d..4f63839 100644 --- a/drivers/net/ixgb/ixgb_main.c +++ b/drivers/net/ixgb/ixgb_main.c @@ -296,6 +296,11 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog) { struct net_device *netdev = adapter-netdev; +#ifdef CONFIG_IXGB_NAPI +napi_disable(adapter-napi); +atomic_set(adapter-irq_sem, 0); +#endif + ixgb_irq_disable(adapter); free_irq(adapter-pdev-irq, netdev); @@ -304,9 +309,7 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog) if(kill_watchdog) del_timer_sync(adapter-watchdog_timer); -#ifdef CONFIG_IXGB_NAPI -napi_disable(adapter-napi); -#endif + adapter-link_speed = 0; adapter-link_duplex = 0; netif_carrier_off(netdev); diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c index de3f45e..a4265bc 100644 --- a/drivers/net/ixgbe/ixgbe_main.c +++ b/drivers/net/ixgbe/ixgbe_main.c @@ -1409,9 +1409,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter) IXGBE_WRITE_FLUSH(adapter-hw); msleep(10); +napi_disable(adapter-napi); +atomic_set(adapter-irq_sem, 0); + ixgbe_irq_disable(adapter); -napi_disable(adapter-napi); del_timer_sync(adapter-watchdog_timer); netif_carrier_off(netdev); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
David Miller writes: > > On Wednesday 16 January 2008, David Miller wrote: > > > Ok, here is the patch I'll propose to fix this. The goal is to make > > > it as simple as possible without regressing the thing we were trying > > > to fix. > > > > Looks good to me. Tested with -rc8. > > Thanks for testing. Yes that code looks nice. I'm using the patch but I've noticed another phenomena with the current e1000 driver. There is a race when taking a device down at high traffic loads. I've tracked and instrumented and it seems like occasionly irq_sem can get bump up so interrupts can't be enabled again. eth0 e1000_irq_enable sem = 1<- High netload eth0 e1000_irq_enable sem = 1 eth0 e1000_irq_enable sem = 1 eth0 e1000_irq_enable sem = 1 eth0 e1000_irq_enable sem = 1 eth0 e1000_irq_enable sem = 1 eth0 e1000_irq_enable sem = 1<- ifconfig eth0 down eth0 e1000_irq_disable sem = 2 **e1000_open <- ifconfig eth0 up eth0 e1000_irq_disable sem = 3 Dead. irq's can't be enabled e1000_irq_enable miss eth0 e1000_irq_enable sem = 2 e1000_irq_enable miss eth0 e1000_irq_enable sem = 1 ADDRCONF(NETDEV_UP): eth0: link is not ready Cheers --ro static void e1000_irq_disable(struct e1000_adapter *adapter) { atomic_inc(>irq_sem); E1000_WRITE_REG(>hw, IMC, ~0); E1000_WRITE_FLUSH(>hw); synchronize_irq(adapter->pdev->irq); if(adapter->netdev->ifindex == 3) printk("%s e1000_irq_disable sem = %d\n", adapter->netdev->name, atomic_read(>irq_sem)); } static void e1000_irq_enable(struct e1000_adapter *adapter) { if (likely(atomic_dec_and_test(>irq_sem))) { E1000_WRITE_REG(>hw, IMS, IMS_ENABLE_MASK); E1000_WRITE_FLUSH(>hw); } else printk("e1000_irq_enable miss\n"); if(adapter->netdev->ifindex == 3) printk("%s e1000_irq_enable sem = %d\n", adapter->netdev->name, atomic_read(>irq_sem)); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
David Miller writes: On Wednesday 16 January 2008, David Miller wrote: Ok, here is the patch I'll propose to fix this. The goal is to make it as simple as possible without regressing the thing we were trying to fix. Looks good to me. Tested with -rc8. Thanks for testing. Yes that code looks nice. I'm using the patch but I've noticed another phenomena with the current e1000 driver. There is a race when taking a device down at high traffic loads. I've tracked and instrumented and it seems like occasionly irq_sem can get bump up so interrupts can't be enabled again. eth0 e1000_irq_enable sem = 1- High netload eth0 e1000_irq_enable sem = 1 eth0 e1000_irq_enable sem = 1 eth0 e1000_irq_enable sem = 1 eth0 e1000_irq_enable sem = 1 eth0 e1000_irq_enable sem = 1 eth0 e1000_irq_enable sem = 1- ifconfig eth0 down eth0 e1000_irq_disable sem = 2 **e1000_open - ifconfig eth0 up eth0 e1000_irq_disable sem = 3 Dead. irq's can't be enabled e1000_irq_enable miss eth0 e1000_irq_enable sem = 2 e1000_irq_enable miss eth0 e1000_irq_enable sem = 1 ADDRCONF(NETDEV_UP): eth0: link is not ready Cheers --ro static void e1000_irq_disable(struct e1000_adapter *adapter) { atomic_inc(adapter-irq_sem); E1000_WRITE_REG(adapter-hw, IMC, ~0); E1000_WRITE_FLUSH(adapter-hw); synchronize_irq(adapter-pdev-irq); if(adapter-netdev-ifindex == 3) printk(%s e1000_irq_disable sem = %d\n, adapter-netdev-name, atomic_read(adapter-irq_sem)); } static void e1000_irq_enable(struct e1000_adapter *adapter) { if (likely(atomic_dec_and_test(adapter-irq_sem))) { E1000_WRITE_REG(adapter-hw, IMS, IMS_ENABLE_MASK); E1000_WRITE_FLUSH(adapter-hw); } else printk(e1000_irq_enable miss\n); if(adapter-netdev-ifindex == 3) printk(%s e1000_irq_enable sem = %d\n, adapter-netdev-name, atomic_read(adapter-irq_sem)); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller writes: > > Is the netif_running() check even required? > > No, it is not. > > When a device is brought down, one of the first things > that happens is that we wait for all pending NAPI polls > to complete, then block any new polls from starting. Hello! Yes but the reason was not to wait for all pending polls to complete so a server/router could be rebooted even under high- load and DOS. We've experienced some nasty problems with this. Cheers. --ro -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] net: napi fix
David Miller writes: Is the netif_running() check even required? No, it is not. When a device is brought down, one of the first things that happens is that we wait for all pending NAPI polls to complete, then block any new polls from starting. Hello! Yes but the reason was not to wait for all pending polls to complete so a server/router could be rebooted even under high- load and DOS. We've experienced some nasty problems with this. Cheers. --ro -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Memory leak in 2.6.11-rc1?
Oh. Linux version 2.6.11-rc2 was used. Robert Olsson writes: > > Andrew Morton writes: > > Russell King <[EMAIL PROTECTED]> wrote: > > > > ip_dst_cache1292 1485256 151 > > > I guess we should find a way to make it happen faster. > > Here is route DoS attack. Pure routing no NAT no filter. > > Start > = > ip_dst_cache 5 30256 151 : tunables 120 608 : > slabdata 2 2 0 > > After DoS > = > ip_dst_cache 66045 76125256 151 : tunables 120 608 : > slabdata 5075 5075480 > > After some GC runs. > == > ip_dst_cache 2 15256 151 : tunables 120 608 : > slabdata 1 1 0 > > No problems here. I saw Martin talked about NAT... > > --ro - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Memory leak in 2.6.11-rc1?
Andrew Morton writes: > Russell King <[EMAIL PROTECTED]> wrote: > > ip_dst_cache1292 1485256 151 > I guess we should find a way to make it happen faster. Here is route DoS attack. Pure routing no NAT no filter. Start = ip_dst_cache 5 30256 151 : tunables 120 608 : slabdata 2 2 0 After DoS = ip_dst_cache 66045 76125256 151 : tunables 120 608 : slabdata 5075 5075480 After some GC runs. == ip_dst_cache 2 15256 151 : tunables 120 608 : slabdata 1 1 0 No problems here. I saw Martin talked about NAT... --ro - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Memory leak in 2.6.11-rc1?
Andrew Morton writes: Russell King [EMAIL PROTECTED] wrote: ip_dst_cache1292 1485256 151 I guess we should find a way to make it happen faster. Here is route DoS attack. Pure routing no NAT no filter. Start = ip_dst_cache 5 30256 151 : tunables 120 608 : slabdata 2 2 0 After DoS = ip_dst_cache 66045 76125256 151 : tunables 120 608 : slabdata 5075 5075480 After some GC runs. == ip_dst_cache 2 15256 151 : tunables 120 608 : slabdata 1 1 0 No problems here. I saw Martin talked about NAT... --ro - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Memory leak in 2.6.11-rc1?
Oh. Linux version 2.6.11-rc2 was used. Robert Olsson writes: Andrew Morton writes: Russell King [EMAIL PROTECTED] wrote: ip_dst_cache1292 1485256 151 I guess we should find a way to make it happen faster. Here is route DoS attack. Pure routing no NAT no filter. Start = ip_dst_cache 5 30256 151 : tunables 120 608 : slabdata 2 2 0 After DoS = ip_dst_cache 66045 76125256 151 : tunables 120 608 : slabdata 5075 5075480 After some GC runs. == ip_dst_cache 2 15256 151 : tunables 120 608 : slabdata 1 1 0 No problems here. I saw Martin talked about NAT... --ro - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] 2.6.* pktgen doesn't set ethnet header properly
Hello! Look at pginfos[i].hh[12] = 0x08; /* fill in protocol. Rest is filled in later. */ pginfos[i].hh[13] = 0x00; --ro Junfeng Yang writes: > Hi, > > I tried to use pktgen module from 2.6.* kernels and found out that I > couldn't receive any packets generated by pktgen. I did not even see a > "packet dropped by kernel" message. It turned out that function > setup_inject in net/core/pktgen.c doesn't setup the ethernet header field > correctly. Below is a patch that fixes the problem. > > --- kernel-source-2.6.8-orig/net/core/pktgen.c 2004-08-13 > 22:37:26.0 -0700 > +++ kernel-source-2.6.8/net/core/pktgen.c2005-01-19 17:54:46.0 > -0800 > @@ -259,6 +259,9 @@ > > /* Set up Dest MAC */ > memcpy(&(info->hh[0]), info->dst_mac, 6); > + > +/* Set up protocol */ > +((struct ethhdr *)(info->hh))->h_proto = htons(ETH_P_IP); > > info->saddr_min = 0; > info->saddr_max = 0; > > -Junfeng > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] 2.6.* pktgen doesn't set ethnet header properly
Hello! Look at pginfos[i].hh[12] = 0x08; /* fill in protocol. Rest is filled in later. */ pginfos[i].hh[13] = 0x00; --ro Junfeng Yang writes: Hi, I tried to use pktgen module from 2.6.* kernels and found out that I couldn't receive any packets generated by pktgen. I did not even see a packet dropped by kernel message. It turned out that function setup_inject in net/core/pktgen.c doesn't setup the ethernet header field correctly. Below is a patch that fixes the problem. --- kernel-source-2.6.8-orig/net/core/pktgen.c 2004-08-13 22:37:26.0 -0700 +++ kernel-source-2.6.8/net/core/pktgen.c2005-01-19 17:54:46.0 -0800 @@ -259,6 +259,9 @@ /* Set up Dest MAC */ memcpy((info-hh[0]), info-dst_mac, 6); + +/* Set up protocol */ +((struct ethhdr *)(info-hh))-h_proto = htons(ETH_P_IP); info-saddr_min = 0; info-saddr_max = 0; -Junfeng - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How to optimize routing performance
Manfred Spraul writes: > > > > http://Linux/net-development/experiments/010313 > > > The link is broken, and I couldn't find it at www.linux.com. Did you > forget the host? Yes Sir! The profile data from the Linux production router is at: http://robur.slu.se/Linux/net-development/experiments/010313 Cheers. --ro - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How to optimize routing performance
Jonathan Morton writes: > Nice. Any chance of similar functionality finding its' way outside the > Tulip driver, eg. to 3c509 or via-rhine? I'd find those useful, since one > or two of my Macs appear to be capable of generating pseudo-DoS levels of > traffic under certain circumstances which totally lock a 486 (for the > duration) and heavily load a P166 - even though said Macs "only" have > 10baseT Ethernet. I'm not the one to tell. :-) First its kind of experimental. Jamal has talked about putting together a proposal for enhancing RX-process for inclusion in the 2.5 kernels. There is meeting soon for this. But why not experiment a bit? Cheers. --ro - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How to optimize routing performance
[Sorry for the length] Rik van Riel writes: > On Thu, 15 Mar 2001, Robert Olsson wrote: > > > CONFIG_NET_HW_FLOWCONTROL enables kernel code for it. But device > > drivers has to have support for it. But unfortunely very few drivers > > has support for it. > > Isn't it possible to put something like this in the layer just > above the driver ? There is a dropping point in netif_rx. The problem is that knowledge of congestion has to be pushed back to the devices that is causing this. Alexey added netdev_dropping for drivers to check. And via netdev_wakeup() the drivers xon_metod can be called when the backlog below a certain threshold. So from here the driver has do the work. Not investing any resources and interrupts in packets we still have to drop. This what happens at very high load a kind of livelock. For routers routing protocols will time out and we loose conetivity. But I would say its important for all apps. In 2.4.0-test10 Jamal added sampling of the backlog queue so device drivers get the current congestion level. This opens new possiblities. > It probably won't work as well as putting it directly in the > driver, but it'll at least keep Linux from collapsing under > really heavy loads ... And we have done experiments with controlling interrupts and running the RX at "lower" priority. The idea is take RX-interrupt and immediately postponing the RX process to tasklet. The tasklet opens for new RX-ints. when its done. This way dropping now occurs outside the box since and dropping becomes very undramatically. As little example of this. I monitored a DoS attack on Linux router equipped with this RX-tasklet driver. Admin up6 day(s) 13 hour(s) 37 min 54 sec Last input NOW Last output NOW 5min RX bit/s 22.4 M 5min TX bit/s 1.3 M 5min RX pkts/s 44079< 5min TX pkts/s 877 5min TX errors 0 5min RX errors 0 5min RX dropped 49913< Fb: no 3127894088 low 154133938 mod 6 high 0 drp 0 < Congestion levels Polling: ON starts/pkts/tasklet_count 96545881/2768574948/1850259980 HW_flowcontrol xon's 0 A bit of explanation. Above is output from tulip driver. We are forwarding 44079 and we are dropping 49913 packets per second! This box has full BGP. The DoS attack was going on for about 30 minutes BGP survived and the box was manageable. Under a heavy attack it still performs well. Cheers. --ro - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How to optimize routing performance
Rik van Riel writes: > On Thu, 15 Mar 2001, [ISO-8859-1] Mårten Wikström wrote: > > > I've performed a test on the routing capacity of a Linux 2.4.2 box > > versus a FreeBSD 4.2 box. I used two Pentium Pro 200Mhz computers with > > 64Mb memory, and two DEC 100Mbit ethernet cards. I used a Smartbits > > test-tool to measure the packet throughput and the packet size was set > > to 64 bytes. Linux dropped no packets up to about 27000 packets/s, but > > then it started to drop packets at higher rates. Worse yet, the output > > rate actually decreased, so at the input rate of 4 packets/s It is a known problem yes. And just as Rik says its has been adressed in 2.1.x by Alexey for first time. > > almost no packets got through. The behaviour of FreeBSD was different, > > it showed a steadily increased output rate up to about 7 packets/s > > before the output rate decreased. (Then the output rate was apprx. > > 4 packets/s). > > > So, my question is: are these figures true, or is it possible to > > optimize the kernel somehow? The only changes I have made to the > > kernel config was to disable advanced routing. > > There are some flow control options in the kernel which should > help. From your description, it looks like they aren't enabled > by default ... CONFIG_NET_HW_FLOWCONTROL enables kernel code for it. But device drivers has to have support for it. But unfortunely very few drivers has support for it. Also we done experiments were we move the device RX processing to SoftIRQ rather than IRQ. With this RX is in better balance with other kernel tasks and TX. Under very high load and under DoS attacks the system is now manageable. It's in practical use already. > At the NordU/USENIX conference in Stockholm (this february) I > saw a nice presentation on the flow control code in the Linux > networking code and how it improved networking performance. > I'm pretty convinced that flow control _should_ be saving your > system in this case. Thanks Rik. This is work/experiments by Jamal and me with support from Gurus. :-) Jamal did this presentation at OLS 2000. At NordU/USENIX I gave an updated presentation of it. The presentation is not yet available form the usenix webb I think. It can ftp from robur.slu.se: /pub/Linux/tmp/FF-NordUSENIX.pdf or .ps In summary Linux is very decent router. Wire speed small packets @ 100 Mbps and capable of Gigabit routing (1440 pkts tested) we used. Also if people are interested we have done profiling on a Linux production router with full BGP at pretty loaded site. This to give us costs for route lookup, skb malloc/free, interrupts etc. http://Linux/net-development/experiments/010313 I'm on netdev but not the kernel list. Cheers. --ro - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How to optimize routing performance
Rik van Riel writes: On Thu, 15 Mar 2001, [ISO-8859-1] Mrten Wikstrm wrote: I've performed a test on the routing capacity of a Linux 2.4.2 box versus a FreeBSD 4.2 box. I used two Pentium Pro 200Mhz computers with 64Mb memory, and two DEC 100Mbit ethernet cards. I used a Smartbits test-tool to measure the packet throughput and the packet size was set to 64 bytes. Linux dropped no packets up to about 27000 packets/s, but then it started to drop packets at higher rates. Worse yet, the output rate actually decreased, so at the input rate of 4 packets/s It is a known problem yes. And just as Rik says its has been adressed in 2.1.x by Alexey for first time. almost no packets got through. The behaviour of FreeBSD was different, it showed a steadily increased output rate up to about 7 packets/s before the output rate decreased. (Then the output rate was apprx. 4 packets/s). So, my question is: are these figures true, or is it possible to optimize the kernel somehow? The only changes I have made to the kernel config was to disable advanced routing. There are some flow control options in the kernel which should help. From your description, it looks like they aren't enabled by default ... CONFIG_NET_HW_FLOWCONTROL enables kernel code for it. But device drivers has to have support for it. But unfortunely very few drivers has support for it. Also we done experiments were we move the device RX processing to SoftIRQ rather than IRQ. With this RX is in better balance with other kernel tasks and TX. Under very high load and under DoS attacks the system is now manageable. It's in practical use already. At the NordU/USENIX conference in Stockholm (this february) I saw a nice presentation on the flow control code in the Linux networking code and how it improved networking performance. I'm pretty convinced that flow control _should_ be saving your system in this case. Thanks Rik. This is work/experiments by Jamal and me with support from Gurus. :-) Jamal did this presentation at OLS 2000. At NordU/USENIX I gave an updated presentation of it. The presentation is not yet available form the usenix webb I think. It can ftp from robur.slu.se: /pub/Linux/tmp/FF-NordUSENIX.pdf or .ps In summary Linux is very decent router. Wire speed small packets @ 100 Mbps and capable of Gigabit routing (1440 pkts tested) we used. Also if people are interested we have done profiling on a Linux production router with full BGP at pretty loaded site. This to give us costs for route lookup, skb malloc/free, interrupts etc. http://Linux/net-development/experiments/010313 I'm on netdev but not the kernel list. Cheers. --ro - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How to optimize routing performance
[Sorry for the length] Rik van Riel writes: On Thu, 15 Mar 2001, Robert Olsson wrote: CONFIG_NET_HW_FLOWCONTROL enables kernel code for it. But device drivers has to have support for it. But unfortunely very few drivers has support for it. Isn't it possible to put something like this in the layer just above the driver ? There is a dropping point in netif_rx. The problem is that knowledge of congestion has to be pushed back to the devices that is causing this. Alexey added netdev_dropping for drivers to check. And via netdev_wakeup() the drivers xon_metod can be called when the backlog below a certain threshold. So from here the driver has do the work. Not investing any resources and interrupts in packets we still have to drop. This what happens at very high load a kind of livelock. For routers routing protocols will time out and we loose conetivity. But I would say its important for all apps. In 2.4.0-test10 Jamal added sampling of the backlog queue so device drivers get the current congestion level. This opens new possiblities. It probably won't work as well as putting it directly in the driver, but it'll at least keep Linux from collapsing under really heavy loads ... And we have done experiments with controlling interrupts and running the RX at "lower" priority. The idea is take RX-interrupt and immediately postponing the RX process to tasklet. The tasklet opens for new RX-ints. when its done. This way dropping now occurs outside the box since and dropping becomes very undramatically. As little example of this. I monitored a DoS attack on Linux router equipped with this RX-tasklet driver. Admin up6 day(s) 13 hour(s) 37 min 54 sec Last input NOW Last output NOW 5min RX bit/s 22.4 M 5min TX bit/s 1.3 M 5min RX pkts/s 44079 5min TX pkts/s 877 5min TX errors 0 5min RX errors 0 5min RX dropped 49913 Fb: no 3127894088 low 154133938 mod 6 high 0 drp 0 Congestion levels Polling: ON starts/pkts/tasklet_count 96545881/2768574948/1850259980 HW_flowcontrol xon's 0 A bit of explanation. Above is output from tulip driver. We are forwarding 44079 and we are dropping 49913 packets per second! This box has full BGP. The DoS attack was going on for about 30 minutes BGP survived and the box was manageable. Under a heavy attack it still performs well. Cheers. --ro - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How to optimize routing performance
Jonathan Morton writes: Nice. Any chance of similar functionality finding its' way outside the Tulip driver, eg. to 3c509 or via-rhine? I'd find those useful, since one or two of my Macs appear to be capable of generating pseudo-DoS levels of traffic under certain circumstances which totally lock a 486 (for the duration) and heavily load a P166 - even though said Macs "only" have 10baseT Ethernet. I'm not the one to tell. :-) First its kind of experimental. Jamal has talked about putting together a proposal for enhancing RX-process for inclusion in the 2.5 kernels. There is meeting soon for this. But why not experiment a bit? Cheers. --ro - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How to optimize routing performance
Manfred Spraul writes: http://Linux/net-development/experiments/010313 The link is broken, and I couldn't find it at www.linux.com. Did you forget the host? Yes Sir! The profile data from the Linux production router is at: http://robur.slu.se/Linux/net-development/experiments/010313 Cheers. --ro - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
Yes ! The FF experiments with 2.1.X indicated improvement factor about 2-3 times with skb recycling. With combination of FF and skb recycling we could reach fast Ethernet wire speed forwarding on 400 Mhz CPU. About ~147 KPPS. As jamal reported the improvement is much less today but the forwarding performance is impressive even without FF and skb recycling. Slab seems to do a good job and especially when the debug is disabled. :-) --ro Andi Kleen writes: > On Thu, Sep 14, 2000 at 11:59:32PM +1100, Andrew Morton wrote: > > That's 20 usec per interrupt, of which 1 usec could be saved by skb > > pooling. > > FF usually runs with interrupt mitigation at higher rates (8-16 or even > more packets / interrupt). I agree though that it probably does not > make too much difference. alloc_skb could probably be made cheaper > for the FF case by being more clever in the slab constructor (I think > there was some bitrot during 2.3 on the cache line usage -- 2.2 pretty > much only needed 2 cache lines in the header for a FF packet) > > > -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Preallocated skb's?
Yes ! The FF experiments with 2.1.X indicated improvement factor about 2-3 times with skb recycling. With combination of FF and skb recycling we could reach fast Ethernet wire speed forwarding on 400 Mhz CPU. About ~147 KPPS. As jamal reported the improvement is much less today but the forwarding performance is impressive even without FF and skb recycling. Slab seems to do a good job and especially when the debug is disabled. :-) --ro Andi Kleen writes: On Thu, Sep 14, 2000 at 11:59:32PM +1100, Andrew Morton wrote: That's 20 usec per interrupt, of which 1 usec could be saved by skb pooling. FF usually runs with interrupt mitigation at higher rates (8-16 or even more packets / interrupt). I agree though that it probably does not make too much difference. alloc_skb could probably be made cheaper for the FF case by being more clever in the slab constructor (I think there was some bitrot during 2.3 on the cache line usage -- 2.2 pretty much only needed 2 cache lines in the header for a FF packet) -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/