RE: [git patches] net driver updates
Hi Divy, But the race doesn't exist even for LLTX drivers these days. There is no way two cpu's can execute the xmit handler at the same time. Thanks, - KK > > > > The first part of the patch removes the !netif_queue_stopped(dev). > > > > It opens the race discussed a while ago between Stephen hemminger > > and > > > > David Miller: > > > > http://marc.info/?l=linux-netdev=113383224512427=2 > > > > > > I feel this race cannot happen anymore. I think the fix for that > race > > was > > > to introduce the > > > __LINK_STATE_QDISC_RUNNING bit thus eliminating any races between > > CPU's. If > > > one > > > CPU has called xmit, the other CPU will enqueue skbs (by holding the > > > queue_lock) and > > > exit from qdisc_run since it finds the bit set already. > > > > And the race is talking about LLTX, which S2IO doesn't use as > > far as I can tell. > > Dave, > > The driver is cxgb3 here, it uses LLTX. > > Cheers, > Divy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git patches] net driver updates
Hi Divy, > > Explain why, so I can include it in the changelog please... > > Hi Jeff, > > The first part of the patch removes the !netif_queue_stopped(dev). > It opens the race discussed a while ago between Stephen hemminger and > David Miller: > http://marc.info/?l=linux-netdev=113383224512427=2 I feel this race cannot happen anymore. I think the fix for that race was to introduce the __LINK_STATE_QDISC_RUNNING bit thus eliminating any races between CPU's. If one CPU has called xmit, the other CPU will enqueue skbs (by holding the queue_lock) and exit from qdisc_run since it finds the bit set already. Thanks, - KK -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git patches] net driver updates
Hi Divy, Explain why, so I can include it in the changelog please... Hi Jeff, The first part of the patch removes the !netif_queue_stopped(dev). It opens the race discussed a while ago between Stephen hemminger and David Miller: http://marc.info/?l=linux-netdevm=113383224512427w=2 I feel this race cannot happen anymore. I think the fix for that race was to introduce the __LINK_STATE_QDISC_RUNNING bit thus eliminating any races between CPU's. If one CPU has called xmit, the other CPU will enqueue skbs (by holding the queue_lock) and exit from qdisc_run since it finds the bit set already. Thanks, - KK -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [git patches] net driver updates
Hi Divy, But the race doesn't exist even for LLTX drivers these days. There is no way two cpu's can execute the xmit handler at the same time. Thanks, - KK The first part of the patch removes the !netif_queue_stopped(dev). It opens the race discussed a while ago between Stephen hemminger and David Miller: http://marc.info/?l=linux-netdevm=113383224512427w=2 I feel this race cannot happen anymore. I think the fix for that race was to introduce the __LINK_STATE_QDISC_RUNNING bit thus eliminating any races between CPU's. If one CPU has called xmit, the other CPU will enqueue skbs (by holding the queue_lock) and exit from qdisc_run since it finds the bit set already. And the race is talking about LLTX, which S2IO doesn't use as far as I can tell. Dave, The driver is cxgb3 here, it uses LLTX. Cheers, Divy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings
? sock_def_readable+0x48/0xa0 > > [] ? sock_queue_rcv_skb+0xb1/0x1a0 > > [] ? sock_queue_rcv_skb+0xf7/0x1a0 > > [] ip_rcv+0x18f/0x290 > > [] ? packet_rcv_spkt+0xd0/0x130 > > [] netif_receive_skb+0x2b6/0x330 > > [] ? netif_receive_skb+0x127/0x330 > > [] ? process_backlog+0x83/0x100 > > [] process_backlog+0x8e/0x100 > > [] net_rx_action+0x13c/0x230 > > [] ? net_rx_action+0x59/0x230 > > [] __do_softirq+0x93/0x120 > > [] do_softirq+0x7a/0x80 > > [] irq_exit+0x65/0x90 > > [] do_IRQ+0x41/0x80 > > [] ? trace_hardirqs_on+0xb9/0x130 > > [] common_interrupt+0x2e/0x34 > > [] ? mwait_idle_with_hints+0x40/0x50 > > [] ? mwait_idle+0x0/0x20 > > [] mwait_idle+0x12/0x20 > > [] cpu_idle+0x61/0x110 > > [] rest_init+0x5d/0x60 > > [] start_kernel+0x1fa/0x260 > > [] ? unknown_bootoption+0x0/0x130 > > === > > ---[ end trace 14b601818e6903ac ]--- > > ...But this no longer is, and even more, L: 5 is not valid state at this > point all (should only happen if we went to RTO but it would reset S to > zero with newreno): > > > P: 5 L: 5 vs 5 S: 0 vs 3 w: 2044790889-2044796616 (0) > > TCP wq(s) l< > > TCP wq(h) +++h+< > > l5 s3 f0 p5 seq: su2044790889 hs2044795029 sn2044796616 > > Surprisingly, it was the first time the WARN_ON for left_out returned > correct location. This also explains why the patch I sent to Krishna > didn't print anything (it didn't end up into printing because I forgot > to add L+S>P check into to the state checking if). > > ...so please, could you (others than Denys) try this patch, it should > solve the issue. And Denys, could you confirm (and if necessary double > check) that the kernel you saw this similar problem with is the pure > Linus' mainline, i.e., without any net-2.6.25 or mm bits please, if so, > that problem persists. And anyway, there were some fackets_out related > problems reported as well and this doesn't help for that but I think I've > lost track of who was seeing it due to large number of reports :-), could > somebody refresh my memory because I currently don't have time to dig it > up from archives (at least on this week). > > > -- > i. > > -- > [PATCH] [TCP]: NewReno must count every skb while marking losses > > NewReno should add cnt per skb (as with FACK) instead of depending > on SACKED_ACKED bits which won't be set with it at all. > Effectively, NewReno should always exists after the first > iteration anyway (or immediately if there's already head in > lost_out. > > This was fixed earlier in net-2.6.25 but got reverted among other > stuff and I didn't notice that this is still necessary (actually > wasn't even considering this case while trying to figure out the > reports because I lived with different kind of code than it in > reality was). > > This should solve the WARN_ONs in TCP code that as a result of > this triggered multiple times in every place we check for this > invariant. > > Special thanks to Dave Young <[EMAIL PROTECTED]> and > Krishna Kumar2 <[EMAIL PROTECTED]> for trying with my debug > patches. > > Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> > --- > net/ipv4/tcp_input.c |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index 295490e..aa409a5 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -2156,7 +2156,7 @@ static void tcp_mark_head_lost(struct sock *sk, int > packets, int fast_rexmit) >tp->lost_skb_hint = skb; >tp->lost_cnt_hint = cnt; > > - if (tcp_is_fack(tp) || > + if (tcp_is_fack(tp) || tcp_is_reno(tp) || >(TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)) > cnt += tcp_skb_pcount(skb); > > -- > 1.5.2.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings
[c03c8ea3] ? process_backlog+0x83/0x100 [c03c8eae] process_backlog+0x8e/0x100 [c03c90bc] net_rx_action+0x13c/0x230 [c03c8fd9] ? net_rx_action+0x59/0x230 [c0136f63] __do_softirq+0x93/0x120 [c013706a] do_softirq+0x7a/0x80 [c0137135] irq_exit+0x65/0x90 [c01078b1] do_IRQ+0x41/0x80 [c0155659] ? trace_hardirqs_on+0xb9/0x130 [c01059ca] common_interrupt+0x2e/0x34 [c0103390] ? mwait_idle_with_hints+0x40/0x50 [c01033a0] ? mwait_idle+0x0/0x20 [c01033b2] mwait_idle+0x12/0x20 [c0103141] cpu_idle+0x61/0x110 [c04339fd] rest_init+0x5d/0x60 [c05a47fa] start_kernel+0x1fa/0x260 [c05a4190] ? unknown_bootoption+0x0/0x130 === ---[ end trace 14b601818e6903ac ]--- ...But this no longer is, and even more, L: 5 is not valid state at this point all (should only happen if we went to RTO but it would reset S to zero with newreno): P: 5 L: 5 vs 5 S: 0 vs 3 w: 2044790889-2044796616 (0) TCP wq(s) l TCP wq(h) +++h+ l5 s3 f0 p5 seq: su2044790889 hs2044795029 sn2044796616 Surprisingly, it was the first time the WARN_ON for left_out returned correct location. This also explains why the patch I sent to Krishna didn't print anything (it didn't end up into printing because I forgot to add L+SP check into to the state checking if). ...so please, could you (others than Denys) try this patch, it should solve the issue. And Denys, could you confirm (and if necessary double check) that the kernel you saw this similar problem with is the pure Linus' mainline, i.e., without any net-2.6.25 or mm bits please, if so, that problem persists. And anyway, there were some fackets_out related problems reported as well and this doesn't help for that but I think I've lost track of who was seeing it due to large number of reports :-), could somebody refresh my memory because I currently don't have time to dig it up from archives (at least on this week). -- i. -- [PATCH] [TCP]: NewReno must count every skb while marking losses NewReno should add cnt per skb (as with FACK) instead of depending on SACKED_ACKED bits which won't be set with it at all. Effectively, NewReno should always exists after the first iteration anyway (or immediately if there's already head in lost_out. This was fixed earlier in net-2.6.25 but got reverted among other stuff and I didn't notice that this is still necessary (actually wasn't even considering this case while trying to figure out the reports because I lived with different kind of code than it in reality was). This should solve the WARN_ONs in TCP code that as a result of this triggered multiple times in every place we check for this invariant. Special thanks to Dave Young [EMAIL PROTECTED] and Krishna Kumar2 [EMAIL PROTECTED] for trying with my debug patches. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_input.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 295490e..aa409a5 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2156,7 +2156,7 @@ static void tcp_mark_head_lost(struct sock *sk, int packets, int fast_rexmit) tp-lost_skb_hint = skb; tp-lost_cnt_hint = cnt; - if (tcp_is_fack(tp) || + if (tcp_is_fack(tp) || tcp_is_reno(tp) || (TCP_SKB_CB(skb)-sacked TCPCB_SACKED_ACKED)) cnt += tcp_skb_pcount(skb); -- 1.5.2.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/