Re: [PATCH net-2.6.24 0/3]: More TCP fixes
Ilpo Järvinen wrote: > On Wed, 3 Oct 2007, Cedric Le Goater wrote: > >> Cedric Le Goater wrote: >>> Below are the messages I got on 2) right after running ketchup (which does >>> a wget www.kernel.org) > > Oops, those tcp_fragment WARNINGs in the other mail were due to bug in > the debug patch as it called verify too early in there (before queue was > adjusted, no wonder it finds state inconsistent at that point, fixed that)... > > ...So please discard all old debug patches, they're all broken in this > respect... :-( > >>> not a warning on 1) with your extra verbose patch. >> bummer, I got this one on 1) :( >> >> WARNING: at /home/legoater/linux/net-2.6.24.git/net/ipv4/tcp_input.c:2325 >> tcp_fastretrans_alert() >> Call Trace: >>[] __wake_up+0x1f/0x4c >> [] tcp_ack+0xcee/0x18ac >> [] tcp_rcv_established+0x61f/0x6df > > ...I just wonder why that's the first place where it occurs... Can you try > the debug patch below (fixed verify place in tcp_fragment/collapse, added > some of them to narrow it down, and handled GSO more user friendly way in > the printout). Put it on top of those three patches (mm should be fine :-)). > ...I wish the verify triggers way before the fastretrans trap (for some > reason it didn't do that in the quoted trace, maybe I had some verifys > missing in that old patch or something)... so here are the results on a net-2.6.24 kernel. I've put the patchset here to make sure it's correct: http://legoater.free.fr/patches/2.6.23/net-2.6.24.git-tcp_fastretrans/ and plenty of logs : http://legoater.free.fr/patches/2.6.23/net-2.6.24.git-tcp_fastretrans.messages FYI, config is here : http://legoater.free.fr/patches/2.6.23/net-2.6.24.git-tcp_fastretrans.config C. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.24 0/3]: More TCP fixes
On Wed, 3 Oct 2007, Cedric Le Goater wrote: > Cedric Le Goater wrote: > > > > Below are the messages I got on 2) right after running ketchup (which does > > a wget www.kernel.org) Oops, those tcp_fragment WARNINGs in the other mail were due to bug in the debug patch as it called verify too early in there (before queue was adjusted, no wonder it finds state inconsistent at that point, fixed that)... ...So please discard all old debug patches, they're all broken in this respect... :-( > > not a warning on 1) with your extra verbose patch. > > bummer, I got this one on 1) :( > > WARNING: at /home/legoater/linux/net-2.6.24.git/net/ipv4/tcp_input.c:2325 > tcp_fastretrans_alert() > Call Trace: >[] __wake_up+0x1f/0x4c > [] tcp_ack+0xcee/0x18ac > [] tcp_rcv_established+0x61f/0x6df ...I just wonder why that's the first place where it occurs... Can you try the debug patch below (fixed verify place in tcp_fragment/collapse, added some of them to narrow it down, and handled GSO more user friendly way in the printout). Put it on top of those three patches (mm should be fine :-)). ...I wish the verify triggers way before the fastretrans trap (for some reason it didn't do that in the quoted trace, maybe I had some verifys missing in that old patch or something)... -- i. include/net/tcp.h |3 + net/ipv4/tcp_input.c | 41 +++--- net/ipv4/tcp_ipv4.c | 113 + net/ipv4/tcp_output.c |4 +- 4 files changed, 154 insertions(+), 7 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 991ccdc..54a0d91 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -43,6 +43,9 @@ #include +extern void tcp_verify_fackets(struct sock *sk); +extern void tcp_print_queue(struct sock *sk); + extern struct inet_hashinfo tcp_hashinfo; extern atomic_t tcp_orphan_count; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 87c9ef5..7707b1d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1140,7 +1140,7 @@ static int tcp_check_dsack(struct tcp_sock *tp, struct sk_buff *ack_skb, return dup_sack; } -static int +int tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_una) { const struct inet_connection_sock *icsk = inet_csk(sk); @@ -1159,6 +1159,9 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ int i; int first_sack_index; + if (WARN_ON(!tp->sacked_out && tp->fackets_out)) + tcp_print_queue(sk); + if (!tp->sacked_out) { if (WARN_ON(tp->fackets_out)) tp->fackets_out = 0; @@ -1421,6 +1424,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ } } } + tcp_verify_fackets(sk); /* Check for lost retransmit. This superb idea is * borrowed from "ratehalving". Event "C". @@ -1633,13 +1637,14 @@ void tcp_enter_frto(struct sock *sk) tcp_set_ca_state(sk, TCP_CA_Disorder); tp->high_seq = tp->snd_nxt; tp->frto_counter = 1; + tcp_verify_fackets(sk); } /* Enter Loss state after F-RTO was applied. Dupack arrived after RTO, * which indicates that we should follow the traditional RTO recovery, * i.e. mark everything lost and do go-back-N retransmission. */ -static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag) +void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag) { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; @@ -1676,6 +1681,7 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag) } } tcp_verify_left_out(tp); + tcp_verify_fackets(sk); tp->snd_cwnd = tcp_packets_in_flight(tp) + allowed_segments; tp->snd_cwnd_cnt = 0; @@ -1754,6 +1760,7 @@ void tcp_enter_loss(struct sock *sk, int how) } } tcp_verify_left_out(tp); + tcp_verify_fackets(sk); tp->reordering = min_t(unsigned int, tp->reordering, sysctl_tcp_reordering); @@ -2309,7 +2316,7 @@ static void tcp_mtup_probe_success(struct sock *sk, struct sk_buff *skb) * It does _not_ decide what to send, it is made in function * tcp_xmit_retransmit_queue(). */ -static void +void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag) { struct inet_connection_sock *icsk = inet_csk(sk); @@ -2318,13 +2325,20 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag) int do_lost = is_dupack || ((flag&FLAG_DATA_SACKED) && (tp->fackets_out > tp->reordering)); + tcp_verify_fackets(sk); + /* Some technical things: * 1. Reno does not count dupacks (sacked_out) automatically. */ - if (!tp->packets_out) + if
Re: [PATCH net-2.6.24 0/3]: More TCP fixes
Cedric Le Goater wrote: > Ilpo Järvinen wrote: >> On Wed, 3 Oct 2007, Cedric Le Goater wrote: >> >>> Ilpo Järvinen wrote: Ah, that's path 1) then... Since you seem to have enough time, I would say that the path 1 is good as well and bugs unrelated to the fix will show up there too... >>> arg. yes. sorry for the confusion. >>> I should have stated it explicitly that with path 2 those 3 patches should not be applied because the aim is not a fix but reproducal. Path 2 was intentionally left without the potentional fix as then nice backtrace informs when we can stop trying (which would hopefully occurred pretty soon) :-). But lets discard that path 2... >>> I have 2 spare nodes so i'll run both. 1) is on already without any issues >>> i'm just compiling 2) > > Below are the messages I got on 2) right after running ketchup (which does > a wget www.kernel.org) > > not a warning on 1) with your extra verbose patch. bummer, I got this one on 1) :( C. WARNING: at /home/legoater/linux/net-2.6.24.git/net/ipv4/tcp_input.c:2325 tcp_fastretrans_alert() Call Trace: [] __wake_up+0x1f/0x4c [] tcp_ack+0xcee/0x18ac [] tcp_rcv_established+0x61f/0x6df [] __lock_acquire+0x8a1/0xf1b [] tcp_v4_do_rcv+0x3e/0x394 [] tcp_v4_rcv+0x624/0x9b1 [] ip_local_deliver+0x1da/0x2a4 [] ip_rcv+0x57c/0x5c4 [] packet_rcv_spkt+0x19a/0x1a8 [] netif_receive_skb+0x2ba/0x2de [] :tg3:tg3_poll+0x65d/0x8a4 [] net_rx_action+0xb8/0x191 [] __do_softirq+0x5f/0xe0 [] call_softirq+0x1c/0x28 [] do_softirq+0x3b/0xb9 [] irq_exit+0x4e/0x50 [] do_IRQ+0xbe/0xd8 [] mwait_idle+0x0/0x4d [] ret_from_intr+0x0/0xf [] __sched_text_start+0x5f0/0x62b [] __sched_text_start+0x5f0/0x62b [] mwait_idle+0x43/0x4d [] enter_idle+0x22/0x24 [] cpu_idle+0x9d/0xc0 [] rest_init+0x57/0x59 [] start_kernel+0x2d1/0x2dd [] _sinittext+0x14e/0x155 WARNING: at /home/legoater/linux/net-2.6.24.git/net/ipv4/tcp_input.c:2325 tcp_fastretrans_alert() Call Trace: [] __wake_up+0x1f/0x4c [] tcp_ack+0xcee/0x18ac [] tcp_rcv_established+0x61f/0x6df [] __lock_acquire+0x8a1/0xf1b [] tcp_v4_do_rcv+0x3e/0x394 [] tcp_v4_rcv+0x624/0x9b1 [] ip_local_deliver+0x1da/0x2a4 [] ip_rcv+0x57c/0x5c4 [] packet_rcv_spkt+0x19a/0x1a8 [] netif_receive_skb+0x2ba/0x2de [] :tg3:tg3_poll+0x65d/0x8a4 [] net_rx_action+0xb8/0x191 [] __do_softirq+0x5f/0xe0 [] call_softirq+0x1c/0x28 [] do_softirq+0x3b/0xb9 [] irq_exit+0x4e/0x50 [] do_IRQ+0xbe/0xd8 [] mwait_idle+0x0/0x4d [] ret_from_intr+0x0/0xf [] __sched_text_start+0x5f0/0x62b [] __sched_text_start+0x5f0/0x62b [] mwait_idle+0x43/0x4d [] enter_idle+0x22/0x24 [] cpu_idle+0x9d/0xc0 [] rest_init+0x57/0x59 [] start_kernel+0x2d1/0x2dd - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.24 0/3]: More TCP fixes
Ilpo Järvinen wrote: > On Wed, 3 Oct 2007, Cedric Le Goater wrote: > >> Ilpo Järvinen wrote: >>> Ah, that's path 1) then... Since you seem to have enough time, I would say >>> that the path 1 is good as well and bugs unrelated to the fix will show up >>> there too... >> arg. yes. sorry for the confusion. >> >>> I should have stated it explicitly that with path 2 those 3 patches should >>> not be applied because the aim is not a fix but reproducal. Path 2 was >>> intentionally left without the potentional fix as then nice backtrace >>> informs when we can stop trying (which would hopefully occurred >>> pretty soon) :-). But lets discard that path 2... >> I have 2 spare nodes so i'll run both. 1) is on already without any issues >> i'm just compiling 2) Below are the messages I got on 2) right after running ketchup (which does a wget www.kernel.org) not a warning on 1) with your extra verbose patch. >> I usually work on -mm, so what would be interesting for me is to have what >> you >> need in net-2.6.24 which is getting pulled in -mm by andrew. then, if >> you need an extra patch for verbosity, that's fine, i'll include it in >> my usual patchset. > > Ah, I'm sorry about the subject and the extra work it caused, no problem, that was a comment for the futur patchset. > it was meant for DaveM only, didn't realize at that time it would be > meaningful to you as well, thus couldn't warn you back then... Testing on > top of mm would be (/ have been) fine as well... From my point of view > both mm and net-2.6.24 are pretty much the same (I even verified that > those patches apply fine on top of rc8-mm2 since I thought that you might > want to use that one). He, you might have solved it with 1). If not, I'm keeping the hardware for you. Cheers, C. WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:193 tcp_verify_fackets() Call Trace: [] tcp_verify_fackets+0x119/0x237 [] tcp_fragment+0x468/0x4b8 [] tcp_retransmit_skb+0xcf/0x2f4 [] tcp_xmit_retransmit_queue+0xc3/0x31e [] tcp_fastretrans_alert+0xb36/0xb43 [] tcp_ack+0x5d3/0x71b [] tcp_rcv_established+0x61f/0x6df [] __lock_acquire+0x8a1/0xf1b [] tcp_v4_do_rcv+0x3e/0x394 [] tcp_v4_rcv+0x61c/0x9a9 [] ip_local_deliver+0x1da/0x2a4 [] ip_rcv+0x583/0x5c9 [] packet_rcv_spkt+0x19a/0x1a8 [] netif_receive_skb+0x2cf/0x2f5 [] :tg3:tg3_poll+0x65d/0x8a4 [] net_rx_action+0xb8/0x191 [] __do_softirq+0x5f/0xe0 [] call_softirq+0x1c/0x28 [] do_softirq+0x3b/0xb8 [] irq_exit+0x4e/0x50 [] do_IRQ+0xbd/0xd7 [] mwait_idle+0x0/0x4d [] ret_from_intr+0x0/0xf [] mwait_idle+0x43/0x4d [] enter_idle+0x22/0x24 [] cpu_idle+0x9d/0xc0 [] rest_init+0x55/0x57 [] start_kernel+0x2e0/0x2ec [] _sinittext+0x134/0x13b WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:198 tcp_verify_fackets() Call Trace: [] vgacon_set_cursor_size+0x39/0xd5 [] tcp_verify_fackets+0x163/0x237 [] tcp_fragment+0x468/0x4b8 [] tcp_retransmit_skb+0xcf/0x2f4 [] tcp_xmit_retransmit_queue+0xc3/0x31e [] tcp_fastretrans_alert+0xb36/0xb43 [] tcp_ack+0x5d3/0x71b [] tcp_rcv_established+0x61f/0x6df [] __lock_acquire+0x8a1/0xf1b [] tcp_v4_do_rcv+0x3e/0x394 [] tcp_v4_rcv+0x61c/0x9a9 [] ip_local_deliver+0x1da/0x2a4 [] ip_rcv+0x583/0x5c9 [] packet_rcv_spkt+0x19a/0x1a8 [] netif_receive_skb+0x2cf/0x2f5 [] :tg3:tg3_poll+0x65d/0x8a4 [] net_rx_action+0xb8/0x191 [] __do_softirq+0x5f/0xe0 [] call_softirq+0x1c/0x28 [] do_softirq+0x3b/0xb8 [] irq_exit+0x4e/0x50 [] do_IRQ+0xbd/0xd7 [] mwait_idle+0x0/0x4d [] ret_from_intr+0x0/0xf [] mwait_idle+0x43/0x4d [] enter_idle+0x22/0x24 [] cpu_idle+0x9d/0xc0 [] rest_init+0x55/0x57 [] start_kernel+0x2e0/0x2ec [] _sinittext+0x134/0x13b TCP wq(s) -S--SSS< TCP wq(i) hf < s4 f9 (47) p9 seq: su3460595874 hs3460607374 sn3460659962 (3460608822) WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:193 tcp_verify_fackets() Call Trace: [] vgacon_set_cursor_size+0x39/0xd5 [] tcp_verify_fackets+0x119/0x237 [] tcp_fragment+0x468/0x4b8 [] tcp_retransmit_skb+0xcf/0x2f4 [] tcp_xmit_retransmit_queue+0xc3/0x31e [] tcp_fastretrans_alert+0xb36/0xb43 [] tcp_ack+0x5d3/0x71b [] tcp_rcv_established+0x61f/0x6df [] __lock_acquire+0x8a1/0xf1b [] tcp_v4_do_rcv+0x3e/0x394 [] tcp_v4_rcv+0x61c/0x9a9 [] ip_local_deliver+0x1da/0x2a4 [] ip_rcv+0x583/0x5c9 [] packet_rcv_spkt+0x19a/0x1a8 [] netif_receive_skb+0x2cf/0x2f5 [] :tg3:tg3_poll+0x65d/0x8a4 [] net_rx_action+0xb8/0x191 [] __do_softirq+0x5f/0xe0 [] call_softirq+0x1c/0x28 [] do_softirq+0x3b/0xb8 [] irq_exit+0x4e/0x50 [] do_IRQ+0xbd/0xd7 [] mwait_idle+0x0/0x4d [] ret_from_intr+0x0/0xf [] mwait_idle+0x43/0x4d [] enter_idle+0x22/0x24 [] cpu_idle+0x9d/0xc0 [] rest_init+0x55/0x57 [] start_kernel+0x2e0/0x2ec [] _sinittext+0x134/0x13b WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:198 tcp_v
Re: [PATCH net-2.6.24 0/3]: More TCP fixes
Cedric Le Goater wrote: > Ilpo Järvinen wrote: >> On Wed, 3 Oct 2007, Cedric Le Goater wrote: >> >>> Ilpo Järvinen wrote: Ah, that's path 1) then... Since you seem to have enough time, I would say that the path 1 is good as well and bugs unrelated to the fix will show up there too... >>> arg. yes. sorry for the confusion. >>> I should have stated it explicitly that with path 2 those 3 patches should not be applied because the aim is not a fix but reproducal. Path 2 was intentionally left without the potentional fix as then nice backtrace informs when we can stop trying (which would hopefully occurred pretty soon) :-). But lets discard that path 2... >>> I have 2 spare nodes so i'll run both. 1) is on already without any issues >>> i'm just compiling 2) > > Below are the messages I got on 2) right after running ketchup (which does > a wget www.kernel.org) and a second run of ketchup gave the following. cheers, C. WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:193 tcp_verify_fackets() Call Trace: [] tcp_verify_fackets+0x119/0x237 [] tcp_fragment+0x468/0x4b8 [] tcp_retransmit_skb+0xcf/0x2f4 [] tcp_xmit_retransmit_queue+0xc3/0x31e [] tcp_fastretrans_alert+0xb36/0xb43 [] tcp_ack+0x5d3/0x71b [] tcp_rcv_established+0x61f/0x6df [] __lock_acquire+0x8a1/0xf1b [] tcp_v4_do_rcv+0x3e/0x394 [] tcp_v4_rcv+0x61c/0x9a9 [] ip_local_deliver+0x1da/0x2a4 [] ip_rcv+0x583/0x5c9 [] packet_rcv_spkt+0x19a/0x1a8 [] netif_receive_skb+0x2cf/0x2f5 [] :tg3:tg3_poll+0x65d/0x8a4 [] net_rx_action+0xb8/0x191 [] __do_softirq+0x5f/0xe0 [] call_softirq+0x1c/0x28 [] do_softirq+0x3b/0xb8 [] irq_exit+0x4e/0x50 [] do_IRQ+0xbd/0xd7 [] mwait_idle+0x0/0x4d [] ret_from_intr+0x0/0xf [] mwait_idle+0x43/0x4d [] enter_idle+0x22/0x24 [] cpu_idle+0x9d/0xc0 [] rest_init+0x55/0x57 [] start_kernel+0x2e0/0x2ec [] _sinittext+0x134/0x13b TCP wq(s) --S--< TCP wq(i) h < s1 f5 (14) p6 seq: su110259658 hs110265450 sn110278722 (0) WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:193 tcp_verify_fackets() Call Trace: [] vgacon_scroll+0x188/0x1dd [] tcp_verify_fackets+0x119/0x237 [] tcp_fragment+0x468/0x4b8 [] tcp_retransmit_skb+0xcf/0x2f4 [] tcp_xmit_retransmit_queue+0xc3/0x31e [] tcp_fastretrans_alert+0xb36/0xb43 [] tcp_ack+0x5d3/0x71b [] tcp_rcv_established+0x61f/0x6df [] __lock_acquire+0x8a1/0xf1b [] tcp_v4_do_rcv+0x3e/0x394 [] tcp_v4_rcv+0x61c/0x9a9 [] ip_local_deliver+0x1da/0x2a4 [] ip_rcv+0x583/0x5c9 [] packet_rcv_spkt+0x19a/0x1a8 [] netif_receive_skb+0x2cf/0x2f5 [] :tg3:tg3_poll+0x65d/0x8a4 [] net_rx_action+0xb8/0x191 [] __do_softirq+0x5f/0xe0 [] call_softirq+0x1c/0x28 [] do_softirq+0x3b/0xb8 [] irq_exit+0x4e/0x50 [] do_IRQ+0xbd/0xd7 [] mwait_idle+0x0/0x4d [] ret_from_intr+0x0/0xf [] mwait_idle+0x43/0x4d [] enter_idle+0x22/0x24 [] cpu_idle+0x9d/0xc0 [] rest_init+0x55/0x57 [] start_kernel+0x2e0/0x2ec [] _sinittext+0x134/0x13b WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:198 tcp_verify_fackets() Call Trace: [] vgacon_scroll+0x188/0x1dd [] tcp_verify_fackets+0x163/0x237 [] tcp_fragment+0x468/0x4b8 [] tcp_retransmit_skb+0xcf/0x2f4 [] tcp_xmit_retransmit_queue+0xc3/0x31e [] tcp_fastretrans_alert+0xb36/0xb43 [] tcp_ack+0x5d3/0x71b [] tcp_rcv_established+0x61f/0x6df [] __lock_acquire+0x8a1/0xf1b [] tcp_v4_do_rcv+0x3e/0x394 [] tcp_v4_rcv+0x61c/0x9a9 [] ip_local_deliver+0x1da/0x2a4 [] ip_rcv+0x583/0x5c9 [] packet_rcv_spkt+0x19a/0x1a8 [] netif_receive_skb+0x2cf/0x2f5 [] :tg3:tg3_poll+0x65d/0x8a4 [] net_rx_action+0xb8/0x191 [] __do_softirq+0x5f/0xe0 [] call_softirq+0x1c/0x28 [] do_softirq+0x3b/0xb8 [] irq_exit+0x4e/0x50 [] do_IRQ+0xbd/0xd7 [] mwait_idle+0x0/0x4d [] ret_from_intr+0x0/0xf [] mwait_idle+0x43/0x4d [] enter_idle+0x22/0x24 [] cpu_idle+0x9d/0xc0 [] rest_init+0x55/0x57 [] start_kernel+0x2e0/0x2ec [] _sinittext+0x134/0x13b TCP wq(s) ---SSS-< TCP wq(i) hf< s3 f7 (14) p7 seq: su110259658 hs110268346 sn110278722 (110269794) C. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.24 0/3]: More TCP fixes
On Wed, 3 Oct 2007, Cedric Le Goater wrote: > Ilpo Järvinen wrote: > > > > Ah, that's path 1) then... Since you seem to have enough time, I would say > > that the path 1 is good as well and bugs unrelated to the fix will show up > > there too... > > arg. yes. sorry for the confusion. > > > I should have stated it explicitly that with path 2 those 3 patches should > > not be applied because the aim is not a fix but reproducal. Path 2 was > > intentionally left without the potentional fix as then nice backtrace > > informs when we can stop trying (which would hopefully occurred > > pretty soon) :-). But lets discard that path 2... > > I have 2 spare nodes so i'll run both. 1) is on already without any issues > i'm just compiling 2) Thanks a lot. :-) > I usually work on -mm, so what would be interesting for me is to have what > you > need in net-2.6.24 which is getting pulled in -mm by andrew. then, if > you need an extra patch for verbosity, that's fine, i'll include it in > my usual patchset. Ah, I'm sorry about the subject and the extra work it caused, it was meant for DaveM only, didn't realize at that time it would be meaningful to you as well, thus couldn't warn you back then... Testing on top of mm would be (/ have been) fine as well... From my point of view both mm and net-2.6.24 are pretty much the same (I even verified that those patches apply fine on top of rc8-mm2 since I thought that you might want to use that one). -- i.
Re: [PATCH net-2.6.24 0/3]: More TCP fixes
Ilpo Järvinen wrote: > On Wed, 3 Oct 2007, Cedric Le Goater wrote: > >> Ilpo Järvinen wrote: >>> On Wed, 3 Oct 2007, Cedric Le Goater wrote: >>> I'm dropping the previous patches you sent me and switching to this patchset. right ? >>> Yes you can do that... However, there are two ways forward: >>> >>> 1) Drop and test with this patchset long enough to verify it's gone... >>> 2) No dropping and get the more exact trace by reproducing, which can >>>point out to tcp_retrans_try_collapse confirming the source of the >>>bug or revealing yet another bug... >>> >>> The first one has one drawback, it cannot prove the fix very well since >>> the bug could just not occur by chance... Path 2 would clearly show the >>> place from where the problem originates because we will know that it got >>> triggered! I personally would prefer path 2 but whether you want to go for >>> that depends on the time you want to invest in it... >>> >>> ...I rediffed the tcp_verify_fackets patch too (below) just in case it >>> would be something else in you case and you choose path 1 (put it on top >>> of this patchset, applies with some offsets). In case the problem is gone, >>> it shouldn't trigger and if it does, we'll have another bug caught. >> I have a spare node so I'm starting 2) with the 3 patches you sent and that >> last one which applied fine. > > Ah, that's path 1) then... Since you seem to have enough time, I would say > that the path 1 is good as well and bugs unrelated to the fix will show up > there too... arg. yes. sorry for the confusion. > I should have stated it explicitly that with path 2 those 3 patches should > not be applied because the aim is not a fix but reproducal. Path 2 was > intentionally left without the potentional fix as then nice backtrace > informs when we can stop trying (which would hopefully occurred > pretty soon) :-). But lets discard that path 2... I have 2 spare nodes so i'll run both. 1) is on already without any issues i'm just compiling 2) I usually work on -mm, so what would be interesting for me is to have what you need in net-2.6.24 which is getting pulled in -mm by andrew. then, if you need an extra patch for verbosity, that's fine, i'll include it in my usual patchset. Cheers, C. >> all of them on a fresh git pull of net-2.6.24 > > That's fine, they're pretty well in sync (mm and net-2.6.24, and > soon 2.6.24-rcs too). > - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.24 0/3]: More TCP fixes
On Wed, 3 Oct 2007, Cedric Le Goater wrote: > Ilpo Järvinen wrote: > > On Wed, 3 Oct 2007, Cedric Le Goater wrote: > > > >> I'm dropping the previous patches you sent me and switching to this > >> patchset. > >> right ? > > > > Yes you can do that... However, there are two ways forward: > > > > 1) Drop and test with this patchset long enough to verify it's gone... > > 2) No dropping and get the more exact trace by reproducing, which can > >point out to tcp_retrans_try_collapse confirming the source of the > >bug or revealing yet another bug... > > > > The first one has one drawback, it cannot prove the fix very well since > > the bug could just not occur by chance... Path 2 would clearly show the > > place from where the problem originates because we will know that it got > > triggered! I personally would prefer path 2 but whether you want to go for > > that depends on the time you want to invest in it... > > > > ...I rediffed the tcp_verify_fackets patch too (below) just in case it > > would be something else in you case and you choose path 1 (put it on top > > of this patchset, applies with some offsets). In case the problem is gone, > > it shouldn't trigger and if it does, we'll have another bug caught. > > I have a spare node so I'm starting 2) with the 3 patches you sent and that > last one which applied fine. Ah, that's path 1) then... Since you seem to have enough time, I would say that the path 1 is good as well and bugs unrelated to the fix will show up there too... I should have stated it explicitly that with path 2 those 3 patches should not be applied because the aim is not a fix but reproducal. Path 2 was intentionally left without the potentional fix as then nice backtrace informs when we can stop trying (which would hopefully occurred pretty soon) :-). But lets discard that path 2... > all of them on a fresh git pull of net-2.6.24 That's fine, they're pretty well in sync (mm and net-2.6.24, and soon 2.6.24-rcs too). -- i.
Re: [PATCH net-2.6.24 0/3]: More TCP fixes
Ilpo Järvinen wrote: > On Wed, 3 Oct 2007, Cedric Le Goater wrote: > >> Ilpo Järvinen wrote: >>> Sacktag fastpath_cnt_hint seems to be very tricky to get right... >>> I suppose this one fixes Cedric's case. I cannot say for sure >>> until there is something more definite indication of >>> tcp_retrans_try_collapse origin than what the simple late WARN_ON >>> gave for us. ...Especially since it's non-trivial to have skb >>> hint "correctly" positioned in the write_queue while still ending >>> up calling that function. However, considering how difficult it >>> seems to be for Cedric to reproduce, it might well be this one. >>> >>> In addition, I noticed another reset which wasn't previously >>> converted to WARN_ON, so doing that now. Boot + simple xfer >>> tested. Please apply to net-2.6.24. >> I'm dropping the previous patches you sent me and switching to this >> patchset. >> right ? > > Yes you can do that... However, there are two ways forward: > > 1) Drop and test with this patchset long enough to verify it's gone... > 2) No dropping and get the more exact trace by reproducing, which can >point out to tcp_retrans_try_collapse confirming the source of the >bug or revealing yet another bug... > > The first one has one drawback, it cannot prove the fix very well since > the bug could just not occur by chance... Path 2 would clearly show the > place from where the problem originates because we will know that it got > triggered! I personally would prefer path 2 but whether you want to go for > that depends on the time you want to invest in it... > > ...I rediffed the tcp_verify_fackets patch too (below) just in case it > would be something else in you case and you choose path 1 (put it on top > of this patchset, applies with some offsets). In case the problem is gone, > it shouldn't trigger and if it does, we'll have another bug caught. I have a spare node so I'm starting 2) with the 3 patches you sent and that last one which applied fine. all of them on a fresh git pull of net-2.6.24 > Anyway, thanks for ccing right persons and netdev right from the > beginning. thanks to git ! :) C. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.24 0/3]: More TCP fixes
On Wed, 3 Oct 2007, Cedric Le Goater wrote: > Ilpo Järvinen wrote: > > Sacktag fastpath_cnt_hint seems to be very tricky to get right... > > I suppose this one fixes Cedric's case. I cannot say for sure > > until there is something more definite indication of > > tcp_retrans_try_collapse origin than what the simple late WARN_ON > > gave for us. ...Especially since it's non-trivial to have skb > > hint "correctly" positioned in the write_queue while still ending > > up calling that function. However, considering how difficult it > > seems to be for Cedric to reproduce, it might well be this one. > > > > In addition, I noticed another reset which wasn't previously > > converted to WARN_ON, so doing that now. Boot + simple xfer > > tested. Please apply to net-2.6.24. > > I'm dropping the previous patches you sent me and switching to this patchset. > right ? Yes you can do that... However, there are two ways forward: 1) Drop and test with this patchset long enough to verify it's gone... 2) No dropping and get the more exact trace by reproducing, which can point out to tcp_retrans_try_collapse confirming the source of the bug or revealing yet another bug... The first one has one drawback, it cannot prove the fix very well since the bug could just not occur by chance... Path 2 would clearly show the place from where the problem originates because we will know that it got triggered! I personally would prefer path 2 but whether you want to go for that depends on the time you want to invest in it... ...I rediffed the tcp_verify_fackets patch too (below) just in case it would be something else in you case and you choose path 1 (put it on top of this patchset, applies with some offsets). In case the problem is gone, it shouldn't trigger and if it does, we'll have another bug caught. Anyway, thanks for ccing right persons and netdev right from the beginning. -- i. include/net/tcp.h |3 + net/ipv4/tcp_input.c | 25 +--- net/ipv4/tcp_ipv4.c | 103 + net/ipv4/tcp_output.c |6 ++- 4 files changed, 130 insertions(+), 7 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 991ccdc..54a0d91 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -43,6 +43,9 @@ #include +extern void tcp_verify_fackets(struct sock *sk); +extern void tcp_print_queue(struct sock *sk); + extern struct inet_hashinfo tcp_hashinfo; extern atomic_t tcp_orphan_count; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 87c9ef5..93bdc20 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1140,7 +1140,7 @@ static int tcp_check_dsack(struct tcp_sock *tp, struct sk_buff *ack_skb, return dup_sack; } -static int +int tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_una) { const struct inet_connection_sock *icsk = inet_csk(sk); @@ -1160,8 +1160,10 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ int first_sack_index; if (!tp->sacked_out) { - if (WARN_ON(tp->fackets_out)) + if (WARN_ON(tp->fackets_out)) { tp->fackets_out = 0; + tcp_print_queue(sk); + } tp->highest_sack = tp->snd_una; } prior_fackets = tp->fackets_out; @@ -1421,6 +1423,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ } } } + tcp_verify_fackets(sk); /* Check for lost retransmit. This superb idea is * borrowed from "ratehalving". Event "C". @@ -1633,13 +1636,14 @@ void tcp_enter_frto(struct sock *sk) tcp_set_ca_state(sk, TCP_CA_Disorder); tp->high_seq = tp->snd_nxt; tp->frto_counter = 1; + tcp_verify_fackets(sk); } /* Enter Loss state after F-RTO was applied. Dupack arrived after RTO, * which indicates that we should follow the traditional RTO recovery, * i.e. mark everything lost and do go-back-N retransmission. */ -static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag) +void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag) { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; @@ -1676,6 +1680,7 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag) } } tcp_verify_left_out(tp); + tcp_verify_fackets(sk); tp->snd_cwnd = tcp_packets_in_flight(tp) + allowed_segments; tp->snd_cwnd_cnt = 0; @@ -1754,6 +1759,7 @@ void tcp_enter_loss(struct sock *sk, int how) } } tcp_verify_left_out(tp); + tcp_verify_fackets(sk); tp->reordering = min_t(unsigned int, tp->reordering, sysctl_tcp_reordering); @@ -2309,7 +2315,7 @@ static void tcp_mtu
Re: [PATCH net-2.6.24 0/3]: More TCP fixes
Hello Ilpo ! Ilpo Järvinen wrote: > Hi Dave, > > Sacktag fastpath_cnt_hint seems to be very tricky to get right... > I suppose this one fixes Cedric's case. I cannot say for sure > until there is something more definite indication of > tcp_retrans_try_collapse origin than what the simple late WARN_ON > gave for us. ...Especially since it's non-trivial to have skb > hint "correctly" positioned in the write_queue while still ending > up calling that function. However, considering how difficult it > seems to be for Cedric to reproduce, it might well be this one. > > In addition, I noticed another reset which wasn't previously > converted to WARN_ON, so doing that now. Boot + simple xfer > tested. Please apply to net-2.6.24. I'm dropping the previous patches you sent me and switching to this patchset. right ? Thanks, C. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.24 0/3]: More TCP fixes
Hi Dave, Sacktag fastpath_cnt_hint seems to be very tricky to get right... I suppose this one fixes Cedric's case. I cannot say for sure until there is something more definite indication of tcp_retrans_try_collapse origin than what the simple late WARN_ON gave for us. ...Especially since it's non-trivial to have skb hint "correctly" positioned in the write_queue while still ending up calling that function. However, considering how difficult it seems to be for Cedric to reproduce, it might well be this one. In addition, I noticed another reset which wasn't previously converted to WARN_ON, so doing that now. Boot + simple xfer tested. Please apply to net-2.6.24. -- i. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html