ChangeSet 1.2199.14.33, 2005/03/23 12:16:53-08:00, [EMAIL PROTECTED] [IPV4]: Check mtu instead of frag_list in ip_push_pending_frames() I still didn't like the fact that ip_append_data was the only user of dst_pmtu :) So I went looking for bugs in the surrounding code. I managed to find something in ip_push_pending_frames. When dst_mtu < dst_pmtu - IPsec overhead (which can be caused by PMTU discovery within an IPsec tunnel), and we transmit a packet that's longer than dst_mtu but shorter than dst_pmtu - IPsec overhead, then the DF bit will be incorrectly set in the inner IP header. This will cause the packet to be dropped when it hits the router that generated the original PMTU event. Unfortunately the ICMP packet coming back doesn't tell us anything new so the next time we send a packet we will do exactly the same thing. The fix is similar to what we did in ip_output. Instead of checking whether frag_list is empty, we check the condition skb->len <= dst_mtu directly and set the DF bit based on that. We can enumerate all the possibilities to see that this is correct. If skb->len <= dst_mtu and frag_list is empty then this does the samething as before and is obviously correct. If skb->len <= dst_mtu and frag_list is non-empty then it implies that dst_pmtu has increased since the fragments were constructed as dst_pmtu = dst_mtu + IPsec overhead. So the skb will now fit within a single fragment which means that setting DF is correct. The fragments will be merged by skb_linearise in dev_queue_xmit. If skb->len > dst_mtu and frag_list is non-empty then again this maintains the status quo. If skb->len > dst_mtu and frag_list is empty then we will leave the DF bit clear as the packet will need to be fragmented between the remote IPsec gateway and the final destination. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
ip_output.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletion(-) diff -Nru a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c --- a/net/ipv4/ip_output.c 2005-03-26 17:24:41 -08:00 +++ b/net/ipv4/ip_output.c 2005-03-26 17:24:41 -08:00 @@ -1152,7 +1152,8 @@ * If local_df is set too, we still allow to fragment this frame * locally. */ if (inet->pmtudisc == IP_PMTUDISC_DO || - (!skb_shinfo(skb)->frag_list && ip_dont_fragment(sk, &rt->u.dst))) + (skb->len <= dst_mtu(&rt->u.dst) && + ip_dont_fragment(sk, &rt->u.dst))) df = htons(IP_DF); if (inet->cork.flags & IPCORK_OPT) - To unsubscribe from this list: send the line "unsubscribe bk-commits-head" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html