ChangeSet 1.2199.14.33, 2005/03/23 12:16:53-08:00, [EMAIL PROTECTED]

        [IPV4]: Check mtu instead of frag_list in ip_push_pending_frames()
        
        I still didn't like the fact that ip_append_data was the only user
        of dst_pmtu :) So I went looking for bugs in the surrounding code.
        I managed to find something in ip_push_pending_frames.
        
        When dst_mtu < dst_pmtu - IPsec overhead (which can be caused by PMTU
        discovery within an IPsec tunnel), and we transmit a packet that's
        longer than dst_mtu but shorter than dst_pmtu - IPsec overhead, then
        the DF bit will be incorrectly set in the inner IP header.
        
        This will cause the packet to be dropped when it hits the router that
        generated the original PMTU event.  Unfortunately the ICMP packet coming
        back doesn't tell us anything new so the next time we send a packet we
        will do exactly the same thing.
        
        The fix is similar to what we did in ip_output.  Instead of checking
        whether frag_list is empty, we check the condition skb->len <= dst_mtu
        directly and set the DF bit based on that.
        
        We can enumerate all the possibilities to see that this is correct.
        
        If skb->len <= dst_mtu and frag_list is empty then this does the
        samething as before and is obviously correct.
        
        If skb->len <= dst_mtu and frag_list is non-empty then it implies
        that dst_pmtu has increased since the fragments were constructed
        as dst_pmtu = dst_mtu + IPsec overhead.  So the skb will now fit
        within a single fragment which means that setting DF is correct.
        The fragments will be merged by skb_linearise in dev_queue_xmit.
        
        If skb->len > dst_mtu and frag_list is non-empty then again this
        maintains the status quo.
        
        If skb->len > dst_mtu and frag_list is empty then we will leave the
        DF bit clear as the packet will need to be fragmented between the
        remote IPsec gateway and the final destination.
        
        Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>
        Signed-off-by: David S. Miller <[EMAIL PROTECTED]>



 ip_output.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)


diff -Nru a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
--- a/net/ipv4/ip_output.c      2005-03-26 17:24:41 -08:00
+++ b/net/ipv4/ip_output.c      2005-03-26 17:24:41 -08:00
@@ -1152,7 +1152,8 @@
         * If local_df is set too, we still allow to fragment this frame
         * locally. */
        if (inet->pmtudisc == IP_PMTUDISC_DO ||
-           (!skb_shinfo(skb)->frag_list && ip_dont_fragment(sk, &rt->u.dst)))
+           (skb->len <= dst_mtu(&rt->u.dst) &&
+            ip_dont_fragment(sk, &rt->u.dst)))
                df = htons(IP_DF);
 
        if (inet->cork.flags & IPCORK_OPT)
-
To unsubscribe from this list: send the line "unsubscribe bk-commits-head" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to