> diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
> index f71ef82a5f3d..65d1dfbf3312 100644
> --- a/net/core/lwt_bpf.c
> +++ b/net/core/lwt_bpf.c
> @@ -612,10 +613,16 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void 
> *hdr, u32 len, bool ingress)
>               ipv4 = true;
>               if (unlikely(len < iph->ihl * 4))
>                       return -EINVAL;

Since len is a u32, if iph->ihl is 0 then iph->ihl * 4 is 0 and the
"len < iph->ihl * 4" check is never true, so it passes without
returning -EINVAL.

Later ip_fast_csum() is called with iph->ihl, and on some architectures
(PowerPC, MIPS) that implementation assumes ihl >= 5 and decrements the
loop counter, which underflows when ihl is 0.

Should there be an explicit iph->ihl >= 5 check here, for example:

        if (unlikely(iph->ihl < 5 || len < iph->ihl * 4))
                return -EINVAL;

This was raised in review of v1 and v2 added the guard, but v3 appears
to have dropped it again.  The earlier discussion is here:

  https://lore.kernel.org/bpf/[email protected]

> +             is_udp_tunnel = iph->protocol == IPPROTO_UDP;
> +             if (unlikely(is_udp_tunnel && len < iph->ihl * 4 + 
> sizeof(struct udphdr)))
> +                     return -EINVAL;
>       } else if (iph->version == 6) {
>               ipv4 = false;
>               if (unlikely(len < sizeof(struct ipv6hdr)))
>                       return -EINVAL;
> +             is_udp_tunnel = ((struct ipv6hdr *)iph)->nexthdr == NEXTHDR_UDP;
> +             if (unlikely(is_udp_tunnel && len < sizeof(struct ipv6hdr) + 
> sizeof(struct udphdr)))
> +                     return -EINVAL;
>       } else {
>               return -EINVAL;
>       }

[ ... ]

> @@ -637,6 +644,10 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void 
> *hdr, u32 len, bool ingress)
>       if (ingress)
>               skb_postpush_rcsum(skb, iph, len);
>       skb_reset_network_header(skb);
> +     if (ipv4 && is_udp_tunnel)
> +             skb_set_transport_header(skb, skb_network_offset(skb) + 
> iph->ihl * 4);
> +     else if (!ipv4 && is_udp_tunnel)
> +             skb_set_transport_header(skb, skb_network_offset(skb) + 
> sizeof(struct ipv6hdr));
>       memcpy(skb_network_header(skb), hdr, len);

Can iph->ihl change between the initial length validation and this call?

The hdr pointer (and therefore iph) refers to a buffer provided by the
BPF program, which can be concurrently modified by userspace through a
shared BPF map.  Because skb_cow_head() acts as a compiler barrier, the
compiler reloads iph->ihl from that untrusted memory here.

This read also happens before the memcpy() into the skb:

        skb_set_transport_header(skb, skb_network_offset(skb) + iph->ihl * 4);
        memcpy(skb_network_header(skb), hdr, len);

If iph->ihl is concurrently changed to a larger value (such as 15) after
the initial validation, can the transport header offset be set beyond
skb->tail, leading to out-of-bounds writes when drivers later dereference
the transport header for checksum updates?

In the v1 discussion the suggestion was to move
skb_set_transport_header() after the memcpy() so it reads from the
copied-in header, but the current code still reads iph->ihl before the
memcpy().  See:

  https://lore.kernel.org/bpf/[email protected]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26763883406

Reply via email to