> diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
> index f71ef82a5f3d..65d1dfbf3312 100644
> --- a/net/core/lwt_bpf.c
> +++ b/net/core/lwt_bpf.c
> @@ -612,10 +613,16 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void
> *hdr, u32 len, bool ingress)
> ipv4 = true;
> if (unlikely(len < iph->ihl * 4))
> return -EINVAL;
Since len is a u32, if iph->ihl is 0 then iph->ihl * 4 is 0 and the
"len < iph->ihl * 4" check is never true, so it passes without
returning -EINVAL.
Later ip_fast_csum() is called with iph->ihl, and on some architectures
(PowerPC, MIPS) that implementation assumes ihl >= 5 and decrements the
loop counter, which underflows when ihl is 0.
Should there be an explicit iph->ihl >= 5 check here, for example:
if (unlikely(iph->ihl < 5 || len < iph->ihl * 4))
return -EINVAL;
This was raised in review of v1 and v2 added the guard, but v3 appears
to have dropped it again. The earlier discussion is here:
https://lore.kernel.org/bpf/[email protected]
> + is_udp_tunnel = iph->protocol == IPPROTO_UDP;
> + if (unlikely(is_udp_tunnel && len < iph->ihl * 4 +
> sizeof(struct udphdr)))
> + return -EINVAL;
> } else if (iph->version == 6) {
> ipv4 = false;
> if (unlikely(len < sizeof(struct ipv6hdr)))
> return -EINVAL;
> + is_udp_tunnel = ((struct ipv6hdr *)iph)->nexthdr == NEXTHDR_UDP;
> + if (unlikely(is_udp_tunnel && len < sizeof(struct ipv6hdr) +
> sizeof(struct udphdr)))
> + return -EINVAL;
> } else {
> return -EINVAL;
> }
[ ... ]
> @@ -637,6 +644,10 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void
> *hdr, u32 len, bool ingress)
> if (ingress)
> skb_postpush_rcsum(skb, iph, len);
> skb_reset_network_header(skb);
> + if (ipv4 && is_udp_tunnel)
> + skb_set_transport_header(skb, skb_network_offset(skb) +
> iph->ihl * 4);
> + else if (!ipv4 && is_udp_tunnel)
> + skb_set_transport_header(skb, skb_network_offset(skb) +
> sizeof(struct ipv6hdr));
> memcpy(skb_network_header(skb), hdr, len);
Can iph->ihl change between the initial length validation and this call?
The hdr pointer (and therefore iph) refers to a buffer provided by the
BPF program, which can be concurrently modified by userspace through a
shared BPF map. Because skb_cow_head() acts as a compiler barrier, the
compiler reloads iph->ihl from that untrusted memory here.
This read also happens before the memcpy() into the skb:
skb_set_transport_header(skb, skb_network_offset(skb) + iph->ihl * 4);
memcpy(skb_network_header(skb), hdr, len);
If iph->ihl is concurrently changed to a larger value (such as 15) after
the initial validation, can the transport header offset be set beyond
skb->tail, leading to out-of-bounds writes when drivers later dereference
the transport header for checksum updates?
In the v1 discussion the suggestion was to move
skb_set_transport_header() after the memcpy() so it reads from the
copied-in header, but the current code still reads iph->ihl before the
memcpy(). See:
https://lore.kernel.org/bpf/[email protected]
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26763883406