> diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
> index f71ef82a5f3d..bf588f508b79 100644
> --- a/net/core/lwt_bpf.c
> +++ b/net/core/lwt_bpf.c
> @@ -612,10 +613,16 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void
> *hdr, u32 len, bool ingress)
> ipv4 = true;
> if (unlikely(len < iph->ihl * 4))
> return -EINVAL;
Can iph->ihl == 0 slip past this check? Since len is a u32, iph->ihl * 4
is 0 when iph->ihl is 0, and len < 0 is never true, so the buffer is
accepted without returning -EINVAL.
ip_fast_csum() is later called with this iph->ihl, and on some
architectures (PowerPC, MIPS) that implementation assumes ihl >= 5 and
decrements the loop counter, which can underflow when ihl is 0.
Would an explicit iph->ihl >= 5 guard be needed here, for example:
if (unlikely(iph->ihl < 5 || len < iph->ihl * 4))
return -EINVAL;
This was raised in review of v1 and an earlier revision added the
iph->ihl < 5 guard, but the current revision appears to have dropped it
again.
> is_udp_tunnel = iph->protocol == IPPROTO_UDP;
> if (unlikely(is_udp_tunnel && len < iph->ihl * 4 +
> sizeof(struct udphdr)))
> return -EINVAL;
> } else if (iph->version == 6) {
> ipv4 = false;
> if (unlikely(len < sizeof(struct ipv6hdr)))
> return -EINVAL;
> is_udp_tunnel = ((struct ipv6hdr *)iph)->nexthdr == NEXTHDR_UDP;
> if (unlikely(is_udp_tunnel && len < sizeof(struct ipv6hdr) +
> sizeof(struct udphdr)))
> return -EINVAL;
> } else {
> return -EINVAL;
> }
> @@ -637,6 +644,11 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void
> *hdr, u32 len, bool ingress)
> if (ingress)
> skb_postpush_rcsum(skb, iph, len);
> skb_reset_network_header(skb);
> + if (is_udp_tunnel) {
> + size_t iph_sz = ipv4 ? iph->ihl * 4 : sizeof(struct ipv6hdr);
> +
> + skb_set_transport_header(skb, skb_network_offset(skb) + iph_sz);
> + }
Can iph->ihl change between the initial length validation and this read?
The hdr pointer, and therefore iph, refers to a buffer provided by the BPF
program, which can be concurrently modified by userspace through a shared
BPF map. iph->ihl is re-read here to compute iph_sz, and this read happens
before the memcpy() of the header into the skb:
if (is_udp_tunnel) {
size_t iph_sz = ipv4 ? iph->ihl * 4 : sizeof(struct ipv6hdr);
skb_set_transport_header(skb, skb_network_offset(skb) + iph_sz);
}
memcpy(skb_network_header(skb), hdr, len);
If iph->ihl is concurrently changed to a larger value such as 15 after the
initial validation, can the transport header offset be set past skb->tail,
leading to out-of-bounds access when drivers later dereference the
transport header for checksum updates?
The suggestion in the v1 discussion was to move skb_set_transport_header()
after the memcpy() so it reads from the copied-in header, but the current
revision still reads iph->ihl before the memcpy().
> memcpy(skb_network_header(skb), hdr, len);
> bpf_compute_data_pointers(skb);
> skb_clear_hash(skb);
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26830439032