On Thu, Oct 9, 2025 at 10:28 PM Maciej Fijalkowski <[email protected]> wrote: > > On Wed, Oct 08, 2025 at 06:56:59PM +0200, Alexander Lobakin wrote: > > Turned out certain clearly invalid values passed in &xdp_desc from > > userspace can pass xp_{,un}aligned_validate_desc() and then lead > > to UBs or just invalid frames to be queued for xmit. > > > > desc->len close to ``U32_MAX`` with a non-zero pool->tx_metadata_len > > can cause positive integer overflow and wraparound, the same way low > > enough desc->addr with a non-zero pool->tx_metadata_len can cause > > negative integer overflow. Both scenarios can then pass the > > validation successfully. > > Hmm, when underflow happens the addr would be enormous, passing > existing validation would really be rare. However let us fix it while at > it. > > > This doesn't happen with valid XSk applications, but can be used > > to perform attacks. > > > > Always promote desc->len to ``u64`` first to exclude positive > > overflows of it. Use explicit check_{add,sub}_overflow() when > > validating desc->addr (which is ``u64`` already). > > > > bloat-o-meter reports a little growth of the code size: > > > > add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44) > > Function old new delta > > xskq_cons_peek_desc 299 330 +31 > > xsk_tx_peek_release_desc_batch 973 1002 +29 > > xsk_generic_xmit 3148 3132 -16 > > > > but hopefully this doesn't hurt the performance much. > > Let us be fully transparent and link the previous discussion here? > > I was commenting that breaking up single statement to multiple branches > might affect subtly performance as this code is executed per each > descriptor. Jason tested copy+aligned mode, let us see if zc+unaligned > mode is affected. > > <rant> > I am also thinking about test side, but xsk tx metadata came with a > separate test (xdp_hw_metadata), which was rather about testing positive > cases. That is probably a separate discussion, but metadata negative > tests should appear somewhere, I suppose xskxceiver would be a good fit, > but then, should we merge the existing logic from xdp_hw_metadata? > </rant> > > > > > Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len") > > Cc: [email protected] # 6.8+ > > Signed-off-by: Alexander Lobakin <[email protected]> > > --- > > net/xdp/xsk_queue.h | 45 +++++++++++++++++++++++++++++++++++---------- > > 1 file changed, 35 insertions(+), 10 deletions(-) > > > > diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h > > index f16f390370dc..1eb8d9f8b104 100644 > > --- a/net/xdp/xsk_queue.h > > +++ b/net/xdp/xsk_queue.h > > @@ -143,14 +143,24 @@ static inline bool xp_unused_options_set(u32 options) > > static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, > > struct xdp_desc *desc) > > { > > - u64 addr = desc->addr - pool->tx_metadata_len; > > - u64 len = desc->len + pool->tx_metadata_len; > > - u64 offset = addr & (pool->chunk_size - 1); > > + u64 len = desc->len; > > + u64 addr, offset; > > > > - if (!desc->len) > > + if (!len) > > This is yet another thing being fixed here as for non-zero tx_metadata_len > we were allowing 0 length descriptors... :< overall feels like we relied > too much on contract with userspace WRT descriptor layout. > > If zc perf is fine, then:
Testing on IXGBE that can reach 10Gb/sec line rate, I didn't see any impact either. This time I used -u to cover the unaligned case. What I did was just like below: # sysctl -w vm.nr_hugepages=1024 # taskset -c 1 ./xdpsock -i eth1 -t -z -u -s 64 Thanks, Jason > Reviewed-by: Maciej Fijalkowski <[email protected]> > > > return false; > > > > - if (offset + len > pool->chunk_size) > > + /* Can overflow if desc->addr < pool->tx_metadata_len */ > > + if (check_sub_overflow(desc->addr, pool->tx_metadata_len, &addr)) > > + return false; > > + > > + offset = addr & (pool->chunk_size - 1); > > + > > + /* > > + * Can't overflow: @offset is guaranteed to be < ``U32_MAX`` > > + * (pool->chunk_size is ``u32``), @len is guaranteed > > + * to be <= ``U32_MAX``. > > + */ > > + if (offset + len + pool->tx_metadata_len > pool->chunk_size) > > return false; > > > > if (addr >= pool->addrs_cnt) > > @@ -158,27 +168,42 @@ static inline bool xp_aligned_validate_desc(struct > > xsk_buff_pool *pool, > > > > if (xp_unused_options_set(desc->options)) > > return false; > > + > > return true; > > } > > > > static inline bool xp_unaligned_validate_desc(struct xsk_buff_pool *pool, > > struct xdp_desc *desc) > > { > > - u64 addr = xp_unaligned_add_offset_to_addr(desc->addr) - > > pool->tx_metadata_len; > > - u64 len = desc->len + pool->tx_metadata_len; > > + u64 len = desc->len; > > + u64 addr, end; > > > > - if (!desc->len) > > + if (!len) > > return false; > > > > + /* Can't overflow: @len is guaranteed to be <= ``U32_MAX`` */ > > + len += pool->tx_metadata_len; > > if (len > pool->chunk_size) > > return false; > > > > - if (addr >= pool->addrs_cnt || addr + len > pool->addrs_cnt || > > - xp_desc_crosses_non_contig_pg(pool, addr, len)) > > + /* Can overflow if desc->addr is close to 0 */ > > + if (check_sub_overflow(xp_unaligned_add_offset_to_addr(desc->addr), > > + pool->tx_metadata_len, &addr)) > > + return false; > > + > > + if (addr >= pool->addrs_cnt) > > + return false; > > + > > + /* Can overflow if pool->addrs_cnt is high enough */ > > + if (check_add_overflow(addr, len, &end) || end > pool->addrs_cnt) > > + return false; > > + > > + if (xp_desc_crosses_non_contig_pg(pool, addr, len)) > > return false; > > > > if (xp_unused_options_set(desc->options)) > > return false; > > + > > return true; > > } > > > > -- > > 2.51.0 > > >
