On Thu, Apr 21, 2022 at 02:37:48PM +0200, Mark Kettenis wrote:
> > Date: Wed, 20 Apr 2022 18:14:57 +0200
> > From: Anton Lindqvist <an...@basename.se>
> > 
> > On Tue, Apr 19, 2022 at 06:07:47PM +0200, Anton Lindqvist wrote:
> > > On Tue, Apr 19, 2022 at 07:32:36AM +0200, Anton Lindqvist wrote:
> > > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote:
> > > > > >Synopsis:    bse: null dereference in genet_rxintr()
> > > > > >Category:    arm64
> > > > > >Environment:
> > > > >       System      : OpenBSD 7.1
> > > > >       Details     : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 
> > > > > 06:55:12 MDT 2022
> > > > >                       
> > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > > > > 
> > > > >       Architecture: OpenBSD.arm64
> > > > >       Machine     : arm64
> > > > > >Description:
> > > > > 
> > > > > Booting my rpi4 often but not always causes a panic while rc(8) tries 
> > > > > to start
> > > > > the bse network interface:
> > > > > 
> > > > > panic: attempt to access user address 0x38 from EL1
> > > > > Stopped at      panic+0x160:    cmp     w21, #0x0
> > > > >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > > > > *     0      0      0     0x10000      0x200    0K swapper
> > > > > db_enter() at panic+0x15c
> > > > > panic() at do_el1h_sync+0x1f8
> > > > > do_el1h_sync() at handle_el1h_sync+0x6c
> > > > > handle_el1h_sync() at genet_rxintr+0x120
> > > > > genet_rxintr() at genet_intr+0x74
> > > > > genet_intr() at ampintc_irq_handler+0x14c
> > > > > ampintc_irq_handler() at arm_cpu_irq+0x30
> > > > > arm_cpu_irq() at handle_el1h_irq+0x6c
> > > > > handle_el1h_irq() at ampintc_splx+0x80
> > > > > ampintc_splx() at genet_ioctl+0x158
> > > > > genet_ioctl() at ifioctl+0x308
> > > > > ifioctl() at nfs_boot_init+0xc0
> > > > > nfs_boot_init() at nfs_mountroot+0x3c
> > > > > nfs_mountroot() at main+0x464
> > > > > main() at virtdone+0x70
> > > > > 
> > > > > >Fix:
> > > > > 
> > > > > The mbuf associated with the current index is NULL. I noticed that 
> > > > > the NetBSD
> > > > > driver allocates mbufs for each ring entry in genet_setup_dma(). But 
> > > > > even with
> > > > > that in place the same panic still occurs. Enabling GENET_DEBUG shows 
> > > > > that the
> > > > > total is quite high:
> > > > > 
> > > > > RX pidx=0000ca07 total=51463                                          
> > > > >                          
> > > > > 
> > > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null 
> > > > > dereference will
> > > > > still happen after doing more than 256 iterations in genet_rxintr() 
> > > > > since we
> > > > > will start accessing mbufs cleared by the previous iteration.
> > > > > 
> > > > > Here's a diff with what I've tried so far. The KASSERT() is just 
> > > > > capturing the
> > > > > problem at an earlier stage. Any pointers would be much appreciated.
> > > > 
> > > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are
> > > > ignored by the hardware. That's why I ended up with a large amount of
> > > > mbufs available in genet_rxintr() since the software and hardware state
> > > > was out of sync. Honoring any existing value makes the problem go away
> > > > and matches what u-boot[1] does as well.
> > > > 
> > > > The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably
> > > > carefully selected as they ensure that the rx ring is filled with at
> > > > least the configured low watermark number of mbufs. However, instead of
> > > > being forced to ensure a pidx - cidx delta above 0 on the first
> > > > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be
> > > > passed as the max argument to if_rxr_get() which will clamp the value
> > > > anyway.
> > > > 
> > > > Also, I've seen up to 8 mbufs being available per rx interrupt which is
> > > > odd as only a less amount of rx ring entries are actually populated. Not
> > > > sure if the driver is missing some interrupt threshold configuration.
> > > > Increasing the rx ring low watermark to 8 "solved" it for now.
> > > > Otherwise, the same null dereference occurs while trying to access empty
> > > > mbuf ring entries.
> > > > 
> > > > Worth mentioning is that the NetBSD driver does not suffer from the same
> > > > problem as they keep all rx ring entries populated all the time.
> > > > 
> > > > Looking for feedback and OKs at this point.
> > > > 
> > > > [1] 
> > > > https://github.com/u-boot/u-boot/blob/a94ab561e2f49a80d8579930e840b810ab1a1330/drivers/net/bcmgenet.c#L404
> > > 
> > > While putting more pressure on the network I'm seeing up to 100 mbufs
> > > being available per rx interrupt. Could it simply be explained by the
> > > hardware operating under the assumption that all ring entries are
> > > available? Even if instructing the hardware about the actual amount of
> > > available ring entries would require the driver to keep it in sync
> > > whenever the if_rxr_*() implementation decides to adjust the ring.
> > > 
> > > Moving to if_rxr_init(RX_DESC_COUNT, RX_DESC_COUNT) essentially making
> > > all 256 ring entries always available makes the driver stable.
> > 
> > Here's the diff I've been running lately which is deemed to be stable.
> > Changes since last time:
> > 
> > * Ensure all 256 ring entries are always populated. Usage of the
> >   if_rxr_*() API doesn't do much at this point.
> > 
> > * Update the consumer index only after a ring entry has been consumed
> >   and not after filling in blank ones. This should have the same effect
> >   as the previous logic in genet_fill_rx_ring() but it makes our driver
> >   look like the one in NetBSD, FreeBSD and Linux.
> 
> But that means we no longer provide "mclgeti" DoS mitigation on the Pi.
> 
> There has to be a way for the hardware to prevent it from completely
> filling up the ring and overwrite ring entries that haven't been
> processed yet by the software stack.

If I configure GENET_RX_DMA_RING_BUF_SIZE_DESC_COUNT using the same low
watermark passed to if_rxr_init() it works. But I'm unsure on how to
reconfigure it once the software decides to adjust the ring as this
happens while we're already are accepting packets and receive DMA is
therefore enabled. Any ideas?

Reply via email to