> Date: Tue, 19 Apr 2022 07:32:36 +0200
> From: Anton Lindqvist <an...@basename.se>
> 
> On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote:
> > >Synopsis:  bse: null dereference in genet_rxintr()
> > >Category:  arm64
> > >Environment:
> >     System      : OpenBSD 7.1
> >     Details     : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 06:55:12 
> > MDT 2022
> >                     
> > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > 
> >     Architecture: OpenBSD.arm64
> >     Machine     : arm64
> > >Description:
> > 
> > Booting my rpi4 often but not always causes a panic while rc(8) tries to 
> > start
> > the bse network interface:
> > 
> > panic: attempt to access user address 0x38 from EL1
> > Stopped at      panic+0x160:    cmp     w21, #0x0
> >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > *     0      0      0     0x10000      0x200    0K swapper
> > db_enter() at panic+0x15c
> > panic() at do_el1h_sync+0x1f8
> > do_el1h_sync() at handle_el1h_sync+0x6c
> > handle_el1h_sync() at genet_rxintr+0x120
> > genet_rxintr() at genet_intr+0x74
> > genet_intr() at ampintc_irq_handler+0x14c
> > ampintc_irq_handler() at arm_cpu_irq+0x30
> > arm_cpu_irq() at handle_el1h_irq+0x6c
> > handle_el1h_irq() at ampintc_splx+0x80
> > ampintc_splx() at genet_ioctl+0x158
> > genet_ioctl() at ifioctl+0x308
> > ifioctl() at nfs_boot_init+0xc0
> > nfs_boot_init() at nfs_mountroot+0x3c
> > nfs_mountroot() at main+0x464
> > main() at virtdone+0x70
> > 
> > >Fix:
> > 
> > The mbuf associated with the current index is NULL. I noticed that the 
> > NetBSD
> > driver allocates mbufs for each ring entry in genet_setup_dma(). But even 
> > with
> > that in place the same panic still occurs. Enabling GENET_DEBUG shows that 
> > the
> > total is quite high:
> > 
> > RX pidx=0000ca07 total=51463                                                
> >                    
> > 
> > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null dereference 
> > will
> > still happen after doing more than 256 iterations in genet_rxintr() since we
> > will start accessing mbufs cleared by the previous iteration.
> > 
> > Here's a diff with what I've tried so far. The KASSERT() is just capturing 
> > the
> > problem at an earlier stage. Any pointers would be much appreciated.
> 
> Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are
> ignored by the hardware. That's why I ended up with a large amount of
> mbufs available in genet_rxintr() since the software and hardware state
> was out of sync. Honoring any existing value makes the problem go away
> and matches what u-boot[1] does as well.

Writing to GENET_RX_DMA_PROD_INDEX works for me.  The U-Boot code says
that writing 0 doesn't work.  But even that works for me.  So I'm
puzzled.

> The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably
> carefully selected as they ensure that the rx ring is filled with at
> least the configured low watermark number of mbufs. However, instead of
> being forced to ensure a pidx - cidx delta above 0 on the first
> invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be
> passed as the max argument to if_rxr_get() which will clamp the value
> anyway.

Well, what the code does is setting the "prod" index ahead of the
"cons" index to simulate a full ring.  And then when we (partially)
fill the ring we increase "cons" to make descriptors available to the
hardware.  This seems to work on my hardware and I've never seen the
crash you're seeing.

Reply via email to