Re: bse: null dereference in genet_rxintr()
> Date: Mon, 2 May 2022 07:15:51 +0200 > From: Anton Lindqvist > > On Mon, May 02, 2022 at 12:32:24AM +0200, Mark Kettenis wrote: > > > Date: Sun, 1 May 2022 20:13:57 +0200 > > > From: Anton Lindqvist > > > > > > On Sat, Apr 30, 2022 at 04:07:51PM +0200, Mark Kettenis wrote: > > > > > Date: Tue, 19 Apr 2022 07:32:36 +0200 > > > > > From: Anton Lindqvist > > > > > > > > > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote: > > > > > > >Synopsis: bse: null dereference in genet_rxintr() > > > > > > >Category: arm64 > > > > > > >Environment: > > > > > > System : OpenBSD 7.1 > > > > > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 > > > > > > 06:55:12 MDT 2022 > > > > > > > > > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > > > > > > > > > > > Architecture: OpenBSD.arm64 > > > > > > Machine : arm64 > > > > > > >Description: > > > > > > > > > > > > Booting my rpi4 often but not always causes a panic while rc(8) > > > > > > tries to start > > > > > > the bse network interface: > > > > > > > > > > > > panic: attempt to access user address 0x38 from EL1 > > > > > > Stopped at panic+0x160:cmp w21, #0x0 > > > > > > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > > > > > > * 0 0 0 0x1 0x2000K swapper > > > > > > db_enter() at panic+0x15c > > > > > > panic() at do_el1h_sync+0x1f8 > > > > > > do_el1h_sync() at handle_el1h_sync+0x6c > > > > > > handle_el1h_sync() at genet_rxintr+0x120 > > > > > > genet_rxintr() at genet_intr+0x74 > > > > > > genet_intr() at ampintc_irq_handler+0x14c > > > > > > ampintc_irq_handler() at arm_cpu_irq+0x30 > > > > > > arm_cpu_irq() at handle_el1h_irq+0x6c > > > > > > handle_el1h_irq() at ampintc_splx+0x80 > > > > > > ampintc_splx() at genet_ioctl+0x158 > > > > > > genet_ioctl() at ifioctl+0x308 > > > > > > ifioctl() at nfs_boot_init+0xc0 > > > > > > nfs_boot_init() at nfs_mountroot+0x3c > > > > > > nfs_mountroot() at main+0x464 > > > > > > main() at virtdone+0x70 > > > > > > > > > > > > >Fix: > > > > > > > > > > > > The mbuf associated with the current index is NULL. I noticed that > > > > > > the NetBSD > > > > > > driver allocates mbufs for each ring entry in genet_setup_dma(). > > > > > > But even with > > > > > > that in place the same panic still occurs. Enabling GENET_DEBUG > > > > > > shows that the > > > > > > total is quite high: > > > > > > > > > > > > RX pidx=ca07 total=51463 > > > > > > > > > > > > > > > > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null > > > > > > dereference will > > > > > > still happen after doing more than 256 iterations in genet_rxintr() > > > > > > since we > > > > > > will start accessing mbufs cleared by the previous iteration. > > > > > > > > > > > > Here's a diff with what I've tried so far. The KASSERT() is just > > > > > > capturing the > > > > > > problem at an earlier stage. Any pointers would be much appreciated. > > > > > > > > > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are > > > > > ignored by the hardware. That's why I ended up with a large amount of > > > > > mbufs available in genet_rxintr() since the software and hardware > > > > > state > > > > > was out of sync. Honoring any existing value makes the problem go away > > > > > and matches what u-boot[1] does as well. > > > > > > > > Writing to GENET_RX_DMA_PROD_INDEX works for me. The U-Boot code says > > > > that writing 0 doesn't work. But even that works for me. So I'
Re: bse: null dereference in genet_rxintr()
On Mon, May 02, 2022 at 12:32:24AM +0200, Mark Kettenis wrote: > > Date: Sun, 1 May 2022 20:13:57 +0200 > > From: Anton Lindqvist > > > > On Sat, Apr 30, 2022 at 04:07:51PM +0200, Mark Kettenis wrote: > > > > Date: Tue, 19 Apr 2022 07:32:36 +0200 > > > > From: Anton Lindqvist > > > > > > > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote: > > > > > >Synopsis:bse: null dereference in genet_rxintr() > > > > > >Category:arm64 > > > > > >Environment: > > > > > System : OpenBSD 7.1 > > > > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 > > > > > 06:55:12 MDT 2022 > > > > > > > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > > > > > > > > > Architecture: OpenBSD.arm64 > > > > > Machine : arm64 > > > > > >Description: > > > > > > > > > > Booting my rpi4 often but not always causes a panic while rc(8) tries > > > > > to start > > > > > the bse network interface: > > > > > > > > > > panic: attempt to access user address 0x38 from EL1 > > > > > Stopped at panic+0x160:cmp w21, #0x0 > > > > > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > > > > > * 0 0 0 0x1 0x2000K swapper > > > > > db_enter() at panic+0x15c > > > > > panic() at do_el1h_sync+0x1f8 > > > > > do_el1h_sync() at handle_el1h_sync+0x6c > > > > > handle_el1h_sync() at genet_rxintr+0x120 > > > > > genet_rxintr() at genet_intr+0x74 > > > > > genet_intr() at ampintc_irq_handler+0x14c > > > > > ampintc_irq_handler() at arm_cpu_irq+0x30 > > > > > arm_cpu_irq() at handle_el1h_irq+0x6c > > > > > handle_el1h_irq() at ampintc_splx+0x80 > > > > > ampintc_splx() at genet_ioctl+0x158 > > > > > genet_ioctl() at ifioctl+0x308 > > > > > ifioctl() at nfs_boot_init+0xc0 > > > > > nfs_boot_init() at nfs_mountroot+0x3c > > > > > nfs_mountroot() at main+0x464 > > > > > main() at virtdone+0x70 > > > > > > > > > > >Fix: > > > > > > > > > > The mbuf associated with the current index is NULL. I noticed that > > > > > the NetBSD > > > > > driver allocates mbufs for each ring entry in genet_setup_dma(). But > > > > > even with > > > > > that in place the same panic still occurs. Enabling GENET_DEBUG shows > > > > > that the > > > > > total is quite high: > > > > > > > > > > RX pidx=ca07 total=51463 > > > > > > > > > > > > > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null > > > > > dereference will > > > > > still happen after doing more than 256 iterations in genet_rxintr() > > > > > since we > > > > > will start accessing mbufs cleared by the previous iteration. > > > > > > > > > > Here's a diff with what I've tried so far. The KASSERT() is just > > > > > capturing the > > > > > problem at an earlier stage. Any pointers would be much appreciated. > > > > > > > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are > > > > ignored by the hardware. That's why I ended up with a large amount of > > > > mbufs available in genet_rxintr() since the software and hardware state > > > > was out of sync. Honoring any existing value makes the problem go away > > > > and matches what u-boot[1] does as well. > > > > > > Writing to GENET_RX_DMA_PROD_INDEX works for me. The U-Boot code says > > > that writing 0 doesn't work. But even that works for me. So I'm > > > puzzled. > > > > > > > The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably > > > > carefully selected as they ensure that the rx ring is filled with at > > > > least the configured low watermark number of mbufs. However, instead of > > > > being forced to ensure a pidx - cidx delta above 0 on the first > > > > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be > > > > passed as the max argument to if_rxr_get() which will clamp the value > > > > anyway. > > > > > > Well, what the code does is setting the "prod" index ahead of the > > > "cons" index to simulate a full ring. And then when we (partially) > > > fill the ring we increase "cons" to make descriptors available to the > > > hardware. This seems to work on my hardware and I've never seen the > > > crash you're seeing. > > > > After further meditation I realized this only happens when netbooting > > through u-boot. Disabling DMA as part of the hardware reset solves the > > issue, at last. > > So reseting the MAC doesn't actually reset the DMA engine... > > Doing this should be fine. Although the firmware really should > disable the DMA before handing control to the OS. Can you file a bug > in the appropriate channel? Sure, in the meantime can this go in?
Re: bse: null dereference in genet_rxintr()
> Date: Sun, 1 May 2022 20:13:57 +0200 > From: Anton Lindqvist > > On Sat, Apr 30, 2022 at 04:07:51PM +0200, Mark Kettenis wrote: > > > Date: Tue, 19 Apr 2022 07:32:36 +0200 > > > From: Anton Lindqvist > > > > > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote: > > > > >Synopsis: bse: null dereference in genet_rxintr() > > > > >Category: arm64 > > > > >Environment: > > > > System : OpenBSD 7.1 > > > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 > > > > 06:55:12 MDT 2022 > > > > > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > > > > > > > Architecture: OpenBSD.arm64 > > > > Machine : arm64 > > > > >Description: > > > > > > > > Booting my rpi4 often but not always causes a panic while rc(8) tries > > > > to start > > > > the bse network interface: > > > > > > > > panic: attempt to access user address 0x38 from EL1 > > > > Stopped at panic+0x160:cmp w21, #0x0 > > > > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > > > > * 0 0 0 0x1 0x2000K swapper > > > > db_enter() at panic+0x15c > > > > panic() at do_el1h_sync+0x1f8 > > > > do_el1h_sync() at handle_el1h_sync+0x6c > > > > handle_el1h_sync() at genet_rxintr+0x120 > > > > genet_rxintr() at genet_intr+0x74 > > > > genet_intr() at ampintc_irq_handler+0x14c > > > > ampintc_irq_handler() at arm_cpu_irq+0x30 > > > > arm_cpu_irq() at handle_el1h_irq+0x6c > > > > handle_el1h_irq() at ampintc_splx+0x80 > > > > ampintc_splx() at genet_ioctl+0x158 > > > > genet_ioctl() at ifioctl+0x308 > > > > ifioctl() at nfs_boot_init+0xc0 > > > > nfs_boot_init() at nfs_mountroot+0x3c > > > > nfs_mountroot() at main+0x464 > > > > main() at virtdone+0x70 > > > > > > > > >Fix: > > > > > > > > The mbuf associated with the current index is NULL. I noticed that the > > > > NetBSD > > > > driver allocates mbufs for each ring entry in genet_setup_dma(). But > > > > even with > > > > that in place the same panic still occurs. Enabling GENET_DEBUG shows > > > > that the > > > > total is quite high: > > > > > > > > RX pidx=ca07 total=51463 > > > > > > > > > > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null > > > > dereference will > > > > still happen after doing more than 256 iterations in genet_rxintr() > > > > since we > > > > will start accessing mbufs cleared by the previous iteration. > > > > > > > > Here's a diff with what I've tried so far. The KASSERT() is just > > > > capturing the > > > > problem at an earlier stage. Any pointers would be much appreciated. > > > > > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are > > > ignored by the hardware. That's why I ended up with a large amount of > > > mbufs available in genet_rxintr() since the software and hardware state > > > was out of sync. Honoring any existing value makes the problem go away > > > and matches what u-boot[1] does as well. > > > > Writing to GENET_RX_DMA_PROD_INDEX works for me. The U-Boot code says > > that writing 0 doesn't work. But even that works for me. So I'm > > puzzled. > > > > > The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably > > > carefully selected as they ensure that the rx ring is filled with at > > > least the configured low watermark number of mbufs. However, instead of > > > being forced to ensure a pidx - cidx delta above 0 on the first > > > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be > > > passed as the max argument to if_rxr_get() which will clamp the value > > > anyway. > > > > Well, what the code does is setting the "prod" index ahead of the > > "cons" index to simulate a full ring. And then when we (partially) > > fill the ring we increase "cons" to make descriptors available to the > > hardware. This seems to work on my hardware and I've never seen the >
Re: bse: null dereference in genet_rxintr()
On Sat, Apr 30, 2022 at 04:07:51PM +0200, Mark Kettenis wrote: > > Date: Tue, 19 Apr 2022 07:32:36 +0200 > > From: Anton Lindqvist > > > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote: > > > >Synopsis:bse: null dereference in genet_rxintr() > > > >Category:arm64 > > > >Environment: > > > System : OpenBSD 7.1 > > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 06:55:12 > > > MDT 2022 > > > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > > > > > Architecture: OpenBSD.arm64 > > > Machine : arm64 > > > >Description: > > > > > > Booting my rpi4 often but not always causes a panic while rc(8) tries to > > > start > > > the bse network interface: > > > > > > panic: attempt to access user address 0x38 from EL1 > > > Stopped at panic+0x160:cmp w21, #0x0 > > > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > > > * 0 0 0 0x1 0x2000K swapper > > > db_enter() at panic+0x15c > > > panic() at do_el1h_sync+0x1f8 > > > do_el1h_sync() at handle_el1h_sync+0x6c > > > handle_el1h_sync() at genet_rxintr+0x120 > > > genet_rxintr() at genet_intr+0x74 > > > genet_intr() at ampintc_irq_handler+0x14c > > > ampintc_irq_handler() at arm_cpu_irq+0x30 > > > arm_cpu_irq() at handle_el1h_irq+0x6c > > > handle_el1h_irq() at ampintc_splx+0x80 > > > ampintc_splx() at genet_ioctl+0x158 > > > genet_ioctl() at ifioctl+0x308 > > > ifioctl() at nfs_boot_init+0xc0 > > > nfs_boot_init() at nfs_mountroot+0x3c > > > nfs_mountroot() at main+0x464 > > > main() at virtdone+0x70 > > > > > > >Fix: > > > > > > The mbuf associated with the current index is NULL. I noticed that the > > > NetBSD > > > driver allocates mbufs for each ring entry in genet_setup_dma(). But even > > > with > > > that in place the same panic still occurs. Enabling GENET_DEBUG shows > > > that the > > > total is quite high: > > > > > > RX pidx=ca07 total=51463 > > > > > > > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null dereference > > > will > > > still happen after doing more than 256 iterations in genet_rxintr() since > > > we > > > will start accessing mbufs cleared by the previous iteration. > > > > > > Here's a diff with what I've tried so far. The KASSERT() is just > > > capturing the > > > problem at an earlier stage. Any pointers would be much appreciated. > > > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are > > ignored by the hardware. That's why I ended up with a large amount of > > mbufs available in genet_rxintr() since the software and hardware state > > was out of sync. Honoring any existing value makes the problem go away > > and matches what u-boot[1] does as well. > > Writing to GENET_RX_DMA_PROD_INDEX works for me. The U-Boot code says > that writing 0 doesn't work. But even that works for me. So I'm > puzzled. > > > The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably > > carefully selected as they ensure that the rx ring is filled with at > > least the configured low watermark number of mbufs. However, instead of > > being forced to ensure a pidx - cidx delta above 0 on the first > > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be > > passed as the max argument to if_rxr_get() which will clamp the value > > anyway. > > Well, what the code does is setting the "prod" index ahead of the > "cons" index to simulate a full ring. And then when we (partially) > fill the ring we increase "cons" to make descriptors available to the > hardware. This seems to work on my hardware and I've never seen the > crash you're seeing. After further meditation I realized this only happens when netbooting through u-boot. Disabling DMA as part of the hardware reset solves the issue, at last. The diff includes one unrelated queue id correction. It's harmless in practice as qid is already equal to GENET_DMA_DEFAULT_QUEUE but makes the code a bit less surprising IMO. diff --git sys/dev/ic/bcmgenet.c sys/dev/ic/bcmgenet.c index 923df40bdfc..c8a2fbfa2dc 100644 --- sys/dev/ic/bcmgenet.c +++ sys/dev/ic/bcmgenet.c @@ -439,11 +439,46 @@ genet_set
Re: bse: null dereference in genet_rxintr()
> Date: Tue, 19 Apr 2022 07:32:36 +0200 > From: Anton Lindqvist > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote: > > >Synopsis: bse: null dereference in genet_rxintr() > > >Category: arm64 > > >Environment: > > System : OpenBSD 7.1 > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 06:55:12 > > MDT 2022 > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > > > Architecture: OpenBSD.arm64 > > Machine : arm64 > > >Description: > > > > Booting my rpi4 often but not always causes a panic while rc(8) tries to > > start > > the bse network interface: > > > > panic: attempt to access user address 0x38 from EL1 > > Stopped at panic+0x160:cmp w21, #0x0 > > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > > * 0 0 0 0x1 0x2000K swapper > > db_enter() at panic+0x15c > > panic() at do_el1h_sync+0x1f8 > > do_el1h_sync() at handle_el1h_sync+0x6c > > handle_el1h_sync() at genet_rxintr+0x120 > > genet_rxintr() at genet_intr+0x74 > > genet_intr() at ampintc_irq_handler+0x14c > > ampintc_irq_handler() at arm_cpu_irq+0x30 > > arm_cpu_irq() at handle_el1h_irq+0x6c > > handle_el1h_irq() at ampintc_splx+0x80 > > ampintc_splx() at genet_ioctl+0x158 > > genet_ioctl() at ifioctl+0x308 > > ifioctl() at nfs_boot_init+0xc0 > > nfs_boot_init() at nfs_mountroot+0x3c > > nfs_mountroot() at main+0x464 > > main() at virtdone+0x70 > > > > >Fix: > > > > The mbuf associated with the current index is NULL. I noticed that the > > NetBSD > > driver allocates mbufs for each ring entry in genet_setup_dma(). But even > > with > > that in place the same panic still occurs. Enabling GENET_DEBUG shows that > > the > > total is quite high: > > > > RX pidx=ca07 total=51463 > > > > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null dereference > > will > > still happen after doing more than 256 iterations in genet_rxintr() since we > > will start accessing mbufs cleared by the previous iteration. > > > > Here's a diff with what I've tried so far. The KASSERT() is just capturing > > the > > problem at an earlier stage. Any pointers would be much appreciated. > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are > ignored by the hardware. That's why I ended up with a large amount of > mbufs available in genet_rxintr() since the software and hardware state > was out of sync. Honoring any existing value makes the problem go away > and matches what u-boot[1] does as well. Writing to GENET_RX_DMA_PROD_INDEX works for me. The U-Boot code says that writing 0 doesn't work. But even that works for me. So I'm puzzled. > The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably > carefully selected as they ensure that the rx ring is filled with at > least the configured low watermark number of mbufs. However, instead of > being forced to ensure a pidx - cidx delta above 0 on the first > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be > passed as the max argument to if_rxr_get() which will clamp the value > anyway. Well, what the code does is setting the "prod" index ahead of the "cons" index to simulate a full ring. And then when we (partially) fill the ring we increase "cons" to make descriptors available to the hardware. This seems to work on my hardware and I've never seen the crash you're seeing.
Re: bse: null dereference in genet_rxintr()
On Thu, Apr 21, 2022 at 02:37:48PM +0200, Mark Kettenis wrote: > > Date: Wed, 20 Apr 2022 18:14:57 +0200 > > From: Anton Lindqvist > > > > On Tue, Apr 19, 2022 at 06:07:47PM +0200, Anton Lindqvist wrote: > > > On Tue, Apr 19, 2022 at 07:32:36AM +0200, Anton Lindqvist wrote: > > > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote: > > > > > >Synopsis:bse: null dereference in genet_rxintr() > > > > > >Category:arm64 > > > > > >Environment: > > > > > System : OpenBSD 7.1 > > > > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 > > > > > 06:55:12 MDT 2022 > > > > > > > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > > > > > > > > > Architecture: OpenBSD.arm64 > > > > > Machine : arm64 > > > > > >Description: > > > > > > > > > > Booting my rpi4 often but not always causes a panic while rc(8) tries > > > > > to start > > > > > the bse network interface: > > > > > > > > > > panic: attempt to access user address 0x38 from EL1 > > > > > Stopped at panic+0x160:cmp w21, #0x0 > > > > > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > > > > > * 0 0 0 0x1 0x2000K swapper > > > > > db_enter() at panic+0x15c > > > > > panic() at do_el1h_sync+0x1f8 > > > > > do_el1h_sync() at handle_el1h_sync+0x6c > > > > > handle_el1h_sync() at genet_rxintr+0x120 > > > > > genet_rxintr() at genet_intr+0x74 > > > > > genet_intr() at ampintc_irq_handler+0x14c > > > > > ampintc_irq_handler() at arm_cpu_irq+0x30 > > > > > arm_cpu_irq() at handle_el1h_irq+0x6c > > > > > handle_el1h_irq() at ampintc_splx+0x80 > > > > > ampintc_splx() at genet_ioctl+0x158 > > > > > genet_ioctl() at ifioctl+0x308 > > > > > ifioctl() at nfs_boot_init+0xc0 > > > > > nfs_boot_init() at nfs_mountroot+0x3c > > > > > nfs_mountroot() at main+0x464 > > > > > main() at virtdone+0x70 > > > > > > > > > > >Fix: > > > > > > > > > > The mbuf associated with the current index is NULL. I noticed that > > > > > the NetBSD > > > > > driver allocates mbufs for each ring entry in genet_setup_dma(). But > > > > > even with > > > > > that in place the same panic still occurs. Enabling GENET_DEBUG shows > > > > > that the > > > > > total is quite high: > > > > > > > > > > RX pidx=ca07 total=51463 > > > > > > > > > > > > > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null > > > > > dereference will > > > > > still happen after doing more than 256 iterations in genet_rxintr() > > > > > since we > > > > > will start accessing mbufs cleared by the previous iteration. > > > > > > > > > > Here's a diff with what I've tried so far. The KASSERT() is just > > > > > capturing the > > > > > problem at an earlier stage. Any pointers would be much appreciated. > > > > > > > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are > > > > ignored by the hardware. That's why I ended up with a large amount of > > > > mbufs available in genet_rxintr() since the software and hardware state > > > > was out of sync. Honoring any existing value makes the problem go away > > > > and matches what u-boot[1] does as well. > > > > > > > > The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably > > > > carefully selected as they ensure that the rx ring is filled with at > > > > least the configured low watermark number of mbufs. However, instead of > > > > being forced to ensure a pidx - cidx delta above 0 on the first > > > > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be > > > > passed as the max argument to if_rxr_get() which will clamp the value > > > > anyway. > > > > > > > > Also, I've seen up to 8 mbufs being available per rx interrupt which is >
Re: bse: null dereference in genet_rxintr()
> Date: Wed, 20 Apr 2022 18:14:57 +0200 > From: Anton Lindqvist > > On Tue, Apr 19, 2022 at 06:07:47PM +0200, Anton Lindqvist wrote: > > On Tue, Apr 19, 2022 at 07:32:36AM +0200, Anton Lindqvist wrote: > > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote: > > > > >Synopsis: bse: null dereference in genet_rxintr() > > > > >Category: arm64 > > > > >Environment: > > > > System : OpenBSD 7.1 > > > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 > > > > 06:55:12 MDT 2022 > > > > > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > > > > > > > Architecture: OpenBSD.arm64 > > > > Machine : arm64 > > > > >Description: > > > > > > > > Booting my rpi4 often but not always causes a panic while rc(8) tries > > > > to start > > > > the bse network interface: > > > > > > > > panic: attempt to access user address 0x38 from EL1 > > > > Stopped at panic+0x160:cmp w21, #0x0 > > > > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > > > > * 0 0 0 0x1 0x2000K swapper > > > > db_enter() at panic+0x15c > > > > panic() at do_el1h_sync+0x1f8 > > > > do_el1h_sync() at handle_el1h_sync+0x6c > > > > handle_el1h_sync() at genet_rxintr+0x120 > > > > genet_rxintr() at genet_intr+0x74 > > > > genet_intr() at ampintc_irq_handler+0x14c > > > > ampintc_irq_handler() at arm_cpu_irq+0x30 > > > > arm_cpu_irq() at handle_el1h_irq+0x6c > > > > handle_el1h_irq() at ampintc_splx+0x80 > > > > ampintc_splx() at genet_ioctl+0x158 > > > > genet_ioctl() at ifioctl+0x308 > > > > ifioctl() at nfs_boot_init+0xc0 > > > > nfs_boot_init() at nfs_mountroot+0x3c > > > > nfs_mountroot() at main+0x464 > > > > main() at virtdone+0x70 > > > > > > > > >Fix: > > > > > > > > The mbuf associated with the current index is NULL. I noticed that the > > > > NetBSD > > > > driver allocates mbufs for each ring entry in genet_setup_dma(). But > > > > even with > > > > that in place the same panic still occurs. Enabling GENET_DEBUG shows > > > > that the > > > > total is quite high: > > > > > > > > RX pidx=ca07 total=51463 > > > > > > > > > > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null > > > > dereference will > > > > still happen after doing more than 256 iterations in genet_rxintr() > > > > since we > > > > will start accessing mbufs cleared by the previous iteration. > > > > > > > > Here's a diff with what I've tried so far. The KASSERT() is just > > > > capturing the > > > > problem at an earlier stage. Any pointers would be much appreciated. > > > > > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are > > > ignored by the hardware. That's why I ended up with a large amount of > > > mbufs available in genet_rxintr() since the software and hardware state > > > was out of sync. Honoring any existing value makes the problem go away > > > and matches what u-boot[1] does as well. > > > > > > The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably > > > carefully selected as they ensure that the rx ring is filled with at > > > least the configured low watermark number of mbufs. However, instead of > > > being forced to ensure a pidx - cidx delta above 0 on the first > > > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be > > > passed as the max argument to if_rxr_get() which will clamp the value > > > anyway. > > > > > > Also, I've seen up to 8 mbufs being available per rx interrupt which is > > > odd as only a less amount of rx ring entries are actually populated. Not > > > sure if the driver is missing some interrupt threshold configuration. > > > Increasing the rx ring low watermark to 8 "solved" it for now. > > > Otherwise, the same null dereference occurs while trying to access empty > > > mbuf ring entries. > > > > > > Worth mentioning is that the NetBSD driver does
Re: bse: null dereference in genet_rxintr()
On Tue, Apr 19, 2022 at 06:07:47PM +0200, Anton Lindqvist wrote: > On Tue, Apr 19, 2022 at 07:32:36AM +0200, Anton Lindqvist wrote: > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote: > > > >Synopsis:bse: null dereference in genet_rxintr() > > > >Category:arm64 > > > >Environment: > > > System : OpenBSD 7.1 > > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 06:55:12 > > > MDT 2022 > > > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > > > > > Architecture: OpenBSD.arm64 > > > Machine : arm64 > > > >Description: > > > > > > Booting my rpi4 often but not always causes a panic while rc(8) tries to > > > start > > > the bse network interface: > > > > > > panic: attempt to access user address 0x38 from EL1 > > > Stopped at panic+0x160:cmp w21, #0x0 > > > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > > > * 0 0 0 0x1 0x2000K swapper > > > db_enter() at panic+0x15c > > > panic() at do_el1h_sync+0x1f8 > > > do_el1h_sync() at handle_el1h_sync+0x6c > > > handle_el1h_sync() at genet_rxintr+0x120 > > > genet_rxintr() at genet_intr+0x74 > > > genet_intr() at ampintc_irq_handler+0x14c > > > ampintc_irq_handler() at arm_cpu_irq+0x30 > > > arm_cpu_irq() at handle_el1h_irq+0x6c > > > handle_el1h_irq() at ampintc_splx+0x80 > > > ampintc_splx() at genet_ioctl+0x158 > > > genet_ioctl() at ifioctl+0x308 > > > ifioctl() at nfs_boot_init+0xc0 > > > nfs_boot_init() at nfs_mountroot+0x3c > > > nfs_mountroot() at main+0x464 > > > main() at virtdone+0x70 > > > > > > >Fix: > > > > > > The mbuf associated with the current index is NULL. I noticed that the > > > NetBSD > > > driver allocates mbufs for each ring entry in genet_setup_dma(). But even > > > with > > > that in place the same panic still occurs. Enabling GENET_DEBUG shows > > > that the > > > total is quite high: > > > > > > RX pidx=ca07 total=51463 > > > > > > > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null dereference > > > will > > > still happen after doing more than 256 iterations in genet_rxintr() since > > > we > > > will start accessing mbufs cleared by the previous iteration. > > > > > > Here's a diff with what I've tried so far. The KASSERT() is just > > > capturing the > > > problem at an earlier stage. Any pointers would be much appreciated. > > > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are > > ignored by the hardware. That's why I ended up with a large amount of > > mbufs available in genet_rxintr() since the software and hardware state > > was out of sync. Honoring any existing value makes the problem go away > > and matches what u-boot[1] does as well. > > > > The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably > > carefully selected as they ensure that the rx ring is filled with at > > least the configured low watermark number of mbufs. However, instead of > > being forced to ensure a pidx - cidx delta above 0 on the first > > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be > > passed as the max argument to if_rxr_get() which will clamp the value > > anyway. > > > > Also, I've seen up to 8 mbufs being available per rx interrupt which is > > odd as only a less amount of rx ring entries are actually populated. Not > > sure if the driver is missing some interrupt threshold configuration. > > Increasing the rx ring low watermark to 8 "solved" it for now. > > Otherwise, the same null dereference occurs while trying to access empty > > mbuf ring entries. > > > > Worth mentioning is that the NetBSD driver does not suffer from the same > > problem as they keep all rx ring entries populated all the time. > > > > Looking for feedback and OKs at this point. > > > > [1] > > https://github.com/u-boot/u-boot/blob/a94ab561e2f49a80d8579930e840b810ab1a1330/drivers/net/bcmgenet.c#L404 > > While putting more pressure on the network I'm seeing up to 100 mbufs > being available per rx interrupt. Could it simply be explained by the > hardware operating under the assumption that
Re: bse: null dereference in genet_rxintr()
On Tue, Apr 19, 2022 at 07:32:36AM +0200, Anton Lindqvist wrote: > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote: > > >Synopsis: bse: null dereference in genet_rxintr() > > >Category: arm64 > > >Environment: > > System : OpenBSD 7.1 > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 06:55:12 > > MDT 2022 > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > > > Architecture: OpenBSD.arm64 > > Machine : arm64 > > >Description: > > > > Booting my rpi4 often but not always causes a panic while rc(8) tries to > > start > > the bse network interface: > > > > panic: attempt to access user address 0x38 from EL1 > > Stopped at panic+0x160:cmp w21, #0x0 > > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > > * 0 0 0 0x1 0x2000K swapper > > db_enter() at panic+0x15c > > panic() at do_el1h_sync+0x1f8 > > do_el1h_sync() at handle_el1h_sync+0x6c > > handle_el1h_sync() at genet_rxintr+0x120 > > genet_rxintr() at genet_intr+0x74 > > genet_intr() at ampintc_irq_handler+0x14c > > ampintc_irq_handler() at arm_cpu_irq+0x30 > > arm_cpu_irq() at handle_el1h_irq+0x6c > > handle_el1h_irq() at ampintc_splx+0x80 > > ampintc_splx() at genet_ioctl+0x158 > > genet_ioctl() at ifioctl+0x308 > > ifioctl() at nfs_boot_init+0xc0 > > nfs_boot_init() at nfs_mountroot+0x3c > > nfs_mountroot() at main+0x464 > > main() at virtdone+0x70 > > > > >Fix: > > > > The mbuf associated with the current index is NULL. I noticed that the > > NetBSD > > driver allocates mbufs for each ring entry in genet_setup_dma(). But even > > with > > that in place the same panic still occurs. Enabling GENET_DEBUG shows that > > the > > total is quite high: > > > > RX pidx=ca07 total=51463 > > > > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null dereference > > will > > still happen after doing more than 256 iterations in genet_rxintr() since we > > will start accessing mbufs cleared by the previous iteration. > > > > Here's a diff with what I've tried so far. The KASSERT() is just capturing > > the > > problem at an earlier stage. Any pointers would be much appreciated. > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are > ignored by the hardware. That's why I ended up with a large amount of > mbufs available in genet_rxintr() since the software and hardware state > was out of sync. Honoring any existing value makes the problem go away > and matches what u-boot[1] does as well. > > The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably > carefully selected as they ensure that the rx ring is filled with at > least the configured low watermark number of mbufs. However, instead of > being forced to ensure a pidx - cidx delta above 0 on the first > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be > passed as the max argument to if_rxr_get() which will clamp the value > anyway. > > Also, I've seen up to 8 mbufs being available per rx interrupt which is > odd as only a less amount of rx ring entries are actually populated. Not > sure if the driver is missing some interrupt threshold configuration. > Increasing the rx ring low watermark to 8 "solved" it for now. > Otherwise, the same null dereference occurs while trying to access empty > mbuf ring entries. > > Worth mentioning is that the NetBSD driver does not suffer from the same > problem as they keep all rx ring entries populated all the time. > > Looking for feedback and OKs at this point. > > [1] > https://github.com/u-boot/u-boot/blob/a94ab561e2f49a80d8579930e840b810ab1a1330/drivers/net/bcmgenet.c#L404 While putting more pressure on the network I'm seeing up to 100 mbufs being available per rx interrupt. Could it simply be explained by the hardware operating under the assumption that all ring entries are available? Even if instructing the hardware about the actual amount of available ring entries would require the driver to keep it in sync whenever the if_rxr_*() implementation decides to adjust the ring. Moving to if_rxr_init(RX_DESC_COUNT, RX_DESC_COUNT) essentially making all 256 ring entries always available makes the driver stable.
Re: bse: null dereference in genet_rxintr()
On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote: > >Synopsis: bse: null dereference in genet_rxintr() > >Category:arm64 > >Environment: > System : OpenBSD 7.1 > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 06:55:12 > MDT 2022 > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > Architecture: OpenBSD.arm64 > Machine : arm64 > >Description: > > Booting my rpi4 often but not always causes a panic while rc(8) tries to start > the bse network interface: > > panic: attempt to access user address 0x38 from EL1 > Stopped at panic+0x160:cmp w21, #0x0 > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > * 0 0 0 0x1 0x2000K swapper > db_enter() at panic+0x15c > panic() at do_el1h_sync+0x1f8 > do_el1h_sync() at handle_el1h_sync+0x6c > handle_el1h_sync() at genet_rxintr+0x120 > genet_rxintr() at genet_intr+0x74 > genet_intr() at ampintc_irq_handler+0x14c > ampintc_irq_handler() at arm_cpu_irq+0x30 > arm_cpu_irq() at handle_el1h_irq+0x6c > handle_el1h_irq() at ampintc_splx+0x80 > ampintc_splx() at genet_ioctl+0x158 > genet_ioctl() at ifioctl+0x308 > ifioctl() at nfs_boot_init+0xc0 > nfs_boot_init() at nfs_mountroot+0x3c > nfs_mountroot() at main+0x464 > main() at virtdone+0x70 > > >Fix: > > The mbuf associated with the current index is NULL. I noticed that the NetBSD > driver allocates mbufs for each ring entry in genet_setup_dma(). But even with > that in place the same panic still occurs. Enabling GENET_DEBUG shows that the > total is quite high: > > RX pidx=ca07 total=51463 > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null dereference will > still happen after doing more than 256 iterations in genet_rxintr() since we > will start accessing mbufs cleared by the previous iteration. > > Here's a diff with what I've tried so far. The KASSERT() is just capturing the > problem at an earlier stage. Any pointers would be much appreciated. Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are ignored by the hardware. That's why I ended up with a large amount of mbufs available in genet_rxintr() since the software and hardware state was out of sync. Honoring any existing value makes the problem go away and matches what u-boot[1] does as well. The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably carefully selected as they ensure that the rx ring is filled with at least the configured low watermark number of mbufs. However, instead of being forced to ensure a pidx - cidx delta above 0 on the first invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be passed as the max argument to if_rxr_get() which will clamp the value anyway. Also, I've seen up to 8 mbufs being available per rx interrupt which is odd as only a less amount of rx ring entries are actually populated. Not sure if the driver is missing some interrupt threshold configuration. Increasing the rx ring low watermark to 8 "solved" it for now. Otherwise, the same null dereference occurs while trying to access empty mbuf ring entries. Worth mentioning is that the NetBSD driver does not suffer from the same problem as they keep all rx ring entries populated all the time. Looking for feedback and OKs at this point. [1] https://github.com/u-boot/u-boot/blob/a94ab561e2f49a80d8579930e840b810ab1a1330/drivers/net/bcmgenet.c#L404 diff --git sys/dev/ic/bcmgenet.c sys/dev/ic/bcmgenet.c index 923df40bdfc..6070f9d202e 100644 --- sys/dev/ic/bcmgenet.c +++ sys/dev/ic/bcmgenet.c @@ -311,16 +311,13 @@ void genet_fill_rx_ring(struct genet_softc *sc, int qid) { struct mbuf *m; - uint32_t cidx, index, total; + uint32_t cidx, index; u_int slots; int error; cidx = sc->sc_rx.cidx; - total = (sc->sc_rx.pidx - cidx) & 0x; - KASSERT(total <= RX_DESC_COUNT); - index = sc->sc_rx.cidx & (RX_DESC_COUNT - 1); - for (slots = if_rxr_get(>sc_rx_ring, total); + for (slots = if_rxr_get(>sc_rx_ring, RX_DESC_COUNT); slots > 0; slots--) { if ((m = genet_alloc_mbufcl(sc)) == NULL) { printf("%s: cannot allocate RX mbuf\n", @@ -517,9 +514,9 @@ genet_init_rings(struct genet_softc *sc, int qid) /* RX ring */ - sc->sc_rx.next = 0; - sc->sc_rx.cidx = 0; - sc->sc_rx.pidx = RX_DESC_COUNT; + sc->sc_rx.cidx = RD4(sc, GENET_RX_DMA_CONS_INDEX(qid)) & 0x; + sc->sc_rx.pidx = RD4(sc, GENET_RX_DMA_PROD_INDEX(qid)) & 0x; + sc->sc_rx.next = sc->sc_rx.cidx & (RX_DESC_COU
bse: null dereference in genet_rxintr()
>Synopsis: bse: null dereference in genet_rxintr() >Category: arm64 >Environment: System : OpenBSD 7.1 Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21 06:55:12 MDT 2022 dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP Architecture: OpenBSD.arm64 Machine : arm64 >Description: Booting my rpi4 often but not always causes a panic while rc(8) tries to start the bse network interface: panic: attempt to access user address 0x38 from EL1 Stopped at panic+0x160:cmp w21, #0x0 TIDPIDUID PRFLAGS PFLAGS CPU COMMAND * 0 0 0 0x1 0x2000K swapper db_enter() at panic+0x15c panic() at do_el1h_sync+0x1f8 do_el1h_sync() at handle_el1h_sync+0x6c handle_el1h_sync() at genet_rxintr+0x120 genet_rxintr() at genet_intr+0x74 genet_intr() at ampintc_irq_handler+0x14c ampintc_irq_handler() at arm_cpu_irq+0x30 arm_cpu_irq() at handle_el1h_irq+0x6c handle_el1h_irq() at ampintc_splx+0x80 ampintc_splx() at genet_ioctl+0x158 genet_ioctl() at ifioctl+0x308 ifioctl() at nfs_boot_init+0xc0 nfs_boot_init() at nfs_mountroot+0x3c nfs_mountroot() at main+0x464 main() at virtdone+0x70 >Fix: The mbuf associated with the current index is NULL. I noticed that the NetBSD driver allocates mbufs for each ring entry in genet_setup_dma(). But even with that in place the same panic still occurs. Enabling GENET_DEBUG shows that the total is quite high: RX pidx=ca07 total=51463 Since it's greater than GENET_DMA_DESC_COUNT (=256) the null dereference will still happen after doing more than 256 iterations in genet_rxintr() since we will start accessing mbufs cleared by the previous iteration. Here's a diff with what I've tried so far. The KASSERT() is just capturing the problem at an earlier stage. Any pointers would be much appreciated. diff --git sys/dev/ic/bcmgenet.c sys/dev/ic/bcmgenet.c index 923df40bdfc..c91db84375b 100644 --- sys/dev/ic/bcmgenet.c +++ sys/dev/ic/bcmgenet.c @@ -695,7 +695,10 @@ genet_rxintr(struct genet_softc *sc, int qid) status = RD4(sc, GENET_RX_DESC_STATUS(index)); len = __SHIFTOUT(status, GENET_RX_DESC_STATUS_BUFLEN); - /* XXX check for errors */ + if (status & GENET_RX_DESC_STATUS_ALL_ERRS) { + ifp->if_ierrors++; + goto next; + } bus_dmamap_sync(sc->sc_rx.buf_tag, sc->sc_rx.buf_map[index].map, 0, sc->sc_rx.buf_map[index].map->dm_mapsize, @@ -706,6 +709,7 @@ genet_rxintr(struct genet_softc *sc, int qid) n, index, status, len, len - ETHER_ALIGN); m = sc->sc_rx.buf_map[index].mbuf; + KASSERT(m != NULL); sc->sc_rx.buf_map[index].mbuf = NULL; if (len > ETHER_ALIGN) { @@ -722,6 +726,7 @@ genet_rxintr(struct genet_softc *sc, int qid) if_rxr_put(>sc_rx_ring, 1); +next: index = RX_NEXT(index); } @@ -919,6 +924,8 @@ genet_setup_dma(struct genet_softc *sc, int qid) /* Setup RX ring */ sc->sc_rx.buf_tag = sc->sc_dmat; for (i = 0; i < RX_DESC_COUNT; i++) { + struct mbuf *m; + error = bus_dmamap_create(sc->sc_rx.buf_tag, MCLBYTES, 1, MCLBYTES, 0, BUS_DMA_WAITOK, >sc_rx.buf_map[i].map); @@ -927,6 +934,18 @@ genet_setup_dma(struct genet_softc *sc, int qid) sc->sc_dev.dv_xname); return error; } + if ((m = genet_alloc_mbufcl(sc)) == NULL) { + printf("%s: cannot allocate RX mbuf\n", + sc->sc_dev.dv_xname); + return ENOMEM; + } + error = genet_setup_rxbuf(sc, i, m); + if (error != 0) { + printf("%s: cannot create RX buffer\n", + sc->sc_dev.dv_xname); + m_freem(m); + return error; + } } return 0; diff --git sys/dev/ic/bcmgenetreg.h sys/dev/ic/bcmgenetreg.h index 2cf24bb6ac0..b4b2d541425 100644 --- sys/dev/ic/bcmgenetreg.h +++ sys/dev/ic/bcmgenetreg.h @@ -146,6 +146,7 @@ #define GENET_RX_DESC_STATUS_OWN __BIT(15) #define GENET_RX_DESC_STATUS_EOP __BIT(14) #define GENET_RX_DESC_STATUS_SOP __BIT(13) +#define GENET_RX_DESC_STATUS_ALL_ERRS __BITS(4,0) #define GENET_RX_DESC_STATUS_RX_ERROR __BIT(2) #defineGENET_RX_DESC_ADDRESS_LO(idx) (GENET_RX_BASE + GENET_DMA_DESC_SIZE * (idx) + 0x04) #defineGENET_RX_DESC_ADDRESS_HI(idx) (GENET_RX_BASE + G