> Date: Wed, 30 Mar 2016 09:10:04 +0200 > From: Stefan Sperling <s...@stsp.name> > > On Wed, Mar 30, 2016 at 07:54:31AM +0100, Fred wrote: > > This looks similar to the issues in this thread: > > > > http://marc.info/?t=143466115000001 > > > > I'm not sure a definative solution was found - but having dc nic's was part > > of the issue. > > It's not driver dependent. > > On my blade100 a similar crash is happening with ral(4). > http://marc.info/?l=openbsd-bugs&m=144802152407599&w=2
I made dc(4) print the dma addressen and legths of the segments it puts on the tx ring (values in hex): X 60095868/86 X 6009479a/42 X 60090768/8e X 60095f9a/42 X 60095f9a/66 The crash happens immediately after that last one. Note that 0x60095f9a + 0x66 = 0x60096000 In other words, this is a segment that is aligned with the end of a page. In all likelyhood the NIC's DMA engine is overfetching and runs into the next page. This page isn't mapped into the IOMMU which triggers the fault. In the past, the mbuf pool used in-page pool page headers. This would hide the issue since the pool page header would "misalign" the pool items and induce some trailing unused space in the page. Not sure how to solve this yet; I'll be looking at Solaris for inspiration. Fairly certain that ral(4) has a similar issue.