Re: sparc64 reproducable panic on 5.9

Fred Thu, 31 Mar 2016 14:01:48 -0700

On 03/31/16 09:27, Mark Kettenis wrote:

Date: Wed, 30 Mar 2016 09:10:04 +0200
From: Stefan Sperling <s...@stsp.name>


On Wed, Mar 30, 2016 at 07:54:31AM +0100, Fred wrote:

This looks similar to the issues in this thread:

http://marc.info/?t=143466115000001

I'm not sure a definative solution was found - but having dc nic's was part
of the issue.


It's not driver dependent.

On my blade100 a similar crash is happening with ral(4).
http://marc.info/?l=openbsd-bugs&m=144802152407599&w=2


I made dc(4) print the dma addressen and legths of the segments it
puts on the tx ring (values in hex):

X 60095868/86
X 6009479a/42
X 60090768/8e
X 60095f9a/42
X 60095f9a/66

The crash happens immediately after that last one.  Note that

   0x60095f9a + 0x66 = 0x60096000

In other words, this is a segment that is aligned with the end of a
page.  In all likelyhood the NIC's DMA engine is overfetching and runs
into the next page.  This page isn't mapped into the IOMMU which
triggers the fault.

In the past, the mbuf pool used in-page pool page headers.  This would
hide the issue since the pool page header would "misalign" the pool
items and induce some trailing unused space in the page.

Not sure how to solve this yet; I'll be looking at Solaris for
inspiration.  Fairly certain that ral(4) has a similar issue.

I'm right in thinking then the Steve's [1] approach of changing the elseif in subr_pool.c to:

} else if (256 > size) {
is just masking the issue?

[1] http://marc.info/?l=openbsd-bugs&m=144962985826087

Cheers

Fred

Re: sparc64 reproducable panic on 5.9

Reply via email to