hello.  The information that a warm reset doesn't come up clean is 
useful information.  
In looking at mfii.c, it looks like there are two possible sources of the 
problem.  The first
is the one I've mentioned earlier, that somehow, interrupt handling gets 
mangled during
operations and interrupt stop getting received from the Perc controller.
The second is the Perc controller itself is getting into a weird state causing 
its firmware to
stop completing requests.  

I'm not sure which source to look at first, so here are some suggestions.

1.  Before the problem occurs, can you capture some dmesg output showing how 
the mfii devices
attach and what interupts they're using?

2.  What does the output of vmstat -i look like when things are working?

3.  Have yu brought up the Perc's RAID configuration menu to confirm the raid 
sets are healthy
and that you're not getting any disk errors which might be masked from NetBSD 
itself?  I've
seen this sort of behavior when a disk is throwing errors; the Perc firmware is 
so busy dealing
with the problem disk it stops responding to the mfii(4) driver.  
Unfortunately, the NetBSD
driver isn't very good about reporting these kinds of errors; I'm not sure if 
it's a problem
with the mfii(4) driver or the firmware on the Perc itself.
Because the errors happen at random intervals after the machine boots, it's 
possible the issue
is a good old fashioned failing disk.

        I do realize yu see the errors on two separate controllers, which is 
why I'm leaning
toward an interrupt issue, but it would be good to verify your disks are good.

Hope that helps.
-Brian

Reply via email to