On Fri, Nov 14, 2014 at 4:52 PM, Luck, Tony <tony.l...@intel.com> wrote:
>> causes Tony's MCE stress test to fail, presumably when some CPU either
>> becomes permanently non-interruptable or otherwise wanders off into
>> the weeds.
>
> It might be that recent "improvements" I made to my test harness have
> messed things up.  I trimmed one delay (between injection and consumption),
> but it turns out the other delay in the code never get executed (because we
> take a SIGBUS on consumption and then longjmp).  So my test that used
> to pause a bit between iterations were running almost back to back
> consumption and injection of next error.

Hmm.

Am I right that the timeout code in mce.c is overly aggressive, too?

>
> This meant the serial console was a huge bottleneck (especially as my
> development BIOS is also kicking its own debug junk onto the same port).
> Some of the errors pointed obliquely at console.
>
> I've slowed things back down to where they used to be, and things are
> ticking along nicely (with 0.6 second delay between iterations).  Just
> passed the 2800 mark and still going.  I'm leaving it running over the
> weekend - if it makes it into the 50k level I'm willing to call it good.
>

Phew :)

FWIW, I've confirmed that my code survives int3 from userspace, int3
from normal kernel code, and int3 from kernel with user gs.  I'm not
completely thrilled with what it does to double_fault, though.  If we
somehow get a double fault caused by an interrupt hitting userspace
with a bad kernel_stack, then we'll end up page faulting in the
double_fault prologue.  I'm not convinced that this is worth worrying
about.  It would be easy enough to fix, though, even if it would
further uglify the code.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to