Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

Andy Lutomirski Mon, 17 Nov 2014 16:07:59 -0800

On Mon, Nov 17, 2014 at 3:16 PM, Luck, Tony <tony.l...@intel.com> wrote:
>> I still wonder whether the timeout code is the real culprit.  My patch
>> will slow down entry into do_machine_check by tens of cycles, several
>> cachelines, and possibly a couple of TLB misses.  Given that the
>> timing seemed marginal to me, it's possible (albeit not that likely)
>> that it pushed the time needed for synchronization into the range of
>> unreliability.
>
> I think we have very conservative timeouts.  Here are the significant bits 
> from mce
>
> First SPINUNIT ... how many nanoseconds to spin "ndelay(SPINUNIT)"  before 
> pounding on an atomic op to see if we are done:
> #define SPINUNIT 100    /* 100ns */
>
> Initialization of our timeout - comment above this claims we are aiming for 
> "one second"
>
>         cfg->monarch_timeout = USEC_PER_SEC;
>
> Inside mce_start() - set up our timeout. Math says this will be 1e9
>
>         u64 timeout = (u64)mca_cfg.monarch_timeout * NSEC_PER_USEC;


Aha.  I missed the extra multiplication.

>
> Now the actual loop:
>         while (atomic_read(&mce_callin) != cpus) {
>                 if (mce_timed_out(&timeout)) {
>                         atomic_set(&global_nwo, 0);
>                         return -1;
>                 }
>                 ndelay(SPINUNIT);
>         }

And I missed the ndelay.

*sigh* reading comprehension wasn't so good there.

>From a debugging POV, it would be kind of nice to know which of the
four mce_timed_out calls is reporting the failure.  IIUC, the first
one would mean that a cpu didn't make it in to do_machine_check at
all, whereas the second one would mean that a cpu got lost after
entering do_machine_check.  Also, the second one presumably knows how
far it got (i.e. mce_executing and order).

It could also be interesting to tweak mce_panic to not actually panic
the machine but to try to return and stop the test instead.  Then real
debugging could be possible :)

I still wish I could test this.  The mce-inject code is completely
useless here, since it calls do_machine_check directly instead of
going through the MCE code.

>
> And the inside of mce_timed_out() ... we decrement our 1e9 by 100 for each 
> call:
>
>         if ((s64)*t < SPINUNIT) {
>                 if (mca_cfg.tolerant <= 1)
>                         mce_panic("Timeout synchronizing machine check over 
> CPUs",
>                                   NULL, NULL);
>                 cpu_missing = 1;
>                 return 1;
>         }
>         *t -= SPINUNIT;
>
> Not sure what the worst case is for waiting for other processors to show up. 
> Perhaps it is
> when the other cpu is in some deep C-state and we have to wait for it to 
> power the core
> back on and resync clock.  Maybe it is when a cpu has just started a "wbinvd" 
> instruction
> with a cache entirely full of dirty lines that belong on some other processor 
> so have to
> be written back across QPI (may when QPI links are congested with 10gbit 
> network traffic?
>
> But a full second seems like a lot ... you adding a handful of cache and TLB 
> misses shouldn't
> push us over the edge.

Agreed.

>
> -Tony
>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

Reply via email to