Re: [GIT PULL] x86/mm changes for v4.21

Andy Lutomirski Thu, 07 Feb 2019 10:09:19 -0800

> On Feb 7, 2019, at 9:57 AM, Peter Zijlstra <[email protected]> wrote:
> 
>> On Thu, Feb 07, 2019 at 09:36:00AM -0800, Luck, Tony wrote:
>>> On Thu, Feb 07, 2019 at 03:01:31PM +0100, Peter Zijlstra wrote:
>>>> On Thu, Feb 07, 2019 at 11:50:52AM +0000, Linus Torvalds wrote:
>>>> If you re-generate the canonical address in __cpa_addr(), now we'll
>>>> actually have the real virtual address around for a lot of code-paths
>>>> (pte lookup etc), which was what people wanted to avoid in the first
>>>> place.
>>> 
>>> Note that it's an 'unsigned long' address, not an actual pointer, and
>>> (afaict) non of the code paths use it as a pointer. This _should_ avoid
>>> the CPU from following said pointer and doing a deref on it.
>> 
>> The type doesn't matter. You want to avoid having the
>> true value in the register as long as possible. Ideal
>> spot would be the instruction before the TLB is flushed.
>> 
>> The speculative issue is that any branch you encounter
>> while you have the address in a register may be mispredicted.
>> You might also get a bogus hit in the branch target cache
>> and speculatively jump into the weeds. While there you
>> could find an instruction that loads using that register, and
>> even though it is speculative and the instruction won't
>> retire, a machine check log will be created in a bank (no
>> machine check is signalled).
>> 
>> Once the TLB is updated, you are safe. A speculative
>> access to an uncached address will not load or log anything.
> 
> Something like so then? AFAICT CLFLUSH will also #GP if feed it crap.
> 
> 
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index 4f8972311a77..d3ae92ad72a6 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -230,6 +230,28 @@ static bool __cpa_pfn_in_highmap(unsigned long pfn)
> 
> #endif
> 
> +/*
> + * Machine check recovery code needs to change cache mode of poisoned
> + * pages to UC to avoid speculative access logging another error. But
> + * passing the address of the 1:1 mapping to set_memory_uc() is a fine
> + * way to encourage a speculative access. So we cheat and flip the top
> + * bit of the address. This works fine for the code that updates the
> + * page tables. But at the end of the process we need to flush the cache
> + * and the non-canonical address causes a #GP fault when used by the
> + * CLFLUSH instruction.
> + *
> + * But in the common case we already have a canonical address. This code
> + * will fix the top bit if needed and is a no-op otherwise.

Joining this thread late...

This is all IMO rather crazy.  How about we fiddle with CR0 to turn off the 
cache, then fiddle with page tables, then turn caching on?  Or, heck, see if 
there’s some chicken bit we can set to improve the situation while we’re in the 
MCE handler.

Also, since I don’t really want to dig into the code to answer this, how 
exactly do we do a broadcast TLB flush from MCE context?  We’re 
super-duper-atomic, and locks might be held on various CPUs.  Shouldn’t we be 
telling the cpa code to skip the flush and then just have the MCE code do a 
full flush manually?  The MCE code has already taken over all CPUs on non-LMCE 
systems.

Or, better yet, get Intel to fix the hardware. A failed speculative access 
while already in MCE context should just be ignored.
Re: [GIT PULL] x86/mm changes for v4.21

Reply via email to