On Tue, Jun 2, 2020 at 4:01 PM Singh, Balbir <sbl...@amazon.com> wrote: > > > (c) and if I read the code correctly, trying to flush the L1D$ on > > non-intel without the HW support, it causes a WARN_ON_ONCE()! WTF? > > That is not correct, the function only complains if we do a software fallback > flush without allocating the flush pages.
Right. And if you're not on Intel, then that allocation would never have been done, since the allocation function returns an error for non-intel systems. > That function is not exposed without > the user using the prctl() API, which allocates those flush pages. See above: it doesn't actually allocate those pages on anything but intel CPU's. That said, looking deeper, it then does look like a l1d_flush_init_once() failure will also cause the code to avoid setting the TIF_SPEC_L1D_FLUSH bit, so non-intel CPU's will never call the actual flushing routines, and thus never hit the WARN_ON. Ok. > > (2) the HW case is done for any vendor, if it reports the "I have the MSR" > > No l1d_flush_init_once() fails for users opting in via the prctl(), it > succeeds for users of L1TF. Yeah, again it looks like this all is basically just a hack for Intel CPU's. It should never have been conditional on "do this on Intel". It should have been conditional on the L1TF bug. Yes, there's certainly overlap there, but it's not complete. > > (3) the VMX support certainly has various sanity checks like "oh, CPU > > doesn't have X86_BUG_L1TF, then I won't do this even if there was some > > kernel command line to say I should". But the new prctrl doesn't have > > anything like that. It just enables that L1D$ thing mindlessly, > > thinking that user-land software somehow knows what it's doing. BS. > > So you'd like to see a double opt-in? I'd like it to be gated on being sane by default, together with some system option like we have for pretty much all the mitigations. > Unforunately there is no gating > of the bug and I tried to make it generic - clearly calling it opt-in > flushing for the paranoid, for those who really care about CVE-2020-0550. No, you didn't make it generic at all - you made it depend on X86_VENDOR_INTEL instead. So now the logic is "on Intel, do this thing whether it makes sense or not, on other vendors, never do it whether it _would_ make sense or not". That to me is not sensible. I just don't see the logic. This feature should never be enabled unless X86_BUG_L1TF is on, as far as I can tell. And it should never be enabled if SMT is on. At that point, it at least starts making sense. Maybe we don't need any further admin options at that point. > Would this make you happier? > > 1. Remove SW fallback flush > 2. Implement a double opt-in (CAP_SYS_ADMIN for the prctl or a > system wide disable)? > 3. Ensure the flush happens only when the current core has > SMT disabled I think that (3) case should basically be "X86_BUG_L1TF && !SMT". That should basically be the default setting for this. The (2) thing I would prefer to just be the same kind of thing we do for all the other mitigations: have a kernel command line to override the defaults. The SW fallback right now feels wrong to me. It does seem to be very microarchitecture-specific and I'd really like to understand the reason for the magic TLB filling. At the same time, if the feature is at least enabled under sane and understandable circumstances, and people have a way to turn it off, maybe I don't care too much. Linus