On 10/07/18 12:21, Richard Biener wrote: > On Tue, Jul 10, 2018 at 12:53 PM Richard Earnshaw (lists) > <richard.earns...@arm.com> wrote: >> >> On 10/07/18 11:10, Richard Biener wrote: >>> On Tue, Jul 10, 2018 at 10:39 AM Richard Earnshaw (lists) >>> <richard.earns...@arm.com> wrote: >>>> >>>> On 10/07/18 08:19, Richard Biener wrote: >>>>> On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw >>>>> <richard.earns...@arm.com> wrote: >>>>>> >>>>>> >>>>>> The patches I posted earlier this year for mitigating against >>>>>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from >>>>>> which it became obvious that a rethink was needed. This mail, and the >>>>>> following patches attempt to address that feedback and present a new >>>>>> approach to mitigating against this form of attack surface. >>>>>> >>>>>> There were two major issues with the original approach: >>>>>> >>>>>> - The speculation bounds were too tightly constrained - essentially >>>>>> they had to represent and upper and lower bound on a pointer, or a >>>>>> pointer offset. >>>>>> - The speculation constraints could only cover the immediately preceding >>>>>> branch, which often did not fit well with the structure of the existing >>>>>> code. >>>>>> >>>>>> An additional criticism was that the shape of the intrinsic did not >>>>>> fit particularly well with systems that used a single speculation >>>>>> barrier that essentially had to wait until all preceding speculation >>>>>> had to be resolved. >>>>>> >>>>>> To address all of the above, these patches adopt a new approach, based >>>>>> in part on a posting by Chandler Carruth to the LLVM developers list >>>>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html), >>>>>> but which we have extended to deal with inter-function speculation. >>>>>> The patches divide the problem into two halves. >>>>>> >>>>>> The first half is some target-specific code to track the speculation >>>>>> condition through the generated code to provide an internal variable >>>>>> which can tell us whether or not the CPU's control flow speculation >>>>>> matches the data flow calculations. The idea is that the internal >>>>>> variable starts with the value TRUE and if the CPU's control flow >>>>>> speculation ever causes a jump to the wrong block of code the variable >>>>>> becomes false until such time as the incorrect control flow >>>>>> speculation gets unwound. >>>>>> >>>>>> The second half is that a new intrinsic function is introduced that is >>>>>> much simpler than we had before. The basic version of the intrinsic >>>>>> is now simply: >>>>>> >>>>>> T var = __builtin_speculation_safe_value (T unsafe_var); >>>>>> >>>>>> Full details of the syntax can be found in the documentation patch, in >>>>>> patch 1. In summary, when not speculating the intrinsic returns >>>>>> unsafe_var; when speculating then if it can be shown that the >>>>>> speculative flow has diverged from the intended control flow then zero >>>>>> is returned. An optional second argument can be used to return an >>>>>> alternative value to zero. The builtin may cause execution to pause >>>>>> until the speculation state is resolved. >>>>> >>>>> So a trivial target implementation would be to emit a barrier and then >>>>> it would always return unsafe_var and never zero. What I don't understand >>>>> fully is what users should do here, thus what the value of ever returning >>>>> "unsafe" is. Also I wonder why the API is forcing you to single-out a >>>>> special value instead of doing >>>>> >>>>> bool safe = __builtin_speculation_safe_value_p (T unsafe_value); >>>>> if (!safe) >>>>> /* what now? */ >>>>> >>>>> I'm only guessing that the correct way to handle "unsafe" is basically >>>>> >>>>> while (__builtin_speculation_safe_value (val) == 0) >>>>> ; >>>>> >>>>> use val, it's now safe >>>> >>>> No, a safe version of val is returned, not a bool telling you it is now >>>> safe to use the original. >>> >>> OK, so making the old value dead is required to preserve the desired >>> dataflow. >>> >>> But how should I use the special value that signaled "failure"? >>> >>> Obviously the user isn't supposed to simply replace 'val' with >>> >>> val = __builtin_speculation_safe_value (val); >>> >>> to make it speculation-proof. So - how should the user _use_ this >>> builtin? The docs do not say anything about this but says the >>> very confusing >>> >>> +The function may use target-dependent speculation tracking state to cause >>> +@var{failval} to be returned when it is known that speculative >>> +execution has incorrectly predicted a conditional branch operation. >>> >>> because speculation is about executing instructions as if they were >>> supposed to be executed. Once it is known the prediciton was wrong >>> no more "wrong" instructions will be executed but a previously >>> speculated instruction cannot know it was "falsely" speculated. >>> >>> Does the above try to say that the function may return failval if the >>> instruction is currently executed speculatively instead? That would >>> make sense to me. And return failval independent of if the speculation >>> later turns out to be correct or not. >>> >>>> You must use the sanitized version in future, >>>> not the unprotected version. >>>> >>>> >>>> So the usage is going to be more like: >>>> >>>> val = __builtin_speculation_safe_value (val); // Overwrite val with a >>>> sanitized version. >>>> >>>> You have to use the cleaned up version, the unclean version is still >>>> vulnerable to incorrect speculation. >>> >>> But then where does failval come into play? The above cannot be correct >>> if failval matters. >> >> failval only comes into play if the CPU is speculating incorrectly. >> It's just a safe value that some CPUs might use until such time as the >> speculation gets unwound. In most cases it can be zero. Perhaps a more >> concrete example would be something like. >> >> >> void **mem; >> >> void* f(unsigned untrusted) >> { >> if (untrusted < 100) >> return mem[untrusted]; >> return NULL; >> } >> >> This can be rewritten as either: >> >> void *mem; >> >> void* f(unsigned untrusted) >> { >> if (untrusted < 100) >> return mem[__builtin_speculation_safe_value (untrusted)]; >> return NULL; >> } >> >> if leaking mem[0] is not a problem; or if you're really paranoid: >> >> void* f(unsigned untrusted) >> { >> if (untrusted < 100) >> return *__builtin_speculation_safe_value (mem + untrusted); >> return NULL; >> } >> >> which becomes a NULL pointer dereference if the speculation is >> incorrect. The programmer doesn't need to test this code (any test >> would likely be useless anyway as the CPU would predict the outcome and >> speculate based on such a prediction). They do need to think a bit >> about what might leak if the CPU does miss-speculate, hence the two >> examples above. >> >> A more complex case, where you might want a different failure value can >> come up if you have a mask operation. A slightly convoluted example >> (derived from a real example we found in the Linux kernel) might be >> >> if (mode32) >> mask_from = 0x100000000; >> else >> mask_from = 0; >> >> ... // some code >> >> return mem[untrusted_val & (mask_from - 1)]; >> >> It would probably be better to place the speculation cleanup nearer the >> initialization of mask_from, especially if mask_from could be used more >> than once; but in that case 0 is less safe than any other (since 0-1 is >> all bits 1). In this case you might write: >> >> if (mode32) >> mask_from = 0x100000000; >> else >> mask_from = 0; >> >> mask_from = __builtin_speculation_safe_value (mask_from, 1); >> >> ... // some code >> >> return mem[untrusted_val & (mask_from - 1)]; >> >> And in this case if the speculation is incorrect the (mask_from - 1) >> expression becomes 0 and the whole range is protected. >> >> Note that speculation is fine if it is correct, we don't want to hold >> things up in that case since it will happen anyway. It's only a problem >> if the speculation is incorrect :-) > > Ok, please amend at least the user-level documentation with one or two > examples like above. > >> >>> >>> Does the builtin try to say that iff the current instruction is speculated >>> then it will return failval, otherwise it will return val and thus following >>> code needs to handle 'failval' in a way that is safe? >> >> No. From an abstract machine sense the builtin only ever returns its >> first argument. The fail value only appears if the CPU guesses the >> direction of a branch and guesses *incorrectly* (incorrect speculation) > > I understand how it works when the implementation issues a fence. > How does the actual implementation (assembly) look like that makes > the CPU know whether it speculated wrongly or not at the time the > value is needed for further speculation given the *__bsfs (mem + offset) > instruction _is_ speculated as well. >
If you've not already read it, then I suggest you read https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/download-the-whitepaper and the link I posted in the original mail to Chandler's posting on this subject. In essence, we start with a register (tracker) initialized to ~0. At every conditional branch we change b.cond tgt ... tgt: ... into b.cond tgt // tracker == ~cond ? tracker, 0 csel tracker, tracker, 0, ~cond ... tgt: // tracker == cond ? tracker, 0 csel tracker, tracker, 0, cond Now when we need to inhibit speculation we emit and safe_val, unsafe_val, tracker CSDB Assuming we want the default result of 0 if speculating unsafely. CSDB is a light-weight barrier that ensures that all preceding /data-processing/ instructions are using real results from earlier instructions: it will resolve any data-flow speculation, but not necessarily any control-flow speculation. Obviously the inserted instructions only appear on the edges between the blocks, but we take care of that during the insertion process. R. > Richard. > >>> >>> Again, the wording "when it is known that speculative execution has >>> incorrectly predicted a conditional branch operation" is very confusing... >>> >>> Richard. >>> >>>> R. >>>> >>>>> >>>>> that is, the return value is only interesting in sofar as to whether it >>>>> is equal >>>>> to val or the special value? >>>>> >>>>> That said, I wonder why we don't hide that details from the user and >>>>> provide a predicate instead. >>>>> >>>>> Richard. >>>>> >>>>>> There are seven patches in this set, as follows. >>>>>> >>>>>> 1) Introduces the new intrinsic __builtin_sepculation_safe_value. >>>>>> 2) Adds a basic hard barrier implementation for AArch32 (arm) state. >>>>>> 3) Adds a basic hard barrier implementation for AArch64 state. >>>>>> 4) Adds a new command-line option -mtrack-speculation (currently a >>>>>> no-op). >>>>>> 5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation. >>>>>> 6) Adds the new speculation tracking pass for AArch64 >>>>>> 7) Uses the new speculation tracking pass to generate CSDB-based barrier >>>>>> sequences >>>>>> >>>>>> I haven't added a speculation-tracking pass for AArch32 at this time. >>>>>> It is possible to do this, but would require quite a lot of rework for >>>>>> the arm backend due to the limited number of registers that are >>>>>> available. >>>>>> >>>>>> Although patch 6 is AArch64 specific, I'd appreciate a review from >>>>>> someone more familiar with the branch edge code than myself. There >>>>>> appear to be a number of tricky issues with more complex edges so I'd >>>>>> like a second opinion on that code in case I've missed an important >>>>>> case. >>>>>> >>>>>> R. >>>>>> >>>>>> >>>>>> >>>>>> Richard Earnshaw (7): >>>>>> Add __builtin_speculation_safe_value >>>>>> Arm - add speculation_barrier pattern >>>>>> AArch64 - add speculation barrier >>>>>> AArch64 - Add new option -mtrack-speculation >>>>>> AArch64 - disable CB[N]Z TB[N]Z when tracking speculation >>>>>> AArch64 - new pass to add conditional-branch speculation tracking >>>>>> AArch64 - use CSDB based sequences if speculation tracking is enabled >>>>>> >>>>>> gcc/builtin-types.def | 6 + >>>>>> gcc/builtins.c | 57 ++++ >>>>>> gcc/builtins.def | 20 ++ >>>>>> gcc/c-family/c-common.c | 143 +++++++++ >>>>>> gcc/c-family/c-cppbuiltin.c | 5 +- >>>>>> gcc/config.gcc | 2 +- >>>>>> gcc/config/aarch64/aarch64-passes.def | 1 + >>>>>> gcc/config/aarch64/aarch64-protos.h | 3 +- >>>>>> gcc/config/aarch64/aarch64-speculation.cc | 494 >>>>>> ++++++++++++++++++++++++++++++ >>>>>> gcc/config/aarch64/aarch64.c | 88 +++++- >>>>>> gcc/config/aarch64/aarch64.md | 140 ++++++++- >>>>>> gcc/config/aarch64/aarch64.opt | 4 + >>>>>> gcc/config/aarch64/iterators.md | 3 + >>>>>> gcc/config/aarch64/t-aarch64 | 10 + >>>>>> gcc/config/arm/arm.md | 21 ++ >>>>>> gcc/config/arm/unspecs.md | 1 + >>>>>> gcc/doc/cpp.texi | 4 + >>>>>> gcc/doc/extend.texi | 29 ++ >>>>>> gcc/doc/invoke.texi | 10 +- >>>>>> gcc/doc/md.texi | 15 + >>>>>> gcc/doc/tm.texi | 20 ++ >>>>>> gcc/doc/tm.texi.in | 2 + >>>>>> gcc/target.def | 23 ++ >>>>>> gcc/targhooks.c | 27 ++ >>>>>> gcc/targhooks.h | 2 + >>>>>> gcc/testsuite/gcc.dg/spec-barrier-1.c | 40 +++ >>>>>> gcc/testsuite/gcc.dg/spec-barrier-2.c | 19 ++ >>>>>> gcc/testsuite/gcc.dg/spec-barrier-3.c | 13 + >>>>>> 28 files changed, 1191 insertions(+), 11 deletions(-) >>>>>> create mode 100644 gcc/config/aarch64/aarch64-speculation.cc >>>>>> create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-1.c >>>>>> create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-2.c >>>>>> create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-3.c >>>> >>