Re: Spectre V1 diagnostic / mitigation

Richard Biener Wed, 19 Dec 2018 09:20:54 -0800

On Wed, 19 Dec 2018, Richard Earnshaw (lists) wrote:

> On 19/12/2018 11:25, Richard Biener wrote:
> > On Tue, 18 Dec 2018, Richard Earnshaw (lists) wrote:
> > 
> >> On 18/12/2018 15:36, Richard Biener wrote:
> >>>
> >>> Hi,
> >>>
> >>> in the past weeks I've been looking into prototyping both spectre V1 
> >>> (speculative array bound bypass) diagnostics and mitigation in an
> >>> architecture independent manner to assess feasability and some kind
> >>> of upper bound on the performance impact one can expect.
> >>> https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html is
> >>> an interesting read in this context as well.
> >>
> >> Interesting, thanks for posting this.
> >>
> >>>
> >>> For simplicity I have implemented mitigation on GIMPLE right before
> >>> RTL expansion and have chosen TLS to do mitigation across function
> >>> boundaries.  Diagnostics sit in the same place but both are not in
> >>> any way dependent on each other.
> >>
> >> We considered using TLS for propagating the state across call-boundaries
> >> on AArch64, but rejected it for several reasons.
> >>
> >> - It's quite expensive to have to set up the TLS state in every function;
> >> - It requires some global code to initialize the state variable - that's
> >> kind of ABI;
> > 
> > The cost is probably target dependent - on x86 it's simply a $fs based
> > load/store.  For initialization a static initializer seemed to work
> > for me (but honestly I didn't do any testing besides running the
> > testsuite for correctness - so at least the mask wasn't zero initialized).
> > Note the LLVM people use an inverted mask and cancel values by
> > OR-ing -1 instead of AND-ing 0.  At least default zero-initialization
> > should be possible with TLS vars.
> > 
> > That said, my choice of TLS was to make this trivially work across
> > targets - if a target can do better then it should.  And of course
> > the target may not have any TLS support besides emultls which would
> > be prohibitly expensive.
> > 
> >> - It also seems likely to be vulnerable to Spectre variant 4 - unless
> >> the CPU can always correctly store-to-load forward the speculation
> >> state, then you have the situation where the load may see an old value
> >> of the state - and that's almost certain to say "we're not speculating".
> >>
> >> The last one is really the killer here.
> > 
> > Hmm, as far as I understood v4 only happens when store-forwarding
> > doesn't work.  And I hope it doesn't fail "randomly" but works
> > reliable when all accesses to the memory are aligned and have
> > the same size as is the case with these compiler-generated TLS
> > accesses.  But yes, if that's not guaranteed then using memory
> > doesn't work at all.  
> 
> The problem is that you can't prove this through realistic testing.
> Architecturally, the result has to come out the same in the end in that
> if the load does bypass the store, eventually the hardware has to replay
> the instruction with the correct data and cancel any operations that
> were dependent on the earlier execution.  Only side-channel data will be
> left after that.
> 
> > Not sure what else target independent there
> > is though that doesn't break the ABI like simply adding another
> > parameter.  And even adding a parameter might not work in case
> > there's only stack passing and V4 happens on the stack accesses...
> 
> Yep, exactly.
> 
> > 
> >>>
> >>> The mitigation strategy chosen is that of tracking speculation
> >>> state via a mask that can be used to zero parts of the addresses
> >>> that leak the actual data.  That's similar to what aarch64 does
> >>> with -mtrack-speculation (but oddly there's no mitigation there).
> >>
> >> We rely on the user inserting the new builtin, which we can more
> >> effectively optimize if the compiler is generating speculation state
> >> tracking data.  That doesn't preclude a full solution at a later date,
> >> but it looked like it was likely overkill for protecting every load and
> >> safely pruning the loads is not an easy problem to solve.  Of course,
> >> the builtin does require the programmer to do some work to identify
> >> which memory accesses might be vulnerable.
> > 
> > My main question was how in earth the -mtrack-speculation overhead
> > is reasonable for the very few expected explicit builtin uses...
> 
> Ultimately that will depend on what the user wants and the level of
> protection needed.  The builtin gives the choice: get a hard barrier if
> tracking has not been enabled, with a very high hit at the point of
> execution; or take a much lower hit at that point if tracking has been
> enabled.  That's a trade-off between how often you hit the barrier vs
> how much you hit the tracking events to no benefit.
> 
> Your code, however, doesn't work at present.  This example shows that
> the mitigation code is just optimized away by the rtl passes, at least
> for -fspectre-v1=2.
> 
> int f (int a, int b, int c, char *d)
> {
>   if (a > 10)
>     return 0;
> 
>   if (b > 64)
>     return 0;
> 
>   if (c > 96)
>     return 0;
> 
>   return d[a] + d[b] + d[c];
> }
> 
> It works ok at level 3 because then the compiler can't prove the logical
> truth of the speculation variable on the path from TLS memory and that's
> sufficient to defeat the optimizers.


That was expected - now I didn't find a simple example, thanks for 
providing one ;)  The above is "mis-"optimized by combine which
seems to have code to track nonzero bits across conditionals
(likely, didn't find the part of the code yet).

Mitigation against this is moving the whole thing to RTL or, as you
say, hide the -1 initialization from it via some volatile stuff.

Richard.

> R.
> 
> > 
> > Richard.
> > 
> >> R.
> >>
> >>
> >>>
> >>> I've optimized things to the point that is reasonable when working
> >>> target independent on GIMPLE but I've only looked at x86 assembly
> >>> and performance.  I expect any "final" mitigation if we choose to
> >>> implement and integrate such would be after RTL expansion since
> >>> RTL expansion can end up introducing quite some control flow whose
> >>> speculation state is not properly tracked by the prototype.
> >>>
> >>> I'm cut&pasting single-runs of SPEC INT 2006/2017 here, the runs
> >>> were done with -O2 [-fspectre-v1={2,3}] where =2 is function-local
> >>> mitigation and =3 does mitigation global with passing the state
> >>> via TLS memory.
> >>>
> >>> The following was measured on a Haswell desktop CPU:
> >>>
> >>>   -O2 vs. -O2 -fspectre-v1=2
> >>>
> >>>                                   Estimated                       
> >>> Estimated
> >>>                 Base     Base       Base        Peak     Peak       Peak
> >>> Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
> >>> -------------- ------  ---------  ---------    ------  ---------  
> >>> ---------
> >>> 400.perlbench    9770        245       39.8 *    9770        452       
> >>> 21.6 *  184%
> >>> 401.bzip2        9650        378       25.5 *    9650        726       
> >>> 13.3 *  192%
> >>> 403.gcc          8050        236       34.2 *    8050        352       
> >>> 22.8 *  149%
> >>> 429.mcf          9120        223       40.9 *    9120        656       
> >>> 13.9 *  294%
> >>> 445.gobmk       10490        400       26.2 *   10490        666       
> >>> 15.8 *  167%
> >>> 456.hmmer        9330        388       24.1 *    9330        536       
> >>> 17.4 *  138%
> >>> 458.sjeng       12100        437       27.7 *   12100        661       
> >>> 18.3 *  151%
> >>> 462.libquantum  20720        300       69.1 *   20720        384       
> >>> 53.9 *  128%
> >>> 464.h264ref     22130        451       49.1 *   22130        586       
> >>> 37.8 *  130%
> >>> 471.omnetpp      6250        291       21.5 *    6250        398       
> >>> 15.7 *  137%
> >>> 473.astar        7020        334       21.0 *    7020        522       
> >>> 13.5 *  156%
> >>> 483.xalancbmk    6900        182       37.9 *    6900        306       
> >>> 22.6 *  168%
> >>>  Est. SPECint_base2006                   --
> >>>  Est. SPECint2006                                                        
> >>> --
> >>>
> >>>    -O2 -fspectre-v1=3
> >>>
> >>>                                   Estimated                       
> >>> Estimated
> >>>                 Base     Base       Base        Peak     Peak       Peak
> >>> Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
> >>> -------------- ------  ---------  ---------    ------  ---------  
> >>> ---------
> >>> 400.perlbench                                    9770        497       
> >>> 19.6 *  203%
> >>> 401.bzip2                                        9650        772       
> >>> 12.5 *  204%
> >>> 403.gcc                                          8050        427       
> >>> 18.9 *  181%
> >>> 429.mcf                                          9120        696       
> >>> 13.1 *  312%
> >>> 445.gobmk                                       10490        726       
> >>> 14.4 *  181%
> >>> 456.hmmer                                        9330        537       
> >>> 17.4 *  138%
> >>> 458.sjeng                                       12100        721       
> >>> 16.8 *  165%
> >>> 462.libquantum                                  20720        446       
> >>> 46.4 *  149%
> >>> 464.h264ref                                     22130        613       
> >>> 36.1 *  136%
> >>> 471.omnetpp                                      6250        471       
> >>> 13.3 *  162%
> >>> 473.astar                                        7020        579       
> >>> 12.1 *  173%
> >>> 483.xalancbmk                                    6900        350       
> >>> 19.7 *  192%
> >>>  Est. SPECint(R)_base2006           Not Run
> >>>  Est. SPECint2006                                                        
> >>> --
> >>>
> >>>
> >>> While the following was measured on a Zen Epyc server:
> >>>
> >>> -O2 vs -O2 -fspectre-v1=2
> >>>
> >>>                        Estimated                       Estimated
> >>>                  Base     Base        Base        Peak     Peak        
> >>> Peak
> >>> Benchmarks       Copies  Run Time     Rate        Copies  Run Time     
> >>> Rate
> >>> --------------- -------  ---------  ---------    -------  ---------  
> >>> ---------
> >>> 500.perlbench_r       1        499       3.19  *       1        621       
> >>> 2.56  * 124%
> >>> 502.gcc_r             1        286       4.95  *       1        392       
> >>> 3.61  * 137%
> >>> 505.mcf_r             1        331       4.88  *       1        456       
> >>> 3.55  * 138%
> >>> 520.omnetpp_r         1        454       2.89  *       1        563       
> >>> 2.33  * 124%
> >>> 523.xalancbmk_r       1        328       3.22  *       1        569       
> >>> 1.86  * 173%
> >>> 525.x264_r            1        518       3.38  *       1        776       
> >>> 2.26  * 150%
> >>> 531.deepsjeng_r       1        365       3.14  *       1        448       
> >>> 2.56  * 123%
> >>> 541.leela_r           1        598       2.77  *       1        729       
> >>> 2.27  * 122%
> >>> 548.exchange2_r       1        460       5.69  *       1        756       
> >>> 3.46  * 164%
> >>> 557.xz_r              1        403       2.68  *       1        586       
> >>> 1.84  * 145%
> >>>  Est. SPECrate2017_int_base              3.55
> >>>  Est. SPECrate2017_int_peak                                               
> >>> 2.56    72%
> >>>
> >>> -O2 -fspectre-v2=3
> >>>
> >>>                        Estimated                       Estimated
> >>>                  Base     Base        Base        Peak     Peak        
> >>> Peak
> >>> Benchmarks       Copies  Run Time     Rate        Copies  Run Time     
> >>> Rate
> >>> --------------- -------  ---------  ---------    -------  ---------  
> >>> ---------
> >>> 500.perlbench_r                               NR       1        700       
> >>> 2.27  * 140%
> >>> 502.gcc_r                                     NR       1        485       
> >>> 2.92  * 170%
> >>> 505.mcf_r                                     NR       1        596       
> >>> 2.71  * 180%
> >>> 520.omnetpp_r                                 NR       1        604       
> >>> 2.17  * 133%
> >>> 523.xalancbmk_r                               NR       1        643       
> >>> 1.64  * 196%
> >>> 525.x264_r                                    NR       1        797       
> >>> 2.20  * 154%
> >>> 531.deepsjeng_r                               NR       1        542       
> >>> 2.12  * 149%
> >>> 541.leela_r                                   NR       1        872       
> >>> 1.90  * 146%
> >>> 548.exchange2_r                               NR       1        761       
> >>> 3.44  * 165%
> >>> 557.xz_r                                      NR       1        595       
> >>> 1.81  * 148%
> >>>  Est. SPECrate2017_int_base           Not Run
> >>>  Est. SPECrate2017_int_peak                                               
> >>> 2.26    64%
> >>>
> >>>
> >>>
> >>> you can see, even thoug we're comparing apples and oranges, that the 
> >>> performance impact is quite dependent on the microarchitecture.
> >>>
> >>> Similarly interesting as performance is the effect on text size which is
> >>> surprisingly high (_best_ case is 13 bytes per conditional branch plus 3
> >>> bytes per instrumented memory).
> >>>
> >>> CPU2016:
> >>>    BASE  -O2
> >>>    text      data     bss     dec     hex filename
> >>> 1117726     20928   12704 1151358  11917e 400.perlbench
> >>>   56568      3800    4416   64784    fd10 401.bzip2
> >>> 3419568      7912  751520 4179000  3fc438 403.gcc
> >>>   12212       712   11984   24908    614c 429.mcf
> >>> 1460694   2081772 2330096 5872562  599bb2 445.gobmk
> >>>  284929      5956   82040  372925   5b0bd 456.hmmer
> >>>  130782      2152 2576896 2709830  295946 458.sjeng
> >>>   41915       764      96   42775    a717 462.libquantum
> >>>  505452     11220  372320  888992   d90a0 464.h264ref
> >>>  638188      9584   14664  662436   a1ba4 471.omnetpp
> >>>   38859       900    5216   44975    afaf 473.astar
> >>> 4033878    140248   12168 4186294  3fe0b6 483.xalancbmk
> >>>    PEAK -O2 -fspectre-v1=2
> >>>    text      data     bss     dec     hex filename
> >>> 1508032     20928   12704 1541664  178620 400.perlbench   135%
> >>>   76098      3800    4416   84314   1495a 401.bzip2       135%
> >>> 4483530      7912  751520 5242962  500052 403.gcc         131%
> >>>   16006       712   11984   28702    701e 429.mcf         131%
> >>> 1647384   2081772 2330096 6059252  5c74f4 445.gobmk       112%
> >>>  377259      5956   82040  465255   71967 456.hmmer       132%
> >>>  164672      2152 2576896 2743720  29dda8 458.sjeng       126%
> >>>   47901       764      96   48761    be79 462.libquantum  114%
> >>>  649854     11220  372320 1033394   fc4b2 464.h264ref     129%
> >>>  706908      9584   14664  731156   b2814 471.omnetpp     111%
> >>>   48493       900    5216   54609    d551 473.astar       125%
> >>> 4862056    140248   12168 5014472  4c83c8 483.xalancbmk   121%
> >>>    PEAK -O2 -fspectre-v1=3
> >>>    text      data     bss     dec     hex filename
> >>> 1742008     20936   12704 1775648  1b1820 400.perlbench   156%
> >>>   83338      3808    4416   91562   165aa 401.bzip2       147%
> >>> 5219850      7920  751520 5979290  5b3c9a 403.gcc         153%
> >>>   17422       720   11984   30126    75ae 429.mcf         143%
> >>> 1801688   2081780 2330096 6213564  5ecfbc 445.gobmk       123%
> >>>  431827      5964   82040  519831   7ee97 456.hmmer       152%
> >>>  182200      2160 2576896 2761256  2a2228 458.sjeng       139%
> >>>   53773       772      96   54641    d571 462.libquantum  128%
> >>>  691798     11228  372320 1075346  106892 464.h264ref     137%
> >>>  976692      9592   14664 1000948   f45f4 471.omnetpp     153%
> >>>   54525       908    5216   60649    ece9 473.astar       140%
> >>> 5808306    140256   12168 5960730  5af41a 483.xalancbmk   144%
> >>>
> >>> CPU2017:
> >>>    BASE -O2 -g
> >>>    text    data     bss     dec     hex filename
> >>> 2209713    8576    9080 2227369  21fca9 500.perlbench_r
> >>> 9295702   37432 1150664 10483798 9ff856 502.gcc_r
> >>>   21795     712     744   23251    5ad3 505.mcf_r
> >>> 2067560    8984   46888 2123432  2066a8 520.omnetpp_r
> >>> 5763577  142584   20040 5926201  5a6d39 523.xalancbmk_r
> >>>  508402    6102   29592  544096   84d60 525.x264_r
> >>>   84222     784 12138360 12223366 ba8386 531.deepsjeng_r
> >>>  223480    8544   30072  262096   3ffd0 541.leela_r
> >>>   70554     864    6384   77802   12fea 548.exchange2_r
> >>>  180640     884   17704  199228   30a3c 557.xz_r
> >>>    PEAK -fspectre-v2=2
> >>>    text    data     bss     dec     hex filename
> >>> 2991161    8576    9080 3008817  2de931 500.perlbench_r   135%
> >>> 12244886  37432 1150664 13432982 ccf896 502.gcc_r 132%
> >>>   28475     712     744   29931    74eb 505.mcf_r 131%
> >>> 2397026    8984   46888 2452898  256da2 520.omnetpp_r     116%
> >>> 6846853  142584   20040 7009477  6af4c5 523.xalancbmk_r   119%
> >>>  645730    6102   29592  681424   a65d0 525.x264_r        127%
> >>>  111166     784 12138360 12250310 baecc6 531.deepsjeng_r 132%
> >>>  260835    8544   30072  299451   491bb 541.leela_r     117%
> >>>   96874     864    6384  104122   196ba 548.exchange2_r   137%
> >>>  215288     884   17704  233876   39194 557.xz_r  119%
> >>>    PEAK -fspectre-v2=3
> >>>    text    data     bss     dec     hex filename
> >>> 3365945    8584    9080 3383609  33a139 500.perlbench_r   152%
> >>> 14790638  37440 1150664 15978742 f3d0f6 502.gcc_r 159%
> >>>   31419     720     744   32883    8073 505.mcf_r 144%
> >>> 2867893    8992   46888 2923773  2c9cfd 520.omnetpp_r     139%
> >>> 8183689  142592   20040 8346321  7f5ad1 523.xalancbmk_r   142%
> >>>  697434    6110   29592  733136   b2fd0 525.x264_r        137%
> >>>  123638     792 12138360 12262790 bb1d86 531.deepsjeng_r 147%
> >>>  315347    8552   30072  353971   566b3 541.leela_r       141%
> >>>   98578     872    6384  105834   19d6a 548.exchange2_r   140%
> >>>  239144     892   17704  257740   3eecc 557.xz_r  133%
> >>>
> >>>
> >>> The patch relies heavily on RTL optimizations for DCE purposes.  At the
> >>> same time we rely on RTL not statically computing the mask (RTL has no
> >>> conditional constant propagation).  Full instrumentation of the classic
> >>> Spectre V1 testcase
> >>>
> >>> char a[1024];
> >>> int b[1024];
> >>> int foo (int i, int bound)
> >>> {
> >>>   if (i < bound)
> >>>     return b[a[i]];
> >>> }
> >>>
> >>> is the following:
> >>>
> >>> foo:
> >>> .LFB0:  
> >>>         .cfi_startproc
> >>>         xorl    %eax, %eax
> >>>         cmpl    %esi, %edi
> >>>         setge   %al
> >>>         subq    $1, %rax
> >>>         jne     .L4
> >>>         ret
> >>>         .p2align 4,,10
> >>>         .p2align 3
> >>> .L4:
> >>>         andl    %eax, %edi
> >>>         movslq  %edi, %rdi
> >>>         movsbq  a(%rdi), %rax
> >>>         movl    b(,%rax,4), %eax
> >>>         ret
> >>>
> >>> so the generated GIMPLE was "tuned" for reasonable x86 assembler outcome.
> >>>
> >>> Patch below for reference (and your own testing in case you are curious).
> >>> I do not plan to pursue this further at this point.
> >>>
> >>> Richard.
> >>>
> >>> From 01e4a5a43e266065d32489daa50de0cf2425d5f5 Mon Sep 17 00:00:00 2001
> >>> From: Richard Guenther <rguent...@suse.de>
> >>> Date: Wed, 5 Dec 2018 13:17:02 +0100
> >>> Subject: [PATCH] warn-spectrev1
> >>>
> >>>
> >>> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> >>> index 7960cace16a..64d472d7fa0 100644
> >>> --- a/gcc/Makefile.in
> >>> +++ b/gcc/Makefile.in
> >>> @@ -1334,6 +1334,7 @@ OBJS = \
> >>>   gimple-ssa-sprintf.o \
> >>>   gimple-ssa-warn-alloca.o \
> >>>   gimple-ssa-warn-restrict.o \
> >>> + gimple-ssa-spectrev1.o \
> >>>   gimple-streamer-in.o \
> >>>   gimple-streamer-out.o \
> >>>   gimple-walk.o \
> >>> diff --git a/gcc/common.opt b/gcc/common.opt
> >>> index 45d7f6189e5..1ae7fcfe177 100644
> >>> --- a/gcc/common.opt
> >>> +++ b/gcc/common.opt
> >>> @@ -702,6 +702,10 @@ Warn when one local variable shadows another local 
> >>> variable or parameter of comp
> >>>  Wshadow-compatible-local
> >>>  Common Warning Undocumented Alias(Wshadow=compatible-local)
> >>>  
> >>> +Wspectre-v1
> >>> +Common Var(warn_spectrev1) Warning
> >>> +Warn about code susceptible to spectre v1 style attacks.
> >>> +
> >>>  Wstack-protector
> >>>  Common Var(warn_stack_protect) Warning
> >>>  Warn when not issuing stack smashing protection for some reason.
> >>> @@ -2406,6 +2410,14 @@ fsingle-precision-constant
> >>>  Common Report Var(flag_single_precision_constant) Optimization
> >>>  Convert floating point constants to single precision constants.
> >>>  
> >>> +fspectre-v1
> >>> +Common Alias(fspectre-v1=, 2, 0)
> >>> +Insert code to mitigate spectre v1 style attacks.
> >>> +
> >>> +fspectre-v1=
> >>> +Common Report RejectNegative Joined UInteger IntegerRange(0, 3) 
> >>> Var(flag_spectrev1) Optimization
> >>> +Insert code to mitigate spectre v1 style attacks.
> >>> +
> >>>  fsplit-ivs-in-unroller
> >>>  Common Report Var(flag_split_ivs_in_unroller) Init(1) Optimization
> >>>  Split lifetimes of induction variables when loops are unrolled.
> >>> diff --git a/gcc/gimple-ssa-spectrev1.cc b/gcc/gimple-ssa-spectrev1.cc
> >>> new file mode 100644
> >>> index 00000000000..c2a5dc95324
> >>> --- /dev/null
> >>> +++ b/gcc/gimple-ssa-spectrev1.cc
> >>> @@ -0,0 +1,824 @@
> >>> +/* Loop interchange.
> >>> +   Copyright (C) 2017-2018 Free Software Foundation, Inc.
> >>> +   Contributed by ARM Ltd.
> >>> +
> >>> +This file is part of GCC.
> >>> +
> >>> +GCC is free software; you can redistribute it and/or modify it
> >>> +under the terms of the GNU General Public License as published by the
> >>> +Free Software Foundation; either version 3, or (at your option) any
> >>> +later version.
> >>> +
> >>> +GCC is distributed in the hope that it will be useful, but WITHOUT
> >>> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> >>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> >>> +for more details.
> >>> +
> >>> +You should have received a copy of the GNU General Public License
> >>> +along with GCC; see the file COPYING3.  If not see
> >>> +<http://www.gnu.org/licenses/>.  */
> >>> +
> >>> +#include "config.h"
> >>> +#include "system.h"
> >>> +#include "coretypes.h"
> >>> +#include "backend.h"
> >>> +#include "is-a.h"
> >>> +#include "tree.h"
> >>> +#include "gimple.h"
> >>> +#include "tree-pass.h"
> >>> +#include "ssa.h"
> >>> +#include "gimple-pretty-print.h"
> >>> +#include "gimple-iterator.h"
> >>> +#include "params.h"
> >>> +#include "tree-ssa.h"
> >>> +#include "cfganal.h"
> >>> +#include "gimple-walk.h"
> >>> +#include "tree-ssa-loop.h"
> >>> +#include "tree-dfa.h"
> >>> +#include "tree-cfg.h"
> >>> +#include "fold-const.h"
> >>> +#include "builtins.h"
> >>> +#include "alias.h"
> >>> +#include "cfgloop.h"
> >>> +#include "varasm.h"
> >>> +#include "cgraph.h"
> >>> +#include "gimple-fold.h"
> >>> +#include "diagnostic.h"
> >>> +
> >>> +/* The Spectre V1 situation is as follows:
> >>> +
> >>> +      if (attacker_controlled_idx < bound)  // speculated as true but is 
> >>> false
> >>> +        {
> >>> +   // out-of-bound access, returns value interesting to attacker
> >>> +   val = mem[attacker_controlled_idx];
> >>> +   // access that causes a cache-line to be brought in - canary
> >>> +   ... = attacker_controlled_mem[val];
> >>> + }
> >>> +
> >>> +   The last load provides the side-channel.  The pattern can be split
> >>> +   into multiple functions or translation units.  Conservatively we'd
> >>> +   have to warn about
> >>> +
> >>> +      int foo (int *a) {  return *a; }
> >>> +
> >>> +   thus any indirect (or indexed) memory access.  That's obvioulsy
> >>> +   not useful.
> >>> +
> >>> +   The next level would be to warn only when we see load of val as
> >>> +   well.  That then misses cases like
> >>> +
> >>> +      int foo (int *a, int *b)
> >>> +      {
> >>> +        int idx = load_it (a);
> >>> + return load_it (&b[idx]);
> >>> +      }
> >>> +
> >>> +   Still we'd warn about cases like
> >>> +
> >>> +      struct Foo { int *a; };
> >>> +      int foo (struct Foo *a) { return *a->a; }
> >>> +
> >>> +   though dereferencing VAL isn't really an interesting case.  It's
> >>> +   hard to exclude this conservatively so the obvious solution is
> >>> +   to restrict the kind of loads that produce val, for example based
> >>> +   on its type or its number of bits.  It's tempting to do this at
> >>> +   the point of the load producing val but in the end what matters
> >>> +   is the number of bits that reach the second loads [as index] given
> >>> +   there are practical limits on the size of the canary.  For this
> >>> +   we have to consider
> >>> +
> >>> +      int foo (struct Foo *a, int *b)
> >>> +      {
> >>> +        int *c = a->a;
> >>> + int idx = *b;
> >>> + return *(c + idx);
> >>> +      }
> >>> +
> >>> +   where idx has too many bits to be an interesting attack vector(?).
> >>> + */
> >>> +
> >>> +/* The pass does two things, first it performs data flow analysis
> >>> +   to be able to warn about the second load.  This is controlled
> >>> +   via -Wspectre-v1.
> >>> +
> >>> +   Second it instruments control flow in the program to track a
> >>> +   mask which is all-ones but all-zeroes if the CPU speculated
> >>> +   a branch in the wrong direction.  This mask is then used to
> >>> +   mask the address[-part(s)] of loads with non-invariant addresses,
> >>> +   effectively mitigating the attack.  This is controlled by
> >>> +   -fpectre-v1[=N] where N is default 2 and
> >>> +     1  optimistically omit some instrumentations (currently
> >>> +        backedge control flow instructions do not update the
> >>> + speculation mask)
> >>> +     2  instrument conservatively using a function-local speculation
> >>> +        mask
> >>> +     3  instrument conservatively using a global (TLS) speculation
> >>> +        mask.  This adds TLS loads/stores of the speculation mask
> >>> + at function boundaries and before and after calls.
> >>> + */
> >>> +
> >>> +/* We annotate statements whose defs cannot be used to leaking data
> >>> +   speculatively via loads with SV1_SAFE.  This is used to optimize
> >>> +   masking of indices where masked indices (and derived by constant
> >>> +   ones) are not masked again.  Note this works only up to the points
> >>> +   that possibly change the speculation mask value.  */
> >>> +#define SV1_SAFE GF_PLF_1
> >>> +
> >>> +namespace {
> >>> +
> >>> +const pass_data pass_data_spectrev1 =
> >>> +{
> >>> +  GIMPLE_PASS, /* type */
> >>> +  "spectrev1", /* name */
> >>> +  OPTGROUP_NONE, /* optinfo_flags */
> >>> +  TV_NONE, /* tv_id */
> >>> +  PROP_cfg|PROP_ssa, /* properties_required */
> >>> +  0, /* properties_provided */
> >>> +  0, /* properties_destroyed */
> >>> +  0, /* todo_flags_start */
> >>> +  TODO_update_ssa, /* todo_flags_finish */
> >>> +};
> >>> +
> >>> +class pass_spectrev1 : public gimple_opt_pass
> >>> +{
> >>> +public:
> >>> +  pass_spectrev1 (gcc::context *ctxt)
> >>> +    : gimple_opt_pass (pass_data_spectrev1, ctxt)
> >>> +  {}
> >>> +
> >>> +  /* opt_pass methods: */
> >>> +  opt_pass * clone () { return new pass_spectrev1 (m_ctxt); }
> >>> +  virtual bool gate (function *) { return warn_spectrev1 || 
> >>> flag_spectrev1; }
> >>> +  virtual unsigned int execute (function *);
> >>> +
> >>> +  static bool stmt_is_indexed_load (gimple *);
> >>> +  static bool stmt_mangles_index (gimple *, tree);
> >>> +  static bool find_value_dependent_guard (gimple *, tree);
> >>> +  static void mark_influencing_outgoing_flow (basic_block, tree);
> >>> +  static tree instrument_mem (gimple_stmt_iterator *, tree, tree);
> >>> +}; // class pass_spectrev1
> >>> +
> >>> +bitmap_head *influencing_outgoing_flow;
> >>> +
> >>> +static bool
> >>> +call_between (gimple *first, gimple *second)
> >>> +{
> >>> +  gcc_assert (gimple_bb (first) == gimple_bb (second));
> >>> +  /* ???  This is inefficient.  Maybe we can use gimple_uid to assign
> >>> +     unique IDs to stmts belonging to groups with the same speculation
> >>> +     mask state.  */
> >>> +  for (gimple_stmt_iterator gsi = gsi_for_stmt (first);
> >>> +       gsi_stmt (gsi) != second; gsi_next (&gsi))
> >>> +    if (is_gimple_call (gsi_stmt (gsi)))
> >>> +      return true;
> >>> +  return false;
> >>> +}
> >>> +
> >>> +basic_block ctx_bb;
> >>> +gimple *ctx_stmt;
> >>> +static bool
> >>> +gather_indexes (tree, tree *idx, void *data)
> >>> +{
> >>> +  vec<tree *> *indexes = (vec<tree *> *)data;
> >>> +  if (TREE_CODE (*idx) != SSA_NAME)
> >>> +    return true;
> >>> +  if (!SSA_NAME_IS_DEFAULT_DEF (*idx)
> >>> +      && gimple_bb (SSA_NAME_DEF_STMT (*idx)) == ctx_bb
> >>> +      && gimple_plf (SSA_NAME_DEF_STMT (*idx), SV1_SAFE)
> >>> +      && (flag_spectrev1 < 3
> >>> +   || !call_between (SSA_NAME_DEF_STMT (*idx), ctx_stmt)))
> >>> +    return true;
> >>> +  if (indexes->is_empty ())
> >>> +    indexes->safe_push (idx);
> >>> +  else if (*(*indexes)[0] == *idx)
> >>> +    indexes->safe_push (idx);
> >>> +  else
> >>> +    return false;
> >>> +  return true;
> >>> +}
> >>> +
> >>> +tree
> >>> +pass_spectrev1::instrument_mem (gimple_stmt_iterator *gsi, tree mem, 
> >>> tree mask)
> >>> +{
> >>> +  /* First try to see if we can find a single index we can zero which
> >>> +     has the chance of repeating in other loads and also avoids separate
> >>> +     LEA and memory references decreasing code size and AGU occupancy.  
> >>> */
> >>> +  auto_vec<tree *, 8> indexes;
> >>> +  ctx_bb = gsi_bb (*gsi);
> >>> +  ctx_stmt = gsi_stmt (*gsi);
> >>> +  if (PARAM_VALUE (PARAM_SPECTRE_V1_MAX_INSTRUMENT_INDICES) > 0
> >>> +      && for_each_index (&mem, gather_indexes, (void *)&indexes))
> >>> +    {
> >>> +      /* All indices are safe.  */
> >>> +      if (indexes.is_empty ())
> >>> + return mem;
> >>> +      if (TYPE_PRECISION (TREE_TYPE (*indexes[0]))
> >>> +   <= TYPE_PRECISION (TREE_TYPE (mask)))
> >>> + {
> >>> +   tree idx = *indexes[0];
> >>> +   gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (idx))
> >>> +               || POINTER_TYPE_P (TREE_TYPE (idx)));
> >>> +   /* Instead of instrumenting IDX directly we could look at
> >>> +      definitions with a single SSA use and instrument that
> >>> +      instead.  But we have to do some work to make SV1_SAFE
> >>> +      propagation updated then - this would really ask to first
> >>> +      gather all indexes of all refs we want to instrument and
> >>> +      compute some optimal set of instrumentations.  */
> >>> +   gimple_seq seq = NULL;
> >>> +   tree idx_mask = gimple_convert (&seq, TREE_TYPE (idx), mask);
> >>> +   tree masked_idx = gimple_build (&seq, BIT_AND_EXPR,
> >>> +                                   TREE_TYPE (idx), idx, idx_mask);
> >>> +   /* Mark the instrumentation sequence as visited.  */
> >>> +   for (gimple_stmt_iterator si = gsi_start (seq);
> >>> +        !gsi_end_p (si); gsi_next (&si))
> >>> +     gimple_set_visited (gsi_stmt (si), true);
> >>> +   gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
> >>> +   gimple_set_plf (SSA_NAME_DEF_STMT (masked_idx), SV1_SAFE, true);
> >>> +   /* Replace downstream users in the BB which reduces register pressure
> >>> +      and allows SV1_SAFE propagation to work (which stops at call/BB
> >>> +      boundaries though).
> >>> +      ???  This is really reg-pressure vs. dependence chains so not
> >>> +      a generally easy thing.  Making the following propagate into
> >>> +      all uses dominated by the insert slows down 429.mcf even more.
> >>> +      ???  We can actually track SV1_SAFE across PHIs but then we
> >>> +      have to propagate into PHIs here.  */
> >>> +   gimple *use_stmt;
> >>> +   use_operand_p use_p;
> >>> +   imm_use_iterator iter;
> >>> +   FOR_EACH_IMM_USE_STMT (use_stmt, iter, idx)
> >>> +     if (gimple_bb (use_stmt) == gsi_bb (*gsi)
> >>> +         && gimple_code (use_stmt) != GIMPLE_PHI
> >>> +         && !gimple_visited_p (use_stmt))
> >>> +       {
> >>> +         FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
> >>> +           SET_USE (use_p, masked_idx);
> >>> +         update_stmt (use_stmt);
> >>> +       }
> >>> +   /* Modify MEM in place...  (our stmt is already marked visited).  */
> >>> +   for (unsigned i = 0; i < indexes.length (); ++i)
> >>> +     *indexes[i] = masked_idx;
> >>> +   return mem;
> >>> + }
> >>> +    }
> >>> +
> >>> +  /* ???  Can we handle TYPE_REVERSE_STORAGE_ORDER at all?  Need to
> >>> +     handle BIT_FIELD_REFs.  */
> >>> +
> >>> +  /* Strip a bitfield reference to re-apply it at the end.  */
> >>> +  tree bitfield = NULL_TREE;
> >>> +  tree bitfield_off = NULL_TREE;
> >>> +  if (TREE_CODE (mem) == COMPONENT_REF
> >>> +      && DECL_BIT_FIELD (TREE_OPERAND (mem, 1)))
> >>> +    {
> >>> +      bitfield = TREE_OPERAND (mem, 1);
> >>> +      bitfield_off = TREE_OPERAND (mem, 2);
> >>> +      mem = TREE_OPERAND (mem, 0);
> >>> +    }
> >>> +
> >>> +  tree ptr_base = mem;
> >>> +  /* VIEW_CONVERT_EXPRs do not change offset, strip them, they get folded
> >>> +     into the MEM_REF we create.  */
> >>> +  while (TREE_CODE (ptr_base) == VIEW_CONVERT_EXPR)
> >>> +    ptr_base = TREE_OPERAND (ptr_base, 0);
> >>> +
> >>> +  tree ptr = make_ssa_name (ptr_type_node);
> >>> +  gimple *new_stmt = gimple_build_assign (ptr, build_fold_addr_expr 
> >>> (ptr_base));
> >>> +  gimple_set_visited (new_stmt, true);
> >>> +  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
> >>> +  ptr = make_ssa_name (ptr_type_node);
> >>> +  new_stmt = gimple_build_assign (ptr, BIT_AND_EXPR,
> >>> +                           gimple_assign_lhs (new_stmt), mask);
> >>> +  gimple_set_visited (new_stmt, true);
> >>> +  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
> >>> +  tree type = TREE_TYPE (mem);
> >>> +  unsigned align = get_object_alignment (mem);
> >>> +  if (align != TYPE_ALIGN (type))
> >>> +    type = build_aligned_type (type, align);
> >>> +
> >>> +  tree new_mem = build2 (MEM_REF, type, ptr,
> >>> +                  build_int_cst (reference_alias_ptr_type (mem), 0));
> >>> +  if (bitfield)
> >>> +    new_mem = build3 (COMPONENT_REF, TREE_TYPE (bitfield), new_mem,
> >>> +               bitfield, bitfield_off);
> >>> +  return new_mem;
> >>> +}
> >>> +
> >>> +bool
> >>> +check_spectrev1_2nd_load (tree, tree *idx, void *data)
> >>> +{
> >>> +  sbitmap value_from_indexed_load = (sbitmap)data;
> >>> +  if (TREE_CODE (*idx) == SSA_NAME
> >>> +      && bitmap_bit_p (value_from_indexed_load, SSA_NAME_VERSION (*idx)))
> >>> +    return false;
> >>> +  return true;
> >>> +}
> >>> +
> >>> +bool
> >>> +check_spectrev1_2nd_load (gimple *, tree, tree ref, void *data)
> >>> +{
> >>> +  return !for_each_index (&ref, check_spectrev1_2nd_load, data);
> >>> +}
> >>> +
> >>> +void
> >>> +pass_spectrev1::mark_influencing_outgoing_flow (basic_block bb, tree op)
> >>> +{
> >>> +  if (!bitmap_set_bit (&influencing_outgoing_flow[SSA_NAME_VERSION (op)],
> >>> +                bb->index))
> >>> +    return;
> >>> +
> >>> +  /* Note we are deliberately non-conservatively stop at call and
> >>> +     memory boundaries here expecting earlier optimization to expose
> >>> +     value dependences via SSA chains.  */
> >>> +  gimple *def_stmt = SSA_NAME_DEF_STMT (op);
> >>> +  if (gimple_vuse (def_stmt)
> >>> +      || !is_gimple_assign (def_stmt))
> >>> +    return;
> >>> +
> >>> +  ssa_op_iter i;
> >>> +  FOR_EACH_SSA_TREE_OPERAND (op, def_stmt, i, SSA_OP_USE)
> >>> +    mark_influencing_outgoing_flow (bb, op);
> >>> +}
> >>> +
> >>> +bool
> >>> +pass_spectrev1::find_value_dependent_guard (gimple *stmt, tree op)
> >>> +{
> >>> +  bitmap_iterator bi;
> >>> +  unsigned i;
> >>> +  EXECUTE_IF_SET_IN_BITMAP (&influencing_outgoing_flow[SSA_NAME_VERSION 
> >>> (op)],
> >>> +                     0, i, bi)
> >>> +    /* ???  If control-dependent on.
> >>> +       ???  Make bits in influencing_outgoing_flow the index of the BB
> >>> +       in RPO order so we could walk bits from STMT "upwards" finding
> >>> +       the nearest one.  */
> >>> +    if (dominated_by_p (CDI_DOMINATORS,
> >>> +                 gimple_bb (stmt), BASIC_BLOCK_FOR_FN (cfun, i)))
> >>> +      {
> >>> + if (dump_enabled_p ())
> >>> +   dump_printf_loc (MSG_NOTE, stmt, "Condition %G in block %d "
> >>> +                    "is related to indexes used in %G\n",
> >>> +                    last_stmt (BASIC_BLOCK_FOR_FN (cfun, i)),
> >>> +                    i, stmt);
> >>> + return true;
> >>> +      }
> >>> +
> >>> +  /* Note we are deliberately non-conservatively stop at call and
> >>> +     memory boundaries here expecting earlier optimization to expose
> >>> +     value dependences via SSA chains.  */
> >>> +  gimple *def_stmt = SSA_NAME_DEF_STMT (op);
> >>> +  if (gimple_vuse (def_stmt)
> >>> +      || !is_gimple_assign (def_stmt))
> >>> +    return false;
> >>> +
> >>> +  ssa_op_iter it;
> >>> +  FOR_EACH_SSA_TREE_OPERAND (op, def_stmt, it, SSA_OP_USE)
> >>> +    if (find_value_dependent_guard (stmt, op))
> >>> +      /* Others may be "nearer".  */
> >>> +      return true;
> >>> +
> >>> +  return false;
> >>> +}
> >>> +
> >>> +bool
> >>> +pass_spectrev1::stmt_is_indexed_load (gimple *stmt)
> >>> +{
> >>> +  /* Given we ignore the function boundary for incoming parameters
> >>> +     let's ignore return values of calls as well for the purpose
> >>> +     of being the first indexed load (also ignore inline-asms).  */
> >>> +  if (!gimple_assign_load_p (stmt))
> >>> +    return false;
> >>> +
> >>> +  /* Exclude esp. pointers from the index load itself (but also floats,
> >>> +     vectors, etc. - quite a bit handwaving here).  */
> >>> +  if (!INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt))))
> >>> +    return false;
> >>> +
> >>> +  /* If we do not have any SSA uses the load cannot be one indexed
> >>> +     by an attacker controlled value.  */
> >>> +  if (zero_ssa_operands (stmt, SSA_OP_USE))
> >>> +    return false;
> >>> +
> >>> +  return true;
> >>> +}
> >>> +
> >>> +/* Return true whether the index in the use operand OP in STMT is
> >>> +   not transfered to STMTs defs.  */
> >>> +
> >>> +bool
> >>> +pass_spectrev1::stmt_mangles_index (gimple *stmt, tree op)
> >>> +{
> >>> +  if (gimple_assign_load_p (stmt))
> >>> +    return true;
> >>> +  if (gassign *ass = dyn_cast <gassign *> (stmt))
> >>> +    {
> >>> +      enum tree_code code = gimple_assign_rhs_code (ass);
> >>> +      switch (code)
> >>> + {
> >>> + case TRUNC_DIV_EXPR:
> >>> + case CEIL_DIV_EXPR:
> >>> + case FLOOR_DIV_EXPR:
> >>> + case ROUND_DIV_EXPR:
> >>> + case EXACT_DIV_EXPR:
> >>> + case RDIV_EXPR:
> >>> + case TRUNC_MOD_EXPR:
> >>> + case CEIL_MOD_EXPR:
> >>> + case FLOOR_MOD_EXPR:
> >>> + case ROUND_MOD_EXPR:
> >>> + case LSHIFT_EXPR:
> >>> + case RSHIFT_EXPR:
> >>> + case LROTATE_EXPR:
> >>> + case RROTATE_EXPR:
> >>> +   /* Division, modulus or shifts by the index do not produce
> >>> +      something useful for the attacker.  */
> >>> +   if (gimple_assign_rhs2 (ass) == op)
> >>> +     return true;
> >>> +   break;
> >>> + default:;
> >>> +   /* Comparisons do not produce an index value.  */
> >>> +   if (TREE_CODE_CLASS (code) == tcc_comparison)
> >>> +     return true;
> >>> + }
> >>> +    }
> >>> +  /* ???  We could handle builtins here.  */
> >>> +  return false;
> >>> +}
> >>> +
> >>> +static GTY(()) tree spectrev1_tls_mask_decl;
> >>> +
> >>> +/* Main entry for spectrev1 pass.  */
> >>> +
> >>> +unsigned int
> >>> +pass_spectrev1::execute (function *fn)
> >>> +{
> >>> +  calculate_dominance_info (CDI_DOMINATORS);
> >>> +  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
> >>> +
> >>> +  int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
> >>> +  int rpo_num = pre_and_rev_post_order_compute_fn (fn, NULL, rpo, false);
> >>> +
> >>> +  /* We track for each SSA name whether its value (may) depend(s) on
> >>> +     the result of an indexed load.
> >>> +     A set of operation will kill a value (enough).  */
> >>> +  auto_sbitmap value_from_indexed_load (num_ssa_names);
> >>> +  bitmap_clear (value_from_indexed_load);
> >>> +
> >>> +  unsigned orig_num_ssa_names = num_ssa_names;
> >>> +  influencing_outgoing_flow = XCNEWVEC (bitmap_head, num_ssa_names);
> >>> +  for (unsigned i = 1; i < num_ssa_names; ++i)
> >>> +    bitmap_initialize (&influencing_outgoing_flow[i], 
> >>> &bitmap_default_obstack);
> >>> +
> >>> +
> >>> +  /* Diagnosis.  */
> >>> +
> >>> +  /* Function arguments are not indexed loads unless we want to
> >>> +     be conservative to a level no longer useful.  */
> >>> +
> >>> +  for (int i = 0; i < rpo_num; ++i)
> >>> +    {
> >>> +      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
> >>> +
> >>> +      for (gphi_iterator gpi = gsi_start_phis (bb);
> >>> +    !gsi_end_p (gpi); gsi_next (&gpi))
> >>> + {
> >>> +   gphi *phi = gpi.phi ();
> >>> +   bool value_from_indexed_load_p = false;
> >>> +   use_operand_p arg_p;
> >>> +   ssa_op_iter it;
> >>> +   FOR_EACH_PHI_ARG (arg_p, phi, it, SSA_OP_USE)
> >>> +     {
> >>> +       tree arg = USE_FROM_PTR (arg_p);
> >>> +       if (TREE_CODE (arg) == SSA_NAME
> >>> +           && bitmap_bit_p (value_from_indexed_load,
> >>> +                            SSA_NAME_VERSION (arg)))
> >>> +         value_from_indexed_load_p = true;
> >>> +     }
> >>> +   if (value_from_indexed_load_p)
> >>> +     bitmap_set_bit (value_from_indexed_load,
> >>> +                     SSA_NAME_VERSION (PHI_RESULT (phi)));
> >>> + }
> >>> +
> >>> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
> >>> +    !gsi_end_p (gsi); gsi_next (&gsi))
> >>> + {
> >>> +   gimple *stmt = gsi_stmt (gsi);
> >>> +   if (is_gimple_debug (stmt))
> >>> +     continue;
> >>> +
> >>> +   if (walk_stmt_load_store_ops (stmt, value_from_indexed_load,
> >>> +                                 check_spectrev1_2nd_load,
> >>> +                                 check_spectrev1_2nd_load))
> >>> +     warning_at (gimple_location (stmt), OPT_Wspectre_v1, "%Gspectrev1",
> >>> +                 stmt);
> >>> +
> >>> +   bool value_from_indexed_load_p = false;
> >>> +   if (stmt_is_indexed_load (stmt))
> >>> +     {
> >>> +       /* We are interested in indexes to later loads so ultimatively
> >>> +          register values that all happen to separate SSA defs.
> >>> +          Interesting aggregates will be decomposed by later loads
> >>> +          which we then mark as producing an index.  Simply mark
> >>> +          all SSA defs as coming from an indexed load.  */
> >>> +       /* We are handling a single load in STMT right now.  */
> >>> +       ssa_op_iter it;
> >>> +       tree op;
> >>> +       FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
> >>> +         if (find_value_dependent_guard (stmt, op))
> >>> +           {
> >>> +             /* ???  Somehow record the dependence to point to it in
> >>> +                diagnostics.  */
> >>> +             value_from_indexed_load_p = true;
> >>> +             break;
> >>> +           }
> >>> +     }
> >>> +
> >>> +   tree op;
> >>> +   ssa_op_iter it;
> >>> +   FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
> >>> +     if (bitmap_bit_p (value_from_indexed_load,
> >>> +                       SSA_NAME_VERSION (op))
> >>> +         && !stmt_mangles_index (stmt, op))
> >>> +       {
> >>> +         value_from_indexed_load_p = true;
> >>> +         break;
> >>> +       }
> >>> +
> >>> +   if (value_from_indexed_load_p)
> >>> +     FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_DEF)
> >>> +       /* ???  We could cut off single-bit values from the chain
> >>> +          here or pretain that float loads will be never turned
> >>> +          into integer indices, etc.  */
> >>> +       bitmap_set_bit (value_from_indexed_load,
> >>> +                       SSA_NAME_VERSION (op));
> >>> + }
> >>> +
> >>> +      if (EDGE_COUNT (bb->succs) > 1)
> >>> + {
> >>> +   gcond *stmt = safe_dyn_cast <gcond *> (last_stmt (bb));
> >>> +   /* ???  What about switches?  What about badly speculated EH?  */
> >>> +   if (!stmt)
> >>> +     continue;
> >>> +   /* We could constrain conditions here to those more likely
> >>> +      being "bounds checks".  For example common guards for
> >>> +      indirect accesses are NULL pointer checks.
> >>> +      ???  This isn't fully safe, but it drops the number of
> >>> +      spectre warnings for dwarf2out.i from cc1files from 70 to 16.  */
> >>> +   if ((gimple_cond_code (stmt) == EQ_EXPR
> >>> +        || gimple_cond_code (stmt) == NE_EXPR)
> >>> +       && integer_zerop (gimple_cond_rhs (stmt))
> >>> +       && POINTER_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt))))
> >>> +     ;
> >>> +   else
> >>> +     {
> >>> +       ssa_op_iter it;
> >>> +       tree op;
> >>> +       FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
> >>> +         mark_influencing_outgoing_flow (bb, op);
> >>> +     }
> >>> + }
> >>> +    }
> >>> +
> >>> +  for (unsigned i = 1; i < orig_num_ssa_names; ++i)
> >>> +    bitmap_release (&influencing_outgoing_flow[i]);
> >>> +  XDELETEVEC (influencing_outgoing_flow);
> >>> +
> >>> +
> >>> +
> >>> +  /* Instrumentation.  */
> >>> +  if (!flag_spectrev1)
> >>> +    return 0;
> >>> +
> >>> +  /* Create the default all-ones mask.  When doing IPA instrumentation
> >>> +     this should initialize the mask from TLS memory and outgoing edges
> >>> +     need to save the mask to TLS memory.  */
> >>> +  gimple *new_stmt;
> >>> +  if (!spectrev1_tls_mask_decl
> >>> +      && flag_spectrev1 >= 3)
> >>> +    {
> >>> +      /* Use a smaller variable in case sign-extending loads are
> >>> +  available?  */
> >>> +      spectrev1_tls_mask_decl
> >>> +   = build_decl (BUILTINS_LOCATION,
> >>> +                 VAR_DECL, NULL_TREE, ptr_type_node);
> >>> +      TREE_STATIC (spectrev1_tls_mask_decl) = 1;
> >>> +      TREE_PUBLIC (spectrev1_tls_mask_decl) = 1;
> >>> +      DECL_VISIBILITY (spectrev1_tls_mask_decl) = VISIBILITY_HIDDEN;
> >>> +      DECL_VISIBILITY_SPECIFIED (spectrev1_tls_mask_decl) = 1;
> >>> +      DECL_INITIAL (spectrev1_tls_mask_decl)
> >>> +   = build_all_ones_cst (ptr_type_node);
> >>> +      DECL_NAME (spectrev1_tls_mask_decl) = get_identifier ("__SV1MSK");
> >>> +      DECL_ARTIFICIAL (spectrev1_tls_mask_decl) = 1;
> >>> +      DECL_IGNORED_P (spectrev1_tls_mask_decl) = 1;
> >>> +      varpool_node::finalize_decl (spectrev1_tls_mask_decl);
> >>> +      make_decl_one_only (spectrev1_tls_mask_decl,
> >>> +                   DECL_ASSEMBLER_NAME (spectrev1_tls_mask_decl));
> >>> +      set_decl_tls_model (spectrev1_tls_mask_decl,
> >>> +                   decl_default_tls_model (spectrev1_tls_mask_decl));
> >>> +    }
> >>> +
> >>> +  /* We let the SSA rewriter cope with rewriting mask into SSA and
> >>> +     inserting PHI nodes.  */
> >>> +  tree mask = create_tmp_reg (ptr_type_node, "spectre_v1_mask");
> >>> +  new_stmt = gimple_build_assign (mask,
> >>> +                           flag_spectrev1 >= 3
> >>> +                           ? spectrev1_tls_mask_decl
> >>> +                           : build_all_ones_cst (ptr_type_node));
> >>> +  gimple_stmt_iterator gsi
> >>> +      = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (fn)));
> >>> +  gsi_insert_before (&gsi, new_stmt, GSI_CONTINUE_LINKING);
> >>> +
> >>> +  /* We are using the visited flag to track stmts downstream in a BB.  */
> >>> +  for (int i = 0; i < rpo_num; ++i)
> >>> +    {
> >>> +      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
> >>> +      for (gphi_iterator gpi = gsi_start_phis (bb);
> >>> +    !gsi_end_p (gpi); gsi_next (&gpi))
> >>> + gimple_set_visited (gpi.phi (), false);
> >>> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
> >>> +    !gsi_end_p (gsi); gsi_next (&gsi))
> >>> + gimple_set_visited (gsi_stmt (gsi), false);
> >>> +    }
> >>> +
> >>> +  for (int i = 0; i < rpo_num; ++i)
> >>> +    {
> >>> +      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
> >>> +
> >>> +      for (gphi_iterator gpi = gsi_start_phis (bb);
> >>> +    !gsi_end_p (gpi); gsi_next (&gpi))
> >>> + {
> >>> +   gphi *phi = gpi.phi ();
> >>> +   /* ???  We can merge SAFE state across BB boundaries in
> >>> +      some cases, like when edges are not critical and the
> >>> +      state was made SAFE in the tail of the predecessors
> >>> +      and not invalidated by calls.   */
> >>> +   gimple_set_plf (phi, SV1_SAFE, false);
> >>> + }
> >>> +
> >>> +      bool instrumented_call_p = false;
> >>> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
> >>> +    !gsi_end_p (gsi); gsi_next (&gsi))
> >>> + {
> >>> +   gimple *stmt = gsi_stmt (gsi);
> >>> +   gimple_set_visited (stmt, true);
> >>> +   if (is_gimple_debug (stmt))
> >>> +     continue;
> >>> +
> >>> +   tree op;
> >>> +   ssa_op_iter it;
> >>> +   bool safe = is_gimple_assign (stmt);
> >>> +   if (safe)
> >>> +     FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
> >>> +       {
> >>> +         if (safe
> >>> +             && (SSA_NAME_IS_DEFAULT_DEF (op)
> >>> +                 || !gimple_plf (SSA_NAME_DEF_STMT (op), SV1_SAFE)
> >>> +                 /* Once mask can have changed we cannot further
> >>> +                    propagate safe state.  */
> >>> +                 || gimple_bb (SSA_NAME_DEF_STMT (op)) != bb
> >>> +                 /* That includes calls if we have instrumented one
> >>> +                    in this block.  */
> >>> +                 || (instrumented_call_p
> >>> +                     && call_between (SSA_NAME_DEF_STMT (op), stmt))))
> >>> +           {
> >>> +             safe = false;
> >>> +             break;
> >>> +           }
> >>> +       }
> >>> +   gimple_set_plf (stmt, SV1_SAFE, safe);
> >>> +
> >>> +   /* Instrument bounded loads.
> >>> +      We instrument non-aggregate loads with non-invariant address.
> >>> +      The idea is to reliably instrument the bounded load while
> >>> +      leaving the canary, being it load or store, aggregate or
> >>> +      non-aggregate, alone.  */
> >>> +   if (gimple_assign_single_p (stmt)
> >>> +       && gimple_vuse (stmt)
> >>> +       && !gimple_vdef (stmt)
> >>> +       && !zero_ssa_operands (stmt, SSA_OP_USE))
> >>> +     {
> >>> +       tree new_mem = instrument_mem (&gsi, gimple_assign_rhs1 (stmt),
> >>> +                                      mask);
> >>> +       gimple_assign_set_rhs1 (stmt, new_mem);
> >>> +       update_stmt (stmt);
> >>> +       /* The value loaded my a masked load is "safe".  */
> >>> +       gimple_set_plf (stmt, SV1_SAFE, true);
> >>> +     }
> >>> +
> >>> +   /* Instrument return store to TLS mask.  */
> >>> +   if (flag_spectrev1 >= 3
> >>> +       && gimple_code (stmt) == GIMPLE_RETURN)
> >>> +     {
> >>> +       new_stmt = gimple_build_assign (spectrev1_tls_mask_decl, mask);
> >>> +       gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
> >>> +     }
> >>> +   /* Instrument calls with store/load to/from TLS mask.
> >>> +      ???  Placement of the stores/loads can be optimized in a LCM
> >>> +      way.  */
> >>> +   else if (flag_spectrev1 >= 3
> >>> +            && is_gimple_call (stmt)
> >>> +            && gimple_vuse (stmt))
> >>> +     {
> >>> +       new_stmt = gimple_build_assign (spectrev1_tls_mask_decl, mask);
> >>> +       gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
> >>> +       if (!stmt_ends_bb_p (stmt))
> >>> +         {
> >>> +           new_stmt = gimple_build_assign (mask,
> >>> +                                           spectrev1_tls_mask_decl);
> >>> +           gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT);
> >>> +         }
> >>> +       else
> >>> +         {
> >>> +           edge_iterator ei;
> >>> +           edge e;
> >>> +           FOR_EACH_EDGE (e, ei, bb->succs)
> >>> +             {
> >>> +               if (e->flags & EDGE_ABNORMAL)
> >>> +                 continue;
> >>> +               new_stmt = gimple_build_assign (mask,
> >>> +                                               spectrev1_tls_mask_decl);
> >>> +               gsi_insert_on_edge (e, new_stmt);
> >>> +             }
> >>> +         }
> >>> +       instrumented_call_p = true;
> >>> +     }
> >>> + }
> >>> +
> >>> +      if (EDGE_COUNT (bb->succs) > 1)
> >>> + {
> >>> +   gcond *stmt = safe_dyn_cast <gcond *> (last_stmt (bb));
> >>> +   /* ???  What about switches?  What about badly speculated EH?  */
> >>> +   if (!stmt)
> >>> +     continue;
> >>> +
> >>> +   /* Instrument conditional branches to track mis-speculation
> >>> +      via a pointer-sized mask.
> >>> +      ???  We could restrict to instrumenting those conditions
> >>> +      that control interesting loads or apply simple heuristics
> >>> +      like not instrumenting FP compares or equality compares
> >>> +      which are unlikely bounds checks.  But we have to instrument
> >>> +      bool != 0 because multiple conditions might have been
> >>> +      combined.  */
> >>> +   edge truee, falsee;
> >>> +   extract_true_false_edges_from_block (bb, &truee, &falsee);
> >>> +   /* Unless -fspectre-v1=2 we do not instrument loop exit tests.  */
> >>> +   if (flag_spectrev1 >= 2
> >>> +       || !loop_exits_from_bb_p (bb->loop_father, bb))
> >>> +     {
> >>> +       gimple_stmt_iterator gsi = gsi_last_bb (bb);
> >>> +
> >>> +       /* Instrument
> >>> +            if (a_1 > b_2)
> >>> +          as
> >>> +            tem_mask_3 = a_1 > b_2 ? -1 : 0;
> >>> +            if (tem_mask_3 != 0)
> >>> +          this will result in a
> >>> +            xor %eax, %eax; cmp|test; setCC %al; sub $0x1, %eax; jne
> >>> +          sequence which is faster in practice than when retaining
> >>> +          the original jump condition.  This is 10 bytes overhead
> >>> +          on x86_64 plus 3 bytes for an and on the true path and
> >>> +          5 bytes for an and and not on the false path.  */
> >>> +       tree tem_mask = make_ssa_name (ptr_type_node);
> >>> +       new_stmt = gimple_build_assign (tem_mask, COND_EXPR,
> >>> +                                       build2 (gimple_cond_code (stmt),
> >>> +                                               boolean_type_node,
> >>> +                                               gimple_cond_lhs (stmt),
> >>> +                                               gimple_cond_rhs (stmt)),
> >>> +                                       build_all_ones_cst 
> >>> (ptr_type_node),
> >>> +                                       build_zero_cst (ptr_type_node));
> >>> +       gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
> >>> +       gimple_cond_set_code (stmt, NE_EXPR);
> >>> +       gimple_cond_set_lhs (stmt, tem_mask);
> >>> +       gimple_cond_set_rhs (stmt, build_zero_cst (ptr_type_node));
> >>> +       update_stmt (stmt);
> >>> +
> >>> +       /* On the false edge
> >>> +            mask = mask & ~tem_mask_3;  */
> >>> +       gimple_seq tems = NULL;
> >>> +       tree tem_mask2 = make_ssa_name (ptr_type_node);
> >>> +       new_stmt = gimple_build_assign (tem_mask2, BIT_NOT_EXPR,
> >>> +                                       tem_mask);
> >>> +       gimple_seq_add_stmt_without_update (&tems, new_stmt);
> >>> +       new_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
> >>> +                                       mask, tem_mask2);
> >>> +       gimple_seq_add_stmt_without_update (&tems, new_stmt);
> >>> +       gsi_insert_seq_on_edge (falsee, tems);
> >>> +
> >>> +       /* On the true edge
> >>> +            mask = mask & tem_mask_3;  */
> >>> +       new_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
> >>> +                                       mask, tem_mask);
> >>> +       gsi_insert_on_edge (truee, new_stmt);
> >>> +     }
> >>> + }
> >>> +    }
> >>> +
> >>> +  gsi_commit_edge_inserts ();
> >>> +
> >>> +  return 0;
> >>> +}
> >>> +
> >>> +} // anon namespace
> >>> +
> >>> +gimple_opt_pass *
> >>> +make_pass_spectrev1 (gcc::context *ctxt)
> >>> +{
> >>> +  return new pass_spectrev1 (ctxt);
> >>> +}
> >>> diff --git a/gcc/params.def b/gcc/params.def
> >>> index 6f98fccd291..19f7dbf4dad 100644
> >>> --- a/gcc/params.def
> >>> +++ b/gcc/params.def
> >>> @@ -1378,6 +1378,11 @@ DEFPARAM(PARAM_LOOP_VERSIONING_MAX_OUTER_INSNS,
> >>>    " loops.",
> >>>    100, 0, 0)
> >>>  
> >>> +DEFPARAM(PARAM_SPECTRE_V1_MAX_INSTRUMENT_INDICES,
> >>> +  "spectre-v1-max-instrument-indices",
> >>> +  "Maximum number of indices to instrument before instrumenting the 
> >>> whole address.",
> >>> +  1, 0, 0)
> >>> +
> >>>  /*
> >>>  
> >>>  Local variables:
> >>> diff --git a/gcc/passes.def b/gcc/passes.def
> >>> index 144df4fa417..2fe0cdcfa7e 100644
> >>> --- a/gcc/passes.def
> >>> +++ b/gcc/passes.def
> >>> @@ -400,6 +400,7 @@ along with GCC; see the file COPYING3.  If not see
> >>>    NEXT_PASS (pass_lower_resx);
> >>>    NEXT_PASS (pass_nrv);
> >>>    NEXT_PASS (pass_cleanup_cfg_post_optimizing);
> >>> +  NEXT_PASS (pass_spectrev1);
> >>>    NEXT_PASS (pass_warn_function_noreturn);
> >>>    NEXT_PASS (pass_gen_hsail);
> >>>  
> >>> diff --git a/gcc/testsuite/gcc.dg/Wspectre-v1-1.c 
> >>> b/gcc/testsuite/gcc.dg/Wspectre-v1-1.c
> >>> new file mode 100644
> >>> index 00000000000..3ac647e72fd
> >>> --- /dev/null
> >>> +++ b/gcc/testsuite/gcc.dg/Wspectre-v1-1.c
> >>> @@ -0,0 +1,10 @@
> >>> +/* { dg-do compile } */
> >>> +/* { dg-options "-Wspectre-v1" } */
> >>> +
> >>> +unsigned char a[1024];
> >>> +int b[256];
> >>> +int foo (int i, int bound)
> >>> +{
> >>> +  if (i < bound)
> >>> +    return b[a[i]];  /* { dg-warning "spectrev1" } */
> >>> +}
> >>> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> >>> index 9f9d85fdbc3..f5c164f465f 100644
> >>> --- a/gcc/tree-pass.h
> >>> +++ b/gcc/tree-pass.h
> >>> @@ -625,6 +625,7 @@ extern gimple_opt_pass *make_pass_local_fn_summary 
> >>> (gcc::context *ctxt);
> >>>  extern gimple_opt_pass *make_pass_update_address_taken (gcc::context 
> >>> *ctxt);
> >>>  extern gimple_opt_pass *make_pass_convert_switch (gcc::context *ctxt);
> >>>  extern gimple_opt_pass *make_pass_lower_vaarg (gcc::context *ctxt);
> >>> +extern gimple_opt_pass *make_pass_spectrev1 (gcc::context *ctxt);
> >>>  
> >>>  /* Current optimization pass.  */
> >>>  extern opt_pass *current_pass;
> >>>
> >>
> >>
> >>
> > 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: Spectre V1 diagnostic / mitigation

Reply via email to