On 2020-01-21, Szabolcs Nagy wrote:
On 21/01/2020 11:34, Mark Rutland wrote:
Hi Szabolcs,

Answers from a linux dev perspective below.

thanks.

On Mon, Jan 20, 2020 at 10:53:33AM +0000, Szabolcs Nagy wrote:
i have to ask some linux developers which way they prefer:

e.g. -fpatchable-function-entry=3,1 is

 .section __patchable_function_entries
 .8byte .Lpatch
 .text
.Lpatch:
  nop
func:
  nop
  nop
  ...

with bti the code will be emitted as:

.Lpatch:
  nop
func:
  bti c
  nop
  nop
  ...

That looks good to me.

but e.g. -fpatchable-function-entry=2,0 has two reasonable
approaches with bti:

(a)

func:
.Lpatch:
  bti c
  nop
  nop
  ...

(b)

func:
  bti c
.Lpatch:
  nop
  nop
  ...

I had assumed (b); that means that .Lpatch consistently points to the
first NOP. To my mental model, that seems more consistent than (a).

However, I can work with either so long as it's consistent.
...
As above, my weak preference is (b), but I can work with either. I just
need the behaviour to be consistent.

Was there a rationale for LLVM choosing (a) rather than (b), e.g. was
that also ease of implementation? If there isn't a rationale otherwise,
and if LLVM could also do (b), that would be nice from my PoV.

How big is "a bit more changes" for GCC?

".Lpatch: nop; nop" is generated by generic code now which
a backend can override, but then i have to copy that logic
(~30 lines) to add something after the .Lpatch, or i have
to change generic code to have a new hook after the .Lpatch:.

to provide more argument for (b) and expand on consistency:

right now bti is not emitted if a function is not called
indirectly (compiler knows this for some local functions)
or bti can be turned off by a target attribute per function.

so a mix of bti and non-bti functions are possible, so
the user patching code has to deal with this (e.g. read
the instructions before patching etc)

in the M>0, M!=N case bti is emitted in the middle of a
patch area (e.g. 3,1 case above), but some functions will
not have bti. i can make it so that if patchable function
entry is requested then bti is always added, irrespective
of any other option or attribute, but that seems to be a
big hammer, without such big hammer solution users will
have a difficult time to deal with the additional bti.

if the patch area is entirely before the function label,
then no changes are needed (M=N), if patch area is after
the function label (M=0) then using fix (b) would allow
simple user code. in other cases user code will be
complicated no matter what. so if linux uses M=0 then i
think (b) is preferable.

hope this helps.

The Clang inconvenience is in the other way...

(My previous Clang patch series do not implement M>0.
 https://reviews.llvm.org/D73070 will add support for M>0.)

AsmPrinter (assembly printer/object file emitter) does the following:

1. Emit data before the function entry
2. Emit the function entry label and the label for __patchable_function_entries
3. Emit the function body

The function body has already been constructed before AsmPrinter. Among
late-stage machine code passes, -fpatchable-function-entry=,
-mbranch-protection= (AArch64 BTI) and -fcf-protection= (Intel Indirect
Branch Tracking) are implemented as passes which can prepend
instructions to the function body.

To do (b), step 2 needs to be split. The code generator should somehow
know the label for __patchable_function_entries is after bti
c/endbr32/endbr64.


Before forming a decision, I think we should also consider whether M>0
will be possible in the future. Taking M>0 into consideration, is (a) or
(b) more consistent?

Linux/arch/arm64/Kconfig currently uses -fpatchable-function-entry=2,0
but M>0 could be useful for other architectures (arch/parisc already uses 1,1).

Reply via email to