On 2020-01-21, Szabolcs Nagy wrote:
On 21/01/2020 11:34, Mark Rutland wrote:
Hi Szabolcs,
Answers from a linux dev perspective below.
thanks.
On Mon, Jan 20, 2020 at 10:53:33AM +0000, Szabolcs Nagy wrote:
i have to ask some linux developers which way they prefer:
e.g. -fpatchable-function-entry=3,1 is
.section __patchable_function_entries
.8byte .Lpatch
.text
.Lpatch:
nop
func:
nop
nop
...
with bti the code will be emitted as:
.Lpatch:
nop
func:
bti c
nop
nop
...
That looks good to me.
but e.g. -fpatchable-function-entry=2,0 has two reasonable
approaches with bti:
(a)
func:
.Lpatch:
bti c
nop
nop
...
(b)
func:
bti c
.Lpatch:
nop
nop
...
I had assumed (b); that means that .Lpatch consistently points to the
first NOP. To my mental model, that seems more consistent than (a).
However, I can work with either so long as it's consistent.
...
As above, my weak preference is (b), but I can work with either. I just
need the behaviour to be consistent.
Was there a rationale for LLVM choosing (a) rather than (b), e.g. was
that also ease of implementation? If there isn't a rationale otherwise,
and if LLVM could also do (b), that would be nice from my PoV.
How big is "a bit more changes" for GCC?
".Lpatch: nop; nop" is generated by generic code now which
a backend can override, but then i have to copy that logic
(~30 lines) to add something after the .Lpatch, or i have
to change generic code to have a new hook after the .Lpatch:.
to provide more argument for (b) and expand on consistency:
right now bti is not emitted if a function is not called
indirectly (compiler knows this for some local functions)
or bti can be turned off by a target attribute per function.
so a mix of bti and non-bti functions are possible, so
the user patching code has to deal with this (e.g. read
the instructions before patching etc)
in the M>0, M!=N case bti is emitted in the middle of a
patch area (e.g. 3,1 case above), but some functions will
not have bti. i can make it so that if patchable function
entry is requested then bti is always added, irrespective
of any other option or attribute, but that seems to be a
big hammer, without such big hammer solution users will
have a difficult time to deal with the additional bti.
if the patch area is entirely before the function label,
then no changes are needed (M=N), if patch area is after
the function label (M=0) then using fix (b) would allow
simple user code. in other cases user code will be
complicated no matter what. so if linux uses M=0 then i
think (b) is preferable.
hope this helps.
The Clang inconvenience is in the other way...
(My previous Clang patch series do not implement M>0.
https://reviews.llvm.org/D73070 will add support for M>0.)
AsmPrinter (assembly printer/object file emitter) does the following:
1. Emit data before the function entry
2. Emit the function entry label and the label for __patchable_function_entries
3. Emit the function body
The function body has already been constructed before AsmPrinter. Among
late-stage machine code passes, -fpatchable-function-entry=,
-mbranch-protection= (AArch64 BTI) and -fcf-protection= (Intel Indirect
Branch Tracking) are implemented as passes which can prepend
instructions to the function body.
To do (b), step 2 needs to be split. The code generator should somehow
know the label for __patchable_function_entries is after bti
c/endbr32/endbr64.
Before forming a decision, I think we should also consider whether M>0
will be possible in the future. Taking M>0 into consideration, is (a) or
(b) more consistent?
Linux/arch/arm64/Kconfig currently uses -fpatchable-function-entry=2,0
but M>0 could be useful for other architectures (arch/parisc already uses 1,1).