[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #19 from jvoisin --- > That's not a good reason to weaken the security of the generated code. Having BTI will more valid targets is still better than no BTI at all, and it would still be better than what clang is doing.
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #18 from Mark Brown --- It's section placement stuff that's triggering this. You will also be able to build a larger kernel if you try, though I'm not sure that's practical.
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #17 from Wilco --- (In reply to Mark Brown from comment #13) > The kernel hasn't got any problem with BTI as far as I am aware - when built > with clang we run the kernel with BTI enabled since clang does just insert a > BTI C at the start of every function, and GCC works fine so long as we don't > get any out of range jumps being generated. The issue is that we don't have > anything to insert veneers in the case where section placement puts static > functions into a distant enough part of memory to need an indirect jump but > GCC has decided to omit the landing pad. Is the kernel already larger than 128 MBytes .text? Or do people do weird stuff with section placement that causes branches to be out of range?
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #16 from Richard Earnshaw --- (In reply to Mark Brown from comment #15) > The kernel module loader simply does not insert veneers at present, and > there were some implementation concerns IIRC. That's not a good reason to weaken the security of the generated code.
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #15 from Mark Brown --- The kernel module loader simply does not insert veneers at present, and there were some implementation concerns IIRC.
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #14 from Richard Earnshaw --- (In reply to Mark Brown from comment #13) > The kernel hasn't got any problem with BTI as far as I am aware - when built > with clang we run the kernel with BTI enabled since clang does just insert a > BTI C at the start of every function, and GCC works fine so long as we don't > get any out of range jumps being generated. The issue is that we don't have > anything to insert veneers in the case where section placement puts static > functions into a distant enough part of memory to need an indirect jump but > GCC has decided to omit the landing pad. The linker has to insert the veneers.
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #13 from Mark Brown --- The kernel hasn't got any problem with BTI as far as I am aware - when built with clang we run the kernel with BTI enabled since clang does just insert a BTI C at the start of every function, and GCC works fine so long as we don't get any out of range jumps being generated. The issue is that we don't have anything to insert veneers in the case where section placement puts static functions into a distant enough part of memory to need an indirect jump but GCC has decided to omit the landing pad.
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #12 from nsz at gcc dot gnu.org --- (In reply to Jiangning Liu from comment #11) > Hi Wilco, > > > "it means we will need a linker optimization to remove those redundant BTIs > > (eg. by changing them into NOPs)" > > It will be only for performance optimization, right? If we don't care about > performance, the linker doesn't need to optimize it to be NOP, right? It > could still be useful if we only do this operation for a specific module. no, this is a security feature, we want as few BTI c in an executable segment as possible.
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #11 from Jiangning Liu --- Hi Wilco, > "it means we will need a linker optimization to remove those redundant BTIs > (eg. by changing them into NOPs)" It will be only for performance optimization, right? If we don't care about performance, the linker doesn't need to optimize it to be NOP, right? It could still be useful if we only do this operation for a specific module. Thanks, -Jiangning
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #10 from Wilco --- (In reply to Feng Xue from comment #9) > On some occasions, we may not use the new ld, the kernel-building relies on > its own runtime linker which is used for kernel modules. So I created a > patch (https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626084.html), > and this provides user another option that could be done at the compiler > side. Reducing BTI is important for security. With LTO a binary should only have BTI on functions that are indirectly called. So I don't like the idea of adding more BTI with a new option - it means we will need a linker optimization to remove those redundant BTIs (eg. by changing them into NOPs). Note that branch offsets up to 256MB don't need special veneer handling: one should place a direct branch about halfway to the destination. Does Linux do any weird hacks in -fpatchable-function-entry that makes it hard to use BTI?
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 Feng Xue changed: What|Removed |Added CC||fxue at os dot amperecomputing.com --- Comment #9 from Feng Xue --- On some occasions, we may not use the new ld, the kernel-building relies on its own runtime linker which is used for kernel modules. So I created a patch (https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626084.html), and this provides user another option that could be done at the compiler side.
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #8 from Mark Brown --- Note that the issue was found in the Linux kernel - we were expecting to see the BTI Cs there, it's certainly a lot simpler to work with.
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 nsz at gcc dot gnu.org changed: What|Removed |Added CC||nsz at gcc dot gnu.org Status|NEW |WAITING --- Comment #7 from nsz at gcc dot gnu.org --- fixed in bfd ld 2.41 see https://sourceware.org/bugzilla/show_bug.cgi?id=30076 we can also fix gcc to work with older ld (emit bti c in local functions), but i don't plan to do that unless there is a reason to do so. (it increases the emitted bti c considerably in some workloads, e.g. linux kernel, while the linker fix is less intrusive in the common case with small binaries and no weird section hacks).
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #6 from Richard Earnshaw --- (In reply to Richard Earnshaw from comment #5) > (In reply to D Scott Phillips from comment #2) > > th(In reply to Andrew Pinski from comment #1) > > > Shouldn't the linker add the BTI inside the ___veneer instead? > > > > The bti instruction has to be placed at the target of the indirect branch > > (at the top of `func` in this case) so I don't think it would be possible to > > work around this just within the veneer. > > The veneer has to be placed 'near' the target and then end with a direct > branch instruction. The linker should be able to work this out. This might, of course, mean that two veneers are needed in this case, one that can be reached from the initial branch, and one that can reach the final target. A direct branch will jump to the first and the second one will be reached by an indirect jump (needing a BTI at the start).
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #5 from Richard Earnshaw --- (In reply to D Scott Phillips from comment #2) > th(In reply to Andrew Pinski from comment #1) > > Shouldn't the linker add the BTI inside the ___veneer instead? > > The bti instruction has to be placed at the target of the indirect branch > (at the top of `func` in this case) so I don't think it would be possible to > work around this just within the veneer. The veneer has to be placed 'near' the target and then end with a direct branch instruction. The linker should be able to work this out.
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #4 from Andrew Pinski --- (In reply to Andrew Pinski from comment #3) > Basically: > void > aarch64_print_patchable_function_entry (FILE *file, > unsigned HOST_WIDE_INT > patch_area_size, > bool record_p) > { > if (cfun->machine->label_is_assembled > && aarch64_bti_enabled () > && !cgraph_node::get (cfun->decl)->only_called_directly_p ()) > > > That last check just needs to be removed as there is no way to know if the > linker will output a veneer. That only fixes the -fpatchable-function-entry= case. aarch64-bti-insert.cc needs to be fixed too: /* Since a Branch Target Exception can only be triggered by an indirect call, we exempt function that are only called directly. We also exempt functions that are already protected by Return Address Signing (PACIASP/ PACIBSP). For all other cases insert a BTI C at the beginning of the function. */ if (!cgraph_node::get (cfun->decl)->only_called_directly_p ())
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2022-08-17 Status|UNCONFIRMED |NEW --- Comment #3 from Andrew Pinski --- Basically: void aarch64_print_patchable_function_entry (FILE *file, unsigned HOST_WIDE_INT patch_area_size, bool record_p) { if (cfun->machine->label_is_assembled && aarch64_bti_enabled () && !cgraph_node::get (cfun->decl)->only_called_directly_p ()) That last check just needs to be removed as there is no way to know if the linker will output a veneer.
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #2 from D Scott Phillips --- th(In reply to Andrew Pinski from comment #1) > Shouldn't the linker add the BTI inside the ___veneer instead? The bti instruction has to be placed at the target of the indirect branch (at the top of `func` in this case) so I don't think it would be possible to work around this just within the veneer.
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #1 from Andrew Pinski --- Shouldn't the linker add the BTI inside the ___veneer instead?