Hi, all,

Sorry for bothering.

I'm trying to commit aarch64 scs code to the gcc and there is an issue
that I'm not sure about, could someone give me some suggestions?
(To avoid noise, I did't cc PING^3 [1] to the kernel mail list :) )

When clang enables scs, the following instructions are usually generated:

str     x30, [x18], 8
ldp     x29, x30, [sp], 16
......
ldp     x29, x30, [sp], 16              ## x30 pop
ldr     x30, [x18, -8]!                 ## x30 pop again
ret

The x30 register is popped twice here, Richard suggested that we can
omit the first x30 pop here.

AFAICT, it seems fine and also safe for SCS. But I'm not sure if I'm
missing something with the kernel, could someone give some suggestions?

The previous discussion can be found here [1].

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589257.html

Thanks a lot!
Dan

On 1/25/22 22:51, Dan Li wrote:


On 1/25/22 02:19, Richard Sandiford wrote:
Dan Li <ashim...@linux.alibaba.com> writes:
+
      if (flag_stack_usage_info)
        current_function_static_stack_size = constant_lower_bound (frame_size);
@@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall)
          RTX_FRAME_RELATED_P (insn) = 1;
        }
+  /* Pop return address from shadow call stack.  */
+  if (aarch64_shadow_call_stack_enabled ())
+    emit_insn (gen_scs_pop ());
+

This looks correct, but following on from the above, I guess we continue
to restore x30 from the frame in the traditional way first, and then
overwrite it with the SCS value.  I think the output code would be
slightly neater if we suppressed the first restore of x30.

Yes, the current epilogue is like:
      .......
      ldp     x29, x30, [sp], #16
+   ldr     x30, [x18, #-8]!

I think may be we can keep the two x30 pops here, for two reasons:
1) The implementation of scs in clang is to pop x30 twice, it might be
better to be consistent with clang here[1].

This doesn't seem a very compelling reason on its own, unless it's
explicitly meant to be observable behaviour.  (But then, observed how?)


Well, probably sticking to pop x30 twice is not a good idea.
AFAICT, there doesn't seem to be an explicit requirement that
this behavior must be followed.

BTW:
Do we still need to emit the .cfi_restore 30 directive here if we
want to save a pop x30? (Sorry I'm not familiar enough with DWARF.)

Since the aarch64 linux kernel always enables -fno-omit-frame-pointer,
the generated assembly code may be as follows:

str     x30, [x18], 8
ldp     x29, x30, [sp], 16
......
ldr     x29, [sp], 16
                         ## Do we still need a .cfi_restore 30 here
.cfi_restore 29
.cfi_def_cfa_offset 0
ldr     x30, [x18, -8]!
                         ## Or may be a non-CFA based directive here
ret

2) If we keep the first restore of x30, it may provide some flexibility
for other scenarios. For example, we can directly patch the scs_push/pop
insns in the binary to turn it into a binary without scs;

Is that a supported (and ideally documented) use case?  If it is,
then it becomes important for correctness that the compiler emits
exactly the opcodes in the original patch.  If it isn't, then the
compiler can emit other instructions that have the same effect.


Oh, no, this is just a little trick that can be used for compatibility.
(Maybe some scenarios such as our company sometimes need to be
compatible with some non-open source binaries from different
manufacturers, two pops could make life easier :). )

If this is not a consideration for our community, please ignore
this request.

I'd like a definitive ruling on this from the kernel folks before
the patch goes in.


Ok, I'll cc some kernel folks to make sure I didn't miss something.

If binary patching is supposed to be possible then scs_push and
scs_pop *do* need to be separate define_insns.  But they also need
to have some magic unspec that differentiates them from normal
pushes and pops, e.g.:

   (set ...mem...
        (unspec:DI [...reg...] UNSPEC_SCS_PUSH))

so that there is no chance that the pattern would be treated as
a normal move and optimised accordingly.


Yeah, this template looks more appropriate if it is to be treated
as a special directive.

Thanks for your suggestions,
Dan

Reply via email to