> On 28 Jul 2025, at 17:10, Remi Machet <rmac...@nvidia.com> wrote:
> 
> 
> On 7/28/25 17:02, Kyrylo Tkachov wrote:
>> External email: Use caution opening links or attachments
>> 
>> 
>> Hi Spencer,
>> 
>>> On 28 Jul 2025, at 16:25, Spencer Abson <spencer.ab...@arm.com> wrote:
>>> 
>>> Streaming-compatible functions can be compiled without SME enabled, but need
>>> to use "SMSTART SM" and "SMSTOP SM" to temporarily switch into the streaming
>>> mode of a callee.  These switches are conditional on the current mode being
>>> opposite to the target mode, so no SME instructions are executed if SME is 
>>> not
>>> available.
>>> 
>>> However, in GAS, "SMSTART SM" and "SMSTOP SM" always require +sme.  A call
>>> from a streaming-compatible function, compiled without SME enabled, to a non
>>> -streaming function will be rejected as:
>> I guess this is an argument for allowing SMTART/SMSTOP in gas 
>> unconditionally?
>> Though it would need to be consistent with LLVM and we’d still need a 
>> workaround for older binutils, so okay...
>> 
>> 
>>> Error: selected processor does not support `smstop sm'..
>>> 
>>> To work around this, we make use of the .inst directive to insert the 
>>> literal
>>> encodings of "SMSTART SM" and "SMSTOP SM".
>>> 
>>> gcc/ChangeLog:
>>> PR target/121028
>>> * config/aarch64/aarch64-sme.md (aarch64_smstart_sm): Use the .inst
>>> directive if !TARGET_SME.
>>> (aarch64_smstop_sm): Likewise.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> PR target/121028
>>> * gcc.target/aarch64/sme/call_sm_switch_1.c: Tell check-function
>>> -bodies not to ignore .inst directives, and replace the test for
>>> "smstart sm" with one for it's encoding.
>>> * gcc.target/aarch64/sme/call_sm_switch_11.c: Likewise.
>>> * gcc.target/aarch64/sme/pr121028.c: New test.
>>> ---
>>> gcc/config/aarch64/aarch64-sme.md             | 12 ++++-
>>> .../gcc.target/aarch64/sme/call_sm_switch_1.c |  4 +-
>>> .../aarch64/sme/call_sm_switch_11.c           |  5 +-
>>> .../gcc.target/aarch64/sme/pr121028.c         | 46 +++++++++++++++++++
>>> 4 files changed, 61 insertions(+), 6 deletions(-)
>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/pr121028.c
>>> 
>>> diff --git a/gcc/config/aarch64/aarch64-sme.md 
>>> b/gcc/config/aarch64/aarch64-sme.md
>>> index 6b3f4390943..5264bf75c0b 100644
>>> --- a/gcc/config/aarch64/aarch64-sme.md
>>> +++ b/gcc/config/aarch64/aarch64-sme.md
>>> @@ -62,6 +62,10 @@
>>> ;; (b) they are sometimes used conditionally, particularly in streaming-
>>> ;; compatible code.
>>> ;;
>>> +;; To prevent the latter from upsetting the assembler, we emit the literal
>>> +;; encodings of "SMSTART SM" and "SMSTOP SM" when compiling without
>>> +;; TARGET_SME.
>>> +;;
>>> ;; =========================================================================
>>> 
>>> ;; -------------------------------------------------------------------------
>>> @@ -161,7 +165,9 @@
>>>    (clobber (reg:VNx16BI P14_REGNUM))
>>>    (clobber (reg:VNx16BI P15_REGNUM))]
>>>   ""
>>> -  "smstart\tsm"
>>> +  {
>>> +    return TARGET_SME ? "smstart\tsm" : ".inst 0xd503437f";
>>> +  }
>> Usually when we emit such .inst directives we also emit the mnemonic as an 
>> assembly comment to aid human assembly debugging.
>> Ok by me with these changes, but maybe Richard is best-placed to comment on 
>> the SME parts.
>> Thanks,
>> Kyrill
>> 
>> 
>>> )
>>> 
>>> ;; Turn off streaming mode.  This clobbers all SVE state.
>>> @@ -196,7 +202,9 @@
>>>    (clobber (reg:VNx16BI P14_REGNUM))
>>>    (clobber (reg:VNx16BI P15_REGNUM))]
>>>   ""
>>> -  "smstop\tsm"
>>> +  {
>>> +    return TARGET_SME ? "smstop\tsm" : ".inst 0xd503427f";
>>> +  }
>>> )
>>> 
>>> ;; -------------------------------------------------------------------------
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c 
>>> b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c
>>> index 98922aaeae0..7d514bd46c1 100644
>>> --- a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c
>>> @@ -1,5 +1,5 @@
>>> // { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls 
>>> -funwind-tables" }
>>> -// { dg-final { check-function-bodies "**" "" } }
>>> +// { dg-final { check-function-bodies "**" "" "" { target "*-*-*" } 
>>> {\t\.inst} } }
>>> 
>>> void ns_callee ();
>>>  void s_callee () [[arm::streaming]];
>>> @@ -218,7 +218,7 @@ sc_caller_x1 (int *ptr, int a) 
>>> [[arm::streaming_compatible]]
>>> ** bl ns_callee_stack
>>> ** ldr x16, \[x29, #?16\]
>>> ** tbz x16, 0, .*
>>> -** smstart sm
>>> +** .inst 0xd503437f
>>> ** ...
>>> */
>>> void
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_11.c 
>>> b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_11.c
>>> index ee6f98737d9..7f1eab716eb 100644
>>> --- a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_11.c
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_11.c
>>> @@ -1,5 +1,6 @@
>>> // { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls 
>>> -funwind-tables -mtrack-speculation" }
>>> -// { dg-final { check-function-bodies "**" "" } }
>>> +// { dg-final { check-function-bodies "**" "" "" { target "*-*-*" } 
>>> {\t\.inst} } }
>>> +
>>> 
>>> void ns_callee ();
>>>  void s_callee () [[arm::streaming]];
>>> @@ -196,7 +197,7 @@ sc_caller_x1 (int *ptr, int a) 
>>> [[arm::streaming_compatible]]
>>> ** tst x16, #?1
>>> ** beq [^\n]*
>>> ** csel x15, x15, xzr, ne
>>> -** smstart sm
>>> +** .inst 0xd503437f
>>> ** ...
>>> */
>>> void
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sme/pr121028.c 
>>> b/gcc/testsuite/gcc.target/aarch64/sme/pr121028.c
>>> new file mode 100644
>>> index 00000000000..fa36d4ec170
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sme/pr121028.c
>>> @@ -0,0 +1,46 @@
>>> +// PR121028
>>> +// { dg-do assemble { target aarch64_asm_sme_ok } }
>>> +// { dg-options "-O --save-temps" }
>>> +// { dg-final { check-function-bodies "**" "" "" { target "*-*-*" } 
>>> {\t\.inst} } }
>>> +
>>> +void ns_callee ();
>>> +
>>> +/*
>>> +** sc_caller_sme:
>>> +** ...
>>> +** mrs x16, svcr
>>> +** str x16, \[x29, #?16\]
>>> +** ldr x16, \[x29, #?16\]
>>> +** tbz x16, 0, .*
>>> +** smstop sm
>>> +** bl ns_callee
>>> +** ldr x16, \[x29, #?16\]
>>> +** tbz x16, 0, .*
>>> +** smstart sm
>>> +** ...
>>> +*/
>>> +void sc_caller_sme() __arm_streaming_compatible
>>> +{
>>> +    ns_callee ();
>>> +}
>>> +
>>> +#pragma GCC target "+nosme"
>>> +
>>> +/*
>>> +** sc_caller_nosme:
>>> +** ...
>>> +** bl __arm_sme_state
>>> +** str x0, \[x29, #?16\]
>>> +** ldr x16, \[x29, #?16\]
>>> +** tbz x16, 0, .*
>>> +** .inst 0xd503427f
>>> +** bl ns_callee
>>> +** ldr x16, \[x29, #?16\]
>>> +** tbz x16, 0, .*
>>> +** .inst 0xd503437f
>>> +** ...
>>> +*/
>>> +void sc_caller_nosme() __arm_streaming_compatible
>>> +{
>>> +    ns_callee ();
>>> +}
>>> --
>>> 2.34.1
>>> 
> Hi Spencer,
> 
> I don't see a signed-off-by, did I miss it?

AFAIK Spencer and other @arm.com people contribute under a copyright assignment 
to the FSF scheme, which doesn’t require a signed-off-by tag.
The tag is only needed for people who want to contribute their code under DCO: 
https://gcc.gnu.org/dco.html
Thanks,
Kyrill


> 
> Remi
> 

Reply via email to