On 3/12/2026 11:46 AM, Vineet Gupta wrote:
Hi,
I wanted some insight/clarity on subreg promotion at expand time
following a promote_function_mode()
Apologies Roger and HJ for explicit CC but it seems you touched the
same general area in 2021 and 2026 respectively.
Here's my understand of various pieces and the problem I'm running into.
1a. PROMOTE_MODE has no direct ABI implications on its own but could
do so indirectly if TARGET_PROMOTE_FUNCTION_MODE uses the default
always which in turn uses this macro.
Right. PROMOTE_MODE is primarily to deal with targets that don't have
subword arithmetic and logicals. Now those targets typically also
define how extensions happen to function/return values, so it's easy to
conflate them.
1b. According to docs [1] PROMOTE_MODE for for ISAs supporting only
64-bit registers would define it to word_mode (64) and I'm implying
further that for ISAs supporting both it should be OK to define either
32 or 64 although it might be desirable to have 32, just for codegen
fiddling with fewer bits if nothing else.
"On most RISC machines, which only have operations that operate on a
full register,
define this macro to set m to word_mode if m is an integer mode
narrower than
BITS_PER_WORD...."
The RISC-V implementation is textbook perfect: for rv64 it would
promote anything smaller that DI to DI (since that's how wide the
container itself is) and if SI clear the unsigned bit as well since
most ALU operations would sign extend the 32-bit result to 64-bits.
Right. And it's worth remembering that rv64 doesn't really have 32 bit
ops, there's implicit sign extension 32->64 with those "w" forms.
BPF currently defines it to promote anything smaller that DI to DI:
that might be a bit conservative and lead to fewer 32-bit only insns.
A future/separate change to promote anything smaller than SI to SI can
be done later and would not be wrong.
Understood. That should be a safe definition.
2a. Does setting SUBREG_PROMOTED_VAR_P imply that rest of pass
pipeline assumes it is already promoted (thus potentially eliding any
subsequent zero/sign extensions) or does it ensure that such
extensions will always be generated on any moves. The documentation
[2] seems to suggest it is the latter, although usage in the code
seems to be more like former.
It means that subreg in that spot was created from a properly extended
wider object. That information can be used in various ways when
computing nonzero_bits, num_sign_bit_copies, etc. ie, the object is
already promoted and we can use that to eliminate subsequent extensions
or for other simplifications.
FWIW my patch [3] removed code which was clearing
SUBREG_PROMOTED_VAR_P as it was leading to extraneous sign extensions
on RISC-V.So I tend to think that keeping subreg promoted prevents
subsequent generation of extensions,
Correct.
2b. expand_call () depending on modes of @target and @rettype would
call ABI promotion for return value, wrap target in a subreg with new
mode and set SUBREG_PROMOTED_VAR_P eagerly.
store_expr
|- expand_call
if REG_P(target) && GET_MODE (target) != TYPE_MODE
(rettype))
...
pmode = promote_function_mode (type, ret_mode,
&unsignedp, funtype, 1);
target = gen_lowpart_SUBREG (ret_mode, target);
SUBREG_PROMOTED_VAR_P (target) = 1;
SUBREG_PROMOTED_SET (target, unsignedp);
The followig convert_move (@from as subreg) just strips off the subreg
...
|- convert_move
if (GET_CODE (from) == SUBREG
&& SUBREG_PROMOTED_VAR_P (from)
...
from = gen_lowpart (to_int_mode, SUBREG_REG (from));
This supposedly ensures that extensions won't be generated ? but
between setting the subreg promoted and stripping the outer, an
extension was not generated for return anyways, what am I missing ?
I'd have to see full context and probably throw it under a debugger.
But on a local basis that seems correct. You can avoid generation of
an explicit extension if the extended object is a properly promoted
subreg. A properly promoted subreg can also trigger extension
elimination in various passes. Also note that combine knows a bit about
ABI guarantees and can use those to eliminate extensions as well. You
have to follow things from expansion through to codegen to really know
what's going on.
3. The reason for the questions above is PR/124171 [4] where we need
to change gcc BPF function ABI to promote arguments as well as return
values both in callee. I'm guessing the last part is atypical as args
promoted in caller would imply return promoted in callee - but BPF
code could be called from as well as calling into other ABIs, such as
x86 kernel code and thus needs to ensure sanity in either direction.
For implementing this
* I'm specifying TARGET_PROMOTE_FUNCTION_MODE to default
promote_always: so both args and retval will be promoted.
* Currently bpf PROMOTE_MODE defaults to promoting anything smaller
than DI to DI (although ISA has insn to do SI mode only ops, and it
could be changed to that effect later on, separately, but that is
not really needed for the ABI change)
This work for most part, except for a single weird test which fails to
promote a bool return value in caller.
_Bool bar_bool(void);
int foo_bool_ne1(void) {
if (bar_bool() != 1) return 0; else return 1;
}
On trunk this generates
foo_bool_ne1:
call bar_bool
r0 &= 0xff
exit
Presumably the masking is to ensure that the single bit bool doesn' t
have any garbage bits on. bools are a bit special and I wouldn't be at
all surprised if they're not handled all that well.
With TARGET_PROMOTE_FUNCTION_MODE to default always (and unchanged
PROMOTE_MODE: sub DI to DI), it generates
foo_bool_ne1:
call bar_bool
r0 = (s32) 0xff
exit
The s32 truncation doesn't seem right as it needs to clamp it to 8
bits (ideally 1 but bool for most targets is implemented as QI).
Guessing you meant &= not =? I
gimple output is same for both trunk and patch
As I would generally expect. Those things are much more relevant to the
RTL space.
RTL Expansion is obviously different:
For trunk version we get a zero_extend
(insn 7 6 8 2 (set (reg:QI 23)
(reg:QI 22))
(nil))
(insn 8 7 9 2 (set (reg:DI 19 [ _1 ])
(zero_extend:DI (reg:QI 23))) <----
(nil))
(insn 9 8 10 2 (set (reg:SI 24 [ _5 ])
(subreg/s/v:SI (reg:DI 19 [ _1 ]) 0))
(nil))
The zero_extend happens because in store_expr () -> expand_call stack
shown above, subreg is not created due to same QI mode for both
@target and @rettype
Seems sensible to me. What you'd be looking for is combine or some
other RTL pass to come along and remove insn 8 at some point.
W/ patch, the check above is true as @target is DI, while @retyype is
QI so so subreg is created.
And subsequently convert_move(to=DI, from=(subreg/s/v:QI (reg:DI 22)
0) gets called
But it strips out the QI subreg, skips any extension ggeneration and
finally ends with a slightly different subreg outer SI (not QI) with
inner DI
(insn 7 6 8 2 (set (reg:DI 22)
(reg:DI 21))
(nil))
(insn 8 7 9 2 (set (reg:DI 19 [ _1 ])
(reg:DI 22))
(nil))
(insn 9 8 10 2 (set (reg:SI 23 [ _5 ])
(subreg/s/v:SI (reg:DI 19 [ _1 ]) 0))
(nil))
A subsequent sign_extension is generated for the return value - which
is probably due to function promote mode being always, but IMO its not
right (SI to DI. vs QI to DI) and is done for wrong reasons (outgoing
function return value, not incoming call return value)
Which seems roughly sensible to me as well.
(insn 10 9 11 2 (set (reg:DI 24) <-- sign extension
(ashift:DI (subreg:DI (reg:SI 23 [ _5 ]) 0)
(const_int 32 [0x20])))
(nil))
(insn 11 10 12 2 (set (reg:DI 24)
(ashiftrt:DI (reg:DI 24)
(const_int 32 [0x20])))
(nil))
I would really like to solve this problem. One approach will be *not*
setting the SUBREG_PROMOTED_VAR_P unconditionally in expand_call at
the time of subreg creation. I tried a hack to clear it always and it
does fix my failing test, generate the missing &= 0xff
No, that would be a mistake I strongly suspect. In general, if we know
a subreg has a particular promotion state, then we should be seeing
those SUBREG_PROMOTED things unconditionally. THere *are* cases where
promotion can cause missed optimizations, largely in cases where we have
sub-word objects as args/return values and promotion hides the fact that
those upper bits are don't cares and that can't always be recovered.
You can see this in some bit twiddling cases on rv64.
I think you're barking up the wrong tree.
jeff