On 8/5/23 01:46, Xiao Zeng wrote:
The operands to the comparison need to be in DImode for rv64 and SImode
for rv32. That's the X iterator.
After analyzing the rtl log, I can't agree more with this sentence.
Note the mode of the comparison
operands may be different than the mode of the destination. ie, we
might have a 64bit comparison and produce a 32bit sign extended result
much like the setcc insns support.
This patch changes the 6 zicond patterns to use the X iterator on the
comparison inputs. That at least makes the patterns correct and fixes
this particular testcase. There's a few other lurking problems that
I'll address in additional patches.
Committed to the trunk,
Jeff
1 In any case, jeff's analysis is convincing, here I will add a little bit of
my own analysis.
2 First, for the test cases:
foo(long long int x, long long int y) {
if (((int)x | (int)y) != 0)
return 6;
return x + y;
}
look directly at the compared assembly code. This allows people to quickly
realize where the error occurred.
X_mode.s(right)
ANYI_mode.S(error)
10 foo: 10 foo:
11 or a5,a0,a1 11
or a5,a0,a1
12 sext.w a5,a5 12
li a4,6
13 addw a0,a0,a1 13
addw a0,a0,a1
14 li a4,6
15 czero.nez a1,a0,a5 14
czero.nez a1,a0,a5
16 czero.eqz a0,a4,a5 15
czero.eqz a0,a4,a5
17 or a0,a0,a1 16
or a0,a0,a1
18 ret
17 ret
3 You will find that the correct assembly is just one more assembly
instruction: sext.w a5,a5, the rest of the two are exactly the same.
4 From the perspective of assembly instructions, the a5 value obtained
by sext.w a5, a5 may be different from the original a5 value, which leads
to errors in the test case.
Exactly.
5 However, it is difficult to directly see that an error has occurred
from the rtl log of gcc's passe.
Yes. It's pretty subtle, but par for the course once subreg expressions
are involved. subregs are a major wart in the design/implementation of
RTL due to them having fairly obscure semantics that are dependent on
whether they are widening or narrowing subregs, if the inner object is a
mem or something else, etc and how they interact with
WORD_REGISTER_OPERATIONS.
6 I'm wondering about transforms like this:
In test.c.c.301r.ira
(insn 36 34 42 2 (set (reg:DI 153)
(if_then_else:DI (eq:DI (subreg:SI (reg:DI 145) 0)
(const_int 0 [0]))
(reg:DI 149)
(const_int 0 [0]))) "test.c":4:12 13599 {*czero.nez.disi}
(expr_list:REG_DEAD (reg:DI 149)
(nil)))
But it's already broken in this form. There was a sign extension in
the IL up through the combine pass. In fact, I nearly made the argument
that the bug was in combine's simplification of (sign_extend (subreg
...)) to (subreg ...).
But ultimately the hardware semantics of risc-v are that comparisons
look at the full register, plain and simple. So a narrowing subreg to a
sub-word mode is simply wrong i this context. This is also consistent
with the implementation of conditional branches and scc insns in the
risc-v backend.
In test.c.c.302r.reload it becomes
(insn 36 34 42 2 (set (reg:DI 11 a1 [153])
(if_then_else:DI (eq:DI (reg:SI 15 a5 [145])
(const_int 0 [0]))
(reg:DI 10 a0 [149])
(const_int 0 [0]))) "test.c":4:12 13599 {*czero.nez.disi}
(nil))
Obviously, (subreg:SI (reg:DI 145) 0) is transformed into (reg:SI 15 a5 [145])
after
passing through reload pass. This conversion is wrong, why did gcc not warn?
The only reason subregs exist is because pseudo registers have a single
mode -- which derives from the fact that there is precisely once
instance of any given pseudo register. See RTL sharing section of the
internals manual.
A subreg expression allows us to look at a pseudo in a mode other than
its natural mode. Essentially its the same bits reinterpreted in
another mode. I'm over-simplifying a lot. The full details of subreg
semantics are documented in the RTL section of the internals manual.
WORD_REGISTER_OPERATIONS is a property of the target that (again
over-simplifying) says that an operation such as an add may have a given
size (say SImode) in the RTL, but the hardware performs the operation in
its native word size (DImode for rv64).
While WORD_REGISTER_OPERATIONS has been around since the 90s, its
semantics have always been a bit vague. It's seen as a wart and I'd
very much like to get away from using it on the RISC-V port.
Register allocation assigns hard registers to the pseudo registers and
the RTL is changed to reflect that register allocation. At that point
we have no more pseudos in the IL as they have been replaced by hard
registers. Register allocation is split into two distinct phases. IRA
and LRA. It's an over-simplification, but IRA is the main register
allocator and LRA is primarily tasked with making each operand match its
constraints -- ie spilling and reloading.
Hard registers are not shared objects, meaning we can have multiple
instances of the same hard register. So near the end of register
allocation we transform all subregs into an appropriate reg expression.
This simplifies the implementation of various passes that run after
register allocation/reloading.
So the transformation (subreg:SI (reg:DI a5)) into (reg:SI a5) by the
register allocation is valid and normal. That subreg expression should
never have shown up in a comparison to begin with.
7 I'm not very familiar with reload pass, maybe someone can give me a
brief introduction, or tell me where to find relevant information? Thanks.
There is some discussion of reload in the internals manual which covers
some of the basics. The implementation details are incredibly complex.
Additionally there are two completely distinct implementations. There
is "reload" which was the original implementation from the late 1980s
and is still used by some targets, though it's considered deprecated at
this point. And there's the LRA implementation which was added in the
2000s.
The RISC-V port is an LRA port. The best documentation of LRA is the
code itself (lra-*.cc) and the papers that have presented at the various
GNU Cauldron and GCC Summit conferences.
Jeff