On 8/5/23 01:46, Xiao Zeng wrote:


The operands to the comparison need to be in DImode for rv64 and SImode
for rv32.  That's the X iterator.
After analyzing the rtl log, I can't agree more with this sentence.

Note the mode of the comparison
operands may be different than the mode of the destination.  ie, we
might have a 64bit comparison and produce a 32bit sign extended result
much like the setcc insns support.

This patch changes the 6 zicond patterns to use the X iterator on the
comparison inputs.  That at least makes the patterns correct and fixes
this particular testcase.   There's a few other lurking problems that
I'll address in additional patches.

Committed to the trunk,
Jeff

1 In any case, jeff's analysis is convincing, here I will add a little bit of 
my own analysis.

2 First, for the test cases:

foo(long long int x, long long int y) {
   if (((int)x | (int)y) != 0)
     return 6;
   return x + y;
}

look directly at the compared assembly code. This allows people to quickly
realize where the error occurred.

X_mode.s(right)                                                    
ANYI_mode.S(error)
10 foo:                                                                  10 foo:
       11         or      a5,a0,a1                                          11  
      or      a5,a0,a1
       12         sext.w  a5,a5                                            12   
     li      a4,6
       13         addw    a0,a0,a1                                      13      
  addw    a0,a0,a1
       14         li      a4,6
       15         czero.nez       a1,a0,a5                             14       
 czero.nez       a1,a0,a5
       16         czero.eqz       a0,a4,a5                             15       
 czero.eqz       a0,a4,a5
       17         or      a0,a0,a1                                          16  
      or      a0,a0,a1
       18         ret                                                           
 17        ret

3 You will find that the correct assembly is just one more assembly
instruction: sext.w  a5,a5, the rest of the two are exactly the same.

4 From the perspective of assembly instructions, the a5 value obtained
by sext.w a5, a5 may be different from the original a5 value, which leads
to errors in the test case.
Exactly.


5 However, it is difficult to directly see that an error has occurred
from the rtl log of gcc's passe.
Yes. It's pretty subtle, but par for the course once subreg expressions are involved. subregs are a major wart in the design/implementation of RTL due to them having fairly obscure semantics that are dependent on whether they are widening or narrowing subregs, if the inner object is a mem or something else, etc and how they interact with WORD_REGISTER_OPERATIONS.



6 I'm wondering about transforms like this:

In test.c.c.301r.ira
(insn 36 34 42 2 (set (reg:DI 153)
         (if_then_else:DI (eq:DI (subreg:SI (reg:DI 145) 0)
                 (const_int 0 [0]))
             (reg:DI 149)
             (const_int 0 [0]))) "test.c":4:12 13599 {*czero.nez.disi}
      (expr_list:REG_DEAD (reg:DI 149)
         (nil)))
But it's already broken in this form. There was a sign extension in the IL up through the combine pass. In fact, I nearly made the argument that the bug was in combine's simplification of (sign_extend (subreg ...)) to (subreg ...).

But ultimately the hardware semantics of risc-v are that comparisons look at the full register, plain and simple. So a narrowing subreg to a sub-word mode is simply wrong i this context. This is also consistent with the implementation of conditional branches and scc insns in the risc-v backend.


In test.c.c.302r.reload it becomes
(insn 36 34 42 2 (set (reg:DI 11 a1 [153])
         (if_then_else:DI (eq:DI (reg:SI 15 a5 [145])
                 (const_int 0 [0]))
             (reg:DI 10 a0 [149])
             (const_int 0 [0]))) "test.c":4:12 13599 {*czero.nez.disi}
      (nil))

Obviously, (subreg:SI (reg:DI 145) 0) is transformed into (reg:SI 15 a5 [145]) 
after
passing through reload pass. This conversion is wrong, why did gcc not warn?
The only reason subregs exist is because pseudo registers have a single mode -- which derives from the fact that there is precisely once instance of any given pseudo register. See RTL sharing section of the internals manual.

A subreg expression allows us to look at a pseudo in a mode other than its natural mode. Essentially its the same bits reinterpreted in another mode. I'm over-simplifying a lot. The full details of subreg semantics are documented in the RTL section of the internals manual.

WORD_REGISTER_OPERATIONS is a property of the target that (again over-simplifying) says that an operation such as an add may have a given size (say SImode) in the RTL, but the hardware performs the operation in its native word size (DImode for rv64).

While WORD_REGISTER_OPERATIONS has been around since the 90s, its semantics have always been a bit vague. It's seen as a wart and I'd very much like to get away from using it on the RISC-V port.

Register allocation assigns hard registers to the pseudo registers and the RTL is changed to reflect that register allocation. At that point we have no more pseudos in the IL as they have been replaced by hard registers. Register allocation is split into two distinct phases. IRA and LRA. It's an over-simplification, but IRA is the main register allocator and LRA is primarily tasked with making each operand match its constraints -- ie spilling and reloading.

Hard registers are not shared objects, meaning we can have multiple instances of the same hard register. So near the end of register allocation we transform all subregs into an appropriate reg expression. This simplifies the implementation of various passes that run after register allocation/reloading.

So the transformation (subreg:SI (reg:DI a5)) into (reg:SI a5) by the register allocation is valid and normal. That subreg expression should never have shown up in a comparison to begin with.



7 I'm not very familiar with reload pass, maybe someone can give me a
brief introduction, or tell me where to find relevant information? Thanks.
There is some discussion of reload in the internals manual which covers some of the basics. The implementation details are incredibly complex. Additionally there are two completely distinct implementations. There is "reload" which was the original implementation from the late 1980s and is still used by some targets, though it's considered deprecated at this point. And there's the LRA implementation which was added in the 2000s.

The RISC-V port is an LRA port. The best documentation of LRA is the code itself (lra-*.cc) and the papers that have presented at the various GNU Cauldron and GCC Summit conferences.

Jeff

Reply via email to