On 07/30/16 13:39, Segher Boessenkool wrote: > On Sat, Jul 30, 2016 at 08:17:59AM +0000, Bernd Edlinger wrote: >> Ok, I the incorrect REG_POINTER is done here: >> >> cse_main -> reg_scan -> reg_scan_mark_refs -> set_reg_attrs_from_value >> >> and here I see a bug, because if POINTERS_EXTEND_UNSIGNED >> can_be_reg_pointer is only set to false if x is SIGN_EXTEND but not >> if x is a SUBREG as in this case. >> >> So I think that should be fixed this way: >> >> Index: emit-rtl.c >> =================================================================== >> --- emit-rtl.c (revision 238891) >> +++ emit-rtl.c (working copy) >> @@ -1155,7 +1155,7 @@ set_reg_attrs_from_value (rtx reg, rtx x) >> || (GET_CODE (x) == SUBREG && subreg_lowpart_p (x))) >> { >> #if defined(POINTERS_EXTEND_UNSIGNED) >> - if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED) >> + if (((GET_CODE (x) != ZERO_EXTEND && POINTERS_EXTEND_UNSIGNED) >> || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED)) >> && !targetm.have_ptr_extend ()) >> can_be_reg_pointer = false; >> >> >> What do you think does this look like the right fix? > > There also is (from rtl.texi), for paradoxical subregs: > > """ > @item @code{subreg} of @code{reg}s > The upper bits are defined when @code{SUBREG_PROMOTED_VAR_P} is true. > @code{SUBREG_PROMOTED_UNSIGNED_P} describes what the upper bits hold. > Such subregs usually represent local variables, register variables > and parameter pseudo variables that have been promoted to a wider mode. > """ > > so you might want to test for these as well. >
like this? Index: emit-rtl.c =================================================================== --- emit-rtl.c (revision 238891) +++ emit-rtl.c (working copy) @@ -1156,7 +1156,11 @@ { #if defined(POINTERS_EXTEND_UNSIGNED) if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED) - || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED)) + || (GET_CODE (x) == ZERO_EXTEND && ! POINTERS_EXTEND_UNSIGNED) + || (paradoxical_subreg_p (x) + && ! (SUBREG_PROMOTED_VAR_P (x) + && SUBREG_CHECK_PROMOTED_SIGN (x, + POINTERS_EXTEND_UNSIGNED)))) && !targetm.have_ptr_extend ()) can_be_reg_pointer = false; #endif In the test case of pr71779 the subreg is no promoted var, so this has no influence at this time. Also I have not POINTERS_EXTEND_SIGNED target, but for the symmetry it ought to check explicitly for ZERO_EXTEND as well, and allow the pointer value to pass thru a TRUNCATE. >> With this patch the code the reg/f:DI 481 does no longer appear, >> and also the invalid combine does no longer happen. > > Good :-) > >> However the test case from pr70903 does not get fixed by this. >> >> But when I look at the dumps, I see again the first incorrect >> transformation in cse2 (again cse!): >> >> (insn 20 19 21 2 (set (reg:DI 94) >> (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64} >> (expr_list:REG_EQUAL (const_int 255 [0xff]) >> (expr_list:REG_DEAD (reg:QI 93) >> (nil)))) >> >> but that is simply wrong, because later optimization passes >> expect reg 94 to be 0x000000ff but the upper bits are unspecified, >> so that REG_EQUAL should better not exist. > > Agreed. So where does that come from? > >> When I looked at cse.c I saw a comment in #if 0, which exactly >> describes the problem that we have with paradoxical subreg here: > > <snip> > >> I am pretty sure, that this has to be reverted, and that it can >> generate less efficient code. >> >> What do you think? > > I think this pessimises the generated code too much; there must be a > better solution. > I debugged the cse again, to see how it works and why it mis-compiles this example. I found out that the trouble starts one instruction earlier: (insn 19 40 20 2 (set (subreg:DI (reg:QI 93) 0) ) pr.c:8 50 {*movdi_aarch64} (expr_list:REG_DEAD (reg:DI 110 [ D.3037 ]) (nil))) cse_main sees the constant value and maps: (reg:QI 93) => (const_int 255 [0xff]) plus (I mean that is wrong): (subreg:DI (reg:QI 93) 0) => (const_int 255 [0xff]) When the next insn is scanned (insn 20 19 21 2 (set (reg:DI 94) (subreg:DI (reg:QI 93) 0)) pr.c:8 50 {*movdi_aarch64} (expr_list:REG_DEAD (reg:QI 93) (nil))) I see fold_rtx (subreg:DI (reg:QI 93) 0)) return (const_int 255 [0xff]) which is wrong. now cse maps: (reg:DI 94) => (const_int 255 [0xff]) which is also not guaranteed to be correct, but the REG_EQUAL at insn 20 is probably only a symptom, suppressing only that does not fix the test case, because later we have: (insn 25 24 26 2 (set (reg:DI 97) (const_int 255 [0xff])) pr.c:9 50 {*movdi_aarch64} (nil)) (insn 26 25 48 2 (set (zero_extract:DI (reg:DI 102 [ D.3038 ]) (const_int 32 [0x20]) (const_int 32 [0x20])) (reg:DI 97)) pr.c:9 691 {*insv_regdi} (expr_list:REG_DEAD (reg:DI 97) (nil))) get replaced to (insn 26 24 48 2 (set (zero_extract:DI (reg:DI 102 [ D.3038 ]) (const_int 32 [0x20]) (const_int 32 [0x20])) (reg:DI 94)) pr.c:9 691 {*insv_regdi} (expr_list:REG_DEAD (reg:DI 97) (nil))) because when insn 25 is scanned it the constant 255 is already mapped to reg 94 at insn 20. now cse maps: (reg:DI 97) => (reg:DI 94) therefore canon_reg (reg:DI 97) returns (reg:DI 94) which makes canonicalize_insn do the wrong transformation at insn 26. Now I think I found a better place for a patch, where the first bogus mapping is recorded: Index: cse.c =================================================================== --- cse.c (revision 238891) +++ cse.c (working copy) @@ -5898,15 +5898,7 @@ cse_insn (rtx_insn *insn) || GET_MODE (dest) == BLKmode /* If we didn't put a REG_EQUAL value or a source into the hash table, there is no point is recording DEST. */ - || sets[i].src_elt == 0 - /* If DEST is a paradoxical SUBREG and SRC is a ZERO_EXTEND - or SIGN_EXTEND, don't record DEST since it can cause - some tracking to be wrong. - - ??? Think about this more later. */ - || (paradoxical_subreg_p (dest) - && (GET_CODE (sets[i].src) == SIGN_EXTEND - || GET_CODE (sets[i].src) == ZERO_EXTEND))) + || sets[i].src_elt == 0) continue; /* STRICT_LOW_PART isn't part of the value BEING set, @@ -5925,6 +5917,11 @@ cse_insn (rtx_insn *insn) sets[i].dest_hash = HASH (dest, GET_MODE (dest)); } + /* If DEST is a paradoxical SUBREG, don't record DEST since it can + cause some tracking to be wrong. */ + if (paradoxical_subreg_p (dest)) + continue; + elt = insert (dest, sets[i].src_elt, sets[i].dest_hash, GET_MODE (dest)); So apparently there was already an attempt of a fix for a similar bug, and svn blame points to: svn log -v -r8354 ------------------------------------------------------------------------ r8354 | kenner | 1994-10-28 23:55:05 +0100 (Fri, 28 Oct 1994) | 3 lines Changed paths: M /trunk/gcc/cse.c (cse_insn): Don't record a DEST a paradoxical SUBREG and SRC is a SIGN_EXTEND or ZERO_EXTEND. ------------------------------------------------------------------------ This way we can still map the underlying QI register to 255 but not the SUBREG if it is a paradoxical subreg. In the test case this patch still works (output code does not change). What do you think? Bernd.