Hi, ping... for the documentation part.
Thanks, Bernd. On 09/06/17 17:31, Kyrill Tkachov wrote: > > On 06/09/17 14:17, Bernd Edlinger wrote: >> On 09/06/17 14:51, Richard Earnshaw (lists) wrote: >>> On 06/09/17 13:44, Bernd Edlinger wrote: >>>> On 09/04/17 21:54, Bernd Edlinger wrote: >>>>> Hi Kyrill, >>>>> >>>>> Thanks for your review! >>>>> >>>>> >>>>> On 09/04/17 15:55, Kyrill Tkachov wrote: >>>>>> Hi Bernd, >>>>>> >>>>>> On 18/01/17 15:36, Bernd Edlinger wrote: >>>>>>> On 01/13/17 19:28, Bernd Edlinger wrote: >>>>>>>> On 01/13/17 17:10, Bernd Edlinger wrote: >>>>>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote: >>>>>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> this is related to PR77308, the follow-up patch will depend >>>>>>>>>>> on this >>>>>>>>>>> one. >>>>>>>>>>> >>>>>>>>>>> When trying the split the *arm_cmpdi_insn and >>>>>>>>>>> *arm_cmpdi_unsigned >>>>>>>>>>> before reload, a mis-compilation in libgcc function >>>>>>>>>>> __gnu_satfractdasq >>>>>>>>>>> was discovered, see [1] for more details. >>>>>>>>>>> >>>>>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly >>>>>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split >>>>>>>>>>> up into this: >>>>>>>>>>> >>>>>>>>>>> [(set (reg:CC CC_REGNUM) >>>>>>>>>>> (compare:CC (match_dup 0) (match_dup 1))) >>>>>>>>>>> (parallel [(set (reg:CC CC_REGNUM) >>>>>>>>>>> (compare:CC (match_dup 3) (match_dup 4))) >>>>>>>>>>> (set (match_dup 2) >>>>>>>>>>> (minus:SI (match_dup 5) >>>>>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) >>>>>>>>>>> (const_int >>>>>>>>>>> 0))))])] >>>>>>>>>>> >>>>>>>>>>> [(set (reg:CC CC_REGNUM) >>>>>>>>>>> (compare:CC (match_dup 2) (match_dup 3))) >>>>>>>>>>> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0)) >>>>>>>>>>> (set (reg:CC CC_REGNUM) >>>>>>>>>>> (compare:CC (match_dup 0) (match_dup >>>>>>>>>>> 1))))] >>>>>>>>>>> >>>>>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare >>>>>>>>>>> is not mentioning that the reg:CC is also dependent on the >>>>>>>>>>> reg:CC >>>>>>>>>>> from before. Therefore the *arm_cmpsi_insn appears to be >>>>>>>>>>> redundant and thus got removed, because the data values are >>>>>>>>>>> identical. >>>>>>>>>>> >>>>>>>>>>> I think that applies to a number of similar pattern where data >>>>>>>>>>> flow is happening through the CC reg. >>>>>>>>>>> >>>>>>>>>>> So this is a kind of correctness issue, and should be fixed >>>>>>>>>>> independently from the optimization issue PR77308. >>>>>>>>>>> >>>>>>>>>>> Therefore I think the patterns need to specify the true >>>>>>>>>>> value that will be in the CC reg, in order for cse to >>>>>>>>>>> know what the instructions are really doing. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>>>>>>>>>> Is it OK for trunk? >>>>>>>>>>> >>>>>>>>>> I agree you've found a valid problem here, but I have some issues >>>>>>>>>> with >>>>>>>>>> the patch itself. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> (define_insn_and_split "subdi3_compare1" >>>>>>>>>> [(set (reg:CC_NCV CC_REGNUM) >>>>>>>>>> (compare:CC_NCV >>>>>>>>>> (match_operand:DI 1 "register_operand" "r") >>>>>>>>>> (match_operand:DI 2 "register_operand" "r"))) >>>>>>>>>> (set (match_operand:DI 0 "register_operand" "=&r") >>>>>>>>>> (minus:DI (match_dup 1) (match_dup 2)))] >>>>>>>>>> "TARGET_32BIT" >>>>>>>>>> "#" >>>>>>>>>> "&& reload_completed" >>>>>>>>>> [(parallel [(set (reg:CC CC_REGNUM) >>>>>>>>>> (compare:CC (match_dup 1) (match_dup 2))) >>>>>>>>>> (set (match_dup 0) (minus:SI (match_dup 1) >>>>>>>>>> (match_dup >>>>>>>>>> 2)))]) >>>>>>>>>> (parallel [(set (reg:CC_C CC_REGNUM) >>>>>>>>>> (compare:CC_C >>>>>>>>>> (zero_extend:DI (match_dup 4)) >>>>>>>>>> (plus:DI (zero_extend:DI (match_dup 5)) >>>>>>>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int >>>>>>>>>> 0))))) >>>>>>>>>> (set (match_dup 3) >>>>>>>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>>>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int >>>>>>>>>> 0))))])] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This pattern is now no-longer self consistent in that before the >>>>>>>>>> split >>>>>>>>>> the overall result for the condition register is in mode >>>>>>>>>> CC_NCV, but >>>>>>>>>> afterwards it is just CC_C. >>>>>>>>>> >>>>>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly >>>>>>>>>> reflect the result of the 64-bit comparison), but that then >>>>>>>>>> implies that >>>>>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and >>>>>>>>>> should in >>>>>>>>>> fact also be CC_NCV. Thinking about this pattern, I'm >>>>>>>>>> inclined to >>>>>>>>>> agree >>>>>>>>>> that CC_NCV is the correct mode for this operation >>>>>>>>>> >>>>>>>>>> I'm not sure if there are other consequences that will fall >>>>>>>>>> out from >>>>>>>>>> fixing this (it's possible that we might need a change to >>>>>>>>>> select_cc_mode >>>>>>>>>> as well). >>>>>>>>>> >>>>>>>>> Yes, this is still a bit awkward... >>>>>>>>> >>>>>>>>> The N and V bit will be the correct result for the subdi3_compare1 >>>>>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI >>>>>>>>> ...) >>>>>>>>> only gets the C bit correct, the expression for N and V is a >>>>>>>>> different >>>>>>>>> one. >>>>>>>>> >>>>>>>>> It probably works, because the subsi3_carryin_compare >>>>>>>>> instruction sets >>>>>>>>> more CC bits than the pattern does explicitly specify the value. >>>>>>>>> We know the subsi3_carryin_compare also computes the NV bits, but >>>>>>>>> it is >>>>>>>>> hard to write down the correct rtl expression for it. >>>>>>>>> >>>>>>>>> In theory the pattern should describe everything correctly, >>>>>>>>> maybe, like: >>>>>>>>> >>>>>>>>> set (reg:CC_C CC_REGNUM) >>>>>>>>> (compare:CC_C >>>>>>>>> (zero_extend:DI (match_dup 4)) >>>>>>>>> (plus:DI (zero_extend:DI (match_dup 5)) >>>>>>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>>>>>> set (reg:CC_NV CC_REGNUM) >>>>>>>>> (compare:CC_NV >>>>>>>>> (match_dup 4)) >>>>>>>>> (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) >>>>>>>>> (const_int 0))) >>>>>>>>> set (match_dup 3) >>>>>>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>>>>>> >>>>>>>>> >>>>>>>>> But I doubt that will work to set CC_REGNUM with two different >>>>>>>>> modes >>>>>>>>> in parallel? >>>>>>>>> >>>>>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that >>>>>>>>> implicitly >>>>>>>>> defines C from the DImode result, and NV from the SImode result, >>>>>>>>> similar to the CC_NOOVmode, that also leaves something open what >>>>>>>>> bits it really defines? >>>>>>>>> >>>>>>>>> >>>>>>>>> What do you think? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Bernd. >>>>>>>> I think maybe the right solution is to invent a new CCmode >>>>>>>> that defines C as if the comparison is done in DImode >>>>>>>> but N and V as if the comparison is done in SImode. >>>>>>>> >>>>>>>> I thought maybe I would call it CC_NCV_CIC (CIC = >>>>>>>> Carry-In-Compare), >>>>>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because >>>>>>>> only N and Z are set correctly), but in a different patch of >>>>>>>> course. >>>>>>>> >>>>>>>> Attached is a new version that implements the new CCmode. >>>>>>>> >>>>>>>> How do you like this new version? >>>>>>>> >>>>>>>> It seems to be able to build a cross-compiler at least. >>>>>>>> >>>>>>>> I will start a new bootstrap with this new patch, but that can take >>>>>>>> some >>>>>>>> time until I have definitive results. >>>>>>>> >>>>>>>> Is there still a chance that it can go into gcc-7 or should it wait >>>>>>>> for the next stage1? >>>>>>>> >>>>>>>> Thanks >>>>>>>> Bernd. >>>>>>> I thought I should also look at where the subdi_compare1 amd the >>>>>>> negdi2_compare patterns are used, and look if the caller is fine >>>>>>> with >>>>>>> not having all CC bits available. >>>>>>> >>>>>>> And indeed usubv<mode>4 turns out to be questionabe, because it >>>>>>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch >>>>>>> (LTU, >>>>>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses >>>>>>> CCmode. >>>>>>> >>>>>>> To correct this, the branch should use CC_Cmode which is always >>>>>>> defined. >>>>>>> >>>>>>> So I tried to test this pattern, with the following test programs, >>>>>>> and found that the code actually improves when the branch uses >>>>>>> CC_Cmode >>>>>>> instead of CCmode, both for SImode and DImode data, which was a bit >>>>>>> surprising. >>>>>>> >>>>>>> I used this test program to see how the usubv<mode>4 pattern works: >>>>>>> >>>>>>> cat test.c (DImode) >>>>>>> unsigned long long x, y, z; >>>>>>> int b; >>>>>>> void test() >>>>>>> { >>>>>>> b = __builtin_sub_overflow (y,z, &x); >>>>>>> } >>>>>>> >>>>>>> >>>>>>> unpatched code used 8 byte more stack than patched, >>>>>>> because the DImode subtraction is effectively done twice. >>>>>>> >>>>>>> cat test1.c (SImode) >>>>>>> unsigned long x, y, z; >>>>>>> int b; >>>>>>> void test() >>>>>>> { >>>>>>> b = __builtin_sub_overflow (y,z, &x); >>>>>>> } >>>>>>> >>>>>>> which generates (unpatched): >>>>>>> cmp r3, r0 >>>>>>> sub ip, r3, r0 >>>>>>> >>>>>>> instead of expected (patched): >>>>>>> subs r3, r3, r2 >>>>>>> >>>>>>> >>>>>>> The condition is extracted by ifconversion and/or combine >>>>>>> and complicates the resulting code instead of simplifying. >>>>>>> >>>>>>> I think this happens only when the branch and the subsi/di3_compare1 >>>>>>> is using the same CC mode. >>>>>>> >>>>>>> That does not happen when the CC modes disagree, as with the >>>>>>> proposed patch. All other uses of the pattern are already using >>>>>>> CC_Cmode or CC_Vmode in the branch, and these do not change. >>>>>>> >>>>>>> Attached is an updated version of the patch, that happens to >>>>>>> improve the code generation of the usubsi4 and usubdi4 pattern, >>>>>>> as a side effect. >>>>>>> >>>>>>> >>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>>>>>> Is it OK for trunk? >>>>>> I'm very sorry it has taken so long to review. >>>>>> I've been ramping up on the context recently now so I'll try to move >>>>>> this along... >>>>>> >>>>>> This patch looks mostly ok to me from reading the patterns and the >>>>>> discussion around it. >>>>>> I have one concern: >>>>>> >>>>>> >>>>>> (define_insn_and_split "negdi2_compare" >>>>>> - [(set (reg:CC CC_REGNUM) >>>>>> - (compare:CC >>>>>> + [(set (reg:CC_NCV CC_REGNUM) >>>>>> + (compare:CC_NCV >>>>>> (const_int 0) >>>>>> (match_operand:DI 1 "register_operand" "0,r"))) >>>>>> (set (match_operand:DI 0 "register_operand" "=r,&r") >>>>>> @@ -4647,8 +4650,12 @@ >>>>>> (compare:CC (const_int 0) (match_dup 1))) >>>>>> (set (match_dup 0) (minus:SI (const_int 0) >>>>>> (match_dup 1)))]) >>>>>> - (parallel [(set (reg:CC CC_REGNUM) >>>>>> - (compare:CC (const_int 0) (match_dup 3))) >>>>>> + (parallel [(set (reg:CC_NCV_CIC CC_REGNUM) >>>>>> + (compare:CC_NCV_CIC >>>>>> + (const_int 0) >>>>>> + (plus:DI >>>>>> + (zero_extend:DI (match_dup 3)) >>>>>> + (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>>> (set (match_dup 2) >>>>>> (minus:SI >>>>>> (minus:SI (const_int 0) (match_dup 3)) >>>>>> >>>>>> >>>>>> I was somewhat concerned with having the first operand of the COMPARE >>>>>> being a const_int 0 and the second being >>>>>> a complex expression as the RTL canonicalization rules usually >>>>>> require >>>>>> the complex operand going first if possible. >>>>>> Reading the RTL rules in rtl.texi I see it says this: >>>>>> "If one of the operands is a constant, it should be placed in the >>>>>> second operand and the comparison code adjusted as appropriate." >>>>>> So it seems that the pre-existing pattern that puts const_int 0 as >>>>>> the >>>>>> first operand already breaks that rule. >>>>>> I think we should fix that and update the use of condition code to a >>>>>> GEU rather than LTU as well. >>>>>> >>>> Well, the sentence before that one is even more explicit: >>>> >>>> "Normally, @var{x} and @var{y} must have the same mode. Otherwise, >>>> @code{compare} is valid only if the mode of @var{x} is in class >>>> @code{MODE_INT} and @var{y} is a @code{const_int} or >>>> @code{const_double} with mode @code{VOIDmode}." >>>> >>>> So because the const_int 0 has VOIDmode the comparison is done >>>> in y-mode not x-mode. >>>> >>>> But unfortunately I see no way how to accomplish this, >>>> because this assumes that the compare can be easily swapped >>>> if the conditional instruction just uses one of GT/GE/LE/LT >>>> or GTU/GEU/LEU/LTU. But that is only the case for plain CCmode. >>>> >>>> And in this example we ask for "overflow", but while 0-X can >>>> overflow X-0 simply can't. And moreover there are non-symmetric >>>> modes like CC_NCVmode which only support LT/GE/LTU/GEU but not >>>> the swapped conditions GT/LE/GTU/LEU. >>>> >>>> I think the only solution would be to adjust the spec to >>>> reflect the implementation: >>>> >>>> Index: rtl.texi >>>> =================================================================== >>>> --- rtl.texi (revision 251752) >>>> +++ rtl.texi (working copy) >>>> @@ -2252,6 +2252,13 @@ >>>> If one of the operands is a constant, it should be placed in the >>>> second operand and the comparison code adjusted as appropriate. >>>> >>>> +There may be exceptions from this rule if the mode @var{m} carries >>>> +not enough information for the swapped comparison operator, or >>> There may be exceptions _to_ ... if mode @var{m} does not carry >>> enough... >>> >>>> +if we ask for overflow from the subtraction. >>> Aren't we really trying to 'detect overflow' rather than 'ask' for it? >>> >>>> That means, while >>>> +0-X may overfow X-0 can never overflow. Under these conditions >>>> +a compare may have the constant expression at the left side. >>> In these circumstances the constant will be in the first operand . >>> >>> (left and right don't really make sense for RTL). >>>> +Examples are the ARM negdi2_compare pattern and similar. >>>> + >>>> A @code{compare} specifying two @code{VOIDmode} constants is not >>>> valid >>>> since there is no way to know in what mode the comparison is to be >>>> performed; the comparison must either be folded during the >>>> compilation >>>> >>>> >>>> >>>> Please advise. >>>> >>>> Thanks >>>> Bernd. >>>> >>>> >>>>> Hmmm... >>>>> >>>>> I think the compare is not a commutative operation, and swapping >>>>> the arguments will imply a different value in the flags. >>>>> >>>>> So if I write >>>>> (set (reg:CC_NCV CC_REGNUM) >>>>> (compare:CC_NCV >>>>> (const_int 0) >>>>> (reg:DI 123))) >>>>> >>>>> I have C,N,V set to the result of (0 - r123), C = usable for LTU or >>>>> GEU, >>>>> N,V = usable for LT, GE >>>>> >>>>> But if I write >>>>> (set (reg:CC_NCV CC_REGNUM) >>>>> (compare:CC_NCV >>>>> (reg:DI 123) >>>>> (const_int 0))) >>>>> >>>>> I have C,N,V set to the result of (r123 - 0), but the expansion stays >>>>> the same and the actual value in the flags is defined by the >>>>> expansion. >>>>> Of course there exists probably no matching expansion for that. >>>>> >>>>> Note that both LTU in the above hunk are in a parallel-stmt and >>>>> operate >>>>> on the flags from the previous pattern, so changing these to GEU >>>>> will probably be wrong. >>>>> >>>>> Both (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)) in the negdi2_compare >>>>> use the flags from the previous (set (reg:CC CC_REGNUM) (compare:CC >>>>> (const_int 0) (match_dup 1)). >>>>> >>>>> One use of the resulting flags (I know of) is in negvdi3 where we >>>>> have: >>>>> >>>>> emit_insn (gen_negdi2_compare (operands[0], operands[1])); >>>>> arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]); >>>>> >>>>> I think only 0-x can overflow while x-0 can never overflow. >>>>> >>>>> Of course the CC_NCV_CIC mode bends the definition of the RTL compare >>>>> a lot and I guess if this pattern is created by a splitter, this can >>>>> only be expanded by an exactly matching pattern, there is (hopefully) >>>>> no way how combine could mess with this pattern due to the exotic >>>>> CCmode. So while I think it would work to swap only the notation of >>>>> all CC_NCV_CIC patterns, _without_ changing the assembler-parts and >>>>> the >>>>> consuming statements, that would make it quite hard to follow for the >>>>> human reader at least. >>>>> >>>>> What do you think? > > I agree on the readability point. It's already very hard to follow the > CCmode stuff as it is :( > My concern was on whether some RTL pass (combine in particular) would > trip up on this > nominally non-canonical RTL. As long as nothing manipulates these RTXes > and they match > the precise single pattern they're expected to match we should be ok I > think. One wonders > whether we would be better off using unspecs if the compare operator is > not exactly what we need, > but I think we can get away with COMPARE for now. > > >>>>> >>>>> Bernd. >> Attached is the patch with an update to the rtl.texi documentation. >> The code does not change, so I did no new bootstrap. >> >> >> Is it OK for trunk? > > This looks ok to me, but I'd feel more comfortable if someone more > knowledgeable on RTL could > ok the documentation part (I guess it's outside my remit anyway) as the > overflow case you pointed out > does indeed make the compare operator not commutative. > > So the arm parts are ok assuming the documentation change is ok'd. > Richard, would you be okay with the suggested documentation change? > > Thanks, > Kyrill > >> >> Thanks >> Bernd. >