On Thu, 17 Oct 2013, Kenneth Zadeck wrote: > On 10/17/2013 04:46 AM, Richard Biener wrote: > > the case actually comes up on the ppc because they do a lot of 128 bit > > math. I think i got thru the x86-64 without noticing this. > > > Well, it'd be suspicious if we're directly using 128-bit numbers > > > in addr_wide_int. The justification for the assertion was that we > > > should explicitly truncate to addr_wide_int when deliberately > > > ignoring upper bits, beyond bit or byte address width. 128 bits > > > definitely falls into that category on powerpc. > > My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid > > wide-int value if it has precision 16. AFAIK that is what the > > code produces, but now Kenny says this is only for some kind > > of wide-ints but not all? That is, why is > > The issue is not that the rules are different between the different flavors of > wide int, it is that the circumstances are different. The rule is that the > only bits above the precision that exist are if the precision is not an even > multiple of the HBPWI. In that case, the bits are always an extension of the > sign bits. > > max_wide_int and addr_wide_int have large enough precisions so that it is > impossible to ever generate an unsigned number on the target that is large > enough to ever run against the precision. However, for the fixed precision > case, you can, and at least on the ppc, you do, generate unsigned numbers that > are big enough to have to be recanonicalized like this. > > > inline wi::storage_ref > > wi::int_traits <const_tree>::decompose (HOST_WIDE_INT *scratch, > > unsigned int precision, const_tree > > x) > > { > > unsigned int len = TREE_INT_CST_NUNITS (x); > > const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) &TREE_INT_CST_ELT (x, > > 0); > > return wi::storage_ref (val, len, precision); > > } > > > > not a valid implementation together with making sure that the > > INTEGER_CST tree rep has that extra word of zeros if required? > > I would like to see us move in that direction (given that the > > tree rep of INTEGER_CST has transitioned to variable-length already). > > > > Btw, code such as > > > > tree > > wide_int_to_tree (tree type, const wide_int_ref &pcst) > > { > > ... > > unsigned int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1); > > bool recanonize = sgn == UNSIGNED > > && small_prec > > && (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == > > len; > > > > definitely needs a comment. I would have thought _all_ unsigned > > numbers need re-canonicalization. Well, maybe only if we're > > forcing the extra zeros. > It will get a comment. > > [This function shows another optimization issue: > > > > case BOOLEAN_TYPE: > > /* Cache false or true. */ > > limit = 2; > > if (wi::leu_p (cst, 1)) > > ix = cst.to_uhwi (); > > > > I would have expected cst <= 1 be optimized to cst.len == 1 && > > cst.val[0] <= 1. It expands to > > > > <L27>: > > MEM[(long int *)&D.50698 + 16B] = 1; > > MEM[(struct wide_int_ref_storage *)&D.50698] = &MEM[(struct > > wide_int_ref_storage *)&D.50698].scratch; > > MEM[(struct wide_int_ref_storage *)&D.50698 + 8B] = 1; > > MEM[(struct wide_int_ref_storage *)&D.50698 + 12B] = 32; > > _277 = MEM[(const struct wide_int_storage *)&cst + 260B]; > > if (_277 <= 64) > > goto <bb 42>; > > else > > goto <bb 43>; > > > > <bb 42>: > > xl_491 = zext_hwi (1, 32); // ok, checking enabled and thus out-of-line > > _494 = MEM[(const long int *)&cst]; > > _495 = (long unsigned int) _494; > > yl_496 = zext_hwi (_495, _277); > > _497 = xl_491 < yl_496; > > goto <bb 44>; > > > > <bb 43>: > > _503 = wi::ltu_p_large (&MEM[(struct wide_int_ref_storage > > *)&D.50698].scratch, 1, 32, &MEM[(const struct wide_int_storage > > *)&cst].val, len_274, _277); > > > > this keeps D.50698 and cst un-SRAable - inline storage is problematic > > for this reason. But the representation should guarantee the > > compare with a low precision (32 bit) constant is evaluatable > > at compile-time if len of the larger value is > 1, no? > > > > <bb 44>: > > # _504 = PHI <_497(42), _503(43)> > > D.50698 ={v} {CLOBBER}; > > if (_504 != 0) > > goto <bb 45>; > > else > > goto <bb 46>; > > > > <bb 45>: > > pretmp_563 = MEM[(const struct wide_int_storage *)&cst + 256B]; > > goto <bb 229> (<L131>); > > > > <bb 46>: > > _65 = generic_wide_int<wide_int_storage>::to_uhwi (&cst, 0); > > ix_66 = (int) _65; > > goto <bb 91>; > > > > The question is whether we should try to optimize wide-int for > > such cases or simply not use wi:leu_p (cst, 1) but rather > > > > if (cst.fits_uhwi_p () == 1 && cst.to_uhwi () < 1) > > > > ? > i find this ugly, but i see where you are coming from. The problem is that > both you and i know that the len has to be 1, but the optimizer does not. > This is a case where I think that we made a mistake getting rid of the > wi::one_p, wi::zero_p and wi::minus_one_p. The internals of one_p were return > (len == 1 && val[0] ==1) and i think that is much nicer than what you put > there. On the other hand, it seem that a person more skilled than i am > with c++ could specialize the comparisons with an integer constant, since i > believe that that constant must fit in one hwi (I am a little concerned about > large unsigned constants).
one_p and zero_p wouldn't catch a compare with 3 ;) So yes, it looks like if we cannot trick the optimizers to do the right thing, like with inline bool wi::lts_p (const wide_int_ref &x, const wide_int_ref &y) { if (x.precision <= HOST_BITS_PER_WIDE_INT && y.precision <= HOST_BITS_PER_WIDE_INT) return x.slow () < y.slow (); else if (x.precision <= HOST_BITS_PER_WIDE_INT) // why not len == 1? ... optimize else if (y.precision <= HOST_BITS_PER_WIDE_INT) ... optimize else return lts_p_large (x.val, x.len, x.precision, y.val, y.len, y.precision); } which of course pessimizes the case where the precisions are not known at compile-time. So maybe the above should always be if (__builtin_constant_p (x.precision) && x.precision <= HOST_BITS_PER_WIDE_INT .... and lts_p_large should handle the small precision case efficiently (but out-of-line?) Richard.