On 11/04/2013 04:12 AM, Richard Biener wrote:
On Fri, 1 Nov 2013, Richard Sandiford wrote:

I'm building one target for each supported CPU and comparing the wide-int
assembly output of gcc.c-torture, gcc.dg and g++.dg with the corresponding
output from the merge point.  This patch removes all the differences I saw
for alpha-linux-gnu in gcc.c-torture.

Hunk 1: Preserve the current trunk behaviour in which the shift count
is truncated if SHIFT_COUNT_TRUNCATED and not otherwise.  This was by
inspection after hunk 5.

Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider
types and where we weren't extending according to the sign of the source.
We should probably assert that the input is at least as wide as the type...

Hunk 4: The "&" in:

              if ((TREE_INT_CST_HIGH (arg1) & mask_hi) == mask_hi
                  && (TREE_INT_CST_LOW (arg1) & mask_lo) == mask_lo)

had got dropped during the conversion.

Hunk 5: The old code was:

          if (shift > 0)
            {
              *mask = r1mask.llshift (shift, TYPE_PRECISION (type));
              *val = r1val.llshift (shift, TYPE_PRECISION (type));
            }
          else if (shift < 0)
            {
              shift = -shift;
              *mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns);
              *val = r1val.rshift (shift, TYPE_PRECISION (type), !uns);
            }

and these precision arguments had two purposes: to control which
bits of the first argument were shifted, and to control the truncation
mask for SHIFT_TRUNCATED.  We need to pass a width to the shift functions
for the second.

(BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to the
end of the !GENERATOR_FILE list in rtl.def, since the current position caused
some spurious differences.  The "problem" AFAICT is that hash_rtx hashes
on code, RTL PRE creates registers in the hash order of the associated
expressions, RA uses register numbers as a tie-breaker during ordering,
and so the order of rtx_code can indirectly influence register allocation.
First time I'd realised that could happen, so just thought I'd mention it.
I think we should keep rtl.def in the current (logical) order though.)

Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK for wide-int?
Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED
from double-int.c on the trunk?  If not then putting this at the
callers of wi::rshift and friends is clearly asking for future
mistakes.

Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else
than in machine instruction patterns was a mistake in the past.
I disagree. The programmer has the right to expect that the optimizer transforms the program in a manner that is consistent, but faster, with the underlying machine. Saying you are going to do it in the patterns is not realistic.

I should point out that i had originally put the shift count truncated inside wide-int and you (richi) told me to push it back out to the callers. I personally think that it should be returned to wide-int to assure consistent behavior.

However, i would also say that it would be nice if were actually done right. There are (as far as i know) actually 3 ways handling shift amounts that have been used in the past:

(1) mask the shift amount by (bitsize - 1). This is the most popular and is what is done if you set SHIFT_COUNT_TRUNCATED. (2) mask the shift amount by ((2*bitsize) -1). There are a couple of architectures that do this including the ppc. However, gcc has never done it correctly for these platforms. (3) assume that the shift amount can be a big number. The only machine that i know of that ever did this was the Knuth's mmix, and AFAIK, it has never "known" silicon. (it turns out that this is almost impossible to build efficiently). And yet, we support this if you do not set SHIFT_COUNT_TRUNCATED. I will, if you like, fix this properly. But i want to see a plan that at least richi and richard are happy with first. My gut feeling is to provide a target hook that takes a bitsize and a shift value as wide-int (or possibly a hwi) and returns a machine specific shift amount. Then i would put that inside of the shift routines of wide-int.

kenny





Reply via email to