The case comes from spec2006 403.gcc (or old GCC itself).
for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
{
vd->e[i].mode = VOIDmode;
vd->e[i].oldest_regno = i;
vd->e[i].next_regno = INVALID_REGNUM;
}
It is vectorized and only then completely peeled.
Only after peeling all stored values become constant.
Currently expand_vec_perm_pblendv works as following:
Let d.target, d.op0, dop1 be permutation parameters.
First we permute an operand (d.op1 or d.op0) and then blend it with
other argument:
d.target = shuffle(d.op1) /* using expand_vec_perm_1 */
d.target = pblend(d.op0, d.target)
(if d.op0 equal to d.target this is buggy)
Patch make it more accurate:
tmp = gen_reg_rtx (vmode)
tmp = shuffle(d.op1) /* using expand_vec_perm_1 */
d.target = pblend(d.op0, tmp)
(Here d.op0 can be equal to d.target)
Below is rtl dump of buggy case:
(183t.optimized)
...
vect_shuffle3_low_470 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 32, 33, 34,
35 }, { 0, 4, 0, 1 }>;
vect_shuffle3_high_469 = VEC_PERM_EXPR <vect_shuffle3_low_470, {
4294967295, 4294967295, 4294967295, 4294967295 }, { 0, 1, 4, 3 }>;
...
(184r.expand)
...
(insn 205 204 206 (set (reg:V4SI 768)
(const_vector:V4SI [
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
])) ../regrename.c:1171 -1
(nil))
(insn 206 205 208 (set (reg:V4SI 769)
(mem/u/c:V4SI (symbol_ref/u:DI ("*.LC28") [flags 0x2]) [3 S16
A128])) ../regrename.c:1171 -1
(expr_list:REG_EQUAL (const_vector:V4SI [
(const_int 32 [0x20])
(const_int 33 [0x21])
(const_int 34 [0x22])
(const_int 35 [0x23])
])
(nil)))
(insn 208 206 207 (set (reg:V4SI 770)
(vec_select:V4SI (vec_concat:V8SI (reg:V4SI 768)
(reg:V4SI 769))
(parallel [
(const_int 0 [0])
(const_int 4 [0x4])
(const_int 1 [0x1])
(const_int 5 [0x5])
]))) ../regrename.c:1171 -1
(nil))
(insn 207 208 209 (set (reg:V4SI 464 [ D.15061 ])
(vec_select:V4SI (reg:V4SI 770)
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 0 [0])
(const_int 2 [0x2])
]))) ../regrename.c:1171 -1
(nil))
(insn 209 207 210 (set (reg:V4SI 771)
(const_vector:V4SI [
(const_int -1 [0xffffffffffffffff])
(const_int -1 [0xffffffffffffffff])
(const_int -1 [0xffffffffffffffff])
(const_int -1 [0xffffffffffffffff])
])) ../regrename.c:1171 -1
(nil))
(insn 210 209 211 (set (reg:V4SI 464 [ D.15061 ])
(vec_select:V4SI (reg:V4SI 771)
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 0 [0])
(const_int 3 [0x3])
]))) ../regrename.c:1171 -1
(nil))
(insn 211 210 212 (set (reg:V8HI 772)
(vec_merge:V8HI (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
(subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
(const_int 48 [0x30]))) ../regrename.c:1171 -1
(nil))
...
On Tue, Dec 9, 2014 at 12:06 PM, Uros Bizjak <[email protected]> wrote:
> On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak <[email protected]> wrote:
>
>>> The patch fix pblendv expand.
>>> The bug was uncovered when permutation operands are constants.
>>> In this case we init target register for expand_vec_perm_1 with
>>> constant and then rewrite the target with constant for
>>> expand_vec_perm_pblend.
>>>
>>> The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
>>> -flto -march=corei7.
>>>
>>> Bootstrap and make check passed.
>>>
>>> Is it ok?
>>
>> Please add a testcase.
>
> Also, it surprises me that we enter expand_vec_perm_pblendv with
> uninitialized (?) target. Does your patch only papers over a real
> problem up the call chain (hard to say without a testcase)?
>
> Uros.
>
>>
>>>
>>> Evgeny
>>>
>>> 2014-12-09 Evgeny Stupachenko <[email protected]>
>>>
>>> gcc/
>>> * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
>>> expand_vec_perm_1 target.
>>>
>>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>>> index eafc15a..5a914ad 100644
>>> --- a/gcc/config/i386/i386.c
>>> +++ b/gcc/config/i386/i386.c
>>> @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d
>>> *d)
>>> dcopy.op0 = dcopy.op1 = d->op1;
>>> else
>>> dcopy.op0 = dcopy.op1 = d->op0;
>>> + dcopy.target = gen_reg_rtx (vmode);
>>> dcopy.one_operand_p = true;
>>>
>>> for (i = 0; i < nelt; ++i)