[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-29 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #33 from Richard Henderson  ---
Author: rth
Date: Thu Oct 29 18:36:39 2015
New Revision: 229550

URL: https://gcc.gnu.org/viewcvs?rev=229550=gcc=rev
Log:
Fix target/68124

PR target/68124
PR rtl-opt/67609
* config/i386/i386.c (ix86_cannot_change_mode_class): Tighten
sse check to the exact conditions of PR 67609.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #31 from Richard Biener  ---
(In reply to Richard Henderson from comment #28)
> (In reply to Richard Biener from comment #26)
> > Hmm, I don't see this documented anywhere.  In fact there is no such
> > thing as a "vector register", there are only vector modes.  And we
> > are using %xmm for plain SF/DFmode all over the place.
> > 
> > Note that in the particular case the mode we subreg is TImode,
> > not a vector mode.
> 
> You're right, my language was sloppy.  The problem I describe is going
> to be true for any register whose reg_raw_mode is larger than word_mode.
> 
> The assumption is that assignment to a word_mode subreg both (1) cannot
> affect bits outside the word_mode and (2) can be simplified to a plain
> hard register post-reload.
> 
> Part deux is impossible when reg_raw_mode is larger than word_mode,
> because the subreg-y assignment result is indistinguishable from a
> normal word_mode assignment.
> 
> > That may be a workaround for x86 but I don't see how this fixes the
> > issue in general given that targets may have general registers larger
> > than word_mode
> 
> It doesn't fix other targets, true.  But I don't see how to do that
> without changes to each target.

Drop (2)?  But that requires to touch every target as well I guess
(and can't be done incrementally).

If we go with your fix can you please amend the documentation to mention
this problem and suggest what targets need (not) to do to not run into
this problem?

> > (is x32 TARGET_64BIT?).
> 
> Yes.


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-27 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #32 from Richard Henderson  ---
Author: rth
Date: Tue Oct 27 19:59:41 2015
New Revision: 229458

URL: https://gcc.gnu.org/viewcvs?rev=229458=gcc=rev
Log:
PR rtl-opt/67609

* config/i386/i386.c (ix86_cannot_change_mode_class): Disallow
narrowing subregs on SSE and MMX registers.
* doc/tm.texi.in (CANNOT_CHANGE_MODE_CLASS): Clarify when subregs that
appear to be sub-words of multi-register pseudos must be rejected.
* doc/tm.texi: Regenerate.
testsuite/
* gcc.target/i386/pr67609-2.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr67609-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c
trunk/gcc/doc/tm.texi
trunk/gcc/doc/tm.texi.in
trunk/gcc/testsuite/ChangeLog


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-26 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #29 from Richard Henderson  ---
(In reply to Vladimir Makarov from comment #27)
> (In reply to Vladimir Makarov from comment #25)
> > So it would be nice to benchmark it.  I'll try to do this on
> > Friday.
> 
> Practically every SPEC2000 benchmark failed to compile with this patch.  GCC
> crashes in split2 pass with messages like this
> 
> Error: unrecognizable insn:
> (insn 381 345 67 3 (set (subreg:V2DF (reg:DF 22 xmm1 [287]) 0)
> (xor:V2DF (subreg:V2DF (reg:DF 22 xmm1 [287]) 0)
> (mem/u/c:V2DF (symbol_ref/u:DI ("*.LC6") [flags 0x2]) [2  S16
> A128])))

Ho hum.  Sorry, Vlad, if I'd bothered bootstrapping I'd have seen this myself.
Please change != to < in the patch to re-try.  (That is, allow the TO mode to
be wider than the FROM mode in order to support the paradoxical subregs seen
above.)


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-26 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #30 from Vladimir Makarov  ---
(In reply to Richard Henderson from comment #29)
>
> 
> Ho hum.  Sorry, Vlad, if I'd bothered bootstrapping I'd have seen this
> myself.
> Please change != to < in the patch to re-try.  (That is, allow the TO mode to
> be wider than the FROM mode in order to support the paradoxical subregs seen
> above.)

Thanks.  I've checked the patch on Haswell x86-64 SPEC2000 using -O3
-mtune=corei7.  The patch changes code of 3 out of 12 SPECInt tests and 9 of 14
SPECFP tests.  The code size always increases.  No surprise.  But the change is
insignificant (about 0.001%).  The biggest SPEC rate decrease is achieved on
SPECFP and is only about 0.15%.

So I guess the patch is ok to fix the problem.


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-23 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #28 from Richard Henderson  ---
(In reply to Richard Biener from comment #26)
> Hmm, I don't see this documented anywhere.  In fact there is no such
> thing as a "vector register", there are only vector modes.  And we
> are using %xmm for plain SF/DFmode all over the place.
> 
> Note that in the particular case the mode we subreg is TImode,
> not a vector mode.

You're right, my language was sloppy.  The problem I describe is going
to be true for any register whose reg_raw_mode is larger than word_mode.

The assumption is that assignment to a word_mode subreg both (1) cannot
affect bits outside the word_mode and (2) can be simplified to a plain
hard register post-reload.

Part deux is impossible when reg_raw_mode is larger than word_mode,
because the subreg-y assignment result is indistinguishable from a
normal word_mode assignment.

> That may be a workaround for x86 but I don't see how this fixes the
> issue in general given that targets may have general registers larger
> than word_mode

It doesn't fix other targets, true.  But I don't see how to do that
without changes to each target.

> (is x32 TARGET_64BIT?).

Yes.


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-23 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #27 from Vladimir Makarov  ---
(In reply to Vladimir Makarov from comment #25)
> So it would be nice to benchmark it.  I'll try to do this on
> Friday.

Practically every SPEC2000 benchmark failed to compile with this patch.  GCC
crashes in split2 pass with messages like this

Error: unrecognizable insn:
(insn 381 345 67 3 (set (subreg:V2DF (reg:DF 22 xmm1 [287]) 0)
(xor:V2DF (subreg:V2DF (reg:DF 22 xmm1 [287]) 0)
(mem/u/c:V2DF (symbol_ref/u:DI ("*.LC6") [flags 0x2]) [2  S16
A128]))) apsi.f:3324 -1
 (nil))
apsi.f:3336:0: internal compiler error: in extract_insn, at recog.c:2297
0x9b9848 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
/home/cygnus/vmakarov/build1/trunk3/gcc/gcc/rtl-error.c:109
0x9b9879 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
/home/cygnus/vmakarov/build1/trunk3/gcc/gcc/rtl-error.c:117
0x988907 extract_insn(rtx_insn*)
/home/cygnus/vmakarov/build1/trunk3/gcc/gcc/recog.c:2297
0x988991 extract_insn_cached(rtx_insn*)
/home/cygnus/vmakarov/build1/trunk3/gcc/gcc/recog.c:2188
0x7e3f5d cleanup_subreg_operands(rtx_insn*)
/home/cygnus/vmakarov/build1/trunk3/gcc/gcc/final.c:3112
0x98638c split_insn
/home/cygnus/vmakarov/build1/trunk3/gcc/gcc/recog.c:2910
0x98ab77 split_all_insns()
/home/cygnus/vmakarov/build1/trunk3/gcc/gcc/recog.c:2964
0x98abd8 rest_of_handle_split_after_reload
/home/cygnus/vmakarov/build1/trunk3/gcc/gcc/recog.c:3902
0x98abd8 execute
/home/cygnus/vmakarov/build1/trunk3/gcc/gcc/recog.c:3931
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-23 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #26 from Richard Biener  ---
(In reply to Richard Henderson from comment #22)
> (In reply to Jeffrey A. Law from comment #21)
> > So going back to the original problem, for a subreg of a multi-word reg, 
> > why can't we simplify that down to a suitably sized reg?
> 
> Because we're dealing with registers of different sizes.
> 
> Assigning to a subreg as the low-part of a multi-word pseudo only
> makes sense when talking about general registers, which is the only
> place that "word_mode" applies.

Hmm, I don't see this documented anywhere.  In fact there is no such
thing as a "vector register", there are only vector modes.  And we
are using %xmm for plain SF/DFmode all over the place.

Note that in the particular case the mode we subreg is TImode,
not a vector mode.

> When talking about vector registers, which are universally larger
> than word-mode, we cannot simply assign to a subreg.
> 
> There is a vec_set named pattern that can perform an insertion into
> a vector element, like what's being demonstrated in the test source
> here.  Ideally that's how we'd have expanded this originally.

Indeed, if expand can see we are setting the low part of a vector
then it should try using vec_set.  Auditing of other targets might
be necessary here though.  And of course the i386 backend might
end up choosing movdf for this operation anyway...

> We already have ix86_cannot_change_mode_class to avoid the cases
> that we knew we couldn't support, e.g. QI and HImode loads/stores.
> But perhaps we should prevent all size-changing mode changes for
> the vector registers.

That may be a workaround for x86 but I don't see how this fixes the
issue in general given that targets may have general registers larger
than word_mode (is x32 TARGET_64BIT?).


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-22 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #21 from Jeffrey A. Law  ---
On 10/22/2015 07:40 AM, rguenth at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609
>
> --- Comment #20 from Richard Biener  ---
> (In reply to Jeffrey A. Law from comment #19)
>> Preserving the upper part when setting the low part would be expressed via
>> STRICT_LOW_PART.  At least that's what I'd expect to see.
>
> According to documentation this is only needed for smaller-than-word subregs:
Yea, you're probably right.  Sigh.

So going back to the original problem, for a subreg of a multi-word reg, 
why can't we simplify that down to a suitably sized reg?

Jeff


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #18 from Richard Biener  ---
(In reply to Jeffrey A. Law from comment #16)
> reload has traditionally removed subregs of hardregs and passes after reload
> have depended on that behaviour.  Doing something similar in lra is
> obviously necessary.  In fact, subregs of multi-word hard regs isn't ever
> supposed to appear in the except during allocation & reloading.

But how do you preserve semantics of preserving the upper part then?
Do recognized insns never change after reload and thus the target needs
to make sure to have insns "separated" enough that the property is preserved
solely by means of the selected insn (we don't record the chosen alternatives
in the insns, do we?)?

> I'm not sure why final has another call to cleanup_subreg_operands.  While
> git blame blames me, I was just refactoring existing code code back in '98.
> 
> Does that shed any light on what the right behaviour for LRA ought to be?


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-22 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #22 from Richard Henderson  ---
(In reply to Jeffrey A. Law from comment #21)
> So going back to the original problem, for a subreg of a multi-word reg, 
> why can't we simplify that down to a suitably sized reg?

Because we're dealing with registers of different sizes.

Assigning to a subreg as the low-part of a multi-word pseudo only
makes sense when talking about general registers, which is the only
place that "word_mode" applies.

When talking about vector registers, which are universally larger
than word-mode, we cannot simply assign to a subreg.

There is a vec_set named pattern that can perform an insertion into
a vector element, like what's being demonstrated in the test source
here.  Ideally that's how we'd have expanded this originally.

We already have ix86_cannot_change_mode_class to avoid the cases
that we knew we couldn't support, e.g. QI and HImode loads/stores.
But perhaps we should prevent all size-changing mode changes for
the vector registers.


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-22 Thread rth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #23 from Richard Henderson  ---
Created attachment 36563
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36563=edit
possible patch

Certainly this fixes the executable test case from #c13.


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-22 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #25 from Vladimir Makarov  ---
(In reply to Vladimir Makarov from comment #24)
> (In reply to Richard Henderson from comment #23)
> > Created attachment 36563 [details]
> > possible patch
> > 
> > Certainly this fixes the executable test case from #c13.
> 

> The patch will force any pseudo which is accessed as a subregister of DImode
> to be spilled.

Probably, I exaggerated with "any", but still the patch will result in more
loads/stores.  So it would be nice to benchmark it.  I'll try to do this on
Friday.


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-22 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #24 from Vladimir Makarov  ---
(In reply to Richard Henderson from comment #23)
> Created attachment 36563 [details]
> possible patch
> 
> Certainly this fixes the executable test case from #c13.

This patch could be a solution to generate a correct code.  Unfortunately, it
generates ineffective code:

movdqa  reg(%rip), %xmm1
movaps  %xmm1, -24(%rsp)
movsd   %xmm0, -24(%rsp)
movapd  -24(%rsp), %xmm2
movaps  %xmm2, reg(%rip)

instead of better code which would be expected for this case

movdqa  reg(%rip), %xmm1
movlpd  %xmm0, %xmm1
movaps  %xmm1, reg(%rip)

The patch will force any pseudo which is accessed as a subregister of DImode to
be spilled.  Although I don't know how it will affect performance of SPEC
benchmarks.


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #20 from Richard Biener  ---
(In reply to Jeffrey A. Law from comment #19)
> Preserving the upper part when setting the low part would be expressed via
> STRICT_LOW_PART.  At least that's what I'd expect to see.

According to documentation this is only needed for smaller-than-word subregs:

"When used as an lvalue, @code{subreg} is a word-based accessor.
Storing to a @code{subreg} modifies all the words of @var{reg} that
overlap the @code{subreg}, but it leaves the other words of @var{reg}
alone.

for the case in question, while strict-low-part:

"When storing to a normal @code{subreg} that is smaller than a word,
the other bits of the referenced word are usually left in an undefined
state.  This laxity makes it easier to generate efficient code for
such instructions.  To represent an instruction that preserves all the
bits outside of those in the @code{subreg}, use @code{strict_low_part}
or @code{zero_extract} around the @code{subreg}.


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-22 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #19 from Jeffrey A. Law  ---
Preserving the upper part when setting the low part would be expressed via
STRICT_LOW_PART.  At least that's what I'd expect to see.


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-21 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #14 from Vladimir Makarov  ---
(In reply to Uroš Bizjak from comment #13)
> The runtime version of the test still fails:
> 
> 
> gcc -O2 -pr67609.c
> 
> $ ./a.out
> Aborted
> 
> set_lower:
> .LFB518:
> movdqa  reg(%rip), %xmm1# 6 *movti_internal/4
> >>  movapd  %xmm0, %xmm1# 7 *movdf_internal/14
> movaps  %xmm1, reg(%rip)# 8 *movv2df_internal/3
> ret # 14simple_return_internal
> 
> Marked insn moves the whole register and clobbers the high word of the xmm1.

Yes, right.  The bug is not fixed yet.  Although it is not RA problem anymore,
I believe.

Before final we have

(insn:TI 7 6 8 2 (set (subreg:DF (reg/v:TI 22 xmm1 [orig:89 v ] [89]) 0)
(reg/v:DF 21 xmm0 [orig:90 b ] [90])) b3.c:7 126 {*movdf_internal}
 (expr_list:REG_DEAD (reg/v:DF 21 xmm0 [orig:90 b ] [90])
(nil)))

How can the final pass use movapd for this?  RTL semantics say high part of
xmm1 should be not changed.

I only can guess.  That final pass removes the subreg first as it was done in
LRA earlier.

I see two solutions.  Prevent the final remove the subreg first and generate
corresponding insn (most probably needs some additions to i386.md).  Or prevent
use movapd in *movdf_internal using movlpd instead (by the way it will also
solve other bugs like 67124).

The first solution is less safe as it may affect all targets.  Although it
could be implemented in a safe way: remove the subreg only if there is no insn
definition with subreg.  But I am not a specialist in writing md files to be
sure (e.g. how to treat insn as *movdf_internal on all passes and only as insn
with subreg on the final pass).

[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-21 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #17 from Vladimir Makarov  ---
(In reply to Jeffrey A. Law from comment #16)
> reload has traditionally removed subregs of hardregs and passes after reload
> have depended on that behaviour.  Doing something similar in lra is
> obviously necessary.  In fact, subregs of multi-word hard regs isn't ever
> supposed to appear in the except during allocation & reloading.
> 
> I'm not sure why final has another call to cleanup_subreg_operands.  While
> git blame blames me, I was just refactoring existing code code back in '98.
> 
> Does that shed any light on what the right behaviour for LRA ought to be?

LRA before the patch did the same as reload.  This case shows that still some
subregs should stay after reload otherwise we have wrong transformations (as
making insn setting high part of the register a dead insn).

Of course, some optimizations can not deal with subregs of multi-regs.  The
patch avoids keeping such subregs.  I also found x86 reg-stack pass can not
deal with any subregisters of stack fp regs.

In any case I spent a lot of time for this small patch which works for at least
5 tested targets.

I guess what we need more is to make final pass (at least for x86-64) to deal
with the rest of subregs in a right way.


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-21 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

Uroš Bizjak  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org

--- Comment #15 from Uroš Bizjak  ---
(In reply to Vladimir Makarov from comment #14)

> The first solution is less safe as it may affect all targets.  Although it
> could be implemented in a safe way: remove the subreg only if there is no
> insn definition with subreg.  But I am not a specialist in writing md files
> to be sure (e.g. how to treat insn as *movdf_internal on all passes and only
> as insn with subreg on the final pass).

I think that RTL infrastructure should be fixed/enhanced first to allow proper
handling of subregs through all passes in a consistent way. There is no point
in a special workaround, applicable to only one target, as the same problem
will trigger also for other targets.

Adding Jeff to CC.

[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-21 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at redhat dot com

--- Comment #16 from Jeffrey A. Law  ---
reload has traditionally removed subregs of hardregs and passes after reload
have depended on that behaviour.  Doing something similar in lra is obviously
necessary.  In fact, subregs of multi-word hard regs isn't ever supposed to
appear in the except during allocation & reloading.

I'm not sure why final has another call to cleanup_subreg_operands.  While git
blame blames me, I was just refactoring existing code code back in '98.

Does that shed any light on what the right behaviour for LRA ought to be?


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-21 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #13 from Uroš Bizjak  ---
The runtime version of the test still fails:

--cut here--
#include 

__m128d reg = { 2.0, 4.0 };

void
__attribute__((noinline))
set_lower (double b)
{
  double v[2];
  _mm_store_pd(v, reg);
  v[0] = b;
  reg = _mm_load_pd(v);
}

int
main ()
{
  set_lower (6.0);

  if (reg[1] != 4.0)
abort ();

  return 0;
}
--cut here--

gcc -O2 -pr67609.c

$ ./a.out
Aborted

set_lower:
.LFB518:
movdqa  reg(%rip), %xmm1# 6 *movti_internal/4
>>  movapd  %xmm0, %xmm1# 7 *movdf_internal/14
movaps  %xmm1, reg(%rip)# 8 *movv2df_internal/3
ret # 14simple_return_internal

Marked insn moves the whole register and clobbers the high word of the xmm1.

[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-20 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #12 from Uroš Bizjak  ---
Unfortunately, the patch doesn't fix similar PR67124 and (dup) PR68011.

[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-20 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #10 from Vladimir Makarov  ---
Author: vmakarov
Date: Tue Oct 20 16:26:05 2015
New Revision: 229087

URL: https://gcc.gnu.org/viewcvs?rev=229087=gcc=rev
Log:
2015-10-20  Vladimir Makarov  

PR rtl-optimization/67609
* lra-splill.c (lra_final_code_change): Don't remove all
sub-registers.

2015-10-20  Vladimir Makarov  

PR rtl-optimization/67609
* gcc.target/i386/pr67609.c: New.


Added:
trunk/gcc/testsuite/gcc.target/i386/pr67609.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/lra-spills.c
trunk/gcc/testsuite/ChangeLog


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-20 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #11 from Vladimir Makarov  ---
I've committed the patch into the trunk.  As the patch is not trivial, I'd wait
for a week before committing it into gcc-5-branch to see how it is doing on the
trunk first.


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-16 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #8 from Vladimir Makarov  ---
(In reply to Uroš Bizjak from comment #7)
> (In reply to Richard Biener from comment #4)
> > (In reply to Uroš Bizjak from comment #3)
> > > The doc says:
> > > 
> > >   When used as an lvalue, 'subreg' is a word-based accessor.
> > >   Storing to a 'subreg' modifies all the words of REG that
> > >   overlap the 'subreg', but it leaves the other words of REG
> > >   alone.
> > 
> > But UNITS_PER_WORD is 8 so (subreg:DF (TI)) should leave the upper half
> > of the TImode register unchanged.
> 
> Indeed, and -m32 creates correct code. So, it is register allocator that
> fails.
> 
> Reconfirmed as rtl-optimization problem.

It is a quite interesting PR which reveals a long lasting latent bug in GCC.

Basically we have before LRA

2: r90:DF=xmm0:DF
  REG_DEAD xmm0:DF
3: NOTE_INSN_FUNCTION_BEG
6: r89:TI=[`reg']
7: r89:TI#0=r90:DF
  REG_DEAD r90:DF
8: [`reg']=r89:TI#0

LRA and reload pass produces

6: xmm1:TI=[`reg']
7: xmm1:DF=xmm0:DF
8: [`reg']=xmm1:V2DF

They does not do any transformations except transforming subreg of hard
register in insn #7.  And after that insn #6 is removed as a dead one by
subsequent optimizations.  In order to avoid removing insn #6 we need to keep
the subreg until the final pass:

7: xmm1:TI#0=xmm0:DF

Why do LRA and reload remove subregs of hard registers? That is because some
subsequent optimizations can handle them.

Last two days I've been struggling to find solution which involves only LRA
(partial removing subreg of hard regs) but still failing.

In any case, even if I find such solution in LRA, it needs extensive testing on
other targets and probably it will be ready next week at the best.

[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-10-16 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

--- Comment #9 from Vladimir Makarov  ---
(In reply to Vladimir Makarov from comment #8)
> 
> Why do LRA and reload remove subregs of hard registers? That is because some
> subsequent optimizations can handle them.
> 

Sorry, it should be *can not handle*.

Basically keeping subregisters crashes other passes on GCC bootstrap.


[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd

2015-09-17 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

Uroš Bizjak  changed:

   What|Removed |Added

  Component|target  |rtl-optimization

--- Comment #7 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #4)
> (In reply to Uroš Bizjak from comment #3)
> > The doc says:
> > 
> >   When used as an lvalue, 'subreg' is a word-based accessor.
> >   Storing to a 'subreg' modifies all the words of REG that
> >   overlap the 'subreg', but it leaves the other words of REG
> >   alone.
> 
> But UNITS_PER_WORD is 8 so (subreg:DF (TI)) should leave the upper half
> of the TImode register unchanged.

Indeed, and -m32 creates correct code. So, it is register allocator that fails.

Reconfirmed as rtl-optimization problem.