[Bug tree-optimization/39300] vectorizer confused by predictive commoning and PRE

2009-03-08 Thread dorit at gcc dot gnu dot org


--- Comment #5 from dorit at gcc dot gnu dot org  2009-03-08 14:25 ---
This is a known problem... Indeed when Zdenek introduced predictive-commoning
there was a discussion on whether to schedule it before or after vectorization.
AFAIR, it ended up getting scheduled before the vectorizer just because this
happened to be what Zdenek tested/experimented with, and he didn't have a
problem with scheduling it after vectorization as long as it didn't hurt
performance (of mgrid in particular). Here are related threads:
http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01383.html
http://gcc.gnu.org/ml/gcc-patches/2007-02/msg00555.html
http://gcc.gnu.org/ml/gcc-patches/2007-05/msg00571.html

Regardless of whether we scheudule predcom after vectorization, it will still
be useful to teach the vectorizer to handle such dependence patterns, as they
may (and do) appear in the source code.  


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39300



[Bug tree-optimization/39068] signed short plus and signed char plus not vectorized

2009-02-01 Thread dorit at gcc dot gnu dot org


--- Comment #2 from dorit at gcc dot gnu dot org  2009-02-01 21:06 ---
(reminds me of a couple missed-optimization PRs where vectorization is also
failing due to casts - PR31873 , PR26128 - don't know if this is related)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39068



[Bug tree-optimization/37021] Fortran Complex reduction / multiplication not vectorized

2009-01-27 Thread dorit at gcc dot gnu dot org


--- Comment #9 from dorit at gcc dot gnu dot org  2009-01-27 12:40 ---
(In reply to comment #4)
 The testcase should be
 subroutine to_product_of(self,a,b,a1,a2)
   complex(kind=8) :: self (:)
   complex(kind=8), intent(in) :: a(:,:)
   complex(kind=8), intent(in) :: b(:)
   integer a1,a2
   do i = 1,a1
 do j = 1,a2
   self(i) = self(i) + a(j,i)*b(j)
 end do
   end do
 end subroutine
 to be meaningful - otherwise we are accessing a in non-continuous ways in the
 inner loop which would prevent vectorization.

this change from a(i,j) to a(j,i) is not required if we try to vectorize the
outer-loop, where the stride is 1. It's also a better way to vectorize the
reduction. A few limitations on the way though are:

1) somehow don't let gcc create guard code around the innermost loop to check
that it executes more than zero iterations. This creates a complicated control
flow structure within the outer-loop. For now you have to have  constant number
of iterations for the inner-loop because of that, or insert a statement like
if (a2=0) return; before the loop...

2) use -fno-tree-sink cause otherwise it moves the loop iv increment to the
latch block and the vectorizer likes to have the latch block empty...

(see also PR33113 for related reference).


 With the versioning for stride == 1 I get then
 .L13:
 movupd  16(%rax), %xmm1
 movupd  (%rax), %xmm3
 incl%ecx
 movupd  (%rdx), %xmm4
 addq$32, %rax
 movapd  %xmm3, %xmm0
 unpckhpd%xmm1, %xmm3
 unpcklpd%xmm1, %xmm0
 movupd  16(%rdx), %xmm1
 movapd  %xmm4, %xmm2
 addq$32, %rdx
 movapd  %xmm3, %xmm9
 cmpl%ecx, %r8d
 unpcklpd%xmm1, %xmm2
 unpckhpd%xmm1, %xmm4
 movapd  %xmm4, %xmm1
 movapd  %xmm2, %xmm4
 mulpd   %xmm1, %xmm9
 mulpd   %xmm0, %xmm4
 mulpd   %xmm3, %xmm2
 mulpd   %xmm1, %xmm0
 subpd   %xmm9, %xmm4
 addpd   %xmm2, %xmm0
 addpd   %xmm4, %xmm6
 addpd   %xmm0, %xmm5
 ja  .L13
 haddpd  %xmm5, %xmm5
 cmpl%r15d, %edi
 movl-4(%rsp), %ecx
 haddpd  %xmm6, %xmm6
 addsd   %xmm5, %xmm8
 addsd   %xmm6, %xmm7
 jne .L12
 jmp .L14
 for the innermost loop, followed by a tail loop (peel for niters).  This is
 about 15% faster on AMD K10 than the non-vectorized loop (if you disable
 the cost-model and make sure to have enough iterations in the inner loop
 to pay back for the extra guarding conditions).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37021



[Bug tree-optimization/33113] Failing to represent the stride (with array) of a dataref when it is not a constant

2009-01-27 Thread dorit at gcc dot gnu dot org


--- Comment #7 from dorit at gcc dot gnu dot org  2009-01-27 12:46 ---
related testcase/PR: PR37021 
and related discussion: http://gcc.gnu.org/ml/gcc-patches/2009-01/msg01322.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33113



[Bug tree-optimization/37692] New: [alias-improvements-branch] can't alias fortran function arguments

2008-10-01 Thread dorit at gcc dot gnu dot org
This happens in testcases gfortran.dg/vect/vect-[2,3,4].f90 - 
On the alias branch we can't tell that subroutine arguments don't alias. e.g.,
X,Y in SUBROUTINE SAXPY(X,Y,A).
As a result the vectorizer applies loop-versioning with runtime aliasing test,
which also means it will handle misalignment using versioning instead of
peeling:


versioning for alias required: can't determine dependence between
(*x_32(D))[D.1518_28] and (*y_29(D))[D.1518_28]
vect-3.f90:6: note: mark for run-time aliasing test between
(*x_32(D))[D.1518_28] and (*y_29(D))[D.1518_28]
...
vect-3.f90:6: note: Alignment of access forced using versioning.
vect-3.f90:6: note: Versioning for alignment will be applied.



-- 
   Summary: [alias-improvements-branch] can't alias fortran function
arguments
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at gcc dot gnu dot org
 GCC build triplet: i386-linux
  GCC host triplet: i386-linux
GCC target triplet: i386-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37692



[Bug tree-optimization/37693] New: [alias-improvements-branch] can't prove non-zero number of iterations

2008-10-01 Thread dorit at gcc dot gnu dot org
This happens in testcase gfortran.dg/vect/pr32377.f90:
On the alias branch can't prove that number of iteratios is non zero:

Analyzing # of iterations of loop 1
  exit condition [2, + , 1](no_overflow)  D.1554_60
  bounds on difference of bases: -2147483650 ... 2147483645
  result:
zero if D.1554_60 = 1
# of iterations (character(kind=4)) D.1554_60 + 0x0fffe, bounded by
2147483645
  (set_nb_iterations_in_loop = scev_not_known))
(get_loop_exit_condition
  if (D.1554_60 = S.10_78)
)

pr32377.f90:9: note: not vectorized: number of iterations cannot be computed.
pr32377.f90:9: note: bad loop form.
pr32377.f90:4: note: vectorized 0 loops in function.

Using mainline we have:

Analyzing # of iterations of loop 1
  exit condition [2, + , 1](no_overflow)  D.1416_112
  bounds on difference of bases: 0 ... 2147483645
  result:
# of iterations (character(kind=4)) D.1416_112 + 0x0fffe, bounded by
2147483645
  (set_nb_iterations_in_loop = (character(kind=4)) D.1416_112 + 0x0fffe))

pr32377.f90:9: note: == get_loop_niters:(character(kind=4)) D.1416_112 +
0x0(get_loop_exit_condition
  if (S.10_78 = D.1416_112)
)

pr32377.f90:9: note: Symbolic number of iterations is (character(kind=4))
D.1416_112 + 0x0


-- 
   Summary: [alias-improvements-branch] can't prove non-zero number
of iterations
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37693



[Bug tree-optimization/37694] New: [alias-improvements-branch] can't alias (restrict) function-pointer (read) and local array (write)

2008-10-01 Thread dorit at gcc dot gnu dot org
This happens in testcases gcc.dg/vect/no-scevccp-outer-6.c and
gcc.dg/vect/vect-multitypes-6.c:

On the alias branch we can't tell that a read through a (restrict) pointer
(which is a function argument) does not overlap with write to a local arrays.
As a result we try to vectorize the loop using loop-versioning controled by a
run-time aliasing test.

In no-scevccp-outer-6.c this capability is not yet supported for outer-loops so
we can't vectorize the outer-loop (the inner loop does get vectorized).

In vect-multitypes-6.c there are too many runtime checks required, so we bail
out:
 
 === vect_prune_runtime_alias_test_list ===
 vect-multitypes-6.c:34: note: disable versioning for alias - max number of
 generated checks exceeded.
 vect-multitypes-6.c:34: note: too long list of versioning for alias
 run-time tests.
 
(with --param vect-max-version-for-alias-checks=20 we do vectorize the loop).


-- 
   Summary: [alias-improvements-branch] can't alias (restrict)
function-pointer (read) and local array (write)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37694



[Bug tree-optimization/37695] New: [alias-improvements-branch] can't alias a restrict pointer write and a local array read

2008-10-01 Thread dorit at gcc dot gnu dot org
This happens in gcc.dg/vect/vect-42.c:

On the alias branch we can't tell that a write through a restrict pointer
(which is a function argument) does not overlap with reads from local
arrays. As a result we vectorize using loop-versioning controled by a
run-time aliasing test. This in turn forces us to handle misalignment using
loop-versioning (rather than peeling, cause right now we don't support
peeling combined with versioning, and these are the only ways we currently
support misaligned stores). Without the aliasing problem, the loop is
vectorized using peeling to align the store.
  
 === vect_analyze_dependences ===
 vect-42.c:36: note: versioning for alias required: can't determine
 dependence between pb[i_59] and *D.2074_6
 vect-42.c:36: note: mark for run-time aliasing test between pb[i_59] and
 *D.2074_6
 vect-42.c:36: note: versioning for alias required: can't determine
 dependence between pc[i_59] and *D.2074_6
 vect-42.c:36: note: mark for run-time aliasing test between pc[i_59] and
 *D.2074_6
 ...
 vect-42.c:36: note: === vect_enhance_data_refs_alignment ===
 vect-42.c:36: note: Unknown misalignment, is_packed = 0
 vect-42.c:36: note: Alignment of access forced using versioning.
 vect-42.c:36: note: Versioning for alignment will be applied.
 


-- 
   Summary: [alias-improvements-branch] can't alias a restrict
pointer write and a local array read
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37695



[Bug tree-optimization/37698] New: [alias-improvements-branch] pre makes latech-block non-empty

2008-10-01 Thread dorit at gcc dot gnu dot org
This happens in testcase gcc.dg/vect/vect-62.c:

looks like on the alias branch pre is more powerful, as it moves the load into
the latch block; as a result the latch block is not empty, and we fail to
vectorize (with -fno-tree-pre vectorization succeeds).

Related non-empty-latch PRs that prevernt vectorization: PR28643, PR33447


-- 
   Summary: [alias-improvements-branch] pre makes latech-block non-
empty
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37698



[Bug tree-optimization/37699] New: [alias-improvements-branch] can't alias ptr and local array

2008-10-01 Thread dorit at gcc dot gnu dot org
This happens in gcc.dg/vect/vect-96.c and gcc.dg/vect/no-vfa-vect-43.c.

In the first, we can't distinguish between a write through a (local) pointer to
a global array (which is a field in a struct), and a read from a local array. s
a result we vectorize the loop using loop-versioning controled by a run-time
aliasing test, which also means we'll use versioning instead of peeling to
align a misaligned store.

In the second, we can't tell that reads through a pointer (which is a function
argument) do not overlap with a write to a local array. As a result we try to
vectorize the loop using loop-versioning controled by a run-time aliasing
test, however this testcase doe not allow that (--param
vect-max-version-for-alias-checks=0), so vectorization fails.


-- 
   Summary: [alias-improvements-branch] can't alias ptr and local
array
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37699



[Bug tree-optimization/37700] New: [alias-improvements-branch] redundant load doesn't get eliminated

2008-10-01 Thread dorit at gcc dot gnu dot org
This happens in testcase gcc.dg/vect/slp-19.c:

The problem is with the loop at line 17: with trunk we detect that one of
the elements of array 'in' is read twice, so we generate overall 8 loads
(reusing one of them). On the alias branch we do not eliminate the extra
load. All the reads and write are from/to local arrays, by the way. This
results in 9 loads, which the vectorizer interperts as a complicated SLP
permutation, so instead it is vectorized across iterations rather than
using SLP:
 
 slp-19.c:17: note: Load permutation 0 1 2 4 5 6 7 8
 slp-19.c:17: note: Build SLP failed: unsupported load permutation out
 [D.2646_11] = D.2647_12;
 


-- 
   Summary: [alias-improvements-branch] redundant load doesn't get
eliminated
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37700



[Bug tree-optimization/37574] [4.4 Regression] ICE with the vectorizer and GC

2008-09-26 Thread dorit at gcc dot gnu dot org


--- Comment #4 from dorit at gcc dot gnu dot org  2008-09-26 06:29 ---
Subject: Bug 37574

Author: dorit
Date: Fri Sep 26 06:28:01 2008
New Revision: 140685

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=140685
Log:
PR tree-optimization/37574
* tree-vectorizer.c (vect_is_simple_use): Fix indentation.
* tree-vect-transform.c (vect_get_constant_vectors): Use vectype
instead of vector_type for constants. Take computation out of loop.
(vect_get_vec_def_for_operand): Use only vectype for constant case,
and use only vector_type for invariant case.
(get_initial_def_for_reduction): Use vectype instead of vector_type.


Added:
trunk/gcc/testsuite/gcc.dg/vect/ggc-pr37574.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/vect/vect.exp
trunk/gcc/tree-vect-transform.c
trunk/gcc/tree-vectorizer.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37574



[Bug tree-optimization/37574] [4.4 Regression] ICE with the vectorizer and GC

2008-09-21 Thread dorit at gcc dot gnu dot org


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |dorit at gcc dot gnu dot org
   |dot org |
 Status|NEW |ASSIGNED
   Last reconfirmed|2008-09-19 14:12:43 |2008-09-21 13:17:55
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37574



[Bug tree-optimization/37574] [4.4 Regression] ICE with the vectorizer and GC

2008-09-21 Thread dorit at gcc dot gnu dot org


--- Comment #3 from dorit at gcc dot gnu dot org  2008-09-21 13:18 ---
happens during outer-loop vectorization. I'm looking into it.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37574



[Bug tree-optimization/37194] Autovectorization of small constant iteration loop degrades performance

2008-08-22 Thread dorit at gcc dot gnu dot org


--- Comment #3 from dorit at gcc dot gnu dot org  2008-08-22 13:31 ---
(In reply to comment #2)
 The x86_64 generated code looks like
...
 I wonder why we do not use movups instead.
 t.i:3: note: Alignment of access forced using peeling.
 t.i:3: note: Peeling for alignment will be applied.

because the vectorizer doesn't support misaligned stores. I think it should be
easy to add - see this old patch:
http://gcc.gnu.org/ml/gcc-patches/2007-01/msg00604.html (and also on
http://gcc.gnu.org/wiki/VectorizationTasks, under todo). 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37194



[Bug bootstrap/37152] tree-vect-transform.c: use of = where == may have been intended

2008-08-19 Thread dorit at gcc dot gnu dot org


--- Comment #2 from dorit at gcc dot gnu dot org  2008-08-19 07:15 ---
Subject: Bug 37152

Author: dorit
Date: Tue Aug 19 07:14:26 2008
New Revision: 139224

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=139224
Log:
PR bootstrap/37152
* tree-vect-transform.c (vect_create_epilog_for_reduction): Change =
to == in assert statement.
(vectorizable_reduction): Fix typo.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-vect-transform.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37152



[Bug bootstrap/37152] tree-vect-transform.c: use of = where == may have been intended

2008-08-18 Thread dorit at gcc dot gnu dot org


--- Comment #1 from dorit at gcc dot gnu dot org  2008-08-18 20:11 ---
(In reply to comment #0)
 I just tried to compile GNU CC version 4.4 snapshot 20080815 with the
 Intel C compiler and it said
 gcc/tree-vect-transform.c(2488): warning #187: use of = where == may have
 been intended
 The source code is
   gcc_assert (ncopies = 1);
 Perhaps 
   gcc_assert (ncopies == 1);
 was intended ?

no... thanks for the catch, I'll commit a fix 


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

Summary|tree-vect-transform.c: use  |tree-vect-transform.c: use
   |of = where == may have  |of = where == may have
   |been intended   |been intended


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37152



[Bug tree-optimization/36844] Vectorizer doesn't support INT-FP conversions with different size

2008-07-22 Thread dorit at gcc dot gnu dot org


--- Comment #2 from dorit at gcc dot gnu dot org  2008-07-22 10:39 ---
(In reply to comment #1)
 One problem is vectorizable_conversion. Is there a way to support
 V4DF/V4DI - D4SI/V4SF
 V8SI - V8SF 

With the current framework, the only way to support 
V8SI - V8SF
is to implement the TARGET_VECTORIZE_BUILTIN_CONVERSION for these modes. 

There's no way in the current framework to support  
V4DF - V4SI
V4DI - V4SF
because of the single-vector-size assumption. These however would be supported:
V4DF - V8SI
V4DI - V8SF
by modeling the idioms unpack[u/s]_float_[lo/hi] and vec_pack_[u/s]fix_trunc
for the respective modes.

I think that in order to really support AVX the vectorizer would need to be
extended to consider multiple vector sizes (which would probably involve more
than just extending the support for conversions).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36844



[Bug middle-end/35343] Sum-reduction loop not recognized

2008-02-25 Thread dorit at gcc dot gnu dot org


--- Comment #1 from dorit at gcc dot gnu dot org  2008-02-25 10:21 ---
(In reply to comment #0)
 It is beneficial to unroll reduction loop (and split the reduction target) to
 reduce dependence height due to recurrence, but GCC does not perform such
 optimization (-O3 -fno-tree-vectorize)

it does, if you use -fvariable-expansion-in-unroller -funroll-loops
(this splits the reduction target into 2 accumulators. For more agressive
spiltting you can use --param max-variable-expansions-in-unrolle=[n])


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35343



[Bug target/32893] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize

2008-01-28 Thread dorit at gcc dot gnu dot org


--- Comment #19 from dorit at gcc dot gnu dot org  2008-01-28 13:20 ---
 Fixed?

In a way, yes. The problem is avoided by generating too conservative code.
AFAIU, a better solution may be expected in 4.4 from the stack alignment
branch. In any case this segfault PR can be closed, and instead a missed
optimization PR could be opened.


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32893



[Bug tree-optimization/34591] [4.3 Regression] internal compiler error: in cost_for_stmt, at tree-vect-transform.c:98

2008-01-03 Thread dorit at gcc dot gnu dot org


--- Comment #6 from dorit at gcc dot gnu dot org  2008-01-03 10:08 ---
(In reply to comment #5)
 I can confirm that pulseaudio 0.9.8 sources which caused the crash, compile
 fine now with the latest gcc 4.3 snapshot.

thanks. (I usually prefer to wait for the person who reported the bug to
confirm that it can be closed)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34591



[Bug tree-optimization/34591] [4.3 Regression] internal compiler error: in cost_for_stmt, at tree-vect-transform.c:98

2008-01-03 Thread dorit at gcc dot gnu dot org


--- Comment #7 from dorit at gcc dot gnu dot org  2008-01-03 10:17 ---
fixed


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34591



[Bug tree-optimization/34591] [4.3 Regression] internal compiler error: in cost_for_stmt, at tree-vect-transform.c:98

2007-12-27 Thread dorit at gcc dot gnu dot org


--- Comment #3 from dorit at gcc dot gnu dot org  2007-12-27 19:14 ---
Subject: Bug 34591

Author: dorit
Date: Thu Dec 27 19:14:17 2007
New Revision: 131206

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=131206
Log:
PR tree-optimization/34591
* tree-vect-trasnform.c (vect_estimate_min_profitable_iters): Skip
stmts (including reduction stmts) that are not live.


Added:
trunk/gcc/testsuite/gcc.dg/vect/pr34591.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-transform.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34591



[Bug tree-optimization/34591] [4.3 Regression] internal compiler error: in cost_for_stmt, at tree-vect-transform.c:98

2007-12-26 Thread dorit at gcc dot gnu dot org


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |dorit at gcc dot gnu dot org
   |dot org |
 Status|NEW |ASSIGNED
   Last reconfirmed|2007-12-26 13:55:29 |2007-12-26 15:29:56
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34591



[Bug tree-optimization/34330] -ftree-parallelize-loops=4 ICE with the vectorizer also

2007-12-19 Thread dorit at gcc dot gnu dot org


--- Comment #3 from dorit at gcc dot gnu dot org  2007-12-19 09:38 ---
 This is a vectorizer vs not being able to run may_alias after it

can you please remind me why we can't run may_alias after the vectorizer? (and
what do you think can be done about it?)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34330



[Bug tree-optimization/34445] [4.3 Regression] internal compiler error: in cost_for_stmt, at tree-vect-transform.c:98

2007-12-17 Thread dorit at gcc dot gnu dot org


--- Comment #5 from dorit at gcc dot gnu dot org  2007-12-17 11:14 ---
Subject: Bug 34445

Author: dorit
Date: Mon Dec 17 11:13:56 2007
New Revision: 131006

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=131006
Log:
PR tree-optimization/34445
* tree-vect-trasnform.c (vect_estimate_min_profitable_iters): Skip
stmts (including live stmts) that are not relevant.


Added:
trunk/gcc/testsuite/gfortran.dg/vect/cost-model-pr34445.f
trunk/gcc/testsuite/gfortran.dg/vect/cost-model-pr34445a.f
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gfortran.dg/vect/vect.exp
trunk/gcc/tree-vect-transform.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34445



[Bug tree-optimization/34445] [4.3 Regression] internal compiler error: in cost_for_stmt, at tree-vect-transform.c:98

2007-12-16 Thread dorit at gcc dot gnu dot org


--- Comment #4 from dorit at gcc dot gnu dot org  2007-12-16 13:06 ---
testing this patch: 

*** tree-vect-transform.c   (revision 130987)
--- tree-vect-transform.c   (working copy)
*** vect_estimate_min_profitable_iters (loop
*** 197,214 
factor = 1;

for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (si))
! {
!   tree stmt = bsi_stmt (si);
!   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
!   if (!STMT_VINFO_RELEVANT_P (stmt_info)
!!STMT_VINFO_LIVE_P (stmt_info))
! continue;
!   scalar_single_iter_cost += cost_for_stmt (stmt) * factor;
!   vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) *
factor;
  /* FIXME: for stmts in the inner-loop in outer-loop vectorization,
 some of the outside costs are generated inside the outer-loop. 
*/
!   vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
! }
  }

/* Add additional cost for the peeled instructions in prologue and epilogue
--- 197,215 
factor = 1;

for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (si))
!   {
! tree stmt = bsi_stmt (si);
! stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
! /* Skip stmts that are not vectorized inside the loop.  */
! if (!STMT_VINFO_RELEVANT_P (stmt_info)
!  STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def)
!   continue;
! scalar_single_iter_cost += cost_for_stmt (stmt) * factor;
! vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) *
factor;
  /* FIXME: for stmts in the inner-loop in outer-loop vectorization,
 some of the outside costs are generated inside the outer-loop. 
*/
! vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
!   }
  }

/* Add additional cost for the peeled instructions in prologue and epilogue


(It fixes both testcases)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34445



[Bug tree-optimization/34445] [4.3 Regression] internal compiler error: in cost_for_stmt, at tree-vect-transform.c:98

2007-12-15 Thread dorit at gcc dot gnu dot org


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |dorit at gcc dot gnu dot org
   |dot org |
 Status|NEW |ASSIGNED
   Last reconfirmed|2007-12-12 20:00:40 |2007-12-15 20:50:23
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34445



[Bug tree-optimization/33319] [4.2 regression] g++.dg/tree-ssa/pr27549.C ICE with vectorization

2007-11-22 Thread dorit at gcc dot gnu dot org


--- Comment #14 from dorit at gcc dot gnu dot org  2007-11-22 15:22 ---
closed, given recent feedback


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33319



[Bug tree-optimization/33869] [4.3 Regression] ICE verify_ssa failed (missing definition for SSA_NAME)

2007-11-22 Thread dorit at gcc dot gnu dot org


--- Comment #15 from dorit at gcc dot gnu dot org  2007-11-22 15:17 ---
(In reply to comment #12)
...
  Richard, is this related to the issue you reported in 
  http://gcc.gnu.org/ml/gcc-patches/2007-10/msg01127.html
  (looks like the same error)?
...
 Yes, these are likely similar problems.  The only difference I see is
 that this one doesn't involve unions?

Richard, any chance you could take a look? (I'm asking just cause it sounds
like you've had recent experience at looking at potentially exactly this kind
of problem...)?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33869



[Bug tree-optimization/33869] [4.3 Regression] ICE verify_ssa failed (missing definition for SSA_NAME)

2007-11-22 Thread dorit at gcc dot gnu dot org


--- Comment #14 from dorit at gcc dot gnu dot org  2007-11-22 15:14 ---
(In reply to comment #13)
 Dorit, can you please take a look again?

I will not be able to look into this in the next couple of weeks, sorry.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33869



[Bug tree-optimization/33860] [4.3 Regression] ICE in vectorizable_load, at tree-vect-transform.c:5503

2007-11-13 Thread dorit at gcc dot gnu dot org


--- Comment #7 from dorit at gcc dot gnu dot org  2007-11-13 13:29 ---
fixed


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33860



[Bug rtl-optimization/34011] Memory load is not eliminated from tight vectorized loop

2007-11-07 Thread dorit at gcc dot gnu dot org


--- Comment #1 from dorit at gcc dot gnu dot org  2007-11-07 18:06 ---
(In reply to comment #0)
 Following testcase exposes optimization problem with current SVN gcc:
...
 the same address
 is accessed with unaligned access (3) as well as aligned access.

This is a missed-optimization in the vectorizer - we use loop-versioning to
deal with the fact that we don't yet support misaligned stores; so the
vectorized version of the loop is guarded by a runtime test that checks that
the address of the store is aligned. However, we don't use the information that
there's a load from the same address that is therefore also guaranteed to be
aligned. 

We actualy have this information (we detect DRs that have the same alignment
and collect them in STMT_VINFO_SAME_ALIGN_REFS), but we don't use it when we do
the versioning. We *do* use this information when instead of versioning the
loop, we peel the loop to make the store aligned. In this case we also mark the
relevant SAME_ALIGN_REFS as aligned and generate aligned accesses for them.

(By the way, the reason we decide to use loop-versioning and not loop-peeling
is because we can't determing whether the pointers overlap at compile time. So
we have to use runtime dependence testing (i.e. versioning for aliasing), and
since we currently don't support both versioning and peeling together, this
dictates that we will use runtime alignment testing instead of peeling.)

Here is how it looks like in the vectorizer dump file:


pr34011.c:14: note: === vect_analyze_dependences ===
pr34011.c:14: note: dependence distance  = 0.
pr34011.c:14: note: accesses have the same alignment.
pr34011.c:14: note: dependence distance modulo vf == 0 between *D.1529_9 and
*D.1529_9
pr34011.c:14: note: versioning for alias required: can't determine dependence
between *D.1531_14 and *D.1529_9
pr34011.c:14: note: mark for run-time aliasing test between *D.1531_14 and
*D.1529_9
...
pr34011.c:14: note: === vect_enhance_data_refs_alignment ===
pr34011.c:14: note: Unknown misalignment, is_packed = 0
pr34011.c:14: note: Alignment of access forced using versioning.
pr34011.c:14: note: Versioning for alignment will be applied.
pr34011.c:14: note: Vectorizing an unaligned access.
pr34011.c:14: note: Vectorizing an unaligned access.


Instead, if I add __restrict__ qualifiers to the pointer arguments, we get
this:


pr34011b.c:14: note: === vect_analyze_dependences ===
pr34011b.c:14: note: dependence distance  = 0.
pr34011b.c:14: note: accesses have the same alignment.
pr34011b.c:14: note: dependence distance modulo vf == 0 between *D.1529_9 and
*D.1529_9
...
pr34011b.c:14: note: === vect_enhance_data_refs_alignment ===
pr34011b.c:14: note: Unknown misalignment, is_packed = 0
...
pr34011b.c:14: note: Alignment of access forced using peeling.
pr34011b.c:14: note: Peeling for alignment will be applied.
pr34011b.c:14: note: Vectorizing an unaligned access.


i.e. we don't need to use runtime dependence testing and version the loop, so
we can use peeling to align the store along with anything that has the same
alignment as the store:

bb 6:
  MEM[base: D.1676, index: ivtmp.142] = M*(vect_p.111 +
ivtmp.142){misalignment: 0}  srcshift | MEM[base: D.1676, index: ivtmp.142];

...
 Missing IV elimination could be attributed to tree loop optimizations, but
 others are IMO RTL optimization problems, 

(except for the misaligned access, which the vectorizer can avoid).

 because we enter RTL generation with:
 bad:
 bb 4:
   MEM[index: ivtmp.127] = M*(vector int *) ivtmp.130{misalignment: 0} 
 srcshift.3 | M*(vector int *) ivtmp.127{misalignment: 0};


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34011



[Bug tree-optimization/34005] [4.3 Regression] ICE: verify_ssa failed (expected an SSA_NAME object)

2007-11-06 Thread dorit at gcc dot gnu dot org


--- Comment #3 from dorit at gcc dot gnu dot org  2007-11-06 18:11 ---
I don't think these are related to PR33680. Sounds like we may be generating a
stmt with a cond_expr at the rhs. The data-reference analysis results in:

base_address: blocks
offset from base address: k_4(D) == 0 ? 8 : 0
constant offset from base address: 0
step: 1
aligned to: 8
base_object: blocks[0][0]
symbol tag: blocks

(Note the cond_expr used to represent the offset).

We probably need to call the gimplifier (if we don't already) and also apply
Zdenek's patch that allows gimplifying rhs cond_exprs -
http://gcc.gnu.org/ml/gcc-patches/2007-07/msg02052.html.


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2007-11-06 18:11:35
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34005



[Bug tree-optimization/34005] [4.3 Regression] ICE: verify_ssa failed (expected an SSA_NAME object)

2007-11-06 Thread dorit at gcc dot gnu dot org


--- Comment #4 from dorit at gcc dot gnu dot org  2007-11-06 18:29 ---
 We probably need to call the gimplifier (if we don't already) and also apply
 Zdenek's patch that allows gimplifying rhs cond_exprs -
 http://gcc.gnu.org/ml/gcc-patches/2007-07/msg02052.html.

Yep - I just tried applying Zdenek's patch to the gimplifier, and it indeed
solves the ICE in both tests. I'll go back and propose this patch for mainline
again.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34005



[Bug tree-optimization/33987] [4.3 regression] internal compiler error: in get_initial_def_for_reduction, at tree-vect-transform.c:2110 with -O3 -msse2

2007-11-03 Thread dorit at gcc dot gnu dot org


--- Comment #3 from dorit at gcc dot gnu dot org  2007-11-04 03:49 ---
Subject: Bug 33987

Author: dorit
Date: Sun Nov  4 03:48:58 2007
New Revision: 129880

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=129880
Log:
PR tree-optimization/33987
* tree-vect-transform.c (get_initial_def_for_reduction): Fix assert.
Fix indentation.
(vectorizable_reduction): Add type check.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-vect-transform.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33987



[Bug tree-optimization/33319] [4.2 regression] g++.dg/tree-ssa/pr27549.C ICE with vectorization

2007-11-03 Thread dorit at gcc dot gnu dot org


--- Comment #11 from dorit at gcc dot gnu dot org  2007-11-04 04:09 ---
(In reply to comment #10)
 Doesn't fail on trunk since r129797:
 2007-10-31  Sebastian Pop  [EMAIL PROTECTED]
 PR tree-optimization/32377
 ...
 before:
 pr27549.C:58: note: create runtime check for data references *D.2383_45 and
 *D.2381_41
 pr27549.C:58: note: LOOP VECTORIZED.
 after:
 pr27549.C:58: note: not vectorized, possible dependence between data-refs
 *D.2383_45 and *D.2381_41

the assumption was that the ICE was related to versioning-for-aliasing
(run-time dependence testing), which, now that the dependence-tester was fixed,
is not required anymore, but:

 Still fails on 4.2 branch.

...we don't have versioning-for-aliasing in 4.2, so this loop could not be
vectorized with 4.2 (unless our dependence tester in 4.2 is able to determine
the dependence without this fix?). Interesting.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33319



[Bug tree-optimization/33987] [4.3 regression] internal compiler error: in get_initial_def_for_reduction, at tree-vect-transform.c:2110 with -O3 -msse2

2007-11-02 Thread dorit at gcc dot gnu dot org


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2007-11-03 03:35:30
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33987



[Bug tree-optimization/33987] [4.3 regression] internal compiler error: in get_initial_def_for_reduction, at tree-vect-transform.c:2110 with -O3 -msse2

2007-11-02 Thread dorit at gcc dot gnu dot org


--- Comment #2 from dorit at gcc dot gnu dot org  2007-11-03 04:06 ---
testing this fix:

Index: tree-vect-transform.c
===
*** tree-vect-transform.c   (revision 129763)
--- tree-vect-transform.c   (working copy)
*** get_initial_def_for_reduction (tree stmt
*** 2107,2113 
tree vector_type;
bool nested_in_vect_loop = false;

!   gcc_assert (INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type));
if (nested_in_vect_loop_p (loop, stmt))
  nested_in_vect_loop = true;
else
--- 2107,2113 
tree vector_type;
bool nested_in_vect_loop = false;

!   gcc_assert (POINTER_TYPE_P (type) || INTEGRAL_TYPE_P (type) ||
SCALAR_FLOAT_TYPE_P (type));
if (nested_in_vect_loop_p (loop, stmt))
  nested_in_vect_loop = true;
else
*** get_initial_def_for_reduction (tree stmt
*** 2120,2136 
case WIDEN_SUM_EXPR:
case DOT_PROD_EXPR:
case PLUS_EXPR:
!   if (nested_in_vect_loop)
!   *adjustment_def = vecdef;
!   else
!   *adjustment_def = init_val;
! /* Create a vector of zeros for init_def.  */
! if (INTEGRAL_TYPE_P (type))
!   def_for_init = build_int_cst (type, 0);
  else
def_for_init = build_real (type, dconst0);
!   for (i = nunits - 1; i = 0; --i)
! t = tree_cons (NULL_TREE, def_for_init, t);
  vector_type = get_vectype_for_scalar_type (TREE_TYPE (def_for_init));
  gcc_assert (vector_type);
  init_def = build_vector (vector_type, t);
--- 2120,2136 
case WIDEN_SUM_EXPR:
case DOT_PROD_EXPR:
case PLUS_EXPR:
! if (nested_in_vect_loop)
!   *adjustment_def = vecdef;
  else
+   *adjustment_def = init_val;
+ /* Create a vector of zeros for init_def.  */
+ if (SCALAR_FLOAT_TYPE_P (type))
def_for_init = build_real (type, dconst0);
! else
!   def_for_init = build_int_cst (type, 0);
! for (i = nunits - 1; i = 0; --i)
!   t = tree_cons (NULL_TREE, def_for_init, t);
  vector_type = get_vectype_for_scalar_type (TREE_TYPE (def_for_init));
  gcc_assert (vector_type);
  init_def = build_vector (vector_type, t);
*** vectorizable_reduction (tree stmt, block
*** 2716,2721 
--- 2716,2724 
  return false;
scalar_dest = GIMPLE_STMT_OPERAND (stmt, 0);
scalar_type = TREE_TYPE (scalar_dest);
+   if (!POINTER_TYPE_P (scalar_type)  !INTEGRAL_TYPE_P (scalar_type) 
+!SCALAR_FLOAT_TYPE_P (scalar_type))
+ return false;

/* All uses but the last are expected to be defined in the loop.
   The last use is the reduction variable.  */


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33987



[Bug target/33958] Using -ftree-vectorize , creates an illegal movaps instruction

2007-10-31 Thread dorit at gcc dot gnu dot org


--- Comment #3 from dorit at gcc dot gnu dot org  2007-10-31 17:46 ---
(In reply to comment #2)
 Works for me.  Try a newer 4.2.x release.

I wonder if the fix for PR25413 fixed this problem - it went into 4.2 on July
25th, just shortly after 4.2.1 was released :-( but should be in 4.2.2


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33958



[Bug tree-optimization/33113] Failing to represent the stride (with array) of a dataref when it is not a constant

2007-10-31 Thread dorit at gcc dot gnu dot org


--- Comment #6 from dorit at gcc dot gnu dot org  2007-11-01 00:55 ---
thanks!

 but the problem is that in the vectorizer, DR_STEP has to be an
 INTEGER_CST: for instance,
   step = TREE_INT_CST_LOW (DR_STEP (dra));
 ...
   || tree_int_cst_compare (DR_STEP (dra), DR_STEP (drb)))
 and plenty of other places will ICE if we feed them with symbolic
 strides.

This can be fixed. I'll try to look into that.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33113



[Bug target/32893] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize

2007-10-29 Thread dorit at gcc dot gnu dot org


--- Comment #17 from dorit at gcc dot gnu dot org  2007-10-30 05:25 ---
Subject: Bug 32893

Author: dorit
Date: Tue Oct 30 05:25:10 2007
New Revision: 129764

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=129764
Log:
PR tree-optimization/32893
* tree-vectorize.c (vect_can_force_dr_alignment_p): Check
STACK_BOUNDARY instead of PREFERRED_STACK_BOUNDARY.


Added:
trunk/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c
trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-31.c
trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-34.c
trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-36.c
trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-64.c
trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-65.c
trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-66.c
trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-68.c
trunk/gcc/testsuite/gcc.dg/vect/vect-77-alignchecks.c
trunk/gcc/testsuite/gcc.dg/vect/vect-77-global.c
trunk/gcc/testsuite/gcc.dg/vect/vect-78-alignchecks.c
trunk/gcc/testsuite/gcc.dg/vect/vect-78-global.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
trunk/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
trunk/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6.c
trunk/gcc/testsuite/gcc.dg/vect/slp-25.c
trunk/gcc/testsuite/gcc.dg/vect/vect-13.c
trunk/gcc/testsuite/gcc.dg/vect/vect-17.c
trunk/gcc/testsuite/gcc.dg/vect/vect-18.c
trunk/gcc/testsuite/gcc.dg/vect/vect-19.c
trunk/gcc/testsuite/gcc.dg/vect/vect-2.c
trunk/gcc/testsuite/gcc.dg/vect/vect-20.c
trunk/gcc/testsuite/gcc.dg/vect/vect-21.c
trunk/gcc/testsuite/gcc.dg/vect/vect-22.c
trunk/gcc/testsuite/gcc.dg/vect/vect-27.c
trunk/gcc/testsuite/gcc.dg/vect/vect-29.c
trunk/gcc/testsuite/gcc.dg/vect/vect-3.c
trunk/gcc/testsuite/gcc.dg/vect/vect-31.c
trunk/gcc/testsuite/gcc.dg/vect/vect-34.c
trunk/gcc/testsuite/gcc.dg/vect/vect-36.c
trunk/gcc/testsuite/gcc.dg/vect/vect-4.c
trunk/gcc/testsuite/gcc.dg/vect/vect-5.c
trunk/gcc/testsuite/gcc.dg/vect/vect-6.c
trunk/gcc/testsuite/gcc.dg/vect/vect-64.c
trunk/gcc/testsuite/gcc.dg/vect/vect-65.c
trunk/gcc/testsuite/gcc.dg/vect/vect-66.c
trunk/gcc/testsuite/gcc.dg/vect/vect-68.c
trunk/gcc/testsuite/gcc.dg/vect/vect-7.c
trunk/gcc/testsuite/gcc.dg/vect/vect-72.c
trunk/gcc/testsuite/gcc.dg/vect/vect-73.c
trunk/gcc/testsuite/gcc.dg/vect/vect-76.c
trunk/gcc/testsuite/gcc.dg/vect/vect-77.c
trunk/gcc/testsuite/gcc.dg/vect/vect-78.c
trunk/gcc/testsuite/gcc.dg/vect/vect-86.c
trunk/gcc/testsuite/gcc.dg/vect/vect-all.c
trunk/gcc/testsuite/gcc.dg/vect/vect.exp
trunk/gcc/testsuite/gcc.dg/vect/wrapv-vect-7.c
trunk/gcc/testsuite/lib/target-supports.exp
trunk/gcc/tree-vectorizer.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32893



[Bug tree-optimization/33860] [4.3 Regression] ICE in vectorizable_load, at tree-vect-transform.c:5503

2007-10-23 Thread dorit at gcc dot gnu dot org


--- Comment #5 from dorit at gcc dot gnu dot org  2007-10-23 19:50 ---
Subject: Bug 33860

Author: dorit
Date: Tue Oct 23 19:50:18 2007
New Revision: 129587

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=129587
Log:
PR tree-optimization/33860
* tree-vect-transform.c (vect_analyze_data_ref_access): Don't allow
interleaved accesses in case the dr is inside the inner-loop during
outer-loop vectorization.


Added:
trunk/gcc/testsuite/g++.dg/vect/pr33860.cc
trunk/gcc/testsuite/g++.dg/vect/pr33860a.cc
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-analyze.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33860



[Bug tree-optimization/33860] [4.3 Regression] ICE in vectorizable_load, at tree-vect-transform.c:5503

2007-10-22 Thread dorit at gcc dot gnu dot org


--- Comment #4 from dorit at gcc dot gnu dot org  2007-10-22 22:54 ---
There's some bad interaction here between the data-interleaving support and the
outer-loop support - these are not yet supported together, however it still
slipped through the checks during the analysis phase. This patch fixes that by
not allowing us to detect interleaved accesses in the inner-loop during
outer-loop vectorization:

--- tree-vect-analyze.c 2007-10-22 08:34:45.0 +0200
+++ tree-vect-analyze.dn.c  2007-10-22 22:23:01.0 +0200
@@ -2321,6 +2321,10 @@

   if (nested_in_vect_loop_p (loop, stmt))
 {
+  /* Interleaved accesses are not yet supported within outer-loop
+vectorization for references in the inner-loop.  */
+  DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt)) = NULL_TREE;
+
   /* For the rest of the analysis we use the outer-loop step.  */
   step = STMT_VINFO_DR_STEP (stmt_info);
   dr_step = TREE_INT_CST_LOW (step);

(yet to be bootstrapped etc.)

By the way, on powerpc-linux, this testcase gets vectorized with this fix
(after changing the doubles to floats, and forcing alignment of the data array
with attribute aligned), without taking advantage of the fact that the two
loads are interleaved. 

By the way, I suspect that the vectorized code here is quite worse than the
original scalar code;
instead of: (ld,ld,add,store) * 16
we have: (vload,realign,splat,vload,realign,splat,vadd,vstore) * 4
with additional overhead outside the loop.
After the ICE is fixed we should probably add this as a missed-optimization PR
(both in terms of the cost model, and in terms of exploiting the data reuse of
the interleaved accesses).


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |dorit at gcc dot gnu dot org
   |dot org |
 Status|UNCONFIRMED |ASSIGNED
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2007-10-22 22:54:32
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33860



[Bug tree-optimization/33834] [4.3 Regression] ICE in vect_get_vec_def_for_operand, at tree-vect-transform.c:1829

2007-10-22 Thread dorit at gcc dot gnu dot org


--- Comment #7 from dorit at gcc dot gnu dot org  2007-10-23 03:24 ---
Subject: Bug 33834

Author: dorit
Date: Tue Oct 23 03:24:06 2007
New Revision: 129571

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=129571
Log:
PR tree-optimization/33834
PR tree-optimization/33835
* tree-vect-analyze.c (vect_analyze_operations): RELEVANT and LIVE
stmts
need to be checked for success seperately.
* tree-vect-transform.c (vectorizable_call, vectorizable_conversion):
Remove the check that stmt is not LIVE.
(vectorizable_assignment, vectorizable_induction): Likewise.
(vectorizable_operation, vectorizable_type_demotion): Likewise.
(vectorizable_type_promotion, vectorizable_load, vectorizable_store):
Likewise.
(vectorizable_live_operation): Check that op is not NULL.


Added:
trunk/gcc/testsuite/g++.dg/vect/pr33834_1.cc
trunk/gcc/testsuite/g++.dg/vect/pr33834_2.cc
trunk/gcc/testsuite/g++.dg/vect/pr33835.cc
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-analyze.c
trunk/gcc/tree-vect-transform.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33834



[Bug tree-optimization/33835] [4.3 Regression] Segfault in vect_is_simple_use

2007-10-22 Thread dorit at gcc dot gnu dot org


--- Comment #5 from dorit at gcc dot gnu dot org  2007-10-23 03:24 ---
Subject: Bug 33835

Author: dorit
Date: Tue Oct 23 03:24:06 2007
New Revision: 129571

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=129571
Log:
PR tree-optimization/33834
PR tree-optimization/33835
* tree-vect-analyze.c (vect_analyze_operations): RELEVANT and LIVE
stmts
need to be checked for success seperately.
* tree-vect-transform.c (vectorizable_call, vectorizable_conversion):
Remove the check that stmt is not LIVE.
(vectorizable_assignment, vectorizable_induction): Likewise.
(vectorizable_operation, vectorizable_type_demotion): Likewise.
(vectorizable_type_promotion, vectorizable_load, vectorizable_store):
Likewise.
(vectorizable_live_operation): Check that op is not NULL.


Added:
trunk/gcc/testsuite/g++.dg/vect/pr33834_1.cc
trunk/gcc/testsuite/g++.dg/vect/pr33834_2.cc
trunk/gcc/testsuite/g++.dg/vect/pr33835.cc
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-analyze.c
trunk/gcc/tree-vect-transform.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33835



[Bug tree-optimization/33833] [4.3 Regression] ICE in build2_stat, at tree.c:3110 at -O3, tree-vectorizer

2007-10-21 Thread dorit at gcc dot gnu dot org


--- Comment #4 from dorit at gcc dot gnu dot org  2007-10-21 06:39 ---
I was able to reproduce this on i386-linux.

Looks like it's related to PLUS_EXPR vs. POINTER_PLUS_EXPR. The folowing patch
fixes this testcase:

Index: tree-vect-analyze.c
===
*** tree-vect-analyze.c (revision 129521)
--- tree-vect-analyze.c (working copy)
*** vect_analyze_data_refs (loop_vec_info lo
*** 3249,3255 
 inner-loop: *(BASE+INIT). (The first location is actually
 BASE+INIT+OFFSET, but we add OFFSET separately later.  */
  tree inner_base = build_fold_indirect_ref
!   (fold_build2 (PLUS_EXPR, TREE_TYPE (base),
base, init));

  if (vect_print_dump_info (REPORT_DETAILS))
{
--- 3249,3256 
 inner-loop: *(BASE+INIT). (The first location is actually
 BASE+INIT+OFFSET, but we add OFFSET separately later.  */
  tree inner_base = build_fold_indirect_ref
!   (fold_build2 (POINTER_PLUS_EXPR,
! TREE_TYPE (base), base, init));

  if (vect_print_dump_info (REPORT_DETAILS))
{


... but breaks some of the current vectorizer testcases: 

WARNING: gcc.dg/vect/vect-62.c compilation failed to produce executable
WARNING: gcc.dg/vect/vect-63.c compilation failed to produce executable
WARNING: gcc.dg/vect/vect-64.c compilation failed to produce executable
WARNING: gcc.dg/vect/vect-65.c compilation failed to produce executable
WARNING: gcc.dg/vect/vect-66.c compilation failed to produce executable
WARNING: gcc.dg/vect/vect-67.c compilation failed to produce executable
WARNING: gcc.dg/vect/vect-70.c compilation failed to produce executable
WARNING: gcc.dg/vect/vect-align-2.c compilation failed to produce executable
WARNING: gcc.dg/vect/no-section-anchors-vect-69.c compilation failed to produce
executable
WARNING: gcc.dg/vect/no-scevccp-slp-30.c compilation failed to produce
executable

I looked into one of these failures, and it fails with:

/home/dorit/mainline/gcc/gcc/testsuite/gcc.dg/vect/vect-62.c:10: internal
compiler error: in build2_stat, at tree.c:3115 

Looks like it doesn't like the POINTER_PLUS_EXPR in this case because arg1 is
not compatible with sizetype:

Breakpoint 4, useless_type_conversion_p (outer_type=0xb7cbf000,
inner_type=0xb7cc521c)
at ../../gcc/gcc/tree-ssa.c:1074
1074{
(gdb) p debug_tree(outer_type)
 integer_type 0xb7cbf000 unsigned int public unsigned sizetype SI
size integer_cst 0xb7cb2658 type integer_type 0xb7cbf06c bit_size_type
constant invariant 32
unit size integer_cst 0xb7cb2444 type integer_type 0xb7cbf000 unsigned
int constant invariant 4
align 32 symtab -1210758772 alias set -1 canonical type 0xb7cc50d8
precision 32 min integer_cst 0xb7cb2674 0 max integer_cst 0xb7cb2c08 -1
$8 = void
(gdb) p debug_tree(inner_type)
 integer_type 0xb7cc521c public sizetype SI
size integer_cst 0xb7cb2658 type integer_type 0xb7cbf06c bit_size_type
constant invariant 32
unit size integer_cst 0xb7cb2444 type integer_type 0xb7cbf000 unsigned
int constant invariant 4
align 32 symtab 0 alias set -1 canonical type 0xb7cc521c precision 32 min
integer_cst 0xb7cb2b98 -2147483648 max integer_cst 0xb7cb2bb4 2147483647
$9 = void


I think POINTER_PLUS_EXPR makes sense here - need to check why we have this
mismatch between unsigned and signed sizetypes (and if that's also the problem
in the other testcases).


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2007-10-21 06:39:08
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33833



[Bug tree-optimization/33834] [4.3 Regression] ICE in vect_get_vec_def_for_operand, at tree-vect-transform.c:1829

2007-10-21 Thread dorit at gcc dot gnu dot org


--- Comment #5 from dorit at gcc dot gnu dot org  2007-10-21 07:14 ---
This patch fixes it:

Index: tree-vect-transform.c
===
*** tree-vect-transform.c   (revision 129521)
--- tree-vect-transform.c   (working copy)
*** vectorizable_live_operation (tree stmt,
*** 5870,5875 
--- 5870,5878 

gcc_assert (STMT_VINFO_LIVE_P (stmt_info));

+   if (STMT_VINFO_RELEVANT_P (stmt_info))
+ return false;
+
if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def)
  return false;


(but doesn't allow vectorization. I may try a different fix that does allow
vectorization)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33834



[Bug tree-optimization/33835] [4.3 Regression] Segfault in vect_is_simple_use

2007-10-21 Thread dorit at gcc dot gnu dot org


--- Comment #3 from dorit at gcc dot gnu dot org  2007-10-21 08:07 ---
The proposed fix/work-around for PR33834 also happens to fix this PR. But the
real problem is that we try to access a NULL argument (operand 2 of a CALL_EXPR
may be NULL). So we should probably at least add something like this:

*** vectorizable_live_operation (tree stmt,
*** 5893,5899 
for (i = 0; i  op_type; i++)
  {
op = TREE_OPERAND (operation, i);
!   if (!vect_is_simple_use (op, loop_vinfo, def_stmt, def, dt))
  {
if (vect_print_dump_info (REPORT_DETAILS))
  fprintf (vect_dump, use not simple.);
--- 5896,5902 
for (i = 0; i  op_type; i++)
  {
op = TREE_OPERAND (operation, i);
!   if (op  !vect_is_simple_use (op, loop_vinfo, def_stmt, def, dt))
  {
if (vect_print_dump_info (REPORT_DETAILS))
  fprintf (vect_dump, use not simple.);


This would help us pass the analysis stage, but we would later fail in the
transform stage just like in PR33834. So this PR would require the same fix as
PR33834. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33835



[Bug tree-optimization/33834] [4.3 Regression] ICE in vect_get_vec_def_for_operand, at tree-vect-transform.c:1829

2007-10-21 Thread dorit at gcc dot gnu dot org


--- Comment #6 from dorit at gcc dot gnu dot org  2007-10-22 04:28 ---
I'm testing this patch. It fixes the two testcases, while allowing the first
testcase to get vectorized. (the last bit in the patch is the fix for PR33835):

Index: tree-vect-analyze.c
===
*** tree-vect-analyze.c (revision 129521)
--- tree-vect-analyze.c (working copy)
*** vect_analyze_operations (loop_vec_info l
*** 481,487 
  need_to_vectorize = true;
}

! ok = (vectorizable_type_promotion (stmt, NULL, NULL)
|| vectorizable_type_demotion (stmt, NULL, NULL)
|| vectorizable_conversion (stmt, NULL, NULL, NULL)
|| vectorizable_operation (stmt, NULL, NULL, NULL)
--- 481,489 
  need_to_vectorize = true;
}

! if (STMT_VINFO_RELEVANT_P (stmt_info)
! || STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def)
!   ok = (vectorizable_type_promotion (stmt, NULL, NULL)
|| vectorizable_type_demotion (stmt, NULL, NULL)
|| vectorizable_conversion (stmt, NULL, NULL, NULL)
|| vectorizable_operation (stmt, NULL, NULL, NULL)
*** vect_analyze_operations (loop_vec_info l
*** 492,508 
|| vectorizable_condition (stmt, NULL, NULL)
|| vectorizable_reduction (stmt, NULL, NULL));

  /* Stmts that are (also) live (i.e. - that are used out of the
loop)
 need extra handling, except for vectorizable reductions.  */
  if (STMT_VINFO_LIVE_P (stmt_info)
   STMT_VINFO_TYPE (stmt_info) != reduc_vec_info_type)
!   ok |= vectorizable_live_operation (stmt, NULL, NULL);

  if (!ok)
{
  if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
{
! fprintf (vect_dump, not vectorized: stmt not supported: );
  print_generic_expr (vect_dump, stmt, TDF_SLIM);
}
  return false;
--- 494,522 
|| vectorizable_condition (stmt, NULL, NULL)
|| vectorizable_reduction (stmt, NULL, NULL));

+ if (!ok)
+   {
+ if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
+   {
+ fprintf (vect_dump, not vectorized: relevant stmt not );
+ fprintf (vect_dump, supported: );
+ print_generic_expr (vect_dump, stmt, TDF_SLIM);
+   }
+ return false;
+   }
+ 
  /* Stmts that are (also) live (i.e. - that are used out of the
loop)
 need extra handling, except for vectorizable reductions.  */
  if (STMT_VINFO_LIVE_P (stmt_info)
   STMT_VINFO_TYPE (stmt_info) != reduc_vec_info_type)
!   ok = vectorizable_live_operation (stmt, NULL, NULL);

  if (!ok)
{
  if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
{
! fprintf (vect_dump, not vectorized: live stmt not );
! fprintf (vect_dump, supported: );
  print_generic_expr (vect_dump, stmt, TDF_SLIM);
}
  return false;
Index: tree-vect-transform.c
===
*** tree-vect-transform.c   (revision 129521)
--- tree-vect-transform.c   (working copy)
*** vectorizable_call (tree stmt, block_stmt
*** 2961,2974 
if (STMT_SLP_TYPE (stmt_info))
  return false;

-   /* FORNOW: not yet supported.  */
-   if (STMT_VINFO_LIVE_P (stmt_info))
- {
-   if (vect_print_dump_info (REPORT_DETAILS))
- fprintf (vect_dump, value used after loop.);
-   return false;
- }
-
/* Is STMT a vectorizable call?   */
if (TREE_CODE (stmt) != GIMPLE_MODIFY_STMT)
  return false;
--- 2961,2966 
*** vectorizable_conversion (tree stmt, bloc
*** 3307,3320 
if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_loop_def)
  return false;

-   if (STMT_VINFO_LIVE_P (stmt_info))
- {
-   /* FORNOW: not yet supported.  */
-   if (vect_print_dump_info (REPORT_DETAILS))
-   fprintf (vect_dump, value used after loop.);
-   return false;
- }
- 
if (TREE_CODE (stmt) != GIMPLE_MODIFY_STMT)
  return false;

--- 3299,3304 
*** vectorizable_assignment (tree stmt, bloc
*** 3585,3598 
if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_loop_def)
  return false;

-   /* FORNOW: not yet supported.  */
-   if (STMT_VINFO_LIVE_P (stmt_info))
- {
-   if (vect_print_dump_info (REPORT_DETAILS))
- fprintf (vect_dump, value used after loop.);
-   return false;
- }
- 
/* Is vectorizable assignment?  */
if (TREE_CODE (stmt) != GIMPLE_MODIFY_STMT)
  return false;
--- 3569,3574 
*** vectorizable_induction (tree

[Bug tree-optimization/33835] [4.3 Regression] Segfault in vect_is_simple_use

2007-10-21 Thread dorit at gcc dot gnu dot org


--- Comment #4 from dorit at gcc dot gnu dot org  2007-10-22 04:37 ---
I'm testing a patch that would fix both this PR and PR33834 (posted it under
the PR33834 entry). By the way, this testcase does not get vectorized with
current mainline (an Oct21 snapshot) because the call to cos is not taken out
of the inner-loop, although it's invariant; it was taken out of the loop with
an older snapshot (Sept10). Another data point - in the testcase in PR33834 the
call to cos is taken out of the inner-loop with current snapshot. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33835



[Bug regression/32582] Bootstrap with vectorization enabled fails with ICE on PPC

2007-10-14 Thread dorit at gcc dot gnu dot org


--- Comment #35 from dorit at gcc dot gnu dot org  2007-10-15 05:52 ---
bootstrap with vectorization enabled with your patch applied passes for me on
ppc64-linux. thanks!!


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32582



[Bug target/32893] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize

2007-10-03 Thread dorit at gcc dot gnu dot org


--- Comment #16 from dorit at gcc dot gnu dot org  2007-10-03 18:52 ---
Ryan, thanks a lot for the info. FYI, I started a discussion about this here:
http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00202.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32893



[Bug target/32893] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize

2007-09-19 Thread dorit at gcc dot gnu dot org


--- Comment #7 from dorit at gcc dot gnu dot org  2007-09-19 14:28 ---
(In reply to comment #6)
 It looks like 
 zlib compiled w/ -O -msse -ftree-vectorize (built with fedora's rpm package
 gcc-4.1.2-17) 
 has same problem.
 In my environment, rpm-4.4.2.1-7.fc8 and seamonkey-1.1.3-6.fc8 segfault like
 below:
 Program received signal SIGSEGV, Segmentation fault.
 0x003a869d in inflate_table (type=CODES, lens=0x913b5c8, codes=19,
 table=0x913b5c4, bits=0x913b5ac, work=0x913b848) at inftrees.c:108
 108 count[len] = 0;

could you please provide a complete (reduced...) testcase that could be used to
reproduce this? 
In the meantime, other things that may help:
- could you please try to add __attribute__ ((__aligned__(16))) to the
definition of count, as suggested in comment 5?
- could you please show the relevant generated assembly up to the offending
insn?  (with and without the attribute aligned)? could you also check (with
gdb) what is the address accessed and what is the address of the stack pointer?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32893



[Bug bootstrap/21335] [meta-bug] bootstrap fails with -ftree-vectorize

2007-09-14 Thread dorit at gcc dot gnu dot org


--- Comment #7 from dorit at gcc dot gnu dot org  2007-09-14 18:49 ---
(In reply to comment #6)
 I can bootstrap current trunk (r128479) with -ftree-vectorize on
 x86_64-unknown-linux-gnu for some time now, and, according to
 http://gcc.gnu.org/ml/gcc-patches/2007-09/msg00327.html, this problem is gone
 on powerpc64 too.

actually the link you give above explicitly states that we don't pass
bootstrap with vectorization enabled on powerpc-linux, but there's already a
PR for that (PR32582) so it's ok to close this one

 So closing as fixed.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21335



[Bug tree-optimization/33373] [4.3 Regression] ICE in vectorizable_type_demotion, at tree-vect-transform.c:4098

2007-09-14 Thread dorit at gcc dot gnu dot org


--- Comment #5 from dorit at gcc dot gnu dot org  2007-09-14 20:53 ---
(In reply to comment #4)
 Very similar testcase with the difference that it is not fixed by r128415 and
 makes current trunk segfault in VEC_tree_base_pop():
 void f (unsigned int *d, unsigned int *s, int w)
 {
   int i;
   for (i = 0; i  w; ++i)
 d [i] = s [i] * (unsigned short) (~d [i]  24);
 }

this should fix it:

Index: tree-vect-transform.c
===
*** tree-vect-transform.c   (revision 128501)
--- tree-vect-transform.c   (working copy)
*** vect_get_vec_defs_for_stmt_copy (enum ve
*** 1938,1944 
vec_oprnd = vect_get_vec_def_for_stmt_copy (dt[0], vec_oprnd);
VEC_quick_push (tree, *vec_oprnds0, vec_oprnd);

!   if (vec_oprnds1)
  {
vec_oprnd = VEC_pop (tree, *vec_oprnds1);
vec_oprnd = vect_get_vec_def_for_stmt_copy (dt[1], vec_oprnd);
--- 1938,1944 
vec_oprnd = vect_get_vec_def_for_stmt_copy (dt[0], vec_oprnd);
VEC_quick_push (tree, *vec_oprnds0, vec_oprnd);

!   if (vec_oprnds1  *vec_oprnds1)
  {
vec_oprnd = VEC_pop (tree, *vec_oprnds1);
vec_oprnd = vect_get_vec_def_for_stmt_copy (dt[1], vec_oprnd);

(and by the way, I think this is a totally different issue than what this PR
was originally opened for, and should be a separate PR. I think this regression
is due to r128289)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33373



[Bug tree-optimization/33373] ICE in vectorizable_type_demotion, at tree-vect-transform.c:4098

2007-09-12 Thread dorit at gcc dot gnu dot org


--- Comment #3 from dorit at gcc dot gnu dot org  2007-09-12 07:10 ---
Subject: Bug 33373

Author: dorit
Date: Wed Sep 12 07:09:38 2007
New Revision: 128415

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=128415
Log:
PR tree-optimization/33373
* tree-vect-analyze (vect_determine_vectorization_factor): Call
TREE_INT_CST_LOW when comparing TYPE_SIZE_UNIT.


Added:
trunk/gcc/testsuite/gcc.dg/vect/pr33373.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-analyze.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33373



[Bug tree-optimization/33373] ICE in vectorizable_type_demotion, at tree-vect-transform.c:4098

2007-09-10 Thread dorit at gcc dot gnu dot org


--- Comment #2 from dorit at gcc dot gnu dot org  2007-09-10 09:08 ---
Testing this patch (it's a bug in the fix for PR33301. I accidentally treated
TYPE_SIZE_UNIT as a constant, whereas it's really a tree...):

Index: tree-vect-analyze.c
===
*** tree-vect-analyze.c (revision 128322)
--- tree-vect-analyze.c (working copy)
*** vect_determine_vectorization_factor (loo
*** 242,252 
  operation = GIMPLE_STMT_OPERAND (stmt, 1);
  if (TREE_CODE (operation) == NOP_EXPR
  || TREE_CODE (operation) == CONVERT_EXPR
! || TREE_CODE (operation) ==  WIDEN_MULT_EXPR)
{
  tree rhs_type = TREE_TYPE (TREE_OPERAND (operation, 0));
! if (TYPE_SIZE_UNIT (rhs_type)  TYPE_SIZE_UNIT (scalar_type))
!   scalar_type = TREE_TYPE (TREE_OPERAND (operation, 0));
}

  if (vect_print_dump_info (REPORT_DETAILS))
--- 242,253 
  operation = GIMPLE_STMT_OPERAND (stmt, 1);
  if (TREE_CODE (operation) == NOP_EXPR
  || TREE_CODE (operation) == CONVERT_EXPR
! || TREE_CODE (operation) == WIDEN_MULT_EXPR)
{
  tree rhs_type = TREE_TYPE (TREE_OPERAND (operation, 0));
! if (TREE_INT_CST_LOW (TYPE_SIZE_UNIT (rhs_type)) 
! TREE_INT_CST_LOW (TYPE_SIZE_UNIT (scalar_type)))
!   scalar_type = rhs_type;
}

  if (vect_print_dump_info (REPORT_DETAILS))


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33373



[Bug tree-optimization/33301] wrong vectorization factor due to an invariant type-promotion in the loop

2007-09-08 Thread dorit at gcc dot gnu dot org


--- Comment #1 from dorit at gcc dot gnu dot org  2007-09-08 09:19 ---
Subject: Bug 33301

Author: dorit
Date: Sat Sep  8 09:19:39 2007
New Revision: 128265

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=128265
Log:
PR tree-optimization/33301
* tree-vect-analyze (analyze_operations): Look at the type of the rhs
when relevant.


Added:
trunk/gcc/testsuite/gfortran.dg/vect/pr33301.f
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-analyze.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33301



[Bug tree-optimization/33301] wrong vectorization factor due to an invariant type-promotion in the loop

2007-09-08 Thread dorit at gcc dot gnu dot org


--- Comment #2 from dorit at gcc dot gnu dot org  2007-09-08 09:23 ---
fix committed


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33301



[Bug tree-optimization/33299] [4.3 Regression] miscompilation with gfortran -O2 -ffast-math -ftree-vectorize

2007-09-08 Thread dorit at gcc dot gnu dot org


--- Comment #6 from dorit at gcc dot gnu dot org  2007-09-08 09:24 ---
fix committed


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33299



[Bug tree-optimization/33320] ICE with vectorization in the testsuite during dataref analysis

2007-09-08 Thread dorit at gcc dot gnu dot org


--- Comment #2 from dorit at gcc dot gnu dot org  2007-09-08 09:42 ---
(In reply to comment #1)
 (In reply to comment #0)
  When the testcase gcc.dg/tree-ssa/predcom-3.c is compiled with 
  vectorization it
  ICes when the dataref analysis called from vectorizer:
 I can't get the compiler (current mainline) to segfault with the compile flags
 form the description on x86_64-pc-linux-gnu or i686-pc-linux-gnu (with and
 without -msse2).

I can't reproduce it anymore either... I actually opened this PR at least a
week after I saw this failure, so maybe it got fixed in the meantime? I guess
I'll close it then.


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||WORKSFORME


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33320



[Bug tree-optimization/33299] [4.3 Regression] miscompilation with gfortran -O2 -ffast-math -ftree-vectorize

2007-09-07 Thread dorit at gcc dot gnu dot org


--- Comment #5 from dorit at gcc dot gnu dot org  2007-09-07 15:00 ---
Subject: Bug 33299

Author: dorit
Date: Fri Sep  7 15:00:11 2007
New Revision: 128242

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=128242
Log:
PR tree-optimization/33299
* tree-vect-transform.c (vect_create_epilog_for_reduction): Update uses
for all relevant loop-exit phis, not just the first.


Added:
trunk/gcc/testsuite/gfortran.dg/vect/fast-math-pr33299.f90
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gfortran.dg/vect/vect.exp
trunk/gcc/tree-vect-transform.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33299



[Bug tree-optimization/33319] New: ICE with vectorization in

2007-09-06 Thread dorit at gcc dot gnu dot org
when the testcase g++.dg/tree-ssa/pr27549.C is compiled with -ftree-vectorize
it ICEs with: 

Unable to coalesce ssa_names 141 and 280 which are marked as MUST COALESCE.
s$b_141(ab) and  s$b_280(ab)
/Develop/mainline-dn/gcc/gcc/testsuite/g++.dg/tree-ssa/pr27549.C: In function
âconst char* foo()â:
/Develop/mainline-dn/gcc/gcc/testsuite/g++.dg/tree-ssa/pr27549.C:72: internal
compiler error: SSA corruption

The testcase is vectorized using versioning-for-aliasing, although it is known
at compile time that there is a dependence for sure, so there's no point in
testing this at runtime (as pointed out here:
http://gcc.gnu.org/ml/gcc-patches/2007-08/msg01211.html). With --param
vect-max-version-for-alias-checks=0 the testcase doesn't get vectorized and
doesn't ICE. So, Disabling versioning-for-aliasing when it's redundant (like in
the above case) would avoid the ICE, but we should probably figure out what is
really causing the ICE.

(this is how the testcase is compiled:
/Develop/mainline-dn/build1/gcc/testsuite/g++/../../g++
-B/Develop/mainline-dn/build1/gcc/testsuite/g++/../../
/Develop/mainline-dn/gcc/gcc/testsuite/g++.dg/tree-ssa/pr27549.C -nostdinc++
-I/Develop/mainline-dn/build1/powerpc64-unknown-linux-gnu/libstdc++-v3/include/powerpc64-unknown-linux-gnu
-I/Develop/mainline-dn/build1/powerpc64-unknown-linux-gnu/libstdc++-v3/include
-I/Develop/mainline-dn/gcc/libstdc++-v3/libsupc++
-I/Develop/mainline-dn/gcc/libstdc++-v3/include/backward
-I/Develop/mainline-dn/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -O2
-S -m64 -O2 -ftree-vectorize -maltivec -fdump-tree-vect-details -o pr27549.s)


-- 
   Summary: ICE with vectorization in
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at gcc dot gnu dot org
 GCC build triplet: powerpc-linux
  GCC host triplet: powerpc-linux
GCC target triplet: powerpc-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33319



[Bug tree-optimization/33320] New: ICE with vectorization in the testsuite during dataref analysis

2007-09-06 Thread dorit at gcc dot gnu dot org
When the testcase gcc.dg/tree-ssa/predcom-3.c is compiled with vectorization it
ICes when the dataref analysis called from vectorizer:

/home/dorit/mainline/build2/gcc/xgcc -B/home/dorit/mainline/build2/gcc/
/home/dorit/mainline/gcc/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c -O2
-fpredictive-commoning -fdump-tree-pcom-details -fno-show-column -S -O2
-ftree-vectorize -fdump-tree-vect-details -o predcom-3.s
/home/dorit/mainline/gcc/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c: In function
âtestâ:
/home/dorit/mainline/gcc/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c:7: internal
compiler error: Segmentation fault


-- 
   Summary: ICE with vectorization in the testsuite during dataref
analysis
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at gcc dot gnu dot org
 GCC build triplet: i386-linux
  GCC host triplet: i386-linux
GCC target triplet: i386-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33320



[Bug tree-optimization/33299] [4.3 Regression] miscompilation with gfortran -O2 -ffast-math -ftree-vectorize

2007-09-04 Thread dorit at gcc dot gnu dot org


--- Comment #2 from dorit at gcc dot gnu dot org  2007-09-04 11:44 ---
(In reply to comment #1)
 Confirmed.  It looks like the vectorizer forgets to update the PHI node for
 stmp_var:

yes. I suspect I didn't expect at the time that there would be two
loop-closed-ssa-form phi-nodes at the loop exit for s_3, so I probably update
just one of them (s_10) and not the other (s_4). This is how it looks before
vectorization:

bb 7:
  # s_4 = PHI s_3(3)
  # s_10 = PHI s_3(3)
  D.1368_15 = *x_14(D);
  if (D.1368_15  0.0)
goto bb 8;
  else
goto bb 9;

bb 8:
  s_16 = -s_10;

bb 9:
  # s_1 = PHI s_4(7), s_16(8)
  return s_1;

I'll prepare a fix.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33299



[Bug tree-optimization/33299] [4.3 Regression] miscompilation with gfortran -O2 -ffast-math -ftree-vectorize

2007-09-04 Thread dorit at gcc dot gnu dot org


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |dorit at gcc dot gnu dot org
   |dot org |
 Status|NEW |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33299



[Bug tree-optimization/33301] New: wrong vectorization factor due to an invariant type-promotion in the loop

2007-09-04 Thread dorit at gcc dot gnu dot org
 (operation, 0));
+ if (TYPE_SIZE_UNIT (rhs_type)  TYPE_SIZE_UNIT (scalar_type))
+   scalar_type = TREE_TYPE (TREE_OPERAND (operation, 0));
+   }
+
  if (vect_print_dump_info (REPORT_DETAILS))
{
  fprintf (vect_dump, get vectype for scalar type:  );


-- 
   Summary: wrong vectorization factor due to an invariant type-
promotion in the loop
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: dorit at gcc dot gnu dot org
ReportedBy: dorit at gcc dot gnu dot org
 GCC build triplet: i386-linux
  GCC host triplet: i386-linux
GCC target triplet: i386-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33301



[Bug tree-optimization/33299] [4.3 Regression] miscompilation with gfortran -O2 -ffast-math -ftree-vectorize

2007-09-04 Thread dorit at gcc dot gnu dot org


--- Comment #3 from dorit at gcc dot gnu dot org  2007-09-04 19:11 ---
I'm testing this patch:

Index: tree-vect-transform.c
===
*** tree-vect-transform.c   (revision 128037)
--- tree-vect-transform.c   (working copy)
*** vect_create_epilog_for_reduction (tree v
*** 1964,1969 
--- 1964,1971 
tree operation = GIMPLE_STMT_OPERAND (stmt, 1);
bool nested_in_vect_loop = false;
int op_type;
+   VEC(tree,heap) *phis = NULL;
+   int i;

if (nested_in_vect_loop_p (loop, stmt))
  {
*** vect_finalize_reduction:
*** 2260,2270 
epilog_stmt = build_gimple_modify_stmt (new_dest, expr);
new_temp = make_ssa_name (new_dest, epilog_stmt);
GIMPLE_STMT_OPERAND (epilog_stmt, 0) = new_temp;
- #if 0
-   bsi_insert_after (exit_bsi, epilog_stmt, BSI_NEW_STMT);
- #else
bsi_insert_before (exit_bsi, epilog_stmt, BSI_SAME_STMT);
- #endif
  }


--- 2262,2268 
*** vect_finalize_reduction:
*** 2274,2318 
   Find the loop-closed-use at the loop exit of the original scalar result.
   (The reduction result is expected to have two immediate uses - one at
the
   latch block, and one at the loop exit).  */
!   exit_phi = NULL;
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, scalar_dest)
  {
if (!flow_bb_inside_loop_p (loop, bb_for_stmt (USE_STMT (use_p
{
  exit_phi = USE_STMT (use_p);
! break;
}
  }
/* We expect to have found an exit_phi because of loop-closed-ssa form.  */
!   gcc_assert (exit_phi);

!   if (nested_in_vect_loop)
  {
!   stmt_vec_info stmt_vinfo = vinfo_for_stmt (exit_phi);

!   /* FORNOW. Currently not supporting the case that an inner-loop
reduction
!is not used in the outer-loop (but only outside the outer-loop).  */
!   gcc_assert (STMT_VINFO_RELEVANT_P (stmt_vinfo) 
!  !STMT_VINFO_LIVE_P (stmt_vinfo));
! 
!   epilog_stmt = adjustment_def ? epilog_stmt :  new_phi;
!   STMT_VINFO_VEC_STMT (stmt_vinfo) = epilog_stmt;
!   set_stmt_info (get_stmt_ann (epilog_stmt),
!  new_stmt_vec_info (epilog_stmt, loop_vinfo));

!   if (vect_print_dump_info (REPORT_DETAILS))
! {
!   fprintf (vect_dump, vector of partial results after inner-loop:);
!   print_generic_expr (vect_dump, epilog_stmt, TDF_SLIM);
! }
!   return;
  }
- 
-   /* Replace the uses:  */
-   orig_name = PHI_RESULT (exit_phi);
-   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
- FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
-   SET_USE (use_p, new_temp);
  } 


--- 2272,2313 
   Find the loop-closed-use at the loop exit of the original scalar result.
   (The reduction result is expected to have two immediate uses - one at
the 
   latch block, and one at the loop exit).  */
!   phis = VEC_alloc (tree, heap, 10);
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, scalar_dest)
  {
if (!flow_bb_inside_loop_p (loop, bb_for_stmt (USE_STMT (use_p
{
  exit_phi = USE_STMT (use_p);
! VEC_quick_push (tree, phis, exit_phi);
}
  }
/* We expect to have found an exit_phi because of loop-closed-ssa form.  */
!   gcc_assert (!VEC_empty (tree, phis));

!   for (i = 0; VEC_iterate (tree, phis, i, exit_phi); i++)
  {
!   if (nested_in_vect_loop)
!   {
! stmt_vec_info stmt_vinfo = vinfo_for_stmt (exit_phi);

! /* FORNOW. Currently not supporting the case that an inner-loop
reduction
!is not used in the outer-loop (but only outside the outer-loop). 
*/
! gcc_assert (STMT_VINFO_RELEVANT_P (stmt_vinfo) 
!  !STMT_VINFO_LIVE_P (stmt_vinfo));
!
! epilog_stmt = adjustment_def ? epilog_stmt :  new_phi;
! STMT_VINFO_VEC_STMT (stmt_vinfo) = epilog_stmt;
! set_stmt_info (get_stmt_ann (epilog_stmt),
! new_stmt_vec_info (epilog_stmt, loop_vinfo));
! continue;
!   }

!   /* Replace the uses:  */
!   orig_name = PHI_RESULT (exit_phi);
!   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
!   FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
! SET_USE (use_p, new_temp);
  }
  } 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33299



[Bug tree-optimization/33299] [4.3 Regression] miscompilation with gfortran -O2 -ffast-math -ftree-vectorize

2007-09-04 Thread dorit at gcc dot gnu dot org


--- Comment #4 from dorit at gcc dot gnu dot org  2007-09-04 19:14 ---
(by the way, fast-math should not be required here, but that's a different
bug... will fix that soonish)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33299



[Bug tree-optimization/33245] Missed opportunities for vectorization due to invariant condition

2007-08-31 Thread dorit at gcc dot gnu dot org


--- Comment #1 from dorit at gcc dot gnu dot org  2007-08-31 13:39 ---
(In reply to comment #0)
 The innermost loop in j cannot be vectorized because of the
 irregular code in that loop, i.e. the condition IF ( l.NE.k ).  But
 the cond expression is invariant in that loop, so the whole condition
 can be hoisted outside that loop, versioning the loop, and potentially
 allowing the vectorization of the innermost loop.

if you use -O3 the condition *is* taken out of the loop by loop-unswitch (at
least that's what I see with revision 127623).

   SUBROUTINE DGEFA(A,Lda,N,Ipvt,Info)
   INTEGER Lda , N , Ipvt(*) , Info
   DOUBLE PRECISION A(Lda,*)
   DOUBLE PRECISION t
   INTEGER IDAMAX , j , k , kp1 , l , nm1
   Info = 0
   nm1 = N - 1
   IF ( nm1.GE.1 ) THEN
  DO k = 1 , nm1
 kp1 = k + 1
 l = IDAMAX(N-k+1,A(k,k),1) + k - 1
 Ipvt(k) = l
 IF ( A(l,k).EQ.0.0D0 ) THEN
Info = k
 ELSE
IF ( l.NE.k ) THEN
   t = A(l,k)
   A(l,k) = A(k,k)
   A(k,k) = t
ENDIF
t = -1.0D0/A(k,k)
CALL DSCAL(N-k,t,A(k+1,k),1)
DO j = kp1 , N
   t = A(l,j)
   IF ( l.NE.k ) THEN
  A(l,j) = A(k,j)
  A(k,j) = t
   ENDIF
   CALL DAXPY(N-k,t,A(k+1,k),1,A(k+1,j),1)
ENDDO
 ENDIF
  ENDDO
   ENDIF
   Ipvt(N) = N
   IF ( A(N,N).EQ.0.0D0 ) Info = N
   CONTINUE
   END
 The result of the vectorizer on this testcase is:
 /home/seb/ex/linpk.f90:24: note: not vectorized: too many BBs in loop.
 /home/seb/ex/linpk.f90:24: note: bad loop form.
 /home/seb/ex/linpk.f90:1: note: vectorized 0 loops in function.
 Okay, if I'm versioning that loop by hand, I get the same error due to
 the PRE as for capacita.f90: the PRE inserts in the loop-latch block
 some code: 
   bb 11:
 # VUSE PARM_NOALIAS.16_252 { PARM_NOALIAS.16 }
 pretmp.47_297 = *n_13(D);
 goto bb 10;

Looks like -fno-tree-pre is not enough, because if PRE doesn't do it, then sink
does it. When I use -O3 -ftree-vectorize -msse2 -fno-tree-pre -fno-tree-sink
I get the dataref problem you report below, without manual modifications to the
code

 And with PRE disabled, the fail occurs in the data ref analysis:
 ./linpk_corrected.f90:26: note: not vectorized: data ref analysis failed 
 t.8_70
 = (*a_25(D))[D.1406_69]
 ./linpk_corrected.f90:26: note: bad data references.

Just for the record, this is the dataref problem that the dataref analyzer
reports:

Creating dr for t
analyze_innermost: (analyze_scalar_evolution
  (loop_nb = 3)
  (scalar = t)
(get_scalar_evolution
  (scalar = t)
  (scalar_evolution = ))
)
success.
base_address: t
offset from base address: 0
constant offset from base address: 0
step: 0
aligned to: 128
base_object: t
symbol tag: t
FAILED as dr address is invariant


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33245



[Bug tree-optimization/33246] Missed opportunities for vectorization due to data ref analysis

2007-08-31 Thread dorit at gcc dot gnu dot org


--- Comment #3 from dorit at gcc dot gnu dot org  2007-08-31 13:57 ---
...
 This is due to data ref analysis problems:
 ./fatigue.f90:14: note: not vectorized: data ref analysis failed
 (*stress_tensor.0_16)[D.1508_168] = D.1513_173
 ./fatigue.f90:14: note: bad data references.
 and
 ./fatigue.f90:14: note: not vectorized: data ref analysis failed D.1489_133 =
 (*strain_tensor.0_41)[D.1488_132]
 ./fatigue.f90:14: note: bad data references.

The data-ref analyzer reports:
   failed: evolution of offset is not affine.

As a result, the DR fields that represent the access relative to the inner-most
loop are almost all empty:

base_address:
offset from base address:
constant offset from base address:
step:
aligned to:
base_object: (*(real8[0:D.1433] *) D.1437_15)[0]
symbol tag: SMT.79

However note that the DR fields relative to the outer-loop are computable:

outer base_address: A.23
outer offset from base address: 0
outer constant offset from base address: 0
outer step: 24
outer aligned to: 128

If the data-ref analyzer can return the expression for the evolution in the
inner-loop, instead of failing, we would at least have a chance to do
outer-loop vectorization. 

This is a duplicate of PR33113.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33246



[Bug tree-optimization/33245] Missed opportunities for vectorization due to invariant condition

2007-08-31 Thread dorit at gcc dot gnu dot org


--- Comment #3 from dorit at gcc dot gnu dot org  2007-08-31 14:18 ---
(In reply to comment #2)
 Subject: Re:  Missed opportunities for vectorization due to invariant 
 condition
  Looks like -fno-tree-pre is not enough, because if PRE doesn't do it, then 
  sink
  does it. When I use -O3 -ftree-vectorize -msse2 -fno-tree-pre 
  -fno-tree-sink
  I get the dataref problem you report below, without manual modifications to 
  the
  code
 
 Apparently this is sink is triggered on -O3, Daniel also warned yesterday
 about the fact that it's not PRE specific.
 Actually, can't we move that code back in the loop body when
 the vectorizer detects that code in the latch bb?

here's a related discussion from a couple years ago:
http://gcc.gnu.org/ml/gcc-patches/2005-11/msg02045.html

(and also a somewhat related PR - PR28643) 

 I'm thinking that it is not really difficult to consider these scalars
 as arrays with a single element, and then just pass these to the rest
 of data deps.  I'll try to figure out a patch for this problem that would
 bring us more vectorized cases.

great.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33245



[Bug rtl-optimization/33224] failing rtl iv analysis (maybe due to df)

2007-08-30 Thread dorit at gcc dot gnu dot org


--- Comment #3 from dorit at gcc dot gnu dot org  2007-08-30 08:12 ---
(In reply to comment #2)
 I suspect this might be due to not updating the rd information after 
 unrolling.
 Can you check if 
 analyze_insns_in_loop() (which calls df_analyze()) is being called just before
 the problematic unrolling ?

it looks like it's called just before the unroller actually transforms
somthing, but not before the (failing) analysis. But when I add a call to it in
decide_peel_completely the analysis still fails.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33224



[Bug tree-optimization/33243] Missed opportunities for vectorization due to unhandled real_type

2007-08-30 Thread dorit at gcc dot gnu dot org


--- Comment #2 from dorit at gcc dot gnu dot org  2007-08-30 10:12 ---
 There are two time consuming routines in air.f90 of the Polyhedron
 benchmark that are not vectorized: lines 1328 and 1354.  These appear
 in the top counting of execution time with oprofile:
 
   SUBROUTINE DERIVY(D,U,Uy,Al,Np,Nd,M)
   IMPLICIT REAL*8(A-H,O-Z)
   PARAMETER (NX=150,NY=150)
   DIMENSION D(NY,33) , U(NX,NY) , Uy(NX,NY) , Al(30) , Np(30)
   DO jm = 1 , M
  jmax = 0
  jmin = 1
  DO i = 1 , Nd
 jmax = jmax + Np(i) + 1
 DO j = jmin , jmax
uyt = 0.
DO k = 0 , Np(i)
   uyt = uyt + D(j,k+1)*U(jm,jmin+k)
ENDDO
Uy(jm,j) = uyt*Al(i)
 ENDDO
 jmin = jmin + Np(i) + 1
  ENDDO
   ENDDO
   CONTINUE
   END
 
 ./poly_air_1354.f90:12: note: def_stmt: uyt_1 = PHI 0.0(9), uyt_42(11)
 ./poly_air_1354.f90:12: note: Unsupported pattern.
 ./poly_air_1354.f90:12: note: not vectorized: unsupported use in stmt.
 ./poly_air_1354.f90:12: note: unexpected pattern.
 ./poly_air_1354.f90:1: note: vectorized 0 loops in function.
 
 This is due to an unsupported type, real_type, for the reduction variable uyt:
 (this is on an i686-linux machine)

There is no unhandled real_type problem, you just need to use -ffast-math to
allow vectorization of summation of fp types (or the new reassociation flag):
pr33243b.f90:12: note: Analyze phi: uyt_1 = PHI 0.0(9), uyt_42(11)
pr33243b.f90:12: note: reduction: unsafe fp math optimization: D.1386_41 +
uyt_1
pr33243b.f90:12: note: Unknown def-use cycle pattern.

If you use -ffast-math the reduction is detected:
pr33243b.f90:12: note: Analyze phi: uyt_1 = PHI 0.0(9), uyt_42(11)
pr33243b.f90:12: note: detected reduction:D.1386_41 + uyt_1
pr33243b.f90:12: note: Detected reduction.

However, the loop will still not get vectorized because there is a
non-consecutive access in the loop: 
pr33243b.f90:12: note: === vect_analyze_data_ref_accesses ===
pr33243b.f90:12: note: not consecutive access
pr33243b.f90:12: note: not vectorized: complicated access pattern.

This is because the stride of the accesses to D(j,k+1) and U(jm,jmin+k) in the
inner-loop (k-loop) between inner-loop iterations is 1200B: 

DO j = jmin , jmax
   uyt = 0.
   DO k = 0 , NP(i)
  uyt = uyt + D(j,k+1)*U(jm,jmin+k)
   ENDDO
   Uy(jm,j) = uyt*Al(i)
ENDDO

In the outer-loop (j-loop) these accesses are consecutive, and also you don't
need to use the -ffast-math flag. However there are other problems: 
1) the compiler creates a guard to control whether to enter the inner-loop or
not (cause it may execute 0 times). This creates a more involved control-flow
than the outer-loop vectorizer is willing to work with. A solution would be to
create this guard outside the outer-loop (in case it is invariant, as is the
case here), which is like versioning the loop (or unswichting the loop).
2) if you change the loop count to something constant (just to bypass the above
problem), then indeed no guard code is generated, but there is a computation
(advancing an iv) in the latch block of the outer-loop (so it is not empty, and
we are not willing to work with such loops). We need to clean that away.
3) After these problems are solved, we still need to deal with a
non-consecutive access in the outer-loop - the store to Uy(jm,j). AFAICS, this
requires either transposing the Uy array in advance, or teaching the vectorizer
to scatter the results to the non-adjacent locations (which would be quite
expensive, but we could give it a try).

Alternatively, vectorizing the inner-loop would require transposing the D and U
matrices.

Another option is to interchange the jm loop with the j loop - I think this way
all accesses would be consecutive, and we could vectorize the jm loop (which
would now be a doubly-nested loop that the outer-loop vectorizer could handle).

So, the PR for this testcase would be better classified under one of the above
problems/missed-optimizations rather than unhandled real_type. 

 
 Another similar routine that also appears in the top ranked and not
 vectorized due to the same unsupported real_type reasons is in air.f90:1181
 
 
   SUBROUTINE FVSPLTX2
   IMPLICIT REAL*8(A-H,O-Z)
   PARAMETER (NX=150,NY=150)
   DIMENSION DX(NX,33) , ALX(30) , NPX(30)
   DIMENSION FP1(NX,NY) , FM1(NX,NY) , FP1x(30,NX) , FM1x(30,NX)
   DIMENSION FP2(NX,NY) , FM2(NX,NY) , FP2x(30,NX) , FM2x(30,NX)
   DIMENSION FP3(NX,NY) , FM3(NX,NY) , FP3x(30,NX) , FM3x(30,NX)
   DIMENSION FP4(NX,NY) , FM4(NX,NY) , FP4x(30,NX) , FM4x(30,NX)
   DIMENSION FV2(NX,NY) , DXP2(30,NX) , DXM2(30,NX)
   DIMENSION FV3(NX,NY) , DXP3(30,NX) , DXM3(30,NX)
   DIMENSION FV4(NX,NY) , DXP4(30,NX) , DXM4(30,NX)
   COMMON /XD1   / FP1 , FM1 , FP2 , FM2 , FP3 , FM3 , FP4 , FM4

[Bug rtl-optimization/33224] failing rtl iv analysis (maybe due to df)

2007-08-30 Thread dorit at gcc dot gnu dot org


--- Comment #5 from dorit at gcc dot gnu dot org  2007-08-30 16:29 ---
 dorit,
 i am having trouble exactly reproducing this example because you did not
 give the svn revision and so all of the numbers are a little bit
 different. 

it's revision 127623

 However, I am going to submit a patch which improves the dump
 information a lot for these passes and we should talk about it after we
 can get on the same page.

I applied your patch, and I'll send you the dump shorlty.

 However, from looking at your posting, there are some issues that you
 may want to look at before we talk:
 The reaching defs problem makes a scan for all of the defs in the blocks
 in the region.  Once all of the defs are found, they are sorted where
 the primary key is the regno. 
 The id's (DF_REF_ID) are then assigned based on this sorting.  The
 reaching defs problem actually depends on all of the defs for a regno to
 be contigious.
 The DF_REF_IDs are not stable between calls to df_set_blocks and any def
 outside of the region has an undefined DF_REF_ID.
 In your posting you have:
  Below is the output of df_ref_debug for adef in each iteration of the loop 
  in
  latch_dominating_def:
  d40 reg 187 bb 3 insn 255 flag 0x0 type 0x0 loc 0xf7da4608(0xf7d9a4e0) 
  chain { }
  d93 reg 187 bb 2 insn 40 flag 0x0 type 0x0 loc 0xf7d89cc8(0xf7d9a4e0) chain 
  { }
 The number after the first d is the DF_REF_ID.  Note that they are not
 contiguous. 
 Given the sorting that occurred, they must be contiguous.  I assume from this
 that 
 someone is holding on to old id's.  This is not correct.
 If you are going to play the game with df_set_blocks, you are allowed to hold
 onto a 
 def, but not the DF_REF_ID, you cannot look at the DF_REF_ID for a def 
 that is not in the blocks set by df_set_blocks.   

are you saying it's safer not to call df_set_blocks in iv_analysis_loop_init?
(iv-analysis still fails when I do that, but maybe that in turn requires other
changes?)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33224



[Bug rtl-optimization/33222] New: failing rtl iv analysis (maybe due to df)

2007-08-29 Thread dorit at gcc dot gnu dot org
In the testcase below, after the inner-loop gets completely unrolled, the
enclosing i-loop does not get unrolled because of failure to analyze the loop
iv, possibly due to a bug in df:

#define N 40
#define M 10
float in[N+M], coeff[M], out[N];
void fir (){
  int i,j,k;
  float diff;
  for (i = 0; i  N; i++) {
diff = 0;
for (j = 0; j  M; j++) {
  diff += in[j+i]*coeff[j];
}
out[i] = diff;
  }
}

Compiler options used:
/Develop/mainline-dn1/bin/gcc -O3 -maltivec -funroll-loops
vect-outer-fir2-kernel.c -S --param max-completely-peeled-insns=5000 --param
max-completely-peel-times=40 -fdump-tree-all -da -ftree-vectorize

(without -ftree-vectorize the i-loop does get unrolled).

Detailed description and discussion here: 
http://gcc.gnu.org/ml/gcc/2007-08/msg00482.html

Here are the relevant pieces from the RTL dump (at loop3_unroll):

bb2:
(insn 40 39 41 2 vect-outer-fir2-kernel.c:38 (set (reg:DI 187 [ ivtmp.59 ])
(mem/u/c:DI (plus:DI (reg:DI 2 2)
(const:DI (minus:DI (symbol_ref/u:DI (*.LC4) [flags 0x2])
(symbol_ref:DI (*.LCTOC1) [7 S8 A8])) 344
{*movdi_internal64} (expr_list:REG_EQUAL (symbol_ref:DI (fir_out) [flags
0x80] var_decl 0xf7d571c0 fir_out)
(nil)))

...
(insn 289 288 68 2 (set (reg/f:DI 319)
(plus:DI (reg:DI 187 [ ivtmp.59 ])
(const_int 160 [0xa0]))) 80 {*adddi3_internal1} (expr_list:REG_DEAD
(reg:DI 2 2)
(expr_list:REG_EQUAL (const:DI (plus:DI (symbol_ref:DI (fir_out)
[flags 0x80] var_decl 0xf7d571c0 fir_out)
(const_int 160 [0xa0])))
(nil
...

loop:
bb3 (loop-header):
...
(insn 255 254 256 3 vect-outer-fir2-kernel.c:47 (set (reg:DI 187 [ ivtmp.59 ])
(plus:DI (reg:DI 187 [ ivtmp.59 ])
(const_int 16 [0x10]))) 80 {*adddi3_internal1} (nil))
...
(insn 265 263 266 3 vect-outer-fir2-kernel.c:47 (set (reg:CC 316)
(compare:CC (reg:DI 187 [ ivtmp.59 ])
(reg/f:DI 319))) 459 {*cmpdi_internal1} (expr_list:REG_EQUAL
(compare:CC (reg:DI 187 [ ivtmp.59 ])
(const:DI (plus:DI (symbol_ref:DI (fir_out) [flags 0x80]
var_decl 0xf7d571c0 fir_out)
(const_int 160 [0xa0]
(nil)))


Below is the output of df_ref_debug for adef in each iteration of the loop in
latch_dominating_def:

d40 reg 187 bb 3 insn 255 flag 0x0 type 0x0 loc 0xf7da4608(0xf7d9a4e0) chain {
}
d93 reg 187 bb 2 insn 40 flag 0x0 type 0x0 loc 0xf7d89cc8(0xf7d9a4e0) chain { }

For both the bitmap is set.


-- 
   Summary: failing rtl iv analysis (maybe due to df)
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at gcc dot gnu dot org
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33222



[Bug rtl-optimization/33224] New: failing rtl iv analysis (maybe due to df)

2007-08-29 Thread dorit at gcc dot gnu dot org
In the testcase below, after the inner-loop gets completely unrolled, the
enclosing i-loop does not get unrolled because of failure to analyze the loop
iv, possibly due to a bug in df:

#define N 40
#define M 10
float in[N+M], coeff[M], out[N];
void fir (){
  int i,j,k;
  float diff;
  for (i = 0; i  N; i++) {
diff = 0;
for (j = 0; j  M; j++) {
  diff += in[j+i]*coeff[j];
}
out[i] = diff;
  }
}

Compiler options used:
/Develop/mainline-dn1/bin/gcc -O3 -maltivec -funroll-loops
vect-outer-fir2-kernel.c -S --param max-completely-peeled-insns=5000 --param
max-completely-peel-times=40 -fdump-tree-all -da -ftree-vectorize

(without -ftree-vectorize the i-loop does get unrolled).

Detailed description and discussion here: 
http://gcc.gnu.org/ml/gcc/2007-08/msg00482.html

Here are the relevant pieces from the RTL dump (at loop3_unroll):

bb2:
(insn 40 39 41 2 vect-outer-fir2-kernel.c:38 (set (reg:DI 187 [ ivtmp.59 ])
(mem/u/c:DI (plus:DI (reg:DI 2 2)
(const:DI (minus:DI (symbol_ref/u:DI (*.LC4) [flags 0x2])
(symbol_ref:DI (*.LCTOC1) [7 S8 A8])) 344
{*movdi_internal64} (expr_list:REG_EQUAL (symbol_ref:DI (fir_out) [flags
0x80] var_decl 0xf7d571c0 fir_out)
(nil)))

...
(insn 289 288 68 2 (set (reg/f:DI 319)
(plus:DI (reg:DI 187 [ ivtmp.59 ])
(const_int 160 [0xa0]))) 80 {*adddi3_internal1} (expr_list:REG_DEAD
(reg:DI 2 2)
(expr_list:REG_EQUAL (const:DI (plus:DI (symbol_ref:DI (fir_out)
[flags 0x80] var_decl 0xf7d571c0 fir_out)
(const_int 160 [0xa0])))
(nil
...

loop:
bb3 (loop-header):
...
(insn 255 254 256 3 vect-outer-fir2-kernel.c:47 (set (reg:DI 187 [ ivtmp.59 ])
(plus:DI (reg:DI 187 [ ivtmp.59 ])
(const_int 16 [0x10]))) 80 {*adddi3_internal1} (nil))
...
(insn 265 263 266 3 vect-outer-fir2-kernel.c:47 (set (reg:CC 316)
(compare:CC (reg:DI 187 [ ivtmp.59 ])
(reg/f:DI 319))) 459 {*cmpdi_internal1} (expr_list:REG_EQUAL
(compare:CC (reg:DI 187 [ ivtmp.59 ])
(const:DI (plus:DI (symbol_ref:DI (fir_out) [flags 0x80]
var_decl 0xf7d571c0 fir_out)
(const_int 160 [0xa0]
(nil)))


Below is the output of df_ref_debug for adef in each iteration of the loop in
latch_dominating_def:

d40 reg 187 bb 3 insn 255 flag 0x0 type 0x0 loc 0xf7da4608(0xf7d9a4e0) chain {
}
d93 reg 187 bb 2 insn 40 flag 0x0 type 0x0 loc 0xf7d89cc8(0xf7d9a4e0) chain { }

For both the bitmap is set.


-- 
   Summary: failing rtl iv analysis (maybe due to df)
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at gcc dot gnu dot org
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33224



[Bug rtl-optimization/33224] failing rtl iv analysis (maybe due to df)

2007-08-29 Thread dorit at gcc dot gnu dot org


--- Comment #1 from dorit at gcc dot gnu dot org  2007-08-29 09:04 ---
 In the testcase below, after the inner-loop gets completely unrolled, the
 enclosing i-loop does not get unrolled because of failure to analyze the loop
 iv, possibly due to a bug in df:
...
 Compiler options used:
 /Develop/mainline-dn1/bin/gcc -O3 -maltivec -funroll-loops
 vect-outer-fir2-kernel.c -S --param max-completely-peeled-insns=5000 --param
 max-completely-peel-times=40 -fdump-tree-all -da -ftree-vectorize
 (without -ftree-vectorize the i-loop does get unrolled).

(it could be ofcourse a result of something the vectorizer does. like, maybe
the vectorizer is not updating the dominance information correctly or
something. but I'd think most such information would be recomputed and verified
between vectorization and rtl unrolling? anyhow, verify_dominance seem to
pass).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33224



[Bug rtl-optimization/33222] failing rtl iv analysis (maybe due to df)

2007-08-29 Thread dorit at gcc dot gnu dot org


--- Comment #1 from dorit at gcc dot gnu dot org  2007-08-29 09:08 ---
I accidentally entered this bug twice. I'm closing this one, and will use
PR33224 instead.


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33222



[Bug rtl-optimization/33224] failing rtl iv analysis (maybe due to df)

2007-08-29 Thread dorit at gcc dot gnu dot org


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |dorit at gcc dot gnu dot org
   |dot org |
 Status|UNCONFIRMED |ASSIGNED
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2007-08-29 09:13:05
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33224



[Bug target/28629] [4.1] Segfault with --march=pentium-m -O2 when compiling faac

2007-08-26 Thread dorit at gcc dot gnu dot org


--- Comment #10 from dorit at gcc dot gnu dot org  2007-08-26 07:49 ---
(In reply to comment #9)
 I've confirmed that the problem is caused by '-ftree-vectorize' passed to
 compile gcc. More precisely, a 'movdqa' instruction in constraint_operands()
 accessed an unaligned memory.

since this is reported to work on 4.2 and 4.3, I wonder if it's related to the
fix for PR25413 (which was committed to 4.2 and 4.3).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28629



[Bug tree-optimization/33113] New: Failing to represent the stride of a dataref when it is not a constant

2007-08-19 Thread dorit at gcc dot gnu dot org
In the following testcase:

subroutine sub(aa,bb,n,m)
  implicit none
  integer, intent(in) :: n,m
  real, intent(inout) :: aa(n,m)
  real, intent(in):: bb(n,m)
  integer :: i,j
 do i = 1,m
do j= 2,n
  aa(i,j)= aa(i,j-1)+bb(i,j-1)
enddo
  enddo
end subroutine
end

The stride of the accesses in the inner-loop is a parameter (m is not a
compile-time known constant). As a result the data dataref analyzer reports:

failed: evolution of offset is not affine
...
base_address:
offset from base address:
constant offset from base address:
step:
aligned to:
base_object: (*aa_54(D))[0]
symbol tag: SMT.25


Any chance that the dataref analysis can return an (invariant) expression in
step, so that further analysis could continue? (for example, the access in
the outer-loop is consecutive, so if we had an expression to represent the
inner-loop stride, we could vectorize the outer-loop).


-- 
   Summary: Failing to represent the stride of a dataref when it is
not a constant
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dorit at gcc dot gnu dot org
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33113



[Bug tree-optimization/32378] can't determine dependence (distinct sections of an array)

2007-08-19 Thread dorit at gcc dot gnu dot org


--- Comment #6 from dorit at gcc dot gnu dot org  2007-08-19 13:47 ---
 Sebastian - any thughts/plans?

Here's another testcase:

subroutine sub(aa,bb,n,m)
  implicit none
  integer, intent(in) :: n,m
  real, intent(inout) :: aa(n,m)
  real, intent(in):: bb(n,m)
  integer :: i,j
 do j= 2,n
do i = 1,m
  aa(i,j)= aa(i,j-1)+bb(i,j-1)
enddo
  enddo
end subroutine
end

Here too we get:

(compute_affine_dependence
  (stmt_a =
D.1385_55 = (*aa_54(D))[D.1384_53])
  (stmt_b =
(*aa_54(D))[D.1380_49] = D.1390_62)
(subscript_dependence_tester
(analyze_overlapping_iterations
  (chrec_a = {pretmp.34_76 + 1, +, 1}_2)
  (chrec_b = {pretmp.34_32 + 1, +, 1}_2)
(analyze_siv_subscript
siv test failed: unimplemented.
)
  (overlap_iterations_a = not known
)
  (overlap_iterations_b = not known
)
)
(dependence classified: scev_not_known)
)
)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32378



[Bug tree-optimization/33113] Failing to represent the stride (with array) of a dataref when it is not a constant

2007-08-19 Thread dorit at gcc dot gnu dot org


--- Comment #2 from dorit at gcc dot gnu dot org  2007-08-20 05:55 ---
 Making us return symbolic stride would not be hard.  The problem is that data
 dependence analysis would fail anyway, 

sometimes (not in this testcases) there won't be a need for dependence testing
- e.g. a reduction computation where there are no stores, or initialization
with a constant (i.e. a store and no loads), so there's already a value in
doing this.

 since we cannot tell whether n is zero.

can we do the data-dependence analysis conditioned on a maybe_zero (like the
number-of-iterations analysis)? (by the way, I was told that ifort vectorizes
this. I think we'd need loop reversal to vectorize the inner-loop though. on
top of overcoming the unknown-stride issue in the DR and DDR analysis)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33113



[Bug tree-optimization/25621] Missed optimization when unrolling the loop (splitting up the sum) (only with -ffast-math)

2007-08-14 Thread dorit at gcc dot gnu dot org


--- Comment #9 from dorit at gcc dot gnu dot org  2007-08-14 20:17 ---
PR32824 discusses a similar issue.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25621



[Bug tree-optimization/32824] Missed reduction vectorizer after store to global is LIM'd

2007-08-14 Thread dorit at gcc dot gnu dot org


--- Comment #5 from dorit at gcc dot gnu dot org  2007-08-14 20:47 ---
Additional testcases:

(1) see  loop in lines 23 and 32 in 
http://gcc.gnu.org/ml/gcc-help/2007-08/msg00171.html

(2)
   SUBROUTINE SUSCEP(L,Iz)
   IMPLICIT NONE
   INTEGER L , Iz(L,L) , iznum, ix, iy
   iznum = 0
   DO ix = 1 , L
  DO iy = 1 , L
 iznum = iznum + Iz(iy,ix)
  ENDDO
   ENDDO
   PRINT* iznum
   END subroutine
   end
 

The above is a slightly modified testcase taken from Polyhedron test suite
(ac.f90).
We get:
b.f90:6: note: Analyze phi: iznum_lsm.74_31 = PHI iznum_lsm.74_32(4),
iznum_lsm.74_12(6)
b.f90:6: note: reduction: not commutative/associative: iznum.10_37
tobias2b.f90:6: note: Unknown def-use cycle pattern.
...
b.f90:6: note: worklist: examine stmt: iznum.9_36 = iznum_lsm.74_31
b.f90:6: note: vect_is_simple_use: operand iznum_lsm.74_31
b.f90:6: note: def_stmt: iznum_lsm.74_31 = PHI iznum_lsm.74_32(4),
iznum_lsm.74_12(6)
b.f90:6: note: Unsupported pattern.
b.f90:6: note: not vectorized: unsupported use in stmt.
2b.f90:6: note: unexpected pattern.

This happens because we get the following pattern:
  # iznum_lsm.74_31 = PHI iznum_lsm.74_32(4), iznum_lsm.74_12(6)
  ...
  iznum.9_36 = iznum_lsm.74_31;
  iznum.10_37 = D.1420_35 + iznum.9_36;
  iznum_lsm.74_12 = iznum.10_37;
  ...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32824



[Bug regression/32582] Bootstrap with vectorization enabled fails with ICE on PPC

2007-08-01 Thread dorit at gcc dot gnu dot org


--- Comment #24 from dorit at gcc dot gnu dot org  2007-08-01 10:08 ---
 I do; however, I got stuck with another bootstrap problem at the moment
 (vectorization changes alignment of variables, which causes a
 misscompilation of crtend.o on my machine; 

I wonder if this is related to PR32893?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32582



[Bug target/32893] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize

2007-08-01 Thread dorit at gcc dot gnu dot org


--- Comment #5 from dorit at gcc dot gnu dot org  2007-08-01 11:57 ---
Ryan, I wonder what happens if you force alignment in the source code, like so:

unsigned short count[MAXBITS+1] __attribute__ ((__aligned__(16))) ;

In this case the vectorizer does not change the alignment of the array. I
wonder if the compiler honors the alignment attribute when the user asks for
it, rather than the vectorizer. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32893



[Bug target/32893] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize

2007-08-01 Thread dorit at gcc dot gnu dot org


--- Comment #4 from dorit at gcc dot gnu dot org  2007-08-01 11:36 ---
Also just for the record - the testcase for this PR is here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413#c14


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32893



[Bug regression/32582] Bootstrap with vectorization enabled fails with ICE on PPC

2007-07-28 Thread dorit at gcc dot gnu dot org


--- Comment #8 from dorit at gcc dot gnu dot org  2007-07-28 19:20 ---
 v0 (and v10 are scratch registers and not saved.

so does it look like a register allocation bug then? 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32582



[Bug target/32893] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize

2007-07-28 Thread dorit at gcc dot gnu dot org


--- Comment #3 from dorit at gcc dot gnu dot org  2007-07-28 21:03 ---
(In reply to comment #2)
  Andrew, makes sense to you?
 I think my patch only checks PREFERRED_STACK_BOUNDARY and not STACK_BOUNDARY
 which is why it does not work but I have not looked into it at all.

I see references in the patch to both PREFERRED_STACK_BOUNDARY and
STACK_BOUNDARY. Could you please check which of these needs to be fixed? (cause
I think your fix is the more desirable one). (just for the record, the link to
the patch in question is here: 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413#c21)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32893



[Bug target/25413] wrong alignment or incorrect address computation in vectorized code on Pentium 4 SSE

2007-07-25 Thread dorit at gcc dot gnu dot org


--- Comment #17 from dorit at gcc dot gnu dot org  2007-07-25 08:40 ---
This looks like an unrelated problem - the vectorizer does not perform loop
peeling here so it's not an issue of natural alignment. Lets open a separate PR
for this one, unless there's already one open. In the meantime, would you
please try this patch?:

Index: tree-vectorizer.c
===
*** tree-vectorizer.c   (revision 126902)
--- tree-vectorizer.c   (working copy)
*** vect_can_force_dr_alignment_p (tree decl
*** 1527,1533 
 PREFERRED_STACK_BOUNDARY is honored by all translation units.
 However, until someone implements forced stack alignment, SSE
 isn't really usable without this.  */
! return (alignment = PREFERRED_STACK_BOUNDARY);
  }


--- 1527,1533 
 PREFERRED_STACK_BOUNDARY is honored by all translation units.
 However, until someone implements forced stack alignment, SSE
 isn't really usable without this.  */
! return (alignment = STACK_BOUNDARY);
  }


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413



[Bug target/25413] wrong alignment or incorrect address computation in vectorized code on Pentium 4 SSE

2007-07-25 Thread dorit at gcc dot gnu dot org


--- Comment #18 from dorit at gcc dot gnu dot org  2007-07-25 08:51 ---
Subject: Bug 25413

Author: dorit
Date: Wed Jul 25 08:51:12 2007
New Revision: 126904

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=126904
Log:
2007-07-25  Dorit Nuzman  [EMAIL PROTECTED]
Devang Patel  [EMAIL PROTECTED]

PR tree-optimization/25413
* targhooks.c (default_builtin_vector_alignment_reachable): New.
* targhooks.h (default_builtin_vector_alignment_reachable): New.
* tree.h (contains_packed_reference): New.
* expr.c (contains_packed_reference): New.
* tree-vect-analyze.c (vector_alignment_reachable_p): New.
(vect_enhance_data_refs_alignment): Call
vector_alignment_reachable_p.
* target.h (vector_alignment_reachable): New builtin.
* target-def.h (TARGET_VECTOR_ALIGNMENT_REACHABLE): New.
* config/rs6000/rs6000.c (rs6000_vector_alignment_reachable): New.
(TARGET_VECTOR_ALIGNMENT_REACHABLE): Define.

2007-07-25  Dorit Nuzman  [EMAIL PROTECTED]
Devang Patel  [EMAIL PROTECTED]
Uros Bizjak  [EMAIL PROTECTED]

PR tree-optimization/25413
* lib/target-supports.exp (check_effective_target_vect_aligned_arrays):
New procedure to check if arrays are naturally aligned to the vector
alignment boundary.
* gcc.dg/vect/vect-align-1.c: New.
* gcc.dg/vect/vect-align-2.c: New.
* gcc.dg/vect/pr25413.c: New.
* gcc.dg/vect/pr25413a.c: New.


Added:
branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/vect/pr25413.c
branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/vect/pr25413a.c
branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/vect/vect-align-1.c
branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/vect/vect-align-2.c
Modified:
branches/gcc-4_2-branch/gcc/ChangeLog
branches/gcc-4_2-branch/gcc/config/rs6000/rs6000.c
branches/gcc-4_2-branch/gcc/expr.c
branches/gcc-4_2-branch/gcc/target-def.h
branches/gcc-4_2-branch/gcc/target.h
branches/gcc-4_2-branch/gcc/targhooks.c
branches/gcc-4_2-branch/gcc/targhooks.h
branches/gcc-4_2-branch/gcc/testsuite/ChangeLog
branches/gcc-4_2-branch/gcc/testsuite/lib/target-supports.exp
branches/gcc-4_2-branch/gcc/tree-vect-analyze.c
branches/gcc-4_2-branch/gcc/tree.h


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413



[Bug target/25413] wrong alignment or incorrect address computation in vectorized code on Pentium 4 SSE

2007-07-25 Thread dorit at gcc dot gnu dot org


--- Comment #19 from dorit at gcc dot gnu dot org  2007-07-25 08:52 ---
problem fixed.


-- 

dorit at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413



[Bug target/25413] wrong alignment or incorrect address computation in vectorized code on Pentium 4 SSE

2007-07-25 Thread dorit at gcc dot gnu dot org


--- Comment #21 from dorit at gcc dot gnu dot org  2007-07-25 11:11 ---
 Of course after my patch for PR 16660, the patch here should be
 changed to just return true always.

In this case, Ryan, could you please also try to see if Andrew's patch
(http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00177.html) fixes the problem?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413



[Bug target/32893] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize

2007-07-25 Thread dorit at gcc dot gnu dot org


--- Comment #1 from dorit at gcc dot gnu dot org  2007-07-25 20:43 ---
thanks a lot for checking both patches!

 With this patch zlib appears to compile successfully.  The loop is
 vectorized with an alignment of access forced using peeling note and linked
 apps no longer segfault.

I'd like to try to verify if the problem is indeed related to the
STACK_BOUNDARY, or whether this has to do with some weird interplay with the
compilation of some other function, possibly after inlining (i.e. something
like what we had in PR27770). I'm not sure how to suggest to check that...

 I also tested using Andrew's patch from bug #16660 and always returning true 
 in
 vect_can_force_dr_alignment_p but it does not fix this error.

Andrew, makes sense to you?

 Let me know if I can provide any other info that would be useful to you.

thanks, I'll think about it...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32893



[Bug bootstrap/31776] Bootstrap fails with error: conflicting types for strsignal

2007-07-24 Thread dorit at gcc dot gnu dot org


--- Comment #8 from dorit at gcc dot gnu dot org  2007-07-24 07:50 ---
Subject: Bug 31776

Author: dorit
Date: Tue Jul 24 07:50:10 2007
New Revision: 126868

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=126868
Log:
2007-07-23  Dorit Nuzman  [EMAIL PROTECTED]

merge revision 124373 from trunk:
2007-05-02  Brooks Moses  [EMAIL PROTECTED]

PR bootstrap/31776
* system.h: Remove inclusion of double-int.h
* tree.h: Include double-int.h
* gengtype.c: Likewise
* cfgloop.h: Likewise
* Makefile.in: Adjust dependencies on double-int.h


Modified:
branches/autovect-branch/   (props changed)
branches/autovect-branch/gcc/ChangeLog.autovect
branches/autovect-branch/gcc/Makefile.in
branches/autovect-branch/gcc/cfgloop.h
branches/autovect-branch/gcc/gengtype.c
branches/autovect-branch/gcc/system.h
branches/autovect-branch/gcc/tree.h

Propchange: branches/autovect-branch/
('svnmerge-integrated' modified)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31776



[Bug tree-optimization/32093] BOOT_CFLAGS=-O2 -g -msse2 -ftree-vectorize causes dfp tests to fail

2007-07-24 Thread dorit at gcc dot gnu dot org


--- Comment #4 from dorit at gcc dot gnu dot org  2007-07-24 08:50 ---
 i'm wondering if this could be related to a problem we're seeing with 
 segfaults
 caused by misaligned movdqa instructions in zlib compiled with
 -ftree-vectorize.

A fix for PR25413 was committed to mainline. 
Ryan, could you please check if it solves the zlib miscompilation? 
Andrew, would you plase check if it solves the libgcc miscompilation that you
are seeing?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32093



[Bug target/32218] [4.2/4.3 Regression] segfault with -O1 -ftree-vectorize

2007-07-24 Thread dorit at gcc dot gnu dot org


--- Comment #5 from dorit at gcc dot gnu dot org  2007-07-24 08:53 ---
(In reply to comment #4)
 I just tried to reproduce this bug on IA64 Linux (and HP-UX) with ToT sources
 (version 126242) and was not able to.  Can anyone else reproduce this with ToT
 sources?

does the fact that no one has responded yet means that this failure cannot be
reproduced anymore?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32218



  1   2   >