[Bug target/65167] ICE: in assign_by_spills, at lra-assigns.c:1383 (unable to find a register to spill) with -O -fschedule-insns -fcheck-pointer-bounds -mmpx

2015-02-25 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65167

--- Comment #6 from Ilya Enkovich enkovich.gnu at gmail dot com ---
(In reply to Uroš Bizjak from comment #5)
 (In reply to Ilya Enkovich from comment #4)
 
  +  if (TARGET_MPX  BND_REGNO_P (regno))
 
 No need for TARGET_MPX check, there will be no bnd regs when this flag is
 cleared.

__builtin_apply_args stores all registers that might be used to pass arguments
to a function.  With no target check it will always try to store bounds with no
instructions to do that.

[Bug target/65167] ICE: in assign_by_spills, at lra-assigns.c:1383 (unable to find a register to spill) with -O -fschedule-insns -fcheck-pointer-bounds -mmpx

2015-02-24 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65167

--- Comment #1 from Ilya Enkovich enkovich.gnu at gmail dot com ---
For call arguments we usually store bounds passed in bounds tables and then
fill bounds passed in registers.  But with -fschedule-insns we have order
changed and all hard registers are filled with values before BNDSTX.  This is
not a nice schedule because it requires additional spills.  Seems LRA fails to
spill a register when all of them are used to pass args.  This situation didn't
happen before because bounds registers is the first case when we use all
registers to pass args. Should LRA be able to spill/fill initialized hard reg?
Can it be fixed or we better avoid such scheduling?


[Bug target/65167] ICE: in assign_by_spills, at lra-assigns.c:1383 (unable to find a register to spill) with -O -fschedule-insns -fcheck-pointer-bounds -mmpx

2015-02-24 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65167

--- Comment #4 from Ilya Enkovich enkovich.gnu at gmail dot com ---
ix86_function_arg_regno_p doesn't recognize bnd registers as args. Also
avoid_func_arg_motion doesn't work for BNDSTX because it is not a single set.  

This patch works for reproducer:

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 71a5b22..acbe25f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6068,6 +6068,9 @@ ix86_function_arg_regno_p (int regno)
   int i;
   const int *parm_regs;

+  if (TARGET_MPX  BND_REGNO_P (regno))
+return true;
+
   if (!TARGET_64BIT)
 {
   if (TARGET_MACHO)
@@ -26846,6 +26849,16 @@ avoid_func_arg_motion (rtx_insn *first_arg, rtx_insn
*insn)
   rtx set;
   rtx tmp;

+  /* Add anti dependencies for bounds stores.  */
+  if (INSN_P (insn)
+   GET_CODE (PATTERN (insn)) == PARALLEL
+   GET_CODE (XVECEXP (PATTERN (insn), 0, 0)) == UNSPEC
+   XINT (XVECEXP (PATTERN (insn), 0, 0), 1) == UNSPEC_BNDSTX)
+{
+  add_dependence (first_arg, insn, REG_DEP_ANTI);
+  return;
+}
+
   set = single_set (insn);
   if (!set)
 return;


Will run a testing for it.


[Bug target/65103] New: [i386] GOTOFF relocation is not propagated into address expression

2015-02-18 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65103

Bug ID: 65103
   Summary: [i386] GOTOFF relocation is not propagated into
address expression
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: enkovich.gnu at gmail dot com

In PIC code there are multiple cases when GOTOFF relocation is put into
register and then used in address expression instead of using relocation
directly in address expression.  Here is an example:

cat test.c
typedef struct S
{
  int a;
  int sum;
  int delta;
} S;

S gs;
int global_opt (int max)
{
  while (gs.sum  max)
gs.sum += gs.delta;
  return gs.a;
}
gcc test.c -m32 -O2 -fPIE -S
cat test.s
...
pushl   %esi
lealgs@GOTOFF, %esi
pushl   %ebx
call__x86.get_pc_thunk.bx
addl$_GLOBAL_OFFSET_TABLE_, %ebx
movl12(%esp), %edx
movl4(%esi,%ebx), %eax
cmpl%eax, %edx
jle .L4
movl8(%esi,%ebx), %ecx
.L3:
addl%ecx, %eax
cmpl%eax, %edx
jg  .L3
movl%eax, 4(%esi,%ebx)
.L4:
movlgs@GOTOFF(%ebx), %eax
popl%ebx
popl%esi
ret

A separate instruction to get gs@GOTOFF is generated in expand.  Later fwprop
propagates this constant only into memory references with zero offset and leave
register usage in all others.

Used compiler:

Target: x86_64-unknown-linux-gnu
Configured with: ../gcc/configure --enable-languages=c,c++,fortran
--disable-bootstrap --prefix=/export/users/ienkovic/ --disable-libsanitizer
Thread model: posix
gcc version 5.0.0 20150217 (experimental) (GCC)


[Bug target/65105] [i386] XMM registers are not used for 64bit computations on 32bit target

2015-02-18 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65105

--- Comment #2 from Ilya Enkovich enkovich.gnu at gmail dot com ---
For this test I see 'plus' and 'minus' ops have DI mode until RA and get GPR
pairs:

(insn 12 35 13 2 (parallel [
(set (reg:DI 0 ax [orig:98 D.1945 ] [98])
(plus:DI (reg:DI 0 ax [orig:97 D.1945 ] [97])
(reg:DI 2 cx [orig:96 D.1945 ] [96])))
(clobber (reg:CC 17 flags))
]) test.c:4 215 {*adddi3_doubleword}
 (nil))
(insn 13 12 18 2 (parallel [
(set (reg:DI 0 ax [orig:95 D.1945 ] [95])
(minus:DI (reg:DI 0 ax [orig:98 D.1945 ] [98])
(reg/v:DI 4 si [orig:94 z ] [94])))
(clobber (reg:CC 17 flags))
]) test.c:4 259 {*subdi3_doubleword}
 (nil))

'ior' and 'and' use SI mode and subregs starting from expand.


[Bug target/65105] New: [i386] XMM registers are not used for 64bit computations on 32bit target

2015-02-18 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65105

Bug ID: 65105
   Summary: [i386] XMM registers are not used for 64bit
computations on 32bit target
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: enkovich.gnu at gmail dot com

XMM registers may be used for 64bit operations on 32bit target.  It should make
code faster and free some GPRs.

Here is an example test where GCC doesn't use XMM registers and possible code
with XMM usage:

cat test.c
long long
test1 (long long x, long long y, long long z)
{
  return ((x | z ) + (y  z) - z);
}
cat test_xmm.s
.file test.c
.text
.globl test1
test1:
movq  4(%esp), %xmm2
movq  20(%esp), %xmm1
movq  12(%esp), %xmm0
por   %xmm1, %xmm2
pand  %xmm1, %xmm0
paddq %xmm0, %xmm2
psubq %xmm1, %xmm2
movd  %xmm2, %eax
psrlq $32, %xmm2
movd  %xmm2, %edx
ret


[Bug target/65044] ICE: SIGSEGV in contains_struct_check with -fsanitize=address -fcheck-pointer-bounds

2015-02-13 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65044

--- Comment #1 from Ilya Enkovich enkovich.gnu at gmail dot com ---
ICE occurs due to NULL field attached to a constructor element used for
initialization of internal asan structure.

Overall I don't think we should allow simultaneous usage of Pointer Bounds
Checker and Address Sanitizer.  It was never investigated how they may
conflict.  There should be at least a problem with static objects where each
instrumentation creates static objects to describe existing ones, newly created
objects are then also described by each other etc.

I will prepare a patch to prevent checker usage with sanitizers.


[Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register

2015-02-13 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #11 from Ilya Enkovich enkovich.gnu at gmail dot com ---
(In reply to Vladimir Makarov from comment #10)
 I guess it is easy to check by preventing pic pseudo generation.

i386 back-end doesn't support fixed PIC register any more.  This test case
demonstrates performance regression in some EEMBC 1.1 tests caused by pseudo
PIC register introduction.

It is unclear why RA decided to spill PIC register.  If we look at loop's code
then we see PIC register is used in each line of code and seems to be the most
used register.

It also seems weird to me that code for the first loop becomes much better
(with no PIC reg fills) if we restrict inlining for the other one.  How does
the second loop affect allocation in the first one?


[Bug tree-optimization/65002] [5 Regression] ICE: Segmentation fault

2015-02-11 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65002

--- Comment #6 from Ilya Enkovich enkovich.gnu at gmail dot com ---
Is this actually an ICE on valid code?  'const' attribute seems incorrect here
similar to what we had in PR64353.

The problem comes from SSA inconsistency caused by the wrong attribute. 
Probably just ignore such cases in SRA as was previously proposed for PR64353?

Here is a possible patch (SSA update at fixup_cfg start may be removed then):

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index ad9584e..7f78e68 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -4890,6 +4890,20 @@ some_callers_have_mismatched_arguments_p (struct
cgraph_node *node,
   return false;
 }

+/* Return false if all callers have vuse attached to a call statement.  */
+
+static bool
+some_callers_have_no_vuse_p (struct cgraph_node *node,
+void *data ATTRIBUTE_UNUSED)
+{
+  struct cgraph_edge *cs;
+  for (cs = node-callers; cs; cs = cs-next_caller)
+if (!cs-call_stmt || !gimple_vuse (cs-call_stmt))
+  return true;
+
+  return false;
+}
+
 /* Convert all callers of NODE.  */

 static bool
@@ -5116,6 +5130,15 @@ ipa_early_sra (void)
   goto simple_out;
 }

+  if (node-call_for_symbol_thunks_and_aliases
+   (some_callers_have_no_vuse_p, NULL, true))
+{
+  if (dump_file)
+   fprintf (dump_file, There are callers with no VUSE attached 
+to a call stmt.\n);
+  goto simple_out;
+}
+
   bb_dereferences = XCNEWVEC (HOST_WIDE_INT,
 func_param_count
 * last_basic_block_for_fn (cfun));


[Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register

2015-02-06 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

--- Comment #5 from Ilya Enkovich enkovich.gnu at gmail dot com ---
(In reply to Jakub Jelinek from comment #4)
 Does #c2 fix this, or is #c3 an unrelated bugreport that still needs fixing?

Problem is still seen after the fix.  I put test here because of the same
symptom.  Should I open a new one?


[Bug rtl-optimization/64960] New: Inefficient address pre-computation in PIC mode

2015-02-06 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64960

Bug ID: 64960
   Summary: Inefficient address pre-computation in PIC mode
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: enkovich.gnu at gmail dot com

After EBX was unfixed in i386 PIC target, we may see addresses of static
objects are loaded from GOT and placed to the stack for later usage.  It allows
to reuse PIC register for other purposes.  But in cases when PIC register is
still used (e.g. for calls) it may cause inefficiency in produced code. Here is
an example:

cat test.c
void f (int);
int val1, *val2, val3;
int test (int max)
{
  int i;
  for (i = 0; i  max; i++)
{
  val1 += val2[i];
  f (val3);
}
}
gcc test.c -O2 -fPIE -S -m32 -ffixed-esi -ffixed-edi -ffixed-edx
cat test.s
...
movlval1@GOT(%ebx), %eax  -- may be removed
xorl%ebp, %ebp
movl%eax, 4(%esp) -- may be removed
movlval2@GOT(%ebx), %eax  -- may be removed
movl%eax, 8(%esp) -- may be removed
movlval3@GOT(%ebx), %eax  -- may be removed
movl%eax, 12(%esp)-- may be removed
.L3:
movl8(%esp), %eax -- equal tomovl  val2@GOT(%ebx),
%eax
subl$12, %esp
movl(%eax), %ecx
movl16(%esp), %eax-- equal tomovl  val1@GOT(%ebx),
%eax
movl(%ecx,%ebp,4), %ecx
addl%ecx, (%eax)
addl$1, %ebp
movl24(%esp), %eax-- equal tomovl  val3@GOT(%ebx),
%eax
pushl   (%eax)
callf@PLT
addl$16, %esp
cmpl%ebp, 32(%esp)
jne .L3
...

Also storing value on the stack doesn't benefit on static objects optimization
performed by linker which transforms movl symbol@GOTPIC into lea
instruction.  It would be useful to avoid early address computation in case PIC
register is available at address usage.

Here is a code generated by GCC 4.9:

xorl%ebp, %ebp
.L2:
movlval2@GOT(%ebx), %eax
subl$12, %esp
movl(%eax), %ecx
movlval1@GOT(%ebx), %eax
movl(%ecx,%ebp,4), %ecx
addl%ecx, (%eax)
addl$1, %ebp
movlval3@GOT(%ebx), %eax
pushl   (%eax)
callf@PLT
addl$16, %esp
cmpl16(%esp), %ebp
jne .L2

Used gcc (GCC) 5.0.0 20150205 (experimental).


[Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register

2015-02-05 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317

Ilya Enkovich enkovich.gnu at gmail dot com changed:

   What|Removed |Added

 CC||enkovich.gnu at gmail dot com

--- Comment #3 from Ilya Enkovich enkovich.gnu at gmail dot com ---
Created attachment 34675
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34675action=edit
Another reproducer

I found one more reproducer for the problem.  Generated code for multiple
'output' calls inlined into test':

..
.L5:
movl28(%esp), %edx   -- load PIC reg
movl%edi, (%ecx)
leal(%ebx,%eax), %ecx
cmpl%ecx, %ebp
movl%ebx, out_pos@GOTOFF(%edx)
movlval3@GOTOFF(%edx), %edi
jnb .L6
movl28(%esp), %ebx  -- have value in EDX
movlout@GOTOFF(%ebx), %ebx
leal(%ebx,%eax), %ecx
.L6:
movl28(%esp), %edx  -- NOP
movl%edi, (%ebx)
leal(%ecx,%eax), %ebx
cmpl%ebx, %ebp
movl%ecx, out_pos@GOTOFF(%edx)
movlval4@GOTOFF(%edx), %edi
jnb .L7
movl28(%esp), %ecx -- have value in EDX
movlout@GOTOFF(%ecx), %ecx
leal(%ecx,%eax), %ebx
.L7:
movl28(%esp), %edx -- NOP
movl%edi, (%ecx)
...

BTW if I put __attribute__((noinline)) for crc32 function then mentioned code
becomes better and we don't have these two useless instructions in each
function instance.

Compilation string:

gcc -Ofast -funroll-loops -m32 -march=slm -fPIE test.i -S

Used compiler:

gcc version 5.0.0 20150203 (experimental) (GCC)


[Bug middle-end/64805] Specific use of __attribute ((always_inline)) breaks MPX functionality with -fcheck-pointer-bounds -mmpx

2015-01-27 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64805

Ilya Enkovich enkovich.gnu at gmail dot com changed:

   What|Removed |Added

 CC||enkovich.gnu at gmail dot com

--- Comment #2 from Ilya Enkovich enkovich.gnu at gmail dot com ---
This might be introduced by the recent changes in always_inline functions
instrumentation.  Now we keep them alive longer and therefore have inline of
the original functionA into the original functionB.  It causes error in a
verifier because inliner clears all references and then calls cgraph node
verification which expects IPA_REF_CHKP reference to instrumented functionB.

I would like to keep IPA_REF_CHKP check in the verifier because this ref is
important for reachability analysis.  Thus we probably should rebuild
IPA_REF_CHKP reference in inliner.  Will tests this patch:

diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index c0ff329..d341619 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -2464,6 +2464,13 @@ early_inliner (function *fun)
 #endif
   node-remove_all_references ();

+  /* Rebuild this reference because it dosn't depend on
+ function's body and it's required to pass cgraph_node
+ verification.  */
+  if (node-instrumented_version
+   !node-instrumentation_clone)
+node-create_reference (node-instrumented_version, IPA_REF_CHKP, NULL);
+
   /* Even when not optimizing or not inlining inline always-inline
  functions.  */
   inlined = inline_always_inline_functions (node);


[Bug tree-optimization/64277] [4.9/5 Regression] Incorrect warning array subscript is above array bounds

2015-01-26 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64277

--- Comment #12 from Ilya Enkovich enkovich.gnu at gmail dot com ---
(In reply to Richard Biener from comment #10)
 Ick - that will also paper over good warnings so I'd rather not do that.

I'm also worried about possible good warnings removal.  Thus I disable them
only in case cunroll speculates about iterations number and never disable them
for the first loop iteration.

I agree warnings disabling looks like a workaround.  But it doesn't seem
correct to complain on code generated by compiler and probably never executed. 
Each time maxiter is used for complete unroll following optimizations may
improve maxiter estimation and thus we get a compiler generated dead code which
still may produce warnings.


[Bug tree-optimization/64277] [4.9/5 Regression] Incorrect warning array subscript is above array bounds

2015-01-26 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64277

--- Comment #8 from Ilya Enkovich enkovich.gnu at gmail dot com ---
Created attachment 34569
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34569action=edit
patch to disable warnings for array references generated by cunroll


[Bug tree-optimization/64277] [4.9/5 Regression] Incorrect warning array subscript is above array bounds

2015-01-26 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64277

--- Comment #13 from Ilya Enkovich enkovich.gnu at gmail dot com ---
Ranges have to be used for maxiter computations to have consistent analysis in
complete unroll and vrp.  Following patch allows to refine maxiter estimation
using ranges and avoid warnings.

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 919f5c0..14cce2a 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -2754,6 +2754,7 @@ record_nonwrapping_iv (struct loop *loop, tree base, tree
step, gimple stmt,
 {
   tree niter_bound, extreme, delta;
   tree type = TREE_TYPE (base), unsigned_type;
+  tree orig_base = base;

   if (TREE_CODE (step) != INTEGER_CST || integer_zerop (step))
 return;
@@ -2777,7 +2778,14 @@ record_nonwrapping_iv (struct loop *loop, tree base,
tree step, gimple stmt,

   if (tree_int_cst_sign_bit (step))
 {
+  wide_int min, max, highwi = high;
   extreme = fold_convert (unsigned_type, low);
+  if (TREE_CODE (orig_base) == SSA_NAME
+  !POINTER_TYPE_P (TREE_TYPE (orig_base))
+  SSA_NAME_RANGE_INFO (orig_base)
+  get_range_info (orig_base, min, max) == VR_RANGE
+  wi::gts_p (highwi, max))
+   base = wide_int_to_tree (unsigned_type, max);
   if (TREE_CODE (base) != INTEGER_CST)
base = fold_convert (unsigned_type, high);
   delta = fold_build2 (MINUS_EXPR, unsigned_type, base, extreme);
@@ -2785,8 +2793,15 @@ record_nonwrapping_iv (struct loop *loop, tree base,
tree step, gimple stmt,
 }
   else
 {
+  wide_int min, max, lowwi = low;
   extreme = fold_convert (unsigned_type, high);
-  if (TREE_CODE (base) != INTEGER_CST)
+  if (TREE_CODE (orig_base) == SSA_NAME
+  !POINTER_TYPE_P (TREE_TYPE (orig_base))
+  SSA_NAME_RANGE_INFO (orig_base)
+  get_range_info (orig_base, min, max) == VR_RANGE
+  wi::gts_p (min, lowwi))
+   base = wide_int_to_tree (unsigned_type, min);
+  else if (TREE_CODE (base) != INTEGER_CST)
base = fold_convert (unsigned_type, low);
   delta = fold_build2 (MINUS_EXPR, unsigned_type, extreme, base);
 }
diff --git a/gcc/testsuite/gcc.dg/pr64277.c b/gcc/testsuite/gcc.dg/pr64277.c
new file mode 100644
index 000..0d5ef11
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr64277.c
@@ -0,0 +1,21 @@
+/* PR tree-optimization/64277 */
+/* { dg-do compile } */
+/* { dg-options -O3 -Wall -Werror } */
+
+
+int f1[10];
+void test1 (short a[], short m, unsigned short l)
+{
+  int i = l;
+  for (i = i + 5; i  m; i++)
+f1[i] = a[i]++;
+}
+
+void test2 (short a[], short m, short l)
+{
+  int i;
+  if (m  5)
+m = 5;
+  for (i = m; i  l; i--)
+f1[i] = a[i]++;
+}


[Bug tree-optimization/64277] [4.9/5 Regression] Incorrect warning array subscript is above array bounds

2015-01-26 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64277

--- Comment #9 from Ilya Enkovich enkovich.gnu at gmail dot com ---
Nice solution for this problem would be to have a better estimation of maximum
loop iterations number.  Currently array size and index step are used to get
the maximum ignoring starting index value range.

Another way to solve the problem is to disable warnings for code generated by
cunroll in case it cannot compute exact number of iterations.  I attach a patch
which does it.

This bug is hit multiple times on Android build with GCC 4.9.  With this fix we
have a clean Android build with GCC 4.9.


[Bug jit/64722] On 2nd time libgccjit is run in-process on i686, generated code clobbers %ebx register

2015-01-23 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64722

--- Comment #14 from Ilya Enkovich enkovich.gnu at gmail dot com ---
(In reply to David Malcolm from comment #13)
 
 Ilya: I can't speak to the correctness of the above code or patch, but
 r220044 fixes the original issue I ran into.  Do you want me to keep this
 bug open, or should we track the above in a separate PR?

I think you may close this tracker if your issue is resolved.  Change I
mentioned is a minor fix I would like to have installed but I'll handle it
separately.


[Bug tree-optimization/64277] [4.9/5 Regression] Incorrect warning array subscript is above array bounds

2015-01-22 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64277

Ilya Enkovich enkovich.gnu at gmail dot com changed:

   What|Removed |Added

 CC||enkovich.gnu at gmail dot com

--- Comment #7 from Ilya Enkovich enkovich.gnu at gmail dot com ---
Here is a reduced test case:

cat test.c
int f1[10];
void foo(short a[], short m, unsigned short l)
{
  int i = l;
  for (i = i + 5; i  m; i++)
f1[i] = a[i]++;
}
gcc test.c -O3 -c -Wall
test.c: In function 'foo':
test.c:6:7: warning: array subscript is above array bounds [-Warray-bounds]
 f1[i] = a[i]++;
   ^
test.c:6:7: warning: array subscript is above array bounds [-Warray-bounds]
test.c:6:7: warning: array subscript is above array bounds [-Warray-bounds]
test.c:6:7: warning: array subscript is above array bounds [-Warray-bounds]
test.c:6:7: warning: array subscript is above array bounds [-Warray-bounds]

Here we have complete unroll of the loop by 10 due to f1 size.  Later vrp
complains of last five produced iterations accessing above array bounds.

Used GCC 5.0.


[Bug jit/64722] On 2nd time libgccjit is run in-process on i686, generated code clobbers %ebx register

2015-01-22 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64722

Ilya Enkovich enkovich.gnu at gmail dot com changed:

   What|Removed |Added

 CC||enkovich.gnu at gmail dot com

--- Comment #4 from Ilya Enkovich enkovich.gnu at gmail dot com ---
(In reply to Jakub Jelinek from comment #3)
 But then wonder if/how target_reinit works for i?86 32-bit.
 Perhaps pic_offset_table_rtx should be cleared in init_emit_regs before
 computing it?
   pic_offset_table_rtx = NULL_RTX;
   if ((unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM)
 pic_offset_table_rtx = gen_raw_REG (Pmode, PIC_OFFSET_TABLE_REGNUM);

Clearing pic_offset_table_rtx here would mean PIC_OFFSET_TABLE_REGNUM tranfroms
into EBX and pic_offset_table_rtx is initialized with EBX which is not what we
want.  Probably we just shouldn't try to initialize pic_offset_table_rtx with a
hard reg in case target assumes pseudo pic reg?

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index df85366..51ef3a5 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -5872,7 +5872,8 @@ init_emit_regs (void)
 = gen_raw_REG (Pmode, RETURN_ADDRESS_POINTER_REGNUM);
 #endif

-  if ((unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM)
+  if (!targetm.use_pseudo_pic_reg ()
+   (unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM)
 pic_offset_table_rtx = gen_raw_REG (Pmode, PIC_OFFSET_TABLE_REGNUM);
   else
 pic_offset_table_rtx = NULL_RTX;


[Bug jit/64722] On 2nd time libgccjit is run in-process on i686, generated code clobbers %ebx register

2015-01-22 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64722

--- Comment #11 from Ilya Enkovich enkovich.gnu at gmail dot com ---
(In reply to David Malcolm from comment #10)
 which led to investigating this code in ix86_conditional_register_usage:
 4394  j = PIC_OFFSET_TABLE_REGNUM;
 4395  if (j != INVALID_REGNUM)
 4396fixed_regs[j] = call_used_regs[j] = 1;
 and line 4396 is bizarrely only called on the 2nd iteration, not the 1st,
 which led me to investigate PIC_OFFSET_TABLE_REGNUM, and discover what
 appears to be the root cause, as described in comment #1.

Now I see.  The problem also is in ix86_conditional_register_usage that relies
on pic_offset_table_rtx value.  As I said EBX value is used only to estimate
costs for middle-end.  Thus we shouldn't fix reg here if pseudo pic register is
used and correct code would be:

@@ -4388,7 +4388,7 @@ ix86_conditional_register_usage (void)

   /* The PIC register, if it exists, is fixed.  */
   j = PIC_OFFSET_TABLE_REGNUM;
-  if (j != INVALID_REGNUM)
+  if (j != INVALID_REGNUM  !ix86_use_pseudo_pic_reg ())
 fixed_regs[j] = call_used_regs[j] = 1;

   /* For 32-bit targets, squash the REX registers.  */


[Bug jit/64722] On 2nd time libgccjit is run in-process on i686, generated code clobbers %ebx register

2015-01-22 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64722

--- Comment #8 from Ilya Enkovich enkovich.gnu at gmail dot com ---
different hooks(In reply to Jakub Jelinek from comment #5)
 Can you explain it?  Usually when this function is called,
 pic_offset_table_rtx is NULL and your i386.h macro relies on that.
 When initializing default target during initialization it is NULL of course,
 and apparently even in target_reinit, when it is called with freshly cleared
 heap object for the non-default target.
 It is just when jit calls the initialization again without clearing all the
 variables...
 So I believe my proposed change is correct.
 
 In any case, perhaps jit shouldn't reinitialize everything all the time, at
 least if the compilation options don't change.

I misunderstood places when init_emit_regs is called and my fix is incorrect.

It is still unclear to me how this initialization affects generated code.  IIRC
we let pic_offset_table_rtx be EBX only because of middle-end which calls
target hooks for code cost estimations.  In this case we needed some valid pic
reg to generate RTL for its estimation and EBX was used.  But in target code
pic_offset_table_rtx is initialized with pseudo register and value set in
init_emit_regs shouldn't matter.


[Bug target/64691] New: Suboptimal register allocation for bytes comparison on i386

2015-01-20 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64691

Bug ID: 64691
   Summary: Suboptimal register allocation for bytes comparison on
i386
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: enkovich.gnu at gmail dot com

This problem was actually found in 256.bzip2 benchmark codes compiled by GCC
5.0 on -O2.  There is a small loop with bytes comparison which appeared to be
ineffective because compared values were not allocated on registers allowing
byte access.  That caused additional copies and as a result significant loop
slow down.

Situation may be simulated on a small test if we restrict registers usage.

cat test.c
void test (unsigned char *p, unsigned char val)
{
  unsigned char tmp1, tmp2;
  int i;

  i = 0;
  tmp1 = p[0];
  while (val != tmp1)
{
  i++;
  tmp2 = tmp1;
  tmp1  = p[i];
  p[i] = tmp2;
}
  p[0]= tmp1;
}
gcc -O2 -m32 -ffixed-ebx test.c -S

Here is a loop:

.L3:
movzbl  (%eax), %ebp
movl%esi, %ecx
movb%dl, (%eax)
addl$1, %eax
movl%ebp, %edx
cmpb%dl, %cl
jne .L3

We have an extra register copy esi-ecx to perform comparison.

Suppose the easiest way to get better register allocation here would be to
transform QI comparison into SI one to relax register constraints.


[Bug target/64363] Unresolved labels with -fcheck-pointer-bounds and -mmpx

2015-01-15 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64363

Ilya Enkovich enkovich.gnu at gmail dot com changed:

   What|Removed |Added

 CC||enkovich.gnu at gmail dot com

--- Comment #1 from Ilya Enkovich enkovich.gnu at gmail dot com ---
We copy function to instrument it but static var is initialized using labels
from the original function and thus we get unresolved links.  Suppose we would
get the same problem with non local gotos.

Suppose it would be more safe to just don't instrument such functions for now
and get back to it at the next stage 1.


[Bug middle-end/64353] [5 Regression] ICE: in execute_todo, at passes.c:1986

2015-01-14 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64353

--- Comment #7 from Ilya Enkovich enkovich.gnu at gmail dot com ---
Right, wrong const attribute causes no VUSE for calls to the function which
leads to # VUSE .MEM generated for added loads and requires SSA update.

We may actually call update_ssa only in case of missing VUSE still allowing
optimization for functions wrongly marked as const.


[Bug middle-end/64353] [5 Regression] ICE: in execute_todo, at passes.c:1986

2015-01-12 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64353

Ilya Enkovich enkovich.gnu at gmail dot com changed:

   What|Removed |Added

 CC||enkovich.gnu at gmail dot com

--- Comment #5 from Ilya Enkovich enkovich.gnu at gmail dot com ---
When we process function C::xx early_ipa_sra pass performs a modification of
C::i in ipa_modify_call_arguments adding a load statements.  It marks function
C::i as requiring ssa renaming for vops.  Later we start processing of C::i and
get ICE at execute_todo (pass-todo_flags_start) because it expects update ssa
flags for functions requiring such update.

Before r217125 it worked because C::i was not in SSA form at the moment of load
insertion.

To fix it we may either call update_ssa from ipa_modify_call_arguments or add
update into todo_flags_start of fixup_cfg (we run it at the beginning of all
gimple passes lists anyway).

Possible fix (helps for the test, not fully tested):

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 01f4111..533dcfe 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -4054,6 +4054,8 @@ ipa_modify_call_arguments (struct cgraph_edge *cs, gcall
*stmt,
expr = create_tmp_reg (TREE_TYPE (expr));
  gimple_assign_set_lhs (tem, expr);
  gsi_insert_before (gsi, tem, GSI_SAME_STMT);
+ if (gimple_in_ssa_p (cfun))
+   update_ssa (TODO_update_ssa_only_virtuals);
}
}
  else


[Bug bootstrap/63995] Bootstrap error with -mmpx -fcheck-pointer-bounds

2014-12-09 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63995

--- Comment #12 from Ilya Enkovich enkovich.gnu at gmail dot com ---
For r218506 bootstrap with BOOT_CFLAGS=-O2 -g -fcheck-pointer-bounds -mmpx on
x86_64-unknown-linux-gnu is OK.


[Bug target/64003] valgrind complains about get_attr_length_nobnd in insn-attrtab.c from i386.md

2014-12-05 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64003

--- Comment #21 from Ilya Enkovich enkovich.gnu at gmail dot com ---
(In reply to Jeffrey A. Law from comment #20)
 Ilya, it's the function call in this code I think:
 
   (cond [(eq_attr length_nobnd !0)
(plus (symbol_ref (ix86_bnd_prefixed_insn_p (insn)))
  (attr length_nobnd))
 
 You're calling out to ix86_bnd_prefixed_insn_p, and that's problematical for
 branch shortening if I'm understanding Joern's comments here and David's
 comments in the PA port correctly.

Then we have three problematic patterns and the easiest way to handle it is to
get rid of ix86_bnd_prefixed_insn_p call in length computation for them.  I
think the easiest way to do it is to have separate bnd and nobnd patterns for
these instructions.  Attached patch helps me to resolve valgrind error.  Is
such approach fine?


[Bug target/64003] valgrind complains about get_attr_length_nobnd in insn-attrtab.c from i386.md

2014-12-05 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64003

--- Comment #22 from Ilya Enkovich enkovich.gnu at gmail dot com ---
Created attachment 34195
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34195action=edit
Proposed patch


[Bug target/64003] valgrind complains about get_attr_length_nobnd in insn-attrtab.c from i386.md

2014-12-05 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64003

--- Comment #26 from Ilya Enkovich enkovich.gnu at gmail dot com ---
(In reply to rsand...@gcc.gnu.org from comment #25)
 
 If all you want to do is add 1 byte to the length to account for a prefix
 then it might be cleaner to use ADJUST_INSN_LENGTH.  You could then keep
 the single nobnd patterns.

Currently i386 target doesn't have ADJUST_INSN_LENGTH defined.  So I prefer to
keep it so and have all length definitions explicit in md file.


[Bug tree-optimization/64183] New: [5.0 Regression] Complete unroll doesn't happen for a while-loop

2014-12-04 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64183

Bug ID: 64183
   Summary: [5.0 Regression] Complete unroll doesn't happen for a
while-loop
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: enkovich.gnu at gmail dot com

Created attachment 34189
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34189action=edit
Reproducer

There is a performance regression in DENMark after r218142.  Regression happens
because complete unroll computes max number of iterations for a while-loop in a
different way.

Reduced reproducer:

int bits;
unsigned int size;
int max_code;

void
test ()
{
 int code = 0;

 while (code  max_code)
   code |= ((unsigned int) (size  (--bits)));

 while (bits  (unsigned int)25)
   bits += 8;
}

Compilation string:

gcc -std=c90 -m32 -O3 test.c -c -fdump-tree-cunroll-details

Dump before r218142:

Analyzing # of iterations of loop 2
  exit condition [(unsigned int) (prephitmp_33 + 8), + , 8] = 24
  bounds on difference of bases: -4294967271 ... 24
...
Loop 2 iterates at most 4 times.

Dump after r218142:

Analyzing # of iterations of loop 2
  exit condition [(unsigned int) (prephitmp_36 + 8), + , 8] = 24
  bounds on difference of bases: -4294967271 ... 24
...
Loop 2 iterates at most 536870911 times.

While-loop condition has signed/unsigned comparison.  But I believe the
original estimation of 4 iterations is correct.


[Bug tree-optimization/64183] [5.0 Regression] Complete unroll doesn't happen for a while-loop

2014-12-04 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64183

--- Comment #2 from Ilya Enkovich enkovich.gnu at gmail dot com ---
(In reply to Richard Biener from comment #1)
 It works correctly for
 
 int bits;
 
 void
 test ()
 {
   while (bits  (unsigned int)25)
 bits += 8;
 }

Right.  But shift operator in the attached testcase somehow breaks it after
r218142 adds a conversion to unsigned type for a second shift operand.


[Bug target/64003] valgrind complains about get_attr_length_nobnd in insn-attrtab.c from i386.md

2014-12-04 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64003

--- Comment #17 from Ilya Enkovich enkovich.gnu at gmail dot com ---
(In reply to Jorn Wolfgang Rennecke from comment #13)
  
 AFAICS, the length attribute was broken in r217125
 https://gcc.gnu.org/ml/gcc-cvs/2014-11/msg00133.html

If I understand the problem correctly the root is in attempt to get length of
following instructions computing length for forwrad jump instruction.  How
comes r217125 is guilty for that? It doesn't introduce such computations, it
just renames length attribute into length_nobnd for mentioned jump
patterns.  Do I miss something here?


[Bug target/64055] [5 regression] gnat.dg/derived_aggregate.adb FAILs on 32-bit i386

2014-11-28 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64055

--- Comment #6 from Ilya Enkovich enkovich.gnu at gmail dot com ---
TREE_INT_CST_LOW (maxval) assumes integer constant anyway.  Therefore we may
use simpler check.  It fixes gnat.dg/derived_aggregate.adb.

diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c
index 0fb78cc..84886da 100644
--- a/gcc/tree-chkp.c
+++ b/gcc/tree-chkp.c
@@ -1568,7 +1568,9 @@ chkp_find_bound_slots_1 (const_tree type, bitmap
have_bound,
   HOST_WIDE_INT esize = TREE_INT_CST_LOW (TYPE_SIZE (etype));
   unsigned HOST_WIDE_INT cur;

-  if (!maxval || integer_minus_onep (maxval))
+  if (!maxval
+ || TREE_CODE (maxval) != INTEGER_CST
+ || integer_minus_onep (maxval))
return;

   for (cur = 0; cur = TREE_INT_CST_LOW (maxval); cur++)


[Bug middle-end/63994] Ada bootstrap fails with -fcheck-pointer-bounds -mmpx

2014-11-27 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63994

--- Comment #7 from Ilya Enkovich enkovich.gnu at gmail dot com ---
(In reply to rguent...@suse.de from comment #6)
 I see.  I mainly wonder because of LTO which can combine TUs from C
 and Ada and because for example both Fortran and Ada define
 interoperability with C.  All languages also share the common
 C runtime builtins.
 
 Richard.

It should be OK to mix instrumented and not instrumented codes. 
Instrumentation happens in early passes before LTO streams out.  Therefore we
can compile C file with '-fcheck-pointer-bounds -mmpx -flto -c', then compile
fortran (or any other) file with '-c -flto' and finally pass generated objects
to LTO.  It may be inconvenient to avoid '-fcheck-pointer-bounds' for nonc-C
files when you work with mixed codes.  To handle it I may use langhooks and
ignore '-fcheck-pointer-bounds' when it's not supported for used language.


[Bug bootstrap/63995] Bootstrap error with -mmpx -fcheck-pointer-bounds

2014-11-26 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63995

--- Comment #8 from Ilya Enkovich enkovich.gnu at gmail dot com ---
With both patches applied bootstrap is OK


[Bug lto/64075] [5 Regression] ICE: in bp_pack_value, at data-streamer.h:106

2014-11-26 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64075

--- Comment #4 from Ilya Enkovich enkovich.gnu at gmail dot com ---
(In reply to H.J. Lu from comment #3)
 It was caused by r217655.
The problem was introduced earlier when function_code field in
tree_function_decl was extended to 12 bits.  LTO streamers were not fixed
appropriately.  r217655 increased BUILT_IN_COMPLEX_MUL_MIN value and put it out
of 11 bits which revealed the problem.

With this patch compiles OK:

diff --git a/gcc/tree-streamer-in.c b/gcc/tree-streamer-in.c
index 99448dd..eb205ed 100644
--- a/gcc/tree-streamer-in.c
+++ b/gcc/tree-streamer-in.c
@@ -333,7 +333,7 @@ unpack_ts_function_decl_value_fields (struct bitpack_d *bp,
tree expr)
   if (DECL_BUILT_IN_CLASS (expr) != NOT_BUILT_IN)
 {
   DECL_FUNCTION_CODE (expr) = (enum built_in_function) bp_unpack_value
(bp,
-  
11);
+  
12);
   if (DECL_BUILT_IN_CLASS (expr) == BUILT_IN_NORMAL
   DECL_FUNCTION_CODE (expr) = END_BUILTINS)
fatal_error (machine independent builtin code out of range);
diff --git a/gcc/tree-streamer-out.c b/gcc/tree-streamer-out.c
index ad58b84..0d87cff 100644
--- a/gcc/tree-streamer-out.c
+++ b/gcc/tree-streamer-out.c
@@ -300,7 +300,7 @@ pack_ts_function_decl_value_fields (struct bitpack_d *bp,
tree expr)
   bp_pack_value (bp, DECL_PURE_P (expr), 1);
   bp_pack_value (bp, DECL_LOOPING_CONST_OR_PURE_P (expr), 1);
   if (DECL_BUILT_IN_CLASS (expr) != NOT_BUILT_IN)
-bp_pack_value (bp, DECL_FUNCTION_CODE (expr), 11);
+bp_pack_value (bp, DECL_FUNCTION_CODE (expr), 12);
 }


[Bug middle-end/63994] Ada bootstrap fails with -fcheck-pointer-bounds -mmpx

2014-11-26 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63994

--- Comment #3 from Ilya Enkovich enkovich.gnu at gmail dot com ---
(In reply to rguent...@suse.de from comment #2)
 
  TARGET_CFLAGS=-O2 -g -mmpx -fcheck-pointer-bounds TARGET_CXXFLAGS=-O2 
 -g -mmpx -fcheck-pointer-bounds BOOT_CFLAGS=-O2 -g -mmpx 
 -fcheck-pointer-bounds /space/rguenther/src/svn/trunk/configure 
 --enable-languages=all,obj-c++,ada,go
  make -j12  TARGET_CFLAGS=-O2 -g -mmpx -fcheck-pointer-bounds 
 TARGET_CXXFLAGS=-O2 -g -mmpx -fcheck-pointer-bounds BOOT_CFLAGS=-O2 -g 
 -mmpx -fcheck-pointer-bounds

Building with these options I see Ada compiler is called with
-fcheck-pointer-bounds.  Option is in c-family/c.opt and shouldn't be passed
for Ada compiler.  

We should either not pass CFLAGS for Ada during build or filter language in the
compiler.


[Bug middle-end/63994] Ada bootstrap fails with -fcheck-pointer-bounds -mmpx

2014-11-26 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63994

--- Comment #5 from Ilya Enkovich enkovich.gnu at gmail dot com ---
(In reply to rguent...@suse.de from comment #4)
 Any reason why non-C-family languages cannot use MPX?
 
 Richard.

There is no fundamental restriction.  If someone wants to implement Pointer
Bounds Checker for some language, then he needs to define how it instruments
the program on that language and implement it in the compiler.  Currently it is
defined and implemented for C-languages only.


[Bug lto/64075] [5 Regression] ICE: in bp_pack_value, at data-streamer.h:106

2014-11-26 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64075

--- Comment #7 from Ilya Enkovich enkovich.gnu at gmail dot com ---
(In reply to Dmitry Gorbachev from comment #6)
 The patch works, thanks! But the committed test is incorrect, because the
 original, unpatched compiler, does not fail on it. It failed on functions
 __mulsc3, __muldc3, __mulxc3, __multc3, __divsc3, __divdc3, __divxc3, and
 __divtc3.

Committed test is what you attached as a reproducer with function renamed to
'test'.  Why shouldn't it work? I used it to reproduce and debug the issue on
today's trunk compiler.


[Bug target/64056] [5 Regression] gcc.target/i386/chkp-strlen-4.c etc. FAIL

2014-11-25 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64056

Ilya Enkovich enkovich.gnu at gmail dot com changed:

   What|Removed |Added

 CC||enkovich.gnu at gmail dot com

--- Comment #1 from Ilya Enkovich enkovich.gnu at gmail dot com ---
I sent a patch (https://gcc.gnu.org/ml/gcc-patches/2014-11/msg03097.html) to
add checks for mempcpy availability.


[Bug bootstrap/63995] Bootstrap error with -mmpx -fcheck-pointer-bounds

2014-11-25 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63995

--- Comment #3 from Ilya Enkovich enkovich.gnu at gmail dot com ---
Patch removing duplicating bounds symbols is in review.  With this patch
applied bootstrap goes till the end but there are lots of stage2 and stage3
comparison error.  I looked into one of them and the difference is caused by
'-gtoggle' option used for stage2 build and not used for stage3 build.


[Bug bootstrap/63995] Bootstrap error with -mmpx -fcheck-pointer-bounds

2014-11-25 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63995

--- Comment #5 from Ilya Enkovich enkovich.gnu at gmail dot com ---
Created attachment 34112
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34112action=edit
-g0 problem reproducer


[Bug bootstrap/63995] Bootstrap error with -mmpx -fcheck-pointer-bounds

2014-11-25 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63995

--- Comment #6 from Ilya Enkovich enkovich.gnu at gmail dot com ---
For attached -g0 problem reproducer:

gcc pr63995-2.c -c -O2 -mmpx -fcheck-pointer-bounds -g -o 1.o
gcc pr63995-2.c -c -O2 -mmpx -fcheck-pointer-bounds -g0 -o 2.o
objdump_pl -d 1.o 1.dump
objdump_pl -d 2.o 2.dump
diff 1.dump 2.dump
2c2
 1.o: file format elf64-x86-64
---
 2.o: file format elf64-x86-64
19,22c19,22
   2b: b8 03 00 00 00  mov$0x3,%eax
   30: f3 0f 1b 1c 07  bndmk  (%rdi,%rax,1),%bnd3
   35: c7 44 24 10 ff ff ffmovl   $0x,0x10(%rsp)
   3c: ff
---
   2b: c7 44 24 10 ff ff ffmovl   $0x,0x10(%rsp)
   32: ff
   33: b8 03 00 00 00  mov$0x3,%eax
   38: f3 0f 1b 1c 07  bndmk  (%rdi,%rax,1),%bnd3

Different instructions order is caused by different GIMPLE statements order
after chkpopt pass.  Will prepare a fix for that.


[Bug bootstrap/63995] Bootstrap error with -mmpx -fcheck-pointer-bounds

2014-11-25 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63995

--- Comment #7 from Ilya Enkovich enkovich.gnu at gmail dot com ---
In chkpopt pass calls to bndmk are moved down to uses to decrease register
pressure.  Debug info introduces new uses and therefore it affects a position
where bndmk calls appear.

-g0 case:

  bb 4: 
  r.field = -1;   
  __bound_tmp.1_13 = __builtin_ia32_bndmk (r, 4);
  test2.chkp (r, __bound_tmp.1_13);  

-g case:

  bb 4:
  # DEBUG c = r
  __bound_tmp.1_13 = __builtin_ia32_bndmk (r, 4);
  # DEBUG __chkp_bounds_of_c = NULL
  r.field = -1;
  test2.chkp (r, __bound_tmp.1_13);

Will ignore debug statements when computing a new position for bounds
load/creation (BTW debug statement seems to be damaged by gsi_move_before
called for bndmk).  Testing following fix:

diff --git a/gcc/tree-chkp-opt.c b/gcc/tree-chkp-opt.c
index ff390d7..b8d5d0b 100644
--- a/gcc/tree-chkp-opt.c
+++ b/gcc/tree-chkp-opt.c
@@ -1175,7 +1175,9 @@ chkp_reduce_bounds_lifetime (void)

   FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op)
{
- if (dom_bb 
+ if (is_gimple_debug (use_stmt))
+   continue;
+ else if (dom_bb 
  dominated_by_p (CDI_DOMINATORS,
  dom_bb, gimple_bb (use_stmt)))
{


[Bug other/63992] fcheck-pointer-bounds and friends are undocumented

2014-11-20 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63992

--- Comment #1 from Ilya Enkovich enkovich.gnu at gmail dot com ---
It is a part of already approved patch
(https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02317.html) which waits for MPX
runtime to be approved.


[Bug middle-end/63994] Ada bootstrap fails with -fcheck-pointer-bounds -mmpx

2014-11-20 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63994

--- Comment #1 from Ilya Enkovich enkovich.gnu at gmail dot com ---
What does bootstrap with -fcheck-pointer-bounds -mmpx mean?  Any instruction
on how to reproduce?


[Bug bootstrap/63995] Bootstrap error with -mmpx -fcheck-pointer-bounds

2014-11-20 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63995

--- Comment #1 from Ilya Enkovich enkovich.gnu at gmail dot com ---
Created attachment 34052
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34052action=edit
reproducer


[Bug bootstrap/63995] Bootstrap error with -mmpx -fcheck-pointer-bounds

2014-11-20 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63995

--- Comment #2 from Ilya Enkovich enkovich.gnu at gmail dot com ---
I had a successful bootstrap with instrumentation some time ago but it's not
performed regularly.

We are extending regression testing for instrumentation now and coverage should
become better.

This particular problem may be caused by multiple varpool_node for the same
var.  Will check it.


[Bug middle-end/63766] [5 Regression] ICE: in gimple_predict_edge, at predict.c:578

2014-11-18 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63766

--- Comment #5 from Ilya Enkovich enkovich.gnu at gmail dot com ---
I forgot to mention PR in a ChangeLog.  Patch is in trunk:
https://gcc.gnu.org/ml/gcc-cvs/2014-11/msg00707.html


[Bug middle-end/63766] [5 Regression] ICE: in gimple_predict_edge, at predict.c:578

2014-11-07 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63766

--- Comment #2 from Ilya Enkovich enkovich.gnu at gmail dot com ---
Problem caused by the fact that now all function come to local optimizations in
SSA form.  It affects inline parameters computation and therefore inlining
order.

During early SRA we call convert_callers_for_node which recomputes inline
parameters for functions in SSA form.  Previously it was computed only for
function already processed by all early local passes.  Now all functions are in
SSA form and it means we may recompute inline parameters for function not yet
processed by local optimizations.

In this test we have function marked as inlinable which is not yet processed in
do_per_function_toporder called for local_optimization_passes.  It allows this
function to be inlined and removed before it is actually processed (and still
sit in order vector).  Another cgraph_node created by SRA is allocated at the
same slot as removed one and thus the same function is processed twice, which
causes ICE in profiling pass.

Solution here would be to either use another condition for inline_parameters
recomputation or to handle nodes removal in do_per_function_toporder by
registering proper node removal hook.  Suppose the latter is better because
allows more early inlining.

Here is a possible fix (works for reproducer, not fully tested):

diff --git a/gcc/passes.c b/gcc/passes.c
index 5e91a79..4799efa 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -1609,6 +1609,19 @@ do_per_function (void (*callback) (function *, void
*data), void *data)
 static int nnodes;
 static GTY ((length (nnodes))) cgraph_node **order;

+static void
+remove_cgraph_node_from_order (cgraph_node *node, void *)
+{
+  int i;
+
+  for (i = 0; i  nnodes; i++)
+if (order[i] == node)
+  {
+   order[i] = NULL;
+   return;
+  }
+}
+
 /* If we are in IPA mode (i.e., current_function_decl is NULL), call
function CALLBACK for every function in the call graph.  Otherwise,
call CALLBACK on the current function.
@@ -1622,13 +1635,20 @@ do_per_function_toporder (void (*callback) (function *,
void *data), void *data)
 callback (cfun, data);
   else
 {
+  cgraph_node_hook_list *hook;
   gcc_assert (!order);
   order = ggc_vec_alloccgraph_node * (symtab-cgraph_count);
   nnodes = ipa_reverse_postorder (order);
   for (i = nnodes - 1; i = 0; i--)
 order[i]-process = 1;
+  hook = symtab-add_cgraph_removal_hook (remove_cgraph_node_from_order,
+ NULL);
   for (i = nnodes - 1; i = 0; i--)
{
+ /* Function could be inlined and removed as unreachable.  */
+ if (!order[i])
+   continue;
+
  struct cgraph_node *node = order[i];

  /* Allow possibly removed nodes to be garbage collected.  */
@@ -1637,6 +1657,7 @@ do_per_function_toporder (void (*callback) (function *,
void *data), void *data)
  if (node-has_gimple_body_p ())
callback (DECL_STRUCT_FUNCTION (node-decl), data);
}
+  symtab-remove_cgraph_removal_hook (hook);
 }
   ggc_free (order);
   order = NULL;


[Bug middle-end/63766] [5 Regression] ICE: in gimple_predict_edge, at predict.c:578

2014-11-07 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63766

Ilya Enkovich enkovich.gnu at gmail dot com changed:

   What|Removed |Added

 CC||enkovich.gnu at gmail dot com

--- Comment #4 from Ilya Enkovich enkovich.gnu at gmail dot com ---
(In reply to Richard Biener from comment #3)
 
 That's quadratic in the number of nodes and thus a no-go.  Why not delay
 removing of unreachable nodes instead?  If you go with the above then
 you need to change that data-structure used.
 

Delaying removal of unreachable nodes would mean we perform all early
optimization passes for node we will later remove.  It should be much more
expensive then having a hook iterating over nodes vector.

I also may add order_idx field into cgraph_node structure or create a local
hash to map nodes to indexes.


[Bug rtl-optimization/63620] RELOAD lost SET_GOT dependency on Darwin

2014-10-29 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63620

Ilya Enkovich enkovich.gnu at gmail dot com changed:

   What|Removed |Added

 CC||enkovich.gnu at gmail dot com

--- Comment #13 from Ilya Enkovich enkovich.gnu at gmail dot com ---
Created attachment 33841
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33841action=edit
Reproducer for Linux


[Bug ipa/63664] New: ipa-icf pass fails with segfault

2014-10-28 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63664

Bug ID: 63664
   Summary: ipa-icf pass fails with segfault
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: enkovich.gnu at gmail dot com

Created attachment 33825
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33825action=edit
Reproducer

There is a segfault in ipa-icf pass.

g++ test.cpp -O2 -c

test.cpp:40:1: internal compiler error: Segmentation fault
 }
 ^
0xe87262 crash_signal
../../gcc-pl-ref/gcc/toplev.c:349
0x17000ad ipa_icf_gimple::func_checker::compatible_types_p(tree_node*,
tree_node*, bool, bool)
../../gcc-pl-ref/gcc/ipa-icf-gimple.c:172
0x170035c ipa_icf_gimple::func_checker::compare_operand(tree_node*, tree_node*)
../../gcc-pl-ref/gcc/ipa-icf-gimple.c:220
0x17020a2 ipa_icf_gimple::func_checker::compare_tree_ssa_label(tree_node*,
tree_node*)
../../gcc-pl-ref/gcc/ipa-icf-gimple.c:737
0x1702187
ipa_icf_gimple::func_checker::compare_gimple_label(gimple_statement_base*,
gimple_statement_base*)
../../gcc-pl-ref/gcc/ipa-icf-gimple.c:755
0x1701c29 ipa_icf_gimple::func_checker::compare_bb(ipa_icf_gimple::sem_bb*,
ipa_icf_gimple::sem_bb*)
../../gcc-pl-ref/gcc/ipa-icf-gimple.c:604
0x16f463b ipa_icf::sem_function::equals_private(ipa_icf::sem_item*,
hash_mapsymtab_node*, ipa_icf::sem_item*, default_hashmap_traits)
../../gcc-pl-ref/gcc/ipa-icf.c:455
0x16f3e74 ipa_icf::sem_function::equals(ipa_icf::sem_item*,
hash_mapsymtab_node*, ipa_icf::sem_item*, default_hashmap_traits)
../../gcc-pl-ref/gcc/ipa-icf.c:355
0x16f8687 ipa_icf::sem_item_optimizer::subdivide_classes_by_equality(bool)
../../gcc-pl-ref/gcc/ipa-icf.c:1771
0x16f7d93 ipa_icf::sem_item_optimizer::execute()
../../gcc-pl-ref/gcc/ipa-icf.c:1590
0x16fa221 ipa_icf_driver
../../gcc-pl-ref/gcc/ipa-icf.c:2320
0x16fa736 ipa_icf::pass_ipa_icf::execute(function*)
../../gcc-pl-ref/gcc/ipa-icf.c:2368
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See http://gcc.gnu.org/bugs.html for instructions.

g++ -v
Using built-in specs.
COLLECT_GCC=../../../gcc-ref-build/bin/g++
COLLECT_LTO_WRAPPER=/export/users/ienkovic/gcc-ref-build/libexec/gcc/x86_64-unknown-linux-gnu/5.0.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-ref/configure
--prefix=/export/users/ienkovic/gcc-ref-build --enable-languages=c,c++,fortran
--disable-bootstrap
Thread model: posix
gcc version 5.0.0 20141024 (experimental) (GCC)


The problem is that compared labels have null types and
ipa_icf_gimple::func_checker::compatible_types_p doesn't check for null types. 
Possible patch:

diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
index 1369b74..afc0eeb 100644
--- a/gcc/ipa-icf-gimple.c
+++ b/gcc/ipa-icf-gimple.c
@@ -169,6 +169,11 @@ bool func_checker::compatible_types_p (tree t1, tree t2,
   bool compare_polymorphic,
   bool first_argument)
 {
+  if (!t1  !t2)
+return true;
+  else if (!t1 || !t2)
+return false;
+
   if (TREE_CODE (t1) != TREE_CODE (t2))
 return return_false_with_msg (different tree types);



If we don't want labels to have null type then start_preparsed_function
(decl.c:13607) has to be fixed.


[Bug lto/63555] New: ICE compiling simple test with SDB debugging info

2014-10-16 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63555

Bug ID: 63555
   Summary: ICE compiling simple test with SDB debugging info
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: enkovich.gnu at gmail dot com

I see ICE when try to compile a small test with -gcoff.  Problem appears when
we have structure field and static variable with the same name.  Here is a
reproducer:

typedef struct {
  int *next;
} list;

int *next;

int main(int argc, char **argv)
{
  return 0;
}

gcc -m64 -c -gcoff short.c
cc1: internal compiler error: in needed_p, at cgraphunit.c:237
0x7c1a8c symtab_node::needed_p()
../../gcc-pl/gcc/cgraphunit.c:236
0x7c3933 analyze_functions
../../gcc-pl/gcc/cgraphunit.c:936
0x7c7579 symbol_table::finalize_compilation_unit()
../../gcc-pl/gcc/cgraphunit.c:2288
0x627b77 c_write_global_declarations()
../../gcc-pl/gcc/c/c-decl.c:10431
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See http://gcc.gnu.org/bugs.html for instructions.

Here is failing assert:

  /* Double check that no one output the function into assembly file
 early.  */ 
  gcc_checking_assert (!DECL_ASSEMBLER_NAME_SET_P (decl)
   || !TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME
(decl)));

During file parsing we have a call to sdbout_symbol for structure type.  It
causes output of its field and field's name is marked as referenced.  Later
variable analysis hits assert because variable's assembler name is shared with
the structure field.


[Bug lto/62034] New: ICE for big statically initialized arrays compiled with LTO

2014-08-06 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62034

Bug ID: 62034
   Summary: ICE for big statically initialized arrays compiled
with LTO
   Product: gcc
   Version: 4.10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: enkovich.gnu at gmail dot com

Created attachment 33259
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33259action=edit
Reproducer

I get ICE when try to compile tests with big amount of statically initialized
data.

gcc --version
gcc (GCC) 4.10.0 20140806 (experimental)
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

gcc -flto test.c
gcc: internal compiler error: Segmentation fault (program lto1)
0x405c80 execute
../../gcc-ref/gcc/gcc.c:2900
0x409fe9 do_spec_1
../../gcc-ref/gcc/gcc.c:4704
0x40d475 process_brace_body
../../gcc-ref/gcc/gcc.c:5987
0x40d2b1 handle_braces
../../gcc-ref/gcc/gcc.c:5901
0x40bf9d do_spec_1
../../gcc-ref/gcc/gcc.c:5358
0x40d475 process_brace_body
../../gcc-ref/gcc/gcc.c:5987
0x40d2b1 handle_braces
../../gcc-ref/gcc/gcc.c:5901
0x40bf9d do_spec_1
../../gcc-ref/gcc/gcc.c:5358
0x40c38c do_spec_1
../../gcc-ref/gcc/gcc.c:5473
0x40d475 process_brace_body
../../gcc-ref/gcc/gcc.c:5987
0x40d2b1 handle_braces
../../gcc-ref/gcc/gcc.c:5901
0x40bf9d do_spec_1
../../gcc-ref/gcc/gcc.c:5358
0x409664 do_spec_2
../../gcc-ref/gcc/gcc.c:4405
0x409582 do_spec(char const*)
../../gcc-ref/gcc/gcc.c:4372
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See http://gcc.gnu.org/bugs.html for instructions.
lto-wrapper: fatal error: gcc-ref-build/bin/gcc returned 4 exit status
compilation terminated.
/usr/bin/ld: lto-wrapper failed
collect2: error: ld returned 1 exit status

Debugger shows that problem appears when lto_input_tree tries to dig through a
bunch of SCC entries in input stream.  Each SCC entry cause two new functions
(lto_input_tree and lto_input_tree_1) in the call stack.  With many consequent
SCC entries stack may grow too much (in my case compiler segfaulted with ~600
000 entries in the call stack).

Attached test has a statically initialized array with a million elements. 
Bigger data set may be required to break the compiler if you use increased
stack size.

Problem appeared after this commit:
https://gcc.gnu.org/ml/gcc-cvs/2014-07/msg00291.html

Following patch removing recursion helps me to compile my tests:

diff --git a/gcc/lto-streamer-in.c b/gcc/lto-streamer-in.c
index 698f926..25657da 100644
--- a/gcc/lto-streamer-in.c
+++ b/gcc/lto-streamer-in.c
@@ -1345,7 +1345,16 @@ lto_input_tree_1 (struct lto_input_block *ib, struct
data_in *data_in,
 tree
 lto_input_tree (struct lto_input_block *ib, struct data_in *data_in)
 {
-  return lto_input_tree_1 (ib, data_in, streamer_read_record_start (ib), 0);
+  enum LTO_tags tag;
+
+  /* Skip SCC entries.  */
+  while ((tag = streamer_read_record_start (ib)) == LTO_tree_scc)
+{
+  unsigned len, entry_len;
+  lto_input_scc (ib, data_in, len, entry_len);
+}
+
+  return lto_input_tree_1 (ib, data_in, tag, 0);
 }

Did not fully test this patch yet.


[Bug middle-end/61734] [4.10 Regression] Regression in ABS_EXPR recognition

2014-07-29 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61734

--- Comment #10 from Ilya Enkovich enkovich.gnu at gmail dot com ---
Thanks for the fix!

Is there any reason for ABS_EXPR detection for not working on 64bit target for
the same test? The only difference should be the long long type size. How does
it affect optimizations?


[Bug middle-end/61734] [4.10 Regression] Regression in ABS_EXPR recognition

2014-07-29 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61734

--- Comment #12 from Ilya Enkovich enkovich.gnu at gmail dot com ---
Before your last fix both 32bit and 64bit versions of .original look similar
except a condition. We have (a - b  0) for 64 bit and (a  b) for 32bit.

64bit version (before and after the patch)

{
  sum = ((int) a - (int) b  0 ? (long unsigned int) ((int) a - (int) b) :
(long unsigned int) ((int) b - (int) a)) + sum;
  return sum;
}

32bit version (before the patch):

{
  sum = ((int) a  (int) b ? (long unsigned int) ((int) a - (int) b) : (long
unsigned int) ((int) b - (int) a)) + sum;
  return sum;
}

It is not clear why such difference exists though.


[Bug tree-optimization/61734] New: Regression in ABS_EXPR recognition

2014-07-07 Thread enkovich.gnu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61734

Bug ID: 61734
   Summary: Regression in ABS_EXPR recognition
   Product: gcc
   Version: 4.10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: enkovich.gnu at gmail dot com

Recently a performance regression occurred in tests heavily using ABS
computation (observed on x86 and ARM targets).  It is caused by missing
ABS_EXPR recognition which results in sub-optimal code.

Problem appeared after this commit:

commit 32ce9a5c4208411361402f60e672c4830da0bc8f
Author: ebotcazou ebotcazou@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Tue May 27 19:54:46 2014 +

* fold-const.c (fold_comparison): Clean up and extend X +- C1 CMP C2
to X CMP C2 -+ C1 transformation to EQ_EXPR/NE_EXPR.
Add X - Y CMP 0 to X CMP Y transformation.
(fold_binary_loc) EQ_EXPR/NE_EXPR: Remove same transformations.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@210979
138bc75d-0d04-0410-961f-82ee72b054a4


Here is a simple test (tested on linux-x86_64):
cat test.i
unsigned long test (unsigned char a, unsigned char b, unsigned long sum)
{
  sum += ((a - b)  0 ? (a - b) : -(a - b));
  return sum;
}
gcc-exp-build/bin/gcc test.i -m32 -O2 -fdump-tree-gimple -c
cat test.i.004t.gimple
test (unsigned char a, unsigned char b, long unsigned int sum)
{
  long unsigned int iftmp.0;
  int D.1720;
  int D.1721;
  int D.1724;
  int D.1726;
  long unsigned int D.1727;

  D.1720 = (int) a;
  D.1721 = (int) b;
  if (D.1720  D.1721) goto D.1722; else goto D.1723;
  D.1722:
  D.1720 = (int) a;
  D.1721 = (int) b;
  D.1724 = D.1720 - D.1721;
  iftmp.0 = (long unsigned int) D.1724;
  goto D.1725;
  D.1723:
  D.1721 = (int) b;
  D.1720 = (int) a;
  D.1726 = D.1721 - D.1720;
  iftmp.0 = (long unsigned int) D.1726;
  D.1725:
  sum = iftmp.0 + sum;
  D.1727 = sum;
  return D.1727;
}


With older compiler I have:

gcc-ref-build/bin/gcc test.i -m32 -O2 -fdump-tree-gimple -c
cat test.i.004t.gimple
test (unsigned char a, unsigned char b, long unsigned int sum)
{
  int D.1719;
  int D.1720;
  int D.1721;
  int D.1722;
  long unsigned int D.1723;
  long unsigned int D.1724;

  D.1719 = (int) a;
  D.1720 = (int) b;
  D.1721 = D.1719 - D.1720;
  D.1722 = ABS_EXPR D.1721;
  D.1723 = (long unsigned int) D.1722;
  sum = D.1723 + sum;
  D.1724 = sum;
  return D.1724;
}


BTW both compilers generate ABS_EXPR when -O0 is used instead of -O2.  Both
compilers fail to generate ABS_EXPR when -m64 is used instead of -m32.


[Bug tree-optimization/60559] New: g++.dg/vect/pr60023.cc fails with -fno-tree-dce (ICE)

2014-03-18 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60559

Bug ID: 60559
   Summary: g++.dg/vect/pr60023.cc fails with -fno-tree-dce (ICE)
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: enkovich.gnu at gmail dot com

Test gcc/testsuite/g++.dg/vect/pr60023.cc -fno-tree-dce fails with ICE if
executed with additional -fno-tree-dce flag.

As I can see the problem is in generated mask load which operated with integer
types:

  int * _13;
  int _14;
  ...
  _14 = MASK_LOAD (_13, 0B, _ifc__37);

With DCE we have LHS of this call removed and then statement ignored in
expand_MASK_LOAD.  But with no DCE we get ICE because there is no proper code
in optab.

I use gcc (GCC) 4.9.0 20140317 (experimental).

gcc -O2 -ftree-vectorize -fno-vect-cost-model -msse2 -fdump-tree-vect-details
-O3 -std=c++11 -fnon-call-exceptions -mavx2 -S -o pr60023.s pr60023.cc
-fno-tree-dce

/export/users/ienkovic/gcc/gcc/testsuite/g++.dg/vect/pr60023.cc: In function
'void f1(int*, int*, int*)':
/export/users/ienkovic/gcc/gcc/testsuite/g++.dg/vect/pr60023.cc:14:17: internal
compiler error: in maybe_gen_insn, at optabs.c:8250
   p[i] = q[i] + 1;
 ^
0xc421d0 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
/gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/optabs.c:8250
0xc42629 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
/gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/optabs.c:8294
0xc426bd expand_insn(insn_code, unsigned int, expand_operand*)
/gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/optabs.c:8325
0xb27d95 expand_MASK_LOAD
   
/gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/internal-fn.c:837
0xb2807f expand_internal_call(gimple_statement_base*)
   
/gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/internal-fn.c:886
0x8f483f expand_call_stmt
   
/gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/cfgexpand.c:2190
0x8f815a expand_gimple_stmt_1
   
/gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/cfgexpand.c:3160
0x8f87a4 expand_gimple_stmt
   
/gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/cfgexpand.c:3312
0x8febbd expand_gimple_basic_block
   
/gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/cfgexpand.c:5152
0x9006a5 gimple_expand_cfg
   
/gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/cfgexpand.c:5731
0x900d20 execute
   
/gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/cfgexpand.c:5951


[Bug middle-end/57055] New: Incorrect CFG after transactional memory passes

2013-04-24 Thread enkovich.gnu at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57055



 Bug #: 57055

   Summary: Incorrect CFG after transactional memory passes

Classification: Unclassified

   Product: gcc

   Version: 4.9.0

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: middle-end

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: enkovich@gmail.com





Transactional passes do not set cfun-calls_setjmp to true and do not fix CFG

accordingly after adding __builtin__ITM_beginTransaction call having

ECF_RETURNS_TWICE flag set.



It leads to inconsistency which may be revealed with special calls flags

recomputation.



If I add DCE pass after transactional memory then flags are recomputed and CFG

check fails because of call statements in the middle of basic block. Thus DCE

pass after transactional memory causes ~250 new fails in 'make check'.





Tried on 'gcc version 4.9.0 20130422 (experimental) (GCC)'


[Bug target/50962] Additional opportunity for AGU stall avoidance optimization for Atom processor

2011-11-02 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50962

--- Comment #2 from Ilya Enkovich enkovich.gnu at gmail dot com 2011-11-02 
13:05:46 UTC ---
Created attachment 25689
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25689
Proposed patch


[Bug target/50962] Additional opportunity for AGU stall avoidance optimization for Atom processor

2011-11-02 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50962

Ilya Enkovich enkovich.gnu at gmail dot com changed:

   What|Removed |Added

 CC||enkovich.gnu at gmail dot
   ||com

--- Comment #3 from Ilya Enkovich enkovich.gnu at gmail dot com 2011-11-02 
13:06:07 UTC ---
Current optimization use only splits to transform arithmetic into lea and vice
versa. It does not work for move because corresponding lea template will be
equal. We can check if lea is required during instruction emit. 

I have a patch to fix it. Bootstrap and make check passed. I'm currently
checking performance changes.


[Bug target/50164] [IRA, 4.7 Regression] Performance degradation due to increased memory instructions count

2011-10-28 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50164

Ilya Enkovich enkovich.gnu at gmail dot com changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution||FIXED

--- Comment #11 from Ilya Enkovich enkovich.gnu at gmail dot com 2011-10-28 
10:10:42 UTC ---
Initially problem was caused by movzbl cost value for Atom. Low cost of movzbl
made IRA keep frequently used byte value on the stack and assign register for
int value. Change cost model resolves the problem and it has been fixed in
revision 17.


[Bug target/50164] [IRA, 4.7 Regression] Performance degradation due to increased memory instructions count

2011-08-30 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50164

Ilya Enkovich enkovich.gnu at gmail dot com changed:

   What|Removed |Added

  Attachment #25083|0   |1
is obsolete||

--- Comment #7 from Ilya Enkovich enkovich.gnu at gmail dot com 2011-08-30 
10:40:50 UTC ---
Created attachment 25138
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25138
Fixed reproducer


[Bug target/50164] [IRA, 4.7 Regression] Performance degradation due to increased memory instructions count

2011-08-30 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50164

--- Comment #8 from Ilya Enkovich enkovich.gnu at gmail dot com 2011-08-30 
10:50:44 UTC ---
I attached a fixed reproducer. It is closer to the original test and has higher
registers pressure then the previous version. It has the same problem as the
first reproducer. 

Reproduced with GCC 4.7.0 20110828 and options -O2 -m32 -march=atom. Code
becomes faster on both Atom (~10%) and Core (~35%) if I use just -O2 -m32.


[Bug target/50164] [IRA, 4.7 Regression] Performance degradation due to increased memory instructions count

2011-08-29 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50164

Ilya Enkovich enkovich.gnu at gmail dot com changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution||FIXED

--- Comment #5 from Ilya Enkovich enkovich.gnu at gmail dot com 2011-08-29 
07:01:38 UTC ---
(In reply to comment #4)
 On Atom with -m32 -O2 -march=atom,
 
 1. GCC 4.6.1:
 
 ./4.6  64.16s user 0.01s system 99% cpu 1:04.18 total
 
 2. GCC 4.7.0 20110819:
 
 ./0819  69.73s user 0.01s system 99% cpu 1:09.76 total
 
 3. GCC 4.7.0 20110826:
 
 ./0826  64.30s user 0.02s system 99% cpu 1:04.33 total
 
 Has this problem been fixed?

Confirm. Problem has gone.


[Bug target/50164] [IRA, 4.7 Regression] Performance degradation due to increased memory instructions count

2011-08-29 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50164

Ilya Enkovich enkovich.gnu at gmail dot com changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |

--- Comment #6 from Ilya Enkovich enkovich.gnu at gmail dot com 2011-08-29 
07:39:09 UTC ---
It appeared problem was fixed in reproducer but was not fixed in original test
case. I'll prepare fixed reproducer.


[Bug target/50164] [IRA, 4.7 Regression] Performance degradation due to increased memory instructions count

2011-08-25 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50164

--- Comment #2 from Ilya Enkovich enkovich.gnu at gmail dot com 2011-08-25 
09:31:29 UTC ---
(In reply to comment #1)
 Yesterday I sent a patch
 http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01954.html which most probably
 solved the problem.
 
 Now I have code size 419 (gcc 4.6) vs 411 (gcc as of Aug 24) bytes for the
 test.

I tried it but unfortunately it did not solve the regression. We still have xk
on the stack and x1.5 more memory accesses in GCC 4.7 assembly for mentioned
code part. GCC 4.6 produces bigger but faster code.

Problem somehow appears only when -march=atom is used. There is no degradation
if generic arch is used. I compared GCC 4.7 dumps for -O2 -m32 and -O2 -m32
-march=atom and found that RTLs are same before IRA and differ after IRA. 

How does -march=atom affects register allocation?


[Bug target/50164] New: [IRA, 4.7 Regression] Performance degradation due to increased memory instructions count

2011-08-23 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50164

 Bug #: 50164
   Summary: [IRA, 4.7 Regression] Performance degradation due to
increased memory instructions count
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: enkovich@gmail.com


Created attachment 25083
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25083
Reproducer

Problem occurs with -march=atom option used on the following part of test case:

  xc = (e_u8) (xc - xk);
  xm = (e_u8) (xm - xk);
  xy = (e_u8) (xy - xk);

  *EritePtr++ = xc;
  *EritePtr++ = xm;
  *EritePtr++ = xy;
  *EritePtr++ = xk;

xk has the most usages here and GCC 4.6 keeps it on register but GCC 4.7 keeps
it on stack which leads to increased number of memory instructions for that
code. On Core i7 GCC 4.7 generates code x1.5 slower than GCC 4.6. On Atom it is
~10% slower.

GCC 4.6 info:
Configured with: /export/users/mstester/stability/svn/gcc-4_6-branch/configure
--with-arch=corei7 --with-cpu=corei7 --enable-clocale=gnu --with-system-zlib
--enable-shared --with-demangler-in-ld --enable-cloog-backend=isl
--with-fpmath=sse
--prefix=/export/users/mstester/stability/work/gcc-4_6-branch/64/install
--enable-languages=c,c++,fortran
Thread model: posix
gcc version 4.6.2 20110822 (prerelease) (GCC)
COLLECT_GCC_OPTIONS='-O2' '-march=atom' '-m32' '-o' 'test.4.6' '-v'

/nfs/ims/proj/icl/gcc/gnu/compilers/gcc/gcc-4_6-branch/64/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.2/cc1
-quiet -v -imultilib 32 -iprefix
/nfs/ims/proj/icl/gcc/gnu/compilers/gcc/gcc-4_6-branch/64/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.2/
test.c -quiet -dumpbase test.c -march=atom -m32 -auxbase test -O2 -version -o
/tmp/ccM2NIHU.s
GNU C (GCC) version 4.6.2 20110822 (prerelease) (x86_64-unknown-linux-gnu)
compiled by GNU C version 4.6.2 20110822 (prerelease), GMP version
4.3.1, MPFR version 2.4.2, MPC version 0.8.1

GCC 4.7 info:
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-master/configure --prefix=/export/gcc-master-build
Thread model: posix
gcc version 4.7.0 20110822 (experimental) (GCC)
COLLECT_GCC_OPTIONS='-O2' '-march=atom' '-m32' '-o' 'test.4.7' '-v'
 /export/gcc-master-build/libexec/gcc/x86_64-unknown-linux-gnu/4.7.0/cc1 -quiet
-v -imultilib 32 test.c -quiet -dumpbase test.c -march=atom -m32 -auxbase test
-O2 -version -o /tmp/cc5DRHOU.s
GNU C (GCC) version 4.7.0 20110822 (experimental) (x86_64-unknown-linux-gnu)
compiled by GNU C version 4.7.0 20110822 (experimental), GMP version
4.3.2, MPFR version 3.0.0, MPC version 0.8.3-dev


[Bug rtl-optimization/50088] movzbl is generated instead of movl

2011-08-17 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50088

--- Comment #13 from Ilya Enkovich enkovich.gnu at gmail dot com 2011-08-17 
09:07:20 UTC ---
(In reply to comment #12)
 Created attachment 25025 [details]
 A patch to use the same mode for shift count
 
 This is an untested patch to use the same mode for shift count.

We should find solution for the general problem. Not for its specific
appearance in reproducer. 

We may have the same issue for any other instructions consuming byte register
and it is better to fix the source of the problem (which is I suppose in IRA)
and do not introduce workaround for each such instruction.

BTW I think you should not increase size of immediate operands in your patch.


[Bug rtl-optimization/50088] movzbl is generated instead of movl

2011-08-17 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50088

--- Comment #15 from Ilya Enkovich enkovich.gnu at gmail dot com 2011-08-17 
14:16:27 UTC ---
(In reply to comment #14)
 
 I think this problem is unique to x86 since some instructions have
 different sizes in register operands.  In this example, shift count
 is CL regardless the source operand size. I am not sure how much RA
 can help here. By making register operands in shift instructions to
 have the same size (32bit or less), it may work for most cases.
 
We have a problem due to different sizes of spill and load generated by IRA for
the same var. I'm not sure that by patching shift instructions we cover all
cases when IRA may do that.


[Bug rtl-optimization/50088] movzbl is generated instead of movl

2011-08-16 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50088

--- Comment #8 from Ilya Enkovich enkovich.gnu at gmail dot com 2011-08-16 
06:55:34 UTC ---
(In reply to comment #4)
 
 Well, yes, I think the proposal was to spill/load the full SImode instead
 which would avoid both the partial dependency and the mismatched load/store
 size.  No?

Yes, I think we should generate full SImode spill/load.


[Bug rtl-optimization/50088] movzbl is generated instead of movl

2011-08-16 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50088

--- Comment #9 from Ilya Enkovich enkovich.gnu at gmail dot com 2011-08-16 
07:28:33 UTC ---
(In reply to comment #5)
 
 It is for movqi.  We can only safely replace mozbl with movl if
 the source is 4byte aligned.  It should a new backend option.

That should work. 

A better solution here would be to not generate movqi at all. But probably it
was performed intentionally and is profitable for some platforms. In this case
we should choose movl generation for movqi.


[Bug rtl-optimization/50037] Unroll factor exceeds max trip count

2011-08-15 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50037

--- Comment #8 from Ilya Enkovich enkovich.gnu at gmail dot com 2011-08-15 
09:06:18 UTC ---
This patch did not work for me. Tried on following loop (-O2 -funroll-loops):

  for ( count = ((*(hdrptr))  0x7); count  0; count--, addr++ )
sum += *addr;

No multiplication by 2 but still have the same unroll.

I also was hoping this patch would prevent unroll of prologue loop generated by
vectorizer.It uses ' 7' expression for iterations computation but this loop
also uses MIN expression to limit number of iteration and is still unrolled.


[Bug rtl-optimization/50088] New: movzbl is generated instead of movl

2011-08-15 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50088

 Bug #: 50088
   Summary: movzbl is generated instead of movl
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: enkovich@gmail.com


Created attachment 25016
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25016
Reproducer

When spilled register is going to be used in subreg expression then short load
is generated to fill register.

Example:
movl  %edx, 0x34(%esp)
jz 0x1498 Block 54
  Block 34:
movzxb  0x34(%esp), %ecx
shl %cl, %eax

It is correct but may cause performance problems. I doubt there are situations
when zero extended load is better than natural one.

On Atom processors (and probably some others) such situations cause stalls
because store forwarding does not work for store/load pair using different
access sizes.

For example EEMBC 2.0/huffde has ~6% performance improvement on Atom if we
replace such movzbl with movl.

Attached reproducer demonstrates fills performed via movzbl.
Used compiler and options:

Target: x86_64-unknown-linux-gnu
Configured with: ../gcc1/configure --prefix=/export/users/gcc-perf/install
--enable-languages=c,c++,fortran
Thread model: posix
gcc version 4.7.0 20110615 (experimental) (GCC)
COLLECT_GCC_OPTIONS='-O2' '-m32' '-S' '-v' '-mtune=generic' '-march=x86-64'
 /export/users/gcc-perf/install/libexec/gcc/x86_64-unknown-linux-gnu/4.7.0/cc1
-quiet -v -imultilib 32 test_movzbl.c -quiet -dumpbase test_movzbl.c -m32
-mtune=generic -march=x86-64 -auxbase test_movzbl -O2 -version -o test_movzbl.s
GNU C (GCC) version 4.7.0 20110615 (experimental) (x86_64-unknown-linux-gnu)
compiled by GNU C version 4.4.3, GMP version 4.3.1, MPFR version 2.4.2,
MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096


[Bug rtl-optimization/50088] movzbl is generated instead of movl

2011-08-15 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50088

--- Comment #2 from Ilya Enkovich enkovich.gnu at gmail dot com 2011-08-15 
13:24:05 UTC ---
Actually we do not need any zero extensions here. Zero extended load appears
only after IRA if we have to spill/fill register.

Here is c code from reproducer:

  n1 = (n1 + 1)  15;
  s += arr[i]  n1;

RTL before IRA:

(insn 67 66 68 4 (parallel [
(set (reg/v:SI 97 [ n1 ])
(plus:SI (reg/v:SI 97 [ n1 ])
(const_int 1 [0x1])))
(clobber (reg:CC 17 flags))
]) test_movzbl.c:18 249 {*addsi_1}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))

(insn 68 67 70 4 (parallel [
(set (reg/v:SI 97 [ n1 ])
(and:SI (reg/v:SI 97 [ n1 ])
(const_int 15 [0xf])))
(clobber (reg:CC 17 flags))
]) test_movzbl.c:18 385 {*andsi_1}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))

(insn 70 68 71 4 (set (reg:SI 262)
(mem:SI (reg:SI 224 [ ivtmp.52 ]) [2 MEM[base: D.2889_232, offset:
0B]+0 S4 A32])) test_movzbl.c:20 64 {*movsi_internal}
 (nil))

(insn 71 70 72 4 (parallel [
(set (reg:SI 262)
(ashift:SI (reg:SI 262)
(subreg:QI (reg/v:SI 97 [ n1 ]) 0)))
(clobber (reg:CC 17 flags))
]) test_movzbl.c:20 502 {*ashlsi3_1}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_EQUAL (ashift:SI (mem:SI (reg:SI 224 [ ivtmp.52 ]) [2
MEM[base: D.2889_232, offset: 0B]+0 S4 A32])
(subreg:QI (reg/v:SI 97 [ n1 ]) 0))
(nil

IRA then introduces fill for shift instruction and use byte load for it:

(insn 155 70 71 4 (set (reg:QI 2 cx)
(mem/c:QI (reg/f:SI 7 sp) [4 %sfp+-28 S1 A32])) test_movzbl.c:20 66
{*movqi_internal}
 (nil))

(insn 71 155 72 4 (parallel [
(set (reg:SI 5 di [262])
(ashift:SI (reg:SI 5 di [262])
(reg:QI 2 cx)))
(clobber (reg:CC 17 flags))
]) test_movzbl.c:20 502 {*ashlsi3_1}
 (expr_list:REG_EQUAL (ashift:SI (mem:SI (reg:SI 0 ax [orig:224 ivtmp.52 ]
[224]) [2 MEM[base: D.2889_232, offset: 0B]+0 S4 A32])
(subreg:QI (mem/c:SI (reg/f:SI 7 sp) [4 %sfp+-28 S4 A32]) 0))
(nil)))

Load for shift then is emitted as movzbl.


[Bug rtl-optimization/50037] New: Unroll factor exceeds max trip count

2011-08-10 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50037

 Bug #: 50037
   Summary: Unroll factor exceeds max trip count
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: enkovich@gmail.com


Created attachment 24971
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24971
Reproducer

Here is a small loop on which GCC performs inefficient unroll:

for ( count = ((*(hdrptr))  0xf) * 2; count  0; count--, addr++ )
sum += *addr;

This loop has maximum 30 iterations. If we use -O3 then this loop is
vectorized. Resulting loop has maximum 30 / 8 = 3 iteration. Also vectorizer
generates prologue and epilogue loops. Each of them has maximum 7 iterations.

If we add -funroll-loops option then each of 3 generated by vectorizer loops is
unrolled with unroll factor 8. It creates a lot of code which is never executed
and also decreases performance due to additional checks and branches.

Target: x86_64-unknown-linux-gnu
Configured with: ../gcc1/configure --prefix=/export/gcc-perf/install
--enable-languages=c,c++,fortran
Thread model: posix
gcc version 4.7.0 20110615 (experimental) (GCC)
COLLECT_GCC_OPTIONS='-O3' '-funroll-loops' '-S' '-v' '-mtune=generic'
'-march=x86-64'
 /export/gcc-perf/install/libexec/gcc/x86_64-unknown-linux-gnu/4.7.0/cc1 -quiet
-v unroll_test.c -quiet -dumpbase unroll_test.c -mtune=generic -march=x86-64
-auxbase unroll_test -O3 -version -funroll-loops -o unroll_test.s
GNU C (GCC) version 4.7.0 20110615 (experimental) (x86_64-unknown-linux-gnu)
compiled by GNU C version 4.4.3, GMP version 4.3.1, MPFR version 2.4.2,
MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096


[Bug rtl-optimization/50037] Unroll factor exceeds max trip count

2011-08-10 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50037

--- Comment #2 from Ilya Enkovich enkovich.gnu at gmail dot com 2011-08-10 
15:33:22 UTC ---
I wouldn't blame vectorizer here. Following loop is unrolled with unroll factor
8 even if vectorizer is disabled:

for ( count = ((*(hdrptr))  0x3) * 2; count  0; count--, addr++ )
sum += *addr;

BTW prologue loops generated by vectorizer also compute iterations count using
'AND' expression. Therefore we may frequently get prologue loops unrolled which
is never profitable if we use such huge unroll factor.


[Bug middle-end/49959] New: ABS pattern is not recognized

2011-08-03 Thread enkovich.gnu at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49959

   Summary: ABS pattern is not recognized
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: enkovich@gmail.com


Created attachment 24900
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24900
Simple test where ABS pattern is not recognized

Here is optimization opportunity for ABS pattern recognizer which does not
catch all cases.

Here is a simple test for ABS computation:

#define ABS(X)(((X)0)?(X):-(X))
int
test_abs(int *cur)
{
  unsigned long sad = 0;
  sad = ABS(cur[0]);
  return sad;
}

GIMPLE for the test is good (phase optimized):

test_abs (int * cur)
{
  int D.2783;
  int D.2782;

bb 2:
  D.2782_3 = *cur_2(D);
  D.2783_4 = ABS_EXPR D.2782_3;
  return D.2783_4;
}

Now try to make a minor change in test:
#define ABS(X)(((X)0)?(X):-(X))
int
test_abs(int *cur)
{
  unsigned long sad = 0;
  sad += ABS(cur[0]);
  return sad;
}

GIMPLE becomes worse:

test_abs (int * cur)
{
  int D.2788;
  int D.2787;
  int D.2783;
  long unsigned int iftmp.0;

bb 2:
  D.2783_4 = *cur_3(D);
  if (D.2783_4  0)
goto bb 3;
  else
goto bb 4;

bb 3:
  iftmp.0_6 = (long unsigned int) D.2783_4;
  goto bb 5;

bb 4:
  D.2787_8 = -D.2783_4;
  iftmp.0_9 = (long unsigned int) D.2787_8;

bb 5:
  # iftmp.0_1 = PHI iftmp.0_6(3), iftmp.0_9(4)
  D.2788_11 = (int) iftmp.0_1;
  return D.2788_11;
}

Compiler used for tests:

Target: x86_64-unknown-linux-gnu
Configured with: ../gcc/configure --prefix=/export/gcc-build
--enable-languages=c,c++,fortran
Thread model: posix
gcc version 4.7.0 20110707 (experimental) (GCC)
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O2' '-S' '-mtune=generic'
'-march=x86-64'