Question about find modifiable mems

2015-06-02 Thread shmeel gutl
find_modifiable_mems was introduced to gcc 4.8 in september 2012. Is 
there any documentation as to how it is supposed to help the haifa 
scheduler?


In my private port of gcc it make the following type of transformations

from
a= *(b+20)
b+=30

to
b+=30
a=*(b-10)

Although this is functionally correct, it has changed an ANTI_DEP into a 
TRUE_DEP and thus introduced stalls. If it went the other way, that 
would be good. Any pointers?


Thanks,
Shmeel



Re: Is it safe to use _Bool as asm statement outputs on x86?

2015-06-02 Thread H. Peter Anvin
On 06/02/2015 11:23 PM, H.J. Lu wrote:
> Trampling code for nested functions puts code on stack.

This is an issue for the CS ≠ DS case, as opposed to _Bool, I assume.
In other words, we are okay if and only if we can run with an NX stack?

-hpa



Re: Is it safe to use _Bool as asm statement outputs on x86?

2015-06-02 Thread H. Peter Anvin
On 06/02/2015 06:02 PM, Richard Henderson wrote:
> On 06/02/2015 04:46 PM, H. Peter Anvin wrote:
>> For the x86 backend explicitly, is doing something like:
>>
>> _Bool x;
>>
>> asm("blah ; setc %0" : "=qm" (x));
>>
>> ... guaranteed to be safe for older versions of gcc?
> 
> I believe so, for the restricted set of conditions I expect you're asking.
> In particular:
> 
>  (1) Linux has always defined _Bool as a byte (indeed, afaik only Darwin
>  has ever done otherwise).
> 
>  (2) You must really produce 0/1 from the asm; the compiler doesn't re-do
>  the canonicalization afterward, and afaik we do rely on that in the
>  optimizers.  But certainly that's true for any version of GCC.
> 

That is all good as far as I am concerned.

-hpa




RE: [RFC] Design and Implementation for Path Splitting for Loop with Conditional IF-THEN-ELSE

2015-06-02 Thread Ajit Kumar Agarwal


-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Tuesday, June 02, 2015 9:19 PM
To: Ajit Kumar Agarwal; Richard Biener; gcc@gcc.gnu.org
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [RFC] Design and Implementation for Path Splitting for Loop with 
Conditional IF-THEN-ELSE

On 06/02/2015 12:35 AM, Ajit Kumar Agarwal wrote:
>>> I don't offhand know if any of the benchmarks you cite above are 
>>> free-enough to derive a testcase from.  But one trick many of us use 
>>> is to instrument the >>pass and compile some known free software 
>>> (often gcc
>>> itself) to find triggering code and use that to generate tests for the new 
>>> transformation.
>
> I will add tests in the suite. I could see many existing tests in the suite 
> also get triggered with this optimization.
>>Thanks.  For cases in the existing testsuite where you need to change the 
>>expected output, it's useful to note why the expected output was changed.  
Sometimes a test is compromised by a new optimization, sometimes the 
>>expected output is changed and is papering over a problem, etc so it's 
>>something >>we look at reasonably closely.

Thanks. I will modify accordingly.

>
>>
>> diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c index 9faa339..559ca96 
>> 100644
>> --- a/gcc/cfghooks.c
>> +++ b/gcc/cfghooks.c
>> @@ -581,7 +581,7 @@ delete_basic_block (basic_block bb)
>>
>>  /* If we remove the header or the latch of a loop, mark the loop for
>>   removal.  */
>> -  if (loop->latch == bb
>> +  if (loop && loop->latch == bb
>>|| loop->header == bb)
>>  mark_loop_for_removal (loop);
>>> So what caused you to add this additional test?  In general loop 
>>> structures are supposed to always be available.  The change here 
>>> implies that the loop structures were gone at some point.  That 
>>> seems at first glance a mistake.
>
> I was using the gimple_duplicate_bb which will not add the duplicate 
> basic block inside the current_loops. That's why the above Condition 
> is required. I am using duplicate_block instead of gimple_duplicate_bb. With 
> this change the above check with loop Is not required as it adds the 
> duplicate basic block inside the loops.
>>OK.  Good to hear it's not required anymore.


>
>>
>> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index aed5254..b25e409 
>> 100644
>> --- a/gcc/tree-cfg.c
>> +++ b/gcc/tree-cfg.c
>> @@ -1838,6 +1838,64 @@ replace_uses_by (tree name, tree val)
>>}
>>}
>>
>> +void
>> +gimple_threaded_merge_blocks (basic_block a, basic_block b)
> If we keep this function it will need a block comment.
>
>>> I say "if" for a couple reasons.  First we already have support 
>>> routines that know how to merge blocks.  If you really need to merge 
>>> blocks you should try to use them.
>
>>> Second, I'm not sure that you really need to worry about block 
>>> merging in this pass.  Just create the duplicates, wire them into 
>>> the CFG and let the existing block merging support handle this problem.
>
> The above routine is not merging but duplicates the join nodes into 
> its predecessors. If I change the name of the above Function to the 
> gimple_threaded_duplicating_join_node it should be fine.
>>But you don't need to duplicate into the predecessors.  If you create the 
>>duplicates and wire them into the CFG properly the existing code in 
>>cfgcleanup >>should take care of this for you.

Certainly I will do it.
>
>> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
>> index 4303a18..2c7d36d 100644
>> --- a/gcc/tree-ssa-threadedge.c
>> +++ b/gcc/tree-ssa-threadedge.c
>> @@ -1359,6 +1359,322 @@ thread_through_normal_block (edge e,
>>  return 0;
>>}
>>
>> +static void
>> +replace_threaded_uses (basic_block a,basic_block b)
>>> If you keep this function, then it'll need a function comment.
>
>>> It looks like this is just doing const/copy propagation.  I think a
>>> better structure is to implement your optimization as a distinct pass,
>>> then rely on existing passes such as update_ssa, DOM, CCP to handle
>>> updating the SSA graph and propagation opportunities exposed by your
>>> transformation.
>
>
>>> Similarly for the other replace_ functions.
>
> I think these replace_ functions are required as existing Dom, CCP and 
> propagation opportunities doesn't transform these
> Propagation given below.
>
> :
> xk_124 = MIN_EXPR ;
> xc_126 = xc_121 - xk_6;
> xm_127 = xm_122 - xk_6;
> xy_128 = xy_123 - xk_6;
> *EritePtr_14 = xc_126;
> MEM[(Byte *)EritePtr_14 + 1B] = xm_127;
> MEM[(Byte *)EritePtr_14 + 2B] = xy_128;
> EritePtr_135 = &MEM[(void *)EritePtr_14 + 4B];
> MEM[(Byte *)EritePtr_14 + 3B] = xk_6;
> i_137 = i_4 + 1;
> goto ;
>
> :
> xk_125 = MIN_EXPR ;
> xc_165 = xc_121 - xk_6;
> xm_166 = xm_122 - xk_6;
> xy_167 = xy_123 - xk_6;
> *EritePtr_14 = xc_126;
> MEM[(Byte *)EritePtr_14 + 1B] = xm_127;
> MEM[(Byte *)EritePtr_14 + 2B] = xy_128;
> EritePtr_171 = &MEM[(void *)Eri

Re: Better info for combine results in worse code generated

2015-06-02 Thread Richard Henderson

On 05/31/2015 07:03 PM, Alan Modra wrote:

I agree.  Do you intend to get rid of WORD_REGISTER_OPERATIONS,
POINTERS_EXTEND_UNSIGNED, PUSH_ROUNDING, SHORT_IMMEDIATES_SIGN_EXTEND,
and LOAD_EXTEND_OP?  ;-)


Oh yes, please.  ;-)

Although for the specific case of this thread, W_R_O, S_I_S_E, and L_E_O are 
really relevant.  P_E_U and P_R are completely different hacks.



r~


PS: I'd forgotten S_I_S_E existed.  I love the comment:

   ??? For 2.5, try to tighten up the MD files in this regard instead of this
   kludge.


Re: Is it safe to use _Bool as asm statement outputs on x86?

2015-06-02 Thread Richard Henderson

On 06/02/2015 04:46 PM, H. Peter Anvin wrote:

For the x86 backend explicitly, is doing something like:

_Bool x;

asm("blah ; setc %0" : "=qm" (x));

... guaranteed to be safe for older versions of gcc?


I believe so, for the restricted set of conditions I expect you're asking.
In particular:

 (1) Linux has always defined _Bool as a byte (indeed, afaik only Darwin
 has ever done otherwise).

 (2) You must really produce 0/1 from the asm; the compiler doesn't re-do
 the canonicalization afterward, and afaik we do rely on that in the
 optimizers.  But certainly that's true for any version of GCC.



r~


Re: Better info for combine results in worse code generated

2015-06-02 Thread Alan Modra
On Tue, Jun 02, 2015 at 11:28:09AM -0500, Segher Boessenkool wrote:
> On Tue, Jun 02, 2015 at 08:49:37AM +0930, Alan Modra wrote:
> > but and64_2_operand doesn't include all of and_operand!
> 
> Maybe I'm slow today, but I don't see it?  Do you have an example?

I need to get new glasses.  That's the best excuse I can come up with
at short notice. :)  mask64_2_operand, used by and64_2_operand,
does indeed cover all of mask_operand and mask64_operand.  Even so,
the predicate deserves to die.

> > > > get rid of WORD_REGISTER_OPERATIONS,
> > > 
> > > rs6000 should not define it.  What e.g. does it mean for mullw?  Or,
> > > worse, mulhw?  Pretty much anything with "w" in its name is problematic.
> > 
> > In many places WORD_REGISTER_OPERATIONS is used, it is saying "don't
> > trust the high bits".  At the moment we definitely do need it defined!
> 
> I don't see that either; do you have a pointer for me?

The first occurrence in combine.c looks like such a place to me.  Also
the first one in rtlanal.c:nonzero_bits1.

-- 
Alan Modra
Australia Development Lab, IBM


Is it safe to use _Bool as asm statement outputs on x86?

2015-06-02 Thread H. Peter Anvin
For the x86 backend explicitly, is doing something like:

_Bool x;

asm("blah ; setc %0" : "=qm" (x));

... guaranteed to be safe for older versions of gcc?

-hpa


Re: s390: SImode pointers vs LR

2015-06-02 Thread Richard Henderson

On 06/02/2015 08:32 AM, Andreas Krebbel wrote:

-(define_insn "*3"
+(define_insn "*3_reg"
[(set (match_operand:GPR 0 "register_operand" "=d")
  (SHIFT:GPR (match_operand:GPR 1 "register_operand" "")
-   (match_operand:SI 2 "shift_count_or_setmem_operand" "Y")))]
+   (match_operand:SI 2 "register_operand" "a")))]
""
-  "sl\t%0,<1>%Y2"
+  "sl\t%0,<1>%2"
+  [(set_attr "op_type"  "RS")
+   (set_attr "atype""reg")])
+
+(define_insn "*3_imm"
+  [(set (match_operand:GPR 0 "register_operand" "=d")
+(SHIFT:GPR (match_operand:GPR 1 "register_operand" "")
+   (match_operand 2 "immediate_operand" "J")))]
+  ""
+  "sl\t%0,<1>%2"
+  [(set_attr "op_type"  "RS")
+   (set_attr "atype""reg")])


These two ought not be split apart.  They're simple alternatives.  And why 
SImode?  You also shouldn't drop the mode on immediate operands.


r~


+
+(define_insn "*3_immreg"
+  [(set (match_operand:GPR 0 "register_operand" "=d")
+(SHIFT:GPR (match_operand:GPR 1 "register_operand" "")
+  (plus:SI
+   (match_operand:SI 2 "register_operand""a")
+   (match_operand3 "immediate_operand"   "J"]
+  ""
+  "sl\t%0,<1>%2(%3)"
[(set_attr "op_type"  "RS")
 (set_attr "atype""reg")])





gcc-5-20150602 is now available

2015-06-02 Thread gccadmin
Snapshot gcc-5-20150602 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/5-20150602/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-5-branch 
revision 224047

You'll find:

 gcc-5-20150602.tar.bz2   Complete GCC

  MD5=f60f5874c24b8fd5cba9bf29d23b8722
  SHA1=ad565796d084aefc9bb872360c002d363c9c36a6

Diffs from 5-20150526 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: target attributes/pragmas changing vector instruction availability and custom types

2015-06-02 Thread Michael Meissner
On Tue, May 19, 2015 at 03:17:26PM +0100, Kyrill Tkachov wrote:
> Hi all,

Sorry, it took me awhile to comment.

> I'm working on enabling target attributes and pragmas on aarch64 and I'm 
> stuck on a particular issue.
> I want to be able to use a target pragma to enable SIMD support in a SIMD 
> intrinsics header file.
> So it will look like this:
> 
> $ cat simd_header.h
> #pragma GCC push_options
> #pragma GCC target ("arch=armv8-a+simd")
> 
> #pragma GCC pop_options
> 
> I would then include it in a file with a function tagged with a simd target 
> attribute:
> 
> $ cat foo.c
> #inlcude "simd_header.h"
> 
> __attribute__((target("arch=armv8-a+simd")))
> uint32x4_t
> foo (uint32x4_t a)
> {
>   return simd_intrinsic (a); //simd_intrinsic defined in simd_header.h and 
> implemented by a target builtin
> }
> 
> This works fine for me. But if I try to compile this without SIMD support, 
> say: aarch64-none-elf-gcc -c -march=armv8-a+nosimd foo.c
> I get an ICE during builtin expansion time.
> 
> I think I've tracked it down to the problem that the type uint32x4_t is a 
> builtin type that we define in the backend
> (with add_builtin_type) during the target builtins initialisation code.

We ran into this on the PowerPC.  If you compile for 32-bit, the Altivec ABI is
not enabled by default.  The compiler won't let you add VSX/Altivec target
options unless you compiled it with -mabi=altivec.

> From what I can see, this code gets called early on after the command line 
> options have been processed,
> but before target pragmas or attributes are processed, so the builtin types 
> are laid out assuming that no SIMD is available,
> as per the command line option -march=armv8-a+nosimd, but later while 
> expanding the builtin in simd_intrinsic with SIMD available
> the ICE occurs. I think that is because the types were not re-laid out.
> 
> I'm somewhat stumped on ideas to work around this issue.
> I notice that rs6000 also defines custom builtin vector types.
> Michael, did you notice any issue similar to what I described above?
> 
> Would re-laying the builtin vector type on target attribute changes be a 
> valid way to go forward here?

You might look at the x86_64.  When I was updating the target support for the
last global change affecting target support, I was pointed to mods that the
x86_64 folk had done.  One of their mods is that they don't define all of the
built-in functions at compiler startup.  Instead, they only add the built-in
functions that are allowed for the current set of switches.  When a target
attribute or pragma adds new features, they go through the built-ins and add
any functions that were previously not added, but can now be added due to
having the appropriate target options enabled.

It helps avoid the big memory sync to create all ten million tree nodes for the
built-in functions for programs that don't use target attributes or pragmas.
It is something I thought about doing when I was working at AMD, and it looks
like they've done it now.

So if you are going to define all of the functions at compiler startup time
(like the PowerPC does), you have to define all of the types used, even the
current switches don't allow use of the types.  Or you can only define what you
need, and when you change options, you go through and define any stuff that
wasn't previously defined that you can now use (like the current x86_64).

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: s390: SImode pointers vs LR

2015-06-02 Thread Jeff Law

On 06/02/2015 09:32 AM, Andreas Krebbel wrote:

Bootstrap failed with:

/home/andreas/clean/../gcc/gcc/dwarf2out.c: In function ‘constant_size’:
/home/andreas/clean/../gcc/gcc/dwarf2out.c:6572: error: insn does not satisfy 
its constraints:
(insn 39 38 66 6 /home/andreas/clean/../gcc/gcc/toplev.h:176 (set (reg:SI 2 %r2 
[orig:44
prephitmp.4958 ] [44])
 (ashift:SI (reg:SI 2 %r2 [orig:44 prephitmp.4958 ] [44])
 (plus:SI (reg:SI 12 %r12 [49])
 (reg:SI 2 %r2 [orig:44 prephitmp.4958 ] [44] 360 
{*ashlsi3} (nil)
 (nil))
/home/andreas/clean/../gcc/gcc/dwarf2out.c:6572: internal compiler error: in
reload_cse_simplify_operands, at postreload.c:393

This is the result of reload reloading the constant in:

(insn 58 57 59 6 (set (reg/v:SI 48 [ log ])
 (ashift:SI (reg:SI 66)
 (plus:SI (reg:SI 2 %r2 [+4 ])
 (const_int 1 [0x1] 360 {*ashlsi3} (insn_list:REG_DEP_TRUE 
53
(insn_list:REG_DEP_TRUE 57 (nil)))
 (expr_list:REG_DEAD (reg:DI 2 %r2)
 (expr_list:REG_DEAD (reg:SI 66)
 (nil

Reloads for insn # 58
Reload 0: reload_in (SI) = (const_int 1 [0x1])
 ADDR_REGS, RELOAD_FOR_INPUT (opnum = 2)
 reload_in_reg: (const_int 1 [0x1])
 reload_reg_rtx: (reg:SI 1 %r1 [66])
Reload 1: reload_in (SI) = (reg:SI 1 %r1 [66])
 reload_out (SI) = (reg/v:SI 9 %r9 [orig:48 log ] [48])
 GENERAL_REGS, RELOAD_OTHER (opnum = 0)
 reload_in_reg: (reg:SI 1 %r1 [66])
 reload_out_reg: (reg/v:SI 9 %r9 [orig:48 log ] [48])
 reload_reg_rtx: (reg/v:SI 9 %r9 [orig:48 log ] [48])
But shouldn't we be reloading the (plus (reg) (const_int 1)) part as a 
whole rather than sub-components?





Without accepting SImode in s390_decompose_address
find_reloads_address reloads const_int 1 into a reg.  Unfortunately it
assumes that this makes a valid address because double_reg_address_ok
is true on our target. But of course the shift instruction cannot deal
with an index register.
But isn't that 3 registers used in the address computation if the 
(const_int 1) gets reloaded?  one of the value shifted, two for the 
shift count?  I'm not familiar with the s390, so if you can handle that 
kind of insn, then, umm, cool.






However, the problem to me appears to be that s390_decompose_address
is involved for shift count operands at all.  We have a separate
function for this (s390_decompose_shift_count) which is invoked by the
relevant predicate and constraint. s390_decompose_address is only
called because the Y constraint letter is marked as
EXTRA_ADDRESS_CONSTRAINT.
Note that the EXTRA_ADDRESS_CONSTRAINT stuff all changed a little while 
back, not sure if it makes any significant difference in the analysis 
though.


The only other thing that comes immediately to mind would be secondary 
reloads.  But I always hate suggesting them.



Jeff


Re: ifcvt limitations?

2015-06-02 Thread Jeff Law

On 06/02/2015 09:57 AM, Kyrill Tkachov wrote:

I'm stuck on noce_process_if_block (in ifcvt.c) and what I think is a
restriction that the THEN-block contents have to be only a single set
insn. This fails on aarch64 because we get an extra zero_extend.

In particular, the following check in noce_process_if_block triggers:
   insn_a = first_active_insn (then_bb);
   if (! insn_a
   || insn_a != last_active_insn (then_bb, FALSE)
   || (set_a = single_set (insn_a)) == NULL_RTX)
 return FALSE;

Is there any particular reason why the code shouldn't be able to handle
arbitrarily large contents
in then_bb (within a sane limit)?
It's just never been implemented or tested per this comment in 
noce_process_if_block.


  /* We're looking for patterns of the form

 (1) if (...) x = a; else x = b;
 (2) x = b; if (...) x = a;
 (3) if (...) x = a;   // as if with an initial x = x.

 The later patterns require jumps to be more expensive.

 ??? For future expansion, look for multiple X in such patterns.  */

I think folks would look favorably upon removing that limitation, 
obviously with some kind of cost checking.


Jeff


Re: Better info for combine results in worse code generated

2015-06-02 Thread Segher Boessenkool
On Tue, Jun 02, 2015 at 08:49:37AM +0930, Alan Modra wrote:
> > > In and3 expander I think you want the following since
> > > and64_2_operand covers the extra double-rotate cases, not all DImode.
> > > 
> > > -  if ((mode == DImode && !and64_2_operand (operands[2], 
> > > mode))
> > > -  || (mode != DImode && !and_operand (operands[2], 
> > > mode)))
> > > +  if (!and_operand (operands[2], mode)
> > > +  && (mode != DImode || !and64_2_operand (operands[2], 
> > > mode)))
> > 
> > and64_2_operand includes all of and_operand.  I agree it is a mess.
> 
> but and64_2_operand doesn't include all of and_operand!

Maybe I'm slow today, but I don't see it?  Do you have an example?

> > > get rid of WORD_REGISTER_OPERATIONS,
> > 
> > rs6000 should not define it.  What e.g. does it mean for mullw?  Or,
> > worse, mulhw?  Pretty much anything with "w" in its name is problematic.
> 
> In many places WORD_REGISTER_OPERATIONS is used, it is saying "don't
> trust the high bits".  At the moment we definitely do need it defined!

I don't see that either; do you have a pointer for me?

I've run some bootstraps and tests with WORD_REGISTER_OPERATIONS
undefined: 32-bit code becomes very slightly bigger, and 64-bit code
becomes slightly smaller.  No change in testresults.


Segher


ifcvt limitations?

2015-06-02 Thread Kyrill Tkachov

Hi all,

I'm looking at a case on aarch64 that's not if-converted to use conditional 
moves:


typedef unsigned char uint8_t;
typedef unsigned int uint16_t;

uint8_t foo(const uint8_t byte, const uint16_t generator)
{
  if (byte & 0x80) {
return (byte << 1) ^ (generator & 0xff);
  } else {
return byte << 1;
  }
}

For aarch64 we fail to if-convert and generate:
foo:
uxtbw2, w0
lsl w3, w2, 1
uxtbw0, w3
tbnzx2, 7, .L5
ret
.p2align 3
.L5:
eor w0, w3, w1
uxtbw0, w0
ret


whereas on x86 we if convert successfully and use a conditional move/select:
leal(%rdi,%rdi), %eax
xorl%eax, %esi
testb   %dil, %dil
cmovs   %esi, %eax
ret



After fixing some of the branch costs in aarch64 and a bogus cost calculation 
in cheap_bb_rtx_cost_p
I'm stuck on noce_process_if_block (in ifcvt.c) and what I think is a 
restriction that the THEN-block contents have to be only a single set insn. 
This fails on aarch64 because we get an extra zero_extend.

In particular, the following check in noce_process_if_block triggers:
  insn_a = first_active_insn (then_bb);
  if (! insn_a
  || insn_a != last_active_insn (then_bb, FALSE)
  || (set_a = single_set (insn_a)) == NULL_RTX)
return FALSE;

Is there any particular reason why the code shouldn't be able to handle 
arbitrarily large contents
in then_bb (within a sane limit)?

Thanks,
Kyrill


Re: [RFC] Design and Implementation for Path Splitting for Loop with Conditional IF-THEN-ELSE

2015-06-02 Thread Jeff Law

On 06/02/2015 12:35 AM, Ajit Kumar Agarwal wrote:

I don't offhand know if any of the benchmarks you cite above are free-enough to 
derive a testcase from.  But one trick many of us use is to instrument the 
>>pass and compile some known free software (often gcc
itself) to find triggering code and use that to generate tests for the new 
transformation.


I will add tests in the suite. I could see many existing tests in the suite 
also get triggered with this optimization.
Thanks.  For cases in the existing testsuite where you need to change 
the expected output, it's useful to note why the expected output was 
changed.  Sometimes a test is compromised by a new optimization, 
sometimes the expected output is changed and is papering over a problem, 
etc so it's something we look at reasonably closely.






diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c
index 9faa339..559ca96 100644
--- a/gcc/cfghooks.c
+++ b/gcc/cfghooks.c
@@ -581,7 +581,7 @@ delete_basic_block (basic_block bb)

 /* If we remove the header or the latch of a loop, mark the loop for
 removal.  */
-  if (loop->latch == bb
+  if (loop && loop->latch == bb
  || loop->header == bb)
mark_loop_for_removal (loop);

So what caused you to add this additional test?  In general loop
structures are supposed to always be available.  The change here implies
that the loop structures were gone at some point.  That seems at first
glance a mistake.


I was using the gimple_duplicate_bb which will not add the duplicate basic 
block inside the current_loops. That's why the above
Condition is required. I am using duplicate_block instead of 
gimple_duplicate_bb. With this change the above check with loop
Is not required as it adds the duplicate basic block inside the loops.

OK.  Good to hear it's not required anymore.






diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index aed5254..b25e409 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -1838,6 +1838,64 @@ replace_uses_by (tree name, tree val)
   }
   }

+void
+gimple_threaded_merge_blocks (basic_block a, basic_block b)

If we keep this function it will need a block comment.


I say "if" for a couple reasons.  First we already have support routines
that know how to merge blocks.  If you really need to merge blocks you
should try to use them.



Second, I'm not sure that you really need to worry about block merging
in this pass.  Just create the duplicates, wire them into the CFG and
let the existing block merging support handle this problem.


The above routine is not merging but duplicates the join nodes into its 
predecessors. If I change the name of the above
Function to the gimple_threaded_duplicating_join_node it should be fine.
But you don't need to duplicate into the predecessors.  If you create 
the duplicates and wire them into the CFG properly the existing code in 
cfgcleanup should take care of this for you.





diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 4303a18..2c7d36d 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -1359,6 +1359,322 @@ thread_through_normal_block (edge e,
 return 0;
   }

+static void
+replace_threaded_uses (basic_block a,basic_block b)

If you keep this function, then it'll need a function comment.



It looks like this is just doing const/copy propagation.  I think a
better structure is to implement your optimization as a distinct pass,
then rely on existing passes such as update_ssa, DOM, CCP to handle
updating the SSA graph and propagation opportunities exposed by your
transformation.




Similarly for the other replace_ functions.


I think these replace_ functions are required as existing Dom, CCP and 
propagation opportunities doesn't transform these
Propagation given below.

:
xk_124 = MIN_EXPR ;
xc_126 = xc_121 - xk_6;
xm_127 = xm_122 - xk_6;
xy_128 = xy_123 - xk_6;
*EritePtr_14 = xc_126;
MEM[(Byte *)EritePtr_14 + 1B] = xm_127;
MEM[(Byte *)EritePtr_14 + 2B] = xy_128;
EritePtr_135 = &MEM[(void *)EritePtr_14 + 4B];
MEM[(Byte *)EritePtr_14 + 3B] = xk_6;
i_137 = i_4 + 1;
goto ;

:
xk_125 = MIN_EXPR ;
xc_165 = xc_121 - xk_6;
xm_166 = xm_122 - xk_6;
xy_167 = xy_123 - xk_6;
*EritePtr_14 = xc_126;
MEM[(Byte *)EritePtr_14 + 1B] = xm_127;
MEM[(Byte *)EritePtr_14 + 2B] = xy_128;
EritePtr_171 = &MEM[(void *)EritePtr_14 + 4B];
MEM[(Byte *)EritePtr_14 + 3B] = xk_6;
i_173 = i_4 + 1;

 and  are the predecessors of the join node. After the 
duplication of join node and the duplicating the
Join node into the predecessors < bb 29> and , the above is the gimple 
representations.
}

But the < bb 30 > should be the following after the transformation.

:
xk_125 = MIN_EXPR ;
xc_165 = xc_121 - xk_125;
xm_166 = xm_122 - xk_125;
xy_167 = xy_123 - xk_125;
*EritePtr_14 = xc_165;
MEM[(Byte *)EritePtr_14 + 1B] = xm_166;
MEM[(Byte *)EritePtr_14 + 2B] = xy_167;
EritePtr_171 = &MEM[(void *)EritePtr_14 + 4B];
MEM[(Byte *)EritePtr_14 + 3B] = xk_125;
i_173 = i_4 + 1;

The phi-only cprop will  replace th

Re: s390: SImode pointers vs LR

2015-06-02 Thread Andreas Krebbel
On 06/01/2015 05:06 PM, Jeff Law wrote:
> On 06/01/2015 06:23 AM, Andreas Krebbel wrote:
>> On 05/30/2015 02:57 AM, DJ Delorie wrote:
>>> In config/s390/s390.c we accept addresses that are SImode:
>>>
>>>if (!REG_P (base)
>>>   || (GET_MODE (base) != SImode
>>>   && GET_MODE (base) != Pmode))
>>> return false;
>>>
>>> However, there doesn't seem to be anything in the s390's opcodes that
>>> masks the top half of address registers in 64-bit mode, the SImode
>>> convention seems to just be a convention for addresses in the first
>>> 4Gb.
>>>
>>> So... what happens if gcc uses a subreg to load the lower half of a
>>> register (via LR), leaving the upper half with random bits in it, then
>>> uses that register as an address?  I could see no code that checked
>>> for this, and I have a fairly large and ungainly test case that says
>>> it breaks :-(
>>>
>>> My local solution was to just disallow "SImode" as an address in
>>> s390_decompose_address, which forces gcc to do an explicit SI->DI
>>> conversion to clear the upper bits, and it seems to work, but I wonder
>>> if it's the ideal solution...
>>
>> We also have address style operands which are used as shift count operands 
>> but never as addresses.
>> Accepting SImode there was necessary to prevent reload from messing things 
>> up:
>> https://gcc.gnu.org/ml/gcc-patches/2006-03/msg01495.html
>>
>> I don't know if this is still necessary though.
> Looks like a hack to me.  Sadly, no testcase in that BZ, though 
> presumably since it was a bootstrap failure one could checkout a 
> development tree from that era and see the failure.

I've checked r112357 (https://gcc.gnu.org/ml/gcc-patches/2006-03/msg01495.html)
with the SImode check part reverted:

Bootstrap failed with:

/home/andreas/clean/../gcc/gcc/dwarf2out.c: In function ‘constant_size’:
/home/andreas/clean/../gcc/gcc/dwarf2out.c:6572: error: insn does not satisfy 
its constraints:
(insn 39 38 66 6 /home/andreas/clean/../gcc/gcc/toplev.h:176 (set (reg:SI 2 %r2 
[orig:44
prephitmp.4958 ] [44])
(ashift:SI (reg:SI 2 %r2 [orig:44 prephitmp.4958 ] [44])
(plus:SI (reg:SI 12 %r12 [49])
(reg:SI 2 %r2 [orig:44 prephitmp.4958 ] [44] 360 {*ashlsi3} 
(nil)
(nil))
/home/andreas/clean/../gcc/gcc/dwarf2out.c:6572: internal compiler error: in
reload_cse_simplify_operands, at postreload.c:393

This is the result of reload reloading the constant in:

(insn 58 57 59 6 (set (reg/v:SI 48 [ log ])
(ashift:SI (reg:SI 66)
(plus:SI (reg:SI 2 %r2 [+4 ])
(const_int 1 [0x1] 360 {*ashlsi3} (insn_list:REG_DEP_TRUE 53
(insn_list:REG_DEP_TRUE 57 (nil)))
(expr_list:REG_DEAD (reg:DI 2 %r2)
(expr_list:REG_DEAD (reg:SI 66)
(nil

Reloads for insn # 58
Reload 0: reload_in (SI) = (const_int 1 [0x1])
ADDR_REGS, RELOAD_FOR_INPUT (opnum = 2)
reload_in_reg: (const_int 1 [0x1])
reload_reg_rtx: (reg:SI 1 %r1 [66])
Reload 1: reload_in (SI) = (reg:SI 1 %r1 [66])
reload_out (SI) = (reg/v:SI 9 %r9 [orig:48 log ] [48])
GENERAL_REGS, RELOAD_OTHER (opnum = 0)
reload_in_reg: (reg:SI 1 %r1 [66])
reload_out_reg: (reg/v:SI 9 %r9 [orig:48 log ] [48])
reload_reg_rtx: (reg/v:SI 9 %r9 [orig:48 log ] [48])

Without accepting SImode in s390_decompose_address
find_reloads_address reloads const_int 1 into a reg.  Unfortunately it
assumes that this makes a valid address because double_reg_address_ok
is true on our target. But of course the shift instruction cannot deal
with an index register.

However, the problem to me appears to be that s390_decompose_address
is involved for shift count operands at all.  We have a separate
function for this (s390_decompose_shift_count) which is invoked by the
relevant predicate and constraint. s390_decompose_address is only
called because the Y constraint letter is marked as
EXTRA_ADDRESS_CONSTRAINT.

So the obvious solution appears to be to remove 'Y' from the
EXTRA_ADDRESS_CONSTRAINTS.  But this is not going to work that easily
since we rely on reload picking a valid base register. As with
addresses we have to avoid r0 here since it stands for an immediate 0
instead of the content of r0.  The legitimate_address_p hook together
with find_reloads_address takes care of this for us.

Since the operand here is not simply a register we cannot easily make
reload to pick one from the ADDR_REGS register class via constraint.

The only way to address this while still exploiting the various
formats of the shift count operand is probably to split the patterns
up into three different variants.  This then would look like the
attached patch where I did it for only one of the instructions:

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 0d03fa6..c5e70ee 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -1617,9 +1617,7 @@ s390_decompose_address (rtx addr, struct s390_address 
*out)
return false;
 

C++ Instantiation Problem

2015-06-02 Thread Yongwei Wu
I encountered this C++ instantiation problem. (Please guide me, if
this is not the correct list.)

The minimal test case I have is as follows:

#include 

template 
auto compose(Fn fn)
{
return [fn](auto&&... x) -> decltype(auto)
{
return fn(std::forward(x)...);
};
}

template 
auto sum(T1 x, T2 y)
{
return x + y;
}

int main()
{
//(void)sum;  // GCC requires this instantiation
auto sum_composed = compose(sum);
std::cout << sum_composed(1, 2) << std::endl;
}

The code cannot compile (using -std=c++14, under GCC 4.9 and 5.1),
unless I uncomment the first line in main(). The reported error is:

test.cpp: In function 'int main()':
test.cpp:21:46: error: no matching function for call to
'compose()'
 auto sum_composed = compose(sum);
  ^
test.cpp:4:6: note: candidate: template auto compose(Fn)
 auto compose(Fn fn)
  ^
test.cpp:4:6: note:   template argument deduction/substitution failed:
test.cpp:21:46: note:   couldn't deduce template parameter 'Fn'
 auto sum_composed = compose(sum);
  ^

Clang does not complain at the code, and I cannot see why the
instantiation has to be done manually.

Is it a GCC bug? Or is the code not conformant to the C++14 standard in any way?

Thanks and best regards,

Yongwei

-- 
Wu Yongwei
URL: http://wyw.dcweb.cn/


Re: Is the check performed on the return value from force_const_mem in the plus_constant function correct?

2015-06-02 Thread Jeff Law

On 06/02/2015 05:22 AM, Andrew Bennett wrote:

Hi,

I am debugging a segmentation fault when compiling some code for MIPS.  The
segmentation fault occurs in the plus_constant function in explow.c when it
tries to update a constant pool value.  The offending code is below:

case MEM:
   /* If this is a reference to the constant pool, try replacing it with
  a reference to a new constant.  If the resulting address isn't
  valid, don't return it because we have no way to validize it.  */
   if (GET_CODE (XEXP (x, 0)) == SYMBOL_REF
   && CONSTANT_POOL_ADDRESS_P (XEXP (x, 0)))
 {
   tem = plus_constant (mode, get_pool_constant (XEXP (x, 0)), c);
   tem = force_const_mem (GET_MODE (x), tem);
   if (memory_address_p (GET_MODE (tem), XEXP (tem, 0)))
 return tem;
 }


The code fails for MIPS because its cannot_force_const_mem target function does
not allow constants (so that the move expanders can deal with them later on),
this then causes the force_const_mem function to return a NULL_RTX.

What I can't understand is why the memory_address_p function is used to check
the result from force_const_mem?  From looking at the return values from
force_const_mem the only possibilies are what looks like a valid memory rtx,
or a NULL_RTX.  Also there is no other instances in the code where this happens.
I was therefore wondering is anyone knew why the check is done in this manner?

Right, force_const_mem will return a (mem ...)) or NULL_RTX.

In the case of (mem ...), the validity of the memory address is not 
checked by force_const_mem.  That's what the call to memory_address_p 
you cited above does.  If the address is valid for the target, then we 
can return the newly created (mem ...) from plus_constant.




I think the fix should be to just check that the tem variable is not a NULL_RTX.
I was also wondering if anyone had any issues/objections to checking the result
in this manner?

Something like this seems right to me

/* Targets may disallow some constants in the constant pool, thus
   force_const_mem may return NULL_RTX.  */
if (tem && memory_address_p (...))

Jeff


Re: Strange insn rtx is emitted in a custom backend

2015-06-02 Thread Lev Yudalevich
Thanks a lot, Jeff!
I was missing some stuff in TARGET_LEGITIMATE_CONSTANT_P.

Lev.

On Mon, Jun 1, 2015 at 11:35 PM, Jeff Law  wrote:
> On 06/01/2015 12:51 PM, Lev Yudalevich wrote:
>>
>> While working on a custom backend for quite a standard RISC-like
>> architecture, I defined 'high'/'lo_sum' patterns as follows:
>>
>> (define_insn "mov_high"
>>[(set (match_operand:SI 0 "register_operand" "=r")
>>(high:SI (match_operand:SI 1 "immediate_operand" "i")))]
>>""
>>"mvup #high(%1), %0"
>>[(set_attr "insn" "alu")
>> (set_attr "length" "4")]
>> )
>>
>> (define_insn "add_low"
>>[(set (match_operand:SI 0 "register_operand" "=r")
>>(lo_sum:SI (match_operand:SI 1 "register_operand" "0")
>> (match_operand:SI 2 "symbolic_operand" "")))]
>> (match_operand:SI 2 "immediate_operand"
>> "i")))]
>>""
>>"add3 %0, #low(%2), %0"
>>[(set_attr "insn" "alu")
>> (set_attr "length" "4")]
>> )
>>
>>
>> Despite having the patters above (along with usual 'plus' patterns)
>> I'm getting the following rtx emitted at the expansion stage:
>>
>> (insn 53 52 63 2 (set (reg:SI 62)
>> (const:SI (plus:SI (symbol_ref:SI [flags
>> 0x6])
>>(const_int
>> 64 [0x40] -1
>>  (nil))
>>
>> which can not be recognized unless I define a pattern like this:
>
> You should debug precisely where that came from.  You're missing something
> in your backend to tell GCC that such constants are not valid.
>
> You should probably look at how expansion occurs for your tests on other
> ports that make heavy use of high/lo_sum.  sparc & hppa come immediately to
> mind, but I'm sure there's others.
>
> jeff
>


Is the check performed on the return value from force_const_mem in the plus_constant function correct?

2015-06-02 Thread Andrew Bennett
Hi,

I am debugging a segmentation fault when compiling some code for MIPS.  The
segmentation fault occurs in the plus_constant function in explow.c when it
tries to update a constant pool value.  The offending code is below:

case MEM:
  /* If this is a reference to the constant pool, try replacing it with
 a reference to a new constant.  If the resulting address isn't
 valid, don't return it because we have no way to validize it.  */
  if (GET_CODE (XEXP (x, 0)) == SYMBOL_REF
  && CONSTANT_POOL_ADDRESS_P (XEXP (x, 0)))
{
  tem = plus_constant (mode, get_pool_constant (XEXP (x, 0)), c);
  tem = force_const_mem (GET_MODE (x), tem);
  if (memory_address_p (GET_MODE (tem), XEXP (tem, 0)))
return tem;
}


The code fails for MIPS because its cannot_force_const_mem target function does
not allow constants (so that the move expanders can deal with them later on),
this then causes the force_const_mem function to return a NULL_RTX.

What I can't understand is why the memory_address_p function is used to check
the result from force_const_mem?  From looking at the return values from
force_const_mem the only possibilies are what looks like a valid memory rtx,
or a NULL_RTX.  Also there is no other instances in the code where this happens.
I was therefore wondering is anyone knew why the check is done in this manner?

I think the fix should be to just check that the tem variable is not a NULL_RTX.
I was also wondering if anyone had any issues/objections to checking the result 
in this manner?


Many thanks,


Andrew

Documentation complex logarithm https://gcc.gnu.org/onlinedocs/gfortran/LOG.html

2015-06-02 Thread C . Friedrich

The LOG function returns the principal value of the complex logarithm
whose imaginary part omega must be in the range -pi < omega <= pi.

GFORTRAN does it correctly but the online documentation
https://gcc.gnu.org/onlinedocs/gfortran/LOG.html gives -pi <= omega <=
pi. Perhaps this should be corrected.

Christoph





Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt





Re: g++ regression: template function accessing a temporary through a lambda

2015-06-02 Thread Jonathan Wakely
On 2 June 2015 at 06:28,   wrote:
> Hi folks,
> the following C++ snippet used to compile, but doesn't anymore with current 
> gcc trunk.

Bug reports belong in Bugzilla, not on this mailing list, and this is
already in Bugzilla
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66374)