date:20140724

Re: [PATCH 1/4] Add an abstract incremental hash data type

2014-07-24 Thread Oleg Endo


On 24 Jul 2014, at 03:48, Trevor Saunders  wrote:

> On Thu, Jul 24, 2014 at 02:00:24AM +0200, Oleg Endo wrote:
>> On Wed, 2014-07-23 at 11:37 +0200, Richard Biener wrote:
>>> On Fri, Jul 18, 2014 at 3:08 AM, Trevor Saunders  
>>> wrote:
 On Thu, Jul 17, 2014 at 06:36:31AM +0200, Andi Kleen wrote:
> On Wed, Jul 16, 2014 at 10:40:53PM -0400, Trevor Saunders wrote:
>> 
>>> + public:
>>> +
>>> +  /* Start incremential hashing, optionally with SEED.  */
>>> +  void begin (hashval_t seed = 0)
>>> +  {
>>> +val = seed;
>> 
>> why isn't this the ctor?
> 
> It's standard for hash classes to have explicit begin()/end().
> All the existing ones I've seen work this way.
 
 I only know of one vaguelly similar thing
 http://mxr.mozilla.org/mozilla-central/source/mfbt/SHA1.h#37  which
 doesn't do that, and a bunch of people doing something doesn't
 necessarily mean it makes sense.  Now there may be a good reason it
 does make sense, but unless these other people need begin() to be
 fallible I don't see it.
>>> 
>>> I agree with Trevor here.  Please make begin() the constructor.
>>> Btw, what will be the way to plug in an alternative hash function?
>>> That is, there doesn't seem to be a separation of interface
>>> and implementation in your patch (like with a template or a base-class
>>> you inherit from).
>> 
>> Isn't that what boost / std hash is actually doing?  Maybe something
>> like the attached example could be an improvement?
> 
> I don't really see why that makes it any easier to plug in a different
> hash algorithm

typedef incremental_hash my_inc_hash;
 ^^^
  replace with a different class

True, it's not any 'easier' than replacing any other name in the code.
It makes it easier to mix different type dependent hash functions/
algorithms, though.  The way I understood it, Richard asked how to replace
the hash function for the incremental hash class.

> 
>> As with boost / std hash, the hash function is a template functor that
>> can be specialized on various types.  The 'incremental_hash' simply
>> plugs together a hash value combiner and a hash function.  Types that
>> are not implemented by the hash function (in this example) are caught at
>> compile time, since implicit conversions do not take place.  However, it
>> should also possible to write hash functions with a bunch of operator ()
>> overloads to utilize implicit conversions.
> 
> at that point it seems like its simpler and equivelent to just overload
> a global function.

AFAIR, there are some cases which are difficult to handle with overloads
(implicit conversions etc).  It could be a freestanding template
function that is specialized for the types in question.


> 
>> One advantage of this construct is that new types can be added along
>> with hash function specializations, without the need to touch any of the
>> existing hashing facilities.
>> 
>> I don't know how/whether this would fit with the already existing hash
>> map stuff (hash-table.h, hash-map.h).  It seems those don't support user
>> specified hash functions.
> 
> they do, see e.g. graphite-htab.h

Aha.  I see.


Cheers,
Oleg

Re: Migrating gcc.c-torture

2014-07-24 Thread Jakub Jelinek

On Wed, Jul 23, 2014 at 04:52:23PM -0700, Andrew Pinski wrote:
> > Comments, objections? Ok to apply the preliminary patch?
> 
> Yes, what if you don't move the tests but just change how the .exp to
> use the same infrastructure as gcc.dg/torture instead?

Yeah.  I believe gcc.c-torture/compile/ has been converted already,
so it is just about gcc.c-torture/execute/.  Each of these tests has
it's own default idioms, e.g. -w in by default in gcc.c-torture/.
So, please just tweak execute.exp, so that it does what it did until now
(perhaps with the exception of *.x files support) in dg framework, and
convert *.exp files into dg-* directives in the testcases.

Jakub

Re: gcov name cleanup

2014-07-24 Thread Xinliang David Li

On Wed, Jul 23, 2014 at 11:45 PM, Nathan Sidwell  wrote:
> This second patch cleans up some more global names.  the three functions
> should not be visible to the user.  I'm not sure if they're attempting to
> export an interface to the user but they're (a) undocumented and (b) don't
> follow existing gcov naming.
>
> I'm a little confused as to why the gcov library was split into
> libgcov-driver and libgcov-interface, as each subsection's protected by an
> appropriate L_ #ifdef, this appears to serve no purpose.
>

libgcov-interfaces.c define public APIs which user can invoke
programmatically in their program. One example use case is through
http handlers for server programs.  On the other hand,
liggcov-driver.c defines internal functions and actual implementations

Before the refactoring, libgcov.c was huge and hard to read/maintain.

> I have another cleanup coming, and then I'll consider David's patch -- I can
> see how to do what it wants without adding another interface (hurrah!)

Ok. For cleanup code, Teresa can help with review.

thanks,

David
>
> nathan

Re: Migrating gcc.c-torture

2014-07-24 Thread Andreas Schwab

Thomas Schwinge  writes:

> Hi!
>
> On Thu, 24 Jul 2014 01:47:09 +0200, Bernd Schmidt  
> wrote:
>> (git doesn't seem to produce something 
>> nice for the renames unfortunately)
>
> Are you maybe looking for the the -M or -C options to certain commands
> (diff, show, ...)?

Or set diff.renames in your .gitconfig.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [PATCH] Move Asan instrumentation to sanopt pass

2014-07-24 Thread Jakub Jelinek

On Thu, Jul 24, 2014 at 10:21:14AM +0400, Yury Gribov wrote:
> On 07/24/2014 12:09 AM, Jakub Jelinek wrote:
> >>Ah internal fns.  Those cannot have attributes indeed (technical
> >>limitation).
> >>Martin was working on putting those flags elsewhere (cgraph, though internal
> >>functions don't have cgraph nodes either ...).  Maybe it was a bad idea to 
> >>use
> >>internal functions for ASAN.
> >
> >For internal-fn, we already support ECF* constants, guess either we could
> >add support for EAF* too, through internal-fn.def,
> 
> Just hack in EAFs or support full-featured declarations in internals?
> The latter looks more appropriate but would increase size of
> internal function calls by one word (namely
> gimple_statement_call::internal_fn).

Internal functions have internal-fn-id instead of a declaration, the whole
point of them is that they don't have function types etc. and it solely
about the types used for arguments and return value in the IL.
So, either support for just EAF*, or perhaps support for DECL_ATTRIBUTES
for internal-fns, say by having some tree array where you'd store what you
stick into DECL_ATTRIBUTES normally.

Jakub

Re: Strenghten assumption about dynamic type changes (placement new)

2014-07-24 Thread Richard Biener

On Wed, Jul 23, 2014 at 6:29 PM, Jan Hubicka  wrote:
>> On July 23, 2014 4:42:22 PM CEST, Jan Hubicka  wrote:
>> >> On Tue, Jul 22, 2014 at 5:17 PM, Jan Hubicka  wrote:
>> >> >> I don't see why
>> >> >>
>> >> >> long x[1024];
>> >> >>
>> >> >> Q *q = new (x) Q;
>> >> >> q->~Q ();
>> >> >> new (x) T;
>> >> >>
>> >> >> would be invalid.  I also don't see why
>> >> >>
>> >> >> Q q;
>> >> >> q.~Q ();
>> >> >> new (&q) T;
>> >> >>
>> >> >> would be.  Object lifetime is precisely specified and I don't see
>> >where it is
>> >> >> tied to (static) storage lifetime.
>> >> >
>> >> > This is precisely the testcase I posted on beggining of this
>> >thread.
>> >> >
>> >> > I do not see how the testcases can work with aliasing rules in the
>> >case Q's and T's
>> >> > memory is known to not alias.
>> >>
>> >> It works because of the well-defined memory model (with regarding to
>> >> TBAA) in the middle-end.  Every store changes the dynamic type of
>> >> a memory location which means that you can only use TBAA for
>> >> true-dependence checks (not anti-dependence or write-dependence
>> >> checks).
>> >
>> >I see, I did not notice this change - it seems like quite a big hammer
>> >though,
>> >limiting scheduling (and loop opts) quite noticeably for all languages.
>> >Are
>> >there any other motivations for this besides placement new?
>>
>> Aggregate copies and memcpy transferring the dynamic type for example.  
>> Being able to tbaa union accesses for another.  And yes, placement new.
>>
>> It's not so much an optimization preventing thing as you still can move 
>> loads up and stores down with the help of tbaa.
>
> well, but you lose extra parallelism like
>
>  *shortptr = exp
>  
>  var = *shortptr
>  *intptr = exp
>  
>  var = *intptr

Yes (that is, you can't hoist the *intptr = exp store above the var = *shortptr
load with TBAA only).  You can probably still hoist the , it's not clear from your example.

That said, being able to optimize union accesses with TBAA at all
is still nice (esp. for GCC).  Now, the C frontend still forces alias-set zero
for this case because of the RTL alias oracle disfunctionality which doesn't
treat a must-alias as an alias if it can TBAA disambiguate.

Richard.

> Honza

[PATCH revert]Revert r212892 and r212893

2014-07-24 Thread Bin Cheng

Hi,
I reverted my patches about inlining memset on arm, it can't build glibc
properly.  I will try to resolve the issue.
Sorry for the inconvenience.

Revert r212893:
PR target/55701
* config/arm/arm.md (setmem): New pattern.
* config/arm/arm-protos.h (struct tune_params): New fields.
(arm_gen_setmem): New prototype.
* config/arm/arm.c (arm_slowmul_tune): Initialize new fields.
(arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
(arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
(arm_cortex_a8_tune, arm_cortex_a7_tune): Ditto.
(arm_cortex_a15_tune, arm_cortex_a53_tune): Ditto.
(arm_cortex_a57_tune, arm_cortex_a5_tune): Ditto.
(arm_cortex_a9_tune, arm_cortex_a12_tune): Ditto.
(arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune): Ditto.
(arm_const_inline_cost): New function.
(arm_block_set_max_insns): New function.
(arm_block_set_non_vect_profit_p): New function.
(arm_block_set_vect_profit_p): New function.
(arm_block_set_unaligned_vect): New function.
(arm_block_set_aligned_vect): New function.
(arm_block_set_unaligned_non_vect): New function.
(arm_block_set_aligned_non_vect): New function.
(arm_block_set_vect, arm_gen_setmem): New functions.

PR target/55701
* gcc.target/arm/memset-inline-1.c: New test.
* gcc.target/arm/memset-inline-2.c: New test.
* gcc.target/arm/memset-inline-3.c: New test.
* gcc.target/arm/memset-inline-4.c: New test.
* gcc.target/arm/memset-inline-5.c: New test.
* gcc.target/arm/memset-inline-6.c: New test.
* gcc.target/arm/memset-inline-7.c: New test.
* gcc.target/arm/memset-inline-8.c: New test.
* gcc.target/arm/memset-inline-9.c: New test.

Revert r212892:
* config/arm/arm.c (output_move_neon): Handle REG explicitly.

Thanks,
bin

Re: [3/5] gomp-3_0-branch merge to trunk - OpenMP specific compiler changes

2014-07-24 Thread Thomas Schwinge

Hi!

On Thu, 5 Jun 2008 12:04:30 -0400, Jakub Jelinek  wrote:
> --- gcc/omp-low.c (.../trunk) (revision 136314)
> +++ gcc/omp-low.c (.../branches/gomp-3_0-branch)  (revision 136352)
> @@ -223,20 +222,223 @@ extract_omp_for_data (tree for_stmt, str
>   fd->sched_kind = OMP_CLAUSE_SCHEDULE_KIND (t);
>   fd->chunk_size = OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (t);
>   break;
> +  case OMP_CLAUSE_COLLAPSE:
> + if (fd->collapse > 1)
> +   {
> + collapse_iter = &OMP_CLAUSE_COLLAPSE_ITERVAR (t);
> + collapse_count = &OMP_CLAUSE_COLLAPSE_COUNT (t);
> +   }
>default:
>   break;
>}

Not wrong per se -- but once you add additional cases.  Committed in
r212971:

commit 757abb462ac4cb6542b3172fc8285173bfe5b3f7
Author: tschwinge 
Date:   Thu Jul 24 08:27:34 2014 +

Add missing break statement.

gcc/
* omp-low.c (extract_omp_for_data): Add missing break statement.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@212971 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog | 4 
 gcc/omp-low.c | 1 +
 2 files changed, 5 insertions(+)

diff --git gcc/ChangeLog gcc/ChangeLog
index b954245..b436332 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,7 @@
+2014-07-24  Thomas Schwinge  
+
+   * omp-low.c (extract_omp_for_data): Add missing break statement.
+
 2014-07-24  Richard Biener  
 
* tree-inline.h (estimate_move_cost): Add speed_p parameter.
diff --git gcc/omp-low.c gcc/omp-low.c
index 1b1cf2c..1ee1c3a 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -341,6 +341,7 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data 
*fd,
collapse_iter = &OMP_CLAUSE_COLLAPSE_ITERVAR (t);
collapse_count = &OMP_CLAUSE_COLLAPSE_COUNT (t);
  }
+   break;
   default:
break;
   }


Grüße,
 Thomas


pgpSAyclIyRqv.pgp
Description: PGP signature

Re: [PATCH, 4.9/4.10] Profile based option tuning

2014-07-24 Thread Richard Biener

On Thu, Jul 24, 2014 at 3:52 AM, Pengfei Yuan <0xcool...@gmail.com> wrote:
> There are more.
>
> In toplev.c:
>   /* One region RA really helps to decrease the code size.  */
>   if (flag_ira_region == IRA_REGION_AUTODETECT)
> flag_ira_region
>   = optimize_size || !optimize ? IRA_REGION_ONE : IRA_REGION_MIXED;

This could be fixed by moving this to ira.c

> In config/i386/i386.c:
>   * Assignment of ix86_cost
>   * Decision of alignment

True, I didn't grep backends.

Did you investigate where the savings come from?  I meanwhile fixed
the estimate_move_cost bit.

Thanks,
Richard.

> 2014-07-23 19:32 GMT+08:00 Richard Biener :
>> On Wed, Jul 23, 2014 at 1:04 PM, Pengfei Yuan <0xcool...@gmail.com> wrote:
>>> I guess some optimizations are controlled only by "optimize_size", not
>>> by the profile.
>>
>> I only see tree-inline.c:estimate_move_cost which we should indeed fix,
>> it could make a significant difference.
>>
>> One other use in tree-ssa-phiopt.c, but probably doesn't really matter.
>>
>> Richard.

Re: Ping: PR61629 (was Re: Delay RTL initialization until it is really needed)

2014-07-24 Thread Richard Biener

On Thu, Jul 24, 2014 at 8:57 AM, Richard Sandiford
 wrote:
> Ping.  Originaly message was here:
>
>   https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01113.html

Ok.

Thanks,
Richard.

> Richard Sandiford  writes:
>> Richard Sandiford  writes:
>>> Jan Hubicka  writes:
 Hi,
>>>
 IRA initialization shows high in profiles even when building lto
 objects.  This patch simply delays RTL backend initialization until we
 really decide to output a function.  In some cases this avoids the
 initialization completely (like in the case of LTO but also user
 target attributes) and there is some hope for better cache locality.

 Basic idea is to have two flags saying whether lang and target
 dependent bits needs initialization and check it when starting
 function codegen.

 Bootstrapped/regtested x86_64-linux, testing also at AIX. Ok if it passes?

 Honza

 * toplev.c (backend_init_target): Move init_emit_regs and init_regs 
 to...
 (backend_init) ... here; skip ira_init_once and backend_init_target.
 (target_reinit) ... and here; clear 
 this_target_rtl->lang_dependent_initialized.
 (lang_dependent_init_target): Clear 
 this_target_rtl->lang_dependent_initialized;
 break out rtl initialization to ...
 (initialize_rtl): ... here; call also backend_init_target and 
 ira_init_once.
 * toplev.h (initialize_rtl): New function.
 * function.c: Include toplev.h
 (init_function_start): Call initialize_rtl.
 * rtl.h (target_rtl): Add target_specific_initialized,
 lang_dependent_initialized.
 Index: toplev.c
 ===
 --- toplev.c(revision 211837)
 +++ toplev.c(working copy)
 @@ -1583,14 +1583,6 @@ backend_init_target (void)
/* Initialize alignment variables.  */
init_alignments ();

 -  /* This reinitializes hard_frame_pointer, and calls 
 init_reg_modes_target()
 - to initialize reg_raw_mode[].  */
 -  init_emit_regs ();
 -
 -  /* This invokes target hooks to set fixed_reg[] etc, which is
 - mode-dependent.  */
 -  init_regs ();
 -
/* This depends on stack_pointer_rtx.  */
init_fake_stack_mems ();

 @@ -1632,9 +1624,13 @@ backend_init (void)
init_varasm_once ();
save_register_info ();

 -  /* Initialize the target-specific back end pieces.  */
 -  ira_init_once ();
 -  backend_init_target ();
 +  /* Middle end needs this initialization for default mem attributes
 + used by early calls to make_decl_rtl.  */
 +  init_emit_regs ();
 +
 +  /* Middle end needs this initialization for mode tables used to assign
 + modes to vector variables.  */
 +  init_regs ();
>>>
>>> This causes a segfault on gcc.target/mips/umips-store16-1.c.  The register
>>> asm:
>>>
>>> register unsigned int global asm ("$16");
>>>
>>> causes us to globalise $16 and call reinit_regs.  reinit_regs in turn
>>> calls ira_init, but IRA hasn't been initialised at this point and
>>> prerequisites like init_fake_stack_mems haven't yet been called.
>>>
>>> Does the patch below look OK?
>>>
 @@ -1686,6 +1682,31 @@ lang_dependent_init_target (void)
   front end is initialized.  It also depends on the HAVE_xxx macros
   generated from the target machine description.  */
init_optabs ();
 +  this_target_rtl->lang_dependent_initialized = false;
 +}
 +
 +/* Perform initializations that are lang-dependent or target-dependent.
 +   but matters only for late optimizations and RTL generation.  */
 +
 +void
 +initialize_rtl (void)
 +{
 +  static int initialized_once;
 +
 +  /* Initialization done just once per compilation, but delayed
 + till code generation.  */
 +  if (!initialized_once)
 +ira_init_once ();
 +  initialized_once = true;
 +
 +  /* Target specific RTL backend initialization.  */
 +  if (!this_target_rtl->target_specific_initialized)
 +backend_init_target ();
 +  this_target_rtl->target_specific_initialized = true;
 +
 +  if (this_target_rtl->lang_dependent_initialized)
 +return;
 +  this_target_rtl->lang_dependent_initialized = true;

/* The following initialization functions need to generate rtl, so
   provide a dummy function context for them.  */
>>>
>>> Why do you need both these flags?  We only call this function once
>>> the language has been initialised, so we should always be initialising
>>> both sets of information (backend_init_target and the stuff after
>>> the comment above, from the old lang_dependent_init_target).
>>>
>>> How about the second patch below, still under testing?  The new assert
>>> is OK for target_reinit because it has:
>>>
>>>   this_target_rtl->target_specific_initia

Re: PR 61628: Invalid sharing of DECL_INCOMING_RTL

2014-07-24 Thread Richard Biener

On Thu, Jul 24, 2014 at 8:58 AM, Richard Sandiford
 wrote:
> Ping.  Original message was here:
>
>   https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01267.html

Ok.

Thanks,
Richard.

> Richard Sandiford  writes:
>> My patch to reduce the amount of rtx garbage created:
>>
>> 2014-05-17  Richard Sandiford  
>>
>>   * emit-rtl.h (replace_equiv_address, replace_equiv_address_nv): Add an
>>   inplace argument.  Store the new address in the original MEM when true.
>>   * emit-rtl.c (change_address_1): Likewise.
>>   (adjust_address_1, adjust_automodify_address_1, offset_address):
>>   Update accordingly.
>>   * rtl.h (plus_constant): Add an inplace argument.
>>   * explow.c (plus_constant): Likewise.  Try to reuse the original PLUS
>>   when true.  Avoid generating (plus X (const_int 0)).
>>   * function.c (instantiate_virtual_regs_in_rtx): Adjust the PLUS
>>   in-place.  Pass true to plus_constant.
>>   (instantiate_virtual_regs_in_insn): Pass true to replace_equiv_address.
>>
>> exposed a case where a DECL_INCOMING_RTL MEM rtx was being reused in insn
>> patterns without being copied.  This meant that instantiating virtual
>> registers changed the DECL_INCOMING_RTL too.
>>
>> The patch fixes this by adding the missing copy_rtxes.  However,
>> validize_mem has a bit of an awkward interface as far as sharing goes.
>> If the MEM is already valid, validize_mem returns the original rtx,
>> but if the MEM is not valid it creates a new one.  This means that if
>> you copy first you create garbage rtl if the MEM was invalid, whereas if
>> you don't copy first you get invalid sharing if the MEM was valid.
>>
>> Obviously we need to err on the side of copying first, to avoid the
>> invalid sharing.  The patch therefore changes the interface so that
>> validize_mem can modify the MEM in-place.
>>
>> I went through all calls to validize_mem to try to find cases where
>> this might cause a problem.  The patch fixes up the ones I could see.
>> Most callers already copy first, so as well fixing the bug, this seems
>> to reduce the amount of garbage created.
>>
>> Tested on x86_64-linux-gnu, sparc-sun-solaris2.1? and
>> powerpc64-unknown-linux-gnu.  OK to install?
>>
>> Thanks,
>> Richard
>>
>>
>> gcc/
>>   PR middle-end/61268
>>   * function.c (assign_parm_setup_reg): Prevent invalid sharing of
>>   DECL_INCOMING_RTL and entry_parm.
>>   (get_arg_pointer_save_area): Likewise arg_pointer_save_area.
>>   * calls.c (load_register_parameters): Likewise argument values.
>>   (emit_library_call_value_1, store_one_arg): Likewise argument
>>   save areas.
>>   * config/i386/i386.c (assign_386_stack_local): Likewise the local
>>   stack slot.
>>   * explow.c (validize_mem): Modify the argument in-place.
>>
>> Index: gcc/function.c
>> ===
>> --- gcc/function.c2014-07-11 11:55:10.495121493 +0100
>> +++ gcc/function.c2014-07-18 08:57:07.047215306 +0100
>> @@ -2662,13 +2662,14 @@ assign_parm_adjust_entry_rtl (struct ass
>>/* Handle calls that pass values in multiple non-contiguous
>>locations.  The Irix 6 ABI has examples of this.  */
>>if (GET_CODE (entry_parm) == PARALLEL)
>> - emit_group_store (validize_mem (stack_parm), entry_parm,
>> + emit_group_store (validize_mem (copy_rtx (stack_parm)), entry_parm,
>> data->passed_type,
>> int_size_in_bytes (data->passed_type));
>>else
>>   {
>> gcc_assert (data->partial % UNITS_PER_WORD == 0);
>> -   move_block_from_reg (REGNO (entry_parm), validize_mem (stack_parm),
>> +   move_block_from_reg (REGNO (entry_parm),
>> +validize_mem (copy_rtx (stack_parm)),
>>  data->partial / UNITS_PER_WORD);
>>   }
>>
>> @@ -2837,7 +2838,7 @@ assign_parm_setup_block (struct assign_p
>>else
>>   gcc_assert (!size || !(PARM_BOUNDARY % BITS_PER_WORD));
>>
>> -  mem = validize_mem (stack_parm);
>> +  mem = validize_mem (copy_rtx (stack_parm));
>>
>>/* Handle values in multiple non-contiguous locations.  */
>>if (GET_CODE (entry_parm) == PARALLEL)
>> @@ -2972,7 +2973,7 @@ assign_parm_setup_reg (struct assign_par
>>   assign_parm_find_data_types and expand_expr_real_1.  */
>>
>>equiv_stack_parm = data->stack_parm;
>> -  validated_mem = validize_mem (data->entry_parm);
>> +  validated_mem = validize_mem (copy_rtx (data->entry_parm));
>>
>>need_conversion = (data->nominal_mode != data->passed_mode
>>|| promoted_nominal_mode != data->promoted_mode);
>> @@ -3228,7 +3229,7 @@ assign_parm_setup_stack (struct assign_p
>>/* Conversion is required.  */
>>rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
>>
>> -  emit_move_insn (tempreg, validize_mem (data->entry_parm));
>> +  emit_move_insn (tempreg, validize_me

Re: [PATCH 1/4] Add an abstract incremental hash data type

2014-07-24 Thread Richard Biener

On Wed, Jul 23, 2014 at 9:56 PM, Andi Kleen  wrote:
>> > Why didn't you replace the tree.c uses BTW?
>>
>> Patches were already quite big, but I'll add it.
>
> Actually I handled them all in tree.c. Did you
> mean something else?

I meant transform the iterative_hash_* _implementations_ to
use the interface.  But maybe I missed sth in the patch(es).

> I didn't convert all of tree-ssa-* and dwarf* so far,
> and a few other places.  This can be done step by step.

Sure.

Can you re-post the adjusted patch introducing inchash.[ch]?

Thanks,
Richard.

> -Andi

Re: testsuite allocators patch

2014-07-24 Thread Jonathan Wakely


On 23/07/14 22:33 +0200, François Dumont wrote:
   I have a small question regarding some code next to the one I am 
modifying in this patch. I can see lines like:


 propagating_allocator() noexcept = default;

   When using a default implementation shouldn't we let the compiler 
decide if it should be noexcept or not depending on the member fields 
or base class default constructors ?


Stating it explicitly means you get an error if the default
implementation is not noexcept. That can be useful, to ensure you
don't silently start getting a throwing constructor by mistake because
of a change to a base class.

I'm not sure if I added the noexcept above, but if I did that might
have been what I was intending it to do. I don't remember.

I'll review the rest of the patch ASAP. Did you test it with no other
changes in your tree, and run the entire testsuite?

Re: [PATCH][gcc-4.9.0] gcc/Makefile.in: fix parallel building failure

2014-07-24 Thread Hongxu Jia


On 07/24/2014 12:17 PM, Andrew Pinski wrote:

On Wed, Jul 23, 2014 at 9:10 PM, Hongxu Jia  wrote:

1. How to reproduce the issue:

1) manually modify gcc/Makefile.in to delay the generation of config.h:
...
diff --git gcc-4.9.0/gcc/Makefile.in gcc-4.9.0/gcc/Makefile.in
--- gcc-4.9.0/gcc/Makefile.in
+++ gcc-4.9.0/gcc/Makefile.in
@@ -1622,9 +1622,12 @@ tm.h: cs-tm.h ; @true
  tm_p.h: cs-tm_p.h ; @true

  cs-config.h: Makefile
+   @echo "start to generate config.h `date`"
+   sleep 10
 TARGET_CPU_DEFAULT="" \
 HEADERS="$(host_xm_include_list)" DEFINES="$(host_xm_defines)" \
 $(SHELL) $(srcdir)/mkconfig.sh config.h
+   @echo "config.h generated `date`"

  cs-bconfig.h: Makefile
 TARGET_CPU_DEFAULT="" \
...

2) compiling gcc

2. Analysis

Most C source files included config.h which was generated by a rule.
But no related prerequisites was added to the C source compiling rule.
There was potential building failure while makefile enabled parallel.


Why not update the dependencies instead of this hack?
For 4.10/5.0 is this already fixed by changing how dependencies work.


Hi Andrew,

I tests on 4.10 (gcc-4.10-20140720), and this issue still existed.

After a grep in gcc subdir, there are almost 703 C source files included 
config.h,


How to add dependencies? I am not familar with this.

Thanks,
Hongxu



Thanks,
Andrew


The C source compiling rule used suffix rule '.c.o', but the suffix
rule doesn't support prerequisites.
https://www.gnu.org/software/make/manual/html_node/Suffix-Rules.html

We used the pattern rule '%.o : %.c' to instead, and add the config.h
as its prerequisite

We also moved the '%.o : %.c' rule down to the 'build/%.o :' rule, which
makes '%.o : %.c' rule doesn't override 'build/%.o :'.

For more detail:
https://bugzilla.yoctoproject.org/show_bug.cgi?id=6568

Signed-off-by: Hongxu Jia 
---
  gcc/Makefile.in | 12 
  1 file changed, 8 insertions(+), 4 deletions(-)

diff --git gcc-4.9.0/gcc/Makefile.in gcc-4.9.0/gcc/Makefile.in
index 6475cba..04889fe 100644
--- gcc-4.9.0/gcc/Makefile.in
+++ gcc-4.9.0/gcc/Makefile.in
@@ -1054,10 +1054,6 @@ COMPILE = source='$<' object='$@' libtool=no \
  POSTCOMPILE =
  endif

-.cc.o .c.o:
-   $(COMPILE) $<
-   $(POSTCOMPILE)
-
  #
  # Support for additional languages (other than C).
  # C can be supported this way too (leave for later).
@@ -2342,6 +2338,14 @@ build/%.o :  # dependencies provided by explicit rule 
later
 $(COMPILER_FOR_BUILD) -c $(BUILD_COMPILERFLAGS) $(BUILD_CPPFLAGS) \
 -o $@ $<

+%.o: %.c $(CONFIG_H)
+   $(COMPILE) $<
+   $(POSTCOMPILE)
+
+%.o: %.cc $(CONFIG_H)
+   $(COMPILE) $<
+   $(POSTCOMPILE)
+
  ## build/version.o is compiled by the $(COMPILER_FOR_BUILD) but needs
  ## several C macro definitions, just like version.o
  build/version.o:  version.c version.h \
--
1.8.1.2

Commit: RX:

2014-07-24 Thread Nick Clifton

Hi Guys,

  I am checking in the patch below to fix a small DWARF generation
  problem with the RX backend.  The stack_push pattern contains two
  separate operations that act in parallel, but they were written as if
  they happened in sequence.  Which meant that the DWARF generated to
  show where a register was pushed onto the stack was off by 4 bytes.

Cheers
  Nick
  
gcc/ChangeLog
2014-07-24  Nick Clifton  

* config/rx/rx.md (stack_push): Adjust RTL to account for the fact
that operations are taking place in parallel.
* config/rx.h (FRAME_POINTER_CFA_OFFSET): Delete.

Index: gcc/config/rx/rx.h
===
--- gcc/config/rx/rx.h  (revision 212971)
+++ gcc/config/rx/rx.h  (working copy)
@@ -645,7 +645,6 @@
 
 #define INCOMING_FRAME_SP_OFFSET   4
 #define ARG_POINTER_CFA_OFFSET(FNDECL) 4
-#define FRAME_POINTER_CFA_OFFSET(FNDECL)   4
 
 #define TARGET_USE_FPU (! TARGET_NO_USE_FPU)
 
Index: gcc/config/rx/rx.md
===
--- gcc/config/rx/rx.md (revision 212971)
+++ gcc/config/rx/rx.md (working copy)
@@ -617,7 +617,7 @@
   [(set (reg:SI SP_REG)
(minus:SI (reg:SI SP_REG)
  (const_int 4)))
-   (set (mem:SI (reg:SI SP_REG))
+   (set (mem:SI (minus:SI (reg:SI SP_REG) (const_int 4)))
(match_operand:SI 0 "register_operand" "r"))]
   ""
   "push.l\t%0"

Re: [PATCH][sched-deps] Generalise usage of macro fusion to work on any two insns

2014-07-24 Thread Kyrill Tkachov


Ping.
https://gcc.gnu.org/ml/gcc-patches/2014-07/msg00958.html

Kyrill

On 14/07/14 11:01, Kyrill Tkachov wrote:

On 11/07/14 14:20, Alexander Monakov wrote:

On Fri, 11 Jul 2014, Kyrill Tkachov wrote:

On 10/07/14 22:53, Maxim Kuvyrkov wrote:

The patch looks good to me, but some cleanup suggestions below.

Thanks, here's an updated patch.
How's this?

You need to remove 'if (targetm. ...) SCHED_GROUP_P (insn) = 1;' from the
first if branch, keeping only one SCHED_GROUP_P assignment at the end of the
function.

Alexander

Thanks for the pointer, I had hurried a bit.
Here is the updated patch.

Kyrill

2014-07-14  Ramana Radhakrishnan 
  Kyrylo Tkachov  

  * sched-deps.c (try_group_insn): Generalise macro fusion hook usage
  to any two insns.  Update comment.  Rename to sched_macro_fuse_insns.
  (sched_analyze_insn): Update use of try_group_insn to
  sched_macro_fuse_insns.
  * config/i386/i386.c (ix86_macro_fusion_pair_p): Reject 2nd arguments
  that are not conditional jumps.

Re: Strenghten assumption about dynamic type changes (placement new)

2014-07-24 Thread Jan Hubicka

> >  *shortptr = exp
> >  
> >  var = *shortptr
> >  *intptr = exp
> >  
> >  var = *intptr
> 
> Yes (that is, you can't hoist the *intptr = exp store above the var = 
> *shortptr
> load with TBAA only).  You can probably still hoist the  chain with intptr>, it's not clear from your example.

Well, this is placement new version of this, where of course the movement is 
not desirable.
Obvioulsy the chains can not overlap:
#include 
#include 
struct A {short a; short b; A(){a=1;}};
struct B {int a; B(){a=2;}};

struct A a;
struct A *pa = &a;
struct B *pb = reinterpret_cast(&a);
int
main()
{
  int sum;
  struct A *ppa = pa;
  struct B *ppb = pb;
  if (!pa || !pb)
return 1;
  ppa->~A();
  new (ppa) A();
  asm ("#asm1":"=m"(ppa->a):"m"(ppa->a));
  sum = ppa->a*11;
  new (ppb) B();
  
  asm ("#asm2":"=m"(ppb->a):"m"(ppb->a));
  sum += ppb->a*11;
  printf ("%i\n",sum);
  return 0;
}

Of course it makes us i.e. in 
t(short *a, short *b, int *c)
{
  int i;
  for (i=0;i<100;i++)
c[i]=a[i]+b[i];
}

generate the fallback case when vectorizing where c is overlapping with a or b, 
while clang doesn't.

Honza
> 
> That said, being able to optimize union accesses with TBAA at all
> is still nice (esp. for GCC).  Now, the C frontend still forces alias-set zero
> for this case because of the RTL alias oracle disfunctionality which doesn't
> treat a must-alias as an alias if it can TBAA disambiguate.
> 
> Richard.
> 
> > Honza

[PATCH][RFC] Fix part of -ftrapv (PR52478)

2014-07-24 Thread Richard Biener


The following fixes one of the most annoying parts of non-working -ftrapv,
namely that we only support >= word_mode trappings (quite annoying on
64bit archs where 'int' is not handled).  At least on x86_64 libgcc
has all the libfuncs available for SImode so the following patch
arranges for them to be used.  RFC because I don't know whether
they are there by accident... (and thus the patch adds a requirement
that is not met by other targets - but a link error is better than
-ftrapv failure?)

The testcase relies on fork() to be able to capture both inline
and out-of-line trapv sequences.  dg-require-fork is unused but
present, so I use it.  I suppose we can restrict the testcase
to a few targets manually as well - not sure, any preferences?

At least the obvious testcase from PR52478 now works (until
you hit constant folding ... see PR61893 I just opened).

Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok
for trunk?

Thanks,
Richard.


2014-07-24  Richard Biener  

PR middle-end/52478
* optabs.c (gen_int_libfunc): For -ftrapv libfuncs make
sure to register SImode ones, not only >= word_mode ones.
* expr.c (expand_expr_real_2): When expanding -ftrapv
binops do not use OPTAB_LIB_WIDEN.

* gcc.dg/torture/ftrapv-1.c: New testcase.

Index: gcc/optabs.c
===
*** gcc/optabs.c(revision 212970)
--- gcc/optabs.c(working copy)
*** gen_int_libfunc (optab optable, const ch
*** 5559,5571 
 enum machine_mode mode)
  {
int maxsize = 2 * BITS_PER_WORD;
  
if (GET_MODE_CLASS (mode) != MODE_INT)
  return;
if (maxsize < LONG_LONG_TYPE_SIZE)
  maxsize = LONG_LONG_TYPE_SIZE;
!   if (GET_MODE_CLASS (mode) != MODE_INT
!   || GET_MODE_BITSIZE (mode) < BITS_PER_WORD
|| GET_MODE_BITSIZE (mode) > maxsize)
  return;
gen_libfunc (optable, opname, suffix, mode);
--- 5559,5575 
 enum machine_mode mode)
  {
int maxsize = 2 * BITS_PER_WORD;
+   int minsize = BITS_PER_WORD;
  
if (GET_MODE_CLASS (mode) != MODE_INT)
  return;
if (maxsize < LONG_LONG_TYPE_SIZE)
  maxsize = LONG_LONG_TYPE_SIZE;
!   if (minsize > INT_TYPE_SIZE
!   && (trapv_binoptab_p (optable)
! || trapv_unoptab_p (optable)))
! minsize = INT_TYPE_SIZE;
!   if (GET_MODE_BITSIZE (mode) < minsize
|| GET_MODE_BITSIZE (mode) > maxsize)
  return;
gen_libfunc (optable, opname, suffix, mode);
Index: gcc/expr.c
===
*** gcc/expr.c  (revision 212970)
--- gcc/expr.c  (working copy)
*** expand_expr_real_2 (sepops ops, rtx targ
*** 9212,9218 
if (modifier == EXPAND_STACK_PARM)
  target = 0;
temp = expand_binop (mode, this_optab, op0, op1, target,
!  unsignedp, OPTAB_LIB_WIDEN);
gcc_assert (temp);
/* Bitwise operations do not need bitfield reduction as we expect their
   operands being properly truncated.  */
--- 9212,9220 
if (modifier == EXPAND_STACK_PARM)
  target = 0;
temp = expand_binop (mode, this_optab, op0, op1, target,
!  unsignedp,
!  trapv_binoptab_p (this_optab)
!  ? OPTAB_LIB : OPTAB_LIB_WIDEN);
gcc_assert (temp);
/* Bitwise operations do not need bitfield reduction as we expect their
   operands being properly truncated.  */
Index: gcc/testsuite/gcc.dg/torture/ftrapv-1.c
===
*** gcc/testsuite/gcc.dg/torture/ftrapv-1.c (revision 0)
--- gcc/testsuite/gcc.dg/torture/ftrapv-1.c (working copy)
***
*** 0 
--- 1,37 
+ /* { dg-do run } */
+ /* { dg-additional-options "-ftrapv" } */
+ /* { dg-require-effective-target trapping } */
+ /* { dg-require-fork } */
+ 
+ #include 
+ #include 
+ #include 
+ #include 
+ 
+ /* Verify SImode operations properly trap.  PR middle-end/52478  */
+ 
+ /* Disallow inlining/cloning which would constant propagate and trigger
+unrelated bugs.  */
+ 
+ int __attribute__((noinline,noclone))
+ iaddv (int a, int b)
+ {
+   return a + b;
+ }
+ 
+ int main(void)
+ {
+   pid_t child = fork ();
+   int status = 0;
+   if (child == 0)
+ {
+   volatile int x = iaddv (__INT_MAX__, 1);
+   exit (0);
+ }
+   else if (child == -1)
+ return 0;
+   if (wait (&status) == child 
+   && status == 0)
+ abort ();
+   return 0;
+ }

Re: Strenghten assumption about dynamic type changes (placement new)

2014-07-24 Thread Richard Biener

On Thu, Jul 24, 2014 at 11:23 AM, Jan Hubicka  wrote:
>> >  *shortptr = exp
>> >  
>> >  var = *shortptr
>> >  *intptr = exp
>> >  
>> >  var = *intptr
>>
>> Yes (that is, you can't hoist the *intptr = exp store above the var = 
>> *shortptr
>> load with TBAA only).  You can probably still hoist the > chain with intptr>, it's not clear from your example.
>
> Well, this is placement new version of this, where of course the movement is 
> not desirable.
> Obvioulsy the chains can not overlap:
> #include 
> #include 
> struct A {short a; short b; A(){a=1;}};
> struct B {int a; B(){a=2;}};
>
> struct A a;
> struct A *pa = &a;
> struct B *pb = reinterpret_cast(&a);
> int
> main()
> {
>   int sum;
>   struct A *ppa = pa;
>   struct B *ppb = pb;
>   if (!pa || !pb)
> return 1;
>   ppa->~A();
>   new (ppa) A();
>   asm ("#asm1":"=m"(ppa->a):"m"(ppa->a));
>   sum = ppa->a*11;
>   new (ppb) B();
>
>   asm ("#asm2":"=m"(ppb->a):"m"(ppb->a));
>   sum += ppb->a*11;
>   printf ("%i\n",sum);
>   return 0;
> }
>
> Of course it makes us i.e. in
> t(short *a, short *b, int *c)
> {
>   int i;
>   for (i=0;i<100;i++)
> c[i]=a[i]+b[i];
> }
>
> generate the fallback case when vectorizing where c is overlapping with a or 
> b, while clang doesn't.

Yep.  I bet clang gets it wrong with placement new (but then for
PODs simply writing to a storage location ends lifetime of the old
object in there and starts lifetime of a new object, so placement new
is not needed to make an overlap valid as far as I read the standard(s)).

So I don't think omitting the runtime alias check is valid for the above
case.

Richard.

> Honza
>>
>> That said, being able to optimize union accesses with TBAA at all
>> is still nice (esp. for GCC).  Now, the C frontend still forces alias-set 
>> zero
>> for this case because of the RTL alias oracle disfunctionality which doesn't
>> treat a must-alias as an alias if it can TBAA disambiguate.
>>
>> Richard.
>>
>> > Honza

Re: Strenghten assumption about dynamic type changes (placement new)

2014-07-24 Thread Richard Biener

On Thu, Jul 24, 2014 at 11:53 AM, Richard Biener
 wrote:
> On Thu, Jul 24, 2014 at 11:23 AM, Jan Hubicka  wrote:
>>> >  *shortptr = exp
>>> >  
>>> >  var = *shortptr
>>> >  *intptr = exp
>>> >  
>>> >  var = *intptr
>>>
>>> Yes (that is, you can't hoist the *intptr = exp store above the var = 
>>> *shortptr
>>> load with TBAA only).  You can probably still hoist the >> chain with intptr>, it's not clear from your example.
>>
>> Well, this is placement new version of this, where of course the movement is 
>> not desirable.
>> Obvioulsy the chains can not overlap:
>> #include 
>> #include 
>> struct A {short a; short b; A(){a=1;}};
>> struct B {int a; B(){a=2;}};
>>
>> struct A a;
>> struct A *pa = &a;
>> struct B *pb = reinterpret_cast(&a);
>> int
>> main()
>> {
>>   int sum;
>>   struct A *ppa = pa;
>>   struct B *ppb = pb;
>>   if (!pa || !pb)
>> return 1;
>>   ppa->~A();
>>   new (ppa) A();
>>   asm ("#asm1":"=m"(ppa->a):"m"(ppa->a));
>>   sum = ppa->a*11;
>>   new (ppb) B();
>>
>>   asm ("#asm2":"=m"(ppb->a):"m"(ppb->a));
>>   sum += ppb->a*11;
>>   printf ("%i\n",sum);
>>   return 0;
>> }
>>
>> Of course it makes us i.e. in
>> t(short *a, short *b, int *c)
>> {
>>   int i;
>>   for (i=0;i<100;i++)
>> c[i]=a[i]+b[i];
>> }
>>
>> generate the fallback case when vectorizing where c is overlapping with a or 
>> b, while clang doesn't.
>
> Yep.  I bet clang gets it wrong with placement new (but then for
> PODs simply writing to a storage location ends lifetime of the old
> object in there and starts lifetime of a new object, so placement new
> is not needed to make an overlap valid as far as I read the standard(s)).
>
> So I don't think omitting the runtime alias check is valid for the above
> case.

Btw, we don't create a runtime test here.  The argument is as simple
as overlap may only happen during iteration 0, otherwise you have
an invalid read via short of sth stored via int.

Thus if you are in loops you are usually fine to use TBAA.

Richard.

> Richard.
>
>> Honza
>>>
>>> That said, being able to optimize union accesses with TBAA at all
>>> is still nice (esp. for GCC).  Now, the C frontend still forces alias-set 
>>> zero
>>> for this case because of the RTL alias oracle disfunctionality which doesn't
>>> treat a must-alias as an alias if it can TBAA disambiguate.
>>>
>>> Richard.
>>>
>>> > Honza

Re: [PATCH, Pointer Bounds Checker 9/x] Cgraph extension

2014-07-24 Thread Ilya Enkovich

On 22 Jul 21:56, Jeff Law wrote:
> On 04/16/14 08:03, Ilya Enkovich wrote:
> >Hi,
> >
> >This patch introduces changes in call graph for Pointer Bounds Checker.
> >
> >New fields instrumented_version, instrumentation_clone and orig_decl are 
> >added for cgraph_node:
> >  - instrumentation_clone field is 1 for nodes created for instrumented 
> > version of functions
> >  - instrumented_version points to instrumented/original node
> >  - orig_decl holds original function declaration for instrumented nodes in 
> > case original node is removed
> >
> >IPA_REF_CHKP reference type is introduced for nodes to reference 
> >instrumented function versions from originals.  It is used to have proper 
> >reachability analysis.
> >
> >When original function bodies are not needed anymore, functions are 
> >transformed into thunks having call edge to the instrumented function.  
> >Therefore new field appeared in cgraph_thunk_info to mark such thunks.
> >
> >Does it look OK?
> >
> >Bootstrapped and tested on linux-x86_64.
> >
> >Thanks,
> >Ilya
> >--
> >gcc/
> >
> >2014-04-16  Ilya Enkovich  
> >
> > * cgraph.h (cgraph_thunk_info): Add add_pointer_bounds_args
> > field.
> > (cgraph_node): Add instrumented_version, orig_decl and
> > instrumentation_clone fields.
> > (symtab_alias_target): Allow IPA_REF_CHKP reference.
> > * cgraph.c (cgraph_remove_node): Fix instrumented_version
> > of the referenced node if any.
> > (dump_cgraph_node): Dump instrumentation_clone and
> > instrumented_version fields.
> > (verify_cgraph_node): Check correctness of IPA_REF_CHKP
> > references and instrumentation thunks.
> > * cgraphbuild.c (rebuild_cgraph_edges): Rebuild IPA_REF_CHKP
> > reference.
> > (cgraph_rebuild_references): Likewise.
> > * cgraphunit.c (assemble_thunks_and_aliases): Skip thunks
> > calling instrumneted function version.
> > * ipa-ref.h (ipa_ref_use): Add IPA_REF_CHKP.
> > (ipa_ref): increase size of use field.
> > * ipa-ref.c (ipa_ref_use_name): Add element for IPA_REF_CHKP.
> > * lto-cgraph.c (lto_output_node): Output instrumentation_clone,
> > thunk.add_pointer_bounds_args and orig_decl field.
> > (lto_output_ref): Adjust to new ipa_ref::use field size.
> > (input_overwrite_node): Read instrumentation_clone field.
> > (input_node): Read thunk.add_pointer_bounds_args and orig_decl
> > fields.
> > (input_ref): Adjust to new ipa_ref::use field size.
> > (input_cgraph_1): Compute instrumented_version fields and restore
> > IDENTIFIER_TRANSPARENT_ALIAS chains.
> > * lto-streamer.h (LTO_minor_version): Change minor version from
> > 0 to 1.
> > * ipa.c (symtab_remove_unreachable_nodes): Consider instrumented
> > clone as address taken if the original one is address taken.
> > (cgraph_externally_visible_p): Mark instrumented 'main' as
> > externally visible.
> > (function_and_variable_visibility): Filter instrumentation
> > thunks.
> >
> >
> >diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> >index be3661a..6210c68 100644
> >--- a/gcc/cgraph.c
> >+++ b/gcc/cgraph.c
> >@@ -2850,7 +2861,9 @@ verify_cgraph_node (struct cgraph_node *node)
> > }
> >for (i = 0; ipa_ref_list_reference_iterate (&node->ref_list,
> >   i, ref); i++)
> >-if (ref->use != IPA_REF_ALIAS)
> >+if (ref->use == IPA_REF_CHKP)
> >+  ;
> >+else if (ref->use != IPA_REF_ALIAS)
> >   {
> > error ("Alias has non-alias reference");
> > error_found = true;
> Is there any checking you can/should be doing here?And I'm
> asking because I'm pretty sure there's something you ought to be
> checking here :-)
> 
> There's a general desire for key datastructures to sanity check them
> as much as possible.

Thanks for comments!  I added additional check for chkp references.  It is 
performed later because this piece of code is for aliases only.

> 
> >+  /* If instrumentation_clone is 1 then instrumented_version points
> >+ to the original function used to make instrumented version.
> >+ Otherwise points to instrumented version of the function.  */
> >+  struct cgraph_node *instrumented_version;
> >+  /* If instrumentation_clone is 1 then orig_decl is the original
> >+ function declaration.  */
> >+  tree orig_decl;
> So I don't see anything which checks these two invariants.
> 
> Mostly it looks good.  I do want to look at it again once the
> verification stuff is beefed up.
> 
> 
> Jeff

Added checks for all new fields.  Below is a new patch version.

Thanks,
Ilya
--
2014-07-24  Ilya Enkovich  

* cgraph.h (cgraph_thunk_info): Add add_pointer_bounds_args
field.
(cgraph_node): Add instrumented_version, orig_decl and
instrumentation_clone fields.
(symtab_alias_target): Allow IPA_REF_CHKP reference.
* cgraph.c (cgraph_remove_node): Fix instrumented_version
of the referenced node if

Re: [GSoC] Handling of isl_ast_op_pdiv_q and isl_ast_op_pdiv_r

2014-07-24 Thread Roman Gareev

> Any reason you don't make 'k' a function parameter?

Yes, ISL'll generate the following code, if 'k' a function parameter:

for (int c1 = 0; c1 <= 24; c1 += 1)
  S_3(c1);

However, we could use  -fno-ipa-cp to get the ISL AST, which contains
isl_ast_op_pdiv_r. What do you think about this?

> Can you initialize res outside of the loop?

Yes, I've implemented this in the improved version.

--
   Cheers, Roman Gareev.

[GSoC] generation of Gimple code from isl_ast_node_if

2014-07-24 Thread Roman Gareev

I've attached the patch, which contains generation of Gimple code from
isl_ast_node_if.

However, I've found out a problem. When I'm trying to generate Gimple
code from, for example, the following ISL AST:

{
  for (int c1 = 0; c1 <= 49; c1 += 1) {
S_6(c1);
if (c1 <= 48) {
  S_3(c1);
  if (c1 >= 24)
S_4(c1);
  S_5(c1);
}
  }
  S_7();
}

the pointer to Gimple basic block of S_3's poly basic block is NULL.

Could you please advise me possible reasons of this issue?

The source code of the example:

int
foo ()
{
  int i, res = 0;

  for (i = 0; i < 50; i++)
{
  if (i >= 25)
res += i;
}

  return res;
}

extern void abort ();

int
main (void)
{
  int res = foo ();

  if (res != 1225)
abort ();

  return 0;
}

--
   Cheers, Roman Gareev.
Index: gcc/graphite-isl-ast-to-gimple.c
===
--- gcc/graphite-isl-ast-to-gimple.c(revision 212922)
+++ gcc/graphite-isl-ast-to-gimple.c(working copy)
@@ -636,6 +636,42 @@
   return next_e;
 }
 
+/* Creates a new if region corresponding to ISL's cond.  */
+
+static edge
+graphite_create_new_guard (edge entry_edge, __isl_take isl_ast_expr *if_cond,
+  ivs_params &ip)
+{
+  tree type =
+build_nonstandard_integer_type (graphite_expression_type_precision, 0);
+  tree cond_expr = gcc_expression_from_isl_expression (type, if_cond, ip);
+  edge exit_edge = create_empty_if_region_on_edge (entry_edge, cond_expr);
+  return exit_edge;
+}
+
+/* Translates an isl_ast_node_if to Gimple.  */
+
+static edge
+translate_isl_ast_node_if (loop_p context_loop,
+  __isl_keep isl_ast_node *node,
+  edge next_e, ivs_params &ip)
+{
+  gcc_assert (isl_ast_node_get_type (node) == isl_ast_node_if);
+  isl_ast_expr *if_cond = isl_ast_node_if_get_cond (node);
+  edge last_e = graphite_create_new_guard (next_e, if_cond, ip);
+
+  edge true_e = get_true_edge_from_guard_bb (next_e->dest);
+  isl_ast_node *then_node = isl_ast_node_if_get_then (node);
+  translate_isl_ast (context_loop, then_node, true_e, ip);
+  isl_ast_node_free (then_node);
+
+  edge false_e = get_false_edge_from_guard_bb (next_e->dest);
+  isl_ast_node *else_node = isl_ast_node_if_get_else (node);
+  translate_isl_ast (context_loop, else_node, false_e, ip);
+  isl_ast_node_free (else_node);
+  return last_e;
+}
+
 /* Translates an ISL AST node NODE to GCC representation in the
context of a SESE.  */
 
@@ -653,7 +689,8 @@
 next_e, ip);
 
 case isl_ast_node_if:
-  return next_e;
+  return translate_isl_ast_node_if (context_loop, node,
+   next_e, ip);
 
 case isl_ast_node_user:
   return translate_isl_ast_node_user (node, next_e, ip);

Re: [GSoC] A bug related to induction variables and blocks

2014-07-24 Thread Roman Gareev

> Is there a reason you have those global values? To my understanding they
> could possibly just be function parameters?

Yes, it would be fine for this test case. I've implemented this in the
improved version.

--
   Cheers, Roman Gareev.
2014-07-23  Roman Gareev  

[gcc/]

* graphite-isl-ast-to-gimple.c:
(graphite_create_new_loop): Add calling of isl_id_free to properly
decrement reference counts.

[gcc/testsuite]

* gcc.dg/graphite/isl-ast-gen-blocks-4.c: New testcase.
Index: gcc/graphite-isl-ast-to-gimple.c
===
--- gcc/graphite-isl-ast-to-gimple.c(revision 212922)
+++ gcc/graphite-isl-ast-to-gimple.c(working copy)
@@ -383,6 +383,10 @@
 
   isl_ast_expr *for_iterator = isl_ast_node_for_get_iterator (node_for);
   isl_id *id = isl_ast_expr_get_id (for_iterator);
+  std::map::iterator res;
+  res = ip.find (id);
+  if (ip.count (id))
+isl_id_free (res->first);
   ip[id] = iv;
   isl_ast_expr_free (for_iterator);
   return loop;
Index: gcc/testsuite/gcc.dg/graphite/isl-ast-gen-blocks-4.c
===
--- gcc/testsuite/gcc.dg/graphite/isl-ast-gen-blocks-4.c(revision 0)
+++ gcc/testsuite/gcc.dg/graphite/isl-ast-gen-blocks-4.c(working copy)
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fgraphite-identity -fgraphite-code-generator=isl" } */
+
+static int __attribute__((noinline))
+foo (int k, int n1, int n2, int n3)
+{
+  int j, res = 0;
+  for (j = 0; j < k; j++)
+{
+  int i;
+  for (i = 0; i < n1; i++)
+res += i;
+  for (i = 0; i < n2; i++)
+res += i;
+  for (i = 0; i < n3; i++)
+res += i;
+}
+
+  return res;
+}
+
+extern void abort ();
+
+int
+main (void)
+{ 
+  int res = foo (4, 50, 50, 50);
+  if (res != 14700)
+abort ();
+
+  return 0;
+}

Re: [PATCH][sched-deps] Generalise usage of macro fusion to work on any two insns

2014-07-24 Thread Maxim Kuvyrkov


On Jul 24, 2014, at 10:11 AM, Kyrill Tkachov  wrote:

> Ping.
> https://gcc.gnu.org/ml/gcc-patches/2014-07/msg00958.html
> 
> Kyrill
> 
> On 14/07/14 11:01, Kyrill Tkachov wrote:
>> On 11/07/14 14:20, Alexander Monakov wrote:
>>> On Fri, 11 Jul 2014, Kyrill Tkachov wrote:
 On 10/07/14 22:53, Maxim Kuvyrkov wrote:
> The patch looks good to me, but some cleanup suggestions below.
 Thanks, here's an updated patch.
 How's this?
>>> You need to remove 'if (targetm. ...) SCHED_GROUP_P (insn) = 1;' from the
>>> first if branch, keeping only one SCHED_GROUP_P assignment at the end of the
>>> function.
>>> 
>>> Alexander
>> Thanks for the pointer, I had hurried a bit.
>> Here is the updated patch.
>> 

Hi Kyrill,

I have reviewed the latest version of your patch and it is perfectly fine with 
me.  You need to wait for an ack from the official maintainer to commit your 
patch.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org

>> Kyrill
>> 
>> 2014-07-14  Ramana Radhakrishnan 
>>  Kyrylo Tkachov  
>> 
>>  * sched-deps.c (try_group_insn): Generalise macro fusion hook usage
>>  to any two insns.  Update comment.  Rename to sched_macro_fuse_insns.
>>  (sched_analyze_insn): Update use of try_group_insn to
>>  sched_macro_fuse_insns.
>>  * config/i386/i386.c (ix86_macro_fusion_pair_p): Reject 2nd arguments
>>  that are not conditional jumps.
> 
>

[PATCH AArch64] Rename [u]int32x1_t to [u]int32_t (resp 16x1, 8x1) in arm_neon.h

2014-07-24 Thread Alan Lawrence

The ACLE spec does not mention the int32x1_t, uint32x1_t, int16x1_t, uint16x1_t, 
int8x1_t or uint8x1_t types currently in arm_neon.h, but just 'standard' types 
int32_t, int16_t, etc. This patch is a global search-and-replace across 
arm_neon.h (and the tests that depend on it).


Regressed (check-gcc and check-g++) on aarch64-none-elf.

The question of backporting to 4.9 has been raised internally. There is no ABI 
issue, as int32x1_t was merely a typedef to int32_t (etc.). However there is a 
source code compatibility issue; code mentioning the 32x1 types, i.e. not 
conforming to the ACLE spec, which previously compiled, will no longer do so. My 
personal feeling is therefore not to backport this, but I would welcome input 
from maintainers (and others)...?


Cheers, Alan

gcc/ChangeLog:

* config/aarch64/arm_neon.h (int32x1_t, int16x1_t, int8x1_t,
uint32x1_t, uint16x1_t, uint8x1_t): Remove typedefs.

(vqabsb_s8, vqabsh_s16, vqabss_s32, vqaddb_s8, vqaddh_s16, vqadds_s32,
vqaddb_u8, vqaddh_u16, vqadds_u32, vqdmlalh_s16, vqdmlalh_lane_s16,
vqdmlals_s32, vqdmlals_lane_s32, vqdmlslh_s16, vqdmlslh_lane_s16,
vqdmlsls_s32, vqdmlsls_lane_s32, vqdmulhh_s16, vqdmulhh_lane_s16,
vqdmulhs_s32, vqdmulhs_lane_s32, vqdmullh_s16, vqdmullh_lane_s16,
vqdmulls_s32, vqdmulls_lane_s32, vqmovnh_s16, vqmovns_s32,
vqmovnh_u16, vqmovns_u32, vqmovunh_s16, vqmovuns_s32, vqnegb_s8,
vqnegh_s16, vqnegs_s32, vqrdmulhh_s16, vqrdmulhh_lane_s16,
vqrdmulhs_s32, vqrdmulhs_lane_s32, vqrshlb_s8, vqrshlh_s16,
vqrshls_s32, vqrshlb_u8, vqrshlh_u16, vqrshls_u32, vqrshrnh_n_s16,
vqrshrns_n_s32, vqrshrnh_n_u16, vqrshrns_n_u32, vqrshrunh_n_s16,
vqrshruns_n_s32, vqshlb_s8, vqshlh_s16, vqshls_s32, vqshlb_u8,
vqshlh_u16, vqshls_u32, vqshlb_n_s8, vqshlh_n_s16, vqshls_n_s32,
vqshlb_n_u8, vqshlh_n_u16, vqshls_n_u32, vqshlub_n_s8, vqshluh_n_s16,
vqshlus_n_s32, vqshrnh_n_s16, vqshrns_n_s32, vqshrnh_n_u16,
vqshrns_n_u32, vqshrunh_n_s16, vqshruns_n_s32, vqsubb_s8, vqsubh_s16,
vqsubs_s32, vqsubb_u8, vqsubh_u16, vqsubs_u32, vsqaddb_u8,
vsqaddh_u16, vsqadds_u32, vuqaddb_s8, vuqaddh_s16, vuqadds_s32):
Replace all int{32,16,8}x1_t with int{32,16,8}_t.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/scalar_intrinsics.c (*): Replace all
int{32,16,8}x1_t with int{32,16,8}_t.
* gcc.target/aarch64/simd/vqdmlalh_lane_s16.c: Likewise.
* gcc.target/aarch64/simd/vqdmlals_lane_s32.c: Likewise.
* gcc.target/aarch64/simd/vqdmlslh_lane_s16.c: Likewise.
* gcc.target/aarch64/simd/vqdmlsls_lane_s32.c: Likewise.
* gcc.target/aarch64/simd/vqdmullh_lane_s16.c: Likewise.
* gcc.target/aarch64/simd/vqdmulls_lane_s32.c: Likewise.diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 83ac5e96d422ceccadcb212ec792665b78c03fae..2e54c01aa068c1680b068592bbdfd011ee4b7cf8 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -40,9 +40,6 @@ typedef __builtin_aarch64_simd_si int32x2_t
   __attribute__ ((__vector_size__ (8)));
 typedef __builtin_aarch64_simd_di int64x1_t
   __attribute__ ((__vector_size__ (8)));
-typedef int32_t int32x1_t;
-typedef int16_t int16x1_t;
-typedef int8_t int8x1_t;
 typedef __builtin_aarch64_simd_df float64x1_t
   __attribute__ ((__vector_size__ (8)));
 typedef __builtin_aarch64_simd_sf float32x2_t
@@ -59,9 +56,6 @@ typedef __builtin_aarch64_simd_usi uint32x2_t
   __attribute__ ((__vector_size__ (8)));
 typedef __builtin_aarch64_simd_udi uint64x1_t
   __attribute__ ((__vector_size__ (8)));
-typedef uint32_t uint32x1_t;
-typedef uint16_t uint16x1_t;
-typedef uint8_t uint8x1_t;
 typedef __builtin_aarch64_simd_qi int8x16_t
   __attribute__ ((__vector_size__ (16)));
 typedef __builtin_aarch64_simd_hi int16x8_t
@@ -19203,22 +19197,22 @@ vqabsq_s64 (int64x2_t __a)
   return (int64x2_t) __builtin_aarch64_sqabsv2di (__a);
 }
 
-__extension__ static __inline int8x1_t __attribute__ ((__always_inline__))
-vqabsb_s8 (int8x1_t __a)
+__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+vqabsb_s8 (int8_t __a)
 {
-  return (int8x1_t) __builtin_aarch64_sqabsqi (__a);
+  return (int8_t) __builtin_aarch64_sqabsqi (__a);
 }
 
-__extension__ static __inline int16x1_t __attribute__ ((__always_inline__))
-vqabsh_s16 (int16x1_t __a)
+__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+vqabsh_s16 (int16_t __a)
 {
-  return (int16x1_t) __builtin_aarch64_sqabshi (__a);
+  return (int16_t) __builtin_aarch64_sqabshi (__a);
 }
 
-__extension__ static __inline int32x1_t __attribute__ ((__always_inline__))
-vqabss_s32 (int32x1_t __a)
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vqabss_s32 (int32_t __a)
 {
-  return (int32x1_t) __builtin_aarch64_sqabssi (__a);
+  return (int32_t) __builtin_aarch64_sqabssi (__a);
 }
 
 __extension__ static __inline

Re: [GSoC] Handling of isl_ast_op_pdiv_q and isl_ast_op_pdiv_r

2014-07-24 Thread Tobias Grosser


On 24/07/2014 12:09, Roman Gareev wrote:

Any reason you don't make 'k' a function parameter?


Yes, ISL'll generate the following code, if 'k' a function parameter:

for (int c1 = 0; c1 <= 24; c1 += 1)
   S_3(c1);

However, we could use  -fno-ipa-cp to get the ISL AST, which contains
isl_ast_op_pdiv_r. What do you think about this?


I see. Just add a comment that we use globals to avoid ipa-cp.


Can you initialize res outside of the loop?


Yes, I've implemented this in the improved version.


Nice. Feel free to commit.

Tobias

Re: [GSoC] A bug related to induction variables and blocks

2014-07-24 Thread Tobias Grosser


On 24/07/2014 12:09, Roman Gareev wrote:

Is there a reason you have those global values? To my understanding they
could possibly just be function parameters?


Yes, it would be fine for this test case. I've implemented this in the
improved version.


LGTM.

Tobias

Re: [GSoC] generation of Gimple code from isl_ast_node_if

2014-07-24 Thread Tobias Grosser


On 24/07/2014 12:09, Roman Gareev wrote:

I've attached the patch, which contains generation of Gimple code from
isl_ast_node_if.


Nice.


However, I've found out a problem. When I'm trying to generate Gimple
code from, for example, the following ISL AST:

{
   for (int c1 = 0; c1 <= 49; c1 += 1) {
 S_6(c1);
 if (c1 <= 48) {
   S_3(c1);
   if (c1 >= 24)
 S_4(c1);
   S_5(c1);
 }
   }
   S_7();
}

the pointer to Gimple basic block of S_3's poly basic block is NULL.

Could you please advise me possible reasons of this issue?


I have no idea. Is the Gimple basic block of S_3 never set, or is it set 
and deleted on the way? What code does S_3 correspond to?


The code itself looks good. Let's get back to this after we understood 
this bug.


Cheers,
Tobias

Re: Strenghten assumption about dynamic type changes (placement new)

2014-07-24 Thread Jan Hubicka

> 
> Aggregate copies and memcpy transferring the dynamic type for example.  Being 
> able to tbaa union accesses for another.  And yes, placement new.

I see that if we previously dropped all union accesses to 0, the current scheme
is nice improvement.  But it seem to me it may be in use only when one of
accesses is through union.

How the memcpy case works? I always tought that memcpy does reads&writes in set 0
that makes it to introduce the necessary conflicts.

Similarly can't we make set 0 clobber of the memory retyped by placement new?
If the clobber is hidden in external function call, we still have it as a side
effect of the call. It would have to survive all the way down to RTL...

Honza

Re: [PATCH][ARM] Enable arm target in ira-shrinkwrap-prep* testcases

2014-07-24 Thread Andreas Schwab

Jiong Wang  writes:

> gcc/testsuite/
>   * gcc.dg/ira-shrinkwrap-prep-1.c (target): Add arm_nothumb
>   * gcc.dg/ira-shrinkwrap-prep-2.c (target): Add arm_nothumb
>   * gcc.dg/pr10474.c (target): Add arm_nothumb

arm_nothumb doesn't check for __arm__, so this enables it everywhere.
Installed as obvious.

Andreas.

* lib/target-supports.exp (check_effective_target_arm_nothumb):
Also check for __arm__.

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 33672f2..1dc0f44 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2262,7 +2262,7 @@ proc check_effective_target_aarch64_little_endian { } {
 }]
 }
 
-# Return 1 is this is an arm target using 32-bit instructions
+# Return 1 if this is an arm target using 32-bit instructions
 proc check_effective_target_arm32 { } {
 return [check_no_compiler_messages arm32 assembly {
#if !defined(__arm__) || (defined(__thumb__) && !defined(__thumb2__))
@@ -2271,10 +2271,10 @@ proc check_effective_target_arm32 { } {
 }]
 }
 
-# Return 1 is this is an arm target not using Thumb
+# Return 1 if this is an arm target not using Thumb
 proc check_effective_target_arm_nothumb { } {
 return [check_no_compiler_messages arm_nothumb assembly {
-   #if (defined(__thumb__) || defined(__thumb2__))
+   #if !defined(__arm__) || (defined(__thumb__) || defined(__thumb2__))
#error FOO
#endif
 }]
-- 
2.0.2

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: werror fallout for cross-builds (was: Re: [BUILDROBOT][PATCH] Fix mmix (unused variable))

2014-07-24 Thread Maciej W. Rozycki

On Tue, 22 Jul 2014, Mike Stump wrote:

> Then I’m shadow boxing.  I assumed that people wanted to turn it on by 
> default.  I’m all for that, I think it is a good idea and a fine 
> direction.  :-)  The only limitation is whitelisting exactly when it 
> pops on and preflighting those at least once to ensure they are clean.

 I think at the very least the thing should be on whenever building with 
itself, that is the build compiler's version is the same as the version of 
the compiler being built.  That can be probably reasonably broadened to 
any compiler bearing the same major.minor version (or just major if we 
switch to the two-part versioning scheme recently proposed).

 The thing is to bring the code base to always compile without warnings 
and then not to let it regress.  Warnings are too easy to miss and 
sometimes are a symptom of actual breakage rather than just harmless 
noise, i.e. code builds and runs, but does something silly.  I have wasted 
hours of debugging time already on chasing such problems.

  Maciej

Re: [AArch64/GCC][14/N] Optimize epilogue when there is frame pointer

2014-07-24 Thread Marcus Shawcroft

On 22 July 2014 15:52, Jiong Wang  wrote:

> gcc/
>   * config/aarch64/aarch64.c (aarch64_expand_epilogue): Don't subtract
>   outgoing area size when restore stack_pointer_rtx.
>
> gcc/testsuite/
>   * gcc.target/aarch64/test_frame_12.c: Match optimized instruction
> sequences.

OK and committed.

/Marcus

Re: [PATCH, Pointer Bounds Checker 9/x] Cgraph extension

2014-07-24 Thread Jan Hubicka

Hello,

> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index a6a51cf..5e702a7 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -191,6 +191,7 @@ struct GTY(()) cgraph_thunk_info {
>tree alias;
>bool this_adjusting;
>bool virtual_offset_p;
> +  bool add_pointer_bounds_args;
>/* Set to true when alias node is thunk.  */
>bool thunk_p;
>  };
> @@ -373,6 +374,13 @@ public:
>struct cgraph_node *prev_sibling_clone;
>struct cgraph_node *clones;
>struct cgraph_node *clone_of;
> +  /* If instrumentation_clone is 1 then instrumented_version points
> + to the original function used to make instrumented version.
> + Otherwise points to instrumented version of the function.  */
> +  struct cgraph_node *instrumented_version;
> +  /* If instrumentation_clone is 1 then orig_decl is the original
> + function declaration.  */
> +  tree orig_decl;

So the patch is introducing yet another notion of clone (in addition to 
existing virtual clones
and function versions used by ifun) and you add a new type of reference (CHKP) 
to link the
original and the clone.

Why do you need to link things in 3 different ways? (i.e. instrumented_version 
points to the
same place as CHKP and as orig_decl, right?).

I would preffer if this can be put into the existing clone mechanizm. The 
virtual clones can
have quite generic transformations done on them and the do perform all the 
necessary links
back and forth.

I will look into the rest of changes, is there some overview?

Honza


>/* For functions with many calls sites it holds map from call expression
>   to the edge to speed up cgraph_edge function.  */
>htab_t GTY((param_is (struct cgraph_edge))) call_site_hash;
> @@ -433,6 +441,9 @@ public:
>/* True if this decl calls a COMDAT-local function.  This is set up in
>   compute_inline_parameters and inline_call.  */
>unsigned calls_comdat_local : 1;
> +  /* True when function is clone created for Pointer Bounds Checker
> + instrumentation.  */
> +  unsigned instrumentation_clone : 1;
>  };
>  
>  
> @@ -1412,6 +1423,8 @@ symtab_alias_target (symtab_node *n)
>  {
>struct ipa_ref *ref;
>ipa_ref_list_reference_iterate (&n->ref_list, 0, ref);
> +  if (ref->use == IPA_REF_CHKP)
> +ipa_ref_list_reference_iterate (&n->ref_list, 1, ref);
>gcc_checking_assert (ref->use == IPA_REF_ALIAS);
>return ref->referred;
>  }
> diff --git a/gcc/cgraphbuild.c b/gcc/cgraphbuild.c
> index 19961e2..a2b2106 100644
> --- a/gcc/cgraphbuild.c
> +++ b/gcc/cgraphbuild.c
> @@ -481,6 +481,10 @@ rebuild_cgraph_edges (void)
>record_eh_tables (node, cfun);
>gcc_assert (!node->global.inlined_to);
>  
> +  if (node->instrumented_version
> +  && !node->instrumentation_clone)
> +ipa_record_reference (node, node->instrumented_version, IPA_REF_CHKP, 
> NULL);
> +
>return 0;
>  }
>  
> @@ -513,6 +517,11 @@ cgraph_rebuild_references (void)
>   ipa_record_stmt_references (node, gsi_stmt (gsi));
>  }
>record_eh_tables (node, cfun);
> +
> +
> +  if (node->instrumented_version
> +  && !node->instrumentation_clone)
> +ipa_record_reference (node, node->instrumented_version, IPA_REF_CHKP, 
> NULL);
>  }
>  
>  namespace {
> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> index 06283fc..ceb4060 100644
> --- a/gcc/cgraphunit.c
> +++ b/gcc/cgraphunit.c
> @@ -1702,7 +1702,8 @@ assemble_thunks_and_aliases (struct cgraph_node *node)
>struct ipa_ref *ref;
>  
>for (e = node->callers; e;)
> -if (e->caller->thunk.thunk_p)
> +if (e->caller->thunk.thunk_p
> + && !e->caller->thunk.add_pointer_bounds_args)
>{
>   struct cgraph_node *thunk = e->caller;
>  
> diff --git a/gcc/ipa-ref.c b/gcc/ipa-ref.c
> index 6aa41e6..3a055d9 100644
> --- a/gcc/ipa-ref.c
> +++ b/gcc/ipa-ref.c
> @@ -27,7 +27,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "cgraph.h"
>  #include "ipa-utils.h"
>  
> -static const char *ipa_ref_use_name[] = {"read","write","addr","alias"};
> +static const char *ipa_ref_use_name[] = 
> {"read","write","addr","alias","chkp"};
>  
>  /* Return ipa reference from REFERING_NODE or REFERING_VARPOOL_NODE
> to REFERED_NODE or REFERED_VARPOOL_NODE. USE_TYPE specify type
> diff --git a/gcc/ipa-ref.h b/gcc/ipa-ref.h
> index 4ce5f8d..d0df0bf 100644
> --- a/gcc/ipa-ref.h
> +++ b/gcc/ipa-ref.h
> @@ -29,7 +29,8 @@ enum GTY(()) ipa_ref_use
>IPA_REF_LOAD,
>IPA_REF_STORE,
>IPA_REF_ADDR,
> -  IPA_REF_ALIAS
> +  IPA_REF_ALIAS,
> +  IPA_REF_CHKP
>  };
>  
>  /* Record of reference in callgraph or varpool.  */
> @@ -40,7 +41,7 @@ struct GTY(()) ipa_ref
>gimple stmt;
>unsigned int lto_stmt_uid;
>unsigned int referred_index;
> -  ENUM_BITFIELD (ipa_ref_use) use:2;
> +  ENUM_BITFIELD (ipa_ref_use) use:3;
>unsigned int speculative:1;
>  };
>  
> diff --git a/gcc/ipa.c b/gcc/ipa.c
> index 5ab3aed..1d7fa35 100644
> --- a/gcc/ipa.c
> +++ b/gcc/ipa.c
> @@ -508,6 +508,12 @@ symtab_remove_unreachable_nodes

Re: Strenghten assumption about dynamic type changes (placement new)

2014-07-24 Thread Richard Biener

On Thu, Jul 24, 2014 at 12:46 PM, Jan Hubicka  wrote:
>>
>> Aggregate copies and memcpy transferring the dynamic type for example.  
>> Being able to tbaa union accesses for another.  And yes, placement new.
>
> I see that if we previously dropped all union accesses to 0, the current 
> scheme
> is nice improvement.  But it seem to me it may be in use only when one of
> accesses is through union.
>
> How the memcpy case works? I always tought that memcpy does reads&writes in 
> set 0
> that makes it to introduce the necessary conflicts.

Yes, that's possible now (with MEM_REF), previously it was not
and the memory model "fixed" it.

> Similarly can't we make set 0 clobber of the memory retyped by placement new?

We don't have a way to do that, but yes, we could.  But as said, for PODs
you don't even need placement new.  You can just store with a new type.

Richard.

> If the clobber is hidden in external function call, we still have it as a side
> effect of the call. It would have to survive all the way down to RTL...
>
> Honza

Re: [AArch64/GCC][14/N] Optimize epilogue when there is frame pointer

2014-07-24 Thread Richard Earnshaw

On 22/07/14 15:52, Jiong Wang wrote:
> currently we are generating sub-optimal epilogue when there
> is frame pointer and there is outgoing area.
>  
> take gcc.target/aarch64/test_frame_12.c for example:
>  
> the epilogue for test_12 is:
>  
>.L12:
>sub sp, x29, #16
>ldp x29, x30, [sp, 16]
>add sp, sp, 432
>ret
>  
> while the optimized version should be:
>  
>.L12:
>add sp, x29, 0
>ldp x29, x30, [sp], 416
>ret

Even better would be

ldp x29, x30, [x29]
add sp, sp, #432

since now the two instructions can dual-issue.

R.

>  
> when there is frame pointer, it is set up to point to base address
> of our reg save area in prologue, so in epilogue we could utilize
> this feature, and skip outgoing if there is, thus we could always utilize
> load write-back for stack adjustment when there is frame pointer.
> 
> ok to install?
> 
> thanks.
> 
> gcc/
>* config/aarch64/aarch64.c (aarch64_expand_epilogue): Don't subtract
>outgoing area size when restore stack_pointer_rtx.
> 
> gcc/testsuite/
>* gcc.target/aarch64/test_frame_12.c: Match optimized instruction 
> sequences.
> 
> 
> 0014-AArch64-GCC-15-20-Optimize-epilogue-when-there-is-fr.patch
> 
> 
>>From 9d8cbfa071df773ef5edfed499c0dc90be8eebfa Mon Sep 17 00:00:00 2001
> From: Jiong Wang 
> Date: Tue, 17 Jun 2014 22:19:33 +0100
> Subject: [PATCH 14/19] [AArch64/GCC][15/20] Optimize epilogue when there is
>  frame pointer
> 
> currently we are generating sub-optimal epilogue when there
> is frame pointer and there is outgoing area.
> 
> take gcc.target/aarch64/test_frame_12.c for example:
> 
> the epilogue for test_12 is:
> 
> .L12:
> sub sp, x29, #16
> ldp x29, x30, [sp, 16]
> add sp, sp, 432
> ret
> 
> while the optimized version should be:
> 
> .L12:
> add sp, x29, 0
> ldp x29, x30, [sp], 416
> ret
> 
> when there is frame pointer, it is set up to point to base address of our
> reg save area in prologue, so in epilogue we could utilize this feature,
> and skip outgoing if there is, thus we could always utilize load write-back
> for stack adjustment when there is frame pointer.
> 
> 2014-06-16  Jiong Wang 
>   Marcus Shawcroft  
> gcc/
>   * config/aarch64/aarch64.c (aarch64_expand_epilogue): Don't subtract
>   outgoing area size when restore stack_pointer_rtx.
> 
> gcc/testsuite/
>   * gcc.target/aarch64/test_frame_12.c: Match optimized instruction sequences.
> ---
>  gcc/config/aarch64/aarch64.c |   24 
> +++---
>  gcc/testsuite/gcc.target/aarch64/test_frame_12.c |4 
>  2 files changed, 11 insertions(+), 17 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 425c865..65a84e8 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -2360,7 +2360,8 @@ aarch64_expand_epilogue (bool for_sibcall)
>  {
>insn = emit_insn (gen_add3_insn (stack_pointer_rtx,
>  hard_frame_pointer_rtx,
> -GEN_INT (- fp_offset)));
> +GEN_INT (0)));
> +  offset = offset - fp_offset;
>RTX_FRAME_RELATED_P (insn) = 1;
>/* As SP is set to (FP - fp_offset), according to the rules in
>dwarf2cfi.c:dwarf2out_frame_debug_expr, CFA should be calculated
> @@ -2368,27 +2369,16 @@ aarch64_expand_epilogue (bool for_sibcall)
>cfa_reg = stack_pointer_rtx;
>  }
>  
> -  aarch64_restore_callee_saves (DFmode, fp_offset, V0_REGNUM, V31_REGNUM);
> +  aarch64_restore_callee_saves (DFmode, frame_pointer_needed ? 0 : fp_offset,
> + V0_REGNUM, V31_REGNUM);
>  
>if (offset > 0)
>  {
>if (frame_pointer_needed)
>   {
> -   if (fp_offset)
> - {
> -   aarch64_restore_callee_saves (DImode, fp_offset, R0_REGNUM,
> - R30_REGNUM);
> -   insn = emit_insn (gen_add2_insn (stack_pointer_rtx,
> -GEN_INT (offset)));
> -   RTX_FRAME_RELATED_P (insn) = 1;
> - }
> -   else
> - {
> -   aarch64_restore_callee_saves (DImode, fp_offset, R0_REGNUM,
> - R28_REGNUM);
> -   aarch64_popwb_pair_reg (DImode, R29_REGNUM, R30_REGNUM, offset,
> -   cfa_reg);
> - }
> +   aarch64_restore_callee_saves (DImode, 0, R0_REGNUM, R28_REGNUM);
> +   aarch64_popwb_pair_reg (DImode, R29_REGNUM, R30_REGNUM, offset,
> +   cfa_reg);
>   }
>else
>   {
> diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_12.c 
> b/gcc/testsuite/gcc.target/aarch64/test_frame_12.c
> index 3649527..81f0070 100644
> --- a/gcc/testsuite/gcc.target/aarch64/test_frame_12.c
> +++ b/gcc/testsuite/gcc.

[PATCHv2] Fix vector tests on ARM platforms with disabled unaligned accesses

2014-07-24 Thread Marat Zakirov



On 07/23/2014 06:23 PM, Marat Zakirov wrote:

Hi there!

I made a patch which fixes regressions on ARM platforms with disabled 
unaligned accesses. The problem is that 'arm_vect_no_misalign' 
predicate do not check 'unaligned_access' global variable to determine 
whether unaligned access to vector are allowed. This leads to spurious 
vect.exp test fails when GCC is configured 
--with-specs=%{!munaligned-access:-mno-unaligned-access}.


Attached patch fixes ARM predicate and several tests to correctly 
handle the issue.


The following targets were reg. tested for multiple targets (ARM, 
Thumb-1, Thumb-2, x86, x86_64) with and without 
-mno-unaligned-access.  Analysis showed patch affects only vect.exp 
tests so only vect.exp was tested.


For x86, x86_64,  ARM without -mno-unaligned-access, Thumb-2 without 
-mno-unaligned-access and Thumb-1 no regressions occured. For 
ARM/Thumb2 with -mno-unaligned-access patch fixed most of failures but 
triggered some problems (see attached log) for current vect.exp tests:

1) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61887
2) Some XPASS'es due to unexpected loop versioning (e.g. 
gcc.dg/vect/pr33804.c).
3) After predicate fix some passing tests which require unaligned 
vector support become NA (this was expected).


Here is new version of patch and regression log. On the current trunk 
results are slightly different due to patches for Richard Biener (no 
UNRESOLVED fails) but some  PASS->XPASS regressions still remain (see 
attachment):


PASS->XPASS: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c 
scan-tree-dump-times vect "vectorized 1 loops" 1
PASS->XPASS: gcc.dg/vect/pr33804.c -flto -ffat-lto-objects 
scan-tree-dump-times vect "vectorized 1 loops" 1

etc.

These XPASS'es are due to code versioning: current GCC creates 2 
versions of loop: aligned and misaligned. It's look like they are 
slightly out of date at lest for ARM.





gcc/testsuite/ChangeLog:

2014-07-23  Marat Zakirov  

	* gcc.dg/vect/vect-109.c: Skip predicate added.
	* gcc.dg/vect/vect-93.c: Test check fixed.
	* gcc.dg/vect/bb-slp-10.c: Likewise.
	* lib/target-supports.exp (check_effective_target_arm_vect_no_misalign):
	Check unaligned feature.

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-10.c b/gcc/testsuite/gcc.dg/vect/bb-slp-10.c
index a1850ed..0090a4b 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-10.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-10.c
@@ -49,7 +49,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "unsupported alignment in basic block." 1 "slp2" { xfail vect_element_align } } } */
+/* { dg-final { scan-tree-dump "unsupported alignment in basic block." 1 "slp2" { xfail vect_element_align } } } */
 /* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" { target vect_element_align } } } */
 /* { dg-final { cleanup-tree-dump "slp2" } } */
   
diff --git a/gcc/testsuite/gcc.dg/vect/vect-109.c b/gcc/testsuite/gcc.dg/vect/vect-109.c
index 854c970..c671175 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-109.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-109.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "" { vect_no_align } } */
 /* { dg-require-effective-target vect_int } */
 
 #include 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-93.c b/gcc/testsuite/gcc.dg/vect/vect-93.c
index 65403eb..1065a6e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-93.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-93.c
@@ -79,7 +79,7 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_no_align } } } */
 
 /* in main: */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target vect_no_align } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_no_align } } } */
 /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 1 "vect" { xfail { vect_no_align } } } } */
 
 /* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index db65ebe..42251e8 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2293,7 +2293,8 @@ proc check_effective_target_arm_little_endian { } {
 proc check_effective_target_arm_vect_no_misalign { } {
 return [check_no_compiler_messages arm_vect_no_misalign assembly {
 	#if !defined(__arm__) \
-	|| (defined(__ARMEL__) \
+	|| (defined(__ARM_FEATURE_UNALIGNED) \
+	&& defined(__ARMEL__) \
 	&& (!defined(__thumb__) || defined(__thumb2__)))
 	#error FOO
 	#endif
dg-cmp-results.sh: Verbosity is 2, Variant is "target-sim"

Older log file: ./gcc/testsuite/gcc/gcc.sum
Test Run By mzakirov on Thu Jul 24 11:47:42 2014
Target is arm-v7a15v5r2-linux-gnueabi
Host   is x86_64-pc-linux-gnu

Newer log file: /home/mzakirov/proj/gcc_unalign/build.arm.cortex-a15/obj/gcc_final/./gcc/testsuite/gcc/gcc.sum
Test Run By mzakirov on Thu Jul 24 13:09:25 2014
Target is arm-v7a15v5r2-linux-gnueabi
Host   is x86_64-pc-linux-gnu

FAIL->NA: gcc.dg/vect/

[PATCH] PR 61876: Do not convert cast + __builtin_round into __builtin_lround unless -fno-math-errno is used

2014-07-24 Thread Kyrill Tkachov


Hi all,

This fixes PR 61876 by not converting the round + cast into an lround 
unless -fno-math-errno is specified.
This is because lround can potentially set math errno whereas round + 
cast doesn't, so the transformation isn't universally valid.


This will cause the tests:
gcc.target/aarch64/fcvt_double_long.c
gcc.target/aarch64/fcvt_double_ulong.c

to start passing on aarch64-linux.

aarch64 and x86 bootstrap and regtest looks fine.

Ok for trunk?

2014-06-23  Kyrylo Tkachov  

PR middle-end/61876
* convert.c (convert_to_integer): Do not convert BUILT_IN_ROUND and 
cast

when flat_errno_math is on.diff --git a/gcc/convert.c b/gcc/convert.c
index 09bc555..8dbf3cb 100644
--- a/gcc/convert.c
+++ b/gcc/convert.c
@@ -456,8 +456,8 @@ convert_to_integer (tree type, tree expr)
 	  break;
 
 	CASE_FLT_FN (BUILT_IN_ROUND):
-	  /* Only convert in ISO C99 mode.  */
-	  if (!targetm.libc_has_function (function_c99_misc))
+	  /* Only convert in ISO C99 mode and with -fno-math-errno.  */
+	  if (!targetm.libc_has_function (function_c99_misc) || flag_errno_math)
 	break;
 	  if (outprec < TYPE_PRECISION (integer_type_node)
 	  || (outprec == TYPE_PRECISION (integer_type_node)

Re: Regimplification enhancements 3/3

2014-07-24 Thread Martin Jambor

Hi,

sorry for late reply, I've been on vacation and then preparing for
Cauldron.  Anyway...

On Mon, Jun 30, 2014 at 05:13:13PM +0200, Bernd Schmidt wrote:
> On 06/17/2014 04:54 PM, Martin Jambor wrote:
> >Weird... does the following (untested) patch help?
> >
> >diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> >index 0afa197..747b1b6 100644
> >--- a/gcc/tree-sra.c
> >+++ b/gcc/tree-sra.c
> >@@ -3277,6 +3277,8 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator 
> >*gsi)
> >
> >if (modify_this_stmt
> >|| gimple_has_volatile_ops (*stmt)
> >+  || is_gimple_reg (lhs)
> >+  || is_gimple_reg (rhs)
> >|| contains_vce_or_bfcref_p (rhs)
> >|| contains_vce_or_bfcref_p (lhs)
> >|| stmt_ends_bb_p (*stmt))
> 
> Unfortunately not.
> 
> >It is just a quick thought though.  If it does not, could you post the
> >access trees dumped by -fdump-tree-esra-details or
> >-fdump-tree-sra-details (depending on whether this is early or late
> >SRA)?  Or is it simple to set it up locally?
> 
> Not really. It needs a whole patch tree for the ptx port. I'm
> attaching the last two dump files.
> 
> 
> Bernd
> 

> 
> ;; Function bar (bar, funcdef_no=0, decl_uid=1376, symbol_order=0)
> 
> 
> Pass statistics:
> 
> 
> 
> Pass statistics:
> 
> 
> bar (struct S xD.1375)
> {
>   struct S D.1385;
>   struct S aD.1378;
>   struct S D.1379;
>   struct S D.1381;
> 
> ;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
> ;;prev block 0, next block 1, flags: (NEW, REACHABLE)
> ;;pred:   ENTRY [100.0%]  (FALLTHRU,EXECUTABLE)
>   # .MEM_2 = VDEF <.MEM_1(D)>
>   aD.1378 = xD.1375;
>   # .MEM_3 = VDEF <.MEM_2>
>   # USE = nonlocal 
>   # CLB = nonlocal 
>   _6 = fooD.1374 (aD.1378);
>   # .MEM_7 = VDEF <.MEM_3>
>   D.1379 = _6;

This seems to be the statement which has its RHS converted to to a
MEM_REF[&_6], am I right?  I wonder whether it is correct input
though, because it looks like it has mismatched types.  The LHS is
clearly an aggregate of type struct S while the RHS is an SSA name,
meaning it cannot be of an aggregate type.  Does this pass gimple
checking?  What creates that statement?

Thanks,

Martin


>   # .MEM_4 = VDEF <.MEM_7>
>   aD.1378 ={v} {CLOBBER};
>   # .MEM_5 = VDEF <.MEM_4>
>   D.1381 = D.1379;
>   # VUSE <.MEM_5>
>   return D.1381;
> ;;succ:   EXIT [100.0%] 
> 
> }
> 
> 

> 
> ;; Function bar (bar, funcdef_no=0, decl_uid=1376, symbol_order=0)
> 
> 
> Pass statistics:
> 
> 
> Candidate (1375): x
> Candidate (1385): D.1385
> Candidate (1378): a
> Candidate (1379): D.1379
> Candidate (1381): D.1381
> Will attempt to totally scalarize D.1379 (UID: 1379): 
> ! Disqualifying D.1385 - No or inhibitingly overlapping accesses.
> ! Disqualifying x - No scalar replacements to be created.
> ! Disqualifying a - No scalar replacements to be created.
> Created a replacement for D.1379 offset: 0, size: 32: SR$2
> 
> Access trees for D.1379 (UID: 1379): 
> access { base = (1379)'D.1379', offset = 0, size = 32, expr = D.1379.len, 
> type = unsigned int, grp_read = 1, grp_write = 1, grp_assignment_read = 1, 
> grp_assignment_write = 1, grp_scalar_read = 1, grp_scalar_write = 0, 
> grp_total_scalarization = 1, grp_hint = 1, grp_covered = 1, 
> grp_unscalarizable_region = 0, grp_unscalarized_data = 0, grp_partial_lhs = 
> 0, grp_to_be_replaced = 1, grp_to_be_debug_replaced = 0, grp_maybe_modified = 
> 0, grp_not_necessarilly_dereferenced = 0
> 
> ! Disqualifying D.1381 - No scalar replacements to be created.
> 
> Pass statistics:
> 
> Scalarized aggregates: 1
> Modified expressions: 2
> Separate LHS and RHS handling: 2
> Scalar replacements created: 1
> 
> 
> Updating SSA:
> Registering new PHI nodes in block #0
> Registering new PHI nodes in block #2
> Updating SSA information for statement SR$2 = MEM[(struct S *)&_6];
> Updating SSA information for statement MEM[(struct S *)&D.1381] = SR$2;
> 
> DFA Statistics for bar
> 
> -
> Number ofMemory
> instances used 
> -
> USE operands  1  8b
> DEF operands  2 16b
> VUSE operands 6 48b
> VDEF operands 4 32b
> PHI nodes 0  0b
> PHI arguments 0  0b
> -
> Total memory used by DFA/SSA data  104b
> -
> 
> 
> 
> Hash table statistics:
> var_infos:   size 61, 1 elements, 0.00 collision/search ratio
> 
> 
> Symbols to be put in SSA form
> { D.1387 }
> Incremental SSA update started at block: 0
> Number of blocks in CFG: 3
> Number

Re: [AArch64/GCC][14/N] Optimize epilogue when there is frame pointer

2014-07-24 Thread Jiong Wang



On 24/07/14 13:25, Richard Earnshaw wrote:

On 22/07/14 15:52, Jiong Wang wrote:

currently we are generating sub-optimal epilogue when there
is frame pointer and there is outgoing area.
  
take gcc.target/aarch64/test_frame_12.c for example:
  
the epilogue for test_12 is:
  
.L12:

sub sp, x29, #16
ldp x29, x30, [sp, 16]
add sp, sp, 432
ret
  
while the optimized version should be:
  
.L12:

add sp, x29, 0
ldp x29, x30, [sp], 416
ret

Even better would be

ldp x29, x30, [x29]
add sp, sp, #432

since now the two instructions can dual-issue.


thanks for pointing this out.

will investigate this after current stack patch set installed.

-- Jiong



R.

  
when there is frame pointer, it is set up to point to base address

of our reg save area in prologue, so in epilogue we could utilize
this feature, and skip outgoing if there is, thus we could always utilize
load write-back for stack adjustment when there is frame pointer.

ok to install?

thanks.

gcc/
* config/aarch64/aarch64.c (aarch64_expand_epilogue): Don't subtract
outgoing area size when restore stack_pointer_rtx.

gcc/testsuite/
* gcc.target/aarch64/test_frame_12.c: Match optimized instruction sequences.


0014-AArch64-GCC-15-20-Optimize-epilogue-when-there-is-fr.patch


>From 9d8cbfa071df773ef5edfed499c0dc90be8eebfa Mon Sep 17 00:00:00 2001
From: Jiong Wang 
Date: Tue, 17 Jun 2014 22:19:33 +0100
Subject: [PATCH 14/19] [AArch64/GCC][15/20] Optimize epilogue when there is
  frame pointer

currently we are generating sub-optimal epilogue when there
is frame pointer and there is outgoing area.

take gcc.target/aarch64/test_frame_12.c for example:

the epilogue for test_12 is:

.L12:
sub sp, x29, #16
ldp x29, x30, [sp, 16]
add sp, sp, 432
ret

while the optimized version should be:

.L12:
add sp, x29, 0
ldp x29, x30, [sp], 416
ret

when there is frame pointer, it is set up to point to base address of our
reg save area in prologue, so in epilogue we could utilize this feature,
and skip outgoing if there is, thus we could always utilize load write-back
for stack adjustment when there is frame pointer.

2014-06-16  Jiong Wang 
Marcus Shawcroft  
gcc/
   * config/aarch64/aarch64.c (aarch64_expand_epilogue): Don't subtract
   outgoing area size when restore stack_pointer_rtx.

gcc/testsuite/
   * gcc.target/aarch64/test_frame_12.c: Match optimized instruction sequences.
---
  gcc/config/aarch64/aarch64.c |   24 +++---
  gcc/testsuite/gcc.target/aarch64/test_frame_12.c |4 
  2 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 425c865..65a84e8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2360,7 +2360,8 @@ aarch64_expand_epilogue (bool for_sibcall)
  {
insn = emit_insn (gen_add3_insn (stack_pointer_rtx,
   hard_frame_pointer_rtx,
-  GEN_INT (- fp_offset)));
+  GEN_INT (0)));
+  offset = offset - fp_offset;
RTX_FRAME_RELATED_P (insn) = 1;
/* As SP is set to (FP - fp_offset), according to the rules in
 dwarf2cfi.c:dwarf2out_frame_debug_expr, CFA should be calculated
@@ -2368,27 +2369,16 @@ aarch64_expand_epilogue (bool for_sibcall)
cfa_reg = stack_pointer_rtx;
  }
  
-  aarch64_restore_callee_saves (DFmode, fp_offset, V0_REGNUM, V31_REGNUM);

+  aarch64_restore_callee_saves (DFmode, frame_pointer_needed ? 0 : fp_offset,
+   V0_REGNUM, V31_REGNUM);
  
if (offset > 0)

  {
if (frame_pointer_needed)
{
- if (fp_offset)
-   {
- aarch64_restore_callee_saves (DImode, fp_offset, R0_REGNUM,
-   R30_REGNUM);
- insn = emit_insn (gen_add2_insn (stack_pointer_rtx,
-  GEN_INT (offset)));
- RTX_FRAME_RELATED_P (insn) = 1;
-   }
- else
-   {
- aarch64_restore_callee_saves (DImode, fp_offset, R0_REGNUM,
-   R28_REGNUM);
- aarch64_popwb_pair_reg (DImode, R29_REGNUM, R30_REGNUM, offset,
- cfa_reg);
-   }
+ aarch64_restore_callee_saves (DImode, 0, R0_REGNUM, R28_REGNUM);
+ aarch64_popwb_pair_reg (DImode, R29_REGNUM, R30_REGNUM, offset,
+ cfa_reg);
}
else
{
diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_12.c 
b/gcc/testsuite/gcc.target/aarch64/test_frame_12.c
index 3649527..81f0070 100644
--- a/gcc/testsuite/gcc.target/aarch64/test_frame_12.c
+++ b/gcc/testsuite/gcc.target/aarch64/test_frame_

[PATCH][Ping v5] Add patch for debugging compiler ICEs

2014-07-24 Thread Maxim Ostapenko


Ping.

 Original Message 
Subject:[PATCH][Ping v4] Add patch for debugging compiler ICEs
Date:   Fri, 11 Jul 2014 17:44:28 +0400
From:   Maxim Ostapenko 
To: GCC Patches 
CC: 	Yury Gribov , Slava Garbuzov 
, Jakub Jelinek , 
tsaund...@mozilla.com, Maxim Ostapenko 




Ping. Added small changes due to previous discussion in community.


 Original Message 
Subject:[PATCH][Ping v3] Add patch for debugging compiler ICEs
Date:   Fri, 04 Jul 2014 18:32:44 +0400
From:   Maxim Ostapenko 
To: GCC Patches 
CC: Yury Gribov , Slava Garbuzov
, Jakub Jelinek ,
tsaund...@mozilla.com, Maxim Ostapenko 



Ping.


 Original Message 
Subject:[PATCH][Ping v2] Add patch for debugging compiler ICEs
Date:   Thu, 26 Jun 2014 19:46:08 +0400
From:   Maxim Ostapenko 
To: GCC Patches 
CC: Yury Gribov , Slava Garbuzov
, Jakub Jelinek ,
tsaund...@mozilla.com, Maxim Ostapenko 



Ping.


 Original Message 
Subject:[PATCH][Ping] Add patch for debugging compiler ICEs
Date:   Wed, 11 Jun 2014 18:15:27 +0400
From:   Maxim Ostapenko 
To: GCC Patches 
CC: Yury Gribov , Slava Garbuzov
, Jakub Jelinek ,
tsaund...@mozilla.com, chefm...@gmail.com



Ping.


 Original Message 
Subject:[PATCH] Add patch for debugging compiler ICEs
Date:   Mon, 02 Jun 2014 19:21:14 +0400
From:   Maxim Ostapenko 
To: GCC Patches 
CC: Yury Gribov , Slava Garbuzov
, Jakub Jelinek ,
tsaund...@mozilla.com, chefm...@gmail.com



Hi,

A years ago there was a discussion 
(https://gcc.gnu.org/ml/gcc-patches/2004-01/msg02437.html) about debugging 
compiler ICEs that resulted in a patch from Jakub, which dumps
useful information into temporary file, but for some reasons this patch wasn't 
applied to trunk.

This is the resurrected patch with added GCC version information into generated 
repro file.

-Maxim













2014-06-02  Jakub Jelinek  
	Max Ostapenko  

	* diagnostic.c (diagnostic_action_after_output): Exit with
	ICE_EXIT_CODE instead of FATAL_EXIT_CODE.
	* gcc.c (execute): Don't free first string early, but at the end
	of the function.  Call retry_ice if compiler exited with
	ICE_EXIT_CODE.
	(main): Factor out common code.
	(print_configuration): New function.
	(try_fork): Likewise.
	(redirect_stdout_stderr): Likewise.
	(files_equal_p): Likewise.
	(check_repro): Likewise.
	(run_attempt): Likewise.
	(generate_preprocessed_code): Likewise.
	(append_text): Likewise.
	(try_generate_repro): Likewise.

diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 0cc7593..67b8c5b 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -492,7 +492,7 @@ diagnostic_action_after_output (diagnostic_context *context,
 	real_abort ();
   diagnostic_finish (context);
   fnotice (stderr, "compilation terminated.\n");
-  exit (FATAL_EXIT_CODE);
+  exit (ICE_EXIT_CODE);
 
 default:
   gcc_unreachable ();
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 6cd08ea..045363c 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -43,6 +43,13 @@ compilation is specified by a string called a "spec".  */
 #include "params.h"
 #include "vec.h"
 #include "filenames.h"
+#ifdef HAVE_UNISTD_H
+#include 
+#endif
+
+#if !(defined (__MSDOS__) || defined (OS2) || defined (VMS))
+#define RETRY_ICE_SUPPORTED
+#endif
 
 /* By default there is no special suffix for target executables.  */
 /* FIXME: when autoconf is fixed, remove the host check - dj */
@@ -253,6 +260,9 @@ static void init_gcc_specs (struct obstack *, const char *, const char *,
 static const char *convert_filename (const char *, int, int);
 #endif
 
+#ifdef RETRY_ICE_SUPPORTED
+static void try_generate_repro (const char *prog, const char **argv);
+#endif
 static const char *getenv_spec_function (int, const char **);
 static const char *if_exists_spec_function (int, const char **);
 static const char *if_exists_else_spec_function (int, const char **);
@@ -2850,7 +2860,7 @@ execute (void)
 	}
 	}
 
-  if (string != commands[i].prog)
+  if (i && string != commands[i].prog)
 	free (CONST_CAST (char *, string));
 }
 
@@ -2903,6 +2913,16 @@ execute (void)
 	else if (WIFEXITED (status)
 		 && WEXITSTATUS (status) >= MIN_FATAL_STATUS)
 	  {
+#ifdef RETRY_ICE_SUPPORTED
+	/* For ICEs in cc1, cc1obj, cc1plus see if it is
+	   reproducible or not.  */
+	const char *p;
+	if (WEXITSTATUS (status) == ICE_EXIT_CODE
+		&& i == 0
+		&& (p = strrchr (commands[0].argv[0], DIR_SEPARATOR))
+		&& ! strncmp (p + 1, "cc1", 3))
+	  try_generate_repro (commands[0].prog, commands[0].argv);
+#endif
 	if (WEXITSTATUS (status) > greatest_status)
 	  greatest_status = WEXITSTATUS (status);
 	ret_code = -1;
@@ -2960,6 +2980,9 @@ execute (void)
 	  }
   }
 
+   if (commands[0].argv[0] != commands[0].prog)
+ free (CONST_CAST (char *, commands[0].argv[0]));
+
 return ret_code;
   }
 }
@@ -6151,6 +6174,341 @@ give_switch (int switchnum, int omit_first_word)

[AArch64/GCC][17/N] Optimize prologue when there is no frame pointe

2014-07-24 Thread Jiong Wang


Under new pro/epi code, we could also utilize our store write-back to optimize
stack adjustment when there is no frame pointer.

* if there is candidate reg pair and adjustment amount is less than 512 then we
could use aarch64's paired store write-back.
* if there is only a single candidate reg and adjustment amount is less than 
256,
we could use aarch64's single store write-back.
* otherwise use explictly subtraction to finish stack adjustment.

  Improved testcases:

gcc.target/aarch64/test_frame_1.c

gcc.target/aarch64/test_frame_10.c
gcc.target/aarch64/test_frame_2.c
gcc.target/aarch64/test_frame_4.c
gcc.target/aarch64/test_frame_6.c
gcc.target/aarch64/test_frame_7.c
gcc.target/aarch64/test_frame_8.c
gcc.target/aarch64/test_fp_attribute_1.c

ok for install?

gcc/
  * config/aarch64/aarch64.c (aarch64_pushwb_single_reg): New function.
  (aarch64_expand_prologue): Optimize prologue when !frame_pointer_needed.

gcc/testsuite/
  * gcc.target/aarch64/test_frame_1.c: Match optimized instruction sequences.
  * gcc.target/aarch64/test_frame_10.c: Likewise.
  * gcc.target/aarch64/test_frame_2.c: Likewise.
  * gcc.target/aarch64/test_frame_4.c: Likewise.
  * gcc.target/aarch64/test_frame_6.c: Likewise.
  * gcc.target/aarch64/test_frame_7.c: Likewise.
  * gcc.target/aarch64/test_frame_8.c: Likewise.
  * gcc.target/aarch64/test_fp_attribute_1.c: Likewise.
From e3ab087747c2f4ddeef0482983b2ebc3bbdc131f Mon Sep 17 00:00:00 2001
From: Jiong Wang 
Date: Tue, 17 Jun 2014 22:24:44 +0100
Subject: [PATCH 17/19] [AArch64/GCC][18/20] Optimize prologue when there is
 no frame pointer

Under new pro/epi code, we could also utilize our store write-back to optimize
stack adjustment when there is no frame pointer.

* if there is candidate reg pair and adjustment amount is less than 512 then we
  could use aarch64's paired store write-back.
* if there is only a single candidate reg and adjustment amount is less than
  256, we could use aarch64's single store write-back.
* otherwise use explictly subtraction to finish stack adjustment.

  Improved testcases:

  gcc.target/aarch64/test_frame_1.c
  gcc.target/aarch64/test_frame_10.c
  gcc.target/aarch64/test_frame_2.c
  gcc.target/aarch64/test_frame_4.c
  gcc.target/aarch64/test_frame_6.c
  gcc.target/aarch64/test_frame_7.c
  gcc.target/aarch64/test_frame_8.c
  gcc.target/aarch64/test_fp_attribute_1.c

2014-06-16  Jiong Wang 
	Marcus Shawcroft  

gcc/
  * config/aarch64/aarch64.c (aarch64_pushwb_single_reg): New function.
  (aarch64_expand_prologue): Optimize prologue when !frame_pointer_needed.

gcc/testsuite/
  * gcc.target/aarch64/test_frame_1.c: Match optimized instruction sequences.
  * gcc.target/aarch64/test_frame_10.c: Likewise.
  * gcc.target/aarch64/test_frame_2.c: Likewise.
  * gcc.target/aarch64/test_frame_4.c: Likewise.
  * gcc.target/aarch64/test_frame_6.c: Likewise.
  * gcc.target/aarch64/test_frame_7.c: Likewise.
  * gcc.target/aarch64/test_frame_8.c: Likewise.
  * gcc.target/aarch64/test_fp_attribute_1.c: Likewise.
---
 gcc/config/aarch64/aarch64.c   |   58 +++-
 .../gcc.target/aarch64/test_fp_attribute_1.c   |2 +-
 gcc/testsuite/gcc.target/aarch64/test_frame_1.c|5 +-
 gcc/testsuite/gcc.target/aarch64/test_frame_10.c   |5 +-
 gcc/testsuite/gcc.target/aarch64/test_frame_2.c|6 +-
 gcc/testsuite/gcc.target/aarch64/test_frame_4.c|5 +-
 gcc/testsuite/gcc.target/aarch64/test_frame_6.c|5 +-
 gcc/testsuite/gcc.target/aarch64/test_frame_7.c|5 +-
 gcc/testsuite/gcc.target/aarch64/test_frame_8.c|5 +-
 9 files changed, 74 insertions(+), 22 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 26d5fba..365fdd4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1927,6 +1927,22 @@ aarch64_next_callee_save (unsigned regno, unsigned limit)
   return regno;
 }
 
+static void
+aarch64_pushwb_single_reg (enum machine_mode mode, unsigned regno,
+			   HOST_WIDE_INT adjustment)
+ {
+  rtx base_rtx = stack_pointer_rtx;
+  rtx insn, reg, mem;
+
+  reg = gen_rtx_REG (mode, regno);
+  mem = gen_rtx_PRE_MODIFY (Pmode, base_rtx,
+			plus_constant (Pmode, base_rtx, -adjustment));
+  mem = gen_rtx_MEM (mode, mem);
+
+  insn = emit_move_insn (mem, reg);
+  RTX_FRAME_RELATED_P (insn) = 1;
+}
+
 static rtx
 aarch64_gen_storewb_pair (enum machine_mode mode, rtx base, rtx reg, rtx reg2,
 			  HOST_WIDE_INT adjustment)
@@ -2276,11 +2292,10 @@ aarch64_expand_prologue (void)
 {
   bool skip_wb = false;
 
-  /* Save the frame pointer and lr if the frame pointer is needed
-	 first.  Make the frame pointer point to the location of the
-	 old frame pointer on the stack.  */
   if (frame_pointer_needed)
 	{
+	  skip_wb = true;
+
 	  if (fp_offset)
 	{
 	  insn = emit_insn (gen_add2_insn (stack_pointer_rtx,
@@ -2288,12 +2303,11 @@ aarch64_expand_prologue (void)
 	  RTX_FR

[AArch64/GCC][18/N] Optimize epilogue when there is no frame pointer

2014-07-24 Thread Jiong Wang


Like [17/N], we do same optimization on epilogue for !frame_pointer_needed.
 
this is the last of this aarch64 stack patch set.


  Improved testcases:

gcc.target/aarch64/test_frame_1.c

gcc.target/aarch64/test_frame_2.c
gcc.target/aarch64/test_frame_4.c
gcc.target/aarch64/test_frame_6.c
gcc.target/aarch64/test_frame_7.c
gcc.target/aarch64/test_frame_8.c
gcc.target/aarch64/test_frame_10.c


ok for install?

thanks.

gcc/
  * config/aarch64/aarch64.c (aarch64_popwb_single_reg): New function.
  (aarch64_expand_epilogue): Optimize epilogue when !frame_pointer_needed.

gcc/testsuite/
  * gcc.target/aarch64/test_frame_1.c: Match optimized instruction sequences.
  * gcc.target/aarch64/test_frame_2.c: Likewise.
  * gcc.target/aarch64/test_frame_4.c: Likewise.
  * gcc.target/aarch64/test_frame_6.c: Likewise.
  * gcc.target/aarch64/test_frame_7.c: Likewise.
  * gcc.target/aarch64/test_frame_8.c: Likewise.
  * gcc.target/aarch64/test_frame_10.c: Likewise.

[PATCH] Fix PRs 61762 and 61894 - native_encode of pieces

2014-07-24 Thread Richard Biener


The following fixes PR61762 and 61894 by making folding from
const aggregates use native_encode/interpret when reaching
a tcc_constant initializer.  To make this work efficiently
the patch introduces the ability to encode only a part of
a tree expression, namely from [off, off + len] when off is
not -1 (in which case old semantics apply).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2014-07-24  Richard Biener  

PR middle-end/61762
PR middle-end/61894
* fold-const.c (native_encode_int): Add and handle offset
parameter to do partial encodings of expr.
(native_encode_fixed): Likewise.
(native_encode_real): Likewise.
(native_encode_complex): Likewise.
(native_encode_vector): Likewise.
(native_encode_string): Likewise.
(native_encode_expr): Likewise.
* fold-const.c (native_encode_expr): Add offset parameter
defaulting to -1.
* gimple-fold.c (fold_string_cst_ctor_reference): Remove.
(fold_ctor_reference): Handle all reads from tcc_constant
ctors.
  
* gcc.dg/pr61762.c: New testcase.
* gcc.dg/fold-cstring.c: Likewise.
* gcc.dg/fold-cvect.c: Likewise.

Index: gcc/fold-const.c
===
*** gcc/fold-const.c.orig   2014-07-16 09:48:06.089217352 +0200
--- gcc/fold-const.c2014-07-24 12:54:11.483860342 +0200
*** fold_plusminus_mult_expr (location_t loc
*** 7240,7254 
 upon failure.  */
  
  static int
! native_encode_int (const_tree expr, unsigned char *ptr, int len)
  {
tree type = TREE_TYPE (expr);
int total_bytes = GET_MODE_SIZE (TYPE_MODE (type));
int byte, offset, word, words;
unsigned char value;
  
!   if (total_bytes > len)
  return 0;
words = total_bytes / UNITS_PER_WORD;
  
for (byte = 0; byte < total_bytes; byte++)
--- 7240,7257 
 upon failure.  */
  
  static int
! native_encode_int (const_tree expr, unsigned char *ptr, int len, int off)
  {
tree type = TREE_TYPE (expr);
int total_bytes = GET_MODE_SIZE (TYPE_MODE (type));
int byte, offset, word, words;
unsigned char value;
  
!   if ((off == -1 && total_bytes > len)
!   || off >= total_bytes)
  return 0;
+   if (off == -1)
+ off = 0;
words = total_bytes / UNITS_PER_WORD;
  
for (byte = 0; byte < total_bytes; byte++)
*** native_encode_int (const_tree expr, unsi
*** 7271,7279 
}
else
offset = BYTES_BIG_ENDIAN ? (total_bytes - 1) - byte : byte;
!   ptr[offset] = value;
  }
!   return total_bytes;
  }
  
  
--- 7274,7284 
}
else
offset = BYTES_BIG_ENDIAN ? (total_bytes - 1) - byte : byte;
!   if (offset >= off
! && offset - off < len)
!   ptr[offset - off] = value;
  }
!   return MIN (len, total_bytes - off);
  }
  
  
*** native_encode_int (const_tree expr, unsi
*** 7283,7289 
 upon failure.  */
  
  static int
! native_encode_fixed (const_tree expr, unsigned char *ptr, int len)
  {
tree type = TREE_TYPE (expr);
enum machine_mode mode = TYPE_MODE (type);
--- 7288,7294 
 upon failure.  */
  
  static int
! native_encode_fixed (const_tree expr, unsigned char *ptr, int len, int off)
  {
tree type = TREE_TYPE (expr);
enum machine_mode mode = TYPE_MODE (type);
*** native_encode_fixed (const_tree expr, un
*** 7303,7309 
value = TREE_FIXED_CST (expr);
i_value = double_int_to_tree (i_type, value.data);
  
!   return native_encode_int (i_value, ptr, len);
  }
  
  
--- 7308,7314 
value = TREE_FIXED_CST (expr);
i_value = double_int_to_tree (i_type, value.data);
  
!   return native_encode_int (i_value, ptr, len, off);
  }
  
  
*** native_encode_fixed (const_tree expr, un
*** 7313,7319 
 upon failure.  */
  
  static int
! native_encode_real (const_tree expr, unsigned char *ptr, int len)
  {
tree type = TREE_TYPE (expr);
int total_bytes = GET_MODE_SIZE (TYPE_MODE (type));
--- 7318,7324 
 upon failure.  */
  
  static int
! native_encode_real (const_tree expr, unsigned char *ptr, int len, int off)
  {
tree type = TREE_TYPE (expr);
int total_bytes = GET_MODE_SIZE (TYPE_MODE (type));
*** native_encode_real (const_tree expr, uns
*** 7325,7332 
   up to 192 bits.  */
long tmp[6];
  
!   if (total_bytes > len)
  return 0;
words = (32 / BITS_PER_UNIT) / UNITS_PER_WORD;
  
real_to_target (tmp, TREE_REAL_CST_PTR (expr), TYPE_MODE (type));
--- 7330,7340 
   up to 192 bits.  */
long tmp[6];
  
!   if ((off == -1 && total_bytes > len)
!   || off >= total_bytes)
  return 0;
+   if (off == -1)
+ off = 0;
words = (32 / BITS_PER_UNIT) / UNITS_PER_WORD;
  
real_to_target (tmp, TREE_REAL_CST_PTR (expr), TYPE_MODE (type));
*** native_encode_real (const_tree expr,

Re: [AArch64/GCC][18/N] Optimize epilogue when there is no frame pointer

2014-07-24 Thread Jiong Wang


sorry, attach patch.

ok for install?

thanks.

gcc/
   * config/aarch64/aarch64.c (aarch64_popwb_single_reg): New function.
   (aarch64_expand_epilogue): Optimize epilogue when !frame_pointer_needed.

gcc/testsuite/
   * gcc.target/aarch64/test_frame_1.c: Match optimized instruction sequences.
   * gcc.target/aarch64/test_frame_2.c: Likewise.
   * gcc.target/aarch64/test_frame_4.c: Likewise.
   * gcc.target/aarch64/test_frame_6.c: Likewise.
   * gcc.target/aarch64/test_frame_7.c: Likewise.
   * gcc.target/aarch64/test_frame_8.c: Likewise.
   * gcc.target/aarch64/test_frame_10.c: Likewise.


On 24/07/14 13:46, Jiong Wang wrote:

Like [17/N], we do same optimization on epilogue for !frame_pointer_needed.
   
this is the last of this aarch64 stack patch set.


Improved testcases:
  
  gcc.target/aarch64/test_frame_1.c

  gcc.target/aarch64/test_frame_2.c
  gcc.target/aarch64/test_frame_4.c
  gcc.target/aarch64/test_frame_6.c
  gcc.target/aarch64/test_frame_7.c
  gcc.target/aarch64/test_frame_8.c
  gcc.target/aarch64/test_frame_10.c


ok for install?

thanks.

gcc/
* config/aarch64/aarch64.c (aarch64_popwb_single_reg): New function.
(aarch64_expand_epilogue): Optimize epilogue when !frame_pointer_needed.

gcc/testsuite/
* gcc.target/aarch64/test_frame_1.c: Match optimized instruction sequences.
* gcc.target/aarch64/test_frame_2.c: Likewise.
* gcc.target/aarch64/test_frame_4.c: Likewise.
* gcc.target/aarch64/test_frame_6.c: Likewise.
* gcc.target/aarch64/test_frame_7.c: Likewise.
* gcc.target/aarch64/test_frame_8.c: Likewise.
* gcc.target/aarch64/test_frame_10.c: Likewise.





From d1dc1b7e1c25b37aecf28967b4368422a9378d56 Mon Sep 17 00:00:00 2001
From: Jiong Wang 
Date: Tue, 17 Jun 2014 22:27:13 +0100
Subject: [PATCH 18/19] [AArch64/GCC][19/20] Optimize epilogue when there is
 no frame pointer

Like previous patch, we do same optimization on epilogue for
!frame_pointer_needed.

  Improved testcases:

  gcc.target/aarch64/test_frame_1.c
  gcc.target/aarch64/test_frame_2.c
  gcc.target/aarch64/test_frame_4.c
  gcc.target/aarch64/test_frame_6.c
  gcc.target/aarch64/test_frame_7.c
  gcc.target/aarch64/test_frame_8.c
  gcc.target/aarch64/test_frame_10.c

2014-06-16  Jiong Wang 
	Marcus Shawcroft  

gcc/
  * config/aarch64/aarch64.c (aarch64_popwb_single_reg): New function.
  (aarch64_expand_epilogue): Optimize epilogue when !frame_pointer_needed.

gcc/testsuite/
  * gcc.target/aarch64/test_frame_1.c: Match optimized instruction sequences.
  * gcc.target/aarch64/test_frame_2.c: Likewise.
  * gcc.target/aarch64/test_frame_4.c: Likewise.
  * gcc.target/aarch64/test_frame_6.c: Likewise.
  * gcc.target/aarch64/test_frame_7.c: Likewise.
  * gcc.target/aarch64/test_frame_8.c: Likewise.
  * gcc.target/aarch64/test_frame_10.c: Likewise.
---
 gcc/config/aarch64/aarch64.c |   55 ++
 gcc/testsuite/gcc.target/aarch64/test_frame_1.c  |2 +
 gcc/testsuite/gcc.target/aarch64/test_frame_10.c |2 +
 gcc/testsuite/gcc.target/aarch64/test_frame_2.c  |2 +
 gcc/testsuite/gcc.target/aarch64/test_frame_4.c  |2 +
 gcc/testsuite/gcc.target/aarch64/test_frame_6.c  |2 +
 gcc/testsuite/gcc.target/aarch64/test_frame_7.c  |2 +
 gcc/testsuite/gcc.target/aarch64/test_frame_8.c  |2 +
 8 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 365fdd4..b884685 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1943,6 +1943,23 @@ aarch64_pushwb_single_reg (enum machine_mode mode, unsigned regno,
   RTX_FRAME_RELATED_P (insn) = 1;
 }
 
+static void
+aarch64_popwb_single_reg (enum machine_mode mode, unsigned regno,
+			  HOST_WIDE_INT adjustment)
+{
+  rtx base_rtx = stack_pointer_rtx;
+  rtx insn, reg, mem;
+
+  reg = gen_rtx_REG (mode, regno);
+  mem = gen_rtx_POST_MODIFY (Pmode, base_rtx,
+			 plus_constant (Pmode, base_rtx, adjustment));
+  mem = gen_rtx_MEM (mode, mem);
+
+  insn = emit_move_insn (reg, mem);
+  add_reg_note (insn, REG_CFA_RESTORE, reg);
+  RTX_FRAME_RELATED_P (insn) = 1;
+}
+
 static rtx
 aarch64_gen_storewb_pair (enum machine_mode mode, rtx base, rtx reg, rtx reg2,
 			  HOST_WIDE_INT adjustment)
@@ -2381,7 +2398,6 @@ aarch64_expand_epilogue (bool for_sibcall)
   HOST_WIDE_INT fp_offset;
   rtx insn;
   rtx cfa_reg;
-  bool skip_wb = false;
 
   aarch64_layout_frame ();
 
@@ -2429,22 +2445,41 @@ aarch64_expand_epilogue (bool for_sibcall)
   cfa_reg = stack_pointer_rtx;
 }
 
-  aarch64_restore_callee_saves (DFmode, frame_pointer_needed ? 0 : fp_offset,
-V0_REGNUM, V31_REGNUM, skip_wb);
-
   if (offset > 0)
 {
+  unsigned reg1 = cfun->machine->frame.wb_candidate1;
+  unsigned reg2 = cfun->machine->frame.wb_candidate2;
+  bool skip_wb = true;
+
   if (frame_pointer_needed)
+	fp_offset = 0;
+  else if (fp_offset
+	   || reg1 == FIRST_PSEUDO_REGISTER

Re: [PATCH] PR 61876: Do not convert cast + __builtin_round into __builtin_lround unless -fno-math-errno is used

2014-07-24 Thread Richard Biener

On Thu, Jul 24, 2014 at 2:36 PM, Kyrill Tkachov  wrote:
> Hi all,
>
> This fixes PR 61876 by not converting the round + cast into an lround unless
> -fno-math-errno is specified.
> This is because lround can potentially set math errno whereas round + cast
> doesn't, so the transformation isn't universally valid.
>
> This will cause the tests:
> gcc.target/aarch64/fcvt_double_long.c
> gcc.target/aarch64/fcvt_double_ulong.c
>
> to start passing on aarch64-linux.
>
> aarch64 and x86 bootstrap and regtest looks fine.
>
> Ok for trunk?

Ok.  Does this really only apply to the round() case and not to all
the others (floor, ceil, rint) as well?

Thanks,
Richard.

> 2014-06-23  Kyrylo Tkachov  
>
> PR middle-end/61876
> * convert.c (convert_to_integer): Do not convert BUILT_IN_ROUND and cast
> when flat_errno_math is on.

Re: Regimplification enhancements 3/3

2014-07-24 Thread Richard Biener

On Thu, Jul 24, 2014 at 2:38 PM, Martin Jambor  wrote:
> Hi,
>
> sorry for late reply, I've been on vacation and then preparing for
> Cauldron.  Anyway...
>
> On Mon, Jun 30, 2014 at 05:13:13PM +0200, Bernd Schmidt wrote:
>> On 06/17/2014 04:54 PM, Martin Jambor wrote:
>> >Weird... does the following (untested) patch help?
>> >
>> >diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>> >index 0afa197..747b1b6 100644
>> >--- a/gcc/tree-sra.c
>> >+++ b/gcc/tree-sra.c
>> >@@ -3277,6 +3277,8 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator 
>> >*gsi)
>> >
>> >if (modify_this_stmt
>> >|| gimple_has_volatile_ops (*stmt)
>> >+  || is_gimple_reg (lhs)
>> >+  || is_gimple_reg (rhs)
>> >|| contains_vce_or_bfcref_p (rhs)
>> >|| contains_vce_or_bfcref_p (lhs)
>> >|| stmt_ends_bb_p (*stmt))
>>
>> Unfortunately not.
>>
>> >It is just a quick thought though.  If it does not, could you post the
>> >access trees dumped by -fdump-tree-esra-details or
>> >-fdump-tree-sra-details (depending on whether this is early or late
>> >SRA)?  Or is it simple to set it up locally?
>>
>> Not really. It needs a whole patch tree for the ptx port. I'm
>> attaching the last two dump files.
>>
>>
>> Bernd
>>
>
>>
>> ;; Function bar (bar, funcdef_no=0, decl_uid=1376, symbol_order=0)
>>
>>
>> Pass statistics:
>> 
>>
>>
>> Pass statistics:
>> 
>>
>> bar (struct S xD.1375)
>> {
>>   struct S D.1385;
>>   struct S aD.1378;
>>   struct S D.1379;
>>   struct S D.1381;
>>
>> ;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
>> ;;prev block 0, next block 1, flags: (NEW, REACHABLE)
>> ;;pred:   ENTRY [100.0%]  (FALLTHRU,EXECUTABLE)
>>   # .MEM_2 = VDEF <.MEM_1(D)>
>>   aD.1378 = xD.1375;
>>   # .MEM_3 = VDEF <.MEM_2>
>>   # USE = nonlocal
>>   # CLB = nonlocal
>>   _6 = fooD.1374 (aD.1378);
>>   # .MEM_7 = VDEF <.MEM_3>
>>   D.1379 = _6;
>
> This seems to be the statement which has its RHS converted to to a
> MEM_REF[&_6], am I right?  I wonder whether it is correct input
> though, because it looks like it has mismatched types.  The LHS is
> clearly an aggregate of type struct S while the RHS is an SSA name,
> meaning it cannot be of an aggregate type.  Does this pass gimple
> checking?  What creates that statement?

Yeah, looks clearly invalid.  MEM_REF[&_6] is not valid even if
the types were correct (taking the address of an SSA name).

Richard.

> Thanks,
>
> Martin
>
>
>>   # .MEM_4 = VDEF <.MEM_7>
>>   aD.1378 ={v} {CLOBBER};
>>   # .MEM_5 = VDEF <.MEM_4>
>>   D.1381 = D.1379;
>>   # VUSE <.MEM_5>
>>   return D.1381;
>> ;;succ:   EXIT [100.0%]
>>
>> }
>>
>>
>
>>
>> ;; Function bar (bar, funcdef_no=0, decl_uid=1376, symbol_order=0)
>>
>>
>> Pass statistics:
>> 
>>
>> Candidate (1375): x
>> Candidate (1385): D.1385
>> Candidate (1378): a
>> Candidate (1379): D.1379
>> Candidate (1381): D.1381
>> Will attempt to totally scalarize D.1379 (UID: 1379):
>> ! Disqualifying D.1385 - No or inhibitingly overlapping accesses.
>> ! Disqualifying x - No scalar replacements to be created.
>> ! Disqualifying a - No scalar replacements to be created.
>> Created a replacement for D.1379 offset: 0, size: 32: SR$2
>>
>> Access trees for D.1379 (UID: 1379):
>> access { base = (1379)'D.1379', offset = 0, size = 32, expr = D.1379.len, 
>> type = unsigned int, grp_read = 1, grp_write = 1, grp_assignment_read = 1, 
>> grp_assignment_write = 1, grp_scalar_read = 1, grp_scalar_write = 0, 
>> grp_total_scalarization = 1, grp_hint = 1, grp_covered = 1, 
>> grp_unscalarizable_region = 0, grp_unscalarized_data = 0, grp_partial_lhs = 
>> 0, grp_to_be_replaced = 1, grp_to_be_debug_replaced = 0, grp_maybe_modified 
>> = 0, grp_not_necessarilly_dereferenced = 0
>>
>> ! Disqualifying D.1381 - No scalar replacements to be created.
>>
>> Pass statistics:
>> 
>> Scalarized aggregates: 1
>> Modified expressions: 2
>> Separate LHS and RHS handling: 2
>> Scalar replacements created: 1
>>
>>
>> Updating SSA:
>> Registering new PHI nodes in block #0
>> Registering new PHI nodes in block #2
>> Updating SSA information for statement SR$2 = MEM[(struct S *)&_6];
>> Updating SSA information for statement MEM[(struct S *)&D.1381] = SR$2;
>>
>> DFA Statistics for bar
>>
>> -
>> Number ofMemory
>> instances used
>> -
>> USE operands  1  8b
>> DEF operands  2 16b
>> VUSE operands 6 48b
>> VDEF operands 4 32b
>> PHI nodes 0  0b
>> PHI arguments 0  0b
>> -
>> Total memory used by DFA/SSA data

Re: update address taken: don't drop clobbers

2014-07-24 Thread Richard Biener

On Sat, Jul 12, 2014 at 8:15 AM, Marc Glisse  wrote:
> On Thu, 10 Jul 2014, Richard Biener wrote:
>
>>> --- gcc/tree-into-ssa.c (revision 212109)
>>> +++ gcc/tree-into-ssa.c (working copy)
>>> @@ -1831,26 +1831,38 @@ maybe_register_def (def_operand_p def_p,
>>>  {
>>>tree def = DEF_FROM_PTR (def_p);
>>>tree sym = DECL_P (def) ? def : SSA_NAME_VAR (def);
>>>
>>>/* If DEF is a naked symbol that needs renaming, create a new
>>>   name for it.  */
>>>if (marked_for_renaming (sym))
>>>  {
>>>if (DECL_P (def))
>>> {
>>> - tree tracked_var;
>>> -
>>> - def = make_ssa_name (def, stmt);
>>> + if (gimple_clobber_p (stmt) && is_gimple_reg (sym))
>>
>>
>> sym should always be a gimple reg here (it's marked for renaming).
>>
>>> +   {
>>> + /* Replace clobber stmts with a default def.  Create a new
>>> +variable so we don't later think we must coalesce, which
>>> would
>>> +fail with some ada abnormal PHIs.  Still, we try to keep
>>> a
>>> +similar name so error messages make sense.  */
>>> + unlink_stmt_vdef (stmt);
>>
>>
>> I think that's redundant with gsi_replace (note that using gsi_replace
>> looks dangerous here as it calls update_stmt during SSA rewrite...
>> that might open a can of worms).
>
>
> IIRC it was failing without unlink_stmt_vdef (maybe that was in a different
> version of the patch not using gsi_replace, but I don't think so). I was
> hoping that a clobber had little enough effects that update_stmt was
> unlikely to break anything. Anyway it doesn't matter if I use your
> suggestion below.

The important part is that it not ICE (of course) and that it doesn't
trigger useless SSA renaming of .MEM - gsi_replace does
update_stmt which does the unlink_stmt_vdef for you if the .MEM
is no longer necessary.

Richard.

>
>>> + gsi_replace (&gsi, gimple_build_nop (), true);
>>> + tree id = DECL_NAME (sym);
>>> + const char* name = id ? IDENTIFIER_POINTER (id) : 0;
>>> + tree newvar = create_tmp_var (TREE_TYPE (sym), name);
>>> + def = get_or_create_ssa_default_def (cfun, newvar);
>>
>>
>> So - can't you simply do
>>
>>gimple_assign_set_rhs_from_tree (&gsi,
>> get_or_create_dda_default_def (cfun, sym));
>>
>> ?  Thus replace x = CLOBBER; with x_3 = x_2(D);
>>
>>> +   }
>>> + else
>>
>>
>> and of course still rewrite the DEF then.  IMHO the copy-propagation
>> you do is premature optimization.
>
>
> I'll try that. I was trying to remain as close as possible to what you wrote
> in rewrite_stmt:
>
> if (gimple_clobber_p (stmt)
> && is_gimple_reg (var))
>   {
> /* If we rewrite a DECL into SSA form then drop its
>clobber stmts and replace uses with a new default def.  */
> gcc_checking_assert (TREE_CODE (var) == VAR_DECL
>  && !gimple_vdef (stmt));
> gsi_replace (si, gimple_build_nop (), true);
> register_new_def (get_or_create_ssa_default_def (cfun, var),
> var);
> break;
>   }
>
>
> I'll be away next week, but I'll re-read all the replies carefully when I
> come back.
>
> --
> Marc Glisse

Re: [PATCH] PR 61876: Do not convert cast + __builtin_round into __builtin_lround unless -fno-math-errno is used

2014-07-24 Thread Kyrill Tkachov



On 24/07/14 13:51, Richard Biener wrote:

On Thu, Jul 24, 2014 at 2:36 PM, Kyrill Tkachov  wrote:

Hi all,

This fixes PR 61876 by not converting the round + cast into an lround unless
-fno-math-errno is specified.
This is because lround can potentially set math errno whereas round + cast
doesn't, so the transformation isn't universally valid.

This will cause the tests:
gcc.target/aarch64/fcvt_double_long.c
gcc.target/aarch64/fcvt_double_ulong.c

to start passing on aarch64-linux.

aarch64 and x86 bootstrap and regtest looks fine.

Ok for trunk?

Ok.  Does this really only apply to the round() case and not to all
the others (floor, ceil, rint) as well?

Thanks for the review,

From what I understand only lround and lrint are defined in the C standard.
There is no lfloor for example, the builtin lfloor in gcc is just an 
extension.

Do we have defined semantics for lfloor somewhere?

The lrint case seems to be similar to the lround case (the documentation 
say the same thing in the Errors section). I can whip up a patch to 
guard that transformation as well...


Kyrill



Thanks,
Richard.


2014-06-23  Kyrylo Tkachov  

 PR middle-end/61876
 * convert.c (convert_to_integer): Do not convert BUILT_IN_ROUND and cast
 when flat_errno_math is on.

Re: [PATCH, testcase, committed] Exit with zero status from g++.dg/ipa/pr61160-3.C

2014-07-24 Thread Martin Jambor

On Tue, Jul 22, 2014 at 06:31:32PM +0200, Martin Jambor wrote:
> Hi,
> 
> in order to avoid spurious testsuite failures, I've checked in the
> following obvious patch so that the testcase always returns zero.  I
> have verified it still properly tests for non-existence of the bug.

...and here goes the same thing for g++.dg/ipa/pr61160-2.C which I
forgot is also a run time test.  I will commit these changes to both
testcase on the 4.9 branch shortly.  Hopefully that will be really it.

Martin

2014-07-24  Martin Jambor  

   PR ipa/61160
   * g++.dg/ipa/pr61160-2.C (main): Always return zero.

Index: g++.dg/ipa/pr61160-2.C
===
--- g++.dg/ipa/pr61160-2.C  (revision 212986)
+++ g++.dg/ipa/pr61160-2.C  (working copy)
@@ -39,5 +39,6 @@ void *test (MMixin & anExample)
 int main ()
 {
   CExample c;
-  return (test (c) != &c);
+  test (c);
+  return 0;
 }

Re: FWD: Re: OpenACC subarray specifications in the GCC Fortran front end

2014-07-24 Thread Thomas Schwinge

Hi Cesar!

On Wed, 23 Jul 2014 17:42:32 -0700, Cesar Philippidis  
wrote:
> On 07/11/2014 03:29 AM, Jakub Jelinek wrote:
> > On Fri, Jul 11, 2014 at 12:11:10PM +0200, Thomas Schwinge wrote:
> >> To avoid duplication of work: with Jakub's Fortran OpenMP 4 target
> >> changes recently committed to trunk, and now merged into gomp-4_0-branch,
> >> I have trimmed down Ilmir's patch to just the OpenACC bits, OpenMP 4
> >> target changes removed, and TODO markers added to integrate into that.
> > 
> > Resolving the TODO markers would be nice, indeed.
> 
> This patch has the openacc data clauses use the new openmp maps. In the
> process of doing so, I removed a lot of the old OMP_LIST_ enums and
> added a few OMP_MAP enums to match what the c frontend currently supports.

Thanks!

> Thomas, is this OK for gomp-4_0-branch? There are no new regressions.

A few comments.  Also copying Tobias in case he has any additional
comments on the Fortran front end changes.

OMP_LIST_DEVICEPTR remains to be converted, which can be done as a later
follow-up patch.

> 2014-07-23  Cesar Philippidis  
>   Thomas Schwinge  
>   Ilmir Usmanov  
> 
>   gcc/fortran/
>   * gfortran.h (gfc_omp_map_op): Add OMP_MAP_TOFROM,

OMP_MAP_TOFROM already has been present:

> --- a/gcc/fortran/gfortran.h
> +++ b/gcc/fortran/gfortran.h
> @@ -,7 +,13 @@ typedef enum
>OMP_MAP_ALLOC,
>OMP_MAP_TO,
>OMP_MAP_FROM,
> -  OMP_MAP_TOFROM
> +  OMP_MAP_TOFROM,
> +  OMP_MAP_FORCE_ALLOC,
> +[...]

> --- a/gcc/fortran/openmp.c
> +++ b/gcc/fortran/openmp.c
> @@ -448,18 +448,177 @@ match_oacc_clause_gang (gfc_omp_clauses *cp)
>  #define OMP_CLAUSE_DEVICE_RESIDENT   (1ULL << 51)
>  #define OMP_CLAUSE_HOST  (1ULL << 52)
>  #define OMP_CLAUSE_OACC_DEVICE   (1ULL << 53)
> -#define OMP_CLAUSE_OACC_COPYIN   (1ULL << 54)

> +/* Helper function for OpenACC and OpenMP clauses involving memory
> +   mapping.  */
> +
> +static bool
> +gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op)
> +{
> +  gfc_omp_namelist **head = NULL;
> +  if (gfc_match_omp_variable_list ("", list, false, NULL, &head, true)
> +  == MATCH_YES)
> +{
> +  gfc_omp_namelist *n;
> +  for (n = *head; n; n = n->next)
> + n->u.map_op = map_op;
> +  return true;
> +}
> +
> +  return false;
> +}
> +
> +/* Match OpenACC data clauses.  */
> +
> +static bool
> +gfc_match_oacc_data_clauses (unsigned long long mask, gfc_omp_clauses *c)
> +{
> +  if ((mask & OMP_CLAUSE_COPYIN)
> +[...]
> +}
> +
> +/* Match OpenMP data clauses.  */
> +
> +static bool
> +gfc_match_omp_data_clauses (unsigned long long mask, gfc_omp_clauses *c)
> +{
> +  if ((mask & OMP_CLAUSE_COPYIN)
> +  && gfc_match_omp_variable_list ("copyin (",
> +   &c->lists[OMP_LIST_COPYIN], true)
> +  == MATCH_YES)
> +return true;
> +  if ((mask & OMP_CLAUSE_COPY)
> +  && gfc_match_omp_variable_list ("copy (",
> +   &c->lists[OMP_LIST_COPY], true)
> +  == MATCH_YES)
> +return true;

It's a bit surprising to see these two (and only these two) handled here
under the moniker OpenMP data clauses.

> +  if (mask & OMP_CLAUSE_COPYOUT)
> +gfc_error ("Invalid OpenMP clause COPYOUT");
> +  if (mask & OMP_CLAUSE_CREATE)
> +gfc_error ("Invalid OpenMP clause CREATE");
> +  if (mask & OMP_CLAUSE_DELETE)
> +gfc_error ("Invalid OpenMP clause DELETE");
> +  if (mask & OMP_CLAUSE_PRESENT)
> +gfc_error ("Invalid OpenMP clause PRESENT");
> +  if (mask & OMP_CLAUSE_PRESENT_OR_COPY)
> +gfc_error ("Invalid OpenMP clause PRESENT_OR_COPY");
> +  if (mask & OMP_CLAUSE_PRESENT_OR_COPY)
> +gfc_error ("Invalid OpenMP clause PRESENT_OR_COPY");
> +  if (mask & OMP_CLAUSE_PRESENT_OR_COPYIN)
> +gfc_error ("Invalid OpenMP clause PRESENT_OR_COPYIN");
> +  if (mask & OMP_CLAUSE_PRESENT_OR_COPYIN)
> +gfc_error ("Invalid OpenMP clause PRESENT_OR_COPYIN");
> +  if (mask & OMP_CLAUSE_PRESENT_OR_COPYOUT)
> +gfc_error ("Invalid OpenMP clause PRESENT_OR_COPYOUT");
> +  if (mask & OMP_CLAUSE_PRESENT_OR_COPYOUT)
> +gfc_error ("Invalid OpenMP clause PRESENT_OR_COPYOUT");
> +  if (mask & OMP_CLAUSE_PRESENT_OR_CREATE)
> +gfc_error ("Invalid OpenMP clause PRESENT_OR_CREATE");
> +  if (mask & OMP_CLAUSE_PRESENT_OR_CREATE)
> +gfc_error ("Invalid OpenMP clause PRESENT_OR_CREATE");

Aren't all these in fact unreachable?

> +
> +  return false;
> +}

I'd suggest to continue to handle all the data clauses...

>  
>  /* Match OpenMP and OpenACC directive clauses. MASK is a bitmask of
> clauses that are allowed for a particular directive.  */
>  
>  static match
>  gfc_match_omp_clauses (gfc_omp_clauses **cp, unsigned long long mask,
> -bool first = true, bool needs_space = true)
> +bool first = true, bool needs_space = true,
> +bool openacc = false)
>  {
>gfc_omp_clauses *c = gfc_

Re: [AArch64] Make sure start callee-save offset for D registers aligned

2014-07-24 Thread Maxim Kuvyrkov

On Jun 5, 2014, at 3:04 PM, Jiong Wang  wrote:

> For AArch64, there may have been an odd num core registers need to be saved.
> 
> This small patch ensure we remain 16 byte aligned for subsequent STP writes 
> of D registers.
> 
> OK for trunk?

Hi Jiong,

How did you test the patch?  You need to run GCC testsuites, and, ideally, 
bootstrap on aarch64-linux-gnu.

Is this patch to fix a correctness problem?  Can you provide a testcase that 
breaks without this patch?

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index aada704..c4abf1e 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1793,6 +1793,10 @@ aarch64_layout_frame (void)
>   offset += UNITS_PER_WORD;
>}
>  
> +  /* Align offset to 16-bytes.
> + There may have been an odd num core registers. Ensure we remain
> + 16 byte aligned for subsequent STP writes of D registers.  */
> +  offset = AARCH64_ROUND_UP (offset, 2 * UNITS_PER_WORD);
>for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
>  if (cfun->machine->frame.reg_offset[regno] == SLOT_REQUIRED)
>{

It seems that we need to align the offset to 16 bytes only when the V0_REGNUM 
loop stores a register pair, but not a single register.  Since non-pair stores 
would occur only when a single register is stored in V0_REGNUM loop (i.e., 
comparatively rare occurrence), we should be OK to align the offset anyway.

The patch seems OK to me (provided no regressions on the testsuite), but you 
need to get an ACK from an official maintainer.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH, Pointer Bounds Checker 9/x] Cgraph extension

2014-07-24 Thread Ilya Enkovich

2014-07-24 15:38 GMT+04:00 Jan Hubicka :
> Hello,
>
>> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
>> index a6a51cf..5e702a7 100644
>> --- a/gcc/cgraph.h
>> +++ b/gcc/cgraph.h
>> @@ -191,6 +191,7 @@ struct GTY(()) cgraph_thunk_info {
>>tree alias;
>>bool this_adjusting;
>>bool virtual_offset_p;
>> +  bool add_pointer_bounds_args;
>>/* Set to true when alias node is thunk.  */
>>bool thunk_p;
>>  };
>> @@ -373,6 +374,13 @@ public:
>>struct cgraph_node *prev_sibling_clone;
>>struct cgraph_node *clones;
>>struct cgraph_node *clone_of;
>> +  /* If instrumentation_clone is 1 then instrumented_version points
>> + to the original function used to make instrumented version.
>> + Otherwise points to instrumented version of the function.  */
>> +  struct cgraph_node *instrumented_version;
>> +  /* If instrumentation_clone is 1 then orig_decl is the original
>> + function declaration.  */
>> +  tree orig_decl;
>
> So the patch is introducing yet another notion of clone (in addition to 
> existing virtual clones
> and function versions used by ifun) and you add a new type of reference 
> (CHKP) to link the
> original and the clone.
>
> Why do you need to link things in 3 different ways? (i.e. 
> instrumented_version points to the
> same place as CHKP and as orig_decl, right?).

CHKP reference is required to have reachability algorithms working
correctly and not removing required instrumented nodes.  References
are rebuilt time to time and instrumented_version is used to rebuild
CHKP reference.  orig_decl is required because original function node
may be removed as unreachable.

>
> I would preffer if this can be put into the existing clone mechanizm. The 
> virtual clones can
> have quite generic transformations done on them and the do perform all the 
> necessary links
> back and forth.

I suppose virtual clones are useful when we may delay their
materialization, i.e. for IPA passes. For checker we have
instrumentation almost immediately following clone creation.
Instrumentation is a GIMPLE pass and we have to materialize clones to
have bodies to instrument. After materialization there is no link to
original node anymore and it means we would still require all new
fields in cgraph_node structure.

>
> I will look into the rest of changes, is there some overview?

I have a short overview of how it works on a wiki page:
https://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler#Instrumentation_clones

Thanks,
Ilya

>
> Honza
>
>
>>/* For functions with many calls sites it holds map from call expression
>>   to the edge to speed up cgraph_edge function.  */
>>htab_t GTY((param_is (struct cgraph_edge))) call_site_hash;
>> @@ -433,6 +441,9 @@ public:
>>/* True if this decl calls a COMDAT-local function.  This is set up in
>>   compute_inline_parameters and inline_call.  */
>>unsigned calls_comdat_local : 1;
>> +  /* True when function is clone created for Pointer Bounds Checker
>> + instrumentation.  */
>> +  unsigned instrumentation_clone : 1;
>>  };
>>
>>
>> @@ -1412,6 +1423,8 @@ symtab_alias_target (symtab_node *n)
>>  {
>>struct ipa_ref *ref;
>>ipa_ref_list_reference_iterate (&n->ref_list, 0, ref);
>> +  if (ref->use == IPA_REF_CHKP)
>> +ipa_ref_list_reference_iterate (&n->ref_list, 1, ref);
>>gcc_checking_assert (ref->use == IPA_REF_ALIAS);
>>return ref->referred;
>>  }
>> diff --git a/gcc/cgraphbuild.c b/gcc/cgraphbuild.c
>> index 19961e2..a2b2106 100644
>> --- a/gcc/cgraphbuild.c
>> +++ b/gcc/cgraphbuild.c
>> @@ -481,6 +481,10 @@ rebuild_cgraph_edges (void)
>>record_eh_tables (node, cfun);
>>gcc_assert (!node->global.inlined_to);
>>
>> +  if (node->instrumented_version
>> +  && !node->instrumentation_clone)
>> +ipa_record_reference (node, node->instrumented_version, IPA_REF_CHKP, 
>> NULL);
>> +
>>return 0;
>>  }
>>
>> @@ -513,6 +517,11 @@ cgraph_rebuild_references (void)
>>   ipa_record_stmt_references (node, gsi_stmt (gsi));
>>  }
>>record_eh_tables (node, cfun);
>> +
>> +
>> +  if (node->instrumented_version
>> +  && !node->instrumentation_clone)
>> +ipa_record_reference (node, node->instrumented_version, IPA_REF_CHKP, 
>> NULL);
>>  }
>>
>>  namespace {
>> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
>> index 06283fc..ceb4060 100644
>> --- a/gcc/cgraphunit.c
>> +++ b/gcc/cgraphunit.c
>> @@ -1702,7 +1702,8 @@ assemble_thunks_and_aliases (struct cgraph_node *node)
>>struct ipa_ref *ref;
>>
>>for (e = node->callers; e;)
>> -if (e->caller->thunk.thunk_p)
>> +if (e->caller->thunk.thunk_p
>> + && !e->caller->thunk.add_pointer_bounds_args)
>>{
>>   struct cgraph_node *thunk = e->caller;
>>
>> diff --git a/gcc/ipa-ref.c b/gcc/ipa-ref.c
>> index 6aa41e6..3a055d9 100644
>> --- a/gcc/ipa-ref.c
>> +++ b/gcc/ipa-ref.c
>> @@ -27,7 +27,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "cgraph.h"
>>  #include "ipa-utils.h"
>>
>> -

Re: [AArch64] Make sure start callee-save offset for D registers aligned

2014-07-24 Thread Jiong Wang


Hi Maxim,

On 24/07/14 14:16, Maxim Kuvyrkov wrote:

On Jun 5, 2014, at 3:04 PM, Jiong Wang  wrote:


For AArch64, there may have been an odd num core registers need to be saved.

This small patch ensure we remain 16 byte aligned for subsequent STP writes of 
D registers.

OK for trunk?

Hi Jiong,

How did you test the patch?  You need to run GCC testsuites, and, ideally, 
bootstrap on aarch64-linux-gnu.

thanks for review.

sorry for haven't make a clearer statement. actually, this patch has 
pass aarch64 bare metal full test and no regression.



Is this patch to fix a correctness problem?  Can you provide a testcase that 
breaks without this patch?


This patch was trying to fix a hidding performance problem. but later I 
found there maybe something wrong with this patch.


I'd re-investigate and re-base this patch  then update it with testcase.

thanks.

-- Jiong




diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index aada704..c4abf1e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1793,6 +1793,10 @@ aarch64_layout_frame (void)
offset += UNITS_PER_WORD;
}
  
+  /* Align offset to 16-bytes.

+ There may have been an odd num core registers. Ensure we remain
+ 16 byte aligned for subsequent STP writes of D registers.  */
+  offset = AARCH64_ROUND_UP (offset, 2 * UNITS_PER_WORD);
for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
  if (cfun->machine->frame.reg_offset[regno] == SLOT_REQUIRED)
{

It seems that we need to align the offset to 16 bytes only when the V0_REGNUM 
loop stores a register pair, but not a single register.  Since non-pair stores 
would occur only when a single register is stored in V0_REGNUM loop (i.e., 
comparatively rare occurrence), we should be OK to align the offset anyway.

The patch seems OK to me (provided no regressions on the testsuite), but you 
need to get an ACK from an official maintainer.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org

[linaro/gcc-4_9-branch] Merge from gcc-4_9-branch and patch reverted

2014-07-24 Thread Yvan Roux

Hi all,

we have merged the gcc-4_9-branch into linaro/gcc-4_9-branch up to
revision 212635 (4.9.1 release) as r212977, and revert the backport of
the revision 211129 (committed as r212685) as r212866.

This will be part of a 2014.07 respinned release.

Thanks,
Yvan

Re: Regimplification enhancements 3/3

2014-07-24 Thread Bernd Schmidt


On 07/24/2014 02:38 PM, Martin Jambor wrote:

This seems to be the statement which has its RHS converted to to a
MEM_REF[&_6], am I right?  I wonder whether it is correct input
though, because it looks like it has mismatched types.  The LHS is
clearly an aggregate of type struct S while the RHS is an SSA name,
meaning it cannot be of an aggregate type.  Does this pass gimple
checking?  What creates that statement?


The code in gimplify-me which I was proposing to remove. I guess I'll 
just commit that patch.



Bernd

Re: [PATCH, Pointer Bounds Checker 9/x] Cgraph extension

2014-07-24 Thread Jan Hubicka

> > So the patch is introducing yet another notion of clone (in addition to 
> > existing virtual clones
> > and function versions used by ifun) and you add a new type of reference 
> > (CHKP) to link the
> > original and the clone.
> >
> > Why do you need to link things in 3 different ways? (i.e. 
> > instrumented_version points to the
> > same place as CHKP and as orig_decl, right?).
> 
> CHKP reference is required to have reachability algorithms working
> correctly and not removing required instrumented nodes.  References
> are rebuilt time to time and instrumented_version is used to rebuild
> CHKP reference.  orig_decl is required because original function node
> may be removed as unreachable.
> 
> >
> > I would preffer if this can be put into the existing clone mechanizm. The 
> > virtual clones can
> > have quite generic transformations done on them and the do perform all the 
> > necessary links
> > back and forth.
> 
> I suppose virtual clones are useful when we may delay their
> materialization, i.e. for IPA passes. For checker we have
> instrumentation almost immediately following clone creation.
> Instrumentation is a GIMPLE pass and we have to materialize clones to
> have bodies to instrument. After materialization there is no link to
> original node anymore and it means we would still require all new
> fields in cgraph_node structure.
> 
> >
> > I will look into the rest of changes, is there some overview?
> 
> I have a short overview of how it works on a wiki page:
> https://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler#Instrumentation_clones

Thanks, I will take a deeper look.  I am just somewhat concerned that you seem
to be duplicating a lot of logic that is already present in the other clonning
schemes we have (i.e. arranging sane partitining, keeping clones linked with
their original etc).  We may want to generalize current mechanizm rather than
implementing similar in parallel...

Sorry for ignoring the patches so long - I seem to have missed my CC in original
thread.  I would welcome if you CC hubi...@ucw.cz for cgraph/ipa related 
patches.

Honza

Fix -flto failures at ARM

2014-07-24 Thread Jan Hubicka

Hi,
this patch fixes ARM LTO failures caused by my ctor offlining patch.
The problem is that error_mark_node used to mark offlined ctors is also
used specially thorough varasm.c mostly as an altenrative to "no constructor".
I am not quite sure this code is needed anyway as we don't output variables
in programs with error, but bellow is safe fix that disables this path
for LTO.

It also makes get_variable_section to stream in the constructor so local
flags are computed correctly.

Bootstrapped/regtested x86_64-linux and tested to fix ARM failures.

Comitted.
Honza

PR lto/61802
* varasm.c (bss_initializer_p): Handle offlined ctors.
(align_variable, get_variable_align): Likewise.
(make_decl_one_only): Likewise.
(default_binds_local_p_1): Likewise.
(decl_binds_to_current_def_p): Likewise.
(get_variable_section): Get constructor if it is offlined.
(assemble_variable_contents): Sanity check that the caller
streamed in the ctor in LTO.
Index: varasm.c
===
--- varasm.c(revision 212984)
+++ varasm.c(working copy)
@@ -956,7 +956,10 @@ bool
 bss_initializer_p (const_tree decl)
 {
   return (DECL_INITIAL (decl) == NULL
- || DECL_INITIAL (decl) == error_mark_node
+ /* In LTO we have no errors in program; error_mark_node is used
+to mark offlined constructors.  */
+ || (DECL_INITIAL (decl) == error_mark_node
+ && !in_lto_p)
  || (flag_zero_initialized_in_bss
  /* Leave constant zeroes in .rodata so they
 can be shared.  */
@@ -1017,7 +1020,9 @@ align_variable (tree decl, bool dont_out
 #endif
 #ifdef CONSTANT_ALIGNMENT
  if (DECL_INITIAL (decl) != 0
- && DECL_INITIAL (decl) != error_mark_node)
+ /* In LTO we have no errors in program; error_mark_node is used
+to mark offlined constructors.  */
+ && (in_lto_p || DECL_INITIAL (decl) != error_mark_node))
{
  unsigned int const_align
= CONSTANT_ALIGNMENT (DECL_INITIAL (decl), align);
@@ -1068,7 +1073,10 @@ get_variable_align (tree decl)
align = data_align;
 #endif
 #ifdef CONSTANT_ALIGNMENT
-  if (DECL_INITIAL (decl) != 0 && DECL_INITIAL (decl) != error_mark_node)
+  if (DECL_INITIAL (decl) != 0
+ /* In LTO we have no errors in program; error_mark_node is used
+to mark offlined constructors.  */
+ && (in_lto_p || DECL_INITIAL (decl) != error_mark_node))
{
  unsigned int const_align = CONSTANT_ALIGNMENT (DECL_INITIAL (decl),
 align);
@@ -1092,13 +1100,20 @@ get_variable_section (tree decl, bool pr
 {
   addr_space_t as = ADDR_SPACE_GENERIC;
   int reloc;
-  symtab_node *snode = symtab_node::get (decl);
-  if (snode)
-decl = snode->ultimate_alias_target ()->decl;
+  varpool_node *vnode = varpool_node::get (decl);
+  if (vnode)
+{
+  vnode = vnode->ultimate_alias_target ();
+  decl = vnode->decl;
+}
 
   if (TREE_TYPE (decl) != error_mark_node)
 as = TYPE_ADDR_SPACE (TREE_TYPE (decl));
 
+  /* We need the constructor to figure out reloc flag.  */
+  if (vnode)
+vnode->get_constructor ();
+
   if (DECL_COMMON (decl))
 {
   /* If the decl has been given an explicit section name, or it resides
@@ -1963,6 +1978,9 @@ assemble_variable_contents (tree decl, c
 
   if (!dont_output_data)
 {
+  /* Caller is supposed to use varpool_get_constructor when it wants
+to output the body.  */
+  gcc_assert (!in_lto_p || DECL_INITIAL (decl) != error_mark_node);
   if (DECL_INITIAL (decl)
  && DECL_INITIAL (decl) != error_mark_node
  && !initializer_zerop (DECL_INITIAL (decl)))
@@ -5890,7 +5908,8 @@ make_decl_one_only (tree decl, tree comd
   symbol->set_comdat_group (comdat_group);
 }
   else if (TREE_CODE (decl) == VAR_DECL
-  && (DECL_INITIAL (decl) == 0 || DECL_INITIAL (decl) == error_mark_node))
+   && (DECL_INITIAL (decl) == 0
+  || (!in_lto_p && DECL_INITIAL (decl) == error_mark_node)))
 DECL_COMMON (decl) = 1;
   else
 {
@@ -6752,7 +6771,7 @@ default_binds_local_p_1 (const_tree exp,
   else if (DECL_COMMON (exp)
   && !resolved_locally
   && (DECL_INITIAL (exp) == NULL
-  || DECL_INITIAL (exp) == error_mark_node))
+  || (!in_lto_p && DECL_INITIAL (exp) == error_mark_node)))
 local_p = false;
   /* Otherwise we're left with initialized (or non-common) global data
  which is of necessity defined locally.  */
@@ -6807,7 +6826,7 @@ decl_binds_to_current_def_p (const_tree
 return false;
   if (DECL_COMMON (decl)
   && (DECL_INITIAL (decl) == NULL
- || DECL_INITIAL (decl) == error_mark_node))
+ || (!in_lto_p && DECL_INITIAL (decl) == error_mark_node)))
 return false;

Re: [PATCH][gcc-4.9.0] gcc/Makefile.in: fix parallel building failure

2014-07-24 Thread Tom Tromey

> "Hongxu" == Hongxu Jia  writes:

Hongxu> I tests on 4.10 (gcc-4.10-20140720), and this issue still existed.

It seems to me that the "generated_files" code should handle this:

generated_files = config.h tm.h $(TM_P_H) $(TM_H) multilib.h \
[...]
$(ALL_HOST_OBJS) : | $(generated_files)

That's the way it is supposed to work anyhow.

Tom

Re: [AArch64/GCC][15/N] Add two new frame fields

2014-07-24 Thread Marcus Shawcroft

On 22 July 2014 15:52, Jiong Wang  wrote:

> gcc/
>   * config/aarch64/aarch64.h (frame): New fields "wb_candidate1" and
> "wb_candidate2".
>   * config/aarch64/aarch64.c (aarch64_layout_frame): Calcualte new added
> fields.

OK and applied.
/Marcus

Re: [AArch64/GCC][16/N] New parameter 'skip_wb' for 'aarch64_save/restore_callee_save_common'

2014-07-24 Thread Marcus Shawcroft

On 22 July 2014 15:52, Jiong Wang  wrote:

> gcc/
>   * config/aarch64/aarch64.c (aarch64_save_callee_save_common): New
> parameter "skip_wb".
>   (aarch64_restore_callee_save_common): Likewise.
>   (aarch64_expand_prologue): Update call site.
>   (aarch64_expand_epilogue): Likewise.

The function names in the proposed ChangeLog don't match the code.
Otherwise OK and applied with fixed ChangeLog.
/Marcus

Re: [AArch64/GCC][17/N] Optimize prologue when there is no frame pointe

2014-07-24 Thread Marcus Shawcroft

On 24 July 2014 13:43, Jiong Wang  wrote:
> Under new pro/epi code, we could also utilize our store write-back to
> optimize
> stack adjustment when there is no frame pointer.
>
> * if there is candidate reg pair and adjustment amount is less than 512 then
> we
> could use aarch64's paired store write-back.
> * if there is only a single candidate reg and adjustment amount is less than
> 256,
> we could use aarch64's single store write-back.
> * otherwise use explictly subtraction to finish stack adjustment.
>   Improved testcases:
> gcc.target/aarch64/test_frame_1.c
> gcc.target/aarch64/test_frame_10.c
> gcc.target/aarch64/test_frame_2.c
> gcc.target/aarch64/test_frame_4.c
> gcc.target/aarch64/test_frame_6.c
> gcc.target/aarch64/test_frame_7.c
> gcc.target/aarch64/test_frame_8.c
> gcc.target/aarch64/test_fp_attribute_1.c

OK and committed.
/Marcus

Re: [AArch64/GCC][18/N] Optimize epilogue when there is no frame pointer

2014-07-24 Thread Marcus Shawcroft

On 24 July 2014 13:48, Jiong Wang  wrote:

> gcc/
>* config/aarch64/aarch64.c (aarch64_popwb_single_reg): New function.
>(aarch64_expand_epilogue): Optimize epilogue when !frame_pointer_needed.
>
> gcc/testsuite/
>* gcc.target/aarch64/test_frame_1.c: Match optimized instruction
> sequences.
>* gcc.target/aarch64/test_frame_2.c: Likewise.
>* gcc.target/aarch64/test_frame_4.c: Likewise.
>* gcc.target/aarch64/test_frame_6.c: Likewise.
>* gcc.target/aarch64/test_frame_7.c: Likewise.
>* gcc.target/aarch64/test_frame_8.c: Likewise.
>* gcc.target/aarch64/test_frame_10.c: Likewise.

OK and committed.
/Marcus

Re: [PATCH] Fix vector tests on ARM platforms with disabled unaligned accesses

2014-07-24 Thread Ramana Radhakrishnan

>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-109.c
> b/gcc/testsuite/gcc.dg/vect/vect-109.c
> index 854c970..fb87e2c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-109.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-109.c
> @@ -1,4 +1,4 @@
> -/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target { vect_int && {! vect_no_align } } } */
>
>  #include 
>  #include "tree-vect.h"
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-93.c
> b/gcc/testsuite/gcc.dg/vect/vect-93.c
> index 65403eb..1065a6e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-93.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-93.c
> @@ -79,7 +79,7 @@ int main (void)
>  /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target
> vect_no_align } } } */
>
>  /* in main: */
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target
> vect_no_align } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target
> vect_no_align } } } */
>  /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 1
> "vect" { xfail { vect_no_align } } } } */
>
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> diff --git a/gcc/testsuite/lib/target-supports.exp
> b/gcc/testsuite/lib/target-supports.exp
> index 5290a55..190483c 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -2293,7 +2293,8 @@ proc check_effective_target_arm_little_endian { } {
>  proc check_effective_target_arm_vect_no_misalign { } {
>  return [check_no_compiler_messages arm_vect_no_misalign assembly {
> #if !defined(__arm__) \

This is redundant.

> -   || (defined(__ARMEL__) \
> +   || (defined(__ARM_FEATURE_UNALIGNED) \
> +   && defined(__ARMEL__) \


> && (!defined(__thumb__) || defined(__thumb2__)))

As is this line.

I think you can restrict the check to defined(__ARM_FEATURE_UNALIGNED)
&& defined(__ARMEL__)

 __ARM_FEATURE_UNALIGNED should tell you whether unaligned access is
allowed or not, therefore you should no longer require any specific
"architectural" checks.


> #error FOO
> #endif
>


I'm not sure about the original intent of the tests right now.

Ramana

[PATCH] gcc/gcc.c: XNEWVEC enough space for 'saved_suffix' using

2014-07-24 Thread Chen Gang

strlen() will get string length excluding '\0', but strcpy() will append
'\0' in the end, so need XNEWVEC additional byte, or cause memory over
flow.

Signed-off-by: Chen Gang 
---
 gcc/gcc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index 6cd08ea..8ea46ec 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -4895,7 +4895,7 @@ do_spec_1 (const char *spec, int inswitch, const char 
*soft_matched_part)
  {
saved_suffix
  = XNEWVEC (char, suffix_length
-+ strlen (TARGET_OBJECT_SUFFIX));
++ strlen (TARGET_OBJECT_SUFFIX) + 1);
strncpy (saved_suffix, suffix, suffix_length);
strcpy (saved_suffix + suffix_length,
TARGET_OBJECT_SUFFIX);
-- 
1.7.11.7

Re: Implement N4051 - Allow typename in a template template parameter

2014-07-24 Thread Jason Merrill


On 07/23/2014 10:31 PM, Ed Smith-Rowland wrote:

+pedwarn (token->location, OPT_Wpedantic,
+"ISO C++ forbids typename key in template template parameter");


This should mention -std=c++1z.


+  if (tag_type == none_type)
+cp_parser_error (parser, "expected type-parameter-key");

...

+ case RT_TYPE_PARAMETER_KEY:
+   cp_parser_error (parser, "expected % or %");


It seems unfortunate to have this diagnostic in two places.  I think 
let's not use cp_parser_require here.


Jason

[PATCHv3] Fix vector tests on ARM platforms with disabled unaligned accesses

2014-07-24 Thread Marat Zakirov



On 07/24/2014 04:27 PM, Marat Zakirov wrote:


On 07/23/2014 06:23 PM, Marat Zakirov wrote:

Hi there!

I made a patch which fixes regressions on ARM platforms with disabled 
unaligned accesses. The problem is that 'arm_vect_no_misalign' 
predicate do not check 'unaligned_access' global variable to 
determine whether unaligned access to vector are allowed. This leads 
to spurious vect.exp test fails when GCC is configured 
--with-specs=%{!munaligned-access:-mno-unaligned-access}.


Attached patch fixes ARM predicate and several tests to correctly 
handle the issue.


The following targets were reg. tested for multiple targets (ARM, 
Thumb-1, Thumb-2, x86, x86_64) with and without 
-mno-unaligned-access.  Analysis showed patch affects only vect.exp 
tests so only vect.exp was tested.


For x86, x86_64,  ARM without -mno-unaligned-access, Thumb-2 without 
-mno-unaligned-access and Thumb-1 no regressions occured. For 
ARM/Thumb2 with -mno-unaligned-access patch fixed most of failures 
but triggered some problems (see attached log) for current vect.exp 
tests:

1) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61887
2) Some XPASS'es due to unexpected loop versioning (e.g. 
gcc.dg/vect/pr33804.c).
3) After predicate fix some passing tests which require unaligned 
vector support become NA (this was expected).


Here is new version of patch and regression log. On the current trunk 
results are slightly different due to patches for Richard Biener (no 
UNRESOLVED fails) but some  PASS->XPASS regressions still remain (see 
attachment):


PASS->XPASS: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c 
scan-tree-dump-times vect "vectorized 1 loops" 1
PASS->XPASS: gcc.dg/vect/pr33804.c -flto -ffat-lto-objects 
scan-tree-dump-times vect "vectorized 1 loops" 1

etc.

These XPASS'es are due to code versioning: current GCC creates 2 
versions of loop: aligned and misaligned. It's look like they are 
slightly out of date at lest for ARM.




On 07/24/2014 06:50 PM, Ramana Radhakrishnan wrote:

This is redundant.


-   || (defined(__ARMEL__) \
+   || (defined(__ARM_FEATURE_UNALIGNED) \
+   && defined(__ARMEL__) \
 && (!defined(__thumb__) || defined(__thumb2__)))

As is this line.

I think you can restrict the check to defined(__ARM_FEATURE_UNALIGNED)
&& defined(__ARMEL__)

  __ARM_FEATURE_UNALIGNED should tell you whether unaligned access is
allowed or not, therefore you should no longer require any specific
"architectural" checks.



 #error FOO
 #endif


I'm not sure about the original intent of the tests right now.

Ramana



Thank you Ramana!

--Marat
dg-cmp-results.sh: Verbosity is 2, Variant is "target-sim"

Older log file: ./gcc/testsuite/gcc/gcc.sum
Test Run By mzakirov on Thu Jul 24 11:47:42 2014
Target is arm-v7a15v5r2-linux-gnueabi
Host   is x86_64-pc-linux-gnu

Newer log file: /home/mzakirov/proj/gcc_unalign/build.arm.cortex-a15/obj/gcc_final/./gcc/testsuite/gcc/gcc.sum
Test Run By mzakirov on Thu Jul 24 19:17:05 2014
Target is arm-v7a15v5r2-linux-gnueabi
Host   is x86_64-pc-linux-gnu

NA->PASS: gcc.dg/vect/bb-slp-10.c -flto -ffat-lto-objects  scan-tree-dump slp2 "unsupported alignment in basic block."
FAIL->NA: gcc.dg/vect/bb-slp-10.c -flto -ffat-lto-objects  scan-tree-dump-times slp2 "basic block vectorized" 1
XFAIL->NA: gcc.dg/vect/bb-slp-10.c -flto -ffat-lto-objects  scan-tree-dump-times slp2 "unsupported alignment in basic block." 1
NA->PASS: gcc.dg/vect/bb-slp-10.c scan-tree-dump slp2 "unsupported alignment in basic block."
FAIL->NA: gcc.dg/vect/bb-slp-10.c scan-tree-dump-times slp2 "basic block vectorized" 1
XFAIL->NA: gcc.dg/vect/bb-slp-10.c scan-tree-dump-times slp2 "unsupported alignment in basic block." 1
FAIL->NA: gcc.dg/vect/bb-slp-24.c -flto -ffat-lto-objects  scan-tree-dump-times slp1 "basic block vectorized" 1
FAIL->NA: gcc.dg/vect/bb-slp-24.c scan-tree-dump-times slp1 "basic block vectorized" 1
FAIL->NA: gcc.dg/vect/bb-slp-25.c -flto -ffat-lto-objects  scan-tree-dump-times slp1 "basic block vectorized" 1
FAIL->NA: gcc.dg/vect/bb-slp-25.c scan-tree-dump-times slp1 "basic block vectorized" 1
FAIL->NA: gcc.dg/vect/bb-slp-29.c -flto -ffat-lto-objects  scan-tree-dump-times slp1 "basic block vectorized" 1
FAIL->NA: gcc.dg/vect/bb-slp-29.c scan-tree-dump-times slp1 "basic block vectorized" 1
FAIL->XFAIL: gcc.dg/vect/bb-slp-32.c -flto -ffat-lto-objects  scan-tree-dump slp2 "vectorization is not profitable"
FAIL->XFAIL: gcc.dg/vect/bb-slp-32.c scan-tree-dump slp2 "vectorization is not profitable"
FAIL->XFAIL: gcc.dg/vect/bb-slp-9.c -flto -ffat-lto-objects  scan-tree-dump-times slp2 "basic block vectorized" 1
FAIL->XFAIL: gcc.dg/vect/bb-slp-9.c scan-tree-dump-times slp2 "basic block vectorized" 1
FAIL->NA: gcc.dg/vect/bb-slp-cond-1.c -flto -ffat-lto-objects  scan-tree-dump-times slp1 "basic block vectorized" 1
FAIL->NA: gcc.dg/vect/bb-slp-cond-1.c scan-tree-dump-times slp1 "basic block vectorized" 1
FAIL->NA: gcc.dg/vect/bb-slp-pattern-2.c -flto -ffat-lto

RFA: Add a common tls_referenced_p function

2014-07-24 Thread Richard Sandiford

Three targets had the same for_each_rtx function to check for a TLS symbol.
This patch adds a generic version instead.

Some other targets have a variation that checks for target-specific
UNSPEC sequences too so I've left those alone.  They're all prefixed
by the target name so there's no name clash or ambiguity.

Tested on mips64-linux-gnu and via a cross-compiler for
powerpc64-linux-gnu and hppa64-hp-hpux11.23.  OK to install?

Thanks,
Richard


gcc/
* rtl.h (tls_referenced_p): Declare.
* rtlanal.c (tls_referenced_p_1, tls_referenced_p): New functions.
* config/mips/mips.c (mips_tls_symbol_ref_1): Delete.
(mips_cannot_force_const_mem): Use tls_referenced_p.
* config/pa/pa-protos.h (pa_tls_referenced_p): Delete.
* config/pa/pa.h (CONSTANT_ADDRESS_P): Use tls_referenced_p
instead of pa_tls_referenced_p.
* config/pa/pa.c (hppa_legitimize_address, pa_cannot_force_const_mem)
(pa_emit_move_sequence, pa_emit_move_sequence): Likewise.
(pa_legitimate_constant_p): Likewise.
(pa_tls_symbol_ref_1, pa_tls_referenced_p): Delete.
* config/rs6000/rs6000.c (rs6000_tls_referenced_p): Delete.
(rs6000_cannot_force_const_mem, rs6000_emit_move)
(rs6000_address_for_altivec): Use tls_referenced_p instead of
rs6000_tls_referenced_p.
(rs6000_tls_symbol_ref_1): Delete.

Index: gcc/rtl.h
===
--- gcc/rtl.h   2014-07-24 16:17:57.804445472 +0100
+++ gcc/rtl.h   2014-07-24 16:42:55.788314641 +0100
@@ -2292,6 +2292,7 @@ extern int replace_label (rtx *, void *)
 extern int rtx_referenced_p (rtx, rtx);
 extern bool tablejump_p (const_rtx, rtx *, rtx *);
 extern int computed_jump_p (const_rtx);
+extern bool tls_referenced_p (rtx);
 
 typedef int (*rtx_function) (rtx *, void *);
 extern int for_each_rtx (rtx *, rtx_function, void *);
Index: gcc/rtlanal.c
===
--- gcc/rtlanal.c   2014-07-24 16:11:17.367474535 +0100
+++ gcc/rtlanal.c   2014-07-24 16:42:55.789314647 +0100
@@ -5960,3 +5960,22 @@ get_index_code (const struct address_inf
 
   return SCRATCH;
 }
+
+/* Return 1 if *X is a thread-local symbol.  */
+
+static int
+tls_referenced_p_1 (rtx *x, void *)
+{
+  return GET_CODE (*x) == SYMBOL_REF && SYMBOL_REF_TLS_MODEL (*x) != 0;
+}
+
+/* Return true if X contains a thread-local symbol.  */
+
+bool
+tls_referenced_p (rtx x)
+{
+  if (!targetm.have_tls)
+return false;
+
+  return for_each_rtx (&x, &tls_referenced_p_1, 0);
+}
Index: gcc/config/mips/mips.c
===
--- gcc/config/mips/mips.c  2014-07-24 16:12:51.943409858 +0100
+++ gcc/config/mips/mips.c  2014-07-24 16:42:55.779314588 +0100
@@ -2171,15 +2171,6 @@ mips_symbol_insns (enum mips_symbol_type
   return mips_symbol_insns_1 (type, mode) * (TARGET_MIPS16 ? 2 : 1);
 }
 
-/* A for_each_rtx callback.  Stop the search if *X references a
-   thread-local symbol.  */
-
-static int
-mips_tls_symbol_ref_1 (rtx *x, void *data ATTRIBUTE_UNUSED)
-{
-  return mips_tls_symbol_p (*x);
-}
-
 /* Implement TARGET_CANNOT_FORCE_CONST_MEM.  */
 
 static bool
@@ -2223,7 +2214,7 @@ mips_cannot_force_const_mem (enum machin
 }
 
   /* TLS symbols must be computed by mips_legitimize_move.  */
-  if (for_each_rtx (&x, &mips_tls_symbol_ref_1, NULL))
+  if (tls_referenced_p (x))
 return true;
 
   return false;
Index: gcc/config/pa/pa-protos.h
===
--- gcc/config/pa/pa-protos.h   2014-07-24 16:11:17.367474535 +0100
+++ gcc/config/pa/pa-protos.h   2014-07-24 16:42:55.780314594 +0100
@@ -54,7 +54,6 @@ extern void pa_output_global_address (FI
 extern void pa_print_operand (FILE *, rtx, int);
 extern void pa_encode_label (rtx);
 extern int pa_symbolic_expression_p (rtx);
-extern bool pa_tls_referenced_p (rtx);
 extern int pa_adjust_insn_length (rtx, int);
 extern int pa_fmpyaddoperands (rtx *);
 extern int pa_fmpysuboperands (rtx *);
Index: gcc/config/pa/pa.h
===
--- gcc/config/pa/pa.h  2014-07-24 16:11:17.367474535 +0100
+++ gcc/config/pa/pa.h  2014-07-24 16:42:55.782314606 +0100
@@ -797,7 +797,7 @@ #define CONSTANT_ADDRESS_P(X) \
   ((GET_CODE (X) == LABEL_REF  \
|| (GET_CODE (X) == SYMBOL_REF && !SYMBOL_REF_TLS_MODEL (X))
\
|| GET_CODE (X) == CONST_INT
\
-   || (GET_CODE (X) == CONST && !pa_tls_referenced_p (X))  \
+   || (GET_CODE (X) == CONST && !tls_referenced_p (X)) \
|| GET_CODE (X) == HIGH)\
&& (reload_in_progress || reload_completed  \
|| ! pa_symbolic_expression_p (X)))
Index: gcc/config/pa/pa.c
==

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-07-24 Thread Ilya Verbin

In gcc/configure.ac:

AC_SUBST(enable_accelerator)
offload_targets=`echo $offload_targets | sed -e 's#,#:#'`
AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets",

Looks like, it should be: sed -e 's#,#:#g'

  -- Ilya

Re: [cygming] make sure libgcc logic is consistent

2014-07-24 Thread Kai Tietz

Hello DJ,

sorry for late reply.  Patch is ok. Please apply.

Thanks,
Kai

- Original Message -
> 
> This patch changes the logic in "crtbegin" that looks for libgcc.dll
> such that the test is only done once, guaranteeing consistent results
> between the register and deregister cases.
> 
> Previously, a crash occurred if the application (directly or
> indirectly) caused libgcc.dll to load after main() was called
> (i.e. dlopen'd a DLL which required libgcc).  In that case, the
> register test would return "no libgcc" and register with the local
> copy, yet the deregister test would return "yes libgcc" and try to
> deregister something with libgcc.dll that hadn't been registered.
> 
> Index: cygming-crtbegin.c
> ===
> --- cygming-crtbegin.c(revision 212915)
> +++ cygming-crtbegin.c(working copy)
> @@ -99,12 +99,13 @@ static EH_FRAME_SECTION_CONST char __EH_
>= { };
>  
>  static struct object obj;
>  
>  /* Handle of libgcc's DLL reference.  */
>  HANDLE hmod_libgcc;
> +static void *  (*deregister_frame_fn) (const void *) == NULL;
>  #endif
>  
>  #if TARGET_USE_JCR_SECTION
>  static void *__JCR_LIST__[]
>__attribute__ ((used, section(JCR_SECTION_NAME), aligned(4)))
>= { };
> @@ -130,15 +131,20 @@ __gcc_register_frame (void)
>if (h)
>  {
>/* Increasing the load-count of LIBGCC_SONAME DLL.  */
>hmod_libgcc = LoadLibrary (LIBGCC_SONAME);
>register_frame_fn = (void (*) (const void *, struct object *))
> GetProcAddress (h, "__register_frame_info");
> +  deregister_frame_fn = (void* (*) (const void *))
> + GetProcAddress (h, "__deregister_frame_info");
> +}
> +  else
> +{
> +  register_frame_fn = __register_frame_info;
> +  deregister_frame_fn = __deregister_frame_info;
>  }
> -  else
> -register_frame_fn = __register_frame_info;
>if (register_frame_fn)
>   register_frame_fn (__EH_FRAME_BEGIN__, &obj);
>  #endif
>  
>  #if TARGET_USE_JCR_SECTION
>if (__JCR_LIST__[0])
> @@ -158,19 +164,12 @@ __gcc_register_frame (void)
>  }
>  
>  void
>  __gcc_deregister_frame (void)
>  {
>  #if DWARF2_UNWIND_INFO
> -  void *  (*deregister_frame_fn) (const void *);
> -  HANDLE h = GetModuleHandle (LIBGCC_SONAME);
> -  if (h)
> -deregister_frame_fn = (void* (*) (const void *))
> -   GetProcAddress (h, "__deregister_frame_info");
> -  else
> -deregister_frame_fn = __deregister_frame_info;
>if (deregister_frame_fn)
>   deregister_frame_fn (__EH_FRAME_BEGIN__);
>if (hmod_libgcc)
>  FreeLibrary (hmod_libgcc);
>  #endif
>  }
>

Re: [PATCH] [gomp4] Initial support of OpenACC loop directive in C front-end.

2014-07-24 Thread Thomas Schwinge

Hi!

On Thu, 20 Mar 2014 15:42:48 +0100, I wrote:
> On Tue, 18 Mar 2014 14:50:44 +0100, I wrote:
> > On Tue, 18 Mar 2014 16:37:24 +0400, Ilmir Usmanov  
> > wrote:
> > > This patch introduces support of OpenACC loop directive (and combined 
> > > directives) in C front-end up to GENERIC. Currently no clause is allowed.
> > 
> > Thanks!  I had worked on a simpler patch, not yet dealing with combined
> > clauses.  Also, I have some work for the GIMPLE level, namely building on
> > GIMPLE_OMP_FOR, adding a new GF_OMP_FOR_KIND_OACC_LOOP.  I'll post this
> > soon.
> 
> Here are the patches, committed in r208702..4 to gomp-4_0-branch.

> commit f1d39706db8dccbc988e2c66552511cd54632257
> Author: tschwinge 
> Date:   Thu Mar 20 14:40:01 2014 +
> 
> Continue implementation of OpenACC loop construct.

For loop scheduling, this is currently using
expand_omp_for_static_nochunk.  For a loop iterating through [0; 100) on
32 threads, this gives us the following schedule:

0   0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9
32  9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 
17 17 17 18 18 18 19 19 19
64  20 20 20 21 21 21 22 22 22 23 23 23 24 24 24 25 25 25 26 26 26 27 
27 27 28 28 28 29 29 29 30 30
96  30 31 31 31

..., that is, several consecutive loop iterations are executed on the
same thread.  This isn't ideal for GPUs, where for a number of "threads"
that are executing in parallel, we'd like all these to execute one
"bucket" of consecutive loop iterations, and then the whole set of them
moves to the next "bucket", so we'd like a schedule as follows:

0   0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 
26 27 28 29 30 31
32  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 
26 27 28 29 30 31
64  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 
26 27 28 29 30 31
96  0 1 2 3

Here, "buckets" of 32 iterations are being executed by 32 threads, then
the next 32 iterations, and so on.  (This is actually one of the OpenACC
parallelism concepts, vector parallelism, mapped to the "warp size" of a
Nvidia GPU.)

In r213006, I committed the following hack to use
expand_omp_for_static_chunk instead of expand_omp_for_static_nochunk, by
specifying a chunk_size of one to implement the desired scheduling.

commit 9a545f89fbb1b361286005ceb68e154d0afc84bd
Author: tschwinge 
Date:   Thu Jul 24 15:55:49 2014 +

Force OpenACC loop to use a chunk size of one.

gcc/
* omp-low.c (extract_omp_for_data): Force OpenACC loop to use a
chunk size of one.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@213006 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |  3 +++
 gcc/omp-low.c  | 10 ++
 2 files changed, 13 insertions(+)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index f8a9d74..cc9b06c 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,8 @@
 2014-07-24  Thomas Schwinge  

+   * omp-low.c (extract_omp_for_data): Force OpenACC loop to use a
+   chunk size of one.
+
* omp-low.c (expand_omp_for_static_chunk): Merge changes
previously applied to expand_omp_for_static_nochunk.

diff --git gcc/omp-low.c gcc/omp-low.c
index 2799638..b188e2d 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -619,6 +619,16 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data 
*fd,
   fd->loop.step = build_int_cst (TREE_TYPE (fd->loop.v), 1);
   fd->loop.cond_code = LT_EXPR;
 }
+
+  //TODO
+  /* For OpenACC loops, force a chunk size of one, as this avoids the default
+scheduling where several subsequent iterations are being executed by the
+same thread.  */
+  if (gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_OACC_LOOP)
+{
+  gcc_assert (fd->chunk_size == NULL_TREE);
+  fd->chunk_size = build_int_cst (TREE_TYPE (fd->loop.v), 1);
+}
 }

In r213005, I committed changes to expand_omp_for_static_chunk that are
just what has previously been applied to expand_omp_for_static_nochunk.
(Internally, we have builtins to query the real nthreads and threadid,
insteead of the dummy one, zero values that I'm using here.)

commit 6c07d1bd13f6ceef80beb3c62cd25c3aaa397f1b
Author: tschwinge 
Date:   Thu Jul 24 15:55:39 2014 +

Make expand_omp_for_static_chunk usable for OpenACC.

gcc/
* omp-low.c (expand_omp_for_static_chunk): Merge changes
previously applied to expand_omp_for_static_nochunk.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@213005 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |  5 +
 gcc/omp-low.c  | 19 +--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index adfae10..f8a9d74 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2014-07-24  Thomas Schwinge  
+

[PATCH, alpha]: Define TARGET_UNWIND_TABLES_DEFAULT

2014-07-24 Thread Uros Bizjak

Hello!

Adding -g to compile flags breaks unwinding on alpha due to
non-existent FDE entries. This is illustrated by compiling a hello.c
program:

w/o -g:

$gcc -O2 -c hello.c

$ objdump --dwarf=frames-interp hello.o

hello.o: file format elf64-alpha

Contents of the .eh_frame section:

 0010  CIE "zR" cf=4 df=-8 ra=26
   LOC   CFA
 r30+0

0014 0018 0018 FDE cie=
pc=..003c
   LOC   CFA  ra
 r30+0u
001c r30+16   c-16

w/ -g:

$gcc -O2 -g -c hello.c

$ objdump --dwarf=frames-interp hello.o

hello.o: file format elf64-alpha

Contents of the .debug_frame section:

 000c  CIE "" cf=4 df=-8 ra=26
   LOC   CFA
 r30+0

0010 0024  FDE cie=
pc=..003c
   LOC   CFA  ra
 r30+0u
000c r30+16   u
001c r30+16   c-16
0038 r30+0u

Please note that FDE moved from .eh_frame section to .debug_frame.

The assembler generates .eh_frame data by itself (please see
gas/config/tc-alpha.c, alpha_elf_md_end), unless there is existing
unwind info, or there are .cfi directives:

--cut here--
  /* If someone has generated the unwind info themselves, great.  */
  if (bfd_get_section_by_name (stdoutput, ".eh_frame") != NULL)
return;

  /* ??? In theory we could look for functions for which we have
 generated unwind info via CFI directives, and those we have not.
 Those we have not could still get their unwind info from here.
 For now, do nothing if we've seen any CFI directives.  Note that
 the above test will not trigger, as we've not emitted data yet.  */
  if (all_fde_data != NULL)
return;
--cut here--

However, when -g is in effect, .cfi_sections switches to .debug_frame.
Emitted .cfi directives go to .debug_frame, and nobody generates
.eh_frame in this case.

This problem can be solved by defining TARGET_UNWIND_TABLES_DEFAULT to
true. This way, the compiler will always generate precise unwind info
by itself, with and without -g. Since the assembler always emits
unwind data, setting TARGET_UNWIND_TABLES_DEFAULT also seems correct.

The patch also fixes following go testsuite failure:

FAIL: go.test/test/nilptr2.go execution,  -O2 -g

where missing FDE in libc's  _wordcopy_fwd_aligned at wordcopy.c broke
unwinding.

2014-07-24  Uros Bizjak  

* config/alpha/elf.h: Define TARGET_UNWIND_TABLES_DEFAULT.

Patch was bootstrapped and regression tested on alphaev68-pc-linux-gnu.

Patch was committed to mainline SVN and will be backported to release branches.

Uros.
Index: config/alpha/elf.h
===
--- config/alpha/elf.h  (revision 212920)
+++ config/alpha/elf.h  (working copy)
@@ -126,6 +126,10 @@
   "%{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s} \
%{shared|pie:crtendS.o%s;:crtend.o%s} crtn.o%s"
 
+/* This variable should be set to 'true' if the target ABI requires
+   unwinding tables even when exceptions are not used.  */
+#define TARGET_UNWIND_TABLES_DEFAULT true
+
 /* Select a format to encode pointers in exception handling data.  CODE
is 0 for data, 1 for code labels, 2 for function pointers.  GLOBAL is
true if the symbol may be affected by dynamic relocations.

Re: [cygming] make sure libgcc logic is consistent

2014-07-24 Thread DJ Delorie


> sorry for late reply.  Patch is ok. Please apply.

Applied.  Thanks!

Re: [RFC: Patch, PR 60102] [4.9/4.10 Regression] powerpc fp-bit ices@dwf_regno

2014-07-24 Thread Ulrich Weigand

Rohit wrote:

> This is related to the following bug:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D60102
> 
> I have tried to fix the e500v2 build on GCC v4.9.0 with the attached patch.
> Can you please review and comment on the changes especially DWARF_FRAME_REG=
> NUM, DWARF_REG_TO_UNWIND_COLUMN definitions?

David asked me to comment on the use of DWARF register numbers in this patch.

There's a number of register number "address spaces" in play here:

(A) GCC hard register numbers
(B) DWARF register numbers used in .debug_info etc.
(C) DWARF CFI register numbers (GCC internal)
(D) DWARF CFI register numbers as used in .debug_frame
(E) DWARF CFI register numbers as used in .eh_frame
(F) DWARF CFI unwind column numbers

These are a number of macros to convert between them:

DBX_REGISTER_NUMBER: (A) -> (B)
DWARF_FRAME_REGNUM: (A) -> (C)
DWARF2_FRAME_REG_OUT: (C) -> (D) / (E)
DWARF_REG_TO_UNWIND_COLUMN: (E) -> (F)

Note that some of these seem to be used incorrectly in current rs6000.c:

  for (i = FIRST_ALTIVEC_REGNO; i < LAST_ALTIVEC_REGNO+1; i++)
{
  int column = DWARF_REG_TO_UNWIND_COLUMN (i);
  HOST_WIDE_INT offset
= DWARF_FRAME_REGNUM (column) * GET_MODE_SIZE (mode);

This should rather be

  int column = DWARF_REG_TO_UNWIND_COLUMN (DWARF_FRAME_REGNUM (i));
  HOST_WIDE_INT offset = column * GET_MODE_SIZE (mode);

which doesn't show up as problem currently since DWARF_FRAME_REGNUM
is defined as the identity mapping, but will show up once you have to
actually define a nontrivial mapping in DWARF_FRAME_REGNUM.

[ To be fully correct, I guess it actually should be

  int column = DWARF_REG_TO_UNWIND_COLUMN
(DWARF2_FRAME_REG_OUT (DWARF_FRAME_REGNUM (i), true));

  but DWARF2_FRAME_REG_OUT (..., true) is the identity map as well ... ]

Now, if I understand the SPE situation correctly, you had previously:

- no GCC hard register numbers
  (however, rs6000_dwarf_register_span, which is supposed to return a
  hard register number, returned numbers in the 1200..1231 range)
- used the 1200..1231 range for (B), (C), (D), and (E)
- used the 113..145 range for (F)

Now, you need to introduce new GCC hard register numbers (A).  However, in
order to preserve compatibility with DWARF info in existing binaries, none
of (B), (D), (E) or (F) is allowed to change.  [ (C) could change in theory,
but it's probably best not to change it either.  ]

Your patch now defines the new GCC hard register numbers in the 117..149 range,
which seems reasonable.  However, you ought to the leave the other mappings
unchanged.  For (B) this looks OK due to the rs6000_dbx_register_number change.

However (C), (D), and (E) *do* change with your patch:

> -#define DWARF_FRAME_REGNUM(REGNO) (REGNO)
> +#define DWARF_FRAME_REGNUM(REGNO) \
> +  ((REGNO) >= 1200 ? ((REGNO) - 1200 + (DWARF_FRAME_REGISTERS - 32)) : 
> (REGNO))

This isn't OK; the input to DWARF_FRAME_REGNUM is a GCC hard register number,
which will never be in the 1200... range.

On the other hand, you can now get hard register numbers in the 117..149 range,
which you need to map *back* to the 1200..1231 range, or else CFI register
numbers will be wrong.  So you should have something like:

#define DWARF_FRAME_REGNUM(REGNO) \
  (SPE_HIGH_REGNO_P(REGNO)? ((REGNO) - FIRST_SPE_HIGH_REGNO + 1200) : (REGNO))

On the other hand, the DWARF_REG_TO_UNWIND_COLUMN macro needs to map that
1200..1231 range back to the 113..145 range, so it should just stay as-is.

Note that (F) ends up being OK with your patch as-is, since the two bugs
in DWARF_FRAME_REGNUM and DWARF_REG_TO_UNWIND_COLUMN cancel each other out.

A couple of further comments on the patch:

> Index: libgcc/config/rs6000/linux-unwind.h
> ===
> --- libgcc/config/rs6000/linux-unwind.h   (revision 212339)
> +++ libgcc/config/rs6000/linux-unwind.h   (working copy)
> @@ -274,8 +274,8 @@ ppc_fallback_frame_state (struct _Unwind
>  #ifdef __SPE__
>for (i = 14; i < 32; i++)
>  {
> -  fs->regs.reg[i + FIRST_PSEUDO_REGISTER - 1].how = REG_SAVED_OFFSET;
> -  fs->regs.reg[i + FIRST_PSEUDO_REGISTER - 1].loc.offset
> +  fs->regs.reg[i + FIRST_SPE_HIGH_REGNO - 4].how = REG_SAVED_OFFSET;
> +  fs->regs.reg[i + FIRST_SPE_HIGH_REGNO - 4].loc.offset

This is a change to current behaviour, but that was probably intended
since the old behaviour seems broken (apparently wasn't updated after
the introduction of the three HTM registers).

> Index: gcc/config/rs6000/rs6000.c
> ===
> --- gcc/config/rs6000/rs6000.c(revision 212339)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -30956,7 +30956,7 @@ rs6000_init_dwarf_reg_sizes_extra (tree 
>rtx mem = gen_rtx_MEM (BLKmode, addr);
>rtx value = gen_int_mode (4, mode);
>  
> -  for (i = 1201; i < 1232; i++)
> +  for (i = FIRST_SPE_HIGH_

Re: Does anyone use Ada on Alpha?

2014-07-24 Thread Alan Lawrence


Well, I was lucky enough to gain access to an alpha pca56 for a day (I say
lucky, this may not be repeatable!). However I was not able to build the Ada
frontend, due (AFAICT) to the image being too big for relocations. (Moreover, my 
understanding is that the default memory model for Alpha, is the largest memory 
model.) I used pristine 4.9.1 sources, host compiler


$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/alpha-linux-gnu/4.9/lto-wrapper
Target: alpha-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.9.1-1'
--with-bugurl=file:///usr/share/doc/gcc-4.9/README.Bugs
--enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-4.9 --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/include/c++/4.9 --libdir=/usr/lib --enable-nls
--with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libssp
--disable-libmudflap --disable-libitm --disable-libsanitizer
--disable-libquadmath --enable-plugin --with-system-zlib
--disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.9-alpha/jre --enable-java-home
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.9-alpha
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.9-alpha
--with-arch-directory=alpha --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--enable-objc-gc --enable-multiarch --with-long-double-128
--enable-checking=release --build=alpha-linux-gnu --host=alpha-linux-gnu
--target=alpha-linux-gnu
Thread model: posix
gcc version 4.9.1 (Debian 4.9.1-1)

and gnat-4.3 (as per http://archive.debian.net/lenny/alpha/gnat-4.3/download), 
and performed


$ ../gcc-4.9.1/configure --enable-languages=c,ada --disable-bootstrap
--enable-libada --prefix=/home/alan/install --disable-nls --disable-threads
--disable-tls

(config.log attached) then

$ make all-gcc

which after approx 8 hours, finally died with the error message below. Given 
no-one has responded to my previous message, I'm left wondering if Ada still 
builds on Alpha, and if not, then should we be worrying about code generation 
bugs in it? (even, dare I say it, hypothetical code generation bugs?)


--Alan


-
g++ -g -O2 -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W
-Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute
-pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings
-DHAVE_CONFIG_H -static-libstdc++ -static-libgcc  -o gnat1 ada/adadecode.o
ada/adaint.o ada/argv.o ada/cio.o ada/cstreams.o ada/env.o ada/init.o
ada/initialize.o ada/raise.o ada/seh_init.o ada/targext.o ada/cuintp.o
ada/decl.o ada/misc.o ada/utils.o ada/utils2.o ada/trans.o ada/targtyps.o
ada/a-charac.o ada/a-chlat1.o ada/a-elchha.o ada/a-except.o ada/a-ioexce.o
ada/ada.o ada/spark_xrefs.o ada/ali.o ada/alloc.o ada/aspects.o ada/atree.o
ada/butil.o ada/casing.o ada/checks.o ada/comperr.o ada/csets.o ada/cstand.o
ada/debug.o ada/debug_a.o ada/einfo.o ada/elists.o ada/err_vars.o ada/errout.o
ada/erroutc.o ada/eval_fat.o ada/exp_aggr.o ada/exp_spark.o ada/exp_atag.o
ada/exp_attr.o ada/exp_cg.o ada/exp_ch11.o ada/exp_ch12.o ada/exp_ch13.o
ada/exp_ch2.o ada/exp_ch3.o ada/exp_ch4.o ada/exp_ch5.o ada/exp_ch6.o
ada/exp_ch7.o ada/exp_ch8.o ada/exp_ch9.o ada/exp_code.o ada/exp_dbug.o
ada/exp_disp.o ada/exp_dist.o ada/exp_fixd.o ada/exp_imgv.o ada/exp_intr.o
ada/exp_pakd.o ada/exp_prag.o ada/exp_sel.o ada/exp_smem.o ada/exp_strm.o
ada/exp_tss.o ada/exp_util.o ada/exp_vfpt.o ada/expander.o ada/fmap.o
ada/fname-uf.o ada/fname.o ada/freeze.o ada/frontend.o ada/g-byorma.o
ada/g-hesora.o ada/g-htable.o ada/g-spchge.o ada/g-speche.o ada/g-u3spch.o
ada/get_spark_xrefs.o ada/get_targ.o ada/gnat.o ada/gnatvsn.o ada/hostparm.o
ada/impunit.o ada/inline.o ada/interfac.o ada/itypes.o ada/krunch.o ada/layout.o
ada/lib-load.o ada/lib-util.o ada/lib-writ.o ada/lib-xref.o ada/lib.o ada/live.o
ada/namet-sp.o ada/namet.o ada/nlists.o ada/nmake.o ada/opt.o ada/osint-c.o
ada/osint.o ada/output.o ada/par.o ada/par_sco.o ada/prep.o ada/prepcomp.o
ada/put_spark_xrefs.o ada/put_scos.o ada/repinfo.o ada/restrict.o ada/rident.o
ada/rtsfind.o ada/s-addope.o ada/s-assert.o ada/s-bitops.o ada/s-carun8.o
ada/s-casuti.o ada/s-conca2.o ada/s-conca3.o ada/s-conca4.o ada/s-conca5.o
ada/s-conca6.o ada/s-conca7.o ada/s-conca8.o ada/s-conca9.o ada/s-crc32.o
ada/s-crtl.o ada/s-excdeb.o ada/s-except.o ada/s-exctab.o ada/s-htable.o
ada/s-imenne.o ada/s-imgenu.o ada/s-mastop.o ada/s-memory.o ada/s-os_lib.o
ada/s-parame.o ada/s-purexc.o ada/s-restri.o ada/s-secsta.o ada/s-soflin.o
ada/s-sopco3.o ada/s-sopco4.o ada/s-sopco5.o ada/s-stache.o ada/s-stalib.o
ada/s-stoele.o ada/s-strcom.o ada/s-strhas.o ada/s-string.o ada/s-strops.o
ada/s-traent.o ada/s-unstyp.o ada/s-utf_32.o ada/s-valint.o ada/s-valuns.o
ada/s-valuti.o ada/s-wchcnv.o ada/s-wc

Re: Migrating gcc.c-torture

2014-07-24 Thread Mike Stump

On Jul 24, 2014, at 12:06 AM, Jakub Jelinek  wrote:
> On Wed, Jul 23, 2014 at 04:52:23PM -0700, Andrew Pinski wrote:
>>> Comments, objections? Ok to apply the preliminary patch?
>> 
>> Yes, what if you don't move the tests but just change how the .exp to
>> use the same infrastructure as gcc.dg/torture instead?
> 
> Yeah.

I too support upgrade in place.

Re: Migrating gcc.c-torture

2014-07-24 Thread Mike Stump

[ dup, sorry ]

On Jul 24, 2014, at 12:06 AM, Jakub Jelinek  wrote:
> On Wed, Jul 23, 2014 at 04:52:23PM -0700, Andrew Pinski wrote:
>>> Comments, objections? Ok to apply the preliminary patch?
>> 
>> Yes, what if you don't move the tests but just change how the .exp to
>> use the same infrastructure as gcc.dg/torture instead?
> 
> Yeah.

I too support upgrade in place.

Re: Does anyone use Ada on Alpha?

2014-07-24 Thread Uros Bizjak

Hello!

> Well, I was lucky enough to gain access to an alpha pca56 for a day (I say
> lucky, this may not be repeatable!). However I was not able to build the Ada
>
> frontend, due (AFAICT) to the image being too big for relocations. (Moreover, 
> my understanding is > that the default memory model for Alpha, is the largest 
> memory model.) I used pristine 4.9.1
> sources, host compiler

You will need attached patch.

Uros.
Index: configure.ac
===
--- configure.ac(revision 173233)
+++ configure.ac(working copy)
@@ -1100,6 +1100,9 @@
   *-interix*)
 host_makefile_frag="config/mh-interix"
 ;;
+  alpha*-*-linux*)
+host_makefile_frag="config/mh-alpha-linux"
+;;
   hppa*-hp-hpux10*)
 host_makefile_frag="config/mh-pa-hpux10"
 ;;
Index: configure
===
--- configure   (revision 173233)
+++ configure   (working copy)
@@ -3672,6 +3672,9 @@
   *-interix*)
 host_makefile_frag="config/mh-interix"
 ;;
+  alpha*-*-linux*)
+host_makefile_frag="config/mh-alpha-linux"
+;;
   hppa*-hp-hpux10*)
 host_makefile_frag="config/mh-pa-hpux10"
 ;;
Index: config/mh-alpha-linux
===
--- config/mh-alpha-linux   (revision 0)
+++ config/mh-alpha-linux   (revision 0)
@@ -0,0 +1,3 @@
+# Prevent GPREL16 relocation truncation
+LDFLAGS += -Wl,--no-relax
+BOOT_LDFLAGS += -Wl,--no-relax

Re: [PATCH, rs6000, v2] Fix ELFv2 homogeneous float aggregate ABI bug

2014-07-24 Thread Ulrich Weigand

David Edelsohn wrote:
> On Mon, Jul 14, 2014 at 2:52 PM, Ulrich Weigand  wrote:
> > gcc/testsuite/ChangLog:
> >
> > * config/rs6000/rs6000.c (rs6000_function_arg): If a float argument
> > does not fit fully into floating-point registers, and there is still
> > space in the register parameter area, use GPRs to pass those parts
> > of the argument.  Issue -Wpsabi note if any parameter is now treated
> > differently than before.
> > (rs6000_arg_partial_bytes): Update.
> >
> > gcc/testsuite/ChangLog:
> >
> > * gcc.target/powerpc/ppc64-abi-warn-2.c: New test.
> 
> This patch is okay.

I've now checked in all three ABI patches plus the compat testsuite patch
to mainline, 4.9 branch, and 4.8 branch.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[PATCH, go]: Restore bootstrap

2014-07-24 Thread Uros Bizjak

Hello!

2014-07-24  Uros Bizjak  

* go/go-gcc.cc (Gcc_backend::global_variable_set_init): Rename
symtab_get_node to symtab_node::get.

Tested on x86_64-linux-gnu and committed to mainline SVN.

Uros.
Index: go-gcc.cc
===
--- go-gcc.cc   (revision 213019)
+++ go-gcc.cc   (working copy)
@@ -2374,8 +2374,8 @@ Gcc_backend::global_variable_set_init(Bvariable* v
 
   // If this variable goes in a unique section, it may need to go into
   // a different one now that DECL_INITIAL is set.
-  if (symtab_get_node(var_decl)
-  && symtab_get_node(var_decl)->implicit_section)
+  if (symtab_node::get(var_decl)
+  && symtab_node::get(var_decl)->implicit_section)
 {
   set_decl_section_name (var_decl, NULL);
   resolve_unique_section (var_decl,

[PATCH, fortran]: Move menu entry to avoid intrinsic.texi warnings

2014-07-24 Thread Uros Bizjak

Hello!

Attached patch avoids a bunch of:

gcc/fortran/intrinsic.texi:1665: warning: node next `ATOMIC_CAS' in
menu `ATOMIC_FETCH_ADD' and in sectioning `ATOMIC_DEFINE' differ

warnings.

2014-07-24  Uros Bizjak  

* intrinsic.texi (Intrinsic Procedures) : Move to
correct menu position to match sectioning.

Tested on x86_64-linux-gnu with Makeinfo 5.2 and committed as obvious.

Uros.
Index: intrinsic.texi
===
--- intrinsic.texi  (revision 213019)
+++ intrinsic.texi  (working copy)
@@ -63,12 +63,12 @@
 * @code{ATOMIC_ADD}:ATOMIC_ADD, Atomic ADD operation
 * @code{ATOMIC_AND}:ATOMIC_AND, Atomic bitwise AND operation
 * @code{ATOMIC_CAS}:ATOMIC_CAS, Atomic compare and swap
+* @code{ATOMIC_DEFINE}: ATOMIC_DEFINE, Setting a variable atomically
 * @code{ATOMIC_FETCH_ADD}: ATOMIC_FETCH_ADD, Atomic ADD operation with prior 
fetch
 * @code{ATOMIC_FETCH_AND}: ATOMIC_FETCH_AND, Atomic bitwise AND operation with 
prior fetch
 * @code{ATOMIC_FETCH_OR}: ATOMIC_FETCH_OR, Atomic bitwise OR operation with 
prior fetch
 * @code{ATOMIC_FETCH_XOR}: ATOMIC_FETCH_XOR, Atomic bitwise XOR operation with 
prior fetch
 * @code{ATOMIC_OR}: ATOMIC_OR, Atomic bitwise OR operation
-* @code{ATOMIC_DEFINE}: ATOMIC_DEFINE, Setting a variable atomically
 * @code{ATOMIC_REF}:ATOMIC_REF, Obtaining the value of a variable 
atomically
 * @code{ATOMIC_XOR}:ATOMIC_XOR, Atomic bitwise OR operation
 * @code{BACKTRACE}: BACKTRACE, Show a backtrace

Re: [Patch, avr] Add device name to cpp_builtins

2014-07-24 Thread Denis Chertykov

2014-07-23 12:04 GMT+04:00 Senthil Kumar Selvaraj
:
> The below patch adds a new preprocessor define for the device name
> (__AVR_DEVICE_NAME__) that was passed to the compiler.
>
> While the device name macro (say __AVR_ATmega128__) can be used to
> check for a specific device, there is no way right now for code
> to get the device name it is being compiled against (without checking
> for every possible device).
>
> This patch is groundwork for embedding device information in a note
> section (see binutils ml discussion
> https://www.sourceware.org/ml/binutils/2014-07/msg00146.html), so that
> utilities that operate on the ELF file do not have to hardcode
> device information themselves.
>
> If ok, could someone apply please? I don't have commit access.
>
> Regards
> Senthil
>
> 2014-07-23  Senthil Kumar Selvaraj  
>
> * config/avr/avr-c.c (avr_cpu_cpp_builtins): Add __AVR_DEVICE_NAME__.
>

Committed.

Denis.

Re: [PATCH, go]: Restore bootstrap

2014-07-24 Thread Martin Liška


On 07/24/2014 07:57 PM, Uros Bizjak wrote:

Hello!

2014-07-24  Uros Bizjak  

 * go/go-gcc.cc (Gcc_backend::global_variable_set_init): Rename
 symtab_get_node to symtab_node::get.

Tested on x86_64-linux-gnu and committed to mainline SVN.

Uros.

Hello,
thank you for your fix. As I see my configure output:

$ ../configure --disable-bootstrap --enable-checking=release 
--enable-languages=all --disable-multilib


The following languages will be built: c,c++,fortran,java,lto,objc
*** This configuration is not supported in the following subdirectories:
 gnattools target-libada target-libgo target-libbacktrace
(Any other directories should still work fine.)

I thought with 'all' I cover all front-ends, is it really intended 
behavior that go is disable by 'all'?


Thank you,
Martin

Re: Strenghten assumption about dynamic type changes (placement new)

2014-07-24 Thread Jason Merrill


On 07/23/2014 07:29 AM, Richard Biener wrote:

On Wed, Jul 23, 2014 at 12:44 PM, Jason Merrill  wrote:

On 07/22/2014 02:34 PM, Richard Biener wrote:


As discussed during the Cauldron keeping some builtin doesn't help because

you are not forced to access the newly created object via the pointer
returned
by the placement new.  That is,

template 
   struct Storage {
   char x[sizeof(T)];
  Storage() { new (x) T; }
  T& get() { return reinterpret_cast  (x); }
};

is valid


Yes.


(and used in this way in Boost - with a type different from 'char'
to force bigger alignment).


But I don't think that should be valid, unless the type contains a char
array at offset 0, as {std,boost}::aligned_storage; the C++ standard needs
improvement in this area.


Why especially at offset 0?  I'm constructing in the place of 'x', not
'this'.


Right, and I'm talking about the type of 'x', not the type of *this.


Do you say that

template 
struct Storage {
   T& get(i) { return new (x + sizeof (T) * i) T; }
   Storage (int n_) n (n_) {}
   int n;
   char x[sizeof (T)];
};

and doing

   Storage *s = new (malloc (sizeof (int)  * 4)) Storage (4);
   s->get (2);

isn't valid?


That's fine.


Looks like the small buffer optimization in boost::spirit::hold_any would
need to be tweaked, as it uses a void* to store anything the same size or
smaller, but that's the only dodgy case I see.


I've seen other odd cases in GCC bugreports ultimately coming from
Boost & friends (mpl or whatnot).  Very likely older Boost versions
of course.

Btw, any reason why the standard treats 'char' and 'unsigned char'
special but not 'signed char'?


I think we'd prefer to only treat unsigned char specially, but plain 
char is also allowed for historical reasons.



That said, as a matter of QOI I think only special-casing character
types would be a bad thing (see your hold_any example).


Well, there's a tradeoff between expressiveness and optimization.  But 
perhaps you have a better sense of that than I.


Jason

Re: testsuite allocators patch

2014-07-24 Thread François Dumont


On 24/07/2014 10:55, Jonathan Wakely wrote:

On 23/07/14 22:33 +0200, François Dumont wrote:
   I have a small question regarding some code next to the one I am 
modifying in this patch. I can see lines like:


 propagating_allocator() noexcept = default;

   When using a default implementation shouldn't we let the compiler 
decide if it should be noexcept or not depending on the member fields 
or base class default constructors ?


Stating it explicitly means you get an error if the default
implementation is not noexcept. That can be useful, to ensure you
don't silently start getting a throwing constructor by mistake because
of a change to a base class.

I'm not sure if I added the noexcept above, but if I did that might
have been what I was intending it to do. I don't remember.

I'll review the rest of the patch ASAP. Did you test it with no other
changes in your tree, and run the entire testsuite?


Ok, thanks for the explanation, it is clear now.

Yes I have tested with no other changes in my tree and got only those 
pretty printers errors which are unrelated I think:


Python Exception  iter() returned non-iterator of 
type '_contained':

$2 = std::experimental::optional [no contained value]
skipping: Python Exception  iter() returned 
non-iterator of type '_contained':

got: $2 = std::experimental::optional [no contained value]
PASS: libstdc++-prettyprinters/libfundts.cc print o
Python Exception  iter() returned non-iterator of 
type '_contained':

$3 = std::experimental::optional
skipping: Python Exception  iter() returned 
non-iterator of type '_contained':

got: $3 = std::experimental::optional
FAIL: libstdc++-prettyprinters/libfundts.cc print ob
Python Exception  iter() returned non-iterator of 
type '_contained':

$4 = std::experimental::optional
skipping: Python Exception  iter() returned 
non-iterator of type '_contained':

got: $4 = std::experimental::optional
FAIL: libstdc++-prettyprinters/libfundts.cc print oi

François

Re: werror fallout for cross-builds (was: Re: [BUILDROBOT][PATCH] Fix mmix (unused variable))

2014-07-24 Thread Hans-Peter Nilsson

On Thu, 24 Jul 2014, Jan-Benedict Glaw wrote:
> On Tue, 2014-07-22 16:40:31 -0400, Hans-Peter Nilsson  
> wrote:
> > Jan-Benedict, which host gcc version do you use when getting
> > most targets to build with config-list.mk?  Maybe we can just
> > set the initial version to that instead of 4.4.4.
>
> darkeye   gcc (Debian 4.8.1-7) 4.8.1
> gccbuild  gcc (Debian 4.8.1-7) 4.8.1
> pluto gcc (Debian 4.9.1-1) 4.9.1
> gcc20 gcc (Debian 4.4.5-8) 4.4.5
> gcc76 gcc (Debian 4.4.5-8) 4.4.5
> gcc110gcc (GCC) 4.7.2 20121109 (Red Hat 4.7.2-8)
> gcc111gcc (GCC) 4.8.1
>   XL 12.1.0.0 (if I ever get that properly working...)

I tried to repeat that, for the CFarm hosts.  On gcc111 trying
config-list.mk on the 0720 snapshot (and with mpc, mpfr and gmp
in-tree) gives me: "configure: error: GNAT is required to build
ada" already for aarch64-unknown-elf.  Somewhat expected, as I
don't think many of the targets in the config-list.mk LIST have
Ada bits ported, but maybe there are no Ada specific bits needed
to build the GNAT compiler proper, just a host GNAT.

On gcc110 which *has* gnat, I get:

/gcc/o/aarch64-elf/./mpfr -I/home/hp/gcc/gcc/mpfr -I/opt/cfarm/mpc/include  
-I../../../gcc/gcc/../libdecnumber -I../../../gcc/gcc/../libdecnumber/dpd 
-I../libdecnumber -I../../../gcc/gcc/../libbacktrace-o dwarf2out.o -MT 
dwarf2out.o -MMD -MP -MF ./.deps/dwarf2out.TPo ../../../gcc/gcc/dwarf2out.c
In file included from ../../../gcc/gcc/real.h:25:0,
 from ../../../gcc/gcc/rtl.h:27,
 from ../../../gcc/gcc/dwarf2out.c:62:
../../../gcc/gcc/wide-int.h: In function 'void insert_wide_int(const wide_int&, 
unsigned char*, int)':
../../../gcc/gcc/wide-int.h:800:48: error: array subscript is above array 
bounds [-Werror=array-bounds]
cc1plus: all warnings being treated as errors
gmake[2]: *** [dwarf2out.o] Error 1
gmake[2]: Leaving directory `/home/hp/gcc/o/aarch64-elf/gcc'

By that list, did you really mean that you got even 4.4.5 to
work on an unmodified config-list.mk?

Perhaps you have local patches or did you call config-list.mk
with some kind of options?  Maybe you didn't actually use
config-list.mk?  Or just looked to see whether the first failure
for each target was on a target-specific file or the (same)
middle-end bits?  Ok, I'm out of guesses. :)

> > For reference, the patch, which works as intended (-Werror in
> > the gcc build directory for cross-builds by default, not
> > affecting native builds and not at all for gcc < 4.4.4).  (Vax
> > is excepted, see J-B's previous post.)  I'd ask for approval
>
> VAX sould work, it's pdp11 that I said wouldn't.

Sorry I misremembered.

brgds, H-P

Re: Patch for constexpr variable templates

2014-07-24 Thread Jason Merrill


First of all, thanks a lot for taking this on!  A few nitpicks:

On 07/21/2014 11:06 PM, Braden Obrzut wrote:

 grokvardecl (tree type,
 tree name,
+tree orig_declarator,
 const cp_decl_specifier_seq *declspecs,
 int initialized,
 int constp,
+int template_count,


Indentation mismatch.


+  if (orig_declarator && (processing_template_decl
+  || TREE_CODE (orig_declarator) == TEMPLATE_ID_EXPR))


The indentation is wrong, since the || is part of the inner 
parenthesized expression.



+  /* Variable templates will need to have the class context.  */
+  if (VAR_P (value))
+DECL_CONTEXT (value) = current_class_type;


Why isn't this covered by grokvardecl?


-  if (!is_overloaded_fn (fns))
+  bool var_templ = variable_template_p (fns);
+  if (!is_overloaded_fn (fns) && !var_templ)
 {
   error ("%qD is not a function template", fns);


I think here we should check whether 'fns' and 'decl' are both variables 
or both functions so that we can give a better diagnostic.



+  if (cxx_dialect < cxx1y)
+permerror (DECL_SOURCE_LOCATION (decl),
+   "%qD is not a static data member of a class template", 
decl);


It's customary to use pedwarn and mention the relevant -std= flag.


+  if ((specialization || member_specialization)
+ /* Variable templates don't apply.  */
+ && (TREE_CODE (TREE_TYPE (decl)) == FUNCTION_TYPE
+ || TREE_CODE (TREE_TYPE (decl)) == METHOD_TYPE))


Indentation mismatch.


+  // Namespace scope variable templates should have a template header.


"Namespace-scope"


+ if (!variable_template_p (tmpl)
+ && DECL_STATIC_FUNCTION_P (tmpl)

...

+ if (!variable_template_p (tmpl))
+   copy_default_args_to_explicit_spec (decl);


I'd prefer to check for a function template here rather than "not a 
variable template".



+  else if (VAR_P (decl))
+{
+  if (!DECL_DECLARED_CONSTEXPR_P (decl))
+error ("template declaration of non-constexpr variable %qD", decl);
+}


As Ed and Andrew pointed out, non-constexpr variable templates are fine.


+  bool var_templ = DECL_TEMPLATE_INFO (decl)
+   && variable_template_p (DECL_TI_TEMPLATE (decl));


An expression on multiple lines should be parenthesized to preserve 
indentation.



+ bool enter_context = CLASS_TYPE_P (DECL_CONTEXT (d));


You can use DECL_CLASS_SCOPE_P.


+  return instantiate_template (TREE_OPERAND (var, 0), TREE_OPERAND (var, 1), 
tf_error);


Line too long.

Jason

Re: [PATCH] libstdc++: add uniform on sphere distribution

2014-07-24 Thread Ulrich Drepper

On Wed, Jul 23, 2014 at 6:29 AM, Jonathan Wakely  wrote:
> As an aside, we already have divide-by-zero bugs in , it
> would be nice if someone could look at that.

I'll take a look at this soon.

Re: C++ PATCH for c++/61687 (extra errors with -O2)

2014-07-24 Thread Jason Merrill


On 07/17/2014 08:23 AM, Jan Hubicka wrote:

Given my experience about numbers of functions that become reachable when you 
stream all virtuals into LTO,
I wonder if we don't want to use possible_polymorphic_call_targets within the 
front-end to avoid instantiating
those that can't be called?


Yes, I think we need to do that.  I'll look into it.


I think it should not be too hard - all we need is to populate the type 
inheritance graph from FE and then
for each polymorphic call produce the list to mark possible targets are 
reachable.


Hmm, do you think it's reasonable to call build_type_inheritance_graph 
from the FE?  The FE doesn't currently track all types derived from a 
particular base.


Jason

Re: [PATCH 2/3] PR other/61321 - demangler crash on casts in template parameters

2014-07-24 Thread Cary Coutant

> It seems that the problem here is more general; a template argument list is
> not in scope within that same template argument list.  Can't we fix that
> without special-casing conversion ops?

I think conversion ops really are a special case. It's the only case
where the template parameters refer to the template argument list from
the cast operator's enclosing template. In a cast expression, like
anywhere else you might have a template parameter, the template
parameter refers to the template argument list of the immediately
enclosing template.

I think this note from Section 5.1.3 (Operator Encodings) of the ABI
is what makes this a special case (it's an informative comment in the
document, but seems to me to be normative):

"For a user-defined conversion operator the result type (i.e., the
type to which the operator converts) is part of the mangled name of
the function. If the conversion operator is a member template, the
result type will appear before the template parameters. There may be
forward references in the result type to the template parameters."

-cary

Re: FWD: Re: OpenACC subarray specifications in the GCC Fortran front end

2014-07-24 Thread Cesar Philippidis

On 07/24/2014 06:11 AM, Thomas Schwinge wrote:

> OMP_LIST_DEVICEPTR remains to be converted, which can be done as a later
> follow-up patch.

Yes, that's the plan.

> I'd suggest to continue to handle all the data clauses...
> 
>>  
>>  /* Match OpenMP and OpenACC directive clauses. MASK is a bitmask of
>> clauses that are allowed for a particular directive.  */
>>  
>>  static match
>>  gfc_match_omp_clauses (gfc_omp_clauses **cp, unsigned long long mask,
>> -   bool first = true, bool needs_space = true)
>> +   bool first = true, bool needs_space = true,
>> +   bool openacc = false)
>>  {
>>gfc_omp_clauses *c = gfc_get_omp_clauses ();
>>locus old_loc;
>> @@ -533,181 +692,109 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, 
>> unsigned long long mask,
>>if ((mask & OMP_CLAUSE_NUM_THREADS) && c->num_threads == NULL
>>&& gfc_match ("num_threads ( %e )", &c->num_threads) == MATCH_YES)
>>  continue;
>> +  if ((mask & OMP_CLAUSE_NUM_GANGS) && c->num_gangs_expr == NULL
>> +  && gfc_match ("num_gangs ( %e )", &c->num_gangs_expr) == MATCH_YES)
>> +continue;
>> +  if ((mask & OMP_CLAUSE_NUM_WORKERS) && c->num_workers_expr == NULL
>> +  && gfc_match ("num_workers ( %e )", &c->num_workers_expr)
>> +  == MATCH_YES)
>> +continue;
>> +  if ((mask & OMP_CLAUSE_TILE)
>> +  && match_oacc_expr_list ("tile (", &c->tile_list, true) == MATCH_YES)
>> +continue;
>> +  if ((mask & OMP_CLAUSE_SEQ) && !c->seq
>> +  && gfc_match ("seq") == MATCH_YES)
>> +{
>> +  c->seq = true;
>> +  needs_space = true;
>> +  continue;
>> +}
>> +  if ((mask & OMP_CLAUSE_INDEPENDENT) && !c->independent
>> +  && gfc_match ("independent") == MATCH_YES)
>> +{
>> +  c->independent = true;
>> +  needs_space = true;
>> +  continue;
>> +}
>> +  if ((mask & OMP_CLAUSE_AUTO) && !c->par_auto
>> +&& gfc_match ("auto") == MATCH_YES)
>> +{
>> +  c->par_auto = true;
>> +  needs_space = true;
>> +  continue;
>> +}
>> +  if ((mask & OMP_CLAUSE_WAIT) && !c->wait
>> +&& gfc_match ("wait") == MATCH_YES)
>> +{
>> +  c->wait = true;
>> +  match_oacc_expr_list (" (", &c->wait_list, false);
>> +  continue;
>> +}
>> +  /* Common, in the sense that no special handling is required,
>> + OpenACC and OpenMP data clauses.  */
>>if ((mask & OMP_CLAUSE_PRIVATE)
>>&& gfc_match_omp_variable_list ("private (",
>>&c->lists[OMP_LIST_PRIVATE], true)
>> - == MATCH_YES)
>> +  == MATCH_YES)
>>  continue;
>>if ((mask & OMP_CLAUSE_FIRSTPRIVATE)
>>&& gfc_match_omp_variable_list ("firstprivate (",
>>&c->lists[OMP_LIST_FIRSTPRIVATE],
>>true)
>> - == MATCH_YES)
>> +  == MATCH_YES)
>>  continue;
>>if ((mask & OMP_CLAUSE_LASTPRIVATE)
>>&& gfc_match_omp_variable_list ("lastprivate (",
>>&c->lists[OMP_LIST_LASTPRIVATE],
>>true)
>> - == MATCH_YES)
>> +  == MATCH_YES)
>>  continue;
>>if ((mask & OMP_CLAUSE_COPYPRIVATE)
>>&& gfc_match_omp_variable_list ("copyprivate (",
>>&c->lists[OMP_LIST_COPYPRIVATE],
>>true)
>> - == MATCH_YES)
>> +  == MATCH_YES)
>>  continue;
>>if ((mask & OMP_CLAUSE_SHARED)
>>&& gfc_match_omp_variable_list ("shared (",
>>&c->lists[OMP_LIST_SHARED], true)
>> - == MATCH_YES)
>> -continue;
>> -  if ((mask & OMP_CLAUSE_COPYIN)
>> -  && gfc_match_omp_variable_list ("copyin (",
>> -  &c->lists[OMP_LIST_COPYIN], true)
>> - == MATCH_YES)
>> -continue;
>> -  if ((mask & OMP_CLAUSE_NUM_GANGS) && c->num_gangs_expr == NULL
>> -  && gfc_match ("num_gangs ( %e )", &c->num_gangs_expr) == MATCH_YES)
>> -continue;
>> -  if ((mask & OMP_CLAUSE_NUM_WORKERS) && c->num_workers_expr == NULL
>> -  && gfc_match ("num_workers ( %e )", &c->num_workers_expr)
>>== MATCH_YES)
>>  continue;
>> -  if ((mask & OMP_CLAUSE_COPY)
>> -  && gfc_match_omp_variable_list ("copy (",
>> -  &c->lists[OMP_LIST_COPY], true)
>> - == MATCH_YES)
>> -continue;
>> -  if ((mask & OMP_CLAUSE_OACC_COPYIN)
>> -  && gfc_match_omp_variable_list ("copyin (",
>> -  &c->lists[OMP_LIST_OACC_COPYIN], true)
>> - == MATCH_YES)
>> -continue;
>> -  if ((mask & OMP_CLAUSE_COPYOUT)
>> -  && gfc_match_omp_variable_list ("copyout (",
>> -  &c->lists[OMP_LIST_COPYOUT], true)
>> - ==

Re: [PATCH] libstdc++: add uniform on sphere distribution

2014-07-24 Thread Jonathan Wakely

On 24 July 2014 22:15, Ulrich Drepper wrote:
> On Wed, Jul 23, 2014 at 6:29 AM, Jonathan Wakely  wrote:
>> As an aside, we already have divide-by-zero bugs in , it
>> would be nice if someone could look at that.
>
> I'll take a look at this soon.

That would be great, thanks!

Re: testsuite allocators patch

2014-07-24 Thread Jonathan Wakely

On 24 July 2014 21:11, François Dumont wrote:
>
> Yes I have tested with no other changes in my tree and got only those pretty
> printers errors which are unrelated I think:
>
> Python Exception  iter() returned non-iterator of type
> '_contained':
> $2 = std::experimental::optional [no contained value]

I haven't seen these, I'll fix them on Monday, thanks.

Re: C++ PATCH for c++/61687 (extra errors with -O2)

2014-07-24 Thread Jan Hubicka

``
> >Given my experience about numbers of functions that become reachable when 
> >you stream all virtuals into LTO,
> >I wonder if we don't want to use possible_polymorphic_call_targets within 
> >the front-end to avoid instantiating
> >those that can't be called?
> 
> Yes, I think we need to do that.  I'll look into it.
> 
> >I think it should not be too hard - all we need is to populate the type 
> >inheritance graph from FE and then
> >for each polymorphic call produce the list to mark possible targets are 
> >reachable.
> 
> Hmm, do you think it's reasonable to call
> build_type_inheritance_graph from the FE?  The FE doesn't currently
> track all types derived from a particular base.

ipa-deivrt has code to do the tracking for you. All is needed is to cal 
get_odr_type
for all polymorphic type that we care about.  build_type_inheritance_graph just
walks all virtual methods to register all polymorphic types that matters (i.e.
have methods associated with them) into the datastructure.  I think within C++
FE we can just reigster all polymoprhic types we produce.

I see that during the reachability walk new types may be instantiated. This may
extend target lists earlier visited, I guess we need to add some way to make
to track this?

Honza
> 
> Jason

Updated incremental hash patchkit

2014-07-24 Thread Andi Kleen

This version addresses the review feedback. begin is gone now.
add_flag is in the class.  The changes in tree.c are nearer
the original code now. Some other minor cleanups.

Passed bootstrap and test and x86_64-linux. Ok to commit 
now?

Thanks,
-Andi

[PATCH 2/4] Convert LTO type hashing to the new inchash interface

2014-07-24 Thread Andi Kleen

From: Andi Kleen 

Should not really change any behavior, it's just a more abstract
interface, but uses the same underlying hash functions.

lto/:

2014-07-24  Andi Kleen  

* lto.c (hash_canonical_type): Convert to inchash.
(iterative_hash_canonical_type): Dito.
---
 gcc/lto/lto.c | 54 --
 1 file changed, 28 insertions(+), 26 deletions(-)

diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 9781653..48fb78e 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -267,7 +267,7 @@ static hash_map 
*canonical_type_hash_cache;
 static unsigned long num_canonical_type_hash_entries;
 static unsigned long num_canonical_type_hash_queries;
 
-static hashval_t iterative_hash_canonical_type (tree type, hashval_t val);
+static void iterative_hash_canonical_type (tree type, inchash &hstate);
 static hashval_t gimple_canonical_type_hash (const void *p);
 static void gimple_register_canonical_type_1 (tree t, hashval_t hash);
 
@@ -279,14 +279,14 @@ static void gimple_register_canonical_type_1 (tree t, 
hashval_t hash);
 static hashval_t
 hash_canonical_type (tree type)
 {
-  hashval_t v;
+  inchash hstate;
 
   /* Combine a few common features of types so that types are grouped into
  smaller sets; when searching for existing matching types to merge,
  only existing types having the same features as the new type will be
  checked.  */
-  v = iterative_hash_hashval_t (TREE_CODE (type), 0);
-  v = iterative_hash_hashval_t (TYPE_MODE (type), v);
+  hstate.add_int (TREE_CODE (type));
+  hstate.add_int (TYPE_MODE (type));
 
   /* Incorporate common features of numerical types.  */
   if (INTEGRAL_TYPE_P (type)
@@ -295,48 +295,50 @@ hash_canonical_type (tree type)
   || TREE_CODE (type) == OFFSET_TYPE
   || POINTER_TYPE_P (type))
 {
-  v = iterative_hash_hashval_t (TYPE_PRECISION (type), v);
-  v = iterative_hash_hashval_t (TYPE_UNSIGNED (type), v);
+  hstate.add_int (TYPE_UNSIGNED (type));
+  hstate.add_int (TYPE_PRECISION (type));
 }
 
   if (VECTOR_TYPE_P (type))
 {
-  v = iterative_hash_hashval_t (TYPE_VECTOR_SUBPARTS (type), v);
-  v = iterative_hash_hashval_t (TYPE_UNSIGNED (type), v);
+  hstate.add_int (TYPE_VECTOR_SUBPARTS (type));
+  hstate.add_int (TYPE_UNSIGNED (type));
 }
 
   if (TREE_CODE (type) == COMPLEX_TYPE)
-v = iterative_hash_hashval_t (TYPE_UNSIGNED (type), v);
+hstate.add_int (TYPE_UNSIGNED (type));
 
   /* For pointer and reference types, fold in information about the type
  pointed to but do not recurse to the pointed-to type.  */
   if (POINTER_TYPE_P (type))
 {
-  v = iterative_hash_hashval_t (TYPE_ADDR_SPACE (TREE_TYPE (type)), v);
-  v = iterative_hash_hashval_t (TREE_CODE (TREE_TYPE (type)), v);
+  hstate.add_int (TYPE_ADDR_SPACE (TREE_TYPE (type)));
+  hstate.add_int (TREE_CODE (TREE_TYPE (type)));
 }
 
   /* For integer types hash only the string flag.  */
   if (TREE_CODE (type) == INTEGER_TYPE)
-v = iterative_hash_hashval_t (TYPE_STRING_FLAG (type), v);
+hstate.add_int (TYPE_STRING_FLAG (type));
 
   /* For array types hash the domain bounds and the string flag.  */
   if (TREE_CODE (type) == ARRAY_TYPE && TYPE_DOMAIN (type))
 {
-  v = iterative_hash_hashval_t (TYPE_STRING_FLAG (type), v);
+  hstate.add_int (TYPE_STRING_FLAG (type));
   /* OMP lowering can introduce error_mark_node in place of
 random local decls in types.  */
   if (TYPE_MIN_VALUE (TYPE_DOMAIN (type)) != error_mark_node)
-   v = iterative_hash_expr (TYPE_MIN_VALUE (TYPE_DOMAIN (type)), v);
+   hstate.add_int (iterative_hash_expr (TYPE_MIN_VALUE (
+   TYPE_DOMAIN (type)), 0));
   if (TYPE_MAX_VALUE (TYPE_DOMAIN (type)) != error_mark_node)
-   v = iterative_hash_expr (TYPE_MAX_VALUE (TYPE_DOMAIN (type)), v);
+   hstate.add_int (iterative_hash_expr (TYPE_MAX_VALUE (
+   TYPE_DOMAIN (type)), 0));
 }
 
   /* Recurse for aggregates with a single element type.  */
   if (TREE_CODE (type) == ARRAY_TYPE
   || TREE_CODE (type) == COMPLEX_TYPE
   || TREE_CODE (type) == VECTOR_TYPE)
-v = iterative_hash_canonical_type (TREE_TYPE (type), v);
+iterative_hash_canonical_type (TREE_TYPE (type), hstate);
 
   /* Incorporate function return and argument types.  */
   if (TREE_CODE (type) == FUNCTION_TYPE || TREE_CODE (type) == METHOD_TYPE)
@@ -346,17 +348,17 @@ hash_canonical_type (tree type)
 
   /* For method types also incorporate their parent class.  */
   if (TREE_CODE (type) == METHOD_TYPE)
-   v = iterative_hash_canonical_type (TYPE_METHOD_BASETYPE (type), v);
+   iterative_hash_canonical_type (TYPE_METHOD_BASETYPE (type), hstate);
 
-  v = iterative_hash_canonical_type (TREE_TYPE (type), v);
+  iterative_hash_canonical_type (TREE_TYPE (type), hstate);
 
   for (p = TYPE_ARG_TYPES (type), na = 0; p; p = TR

[PATCH 1/4] Add an abstract incremental hash data type

2014-07-24 Thread Andi Kleen

From: Andi Kleen 

Some files in gcc, like lto or tree, do large scale incremential hashing.
The current jhash implementation of this could be likely improved
by using an incremential hash that does not do a full rehashing
for every new value added.

This patch adds a new "inchash" class that abstracts the internal
state of the hash. This makes it easier to plug in new hashes
and also cleans up the code a bit.

Right now it is just implemented in the same way as the old
iterative hash in tree.c. The previous iterative hash code
from tree.c moved into a new separate file. Also I fixed up all
users to include the new header.

It should not really significantly change any hashing by itself,
it's mostly a cleanup at this point.

v2: Remove begin. Add commutative interface.
Add merge hash interface.  Add add_flag.

gcc/:

2014-07-24  Andi Kleen  

* Makefile.in (OBJS): Add inchash.o.
(PLUGIN_HEADERS): Add inchash.h.
* ipa-devirt.c: Include inchash.h.
* lto-streamer-out.c: Dito.
* tree-ssa-dom.c: Dito.
* tree-ssa-pre.c: Dito.
* tree-ssa-sccvn.c: Dito.
* tree-ssa-tail-merge.c: Dito.
* asan.c: Dito.
* tree.c (iterative_hash_hashval_t): Move to ...
(iterative_hash_host_wide_int): Move to ...
* inchash.c: Here. New file.
* tree.h (iterative_hash_hashval_t): Move to ...
(iterative_hash_host_wide_int): Move to ...
* inchash.h: Here. New file.

gcc/lto/:

2014-07-10  Andi Kleen  

* lto.c: Include inchash.h
---
 gcc/Makefile.in   |   3 +-
 gcc/asan.c|   1 +
 gcc/inchash.c |  75 +++
 gcc/inchash.h | 129 ++
 gcc/ipa-devirt.c  |   1 +
 gcc/lto-streamer-out.c|   1 +
 gcc/lto/lto.c |   1 +
 gcc/tree-ssa-dom.c|   1 +
 gcc/tree-ssa-pre.c|   1 +
 gcc/tree-ssa-sccvn.c  |   1 +
 gcc/tree-ssa-tail-merge.c |   1 +
 gcc/tree.c|  51 +-
 gcc/tree.h|   3 --
 13 files changed, 215 insertions(+), 54 deletions(-)
 create mode 100644 gcc/inchash.c
 create mode 100644 gcc/inchash.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 187e6b6..4c578b3 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1268,6 +1268,7 @@ OBJS = \
hwint.o \
ifcvt.o \
ree.o \
+   inchash.o \
incpath.o \
init-regs.o \
internal-fn.o \
@@ -3162,7 +3163,7 @@ PLUGIN_HEADERS = $(TREE_H) $(CONFIG_H) $(SYSTEM_H) 
coretypes.h $(TM_H) \
   tree-parloops.h tree-ssa-address.h tree-ssa-coalesce.h tree-ssa-dom.h \
   tree-ssa-loop.h tree-ssa-loop-ivopts.h tree-ssa-loop-manip.h \
   tree-ssa-loop-niter.h tree-ssa-ter.h tree-ssa-threadedge.h \
-  tree-ssa-threadupdate.h
+  tree-ssa-threadupdate.h inchash.h
 
 # generate the 'build fragment' b-header-vars
 s-header-vars: Makefile
diff --git a/gcc/asan.c b/gcc/asan.c
index 0d78634..fe9a2d5 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "internal-fn.h"
 #include "gimple-expr.h"
 #include "is-a.h"
+#include "inchash.h"
 #include "gimple.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
diff --git a/gcc/inchash.c b/gcc/inchash.c
new file mode 100644
index 000..0f8583e
--- /dev/null
+++ b/gcc/inchash.c
@@ -0,0 +1,75 @@
+/* Incremential hashing for jhash.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "hashtab.h"
+#include "inchash.h"
+
+/* Borrowed from hashtab.c iterative_hash implementation.  */
+#define mix(a,b,c) \
+{ \
+  a -= b; a -= c; a ^= (c>>13); \
+  b -= c; b -= a; b ^= (a<< 8); \
+  c -= a; c -= b; c ^= ((b&0x)>>13); \
+  a -= b; a -= c; a ^= ((c&0x)>>12); \
+  b -= c; b -= a; b = (b ^ (a<<16)) & 0x; \
+  c -= a; c -= b; c = (c ^ (b>> 5)) & 0x; \
+  a -= b; a -= c; a = (a ^ (c>> 3)) & 0x; \
+  b -= c; b -= a; b = (b ^ (a<<10)) & 0x; \
+  c -= a; c -= b; c = (c ^ (b>>15)) & 0x; \
+}
+
+
+/* Produce good hash value combining VAL and VAL2.  */
+hashval_t
+iterative_hash_hashval_t (hashval_t val, hashval_t val2)
+{
+  /* the golden ratio; an arbitrary value.  */
+  hashval_t a = 0x9e3779b9;

[PATCH 4/4] Convert lto streamer out hashing to inchash

2014-07-24 Thread Andi Kleen

From: Andi Kleen 

No substantial changes, although the hash values will be slightly
different.

v2: add_flag moved to inchash. Some minor changes.

gcc/:

2014-07-24  Andi Kleen  

* lto-streamer-out.c (hash_tree): Convert to inchash.
---
 gcc/lto-streamer-out.c | 251 +
 1 file changed, 130 insertions(+), 121 deletions(-)

diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index cf2e9a8..21b6e07 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -692,207 +692,213 @@ DFS_write_tree_body (struct output_block *ob,
 static hashval_t
 hash_tree (struct streamer_tree_cache_d *cache, tree t)
 {
+  inchash hstate;
+
 #define visit(SIBLING) \
   do { \
 unsigned ix; \
 if (SIBLING && streamer_tree_cache_lookup (cache, SIBLING, &ix)) \
-  v = iterative_hash_hashval_t (streamer_tree_cache_get_hash (cache, ix), 
v); \
+  hstate.add_int (streamer_tree_cache_get_hash (cache, ix)); \
   } while (0)
 
   /* Hash TS_BASE.  */
   enum tree_code code = TREE_CODE (t);
-  hashval_t v = iterative_hash_host_wide_int (code, 0);
+  hstate.add_int (code);
   if (!TYPE_P (t))
 {
-  v = iterative_hash_host_wide_int (TREE_SIDE_EFFECTS (t)
-   | (TREE_CONSTANT (t) << 1)
-   | (TREE_READONLY (t) << 2)
-   | (TREE_PUBLIC (t) << 3), v);
+  hstate.add_flag (TREE_SIDE_EFFECTS (t));
+  hstate.add_flag (TREE_CONSTANT (t));
+  hstate.add_flag (TREE_READONLY (t));
+  hstate.add_flag (TREE_PUBLIC (t));
 }
-  v = iterative_hash_host_wide_int (TREE_ADDRESSABLE (t)
-   | (TREE_THIS_VOLATILE (t) << 1), v);
+  hstate.add_flag (TREE_ADDRESSABLE (t));
+  hstate.add_flag (TREE_THIS_VOLATILE (t));
   if (DECL_P (t))
-v = iterative_hash_host_wide_int (DECL_UNSIGNED (t), v);
+hstate.add_flag (DECL_UNSIGNED (t));
   else if (TYPE_P (t))
-v = iterative_hash_host_wide_int (TYPE_UNSIGNED (t), v);
+hstate.add_flag (TYPE_UNSIGNED (t));
   if (TYPE_P (t))
-v = iterative_hash_host_wide_int (TYPE_ARTIFICIAL (t), v);
+hstate.add_flag (TYPE_ARTIFICIAL (t));
   else
-v = iterative_hash_host_wide_int (TREE_NO_WARNING (t), v);
-  v = iterative_hash_host_wide_int (TREE_NOTHROW (t)
-   | (TREE_STATIC (t) << 1)
-   | (TREE_PROTECTED (t) << 2)
-   | (TREE_DEPRECATED (t) << 3), v);
+hstate.add_flag (TREE_NO_WARNING (t));
+  hstate.add_flag (TREE_NOTHROW (t));
+  hstate.add_flag (TREE_STATIC (t));
+  hstate.add_flag (TREE_PROTECTED (t));
+  hstate.add_flag (TREE_DEPRECATED (t));
   if (code != TREE_BINFO)
-v = iterative_hash_host_wide_int (TREE_PRIVATE (t), v);
+hstate.add_flag (TREE_PRIVATE (t));
   if (TYPE_P (t))
-v = iterative_hash_host_wide_int (TYPE_SATURATING (t)
- | (TYPE_ADDR_SPACE (t) << 1), v);
+{
+  hstate.add_flag (TYPE_SATURATING (t));
+  hstate.add_flag (TYPE_ADDR_SPACE (t));
+}
   else if (code == SSA_NAME)
-v = iterative_hash_host_wide_int (SSA_NAME_IS_DEFAULT_DEF (t), v);
+hstate.add_flag (SSA_NAME_IS_DEFAULT_DEF (t));
+  hstate.commit_flag ();
 
   if (CODE_CONTAINS_STRUCT (code, TS_INT_CST))
 {
   int i;
-  v = iterative_hash_host_wide_int (TREE_INT_CST_NUNITS (t), v);
-  v = iterative_hash_host_wide_int (TREE_INT_CST_EXT_NUNITS (t), v);
+  hstate.add_wide_int (TREE_INT_CST_NUNITS (t));
+  hstate.add_wide_int (TREE_INT_CST_EXT_NUNITS (t));
   for (i = 0; i < TREE_INT_CST_NUNITS (t); i++)
-   v = iterative_hash_host_wide_int (TREE_INT_CST_ELT (t, i), v);
+   hstate.add_wide_int (TREE_INT_CST_ELT (t, i));
 }
 
   if (CODE_CONTAINS_STRUCT (code, TS_REAL_CST))
 {
   REAL_VALUE_TYPE r = TREE_REAL_CST (t);
-  v = iterative_hash_host_wide_int (r.cl, v);
-  v = iterative_hash_host_wide_int (r.decimal
-   | (r.sign << 1)
-   | (r.signalling << 2)
-   | (r.canonical << 3), v);
-  v = iterative_hash_host_wide_int (r.uexp, v);
-  for (unsigned i = 0; i < SIGSZ; ++i)
-   v = iterative_hash_host_wide_int (r.sig[i], v);
+  hstate.add_flag (r.cl);
+  hstate.add_flag (r.sign);
+  hstate.add_flag (r.signalling);
+  hstate.add_flag (r.canonical);
+  hstate.commit_flag ();
+  hstate.add_int (r.uexp);
+  hstate.add (r.sig, sizeof (r.sig));
 }
 
   if (CODE_CONTAINS_STRUCT (code, TS_FIXED_CST))
 {
   FIXED_VALUE_TYPE f = TREE_FIXED_CST (t);
-  v = iterative_hash_host_wide_int (f.mode, v);
-  v = iterative_hash_host_wide_int (f.data.low, v);
-  v = iterative_hash_host_wide_int (f.data.high, v);
+  hstate.add_int (f.mode);
+  hstate.add_int (f.data.low);
+  hstate.add_int (f.data.high)

[PATCH 3/4] Convert the tree.c type hashing over to inchash

2014-07-24 Thread Andi Kleen

From: Andi Kleen 

v2: Use commutative interface. Be much nearer to the old
code.

gcc/:

2014-07-24  Andi Kleen  

* tree.c (build_type_attribute_qual_variant): Use inchash.
(type_hash_list): Dito.
(attribute_hash_list): Dito
(iterative_hstate_expr): Dito.
(iterative_hash_expr): Dito.
(build_range_type_1): Dito.
(build_array_type_1): Dito.
(build_function_type): Dito.
(build_method_type_directly): Dito.
(build_offset_type): Dito.
(build_complex_type): Dito.
(make_vector_type): Dito.
* tree.h (iterative_hash_expr): Add compat wrapper.
(iterative_hstate_expr): Add.

lto/:

2014-07-24  Andi Kleen  

* lto.c (hash_canonical_type): Call iterative_hstate_expr.
---
 gcc/lto/lto.c |   6 +-
 gcc/tree.c| 182 --
 gcc/tree.h|  13 -
 3 files changed, 102 insertions(+), 99 deletions(-)

diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 48fb78e..40bf2ea 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -327,11 +327,9 @@ hash_canonical_type (tree type)
   /* OMP lowering can introduce error_mark_node in place of
 random local decls in types.  */
   if (TYPE_MIN_VALUE (TYPE_DOMAIN (type)) != error_mark_node)
-   hstate.add_int (iterative_hash_expr (TYPE_MIN_VALUE (
-   TYPE_DOMAIN (type)), 0));
+   iterative_hstate_expr (TYPE_MIN_VALUE (TYPE_DOMAIN (type)), hstate);
   if (TYPE_MAX_VALUE (TYPE_DOMAIN (type)) != error_mark_node)
-   hstate.add_int (iterative_hash_expr (TYPE_MAX_VALUE (
-   TYPE_DOMAIN (type)), 0));
+   iterative_hstate_expr (TYPE_MAX_VALUE (TYPE_DOMAIN (type)), hstate);
 }
 
   /* Recurse for aggregates with a single element type.  */
diff --git a/gcc/tree.c b/gcc/tree.c
index e46..e2b43bf 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -231,8 +231,8 @@ static void print_type_hash_statistics (void);
 static void print_debug_expr_statistics (void);
 static void print_value_expr_statistics (void);
 static int type_hash_marked_p (const void *);
-static unsigned int type_hash_list (const_tree, hashval_t);
-static unsigned int attribute_hash_list (const_tree, hashval_t);
+static void type_hash_list (const_tree, inchash &);
+static void attribute_hash_list (const_tree, inchash &);
 
 tree global_trees[TI_MAX];
 tree integer_types[itk_none];
@@ -4593,7 +4593,7 @@ build_type_attribute_qual_variant (tree ttype, tree 
attribute, int quals)
 {
   if (! attribute_list_equal (TYPE_ATTRIBUTES (ttype), attribute))
 {
-  hashval_t hashcode = 0;
+  inchash hstate;
   tree ntype;
   int i;
   tree t;
@@ -4621,39 +4621,37 @@ build_type_attribute_qual_variant (tree ttype, tree 
attribute, int quals)
 
   TYPE_ATTRIBUTES (ntype) = attribute;
 
-  hashcode = iterative_hash_object (code, hashcode);
+  hstate.add_int (code);
   if (TREE_TYPE (ntype))
-   hashcode = iterative_hash_object (TYPE_HASH (TREE_TYPE (ntype)),
- hashcode);
-  hashcode = attribute_hash_list (attribute, hashcode);
+   hstate.add_object (TYPE_HASH (TREE_TYPE (ntype)));
+  attribute_hash_list (attribute, hstate);
 
   switch (TREE_CODE (ntype))
{
case FUNCTION_TYPE:
- hashcode = type_hash_list (TYPE_ARG_TYPES (ntype), hashcode);
+ type_hash_list (TYPE_ARG_TYPES (ntype), hstate);
  break;
case ARRAY_TYPE:
  if (TYPE_DOMAIN (ntype))
-   hashcode = iterative_hash_object (TYPE_HASH (TYPE_DOMAIN (ntype)),
- hashcode);
+   hstate.add_object (TYPE_HASH (TYPE_DOMAIN (ntype)));
  break;
case INTEGER_TYPE:
  t = TYPE_MAX_VALUE (ntype);
  for (i = 0; i < TREE_INT_CST_NUNITS (t); i++)
-   hashcode = iterative_hash_object (TREE_INT_CST_ELT (t, i), 
hashcode);
+   hstate.add_object (TREE_INT_CST_ELT (t, i));
  break;
case REAL_TYPE:
case FIXED_POINT_TYPE:
  {
unsigned int precision = TYPE_PRECISION (ntype);
-   hashcode = iterative_hash_object (precision, hashcode);
+   hstate.add_object (precision);
  }
  break;
default:
  break;
}
 
-  ntype = type_hash_canon (hashcode, ntype);
+  ntype = type_hash_canon (hstate.end(), ntype);
 
   /* If the target-dependent attributes make NTYPE different from
 its canonical type, we will need to use structural equality
@@ -6632,17 +6630,14 @@ decl_debug_args_insert (tree from)
with types in the TREE_VALUE slots), by adding the hash codes
of the individual types.  */
 
-static unsigned int
-type_hash_list (const_tree list, hashval_t hashcode)
+static void
+type_hash_list (const_tree list, inchash &hstate)
 {
   const_tree tail;
 
   for (tail = list; tail; ta

Re: [PATCH, 4.9/4.10] Profile based option tuning

2014-07-24 Thread Pengfei Yuan

No, I didn't.

2014-07-24 16:50 GMT+08:00 Richard Biener :
> On Thu, Jul 24, 2014 at 3:52 AM, Pengfei Yuan <0xcool...@gmail.com> wrote:
>> There are more.
>>
>> In toplev.c:
>>   /* One region RA really helps to decrease the code size.  */
>>   if (flag_ira_region == IRA_REGION_AUTODETECT)
>> flag_ira_region
>>   = optimize_size || !optimize ? IRA_REGION_ONE : IRA_REGION_MIXED;
>
> This could be fixed by moving this to ira.c
>
>> In config/i386/i386.c:
>>   * Assignment of ix86_cost
>>   * Decision of alignment
>
> True, I didn't grep backends.
>
> Did you investigate where the savings come from?  I meanwhile fixed
> the estimate_move_cost bit.
>
> Thanks,
> Richard.

1 2 >

1 - 100 of 109 matches

Mail list logo