date:20141112

RE: Small multiplier support in Cortex-M0/1/+

2014-11-12 Thread Hale Wang


> -Original Message-
> From: Hale Wang [mailto:hale.w...@arm.com]
> Sent: Thursday, November 13, 2014 2:16 PM
> To: 'Christophe Lyon'
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: Small multiplier support in Cortex-M0/1/+
> 
> > -Original Message-
> > From: Christophe Lyon [mailto:christophe.l...@linaro.org]
> > Sent: Wednesday, November 12, 2014 9:49 PM
> > To: Hale Wang
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: Small multiplier support in Cortex-M0/1/+
> >
> > On 21 October 2014 12:01, Hale Wang  wrote:
> > > Hi,
> > >
> > > Some configurations of the Cortex-M0 and Cortex-M1 come with a high
> > > latency multiplier. This patch adds support for such configurations.
> > >
> > > Small multiplier means using add/sub/shift instructions to replace
> > > the mul instruction for the MCU that has no fast multiplier.
> > >
> > > The following strategies are adopted in this patch:
> > > 1. Define new CPUs as
> > > -mcpu=cortex-m0.small-multiply,cortex-m0plus.small-multiply,cortex-m1.
> > > small-
> > > multiply to support small multiplier.
> > > 2. -Os means size is preferred. A threshold of 5 is set which means
> > > it will prevent spliting if ending up with more than 5 instructions.
> > > As for non-OS, there will be no such a limit.
> > >
> > > Some test cases are also added in the testsuite to verify this function.
> > >
> > > Is it ok for trunk?
> > >
> > > Thanks and Best Regards,
> > > Hale Wang
> > >
> > > gcc/ChangeLog:
> > >
> > > 2014-08-29  Hale Wang  
> > >
> > > * config/arm/arm-cores.def: Add support for
> > > -mcpu=cortex-m0.small-multiply,cortex-m0plus.small-multiply,
> > > cortex-m1.small-multiply.
> > > * config/arm/arm-tables.opt: Regenerate.
> > > * config/arm/arm-tune.md: Regenerate.
> > > * config/arm/arm.c: Update the rtx-costs for MUL.
> > > * config/arm/bpabi.h: Handle
> > > -mcpu=cortex-m0.small-multiply,cortex-m0plus.small-multiply,
> > > cortex-m1.small-multiply.
> > > * doc/invoke.texi: Document
> > > -mcpu=cortex-m0.small-multiply,cortex-m0plus.small-multiply,
> > > cortex-m1.small-multiply.
> > > * testsuite/gcc.target/arm/small-multiply-m0-1.c: New test case.
> > > * testsuite/gcc.target/arm/small-multiply-m0-2.c: Likewise.
> > > * testsuite/gcc.target/arm/small-multiply-m0-3.c: Likewise.
> > > * testsuite/gcc.target/arm/small-multiply-m0plus-1.c: Likewise.
> > > * testsuite/gcc.target/arm/small-multiply-m0plus-2.c: Likewise.
> > > * testsuite/gcc.target/arm/small-multiply-m0plus-3.c: Likewise.
> > > * testsuite/gcc.target/arm/small-multiply-m1-1.c: Likewise.
> > > * testsuite/gcc.target/arm/small-multiply-m1-2.c: Likewise.
> > > * testsuite/gcc.target/arm/small-multiply-m1-3.c: Likewise.
> > >
> > >
> >
> ==
> > =
> > > diff --git a/gcc/config/arm/arm-cores.def
> > > b/gcc/config/arm/arm-cores.def index a830a83..af4b373 100644
> > > --- a/gcc/config/arm/arm-cores.def
> > > +++ b/gcc/config/arm/arm-cores.def
> > > @@ -137,6 +137,11 @@ ARM_CORE("cortex-m1",  cortexm1,
> > cortexm1,
> > > 6M, FL_LDSCHED, v6m)
> > >  ARM_CORE("cortex-m0",  cortexm0, cortexm0, 6M,
> > > FL_LDSCHED, v6m)
> > >  ARM_CORE("cortex-m0plus",  cortexm0plus, cortexm0plus, 6M,
> > > FL_LDSCHED, v6m)
> > >
> > > +/* V6M Architecture Processors for small-multiply implementations.  */
> > > +ARM_CORE("cortex-m1.small-multiply",   cortexm1smallmultiply,
> > cortexm1,
> > > 6M, FL_LDSCHED | FL_SMALLMUL, v6m)
> > > +ARM_CORE("cortex-m0.small-multiply",   cortexm0smallmultiply,
> > cortexm0,
> > > 6M, FL_LDSCHED | FL_SMALLMUL, v6m)
> > > +ARM_CORE("cortex-m0plus.small-multiply",cortexm0plussmallmultiply,
> > > cortexm0plus,6M, FL_LDSCHED | FL_SMALLMUL, v6m)
> > > +
> > >  /* V7 Architecture Processors */
> > >  ARM_CORE("generic-armv7-a",genericv7a, genericv7a, 7A,
> > > FL_LDSCHED, cortex)
> > >  ARM_CORE("cortex-a5",  cortexa5, cortexa5, 7A,
> > > FL_LDSCHED, cortex_a5)
> > > diff --git a/gcc/config/arm/arm-tables.opt
> > > b/gcc/config/arm/arm-tables.opt index bc046a0..bd65bd2 100644
> > > --- a/gcc/config/arm/arm-tables.opt
> > > +++ b/gcc/config/arm/arm-tables.opt
> > > @@ -241,6 +241,15 @@ EnumValue
> > >  Enum(processor_type) String(cortex-m0plus) Value(cortexm0plus)
> > >
> > >  EnumValue
> > > +Enum(processor_type) String(cortex-m1.small-multiply)
> > > Value(cortexm1smallmultiply)
> > > +
> > > +EnumValue
> > > +Enum(processor_type) String(cortex-m0.small-multiply)
> > > Value(cortexm0smallmultiply)
> > > +
> > > +EnumValue
> > > +Enum(processor_type) String(cortex-m0plus.small-multiply)
> > > Value(cortexm0plussmallmultiply)
> > > +
> > > +EnumValue
> > >  Enum(processor_type) String(generic-armv7-a) Value(genericv7a)
> > >
> > >  EnumValue
> > > diff --git a/gcc/config/arm/arm-tune.md

Re: [PATCH, ifcvt] Allow CC mode if HAVE_cbranchcc4 (fix s390 build)

2014-11-12 Thread Richard Henderson

On 11/13/2014 08:49 AM, Zhenqiang Chen wrote:
> After adding HAVE_cbranchcc4, we can just use HAVE_cbranchcc4. No need to
> add a local variable allow_cc_mode.
> 
> Here is the updated patch.

This is ok.

Since I've already approved Ulrich's s390 fix, there should not be a
problem there for long.

r~

RE: [PATCH, ifcvt] Allow CC mode if HAVE_cbranchcc4 (fix s390 build)

2014-11-12 Thread Zhenqiang Chen


> -Original Message-
> From: Richard Henderson [mailto:r...@redhat.com]
> Sent: Thursday, November 06, 2014 4:23 PM
> To: Zhenqiang Chen; 'Jan-Benedict Glaw'; Hartmut Penner; Ulrich Weigand;
> Andreas Krebbel
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH, ifcvt] Allow CC mode if HAVE_cbranchcc4 (fix s390
build)
> 
> On 11/06/2014 08:44 AM, Zhenqiang Chen wrote:
> > Hi,
> >
> > The patch add runtime check to fix s390 build fail
> > (https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00050.html).
> >
> > And there is additional code to workaround s390 cstorecc4 issue.
> >
> > Bootstrap and no make check regression on X86-64.
> > Build s390-linux-gnu and s390x-linux-gnu.
> >
> > I do not have env to run full s390 tests. Would anyone in the TO list
> > help to run the s390 tests?
> >
> > Thanks!
> > -Zhenqiang
> >
> > ChangeLog:
> > 2014-11-06  Zhenqiang Chen  
> >
> > * ifcvt.c (noce_emit_cmove, noce_get_alt_condition,
> > noce_get_condition):
> > Allow CC mode if HAVE_cbranchcc4.
> > (noce_emit_store_flag): Change CC REG as cond_complex.
> >
> > diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index f4002f9..7f7b7c1 100644
> > --- a/gcc/ifcvt.c
> > +++ b/gcc/ifcvt.c
> > @@ -849,7 +849,10 @@ noce_emit_store_flag (struct noce_if_info
> > *if_info, rtx x, int reversep,
> >enum rtx_code code;
> >
> >cond_complex = (! general_operand (XEXP (cond, 0), VOIDmode)
> > - || ! general_operand (XEXP (cond, 1), VOIDmode));
> > + || ! general_operand (XEXP (cond, 1), VOIDmode)
> > +/* For s390, CC REG is general_operand.  But cstorecc4
> > only
> > +   handles CCZ1, which can not handle others like CCU.
> > */
> > + || GET_MODE_CLASS (GET_MODE (XEXP (cond, 0))) ==
> MODE_CC);
> >
> 
> I'd like to know more about this.  This seems like a mistake in the
backend.
> 
> > +#ifdef HAVE_cbranchcc4
> > +  if (GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
> > + || cmp_b != const0_rtx
> > + || !(HAVE_cbranchcc4))
> > +#endif
> 
> Please add
> 
> #ifndef HAVE_cbranchcc4
> # define HAVE_cbranchcc4
> #endif
> 
> at the top of ifcvt.c along with all of the others.  Then use normal C
tests
> instead of preprocessor tests.

Thanks for the comment.
 
> > +  int allow_cc_mode = false;
> > +#ifdef HAVE_cbranchcc4
> > +  allow_cc_mode = HAVE_cbranchcc4;
> > +#endif
> 
> E.g. this becomes
> 
>   bool allow_cc_mode = HAVE_cbranchcc4;

After adding HAVE_cbranchcc4, we can just use HAVE_cbranchcc4. No need to
add a local variable allow_cc_mode.

Here is the updated patch.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index f4002f9..21f08c2 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -75,6 +75,10 @@
+ 1)
 #endif
 
+#ifndef HAVE_cbranchcc4
+#define HAVE_cbranchcc4 0
+#endif
+
 #define IFCVT_MULTIPLE_DUMPS 1
 
 #define NULL_BLOCK ((basic_block) NULL)
@@ -1459,10 +1463,16 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx
x, enum rtx_code code,
   end_sequence ();
 }
 
-  /* Don't even try if the comparison operands are weird.  */
+  /* Don't even try if the comparison operands are weird
+ except that the target supports cbranchcc4.  */
   if (! general_operand (cmp_a, GET_MODE (cmp_a))
   || ! general_operand (cmp_b, GET_MODE (cmp_b)))
-return NULL_RTX;
+{
+  if (!(HAVE_cbranchcc4)
+ || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
+ || cmp_b != const0_rtx)
+   return NULL_RTX;
+}
 
 #if HAVE_conditional_move
   unsignedp = (code == LTU || code == GEU
@@ -1909,7 +1919,7 @@ noce_get_alt_condition (struct noce_if_info *if_info,
rtx target,
 }
 
   cond = canonicalize_condition (if_info->jump, cond, reverse,
-earliest, target, false, true);
+earliest, target, HAVE_cbranchcc4, true);
   if (! cond || ! reg_mentioned_p (target, cond))
 return NULL;
 
@@ -2401,7 +2411,7 @@ noce_get_condition (rtx_insn *jump, rtx_insn
**earliest, bool then_else_reversed
   /* Otherwise, fall back on canonicalize_condition to do the dirty
  work of manipulating MODE_CC values and COMPARE rtx codes.  */
   tmp = canonicalize_condition (jump, cond, reverse, earliest,
-   NULL_RTX, false, true);
+   NULL_RTX, HAVE_cbranchcc4, true);
 
   /* We don't handle side-effects in the condition, like handling
  REG_INC notes and making sure no duplicate conditions are emitted.  */

Re: [Patch, libstdc++/63775] Fix regex bracket expression parsing

2014-11-12 Thread Tim Shen

On Mon, Nov 10, 2014 at 10:32 AM, Jonathan Wakely  wrote:
> OK for trunk with the small change above - thanks!

Committed with comment fix and slight change on testcase
(VERIFY(false) at end of the try block -- must throw).


-- 
Regards,
Tim Shen
commit 63544c468fa784a354def7bc240b2cd93210f6d8
Author: timshen 
Date:   Sat Nov 8 18:19:56 2014 -0800

PR libstdc++/63775
* include/bits/regex_compiler.h (_Compiler<>::_M_expression_term,
_BracketMatcher<>::_M_make_range): Throw regex_erorr on invalid range
like [z-a]. Change _M_expression_term interface.
* include/bits/regex_compiler.tcc (
_Compiler<>::_M_insert_bracket_matcher,
_Compiler<>::_M_expression_term): Rewrite bracket expression parsing.
* testsuite/28_regex/algorithms/regex_match/cstring_bracket_01.cc:
Add testcases and move file out of extended.

diff --git a/libstdc++-v3/include/bits/regex_compiler.h 
b/libstdc++-v3/include/bits/regex_compiler.h
index 1bbc09d..d8880cc 100644
--- a/libstdc++-v3/include/bits/regex_compiler.h
+++ b/libstdc++-v3/include/bits/regex_compiler.h
@@ -118,7 +118,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
void
-   _M_expression_term(_BracketMatcher<_TraitsT, __icase, __collate>&
+   _M_expression_term(pair& __last_char,
+  _BracketMatcher<_TraitsT, __icase, __collate>&
   __matcher);
 
   int
@@ -390,6 +391,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   void
   _M_make_range(_CharT __l, _CharT __r)
   {
+   if (__l > __r)
+ __throw_regex_error(regex_constants::error_range);
_M_range_set.push_back(make_pair(_M_translator._M_transform(__l),
 _M_translator._M_transform(__r)));
 #ifdef _GLIBCXX_DEBUG
diff --git a/libstdc++-v3/include/bits/regex_compiler.tcc 
b/libstdc++-v3/include/bits/regex_compiler.tcc
index 349d92a..f959884 100644
--- a/libstdc++-v3/include/bits/regex_compiler.tcc
+++ b/libstdc++-v3/include/bits/regex_compiler.tcc
@@ -415,8 +415,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _M_insert_bracket_matcher(bool __neg)
 {
   _BracketMatcher<_TraitsT, __icase, __collate> __matcher(__neg, 
_M_traits);
+  pair __last_char; // Optional<_CharT>
+  __last_char.first = false;
+  if (!(_M_flags & regex_constants::ECMAScript))
+   if (_M_try_char())
+ {
+   __matcher._M_add_char(_M_value[0]);
+   __last_char.first = true;
+   __last_char.second = _M_value[0];
+ }
   while (!_M_match_token(_ScannerT::_S_token_bracket_end))
-   _M_expression_term(__matcher);
+   _M_expression_term(__last_char, __matcher);
   __matcher._M_ready();
   _M_stack.push(_StateSeqT(
  *_M_nfa,
@@ -427,7 +436,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 void
 _Compiler<_TraitsT>::
-_M_expression_term(_BracketMatcher<_TraitsT, __icase, __collate>& 
__matcher)
+_M_expression_term(pair& __last_char,
+  _BracketMatcher<_TraitsT, __icase, __collate>& __matcher)
 {
   if (_M_match_token(_ScannerT::_S_token_collsymbol))
__matcher._M_add_collating_element(_M_value);
@@ -435,27 +445,50 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__matcher._M_add_equivalence_class(_M_value);
   else if (_M_match_token(_ScannerT::_S_token_char_class_name))
__matcher._M_add_character_class(_M_value, false);
-  else if (_M_try_char()) // [a
+  // POSIX doesn't permit '-' as a start-range char (say [a-z--0]),
+  // except when the '-' is the first character in the bracket expression
+  // ([--0]). ECMAScript treats all '-' after a range as a normal 
character.
+  // Also see above, where _M_expression_term gets called.
+  //
+  // As a result, POSIX rejects [-], but ECMAScript doesn't.
+  // Boost (1.57.0) always uses POSIX style even in its ECMAScript syntax.
+  // Clang (3.5) always uses ECMAScript style even in its POSIX syntax.
+  //
+  // It turns out that no one reads BNFs ;)
+  else if (_M_try_char())
{
- auto __ch = _M_value[0];
- if (_M_try_char())
+ if (!__last_char.first)
+   {
+ if (_M_value[0] == '-'
+ && !(_M_flags & regex_constants::ECMAScript))
+   __throw_regex_error(regex_constants::error_range);
+ __matcher._M_add_char(_M_value[0]);
+ __last_char.first = true;
+ __last_char.second = _M_value[0];
+   }
+ else
{
- if (_M_value[0] == '-') // [a-
+ if (_M_value[0] == '-')
{
- if (_M_try_char()) // [a-z]
+ if (_M_try_char())
{
- __matcher._M_make_range(__ch, _M_value[0]);
- return;
+ __matcher._M_make_range(__last_char.second , _M_value[0]);

Re: [PATCH] Propagate nonfreeing_call_p using ipa-pure-const

2014-11-12 Thread Jakub Jelinek

On Thu, Nov 13, 2014 at 12:11:08AM +0100, Jan Hubicka wrote:
> > +  else if (gimple_call_internal_p (call) && !nonfreeing_call_p (call))
> > +local->can_free = true;
> 
> Actually I think you want to do this for can_throw, too.
> We probably do not have throwing internal calls, but it is better to be safe.

The only border case internal function is ABNORMAL_DISPATCHER, the only
internal function right now that is not ECF_LEAF.  It doesn't throw, but is
used for sjlj EH.
That said, I don't see any special handling of if (callee_t) for can_throw
(for internal functions it will be always false), only for
looping/pure_const_state, but that seems to be only about doing something
for selected functions (builtin or specially named - setjmp*).
Or do you mean pretend that callee_t is non-NULL for internal functions?

> > +/* Produce transitive closure over the callgraph and compute can_free
> > +   attributes.  */
> > +
> > +static void
> > +propagate_can_free (void)
> > +{
> > +  struct cgraph_node *node;
> > +  struct cgraph_node *w;
> > +  struct cgraph_node **order
> > += XCNEWVEC (struct cgraph_node *, symtab->cgraph_count);
> > +  int order_pos;
> > +  int i;
> > +  struct ipa_dfs_info *w_info;
> > +
> > +  order_pos = ipa_reduced_postorder (order, true, false, NULL);
> > +  if (dump_file)
> > +{
> > +  cgraph_node::dump_cgraph (dump_file);
> > +  ipa_print_order (dump_file, "reduced", order, order_pos);
> > +}
> 
> The propagation seems fine, but I wonder if we won't get better memory 
> locality doing this
> during the propagation of pure/const?

Ok, will try to put it there.

Jakub

Re: [PATCH, Libatomic, Darwin] Initial libatomic port for darwin.

2014-11-12 Thread Richard Henderson

On 11/12/2014 10:18 PM, Iain Sandoe wrote:
>   * config/darwin/host-config.h New.
>   * config/darwin/lock.c New.
>   * configure.tgt (DEFAULT_X86_CPU): New, (target): New entry for darwin.

Looks pretty good.

> +#  ifndef USE_ATOMIC
> +#define USE_ATOMIC 1
> +#  endif

Why would USE_ATOMIC be defined previously?

> +inline static void LockUnlock(uint32_t *l) {
> +  __atomic_store_4((_Atomic(uint32_t)*)l, 0, __ATOMIC_RELEASE);
> +}

Gnu coding style, please.  All through the file here.

> +#  define LOCK_SIZE sizeof(uint32_t)
> +#  define NLOCKS (PAGE_SIZE / LOCK_SIZE)
> +static uint32_t locks[NLOCKS];

Um, surely not LOCK_SIZE, but CACHELINE_SIZE.  It's the granularity of the
target region that's at issue, not the size of the lock itself.

> +#if USE_ATOMIC
> +  LockLock (&locks[addr_hash (ptr, 1)]);
> +#else
> +  OSSpinLockLock(&locks[addr_hash (ptr, 1)]);
> +#endif

Better to #define LockLock  OSSpinLockLock within the area above, so as to
avoid the ifdefs here.

r~

Re: [PATCH ARM]Prefer neon for stringops on a53/a57 in AArch32 mode

2014-11-12 Thread Bin.Cheng

On Thu, Nov 13, 2014 at 3:20 PM, Yangfei (Felix)  wrote:
>> >> > Just curious about this.  Can we make sure that FPAdvSIMD are
>> >> > always
>> >> enabled?
>> >> >
>> >>
>> >> I am not sure I understand correct, do you mean enable options like
>> >> below default?
>> >> --with-fpu=crypto-neon-fp-armv8/--with-float=hard
>> >>
>> >> Thanks,
>> >> bin
>> >
>> >
>> > I mean the NEON hardware.  The processor checks if the FPAdvSIMD is
>> enabled when executing an NEON instruction.
>> >
>> > aarch64/exceptions/traps/functions/AArch64.CheckFPAdvSIMDEnabled
>> > // AArch64.CheckFPAdvSIMDEnabled()
>> > // ===
>> > // Check against CPACR_EL1.
>> > AArch64.CheckFPAdvSIMDEnabled()
>> > if PSTATE.EL IN {EL0, EL1} then
>> > // Check if access disabled in CPACR_EL1 case CPACR_EL1.FPEN of when
>> > 'x0' disabled = TRUE; when '01' disabled = PSTATE.EL == EL0; when '11'
>> > disabled = FALSE; if disabled then AArch64.AdvSIMDFPAccessTrap(EL1);
>> > AArch64.CheckFPAdvSIMDTrap(); // Also check against CPTR_EL2 and
>> > CPTR_EL3
>> >
>>
>> Oh,  this is a question that compiler/software engineers don't have the 
>> answer.
>> Also it's inappropriate for gcc mailing list.  Sorry.
>>
>> Thanks,
>> bin
>
>
> But what if the compiler generates NEON instructions for stringops when the 
> NEON hardware is disabled?
> So this kind of optimizations depend on the command line option such as 
> --with-float=hard, right?

Yes, see arm.c/arm.h, neon instructions in this case are only
generated when TARGET_NEON is true (and some other conditions), which
is defined as below.

#define TARGET_NEON (TARGET_32BIT && TARGET_HARD_FLOAT \
&& TARGET_VFP && arm_fpu_desc->neon)

Thanks,
bin

Re: [PATCH, ifcvt] Allow CC mode if HAVE_cbranchcc4 (fix s390 build)

2014-11-12 Thread Richard Henderson

On 11/12/2014 09:41 PM, Ulrich Weigand wrote:
>   * optabs.c (prepare_operand): Gracefully fail if the mode of X
>   does not match the operand mode expected by the insn pattern.

This is ok.

Another solution I'd thought about involved accepting the mode
with the predicate, but FAILing in the expander.

I wondered whether s390 benefit from following i386 in allowing
mov<>cc to accept immediate operands pre-reload.  This could then
be re-used by cstorecc4 to allow any CC mode to expand to LOCR
when available.

But certainly preventing a generic crash in prepare_operand is
good regardless of whatever else happens in the backend.

r~

Re: [PATCH][AArch64] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P

2014-11-12 Thread Andrew Pinski

On Tue, Nov 11, 2014 at 3:55 AM, Kyrill Tkachov  wrote:
> Hi all,
>
> This is the aarch64 implementation of the macro fusion hook, used to fuse
> mov+movk instructions together.
>
> A new field is declared in the tuning struct and as we add more fuseable ops
> in the future we will fill in more bits in the fuseable_ops field.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Ok for trunk?

I have a few comments about this patch as I have a patch which depends
on this for ThunderX.

Each new function should have a comment before it saying what the function does.
The checks inside aarch_macro_fusion_pair_p for single_set and
any_condjump_p seems a little too restrictive to add more fusion
without major changes to the code.
In the fusion case I am adding is about fusing any single cycle
arithmetic instruction (on ThunderX) that produces flags and the
branch that consumes it.

Thanks,
Andrew

>
> 2014-11-11  Kyrylo Tkachov  
>
> * config/aarch64/aarch64-protos.h (struct tune_params): Add
> fuseable_ops field.
> * config/aarch64/aarch64.c (generic_tunings): Specify fuseable_ops.
> (cortexa53_tunings): Likewise.
> (cortexa57_tunings): Likewise.
> (thunderx_tunings): Likewise.
> (aarch64_macro_fusion_p): New function.
> (aarch_macro_fusion_pair_p): Likewise.
> (TARGET_SCHED_MACRO_FUSION_P): Define.
> (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
> (AARCH64_FUSE_MOV_MOVK): Likewise.
> (AARCH64_FUSE_NOTHING): Likewise.

Re: [PATCH ARM]Prefer neon for stringops on a53/a57 in AArch32 mode

2014-11-12 Thread Yangfei (Felix)

> >> > Just curious about this.  Can we make sure that FPAdvSIMD are
> >> > always
> >> enabled?
> >> >
> >>
> >> I am not sure I understand correct, do you mean enable options like
> >> below default?
> >> --with-fpu=crypto-neon-fp-armv8/--with-float=hard
> >>
> >> Thanks,
> >> bin
> >
> >
> > I mean the NEON hardware.  The processor checks if the FPAdvSIMD is
> enabled when executing an NEON instruction.
> >
> > aarch64/exceptions/traps/functions/AArch64.CheckFPAdvSIMDEnabled
> > // AArch64.CheckFPAdvSIMDEnabled()
> > // ===
> > // Check against CPACR_EL1.
> > AArch64.CheckFPAdvSIMDEnabled()
> > if PSTATE.EL IN {EL0, EL1} then
> > // Check if access disabled in CPACR_EL1 case CPACR_EL1.FPEN of when
> > 'x0' disabled = TRUE; when '01' disabled = PSTATE.EL == EL0; when '11'
> > disabled = FALSE; if disabled then AArch64.AdvSIMDFPAccessTrap(EL1);
> > AArch64.CheckFPAdvSIMDTrap(); // Also check against CPTR_EL2 and
> > CPTR_EL3
> >
> 
> Oh,  this is a question that compiler/software engineers don't have the 
> answer.
> Also it's inappropriate for gcc mailing list.  Sorry.
> 
> Thanks,
> bin


But what if the compiler generates NEON instructions for stringops when the 
NEON hardware is disabled? 
So this kind of optimizations depend on the command line option such as 
--with-float=hard, right? 
Thanks

Re: [PATCH ARM]Prefer neon for stringops on a53/a57 in AArch32 mode

2014-11-12 Thread Bin.Cheng

On Thu, Nov 13, 2014 at 3:04 PM, Yangfei (Felix)  wrote:
>> On Thu, Nov 13, 2014 at 2:33 PM, Yangfei (Felix) 
>> wrote:
>> >> Hi,
>> >> As commented at
>> >> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00684.html,
>> >> this is a simple patch enabling neon memset inlining on
>> >> cortex-a53/cortex-a57 in AArch32 mode.
>> >>
>> >> Test on
>> >> arm-none-linux-gnueabihf/--with-cpu=cortex-a57/--with-fpu=crypto-neon
>> >> -fp-ar
>> >> m
>> >> v8/--with-float=hard.  I will further collect benchmark data, see if
>> >> there is regression.
>> >>
>> >> Is it ok if benchmark results are good?
>> >>
>> >> 2014-11-13  Bin Cheng  
>> >>
>> >>   * config/arm/arm.c (arm_cortex_a53_tune, arm_cortex_a57_tune):
>> >> Prefer
>> >>   neon for stringops on cortex-a53/a57 in AArch32 mode.
>> >
>> >
>> > Just curious about this.  Can we make sure that FPAdvSIMD are always
>> enabled?
>> >
>>
>> I am not sure I understand correct, do you mean enable options like below
>> default?
>> --with-fpu=crypto-neon-fp-armv8/--with-float=hard
>>
>> Thanks,
>> bin
>
>
> I mean the NEON hardware.  The processor checks if the FPAdvSIMD is enabled 
> when executing an NEON instruction.
>
> aarch64/exceptions/traps/functions/AArch64.CheckFPAdvSIMDEnabled
> // AArch64.CheckFPAdvSIMDEnabled()
> // ===
> // Check against CPACR_EL1.
> AArch64.CheckFPAdvSIMDEnabled()
> if PSTATE.EL IN {EL0, EL1} then
> // Check if access disabled in CPACR_EL1
> case CPACR_EL1.FPEN of
> when 'x0' disabled = TRUE;
> when '01' disabled = PSTATE.EL == EL0;
> when '11' disabled = FALSE;
> if disabled then AArch64.AdvSIMDFPAccessTrap(EL1);
> AArch64.CheckFPAdvSIMDTrap(); // Also check against CPTR_EL2 and CPTR_EL3
>

Oh,  this is a question that compiler/software engineers don't have
the answer.  Also it's inappropriate for gcc mailing list.  Sorry.

Thanks,
bin

Re: [PATCH ARM]Prefer neon for stringops on a53/a57 in AArch32 mode

2014-11-12 Thread Yangfei (Felix)

> On Thu, Nov 13, 2014 at 2:33 PM, Yangfei (Felix) 
> wrote:
> >> Hi,
> >> As commented at
> >> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00684.html,
> >> this is a simple patch enabling neon memset inlining on
> >> cortex-a53/cortex-a57 in AArch32 mode.
> >>
> >> Test on
> >> arm-none-linux-gnueabihf/--with-cpu=cortex-a57/--with-fpu=crypto-neon
> >> -fp-ar
> >> m
> >> v8/--with-float=hard.  I will further collect benchmark data, see if
> >> there is regression.
> >>
> >> Is it ok if benchmark results are good?
> >>
> >> 2014-11-13  Bin Cheng  
> >>
> >>   * config/arm/arm.c (arm_cortex_a53_tune, arm_cortex_a57_tune):
> >> Prefer
> >>   neon for stringops on cortex-a53/a57 in AArch32 mode.
> >
> >
> > Just curious about this.  Can we make sure that FPAdvSIMD are always
> enabled?
> >
> 
> I am not sure I understand correct, do you mean enable options like below
> default?
> --with-fpu=crypto-neon-fp-armv8/--with-float=hard
> 
> Thanks,
> bin

I mean the NEON hardware.  The processor checks if the FPAdvSIMD is enabled 
when executing an NEON instruction. 

aarch64/exceptions/traps/functions/AArch64.CheckFPAdvSIMDEnabled
// AArch64.CheckFPAdvSIMDEnabled()
// ===
// Check against CPACR_EL1.
AArch64.CheckFPAdvSIMDEnabled()
if PSTATE.EL IN {EL0, EL1} then
// Check if access disabled in CPACR_EL1
case CPACR_EL1.FPEN of
when 'x0' disabled = TRUE;
when '01' disabled = PSTATE.EL == EL0;
when '11' disabled = FALSE;
if disabled then AArch64.AdvSIMDFPAccessTrap(EL1);
AArch64.CheckFPAdvSIMDTrap(); // Also check against CPTR_EL2 and CPTR_EL3

Re: [PATCH ARM]Prefer neon for stringops on a53/a57 in AArch32 mode

2014-11-12 Thread Bin.Cheng

On Thu, Nov 13, 2014 at 2:33 PM, Yangfei (Felix)  wrote:
>> Hi,
>> As commented at https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00684.html,
>> this is a simple patch enabling neon memset inlining on
>> cortex-a53/cortex-a57 in AArch32 mode.
>>
>> Test on
>> arm-none-linux-gnueabihf/--with-cpu=cortex-a57/--with-fpu=crypto-neon-fp-ar
>> m
>> v8/--with-float=hard.  I will further collect benchmark data, see if there is
>> regression.
>>
>> Is it ok if benchmark results are good?
>>
>> 2014-11-13  Bin Cheng  
>>
>>   * config/arm/arm.c (arm_cortex_a53_tune, arm_cortex_a57_tune):
>> Prefer
>>   neon for stringops on cortex-a53/a57 in AArch32 mode.
>
>
> Just curious about this.  Can we make sure that FPAdvSIMD are always enabled?
>

I am not sure I understand correct, do you mean enable options like
below default?
--with-fpu=crypto-neon-fp-armv8/--with-float=hard

Thanks,
bin

Re: [PATCH] Make -fdiagnostics-color= default configurable, default to =auto

2014-11-12 Thread Jakub Jelinek

On Thu, Nov 13, 2014 at 01:27:29AM +0100, Manuel López-Ibáñez wrote:
> I think it is great that =auto becomes the default. However, adding a
> configure option for this seems overkill. If anyone really really
> wishes to change the default, they can simply edit a single-line in
> common.opt.

Configure option is what Richard asked me to do when changing the default,
and I think it is reasonable to give users a choice other than patching
the compiler (which is e.g. what I've been using in Fedora/RHEL for a year
or so now).

> Apart from that, I really cannot understand why someone would want the
> options none, auto or always: none is equivalent to auto-if-env but
> without the "if-env" escape route, and neither auto nor always allow
> disabling the coloring globally (which is precisely what users hating

That is not true.  GCC_COLORS= in the environment (present and empty)
always disables colors, no matter what the -fdiagnostics-color= option
says (explicit or implicit).  info gcc/man gcc documents this:
"Setting GCC_COLORS to the empty string disables colors."

auto-if-env is "never" if GCC_COLORS isn't in the environment, "auto"
if it is set to non-empty string, say
GCC_COLORS='error=01;31:warning=01;35:note=01;36:caret=01;32:locus=01:quote=01'
but even
GCC_COLORS=' '
will do and with GCC_COLORS= also disabled.

Jakub

Re: PATCH GCC5.0: conditionally skip gcc_version in plugin-version.h

2014-11-12 Thread Basile Starynkevitch

On Wed, 2014-11-12 at 14:36 +0100, Basile Starynkevitch wrote:
> On Wed, Nov 12, 2014 at 02:29:13PM +0100, Jakub Jelinek wrote:
> > On Wed, Nov 12, 2014 at 02:20:22PM +0100, Basile Starynkevitch wrote:
> > > Most plugin don't need any configure, because they are installed in 
> > > a version specific directory (like 
> > > /usr/lib/gcc/x86_64-linux-gnu/4.9/plugin 
> > > for example). I don't think it is wise to require plugin to be 
> > > autoconf-configurable. Their Makefile simply uses 
> > > $(shell gcc -print-file-name=plugin), there is no need to complex
> > > autoconf machinery.
> > 
> > If you use $(shell gcc -print-file-name=plugin), there is no point
> > to include plugin-version.h, just use __GNUC__/__GNUC_MINOR__ ?
> 
> 
> I could compile a plugin (notably for a GCC cross-compiler) with a GCC version
> different of the GCC targetting the plugin. I could also compile a 
> plugin with Clang or some other non-GCC compiler. In both cases
> plugin-version.h is needed with its GCCPLUGIN_VERSION.

I'm trying to patch GCC to get a plugin-version.c file generated, but I
can't get that work. Here is attached a buggy patch against trunk svn
r217460 which does not work. Could any one help me to catch my mistake
please?


Regards.

-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***

Index: fixincludes/fixincl.x
===
--- fixincludes/fixincl.x	(revision 217460)
+++ fixincludes/fixincl.x	(working copy)
@@ -1,12 +1,12 @@
 /*  -*- buffer-read-only: t -*- vi: set ro:
- * 
+ *
  * DO NOT EDIT THIS FILE   (fixincl.x)
- * 
- * It has been AutoGen-ed  October 21, 2014 at 10:18:16 AM by AutoGen 5.16.2
+ *
+ * It has been AutoGen-ed
  * From the definitionsinclhack.def
  * and the template file   fixincl
  */
-/* DO NOT SVN-MERGE THIS FILE, EITHER Tue Oct 21 10:18:17 CEST 2014
+/* DO NOT SVN-MERGE THIS FILE, EITHER Thu Nov 13 07:50:38 MET 2014
  *
  * You must regenerate it.  Use the ./genfixes script.
  *
Index: gcc/Makefile.in
===
--- gcc/Makefile.in	(revision 217460)
+++ gcc/Makefile.in	(working copy)
@@ -1324,6 +1324,7 @@
 	opts-global.o \
 	passes.o \
 	plugin.o \
+	plugin-version.o \
 	postreload-gcse.o \
 	postreload.o \
 	predict.o \
Index: gcc/configure.ac
===
--- gcc/configure.ac	(revision 217460)
+++ gcc/configure.ac	(working copy)
@@ -1656,7 +1656,25 @@
 else
 gcc_REVISION=""
 fi
+
+cat > plugin-version.c < plugin-version.h <

Re: [PATCH 2/5] remove the remaining uses of if_marked

2014-11-12 Thread Marek Polacek

On Thu, Nov 13, 2014 at 12:55:40AM -0500, tsaund...@mozilla.com wrote:
> From: Trevor Saunders 
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index b70c56c..227509a 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -14032,14 +14032,34 @@ legitimize_tls_address (rtx x, enum tls_model 
> model, bool for_mov)
> to symbol DECL if BEIMPORT is true.  Otherwise create or return the
> unique refptr-DECL symbol corresponding to symbol DECL.  */
>  
> -static GTY((if_marked ("tree_map_marked_p"), param_is (struct tree_map)))
> -  htab_t dllimport_map;
> +struct dllimport_hasher : ggc_cache_hasher
> +{
> +  static inline hashval_t hash (tree_map *m) { return m->hash; }
> +  static inline bool
> +  equal (tree_map *a, tree_map *b)
> +  {
> +return a->base.from == b->base.from;
> +  }
> +
> +  static void
> +handle_cache_entry (tree_map *&m)
> +  {
> + extern void gt_ggc_mx (tree_map *&);
> + if (m == HTAB_EMPTY_ENTRY || m == HTAB_DELETED_ENTRY)
> +   return;
> + else if (ggc_marked_p (m->base.from))
> +   gt_ggc_mx (m);
> + else
> +   m = static_cast (HTAB_DELETED_ENTRY);
> +  }
> +};
> +

Note that the formatting is a bit off here (and in ubsan.c too).

Marek

Re: [PATCH ARM]Prefer neon for stringops on a53/a57 in AArch32 mode

2014-11-12 Thread Yangfei (Felix)

> Hi,
> As commented at https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00684.html,
> this is a simple patch enabling neon memset inlining on
> cortex-a53/cortex-a57 in AArch32 mode.
> 
> Test on
> arm-none-linux-gnueabihf/--with-cpu=cortex-a57/--with-fpu=crypto-neon-fp-ar
> m
> v8/--with-float=hard.  I will further collect benchmark data, see if there is
> regression.
> 
> Is it ok if benchmark results are good?
> 
> 2014-11-13  Bin Cheng  
> 
>   * config/arm/arm.c (arm_cortex_a53_tune, arm_cortex_a57_tune):
> Prefer
>   neon for stringops on cortex-a53/a57 in AArch32 mode.


Just curious about this.  Can we make sure that FPAdvSIMD are always enabled?

RE: Small multiplier support in Cortex-M0/1/+

2014-11-12 Thread Hale Wang

> -Original Message-
> From: Christophe Lyon [mailto:christophe.l...@linaro.org]
> Sent: Wednesday, November 12, 2014 9:49 PM
> To: Hale Wang
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: Small multiplier support in Cortex-M0/1/+
> 
> On 21 October 2014 12:01, Hale Wang  wrote:
> > Hi,
> >
> > Some configurations of the Cortex-M0 and Cortex-M1 come with a high
> > latency multiplier. This patch adds support for such configurations.
> >
> > Small multiplier means using add/sub/shift instructions to replace the
> > mul instruction for the MCU that has no fast multiplier.
> >
> > The following strategies are adopted in this patch:
> > 1. Define new CPUs as
> > -mcpu=cortex-m0.small-multiply,cortex-m0plus.small-multiply,cortex-m1.
> > small-
> > multiply to support small multiplier.
> > 2. -Os means size is preferred. A threshold of 5 is set which means it
> > will prevent spliting if ending up with more than 5 instructions. As
> > for non-OS, there will be no such a limit.
> >
> > Some test cases are also added in the testsuite to verify this function.
> >
> > Is it ok for trunk?
> >
> > Thanks and Best Regards,
> > Hale Wang
> >
> > gcc/ChangeLog:
> >
> > 2014-08-29  Hale Wang  
> >
> > * config/arm/arm-cores.def: Add support for
> > -mcpu=cortex-m0.small-multiply,cortex-m0plus.small-multiply,
> > cortex-m1.small-multiply.
> > * config/arm/arm-tables.opt: Regenerate.
> > * config/arm/arm-tune.md: Regenerate.
> > * config/arm/arm.c: Update the rtx-costs for MUL.
> > * config/arm/bpabi.h: Handle
> > -mcpu=cortex-m0.small-multiply,cortex-m0plus.small-multiply,
> > cortex-m1.small-multiply.
> > * doc/invoke.texi: Document
> > -mcpu=cortex-m0.small-multiply,cortex-m0plus.small-multiply,
> > cortex-m1.small-multiply.
> > * testsuite/gcc.target/arm/small-multiply-m0-1.c: New test case.
> > * testsuite/gcc.target/arm/small-multiply-m0-2.c: Likewise.
> > * testsuite/gcc.target/arm/small-multiply-m0-3.c: Likewise.
> > * testsuite/gcc.target/arm/small-multiply-m0plus-1.c: Likewise.
> > * testsuite/gcc.target/arm/small-multiply-m0plus-2.c: Likewise.
> > * testsuite/gcc.target/arm/small-multiply-m0plus-3.c: Likewise.
> > * testsuite/gcc.target/arm/small-multiply-m1-1.c: Likewise.
> > * testsuite/gcc.target/arm/small-multiply-m1-2.c: Likewise.
> > * testsuite/gcc.target/arm/small-multiply-m1-3.c: Likewise.
> >
> >
> ==
> =
> > diff --git a/gcc/config/arm/arm-cores.def
> > b/gcc/config/arm/arm-cores.def index a830a83..af4b373 100644
> > --- a/gcc/config/arm/arm-cores.def
> > +++ b/gcc/config/arm/arm-cores.def
> > @@ -137,6 +137,11 @@ ARM_CORE("cortex-m1",  cortexm1,
> cortexm1,
> > 6M, FL_LDSCHED, v6m)
> >  ARM_CORE("cortex-m0",  cortexm0, cortexm0, 6M,
> > FL_LDSCHED, v6m)
> >  ARM_CORE("cortex-m0plus",  cortexm0plus, cortexm0plus, 6M,
> > FL_LDSCHED, v6m)
> >
> > +/* V6M Architecture Processors for small-multiply implementations.  */
> > +ARM_CORE("cortex-m1.small-multiply",   cortexm1smallmultiply,
> cortexm1,
> > 6M, FL_LDSCHED | FL_SMALLMUL, v6m)
> > +ARM_CORE("cortex-m0.small-multiply",   cortexm0smallmultiply,
> cortexm0,
> > 6M, FL_LDSCHED | FL_SMALLMUL, v6m)
> > +ARM_CORE("cortex-m0plus.small-multiply",cortexm0plussmallmultiply,
> > cortexm0plus,6M, FL_LDSCHED | FL_SMALLMUL, v6m)
> > +
> >  /* V7 Architecture Processors */
> >  ARM_CORE("generic-armv7-a",genericv7a, genericv7a, 7A,
> > FL_LDSCHED, cortex)
> >  ARM_CORE("cortex-a5",  cortexa5, cortexa5, 7A,
> > FL_LDSCHED, cortex_a5)
> > diff --git a/gcc/config/arm/arm-tables.opt
> > b/gcc/config/arm/arm-tables.opt index bc046a0..bd65bd2 100644
> > --- a/gcc/config/arm/arm-tables.opt
> > +++ b/gcc/config/arm/arm-tables.opt
> > @@ -241,6 +241,15 @@ EnumValue
> >  Enum(processor_type) String(cortex-m0plus) Value(cortexm0plus)
> >
> >  EnumValue
> > +Enum(processor_type) String(cortex-m1.small-multiply)
> > Value(cortexm1smallmultiply)
> > +
> > +EnumValue
> > +Enum(processor_type) String(cortex-m0.small-multiply)
> > Value(cortexm0smallmultiply)
> > +
> > +EnumValue
> > +Enum(processor_type) String(cortex-m0plus.small-multiply)
> > Value(cortexm0plussmallmultiply)
> > +
> > +EnumValue
> >  Enum(processor_type) String(generic-armv7-a) Value(genericv7a)
> >
> >  EnumValue
> > diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
> > index 954cab8..8b5c778 100644
> > --- a/gcc/config/arm/arm-tune.md
> > +++ b/gcc/config/arm/arm-tune.md
> > @@ -25,6 +25,7 @@
> > arm1176jzs,arm1176jzfs,mpcorenovfp,
> > mpcore,arm1156t2s,arm1156t2fs,
> > cortexm1,cortexm0,cortexm0plus,
> > +
> > cortexm1smallmultiply,cortexm0smallmultiply,cortexm0plussmallmultiply,
> > genericv7a,cortexa5,cortexa7,
> > cortexa8,cortexa9,cortexa12,

[PATCH, trivial][AArch64] Fix mode iterator for *aarch64_simd_ld1r pattern

2014-11-12 Thread Yangfei (Felix)

Hi,

  We find that the VALLDI mode iterator used in *aarch64_simd_ld1r 
pattern is not appropriate. 
  The reason is that it's impossible to get a new operand of DImode by 
vec_duplicating an operand of the same mode. 
  So this patch just excludes the DImode and uses VALL instead. 
  Reg-tested for aarch64-linux-gnu with QEMU.  OK for the trunk? 


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 217394)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,9 @@
+2014-11-13  Felix Yang  
+   Jiji Jiang  
+
+   * config/aarch64/aarch64-simd.md (*aarch64_simd_ld1r): Use
+   VALL mode iterator instead of VALLDI.
+
 2014-11-11  Andrew Pinski  
 
Bug target/61997
Index: gcc/config/aarch64/aarch64-simd.md
===
--- gcc/config/aarch64/aarch64-simd.md  (revision 217394)
+++ gcc/config/aarch64/aarch64-simd.md  (working copy)
@@ -4936,8 +4936,8 @@
 })
 
 (define_insn "*aarch64_simd_ld1r"
-  [(set (match_operand:VALLDI 0 "register_operand" "=w")
-   (vec_duplicate:VALLDI
+  [(set (match_operand:VALL 0 "register_operand" "=w")
+   (vec_duplicate:VALL
  (match_operand: 1 "aarch64_simd_struct_operand" "Utv")))]
   "TARGET_SIMD"
   "ld1r\\t{%0.}, %1"


aarch64-fix-mode-iterator-v1.diff
Description: aarch64-fix-mode-iterator-v1.diff

[PATCH 3/5] fix hash_table when empty elements are not 0

2014-11-12 Thread tsaunders

From: Trevor Saunders 

hi,

The problem here is that hash_table used to zero element storage, but if the
empty element is not 0 then all elements appear to be in use.

bootstrapped + regtested x86_64-unknown-linux-gnu, ok?

Trev


gcc/ChangeLog:

2014-11-13  Trevor Saunders  

* hash-table.h (hash_table::hash_table): Call alloc_entries.
(hash_table::alloc_entries): new method.
(hash_table::expand): Call alloc_entries.
(hash_table::empty): Likewise.


diff --git a/gcc/hash-table.h b/gcc/hash-table.h
index 2802e0a..7a4d609 100644
--- a/gcc/hash-table.h
+++ b/gcc/hash-table.h
@@ -1201,6 +1201,7 @@ private:
   template friend void gt_pch_nx (hash_table *,
  gt_pointer_operator, void *);
 
+  value_type *alloc_entries (size_t n) const;
   value_type *find_empty_slot_for_expand (hashval_t);
   void expand ();
   static bool is_deleted (value_type &v)
@@ -1259,12 +1260,7 @@ hash_table::hash_table 
(size_t size, bool ggc) :
   size_prime_index = hash_table_higher_prime_index (size);
   size = prime_tab[size_prime_index].prime;
 
-  if (!m_ggc)
-m_entries = Allocator  ::data_alloc (size);
-  else
-m_entries = ggc_cleared_vec_alloc (size);
-
-  gcc_assert (m_entries != NULL);
+  m_entries = alloc_entries (size);
   m_size = size;
   m_size_prime_index = size_prime_index;
 }
@@ -1282,6 +1278,26 @@ hash_table::~hash_table ()
 ggc_free (m_entries);
 }
 
+/* This function returns an array of empty hash table elements.  */
+
+template class Allocator>
+inline typename hash_table::value_type *
+hash_table::alloc_entries (size_t n) const
+{
+  value_type *nentries;
+
+  if (!m_ggc)
+nentries = Allocator  ::data_alloc (n);
+  else
+nentries = ::ggc_cleared_vec_alloc (n);
+
+  gcc_assert (nentries != NULL);
+  for (size_t i = 0; i < n; i++)
+mark_empty (nentries[i]);
+
+  return nentries;
+}
+
 /* Similar to find_slot, but without several unwanted side effects:
 - Does not call equal when it finds an existing entry.
 - Does not change the count of elements/searches/collisions in the
@@ -1351,13 +1367,7 @@ hash_table::expand ()
   nsize = osize;
 }
 
-  value_type *nentries;
-  if (!m_ggc)
-nentries = Allocator  ::data_alloc (nsize);
-  else
-nentries = ggc_cleared_vec_alloc (nsize);
-
-  gcc_assert (nentries != NULL);
+  value_type *nentries = alloc_entries (nsize);
   m_entries = nentries;
   m_size = nsize;
   m_size_prime_index = nindex;
@@ -1405,16 +1415,11 @@ hash_table::empty ()
   int nsize = prime_tab[nindex].prime;
 
   if (!m_ggc)
-   {
- Allocator  ::data_free (m_entries);
- m_entries = Allocator  ::data_alloc (nsize);
-   }
+   Allocator  ::data_free (m_entries);
   else
-   {
- ggc_free (m_entries);
- m_entries = ggc_cleared_vec_alloc (nsize);
-   }
+   ggc_free (m_entries);
 
+  m_entries = alloc_entries (nsize);
   m_size = nsize;
   m_size_prime_index = nindex;
 }
-- 
2.1.3

[PATCH 4/5] remove param$N_is usage

2014-11-12 Thread tsaunders

From: Trevor Saunders 

Hi,

The only user of this is splay_tree.  We only have a couple splay trees in ggc
memory, and it wasn't clear to me any of them were tree based instead of hash
based for performance reasons, so I chose to just convert them to hash_map
rather than writing a templated splay tree.

bootstrapped + regtested x86_64-unknown-linux-gnu, ok?

trev

gcc/

* hash-map.h (hash_map::iterator): New class.
(hash_map::begin): New method.
(hash_map::end): Likewise.
* alias.c, config/alpha/alpha.c, dwarf2asm.c, omp-low.c, tree.h:
replace splay_tree with hash_map.


diff --git a/gcc/alias.c b/gcc/alias.c
index aa6ae41..fe27ce8 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h"
 #include "cselib.h"
 #include "splay-tree.h"
+#include "hash-map.h"
 #include "langhooks.h"
 #include "timevar.h"
 #include "dumpfile.h"
@@ -139,6 +140,32 @@ along with GCC; see the file COPYING3.  If not see
However, this is no actual entry for alias set zero.  It is an
error to attempt to explicitly construct a subset of zero.  */
 
+struct alias_set_traits : default_hashmap_traits
+{
+  template
+  static bool
+  is_empty (T &e)
+  {
+return e.m_key == INT_MIN;
+  }
+
+  template
+  static bool
+  is_deleted (T &e)
+  {
+return e.m_key == (INT_MIN + 1);
+  }
+
+  template static void mark_empty (T &e) { e.m_key = INT_MIN; }
+
+  template
+  static void
+  mark_deleted (T &e)
+  {
+e.m_key = INT_MIN + 1;
+  }
+};
+
 struct GTY(()) alias_set_entry_d {
   /* The alias set number, as stored in MEM_ALIAS_SET.  */
   alias_set_type alias_set;
@@ -154,7 +181,7 @@ struct GTY(()) alias_set_entry_d {
 
  continuing our example above, the children here will be all of
  `int', `double', `float', and `struct S'.  */
-  splay_tree GTY((param1_is (int), param2_is (int))) children;
+  hash_map *children;
 };
 typedef struct alias_set_entry_d *alias_set_entry;
 
@@ -165,7 +192,6 @@ static int base_alias_check (rtx, rtx, rtx, rtx, 
machine_mode,
 machine_mode);
 static rtx find_base_value (rtx);
 static int mems_in_disjoint_alias_sets_p (const_rtx, const_rtx);
-static int insert_subset_children (splay_tree_node, void*);
 static alias_set_entry get_alias_set_entry (alias_set_type);
 static tree decl_for_component_ref (tree);
 static int write_dependence_p (const_rtx,
@@ -405,17 +431,6 @@ mems_in_disjoint_alias_sets_p (const_rtx mem1, const_rtx 
mem2)
   return ! alias_sets_conflict_p (MEM_ALIAS_SET (mem1), MEM_ALIAS_SET (mem2));
 }
 
-/* Insert the NODE into the splay tree given by DATA.  Used by
-   record_alias_subset via splay_tree_foreach.  */
-
-static int
-insert_subset_children (splay_tree_node node, void *data)
-{
-  splay_tree_insert ((splay_tree) data, node->key, node->value);
-
-  return 0;
-}
-
 /* Return true if the first alias set is a subset of the second.  */
 
 bool
@@ -431,8 +446,7 @@ alias_set_subset_of (alias_set_type set1, alias_set_type 
set2)
   ase = get_alias_set_entry (set2);
   if (ase != 0
   && (ase->has_zero_child
- || splay_tree_lookup (ase->children,
-   (splay_tree_key) set1)))
+ || ase->children->get (set1)))
 return true;
   return false;
 }
@@ -452,16 +466,14 @@ alias_sets_conflict_p (alias_set_type set1, 
alias_set_type set2)
   ase = get_alias_set_entry (set1);
   if (ase != 0
   && (ase->has_zero_child
- || splay_tree_lookup (ase->children,
-   (splay_tree_key) set2)))
+ || ase->children->get (set2)))
 return 1;
 
   /* Now do the same, but with the alias sets reversed.  */
   ase = get_alias_set_entry (set2);
   if (ase != 0
   && (ase->has_zero_child
- || splay_tree_lookup (ase->children,
-   (splay_tree_key) set1)))
+ || ase->children->get (set1)))
 return 1;
 
   /* The two alias sets are distinct and neither one is the
@@ -956,9 +968,7 @@ record_alias_subset (alias_set_type superset, 
alias_set_type subset)
   superset_entry = ggc_cleared_alloc ();
   superset_entry->alias_set = superset;
   superset_entry->children
-   = splay_tree_new_ggc (splay_tree_compare_ints,
- ggc_alloc_splay_tree_scalar_scalar_splay_tree_s,
- 
ggc_alloc_splay_tree_scalar_scalar_splay_tree_node_s);
+   = hash_map::create_ggc (64);
   superset_entry->has_zero_child = 0;
   (*alias_sets)[superset] = superset_entry;
 }
@@ -975,13 +985,14 @@ record_alias_subset (alias_set_type superset, 
alias_set_type subset)
  if (subset_entry->has_zero_child)
superset_entry->has_zero_child = 1;
 
- splay_tree_foreach (subset_entry->children, insert_subset_children,
- superset_entry->children);
+ hash_map::iterator iter
+   = subset_entry->children->

[PATCH 5/5] use vec in lto_tree_ref_table

2014-11-12 Thread tsaunders

From: Trevor Saunders 

Hi,

gengtype fails to create valid user marking functions for this type, which is
fixed by using vec here (which seems cleaner anyway).

bootstrapped + regtested powerpc64-linux (gcc 110 since gcc20 died) ok?

Trev


gcc/ChangeLog:

2014-11-13  Trevor Saunders  

* lto-section-in.c (lto_delete_in_decl_state): Adjust.
(lto_free_function_in_decl_state): Likewise.
* lto-streamer-out.c (copy_function_or_variable): Likewise.
* lto-streamer.h (lto_file_decl_data_get_ ## name): Likewise.
(lto_file_decl_data_num_ ## name ## s): Likewise.
(struct lto_tree_ref_table): Remove.
(struct lto_in_decl_state): Replace lto_tree_ref_table with vec.

gcc/lto/ChangeLog:

2014-11-13  Trevor Saunders  

* lto.c (lto_read_in_decl_state): Adjust.
(lto_fixup_state): Likewise.


diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
index 042dd99..75f394d 100644
--- a/gcc/lto-section-in.c
+++ b/gcc/lto-section-in.c
@@ -379,8 +379,7 @@ lto_delete_in_decl_state (struct lto_in_decl_state *state)
   int i;
 
   for (i = 0; i < LTO_N_DECL_STREAMS; i++)
-if (state->streams[i].trees)
-  ggc_free (state->streams[i].trees);
+vec_free (state->streams[i]);
   ggc_free (state);
 }
 
@@ -429,7 +428,7 @@ lto_free_function_in_decl_state (struct lto_in_decl_state 
*state)
 {
   int i;
   for (i = 0; i < LTO_N_DECL_STREAMS; i++)
-ggc_free (state->streams[i].trees);
+vec_free (state->streams[i]);
   ggc_free (state);
 }
 
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index dc406da..bc18a9c 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -2186,8 +2186,8 @@ copy_function_or_variable (struct symtab_node *node)
 
   for (i = 0; i < LTO_N_DECL_STREAMS; i++)
 {
-  size_t n = in_state->streams[i].size;
-  tree *trees = in_state->streams[i].trees;
+  size_t n = vec_safe_length (in_state->streams[i]);
+  vec *trees = in_state->streams[i];
   struct lto_tree_ref_encoder *encoder = &(out_state->streams[i]);
 
   /* The out state must have the same indices and the in state.
@@ -2196,7 +2196,7 @@ copy_function_or_variable (struct symtab_node *node)
   gcc_assert (lto_tree_ref_encoder_size (encoder) == 0);
   encoder->trees.reserve_exact (n);
   for (j = 0; j < n; j++)
-   encoder->trees.safe_push (trees[j]);
+   encoder->trees.safe_push ((*trees)[j]);
 }
 
   lto_free_section_data (file_data, LTO_section_function_body, name,
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 4b875a2..9d6d7a0 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -274,15 +274,14 @@ lto_file_decl_data_get_ ## name (struct 
lto_file_decl_data *data, \
 unsigned int idx) \
 { \
   struct lto_in_decl_state *state = data->current_decl_state; \
-  gcc_assert (idx < state->streams[LTO_DECL_STREAM_## UPPER_NAME].size); \
-  return state->streams[LTO_DECL_STREAM_## UPPER_NAME].trees[idx]; \
+   return (*state->streams[LTO_DECL_STREAM_## UPPER_NAME])[idx]; \
 } \
 \
 static inline unsigned int \
 lto_file_decl_data_num_ ## name ## s (struct lto_file_decl_data *data) \
 { \
   struct lto_in_decl_state *state = data->current_decl_state; \
-  return state->streams[LTO_DECL_STREAM_## UPPER_NAME].size; \
+  return vec_safe_length (state->streams[LTO_DECL_STREAM_## UPPER_NAME]); \
 }
 
 
@@ -420,18 +419,6 @@ struct lto_symtab_encoder_iterator
 
 
 
-
-/* Mapping from indices to trees.  */
-struct GTY(()) lto_tree_ref_table
-{
-  /* Array of referenced trees . */
-  tree * GTY((length ("%h.size"))) trees;
-
-  /* Size of array. */
-  unsigned int size;
-};
-
-
 /* The lto_tree_ref_encoder struct is used to encode trees into indices. */
 
 struct lto_tree_ref_encoder
@@ -445,7 +432,7 @@ struct lto_tree_ref_encoder
 struct GTY(()) lto_in_decl_state
 {
   /* Array of lto_in_decl_buffers to store type and decls streams. */
-  struct lto_tree_ref_table streams[LTO_N_DECL_STREAMS];
+  vec *streams[LTO_N_DECL_STREAMS];
 
   /* If this in-decl state is associated with a function. FN_DECL
  point to the FUNCTION_DECL. */
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index d8519d9..cdd2331 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -260,13 +260,15 @@ lto_read_in_decl_state (struct data_in *data_in, const 
uint32_t *data,
   for (i = 0; i < LTO_N_DECL_STREAMS; i++)
 {
   uint32_t size = *data++;
-  tree *decls = ggc_vec_alloc (size);
+  vec *decls = NULL;
+  vec_alloc (decls, size);
 
   for (j = 0; j < size; j++)
-   decls[j] = streamer_tree_cache_get_tree (data_in->reader_cache, 
data[j]);
+   vec_safe_push (decls,
+  streamer_tree_cache_get_tree (data_in->reader_cache,
+data[j]));
 
-  state->streams[i].size = size;
-  state->streams[i].trees = decls;
+  state->streams[i] = decls;
   data += size;
 }
 
@@ -2806,20 +2808,19 @@ sta

[PATCH 2/5] remove the remaining uses of if_marked

2014-11-12 Thread tsaunders

From: Trevor Saunders 

Hi,

$subject.

bootstrapped + regtested x86_64-unknown-linux-gnu, ok?

Trev


ada/

* gcc-interface/decl.c, gcc-interface/utils.c: replace htab with
hash_table.

cp/

* cp-objcp-common.c: Use hash_table instead of htab.

gcc/

* config/i386/i386.c, function.c, trans-mem.c, tree-core.h,
tree.c, tree.h, ubsan.c, varasm.c: Use hash_table instead of htab.



diff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c
index 2ed68d4..c133a22 100644
--- a/gcc/ada/gcc-interface/decl.c
+++ b/gcc/ada/gcc-interface/decl.c
@@ -128,8 +128,35 @@ typedef struct variant_desc_d {
 
 
 /* A hash table used to cache the result of annotate_value.  */
-static GTY ((if_marked ("tree_int_map_marked_p"),
-param_is (struct tree_int_map))) htab_t annotate_value_cache;
+
+struct value_annotation_hasher : ggc_cache_hasher
+{
+  static inline hashval_t
+  hash (tree_int_map *m)
+  {
+return htab_hash_pointer (m->base.from);
+  }
+
+  static inline bool
+  equal (tree_int_map *a, tree_int_map *b)
+  {
+return a->base.from == b->base.from;
+  }
+
+  static void
+  handle_cache_entry (tree_int_map *&m)
+  {
+extern void gt_ggc_mx (tree_int_map *&);
+if (m == HTAB_EMPTY_ENTRY || m == HTAB_DELETED_ENTRY)
+  return;
+else if (ggc_marked_p (m->base.from))
+  gt_ggc_mx (m);
+else
+  m = static_cast (HTAB_DELETED_ENTRY);
+  }
+};
+
+static GTY ((cache)) hash_table *annotate_value_cache;
 
 static bool allocatable_size_p (tree, bool);
 static void prepend_one_attribute (struct attrib **,
@@ -7362,7 +7389,7 @@ annotate_value (tree gnu_size)
   struct tree_int_map *e;
 
   in.base.from = gnu_size;
-  e = (struct tree_int_map *) htab_find (annotate_value_cache, &in);
+  e = annotate_value_cache->find (&in);
 
   if (e)
return (Node_Ref_Or_Val) e->to;
@@ -7491,8 +7518,7 @@ annotate_value (tree gnu_size)
 look up, so we have to search again.  Allocating and inserting an
 entry at that point would be an alternative, but then we'd better
 discard the entry if we decided not to cache it.  */
-  h = (struct tree_int_map **)
-   htab_find_slot (annotate_value_cache, &in, INSERT);
+  h = annotate_value_cache->find_slot (&in, INSERT);
   gcc_assert (!*h);
   *h = ggc_alloc ();
   (*h)->base.from = gnu_size;
@@ -8840,8 +8866,7 @@ void
 init_gnat_decl (void)
 {
   /* Initialize the cache of annotated values.  */
-  annotate_value_cache
-= htab_create_ggc (512, tree_int_map_hash, tree_int_map_eq, 0);
+  annotate_value_cache = hash_table::create_ggc (512);
 }
 
 /* Destroy data structures of the decl.c module.  */
@@ -8850,7 +8875,7 @@ void
 destroy_gnat_decl (void)
 {
   /* Destroy the cache of annotated values.  */
-  htab_delete (annotate_value_cache);
+  annotate_value_cache->empty ();
   annotate_value_cache = NULL;
 }
 
diff --git a/gcc/ada/gcc-interface/utils.c b/gcc/ada/gcc-interface/utils.c
index 4d35060..32f0012 100644
--- a/gcc/ada/gcc-interface/utils.c
+++ b/gcc/ada/gcc-interface/utils.c
@@ -233,20 +233,23 @@ static GTY(()) vec *global_renaming_pointers;
 /* A chain of unused BLOCK nodes. */
 static GTY((deletable)) tree free_block_chain;
 
-static int pad_type_hash_marked_p (const void *p);
-static hashval_t pad_type_hash_hash (const void *p);
-static int pad_type_hash_eq (const void *p1, const void *p2);
-
 /* A hash table of padded types.  It is modelled on the generic type
hash table in tree.c, which must thus be used as a reference.  */
-struct GTY(()) pad_type_hash {
+
+struct GTY((for_user)) pad_type_hash {
   unsigned long hash;
   tree type;
 };
 
-static GTY ((if_marked ("pad_type_hash_marked_p"),
-param_is (struct pad_type_hash)))
-  htab_t pad_type_hash_table;
+struct pad_type_hasher : ggc_cache_hasher
+{
+  static inline hashval_t hash (pad_type_hash *t) { return t->hash; }
+  static bool equal (pad_type_hash *a, pad_type_hash *b);
+  static void handle_cache_entry (pad_type_hash *&);
+};
+
+static GTY ((cache))
+  hash_table *pad_type_hash_table;
 
 static tree merge_sizes (tree, tree, tree, bool, bool);
 static tree compute_related_constant (tree, tree);
@@ -294,8 +297,7 @@ init_gnat_utils (void)
   dummy_node_table = ggc_cleared_vec_alloc (max_gnat_nodes);
 
   /* Initialize the hash table of padded types.  */
-  pad_type_hash_table
-= htab_create_ggc (512, pad_type_hash_hash, pad_type_hash_eq, 0);
+  pad_type_hash_table = hash_table::create_ggc (512);
 }
 
 /* Destroy data structures of the utils.c module.  */
@@ -312,7 +314,7 @@ destroy_gnat_utils (void)
   dummy_node_table = NULL;
 
   /* Destroy the hash table of padded types.  */
-  htab_delete (pad_type_hash_table);
+  pad_type_hash_table->empty ();
   pad_type_hash_table = NULL;
 
   /* Invalidate the global renaming pointers.   */
@@ -1155,29 +1157,23 @@ make_type_from_size (tree type, tree size_tree, bool 
for_biased)
 
 /* See if the data pointed

[PATCH 1/5] add an alternative to if_marked using hash_table

2014-11-12 Thread tsaunders

From: Trevor Saunders 

Hi,

 This adds a gty cache attribute that calls user code after marking and before
sweeping allowing user code to mark more objects or clear caches as
appropriate.  User code for hash_table is set up to work similarly to if_marked
for htab.

bootstrapped + regtested x86_64-unknown-linux-gnu, ok?

Trev


gcc/ChangeLog:

2014-11-13  Trevor Saunders  

* doc/gty.texi: Document the new cache gty attribute.
* gengtype.c (finish_cache_funcs): New function.
(write_roots): Call gt_clear_cache on global variables with the cache
gty attribute.
* ggc-common.c (ggc_mark_roots): Call gt_clear_caches.
* ggc.h (gt_clear_caches): New declaration.
* hash-table.h (struct ggc_cache_hasher): New hasher for caches in gc
memory.
(gt_cleare_cache): New function.
* emit-rtl.c, rtl.h, tree.c: Use hash_table instead of htab.


diff --git a/gcc/doc/gty.texi b/gcc/doc/gty.texi
index 609cfce..e4d2b60 100644
--- a/gcc/doc/gty.texi
+++ b/gcc/doc/gty.texi
@@ -293,6 +293,14 @@ the pointed-to structure should use the same parameters as 
the outer
 structure.  This is done by marking the pointer with the
 @code{use_params} option.
 
+@findex cache
+@item cache
+
+When the @code{cache} option is applied to a global variable gt_clear_cache is
+called on that variable between the mark and sweep phases of garbage
+collection.  The gt_clear_cache function is free to mark blocks as used, or to
+clear pointers in the variable.
+
 @findex deletable
 @item deletable
 
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 04f677e..1226aad 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -131,23 +131,50 @@ rtx cc0_rtx;
 /* A hash table storing CONST_INTs whose absolute value is greater
than MAX_SAVED_CONST_INT.  */
 
-static GTY ((if_marked ("ggc_marked_p"), param_is (struct rtx_def)))
- htab_t const_int_htab;
+struct const_int_hasher : ggc_cache_hasher
+{
+  typedef HOST_WIDE_INT compare_type;
+
+  static hashval_t hash (rtx i);
+  static bool equal (rtx i, HOST_WIDE_INT h);
+};
 
-static GTY ((if_marked ("ggc_marked_p"), param_is (struct rtx_def)))
- htab_t const_wide_int_htab;
+static GTY ((cache)) hash_table *const_int_htab;
+
+struct const_wide_int_hasher : ggc_cache_hasher
+{
+  static hashval_t hash (rtx x);
+  static bool equal (rtx x, rtx y);
+};
+
+static GTY ((cache)) hash_table *const_wide_int_htab;
 
 /* A hash table storing register attribute structures.  */
-static GTY ((if_marked ("ggc_marked_p"), param_is (struct reg_attrs)))
- htab_t reg_attrs_htab;
+struct reg_attr_hasher : ggc_cache_hasher
+{
+  static hashval_t hash (reg_attrs *x);
+  static bool equal (reg_attrs *a, reg_attrs *b);
+};
+
+static GTY ((cache)) hash_table *reg_attrs_htab;
 
 /* A hash table storing all CONST_DOUBLEs.  */
-static GTY ((if_marked ("ggc_marked_p"), param_is (struct rtx_def)))
- htab_t const_double_htab;
+struct const_double_hasher : ggc_cache_hasher
+{
+  static hashval_t hash (rtx x);
+  static bool equal (rtx x, rtx y);
+};
+
+static GTY ((cache)) hash_table *const_double_htab;
 
 /* A hash table storing all CONST_FIXEDs.  */
-static GTY ((if_marked ("ggc_marked_p"), param_is (struct rtx_def)))
- htab_t const_fixed_htab;
+struct const_fixed_hasher : ggc_cache_hasher
+{
+  static hashval_t hash (rtx x);
+  static bool equal (rtx x, rtx y);
+};
+
+static GTY ((cache)) hash_table *const_fixed_htab;
 
 #define cur_insn_uid (crtl->emit.x_cur_insn_uid)
 #define cur_debug_insn_uid (crtl->emit.x_cur_debug_insn_uid)
@@ -155,21 +182,11 @@ static GTY ((if_marked ("ggc_marked_p"), param_is (struct 
rtx_def)))
 
 static void set_used_decls (tree);
 static void mark_label_nuses (rtx);
-static hashval_t const_int_htab_hash (const void *);
-static int const_int_htab_eq (const void *, const void *);
 #if TARGET_SUPPORTS_WIDE_INT
-static hashval_t const_wide_int_htab_hash (const void *);
-static int const_wide_int_htab_eq (const void *, const void *);
 static rtx lookup_const_wide_int (rtx);
 #endif
-static hashval_t const_double_htab_hash (const void *);
-static int const_double_htab_eq (const void *, const void *);
 static rtx lookup_const_double (rtx);
-static hashval_t const_fixed_htab_hash (const void *);
-static int const_fixed_htab_eq (const void *, const void *);
 static rtx lookup_const_fixed (rtx);
-static hashval_t reg_attrs_htab_hash (const void *);
-static int reg_attrs_htab_eq (const void *, const void *);
 static reg_attrs *get_reg_attrs (tree, int);
 static rtx gen_const_vector (machine_mode, int);
 static void copy_rtx_if_shared_1 (rtx *orig);
@@ -180,31 +197,31 @@ int split_branch_probability = -1;
 
 /* Returns a hash code for X (which is a really a CONST_INT).  */
 
-static hashval_t
-const_int_htab_hash (const void *x)
+hashval_t
+const_int_hasher::hash (rtx x)
 {
-  return (hashval_t) INTVAL ((const_rtx) x);
+  return (hashval_t) INTVAL (x);
 }
 
 /* Returns nonzero if the value represented by X (which is really a

[PATCH ARM]Prefer neon for stringops on a53/a57 in AArch32 mode

2014-11-12 Thread Bin Cheng

Hi,
As commented at https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00684.html,
this is a simple patch enabling neon memset inlining on
cortex-a53/cortex-a57 in AArch32 mode.

Test on
arm-none-linux-gnueabihf/--with-cpu=cortex-a57/--with-fpu=crypto-neon-fp-arm
v8/--with-float=hard.  I will further collect benchmark data, see if there
is regression.

Is it ok if benchmark results are good?

2014-11-13  Bin Cheng  

* config/arm/arm.c (arm_cortex_a53_tune, arm_cortex_a57_tune):
Prefer
neon for stringops on cortex-a53/a57 in AArch32 mode.
Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 215108)
+++ gcc/config/arm/arm.c(working copy)
@@ -1893,7 +1893,7 @@ const struct tune_params arm_cortex_a53_tune =
   &arm_default_vec_cost,   /* Vectorizer costs.  */
   false,   /* Prefer Neon for 64-bits 
bitops.  */
   false, false, /* Prefer 32-bit encodings.  */
-  false,   /* Prefer Neon for stringops.  
*/
+  true,/* Prefer Neon for 
stringops.  */
   8/* Maximum insns to inline 
memset.  */
 };
 
@@ -1912,7 +1912,7 @@ const struct tune_params arm_cortex_a57_tune =
   &arm_default_vec_cost,   /* Vectorizer costs.  */
   false,   /* Prefer Neon for 64-bits 
bitops.  */
   true, true,  /* Prefer 32-bit encodings.  */
-  false,   /* Prefer Neon for stringops.  
*/
+  true,/* Prefer Neon for 
stringops.  */
   8/* Maximum insns to inline 
memset.  */
 };

Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize

2014-11-12 Thread Andrew Pinski

On Wed, Nov 12, 2014 at 9:38 PM, Teresa Johnson  wrote:
> On Wed, Nov 12, 2014 at 9:30 PM, Andrew Pinski  wrote:
>> On Wed, Nov 12, 2014 at 9:25 PM, Teresa Johnson  wrote:
>>> Added testcase. Here is the new patch:
>>>
>>> 2014-11-12
>>>
>>> gcc:
>>> PR tree-optimization/63841
>>> * tree.c (initializer_zerop): A constructor with no elements
>>> does not zero initialize.
>>
>> Actually an empty constructor does zero initialize.  A clobber does not.
>
> Ok, thanks, I wasn't sure.
>
>>
>> The check you want is TREE_CLOBBER_P instead.
>
> So in initializer_zerop, a constructor that is a TREE_CLOBBER_P would
> return false, right? I will make that change and retest.

Yes that is correct.  Note TREE_CLOBBER_P already checks if it is an
constructor.

Thanks,
Andrew

>
> Thanks,
> Teresa
>
>>
>> Thanks,
>> Andrew
>>
>>
>>>
>>> gcc/testsuite:
>>> * g++.dg/tree-ssa/pr63841.C: New test.
>>>
>>> Index: tree.c
>>> ===
>>> --- tree.c  (revision 217190)
>>> +++ tree.c  (working copy)
>>> @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init)
>>>{
>>> unsigned HOST_WIDE_INT idx;
>>>
>>> +if (!CONSTRUCTOR_NELTS (init))
>>> +  return false;
>>> FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt)
>>>   if (!initializer_zerop (elt))
>>> return false;
>>> Index: testsuite/g++.dg/tree-ssa/pr63841.C
>>> ===
>>> --- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0)
>>> +++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy)
>>> @@ -0,0 +1,38 @@
>>> +/* { dg-do run } */
>>> +/* { dg-options "-O2" } */
>>> +
>>> +#include 
>>> +#include 
>>> +
>>> +std::string __attribute__ ((noinline)) comp_test_write() {
>>> +  std::string data;
>>> +
>>> +  for (int i = 0; i < 2; ++i) {
>>> +char b = 1 >> (i * 8);
>>> +data.append(&b, 1);
>>> +  }
>>> +
>>> +  return data;
>>> +}
>>> +
>>> +std::string __attribute__ ((noinline)) comp_test_write_good() {
>>> +  std::string data;
>>> +
>>> +  char b;
>>> +  for (int i = 0; i < 2; ++i) {
>>> +b = 1 >> (i * 8);
>>> +data.append(&b, 1);
>>> +  }
>>> +
>>> +  return data;
>>> +}
>>> +
>>> +int main() {
>>> +  std::string good = comp_test_write_good();
>>> +  printf("expected: %hx\n", *(short*)good.c_str());
>>> +
>>> +  std::string bad = comp_test_write();
>>> +  printf("got: %hx\n", *(short*)bad.c_str());
>>> +
>>> +  return good != bad;
>>> +}
>>>
>>> On Wed, Nov 12, 2014 at 2:17 PM, Xinliang David Li  
>>> wrote:
 missing test case?

 David

 On Wed, Nov 12, 2014 at 2:13 PM, Teresa Johnson  
 wrote:
> This patch fixes an issue where tree-strlen was incorrectly removing a
> store of 0 into a string because it thought a prior CLOBBER (which is
> an empty constructor with no elements) was zero-initializing the
> string.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk?
>
> Thanks,
> Teresa
>
> 2014-11-12
>
> PR tree-optimization/63841
> * tree.c (initializer_zerop): A constructor with no elements
> does not zero initialize.
>
> Index: tree.c
> ===
> --- tree.c  (revision 217190)
> +++ tree.c  (working copy)
> @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init)
>{
> unsigned HOST_WIDE_INT idx;
>
> +if (!CONSTRUCTOR_NELTS (init))
> +  return false;
> FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt)
>   if (!initializer_zerop (elt))
> return false;
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize

2014-11-12 Thread Teresa Johnson

On Wed, Nov 12, 2014 at 9:30 PM, Andrew Pinski  wrote:
> On Wed, Nov 12, 2014 at 9:25 PM, Teresa Johnson  wrote:
>> Added testcase. Here is the new patch:
>>
>> 2014-11-12
>>
>> gcc:
>> PR tree-optimization/63841
>> * tree.c (initializer_zerop): A constructor with no elements
>> does not zero initialize.
>
> Actually an empty constructor does zero initialize.  A clobber does not.

Ok, thanks, I wasn't sure.

>
> The check you want is TREE_CLOBBER_P instead.

So in initializer_zerop, a constructor that is a TREE_CLOBBER_P would
return false, right? I will make that change and retest.

Thanks,
Teresa

>
> Thanks,
> Andrew
>
>
>>
>> gcc/testsuite:
>> * g++.dg/tree-ssa/pr63841.C: New test.
>>
>> Index: tree.c
>> ===
>> --- tree.c  (revision 217190)
>> +++ tree.c  (working copy)
>> @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init)
>>{
>> unsigned HOST_WIDE_INT idx;
>>
>> +if (!CONSTRUCTOR_NELTS (init))
>> +  return false;
>> FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt)
>>   if (!initializer_zerop (elt))
>> return false;
>> Index: testsuite/g++.dg/tree-ssa/pr63841.C
>> ===
>> --- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0)
>> +++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy)
>> @@ -0,0 +1,38 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include 
>> +#include 
>> +
>> +std::string __attribute__ ((noinline)) comp_test_write() {
>> +  std::string data;
>> +
>> +  for (int i = 0; i < 2; ++i) {
>> +char b = 1 >> (i * 8);
>> +data.append(&b, 1);
>> +  }
>> +
>> +  return data;
>> +}
>> +
>> +std::string __attribute__ ((noinline)) comp_test_write_good() {
>> +  std::string data;
>> +
>> +  char b;
>> +  for (int i = 0; i < 2; ++i) {
>> +b = 1 >> (i * 8);
>> +data.append(&b, 1);
>> +  }
>> +
>> +  return data;
>> +}
>> +
>> +int main() {
>> +  std::string good = comp_test_write_good();
>> +  printf("expected: %hx\n", *(short*)good.c_str());
>> +
>> +  std::string bad = comp_test_write();
>> +  printf("got: %hx\n", *(short*)bad.c_str());
>> +
>> +  return good != bad;
>> +}
>>
>> On Wed, Nov 12, 2014 at 2:17 PM, Xinliang David Li  
>> wrote:
>>> missing test case?
>>>
>>> David
>>>
>>> On Wed, Nov 12, 2014 at 2:13 PM, Teresa Johnson  
>>> wrote:
 This patch fixes an issue where tree-strlen was incorrectly removing a
 store of 0 into a string because it thought a prior CLOBBER (which is
 an empty constructor with no elements) was zero-initializing the
 string.

 Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk?

 Thanks,
 Teresa

 2014-11-12

 PR tree-optimization/63841
 * tree.c (initializer_zerop): A constructor with no elements
 does not zero initialize.

 Index: tree.c
 ===
 --- tree.c  (revision 217190)
 +++ tree.c  (working copy)
 @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init)
{
 unsigned HOST_WIDE_INT idx;

 +if (!CONSTRUCTOR_NELTS (init))
 +  return false;
 FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt)
   if (!initializer_zerop (elt))
 return false;


 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

[gomp4] fix omp_clause_code_name

2014-11-12 Thread Cesar Philippidis

debug_tree was throwing an ICE when I was debugging scan_sharing_clauses
in omp-low.c. The problem turned out to be a missing comma. I've applied
this fix to gomp-4_0-branch.

Cesar
2014-11-12  Cesar Philippidis  

	gcc/
	* tree.c (omp_clause_code_name): Add missing comma
	after "_Cilk_for_count_". 


Index: gcc/tree.c
===
--- gcc/tree.c	(revision 217459)
+++ gcc/tree.c	(working copy)
@@ -358,7 +358,7 @@ const char * const omp_clause_code_name[
   "sections",
   "taskgroup",
   "_simduid_",
-  "_Cilk_for_count_"
+  "_Cilk_for_count_",
   "independent",
   "worker",
   "vector",

Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize

2014-11-12 Thread Andrew Pinski

On Wed, Nov 12, 2014 at 9:25 PM, Teresa Johnson  wrote:
> Added testcase. Here is the new patch:
>
> 2014-11-12
>
> gcc:
> PR tree-optimization/63841
> * tree.c (initializer_zerop): A constructor with no elements
> does not zero initialize.

Actually an empty constructor does zero initialize.  A clobber does not.

The check you want is TREE_CLOBBER_P instead.

Thanks,
Andrew


>
> gcc/testsuite:
> * g++.dg/tree-ssa/pr63841.C: New test.
>
> Index: tree.c
> ===
> --- tree.c  (revision 217190)
> +++ tree.c  (working copy)
> @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init)
>{
> unsigned HOST_WIDE_INT idx;
>
> +if (!CONSTRUCTOR_NELTS (init))
> +  return false;
> FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt)
>   if (!initializer_zerop (elt))
> return false;
> Index: testsuite/g++.dg/tree-ssa/pr63841.C
> ===
> --- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0)
> +++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy)
> @@ -0,0 +1,38 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +#include 
> +#include 
> +
> +std::string __attribute__ ((noinline)) comp_test_write() {
> +  std::string data;
> +
> +  for (int i = 0; i < 2; ++i) {
> +char b = 1 >> (i * 8);
> +data.append(&b, 1);
> +  }
> +
> +  return data;
> +}
> +
> +std::string __attribute__ ((noinline)) comp_test_write_good() {
> +  std::string data;
> +
> +  char b;
> +  for (int i = 0; i < 2; ++i) {
> +b = 1 >> (i * 8);
> +data.append(&b, 1);
> +  }
> +
> +  return data;
> +}
> +
> +int main() {
> +  std::string good = comp_test_write_good();
> +  printf("expected: %hx\n", *(short*)good.c_str());
> +
> +  std::string bad = comp_test_write();
> +  printf("got: %hx\n", *(short*)bad.c_str());
> +
> +  return good != bad;
> +}
>
> On Wed, Nov 12, 2014 at 2:17 PM, Xinliang David Li  wrote:
>> missing test case?
>>
>> David
>>
>> On Wed, Nov 12, 2014 at 2:13 PM, Teresa Johnson  wrote:
>>> This patch fixes an issue where tree-strlen was incorrectly removing a
>>> store of 0 into a string because it thought a prior CLOBBER (which is
>>> an empty constructor with no elements) was zero-initializing the
>>> string.
>>>
>>> Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk?
>>>
>>> Thanks,
>>> Teresa
>>>
>>> 2014-11-12
>>>
>>> PR tree-optimization/63841
>>> * tree.c (initializer_zerop): A constructor with no elements
>>> does not zero initialize.
>>>
>>> Index: tree.c
>>> ===
>>> --- tree.c  (revision 217190)
>>> +++ tree.c  (working copy)
>>> @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init)
>>>{
>>> unsigned HOST_WIDE_INT idx;
>>>
>>> +if (!CONSTRUCTOR_NELTS (init))
>>> +  return false;
>>> FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt)
>>>   if (!initializer_zerop (elt))
>>> return false;
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize

2014-11-12 Thread Teresa Johnson

Added testcase. Here is the new patch:

2014-11-12

gcc:
PR tree-optimization/63841
* tree.c (initializer_zerop): A constructor with no elements
does not zero initialize.

gcc/testsuite:
* g++.dg/tree-ssa/pr63841.C: New test.

Index: tree.c
===
--- tree.c  (revision 217190)
+++ tree.c  (working copy)
@@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init)
   {
unsigned HOST_WIDE_INT idx;

+if (!CONSTRUCTOR_NELTS (init))
+  return false;
FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt)
  if (!initializer_zerop (elt))
return false;
Index: testsuite/g++.dg/tree-ssa/pr63841.C
===
--- testsuite/g++.dg/tree-ssa/pr63841.C (revision 0)
+++ testsuite/g++.dg/tree-ssa/pr63841.C (working copy)
@@ -0,0 +1,38 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include 
+#include 
+
+std::string __attribute__ ((noinline)) comp_test_write() {
+  std::string data;
+
+  for (int i = 0; i < 2; ++i) {
+char b = 1 >> (i * 8);
+data.append(&b, 1);
+  }
+
+  return data;
+}
+
+std::string __attribute__ ((noinline)) comp_test_write_good() {
+  std::string data;
+
+  char b;
+  for (int i = 0; i < 2; ++i) {
+b = 1 >> (i * 8);
+data.append(&b, 1);
+  }
+
+  return data;
+}
+
+int main() {
+  std::string good = comp_test_write_good();
+  printf("expected: %hx\n", *(short*)good.c_str());
+
+  std::string bad = comp_test_write();
+  printf("got: %hx\n", *(short*)bad.c_str());
+
+  return good != bad;
+}

On Wed, Nov 12, 2014 at 2:17 PM, Xinliang David Li  wrote:
> missing test case?
>
> David
>
> On Wed, Nov 12, 2014 at 2:13 PM, Teresa Johnson  wrote:
>> This patch fixes an issue where tree-strlen was incorrectly removing a
>> store of 0 into a string because it thought a prior CLOBBER (which is
>> an empty constructor with no elements) was zero-initializing the
>> string.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk?
>>
>> Thanks,
>> Teresa
>>
>> 2014-11-12
>>
>> PR tree-optimization/63841
>> * tree.c (initializer_zerop): A constructor with no elements
>> does not zero initialize.
>>
>> Index: tree.c
>> ===
>> --- tree.c  (revision 217190)
>> +++ tree.c  (working copy)
>> @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init)
>>{
>> unsigned HOST_WIDE_INT idx;
>>
>> +if (!CONSTRUCTOR_NELTS (init))
>> +  return false;
>> FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt)
>>   if (!initializer_zerop (elt))
>> return false;
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: Updated LRA rematerialization patch has been committed

2014-11-12 Thread Vladimir Makarov


On 2014-11-12 11:06 PM, H.J. Lu wrote:


Unfortunately, it breaks bootstrap on Linux/ia32:

https://gcc.gnu.org/ml/gcc-regression/2014-11/msg00288.html

../../../src-trunk/libgcc/config/libbid/bid_round.c: In function
â__bid_round128_19_38â:
../../../src-trunk/libgcc/config/libbid/bid_round.c:391:1: internal
compiler error: Segmentation fault
  }
  ^
/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/xgcc
-B/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/
-B/usr/5.0.0/i686-linux/bin/ -B/usr/5.0.0/i686-linux/lib/ -isystem
/usr/5.0.0/i686-linux/include -isystem
/usr/5.0.0/i686-linux/sys-include-g -O2 -O2  -g -O2 -DIN_GCC-W
-Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes
-Wmissing-prototypes -Wold-style-definition  -isystem ./include
-fpic -mlong-double-80 -g -DIN_LIBGCC2 -fbuilding-libgcc
-fno-stack-protector   -fpic -mlong-double-80 -I. -I. -I../.././gcc
-I../../../src-trunk/libgcc -I../../../src-trunk/libgcc/.
-I../../../src-trunk/libgcc/../gcc
-I../../../src-trunk/libgcc/../include
-I../../../src-trunk/libgcc/config/libbid -DENABLE_DECIMAL_BID_FORMAT
-DHAVE_CC_TLS  -DUSE_TLS -o bid128.o -MT bid128.o -MD -MP -MF
bid128.dep -c ../../../src-trunk/libgcc/config/libbid/bid128.c
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
make[6]: *** [bid_round.o] Error 1

when configured with

--with-arch=corei7 --with-cpu=corei7 --prefix=/usr/5.0.0
--enable-clocale=gnu --with-system-zlib --enable-shared
--with-demangler-in-ld i686-linux --with-fpmath=sse
--enable-languages=c,c++,fortran,java,lto,objc



Thanks, H.J.  I'll investigate it.  Meanwhile, I switched the pass off 
temporarily.


Index: ChangeLog
===
--- ChangeLog   (revision 217458)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2014-11-13  Vladimir Makarov  
+
+   * lra.c (lra): Switch off rematerialization pass.
+
 2014-11-12  Vladimir Makarov  

* common.opt (flra-remat): New.
Index: lra.c
===
--- lra.c   (revision 217458)
+++ lra.c   (working copy)
@@ -2349,7 +2349,7 @@
}
   /* Now we know what pseudos should be spilled.  Try to
 rematerialize them first.  */
-  if (lra_remat ())
+  if (0 && lra_remat ())
{
  /* We need full live info -- see the comment above.  */
  lra_create_live_ranges (lra_reg_spill_p);

LTO streaming of TARGET_OPTIMIZE_NODE

2014-11-12 Thread Jan Hubicka

Hi,
this patch adds infrastructure for proper streaming and merging of
TREE_TARGET_OPTION.  The catch is that TREE_TARGET_OPTION is autogenerated
structure.  For x86_64 it looks as follows:

/* Structure to save/restore selected target specific options.  */
struct GTY(()) cl_target_option
{
  HOST_WIDE_INT x_ix86_isa_flags_explicit;
  HOST_WIDE_INT x_ix86_target_flags_explicit;
  const char *x_ix86_arch_string;
  const char *x_ix86_recip_name;
  const char *x_ix86_tune_ctrl_string;
  const char *x_ix86_tune_memcpy_strategy;
  const char *x_ix86_tune_memset_strategy;
  const char *x_ix86_tune_string;
  HOST_WIDE_INT x_ix86_isa_flags;
  enum fpmath_unit x_ix86_fpmath;
  enum asm_dialect x_ix86_asm_dialect;
  ...
  enum stack_protector_guard x_ix86_stack_protector_guard;
  enum stringop_alg x_ix86_stringop_alg;
  enum tls_dialect x_ix86_tls_dialect;
  int x_ix86_branch_cost;
  int x_ix86_dump_tunes;
  
  int x_recip_mask;
  int x_target_flags;
  unsigned char arch;
  unsigned char arch_specified;
  unsigned char branch_cost;
  unsigned char schedule;
  unsigned char tune;
  unsigned char tune_defaulted;
};

The problem is that the strings prevents us from streaming the structure as a 
binary
blob like we do for optimization nodes (and we should not do that to make LTO 
streaming
stable across attribute tweaks in minor releases).

This patch extends the AWK generators to produce four functions:

/* Compare two target options  */
bool
cl_target_option_eq (struct cl_target_option const *ptr1,
 struct cl_target_option const *ptr2)
{
  if (ptr1->x_ix86_arch_string != ptr2->x_ix86_arch_string
  && (!ptr1->x_ix86_arch_string || !ptr2->x_ix86_arch_string
  || strcmp (ptr1->x_ix86_arch_string, ptr2->x_ix86_arch_string)))
return false;

if (ptr1->x_ix86_isa_flags != ptr2->x_ix86_isa_flags)
return false;
  if (ptr1->x_ix86_fpmath != ptr2->x_ix86_fpmath)
return false;
  return true;
}

/* Hash target options  */
hashval_t
cl_target_option_hash (struct cl_target_option const *ptr)
{ 
  inchash::hash hstate;
  if (ptr->x_ix86_arch_string)
hstate.add (ptr->x_ix86_arch_string, strlen (ptr->x_ix86_arch_string));
  else
hstate.add_int (0);
...
  hstate.add_wide_int (ptr->x_ix86_fpmath);
  return hstate.end ();
}

/* Stream out target options  */
void
cl_target_option_stream_out (struct output_block *ob, struct cl_target_option 
*ptr)
{ 
  streamer_write_string (ob, ob->main_stream, ptr->x_ix86_arch_string, true);
...
  streamer_write_uhwi (ob, ptr->x_ix86_isa_flags);
  streamer_write_uhwi (ob, ptr->x_ix86_fpmath);
}

/* Stream in target options  */
void
cl_target_option_stream_in (struct lto_input_block *ib, struct data_in 
*data_in, struct cl_target_option *ptr)
{ 
  ptr->x_ix86_arch_string = strdup (streamer_read_string (data_in, ib));
...
  ptr->x_ix86_fpmath = (enum fpmath_unit ) streamer_read_uhwi (ib);
}

These allows us to properly stream, compare and hash the values.

There is some implementation lazyness - for example I stream all scalars as
unsigned HOST_WIDE_INT that seems to be the largest type used for variables
across the targets.

If other types are used, the type matching (I stole from other part of the awk
scripts) will need to be extended; for now it knows that const char * is a
string and needs special care but nothing else.

I have tested this with firefox and additional change to tree.c to always
initialize TARGET_OPTION_NODE for defined functions (so target units are
peorperly presered across units). This seems to work as intended - i.e. the
functions gets link-time optimized according to -march settings passed at
compile time.

I noticed that memory use is now 3.8GB instead of previous 2.8GB - this may be
tree merging problem caused by mismatching target settings but most probably it
is an unrelated stuff. I will look into it tomorrow.  I also plan to test this
on PPC (that currently fails to bootstrap).

Incrementally I would like to handle optimizations nodes same way and change
streaming format to be a set of assignments field_name=value.
This way we could behave sanely when some field is introduced or removed by
either defaulting it or ignoring it.

Bootstrapped/regtested x86_64-linux, seems sane?

* optc-save-gen.awk: Output cl_target_option_eq,
cl_target_option_hash, cl_target_option_stream_out,
cl_target_option_stream_in functions.
* opth-gen.awk: Output prototypes for
cl_target_option_eq and cl_target_option_hash.
* lto-streamer.h (cl_target_option_stream_out,
cl_target_option_stream_in): Declare.
* tree.c (cl_option_hash_hash): Use cl_target_option_hash.
(cl_option_hash_eq): Use cl_target_option_eq.
* tree-streamer-in.c (unpack_value_fields): Stream in
TREE_TARGET_OPTION.
* lto-streamer-out.c (DFS::DFS_write_tree_body): Follow
DECL_FUNCTION_SPECIFIC_TARGET.
(hash_tree): Hash TREE_TARGET_OPTION; visit
DECL_FUNCTION_

Re: Updated LRA rematerialization patch has been committed

2014-11-12 Thread H.J. Lu

On Wed, Nov 12, 2014 at 7:08 PM, Vladimir Makarov  wrote:
>   After submitting LRA rematerialization patch, I got a lot of
> feedback.  Some people reported performance degradation and pointed me
> out the most important problem which looks like
>
>   p0 <- p1 + p2  p0 <- p1 + p2
>   spilled_pseudo <- p0   spilled_pseudo <- p0
>
>... some code =>
>
>   p3 <- spilled_pseudop3 <- p1 + p2
>
>   The first 2 insns were not removed and the second one became a dead
> store.  It was hard to fix as LRA (and reload pass) does mostly local
> transformations (in BB or EBB).  It could be fixed in BB or EBB.  But
> some important cases (e.g. the code between is a loop) still will be
> missed and will result in the same problem.  To fix this in right way,
> we needed to update the global live info.
>
>  A recent Intel project on reuse of pic register came to problems
> which need a global live analysis too in LRA to fix them.  For
> rematerialization, it is a matter of better code generation but for
> reuse of pic register project it is a matter of correct code
> generation.
>
>   So the last two weeks I worked on global live analysis in LRA and
> submitted the patch 3 days ago.  The rematerialization patch here is
> mostly the same I sent month ago.  I added only small changes to adapt
> it to global live analysis and fix some tests failures I found on
> ppc64.
>
>   The patch with live analysis generates smaller and a better code
> than before.  Last time I reported only 1% SPECFP2000 improvement on
> ARM.  Now I see about 0.4% SPECFP2000 improvement on x86-64 too.
>
>   So I've committed the rematerialization patch to the trunk as rev. 217458.
>
>   As I wrote its initial version of rematerialziation.  Other
> people and me proposed several ideas how to improve it in the future.
>
> 2014-11-12  Vladimir Makarov  
>
> * common.opt (flra-remat): New.
> * opts.c (default_options_table): Add entry for flra_remat.
> * timevar_def (TV_LRA_REMAT): New.
> * doc/invoke.texi (-flra-remat): Add description of the new
> option.
> * doc/passes.texi (-flra-remat): Remove lra-equivs.c and
> lra-saves.c.  Add lra-remat.c.
> * Makefile.in (OBJS): Add lra-remat.o.
> * lra-remat.c: New file.
> * lra.c: Add info about the rematerialization pass in the top
> comment.
> (collect_non_operand_hard_regs, add_regs_to_insn_regno_info):
> Process unallocatable regs too.
> (lra_constraint_new_insn_uid_start): Remove.
> (lra): Add code for calling rematerialization sub-pass.
> * lra-int.h (lra_constraint_new_insn_uid_start): Remove.
> (lra_constrain_insn, lra_remat): New prototypes.
> (lra_eliminate_regs_1): Add parameter.
> * lra-lives.c (make_hard_regno_born, make_hard_regno_dead):
> Process unallocatable hard regs too.
> (process_bb_lives): Ditto.
> * lra-spills.c (remove_pseudos): Add argument to
> lra_eliminate_regs_1 call.
> * lra-eliminations.c (lra_eliminate_regs_1): Add parameter.  Use it
> for sp offset calculation.
> (lra_eliminate_regs): Add argument for lra_eliminate_regs_1 call.
> (eliminate_regs_in_insn): Add parameter.  Use it for sp offset
> calculation.
> (process_insn_for_elimination): Add argument for
> eliminate_regs_in_insn call.
> * lra-constraints.c (get_equiv_with_elimination):  Add argument
> for lra_eliminate_regs_1 call.
> (process_addr_reg): Add parameter.  Use it.
> (process_address_1): Ditto.  Add argument for process_addr_reg
> call.
> (process_address): Ditto.
> (curr_insn_transform): Add parameter.  Use it.  Add argument for
> process_address calls.
> (lra_constrain_insn): New function.
> (lra_constraints): Add argument for curr_insn_transform call.
>

Unfortunately, it breaks bootstrap on Linux/ia32:

https://gcc.gnu.org/ml/gcc-regression/2014-11/msg00288.html

../../../src-trunk/libgcc/config/libbid/bid_round.c: In function
â__bid_round128_19_38â:
../../../src-trunk/libgcc/config/libbid/bid_round.c:391:1: internal
compiler error: Segmentation fault
 }
 ^
/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/xgcc
-B/export/gnu/import/git/gcc-test-ia32corei7/bld/./gcc/
-B/usr/5.0.0/i686-linux/bin/ -B/usr/5.0.0/i686-linux/lib/ -isystem
/usr/5.0.0/i686-linux/include -isystem
/usr/5.0.0/i686-linux/sys-include-g -O2 -O2  -g -O2 -DIN_GCC-W
-Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes
-Wmissing-prototypes -Wold-style-definition  -isystem ./include
-fpic -mlong-double-80 -g -DIN_LIBGCC2 -fbuilding-libgcc
-fno-stack-protector   -fpic -mlong-double-80 -I. -I. -I../.././gcc
-I../../../src-trunk/libgcc -I../../../src-trunk/libgcc/.
-I../../../src-trunk/libgcc/../gcc
-I../../../src-trunk/libgcc/../include
-I../../.

Re: [RTL, Patch] Int div by constant compilation enhancement

2014-11-12 Thread H.J. Lu

On Mon, Nov 3, 2014 at 9:01 AM, Alex Velenko  wrote:
> Hi,
> This patch adds a mid-end check to catch division by
> constant case and optimize it to generate one shift,
> instead of two.
>
> A testacase to check the correct codegeneration for aarch64
> is added. This check is not made generic, because the optimisation
> implemented is not used by all targets.
>
> Is it ok?
>
> Thanks,
> Alex
>
> gcc/
>
> 2014-11-03  Alex Velenko  
>
> * simplify-rtx.c (simplify_binary_operation_1): Div check added.
> * rtl.h (SUBREG_P): New macro added.
>
> gcc/testsuite/
>
> 2014-11-03  Alex Velenko  
>
> * gcc.dg/asr-div1.c : New testcase.

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63843

-- 
H.J.

Updated LRA rematerialization patch has been committed

2014-11-12 Thread Vladimir Makarov


  After submitting LRA rematerialization patch, I got a lot of
feedback.  Some people reported performance degradation and pointed me
out the most important problem which looks like

  p0 <- p1 + p2  p0 <- p1 + p2
  spilled_pseudo <- p0   spilled_pseudo <- p0

   ... some code =>

  p3 <- spilled_pseudop3 <- p1 + p2

  The first 2 insns were not removed and the second one became a dead
store.  It was hard to fix as LRA (and reload pass) does mostly local
transformations (in BB or EBB).  It could be fixed in BB or EBB.  But
some important cases (e.g. the code between is a loop) still will be
missed and will result in the same problem.  To fix this in right way,
we needed to update the global live info.

 A recent Intel project on reuse of pic register came to problems
which need a global live analysis too in LRA to fix them.  For
rematerialization, it is a matter of better code generation but for
reuse of pic register project it is a matter of correct code
generation.

  So the last two weeks I worked on global live analysis in LRA and
submitted the patch 3 days ago.  The rematerialization patch here is
mostly the same I sent month ago.  I added only small changes to adapt
it to global live analysis and fix some tests failures I found on
ppc64.

  The patch with live analysis generates smaller and a better code
than before.  Last time I reported only 1% SPECFP2000 improvement on
ARM.  Now I see about 0.4% SPECFP2000 improvement on x86-64 too.

  So I've committed the rematerialization patch to the trunk as rev. 
217458.


  As I wrote its initial version of rematerialziation.  Other
people and me proposed several ideas how to improve it in the future.

2014-11-12  Vladimir Makarov  

* common.opt (flra-remat): New.
* opts.c (default_options_table): Add entry for flra_remat.
* timevar_def (TV_LRA_REMAT): New.
* doc/invoke.texi (-flra-remat): Add description of the new
option.
* doc/passes.texi (-flra-remat): Remove lra-equivs.c and
lra-saves.c.  Add lra-remat.c.
* Makefile.in (OBJS): Add lra-remat.o.
* lra-remat.c: New file.
* lra.c: Add info about the rematerialization pass in the top
comment.
(collect_non_operand_hard_regs, add_regs_to_insn_regno_info):
Process unallocatable regs too.
(lra_constraint_new_insn_uid_start): Remove.
(lra): Add code for calling rematerialization sub-pass.
* lra-int.h (lra_constraint_new_insn_uid_start): Remove.
(lra_constrain_insn, lra_remat): New prototypes.
(lra_eliminate_regs_1): Add parameter.
* lra-lives.c (make_hard_regno_born, make_hard_regno_dead):
Process unallocatable hard regs too.
(process_bb_lives): Ditto.
* lra-spills.c (remove_pseudos): Add argument to
lra_eliminate_regs_1 call.
* lra-eliminations.c (lra_eliminate_regs_1): Add parameter.  Use it
for sp offset calculation.
(lra_eliminate_regs): Add argument for lra_eliminate_regs_1 call.
(eliminate_regs_in_insn): Add parameter.  Use it for sp offset
calculation.
(process_insn_for_elimination): Add argument for
eliminate_regs_in_insn call.
* lra-constraints.c (get_equiv_with_elimination):  Add argument
for lra_eliminate_regs_1 call.
(process_addr_reg): Add parameter.  Use it.
(process_address_1): Ditto.  Add argument for process_addr_reg
call.
(process_address): Ditto.
(curr_insn_transform): Add parameter.  Use it.  Add argument for
process_address calls.
(lra_constrain_insn): New function.
(lra_constraints): Add argument for curr_insn_transform call.



Index: Makefile.in
===
--- Makefile.in (revision 217448)
+++ Makefile.in (working copy)
@@ -1304,6 +1304,7 @@
lra-constraints.o \
lra-eliminations.o \
lra-lives.o \
+   lra-remat.o \
lra-spills.o \
lto-cgraph.o \
lto-streamer.o \
Index: common.opt
===
--- common.opt  (revision 217448)
+++ common.opt  (working copy)
@@ -1551,6 +1551,10 @@
 Common Ignore
 Does nothing.  Preserved for backward compatibility.
 
+flra-remat
+Common Report Var(flag_lra_remat) Optimization
+Do CFG-sensitive rematerialization in LRA
+
 flto
 Common
 Enable link-time optimization.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 217448)
+++ doc/invoke.texi (working copy)
@@ -392,7 +392,7 @@
 -fisolate-erroneous-paths-dereference -fisolate-erroneous-paths-attribute @gol
 -fivopts -fkeep-inline-functions -fkeep-static-consts -flive-range-shrinkage 
@gol
 -floop-block -floop-interchange -floop-strip-mine -floop-nest-optimize @gol
--floop-parallelize-all -flto -

Re: PATCH [3 of 7], rs6000, add support for scalar floating point in Altivec registers

2014-11-12 Thread David Edelsohn

On Tue, Nov 11, 2014 at 8:02 PM, Michael Meissner
 wrote:
> Last year, in preparation for the upper regs patch, I went through the
> instructions, and changed the iterators, so the vector support was in vsx.md,
> while the scalar support was in rs6000.md.  I missed the int<->fp conversion
> operators.  This patch moves the scalar support into rs6000.md.  Note, this
> patch will generate the non-VSX form of the conversion instrution if the
> registers are both traditional floating point (in GCC 4.x, it would generate
> the VSX scalar operation).  The upper register support will not be enabled
> until a future patch.  Is this patch acceptable to be checked in when the
> PowerPC bootstraps?
>
> I also fixed up the tests that were affected by these changes.
>
> [gcc, patch]
> 2014-11-11  Michael Meissner  
>
> * config/rs6000/vsx.md (vsx_float2): Only provide the
> vector forms of the instructions.  Move VSX scalar forms to
> rs6000.md, and add support for -mupper-regs-sf.
> (vsx_floatuns2): Likewise.
> (vsx_fix_trunc2): Likewise.
> (vsx_fixuns_trunc2): Likewise.
> (vsx_float_fix_2): Delete DF version, rename to
> vsx_float_fix_v2df2.
> (vsx_float_fix_v2df2): Likewise.
>
> * config/rs6000/rs6000.md (Fa): New mode attribute to give
> constraint for the Altivec registers for a type.
> (extendsfdf2_fpr): Use correct constraint.
> (copysign3_fcpsgn): For SFmode, use correct xscpsgndp
> instruction.
> (floatsi2_lfiwax): Add support for -mupper-regs-{sf,df}.
> Generate the non-VSX instruction if all registers were FPRs.  Do
> not use the patterns in vsx.md for scalar operations.
> (floatsi2_lfiwax_mem): Likewise.
> (floatunssi2_lfiwzx): Likewise.
> (floatunssi2_lfiwzx_mem): Likewise.
> (fix_truncdi2_fctidz): Likewise.
> (fixuns_truncdi2_fctiduz): Likewise.
> (fctiwz_): Likewise.
> (fctiwuz_): Likewise.
> (friz): Likewise.
> (floatdidf2_fpr): Likewise.
> (floatdidf2_mem): Likewise.
> (floatunsdidf2): Likewise.
> (floatunsdidf2_fcfidu): Likewise.
> (floatunsdidf2_mem): Likewise.
> (floatdisf2_fcfids): Likewise.
> (floatdisf2_mem): Likewise.
> (floatdisf2_internal1): Add explicit test for not FCFIDS to make
> it more obvious that the code is for pre-ISA 2.06 machines.
> (floatdisf2_internal2): Likewise.
> (floatunsdisf2_fcfidus): Add support for -mupper-regs-{sf,df}.
> Generate the non-VSX instruction if all registers were FPRs.  Do
> not use the patterns in vsx.md for scalar operations.
> (floatunsdisf2_mem): Likewise.
>
> [gcc/testsuite]
> 2014-11-11  Michael Meissner  
>
> * gcc.target/powerpc/ppc-fpconv-1.c: Adjust for -mupper-regs-df
> changes.
> * gcc.target/powerpc/ppc-fpconv-2.c: Likewise.
> * gcc.target/powerpc/ppc-fpconv-3.c: Likewise.
> * gcc.target/powerpc/ppc-fpconv-4.c: Likewise.
> * gcc.target/powerpc/ppc-fpconv-5.c: Likewise.
> * gcc.target/powerpc/ppc-fpconv-6.c: Likewise.
> * gcc.target/powerpc/ppc-fpconv-7.c: Likewise.
> * gcc.target/powerpc/ppc-fpconv-8.c: Likewise.
> * gcc.target/powerpc/ppc-fpconv-9.c: Likewise.
> * gcc.target/powerpc/ppc-fpconv-10.c: Likewise.
> * gcc.target/powerpc/ppc-round.c: Likewise.

Okay.

Thanks, David

Re: PATCH [2 of 7], rs6000, add support for scalar floating point in Altivec registers

2014-11-12 Thread David Edelsohn

On Tue, Nov 11, 2014 at 7:56 PM, Michael Meissner
 wrote:
> When I did the original power7 work, I put the reload handlers into vector.md,
> since they were only used for vector types.  Since they are now more general, 
> I
> am moving these insns from vector.md to rs6000.md.  Is this patch acceptable 
> to
> be checked in once the PowerPC boostraps again.
>
> 2014-11-11  Michael Meissner  
>
> * config/rs6000/vector.md (VEC_R): Move secondary reload support
> insns to rs6000.md from vector.md.
> (reload___store): Likewise.
> (reload___load): Likewise.
> (vec_reload_and_plus_): Likewise.
>
> * config/rs6000/rs6000.md (RELOAD): New mode iterator for all of
> the types that have secondary reload address support to load up a
> base register.
> (reload___store): Move the reload
> handlers here from vector.md, and expand the types we generate
> reload handlers for.
> (reload___load): Likewise.
> (vec_reload_and_plus_): Likewise.

Okay.

Thanks, David

Re: PATCH [patch 1 of 7], rs6000, add support for scalar floating point in Altivec registers

2014-11-12 Thread David Edelsohn

On Tue, Nov 11, 2014 at 7:53 PM, Michael Meissner
 wrote:
> At one stage in working on the patches, I was editing the easy_fp_constant
> predicate, and I noticed there was a test for the constant 0.0 at the top of
> the function, as well as redundant tests in SFmode/DFmode.  I deleted these
> redundant tests.  Is the patch acceptable to check in once the PowerPC
> bootstraps again:
>
> 2014-11-11  Michael Meissner  
>
> * config/rs6000/predicates.md (easy_fp_constant): Delete redunant
> tests for 0.0.

This is okay.

Thanks, David

Re: [PR c/52952] More precise locations within format strings

2014-11-12 Thread Manuel López-Ibáñez

On 13 November 2014 03:07, Mike Stump  wrote:
> On Nov 12, 2014, at 3:52 PM, Manuel López-Ibáñez  
> wrote:
>> On 12 November 2014 22:41, Jakub Jelinek  wrote:
>>> I think we had discussions on this topic, the important thing is that we
>>> don't want to generate different warnings based on whether -save-temps has
>>> been used, there could be just comment in between ) and ; etc.
>>
>> How can --save-temps influence whether ";" is in the same line as the
>> 'if' or not?
>
> He was worried about:
>
> int main() {
> if (1)/*hi*/;
> }
> mrs@alcmene:~/work1/gcc-c2e/gcc$ gcc -E tt.c
> int main() {
>  if (1) ;
> }
>
> but, no need to worry, there is a space in there.  People that get this wrong 
> do );, without a space.  They also don't have a comment in there either.  The 
> eye disappears the ; when right next to ), but, if there is a space, the eye 
> can find it pretty easy.

But that has nothing to do with the change I proposed, which is to not
warn if ";" is on a different line. The actual heuristics of Clang are
more complicated and avoid even more false positives, but our location
info is not precise enough yet to implement them.

Cheers,

Manuel.

Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign

2014-11-12 Thread David Malcolm

On Tue, 2014-11-11 at 08:26 +0100, Jakub Jelinek wrote:
> On Mon, Nov 10, 2014 at 05:27:50PM -0500, David Malcolm wrote:
> > On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote:
> > > On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote:
> > > > To be constructive here - the above case is from within a
> > > > GIMPLE_ASSIGN case label
> > > > and thus I'd have expected
> > > > 
> > > > case GIMPLE_ASSIGN:
> > > >   {
> > > > gassign *a1 = as_a  (s1);
> > > > gassign *a2 = as_a  (s2);
> > > >   lhs1 = gimple_assign_lhs (a1);
> > > >   lhs2 = gimple_assign_lhs (a2);
> > > >   if (TREE_CODE (lhs1) != SSA_NAME
> > > >   && TREE_CODE (lhs2) != SSA_NAME)
> > > > return (operand_equal_p (lhs1, lhs2, 0)
> > > > && gimple_operand_equal_value_p (gimple_assign_rhs1 
> > > > (a1),
> > > >  gimple_assign_rhs1 
> > > > (a2)));
> > > >   else if (TREE_CODE (lhs1) == SSA_NAME
> > > >&& TREE_CODE (lhs2) == SSA_NAME)
> > > > return vn_valueize (lhs1) == vn_valueize (lhs2);
> > > >   return false;
> > > >   }
> > > > 
> > > > instead.  That's the kind of changes I have expected and have approved 
> > > > of.
> > > 
> > > But even that looks like just adding extra work for all developers, with 
> > > no
> > > gain.  You only have to add extra code and extra temporaries, in switches
> > > typically also have to add {} because of the temporaries and thus extra
> > > indentation level, and it doesn't simplify anything in the code.
> > 
> > The branch attempts to use the C++ typesystem to capture information
> > about the kinds of gimple statement we expect, both:
> >   (A) so that the compiler can detect type errors, and
> >   (B) as a comprehension aid to the human reader of the code
> > 
> > The ideal here is when function params and struct field can be
> > strengthened from "gimple" to a subclass ptr.  This captures the
> > knowledge that every use of a function or within a struct has a given
> > gimple code.
> 
> I just don't like all the as_a/is_a stuff enforced everywhere,
> it means more typing, more temporaries, more indentation.
> So, as I view it, instead of the checks being done cheaply (yes, I think
> the gimple checking as we have right now is very cheap) under the
> hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes
> put the burden on the developers, who has to check that manually through
> the as_a/is_a stuff everywhere, more typing and uglier syntax.
> I just don't see that as a step forward, instead a huge step backwards.
> But perhaps I'm alone with this.

I agree with you w.r.t. my later patches.   I like my initial set of 89
patches, but I overdid things with the attempt to convert all of the
gimple_assign_ accessors.

> Can you e.g. compare the size of - lines in your patchset combined, and
> size of + lines in your patchset?  As in, if your changes lead to less
> typing or more.

I don't know if this data matches the proposed interpretation
(autoindentation in emacs is wonderful), but here goes:

Diff of my current working copy vs last merge point, ignoring ChangeLog
additions, obtaining lines starting "-", piping into wc:

$ git diff fddbd0194b01f44c5b5f16379fd5405dcf6d71c0 $(git diff
fddbd0194b01f44c5b5f16379fd5405dcf6d71c0 | diffstat -lp1 | grep -v
ChangeLog) | grep -e "^-" | wc
   6169   31032  272916

Likewise for lines starting with "+":

$ git diff fddbd0194b01f44c5b5f16379fd5405dcf6d71c0 $(git diff
fddbd0194b01f44c5b5f16379fd5405dcf6d71c0 | diffstat -lp1 | grep -v
ChangeLog) | grep -e "^+" | wc
   7295   36566  309380

so the + lines have 13% more bytes than the - lines


Dave

Re: [PR c/52952] More precise locations within format strings

2014-11-12 Thread Mike Stump

On Nov 12, 2014, at 3:52 PM, Manuel López-Ibáñez  wrote:
> On 12 November 2014 22:41, Jakub Jelinek  wrote:
>> I think we had discussions on this topic, the important thing is that we
>> don't want to generate different warnings based on whether -save-temps has
>> been used, there could be just comment in between ) and ; etc.
> 
> How can --save-temps influence whether ";" is in the same line as the
> 'if' or not?

He was worried about:

int main() {
if (1)/*hi*/;
}
mrs@alcmene:~/work1/gcc-c2e/gcc$ gcc -E tt.c
int main() {
 if (1) ;
}

but, no need to worry, there is a space in there.  People that get this wrong 
do );, without a space.  They also don’t have a comment in there either.  The 
eye disappears the ; when right next to ), but, if there is a space, the eye 
can find it pretty easy.

Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign

2014-11-12 Thread David Malcolm

On Tue, 2014-11-11 at 11:43 +0100, Richard Biener wrote:
> On Tue, Nov 11, 2014 at 8:26 AM, Jakub Jelinek  wrote:
> > On Mon, Nov 10, 2014 at 05:27:50PM -0500, David Malcolm wrote:
> >> On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote:
> >> > On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote:
> >> > > To be constructive here - the above case is from within a
> >> > > GIMPLE_ASSIGN case label
> >> > > and thus I'd have expected
> >> > >
> >> > > case GIMPLE_ASSIGN:
> >> > >   {
> >> > > gassign *a1 = as_a  (s1);
> >> > > gassign *a2 = as_a  (s2);
> >> > >   lhs1 = gimple_assign_lhs (a1);
> >> > >   lhs2 = gimple_assign_lhs (a2);
> >> > >   if (TREE_CODE (lhs1) != SSA_NAME
> >> > >   && TREE_CODE (lhs2) != SSA_NAME)
> >> > > return (operand_equal_p (lhs1, lhs2, 0)
> >> > > && gimple_operand_equal_value_p (gimple_assign_rhs1 
> >> > > (a1),
> >> > >  gimple_assign_rhs1 
> >> > > (a2)));
> >> > >   else if (TREE_CODE (lhs1) == SSA_NAME
> >> > >&& TREE_CODE (lhs2) == SSA_NAME)
> >> > > return vn_valueize (lhs1) == vn_valueize (lhs2);
> >> > >   return false;
> >> > >   }
> >> > >
> >> > > instead.  That's the kind of changes I have expected and have approved 
> >> > > of.
> >> >
> >> > But even that looks like just adding extra work for all developers, with 
> >> > no
> >> > gain.  You only have to add extra code and extra temporaries, in switches
> >> > typically also have to add {} because of the temporaries and thus extra
> >> > indentation level, and it doesn't simplify anything in the code.
> >>
> >> The branch attempts to use the C++ typesystem to capture information
> >> about the kinds of gimple statement we expect, both:
> >>   (A) so that the compiler can detect type errors, and
> >>   (B) as a comprehension aid to the human reader of the code
> >>
> >> The ideal here is when function params and struct field can be
> >> strengthened from "gimple" to a subclass ptr.  This captures the
> >> knowledge that every use of a function or within a struct has a given
> >> gimple code.
> >
> > I just don't like all the as_a/is_a stuff enforced everywhere,
> > it means more typing, more temporaries, more indentation.
> > So, as I view it, instead of the checks being done cheaply (yes, I think
> > the gimple checking as we have right now is very cheap) under the
> > hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes
> > put the burden on the developers, who has to check that manually through
> > the as_a/is_a stuff everywhere, more typing and uglier syntax.
> > I just don't see that as a step forward, instead a huge step backwards.
> > But perhaps I'm alone with this.
> > Can you e.g. compare the size of - lines in your patchset combined, and
> > size of + lines in your patchset?  As in, if your changes lead to less
> > typing or more.
> 
> I see two ways out here.  One is to add overloads to all the functions
> taking the special types like
> 
> tree
> gimple_assign_rhs1 (gimple *);
> 
> or simply add
> 
> gassign *operator ()(gimple *g) { return as_a  (g); }
> 
> into a gimple-compat.h header which you include in places that
> are not converted "nicely".

Thanks for the suggestions.

Am I missing something, or is the gimple-compat.h idea above not valid C
++?

Note that "gimple" is still a typedef to
  gimple_statement_base *
(as noted before, the gimple -> gimple * change would break everyone
else's patches, so we talked about that as a followup patch for early
stage3).

Given that, if I try to create an "operator ()" outside of a class, I
get this error:

‘gassign* operator()(gimple)’ must be a nonstatic member function

which is emitted from cp/decl.c's grok_op_properties:
  /* An operator function must either be a non-static member function
 or have at least one parameter of a class, a reference to a class,
 an enumeration, or a reference to an enumeration.  13.4.0.6 */

I tried making it a member function of gimple_statement_base, but that
doesn't work either: we want a conversion
  from a gimple_statement_base * to a gassign *, not
  from a gimple_statement_base   to a gassign *.

Is there some syntactic trick here that I'm missing?  Sorry if I'm being
dumb (I can imagine there's a way of doing it by making "gimple" become
some kind of wrapped ptr class, but that way lies madness, surely).

> Both avoid manually making the compiler happy (which the
> explicit as_a<> stuff is!  It doesn't add any "checking" - it's
> just placing the as_a<> at the callers and thus make the
> runtine ICE fire there).
> 
> As much as I don't like "global" conversion operators I don't
> like adding overloads to all of the accessor functions even more.

(nods)

Some other options:

Option 3: only convert the "easy" accessors: the ones I already did in
the /89 patch kit, as reviewed by Jeff, and rebased by me recently,
which

libsanitizer merge from upstream r221802

2014-11-12 Thread Konstantin Serebryany

Hi,

Here is one more merge of libsanitizer (last one was in Sept).

Tested on x86_64 Ubuntu 14.04 like this:
rm -rf */{*/,}libsanitizer && make -j 50
make -j 40 -C gcc check-g{cc,++}
RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} asan.exp' && \
make -j 40 -C gcc check-g{cc,++}
RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} tsan.exp' && \
make -j 40 -C gcc check
RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} ubsan.exp' && \
echo PASS

Expected ChangeLog entry:

2014-11-12  Kostya Serebryany  

* All source files: Merge from upstream r221802.
* sanitizer_common/sanitizer_symbolizer_libbacktrace.cc
  (LibbacktraceSymbolizer::SymbolizeData): replace 'address'
  with 'start' to follow the new interface.
* asan/Makefile.am (AM_CXXFLAGS): added -std=c++11.
* interception/Makefile.am (AM_CXXFLAGS): added -std=c++11.
* libbacktrace/Makefile.am (AM_CXXFLAGS): added -std=c++11.
* lsan/Makefile.am (AM_CXXFLAGS): added -std=c++11.
* sanitizer_common/Makefile.am (sanitizer_common_files): Added new
  files.
  (AM_CXXFLAGS): added -std=c++11.
* tsan/Makefile.am (AM_CXXFLAGS): added -std=c++11.
* ubsan/Makefile.am (AM_CXXFLAGS): added -std=c++11.
* asan/Makefile.in: Regenerate.
* interception/Makefile.in: Regenerate.
* libbacktrace/Makefile.in: Regenerate.
* lsan/Makefile.in: Regenerate.
* sanitizer_common/Makefile.in: Regenerate.
* tsan/Makefile.in: Regenerate.
* ubsan/Makefile.in: Regenerate.


libsanitizer-221802.patch.gz
Description: GNU Zip compressed data

[PATCH obvious] Fix quoting in several diagnostics

2014-11-12 Thread Manuel López-Ibáñez

Committed as obvious (after bootstrapping®test).

https://gcc.gnu.org/r217457


2014-11-13  Manuel López-Ibáñez  

* opts-global.c (postpone_unknown_option_warning): Fix spelling.
(print_ignored_options): Fix quoting.
* opts.c (common_handle_option): Likewise.
(set_debug_level): Likewise.
* toplev.c (process_options): Likewise.



Index: gcc/opts-global.c
===
--- gcc/opts-global.c(revision 217382)
+++ gcc/opts-global.c(working copy)
@@ -132,7 +132,7 @@
we only complain about unknown -Wno-* options if they may have
prevented a diagnostic. Otherwise, we just ignore them.  Note that
if we do complain, it is only as a warning, not an error; passing
-   the compiler an unrecognised -Wno-* option should never change
+   the compiler an unrecognized -Wno-* option should never change
whether the compilation succeeds or fails.  */

 static void
@@ -152,7 +152,7 @@

   opt = ignored_options.pop ();
   warning_at (UNKNOWN_LOCATION, 0,
-  "unrecognized command line option \"%s\"", opt);
+  "unrecognized command line option %qs", opt);
 }
 }

Index: gcc/opts.c
===
--- gcc/opts.c(revision 217382)
+++ gcc/opts.c(working copy)
@@ -1923,7 +1923,7 @@
  ? STATIC_BUILTIN_STACK_CHECK
  : GENERIC_STACK_CHECK;
   else
-warning_at (loc, 0, "unknown stack check parameter \"%s\"", arg);
+warning_at (loc, 0, "unknown stack check parameter %qs", arg);
   break;

 case OPT_fstack_limit:
@@ -2199,7 +2199,7 @@
   if (opts_set->x_write_symbols != NO_DEBUG
   && opts->x_write_symbols != NO_DEBUG
   && type != opts->x_write_symbols)
-error_at (loc, "debug format \"%s\" conflicts with prior selection",
+error_at (loc, "debug format %qs conflicts with prior selection",
   debug_type_names[type]);
   opts->x_write_symbols = type;
   opts_set->x_write_symbols = type;
@@ -2217,9 +2217,9 @@
 {
   int argval = integral_argument (arg);
   if (argval == -1)
-error_at (loc, "unrecognised debug output level \"%s\"", arg);
+error_at (loc, "unrecognised debug output level %qs", arg);
   else if (argval > 3)
-error_at (loc, "debug output level %s is too high", arg);
+error_at (loc, "debug output level %qs is too high", arg);
   else
 opts->x_debug_info_level = (enum debug_info_levels) argval;
 }
Index: gcc/toplev.c
===
--- gcc/toplev.c(revision 217382)
+++ gcc/toplev.c(working copy)
@@ -1463,7 +1463,7 @@
 debug_hooks = &vmsdbg_debug_hooks;
 #endif
   else
-error ("target system does not support the \"%s\" debug format",
+error ("target system does not support the %qs debug format",
debug_type_names[write_symbols]);

   /* We know which debug output will be used so we can set flag_var_tracking

Re: [PATCH] Make -fdiagnostics-color= default configurable, default to =auto

2014-11-12 Thread Manuel López-Ibáñez

I think it is great that =auto becomes the default. However, adding a
configure option for this seems overkill. If anyone really really
wishes to change the default, they can simply edit a single-line in
common.opt.

Apart from that, I really cannot understand why someone would want the
options none, auto or always: none is equivalent to auto-if-env but
without the "if-env" escape route, and neither auto nor always allow
disabling the coloring globally (which is precisely what users hating
coloring are going to be asking for and they won't be happy if we tell
them to recompile gcc).

I think there are only two defaults that matter:
auto-if-env-otherwise-never (the current one) and
never-if-env-otherwise-auto. I wish we had chosen the second one
earlier, but this is not such a critical feature to not change the
default now. With never-if-even-otherwise-auto, an empty GCC_COLORS
(or a value ="never") could disable coloring. Currently,
GCC_COLORS="never" actually enables coloring (well, any value does).

Cheers,

Manuel.

Go patch committed: Report unused packages correctly for ambiguous lookups

2014-11-12 Thread Ian Taylor

This patch from Chris Manghane fixes the Go frontend to correctly
report unused packages when there are ambiguous lookups.  Previously
some cases involving composite literals could cause a package to be
considered as used, when it was really not used.  That is, in some
cases, the compiler did not report an error that it should have
reported.  This is http://golang.org/issue/6427 .

Bootstrapped and ran testsuite on x86_64-unknown-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 217402)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -12791,6 +12791,16 @@ Composite_literal_expression::lower_stru
{
case EXPRESSION_UNKNOWN_REFERENCE:
  name = name_expr->unknown_expression()->name();
+ if (type->named_type() != NULL)
+   {
+ // If the named object found for this field name comes from a
+ // different package than the struct it is a part of, do not count
+ // this incorrect lookup as a usage of the object's package.
+ no = name_expr->unknown_expression()->named_object();
+ if (no->package() != NULL
+ && no->package() != 
type->named_type()->named_object()->package())
+   no->package()->forget_usage(name_expr);
+   }
  break;
 
case EXPRESSION_CONST_REFERENCE:
Index: gcc/go/gofrontend/gogo.cc
===
--- gcc/go/gofrontend/gogo.cc   (revision 217402)
+++ gcc/go/gofrontend/gogo.cc   (working copy)
@@ -1412,7 +1412,7 @@ Gogo::lookup(const std::string& name, Na
   if (ret != NULL)
{
  if (ret->package() != NULL)
-   ret->package()->set_used();
+   ret->package()->note_usage();
  return ret;
}
 }
@@ -7426,6 +7426,36 @@ Package::set_priority(int priority)
 this->priority_ = priority;
 }
 
+// Forget a given usage.  If forgetting this usage means this package becomes
+// unused, report that error.
+
+void
+Package::forget_usage(Expression* usage) const
+{
+  if (this->fake_uses_.empty())
+return;
+
+  std::set::iterator p = this->fake_uses_.find(usage);
+  go_assert(p != this->fake_uses_.end());
+  this->fake_uses_.erase(p);
+
+  if (this->fake_uses_.empty())
+error_at(this->location(), "imported and not used: %s",
+Gogo::message_name(this->package_name()).c_str());
+}
+
+// Clear the used field for the next file.  If the only usages of this package
+// are possibly fake, keep the fake usages for lowering.
+
+void
+Package::clear_used()
+{
+  if (this->used_ > this->fake_uses_.size())
+this->fake_uses_.clear();
+
+  this->used_ = 0;
+}
+
 // Determine types of constants.  Everything else in a package
 // (variables, function declarations) should already have a fixed
 // type.  Constants may have abstract types.
Index: gcc/go/gofrontend/gogo.h
===
--- gcc/go/gofrontend/gogo.h(revision 217402)
+++ gcc/go/gofrontend/gogo.h(working copy)
@@ -2645,17 +2645,25 @@ class Package
   // Whether some symbol from the package was used.
   bool
   used() const
-  { return this->used_; }
+  { return this->used_ > 0; }
 
   // Note that some symbol from this package was used.
   void
-  set_used() const
-  { this->used_ = true; }
+  note_usage() const
+  { this->used_++; }
+
+  // Note that USAGE might be a fake usage of this package.
+  void
+  note_fake_usage(Expression* usage) const
+  { this->fake_uses_.insert(usage); }
+
+  // Forget a given USAGE of this package.
+  void
+  forget_usage(Expression* usage) const;
 
   // Clear the used field for the next file.
   void
-  clear_used()
-  { this->used_ = false; }
+  clear_used();
 
   // Whether this package was imported in the current file.
   bool
@@ -2749,10 +2757,12 @@ class Package
   int priority_;
   // The location of the import statement.
   Location location_;
-  // True if some name from this package was used.  This is mutable
-  // because we can use a package even if we have a const pointer to
-  // it.
-  mutable bool used_;
+  // The amount of times some name from this package was used.  This is mutable
+  // because we can use a package even if we have a const pointer to it.
+  mutable size_t used_;
+  // A set of possibly fake uses of this package.  This is mutable because we
+  // can track fake uses of a package even if we have a const pointer to it.
+  mutable std::set fake_uses_;
   // True if this package was imported in the current file.
   bool is_imported_;
   // True if this package was imported with a name of "_".
Index: gcc/go/gofrontend/parse.cc
===
--- gcc/go/gofrontend/parse.cc  (revision 217402)
+++ gcc/go/gofrontend/parse.cc  (working copy)
@@ -199,7 +199,7 @@ Parse::qu

Re: [patch] libstdc++/57250 shared_ptr atomic operations

2014-11-12 Thread Jonathan Wakely


On 17/10/14 19:27 +0100, Jonathan Wakely wrote:

I'm not very proud of this solution, but unless anyone has a better
ideas this is how I plan to add the atomic operations for shared_ptr.

I used __gnu_cxx::__mutex instead of std::mutex because it has fewer
dependencies, so the atomic operations should always be available for
shared_ptr even when the target doesn't support C++11 threads.

The operations are also defined for the __shared_ptr class template.

Tested x86_64-linux.


And also tested on powerpc64-linux. Committed to trunk.
commit a4a18a1822e9215c141f205ed0d2f16e88f0138e
Author: Jonathan Wakely 
Date:   Fri Oct 17 18:56:46 2014 +0100

std::shared_ptr atomic operations

	PR libstdc++/57250
	* config/abi/pre/gnu.ver: Export new symbols.
	* include/Makefile.am: Add new header.
	* include/Makefile.in: Regenerate.
	* include/bits/shared_ptr_atomic.h: Define atomic access functions.
	* include/std/memory: Include new header.
	* src/c++11/shared_ptr.cc (_Sp_locker): Define and use mutex pool.
	* testsuite/20_util/shared_ptr/atomic/1.cc: New.
	* testsuite/20_util/shared_ptr/atomic/2.cc: New.
	* testsuite/20_util/shared_ptr/atomic/3.cc: New.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index 4c6d994..bd44bcc 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1476,6 +1476,9 @@ GLIBCXX_3.4.21 {
 # std::ctype_base::blank
 _ZNSt10ctype_base5blankE;
 
+# std::_Sp_locker::*
+_ZNSt10_Sp_locker[CD]*;
+
 } GLIBCXX_3.4.20;
 
 
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 5fa243b..e6edc73 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -143,6 +143,7 @@ bits_headers = \
 	${bits_srcdir}/stream_iterator.h \
 	${bits_srcdir}/streambuf_iterator.h \
 	${bits_srcdir}/shared_ptr.h \
+	${bits_srcdir}/shared_ptr_atomic.h \
 	${bits_srcdir}/shared_ptr_base.h \
 	${bits_srcdir}/slice_array.h \
 	${bits_srcdir}/sstream.tcc \
diff --git a/libstdc++-v3/include/bits/shared_ptr_atomic.h b/libstdc++-v3/include/bits/shared_ptr_atomic.h
new file mode 100644
index 000..79e35ec
--- /dev/null
+++ b/libstdc++-v3/include/bits/shared_ptr_atomic.h
@@ -0,0 +1,330 @@
+// shared_ptr atomic access -*- C++ -*-
+
+// Copyright (C) 2014 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file bits/shared_ptr_atomic.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{memory}
+ */
+
+#ifndef _SHARED_PTR_ATOMIC_H
+#define _SHARED_PTR_ATOMIC_H 1
+
+#include 
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  /**
+   * @addtogroup pointer_abstractions
+   * @{
+   */
+
+  struct _Sp_locker
+  {
+_Sp_locker(const _Sp_locker&) = delete;
+_Sp_locker& operator=(const _Sp_locker&) = delete;
+
+#ifdef __GTHREADS
+explicit
+_Sp_locker(const void*) noexcept;
+_Sp_locker(const void*, const void*) noexcept;
+~_Sp_locker();
+
+  private:
+unsigned char _M_key1;
+unsigned char _M_key2;
+#else
+explicit _Sp_locker(const void*, const void* = nullptr) { }
+#endif
+  };
+
+  /**
+   *  @brief  Report whether shared_ptr atomic operations are lock-free.
+   *  @param  __p A non-null pointer to a shared_ptr object.
+   *  @return True if atomic access to @c *__p is lock-free, false otherwise.
+   *  @{
+  */
+  template
+inline bool
+atomic_is_lock_free(const __shared_ptr<_Tp, _Lp>* __p)
+{
+#ifdef __GTHREADS
+  return __gthread_active_p() == 0;
+#else
+  return true;
+#endif
+}
+
+  template
+inline bool
+atomic_is_lock_free(const shared_ptr<_Tp>* __p)
+{ return std::atomic_is_lock_free<_Tp, __default_lock_policy>(__p); }
+
+  // @}
+
+  /**
+   *  @brief  Atomic load for shared_ptr objects.
+   *  @param  __p A non-null pointer to a shared_ptr object.
+   *  @return @c *__p
+   *
+   *

Re: [PR c/52952] More precise locations within format strings

2014-11-12 Thread Manuel López-Ibáñez

On 12 November 2014 22:41, Jakub Jelinek  wrote:
> I think we had discussions on this topic, the important thing is that we
> don't want to generate different warnings based on whether -save-temps has
> been used, there could be just comment in between ) and ; etc.

How can --save-temps influence whether ";" is in the same line as the
'if' or not?

Cheers,

Manuel.

Re: [GRAPHITE, PATCH] Loop unroll and jam optimization

2014-11-12 Thread Mircea Namolaru


> 
> > At this point pbb->transformed has two basic maps, one is the mapping for
> > unroll and jam,
> > and one for the full tile for the striped dimension. Introduce a check that
> > differentiate
> > between them as the image of one maps should be included in the other.
> 
> I didn't do a full review (and I won't have time for it either),
> but at a quick glance,
> you still seem to be assuming that if you take the union of two
> basic maps, that you can extract the original two basic maps from the union.
> In general, you can't.  At least you shouldn't assume that you can.
> Besides, your updated code is also pretty convoluted,
> with very little documentation.
> 

Many thanks. To considerably ease the computation of separation class, needed 
to transmit information
available at scheduling computation time to AST computation time. I didn't want 
to
introduce new fields in graphite global structures, and tried to trick 
pbb->transformed
to contain the unroll and jam mapping as well as a mapping use to compute the 
separation
class. It wasn't good idea (for the mappings used by unroll and jam my version 
of isl preserve the basic mappings afteer union), but of course my code should 
not be based on
assumptions not guaranteed by isl semantics.

So, in this updated patch I go the other way and keep the additional 
information required in the
pbb structure. This has the advantages that removes the most convoluted part of 
my code dealing 
with the two basic maps kept in pbb->transformed. As before in 
graphite-optimize_isl,c, 
beside the map for unroll and jam, another map needed for the separation class 
defined by
full tiles is computed. In graphite-isl-ast-to-gimple.c, the separation class 
option is set (using 
the auxiliary map computed previously) and is added to the other AST build 
options. 

Mircea

2014-11-12  Mircea Namolaru  

* common.opt (flag_loop_unroll_and_jam): New flag for unroll and jam.
* params.def (PARAM_LOOP_UNROLL_JAM_SIZE)
(PARAM_LOOP_UNROLL_JAM_DEPTH): Parameters for unroll and jam flag.
* graphite-poly.h (struct poly_bb:map_sepclass): New field
* graphite-poly.c (new_poly_bb): New field intialization.
* graphite-isl-ast-to-gimple.c (generate_luj_sepclass_opt,
generate_luj_sepclass,generate_luj_options):
New functions to set AST options for unroll and jam.
* graphite-isl-ast-to-gimple.c (set_options,scop_to_isl_ast):
Added support for unroll and jam options.
* graphite-optimize-isl.c (getPrevectorMap_full): New function for
sepating class map.
* graphite-optimize-isl.c (optimize_isl,apply_schedule_map_to_scop,
getScheduleMap,getScheduleForBand,getScheduleForBandList):
Added support for the separating class map.
* graphite.c: Support for unroll and jam flag.
* graphite-poly.c: Likewise.
* toplev.c: Likewise.

Index: gcc/toplev.c
===
--- gcc/toplev.c	(revision 217013)
+++ gcc/toplev.c	(working copy)
@@ -1302,11 +1302,12 @@
   || flag_loop_block
   || flag_loop_interchange
   || flag_loop_strip_mine
-  || flag_loop_parallelize_all)
+  || flag_loop_parallelize_all
+  || flag_loop_unroll_jam)
 sorry ("Graphite loop optimizations cannot be used (ISL is not available)" 
 	   "(-fgraphite, -fgraphite-identity, -floop-block, "
 	   "-floop-interchange, -floop-strip-mine, -floop-parallelize-all, "
-	   "and -ftree-loop-linear)");
+	   "-floop-unroll-and-jam, and -ftree-loop-linear)");
 #endif
 
   /* One region RA really helps to decrease the code size.  */
Index: gcc/graphite-optimize-isl.c
===
--- gcc/graphite-optimize-isl.c	(revision 217013)
+++ gcc/graphite-optimize-isl.c	(working copy)
@@ -186,7 +186,7 @@
   PartialSchedule = isl_band_get_partial_schedule (Band);
   *Dimensions = isl_band_n_member (Band);
 
-  if (DisableTiling)
+  if (DisableTiling || flag_loop_unroll_jam)
 return PartialSchedule;
 
   /* It does not make any sense to tile a band with just one dimension.  */
@@ -241,7 +241,9 @@
constant number of iterations, if the number of loop iterations at
DimToVectorize can be devided by VectorWidth. The default VectorWidth is
currently constant and not yet target specific. This function does not reason
-   about parallelism.  */
+   about parallelism.
+
+  */
 static isl_map *
 getPrevectorMap (isl_ctx *ctx, int DimToVectorize,
 		 int ScheduleDimensions,
@@ -305,8 +307,89 @@
   isl_constraint_set_constant_si (c, VectorWidth - 1);
   TilingMap = isl_map_add_constraint (TilingMap, c);
 
-  isl_map_dump (TilingMap);
+  return TilingMap;
+}
 
+/* Compute an auxiliary map to getPrevectorMap, for computing the separating 
+   class defined by full tiles.  Used in graphite_isl_ast_to_gimple.c to set the 
+   corresponding option for AST build.
+ */ 
+static isl_map

Re: [PATCH][wwwdocs] Update changes.html with Cortex-A53 erratum workaround options

2014-11-12 Thread Gerald Pfeifer


On Wednesday 2014-11-12 16:31, Kyrill Tkachov wrote:
This patch adds the Cortex-A53 erratum workaround options item to the 
changes page (and adds the AArch64 section of that page in the 
process) Ok?


Looks perfect to me.

Thanks,
Gerald

Re: [PATCH][wwwdocs] Update changes.html with Cortex-A53 erratum workaround options

2014-11-12 Thread Jeff Law


On 11/12/14 09:31, Kyrill Tkachov wrote:

Hi all,

This patch adds the Cortex-A53 erratum workaround options item to the
changes page
(and adds the AArch64 section of that page in the process)
Ok?

OK
jeff

Re: [PATCH][12/n] Merge from match-and-simplify, pointer-plus patterns and forwprop re-org

2014-11-12 Thread H.J. Lu

On Fri, Nov 7, 2014 at 12:53 AM, Richard Biener  wrote:
>
> This interleaves stmt folding and manual simplifications done in
> forwprop into a single loop over all basic-blocks.  It somewhat
> complicates things as we need to make sure the lattice stays
> valid when releasing SSA names from old code or when purging
> dead EH edges (which we now delay).  But this ensures we don't
> regress by exposing dependences between the transforms still
> in forwprop and those we moved to patterns.
>
> This patch also goes forward and implements the POINTER_PLUS_EXPR
> patterns from tree-ssa-forwprop.c as patterns.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>
> Richard.
>
> 2014-11-07  Richard Biener  
>
> * match.pd: Add patterns for POINTER_PLUS_EXPR association
> and special patterns from tree-ssa-forwprop.c
> * fold-const.c (fold_binary_loc): Remove them here.
> * tree-ssa-forwprop.c (to_purge): New global bitmap.
> (fwprop_set_lattice_val): New function.
> (fwprop_invalidate_lattice): Likewise.
> (remove_prop_source_from_use): Instead of purging dead EH
> edges record blocks to do that in to_purge.
> (tidy_after_forward_propagate_addr): Likewise.
> (forward_propagate_addr_expr): Invalidate the lattice for
> SSA names we release.
> (simplify_conversion_from_bitmask): Likewise.
> (simplify_builtin_call): Likewise.
> (associate_pointerplus_align): Remove.
> (associate_pointerplus_diff): Likewise.
> (associate_pointerplus): Likewise.
> (fold_all_stmts): Merge with ...
> (pass_forwprop::execute): ... the original loop over all
> basic-blocks.  Delay purging dead EH edges and invalidate
> the lattice for SSA names we release.
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63839

Re: [PATCH][sched-deps] Remove needless check for modified_in_p when trying to fuse two non-conditional jump insns

2014-11-12 Thread Jeff Law


On 11/12/14 09:23, Kyrill Tkachov wrote:


Hi all,

I noticed that the check for modified_in_p in sched-deps is not needed
and needlessly restricts the insn pairs that we can check for macro
fusion in the backend hooks.
This patch removes the check. Currently that execution path is only used
in arm and aarch64.
This enables the backend hooks to catch more cases that I'm working on.

Bootstrapped on aarch64, x86, arm. Tested on aarch64.

Ok for trunk?

Thanks,
Kyrill

2014-11-12  Kyrylo Tkachov  

 * sched-deps.c (sched_macro_fuse_insns): Do not check modified_in_p
 in the not condtional jump case.

Typo in ChangeLog "condtional"

Can you please include a testcase which shows fusion occurring after 
this patch where it does not occur before this patch.


I think you probably need to include some kind of doc change around this 
change since you're effectively pushing responsibility for checking this 
into the ports.


With the testcase and doc change, this will probably be OK.  Please take 
care of those, repost for a review of the test & doc changes.


jeff

Re: Concepts code review

2014-11-12 Thread Andrew Sutton

>> Agreed. I'll probably start looking at this on Friday morning.
>
>
> Note that end of stage 1 is Saturday, as I just realized today.  So the
> sooner the better.  :)

Ouch. Good thing my merge with trunk broke in unexpected ways this
morning -- minimally, something in cgraph ended up missing a #include
in the merge and I don't know how to fix it :/

Andrew

Re: [PATCH] Propagate nonfreeing_call_p using ipa-pure-const

2014-11-12 Thread Jan Hubicka

> On Wed, Nov 12, 2014 at 02:49:30PM +0400, Maxim Ostapenko wrote:
> > We used this code for IPA propagation of nonfreeing_call_p. It implemented
> > with a separate pass, but it probably could be propagated in some existing
> > one. This analysis doesn't seem to be costly thought, we didn't see any
> > significant slowdown compiling big files.
> 
> Here it is rewritten using ipa-pure-const which is where Richard/Honza
> suggested it should be done in.
> 
> I wonder if the nonfreeing_call_p function shouldn't be moved elsewhere
> though (suggestion where), so that gimple.c doesn't need the cgraph
> includes.
> 
> In any case, bootstrapped/regtested on x86_64-linux and i686-linux.
> 
> 2014-11-12  Jakub Jelinek  
> 
>   * ipa-pure-const.c (struct funct_state_d): Add can_free field.
>   (varying_state): Add true for can_free.
>   (check_call): For builtin or internal !nonfreeing_call_p set
>   local->can_free.
>   (check_stmt): For asm volatile and asm with "memory" set
>   local->can_free.
>   (analyze_function): Clear local->can_free initially, continue
>   calling check_stmt until all flags are computed, dump can_free
>   flag.
>   (pure_const_write_summary): Write can_free flag.
>   (pure_const_read_summary): Read it back.
>   (propagate_can_free): New function.
>   (pass_ipa_pure_const::execute): Call it.
>   * cgraph.h (cgraph_node): Add nonfreeing_fn member.
>   * gimple.c: Include ipa-ref.h, lto-streamer.h and cgraph.h.
>   (nonfreeing_call_p): Return cgraph nonfreeing_fn flag if set.
>   * cgraph.c (cgraph_node::dump): Dump nonfreeing_fn flag.
>   * lto-cgraph.c (lto_output_node): Write nonfreeing_fn flag.
>   (input_overwrite_node): Read it back.
> 
> --- gcc/ipa-pure-const.c.jj   2014-11-12 18:32:56.351139726 +0100
> +++ gcc/ipa-pure-const.c  2014-11-12 21:11:08.574354600 +0100
> @@ -112,11 +112,15 @@ struct funct_state_d
>bool looping;
>  
>bool can_throw;
> +
> +  /* If function can call free, munmap or otherwise make previously
> + non-trapping memory accesses trapping.  */
> +  bool can_free;
>  };
>  
>  /* State used when we know nothing about function.  */
>  static struct funct_state_d varying_state
> -   = { IPA_NEITHER, IPA_NEITHER, true, true, true };
> +   = { IPA_NEITHER, IPA_NEITHER, true, true, true, true };
>  
>  
>  typedef struct funct_state_d * funct_state;
> @@ -559,6 +563,10 @@ check_call (funct_state local, gimple ca
>enum pure_const_state_e call_state;
>bool call_looping;
>  
> +  if (gimple_call_builtin_p (call, BUILT_IN_NORMAL)
> +   && !nonfreeing_call_p (call))
> + local->can_free = true;
> +
>if (special_builtin_state (&call_state, &call_looping, callee_t))
>   {
> worse_state (&local->pure_const_state, &local->looping,
> @@ -589,6 +597,8 @@ check_call (funct_state local, gimple ca
>   break;
> }
>  }
> +  else if (gimple_call_internal_p (call) && !nonfreeing_call_p (call))
> +local->can_free = true;

Actually I think you want to do this for can_throw, too.
We probably do not have throwing internal calls, but it is better to be safe.
> +/* Produce transitive closure over the callgraph and compute can_free
> +   attributes.  */
> +
> +static void
> +propagate_can_free (void)
> +{
> +  struct cgraph_node *node;
> +  struct cgraph_node *w;
> +  struct cgraph_node **order
> += XCNEWVEC (struct cgraph_node *, symtab->cgraph_count);
> +  int order_pos;
> +  int i;
> +  struct ipa_dfs_info *w_info;
> +
> +  order_pos = ipa_reduced_postorder (order, true, false, NULL);
> +  if (dump_file)
> +{
> +  cgraph_node::dump_cgraph (dump_file);
> +  ipa_print_order (dump_file, "reduced", order, order_pos);
> +}

The propagation seems fine, but I wonder if we won't get better memory locality 
doing this
during the propagation of pure/const?

nothrow flag goes in separate loop because knowledge of nothrow helps 
pure/const to work better.
Also one can ignore call edges that are !can_thros_externally to get fewer 
cycles, but apparently
this got never implemented.
>  && gimple_call_flags (call) & ECF_LEAF)
>  return true;
>  
> -  return false;
> +  tree fndecl = gimple_call_fndecl (call);
> +  if (!fndecl)
> +return false;
> +  struct cgraph_node *n = cgraph_node::get (fndecl);

You want to walk aliases,

  cgraph_node::get (fndecl)->function_symbol (&availability)

> +  if (!n || n->get_availability () <= AVAIL_INTERPOSABLE)

and use availability here.

OK with this change.

Honza

Re: [PATCH, MPX runtime 1/2] Integrate MPX runtime library

2014-11-12 Thread Joseph Myers

On Thu, 13 Nov 2014, Ilya Enkovich wrote:

> It's hard to decide which of runtime functionality should be
> considered as basic and how it should be used.  We may say that the
> only basic thing is hardware enabling which is enable_mpx and stop
> here.  But then you get minimal but quite useless library.  Yes, it
> can enable MPX and thus make bounds violation to interrupt a program.
> But users cannot enable/disable MPX dinamycally then.  Also they
> cannot configure it.  Thus either control via environmental variables
> appears in this core library or we transform initialization function
> from constructor to interface function and use it from another
> extended MPX library which support environment variables, logging etc.
> But the core library will only be used by this extended MPX library
> and nothing else. So why not to leave it as a single library as it is?

You can leave it as a single library - it's just that imposes libgcc-like 
constraints on what the library does and how it does things, so as to be 
usable for arbitrary programs built with MPX (e.g. using 
reserved-namespace names such as __write when available - direct syscalls 
may also be a possibility in some cases).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Fix ipa-pure-const can_throw propagation

2014-11-12 Thread Jan Hubicka

> Hi!
> 
> The following testcase is miscompiled starting with 4.6.
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
> 
> For 4.9/4.8, I'd prefer to use a one-liner instead:
> -  for (ie = node->indirect_calls; ie; ie = ie->next_callee)
> +  for (ie = w->indirect_calls; ie; ie = ie->next_callee)

OK for branches.
> 
> 2014-11-12  Jakub Jelinek  
> 
>   PR ipa/63838
>   * ipa-pure-const.c (propagate_nothrow): Walk w->indirect_calls
>   chain instead of node->indirect_calls.  Put !can_throw into
>   conditions of all the loops.
> 
>   * g++.dg/ipa/pr63838.C: New test.

OK for mainline,
thanks!
Honza

RE: [PATCHv4][MIPS] Implement O32 ABI extensions (GCC)

2014-11-12 Thread Matthew Fortune

> > Moore, Catherine  writes:
> > > The patch looks good.   Please fix up these couple of nits prior to
> > > committing.
> >
> > OK, thanks for the second read through. One further amendment below,
> > I'll aim to commit later today.
> >
> 
> Yes, that's better.

Committed as r217446

Fingers crossed there will be no fallout from it but if there is I'll deal
with it promptly.

Matthew

Re: [PATCH, MPX runtime 1/2] Integrate MPX runtime library

2014-11-12 Thread Ilya Enkovich

2014-11-13 0:40 GMT+03:00 Joseph Myers :
> On Wed, 12 Nov 2014, Ilya Enkovich wrote:
>
>> MPX runtime needs to be linked with programs using MPX because it
>> initializes hardware.  Without it all MPX instructions are just NOPs.
>> Thus it's not an extra functionality, but is for basic MPX
>> functionality.
>
> So what if you just have the constructor that initializes the hardware -
> could everything else (handling environment variables, printing more
> detailed diagnostics, etc.) be separate?  How much is basic MPX
> functionality, how much is extra?  Basic functionality should arguably be
> like libgcc in namespace terms (so not calling any libc functions outside
> of ISO C90 namespace, using reserved-namespace versions such as __write
> instead if necessary and if such versions are available); extra
> functionality need not be so restricted.

It's hard to decide which of runtime functionality should be
considered as basic and how it should be used.  We may say that the
only basic thing is hardware enabling which is enable_mpx and stop
here.  But then you get minimal but quite useless library.  Yes, it
can enable MPX and thus make bounds violation to interrupt a program.
But users cannot enable/disable MPX dinamycally then.  Also they
cannot configure it.  Thus either control via environmental variables
appears in this core library or we transform initialization function
from constructor to interface function and use it from another
extended MPX library which support environment variables, logging etc.
But the core library will only be used by this extended MPX library
and nothing else. So why not to leave it as a single library as it is?

>
>> I also fixed other issues you mentioned in your previous comments.
>> Below is a new version.  Does it look better?
>
> I don't see anything obvious to do symbol versioning.
>
> If this isn't providing interfaces for the user program or
> compiler-generated code to call, then the symbol versioning could hide all
> symbols (so there are no exported symbols at all and the library operates
> purely through its initialization constructor).  Or build with
> -fvisibility=hidden to hide the symbols that way.  Either way, the dynamic
> symbol table of the shared library would end up with just undefined
> symbols, nothing exported by the library.

Currently it's just constructors.  I'll hide functions.

>
> The issue with signal handlers calling inappropriate functions still
> applies - you're calling vfprintf from a signal handler.  Using
> pthread_mutex_lock around it doesn't help at all; vfprintf can call
> malloc, and the signal may have interrupted malloc, for example.  You
> really can't use stdio at all inside signal handlers - if you want to do
> I/O in them, you have to use write ().  (There's an argument that dprintf
> or an snprintf/write combination ought to be AS-safe, but currently they
> aren't; see ;
> snprintf is probably safe than dprintf in practice.)

I'll get rid of printf there. Will use write + own int to string
convert to avoid snprintf.

>
> I mentioned the question of static writable variables.  If those variables
> are only ever modified at startup, before any threads have been created,
> could you add comments to that effect (and otherwise ensure comments on
> the variables explain how you ensure they are accessed in a thread-safe
> way)?

Almost all varibles are written at process initialization only.
[out|err]_file_dirty may be written by different threads in
__mpxrt_print but the same value is always written and value is read
in process finalizer only.  Since __mpxrt_print has mutex lock anyway,
I may move the rest of fucntion body under the lock.  Remaining var is
bounds violation counter num_bnd_chk which is written in signal
handler only and read in the process finalizer.  So I believe all vars
are thread safe.  Will add appropriate comments in the code.

Thanks a lot for detailed review!
Ilya

>
> --
> Joseph S. Myers
> jos...@codesourcery.com

[PATCH] Propagate nonfreeing_call_p using ipa-pure-const

2014-11-12 Thread Jakub Jelinek

On Wed, Nov 12, 2014 at 02:49:30PM +0400, Maxim Ostapenko wrote:
> We used this code for IPA propagation of nonfreeing_call_p. It implemented
> with a separate pass, but it probably could be propagated in some existing
> one. This analysis doesn't seem to be costly thought, we didn't see any
> significant slowdown compiling big files.

Here it is rewritten using ipa-pure-const which is where Richard/Honza
suggested it should be done in.

I wonder if the nonfreeing_call_p function shouldn't be moved elsewhere
though (suggestion where), so that gimple.c doesn't need the cgraph
includes.

In any case, bootstrapped/regtested on x86_64-linux and i686-linux.

2014-11-12  Jakub Jelinek  

* ipa-pure-const.c (struct funct_state_d): Add can_free field.
(varying_state): Add true for can_free.
(check_call): For builtin or internal !nonfreeing_call_p set
local->can_free.
(check_stmt): For asm volatile and asm with "memory" set
local->can_free.
(analyze_function): Clear local->can_free initially, continue
calling check_stmt until all flags are computed, dump can_free
flag.
(pure_const_write_summary): Write can_free flag.
(pure_const_read_summary): Read it back.
(propagate_can_free): New function.
(pass_ipa_pure_const::execute): Call it.
* cgraph.h (cgraph_node): Add nonfreeing_fn member.
* gimple.c: Include ipa-ref.h, lto-streamer.h and cgraph.h.
(nonfreeing_call_p): Return cgraph nonfreeing_fn flag if set.
* cgraph.c (cgraph_node::dump): Dump nonfreeing_fn flag.
* lto-cgraph.c (lto_output_node): Write nonfreeing_fn flag.
(input_overwrite_node): Read it back.

--- gcc/ipa-pure-const.c.jj 2014-11-12 18:32:56.351139726 +0100
+++ gcc/ipa-pure-const.c2014-11-12 21:11:08.574354600 +0100
@@ -112,11 +112,15 @@ struct funct_state_d
   bool looping;
 
   bool can_throw;
+
+  /* If function can call free, munmap or otherwise make previously
+ non-trapping memory accesses trapping.  */
+  bool can_free;
 };
 
 /* State used when we know nothing about function.  */
 static struct funct_state_d varying_state
-   = { IPA_NEITHER, IPA_NEITHER, true, true, true };
+   = { IPA_NEITHER, IPA_NEITHER, true, true, true, true };
 
 
 typedef struct funct_state_d * funct_state;
@@ -559,6 +563,10 @@ check_call (funct_state local, gimple ca
   enum pure_const_state_e call_state;
   bool call_looping;
 
+  if (gimple_call_builtin_p (call, BUILT_IN_NORMAL)
+ && !nonfreeing_call_p (call))
+   local->can_free = true;
+
   if (special_builtin_state (&call_state, &call_looping, callee_t))
{
  worse_state (&local->pure_const_state, &local->looping,
@@ -589,6 +597,8 @@ check_call (funct_state local, gimple ca
break;
  }
 }
+  else if (gimple_call_internal_p (call) && !nonfreeing_call_p (call))
+local->can_free = true;
 
   /* When not in IPA mode, we can still handle self recursion.  */
   if (!ipa && callee_t
@@ -753,6 +763,7 @@ check_stmt (gimple_stmt_iterator *gsip,
fprintf (dump_file, "memory asm clobber is not const/pure\n");
  /* Abandon all hope, ye who enter here. */
  local->pure_const_state = IPA_NEITHER;
+ local->can_free = true;
}
   if (gimple_asm_volatile_p (stmt))
{
@@ -761,6 +772,7 @@ check_stmt (gimple_stmt_iterator *gsip,
  /* Abandon all hope, ye who enter here. */
  local->pure_const_state = IPA_NEITHER;
   local->looping = true;
+ local->can_free = true;
}
   return;
 default:
@@ -785,6 +797,7 @@ analyze_function (struct cgraph_node *fn
   l->looping_previously_known = true;
   l->looping = false;
   l->can_throw = false;
+  l->can_free = false;
   state_from_flags (&l->state_previously_known, &l->looping_previously_known,
flags_from_decl_or_type (fn->decl),
fn->cannot_return_p ());
@@ -815,7 +828,10 @@ analyze_function (struct cgraph_node *fn
   gsi_next (&gsi))
{
  check_stmt (&gsi, l, ipa);
- if (l->pure_const_state == IPA_NEITHER && l->looping && l->can_throw)
+ if (l->pure_const_state == IPA_NEITHER
+ && l->looping
+ && l->can_throw
+ && l->can_free)
goto end;
}
 }
@@ -882,6 +898,8 @@ end:
 fprintf (dump_file, "Function is locally const.\n");
   if (l->pure_const_state == IPA_PURE)
 fprintf (dump_file, "Function is locally pure.\n");
+  if (l->can_free)
+   fprintf (dump_file, "Function can locally free.\n");
 }
   return l;
 }
@@ -1021,6 +1039,7 @@ pure_const_write_summary (void)
  bp_pack_value (&bp, fs->looping_previously_known, 1);
  bp_pack_value (&bp, fs->looping, 1);
  bp_pack_value (&bp, fs->can_throw, 1);
+ bp_pack_value (&bp, fs->can_free, 1);
  streamer_write_bitpac

[PATCH] Optimize ASAN_CHECK checks

2014-11-12 Thread Jakub Jelinek

On Wed, Nov 12, 2014 at 02:09:59PM +0300, Yury Gribov wrote:
> >For asan you're right, we can have addresses of decls there etc.
> >If you have spare cycles, feel free to take over the patch and adjust it.
> 
> I guess I'd wait when this gets to trunk?

Ok, in that case I've bootstrapped/regtested on x86_64-linux/i686-linux what I 
have with
the ASAN_CHECK_NON_ZERO_LEN stuff removed from it (all non-INTEGER_CST
lengths ignored).  Dodji, is this ok for trunk?

2014-11-12  Jakub Jelinek  
Marek Polacek  

* sanopt.c: Include tree-ssa-operands.h.
(struct sanopt_info): Add has_freeing_call_p,
has_freeing_call_computed_p, imm_dom_path_with_freeing_call_p,
imm_dom_path_with_freeing_call_computed_p, freeing_call_events,
being_visited_p fields.
(struct sanopt_ctx): Add asan_check_map field.
(imm_dom_path_with_freeing_call, maybe_optimize_ubsan_null_ifn,
maybe_optimize_asan_check_ifn): New functions.
(sanopt_optimize_walker): Use them, optimize even ASAN_CHECK
internal calls.
(pass_sanopt::execute): Call sanopt_optimize even for
-fsanitize=address.
* gimple.c (nonfreeing_call_p): Return true for non-ECF_LEAF
internal calls.

--- gcc/sanopt.c.jj 2014-11-12 08:06:51.497182216 +0100
+++ gcc/sanopt.c2014-11-12 21:04:50.007325020 +0100
@@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.
 #include "langhooks.h"
 #include "ubsan.h"
 #include "params.h"
+#include "tree-ssa-operands.h"
 
 
 /* This is used to carry information about basic blocks.  It is
@@ -56,7 +57,29 @@ along with GCC; see the file COPYING3.
 
 struct sanopt_info
 {
-  /* True if this BB has been visited.  */
+  /* True if this BB might call (directly or indirectly) free/munmap
+ or similar operation.  */
+  bool has_freeing_call_p;
+
+  /* True if HAS_FREEING_CALL_P flag has been computed.  */
+  bool has_freeing_call_computed_p;
+
+  /* True if there is a block with HAS_FREEING_CALL_P flag set
+ on any a path between imm(BB) and BB.  */
+  bool imm_dom_path_with_freeing_call_p;
+
+  /* True if IMM_DOM_PATH_WITH_FREEING_CALL_P has been computed.  */
+  bool imm_dom_path_with_freeing_call_computed_p;
+
+  /* Number of possibly freeing calls encountered in this bb
+ (so far).  */
+  uint64_t freeing_call_events;
+
+  /* True if BB is currently being visited during computation
+ of IMM_DOM_PATH_WITH_FREEING_CALL_P flag.  */
+  bool being_visited_p;
+
+  /* True if this BB has been visited in the dominator walk.  */
   bool visited_p;
 };
 
@@ -69,11 +92,301 @@ struct sanopt_ctx
  a vector of UBSAN_NULL call statements that check this pointer.  */
   hash_map > null_check_map;
 
+  /* This map maps a pointer (the second argument of ASAN_CHECK) to
+ a vector of ASAN_CHECK call statements that check the access.  */
+  hash_map > asan_check_map;
+
   /* Number of IFN_ASAN_CHECK statements.  */
   int asan_num_accesses;
 };
 
 
+/* Return true if there might be any call to free/munmap operation
+   on any path in between DOM (which should be imm(BB)) and BB.  */
+
+static bool
+imm_dom_path_with_freeing_call (basic_block bb, basic_block dom)
+{
+  sanopt_info *info = (sanopt_info *) bb->aux;
+  edge e;
+  edge_iterator ei;
+
+  if (info->imm_dom_path_with_freeing_call_computed_p)
+return info->imm_dom_path_with_freeing_call_p;
+
+  info->being_visited_p = true;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+{
+  sanopt_info *pred_info = (sanopt_info *) e->src->aux;
+
+  if (e->src == dom)
+   continue;
+
+  if ((pred_info->imm_dom_path_with_freeing_call_computed_p
+ && pred_info->imm_dom_path_with_freeing_call_p)
+ || (pred_info->has_freeing_call_computed_p
+ && pred_info->has_freeing_call_p))
+   {
+ info->imm_dom_path_with_freeing_call_computed_p = true;
+ info->imm_dom_path_with_freeing_call_p = true;
+ info->being_visited_p = false;
+ return true;
+   }
+}
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+{
+  sanopt_info *pred_info = (sanopt_info *) e->src->aux;
+
+  if (e->src == dom)
+   continue;
+
+  if (pred_info->has_freeing_call_computed_p)
+   continue;
+
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start_bb (e->src); !gsi_end_p (gsi); gsi_next (&gsi))
+   {
+ gimple stmt = gsi_stmt (gsi);
+
+ if (is_gimple_call (stmt) && !nonfreeing_call_p (stmt))
+   {
+ pred_info->has_freeing_call_p = true;
+ break;
+   }
+   }
+
+  pred_info->has_freeing_call_computed_p = true;
+  if (pred_info->has_freeing_call_p)
+   {
+ info->imm_dom_path_with_freeing_call_computed_p = true;
+ info->imm_dom_path_with_freeing_call_p = true;
+ info->being_visited_p = false;
+ return true;
+   }
+}
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+{
+  if (e->src == dom)
+   continue;
+
+

[PATCH] Fix ipa-pure-const can_throw propagation

2014-11-12 Thread Jakub Jelinek

Hi!

The following testcase is miscompiled starting with 4.6.
Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

For 4.9/4.8, I'd prefer to use a one-liner instead:
-  for (ie = node->indirect_calls; ie; ie = ie->next_callee)
+  for (ie = w->indirect_calls; ie; ie = ie->next_callee)

2014-11-12  Jakub Jelinek  

PR ipa/63838
* ipa-pure-const.c (propagate_nothrow): Walk w->indirect_calls
chain instead of node->indirect_calls.  Put !can_throw into
conditions of all the loops.

* g++.dg/ipa/pr63838.C: New test.

--- gcc/ipa-pure-const.c.jj 2014-11-11 00:06:11.0 +0100
+++ gcc/ipa-pure-const.c2014-11-12 18:32:56.351139726 +0100
@@ -1448,7 +1448,7 @@ propagate_nothrow (void)
 
   /* Find the worst state for any node in the cycle.  */
   w = node;
-  while (w)
+  while (w && !can_throw)
{
  struct cgraph_edge *e, *ie;
  funct_state w_l = get_function_state (w);
@@ -1457,10 +1457,7 @@ propagate_nothrow (void)
  || w->get_availability () == AVAIL_INTERPOSABLE)
can_throw = true;
 
- if (can_throw)
-   break;
-
- for (e = w->callees; e; e = e->next_callee)
+ for (e = w->callees; e && !can_throw; e = e->next_callee)
{
  enum availability avail;
  struct cgraph_node *y = e->callee->function_symbol (&avail);
@@ -1469,8 +1466,6 @@ propagate_nothrow (void)
{
  funct_state y_l = get_function_state (y);
 
- if (can_throw)
-   break;
  if (y_l->can_throw && !TREE_NOTHROW (w->decl)
  && e->can_throw_external)
can_throw = true;
@@ -1478,12 +1473,9 @@ propagate_nothrow (void)
  else if (e->can_throw_external && !TREE_NOTHROW (y->decl))
can_throw = true;
}
-  for (ie = node->indirect_calls; ie; ie = ie->next_callee)
+  for (ie = w->indirect_calls; ie && !can_throw; ie = ie->next_callee)
if (ie->can_throw_external)
- {
-   can_throw = true;
-   break;
- }
+ can_throw = true;
  w_info = (struct ipa_dfs_info *) w->aux;
  w = w_info->next_cycle;
}
@@ -1794,5 +1786,3 @@ make_pass_warn_function_noreturn (gcc::c
 {
   return new pass_warn_function_noreturn (ctxt);
 }
-
-
--- gcc/testsuite/g++.dg/ipa/pr63838.C.jj   2014-11-12 19:48:50.403403040 
+0100
+++ gcc/testsuite/g++.dg/ipa/pr63838.C  2014-11-12 19:48:59.650247907 +0100
@@ -0,0 +1,56 @@
+// PR ipa/63838
+// { dg-do run }
+// { dg-options "-O2 -fdump-ipa-pure-const" }
+// { dg-final { scan-ipa-dump-not "Function found to be nothrow: void foo" 
"pure-const" } }
+// { dg-final { scan-ipa-dump-not "Function found to be nothrow: void bar" 
"pure-const" } }
+// { dg-final { cleanup-ipa-dump "pure-const" } }
+
+__attribute__((noinline, noclone)) static void bar (int);
+volatile int v;
+void (*fn) ();
+struct S { S () { v++; } ~S () { v++; } };
+
+__attribute__((noinline, noclone)) static void
+foo (int x)
+{
+  v++;
+  if (x == 5)
+bar (x);
+}
+
+__attribute__((noinline, noclone)) static void
+bar (int x)
+{
+  v++;
+  if (x == 6)
+foo (x);
+  else if (x == 5)
+fn ();
+}
+
+__attribute__((noinline, noclone)) int
+baz (int x)
+{
+  S s;
+  foo (x);
+}
+
+void
+throw0 ()
+{
+  throw 0;
+}
+
+int
+main ()
+{
+  fn = throw0;
+  asm volatile ("" : : : "memory");
+  try
+{
+  baz (5);
+}
+  catch (int)
+{
+}
+}

Jakub

[PATCH] Make -fdiagnostics-color= default configurable, default to =auto

2014-11-12 Thread Jakub Jelinek

Hi!

This patch makes the -fdiagnostics-color= default configurable, and
changes the default (if no configure option is specified for it)
to --with-diagnostics-color=auto.  The previous behavior can be
restored with --with-diagnostics-color=auto-if-env , the 4.8
behavior (never coloring anything) with --with-diagnostics-color=never .
Bootstrapped/regtested on x86_64-linux and i686-linux and tested with
all 5 different configure options (the four explicit one and without).
Ok for trunk?

2014-11-12  Jakub Jelinek  

* configure.ac (--with-diagnostics-color): New configure
option, default to --with-diagnostics-color=auto.
* toplev.c (process_options): Use DIAGNOSTICS_COLOR_DEFAULT
to determine -fdiagnostics-color= option default.
* doc/invoke.texi (-fdiagnostics-color=): Document new
default.
* configure: Regenerated.
* config.in: Regenerated.

--- gcc/configure.ac.jj 2014-11-12 08:06:56.0 +0100
+++ gcc/configure.ac2014-11-12 20:28:32.379823251 +0100
@@ -5608,6 +5608,34 @@ if test x"${LINKER_HASH_STYLE}" != x; th
  [The linker hash style])
 fi
 
+# Specify what should be the default of -fdiagnostics-color option.
+AC_ARG_WITH([diagnostics-color],
+[AC_HELP_STRING([--with-diagnostics-color={never,auto,auto-if-env,always}],
+[specify the default of -fdiagnostics-color option
+ auto-if-env stands for -fdiagnostics-color=auto if
+ GCC_COLOR environment variable is present and
+ -fdiagnostics-color=never otherwise])],
+[case x"$withval" in
+   xnever)
+ DIAGNOSTICS_COLOR_DEFAULT=DIAGNOSTICS_COLOR_NO
+ ;;
+   xauto)
+ DIAGNOSTICS_COLOR_DEFAULT=DIAGNOSTICS_COLOR_AUTO
+ ;;
+   xauto-if-env)
+ DIAGNOSTICS_COLOR_DEFAULT=-1
+ ;;
+   xalways)
+ DIAGNOSTICS_COLOR_DEFAULT=DIAGNOSTICS_COLOR_YES
+ ;;
+   *)
+ AC_MSG_ERROR([$withval is an invalid option to --with-diagnostics-color])
+ ;;
+ esac],
+[DIAGNOSTICS_COLOR_DEFAULT=DIAGNOSTICS_COLOR_AUTO])
+AC_DEFINE_UNQUOTED(DIAGNOSTICS_COLOR_DEFAULT, $DIAGNOSTICS_COLOR_DEFAULT,
+  [The default for -fdiagnostics-color option])
+
 # Generate gcc-driver-name.h containing GCC_DRIVER_NAME for the benefit
 # of jit/jit-playback.c.
 cat > gcc-driver-name.h printer)
+ = colorize_init (DIAGNOSTICS_COLOR_AUTO);
+   break;
+  case DIAGNOSTICS_COLOR_YES:
+   pp_show_color (global_dc->printer)
+ = colorize_init (DIAGNOSTICS_COLOR_YES);
+   break;
+  default:
+   break;
+  }
 
   /* Allow the front end to perform consistency checks and do further
  initialization based on the command line options.  This hook also
--- gcc/doc/invoke.texi.jj  2014-11-12 08:06:54.0 +0100
+++ gcc/doc/invoke.texi 2014-11-12 20:20:58.802642925 +0100
@@ -3103,8 +3103,10 @@ a message which is too long to fit on a
 @cindex highlight, color, colour
 @vindex GCC_COLORS @r{environment variable}
 Use color in diagnostics.  @var{WHEN} is @samp{never}, @samp{always},
-or @samp{auto}.  The default is @samp{never} if @env{GCC_COLORS} environment
-variable isn't present in the environment, and @samp{auto} otherwise.
+or @samp{auto}.  The default depends on how the compiler has been configured,
+it can be any of the above @var{WHEN} options or also @samp{never}
+if @env{GCC_COLORS} environment variable isn't present in the environment,
+and @samp{auto} otherwise.
 @samp{auto} means to use color only when the standard error is a terminal.
 The forms @option{-fdiagnostics-color} and @option{-fno-diagnostics-color} are
 aliases for @option{-fdiagnostics-color=always} and
--- gcc/configure.jj2014-11-12 08:06:54.0 +0100
+++ gcc/configure   2014-11-12 20:28:49.364032302 +0100
@@ -935,6 +935,7 @@ enable_plugin
 enable_host_shared
 enable_libquadmath_support
 with_linker_hash_style
+with_diagnostics_color
 '
   ac_precious_vars='build_alias
 host_alias
@@ -1709,6 +1710,11 @@ Optional Packages:
   --with-system-zlib  use installed libz
   --with-linker-hash-style={sysv,gnu,both}
   specify the linker hash style
+  --with-diagnostics-color={never,auto,auto-if-env,always}
+  specify the default of -fdiagnostics-color option
+  auto-if-env stands for -fdiagnostics-color=auto

Re: [PATCH, Libatomic, Darwin] Initial libatomic port for darwin.

2014-11-12 Thread Joseph Myers

On Wed, 12 Nov 2014, Iain Sandoe wrote:

> libatomic:

PR target/59305

(at least, I presume this fixes that bug; I don't know if there are any 
other relevant bugs in Bugzilla).

>   * config/darwin/host-config.h New.
>   * config/darwin/lock.c New.
>   * configure.tgt (DEFAULT_X86_CPU): New, (target): New entry for darwin.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Fix PR/63841: empty constructor doesn't zero-initialize

2014-11-12 Thread Xinliang David Li

missing test case?

David

On Wed, Nov 12, 2014 at 2:13 PM, Teresa Johnson  wrote:
> This patch fixes an issue where tree-strlen was incorrectly removing a
> store of 0 into a string because it thought a prior CLOBBER (which is
> an empty constructor with no elements) was zero-initializing the
> string.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk?
>
> Thanks,
> Teresa
>
> 2014-11-12
>
> PR tree-optimization/63841
> * tree.c (initializer_zerop): A constructor with no elements
> does not zero initialize.
>
> Index: tree.c
> ===
> --- tree.c  (revision 217190)
> +++ tree.c  (working copy)
> @@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init)
>{
> unsigned HOST_WIDE_INT idx;
>
> +if (!CONSTRUCTOR_NELTS (init))
> +  return false;
> FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt)
>   if (!initializer_zerop (elt))
> return false;
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

[PATCH] Fix PR/63841: empty constructor doesn't zero-initialize

2014-11-12 Thread Teresa Johnson

This patch fixes an issue where tree-strlen was incorrectly removing a
store of 0 into a string because it thought a prior CLOBBER (which is
an empty constructor with no elements) was zero-initializing the
string.

Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk?

Thanks,
Teresa

2014-11-12

PR tree-optimization/63841
* tree.c (initializer_zerop): A constructor with no elements
does not zero initialize.

Index: tree.c
===
--- tree.c  (revision 217190)
+++ tree.c  (working copy)
@@ -10330,6 +10330,8 @@ initializer_zerop (const_tree init)
   {
unsigned HOST_WIDE_INT idx;

+if (!CONSTRUCTOR_NELTS (init))
+  return false;
FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (init), idx, elt)
  if (!initializer_zerop (elt))
return false;


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH, RFC, C] Add -fno-float to forbid floating point data types

2014-11-12 Thread Joseph Myers

On Wed, 12 Nov 2014, Thomas Preud'homme wrote:

> In several occasions (see [1][2] people requested a switch to tell GCC 
> that a given compilation unit should not contain any float and that GCC 
> should warn about any violation of this assumption. Such a switch would 
> ensure that no softfloat library is going to be pulled in by the linker 
> and would also allow to notify the linker that the resulting file is 
> float ABI independent (a feature useful for [3]). A previous patch was 
> posted here [4] and this patch tries to address all the comments that 
> were done.
> 
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60070
> [2] https://answers.launchpad.net/gcc-arm-embedded/+question/254345
> [3] http://jira.arm.com/browse/GCC32RM-276
> [4] https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01099.html
> 
> This patch modifies the C parser to give an error if:
>   - any variable or function parameter is declared with a float type or 
> a type containing a float (prototype are ignored)

But if you ignore prototypes at the time of declaration, don't you need to 
diagnose if a function with a floating-point parameter or return type gets 
called?  I don't see anything to do that.  (This includes the 
__builtin_sqrt case from the previous discussion.)

> specified by user and a float litteral is found.

"literal" (throughout).

> @@ -1874,6 +1925,12 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
> fndef_ok,
>   }
> else if (c_parser_next_token_is (parser, CPP_SEMICOLON))
>   {
> +   /* Forbidden declaration of floating point variable.  */
> +   if (flag_no_float && declarator->kind != cdk_function
> + && contains_floating_point_type (specs->type))
> + error_at (specs->locations[cdw_typespec],
> +   "use of floating point data types forbidden in this "
> +   "translation unit");
> c_parser_consume_token (parser);
> return;
>   }

No, this is wrong.  (a) By tying this to CPP_SEMICOLON you'll only catch 
it if the variable is last in the list of declarators (consider "float f, 
g (void);", where what comes before the semicolon is a function 
declaration); better to check on each declarator, not just the last.  (b) 
declarator->kind reflects the outermost declarator, but what determines 
whether it's a function declaration is the innermost declarator (other 
than cdk_id or cdk_attrs declarators), so this looks like it would give an 
error for "float *f(void);" (wrongly treating it as a variable because the 
outermost declarator is cdk_pointer), but not for "float (*f)(void);" 
(wrongly treating it as a function because the outermost declarator is 
cdk_function ... you could of course decide that function pointers 
involving floating-point types are OK if you want).  (c) specs->type only 
covers the type specifiers, so if you want to diagnose function pointer 
variables you need to allow for "int (*f)(float);" where the declaration's 
type involves floating point but the type specifiers don't.  (d) What do 
you want to do with typedef declarations (right now it looks like they'll 
be handled as variables, but your testcases don't consider them)?

I'd also suggest some refactoring: have a function that takes as arguments 
a location and a type, and does the

if (flag_no_float && contains_floating_point_type (type))
  error_at (loc, ...);

to avoid repeating that pattern in several places.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Add a way to mark regions of code which assume that the GC won't run

2014-11-12 Thread David Malcolm

On Wed, 2014-11-12 at 13:16 -0500, David Malcolm wrote:
> We make assumptions in the codebase about when the GC can run, and when
> it can't (e.g. within numerous passes) but these aren't captured in a way
> that's verifiable.
> 
> This patch introduces macros GGC_{BEGIN|END}_ASSERT_NO_COLLECTIONS for
> marking regions of code where we assume that a GC can't happen, together
> with an assert within ggc_collect to verify that we're not within such
> a region.
> 
> This lets us both
> (A) document code regions where a GC must not happen and
> (B) verify in a checked build that these assumptions hold
> 
> The patch also adds an example of using the macros, to the JIT.
> 
> It all compiles away in a release build.
> 
> I'm not fond of the names of the macros, but I can't think of better ones
> (suggestions?)
> 
> Successfully bootstrapped and regrtested on x86_64-unknown-linux-gnu.
> 
> OK for trunk?
> 
> gcc/ChangeLog:
>   * ggc-page.c (ggc_count_of_collection_blocking_assertions): New
>   variable.
>   (ggc_collect): Assert that we're not within code regions that are
>   assuming that the GC isn't going to run.
>   * ggc.h (GGC_BEGIN_ASSERT_NO_COLLECTIONS): New macro.
>   (GGC_END_ASSERT_NO_COLLECTIONS): New macro.
>   (ggc_count_of_collection_blocking_assertions): New variable.
> 
> gcc/jit/ChangeLog:
>   * jit-playback.c (gcc::jit::playback::context::replay): Add
>   uses of GGC_{BEGIN|END}_ASSERT_NO_COLLECTIONS to clearly
>   delimit the region of code in which the GC must not run.
> ---
>  gcc/ggc-page.c | 13 +
>  gcc/ggc.h  | 39 ++-
>  gcc/jit/jit-playback.c | 10 +-
>  3 files changed, 60 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
> index f55c4e9..fd185e4 100644
> --- a/gcc/ggc-page.c
> +++ b/gcc/ggc-page.c
> @@ -2151,11 +2151,24 @@ validate_free_objects (void)
>  #define validate_free_objects()
>  #endif
>  
> +#ifdef ENABLE_CHECKING
> +int ggc_count_of_collection_blocking_assertions;
> +#endif
> +
>  /* Top level mark-and-sweep routine.  */
>  
>  void
>  ggc_collect (void)
>  {
> +  /* Ensure that we don't try to garbage-collect in a code region that
> + assumes the GC won't run (as guarded by
> + GGC_BEGIN_ASSERT_NO_COLLECTIONS).
> +
> + If this assertion fails, then either ggc_collect was called in a
> + region that assumed no GC could occur, or we don't have matching pairs
> + of GGC_{BEGIN|END}_ASSERT_NO_COLLECTIONS macros.  */
> +  gcc_checking_assert (ggc_count_of_collection_blocking_assertions == 0);
> +
>/* Avoid frequent unnecessary work by skipping collection if the
>   total allocations haven't expanded much since the last
>   collection.  */
> diff --git a/gcc/ggc.h b/gcc/ggc.h
> index dc21520..827a8a5 100644
> --- a/gcc/ggc.h
> +++ b/gcc/ggc.h
> @@ -363,4 +363,41 @@ gt_pch_nx (unsigned int)
>  {
>  }
>  
> -#endif
> +/* We don't attempt to track references to GC-allocated entities
> +   that are on the stack, so the garbage collectior can only run at
> +   specific times.
> +
> +   The following machinery is available for recording assumptions about
> +   code regions in which the GC is expected *not* to collect.  */
> +
> +#if defined ENABLE_CHECKING
> +
> +/* Begin a region in which it's a bug for a ggc_collect to occur.
> +   Such regions can be nested.
> +   Each such region must be terminated with a matching
> +   GGC_END_ASSERT_NO_COLLECTIONS.  */
> +
> +#define GGC_BEGIN_ASSERT_NO_COLLECTIONS() \
> +  do { ggc_count_of_collection_blocking_assertions++; } while (0)
> +
> +/* Terminate a region in which ggc_collect is forbidden.  */
> +
> +#define GGC_END_ASSERT_NO_COLLECTIONS()  \
> +  do { \
> +gcc_assert (ggc_count_of_collection_blocking_assertions > 0); \
> +ggc_count_of_collection_blocking_assertions--; \
> +  } while (0)
> +
> +/* How many such assertions are active?  */
> +
> +extern int ggc_count_of_collection_blocking_assertions;
> +
> +#else /* not defined ENABLE_CHECKING */
> +
> +/* Do no checking in a release build. */
> +#define GGC_BEGIN_ASSERT_NO_COLLECTIONS()
> +#define GGC_END_ASSERT_NO_COLLECTIONS()
> +
> +#endif /* not defined ENABLE_CHECKING */
> +
> +#endif /* #ifndef GCC_GGC_H */
> diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
> index 285a3ef..2998631 100644
> --- a/gcc/jit/jit-playback.c
> +++ b/gcc/jit/jit-playback.c
> @@ -1723,6 +1723,8 @@ void
>  playback::context::
>  replay ()
>  {
> +  GGC_BEGIN_ASSERT_NO_COLLECTIONS ();
> +
>/* Adapted from c-common.c:c_common_nodes_and_builtins.  */
>tree array_domain_type = build_index_type (size_int (200));
>m_char_array_type_node
> @@ -1745,7 +1747,11 @@ replay ()
>  
>timevar_pop (TV_JIT_REPLAY);
>  
> -  if (!errors_occurred ())
> +  if (errors_occurred ())
> +{
> +  GGC_END_ASSERT_NO_COLLECTIONS ()

Re: PATCH: PR target/63815: [5 Regression] g++.dg/other/pr53811.C fails with -mcmodel=large -fpic

2014-11-12 Thread H.J. Lu

On Wed, Nov 12, 2014 at 2:00 PM, Jakub Jelinek  wrote:
> On Wed, Nov 12, 2014 at 01:51:01PM -0800, H.J. Lu wrote:
>> > So, all in all, SET_REGNO sounds inappropriate to me, everywhere else
>> > gen_raw_REG or gen_rtx_REG is used instead, so IMNSHO you should do the
>> > same.
>> >
>> > Jakub
>>
>> It makes sense.  I checked in the following patch.
>
> gen_raw_REG would be better, the difference is that gen_rtx_REG might
> in theory return you pic_offset_table_rtx again.

The whole  x86_output_mi_thunk function uses gen_rtx_REG.  Using
gen_raw_REG is out of space.

> Anyway, I don't think anything is needed for release branches,
> pic_offset_table_rtx should be initialized there just once and not changed
> afterwards.

Try this with released GCC:

[hjl@gnu-6 tmp]$ cat foo.h
struct ICCStringClass
{
  virtual int CreateString (int) = 0;
};

struct AGSCCDynamicObject
{
  virtual void Unserialize () = 0;
};


struct ScriptString:AGSCCDynamicObject, ICCStringClass
{
  virtual int CreateString (int);
  virtual void Unserialize ();
};
[hjl@gnu-6 tmp]$ cat foo1.cc
#include "foo.h"

int
__attribute__ ((noinline))
CreateNewScriptString (int fromText, bool reAllocate = true)
{
  return fromText;
}

int
__attribute__ ((noinline))
ScriptString::CreateString (int fromText)
{
  return CreateNewScriptString (fromText);
}

void
__attribute__ ((noinline))
ScriptString::Unserialize ()
{
}
[hjl@gnu-6 tmp]$ cat foo2.cc
#include "foo.h"

int
main ()
{
  ICCStringClass *x = new ScriptString;
  if (x->CreateString (1) != 1)
__builtin_abort ();
  return 0;
}
[hjl@gnu-6 tmp]$ g++ -shared -fpic -O2 -mcmodel=large -o libfoo.so foo1.cc
[hjl@gnu-6 tmp]$ g++ foo2.cc ./libfoo.so
[hjl@gnu-6 tmp]$ ./a.out
Segmentation fault
[hjl@gnu-6 tmp]$




-- 
H.J.

Re: PATCH: PR target/63815: [5 Regression] g++.dg/other/pr53811.C fails with -mcmodel=large -fpic

2014-11-12 Thread Uros Bizjak

On Wed, Nov 12, 2014 at 11:00 PM, Jakub Jelinek  wrote:
> On Wed, Nov 12, 2014 at 01:51:01PM -0800, H.J. Lu wrote:
>> > So, all in all, SET_REGNO sounds inappropriate to me, everywhere else
>> > gen_raw_REG or gen_rtx_REG is used instead, so IMNSHO you should do the
>> > same.
>> >
>> > Jakub
>>
>> It makes sense.  I checked in the following patch.
>
> gen_raw_REG would be better, the difference is that gen_rtx_REG might
> in theory return you pic_offset_table_rtx again.
>
> Anyway, I don't think anything is needed for release branches,
> pic_offset_table_rtx should be initialized there just once and not changed
> afterwards.

Please see the test(s) from [1]. The executable segfaults when
compiled with gcc-4.9.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63815#c10

Uros.

Re: PATCH: PR target/63815: [5 Regression] g++.dg/other/pr53811.C fails with -mcmodel=large -fpic

2014-11-12 Thread Jakub Jelinek

On Wed, Nov 12, 2014 at 01:51:01PM -0800, H.J. Lu wrote:
> > So, all in all, SET_REGNO sounds inappropriate to me, everywhere else
> > gen_raw_REG or gen_rtx_REG is used instead, so IMNSHO you should do the
> > same.
> >
> > Jakub
> 
> It makes sense.  I checked in the following patch.

gen_raw_REG would be better, the difference is that gen_rtx_REG might
in theory return you pic_offset_table_rtx again.

Anyway, I don't think anything is needed for release branches,
pic_offset_table_rtx should be initialized there just once and not changed
afterwards.

Jakub

Re: PATCH: PR target/63815: [5 Regression] g++.dg/other/pr53811.C fails with -mcmodel=large -fpic

2014-11-12 Thread H.J. Lu

On Wed, Nov 12, 2014 at 1:39 PM, Jakub Jelinek  wrote:
> On Wed, Nov 12, 2014 at 01:11:40PM -0800, H.J. Lu wrote:
>> On Wed, Nov 12, 2014 at 1:02 PM, Jakub Jelinek  wrote:
>> > On Wed, Nov 12, 2014 at 12:43:17PM -0800, H.J. Lu wrote:
>> >> @@ -42686,8 +42692,12 @@ x86_output_mi_thunk (FILE *file, tree, 
>> >> HOST_WIDE_INT delta,
>> >>else
>> >>  {
>> >>if (ix86_cmodel == CM_LARGE_PIC && SYMBOLIC_CONST (fnaddr))
>> >> - fnaddr = legitimize_pic_address (fnaddr,
>> >> -  gen_rtx_REG (Pmode, tmp_regno));
>> >> + {
>> >> +   SET_REGNO (pic_offset_table_rtx, R11_REG);
>> >
>> > If pic_offset_table_rtx has never been initialized, how you can use
>>
>> It is a pseudo PIC register which is uninitialized:
>>
>> (gdb) call debug_rtx (this_target_rtl->x_pic_offset_table_rtx)
>> (reg:DI 89)
>> (gdb)
>>
>> > SET_REGNO on it?  Shouldn't that be pic_offset_table_rtx = gen_raw_REG 
>> > (Pmode, R11_REG);
>>
>> I added the following comments and am checking it into trunk:
>>
>>   // CM_LARGE_PIC always uses pseudo PIC register which is
>>   // uninitialized.  Since FUNCTION is local and calling it
>>   // doesn't go through PLT, we use scratch register %r11 as
>>   // PIC register and initialize it here.
>
> From what I can see, it probably in this case should be always non-NULL,
> as it is initialized in init_emit_regs:
> 5847  if ((unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM)
> 5848pic_offset_table_rtx = gen_raw_REG (Pmode, 
> PIC_OFFSET_TABLE_REGNUM);
> 5849  else
> 5850pic_offset_table_rtx = NULL_RTX;
> and as pic_offset_table_rtx is NULL there, PIC_OFFSET_TABLE_REGNUM is
> BX_REG.
> Later on, assign_params will change that to:
> 3684  if (targetm.use_pseudo_pic_reg ())
> 3685pic_offset_table_rtx = gen_reg_rtx (Pmode);
> and do_reload, after reload temporarily changing the REGNO, will overwrite
> it again to the chosen pseudo:
> 5470  if (pic_offset_table_regno != INVALID_REGNUM)
> 5471pic_offset_table_rtx = gen_rtx_REG (Pmode, 
> pic_offset_table_regno);
>
> So, all in all, SET_REGNO sounds inappropriate to me, everywhere else
> gen_raw_REG or gen_rtx_REG is used instead, so IMNSHO you should do the
> same.
>
> Jakub

It makes sense.  I checked in the following patch.

Thanks.

-- 
H.J.
---
Index: ChangeLog
===
--- ChangeLog (revision 217445)
+++ ChangeLog (working copy)
@@ -1,5 +1,10 @@
 2014-11-12  H.J. Lu  

+ * config/i386/i386.c (x86_output_mi_thunk): Set gen_rtx_REG to
+ set pic_offset_table_rtx.
+
+2014-11-12  H.J. Lu  
+
  PR target/63815
  * config/i386/i386.c (ix86_init_large_pic_reg): New.  Extracted
  from ...
Index: config/i386/i386.c
===
--- config/i386/i386.c (revision 217445)
+++ config/i386/i386.c (working copy)
@@ -42697,7 +42697,7 @@ x86_output_mi_thunk (FILE *file, tree, H
   // uninitialized.  Since FUNCTION is local and calling it
   // doesn't go through PLT, we use scratch register %r11 as
   // PIC register and initialize it here.
-  SET_REGNO (pic_offset_table_rtx, R11_REG);
+  pic_offset_table_rtx = gen_rtx_REG (Pmode, R11_REG);
   ix86_init_large_pic_reg (tmp_regno);
   fnaddr = legitimize_pic_address (fnaddr,
gen_rtx_REG (Pmode, tmp_regno));

Re: [RFC testsuite] Allow wrappers to choose file names based on pids

2014-11-12 Thread Mike Stump

On Nov 12, 2014, at 8:53 AM, Ramana Radhakrishnan 
 wrote:
>   One of the problems we appear to face with multilib testing especially 
> with conflicting ABIs in the ARM world is the occasional case where a 
> testglue file is built for one ABI but gets linked against a multilib test 
> invocation with another target.

Yeah, this is a natural consequence of doing more things in parallel.  I’d let 
Jakub think about it for a minute and see if he sees a better (more invisible 
way) to fix this.  If not, Ok (with what ever little clean-ups you might want 
to do).

Re: Concepts code review

2014-11-12 Thread Jason Merrill


On 11/12/2014 10:27 AM, Andrew Sutton wrote:

Agreed. I'll probably start looking at this on Friday morning.


Note that end of stage 1 is Saturday, as I just realized today.  So the 
sooner the better.  :)



deduce_concept_introduction (tree expr)



Do you still need this coerce_template_parms now that I've added a call to
lookup_template_variable?  Well, once my change is merged onto the branch or
the branch onto trunk.


Maybe? I'm not sure what the call to lookup_template_variable is going
to do :) I think we still need to instantiate default arguments in
order to perform the match.


I meant that lookup_template_variable now calls coerce_template_parms 
(on trunk), so you shouldn't need to do it again here.



I'm nervous about misusing these fields this way.  Why is a constrained
parameter represented as a TYPE_DECL?


I think it was because the callers of cp_parser_nonclass_name (?)
expect to get back some kind of type or type-decl. This was the most
direct way to make constrained-type-specifiers and
constrained-parameters work (I did this at the meeting in Issaquah as
a proof of concept).

I don't especially like it either. Any recommendations?


Maybe another new tree code for constrained-type-specifier?


There are cases where trying to be proactive about diagnosing those
lookups broke a lot of existing parses, specifically the
postfix-expression for constructor calls. For example, in C(args), if
T finds

template
   concept bool C() { return true; }
template
   int C(int n) { return n; }

We to select C(int).

The check for introductions might be a special case.


Probably.

Jason

Re: [PR c/52952] More precise locations within format strings

2014-11-12 Thread Jakub Jelinek

On Wed, Nov 12, 2014 at 01:24:18PM -0800, Mike Stump wrote:
> On Nov 12, 2014, at 7:04 AM, Manuel López-Ibáñez  
> wrote:
> > And I think GCC is wrong to wan here. The point of the Wempty-body
> > warning is to catch things like:
> > 
> > if(a);
> >  return 2;
> > return 3;
> > 
> > However,
> > 
> > if(a)
> >  ;
> > return 2;
> 
> In the olden days, we didn’t have enough information to do the warnings
> well.  clang did better, cause it always had the information necessary.  I
> think if (); should warn, and if () ; should not, neither should if () \n
> ;, if we have the information.

I think we had discussions on this topic, the important thing is that we
don't want to generate different warnings based on whether -save-temps has
been used, there could be just comment in between ) and ; etc.

Jakub

Re: [PATCH, MPX runtime 1/2] Integrate MPX runtime library

2014-11-12 Thread Joseph Myers

On Wed, 12 Nov 2014, Ilya Enkovich wrote:

> MPX runtime needs to be linked with programs using MPX because it 
> initializes hardware.  Without it all MPX instructions are just NOPs.  
> Thus it's not an extra functionality, but is for basic MPX 
> functionality.

So what if you just have the constructor that initializes the hardware - 
could everything else (handling environment variables, printing more 
detailed diagnostics, etc.) be separate?  How much is basic MPX 
functionality, how much is extra?  Basic functionality should arguably be 
like libgcc in namespace terms (so not calling any libc functions outside 
of ISO C90 namespace, using reserved-namespace versions such as __write 
instead if necessary and if such versions are available); extra 
functionality need not be so restricted.

> I also fixed other issues you mentioned in your previous comments.  
> Below is a new version.  Does it look better?

I don't see anything obvious to do symbol versioning.

If this isn't providing interfaces for the user program or 
compiler-generated code to call, then the symbol versioning could hide all 
symbols (so there are no exported symbols at all and the library operates 
purely through its initialization constructor).  Or build with 
-fvisibility=hidden to hide the symbols that way.  Either way, the dynamic 
symbol table of the shared library would end up with just undefined 
symbols, nothing exported by the library.

The issue with signal handlers calling inappropriate functions still 
applies - you're calling vfprintf from a signal handler.  Using 
pthread_mutex_lock around it doesn't help at all; vfprintf can call 
malloc, and the signal may have interrupted malloc, for example.  You 
really can't use stdio at all inside signal handlers - if you want to do 
I/O in them, you have to use write ().  (There's an argument that dprintf 
or an snprintf/write combination ought to be AS-safe, but currently they 
aren't; see ; 
snprintf is probably safe than dprintf in practice.)

I mentioned the question of static writable variables.  If those variables 
are only ever modified at startup, before any threads have been created, 
could you add comments to that effect (and otherwise ensure comments on 
the variables explain how you ensure they are accessed in a thread-safe 
way)?

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: PATCH: PR target/63815: [5 Regression] g++.dg/other/pr53811.C fails with -mcmodel=large -fpic

2014-11-12 Thread Jakub Jelinek

On Wed, Nov 12, 2014 at 01:11:40PM -0800, H.J. Lu wrote:
> On Wed, Nov 12, 2014 at 1:02 PM, Jakub Jelinek  wrote:
> > On Wed, Nov 12, 2014 at 12:43:17PM -0800, H.J. Lu wrote:
> >> @@ -42686,8 +42692,12 @@ x86_output_mi_thunk (FILE *file, tree, 
> >> HOST_WIDE_INT delta,
> >>else
> >>  {
> >>if (ix86_cmodel == CM_LARGE_PIC && SYMBOLIC_CONST (fnaddr))
> >> - fnaddr = legitimize_pic_address (fnaddr,
> >> -  gen_rtx_REG (Pmode, tmp_regno));
> >> + {
> >> +   SET_REGNO (pic_offset_table_rtx, R11_REG);
> >
> > If pic_offset_table_rtx has never been initialized, how you can use
> 
> It is a pseudo PIC register which is uninitialized:
> 
> (gdb) call debug_rtx (this_target_rtl->x_pic_offset_table_rtx)
> (reg:DI 89)
> (gdb)
> 
> > SET_REGNO on it?  Shouldn't that be pic_offset_table_rtx = gen_raw_REG 
> > (Pmode, R11_REG);
> 
> I added the following comments and am checking it into trunk:
> 
>   // CM_LARGE_PIC always uses pseudo PIC register which is
>   // uninitialized.  Since FUNCTION is local and calling it
>   // doesn't go through PLT, we use scratch register %r11 as
>   // PIC register and initialize it here.

>From what I can see, it probably in this case should be always non-NULL,
as it is initialized in init_emit_regs:
5847  if ((unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM)
5848pic_offset_table_rtx = gen_raw_REG (Pmode, PIC_OFFSET_TABLE_REGNUM);
5849  else
5850pic_offset_table_rtx = NULL_RTX;
and as pic_offset_table_rtx is NULL there, PIC_OFFSET_TABLE_REGNUM is
BX_REG.
Later on, assign_params will change that to:
3684  if (targetm.use_pseudo_pic_reg ())
3685pic_offset_table_rtx = gen_reg_rtx (Pmode);
and do_reload, after reload temporarily changing the REGNO, will overwrite
it again to the chosen pseudo:
5470  if (pic_offset_table_regno != INVALID_REGNUM)
5471pic_offset_table_rtx = gen_rtx_REG (Pmode, pic_offset_table_regno);

So, all in all, SET_REGNO sounds inappropriate to me, everywhere else
gen_raw_REG or gen_rtx_REG is used instead, so IMNSHO you should do the
same.

Jakub

Re: The nvptx port [0/11+]

2014-11-12 Thread Jeff Law


On 11/12/14 05:34, Richard Biener wrote:



Now that this has been committed - I notice that there is no entry
in MAINTAINERS for the port.  I propose Bernd.
Well, ahead of you there.   I proposed Bernd to the steering committee 
as the maintainer a little while ago.  I need to go back and count votes :-)


jeff

Re: [PR c/52952] More precise locations within format strings

2014-11-12 Thread Mike Stump

On Nov 12, 2014, at 7:04 AM, Manuel López-Ibáñez  wrote:
> And I think GCC is wrong to wan here. The point of the Wempty-body
> warning is to catch things like:
> 
> if(a);
>  return 2;
> return 3;
> 
> However,
> 
> if(a)
>  ;
> return 2;

In the olden days, we didn’t have enough information to do the warnings well.  
clang did better, cause it always had the information necessary.  I think if 
(); should warn, and if () ; should not, neither should if () \n ;, if we have 
the information.

[PATCH, Libatomic, Darwin] Initial libatomic port for darwin.

2014-11-12 Thread Iain Sandoe

Hi Richard,

The posix fallback for libatomic locks on unsupported item sizes (using 
pthreads) might be reliable, but is (not surprisingly) somewhat slow.

Whereas the built-in testsuite from libatomic passes ..
..  every Darwin platform from powerpc-darwin9 (on a G5) .. through 
x86-64-darwin14 (on Haswell) fails the gcc/exceptions.exp suite with timeouts.

So here's a platform port
(we don't [yet] have ifuncs).



The patch does the following:

1) When 4byte atomic operations are supported (will always be the case when the 
lib is built as part of GCC) these are used to guard accesses.

2) When atomic ops are not available, OSSpinLocks are used (available since 
forever on the OS).  These are also 4byte locks on all current platforms.

3) Some heuristic fiddling with the algorithm for hashing addresses - to try 
and avoid cache turbulence when items are close-ish together (not unlikely in a 
typical code).  This might yet bear some more tweaking, but seems OK enough for 
a first go.

4) We use low-level Mach interfaces to give up our timeslice when blocked, to 
keep overheads to a minimum.

5) We allow the port to specify the minimum processor that will be available.
   E.G. for x86 darwin defaults to core2, which means that we don't start 
guarding small items which we could lock natively.



Tested across the patch for a while on darwin9..13 (and by Dominique on 
darwin14)
I jammed the atomic version off to test the Spinlock-based timings.

Typical runtime results  for gcc: atomic.exp 

[all == timeout with trunk implementation]

Port VersionSpinlockAtomic
===
powerpc-darwin9 ~30mins ~15mins 2G5 G5
x86-64-darwin{12,13}~15mins ~6mins  2G8 Xeon, 2G6 Ivy bridge
x86-64-darwin14 --- ~3mins  (Haswell, AFAIK)

This (for x86-64, Xeon) is about the same as I see on our Linux machines.

OK for trunk?

This is also functional on gcc-4.8 and gcc-4.9...
.. what would the feeling be about back-porting?

Iain

libatomic:

* config/darwin/host-config.h New.
* config/darwin/lock.c New.
* configure.tgt (DEFAULT_X86_CPU): New, (target): New entry for darwin.

From fdbf91b9fb20992231a370f0e5cd803085b4f69e Mon Sep 17 00:00:00 2001
From: Iain Sandoe 
Date: Wed, 15 Oct 2014 10:49:40 +0100
Subject: [PATCH] Initial draft of a Darwin port for libatomic

---
 libatomic/config/darwin/host-config.h |  55 ++
 libatomic/config/darwin/lock.c| 187 ++
 libatomic/configure.tgt   |  21 +++-
 3 files changed, 260 insertions(+), 3 deletions(-)
 create mode 100644 libatomic/config/darwin/host-config.h
 create mode 100644 libatomic/config/darwin/lock.c

diff --git a/libatomic/config/darwin/host-config.h 
b/libatomic/config/darwin/host-config.h
new file mode 100644
index 000..db55d34
--- /dev/null
+++ b/libatomic/config/darwin/host-config.h
@@ -0,0 +1,55 @@
+/* Copyright (C) 2012-2014 Free Software Foundation, Inc.
+   Contributed by Richard Henderson .
+
+   This file is part of the GNU Atomic Library (libatomic).
+
+   Libatomic is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* Included after all more target-specific host-config.h.  */
+
+
+#ifndef protect_start_end
+# ifdef HAVE_ATTRIBUTE_VISIBILITY
+#  pragma GCC visibility push(hidden)
+# endif
+
+void libat_lock_1 (void *ptr);
+void libat_unlock_1 (void *ptr);
+
+static inline UWORD
+protect_start (void *ptr)
+{
+  libat_lock_1 (ptr);
+  return 0;
+}
+
+static inline void
+protect_end (void *ptr, UWORD dummy UNUSED)
+{
+  libat_unlock_1 (ptr);
+}
+
+# define protect_start_end 1
+# ifdef HAVE_ATTRIBUTE_VISIBILITY
+#  pragma GCC visibility pop
+# endif
+#endif /* protect_start_end */
+
+#include_next 
diff --git a/libatomic/config/darwin/lock.c b/libatomic/config/darwin/lock.c
new file mode 100644
index 000..286b9df
--- /dev/null
+++ b/libatomic/config/darwin/lock.c
@@ -0,0 +1,187 @@
+/* Copyright (C) 2014 Free Software Foundation, I

Re: PATCH: PR target/63815: [5 Regression] g++.dg/other/pr53811.C fails with -mcmodel=large -fpic

2014-11-12 Thread H.J. Lu

On Wed, Nov 12, 2014 at 1:02 PM, Jakub Jelinek  wrote:
> On Wed, Nov 12, 2014 at 12:43:17PM -0800, H.J. Lu wrote:
>> @@ -42686,8 +42692,12 @@ x86_output_mi_thunk (FILE *file, tree, 
>> HOST_WIDE_INT delta,
>>else
>>  {
>>if (ix86_cmodel == CM_LARGE_PIC && SYMBOLIC_CONST (fnaddr))
>> - fnaddr = legitimize_pic_address (fnaddr,
>> -  gen_rtx_REG (Pmode, tmp_regno));
>> + {
>> +   SET_REGNO (pic_offset_table_rtx, R11_REG);
>
> If pic_offset_table_rtx has never been initialized, how you can use

It is a pseudo PIC register which is uninitialized:

(gdb) call debug_rtx (this_target_rtl->x_pic_offset_table_rtx)
(reg:DI 89)
(gdb)

> SET_REGNO on it?  Shouldn't that be pic_offset_table_rtx = gen_raw_REG 
> (Pmode, R11_REG);

I added the following comments and am checking it into trunk:

  // CM_LARGE_PIC always uses pseudo PIC register which is
  // uninitialized.  Since FUNCTION is local and calling it
  // doesn't go through PLT, we use scratch register %r11 as
  // PIC register and initialize it here.

-- 
H.J.
-

Re: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost)

2014-11-12 Thread Mike Stump

On Nov 12, 2014, at 2:55 AM, Thomas Schwinge  wrote:
> There is no mechanism in DejaGnu to pass environment variables to remote
> boards (which we're using in internal testing), and we currently use that
> to circle through available accelerators/libgomp plugins

So, two thoughts come to mind.  In my target, I wire up env and in my 
simulator, I push all of the environment from the host into the target.  Works 
very nicely, I can bootstrap gcc and binutils on my target.  Total line count 
to do this is fairly trivial.  Once done, then all the usual environment 
adjusting that one can do in tcl is reflected on the target.

Or, you can put in code that will communicate the bits over to the target.  
When commands are executed, usually we are pushing them into shells, and in 
shells, you can do:

  env1=value1 env2=value2 command arg1 arg2

and push as many variables over that you want.  Just need an api to add/manage 
what variables you want to actively push.  If you don’t have a shell, you still 
need some way to communicate the variables over.  Ultimately this will have to 
be put into the board file and then the abi can use that interface to move the 
variables.  I don’ think there are existing ways to do this.

I like the first, and that’s fine for testing on software simulators with lots 
of memory.  On actual target hardware with limited memory, it would be less 
appropriate.

The second, might require a dejagnu update to it work.

Another thought, if you have an argument to the program you want to run, —env, 
that can place environment variables into that program, you can then just add 
—env args to the command line too, and it will put them into the environment 
directly.  Benefit, no dejagnu mods or updates.  Easy to understand and audit 
(env variables otherwise kinda disappear, making flaws in them harder to see).

Re: PATCH: PR target/63815: [5 Regression] g++.dg/other/pr53811.C fails with -mcmodel=large -fpic

2014-11-12 Thread Jakub Jelinek

On Wed, Nov 12, 2014 at 12:43:17PM -0800, H.J. Lu wrote:
> @@ -42686,8 +42692,12 @@ x86_output_mi_thunk (FILE *file, tree, HOST_WIDE_INT 
> delta,
>else
>  {
>if (ix86_cmodel == CM_LARGE_PIC && SYMBOLIC_CONST (fnaddr))
> - fnaddr = legitimize_pic_address (fnaddr,
> -  gen_rtx_REG (Pmode, tmp_regno));
> + {
> +   SET_REGNO (pic_offset_table_rtx, R11_REG);

If pic_offset_table_rtx has never been initialized, how you can use
SET_REGNO on it?  Shouldn't that be pic_offset_table_rtx = gen_raw_REG (Pmode, 
R11_REG);
or similar?  Or is it initialized from some earlier function emitted, just
with a different reg?

Jakub

Re: [RFC testsuite] Allow wrappers to choose file names based on pids

2014-11-12 Thread Jeff Law


On 11/12/14 09:53, Ramana Radhakrishnan wrote:

Hi,

 One of the problems we appear to face with multilib testing
especially with conflicting ABIs in the ARM world is the occasional case
where a testglue file is built for one ABI but gets linked against a
multilib test invocation with another target.

 We've tried to ameliorate this in some of our builds with something
like this attached patch and we've been carrying this internally for
quite some time. In general this appears to help in quite a few cases
but there are still clashes that we get in our testing environment once
a while that we are unable to remove or get to the bottom of. (My
suspicion is around mov_if_change gcc-testglue{pid}.o gcc-tg.o but I'm
not yet a 100% sure.)

This appears to reduce the number of clashes we have with such
conflicting ABIs and their testing thereof. I don't know if other folks
see such issues but it's worth checking if this is of use somewhere else.

At the minute this is just an RFC to see what other folks think of this
and whether others face such an issue.

Tested in the past for arm-none-eabi regularly built with
--with-multilib-list=aprofile with a set of conflicting multilib ABI tests.

I'm happy to retest and resubmit if folks think this is a good idea.
It's been a very long time since I had to deal with wrapping and 
multilib testing.  *But* I can recall stumbling over this kind of 
problem in the past.


Is it an issue with running the tests in parallel, or is it specific 
tests that are calling for non-default ABI and picking up a testglue 
built for the some other ABI?


Jeff

Re: PATCH: PR target/63815: [5 Regression] g++.dg/other/pr53811.C fails with -mcmodel=large -fpic

2014-11-12 Thread Uros Bizjak

On Wed, Nov 12, 2014 at 9:43 PM, H.J. Lu  wrote:
> Hi,
>
> r216154 exposed an x86 backend bug.  For large PIC mode thunk, there
> are
>
>   if (ix86_cmodel == CM_LARGE_PIC && SYMBOLIC_CONST (fnaddr))
> fnaddr = legitimize_pic_address (fnaddr,
>  gen_rtx_REG (Pmode, tmp_regno));
>
> and legitimize_pic_address does:
>
>   if (reg != 0)
> {
>   new_rtx = expand_simple_binop (Pmode, PLUS, reg, 
> pic_offset_table_rtx,
>  tmpreg, 1, OPTAB_DIRECT);
>   new_rtx = reg;
> }
>
> However, pic_offset_table_rtx was never initialized.  Befor r216154,
> we generated thunk with random value in hardcoded PIC register.  After
> r216154, compiler crashes.  This patch sets PIC register to %r11 and
> initialize it.  Tested on Linux/x86-64.  OK for trunk and backport
> to 4.8/4.9 branches?
>
> 2014-11-12  H.J. Lu  
>
> PR target/63815
> * config/i386/i386.c (ix86_init_large_pic_reg): New.  Extracted
> from ...
> (ix86_init_pic_reg): Here.  Use ix86_init_large_pic_reg.
> (x86_output_mi_thunk): Set PIC register to %r11.  Call
> ix86_init_large_pic_reg to initialize PIC register.
>
> gcc/testsuite/
>
> 2014-11-12  H.J. Lu  
>
> PR target/63815
> * g++.dg/other/pr63815.C: New test.

OK for mainline with a short comment addition.

Backports to release branches need RM's approval.

Thanks,
Uros.

>if (ix86_cmodel == CM_LARGE_PIC && SYMBOLIC_CONST (fnaddr))
> -   fnaddr = legitimize_pic_address (fnaddr,
> -gen_rtx_REG (Pmode, tmp_regno));
> +   {

Please put a short comment why we need hard reg here.

> + SET_REGNO (pic_offset_table_rtx, R11_REG);

Re: [PATCH, MPX runtime 1/2] Integrate MPX runtime library

2014-11-12 Thread Jeff Law


On 11/11/14 13:22, Ilya Enkovich wrote:



This library (some version of it) was only published in a binary form
some time ago.  AFAIK currently it is not a part of any product.
Sources were never made public before.  Sources go under the same
license as Cilk runtime and therefore require approval from steering
committee.  Should I do something to start that process?

Just want to make sure I understand the issues in play.

It sounds like GCC should be considered the canonical repository.

The code has the same copyright as Cilk+ runtime, which has already been 
OK'd.


The code is still copyrighted by Intel

Assuming I've got those key items correct, I'll raise it with the 
steering committee.  We've dealt with all the kind of stuff before, so 
it ought to be something we can work through.


jeff

Re: libgomp: "GNU OpenMP Runtime Library" (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost))

2014-11-12 Thread David Malcolm

On Wed, 2014-11-12 at 21:30 +0100, Jakub Jelinek wrote:
> On Wed, Nov 12, 2014 at 03:22:21PM -0500, David Malcolm wrote:
> > On Wed, 2014-11-12 at 14:47 +0100, Jakub Jelinek wrote:
> > > On Wed, Nov 12, 2014 at 08:33:34AM -0500, David Malcolm wrote:
> > > > Apologies for bikeshedding, and I normally dislike "cute" names, but
> > > > renaming it to
> > > > 
> > > >"GNU Offloading and Multi Processing library"
> > > > 
> > > > would allow a backronym of "libgomp", thus preserving the existing
> > > > filenames/SONAME etc.
> > > 
> > > I think this is fine, can you change it both in libgomp/configure.ac
> > > and texi docs?
> > 
> > Am attaching a patch that does so, though I suspect the wording in the
> > texi may need some more work (not my area of expertise).
> 
> Oops, I didn't mean by "you" above you, but the OpenACC folks, sorry for
> confusion.  Anyway, your patch is ok for trunk.  Thanks.

Ah, ok :)   Presumably this is conditional on the rest of the OpenACC
work merging?  (AIUI the OpenACC work is not yet on trunk, right?)

If so, perhaps the OpenACC people can adopt the patch and apply it (or
in a modified form) when they merge their work?

Sorry if I'm stepping on any toes here
Dave


> > >From f52f7d0e2115d3f88e8662cab650f8746a2c147d Mon Sep 17 00:00:00 2001
> > From: David Malcolm 
> > Date: Wed, 12 Nov 2014 12:25:25 -0500
> > Subject: [PATCH] Change "human" name of libgomp
> > 
> > libgomp/ChangeLog:
> > * configure.ac (AC_INIT): Rename from "GNU OpenMP Runtime Library"
> > to "GNU Offloading and Multi Processing Runtime Library".
> > * libgomp.texi (direntry): Likewise.  Reword to refer to both
> > OpenMP and OpenACC.
> > (Introduction): Reword.
> > (Runtime Library Routines): Reword.
> > ---
> >  libgomp/configure.ac |  2 +-
> >  libgomp/libgomp.texi | 14 --
> >  2 files changed, 9 insertions(+), 7 deletions(-)
> > 
> > diff --git a/libgomp/configure.ac b/libgomp/configure.ac
> > index 84d250f..1a70058 100644
> > --- a/libgomp/configure.ac
> > +++ b/libgomp/configure.ac
> > @@ -2,7 +2,7 @@
> >  # aclocal -I ../config && autoconf && autoheader && automake
> >  
> >  AC_PREREQ(2.64)
> > -AC_INIT([GNU OpenMP Runtime Library], 1.0,,[libgomp])
> > +AC_INIT([GNU Offloading and Multi Processing Runtime Library], 
> > 1.0,,[libgomp])
> >  AC_CONFIG_HEADER(config.h)
> >  
> >  # ---
> > diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
> > index 254be57..78e8404 100644
> > --- a/libgomp/libgomp.texi
> > +++ b/libgomp/libgomp.texi
> > @@ -31,11 +31,12 @@ texts being (a) (see below), and with the Back-Cover 
> > Texts being (b)
> >  @ifinfo
> >  @dircategory GNU Libraries
> >  @direntry
> > -* libgomp: (libgomp).GNU OpenMP runtime library
> > +* libgomp: (libgomp).   GNU Offloading and Multi Processing Runtime library
> >  @end direntry
> >  
> > -This manual documents the GNU implementation of the OpenMP API for 
> > -multi-platform shared-memory parallel programming in C/C++ and Fortran.
> > +This manual documents libgomp, the GNU Offloading and Multi
> > +Processing Runtime library.  This is the GNU implementation of the OpenMP
> > +and OpenACC APIs for parallel programming in C/C++ and Fortran.
> >  
> >  Published by the Free Software Foundation
> >  51 Franklin Street, Fifth Floor
> > @@ -69,7 +70,8 @@ Boston, MA 02110-1301, USA@*
> >  @top Introduction
> >  @cindex Introduction
> >  
> > -This manual documents the usage of libgomp, the GNU implementation of the 
> > +This manual documents the usage of libgomp, the GNU Offloading and Multi
> > +Processing Runtime library.  This is the GNU implementation of the
> >  @uref{http://www.openmp.org, OpenMP} Application Programming Interface 
> > (API)
> >  for multi-platform shared-memory parallel programming in C/C++ and Fortran.
> >  
> > @@ -82,8 +84,8 @@ for multi-platform shared-memory parallel programming in 
> > C/C++ and Fortran.
> >  @comment
> >  @menu
> >  * Enabling OpenMP::How to enable OpenMP for your applications.
> > -* Runtime Library Routines::   The OpenMP runtime application programming 
> > -   interface.
> > +* Runtime Library Routines::   The offloading and multiprocessing runtime
> > +   application programming interface.
> >  * Environment Variables::  Influencing runtime behavior with 
> > environment 
> > variables.
> >  * The libgomp ABI::Notes on the external ABI presented by 
> > libgomp.
> > -- 
> > 1.8.5.3
> > 
> 
> 
>   Jakub

PATCH: PR target/63815: [5 Regression] g++.dg/other/pr53811.C fails with -mcmodel=large -fpic

2014-11-12 Thread H.J. Lu

Hi,

r216154 exposed an x86 backend bug.  For large PIC mode thunk, there
are

  if (ix86_cmodel == CM_LARGE_PIC && SYMBOLIC_CONST (fnaddr))
fnaddr = legitimize_pic_address (fnaddr,
 gen_rtx_REG (Pmode, tmp_regno));

and legitimize_pic_address does:

  if (reg != 0)
{
  new_rtx = expand_simple_binop (Pmode, PLUS, reg, pic_offset_table_rtx,
 tmpreg, 1, OPTAB_DIRECT);
  new_rtx = reg;
}

However, pic_offset_table_rtx was never initialized.  Befor r216154,
we generated thunk with random value in hardcoded PIC register.  After
r216154, compiler crashes.  This patch sets PIC register to %r11 and
initialize it.  Tested on Linux/x86-64.  OK for trunk and backport 
to 4.8/4.9 branches?

Thanks.


H.J.
---
gcc/

2014-11-12  H.J. Lu  

PR target/63815
* config/i386/i386.c (ix86_init_large_pic_reg): New.  Extracted
from ...
(ix86_init_pic_reg): Here.  Use ix86_init_large_pic_reg.
(x86_output_mi_thunk): Set PIC register to %r11.  Call
ix86_init_large_pic_reg to initialize PIC register.

gcc/testsuite/

2014-11-12  H.J. Lu  

PR target/63815
* g++.dg/other/pr63815.C: New test.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2866900..0bd3ff3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6195,6 +6195,27 @@ ix86_use_pseudo_pic_reg (void)
   return true;
 }
 
+/* Initialize large model PIC register.  */
+
+static void
+ix86_init_large_pic_reg (unsigned int tmp_regno)
+{
+  rtx_code_label *label;
+  rtx tmp_reg;
+
+  gcc_assert (Pmode == DImode);
+  label = gen_label_rtx ();
+  emit_label (label);
+  LABEL_PRESERVE_P (label) = 1;
+  tmp_reg = gen_rtx_REG (Pmode, tmp_regno);
+  gcc_assert (REGNO (pic_offset_table_rtx) != tmp_regno);
+  emit_insn (gen_set_rip_rex64 (pic_offset_table_rtx,
+   label));
+  emit_insn (gen_set_got_offset_rex64 (tmp_reg, label));
+  emit_insn (ix86_gen_add3 (pic_offset_table_rtx,
+   pic_offset_table_rtx, tmp_reg));
+}
+
 /* Create and initialize PIC register if required.  */
 static void
 ix86_init_pic_reg (void)
@@ -6210,22 +6231,7 @@ ix86_init_pic_reg (void)
   if (TARGET_64BIT)
 {
   if (ix86_cmodel == CM_LARGE_PIC)
-   {
- rtx_code_label *label;
- rtx tmp_reg;
-
- gcc_assert (Pmode == DImode);
- label = gen_label_rtx ();
- emit_label (label);
- LABEL_PRESERVE_P (label) = 1;
- tmp_reg = gen_rtx_REG (Pmode, R11_REG);
- gcc_assert (REGNO (pic_offset_table_rtx) != REGNO (tmp_reg));
- emit_insn (gen_set_rip_rex64 (pic_offset_table_rtx,
-   label));
- emit_insn (gen_set_got_offset_rex64 (tmp_reg, label));
- emit_insn (ix86_gen_add3 (pic_offset_table_rtx,
-   pic_offset_table_rtx, tmp_reg));
-   }
+   ix86_init_large_pic_reg (R11_REG);
   else
emit_insn (gen_set_got_rex64 (pic_offset_table_rtx));
 }
@@ -42686,8 +42692,12 @@ x86_output_mi_thunk (FILE *file, tree, HOST_WIDE_INT 
delta,
   else
 {
   if (ix86_cmodel == CM_LARGE_PIC && SYMBOLIC_CONST (fnaddr))
-   fnaddr = legitimize_pic_address (fnaddr,
-gen_rtx_REG (Pmode, tmp_regno));
+   {
+ SET_REGNO (pic_offset_table_rtx, R11_REG);
+ ix86_init_large_pic_reg (tmp_regno);
+ fnaddr = legitimize_pic_address (fnaddr,
+  gen_rtx_REG (Pmode, tmp_regno));
+   }
 
   if (!sibcall_insn_operand (fnaddr, word_mode))
{
diff --git a/gcc/testsuite/g++.dg/other/pr63815.C 
b/gcc/testsuite/g++.dg/other/pr63815.C
new file mode 100644
index 000..fce6226
--- /dev/null
+++ b/gcc/testsuite/g++.dg/other/pr63815.C
@@ -0,0 +1,50 @@
+// PR target/63815
+// { dg-do run { target { { i?86-*-linux* x86_64-*-linux* } && lp64 } } }
+// { dg-options "-mcmodel=large" }
+// { dg-additional-options "-fpic" { target fpic } }
+
+struct ICCStringClass
+{
+  virtual int CreateString (int) = 0;
+};
+
+struct AGSCCDynamicObject
+{
+  virtual void Unserialize () = 0;
+};
+
+struct ScriptString:AGSCCDynamicObject, ICCStringClass
+{
+  virtual int CreateString (int);
+  virtual void Unserialize ();
+};
+
+int
+__attribute__ ((noinline))
+CreateNewScriptString (int fromText, bool reAllocate = true)
+{
+  return fromText;
+}
+
+int
+__attribute__ ((noinline))
+ScriptString::CreateString (int fromText)
+{
+  return CreateNewScriptString (fromText);
+}
+
+void
+__attribute__ ((noinline))
+ScriptString::Unserialize ()
+{
+}
+
+int
+main ()
+{
+  ICCStringClass *x = new ScriptString;
+
+  if (x->CreateString (1) != 1)
+__builtin_abort ();
+  return 0;
+}

Re: [PATCH, ifcvt] Allow CC mode if HAVE_cbranchcc4 (fix s390 build)

2014-11-12 Thread Ulrich Weigand

Zhenqiang Chen wrote:

> Function noce_emit_store_flag tries to generate instruction to store flag by
> emit_store_flag for general_operand. For s390, CCU is a general _operand,
> but can not match cstorecc4, then it tries to generate a register move
> instruction from CCU to CCZ1, which will trigger an ICE.

OK, I think I see the problem now.  emit_store_flag calls emit_cstore, which
calls prepare_operand, which does:

  if (!insn_operand_matches (icode, opnum, x))
{
  if (reload_completed)
return NULL_RTX;
  x = copy_to_mode_reg (insn_data[(int) icode].operand[opnum].mode, x);
}

The insn_operand_matches call will indeed fail since x has CCUmode, but
the operand only accepts CCZ1mode for this insn.

However, it is a bug to use copy_to_mode_reg with a mode different
from the mode of the RTX to be loaded (unless that RTX is a constant
of VOIDmode); copy_to_mode_reg will simply abort in that case.

Usually, this doesn't happen since prepare_operand is called on insns that
were already selected according to the operand mode.  However, this is
weakened for MODE_CC modes in emit_store_flag:

 machine_mode optab_mode = mclass == MODE_CC ? CCmode : compare_mode;
 icode = optab_handler (cstore_optab, optab_mode);

So the assumption seems to be that a cstorecc4 insn must accept *any*
MODE_CC mode, or else prepare_operand will abort.  I don't think this
requirement is really useful; on s390 there are good reasons to only
accept certain MODE_CC modes and ask the middle-end to fall back to
the generic implemention for others.

Thus I'd propose to add a mode check to prepare_operand and simply fail
instead of aborting if the mode doesn't match what the insn expects.
The appended patch does this; it fixes the ICEs when adding your patch.

Tested on s390x-ibm-linux.

Richard, does this look reasonable to you?

Bye,
Ulrich

ChangeLog:

* optabs.c (prepare_operand): Gracefully fail if the mode of X
does not match the operand mode expected by the insn pattern.

Index: gcc/optabs.c
===
*** gcc/optabs.c(revision 217416)
--- gcc/optabs.c(working copy)
*** prepare_operand (enum insn_code icode, r
*** 4308,4316 

if (!insn_operand_matches (icode, opnum, x))
  {
if (reload_completed)
return NULL_RTX;
!   x = copy_to_mode_reg (insn_data[(int) icode].operand[opnum].mode, x);
  }

return x;
--- 4308,4319 

if (!insn_operand_matches (icode, opnum, x))
  {
+   machine_mode op_mode = insn_data[(int) icode].operand[opnum].mode;
if (reload_completed)
return NULL_RTX;
!   if (GET_MODE (x) != op_mode && GET_MODE (x) != VOIDmode)
!   return NULL_RTX;
!   x = copy_to_mode_reg (op_mode, x);
  }

return x;

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: libgomp: "GNU OpenMP Runtime Library" (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost))

2014-11-12 Thread Jakub Jelinek

On Wed, Nov 12, 2014 at 03:22:21PM -0500, David Malcolm wrote:
> On Wed, 2014-11-12 at 14:47 +0100, Jakub Jelinek wrote:
> > On Wed, Nov 12, 2014 at 08:33:34AM -0500, David Malcolm wrote:
> > > Apologies for bikeshedding, and I normally dislike "cute" names, but
> > > renaming it to
> > > 
> > >"GNU Offloading and Multi Processing library"
> > > 
> > > would allow a backronym of "libgomp", thus preserving the existing
> > > filenames/SONAME etc.
> > 
> > I think this is fine, can you change it both in libgomp/configure.ac
> > and texi docs?
> 
> Am attaching a patch that does so, though I suspect the wording in the
> texi may need some more work (not my area of expertise).

Oops, I didn't mean by "you" above you, but the OpenACC folks, sorry for
confusion.  Anyway, your patch is ok for trunk.  Thanks.

> >From f52f7d0e2115d3f88e8662cab650f8746a2c147d Mon Sep 17 00:00:00 2001
> From: David Malcolm 
> Date: Wed, 12 Nov 2014 12:25:25 -0500
> Subject: [PATCH] Change "human" name of libgomp
> 
> libgomp/ChangeLog:
>   * configure.ac (AC_INIT): Rename from "GNU OpenMP Runtime Library"
>   to "GNU Offloading and Multi Processing Runtime Library".
>   * libgomp.texi (direntry): Likewise.  Reword to refer to both
>   OpenMP and OpenACC.
>   (Introduction): Reword.
>   (Runtime Library Routines): Reword.
> ---
>  libgomp/configure.ac |  2 +-
>  libgomp/libgomp.texi | 14 --
>  2 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/libgomp/configure.ac b/libgomp/configure.ac
> index 84d250f..1a70058 100644
> --- a/libgomp/configure.ac
> +++ b/libgomp/configure.ac
> @@ -2,7 +2,7 @@
>  # aclocal -I ../config && autoconf && autoheader && automake
>  
>  AC_PREREQ(2.64)
> -AC_INIT([GNU OpenMP Runtime Library], 1.0,,[libgomp])
> +AC_INIT([GNU Offloading and Multi Processing Runtime Library], 
> 1.0,,[libgomp])
>  AC_CONFIG_HEADER(config.h)
>  
>  # ---
> diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
> index 254be57..78e8404 100644
> --- a/libgomp/libgomp.texi
> +++ b/libgomp/libgomp.texi
> @@ -31,11 +31,12 @@ texts being (a) (see below), and with the Back-Cover 
> Texts being (b)
>  @ifinfo
>  @dircategory GNU Libraries
>  @direntry
> -* libgomp: (libgomp).GNU OpenMP runtime library
> +* libgomp: (libgomp).   GNU Offloading and Multi Processing Runtime library
>  @end direntry
>  
> -This manual documents the GNU implementation of the OpenMP API for 
> -multi-platform shared-memory parallel programming in C/C++ and Fortran.
> +This manual documents libgomp, the GNU Offloading and Multi
> +Processing Runtime library.  This is the GNU implementation of the OpenMP
> +and OpenACC APIs for parallel programming in C/C++ and Fortran.
>  
>  Published by the Free Software Foundation
>  51 Franklin Street, Fifth Floor
> @@ -69,7 +70,8 @@ Boston, MA 02110-1301, USA@*
>  @top Introduction
>  @cindex Introduction
>  
> -This manual documents the usage of libgomp, the GNU implementation of the 
> +This manual documents the usage of libgomp, the GNU Offloading and Multi
> +Processing Runtime library.  This is the GNU implementation of the
>  @uref{http://www.openmp.org, OpenMP} Application Programming Interface (API)
>  for multi-platform shared-memory parallel programming in C/C++ and Fortran.
>  
> @@ -82,8 +84,8 @@ for multi-platform shared-memory parallel programming in 
> C/C++ and Fortran.
>  @comment
>  @menu
>  * Enabling OpenMP::How to enable OpenMP for your applications.
> -* Runtime Library Routines::   The OpenMP runtime application programming 
> -   interface.
> +* Runtime Library Routines::   The offloading and multiprocessing runtime
> +   application programming interface.
>  * Environment Variables::  Influencing runtime behavior with environment 
> variables.
>  * The libgomp ABI::Notes on the external ABI presented by 
> libgomp.
> -- 
> 1.8.5.3
> 


Jakub

Re: libgomp: "GNU OpenMP Runtime Library" (was: [PATCH 1/5] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin (repost))

2014-11-12 Thread David Malcolm

On Wed, 2014-11-12 at 14:47 +0100, Jakub Jelinek wrote:
> On Wed, Nov 12, 2014 at 08:33:34AM -0500, David Malcolm wrote:
> > Apologies for bikeshedding, and I normally dislike "cute" names, but
> > renaming it to
> > 
> >"GNU Offloading and Multi Processing library"
> > 
> > would allow a backronym of "libgomp", thus preserving the existing
> > filenames/SONAME etc.
> 
> I think this is fine, can you change it both in libgomp/configure.ac
> and texi docs?

Am attaching a patch that does so, though I suspect the wording in the
texi may need some more work (not my area of expertise).

Bootstrapped on x86_64-unknown-linux-gnu (Fedora 20); regrtest ongoing.

Dave
>From f52f7d0e2115d3f88e8662cab650f8746a2c147d Mon Sep 17 00:00:00 2001
From: David Malcolm 
Date: Wed, 12 Nov 2014 12:25:25 -0500
Subject: [PATCH] Change "human" name of libgomp

libgomp/ChangeLog:
	* configure.ac (AC_INIT): Rename from "GNU OpenMP Runtime Library"
	to "GNU Offloading and Multi Processing Runtime Library".
	* libgomp.texi (direntry): Likewise.  Reword to refer to both
	OpenMP and OpenACC.
	(Introduction): Reword.
	(Runtime Library Routines): Reword.
---
 libgomp/configure.ac |  2 +-
 libgomp/libgomp.texi | 14 --
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/libgomp/configure.ac b/libgomp/configure.ac
index 84d250f..1a70058 100644
--- a/libgomp/configure.ac
+++ b/libgomp/configure.ac
@@ -2,7 +2,7 @@
 # aclocal -I ../config && autoconf && autoheader && automake
 
 AC_PREREQ(2.64)
-AC_INIT([GNU OpenMP Runtime Library], 1.0,,[libgomp])
+AC_INIT([GNU Offloading and Multi Processing Runtime Library], 1.0,,[libgomp])
 AC_CONFIG_HEADER(config.h)
 
 # ---
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 254be57..78e8404 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -31,11 +31,12 @@ texts being (a) (see below), and with the Back-Cover Texts being (b)
 @ifinfo
 @dircategory GNU Libraries
 @direntry
-* libgomp: (libgomp).GNU OpenMP runtime library
+* libgomp: (libgomp).   GNU Offloading and Multi Processing Runtime library
 @end direntry
 
-This manual documents the GNU implementation of the OpenMP API for 
-multi-platform shared-memory parallel programming in C/C++ and Fortran.
+This manual documents libgomp, the GNU Offloading and Multi
+Processing Runtime library.  This is the GNU implementation of the OpenMP
+and OpenACC APIs for parallel programming in C/C++ and Fortran.
 
 Published by the Free Software Foundation
 51 Franklin Street, Fifth Floor
@@ -69,7 +70,8 @@ Boston, MA 02110-1301, USA@*
 @top Introduction
 @cindex Introduction
 
-This manual documents the usage of libgomp, the GNU implementation of the 
+This manual documents the usage of libgomp, the GNU Offloading and Multi
+Processing Runtime library.  This is the GNU implementation of the
 @uref{http://www.openmp.org, OpenMP} Application Programming Interface (API)
 for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 
@@ -82,8 +84,8 @@ for multi-platform shared-memory parallel programming in C/C++ and Fortran.
 @comment
 @menu
 * Enabling OpenMP::How to enable OpenMP for your applications.
-* Runtime Library Routines::   The OpenMP runtime application programming 
-   interface.
+* Runtime Library Routines::   The offloading and multiprocessing runtime
+   application programming interface.
 * Environment Variables::  Influencing runtime behavior with environment 
variables.
 * The libgomp ABI::Notes on the external ABI presented by libgomp.
-- 
1.8.5.3

Re: Follow-up to PR51471

2014-11-12 Thread Jeff Law


On 11/11/14 15:28, Eric Botcazou wrote:

In https://gcc.gnu.org/ml/gcc-patches/2012-01/msg00363.html, Tom reported an
ICE in the DWARF CFI pass because a frame-related insn was speculated by the
reorg pass and, therefore, disabled the optimization.  It turns out that the
same code (when condition == const_true_rtx) can also duplicate frame-related
insns, leading to the same ICE in the DWARF CFI pass; we have run into that
building newlib for a port we are about to contribute.

Hence the attached patch, which disables the optimization in this case too and
adds a comment for both.  Tested on SPARC/Solaris, applied on the mainline.


2014-11-11  Eric Botcazou  

* reorg.c (fill_slots_from_thread): Do not copy frame-related insns.
I wonder how many other problems of this nature are lurking in reorg.c. 
 For example steal_delay_list_from_{target,fallthrough} or the code 
which searches for arithmetic at the branch target, and puts the 
opposite insn in a delay slot.


In fact, I really wonder if we should be allowing anything frame related 
outside fill_simple_delay_slots.


jeff

[patch] Apply LWG 2399, 2400 and 2401 to std::shared_ptr and std::function

2014-11-12 Thread Jonathan Wakely


These issues just became DRs in Urbana.

Tested x86_64-linux, committed to trunk.

The other new DRs we need to make changes for are 2212 (to shuffle
some code between headers), 2217 (which I sent an email about earlier)
and 2354 (adding some new insert overloads in RB trees).

commit 5930336c20503e26ed22f8ab5fe81146b31a60e5
Author: Jonathan Wakely 
Date:   Wed Nov 12 14:21:41 2014 +

Implement resolutions of LWG 2399, 2400 and 2401.

	* include/bits/shared_ptr.h (shared_ptr, weak_ptr): Define
	_Convertible alias template to simplify constraints.
	(shared_ptr(unique_ptr&&)): Constrain (LWG 2399).
	* include/bits/shared_ptr_base.h: Likewise.
	(_Sp_counted_deleter::_M_get_deleter()): Use addressof (LWG 2400).
	* include/std/functional (function::operator=(nullptr_t)): Add
	noexcept (LWG 2401).
	* testsuite/20_util/shared_ptr/cons/43820_neg.cc: Adjust dg-error.
	* testsuite/20_util/shared_ptr/cons/void_neg.cc: Adjust dg-error.

diff --git a/libstdc++-v3/include/bits/shared_ptr.h b/libstdc++-v3/include/bits/shared_ptr.h
index c2d56eb..22cb58a 100644
--- a/libstdc++-v3/include/bits/shared_ptr.h
+++ b/libstdc++-v3/include/bits/shared_ptr.h
@@ -92,6 +92,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 class shared_ptr : public __shared_ptr<_Tp>
 {
+  template
+	using _Convertible
+	  = typename enable_if::value>::type;
+
 public:
   /**
*  @brief  Construct an empty %shared_ptr.
@@ -213,8 +217,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @param  __r  A %shared_ptr.
*  @post   get() == __r.get() && use_count() == __r.use_count()
*/
-  template::value>::type>
+  template>
 	shared_ptr(const shared_ptr<_Tp1>& __r) noexcept
 : __shared_ptr<_Tp>(__r) { }
 
@@ -231,8 +234,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @param  __r  A %shared_ptr rvalue.
*  @post   *this contains the old value of @a __r, @a __r is empty.
*/
-  template::value>::type>
+  template>
 	shared_ptr(shared_ptr<_Tp1>&& __r) noexcept
 	: __shared_ptr<_Tp>(std::move(__r)) { }
 
@@ -253,7 +255,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	shared_ptr(std::auto_ptr<_Tp1>&& __r);
 #endif
 
-  template
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2399. shared_ptr's constructor from unique_ptr should be constrained
+  template::pointer>>
 	shared_ptr(std::unique_ptr<_Tp1, _Del>&& __r)
 	: __shared_ptr<_Tp>(std::move(__r)) { }
 
@@ -464,25 +469,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 class weak_ptr : public __weak_ptr<_Tp>
 {
+  template
+	using _Convertible
+	  = typename enable_if::value>::type;
+
 public:
   constexpr weak_ptr() noexcept = default;
 
-  template::value>::type>
+  template>
 	weak_ptr(const shared_ptr<_Tp1>& __r) noexcept
 	: __weak_ptr<_Tp>(__r) { }
 
   weak_ptr(const weak_ptr&) noexcept = default;
 
-  template::value>::type>
+  template>
 	weak_ptr(const weak_ptr<_Tp1>& __r) noexcept
 	: __weak_ptr<_Tp>(__r) { }
 
   weak_ptr(weak_ptr&&) noexcept = default;
 
-  template::value>::type>
+  template>
 	weak_ptr(weak_ptr<_Tp1>&& __r) noexcept
 	: __weak_ptr<_Tp>(std::move(__r)) { }
 
diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h
index ea74000..fe397d0 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -477,7 +477,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _M_get_deleter(const std::type_info& __ti) noexcept
   {
 #ifdef __GXX_RTTI
-return __ti == typeid(_Deleter) ? &_M_impl._M_del() : nullptr;
+	// _GLIBCXX_RESOLVE_LIB_DEFECTS
+	// 2400. shared_ptr's get_deleter() should use addressof()
+return __ti == typeid(_Deleter)
+	  ? std::__addressof(_M_impl._M_del())
+	  : nullptr;
 #else
 return nullptr;
 #endif
@@ -862,6 +866,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 class __shared_ptr
 {
+  template
+	using _Convertible
+	  = typename enable_if::value>::type;
+
 public:
   typedef _Tp   element_type;
 
@@ -916,8 +924,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __shared_ptr& operator=(const __shared_ptr&) noexcept = default;
   ~__shared_ptr() = default;
 
-  template::value>::type>
+  template>
 	__shared_ptr(const __shared_ptr<_Tp1, _Lp>& __r) noexcept
 	: _M_ptr(__r._M_ptr), _M_refcount(__r._M_refcount)
 	{ }
@@ -929,8 +936,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__r._M_ptr = 0;
   }
 
-  template::value>::type>
+  template>
 	__shared_ptr(__shared_ptr<_Tp1, _Lp>&& __r) noexcept
 	: _M_ptr(__r._M_ptr), _M_refcount()
 	{
@@ -950,7 +956,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
 
   // If an exception is thrown this constructor has no effect.
-  template
+  template::pointer>>
 	__shared_ptr(std::unique_ptr<_Tp1, _Del>&& __r)
 	: _M_ptr(__r.get()), _M_refcount()
 	{
@@ -1331,6 +1338,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

RE: [PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR

2014-11-12 Thread Matthew Fortune

> (for MIPS) https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01481.html,
> although I have not been able to test this as there doesn't seem to be
> any working MIPS/Loongson hardware in the Compile Farm;

I will post a patch to remove vec_shl and only support vec_shr for
little endian. This is on the basis that loongson only supports
little endian anyway.

I believe this is a safe thing to do regardless of whether your change
is in place or not.

Matthew

RE: [PATCHv4][MIPS] Implement O32 ABI extensions (GCC)

2014-11-12 Thread Moore, Catherine



> -Original Message-
> From: Matthew Fortune [mailto:matthew.fort...@imgtec.com]
> Sent: Wednesday, November 12, 2014 1:59 PM
> To: Moore, Catherine; 'gcc-patches@gcc.gnu.org' (gcc-
> patc...@gcc.gnu.org); Eric Christopher (echri...@gmail.com)
> Cc: Richard Sandiford; Rich Fuhler; Rozycki, Maciej; Myers, Joseph
> Subject: RE: [PATCHv4][MIPS] Implement O32 ABI extensions (GCC)
> 
> Moore, Catherine  writes:
> > The patch looks good.   Please fix up these couple of nits prior to
> > committing.
> 
> OK, thanks for the second read through. One further amendment below, I'll
> aim to commit later today.
> 

Yes, that's better.

> > Index: gcc/config/mips/mips.c
> >
> ==
> =
> > --- gcc/config/mips/mips.c  (revision 217363)
> > +++ gcc/config/mips/mips.c  (working copy)
> > @@ -18824,6 +19000,21 @@ mips_expand_vec_minmax (rtx target, rtx
> op0,
> > rtx o
> >emit_insn (gen_rtx_SET (VOIDmode, target, x));  }
> >
> > +/* Implement HARD_REGNO_CALLER_SAVE_MODE.  */
> > +
> > +machine_mode
> > +mips_hard_regno_caller_save_mode (unsigned int regno,
> > + unsigned int nregs,
> > + machine_mode mode) {
> > +  /* For performance, to avoid saving/restoring upper parts of a
> > register,
> > + we return MODE as save mode when MODE is not VOIDmode.  */
> >
> > s/performance, to/performance, /
> >
> 
> The second part of this sentence will need to change too I think:
> 
> For performance, avoid saving/restoring upper parts of a register by
> returning MODE as save mode when the mode is known.
> 
> Thanks,
> Matthew

Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

2014-11-12 Thread Michael Meissner

On Wed, Nov 12, 2014 at 03:26:35AM -0600, Segher Boessenkool wrote:
> On Tue, Nov 11, 2014 at 08:27:22PM -0500, Michael Meissner wrote:
> > > Before the patch, the final reduction used *vsx_reduc_splus_v2df; after
> > > the patch, it is *vsx_reduc_plus_v2df_scalar.  The former does a vector
> > > add, the latter a float add.  And it uses the same pseudoregister for the
> > > accumulator throughout.  IRA decides a register is more expensive than
> > > memory for this, I suppose because it wants both V2DF and DF?  It doesn't
> > > seem to like the subreg very much.
> > 
> > I haven't looked into in detail (I've been a little busy with th upper regs
> > patch), but I suspect the problem is that 128-bit and 64-bit types cannot
> > overlap (i.e. rs6000_cannot_change_mode_class returns true).  This is due to
> > the fact that scalars in VSX registers occupy the upper 64-bits, which would
> > not match the compiler's notion of that it should be in the bottom 64-bits.
> 
> You suspect correctly.  Hacking around that in cannot_change_mode_class
> doesn't help, subreg_get_info disallows it next.
> 
> Changing the pattern so it does two extracts instead of an extract and
> a subreg works (you get an fmr for the high part though, register alloc
> doesn't know dest=src is for free here).
> 
> _Should_ the subreg thing work?  Or should the patterns be fixed?

As I said, we cannot allow CANNOT_CHANGE_MODE_CLASS to return false for this
case, because the hardware just does not agree with what GCC believes is the
natural placement for smaller values inside of larger register fields.  I
suspect even if you add new target support macros to fix it, it will be a game
of whack-a-mole to find all of the places where there are hidden asumptions in
the compiler about subreg ordering.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

RE: [PATCHv4][MIPS] Implement O32 ABI extensions (GCC)

2014-11-12 Thread Matthew Fortune

Moore, Catherine  writes: 
> The patch looks good.   Please fix up these couple of nits prior to
> committing.

OK, thanks for the second read through. One further amendment below, I'll
aim to commit later today.

> Index: gcc/config/mips/mips.c
> ===
> --- gcc/config/mips/mips.c  (revision 217363)
> +++ gcc/config/mips/mips.c  (working copy)
> @@ -18824,6 +19000,21 @@ mips_expand_vec_minmax (rtx target, rtx op0, rtx
> o
>emit_insn (gen_rtx_SET (VOIDmode, target, x));  }
> 
> +/* Implement HARD_REGNO_CALLER_SAVE_MODE.  */
> +
> +machine_mode
> +mips_hard_regno_caller_save_mode (unsigned int regno,
> + unsigned int nregs,
> + machine_mode mode) {
> +  /* For performance, to avoid saving/restoring upper parts of a
> register,
> + we return MODE as save mode when MODE is not VOIDmode.  */
> 
> s/performance, to/performance, /
> 

The second part of this sentence will need to change too I think:

For performance, avoid saving/restoring upper parts of a register
by returning MODE as save mode when the mode is known.

Thanks,
Matthew

Re: [PATCH][17/n] Merge from match-and-simplify, plus/minus association patterns

2014-11-12 Thread H.J. Lu

On Wed, Nov 12, 2014 at 9:55 AM, H.J. Lu  wrote:
> On Tue, Nov 11, 2014 at 5:13 AM, Richard Biener  wrote:
>>
>> This merges patterns from associate_plusminus and adjusts them with
>> details from their fold-const.c pendants.  It also fixes missing
>> flag_sanitize checks on negate contraction on the way.
>>
>> This shows places where folds STRIP_NOPs was important (but also
>> shows where it may create wrong code - sth the patch doesn't fix
>> yet).  Without the conditonal convert handling on the negate
>> contraction we regress quite a few GENERIC folding testcases.
>>
>> Note that the other explicit reassocation patterns are handled
>> by folds associate: piece which I am sure we don't implement
>> fully by the few patterns (OTOH on GIMPLE we have a reassoc
>> pass for that anyway).  So not too many patterns were removed
>> from fold.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>>
>> Richard.
>>
>> 2014-11-11  Richard Biener  
>>
>> * match.pd: Implement patterns from associate_plusminus
>> and factor in differences from the fold-const.c implementation.
>> * fold-const.c (fold_binary_loc): Remove patterns here.
>> * tree-ssa-forwprop.c (associate_plusminus): Remove.
>> (pass_forwprop::execute): Don't call it.
>> * tree.c (tree_nop_conversion_p): New function, factored
>> from tree_nop_conversion.
>> * tree.h (tree_nop_conversion_p): Declare.
>>
>
> This caused:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63836
>

It is dup of

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63821
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63835

I checked in this patch and closed the PR.

-- 
H.J.
---
Index: ChangeLog
===
--- ChangeLog (revision 217440)
+++ ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2014-11-12  H.J. Lu  
+
+ PR tree-optimization/63835
+ * gcc.dg/pr63835.c: New test.
+
 2014-11-12  Alan Lawrence  

  * gcc.target/aarch64/simd/vqdmlal_high_lane_s16_indices_1.c: New test.
Index: gcc.dg/pr63835.c
===
--- gcc.dg/pr63835.c (revision 0)
+++ gcc.dg/pr63835.c (working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+
+int a;
+
+int
+fn1 (int p)
+{
+  return -p;
+}
+
+void
+fn2 ()
+{
+  fn1 (-(unsigned int) a);
+}

Re: [PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness

2014-11-12 Thread Alan Lawrence

Pushed as r217440, also with Charles' whitespace fixes ('' -> tab) - 
good spot!


Cheers, Alan

Marcus Shawcroft wrote:

On 6 November 2014 10:19, Alan Lawrence  wrote:

This generates out-of-range errors at compile- (rather than assemble-)time
for the vqdm*_lane intrinsics, and also provides a single place to do
bigendian lane-swapping for all those intrinsics (and others to follow in
later patches). This allows us to remove many define_expands that just do a
range-check and endian-swap before outputting the RTL for a corresponding
"_internal" insn.

Changes to aarch64-simd.md are not as big as they look, they are highly
repetitive, like the code they are removing! Testcases are also repetitive,
as unfortunately dg-error doesn't care *how many* errors there were matching
it's pattern, as long as at least 1, hence having to separate each into own
file - the last "0" in the dg-error disables the line-number checking, as
the line numbers in our error messages refer to lines within arm_neon.h
rather than within the test case. (They do at least mention the user
function containing the call to the intrinsic.)

Ok for trunk?


OK, but make sure the committed patch addresses Mike's comment. /Marcus

Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

2014-11-12 Thread Alan Lawrence

Have run check-gcc on gcc110.fsffrance.org (powerpc64-unknown-linux-gnu) using 
this snippet on top of original patch; no regressions.


Alan Lawrence wrote:
So I'm no expert on RS6000 here, but following on from Segher's observation 
about the change in pattern...so the difference in 'expand' is exactly that, a 
vsx_reduc_splus_v2df followed by a vec_extract to DF, becomes a 
vsx_reduc_splus_v2df_scalar - as I expected the combiner to produce by combining 
the two previous insns.


However, inspecting the logs from -fdump-rtl-combine-all, *without* my patch, 
when the combiner tries to put those two together, I see:


Trying 30 -> 31:
Failed to match this instruction:
(set (reg:DF 179 [ stmp_s_5.7D.2196 ])
 (vec_select:DF (plus:V2DF (vec_select:V2DF (reg:V2DF 173 [ 
vect_s_5.6D.2195 ])
 (parallel [
 (const_int 1 [0x1])
 (const_int 0 [0])
 ]))
 (reg:V2DF 173 [ vect_s_5.6D.2195 ]))
 (parallel [
 (const_int 1 [0x1])
 ])))

That is, it looks like combine_simplify_rtx has transformed the (vec_concat 
(vec_select ... 1) (vec_select ... 0)) from the vsx_reduc_plus_v2df insn, into a 
single vec_select, which does not match the vsx_reduc_plus_v2df_scalar insn.


So despite the comment (in vsx.md):

;; Combiner patterns with the vector reduction patterns that knows we can get
;; to the top element of the V2DF array without doing an extract.

It looks like the code generation prior to my patch, considered better, was 
because the combiner didn't actually use the pattern?


In that case whilst you may want to dig into register allocation, 
cannot_change_mode_class, etc., for other reasons, I think the best fix for 
migrating to reduc_plus_scal... is simply to avoid using the "Combiner" patterns 
and just emit two insns, the old pattern followed by a vec_extract. The attached 
snippet does this (I won't call it a patch yet, and it applies on top of the 
previous patch - I went the route of calling the two gen functions rather than 
copying their RTL sequences, but could do the latter if that were 
preferable???), and restores code generation to the original form on your 
example above; it bootstraps OK but I'm still running check-gcc on the Compile 
Farm...


However, again on your example above, I note that if I *remove* the 
reduc_plus_scal_v2df pattern altogether, I get:


.sum:
 li 10,512# 52   *movdi_internal64/4 [length = 4]
 ld 9,.LC2@toc(2) # 20   *movdi_internal64/2 [length = 4]
 xxlxor 0,0,0 # 17   *vsx_movv2df/12 [length = 4]
 mtctr 10 # 48   *movdi_internal64/11[length = 4]
 .align 4
.L2:
 lxvd2x 12,0,9# 23   *vsx_movv2df/2  [length = 4]
 addi 9,9,16  # 25   *adddi3_internal1/2 [length = 4]
 xvadddp 0,0,12   # 24   *vsx_addv2df3/1 [length = 4]
 bdnz .L2 # 47   *ctrdi_internal1/1  [length = 4]
 xxsldwi 12,0,0,2 # 30   vsx_xxsldwi_v2df[length = 4]
 xvadddp 1,0,12   # 31   *vsx_addv2df3/1 [length = 4]
 nop  # 37   *vsx_extract_v2df_internal2/1   [length = 4]
 blr  # 55   return  [length = 4]

this is presumably using gcc's scalar reduction code, but (to my untrained eye 
on powerpc!) it looks even better than the first form above (the same in the 
loop, and in the reduction, an xxpermdi is replaced by a nop !)...


--Alan


Segher Boessenkool wrote:

On Mon, Nov 10, 2014 at 05:36:24PM -0500, Michael Meissner wrote:

However, the double pattern is completely broken.  This cannot go in.

[snip]


It is unacceptable to have to do the inner loop doing a load, vector add, and
store in the loop.

Before the patch, the final reduction used *vsx_reduc_splus_v2df; after
the patch, it is *vsx_reduc_plus_v2df_scalar.  The former does a vector
add, the latter a float add.  And it uses the same pseudoregister for the
accumulator throughout.  IRA decides a register is more expensive than
memory for this, I suppose because it wants both V2DF and DF?  It doesn't
seem to like the subreg very much.

The new code does look nicer otherwise :-)


Segher

1 2 3 >

1 - 100 of 275 matches

Mail list logo