Re: [patch] Reduce memory overhead for large functions

2012-08-13 Thread Richard Guenther
On Sun, Aug 12, 2012 at 11:49 PM, Steven Bosscher  wrote:
> Hello,
>
> This patch tried to use non-clearing memory allocation where possible.
> This is especially important for very large functions, when arrays of
> size in the order of n_basic_blocks or num_ssa_names are allocated to
> hold sparse data sets. For such cases the overhead of memset becomes
> measurable (and even dominant for the time spent in a pass in some
> cases, such as the one I recently fixed in ifcvt.c).
>
> This cuts off ~20% of the compile time for the test case of PR54146 at
> -O1. Not bad for a patch that basically only removes a bunch of
> memsets.
>
> I got another 5% for the changes in tree-ssa-loop-manip.c. A loop over
> an array with num_ssa_names there is expensive and unnecessary, and it
> helps to stuff all bitmaps together on a single obstack if you intend
> to blow them all away at the end (this could be done in a number of
> other places in the compiler). Clearing livein at the end of
> add_exit_phis_var also reduces peak memory with ~250MB at that point
> in the passes pipeline (only to blow up from ~1.5GB peak memory in the
> GIMPLE optimizers to ~3.6 GB in expand, and to ~8.6GB in IRA, but hey,
> who's counting? :-)
>
> Actually, the worst cases are not fixed with this patch. That'd be IRA
> (which consumes ~5GB on the test case, out of 8GB total), and
> tree-PRE.
>
> The IRA case looks like it may be hard to fix: Allocating multiple
> arrays of size O(max_regno) for every loop in init_loop_tree_node.
>
> The tree-PRE case is one where the avail arrays are allocated and
> cleared for every PRE candidate. This looks like a place where a
> pointer_map should be used instead. I'll tackle that later, when I've
> addressed more pressing problems in the compilation of the PR54146
> test case.

Hmm, or eaiser, use a vector of size (num_bb_preds) and index it by
edge index.

> This patch was bootstrapped&tested on powerpc64-unknown-linux-gnu. OK for 
> trunk?

Ok with adjusting the PRE comments according to the above.

Thanks,
Richard.

> Kudos to the compile farm people, without them I couldn't even hope to
> get any of this work done!

> Ciao!
> Steven


[PATCH] Fix PR54200

2012-08-13 Thread Richard Guenther

This fixes one issue with copyrename, that it "leaks" names backward
through a PHI node because it treats a PHI node

 _1 = PHI <_2, _3, _4>

as

 _1 = _2;
 _1 = _3;
 _1 = _4;

at the point of the PHI node which is certainly not what it is
(the assigns exist, one each, on the incoming edges only).

The following fixes that.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  The guality
test fails for -Os on x86_64 and for -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects on i586, a single FAIL is much better than
another 7 XPASSes, so I left it as FAIL.  On x86_64 now also
gcc.target/i386/pad-10.c FAILs, because appearantly whatever is
responsible for not issueing padding nops is confused by the
no longer occuring coalescing at out-of-SSA time.  Thus clearly
a pre-existing bug and simply a weak testcase.

Applied.

Richard.

2012-08-13  Richard Guenther  

PR tree-optimization/54200
* tree-ssa-copyrename.c (rename_ssa_copies): Do not add
PHI results to another partition if not all PHI arguments
have the same partition.

* gcc.dg/guality/pr54200.c: New testcase.
* gcc.dg/tree-ssa/slsr-8.c: Adjust.

Index: gcc/tree-ssa-copyrename.c
===
*** gcc/tree-ssa-copyrename.c.orig  2012-08-10 15:51:19.0 +0200
--- gcc/tree-ssa-copyrename.c   2012-08-10 15:51:29.000638547 +0200
*** rename_ssa_copies (void)
*** 348,362 
  res = gimple_phi_result (phi);
  
  /* Do not process virtual SSA_NAMES.  */
! if (!is_gimple_reg (res))
continue;
  
!   for (i = 0; i < gimple_phi_num_args (phi); i++)
! {
!   tree arg = gimple_phi_arg (phi, i)->def;
!   if (TREE_CODE (arg) == SSA_NAME)
!   updated |= copy_rename_partition_coalesce (map, res, arg, 
debug);
! }
  }
  }
  
--- 348,400 
  res = gimple_phi_result (phi);
  
  /* Do not process virtual SSA_NAMES.  */
! if (virtual_operand_p (res))
continue;
  
! /* Make sure to only use the same partition for an argument
!as the result but never the other way around.  */
! if (SSA_NAME_VAR (res)
! && !DECL_IGNORED_P (SSA_NAME_VAR (res)))
!   for (i = 0; i < gimple_phi_num_args (phi); i++)
! {
!   tree arg = PHI_ARG_DEF (phi, i);
!   if (TREE_CODE (arg) == SSA_NAME)
! updated |= copy_rename_partition_coalesce (map, res, arg,
!debug);
! }
! /* Else if all arguments are in the same partition try to merge
!it with the result.  */
! else
!   {
! int all_p_same = -1;
! int p = -1;
! for (i = 0; i < gimple_phi_num_args (phi); i++)
!   {
! tree arg = PHI_ARG_DEF (phi, i);
! if (TREE_CODE (arg) != SSA_NAME)
!   {
! all_p_same = 0;
! break;
!   }
! else if (all_p_same == -1)
!   {
! p = partition_find (map->var_partition,
! SSA_NAME_VERSION (arg));
! all_p_same = 1;
!   }
! else if (all_p_same == 1
!  && p != partition_find (map->var_partition,
!  SSA_NAME_VERSION (arg)))
!   {
! all_p_same = 0;
! break;
!   }
!   }
! if (all_p_same == 1)
!   updated |= copy_rename_partition_coalesce (map, res,
!  PHI_ARG_DEF (phi, 0),
!  debug);
!   }
  }
  }
  
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c.orig 2012-08-10 15:51:19.0 
+0200
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c  2012-08-10 15:51:29.000638547 
+0200
*** f (int s, int *c)
*** 17,23 
return x1 ? x2 : x3;
  }
  
! /* There are 2 ' * ' instances in the decls (since "int * x3;" is
!optimized out), 1 parm, 2 in the code.  */
! /* { dg-final { scan-tree-dump-times " \\* " 5 "optimized" } } */
  /* { dg-final { cleanup-tree-dump "optimized" } } */
--- 17,23 
return x1 ? x2 : x3;
  }
  
! /* There are 4 ' * ' instances in the decls (since "int * iftmp.0;" is
!added), 1 parm, 2 in the code.  */
! /* { dg-final { scan-tree-dump-times " \\* " 7 "optimized" } } */
  /* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/guality/pr54200.c
===

Re: [Patch,AVR] PR54222: Add fixed point support

2012-08-13 Thread Georg-Johann Lay
Denis Chertykov wrote:
> 2012/8/11 Georg-Johann Lay :
>> Weddington, Eric schrieb:
 From: Georg-Johann Lay


 The first step would be to bisect and find the patch that lead to
 PR53923.  It was not a change in the avr BE, so the question goes
 to the authors of the respective patch.

 Up to now I didn't even try to bisect; that would take years on the
 host that I have available...

> My only real concern is that this is a major feature addition and
> the AVR port is currently broken.

 I don't know if it's the avr port or some parts of the middle end that
 don't cooperate with avr.
>>>
>>> I would really, really love to see fixed point support added in,
>>> especially since I know that Sean has worked on it for quite a while,
>>> and you've also done a lot of work in getting the patches in shape to
>>> get them committed.
>>>
>>> But, if the AVR port is currently broken (by whomever, and whatever
>>> patch) and a major feature like this can't be tested to make sure it
>>> doesn't break anything else in the AVR backend, then I'm hesitant to
>>> approve (even though I really want to approve).
>>
>> I don't understand enough of DF to fix PR53923.  The insn that leads
>> to the ICE is (in df-problems.c:dead_debug_insert_temp):
>>
> 
> Today I have updated GCC svn tree and successfully compiled avr-gcc.
> The libgcc2-mulsc3.c from PR53923 also compiled without bugs.
> 
> Denis.
> 
> PS: May be I'm doing something wrong ? (I had too long vacations)

I am configuring with --target=avr --disable-nls --with-dwarf2
--enable-languages=c,c++ --enable-target-optspace=yes --enable-checking=yes,rtl

Build GCC is "gcc version 4.3.2".
Build and host are i686-pc-linux-gnu.

Maybe it's different on a 64-bit computer, but I only have 32-bit host.

Johann


[PATCH,i386] cpuid function for prefetchw

2012-08-13 Thread Gopalasubramanian, Ganesh
Hello,

To get the prefetchw cpuid flag, cpuid
function 0x8001 needs to be called.
Previous to patch, function 0x7 is called.

Bootstrapping and "make -k check" passes without failures.
Ok for trunk?

Regards
Ganesh

2012-08-13 Ganesh Gopalasubramanian  

PR driver/54210
* config/i386/driver-i386.c (host_detect_local_cpu): Call
cpuid function 0x8001 to get the prfchw cpuid flag. 

Index: gcc/config/i386/driver-i386.c
===
--- gcc/config/i386/driver-i386.c   (revision 189996)
+++ gcc/config/i386/driver-i386.c   (working copy)
@@ -467,7 +467,6 @@
   has_bmi2 = ebx & bit_BMI2;
   has_fsgsbase = ebx & bit_FSGSBASE;
   has_rdseed = ebx & bit_RDSEED;
-  has_prfchw = ecx & bit_PRFCHW;
 }

   /* Check cpuid level of extended features.  */ @@ -491,6 +490,7 @@
   has_longmode = edx & bit_LM;
   has_3dnowp = edx & bit_3DNOWP;
   has_3dnow = edx & bit_3DNOW;
+  has_prfchw = ecx & bit_PRFCHW;
 }

   if (!arch



Re: Merge C++ conversion into trunk (1/6 - Configury)

2012-08-13 Thread Diego Novillo

On 12-08-13 02:51 , Steven Bosscher wrote:

On Mon, Aug 13, 2012 at 2:34 AM, Diego Novillo  wrote:

On 12-08-12 16:16 , Steven Bosscher wrote:


On Sun, Aug 12, 2012 at 10:09 PM, Diego Novillo 
wrote:


This patch implements the configuration changes needed to
bootstrap with a C++ compiler by default.



Hi,

Is it possible to add -fno-rtti to the default CXX_FLAGS, and remove
it if necessary?



I suppose, but I don't recall what the consensus was with rtti.  I did not
follow the coding conventions discussion very closely.


The coding conventions say:
"Run-time type information (RTTI) is permitted when certain
non-default --enable-checking options are enabled, so as to allow
checkers to report dynamic types. However, by default, RTTI is not
permitted and the compiler must build cleanly with -fno-rtti."


Thanks.  We actually already build with -fno-rtti.  So, no changes are 
needed.



Diego.


Re: Merge C++ conversion into trunk (0/6 - Overview)

2012-08-13 Thread Richard Guenther
On Sun, Aug 12, 2012 at 10:04 PM, Diego Novillo  wrote:
> I will be sending 6 patches that implement all the changes we
> have been making on the cxx-conversion branch.  As described in
> http://gcc.gnu.org/ml/gcc/2012-08/msg00015.html, these patches
> change the default bootstrap process so that stage 1 always
> builds with a C++ compiler.
>
> Other than the bootstrap change, the patches make no functional
> changes to the compiler.  Everything should build as it does now
> in trunk.
>
> I have split the merge in 6 main patches.  I will send these
> patches to the respective maintainers and gcc-patches.
> Please remember that the patches conform to the new C++ coding
> guidelines (http://gcc.gnu.org/codingconventions.html#Cxx_Conventions):
>
> 1- Configuration changes.
> 2- Re-write of VEC.
> 3- Re-write of gengtype to support C++ templates and
>user-provided marking functions.
> 4- New hash table class.
> 5- Re-write double_int.
> 6- Implement tree macros as inline functions so they can be
>called from gdb.
>
> As discussed before, several of these patches do not fully change
> the call sites to use the new APIs.  We will do this change once
> the branch has been merged into trunk.  Otherwise, the branch
> becomes a maintenance nightmare (despite not having changed many
> caller sites we were already starting to run into maintenance
> problems).

As I understand only 1. to 3. were kind-of required for the merge, all
other changes are a bonus at this time and should be delayed IMHO
(thus not merged with this batch).

I also understand that you will, quickly after merging 1. to 3. convert
all VEC users and remove the old interface.  This should be done
before any of 4. - 6. is merged as generally we don't want the
"half-converted" to persist, nor have multiple such half-conversions
at the same time.

I also understand that the merge of 1. to 3. will be followed by the promised
gengtype improvements and re-organizations.

Thus, please to the first C++ things very well.

Thanks,
Richard.

> For those who would like to build the conversion, you can either
> checkout the branch from SVN
> (svn://gcc.gnu.org/gcc/branches/cxx-conversion) or get the merged
> trunk I have in the git repo (branch dnovillo/cxx-conversion).
>
> The bootstrap changes have already been tested on a wide range of
> targets (http://gcc.gnu.org/wiki/CppBuildStatus).  Additionally,
> I have tested the merged trunk on: x86_64-unknown-linux-gnu,
> mips64el-unknown-linux-gnu, powerpc64-unknown-linux-gnu,
> i686-pc-linux-gnu, and ia64-unknown-linux-gnu.
>
>
> Thanks.  Diego.


Re: Merge C++ conversion into trunk (5/6 - double_int rewrite)

2012-08-13 Thread Richard Guenther
On Sun, Aug 12, 2012 at 11:30 PM, Marc Glisse  wrote:
> On Sun, 12 Aug 2012, Diego Novillo wrote:
>
>> This implements the double_int rewrite.
>>
>> See http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00711.html for
>> details.
>>
>> Diego.
>
>
> I am taking it as a chance to ask a couple questions about the coding
> conventions.
>
>
>> 2012-08-12   Lawrence Crowl  
>>
>> * hash-table.h
>> (typedef double_int): Change to struct (POD).
>> (double_int::make): New overloads for int to double-int
>> conversion.
>
>
> Isn't that double_int::from_* now?
>
>> +typedef struct double_int
>> {
>
> [...]
>>
>> } double_int;
>
>
> Does the coding convention say something about this verbosity?
>
>
>> +  HOST_WIDE_INT to_signed () const;
>> +  unsigned HOST_WIDE_INT to_unsigned () const;
>> +
>> +  /* Conversion query functions.  */
>> +
>> +  bool fits_unsigned() const;
>> +  bool fits_signed() const;
>
>
> Space before the parentheses or not?
>
>
>> +inline double_int &
>> +double_int::operator ++ ()
>> +{
>> +  *this + double_int_one;
>
>
> *this += double_int_one;
> would be less confusing.

Increment/decrement operations did not exist, please do not add them
at this point.

Richard.

>> +  return *this;
>> +}
>
>
> --
> Marc Glisse


Re: Merge C++ conversion into trunk (5/6 - double_int rewrite)

2012-08-13 Thread Jakub Jelinek
On Sun, Aug 12, 2012 at 11:30:59PM +0200, Marc Glisse wrote:
> >+inline double_int &
> >+double_int::operator ++ ()
> >+{
> >+  *this + double_int_one;
> 
> *this += double_int_one;
> would be less confusing.

Do you mean that *this + double_int_one; alone also works, just is
confusing?  That would mean operator+ has side-effects, right?

Jakub


Re: [PATCH,i386] cpuid function for prefetchw

2012-08-13 Thread Jakub Jelinek
On Mon, Aug 13, 2012 at 09:29:45AM +, Gopalasubramanian, Ganesh wrote:
> To get the prefetchw cpuid flag, cpuid
> function 0x8001 needs to be called.
> Previous to patch, function 0x7 is called.
> 
> Bootstrapping and "make -k check" passes without failures.
> Ok for trunk?

IMHO you move it to a wrong spot, ecx bits of CPUID 0x8001 are tested
earlier.

So I think you want this instead (bootstrap/regtest in progress):

2012-08-13  Ganesh Gopalasubramanian  
Jakub Jelinek  

PR driver/54210
* config/i386/driver-i386.c (host_detect_local_cpu): Test bit_PRFCHW
bit of CPUID 0x8001 %ecx instead of CPUID 7 %ecx.
* config/i386/cpuid.h (bits_PRFCHW): Move definition to CPUID
0x8001 %ecx flags.

--- gcc/config/i386/driver-i386.c.jj2012-08-10 15:49:25.0 +0200
+++ gcc/config/i386/driver-i386.c   2012-08-13 11:30:14.570494736 +0200
@@ -467,7 +467,6 @@ const char *host_detect_local_cpu (int a
   has_bmi2 = ebx & bit_BMI2;
   has_fsgsbase = ebx & bit_FSGSBASE;
   has_rdseed = ebx & bit_RDSEED;
-  has_prfchw = ecx & bit_PRFCHW;
   has_adx = ebx & bit_ADX;
 }
 
@@ -488,6 +487,7 @@ const char *host_detect_local_cpu (int a
   has_xop = ecx & bit_XOP;
   has_tbm = ecx & bit_TBM;
   has_lzcnt = ecx & bit_LZCNT;
+  has_prfchw = ecx & bit_PRFCHW;
 
   has_longmode = edx & bit_LM;
   has_3dnowp = edx & bit_3DNOWP;
--- gcc/config/i386/cpuid.h.jj  2012-08-10 15:49:25.0 +0200
+++ gcc/config/i386/cpuid.h 2012-08-13 11:31:30.346494092 +0200
@@ -52,6 +52,7 @@
 #define bit_LAHF_LM(1 << 0)
 #define bit_ABM(1 << 5)
 #define bit_SSE4a  (1 << 6)
+#define bit_PRFCHW (1 << 8)
 #define bit_XOP (1 << 11)
 #define bit_LWP(1 << 15)
 #define bit_FMA4(1 << 16)
@@ -69,7 +70,6 @@
 #define bit_HLE(1 << 4)
 #define bit_AVX2   (1 << 5)
 #define bit_BMI2   (1 << 8)
-#define bit_PRFCHW (1 << 8)
 #define bit_RTM(1 << 11)
 #define bit_RDSEED (1 << 18)
 #define bit_ADX(1 << 19)


Jakub


Backported patch to 4.7 branch

2012-08-13 Thread Jakub Jelinek
Hi!

I've bootstrapped/regtested on x86_64-linux and i686-linux and committed the
following backport to 4.7 branch.

2012-08-13  Jakub Jelinek  

Backported from trunk
2012-07-19  Jakub Jelinek  

PR rtl-optimization/53942
* function.c (assign_parm_setup_reg): Avoid zero/sign extension
directly from likely spilled non-fixed hard registers, move them
to pseudo first.

* gcc.dg/pr53942.c: New test.

--- gcc/function.c  (revision 189680)
+++ gcc/function.c  (revision 189681)
@@ -2987,11 +2987,26 @@ assign_parm_setup_reg (struct assign_par
  && insn_operand_matches (icode, 1, op1))
{
  enum rtx_code code = unsignedp ? ZERO_EXTEND : SIGN_EXTEND;
- rtx insn, insns;
+ rtx insn, insns, t = op1;
  HARD_REG_SET hardregs;
 
  start_sequence ();
- insn = gen_extend_insn (op0, op1, promoted_nominal_mode,
+ /* If op1 is a hard register that is likely spilled, first
+force it into a pseudo, otherwise combiner might extend
+its lifetime too much.  */
+ if (GET_CODE (t) == SUBREG)
+   t = SUBREG_REG (t);
+ if (REG_P (t)
+ && HARD_REGISTER_P (t)
+ && ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (t))
+ && targetm.class_likely_spilled_p (REGNO_REG_CLASS (REGNO (t
+   {
+ t = gen_reg_rtx (GET_MODE (op1));
+ emit_move_insn (t, op1);
+   }
+ else
+   t = op1;
+ insn = gen_extend_insn (op0, t, promoted_nominal_mode,
  data->passed_mode, unsignedp);
  emit_insn (insn);
  insns = get_insns ();
--- gcc/testsuite/gcc.dg/pr53942.c  (revision 0)
+++ gcc/testsuite/gcc.dg/pr53942.c  (revision 189681)
@@ -0,0 +1,34 @@
+/* PR rtl-optimization/53942 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-additional-options "-mtune=pentium2" { target { { i?86-*-* x86_64-*-* 
} && ia32 } } } */
+
+struct S
+{
+  unsigned short w[3];
+  unsigned int x, y;
+};
+
+struct S *baz (void);
+
+__attribute__ ((noinline))
+static unsigned char
+foo (struct S *x, unsigned char y)
+{
+  unsigned char c = 0;
+  unsigned char v = x->w[0];
+  c |= v;
+  v = ((x->w[1]) & (1 << y)) ? 1 : 0;
+  c |= v << 1;
+  v = ((x->w[2]) & 0xff) & (1 << y);
+  c |= v << 2;
+  return c;
+}
+
+void
+bar (void)
+{
+  struct S *s = baz ();
+  s->x = foo (s, 6);
+  s->y = foo (s, 7);
+}

Jakub


[PATCH GCC/ARM] Fix problem that hardreg_cprop opportunities are missed on thumb1

2012-08-13 Thread Bin Cheng
Hi,
For thumb1, arm-gcc rewrites move insn into subtract of ZERO in peephole2
pass intentionally, then executes
pass_if_after_reload/pass_regrename/pass_cprop_hardreg sequentially.

In this scenario, copy propagation opportunities are missed because:
  1. the move insns are re-written.
  2. pass_cprop_hardreg currently don't notice the subtract of ZERO.

This patch fixes the problem and the logic is:
  1. notice the plus/subtract of ZERO in pass_cprop_hardreg.
  2. if the last insn providing information about conditional codes is in
the form of "dest_reg = src_reg - 0", record the src_reg in newly added
field thumb1_cc_op0_src of structure machine_function.
  3. in pattern "cbranchsi4_insn", check thumb1_cc_op0_src along with
thumb1_cc_op0 to save one comparison insn.

I measured the patch on CSiBE, about 600 bytes are saved for both O2 and Os
on cortex-m0 without any regression.

I also tested the patch on
arm-none-eabi+cortex-m0/arm-none-eabi+cortex-m3/i686-pc-linux and no
regressions introduced.

So is it OK?

Thanks

2012-08-13  Bin Cheng  

* regcprop.c (copyprop_hardreg_forward_1) Notice copies in the form
of
subtract of ZERO.
* config/arm/arm.h (thumb1_cc_op0_src) New field.
* config/arm/arm.c (thumb1_final_prescan_insn) Record
thumb1_cc_op0_src.
* config/arm/arm.md (cbranchsi4_insn) Check thumb1_cc_op0_src along
with
thumb1_cc_op0.
Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 189835)
+++ gcc/config/arm/arm.c(working copy)
@@ -21663,6 +21663,8 @@
   if (cfun->machine->thumb1_cc_insn)
{
  if (modified_in_p (cfun->machine->thumb1_cc_op0, insn)
+ || (cfun->machine->thumb1_cc_op0_src != NULL_RTX
+ && modified_in_p (cfun->machine->thumb1_cc_op0_src, insn))
  || modified_in_p (cfun->machine->thumb1_cc_op1, insn))
CC_STATUS_INIT;
}
@@ -21672,13 +21674,18 @@
  rtx set = single_set (insn);
  cfun->machine->thumb1_cc_insn = insn;
  cfun->machine->thumb1_cc_op0 = SET_DEST (set);
+ cfun->machine->thumb1_cc_op0_src = NULL_RTX;
  cfun->machine->thumb1_cc_op1 = const0_rtx;
  cfun->machine->thumb1_cc_mode = CC_NOOVmode;
  if (INSN_CODE (insn) == CODE_FOR_thumb1_subsi3_insn)
{
  rtx src1 = XEXP (SET_SRC (set), 1);
  if (src1 == const0_rtx)
-   cfun->machine->thumb1_cc_mode = CCmode;
+   {
+ cfun->machine->thumb1_cc_mode = CCmode;
+ /* Record the minuend in thumb1_subsi3_insn pattern.  */
+ cfun->machine->thumb1_cc_op0_src = XEXP (SET_SRC (set), 0);
+   }
}
}
   else if (conds != CONDS_NOCOND)
Index: gcc/config/arm/arm.h
===
--- gcc/config/arm/arm.h(revision 189835)
+++ gcc/config/arm/arm.h(working copy)
@@ -1459,9 +1459,17 @@
  is not needed.  */
   int return_used_this_function;
   /* When outputting Thumb-1 code, record the last insn that provides
- information about condition codes, and the comparison operands.  */
+ information about condition codes, and the comparison operands.
+
+ If the last insn that provides information about condition codes
+ is in the form of "dest_reg = src_reg - 0", record the src_reg in
+ thumb1_cc_op0_src, and for following insn sequence:
+dest_reg = src_reg - 0;
+if (src_reg ?= 0) goto label;
+ the comparison insn can also be saved.  */
   rtx thumb1_cc_insn;
   rtx thumb1_cc_op0;
+  rtx thumb1_cc_op0_src;
   rtx thumb1_cc_op1;
   /* Also record the CC mode that is supported.  */
   enum machine_mode thumb1_cc_mode;
Index: gcc/config/arm/arm.md
===
--- gcc/config/arm/arm.md   (revision 189835)
+++ gcc/config/arm/arm.md   (working copy)
@@ -7018,7 +7018,8 @@
   rtx t = cfun->machine->thumb1_cc_insn;
   if (t != NULL_RTX)
 {
-  if (!rtx_equal_p (cfun->machine->thumb1_cc_op0, operands[1])
+  if ((!rtx_equal_p (cfun->machine->thumb1_cc_op0, operands[1])
+  && !rtx_equal_p (cfun->machine->thumb1_cc_op0_src, operands[1]))
  || !rtx_equal_p (cfun->machine->thumb1_cc_op1, operands[2]))
t = NULL_RTX;
   if (cfun->machine->thumb1_cc_mode == CC_NOOVmode)
@@ -7034,6 +7035,7 @@
   output_asm_insn ("cmp\t%1, %2", operands);
   cfun->machine->thumb1_cc_insn = insn;
   cfun->machine->thumb1_cc_op0 = operands[1];
+  cfun->machine->thumb1_cc_op0_src = NULL_RTX;
   cfun->machine->thumb1_cc_op1 = operands[2];
   cfun->machine->thumb1_cc_mode = CCmode;
 }
Index: gcc/regcprop.c
===
--- gcc/regcprop.c  (revision 189835)
+++ gcc/regcprop.c  (working copy)
@

Ping: [PATCH]Remove duplicate check on BRANCH_COST in fold-const.c

2012-08-13 Thread Bin Cheng
Ping.

> -Original Message-
> From: Richard Earnshaw
> Sent: Thursday, July 26, 2012 9:19 PM
> To: Andrew Pinski
> Cc: Bin Cheng; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH]Remove duplicate check on BRANCH_COST in fold-const.c
> 
> On 26/07/12 11:27, Andrew Pinski wrote:
> > On Thu, Jul 26, 2012 at 3:20 AM, Bin Cheng  wrote:
> >> Hi,
> >> This patch removes the duplicate check on BRANCH_COST in
fold_truth_andor.
> >> The BRANCH_COST condition removed is a duplicate of the default
> >> definition of LOGICAL_OP_NON_SHORT_CIRCUIT.
> >> All current targets (mips and rs6000) that provide non-default
> >> definitions of LOGICAL_OP_SHORT_CIRCUIT set it to 0, so this patch is
> >> therefore just a code cleanup and does not change behaviour in the
compiler.
> >>
> >> I built mipsel-elf cross compiler and compared newlib/libstdc++
> >> compiled by the patched/original compilers.
> >>
> >> Is it OK?
> >
> > Just some history here on this.  The BRANCH COST check was there
> > before LOGICAL_OP_NON_SHORT_CIRCUIT was added.  I will be submitting a
> > patch which changes the MIPS definition soon but it will not be based
> > on the branch cost but rather than another option.  So in the end it
> > might not be redundant as it is currently.
> >
> > Thanks,
> > Andrew
> >
> 
> You can always factor BRANCH_COST into LOGICAL_OP_NON_SHORT_CIRCUIT (as
the
> default currently does), so there's no loss of functionality from removing
> this currently redundant check.  However, the current definition is broken
in
> that it makes it impossible to force the compiler to use this optimization
> when the branch cost is low.
> 

Is it OK?

Thanks





Re: Merge C++ conversion into trunk (5/6 - double_int rewrite)

2012-08-13 Thread Marc Glisse

On Mon, 13 Aug 2012, Jakub Jelinek wrote:


On Sun, Aug 12, 2012 at 11:30:59PM +0200, Marc Glisse wrote:

+inline double_int &
+double_int::operator ++ ()
+{
+  *this + double_int_one;


*this += double_int_one;
would be less confusing.


Do you mean that *this + double_int_one; alone also works, just is
confusing?  That would mean operator+ has side-effects, right?


It "works" in that it compiles. It is confusing because the addition is 
dead code and thus operator++ is a nop. Sorry for my confusing euphemism, 
I should have called it a bug. operator+ has no side-effects AFAICS.


Note that there are many obvious places where this operator can be used:

varasm.c:  i = double_int_add (i, double_int_one);
tree-vrp.c: prod2h = double_int_add (prod2h, double_int_one);
tree-ssa-loop-niter.c:bound = double_int_add (bound, double_int_one);
tree-ssa-loop-niter.c:  *nit = double_int_add (*nit, double_int_one);
tree-ssa-loop-ivopts.c:max_niter = double_int_add (max_niter, 
double_int_one);
gimple-fold.c:index = double_int_add (index, double_int_one);

etc.

As a side note, I don't usually like making operator+ a member function. 
It doesn't matter when there is no implicit conversion, but if we ever add 
them, it will make addition non-symmetric.


--
Marc Glisse


Scheduler: Save state at the end of a block

2012-08-13 Thread Bernd Schmidt
This is a small patch for sched-rgn that attempts to save DFA state at
the end of a basic block and re-use it in successor blocks. This was a
customer-requested optimization; I've not seen it make much of a
difference in any macro benchmarks.

Bootstrapped and tested on x86_64-linux and also tested on c6x-elf. OK?


Bernd
	* sched-int.h (schedule_block): Adjust declaration.
	* sched-rgn.c (bb_state_array, bb_state): New static variables.
	(sched_rgn_init): Initialize them.
	(sched_rgn_free): Free them.
	(schedule_region): Save scheduling state for future blocks, and
	pass such state to schedule_block.
	* params.def (PARAM_SCHED_STATE_EDGE_PROB_CUTOFF): New.
	* doc/invoke.texi (--param): Document it.
	* haifa-sched.c (schedule_block): New arg init_state.  Use it to
	initialize state if nonnull.  All callers changed.
	Call advance_one_cycle after scheduling.

Index: gcc/sched-ebb.c
===
--- gcc/sched-ebb.c	(revision 189284)
+++ gcc/sched-ebb.c	(working copy)
@@ -544,7 +544,7 @@ schedule_ebb (rtx head, rtx tail, bool m
 
   /* Make ready list big enough to hold all the instructions from the ebb.  */
   sched_extend_ready_list (rgn_n_insns);
-  success = schedule_block (&target_bb);
+  success = schedule_block (&target_bb, NULL);
   gcc_assert (success || modulo_scheduling);
 
   /* Free ready list.  */
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 189284)
+++ gcc/doc/invoke.texi	(working copy)
@@ -9101,6 +9101,11 @@ The minimal probability of speculation s
 speculative insns are scheduled.
 The default value is 40.
 
+@item sched-spec-state-edge-prob-cutoff
+The minimum probability an edge must have for the scheduler to save its
+state across it.
+The default value is 10.
+
 @item sched-mem-true-dep-cost
 Minimal distance (in CPU cycles) between store and load targeting same
 memory locations.  The default value is 1.
Index: gcc/haifa-sched.c
===
--- gcc/haifa-sched.c	(revision 189284)
+++ gcc/haifa-sched.c	(working copy)
@@ -5542,7 +5542,7 @@ verify_shadows (void)
region.  */
 
 bool
-schedule_block (basic_block *target_bb)
+schedule_block (basic_block *target_bb, state_t init_state)
 {
   int i;
   bool success = modulo_ii == 0;
@@ -5573,7 +5573,10 @@ schedule_block (basic_block *target_bb)
   if (sched_verbose)
 dump_new_block_header (0, *target_bb, head, tail);
 
-  state_reset (curr_state);
+  if (init_state == NULL)
+state_reset (curr_state);
+  else
+memcpy (curr_state, init_state, dfa_state_size);
 
   /* Clear the ready list.  */
   ready.first = ready.veclen - 1;
@@ -6029,7 +6032,9 @@ schedule_block (basic_block *target_bb)
 }
   if (ls.modulo_epilogue)
 success = true;
+
  end_schedule:
+  advance_one_cycle ();
   if (modulo_ii > 0)
 {
   /* Once again, debug insn suckiness: they can be on the ready list
Index: gcc/sched-int.h
===
--- gcc/sched-int.h	(revision 189284)
+++ gcc/sched-int.h	(working copy)
@@ -1291,7 +1291,7 @@ extern int dep_cost (dep_t);
 extern int set_priorities (rtx, rtx);
 
 extern void sched_setup_bb_reg_pressure_info (basic_block, rtx);
-extern bool schedule_block (basic_block *);
+extern bool schedule_block (basic_block *, state_t);
 
 extern int cycle_issued_insns;
 extern int issue_rate;
Index: gcc/sched-rgn.c
===
--- gcc/sched-rgn.c	(revision 189284)
+++ gcc/sched-rgn.c	(working copy)
@@ -125,6 +125,9 @@ int current_blocks;
 static basic_block *bblst_table;
 static int bblst_size, bblst_last;
 
+static char *bb_state_array;
+static state_t *bb_state;
+
 /* Target info declarations.
 
The block currently being scheduled is referred to as the "target" block,
@@ -2974,9 +2977,21 @@ schedule_region (int rgn)
   curr_bb = first_bb;
   if (dbg_cnt (sched_block))
 {
-  schedule_block (&curr_bb);
+	  edge f;
+
+  schedule_block (&curr_bb, bb_state[first_bb->index]);
   gcc_assert (EBB_FIRST_BB (bb) == first_bb);
   sched_rgn_n_insns += sched_n_insns;
+	  f = find_fallthru_edge (last_bb->succs);
+	  if (f && f->probability * 100 / REG_BR_PROB_BASE >=
+	  PARAM_VALUE (PARAM_SCHED_STATE_EDGE_PROB_CUTOFF))
+	{
+	  memcpy (bb_state[f->dest->index], curr_state,
+		  dfa_state_size);
+	  if (sched_verbose >= 5)
+		fprintf (sched_dump, "saving state for edge %d->%d\n",
+			 f->src->index, f->dest->index);
+	}
 }
   else
 {
@@ -3009,6 +3024,8 @@ schedule_region (int rgn)
 void
 sched_rgn_init (bool single_blocks_p)
 {
+  int i;
+
   min_spec_prob = ((PARAM_VALUE (PARAM_MIN_SPEC_PROB) * REG_BR_PROB_BASE)
 		/ 100);
 
@@ -3020,6 +3037,23 @@ sched_rgn_init (bool single_blocks_p)
   CONTAINING_RGN (ENTRY_BLOCK) = -1;
   CONTAINING_RG

RE: [PATCH,i386] cpuid function for prefetchw

2012-08-13 Thread Gopalasubramanian, Ganesh
Yes! Thanks Jakub.

-Original Message-
From: Jakub Jelinek [mailto:ja...@redhat.com] 
Sent: Monday, August 13, 2012 3:16 PM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH,i386] cpuid function for prefetchw

On Mon, Aug 13, 2012 at 09:29:45AM +, Gopalasubramanian, Ganesh wrote:
> To get the prefetchw cpuid flag, cpuid function 0x8001 needs to be 
> called.
> Previous to patch, function 0x7 is called.
> 
> Bootstrapping and "make -k check" passes without failures.
> Ok for trunk?

IMHO you move it to a wrong spot, ecx bits of CPUID 0x8001 are tested 
earlier.

So I think you want this instead (bootstrap/regtest in progress):

2012-08-13  Ganesh Gopalasubramanian  
Jakub Jelinek  

PR driver/54210
* config/i386/driver-i386.c (host_detect_local_cpu): Test bit_PRFCHW
bit of CPUID 0x8001 %ecx instead of CPUID 7 %ecx.
* config/i386/cpuid.h (bits_PRFCHW): Move definition to CPUID
0x8001 %ecx flags.

--- gcc/config/i386/driver-i386.c.jj2012-08-10 15:49:25.0 +0200
+++ gcc/config/i386/driver-i386.c   2012-08-13 11:30:14.570494736 +0200
@@ -467,7 +467,6 @@ const char *host_detect_local_cpu (int a
   has_bmi2 = ebx & bit_BMI2;
   has_fsgsbase = ebx & bit_FSGSBASE;
   has_rdseed = ebx & bit_RDSEED;
-  has_prfchw = ecx & bit_PRFCHW;
   has_adx = ebx & bit_ADX;
 }
 
@@ -488,6 +487,7 @@ const char *host_detect_local_cpu (int a
   has_xop = ecx & bit_XOP;
   has_tbm = ecx & bit_TBM;
   has_lzcnt = ecx & bit_LZCNT;
+  has_prfchw = ecx & bit_PRFCHW;
 
   has_longmode = edx & bit_LM;
   has_3dnowp = edx & bit_3DNOWP;
--- gcc/config/i386/cpuid.h.jj  2012-08-10 15:49:25.0 +0200
+++ gcc/config/i386/cpuid.h 2012-08-13 11:31:30.346494092 +0200
@@ -52,6 +52,7 @@
 #define bit_LAHF_LM(1 << 0)
 #define bit_ABM(1 << 5)
 #define bit_SSE4a  (1 << 6)
+#define bit_PRFCHW (1 << 8)
 #define bit_XOP (1 << 11)
 #define bit_LWP(1 << 15)
 #define bit_FMA4(1 << 16)
@@ -69,7 +70,6 @@
 #define bit_HLE(1 << 4)
 #define bit_AVX2   (1 << 5)
 #define bit_BMI2   (1 << 8)
-#define bit_PRFCHW (1 << 8)
 #define bit_RTM(1 << 11)
 #define bit_RDSEED (1 << 18)
 #define bit_ADX(1 << 19)


Jakub




Re: complex.h

2012-08-13 Thread Gabriel Dos Reis
On Fri, Aug 10, 2012 at 7:00 PM, Jonathan Wakely  wrote:
> Let's CC Gaby, who likes to keep an eye on patches involving 

Thanks Jonathan.

The patch is OK -- though I suspect we should have a documentation note
about the extension of allowing other C99 complex functions in .

-- Gaby

>
>
> On 10 August 2012 20:17, Marc Glisse wrote:
>> Ping
>>
>> http://gcc.gnu.org/ml/gcc-patches/2012-07/msg01440.html
>>
>>
>> On Sat, 28 Jul 2012, Marc Glisse wrote:
>>
>>> Hello,
>>>
>>> here is a patch for PR54112. It does 2 things:
>>> * #undef complex after including the system's complex.h
>>> * in C++11, still include the system's complex.h
>>>
>>> The C++11 standard says that including complex.h is equivalent to
>>> including , with the rationale that a C++ compiler can't parse a
>>> C99 complex.h header. However, g++, as an extension, handles _Complex, so it
>>> makes sense to also provide the prototypes for cacos and other C99 complex
>>> math functions.
>>>
>>> Tested, no regression.
>>>
>>> (Cc: Benjamin, who wrote the header)
>>>
>>> 2012-07-28  Marc Glisse  
>>>
>>>PR libstdc++/54112
>>> * include/c_compatibility/complex.h: Undefine complex, always
>>> include system's complex.h if present.
>>> * testsuite/26_numerics/complex/c99.cc: New testcase.
>>> * testsuite/17_intro/headers/c++1998/complex.cc: Likewise.
>>
>>
>> --
>> Marc Glisse


Re: Merge C++ conversion into trunk (5/6 - double_int rewrite)

2012-08-13 Thread Gabriel Dos Reis
On Mon, Aug 13, 2012 at 5:32 AM, Marc Glisse  wrote:
> On Mon, 13 Aug 2012, Jakub Jelinek wrote:
>
>> On Sun, Aug 12, 2012 at 11:30:59PM +0200, Marc Glisse wrote:

 +inline double_int &
 +double_int::operator ++ ()
 +{
 +  *this + double_int_one;
>>>
>>>
>>> *this += double_int_one;
>>> would be less confusing.
>>
>>
>> Do you mean that *this + double_int_one; alone also works, just is
>> confusing?  That would mean operator+ has side-effects, right?
>
>
> It "works" in that it compiles. It is confusing because the addition is dead
> code and thus operator++ is a nop. Sorry for my confusing euphemism, I
> should have called it a bug. operator+ has no side-effects AFAICS.

yes, it is just as confusing and a bug as

   2.3 + 1;

is in plain C.

>
> Note that there are many obvious places where this operator can be used:
>
> varasm.c:  i = double_int_add (i, double_int_one);
> tree-vrp.c: prod2h = double_int_add (prod2h, double_int_one);
> tree-ssa-loop-niter.c:bound = double_int_add (bound, double_int_one);
> tree-ssa-loop-niter.c:  *nit = double_int_add (*nit, double_int_one);
> tree-ssa-loop-ivopts.c:max_niter = double_int_add (max_niter,
> double_int_one);
> gimple-fold.c:index = double_int_add (index, double_int_one);
>
> etc.
>
> As a side note, I don't usually like making operator+ a member function. It
> doesn't matter when there is no implicit conversion, but if we ever add
> them, it will make addition non-symmetric.

As not everybody is familiar with C++ litotes, let me expand on this.  I believe
you are not opposing overloading operator+ on double_int.  You are objecting
to its implementation being defined as a member function.  That is you would be
perfectly fine with operator+ defined as a free function, e.g. not a
member function.

-- Gaby


Re: complex.h

2012-08-13 Thread Marc Glisse

On Mon, 13 Aug 2012, Gabriel Dos Reis wrote:


On Fri, Aug 10, 2012 at 7:00 PM, Jonathan Wakely  wrote:

Let's CC Gaby, who likes to keep an eye on patches involving 


Thanks Jonathan.

The patch is OK -- though I suspect we should have a documentation note
about the extension of allowing other C99 complex functions in .


Thanks.

What about adding at the end of this page:
http://gcc.gnu.org/onlinedocs/libstdc++/manual/numerics.html#numerics.complex.processing

"As an extension to C++11 and for increased compatibility with C, 
 includes both  and the C99  (if the C 
library provides it)."


(there may be better places and better explanations...)

--
Marc Glisse


Re: complex.h

2012-08-13 Thread Gabriel Dos Reis
On Mon, Aug 13, 2012 at 6:17 AM, Marc Glisse  wrote:
> On Mon, 13 Aug 2012, Gabriel Dos Reis wrote:
>
>> On Fri, Aug 10, 2012 at 7:00 PM, Jonathan Wakely 
>> wrote:
>>>
>>> Let's CC Gaby, who likes to keep an eye on patches involving 
>>
>>
>> Thanks Jonathan.
>>
>> The patch is OK -- though I suspect we should have a documentation note
>> about the extension of allowing other C99 complex functions in
>> .
>
>
> Thanks.
>
> What about adding at the end of this page:
> http://gcc.gnu.org/onlinedocs/libstdc++/manual/numerics.html#numerics.complex.processing
>
> "As an extension to C++11 and for increased compatibility with C,
>  includes both  and the C99  (if the C
> library provides it)."
>
> (there may be better places and better explanations...)
>
> --
> Marc Glisse

That is fine.

Thanks,

-- Gaby


Re: Merge C++ conversion into trunk (0/6 - Overview)

2012-08-13 Thread Diego Novillo

On 12-08-13 05:37 , Richard Guenther wrote:

On Sun, Aug 12, 2012 at 10:04 PM, Diego Novillo  wrote:

I will be sending 6 patches that implement all the changes we
have been making on the cxx-conversion branch.  As described in
http://gcc.gnu.org/ml/gcc/2012-08/msg00015.html, these patches
change the default bootstrap process so that stage 1 always
builds with a C++ compiler.

Other than the bootstrap change, the patches make no functional
changes to the compiler.  Everything should build as it does now
in trunk.

I have split the merge in 6 main patches.  I will send these
patches to the respective maintainers and gcc-patches.
Please remember that the patches conform to the new C++ coding
guidelines (http://gcc.gnu.org/codingconventions.html#Cxx_Conventions):

1- Configuration changes.
2- Re-write of VEC.
3- Re-write of gengtype to support C++ templates and
user-provided marking functions.
4- New hash table class.
5- Re-write double_int.
6- Implement tree macros as inline functions so they can be
called from gdb.

As discussed before, several of these patches do not fully change
the call sites to use the new APIs.  We will do this change once
the branch has been merged into trunk.  Otherwise, the branch
becomes a maintenance nightmare (despite not having changed many
caller sites we were already starting to run into maintenance
problems).


As I understand only 1. to 3. were kind-of required for the merge, all
other changes are a bonus at this time and should be delayed IMHO
(thus not merged with this batch).


Both #4 and #5 have the same issues as #3 (VEC).  What remains to be 
done is update a whole swath of user code.  This is hard to maintain in 
the branch and needs to be done during stage 1.  The change in #6 is 
similarly ready, so delaying it makes no sense.


What I can do is merge #1-#3 as one rev and merge the others as 3 
separate revisions.



I also understand that you will, quickly after merging 1. to 3. convert
all VEC users and remove the old interface.  This should be done
before any of 4. - 6. is merged as generally we don't want the
"half-converted" to persist, nor have multiple such half-conversions
at the same time.


Yes, there will be no half conversions.  We are committed to continue 
making changes in this area.



Diego.


Re: Merge C++ conversion into trunk (2/6 - VEC rewrite)

2012-08-13 Thread Diego Novillo

On 12-08-13 05:39 , Richard Guenther wrote:


It's an odd thing that you need to touch code replacing -> with . (yes, it's
due to the use of references) but not at the same time convert those places
to the new VEC interface.


Yes.  I hated this aspect of the initial conversion.  It caused many 
merge problems.  Unfortunately, I could not escape it.



Diego.


Re: complex.h

2012-08-13 Thread Marc Glisse

On Mon, 13 Aug 2012, Gabriel Dos Reis wrote:


On Mon, Aug 13, 2012 at 6:17 AM, Marc Glisse  wrote:

On Mon, 13 Aug 2012, Gabriel Dos Reis wrote:


On Fri, Aug 10, 2012 at 7:00 PM, Jonathan Wakely 
wrote:


Let's CC Gaby, who likes to keep an eye on patches involving 



Thanks Jonathan.

The patch is OK -- though I suspect we should have a documentation note
about the extension of allowing other C99 complex functions in
.



Thanks.

What about adding at the end of this page:
http://gcc.gnu.org/onlinedocs/libstdc++/manual/numerics.html#numerics.complex.processing

"As an extension to C++11 and for increased compatibility with C,
 includes both  and the C99  (if the C
library provides it)."

(there may be better places and better explanations...)

--
Marc Glisse


That is fine.

Thanks,


I only modified the xml version. I expect the html version will be updated 
the next time someone who knows what they are doing touches the doc...


--
Marc Glisse


[PATCH] Fix some undefined behavior spots in gcc sources (PR c/53968)

2012-08-13 Thread Jakub Jelinek
Hi!

John Regehr discovered a couple of spots in GCC sources that invoke
undefined behavior during bootstrap/regtest, the following patch fixes most
of them.  I couldn't reproduce the diagnostic.c failure and would like to
leave the ipa hunk to Honza, I think the probability/frequency code often
might go out of the expected limits and then invoke undefined signed
overflow.

The double_int_mask change is because e.g. with -E, ptr_mode is VOIDmode and
so double_int_mask is called with 0 precision during compiler
initialization.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2012-08-13  Jakub Jelinek  

PR c/53968
* tree.c (integer_pow2p): Avoid undefined signed overflows.
* simplify-rtx.c (neg_const_int): Likewise.
* expr.c (fixup_args_size_notes): Likewise.
* stor-layout.c (set_min_and_max_values_for_integral_type): Likewise.
* double-int.c (mul_double_wide_with_sign): Likewise.
(double_int_mask): Likewise.
* tree-ssa-loop-ivopts.c (get_address_cost): Likewise.

--- gcc/tree.c.jj   2012-08-10 15:48:53.0 +0200
+++ gcc/tree.c  2012-08-13 10:25:46.610502929 +0200
@@ -1849,7 +1849,7 @@ int
 integer_pow2p (const_tree expr)
 {
   int prec;
-  HOST_WIDE_INT high, low;
+  unsigned HOST_WIDE_INT high, low;
 
   STRIP_NOPS (expr);
 
--- gcc/simplify-rtx.c.jj   2012-08-10 15:49:20.0 +0200
+++ gcc/simplify-rtx.c  2012-08-13 09:51:43.628508537 +0200
@@ -66,7 +66,7 @@ static rtx simplify_binary_operation_1 (
 static rtx
 neg_const_int (enum machine_mode mode, const_rtx i)
 {
-  return gen_int_mode (- INTVAL (i), mode);
+  return gen_int_mode (-(unsigned HOST_WIDE_INT) INTVAL (i), mode);
 }
 
 /* Test whether expression, X, is an immediate constant that represents
--- gcc/expr.c.jj   2012-08-10 15:49:07.0 +0200
+++ gcc/expr.c  2012-08-13 10:40:01.182501639 +0200
@@ -3828,7 +3828,7 @@ fixup_args_size_notes (rtx prev, rtx las
 
   add_reg_note (insn, REG_ARGS_SIZE, GEN_INT (args_size));
 #ifdef STACK_GROWS_DOWNWARD
-  this_delta = -this_delta;
+  this_delta = -(unsigned HOST_WIDE_INT) this_delta;
 #endif
   args_size -= this_delta;
 }
--- gcc/stor-layout.c.jj2012-08-10 15:49:20.0 +0200
+++ gcc/stor-layout.c   2012-08-13 10:14:14.388505253 +0200
@@ -2568,10 +2568,14 @@ set_min_and_max_values_for_integral_type
= build_int_cst_wide (type,
  (precision - HOST_BITS_PER_WIDE_INT > 0
   ? -1
-  : ((HOST_WIDE_INT) 1 << (precision - 1)) - 1),
+  : (HOST_WIDE_INT)
+(((unsigned HOST_WIDE_INT) 1
+  << (precision - 1)) - 1)),
  (precision - HOST_BITS_PER_WIDE_INT - 1 > 0
-  ? (((HOST_WIDE_INT) 1
-  << (precision - HOST_BITS_PER_WIDE_INT - 
1))) - 1
+  ? (HOST_WIDE_INT)
+unsigned HOST_WIDE_INT) 1
+   << (precision - HOST_BITS_PER_WIDE_INT
+   - 1))) - 1)
   : 0));
 }
 
--- gcc/double-int.c.jj 2012-08-10 15:49:07.0 +0200
+++ gcc/double-int.c2012-08-13 11:24:17.816495757 +0200
@@ -170,7 +170,7 @@ mul_double_wide_with_sign (unsigned HOST
{
  k = i + j;
  /* This product is <= 0xFFFE0001, the sum <= 0x.  */
- carry += arg1[i] * arg2[j];
+ carry += (unsigned HOST_WIDE_INT) arg1[i] * arg2[j];
  /* Since prod[p] < 0x, this sum <= 0x.  */
  carry += prod[k];
  prod[k] = LOWPART (carry);
@@ -625,7 +625,7 @@ double_int_mask (unsigned prec)
   else
 {
   mask.high = 0;
-  mask.low = ((unsigned HOST_WIDE_INT) 2 << (prec - 1)) - 1;
+  mask.low = prec ? ((unsigned HOST_WIDE_INT) 2 << (prec - 1)) - 1 : 0;
 }
 
   return mask;
--- gcc/tree-ssa-loop-ivopts.c.jj   2012-08-10 15:49:07.0 +0200
+++ gcc/tree-ssa-loop-ivopts.c  2012-08-13 10:17:51.227504425 +0200
@@ -3173,7 +3173,7 @@ get_address_cost (bool symbol_present, b
 
   for (i = width; i >= 0; i--)
{
- off = -((HOST_WIDE_INT) 1 << i);
+ off = -((unsigned HOST_WIDE_INT) 1 << i);
  XEXP (addr, 1) = gen_int_mode (off, address_mode);
  if (memory_address_addr_space_p (mem_mode, addr, as))
break;
@@ -3182,7 +3182,7 @@ get_address_cost (bool symbol_present, b
 
   for (i = width; i >= 0; i--)
{
- off = ((HOST_WIDE_INT) 1 << i) - 1;
+ off = ((unsigned HOST_WIDE_INT) 1 << i) - 1;
  XEXP (addr, 1) = gen_int_mode (off, address_mode);
  if (memory_address_addr_space_p (mem_mode, addr, as))
break;

Jakub


Re: Value type of map need not be default copyable

2012-08-13 Thread Paolo Carlini

On 08/12/2012 10:00 PM, François Dumont wrote:

On 08/11/2012 03:47 PM, Marc Glisse wrote:

On Sat, 11 Aug 2012, François Dumont wrote:

   Your remark on using std::move rather than std::forward Marc made 
sens but didn't work. I don't understand why but the new test is 
showing that std::forward works. If anyone can explain why std::move 
doesn't work I am interested.


What testcase failed? I just tried the 2.cc file you added with the 
patch, and replacing forward(__k) with move(__k) compiled 
fine.




You are right, I replaced std::forward with 
std::move forcing a wrong type deduction in std::move. With 
a simple std::move() it works fine. So here is the patch again.


2012-08-10  François Dumont  
Ollie Wild  

* include/bits/hashtable.h
(_Hashtable<>_M_insert_multi_node(hash_code, node_type*)): New.
(_Hashtable<>_M_insert(_Args&&, false_type)): Use latter.
(_Hashtable<>::_M_emplace(false_type, _Args&&...)): Likewise.
(_Hashtable<>::_M_insert_bucket): Replace by ...
(_Hashtable<>::_M_insert_unique_node(size_type, hash_code, 
node_type*)):

... this, new.
(_Hashtable<>::_M_insert(_Args&&, true_type)): Use latter.
(_Hashtable<>::_M_emplace(true_type, _Args&&...)): Likewise.
* include/bits/hashtable_policy.h (_Map_base<>::operator[]): Use
latter, emplace the value_type rather than insert.
* include/std/unordered_map: Include tuple.
* include/std/unordered_set: Likewise.
* testsuite/util/testsuite_counter_type.h: New.
* testsuite/23_containers/unordered_map/operators/2.cc: New.

Tested under Linux x86_64.

Ok for trunk ?

Ok, thanks!

Paolo.

PS: you may want to remove the trailing blank line of 
testsuite_counter_type.h


Re: [PATCH,i386] cpuid function for prefetchw

2012-08-13 Thread Jakub Jelinek
On Mon, Aug 13, 2012 at 11:45:36AM +0200, Jakub Jelinek wrote:
> On Mon, Aug 13, 2012 at 09:29:45AM +, Gopalasubramanian, Ganesh wrote:
> > To get the prefetchw cpuid flag, cpuid
> > function 0x8001 needs to be called.
> > Previous to patch, function 0x7 is called.
> > 
> > Bootstrapping and "make -k check" passes without failures.
> > Ok for trunk?
> 
> IMHO you move it to a wrong spot, ecx bits of CPUID 0x8001 are tested
> earlier.
> 
> So I think you want this instead (bootstrap/regtest in progress):
> 
> 2012-08-13  Ganesh Gopalasubramanian  
>   Jakub Jelinek  
> 
>   PR driver/54210
>   * config/i386/driver-i386.c (host_detect_local_cpu): Test bit_PRFCHW
>   bit of CPUID 0x8001 %ecx instead of CPUID 7 %ecx.
>   * config/i386/cpuid.h (bits_PRFCHW): Move definition to CPUID
>   0x8001 %ecx flags.

Now bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Jakub


Re: combine permutations in gimple

2012-08-13 Thread Richard Guenther
On Sat, Aug 11, 2012 at 2:25 PM, Marc Glisse  wrote:
> On Fri, 10 Aug 2012, Marc Glisse wrote:
>
>> this patch detects permutations of permutations and merges them. It also
>> canonicalizes permutations a bit more.
>>
>> There are several issues with this patch:
>>
>> 1) I am not sure we always want to combine permutations. Indeed, someone
>> (user? vectorizer?) may have written 2 permutations to help the backend
>> generate optimal code, whereas it will do a bad job on the complicated
>> combined permutation. At tree level, I am not sure what can be done except
>> possibly restrict the optimization to the case where the combined
>> permutation is the identity. At the RTL level, we could compare what insns
>> are generated, but that's going to be painful (doing anything with
>> permutations at the RTL level is painful).
>>
>> 2) I don't understand how the cleanup works in tree-ssa-forwprop.c. I
>> copied some fold/update/remove lines from other simplifications, but I don't
>> know if they are right.
>>
>> 3) In a first version of the patch, where I had SSA_NAME_DEF_STMT instead
>> of get_prop_source_stmt, I got an ICE in one of the torture vectorization
>> testcases. It happened in expand_expr_real_2, because it asserts that
>> expand_vec_perm will never fail. However, expand_vec_perm does return
>> NULL_RTX sometimes. Here it was in V16QImode, with the permutation
>> {0,0,2,2,4,...,14,14}. maybe_expand_insn can't handle it directly (that's ok
>> I guess), but then expand_vec_perm sees VOIDmode and exits instead of trying
>> other ways. I don't know if there is a latent bug or if (more likely) my
>> patch may produce trees with wrong modes.
>>
>> 4) Is this the right place?
>>
>> This isn't the transformation I am most interested in, but it is a first
>> step to see if the direction is right.
>
>
> Hello,
>
> there was yet another issue with the version I posted: the testcase was
> trivial so I didn't notice that it didn't perform the optimization at all...
>
> Here is a new version. It seems a bit strange to me that there are plenty of
> functions that check for single-use variables, but there isn't any that
> checks for variables used in a single statement (but possibly twice). So I
> may be doing something strange, but it seems to be the natural test here.
>
> Happily passed bootstrap+regtest.
>
> 2012-08-11  Marc Glisse  
>
>
> gcc/
> * fold-const.c (fold_ternary_loc): Detect identity permutations.
> Canonicalize permutations more.
> * tree-ssa-forwprop.c (is_only_used_in): New function.
> (simplify_permutation): Likewise.
>
> (ssa_forward_propagate_and_combine): Call it.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/forwprop-19.c: New testcase.
>
> --
> Marc Glisse
>
> Index: gcc/tree-ssa-forwprop.c
> ===
> --- gcc/tree-ssa-forwprop.c (revision 190311)
> +++ gcc/tree-ssa-forwprop.c (working copy)
> @@ -2569,20 +2569,97 @@ combine_conversions (gimple_stmt_iterato
>   gimple_assign_set_rhs_code (stmt, CONVERT_EXPR);
>   update_stmt (stmt);
>   return remove_prop_source_from_use (op0) ? 2 : 1;
> }
> }
>  }
>
>return 0;
>  }
>
> +/* Return true if VAR has no nondebug use but in stmt.  */
> +static bool
> +is_only_used_in (const_tree var, const_gimple stmt)
> +{
> +  const ssa_use_operand_t *const ptr0 = &(SSA_NAME_IMM_USE_NODE (var));
> +  const ssa_use_operand_t *ptr = ptr0->next;
> +
> +  for (; ptr != ptr0; ptr = ptr->next)
> +{
> +  const_gimple use_stmt = USE_STMT (ptr);
> +  if (is_gimple_debug (use_stmt))
> +   continue;
> +  if (use_stmt != stmt)
> +   return false;
> +}
> +  return true;
> +}

if (!single_imm_use (var, &use, &use_stmt) || use_stmt != stmt)

instead of the above.

> +/* Combine two shuffles in a row.  Returns 1 if there were any changes
> +   made, 2 if cfg-cleanup needs to run.  Else it returns 0.  */
> +
> +static int
> +simplify_permutation (gimple_stmt_iterator *gsi)
> +{
> +  gimple stmt = gsi_stmt (*gsi);
> +  gimple def_stmt;
> +  tree op0, op1, op2, op3, mask;
> +  enum tree_code code = gimple_assign_rhs_code (stmt);
> +  enum tree_code code2;
> +  location_t loc = gimple_location (stmt);
> +
> +  gcc_checking_assert (code == VEC_PERM_EXPR);
> +
> +  op0 = gimple_assign_rhs1 (stmt);
> +  op1 = gimple_assign_rhs2 (stmt);
> +  op2 = gimple_assign_rhs3 (stmt);
> +
> +  if (TREE_CODE (op0) != SSA_NAME)
> +return 0;
> +
> +  if (TREE_CODE (op2) != VECTOR_CST)
> +return 0;
> +
> +  def_stmt = SSA_NAME_DEF_STMT (op0);
> +  if (!def_stmt || !is_gimple_assign (def_stmt)
> +  || !can_propagate_from (def_stmt)
> +  || !is_only_used_in (op0, stmt))

Or rather than the above, simply check

   || !has_single_use (op0)

here.

> +return 0;
> +
> +  /* Check that it is only used here. We cannot use has_single_use
> + since the expression is using it twice 

Re: [PATCH] Fix some undefined behavior spots in gcc sources (PR c/53968)

2012-08-13 Thread Richard Guenther
On Mon, Aug 13, 2012 at 2:01 PM, Jakub Jelinek  wrote:
> Hi!
>
> John Regehr discovered a couple of spots in GCC sources that invoke
> undefined behavior during bootstrap/regtest, the following patch fixes most
> of them.  I couldn't reproduce the diagnostic.c failure and would like to
> leave the ipa hunk to Honza, I think the probability/frequency code often
> might go out of the expected limits and then invoke undefined signed
> overflow.
>
> The double_int_mask change is because e.g. with -E, ptr_mode is VOIDmode and
> so double_int_mask is called with 0 precision during compiler
> initialization.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2012-08-13  Jakub Jelinek  
>
> PR c/53968
> * tree.c (integer_pow2p): Avoid undefined signed overflows.
> * simplify-rtx.c (neg_const_int): Likewise.
> * expr.c (fixup_args_size_notes): Likewise.
> * stor-layout.c (set_min_and_max_values_for_integral_type): Likewise.
> * double-int.c (mul_double_wide_with_sign): Likewise.
> (double_int_mask): Likewise.
> * tree-ssa-loop-ivopts.c (get_address_cost): Likewise.
>
> --- gcc/tree.c.jj   2012-08-10 15:48:53.0 +0200
> +++ gcc/tree.c  2012-08-13 10:25:46.610502929 +0200
> @@ -1849,7 +1849,7 @@ int
>  integer_pow2p (const_tree expr)
>  {
>int prec;
> -  HOST_WIDE_INT high, low;
> +  unsigned HOST_WIDE_INT high, low;
>
>STRIP_NOPS (expr);
>
> --- gcc/simplify-rtx.c.jj   2012-08-10 15:49:20.0 +0200
> +++ gcc/simplify-rtx.c  2012-08-13 09:51:43.628508537 +0200
> @@ -66,7 +66,7 @@ static rtx simplify_binary_operation_1 (
>  static rtx
>  neg_const_int (enum machine_mode mode, const_rtx i)
>  {
> -  return gen_int_mode (- INTVAL (i), mode);
> +  return gen_int_mode (-(unsigned HOST_WIDE_INT) INTVAL (i), mode);
>  }
>
>  /* Test whether expression, X, is an immediate constant that represents
> --- gcc/expr.c.jj   2012-08-10 15:49:07.0 +0200
> +++ gcc/expr.c  2012-08-13 10:40:01.182501639 +0200
> @@ -3828,7 +3828,7 @@ fixup_args_size_notes (rtx prev, rtx las
>
>add_reg_note (insn, REG_ARGS_SIZE, GEN_INT (args_size));
>  #ifdef STACK_GROWS_DOWNWARD
> -  this_delta = -this_delta;
> +  this_delta = -(unsigned HOST_WIDE_INT) this_delta;
>  #endif
>args_size -= this_delta;
>  }
> --- gcc/stor-layout.c.jj2012-08-10 15:49:20.0 +0200
> +++ gcc/stor-layout.c   2012-08-13 10:14:14.388505253 +0200
> @@ -2568,10 +2568,14 @@ set_min_and_max_values_for_integral_type
> = build_int_cst_wide (type,
>   (precision - HOST_BITS_PER_WIDE_INT > 0
>? -1
> -  : ((HOST_WIDE_INT) 1 << (precision - 1)) - 1),
> +  : (HOST_WIDE_INT)
> +(((unsigned HOST_WIDE_INT) 1
> +  << (precision - 1)) - 1)),
>   (precision - HOST_BITS_PER_WIDE_INT - 1 > 0
> -  ? (((HOST_WIDE_INT) 1
> -  << (precision - HOST_BITS_PER_WIDE_INT - 
> 1))) - 1
> +  ? (HOST_WIDE_INT)
> +unsigned HOST_WIDE_INT) 1
> +   << (precision - HOST_BITS_PER_WIDE_INT
> +   - 1))) - 1)
>: 0));
>  }
>
> --- gcc/double-int.c.jj 2012-08-10 15:49:07.0 +0200
> +++ gcc/double-int.c2012-08-13 11:24:17.816495757 +0200
> @@ -170,7 +170,7 @@ mul_double_wide_with_sign (unsigned HOST
> {
>   k = i + j;
>   /* This product is <= 0xFFFE0001, the sum <= 0x.  */
> - carry += arg1[i] * arg2[j];
> + carry += (unsigned HOST_WIDE_INT) arg1[i] * arg2[j];
>   /* Since prod[p] < 0x, this sum <= 0x.  */
>   carry += prod[k];
>   prod[k] = LOWPART (carry);
> @@ -625,7 +625,7 @@ double_int_mask (unsigned prec)
>else
>  {
>mask.high = 0;
> -  mask.low = ((unsigned HOST_WIDE_INT) 2 << (prec - 1)) - 1;
> +  mask.low = prec ? ((unsigned HOST_WIDE_INT) 2 << (prec - 1)) - 1 : 0;
>  }
>
>return mask;
> --- gcc/tree-ssa-loop-ivopts.c.jj   2012-08-10 15:49:07.0 +0200
> +++ gcc/tree-ssa-loop-ivopts.c  2012-08-13 10:17:51.227504425 +0200
> @@ -3173,7 +3173,7 @@ get_address_cost (bool symbol_present, b
>
>for (i = width; i >= 0; i--)
> {
> - off = -((HOST_WIDE_INT) 1 << i);
> + off = -((unsigned HOST_WIDE_INT) 1 << i);
>   XEXP (addr, 1) = gen_int_mode (off, address_mode);
>   if (memory_address_addr_space_p (mem_mode, addr, as))
> break;
> @@ -3182,7 +3182,7 @@ get_address_cost (bool symbol_present, b
>
>for (i = width; i >= 0; i--)
> {
> - off = ((HOST_WIDE_INT) 1 << i) - 1;
> + 

Re: complex.h

2012-08-13 Thread Rainer Orth
Paolo Carlini  writes:

> Marc Glisse  ha scritto:
>
>>To be honest, I only checked the patch on linux/glibc, so there is a
>>real 
>>risk on other platforms (which I don't have access to). I also did a
>>quick 
>>sanity check on freebsd (not a true test).
>
> Ok, conditioning the small change on glibc would not be a big deal, in
> case. Let's ask Rainer to double check Solaris, anyway.

The testcase from the PR fails on i386-pc-solaris2.11 with g++ 4.7.0,
but works with #undef complex added, for all of -std=c++98/03/11.  I
don't have an installed version of mainline, but will keep an eye for
failures during the next round of bootstraps.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Scheduler: Save state at the end of a block

2012-08-13 Thread Andrey Belevantsev

On 13.08.2012 14:32, Bernd Schmidt wrote:

This is a small patch for sched-rgn that attempts to save DFA state at
the end of a basic block and re-use it in successor blocks. This was a
customer-requested optimization; I've not seen it make much of a
difference in any macro benchmarks.


FWIW, this was definitely helpful for sel-sched on ia64, as far as I 
recall, and likewise on some of the smaller tests.


Andrey



Bootstrapped and tested on x86_64-linux and also tested on c6x-elf. OK?


Bernd





[AArch64] Merge from upstream trunk r190154

2012-08-13 Thread Sofiane Naci
Hi,

I've just merged upstream trunk on the aarch64-branch up to r190335.

Thanks
Sofiane






Re: [PATCH,i386] cpuid function for prefetchw

2012-08-13 Thread Uros Bizjak
On Mon, Aug 13, 2012 at 2:10 PM, Jakub Jelinek  wrote:

>> > To get the prefetchw cpuid flag, cpuid
>> > function 0x8001 needs to be called.
>> > Previous to patch, function 0x7 is called.
>> >
>> > Bootstrapping and "make -k check" passes without failures.
>> > Ok for trunk?
>>
>> IMHO you move it to a wrong spot, ecx bits of CPUID 0x8001 are tested
>> earlier.
>>
>> So I think you want this instead (bootstrap/regtest in progress):
>>
>> 2012-08-13  Ganesh Gopalasubramanian  
>>   Jakub Jelinek  
>>
>>   PR driver/54210
>>   * config/i386/driver-i386.c (host_detect_local_cpu): Test bit_PRFCHW
>>   bit of CPUID 0x8001 %ecx instead of CPUID 7 %ecx.
>>   * config/i386/cpuid.h (bits_PRFCHW): Move definition to CPUID
>>   0x8001 %ecx flags.
>
> Now bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK, but I didn't find PRFCHW in the cpuid documentation to confirm the change.

Thanks,
Uros.


[PATCH] Remove basic_block->loop_depth

2012-08-13 Thread Richard Guenther

Accessing loop_depth (bb->loop_father) isn't very expensive.  The
following removes the duplicate info in basic-blocks which is not
properly kept up-to-date at the moment.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2012-08-13  Richard Guenther  

* basic-block.h (struct basic_block): Remove loop_depth
member, move flags and index members next to each other.
* cfgloop.h (bb_loop_depth): New inline function.
* cfghooks.c (split_block): Do not set loop_depth.
(duplicate_block): Likewise.
* cfgloop.c (flow_loop_nodes_find): Likewise.
(flow_loops_find): Likewise.
(add_bb_to_loop): Likewise.
(remove_bb_from_loops): Likewise.
* cfgrtl.c (force_nonfallthru_and_redirect): Likewise.
* gimple-streamer-in.c (input_bb): Do not stream loop_depth.
* gimple-streamer-out.c (output_bb): Likewise.
* bt-load.c: Include cfgloop.h.
(migrate_btr_defs): Use bb_loop_depth.
* cfg.c (dump_bb_info): Likewise.
* final.c (compute_alignments): Likewise.
* ira.c (update_equiv_regs): Likewise.
* tree-ssa-copy.c (init_copy_prop): Likewise.
* tree-ssa-dom.c (loop_depth_of_name): Likewise.
* tree-ssa-forwprop.c: Include cfgloop.h.
(forward_propagate_addr_expr): Use bb_loop_depth.
* tree-ssa-pre.c (insert_into_preds_of_block): Likewise.
* tree-ssa-sink.c (select_best_block): Likewise.
* ipa-inline-analysis.c: Include cfgloop.h.
(estimate_function_body_sizes): Use bb_loop_depth.
* Makefile.in (tree-ssa-forwprop.o): Depend on $(CFGLOOP_H).
(ipa-inline-analysis.o): Likewise.
(bt-load.o): Likewise.

Index: trunk/gcc/basic-block.h
===
*** trunk.orig/gcc/basic-block.h2012-08-13 13:58:13.0 +0200
--- trunk/gcc/basic-block.h 2012-08-13 14:03:09.438889233 +0200
*** struct GTY((chain_next ("%h.next_bb"), c
*** 160,173 
} GTY ((tag ("1"))) x;
  } GTY ((desc ("((%1.flags & BB_RTL) != 0)"))) il;
  
!   /* Expected number of executions: calculated in profile.c.  */
!   gcov_type count;
  
/* The index of this block.  */
int index;
  
!   /* The loop depth of this block.  */
!   int loop_depth;
  
/* Expected frequency.  Normalized to be in range 0 to BB_FREQ_MAX.  */
int frequency;
--- 160,173 
} GTY ((tag ("1"))) x;
  } GTY ((desc ("((%1.flags & BB_RTL) != 0)"))) il;
  
!   /* Various flags.  See cfg-flags.def.  */
!   int flags;
  
/* The index of this block.  */
int index;
  
!   /* Expected number of executions: calculated in profile.c.  */
!   gcov_type count;
  
/* Expected frequency.  Normalized to be in range 0 to BB_FREQ_MAX.  */
int frequency;
*** struct GTY((chain_next ("%h.next_bb"), c
*** 176,184 
   among several basic blocks that share a common locus, allowing for
   more accurate sample-based profiling.  */
int discriminator;
- 
-   /* Various flags.  See cfg-flags.def.  */
-   int flags;
  };
  
  /* This ensures that struct gimple_bb_info is smaller than
--- 176,181 
Index: trunk/gcc/cfghooks.c
===
*** trunk.orig/gcc/cfghooks.c   2012-08-13 13:58:13.0 +0200
--- trunk/gcc/cfghooks.c2012-08-13 14:03:09.438889233 +0200
*** split_block (basic_block bb, void *i)
*** 462,468 
  
new_bb->count = bb->count;
new_bb->frequency = bb->frequency;
-   new_bb->loop_depth = bb->loop_depth;
new_bb->discriminator = bb->discriminator;
  
if (dom_info_available_p (CDI_DOMINATORS))
--- 462,467 
*** duplicate_block (basic_block bb, edge e,
*** 985,991 
if (after)
  move_block_after (new_bb, after);
  
-   new_bb->loop_depth = bb->loop_depth;
new_bb->flags = bb->flags;
FOR_EACH_EDGE (s, ei, bb->succs)
  {
--- 984,989 
Index: trunk/gcc/cfgloop.c
===
*** trunk.orig/gcc/cfgloop.c2012-08-13 13:58:13.0 +0200
--- trunk/gcc/cfgloop.c 2012-08-13 14:11:57.325870997 +0200
*** flow_loop_nodes_find (basic_block header
*** 229,238 
int num_nodes = 1;
edge latch;
edge_iterator latch_ei;
-   unsigned depth = loop_depth (loop);
  
header->loop_father = loop;
-   header->loop_depth = depth;
  
FOR_EACH_EDGE (latch, latch_ei, loop->header->preds)
  {
--- 229,236 
*** flow_loop_nodes_find (basic_block header
*** 243,249 
num_nodes++;
VEC_safe_push (basic_block, heap, stack, latch->src);
latch->src->loop_father = loop;
-   latch->src->loop_depth = depth;
  
while (!VEC_empty (basic_block, stack))
{
--- 241,246 
*** flow_loop_nodes_find (basic_block header
*** 260,266 
  if (ancest

[PATCH] Fix loop dumping ICEs

2012-08-13 Thread Richard Guenther

Bootstrapped on x86_64-unknown-linux-gnu, committed.

Richard.

2012-08-13  Richard Guenther  

* tree-cfg.c (print_loop): Avoid ICEing for loops marked for
removal and loops with multiple latches.

Index: gcc/tree-cfg.c
===
*** gcc/tree-cfg.c  (revision 190339)
--- gcc/tree-cfg.c  (working copy)
*** print_loop (FILE *file, struct loop *loo
*** 6870,6877 
s_indent[indent] = '\0';
  
/* Print loop's header.  */
!   fprintf (file, "%sloop_%d (header = %d, latch = %d", s_indent,
!  loop->num, loop->header->index, loop->latch->index);
fprintf (file, ", niter = ");
print_generic_expr (file, loop->nb_iterations, 0);
  
--- 6969,6986 
s_indent[indent] = '\0';
  
/* Print loop's header.  */
!   fprintf (file, "%sloop_%d (", s_indent, loop->num);
!   if (loop->header)
! fprintf (file, "header = %d", loop->header->index);
!   else
! {
!   fprintf (file, "deleted)\n");
!   return;
! }
!   if (loop->latch)
! fprintf (file, ", latch = %d", loop->latch->index);
!   else
! fprintf (file, ", multiple latches");
fprintf (file, ", niter = ");
print_generic_expr (file, loop->nb_iterations, 0);
  


Re: combine permutations in gimple

2012-08-13 Thread Marc Glisse

On Mon, 13 Aug 2012, Richard Guenther wrote:


On Sat, Aug 11, 2012 at 2:25 PM, Marc Glisse  wrote:

On Fri, 10 Aug 2012, Marc Glisse wrote:


1) I am not sure we always want to combine permutations. Indeed, someone
(user? vectorizer?) may have written 2 permutations to help the backend
generate optimal code, whereas it will do a bad job on the complicated
combined permutation. At tree level, I am not sure what can be done except
possibly restrict the optimization to the case where the combined
permutation is the identity. At the RTL level, we could compare what insns
are generated, but that's going to be painful (doing anything with
permutations at the RTL level is painful).


I guess people will complain soon enough if this causes horrible 
performance regressions in vectorized code.



+/* Return true if VAR has no nondebug use but in stmt.  */
+static bool
+is_only_used_in (const_tree var, const_gimple stmt)
+{
+  const ssa_use_operand_t *const ptr0 = &(SSA_NAME_IMM_USE_NODE (var));
+  const ssa_use_operand_t *ptr = ptr0->next;
+
+  for (; ptr != ptr0; ptr = ptr->next)
+{
+  const_gimple use_stmt = USE_STMT (ptr);
+  if (is_gimple_debug (use_stmt))
+   continue;
+  if (use_stmt != stmt)
+   return false;
+}
+  return true;
+}


if (!single_imm_use (var, &use, &use_stmt) || use_stmt != stmt)

instead of the above.


I don't think that works with statements that use a variable twice.


+/* Combine two shuffles in a row.  Returns 1 if there were any changes
+   made, 2 if cfg-cleanup needs to run.  Else it returns 0.  */
+
+static int
+simplify_permutation (gimple_stmt_iterator *gsi)
+{
+  gimple stmt = gsi_stmt (*gsi);
+  gimple def_stmt;
+  tree op0, op1, op2, op3, mask;
+  enum tree_code code = gimple_assign_rhs_code (stmt);
+  enum tree_code code2;
+  location_t loc = gimple_location (stmt);
+
+  gcc_checking_assert (code == VEC_PERM_EXPR);
+
+  op0 = gimple_assign_rhs1 (stmt);
+  op1 = gimple_assign_rhs2 (stmt);
+  op2 = gimple_assign_rhs3 (stmt);
+
+  if (TREE_CODE (op0) != SSA_NAME)
+return 0;
+
+  if (TREE_CODE (op2) != VECTOR_CST)
+return 0;
+
+  def_stmt = SSA_NAME_DEF_STMT (op0);
+  if (!def_stmt || !is_gimple_assign (def_stmt)
+  || !can_propagate_from (def_stmt)
+  || !is_only_used_in (op0, stmt))


Or rather than the above, simply check

  || !has_single_use (op0)

here.


Then there was my previous (non-working) patch that used 
get_prop_source_stmt.



+return 0;
+
+  /* Check that it is only used here. We cannot use has_single_use
+ since the expression is using it twice itself...  */


Ah ... so then

  || num_imm_uses (op0) != 2


Ah, ok, that's simpler indeed, but there were such dire warnings to never 
use that evil function unless absolutely necessary that I didn't dare use 
it... Thanks for the permission.



+  code2 = gimple_assign_rhs_code (def_stmt);
+
+  /* Two consecutive shuffles.  */
+  if (code2 == VEC_PERM_EXPR)
+{
+  op3 = gimple_assign_rhs3 (def_stmt);
+  if (TREE_CODE (op3) != VECTOR_CST)
+   return 0;
+  mask = fold_build3_loc (loc, VEC_PERM_EXPR, TREE_TYPE (op3),
+ op3, op3, op2);
+  gcc_assert (TREE_CODE (mask) == VECTOR_CST);


which means you can use

 mask = fold_ternary (loc, ...);

Ok with that changes.


Thanks a lot, I'll do that.

Next step should be either BIT_FIELD_REF(VEC_PERM_EXPR) or 
VEC_PERM_EXPR(CONSTRUCTOR). Is there a good way to determine whether this 
kind of combination should be done forwards or backwards (i.e. start from 
VEC_PERM_EXPR and look if its single use is in BIT_FIELD_REF, or start 
from BIT_FIELD_REF and look if its argument is a VEC_PERM_EXPR only used 
here?


--
Marc Glisse


Re: complex.h

2012-08-13 Thread Marc Glisse

On Mon, 13 Aug 2012, Rainer Orth wrote:


Paolo Carlini  writes:


Marc Glisse  ha scritto:


To be honest, I only checked the patch on linux/glibc, so there is a
real
risk on other platforms (which I don't have access to). I also did a
quick
sanity check on freebsd (not a true test).


Ok, conditioning the small change on glibc would not be a big deal, in
case. Let's ask Rainer to double check Solaris, anyway.


The testcase from the PR fails on i386-pc-solaris2.11 with g++ 4.7.0,
but works with #undef complex added, for all of -std=c++98/03/11.


That should be fine then.

I don't have an installed version of mainline, but will keep an eye for 
failures during the next round of bootstraps.


Thank you,

--
Marc Glisse


Re: [PATCH,i386] cpuid function for prefetchw

2012-08-13 Thread H.J. Lu
On Mon, Aug 13, 2012 at 5:59 AM, Uros Bizjak  wrote:
> On Mon, Aug 13, 2012 at 2:10 PM, Jakub Jelinek  wrote:
>
>>> > To get the prefetchw cpuid flag, cpuid
>>> > function 0x8001 needs to be called.
>>> > Previous to patch, function 0x7 is called.
>>> >
>>> > Bootstrapping and "make -k check" passes without failures.
>>> > Ok for trunk?
>>>
>>> IMHO you move it to a wrong spot, ecx bits of CPUID 0x8001 are tested
>>> earlier.
>>>
>>> So I think you want this instead (bootstrap/regtest in progress):
>>>
>>> 2012-08-13  Ganesh Gopalasubramanian  
>>>   Jakub Jelinek  
>>>
>>>   PR driver/54210
>>>   * config/i386/driver-i386.c (host_detect_local_cpu): Test bit_PRFCHW
>>>   bit of CPUID 0x8001 %ecx instead of CPUID 7 %ecx.
>>>   * config/i386/cpuid.h (bits_PRFCHW): Move definition to CPUID
>>>   0x8001 %ecx flags.
>>
>> Now bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> OK, but I didn't find PRFCHW in the cpuid documentation to confirm the change.
>
> Thanks,
> Uros.

It is in Intel AVX spec:

http://software.intel.com/en-us/avx/

-- 
H.J.


Re: combine permutations in gimple

2012-08-13 Thread Richard Guenther
On Mon, Aug 13, 2012 at 3:03 PM, Marc Glisse  wrote:
> On Mon, 13 Aug 2012, Richard Guenther wrote:
>
>> On Sat, Aug 11, 2012 at 2:25 PM, Marc Glisse  wrote:
>>>
>>> On Fri, 10 Aug 2012, Marc Glisse wrote:
>>>
 1) I am not sure we always want to combine permutations. Indeed, someone
 (user? vectorizer?) may have written 2 permutations to help the backend
 generate optimal code, whereas it will do a bad job on the complicated
 combined permutation. At tree level, I am not sure what can be done
 except
 possibly restrict the optimization to the case where the combined
 permutation is the identity. At the RTL level, we could compare what
 insns
 are generated, but that's going to be painful (doing anything with
 permutations at the RTL level is painful).
>
>
> I guess people will complain soon enough if this causes horrible performance
> regressions in vectorized code.
>
>
>>> +/* Return true if VAR has no nondebug use but in stmt.  */
>>> +static bool
>>> +is_only_used_in (const_tree var, const_gimple stmt)
>>> +{
>>> +  const ssa_use_operand_t *const ptr0 = &(SSA_NAME_IMM_USE_NODE (var));
>>> +  const ssa_use_operand_t *ptr = ptr0->next;
>>> +
>>> +  for (; ptr != ptr0; ptr = ptr->next)
>>> +{
>>> +  const_gimple use_stmt = USE_STMT (ptr);
>>> +  if (is_gimple_debug (use_stmt))
>>> +   continue;
>>> +  if (use_stmt != stmt)
>>> +   return false;
>>> +}
>>> +  return true;
>>> +}
>>
>>
>> if (!single_imm_use (var, &use, &use_stmt) || use_stmt != stmt)
>>
>> instead of the above.
>
>
> I don't think that works with statements that use a variable twice.
>
>
>>> +/* Combine two shuffles in a row.  Returns 1 if there were any changes
>>> +   made, 2 if cfg-cleanup needs to run.  Else it returns 0.  */
>>> +
>>> +static int
>>> +simplify_permutation (gimple_stmt_iterator *gsi)
>>> +{
>>> +  gimple stmt = gsi_stmt (*gsi);
>>> +  gimple def_stmt;
>>> +  tree op0, op1, op2, op3, mask;
>>> +  enum tree_code code = gimple_assign_rhs_code (stmt);
>>> +  enum tree_code code2;
>>> +  location_t loc = gimple_location (stmt);
>>> +
>>> +  gcc_checking_assert (code == VEC_PERM_EXPR);
>>> +
>>> +  op0 = gimple_assign_rhs1 (stmt);
>>> +  op1 = gimple_assign_rhs2 (stmt);
>>> +  op2 = gimple_assign_rhs3 (stmt);
>>> +
>>> +  if (TREE_CODE (op0) != SSA_NAME)
>>> +return 0;
>>> +
>>> +  if (TREE_CODE (op2) != VECTOR_CST)
>>> +return 0;
>>> +
>>> +  def_stmt = SSA_NAME_DEF_STMT (op0);
>>> +  if (!def_stmt || !is_gimple_assign (def_stmt)
>>> +  || !can_propagate_from (def_stmt)
>>> +  || !is_only_used_in (op0, stmt))
>>
>>
>> Or rather than the above, simply check
>>
>>   || !has_single_use (op0)
>>
>> here.
>
>
> Then there was my previous (non-working) patch that used
> get_prop_source_stmt.
>
>
>>> +return 0;
>>> +
>>> +  /* Check that it is only used here. We cannot use has_single_use
>>> + since the expression is using it twice itself...  */
>>
>>
>> Ah ... so then
>>
>>   || num_imm_uses (op0) != 2
>
>
> Ah, ok, that's simpler indeed, but there were such dire warnings to never
> use that evil function unless absolutely necessary that I didn't dare use
> it... Thanks for the permission.

If your new predicate would match more places (can you do a quick search?)
then we want it in tree-flow-inline.h instead of in
tree-ssa-forwprop.c.  But yes,
num_imm_uses can be expensive.   For now just stick with the above.

>>> +  code2 = gimple_assign_rhs_code (def_stmt);
>>> +
>>> +  /* Two consecutive shuffles.  */
>>> +  if (code2 == VEC_PERM_EXPR)
>>> +{
>>> +  op3 = gimple_assign_rhs3 (def_stmt);
>>> +  if (TREE_CODE (op3) != VECTOR_CST)
>>> +   return 0;
>>> +  mask = fold_build3_loc (loc, VEC_PERM_EXPR, TREE_TYPE (op3),
>>> + op3, op3, op2);
>>> +  gcc_assert (TREE_CODE (mask) == VECTOR_CST);
>>
>>
>> which means you can use
>>
>>  mask = fold_ternary (loc, ...);
>>
>> Ok with that changes.
>
>
> Thanks a lot, I'll do that.
>
> Next step should be either BIT_FIELD_REF(VEC_PERM_EXPR) or
> VEC_PERM_EXPR(CONSTRUCTOR). Is there a good way to determine whether this
> kind of combination should be done forwards or backwards (i.e. start from
> VEC_PERM_EXPR and look if its single use is in BIT_FIELD_REF, or start from
> BIT_FIELD_REF and look if its argument is a VEC_PERM_EXPR only used here?

The natural SSA (and forwprop) way is to look for BIT_FIELD_REF and see
if the def stmt is a VEC_PERM_EXPR.

Richard.

> --
> Marc Glisse


Re: combine permutations in gimple

2012-08-13 Thread Ramana Radhakrishnan
>
> I guess people will complain soon enough if this causes horrible performance
> regressions in vectorized code.

Not having looked at your patch in great detail,. surely what we don't
want is a situation where 2 constant permutations are converted into
one generic permute. Based on a quick read of your patch I couldn't
work that out.  It might be that 2 constant  permutes are cheaper than
a generic permute. Have you looked at any examples in that space . I
surely wouldn't like to see a sequence of interleave / transpose
change into a generic permute operation on Neon as that would be far
more expensive than this.  It surely needs more testting than just
this bit before going in. The reason being that this would likely take
more registers and indeed produce loads of a constant pool for the new
mask.

regards,
Ramana


Re: combine permutations in gimple

2012-08-13 Thread Richard Guenther
On Mon, Aug 13, 2012 at 3:12 PM, Ramana Radhakrishnan
 wrote:
>>
>> I guess people will complain soon enough if this causes horrible performance
>> regressions in vectorized code.
>
> Not having looked at your patch in great detail,. surely what we don't
> want is a situation where 2 constant permutations are converted into
> one generic permute. Based on a quick read of your patch I couldn't
> work that out.  It might be that 2 constant  permutes are cheaper than
> a generic permute. Have you looked at any examples in that space . I
> surely wouldn't like to see a sequence of interleave / transpose
> change into a generic permute operation on Neon as that would be far
> more expensive than this.  It surely needs more testting than just
> this bit before going in. The reason being that this would likely take
> more registers and indeed produce loads of a constant pool for the new
> mask.

The patch does not do that.  It merely assumes that the target knows
how to perform an optimal constant permute and that two constant
permutes never generate better code than a single one.

Richard.

> regards,
> Ramana


Re: combine permutations in gimple

2012-08-13 Thread Marc Glisse

On Mon, 13 Aug 2012, Richard Guenther wrote:


On Mon, Aug 13, 2012 at 3:12 PM, Ramana Radhakrishnan
 wrote:


I guess people will complain soon enough if this causes horrible performance
regressions in vectorized code.


Not having looked at your patch in great detail,. surely what we don't
want is a situation where 2 constant permutations are converted into
one generic permute. Based on a quick read of your patch I couldn't
work that out.  It might be that 2 constant  permutes are cheaper than
a generic permute. Have you looked at any examples in that space . I
surely wouldn't like to see a sequence of interleave / transpose
change into a generic permute operation on Neon as that would be far
more expensive than this.  It surely needs more testting than just
this bit before going in. The reason being that this would likely take
more registers and indeed produce loads of a constant pool for the new
mask.


What do you call constant / non-constant? The combined permutation is 
still constant, although the expansion (in the back-end) might fail to 
expand it efficiently and fall back to the generic permutation 
expansion...



The patch does not do that.  It merely assumes that the target knows
how to perform an optimal constant permute and that two constant
permutes never generate better code than a single one.


Which, to be honest, is false on all platforms I know, although I did 
contribute some minor enhancements for x86.


--
Marc Glisse


Re: combine permutations in gimple

2012-08-13 Thread Jakub Jelinek
On Mon, Aug 13, 2012 at 03:13:26PM +0200, Richard Guenther wrote:
> On Mon, Aug 13, 2012 at 3:12 PM, Ramana Radhakrishnan
>  wrote:
> >>
> >> I guess people will complain soon enough if this causes horrible 
> >> performance
> >> regressions in vectorized code.
> >
> > Not having looked at your patch in great detail,. surely what we don't
> > want is a situation where 2 constant permutations are converted into
> > one generic permute. Based on a quick read of your patch I couldn't
> > work that out.  It might be that 2 constant  permutes are cheaper than
> > a generic permute. Have you looked at any examples in that space . I
> > surely wouldn't like to see a sequence of interleave / transpose
> > change into a generic permute operation on Neon as that would be far
> > more expensive than this.  It surely needs more testting than just
> > this bit before going in. The reason being that this would likely take
> > more registers and indeed produce loads of a constant pool for the new
> > mask.
> 
> The patch does not do that.  It merely assumes that the target knows
> how to perform an optimal constant permute and that two constant
> permutes never generate better code than a single one.

Still, the patch should do some tests whether it is beneficial.
At least a can_vec_perm_p (mode, false, sel) test of the resulting
permutation if both the original permutations pass that test, and perhaps
additionally if targetm.vectorize.vec_perm_const_ok is non-NULL and
passes for both the original permutations then it should also pass
for the resulting permutation.

Jakub


Re: combine permutations in gimple

2012-08-13 Thread Marc Glisse

On Mon, 13 Aug 2012, Richard Guenther wrote:


+  /* Check that it is only used here. We cannot use has_single_use
+ since the expression is using it twice itself...  */


Ah ... so then

  || num_imm_uses (op0) != 2


Ah, ok, that's simpler indeed, but there were such dire warnings to never
use that evil function unless absolutely necessary that I didn't dare use
it... Thanks for the permission.


If your new predicate would match more places (can you do a quick search?)


You mean: if there are more optimizations that either already check for 
double use in the same statement, or could benefit from doing so? I'll 
take a look.



then we want it in tree-flow-inline.h instead of in
tree-ssa-forwprop.c.  But yes,
num_imm_uses can be expensive.   For now just stick with the above.


I assume "the above" means "num_imm_uses (op0) != 2", since both versions 
are above ;-)



The natural SSA (and forwprop) way is to look for BIT_FIELD_REF and see
if the def stmt is a VEC_PERM_EXPR.


Thanks again,

--
Marc Glisse


Re: RFC: fix std::unique_ptr pretty-printer

2012-08-13 Thread Tom Tromey
> "Jonathan" == Jonathan Wakely  writes:

>> $11 = std::unique_ptr containing (datum *) 0x6067d0

Jonathan> It's inconsistent with the other printers in that it prints
Jonathan> the stored type, unlike e.g. std::vector which just says
Jonathan> "std::vector of length ..." but I think that's an improvement.

Yeah... without this bit it was just printing

$11 = std::unique_ptr containing 0x6067d0

Ordinarily, gdb will print the type here; but it doesn't when called
from Python.  I thought the typical output was easier to read.

Jonathan> Personally I'd prefer the element_type as part of the type, e.g.
Jonathan> "std::unique_ptr = 0x6067d0" but that would be even more
Jonathan> inconsistent!

I can make that change if you'd prefer.
I don't know why, but I didn't even think of it.

Tom


[Patch, fortran] PR46897 - [OOP] type-bound defined ASSIGNMENT(=) not used for derived type component in intrinsic assign

2012-08-13 Thread Paul Richard Thomas
Dear All,

Please find attached a patch and testcase for the above PR.  The
comment before generate_component_assignments explains the need for
the patch, which itself is fairly self explanatory.

Bootstrapped and regtested on Fc9/x86_64 - OK for trunk?

Best regards

Paul and Alessandro.

2012-08-13   Alessandro Fanfarillo 
 Paul Thomas  

PR fortran/46897
* resolve.c (add_comp_ref): New function.
(generate_component_assignments): New function that calls
add_comp_ref.
(resolve_code): Call generate_component_assignments.

2012-08-13   Alessandro Fanfarillo 
 Paul Thomas  

PR fortran/46897
* gfortran.dg/defined_assignment_1.f90: New test.
Index: gcc/fortran/resolve.c
===
*** gcc/fortran/resolve.c	(revision 190338)
--- gcc/fortran/resolve.c	(working copy)
*** resolve_ordinary_assign (gfc_code *code,
*** 9485,9490 
--- 9485,9614 
  }
  
  
+ /* Add a component reference onto an expression.  */
+ 
+ static void
+ add_comp_ref (gfc_expr *e, gfc_component *c)
+ {
+   gfc_ref **ref;
+   ref = &(e->ref);
+   while (*ref)
+ ref = &((*ref)->next);
+   *ref = gfc_get_ref();
+   (*ref)->type = REF_COMPONENT;
+   (*ref)->u.c.sym = c->ts.u.derived;
+   (*ref)->u.c.component = c;
+   e->ts = c->ts;
+ }
+ 
+ 
+ /* Implement 7.2.1.3 of the F08 standard:
+"An intrinsic assignment where the variable is of derived type is
+performed as if each component of the variable were assigned from the
+corresponding component of expr using pointer assignment (7.2.2) for
+each pointer component, defined assignment for each nonpointer
+nonallocatable component of a type that has a type-bound defined
+assignment consistent with the component, intrinsic assignment for
+each other nonpointer nonallocatable component, ..." 
+ 
+The pointer assignments are taken care of by the intrinsic
+assignment of the structure itself.  This function recursively adds
+defined assignments where required.  */
+ 
+ static void
+ generate_component_assignments (gfc_code **code, gfc_namespace *ns)
+ {
+   gfc_component *comp1, *comp2;
+   gfc_code *this_code, *next, *root, *previous;
+ 
+   /* Filter out continuing processing after an error.  */
+   if ((*code)->expr1->ts.type != BT_DERIVED
+   || (*code)->expr2->ts.type != BT_DERIVED)
+ return;
+ 
+   comp1 = (*code)->expr1->ts.u.derived->components;
+   comp2 = (*code)->expr2->ts.u.derived->components;
+ 
+   for (; comp1; comp1 = comp1->next, comp2 = comp2->next)
+ {
+   if (comp1->ts.type != BT_DERIVED
+ 	  || comp1->ts.u.derived == NULL
+ 	  || (comp1->attr.pointer || comp1->attr.allocatable)
+ 	  || (*code)->expr1->ts.u.derived == comp1->ts.u.derived)
+ 	continue;
+ 
+   /* Make an assigment for this component.  */
+   this_code = gfc_get_code ();
+   this_code->op = EXEC_ASSIGN;
+   this_code->next = NULL;
+   this_code->expr1 = gfc_copy_expr ((*code)->expr1);
+   this_code->expr2 = gfc_copy_expr ((*code)->expr2);
+ 
+   add_comp_ref (this_code->expr1, comp1);
+   add_comp_ref (this_code->expr2, comp2);
+ 
+   root = this_code;
+ 
+   /* Convert the assignment if there is a defined assignment for
+ 	 this type.  Otherwise, recurse into its components.  */
+   if (resolve_ordinary_assign (this_code, ns)
+ 	  && this_code->op == EXEC_COMPCALL)
+ 	resolve_typebound_subroutine (this_code);
+   else if (this_code && this_code->op == EXEC_ASSIGN)
+ 	generate_component_assignments (&this_code, ns);
+ 
+   previous = NULL;
+   this_code = root;
+ 
+   /* Go through the code chain eliminating all but calls to
+ 	 typebound procedures. Since we have been through
+ 	 resolve_typebound_subroutine. */
+   for (; this_code; this_code = this_code->next)
+ 	{
+ 	  if (this_code->op == EXEC_ASSIGN_CALL)
+ 	{
+ 	  gfc_symbol *fsym = this_code->symtree->n.sym->formal->sym;
+ 	  /* Check that there is a defined assignment.  If so, then
+ 	 resolve the call.  */
+ 	  if (fsym->ts.type == BT_CLASS
+ 		  && CLASS_DATA (fsym)->ts.u.derived->f2k_derived
+ 		  && CLASS_DATA (fsym)->ts.u.derived->f2k_derived
+ 			->tb_op[INTRINSIC_ASSIGN])
+ 		{
+ 		  resolve_call (this_code);
+ 		  goto next;
+ 		}
+ 	}
+ 
+ 	  next = this_code->next;
+ 	  if (this_code == root)
+ 	root = next;
+ 	  else
+ 	previous->next = next;
+ 
+ 	  next = this_code;
+ 	  next->next = NULL;
+ 	  gfc_free_statements (next);
+ 	next:
+ 	  previous = this_code;
+ 	}
+ 
+   /* Now attach the remaining code chain to the input code. Step on
+ 	 to the end of the new code since resolution is complete.  */
+   if (root)
+ 	{
+ 	  next = (*code)->next;
+ 	  (*code)->next = root;
+ 	  for (;root; root = root->next)
+ 	if (!root->next)
+ 	  break;
+ 	  root->next = next;
+ 	  *code = root;
+ 	}
+}
+ }
+ 
+ 
  /* Given a block of co

Re: [cxx-conversion] Support garbage-collected C++ templates

2012-08-13 Thread Diego Novillo
On Sun, Aug 12, 2012 at 11:28 PM, Laurynas Biveinis
 wrote:

> I'm referring to the very first part of gty.texi, section 22 before
> the subsection table of contents:
> http://gcc.gnu.org/onlinedocs/gccint/Type-Information.html#Type-Information.
> It talks about C, structs and unions and it will need updating about
> C++ support in gengtype.

Ah, thanks.  I've added this patch to the merge image.  I will update
the documentation to include 'class' when I update gengtype later on.


Diego.

diff --git a/gcc/doc/gty.texi b/gcc/doc/gty.texi
index 3754b75..614dae3 100644
--- a/gcc/doc/gty.texi
+++ b/gcc/doc/gty.texi
@@ -13,17 +13,17 @@ involve determining information about GCC's data
structures from GCC's
 source code and using this information to perform garbage collection and
 implement precompiled headers.

-A full C parser would be too complicated for this task, so a limited
-subset of C is interpreted and special markers are used to determine
-what parts of the source to look at.  All @code{struct} and
-@code{union} declarations that define data structures that are
-allocated under control of the garbage collector must be marked.  All
-global variables that hold pointers to garbage-collected memory must
-also be marked.  Finally, all global variables that need to be saved
-and restored by a precompiled header must be marked.  (The precompiled
-header mechanism can only save static variables if they're scalar.
-Complex data structures must be allocated in garbage-collected memory
-to be saved in a precompiled header.)
+A full C++ parser would be too complicated for this task, so a limited
+subset of C++ is interpreted and special markers are used to determine
+what parts of the source to look at.  All @code{struct}, @code{union}
+and @code{template} structure declarations that define data structures
+that are allocated under control of the garbage collector must be
+marked.  All global variables that hold pointers to garbage-collected
+memory must also be marked.  Finally, all global variables that need
+to be saved and restored by a precompiled header must be marked.  (The
+precompiled header mechanism can only save static variables if they're
+scalar. Complex data structures must be allocated in garbage-collected
+memory to be saved in a precompiled header.)

 The full format of a marker is
 @smallexample


Re: combine permutations in gimple

2012-08-13 Thread Marc Glisse

On Mon, 13 Aug 2012, Jakub Jelinek wrote:


On Mon, Aug 13, 2012 at 03:13:26PM +0200, Richard Guenther wrote:

The patch does not do that.  It merely assumes that the target knows
how to perform an optimal constant permute and that two constant
permutes never generate better code than a single one.


Still, the patch should do some tests whether it is beneficial.
At least a can_vec_perm_p (mode, false, sel) test of the resulting
permutation if both the original permutations pass that test,


Sounds good. The last argument should be the constant permutation vector, 
presented as an array of indices stored in unsigned char? I hadn't 
realized we already had access to backend information that early in the 
compilation. It doesn't give the cost though.



and perhaps
additionally if targetm.vectorize.vec_perm_const_ok is non-NULL and
passes for both the original permutations then it should also pass
for the resulting permutation.


Isn't that implied by the previous test (I see calls to vec_perm_const_ok 
in there)? Or maybe not quite?


--
Marc Glisse


Re: combine permutations in gimple

2012-08-13 Thread Jakub Jelinek
On Mon, Aug 13, 2012 at 03:45:00PM +0200, Marc Glisse wrote:
> On Mon, 13 Aug 2012, Jakub Jelinek wrote:
> 
> >On Mon, Aug 13, 2012 at 03:13:26PM +0200, Richard Guenther wrote:
> >>The patch does not do that.  It merely assumes that the target knows
> >>how to perform an optimal constant permute and that two constant
> >>permutes never generate better code than a single one.
> >
> >Still, the patch should do some tests whether it is beneficial.
> >At least a can_vec_perm_p (mode, false, sel) test of the resulting
> >permutation if both the original permutations pass that test,
> 
> Sounds good. The last argument should be the constant permutation
> vector, presented as an array of indices stored in unsigned char? I
> hadn't realized we already had access to backend information that
> early in the compilation. It doesn't give the cost though.

Yeah, it doesn't give the cost, just whether it is supported at all.

> >and perhaps
> >additionally if targetm.vectorize.vec_perm_const_ok is non-NULL and
> >passes for both the original permutations then it should also pass
> >for the resulting permutation.
> 
> Isn't that implied by the previous test (I see calls to
> vec_perm_const_ok in there)? Or maybe not quite?

can_vec_perm_p can also return true if e.g. the target supports generic
variable permutation for the mode in question.
So the targetm.vectorize.vec_perm_const_ok check is stricter than that,
it tells you whether it can be supported by some constant permutation (or a
sequence of them, you know how the i386 code for that is complicated).

Jakub


Re: combine permutations in gimple

2012-08-13 Thread Ramana Radhakrishnan
On 13 August 2012 14:21, Marc Glisse  wrote:
> On Mon, 13 Aug 2012, Richard Guenther wrote:
>
>> On Mon, Aug 13, 2012 at 3:12 PM, Ramana Radhakrishnan
>>  wrote:


 I guess people will complain soon enough if this causes horrible
 performance
 regressions in vectorized code.
>>>
>>>
>>> Not having looked at your patch in great detail,. surely what we don't
>>> want is a situation where 2 constant permutations are converted into
>>> one generic permute. Based on a quick read of your patch I couldn't
>>> work that out.  It might be that 2 constant  permutes are cheaper than
>>> a generic permute. Have you looked at any examples in that space . I
>>> surely wouldn't like to see a sequence of interleave / transpose
>>> change into a generic permute operation on Neon as that would be far
>>> more expensive than this.  It surely needs more testting than just
>>> this bit before going in. The reason being that this would likely take
>>> more registers and indeed produce loads of a constant pool for the new
>>> mask.
>
>
> What do you call constant / non-constant? The combined permutation is still
> constant, although the expansion (in the back-end) might fail to expand it
> efficiently and fall back to the generic permutation expansion...


That is exactly what I was worried about. By constant I would expect
something that is expanded as  a constant permute by the backend - an
interleave operation or a transpose operation or indeed some of the
funky operations as below in ARM / Neon which is the SIMD architecture
I'm most familiar with .


If you had something that did the following :


uint8x8_t
tst_vrev642_u8 (uint8x8_t __a)
{
  uint8x8_t __rv;
  uint8x8_t __mask1 = { 7, 6, 5, 4, 3, 2, 1, 0};
  uint8x8_t __mask2 =  { 0, 8, 2, 10, 4, 12, 6, 14 };
  return __builtin_shuffle ( __builtin_shuffle ( __a, __mask1), __mask2) ;
}



I would expect these instructions

vrev64.8d0, d0
vmovd16, d0  @ v8qi
vtrn.8  d0, d16
bx  lr

With your patch I see

tst_vrev642_u8:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
flddd16, .L2
vtbl.8  d0, {d0}, d16
bx  lr
.L3:
.align  3
.L2:
.byte   7
.byte   7
.byte   5
.byte   5
.byte   3
.byte   3
.byte   1
.byte   1

It might be that the backend predicates need tightening in this case
but  why not try to get cost info about such combinations rather than
just doing this gratuitously ?  While in this case the ARM port might
be wrong , but in general when the vector permute rewrites were done
we chose to go ahead and keep the generic constant permutes in the
backend as the last resort to fall back to. However if fwprop starts
making such transformations really one ought to get relative costs for
each of these operations rather than allowing gratuitous replacements.



Thanks,
Ramana


Re: combine permutations in gimple

2012-08-13 Thread Ramana Radhakrishnan
On 13 August 2012 14:54, Jakub Jelinek  wrote:
> On Mon, Aug 13, 2012 at 03:45:00PM +0200, Marc Glisse wrote:
>> On Mon, 13 Aug 2012, Jakub Jelinek wrote:
>>
>> >On Mon, Aug 13, 2012 at 03:13:26PM +0200, Richard Guenther wrote:
>> >>The patch does not do that.  It merely assumes that the target knows
>> >>how to perform an optimal constant permute and that two constant
>> >>permutes never generate better code than a single one.
>> >
>> >Still, the patch should do some tests whether it is beneficial.
>> >At least a can_vec_perm_p (mode, false, sel) test of the resulting
>> >permutation if both the original permutations pass that test,
>>
>> Sounds good. The last argument should be the constant permutation
>> vector, presented as an array of indices stored in unsigned char? I
>> hadn't realized we already had access to backend information that
>> early in the compilation. It doesn't give the cost though.
>
> Yeah, it doesn't give the cost, just whether it is supported at all.


Maybe we need an interface of that form. A generic permute operation
used for constant permute operations is going to be more expensive
than more specialized constant permute operations. It might be that
this cost gets amortized over a large number of operations at which
point it makes sense but the compiler should make this transformation
based on cost rather than just whether something is supported or not.

Ramana


Re: combine permutations in gimple

2012-08-13 Thread Marc Glisse

On Mon, 13 Aug 2012, Ramana Radhakrishnan wrote:


On 13 August 2012 14:21, Marc Glisse  wrote:

On Mon, 13 Aug 2012, Richard Guenther wrote:


On Mon, Aug 13, 2012 at 3:12 PM, Ramana Radhakrishnan
 wrote:



I guess people will complain soon enough if this causes horrible
performance
regressions in vectorized code.



Not having looked at your patch in great detail,. surely what we don't
want is a situation where 2 constant permutations are converted into
one generic permute. Based on a quick read of your patch I couldn't
work that out.  It might be that 2 constant  permutes are cheaper than
a generic permute. Have you looked at any examples in that space . I
surely wouldn't like to see a sequence of interleave / transpose
change into a generic permute operation on Neon as that would be far
more expensive than this.  It surely needs more testting than just
this bit before going in. The reason being that this would likely take
more registers and indeed produce loads of a constant pool for the new
mask.



What do you call constant / non-constant? The combined permutation is still
constant, although the expansion (in the back-end) might fail to expand it
efficiently and fall back to the generic permutation expansion...



That is exactly what I was worried about. By constant I would expect
something that is expanded as  a constant permute by the backend - an
interleave operation or a transpose operation or indeed some of the
funky operations as below in ARM / Neon which is the SIMD architecture
I'm most familiar with .


If you had something that did the following :


uint8x8_t
tst_vrev642_u8 (uint8x8_t __a)
{
 uint8x8_t __rv;
 uint8x8_t __mask1 = { 7, 6, 5, 4, 3, 2, 1, 0};
 uint8x8_t __mask2 =  { 0, 8, 2, 10, 4, 12, 6, 14 };
 return __builtin_shuffle ( __builtin_shuffle ( __a, __mask1), __mask2) ;
}



I would expect these instructions

vrev64.8d0, d0
vmovd16, d0  @ v8qi
vtrn.8  d0, d16
bx  lr

With your patch I see

tst_vrev642_u8:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
flddd16, .L2
vtbl.8  d0, {d0}, d16
bx  lr
.L3:
.align  3
.L2:
.byte   7
.byte   7
.byte   5
.byte   5
.byte   3
.byte   3
.byte   1
.byte   1


Seems to be one instruction shorter at least ;-) Yes, there can be much 
worse regressions than that because of the patch (like 40 instructions 
instead of 4, in the x86 backend).



It might be that the backend predicates need tightening in this case
but  why not try to get cost info about such combinations rather than
just doing this gratuitously ?


I don't think the word gratuitous is appropriate. middle-end replaces a+-b 
with a-b without first asking the backend whether it might be more 
efficient. One permutation is better than 2. It just happens that the 
range of possible permutations is too large (and the basic instructions 
are too strange) for backends to do a good job on them, and thus keeping 
toplevel input as a hint is important.



While in this case the ARM port might
be wrong , but in general when the vector permute rewrites were done
we chose to go ahead and keep the generic constant permutes in the
backend as the last resort to fall back to. However if fwprop starts
making such transformations really one ought to get relative costs for
each of these operations rather than allowing gratuitous replacements.


Indeed, having costs would help, but that's a lot of work.

As you can see from the original email, I am willing to limit the 
optimization to the case where the combined permutation is the identity 
(yes, that's quite drastic...). I should probably implement that special 
case anyway, because it doesn't require its argument to have a single use 
for the optimization to make sense.


--
Marc Glisse


[Patch, fortran] PR 47586 Missing deep copy when assigning from a function returning a pointer.

2012-08-13 Thread Mikael Morin
Hello,

here is a fix for PR47586: missing deep copy for the case:

dt_w_alloc = ptr_func(arg)

where dt_w_alloc is of derived type with allocatable components, and
ptr_func returns a data pointer.
The fix tweaks expr_is_variable so that gfc_trans_scalar_assign is
called with the flag enabling deep copy set.

I added a few fixes loosely related before, so that the patches are as
follows:

1/4: gfc_is_proc_ptr_comp interface change,
2/4: gfc_is_scalar_ptr deep_copy flag lengthy explanation,
3/4: regression fix,
4/4: patch fixing the PR.

Regression-tested on x86_64-unknown-linux-gnu. OK for trunk?

Mikael

gfc_is_proc_ptr_comp has a side effect: if the expression references
a procedure pointer component, it returns true and assigns to its second
argument the component.

As I don't like side effects, this patch removes the second argument and
replaces the cases where it is useful by a call to (the new function)
gfc_get_proc_ptr_comp.

This is optional: I can adjust the patch depending on it (patch 4) to do
it the old way if it's preferred.

OK?

2012-08-13  Mikael Morin  

	* gfortran.h (gfc_get_proc_ptr_comp): New prototype.
	(gfc_is_proc_ptr_comp): Update prototype.
	* expr.c (gfc_get_proc_ptr_comp): New function based on the old
	gfc_is_proc_ptr_comp.
	(gfc_is_proc_ptr_comp): Call gfc_get_proc_ptr_comp.
	(gfc_specification_expr, gfc_check_pointer_assign): Use
	gfc_get_proc_ptr_comp.
	* trans-array.c (gfc_walk_function_expr): Likewise.
	* resolve.c (resolve_structure_cons, update_ppc_arglist,
	resolve_ppc_call, resolve_expr_ppc): Likewise.
	(resolve_function): Update call to gfc_is_proc_ptr_comp.
	* dump-parse-tree.c (show_expr): Likewise.
	* interface.c (compare_actual_formal): Likewise.
	* match.c (gfc_match_pointer_assignment): Likewise.
	* primary.c (gfc_match_varspec): Likewise.
	* trans-io.c (gfc_trans_transfer): Likewise.
	* trans-expr.c (gfc_conv_variable, conv_function_val,
	conv_isocbinding_procedure, gfc_conv_procedure_call,
	gfc_trans_pointer_assignment): Likewise.
	(gfc_conv_procedure_call, gfc_trans_array_func_assign):
	Use gfc_get_proc_ptr_comp.

diff --git a/dump-parse-tree.c b/dump-parse-tree.c
index 681dc8d..cb8fab4 100644
--- a/dump-parse-tree.c
+++ b/dump-parse-tree.c
@@ -569,7 +569,7 @@ show_expr (gfc_expr *p)
   if (p->value.function.name == NULL)
 	{
 	  fprintf (dumpfile, "%s", p->symtree->n.sym->name);
-	  if (gfc_is_proc_ptr_comp (p, NULL))
+	  if (gfc_is_proc_ptr_comp (p))
 	show_ref (p->ref);
 	  fputc ('[', dumpfile);
 	  show_actual_arglist (p->value.function.actual);
@@ -578,7 +578,7 @@ show_expr (gfc_expr *p)
   else
 	{
 	  fprintf (dumpfile, "%s", p->value.function.name);
-	  if (gfc_is_proc_ptr_comp (p, NULL))
+	  if (gfc_is_proc_ptr_comp (p))
 	show_ref (p->ref);
 	  fputc ('[', dumpfile);
 	  fputc ('[', dumpfile);
diff --git a/expr.c b/expr.c
index cb5e1c6..18e8b5b 100644
--- a/expr.c
+++ b/expr.c
@@ -2965,12 +2965,12 @@ gfc_specification_expr (gfc_expr *e)
   return FAILURE;
 }
 
+  comp = gfc_get_proc_ptr_comp (e);
   if (e->expr_type == EXPR_FUNCTION
-	  && !e->value.function.isym
-	  && !e->value.function.esym
-	  && !gfc_pure (e->symtree->n.sym)
-	  && (!gfc_is_proc_ptr_comp (e, &comp)
-	  || !comp->attr.pure))
+  && !e->value.function.isym
+  && !e->value.function.esym
+  && !gfc_pure (e->symtree->n.sym)
+  && (!comp || !comp->attr.pure))
 {
   gfc_error ("Function '%s' at %L must be PURE",
 		 e->symtree->n.sym->name, &e->where);
@@ -3478,12 +3478,14 @@ gfc_check_pointer_assign (gfc_expr *lvalue, gfc_expr *rvalue)
 	}
 	}
 
-  if (gfc_is_proc_ptr_comp (lvalue, &comp))
+  comp = gfc_get_proc_ptr_comp (lvalue);
+  if (comp)
 	s1 = comp->ts.interface;
   else
 	s1 = lvalue->symtree->n.sym;
 
-  if (gfc_is_proc_ptr_comp (rvalue, &comp))
+  comp = gfc_get_proc_ptr_comp (rvalue);
+  if (comp)
 	{
 	  s2 = comp->ts.interface;
 	  name = comp->name;
@@ -4058,31 +4060,35 @@ gfc_expr_set_symbols_referenced (gfc_expr *expr)
 }
 
 
-/* Determine if an expression is a procedure pointer component. If yes, the
-   argument 'comp' will point to the component (provided that 'comp' was
-   provided).  */
+/* Determine if an expression is a procedure pointer component and return
+   the component in that case.  Otherwise return NULL.  */
 
-bool
-gfc_is_proc_ptr_comp (gfc_expr *expr, gfc_component **comp)
+gfc_component *
+gfc_get_proc_ptr_comp (gfc_expr *expr)
 {
   gfc_ref *ref;
-  bool ppc = false;
 
   if (!expr || !expr->ref)
-return false;
+return NULL;
 
   ref = expr->ref;
   while (ref->next)
 ref = ref->next;
 
-  if (ref->type == REF_COMPONENT)
-{
-  ppc = ref->u.c.component->attr.proc_pointer;
-  if (ppc && comp)
-	*comp = ref->u.c.component;
-}
+  if (ref->type == REF_COMPONENT
+  && ref->u.c.component->attr.proc_pointer)
+return ref->u.c.component;
+
+  return NULL;
+}
+
 
-  return ppc;
+/* Determine if an expression is a procedure pointer componen

Re: combine permutations in gimple

2012-08-13 Thread Richard Guenther
On Mon, Aug 13, 2012 at 4:12 PM, Ramana Radhakrishnan
 wrote:
> On 13 August 2012 14:54, Jakub Jelinek  wrote:
>> On Mon, Aug 13, 2012 at 03:45:00PM +0200, Marc Glisse wrote:
>>> On Mon, 13 Aug 2012, Jakub Jelinek wrote:
>>>
>>> >On Mon, Aug 13, 2012 at 03:13:26PM +0200, Richard Guenther wrote:
>>> >>The patch does not do that.  It merely assumes that the target knows
>>> >>how to perform an optimal constant permute and that two constant
>>> >>permutes never generate better code than a single one.
>>> >
>>> >Still, the patch should do some tests whether it is beneficial.
>>> >At least a can_vec_perm_p (mode, false, sel) test of the resulting
>>> >permutation if both the original permutations pass that test,
>>>
>>> Sounds good. The last argument should be the constant permutation
>>> vector, presented as an array of indices stored in unsigned char? I
>>> hadn't realized we already had access to backend information that
>>> early in the compilation. It doesn't give the cost though.
>>
>> Yeah, it doesn't give the cost, just whether it is supported at all.
>
>
> Maybe we need an interface of that form. A generic permute operation
> used for constant permute operations is going to be more expensive
> than more specialized constant permute operations. It might be that
> this cost gets amortized over a large number of operations at which
> point it makes sense but the compiler should make this transformation
> based on cost rather than just whether something is supported or not.

Well.  I think the middle-end can reasonably assume that backends know
how to most efficiently perform any constant permute.  We do not limit
converting

 a = a + 5;
 a = a + 254;

to

 a = a + 259;

either though a target may generate better code for the first case.

Which means that we should go without a target test and take regressions
as they appear as an opportunity to improve targets instead.

Richard.

> Ramana


LEA-splitting improvement patch.

2012-08-13 Thread Yuri Rumyantsev
Hi all,

It is known that LEA splitting is one of the most critical problems
for Atom processors and changes try to improve it through:
1.   More aggressive Lea splitting – do not perform splitting if
only split cost exceeds AGU stall .
2.   Reordering splitting instructions to get better scheduling –
use the farthest defined register for SET instruction, then add
constant offset if any and finally generate add instruction.This gives
+0.5% speedup in geomean for eembc2.0 suite on Atom.
All required testing was done – bootstraps for Atom & Core2, make check.
Note that this fix affects only on Atom processors.

ChangeLog:
2012-08-08  Yuri Rumyantsev  yuri.s.rumyant...@intel.com

* config/i386/i386-protos.h (ix86_split_lea_for_addr) : Add additional argument.
* config/i386/i386.md (ix86_splitt_lea_for_addr) : Add additional
argument curr_insn.
* config/i386/i386.c (find_nearest_reg-def): New function. Find
nearest register definition used in address.
(ix86_split_lea_for_addr) : Do more aggressive lea splitting and
instructions reodering to get opportunities for better scheduling.

Is it OK for trunk?


lea_improvements.diff
Description: Binary data


Re: Scheduler: Save state at the end of a block

2012-08-13 Thread Vladimir Makarov

On 08/13/2012 06:32 AM, Bernd Schmidt wrote:

This is a small patch for sched-rgn that attempts to save DFA state at
the end of a basic block and re-use it in successor blocks. This was a
customer-requested optimization; I've not seen it make much of a
difference in any macro benchmarks.
Bootstrapped and tested on x86_64-linux and also tested on c6x-elf. OK?




Yes.  Thanks for the patch, Bernd.


Re: combine permutations in gimple

2012-08-13 Thread Marc Glisse

On Mon, 13 Aug 2012, Marc Glisse wrote:


On Mon, 13 Aug 2012, Richard Guenther wrote:


If your new predicate would match more places (can you do a quick search?)


You mean: if there are more optimizations that either already check for 
double use in the same statement, or could benefit from doing so? I'll take a 
look.


I couldn't find anything obvious (not that I understood a significant 
percentage of what I saw...).


Note that the comments are efficient since the only current use of 
num_imm_uses is in a debug printf call.



I'll give it a few more days for the conversation to settle, so I know 
what I should do between:

- the barely modified patch you accepted,
- the check asked by Jakub,
- the restriction to identity that prevents any regression (well...),
- something else?

--
Marc Glisse


Re: [v3] improve exception text when threads not enabled

2012-08-13 Thread Joe Buck
On Sun, Aug 12, 2012 at 08:02:30PM +0100, Jonathan Wakely wrote:
> This improves the fairly uninformative "Operation not supported"
> message given when std::thread is used without linking to libpthread.
> 
> Now you get:
> 
> terminate called after throwing an instance of 'std::system_error'
>   what():  Enable multithreading to use std::thread: Operation not permitted
> Aborted

The new message still seems deficient.  The issue is that the executable
does not contain any thread support; "not permitted" usually suggests a
permission violation (like trying to write a read-only file).  Perhaps "no
thread support found" should be used instead of "Operation not permitted".


Re: [C++ Pubnames Patch] Anonymous namespaces enclosed in named namespaces. (issue6343052)

2012-08-13 Thread Sterling Augustine
On Sun, Aug 12, 2012 at 12:46 PM, Jack Howarth  wrote:
> This patch introduces the regressions...
>
> FAIL: g++.dg/debug/dwarf2/pubnames-2.C scan-assembler 
> .section\t.debug_pubnames
> FAIL: g++.dg/debug/dwarf2/pubnames-2.C scan-assembler
> "_GLOBAL__sub_I__ZN3one3c1vE0"+[ \t]+[#;]+[ \t]+external name
> FAIL: g++.dg/debug/dwarf2/pubnames-2.C scan-assembler 
> .section\t.debug_pubtypes
>
> at -m32/-m64 on x86_64-apple-darwin12...
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54230
>
> I have attached the -m32 assembly generated for the 
> g++.dg/debug/dwarf2/pubnames-2.C
> to PR54230 but haven't been able to add Sterling to the PR as none of his 
> email
> addresses are recognized by bugzilla.

I will look at this and figure it out. The problem is quite likely
that the test is erroneous, rather than the patch.

Sterling


[PATCH, i386]: Rewrite ix86_conditional_register_usage

2012-08-13 Thread Uros Bizjak
Hello!

This patch rewrites ix86_conditional_register_usage to improve following things:

- Do not mark REX registers in FIXED_REGISTERS. We know that no 32bit
target supports them, so we can disable REX registers at
ix86_conditional_register_usage as well.
- Use bitmaps in CALL_USED_REGISTERS to conditionally mark used
registers for different ABIs, while still allowing -fcall-used-REG and
-fcall-saved-REG options to be used. This brings TARGET_64BIT_MS_ABI
to the same level as other ABIs and  allows future ABIs to be handled.
- Calculate CLOBBERED_REGS for 32bit targets in the same way as for
64bit targets, removing another 32bit/64bit difference. The regclass
definition has to be moved just before GENERAL_REGS, since it is
derived from this register class. We depend on fixed register
definition to avoid allocation of ESP and REX registers for
CLOBBERED_REGS class. This is the same approach as 64bit targets have
to avoid ESP, and the same approach as 32bit targets have to avoid REX
registers in  GENERAL_REGS class.
- Some trivial code reorderings, so we don't process non-existing registers.

2012-08-13  Uros Bizjak  

* config/i386/i386.h (FIXED_REGISTERS): Do not mark REX registers here.
(CALL_USED_REGISTERS): Use bitmaps to mark call-used registers
for different ABIs.
(enum reg_class): Move CLOBBERED_REGS just before GENERAL_REGS.
(REG_CLASS_NAMES): Update.
(REG_CLASS_CONTENTS): Update.  Clear CLOBBERED_REGS members.
* config/i386/i386.c (ix86_conditional_register_usage): Disable
REX registers on 32bit targets.  Handle bitmaps from
CALL_USED_REGISTERS initializer.  Calculate CLOBBERED_REGS register
set from GENERAL_REGS also for 32bit targets.  Do not change call
used register set for TARGET_64BIT_MS_ABI separately.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
{,-m32} and i686-pc-linux-gnu.

Patch was committed to mainline SVN.

Uros.
Index: i386.c
===
--- i386.c  (revision 190347)
+++ i386.c  (working copy)
@@ -4135,43 +4135,42 @@ ix86_option_override (void)
 static void
 ix86_conditional_register_usage (void)
 {
-  int i;
+  int i, c_mask;
   unsigned int j;
 
-  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-{
-  if (fixed_regs[i] > 1)
-   fixed_regs[i] = (fixed_regs[i] == (TARGET_64BIT ? 3 : 2));
-  if (call_used_regs[i] > 1)
-   call_used_regs[i] = (call_used_regs[i] == (TARGET_64BIT ? 3 : 2));
-}
-
   /* The PIC register, if it exists, is fixed.  */
   j = PIC_OFFSET_TABLE_REGNUM;
   if (j != INVALID_REGNUM)
 fixed_regs[j] = call_used_regs[j] = 1;
 
-  /* The 64-bit MS_ABI changes the set of call-used registers.  */
-  if (TARGET_64BIT_MS_ABI)
+  /* For 32-bit targets, squash the REX registers.  */
+  if (! TARGET_64BIT)
 {
-  call_used_regs[SI_REG] = 0;
-  call_used_regs[DI_REG] = 0;
-  call_used_regs[XMM6_REG] = 0;
-  call_used_regs[XMM7_REG] = 0;
+  for (i = FIRST_REX_INT_REG; i <= LAST_REX_INT_REG; i++)
+   fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
   for (i = FIRST_REX_SSE_REG; i <= LAST_REX_SSE_REG; i++)
-   call_used_regs[i] = 0;
+   fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
 }
 
-  /* The default setting of CLOBBERED_REGS is for 32-bit; add in the
- other call-clobbered regs for 64-bit.  */
-  if (TARGET_64BIT)
+  /*  See the definition of CALL_USED_REGISTERS in i386.h.  */
+  c_mask = (TARGET_64BIT_MS_ABI ? (1 << 3)
+   : TARGET_64BIT ? (1 << 2)
+   : (1 << 1));
+  
+  CLEAR_HARD_REG_SET (reg_class_contents[(int)CLOBBERED_REGS]);
+
+  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
 {
-  CLEAR_HARD_REG_SET (reg_class_contents[(int)CLOBBERED_REGS]);
+  /* Set/reset conditionally defined registers from
+CALL_USED_REGISTERS initializer.  */
+  if (call_used_regs[i] > 1)
+   call_used_regs[i] = !!(call_used_regs[i] & c_mask);
 
-  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-   if (TEST_HARD_REG_BIT (reg_class_contents[(int)GENERAL_REGS], i)
-   && call_used_regs[i])
- SET_HARD_REG_BIT (reg_class_contents[(int)CLOBBERED_REGS], i);
+  /* Calculate registers of CLOBBERED_REGS register set
+as call used registers from GENERAL_REGS register set.  */
+  if (TEST_HARD_REG_BIT (reg_class_contents[(int)GENERAL_REGS], i)
+ && call_used_regs[i])
+   SET_HARD_REG_BIT (reg_class_contents[(int)CLOBBERED_REGS], i);
 }
 
   /* If MMX is disabled, squash the registers.  */
@@ -4191,15 +4190,6 @@ ix86_conditional_register_usage (void)
 for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
   if (TEST_HARD_REG_BIT (reg_class_contents[(int)FLOAT_REGS], i))
fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
-
-  /* If 32-bit, squash the 64-bit registers.  */
-  if (! TARGET_64BIT)
-{
-  for (i = FIRST_REX_INT_REG

Re: [PATCH 0/7] s390 improvements with r[ioxn]sbg

2012-08-13 Thread Ulrich Weigand
Richard Henderson wrote:
> Only "tested" visually, by examining assembly diffs of the
> runtime libraries between successive patches.  All told it
> would appear to be some remarkable code size improvements.

Thanks for having a look at this!

> Please test.

Unfortunately GCC crashes during bootstrap, probably because on
of the issues below.

A couple of comments to the patches:

>   s390: Constraints, predicates, and op letters for contiguous bitmasks
>   s390: Only use lhs zero_extract in word_mode
>   s390: Use risbgz for AND.
>   s390: Add mode attribute for mode bitsize

These look all good to me.

>   s390: Implement extzv for z10

> +/* Check whether a rotate of ROTL followed by an AND of CONTIG is equivalent
> +   to a shift followed by the AND.  In particular, CONTIG should not overlap
> +   the (rotated) bit 0/bit 63 gap.  */
> +
> +bool
> +s390_extzv_shift_ok (int bitsize, int rotl, unsigned HOST_WIDE_INT contig)
> +{
> +  int pos, len;
> +  bool ok;
> +
> +  ok = s390_contiguous_bitmask_p (contig, bitsize, &pos, &len);
> +  gcc_assert (ok);
> +
> +  return (rotl <= pos || rotl >= pos + len + (64 - bitsize));
> +}

I don't quite see how this can work correctly for both left and right shifts.
E.g. for bitsize == 64, rotl == 1, contig == 1, a left shift by 1 followed
by the AND is always zero, and certainly not equal to the rotate.  But the
routine would return true.  (Note that the same routine would be called for
a *right* shift by 63, in which case the "true" result is actually correct.)

> +(define_insn "extzv"
> +  [(set (match_operand:DI 0 "register_operand" "=d")
> +   (zero_extract:DI
> + (match_operand:DI 1 "register_operand" "d")
> + (match_operand 2 "const_int_operand" "")
> + (match_operand 3 "const_int_operand" "")))
> +   (clobber (reg:CC CC_REGNUM))]
> +  "TARGET_Z10"
> +  "risbg\t%0,%1,63-%3-%2,128+63,63-%3-%2"

This doesn't look right for bits-big-endian order.  Shouldn't we have a
rotate count of %3+%2, and a start bit position of 64-%2 ?

> -(define_insn_and_split "*extv"
> +(define_insn_and_split "*pre_z10_extv"
>[(set (match_operand:GPR 0 "register_operand" "=d")
> (sign_extract:GPR (match_operand:QI 1 "s_operand" "QS")
> - (match_operand 2 "const_int_operand" "n")
> + (match_operand 2 "nonzero_shift_count_operand" "")
>   (const_int 0)))
> (clobber (reg:CC CC_REGNUM))]
> -  "INTVAL (operands[2]) > 0
> -   && INTVAL (operands[2]) <= GET_MODE_BITSIZE (SImode)"
> +  "!TARGET_Z10"
>"#"
>"&& reload_completed"

Why disable this for pre-z10?

> +(define_insn "*extzv__srl"
> +  [(set (match_operand:DSI 0 "register_operand" "=d")
> +   (and:DSI (lshiftrt:DSI
> +  (match_operand:DSI 1 "register_operand" "d")
> +  (match_operand:DSI 2 "nonzero_shift_count_operand" ""))
> +   (match_operand:DSI 3 "contiguous_bitmask_operand" "")))
> +   (clobber (reg:CC CC_REGNUM))]
> +  "TARGET_Z10

While those are equivalent (given that TARGET_Z10 implies TARGET_ZARCH),
it would seem more in line with many other patterns to use GPR instead of DSI.

>   s390: Generate rxsbg, and shifted forms of rosbg

This looks OK.

>   s390: Generate rnsbg

> +(define_insn "*insv_rnsbg_srl"
> +  [(set (zero_extract:DI
> + (match_operand:DI 0 "nonimmediate_operand" "+d")
> + (match_operand 1 "const_int_operand" "")
> + (match_operand 2 "const_int_operand" ""))
> +   (and:DI
> + (lshiftrt:DI
> +   (match_dup 0)
> +   (match_operand 3 "const_int_operand" ""))
> + (match_operand:DI 4 "nonimmediate_operand" "d")))
> +   (clobber (reg:CC CC_REGNUM))]
> +  "TARGET_Z10
> +   && INTVAL (operands[3]) == 64 - INTVAL (operands[1]) - INTVAL 
> (operands[2])"
> +  "rnsbg\t%0,%4,%2,%2+%1-1,64-%2,%1"

I guess the last "," is supposed to be a "-".  (Then we might
as well use %3 instead of 64-%2-%1.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com



Re: [PATCH 0/7] s390 improvements with r[ioxn]sbg

2012-08-13 Thread Richard Henderson
On 08/13/2012 10:07 AM, Ulrich Weigand wrote:
>> +/* Check whether a rotate of ROTL followed by an AND of CONTIG is equivalent
>> +   to a shift followed by the AND.  In particular, CONTIG should not overlap
>> +   the (rotated) bit 0/bit 63 gap.  */
>> +
>> +bool
>> +s390_extzv_shift_ok (int bitsize, int rotl, unsigned HOST_WIDE_INT contig)
>> +{
>> +  int pos, len;
>> +  bool ok;
>> +
>> +  ok = s390_contiguous_bitmask_p (contig, bitsize, &pos, &len);
>> +  gcc_assert (ok);
>> +
>> +  return (rotl <= pos || rotl >= pos + len + (64 - bitsize));
>> +}
> 
> I don't quite see how this can work correctly for both left and right shifts.
> E.g. for bitsize == 64, rotl == 1, contig == 1, a left shift by 1 followed
> by the AND is always zero, and certainly not equal to the rotate.  But the
> routine would return true.  (Note that the same routine would be called for
> a *right* shift by 63, in which case the "true" result is actually correct.)

Absolutely correct; I hadn't considered that.

>> +(define_insn "extzv"
>> +  [(set (match_operand:DI 0 "register_operand" "=d")
>> +   (zero_extract:DI
>> + (match_operand:DI 1 "register_operand" "d")
>> + (match_operand 2 "const_int_operand" "")
>> + (match_operand 3 "const_int_operand" "")))
>> +   (clobber (reg:CC CC_REGNUM))]
>> +  "TARGET_Z10"
>> +  "risbg\t%0,%1,63-%3-%2,128+63,63-%3-%2"
> 
> This doesn't look right for bits-big-endian order.  Shouldn't we have a
> rotate count of %3+%2, and a start bit position of 64-%2 ?

Yes.

>> -(define_insn_and_split "*extv"
>> +(define_insn_and_split "*pre_z10_extv"
>>[(set (match_operand:GPR 0 "register_operand" "=d")
>> (sign_extract:GPR (match_operand:QI 1 "s_operand" "QS")
>> - (match_operand 2 "const_int_operand" "n")
>> + (match_operand 2 "nonzero_shift_count_operand" "")
>>   (const_int 0)))
>> (clobber (reg:CC CC_REGNUM))]
>> -  "INTVAL (operands[2]) > 0
>> -   && INTVAL (operands[2]) <= GET_MODE_BITSIZE (SImode)"
>> +  "!TARGET_Z10"
>>"#"
>>"&& reload_completed"
> 
> Why disable this for pre-z10?

I blatantly assumed that L+RISBGZ was better than ICM+SRL.  I probably
shouldn't have also disabled the sign_extract version as well.

> While those are equivalent (given that TARGET_Z10 implies TARGET_ZARCH),
> it would seem more in line with many other patterns to use GPR instead of DSI.

Ok.

>> +   && INTVAL (operands[3]) == 64 - INTVAL (operands[1]) - INTVAL 
>> (operands[2])"
>> +  "rnsbg\t%0,%4,%2,%2+%1-1,64-%2,%1"
> 
> I guess the last "," is supposed to be a "-".  (Then we might
> as well use %3 instead of 64-%2-%1.)

Good catch.


r~


Re: [v3] improve exception text when threads not enabled

2012-08-13 Thread Jonathan Wakely
On 13 August 2012 16:47, Joe Buck  wrote:
> On Sun, Aug 12, 2012 at 08:02:30PM +0100, Jonathan Wakely wrote:
>> This improves the fairly uninformative "Operation not supported"
>> message given when std::thread is used without linking to libpthread.
>>
>> Now you get:
>>
>> terminate called after throwing an instance of 'std::system_error'
>>   what():  Enable multithreading to use std::thread: Operation not permitted
>> Aborted
>
> The new message still seems deficient.  The issue is that the executable
> does not contain any thread support; "not permitted" usually suggests a
> permission violation (like trying to write a read-only file).  Perhaps "no
> thread support found" should be used instead of "Operation not permitted".

"Operation not permitted" is what it has always printed, it's the
strerror text corresponding to EPERM, which is the error code we throw
in the std::system_error exception when thread support is not present.

std::system_error::what() is required to return a string
"incorporating the arguments supplied in the constructor" so it should
print a string corresponding to some standard error number. In
libstdc++ those strings are the same ones as returned by strerror.

std::thread is documented as throwing a std::system_error exception
with the error code EAGAIN if a thread couldn't be created, but
"Resource temporarily unavailable" would be misleading since no matter
how many times you try no thread can ever be created.  Instead we make
it throw a std::system_error with EPERM.

I suppose EOPNOTSUPP might be better, if it's supported everywhere.
EPERM has the advantage of being a documented error for
pthread_create.

In short, if you want to replace the "Operation not found" string you
ned to pick an errno value.  The "Enable multithreading to use
std::thread" part is free text that can be replaced or extended, but I
didn't want the message to be too long.


Re: RFC: fix std::unique_ptr pretty-printer

2012-08-13 Thread Jonathan Wakely
On 13 August 2012 14:31, Tom Tromey wrote:
>> "Jonathan" == Jonathan Wakely  writes:
>
>>> $11 = std::unique_ptr containing (datum *) 0x6067d0
>
> Jonathan> It's inconsistent with the other printers in that it prints
> Jonathan> the stored type, unlike e.g. std::vector which just says
> Jonathan> "std::vector of length ..." but I think that's an improvement.
>
> Yeah... without this bit it was just printing
>
> $11 = std::unique_ptr containing 0x6067d0
>
> Ordinarily, gdb will print the type here; but it doesn't when called
> from Python.  I thought the typical output was easier to read.
>
> Jonathan> Personally I'd prefer the element_type as part of the type, e.g.
> Jonathan> "std::unique_ptr = 0x6067d0" but that would be even more
> Jonathan> inconsistent!
>
> I can make that change if you'd prefer.
> I don't know why, but I didn't even think of it.

I prefer it as unique_ptr but I'm probably not your typical
user of the pretty printers, so if anyone else has an opinion please
share it.


[google/gcc-4_7] Fix regression - SUBTARGET_EXTRA_SPECS overridden by LINUX_GRTE_EXTRA_SPECS

2012-08-13 Thread 沈涵
Hi, the google/gcc-4_7 fails to linking anything (on x86-generic), by
looking into specs file, it seems that 'link_emulation' section is
missing in specs.

The problem is in config/i386/linux.h, SUBTARGET_EXTRA_SPECS (which is
not empty for chrome x86-generic) is overridden by
"LINUX_GRTE_EXTRA_SPECS".

My fix is to prepend LINUX_GRTE_EXTRA_SPECS to SUBTARGET_EXTRA_SPECS in linux.h

Jing, could you take a look at this?

--
Han Shen

2012-08-13 Han Shen  
* gcc/config/i386/gnu-user.h (SUBTARGET_EXTRA_SPECS): Compute
new value of LINUX_GRTE_EXTRA_SPECS by pre-pending LINUX_GRTE_EXTRA_SPECS
to its origin value.
* gcc/config/i386/gnu-user.h (SUBTARGET_EXTRA_SPECS_STR): Add
new MACRO to hold value of SUBTARET_EXTRA_SPECS so that
SUBTARET_EXTRA_SPECS could be replaced later in gnu-user.h

--- a/gcc/config/i386/gnu-user.h
+++ b/gcc/config/i386/gnu-user.h
@@ -92,11 +92,14 @@ along with GCC; see the file COPYING3.  If not see
 #define ASM_SPEC \
   "--32 %{!mno-sse2avx:%{mavx:-msse2avx}} %{msse2avx:%{!mavx:-msse2avx}}"

-#undef  SUBTARGET_EXTRA_SPECS
-#define SUBTARGET_EXTRA_SPECS \
+#undef  SUBTARGET_EXTRA_SPECS_STR
+#define SUBTARGET_EXTRA_SPECS_STR \
   { "link_emulation", GNU_USER_LINK_EMULATION },\
   { "dynamic_linker", GNU_USER_DYNAMIC_LINKER }

+#undef  SUBTARGET_EXTRA_SPECS
+#define SUBTARGET_EXTRA_SPECS SUBTARGET_EXTRA_SPECS_STR
+
 #undef LINK_SPEC
 #define LINK_SPEC "-m %(link_emulation) %{shared:-shared} \
   %{!shared: \
--- a/gcc/config/i386/linux.h
+++ b/gcc/config/i386/linux.h
@@ -32,5 +32,11 @@ along with GCC; see the file COPYING3.  If not see
 #endif

 #undef  SUBTARGET_EXTRA_SPECS
+#ifndef SUBTARGET_EXTRA_SPECS_STR
 #define SUBTARGET_EXTRA_SPECS \
   LINUX_GRTE_EXTRA_SPECS
+#else
+#define SUBTARGET_EXTRA_SPECS \
+  LINUX_GRTE_EXTRA_SPECS \
+  SUBTARGET_EXTRA_SPECS_STR
+#endif


Re: complex.h

2012-08-13 Thread Jonathan Wakely
On 13 August 2012 12:57, Marc Glisse wrote:
> I only modified the xml version. I expect the html version will be updated
> the next time someone who knows what they are doing touches the doc...

That's no problem, I tend to regenerate the html fairly frequently.

Thanks.


[contrib] Add expiration support for validate_failures.py

2012-08-13 Thread Diego Novillo
I noticed recently that while the validator was accepting the
'expire=MMDD' attribute, it was not actually doing anything with
it.

This patch fixes the oversight.  Simon, I will be backporting the
patch to google/gcc-4_7.

Committed to trunk.

2012-08-13  Diego Novillo  

* testsuite-management/validate_failures.py: Import datetime.
(TestResult.ExpirationDate): New.
(TestResult.HasExpired): New.
(ParseSummary): Call it.  If it returns True, warn that the
expected failure has expired and do not add it to the set of
expected results.
(GetResults): Clarify documentation.

diff --git a/contrib/testsuite-management/validate_failures.py 
b/contrib/testsuite-management/validate_failures.py
index ef01938..0ac9b15 100755
--- a/contrib/testsuite-management/validate_failures.py
+++ b/contrib/testsuite-management/validate_failures.py
@@ -46,6 +46,7 @@ executed it will:
with exit code 0.  Otherwise, it exits with error code 1.
 """
 
+import datetime
 import optparse
 import os
 import re
@@ -135,6 +136,26 @@ class TestResult(object):
   attrs = '%s | ' % self.attrs
 return '%s%s: %s %s' % (attrs, self.state, self.name, self.description)
 
+  def ExpirationDate(self):
+# Return a datetime.date object with the expiration date for this
+# test result expires.  Return None, if no expiration # has been set.
+if re.search(r'expire=', self.attrs):
+  expiration = re.search(r'expire=(\d\d\d\d)(\d\d)(\d\d)', self.attrs)
+  if not expiration:
+Error('Invalid expire= format in "%s".  Must be of the form '
+  '"expire=MMDD"' % self)
+  return datetime.date(int(expiration.group(1)),
+   int(expiration.group(2)),
+   int(expiration.group(3)))
+return None
+
+  def HasExpired(self):
+# Return True if the expiration date of this result has passed.
+expiration_date = self.ExpirationDate()
+if expiration_date:
+  now = datetime.date.today()
+  return now > expiration_date
+
 
 def GetMakefileValue(makefile_name, value_name):
   if os.path.exists(makefile_name):
@@ -178,7 +199,13 @@ def ParseSummary(sum_fname):
   sum_file = open(sum_fname)
   for line in sum_file:
 if IsInterestingResult(line):
-  result_set.add(TestResult(line))
+  result = TestResult(line)
+  if result.HasExpired():
+# Tests that had an expiration set are not added to the
+# set of expected results.
+print 'WARNING: Expected failure "%s" has expired.' % line.strip()
+continue
+  result_set.add(result)
   sum_file.close()
   return result_set
 
@@ -220,16 +247,20 @@ def GetResults(sum_files):
 
 def CompareResults(manifest, actual):
   """Compare sets of results and return two lists:
- - List of results present in MANIFEST but missing from ACTUAL.
  - List of results present in ACTUAL but missing from MANIFEST.
+ - List of results present in MANIFEST but missing from ACTUAL.
   """
-  # Report all the actual results not present in the manifest.
+  # Collect all the actual results not present in the manifest.
+  # Results in this set will be reported as errors.
   actual_vs_manifest = set()
   for actual_result in actual:
 if actual_result not in manifest:
   actual_vs_manifest.add(actual_result)
 
-  # Simlarly for all the tests in the manifest.
+  # Collect all the tests in the manifest that were not found
+  # in the actual results.
+  # Results in this set will be reported as warnings (since
+  # they are expected failures that are not failing anymore).
   manifest_vs_actual = set()
   for expected_result in manifest:
 # Ignore tests marked flaky.


Re: [v3] improve exception text when threads not enabled

2012-08-13 Thread Jonathan Wakely
On 13 August 2012 18:49, Jonathan Wakely wrote:
> I suppose EOPNOTSUPP might be better, if it's supported everywhere.
> EPERM has the advantage of being a documented error for
> pthread_create.

We do define std::errc::operation_not_supported unconditionally on
most platforms, but not mingw or djgpp.  They don't necessarily
support std::errc::operation_not_permitted either, and so far noone
has made std::thread work on those targets anyway, so that's not a
showstopper.

e.g.

terminate called after throwing an instance of 'std::system_error'
  what():  Enable multithreading to use std::thread: Operation not supported
Aborted


Re: [Patch, fortran] PR46897 - [OOP] type-bound defined ASSIGNMENT(=) not used for derived type component in intrinsic assign

2012-08-13 Thread Mikael Morin
Hello Paul,

I think there are a couple of bugs not triggered by the single component
types in the test. See below.

On 13/08/2012 15:37, Paul Richard Thomas wrote:
> + 
> +   /* Go through the code chain eliminating all but calls to
> +  typebound procedures. Since we have been through
> +  resolve_typebound_subroutine. */
> +   for (; this_code; this_code = this_code->next)
> + {
> +   if (this_code->op == EXEC_ASSIGN_CALL)
> + {
> +   gfc_symbol *fsym = this_code->symtree->n.sym->formal->sym;
> +   /* Check that there is a defined assignment.  If so, then
> +  resolve the call.  */
> +   if (fsym->ts.type == BT_CLASS
> +   && CLASS_DATA (fsym)->ts.u.derived->f2k_derived
> +   && CLASS_DATA (fsym)->ts.u.derived->f2k_derived
> + ->tb_op[INTRINSIC_ASSIGN])
> + {
> +   resolve_call (this_code);
> +   goto next;
> + }
> + }
> + 
> +   next = this_code->next;
> +   if (this_code == root)
> + root = next;
> +   else
> + previous->next = next;
> + 
> +   next = this_code;
> +   next->next = NULL;
> +   gfc_free_statements (next);
This frees `this_code', but `this_code' is used to iterate the loop and
below.

> + next:
> +   previous = this_code;
This could be moved to the only next caller (`previous' doesn't need to
be updated if `this_code' is removed) to fix one usage of `this_code' :-).

> + }
> + 
> +   /* Now attach the remaining code chain to the input code. Step on
> +  to the end of the new code since resolution is complete.  */
This tells me that you know what you do...

> +   if (root)
> + {
> +   next = (*code)->next;
> +   (*code)->next = root;
> +   for (;root; root = root->next)
> + if (!root->next)
> +   break;
> +   root->next = next;
> +   *code = root;
> + }
... but I have the feeling that this makes (*code) unreachable and that
that's wrong. Shouldn't it be "root->next = *code;" ?
Maybe you want to remove (*code) at the first iteration (as it contains
the whole structure assignment), but in the next iteration, it contains
the first typebound call, etc, doesn't it?

By the way I'm not sure we can keep the whole structure assignment to
handle default assignment:
if we do it after the typebound calls, we overwrite their job so we have
to do it before.
However, if we do it before, we also overwrite components to be assigned
with a typebound call, and this can have some side effects as the LHS's
argument can be INTENT(INOUT).

Thoughts?
Mikael


Re: [PATCH] Combine location with block using block_locations

2012-08-13 Thread Dodji Seketeli
Hello Dehao,

I have mostly cosmetic comments to make about the libcpp parts.

Dehao Chen  writes:

> Index: libcpp/include/line-map.h
> ===
> *** libcpp/include/line-map.h (revision 189835)
> --- libcpp/include/line-map.h (working copy)
> *** struct GTY(()) line_map_ordinary {
> *** 89,95 
>

Just to sum things up, here is what I understand from the libcpp
changes.

The integer space is now segmented like this:

X  A  B  C
[]  <--- integer space of source_location

C is the smallest possible source_location and X is the biggest
possible.

>From X to A, we have instances of source_location encoded in an ad-hoc
client specific (from the point of view of libcpp/line-map) map.

>From A to B, we have instances of source_location encoded in macro maps.

>From B to C, we have instances of source_location encoded in ordinary
maps.

As of this patch, MAX_SOURCE_LOCATION is A.  X is 0x and is not
represented by any particular constant declaration.

>From the point of view libcpp the goal of this patch is to 

1/ Provide functions to encode a pair of 
   {non-ad-hoc source_location, client specific data} and yield an
   integer for that (a new instance of source_location).

2/ modify the lookup routines of line-map to have them recognize
   instances of source_location encoded in the ad-hoc map.

If this description is correct, I think a high level comment like this
should be added to line-map.c or line-map.h to help people understand
this in the future, a bit like what has been done for ordinary and macro
maps.

[...]

> + extern void *get_block_from_location (source_location);

I'd call this function get_block_from_ad_hoc_location instead.  Or
something like that, to hint at the fact that the parameter is not an
actual source location.

> + extern source_location get_locus_from_location (source_location);

Likewise, I'd call this get_source_location_from_ad_hoc_loc.

> +
> + #define COMBINE_LOCATION(LOC, BLOCK) \
> +   ((BLOCK) ? get_combine_location ((LOC), (BLOCK)) : (LOC))
> + #define IS_COMBINED_LOCATION(LOC) (((LOC) & MAX_SOURCE_LOCATION) != (LOC))
> +
>   /* Initialize a line map set.  */
>   extern void linemap_init (struct line_maps *);
>
> *** typedef struct
> *** 594,599 
> --- 604,611 
>
> int column;
>
> +   void *block;
> +

I'd just call this 'data' or something like, and add I comment
explaining that its a client specific data that is not manipulated by
libcpp.

> /* In a system header?. */
> bool sysp;
>   } expanded_location;


> --- libcpp/line-map.c (working copy)

[...]


> + struct location_block {
> +   source_location locus;
> +   void *block;
> + };

I think we should have a more general name than "block" here.  I am
thinking that other client code might be willing to associate entities
other than blocks to a given source_location in a scheme similar to this
one.  Also, it seems surprising to have the line-maps library deal with
blocks specifically.

So maybe something like this instead?:

/* Data structure to associate an arbitrary data to a source location.  */
struct location_ad_hoc_data {
  source_location locus;
  void *data;
};


Subsequently, if you agree with this, all the occurrences of 'block' in
the new code (function, types, and variable names) you introduced should
be replaced with 'ad_hoc_data'.

> + source_location
> + get_combine_location (source_location locus, void *block)

Shouldn't this be get_combined_location instead?  Note the 'd' after the
combine.

> --- gcc/input.h   (working copy)

[...]

> + #define LOCATION_LOCUS(LOC) \
> +   ((IS_COMBINED_LOCATION(LOC)) ? get_locus_from_location (LOC) : (LOC))

I think this name sounds confusing.  Maybe something like
SOURCE_LOCATION_FROM_AD_HOC_LOC instead?  And please add a comment to
explain a little bit about this business of ad hoc location that
associates a block with an actual source location.

> + #define LOCATION_BLOCK(LOC) \
> +   ((tree) ((IS_COMBINED_LOCATION (LOC)) ? get_block_from_location (LOC) \
> +   : NULL))

And the name of this macro would then be something like BLOCK_FROM_AD_HOC_LOC.

> + #define IS_UNKNOWN_LOCATION(LOC) \
> +   ((IS_COMBINED_LOCATION (LOC)) ? get_locus_from_location (LOC) == 0 \
> +   : (LOC) == 0)

-- 
Dodji


[PATCH] Fix up an IRA ICE (PR middle-end/53411, rtl-optimization/53495)

2012-08-13 Thread Jakub Jelinek
Hi!

move_unallocated_pseudos apparently relies on no insns being deleted
in between find_moveable_pseudos and itself, which can happen when
delete_trivially_dead_insns removes dead insns and insns that feed them.

This can be fixed either by moving the delete_trivially_dead_insns
call earlier in ira function (is there anything in between those lines that
could create trivially dead insns?), as done in this patch, or e.g. as done
in the first patch in the PR can be fixed by making move_unallocated_pseudos
more tollerant to insns being removed.

I've bootstrapped/regtested this on x86_64-linux and i686-linux, ok for
trunk (or is the patch in the PR preferred, or something else)?

2012-08-13  Jakub Jelinek  

PR middle-end/53411
PR rtl-optimization/53495
* ira.c (ira): Move delete_trivially_dead_insns call before
find_moveable_pseudos call.

* gcc.c-torture/compile/pr53411.c: New test.
* gcc.c-torture/compile/pr53495.c: New test.

--- gcc/ira.c.jj2012-08-10 12:57:39.0 +0200
+++ gcc/ira.c   2012-08-13 13:23:08.137339807 +0200
@@ -4206,6 +4206,9 @@ ira (FILE *f)
 
   allocated_reg_info_size = max_reg_num ();
 
+  if (delete_trivially_dead_insns (get_insns (), max_reg_num ()))
+df_analyze ();
+
   /* It is not worth to do such improvement when we use a simple
  allocation because of -O0 usage or because the function is too
  big.  */
@@ -4288,9 +4291,6 @@ ira (FILE *f)
 check_allocation ();
 #endif
 
-  if (delete_trivially_dead_insns (get_insns (), max_reg_num ()))
-df_analyze ();
-
   if (max_regno != max_regno_before_ira)
 {
   regstat_free_n_sets_and_refs ();
--- gcc/testsuite/gcc.c-torture/compile/pr53411.c.jj2012-08-13 
12:53:23.153131907 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr53411.c   2012-08-13 
12:53:01.0 +0200
@@ -0,0 +1,33 @@
+/* PR middle-end/53411 */
+
+int a, b, c, d, e, f, g, h;
+void fn1 (void);
+int fn2 (void);
+
+int
+fn3 (x)
+ int x;
+{
+  return a ? 0 : x;
+}
+
+void
+fn4 (char x)
+{
+  int i, j, k;
+  for (; e; e++)
+if (fn2 ())
+  {
+   f = 1;
+   k = 0;
+   for (; k <= 1; k++)
+ {
+   j = ~x;
+   i = f * j;
+   h = (fn3 (i | 0 <= c ^ 9L) != b | d) & 8;
+   g = x | 1;
+   fn1 ();
+ }
+  }
+  c = x;
+}
--- gcc/testsuite/gcc.c-torture/compile/pr53495.c.jj2012-08-13 
12:28:27.311493428 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr53495.c   2012-08-13 
12:27:57.0 +0200
@@ -0,0 +1,41 @@
+/* PR rtl-optimization/53495 */
+
+int a, b, c, d, e, g;
+static char
+fn1 (char p1, int p2)
+{
+  return p1 || p2 < 0 || p2 >= 1 || 1 >> p2 ? p1 : 0;
+}
+
+static long long fn2 (int *, int);
+static int fn3 ();
+void
+fn4 ()
+{
+  fn3 ();
+  fn2 (&a, d);
+}
+
+long long
+fn2 (int *p1, int p2)
+{
+  int f = -1L;
+  for (; c <= 1; c++)
+{
+  *p1 = 0;
+  *p1 = fn1 (c, p2 ^ f);
+}
+  a = 0;
+  e = p2;
+  return 0;
+}
+
+int
+fn3 ()
+{
+  b = 3;
+  for (; b; b--)
+c++;
+  g = 0 >= c;
+  return 0;
+}

Jakub


Re: [PATCH,i386] fma,fma4 and xop flags

2012-08-13 Thread Richard Henderson
On 08/10/2012 03:24 PM, Uros Bizjak wrote:
> +  (eq_attr "isa" "fma") (symbol_ref "TARGET_FMA")
> +  (eq_attr "isa" "fma4")
> +(symbol_ref "TARGET_FMA4 && !TARGET_FMA")

Why the !TARGET_FMA for fma4?

If both ISAs are enabled, I don't see why we couldn't choose from either.
If they really should be mutually exclusive, then that should happen elsewhere.

I do see that fma3 is one byte smaller.  So in the instances where we're
concerned with code size, and we have both isas, and there does happen to
be output overlap with one of the inputs, then we should use fma3.  But
we should also not have reload generate an extra move when fma4 is available.


r~


Re: [PATCH,i386] fma,fma4 and xop flags

2012-08-13 Thread Uros Bizjak
On Mon, Aug 13, 2012 at 9:03 PM, Richard Henderson  wrote:

>> +  (eq_attr "isa" "fma") (symbol_ref "TARGET_FMA")
>> +  (eq_attr "isa" "fma4")
>> +(symbol_ref "TARGET_FMA4 && !TARGET_FMA")
>
> Why the !TARGET_FMA for fma4?
>
> If both ISAs are enabled, I don't see why we couldn't choose from either.
> If they really should be mutually exclusive, then that should happen 
> elsewhere.
>
> I do see that fma3 is one byte smaller.  So in the instances where we're
> concerned with code size, and we have both isas, and there does happen to
> be output overlap with one of the inputs, then we should use fma3.  But
> we should also not have reload generate an extra move when fma4 is available.

AFAIU fma3 is better than fma4 for bdver2 (the only CPU that
implements both FMA sets). Current description of bdver2 doesn't even
enable fma4 in processor_alias_table due to this fact.

The change you are referring to adds preference for fma3 insn set for
generic code (not FMA4 builtins!), even when fma4 is enabled. So, no
matter which combination and sequence of -mfmfa -mfma4 or -mxop user
passes to the compiler, only fma3 instructions will be generated.

This change also allows -march=bdver2 to use PTA_FMA4 in
processor_alias_table, while still generating fma3 instructions only
for generic code.

Uros.


[PATCH][RFC] Fixing instability of -fschedule-insns for x86

2012-08-13 Thread Igor Zamyatin
Hi all!

Patch aims to fix instability introduced by first scheduler on x86. In
particular it targets following list:

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46843
[2] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46829
[3] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36680
[4] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42295

Main idea of this activity is mostly to provide user a possibility to
safely turn on first scheduler for his codes. In some cases this could
positively affect performance, especially for in-order Atom.

It would be great to hear some feedback from the community about the change.

Thanks in advance,
Igor


first_scheduler_fix.patch
Description: Binary data


Re: [PATCH] Fix some undefined behavior spots in gcc sources (PR c/53968)

2012-08-13 Thread Richard Sandiford
Jakub Jelinek  writes:
> --- gcc/simplify-rtx.c.jj 2012-08-10 15:49:20.0 +0200
> +++ gcc/simplify-rtx.c2012-08-13 09:51:43.628508537 +0200
> @@ -66,7 +66,7 @@ static rtx simplify_binary_operation_1 (
>  static rtx
>  neg_const_int (enum machine_mode mode, const_rtx i)
>  {
> -  return gen_int_mode (- INTVAL (i), mode);
> +  return gen_int_mode (-(unsigned HOST_WIDE_INT) INTVAL (i), mode);

Really minor, but UINTVAL would be nicer.

Richard


Re: Value type of map need not be default copyable

2012-08-13 Thread François Dumont

On 08/13/2012 02:10 PM, Paolo Carlini wrote:

On 08/12/2012 10:00 PM, François Dumont wrote:

Ok for trunk ?

Ok, thanks!

Paolo.

PS: you may want to remove the trailing blank line of 
testsuite_counter_type.h




Attached patch applied.

2012-08-13  François Dumont  
Ollie Wild  

* include/bits/hashtable.h
(_Hashtable<>_M_insert_multi_node(hash_code, node_type*)): New.
(_Hashtable<>_M_insert(_Args&&, false_type)): Use latter.
(_Hashtable<>::_M_emplace(false_type, _Args&&...)): Likewise.
(_Hashtable<>::_M_insert_bucket): Replace by ...
(_Hashtable<>::_M_insert_unique_node(size_type, hash_code, 
node_type*)):

... this, new.
(_Hashtable<>::_M_insert(_Args&&, true_type)): Use latter.
(_Hashtable<>::_M_emplace(true_type, _Args&&...)): Likewise.
* include/bits/hashtable_policy.h (_Map_base<>::operator[]): Use
latter, emplace the value_type rather than insert.
* include/std/unordered_map: Include tuple.
* include/std/unordered_set: Likewise.
* testsuite/util/testsuite_counter_type.h: New.
* testsuite/23_containers/unordered_map/operators/2.cc: New.

François

Index: include/bits/hashtable_policy.h
===
--- include/bits/hashtable_policy.h	(revision 190353)
+++ include/bits/hashtable_policy.h	(working copy)
@@ -577,8 +577,13 @@
   __node_type* __p = __h->_M_find_node(__n, __k, __code);
 
   if (!__p)
-	return __h->_M_insert_bucket(std::make_pair(__k, mapped_type()),
- __n, __code)->second;
+	{
+	  __p = __h->_M_allocate_node(std::piecewise_construct,
+  std::tuple(__k),
+  std::tuple<>());
+	  return __h->_M_insert_unique_node(__n, __code, __p)->second;
+	}
+
   return (__p->_M_v).second;
 }
 
@@ -598,9 +603,13 @@
   __node_type* __p = __h->_M_find_node(__n, __k, __code);
 
   if (!__p)
-	return __h->_M_insert_bucket(std::make_pair(std::move(__k),
-		mapped_type()),
- __n, __code)->second;
+	{
+	  __p = __h->_M_allocate_node(std::piecewise_construct,
+  std::forward_as_tuple(std::move(__k)),
+  std::tuple<>());
+	  return __h->_M_insert_unique_node(__n, __code, __p)->second;
+	}
+
   return (__p->_M_v).second;
 }
 
Index: include/bits/hashtable.h
===
--- include/bits/hashtable.h	(revision 190353)
+++ include/bits/hashtable.h	(working copy)
@@ -584,10 +584,17 @@
   __node_base*
   _M_get_previous_node(size_type __bkt, __node_base* __n);
 
-  template
-	iterator
-	_M_insert_bucket(_Arg&&, size_type, __hash_code);
+  // Insert node with hash code __code, in bucket bkt if no rehash (assumes
+  // no element with its key already present). Take ownership of the node,
+  // deallocate it on exception.
+  iterator
+  _M_insert_unique_node(size_type __bkt, __hash_code __code,
+			__node_type* __n);
 
+  // Insert node with hash code __code. Take ownership of the node,
+  // deallocate it on exception.
+  iterator
+  _M_insert_multi_node(__hash_code __code, __node_type* __n);
 
   template
 	std::pair
@@ -1214,42 +1221,29 @@
   {
 	// First build the node to get access to the hash code
 	__node_type* __node = _M_allocate_node(std::forward<_Args>(__args)...);
+	const key_type& __k = this->_M_extract()(__node->_M_v);
+	__hash_code __code;
 	__try
 	  {
-	const key_type& __k = this->_M_extract()(__node->_M_v);
-	__hash_code __code = this->_M_hash_code(__k);
-	size_type __bkt = _M_bucket_index(__k, __code);
-
-	if (__node_type* __p = _M_find_node(__bkt, __k, __code))
-	  {
-		// There is already an equivalent node, no insertion
-		_M_deallocate_node(__node);
-		return std::make_pair(iterator(__p), false);
-	  }
-
-	// We are going to insert this node
-	this->_M_store_code(__node, __code);
-	const __rehash_state& __saved_state
-	  = _M_rehash_policy._M_state();
-	std::pair __do_rehash
-	  = _M_rehash_policy._M_need_rehash(_M_bucket_count,
-		_M_element_count, 1);
-
-	if (__do_rehash.first)
-	  {
-		_M_rehash(__do_rehash.second, __saved_state);
-		__bkt = _M_bucket_index(__k, __code);
-	  }
-
-	_M_insert_bucket_begin(__bkt, __node);
-	++_M_element_count;
-	return std::make_pair(iterator(__node), true);
+	__code = this->_M_hash_code(__k);
 	  }
 	__catch(...)
 	  {
 	_M_deallocate_node(__node);
 	__throw_exception_again;
 	  }
+
+	size_type __bkt = _M_bucket_index(__k, __code);
+	if (__node_type* __p = _M_find_node(__bkt, __k, __code))
+	  {
+	// There is already an equivalent node, no insertion
+	_M_deallocate_node(__node);
+	return std::make_pair(iterator(__p), false);
+	  }
+
+	// Insert the node
+	return std::make_pair(_M_insert_unique_node(__bkt, __code, __node),
+			  true);
   }
 
   template::
   _M_emplace(std::false_type, _Args&&... __args)
   {
-	const __rehash_stat

Re: [PATCH,i386] fma,fma4 and xop flags

2012-08-13 Thread Richard Henderson
On 08/13/2012 12:33 PM, Uros Bizjak wrote:
> AFAIU fma3 is better than fma4 for bdver2 (the only CPU that
> implements both FMA sets). Current description of bdver2 doesn't even
> enable fma4 in processor_alias_table due to this fact.
> 
> The change you are referring to adds preference for fma3 insn set for
> generic code (not FMA4 builtins!), even when fma4 is enabled. So, no
> matter which combination and sequence of -mfmfa -mfma4 or -mxop user
> passes to the compiler, only fma3 instructions will be generated.

This rationale needs to appear as a comment above

> +  (eq_attr "isa" "fma4")
> +(symbol_ref "TARGET_FMA4 && !TARGET_FMA")

Longer term we may well require some sort of

  (TARGET_FMA4 && !(TARGET_FMA && TARGET_PREFER_FMA3))

with an appropriate entry in ix86_tune_features to match.


r~


[v3] fix libstdc++/54185

2012-08-13 Thread Jonathan Wakely
This fixes an error I inadvertently introduced a few months ago.

2012-08-13  David Adler 

PR libstdc++/54185
* src/c++11/condition_variable.cc (condition_variable): Always
destroy native type in destructor.
* testsuite/30_threads/condition_variable/54185.cc: New.

Tested x86-64-linux, committed to 4.7 and trunk.

Thanks for the bug report and patch, David.
commit 03a66e46ee35b74372da5c46caa4e7761d10b4c8
Author: Jonathan Wakely 
Date:   Mon Aug 13 19:45:30 2012 +0100

2012-08-13  David Adler  

PR libstdc++/54185
* src/c++11/condition_variable.cc (condition_variable): Always
destroy native type in destructor.
* testsuite/30_threads/condition_variable/54185.cc: New.

diff --git a/libstdc++-v3/src/c++11/condition_variable.cc 
b/libstdc++-v3/src/c++11/condition_variable.cc
index 9cd0763..001d95c 100644
--- a/libstdc++-v3/src/c++11/condition_variable.cc
+++ b/libstdc++-v3/src/c++11/condition_variable.cc
@@ -1,6 +1,6 @@
 // condition_variable -*- C++ -*-
 
-// Copyright (C) 2008, 2009, 2010 Free Software Foundation, Inc.
+// Copyright (C) 2008-2012 Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the
@@ -32,12 +32,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #ifdef __GTHREAD_COND_INIT
   condition_variable::condition_variable() noexcept = default;
-  condition_variable::~condition_variable() noexcept = default;
 #else
   condition_variable::condition_variable() noexcept
   {
 __GTHREAD_COND_INIT_FUNCTION(&_M_cond);
   }
+#endif
 
   condition_variable::~condition_variable() noexcept
   {
@@ -45,7 +45,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 /* int __e = */ __gthread_cond_destroy(&_M_cond);
 // if __e == EBUSY then blocked
   }
-#endif
 
   void
   condition_variable::wait(unique_lock& __lock)
diff --git a/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc 
b/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
new file mode 100644
index 000..5769670
--- /dev/null
+++ b/libstdc++-v3/testsuite/30_threads/condition_variable/54185.cc
@@ -0,0 +1,62 @@
+// { dg-do run { target *-*-freebsd* *-*-netbsd* *-*-linux* *-*-solaris* 
*-*-cygwin *-*-darwin* powerpc-ibm-aix* } }
+// { dg-options " -std=gnu++0x -pthread" { target *-*-freebsd* *-*-netbsd* 
*-*-linux* powerpc-ibm-aix* } }
+// { dg-options " -std=gnu++0x -pthreads" { target *-*-solaris* } }
+// { dg-options " -std=gnu++0x " { target *-*-cygwin *-*-darwin* } }
+// { dg-require-cstdint "" }
+// { dg-require-gthreads "" }
+
+// Copyright (C) 2012 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+#include 
+#include 
+
+// PR libstdc++/54185
+
+std::condition_variable* cond = nullptr;
+std::mutex mx;
+int started = 0;
+int constexpr NUM_THREADS = 10;
+
+void do_thread_a()
+{
+  std::unique_lock lock(mx);
+  if(++started >= NUM_THREADS)
+  {
+cond->notify_all();
+delete cond;
+cond = nullptr;
+  }
+  else
+cond->wait(lock);
+}
+
+int main(){
+  std::vector vec;
+  for(int j = 0; j < 1000; ++j)
+  {
+started = 0;
+cond = new std::condition_variable;
+for (int i = 0; i < NUM_THREADS; ++i)
+  vec.emplace_back(&do_thread_a);
+for (int i = 0; i < NUM_THREADS; ++i)
+  vec[i].join();
+vec.clear();
+  }
+}


[Fortran] PR37336 - FIINAL patch [1/n]: Implement the finalization wrapper subroutine

2012-08-13 Thread Tobias Burnus

Dear all,

Attached is the first part of a patch which will implement finalization 
support and polymorphic freeing in gfortran.



It addresses two needs:

a) For polymorphic ("CLASS") variables, allocatable components have to 
be freed; however, at compile time only the allocatable components of 
the declared type are known – and the dynamic type might have more


b) Fortran 2003 allows finalization subroutines ("FINAL", destructors), 
which can be elemental, scalar or for a given rank (any array type is 
allowed). Those should be called for DEALLOCATE, leaving the scope 
(unless saved), intrinsic assignment and with intent(out).



The finalization is done as follows (F2008, "4.5.6.2 The finalization 
process")


"(1) If the dynamic type of the entity has a final subroutine whose 
dummy argument has the same kind type parameters and rank as the entity 
being finalized, it is called with the entity as an actual argument. 
Otherwise, if there is an elemental final subroutine whose dummy 
argument has the same kind type parameters as the entity being 
finalized, it is called with the entity as an actual argument. 
Otherwise, no subroutine is called at this point.


"(2) All finalizable components that appear in the type definition are 
finalized in a processor-dependent order. If the entity being finalized 
is an array, each finalizable component of each element of that entity 
is finalized separately.


"(3) If the entity is of extended type and the parent type is 
finalizable, the parent component is finalized."



The idea is to create a wrapper function which handles those steps - and 
attach a reference to the dynamic type (i.e. add it via proc-pointer to 
the vtable). Additionally, the wrapper can be directly called for TYPE.



The attached patch implements the generation of the wrapper subroutine; 
it does not yet implement the actual calls. The wrapper is generated on 
Fortran AST level and creates code similar to


subroutine final_wrapper_for_type_t (array)
type(t), intent(inout) :: array(..)
integer, pointer :: ptr
integer(c_intptr_t) :: i, addr

select case (rank (array))
case (3)
call final_rank3 (array)
case default:
do i = 0, size (array)-1
addr = transfer (c_loc (array), addr) + i * STORAGE_SIZE (array)
call c_f_pointer (transfer (addr, c_ptr), ptr)
call elemental_final (ptr)
end do
end select

! For all noninherited allocatable components, call
! DEALLOCATE(array(:)%comp, stat=ignore)
! scalarized as above

call final_wrapper_of_parent (array(...)%parent)
end subroutine final_wrapper_for_type_t


Note 1: The call to the parent type requires packing support for 
assumed-rank arrays, which has not yet been implemented (also required 
for TS29113, though not for this usage). That is, without further 
patches, the wrapper will only work for scalars or if the parent has no 
wrapper subroutine.


Note 2: The next step will be to add the calls to the wrapper, starting 
with an explicit DEALLOCATE.



I intent to commit the patch, when approved, without allowing FINAL at 
resolution time; that way there is no false impression that finalization 
actually works.


Build and regtested on x86-64-gnu-linux.
OK for the trunk?

* * *

Note: The patch will break gfortran's OOP ABI. It does so by adding 
"_final" to the virtual table (vtab).


I think breaking the ABI for this functionality is unavoidable. The ABI 
change only affects code which uses the CLASS (polymorphic variables) 
and the issue only raises if one mixes old with new code for the same 
derived type. However, if one does so (e.g. by incomplete 
recompilation), segfaults and similar issues will occur. Hence, I am 
considering to bump the .mod version; that will effectively force a 
recompilation and thus avoid the issue. The down side is that it will 
also break packages (e.g. of Linux distributions) which ship .mod files 
(sorry!). What do you think?


I think it could then be combined with Janus' proc-pointer patch, which 
changes the assembler name of (non-Bind(C)) procedure pointers, declared 
at module level. Again, by forcing recompilation, the .mod version bump 
should ensure that users don't see the ABI breakage. His patch is at 
http://gcc.gnu.org/ml/fortran/2012-04/msg00033.html (I think is okay, 
but I believe it has not yet been reviewed.)


Tobias

PS: I used the following test case to test whether the wrapper 
generation and scalarization works; it properly prints 11,22,33,44,55,66 
and also the dump looks okay for various versions.


The scalarization code should work relatively well; there is only one 
call to an external function: For SIZE gfortran - for what ever reason - 
doesn't generate inline code, but calls libgfortran.



But now the test code:

module m
type tt
end type tt

type t
! type(tt), allocatable :: comp1
integer :: val
contains
final bar1
end type t

type t1t
! type(tt), allocatable :: comp1
integer :: val
!contains
! final bar1
end type t1t

type, extends(t) :: t2
type(tt), allocatable :: comp2
contains

[SH] PR 52933 - Use div0s insn for integer sign comparisons

2012-08-13 Thread Oleg Endo
Hello,

This patch adds basic support for utilizing the SH div0s instruction to
simplify some integer sign comparisons such as '(a < 0) == (b < 0)'.
Tested on rev 190332 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
OK?

Cheers,
Oleg

ChangeLog:

PR target/52933
* config/sh/sh.md (cmp_div0s_0, cmp_div0s_1, *cmp_div0s_0,
*cmp_div0s_1, *cbranch_div0s, *movsicc_div0s): New insns.
* config/sh/sh.c (sh_rtx_costs): Handle div0s patterns.

testsuite/ChangeLog:

PR target/52933
* gcc.target/sh/pr52933-1.c: New.
* gcc.target/sh/pr52933-2.c: New.
Index: gcc/testsuite/gcc.target/sh/pr52933-1.c
===
--- gcc/testsuite/gcc.target/sh/pr52933-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr52933-1.c	(revision 0)
@@ -0,0 +1,168 @@
+/* Check that the div0s instruction is used for integer sign comparisons.
+   Each test case is expected to emit at least one div0s insn.
+   Problems when combining the div0s comparison result with surrounding
+   logic usually show up as redundant tst insns.  */
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-options "-O2" } */
+/* { dg-skip-if "" { "sh*-*-*" } { "-m5*" } { "" } } */
+/* { dg-final { scan-assembler-times "div0s" 25 } } */
+/* { dg-final { scan-assembler-not "tst" } } */
+
+typedef unsigned char bool;
+
+int other_func_a (int, int);
+int other_func_b (int, int);
+
+bool
+test_00 (int a, int b)
+{
+  return (a ^ b) >= 0;
+}
+
+bool
+test_01 (int a, int b)
+{
+  return (a ^ b) < 0;
+}
+
+int
+test_02 (int a, int b, int c, int d)
+{
+  if ((a ^ b) < 0)
+return other_func_a (a, c);
+  else
+return other_func_b (d, b);
+}
+
+int
+test_03 (int a, int b, int c, int d)
+{
+  if ((a ^ b) >= 0)
+return other_func_a (a, c);
+  else
+return other_func_b (d, b);
+}
+
+int
+test_04 (int a, int b)
+{
+  return (a ^ b) >= 0 ? -20 : -40;
+}
+
+bool
+test_05 (int a, int b)
+{
+  return (a ^ b) < 0;
+}
+
+int
+test_06 (int a, int b)
+{
+  return (a ^ b) < 0 ? -20 : -40;
+}
+
+bool
+test_07 (int a, int b)
+{
+  return (a < 0) == (b < 0);
+}
+
+int
+test_08 (int a, int b)
+{
+  return (a < 0) == (b < 0) ? -20 : -40;
+}
+
+bool
+test_09 (int a, int b)
+{
+  return (a < 0) != (b < 0);
+}
+
+int
+test_10 (int a, int b)
+{
+  return (a < 0) != (b < 0) ? -20 : -40;
+}
+
+bool
+test_11 (int a, int b)
+{
+  return (a >= 0) ^ (b < 0);
+}
+
+int
+test_12 (int a, int b)
+{
+  return (a >= 0) ^ (b < 0) ? -20 : -40;
+}
+
+bool
+test_13 (int a, int b)
+{
+  return !((a >= 0) ^ (b < 0));
+}
+
+int
+test_14 (int a, int b)
+{
+  return !((a >= 0) ^ (b < 0)) ? -20 : -40;
+}
+
+bool
+test_15 (int a, int b)
+{
+ return (a & 0x8000) == (b & 0x8000);
+}
+
+int
+test_16 (int a, int b)
+{
+  return (a & 0x8000) == (b & 0x8000) ? -20 : -40;
+}
+
+bool
+test_17 (int a, int b)
+{
+  return (a & 0x8000) != (b & 0x8000);
+}
+
+int
+test_18 (int a, int b)
+{
+  return (a & 0x8000) != (b & 0x8000) ? -20 : -40;
+}
+
+int
+test_19 (unsigned int a, unsigned int b)
+{
+  return (a ^ b) >> 31;
+}
+
+int
+test_20 (unsigned int a, unsigned int b)
+{
+  return (a >> 31) ^ (b >> 31);
+}
+
+int
+test_21 (int a, int b)
+{
+  return ((a & 0x8000) ^ (b & 0x8000)) >> 31 ? -30 : -10;
+}
+
+int
+test_22 (int a, int b, int c, int d)
+{
+  if ((a < 0) == (b < 0))
+return other_func_a (a, b);
+  else
+return other_func_b (c, d);
+}
+
+bool
+test_23 (int a, int b, int c, int d)
+{
+  /* Should emit 2x div0s.  */
+  return ((a < 0) == (b < 0)) | ((c < 0) == (d < 0));
+}
Index: gcc/testsuite/gcc.target/sh/pr52933-2.c
===
--- gcc/testsuite/gcc.target/sh/pr52933-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr52933-2.c	(revision 0)
@@ -0,0 +1,12 @@
+/* Check that the div0s instruction is used for integer sign comparisons
+   when -mpretend-cmove is enabled.
+   Each test case is expected to emit at least one div0s insn.
+   Problems when combining the div0s comparison result with surrounding
+   logic usually show up as redundant tst insns.  */
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-options "-O2 -mpretend-cmove" } */
+/* { dg-skip-if "" { "sh*-*-*" } { "-m5*" } { "" } } */
+/* { dg-final { scan-assembler-times "div0s" 25 } } */
+/* { dg-final { scan-assembler-not "tst" } } */
+
+#include "pr52933-1.c"
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 190332)
+++ gcc/config/sh/sh.md	(working copy)
@@ -801,6 +801,70 @@
   "cmp/pl	%0"
[(set_attr "type" "mt_group")])
 
+;; Some integer sign comparison patterns can be realized with the div0s insn.
+;;	div0s	Rm,Rn		T = (Rm >> 31) ^ (Rn >> 31)
+(define_insn "cmp_div0s_0"
+  [(set (reg:SI T_REG)
+	(lshiftrt:SI (xor:SI (match_operand:SI 0 "arith_reg_operand" "%r")
+

Re: [patch] Fix problems with -fdebug-types-section and local types

2012-08-13 Thread Cary Coutant
> 2012-08-07   Cary Coutant  
>
> gcc/
> * dwarf2out.c (clone_as_declaration): Copy DW_AT_abstract_origin
> attribute.
> (generate_skeleton_bottom_up): Remove DW_AT_object_pointer attribute
> from original DIE.
> (clone_tree_hash): Rename to ...
> (clone_tree_partial): ... this; change callers.  Copy
> DW_TAG_subprogram DIEs as declarations.
>
> gcc/testsuite/
> * testsuite/g++.dg/debug/dwarf2/dwarf4-nested.C: New test case.
> * testsuite/g++.dg/debug/dwarf2/dwarf4-typedef.C: Add
> -fdebug-types-section flag.

Ping?

-cary


Re: Merge C++ conversion into trunk (5/6 - double_int rewrite)

2012-08-13 Thread Lawrence Crowl
On 8/13/12, Gabriel Dos Reis  wrote:
> On Aug 13, 2012 Marc Glisse  wrote:
> > On Mon, 13 Aug 2012, Jakub Jelinek wrote:
> > > On Sun, Aug 12, 2012 at 11:30:59PM +0200, Marc Glisse wrote:
> > > > > +inline double_int &
> > > > > +double_int::operator ++ ()
> > > > > +{
> > > > > +  *this + double_int_one;
> > > >
> > > > *this += double_int_one;
> > > > would be less confusing.
> > >
> > > Do you mean that *this + double_int_one; alone also works,
> > > just is confusing?  That would mean operator+ has side-effects,
> > > right?
> >
> > It "works" in that it compiles. It is confusing because the
> > addition is dead code and thus operator++ is a nop. Sorry for
> > my confusing euphemism, I should have called it a bug. operator+
> > has no side-effects AFAICS.
>
> yes, it is just as confusing and a bug as
>
> 2.3 + 1;
>
> is in plain C.

Yes, it is a bug.  It's a bit disturbing that it wasn't caught
in bootstrap.

> > Note that there are many obvious places where this operator can be used:
> >
> > varasm.c:  i = double_int_add (i, double_int_one);
> > tree-vrp.c: prod2h = double_int_add (prod2h, double_int_one);
> > tree-ssa-loop-niter.c:bound = double_int_add (bound, double_int_one);
> > tree-ssa-loop-niter.c:  *nit = double_int_add (*nit, double_int_one);
> > tree-ssa-loop-ivopts.c:max_niter = double_int_add (max_niter,
> > double_int_one);
> > gimple-fold.c:index = double_int_add (index, double_int_one);
> >
> > etc.
> >
> > As a side note, I don't usually like making operator+ a
> > member function.  It doesn't matter when there is no implicit
> > conversion, but if we ever add them, it will make addition
> > non-symmetric.
>
> As not everybody is familiar with C++ litotes, let me expand
> on this.  I believe you are not opposing overloading operator+ on
> double_int.  You are objecting to its implementation being defined
> as a member function.  That is you would be perfectly fine with
> operator+ defined as a free function, e.g. not a member function.

In the absence of symmetric overloading, the code is slightly
cleaner with operators as member functions.  It is easy to change
should we need symmetric overloads because, for the most part,
the change would have no effect on client code.

-- 
Lawrence Crowl


Re: Merge C++ conversion into trunk (5/6 - double_int rewrite)

2012-08-13 Thread Lawrence Crowl
On 8/12/12, Marc Glisse  wrote:
> On Sun, 12 Aug 2012, Diego Novillo wrote:
> > This implements the double_int rewrite.
> >
> > See http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00711.html for
> > details.
>
> I am taking it as a chance to ask a couple questions about the coding
> conventions.
>
> > 2012-08-12   Lawrence Crowl  
> >
> > * hash-table.h
> > (typedef double_int): Change to struct (POD).
> > (double_int::make): New overloads for int to double-int conversion.
>
> Isn't that double_int::from_* now?

Yes.

> > +typedef struct double_int
> > {
> > [...]
> > } double_int;
>
> Does the coding convention say something about this verbosity?

No.  It helps to have it in code that is compiled by both C and C++.
In this case, it will only be compiled by C++ and the verbosity
is unnecessary.  I left the verbosity as it was to help keep the
diff synchronized.  I certainly don't object to a cleanup pass for
this kind of stuff.

> > +  HOST_WIDE_INT to_signed () const;
> > +  unsigned HOST_WIDE_INT to_unsigned () const;
> > +
> > +  /* Conversion query functions.  */
> > +
> > +  bool fits_unsigned() const;
> > +  bool fits_signed() const;
>
> Space before the parentheses or not?

Space.  Sorry, gcc is the only coding convention I've used that
requires the space.  My fingers sometimes forget.

-- 
Lawrence Crowl


Re: Merge C++ conversion into trunk (5/6 - double_int rewrite)

2012-08-13 Thread Lawrence Crowl
On 8/13/12, Richard Guenther  wrote:
> Increment/decrement operations did not exist, please do not add
> them at this point.

Note that I have also added +=, -= and *= operations.  Having them
has three advantages.  First, it matches expectations on what
numeric types allow.  Second, it results in more concise code.
Third, it results in potentially faster code.  I think we should
be able to use those operators.

When I run through changing call sites, I really want to change
the sites to the final form, not do two passes.

-- 
Lawrence Crowl


C++ PR 54197: lifetime of reference not properly extended

2012-08-13 Thread Ollie Wild
This patch fixes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54197.

Certain constructs (see bug for examples) cause C++ references to be
initialized with COMPOUND_EXPR's.  The old logic was ignoring these,
causing those temporaries to be prematurely destroyed.

Tested on trunk via full x86_64 bootstrap and testsuite.

Okay for trunk and backport to gcc-4_7-branch?

Ollie


2012-08-13  Ollie Wild  

PR c++/54197
* gcc/cp/call.c (extend_ref_init_temps_1): Handle COMPOUND_EXPR trees.
* gcc/testsuite/g++.dg/init/lifetime3.C: New test.
commit dfd33145e3b32963e03b47bcc89f3eb2912714a6
Author: Ollie Wild 
Date:   Mon Aug 13 15:36:24 2012 -0500

2012-08-13  Ollie Wild  

PR c++/54197
* gcc/cp/call.c (extend_ref_init_temps_1): Handle COMPOUND_EXPR trees.
* gcc/testsuite/g++.dg/init/lifetime3.C: New test.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 5345f2b..b2fac16 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8924,6 +8924,12 @@ extend_ref_init_temps_1 (tree decl, tree init, 
VEC(tree,gc) **cleanups)
   tree sub = init;
   tree *p;
   STRIP_NOPS (sub);
+  if (TREE_CODE (sub) == COMPOUND_EXPR)
+{
+  TREE_OPERAND(sub, 1) = extend_ref_init_temps_1 (
+   decl, TREE_OPERAND(sub, 1), cleanups);
+  return init;
+}
   if (TREE_CODE (sub) != ADDR_EXPR)
 return init;
   /* Deal with binding to a subobject.  */
diff --git a/gcc/testsuite/g++.dg/init/lifetime3.C 
b/gcc/testsuite/g++.dg/init/lifetime3.C
new file mode 100644
index 000..d099699
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/lifetime3.C
@@ -0,0 +1,37 @@
+// PR c++/26714
+// { dg-do run }
+
+extern "C" void abort();
+
+bool ok = false;
+struct A {
+  A() { }
+  ~A() { if (!ok) abort(); }
+};
+
+struct B {
+  static A foo() { return A(); }
+};
+
+B b_g;
+
+struct scoped_ptr {
+  B* operator->() const { return &b_g; }
+  B* get() const { return &b_g; }
+};
+
+B *get() { return &b_g; }
+
+int main()
+{
+  scoped_ptr f;
+  const A& ref1 = f->foo();
+  const A& ref2 = f.get()->foo();
+  const A& ref3 = get()->foo();
+  const A& ref4 = B::foo();
+  B *pf = f.get();
+  const A& ref5 = pf->foo();
+
+
+  ok = true;
+}


[C++ Patch] for c++/11750

2012-08-13 Thread Fabien Chêne
Hi,

Here, we were setting the LOOKUP_NONVIRTUAL flag wrongly. Actually, we
need to check if the function context is the same than the instance
type -- yes that might happen that they be different in presence of
using-declarations.

It happens that it was working if the call was invoked through a
pointer, that's because  we were failing to determine the dynamic type
(in resolved_fixed_type_p). On the contrary, it wasn't working if the
call was done through a reference because we manage to determine the
dynamic_type thanks to a special case in fixed_type_or_null. There is
probably room for improvement here, though I'm not sure the C++ front
end is the better place to de-virtualize.

Tested x84_64-unknown-linux-gnu without regressions. OK to commit ?

gcc/testsuite/ChangeLog

2012-08-12  Fabien Chêne  

PR c++/11750
* g++.dg/inherit/vitual9.C: New.

gcc/cp/ChangeLog

2012-08-12  Fabien Chêne  

PR c++/11750
* call.c (build_new_method_call_1): Check that the instance type
and the function context are the same before setting the flag
LOOKUP_NONVIRTUAL.


-- 
Fabien


pr11750.patch
Description: Binary data


Re: C++ PR 54197: lifetime of reference not properly extended

2012-08-13 Thread Jakub Jelinek
On Mon, Aug 13, 2012 at 03:47:43PM -0500, Ollie Wild wrote:
> diff --git a/gcc/cp/call.c b/gcc/cp/call.c
> index 5345f2b..b2fac16 100644
> --- a/gcc/cp/call.c
> +++ b/gcc/cp/call.c
> @@ -8924,6 +8924,12 @@ extend_ref_init_temps_1 (tree decl, tree init, 
> VEC(tree,gc) **cleanups)
>tree sub = init;
>tree *p;
>STRIP_NOPS (sub);
> +  if (TREE_CODE (sub) == COMPOUND_EXPR)
> +{
> +  TREE_OPERAND(sub, 1) = extend_ref_init_temps_1 (
> +   decl, TREE_OPERAND(sub, 1), cleanups);
> +  return init;
> +}

The formatting doesn't match GCC coding conventions in several ways.
You don't have spaces before (, and ( shouldn't be at the end of line if
possible.

  TREE_OPERAND (sub, 1)
= extend_ref_init_temps_1 (decl, TREE_OPERAND (sub, 1), cleanups);

is what should be used instead.

Jakub


Re: C++ PR 54197: lifetime of reference not properly extended

2012-08-13 Thread Ollie Wild
On Mon, Aug 13, 2012 at 3:50 PM, Jakub Jelinek  wrote:
>
> The formatting doesn't match GCC coding conventions in several ways.
> You don't have spaces before (, and ( shouldn't be at the end of line if
> possible.

Updated patch attached.

Ollie
commit d023097c555a6f7cb84685fd7befedb550889d2c
Author: Ollie Wild 
Date:   Mon Aug 13 15:36:24 2012 -0500

2012-08-13  Ollie Wild  

PR c++/54197
* gcc/cp/call.c (extend_ref_init_temps_1): Handle COMPOUND_EXPR trees.
* gcc/testsuite/g++.dg/init/lifetime3.C: New test.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 5345f2b..f3a73af 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8924,6 +8924,12 @@ extend_ref_init_temps_1 (tree decl, tree init, 
VEC(tree,gc) **cleanups)
   tree sub = init;
   tree *p;
   STRIP_NOPS (sub);
+  if (TREE_CODE (sub) == COMPOUND_EXPR)
+{
+  TREE_OPERAND (sub, 1)
+= extend_ref_init_temps_1 (decl, TREE_OPERAND (sub, 1), cleanups);
+  return init;
+}
   if (TREE_CODE (sub) != ADDR_EXPR)
 return init;
   /* Deal with binding to a subobject.  */
diff --git a/gcc/testsuite/g++.dg/init/lifetime3.C 
b/gcc/testsuite/g++.dg/init/lifetime3.C
new file mode 100644
index 000..d099699
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/lifetime3.C
@@ -0,0 +1,37 @@
+// PR c++/26714
+// { dg-do run }
+
+extern "C" void abort();
+
+bool ok = false;
+struct A {
+  A() { }
+  ~A() { if (!ok) abort(); }
+};
+
+struct B {
+  static A foo() { return A(); }
+};
+
+B b_g;
+
+struct scoped_ptr {
+  B* operator->() const { return &b_g; }
+  B* get() const { return &b_g; }
+};
+
+B *get() { return &b_g; }
+
+int main()
+{
+  scoped_ptr f;
+  const A& ref1 = f->foo();
+  const A& ref2 = f.get()->foo();
+  const A& ref3 = get()->foo();
+  const A& ref4 = B::foo();
+  B *pf = f.get();
+  const A& ref5 = pf->foo();
+
+
+  ok = true;
+}


Re: Merge C++ conversion into trunk (5/6 - double_int rewrite)

2012-08-13 Thread Diego Novillo
On Mon, Aug 13, 2012 at 5:41 AM, Richard Guenther
 wrote:

>> *this += double_int_one;
>> would be less confusing.
>
> Increment/decrement operations did not exist, please do not add them
> at this point.

But they are going to be used when the call-sites are converted.
There is no point in leaving them out now.


Diego.


Re: Merge C++ conversion into trunk (5/6 - double_int rewrite)

2012-08-13 Thread Richard Henderson
On 08/13/2012 01:22 PM, Lawrence Crowl wrote:
>> > yes, it is just as confusing and a bug as
>> >
>> > 2.3 + 1;
>> >
>> > is in plain C.
> Yes, it is a bug.  It's a bit disturbing that it wasn't caught
> in bootstrap.
> 

You'll recall that I pointed it out last time around as well.


r~


Re: Merge C++ conversion into trunk (5/6 - double_int rewrite)

2012-08-13 Thread Lawrence Crowl
On 8/13/12, Richard Henderson  wrote:
> On 08/13/2012 01:22 PM, Lawrence Crowl wrote:
>>> > yes, it is just as confusing and a bug as
>>> >
>>> > 2.3 + 1;
>>> >
>>> > is in plain C.
>> Yes, it is a bug.  It's a bit disturbing that it wasn't caught
>> in bootstrap.
>
> You'll recall that I pointed it out last time around as well.

My memory must be losing bits.

-- 
Lawrence Crowl


Re: C++ PATCH for c++/48707 (c++0x ice on initialization in template)

2012-08-13 Thread H.J. Lu
On Mon, Apr 25, 2011 at 2:53 PM, Jason Merrill  wrote:
> In C++0x we can have an initializer that is potentially constant and yet
> still type-dependent if it involves a call, so we need to handle that.
>
> For 4.7 I'm explicitly testing for type-dependency; for 4.6 I've made a
> smaller change to make value_dependent_expression_p return true for
> type-dependent expressions as well.
>

> commit 6b978a4d2e33771d8fd95f39a301f8df180ac98c
> Author: Jason Merrill 
> Date:   Mon Apr 25 16:44:03 2011 -0400
>
> PR c++/48707
> * pt.c (value_dependent_expression_p): Handle type-dependent
> expression.
>
> diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> index ed48203..fc5177d 100644
> --- a/gcc/cp/pt.c
> +++ b/gcc/cp/pt.c
> @@ -18068,6 +18068,11 @@ value_dependent_expression_p (tree expression)
>if (DECL_P (expression) && type_dependent_expression_p (expression))
>  return true;
>
> +  /* We shouldn't have gotten here for a type-dependent expression, but
> + let's handle it properly anyway.  */
> +  if (TREE_TYPE (expression) == NULL_TREE)
> +return true;
> +
>switch (TREE_CODE (expression))
>  {
>  case IDENTIFIER_NODE:

Any particular reason why it was only applied to 4.6 branch?
The same patch also fixes:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53836

on trunk which is a regression from 4.6.

-- 
H.J.


Re: LEA-splitting improvement patch.

2012-08-13 Thread Uros Bizjak
Hello!

> It is known that LEA splitting is one of the most critical problems
> for Atom processors and changes try to improve it through:
> 1.   More aggressive Lea splitting – do not perform splitting if
> only split cost exceeds AGU stall .
> 2.   Reordering splitting instructions to get better scheduling –
> use the farthest defined register for SET instruction, then add
> constant offset if any and finally generate add instruction.This gives
> +0.5% speedup in geomean for eembc2.0 suite on Atom.
> All required testing was done – bootstraps for Atom & Core2, make check.
> Note that this fix affects only on Atom processors.

IMO, you should test LEA handling changes on x32 atom, too. With
recent changes, you will get lots of zero_extended addresses through
these functions, so I think it is worth to benchmark on x32 target.

> ChangeLog:
> 2012-08-08  Yuri Rumyantsev  yuri.s.rumyant...@intel.com
>
> * config/i386/i386-protos.h (ix86_split_lea_for_addr) : Add additional 
> argument.
> * config/i386/i386.md (ix86_splitt_lea_for_addr) : Add additional
> argument curr_insn.
> * config/i386/i386.c (find_nearest_reg-def): New function. Find
> nearest register definition used in address.

(find_nearest_reg_def)

> (ix86_split_lea_for_addr) : Do more aggressive lea splitting and
> instructions reodering to get opportunities for better scheduling.

Please merge entries for ix86_split_lea_for_addr.

@@ -16954,7 +16954,7 @@ ix86_lea_outperforms (rtx insn, unsigned int
regno0, unsigned int regno1,
   /* If there is no use in memory addess then we just check
  that split cost exceeds AGU stall.  */
   if (dist_use < 0)
-return dist_define >= LEA_MAX_STALL;
+return dist_define > LEA_MAX_STALL;

You didn't described this change in ChangeLog. Does this affect also
affect benchmark speedup?

+/* Return 0 if regno1 def is nearest to insn and 1 otherwise. */

Watch comment formatting and vertical spaces!

+static int
+find_nearest_reg_def (rtx insn, int regno1, int regno2)

IMO, you can return 0, 1 and 2; with 0 when no definitions are found
in the BB, 1 and 2 when either of two regnos are found. Otherwise,
please use bool for function type.

+   if (insn_defines_reg (regno1, regno1, prev))
+ return 0;
+   else if (insn_defines_reg (regno2, regno2, prev))

Please use INVALID_REGNUM as the second argument in the call to
insn_defines_reg when looking for only one regno definition.

{
- emit_insn (gen_rtx_SET (VOIDmode, target, parts.base));
- tmp = parts.index;
+  rtx tmp1;
+  /* Try to give more opportunities to scheduler -
+ choose operand for move instruction with longer
+ distance from its definition to insn. */

(Hm, I don't think you mean gcc insn scheduler here.)

+  if (find_nearest_reg_def (insn, regno1, regno2) == 0)
+{
+  tmp = parts.index;  /* choose index for move. */
+  tmp1 = parts.base;
+}
+  else
+   {
+ tmp = parts.base;
+ tmp1 = parts.index;
+   }
+ emit_insn (gen_rtx_SET (VOIDmode, target, tmp));
+  if (parts.disp && parts.disp != const0_rtx)
+ix86_emit_binop (PLUS, mode, target, parts.disp);
+ ix86_emit_binop (PLUS, mode, target, tmp1);
+  return;
}

(Please use tabs instead of spaces in the added code.)

However, this whole new part can be written simply as following (untested):

{
  rtx tmp1;

  if (find_nearest_reg_def (insn, regno1, regno2) == 0)
tmp1 = parts.base, tmp = parts.index;
  else
tmp1 = parts.index, tmp = parts.base;

  emit_insn (gen_rtx_SET (VOIDmode, target, tmp1);
}

Please see how tmp is handled further down in the function.

- ix86_emit_binop (PLUS, mode, target, tmp);
+ ix86_emit_binop (PLUS, mode, target, tmp);

Please watch accidental whitespace changes.

Uros.


[SH] PR 50751 - Add support for SH2A movu.b and movu.w insns

2012-08-13 Thread Oleg Endo
Hello,

This adds support for the SH2A instructions movu.b and movu.w for
zero-extending mem loads with displacement addressing.
Tested on rev 190332 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
OK?

Cheers,
Oleg

ChangeLog:

PR target/50751
* config/sh/constraints.md (Sra): New constraint.
* config/sh/predicates.md (simple_mem_operand, 
displacement_mem_operand, zero_extend_movu_operand): New 
predicates.
(zero_extend_operand): Check zero_extend_movu_operand for SH2A.
* config/sh/sh.md (*zero_extendqisi2_disp_mem, 
*zero_extendhisi2_disp_mem): Add new insns and two new related 
peephole2 patterns.

testsuite/ChangeLog:

PR target/50751
* gcc.target/sh/pr50751-8.c: New.
Index: gcc/config/sh/constraints.md
===
--- gcc/config/sh/constraints.md	(revision 190332)
+++ gcc/config/sh/constraints.md	(working copy)
@@ -49,6 +49,7 @@
 ;;  Sbw: QImode address with 12 bit displacement
 ;;  Snd: address without displacement
 ;;  Sdd: address with displacement
+;;  Sra: simple register address
 ;; W: vector
 ;; Z: zero in any mode
 ;;
@@ -307,3 +308,8 @@
(match_test "GET_MODE (op) == QImode")
(match_test "satisfies_constraint_K12 (XEXP (XEXP (op, 0), 1))")))
 
+(define_memory_constraint "Sra"
+  "A memory reference that uses a simple register addressing."
+  (and (match_test "MEM_P (op)")
+   (match_test "REG_P (XEXP (op, 0))")))
+
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 190332)
+++ gcc/config/sh/sh.md	(working copy)
@@ -4842,6 +4842,88 @@
   "extu.b	%1,%0"
   [(set_attr "type" "arith")])
 
+;; SH2A supports two zero extending load instructions: movu.b and movu.w.
+;; They could also be used for simple memory addresses like @Rn by setting
+;; the displacement value to zero.  However, doing so too early results in
+;; missed opportunities for other optimizations such as post-inc or index
+;; addressing loads.
+;; Although the 'zero_extend_movu_operand' predicate does not allow simple
+;; register addresses (an address without a displacement, index, post-inc),
+;; zero-displacement addresses might be generated during reload, wich are
+;; simplified to simple register addresses in turn.  Thus, we have to
+;; provide the Sdd and Sra alternatives in the patterns.
+(define_insn "*zero_extendqisi2_disp_mem"
+  [(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
+	(zero_extend:SI
+	  (match_operand:QI 1 "zero_extend_movu_operand" "Sdd,Sra")))]
+  "TARGET_SH2A"
+  "@
+	movu.b	%1,%0
+	movu.b	@(0,%t1),%0"
+  [(set_attr "type" "load")
+   (set_attr "length" "4")])
+
+(define_insn "*zero_extendhisi2_disp_mem"
+  [(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
+	(zero_extend:SI
+	  (match_operand:HI 1 "zero_extend_movu_operand" "Sdd,Sra")))]
+  "TARGET_SH2A"
+  "@
+	movu.w	%1,%0
+	movu.w	@(0,%t1),%0"
+  [(set_attr "type" "load")
+   (set_attr "length" "4")])
+
+;; Convert the zero extending loads in sequences such as:
+;;	movu.b	@(1,r5),r0	movu.w	@(2,r5),r0
+;;	mov.b	r0,@(1,r4)	mov.b	r0,@(1,r4)
+;;
+;; back to sign extending loads like:
+;;	mov.b	@(1,r5),r0	mov.w	@(2,r5),r0
+;;	mov.b	r0,@(1,r4)	mov.b	r0,@(1,r4)
+;;
+;; if the extension type is irrelevant.  The sign extending mov.{b|w} insn
+;; is only 2 bytes in size if the displacement is {K04|K05}.
+;; If the displacement is greater it doesn't matter, so we convert anyways.
+(define_peephole2
+  [(set (match_operand:SI 0 "arith_reg_dest" "")
+	(zero_extend:SI (match_operand 1 "displacement_mem_operand" "")))
+   (set (match_operand 2 "general_operand" "")
+	(match_operand 3 "arith_reg_operand" ""))]
+  "TARGET_SH2A
+   && REGNO (operands[0]) == REGNO (operands[3])
+   && peep2_reg_dead_p (2, operands[0])
+   && GET_MODE_SIZE (GET_MODE (operands[2]))
+  <= GET_MODE_SIZE (GET_MODE (operands[1]))"
+  [(set (match_dup 0) (sign_extend:SI (match_dup 1)))
+   (set (match_dup 2) (match_dup 3))])
+
+;; Fold sequences such as
+;;	mov.b	@r3,r7
+;;	extu.b	r7,r7
+;; into
+;;	movu.b	@(0,r3),r7
+;; This does not reduce the code size but the number of instructions is
+;; halved, which results in faster code.
+(define_peephole2
+  [(set (match_operand:SI 0 "arith_reg_dest" "")
+	(sign_extend:SI (match_operand 1 "simple_mem_operand" "")))
+   (set (match_operand:SI 2 "arith_reg_dest" "")
+	(zero_extend:SI (match_operand 3 "arith_reg_operand" "")))]
+  "TARGET_SH2A
+   && GET_MODE (operands[1]) == GET_MODE (operands[3])
+   && (GET_MODE (operands[1]) == QImode || GET_MODE (operands[1]) == HImode)
+   && REGNO (operands[0]) == REGNO (operands[3])
+   && (REGNO (operands[2]) == REGNO (operands[0])
+   || peep2_reg_dead_p (2, operands[0]))"
+  [(set (match_dup 2) (zero_extend:SI (match_dup 4)))]
+{
+  operands[4]
+= replace_equiv_address (o

Re: PING [PATCH] Fix PR libstdc++/54036, problem negating DFP NaNs

2012-08-13 Thread Peter Bergner
On 08/03/2012 11:28:57 +0200 Paolo Carlini wrote:
> No problem ;) Patch is Ok, thanks!

Hi Paolo,

I see you committed the patch for me.  Thanks!!  I literally posted
that last patch just minutes before heading out on vacation and didn't
want to commit it just before I left in case it caused any fallout.
Thanks again for your reviews and committing the patch for me!

Peter



[google/gcc-4_7] Backport arm hardfp patch from trunk

2012-08-13 Thread 沈涵
Hi Carrot, could you take a look at this patch? Thanks!

The modification is in upstream trunk patch revision - 186859.

The same patch has been back ported to google/gcc-4_6
(http://codereview.appspot.com/6206055/), this is to apply on
google/gcc-4_7

Regards,
-Han

2012-08-13  Han Shen  

Backport from mainline.
2012-05-01  Richard Earnshaw  

* arm/linux-eabi.h (GLIBC_DYNAMIC_LINKER_DEFAULT): Avoid ifdef
comparing enumeration values.  Update comments.

2012-04-26  Michael Hope  
Richard Earnshaw  

* config/arm/linux-eabi.h (GLIBC_DYNAMIC_LINKER_SOFT_FLOAT): Define.
(GLIBC_DYNAMIC_LINKER_HARD_FLOAT): Define.
(GLIBC_DYNAMIC_LINKER_DEFAULT): Define.
(GLIBC_DYNAMIC_LINKER): Redefine to use the hard float path.

diff --git a/gcc/config/arm/linux-eabi.h b/gcc/config/arm/linux-eabi.h
index c0cfde3..142054f 100644
--- a/gcc/config/arm/linux-eabi.h
+++ b/gcc/config/arm/linux-eabi.h
@@ -32,7 +32,8 @@
   while (false)

 /* We default to a soft-float ABI so that binaries can run on all
-   target hardware.  */
+   target hardware.  If you override this to use the hard-float ABI then
+   change the setting of GLIBC_DYNAMIC_LINKER_DEFAULT as well.  */
 #undef  TARGET_DEFAULT_FLOAT_ABI
 #define TARGET_DEFAULT_FLOAT_ABI ARM_FLOAT_ABI_SOFT

@@ -59,10 +60,25 @@
 #undef  SUBTARGET_EXTRA_LINK_SPEC
 #define SUBTARGET_EXTRA_LINK_SPEC " -m " TARGET_LINKER_EMULATION

-/* Use ld-linux.so.3 so that it will be possible to run "classic"
-   GNU/Linux binaries on an EABI system.  */
+/* GNU/Linux on ARM currently supports three dynamic linkers:
+   - ld-linux.so.2 - for the legacy ABI
+   - ld-linux.so.3 - for the EABI-derived soft-float ABI
+   - ld-linux-armhf.so.3 - for the EABI-derived hard-float ABI.
+   All the dynamic linkers live in /lib.
+   We default to soft-float, but this can be overridden by changing both
+   GLIBC_DYNAMIC_LINKER_DEFAULT and TARGET_DEFAULT_FLOAT_ABI.  */
+
 #undef  GLIBC_DYNAMIC_LINKER
-#define GLIBC_DYNAMIC_LINKER RUNTIME_ROOT_PREFIX "/lib/ld-linux.so.3"
+#define GLIBC_DYNAMIC_LINKER_SOFT_FLOAT \
+  RUNTIME_ROOT_PREFIX "/lib/ld-linux.so.3"
+#define GLIBC_DYNAMIC_LINKER_HARD_FLOAT \
+  RUNTIME_ROOT_PREFIX "/lib/ld-linux-armhf.so.3"
+#define GLIBC_DYNAMIC_LINKER_DEFAULT GLIBC_DYNAMIC_LINKER_SOFT_FLOAT
+
+#define GLIBC_DYNAMIC_LINKER \
+   "%{mfloat-abi=hard:" GLIBC_DYNAMIC_LINKER_HARD_FLOAT "} \
+%{mfloat-abi=soft*:" GLIBC_DYNAMIC_LINKER_SOFT_FLOAT "} \
+%{!mfloat-abi=*:" GLIBC_DYNAMIC_LINKER_DEFAULT "}"

 /* At this point, bpabi.h will have clobbered LINK_SPEC.  We want to
use the GNU/Linux version, not the generic BPABI version.  */


Re: [C++ Pubnames Patch] Anonymous namespaces enclosed in named namespaces. (issue6343052)

2012-08-13 Thread Sterling Augustine
On Sun, Aug 12, 2012 at 12:46 PM, Jack Howarth  wrote:
> On Sun, Jul 01, 2012 at 09:33:06AM -0500, Gabriel Dos Reis wrote:
>> On Thu, Jun 28, 2012 at 12:50 PM, Sterling Augustine
>>  wrote:
>> > The enclosed patch adds a fix for the pubnames anonymous namespaces 
>> > contained
>> > within named namespaces, and adds an extensive test for the various 
>> > pubnames.
>> >
>> > The bug is that when printing at verbosity level 1, and lang_decl_name 
>> > sees a
>> > namespace decl in not in the global namespace, it prints the namespace's
>> > enclosing scopes--so far so good. However, the code I added earlier this 
>> > month
>> > to handle anonymous namespaces also prints the enclosing scopes, so one 
>> > would
>> > get foo::foo::(anonymous namespace) instead of foo::(anonymous namespace).
>> >
>> > The solution is to stop the added code from printing the enclosing scope, 
>> > which
>> > is correct for both verbosity levels 0 and 1. Level 2 is handled elsewhere 
>> > and
>> > so not relevant.
>> >
>> > I have formalized the tests I have been using to be sure pubnames are 
>> > correct
>> > and include that in this patch. It is based on ccoutant's 
>> > gdb_index_test.cc from
>> > the gold test suite.
>> >
>> > OK for mainline?
>>
>> OK.
>
> This patch introduces the regressions...
>
> FAIL: g++.dg/debug/dwarf2/pubnames-2.C scan-assembler 
> .section\t.debug_pubnames
> FAIL: g++.dg/debug/dwarf2/pubnames-2.C scan-assembler
> "_GLOBAL__sub_I__ZN3one3c1vE0"+[ \t]+[#;]+[ \t]+external name
> FAIL: g++.dg/debug/dwarf2/pubnames-2.C scan-assembler 
> .section\t.debug_pubtypes
>
> at -m32/-m64 on x86_64-apple-darwin12...
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54230
>
> I have attached the -m32 assembly generated for the 
> g++.dg/debug/dwarf2/pubnames-2.C
> to PR54230 but haven't been able to add Sterling to the PR as none of his 
> email
> addresses are recognized by bugzilla.
>  Jack

The enclosed patch adjusts the test so it will pass on darwin. The
issue was that it looked for some elf-specific assembly directives,
which it shouldn't.

OK for mainline?

Sterling


2012-08-13  Sterling Augustine  

* g++.dg/debug/dwarf2/pubnames-2.C: Adjust.


darwin.patch
Description: Binary data


Re: [PATCH, MIPS] 74k madd scheduler tweaks

2012-08-13 Thread Maxim Kuvyrkov
On 9/08/2012, at 7:10 AM, Richard Sandiford wrote:

> Hmm, yeah, it does look like they should be using mips_linked_madd_p
> instead, except that mips_linked_madd_p isn't yet wired up to handle
> DSP macs.  Rather than pattern-match them all, the easiest thing would
> probably be to define a new attribute along the lines of:
> 
> (define_attr "accum_in" "none,0,1,2,3,4,5" (const_string "none"))
> 
> and use it for the existing imadds too.  E.g.:
> 
> (define_insn "*mul_acc_si"
>  [(set (match_operand:SI 0 "register_operand" "=l*?*?,d?")
>   (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d")
> (match_operand:SI 2 "register_operand" "d,d"))
>(match_operand:SI 3 "register_operand" "0,d")))
>   (clobber (match_scratch:SI 4 "=X,l"))
>   (clobber (match_scratch:SI 5 "=X,&d"))]
>  "GENERATE_MADD_MSUB && !TARGET_MIPS16"
>  "@
>madd\t%1,%2
>#"
>  [(set_attr "type""imadd")
>   (set_attr "accum_in" "3")
>   (set_attr "mode""SI")
>   (set_attr "length"  "4,8")])
> 
> Then mips_linked_madd_p can use get_attr_accum_in to check for chains.
> 

I thought I'll butt in since I did a very similar thing for sync_memmodel a 
couple of months ago.

Is the attached patch what you have in mind?  It is a standalone change by 
itself and can be checked in if OK.  Sandra can than adjust DSP patches she's 
working on to use mips_linked_madd_p.

OK to apply if no regressions?

Thanks,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



accum_in.ChangeLog
Description: Binary data


accum_in.patch
Description: Binary data



Re: [PATCH] Fix up an IRA ICE (PR middle-end/53411, rtl-optimization/53495)

2012-08-13 Thread Vladimir Makarov

On 08/13/2012 02:44 PM, Jakub Jelinek wrote:

Hi!

move_unallocated_pseudos apparently relies on no insns being deleted
in between find_moveable_pseudos and itself, which can happen when
delete_trivially_dead_insns removes dead insns and insns that feed them.

This can be fixed either by moving the delete_trivially_dead_insns
call earlier in ira function (is there anything in between those lines that
could create trivially dead insns?),
I can not imagine what can create trivially dead insns between those 
lines.  So this solution is ok.

  as done in this patch, or e.g. as done
in the first patch in the PR can be fixed by making move_unallocated_pseudos
more tollerant to insns being removed.

I've bootstrapped/regtested this on x86_64-linux and i686-linux, ok for
trunk (or is the patch in the PR preferred, or something else)?

Ok.  Thanks for fixing this bug, Jakub.

2012-08-13  Jakub Jelinek  

PR middle-end/53411
PR rtl-optimization/53495
* ira.c (ira): Move delete_trivially_dead_insns call before
find_moveable_pseudos call.

* gcc.c-torture/compile/pr53411.c: New test.
* gcc.c-torture/compile/pr53495.c: New test.






Re: [patch][gcov] Clarify the internals a bit

2012-08-13 Thread Nathan Sidwell

On 07/23/12 02:34, Steven Bosscher wrote:

Hello,

While reading up on how gcov/profiling works, I noticed that there are
a lot of places where the notes file is still referred to as the
"basic block graph" file. Also, the gcov manual has not been updated
for -fprofile-dir. The attached patch addresses these issues, so that
the next gcov newbie hopefully has an easier time understanding how
everything fits together.

Bootstrapped&tested on x86_64-unknown-linux-gnu. OK?


ok (been on vacation and catching up. sorry for the noise if this has already 
been acked)


nathan



Re: [EXTERNAL] [Fortran] PR37336 - FIINAL patch [1/n]: Implement the finalization wrapper subroutine

2012-08-13 Thread Rouson, Damian
Hi Tobias,

Thanks for your work on this.  This is a big step.  I would add to your
list the following:

(4) If the entity is of extended type and the parent type has a component
that is finalizable, the parent component's component is finalized.

In ForTrilnos, we need for this to happen even when the parent is abstract
but has a finalizable component.  So far, the IBM, NAG, and Cray compilers
support this use case and we've had enough dialogue with committee members
that I'm confident it's required by the standard, although I can't cite
the specific part of the standard that requires it.

Please copy my staff member Karla Morris on any replies.  Thanks again!

Damian


On 8/13/12 1:05 PM, "Tobias Burnus"  wrote:

>Dear all,
>
>Attached is the first part of a patch which will implement finalization
>support and polymorphic freeing in gfortran.
>
>
>It addresses two needs:
>
>a) For polymorphic ("CLASS") variables, allocatable components have to
>be freed; however, at compile time only the allocatable components of
>the declared type are known ­ and the dynamic type might have more
>
>b) Fortran 2003 allows finalization subroutines ("FINAL", destructors),
>which can be elemental, scalar or for a given rank (any array type is
>allowed). Those should be called for DEALLOCATE, leaving the scope
>(unless saved), intrinsic assignment and with intent(out).
>
>
>The finalization is done as follows (F2008, "4.5.6.2 The finalization
>process")
>
>"(1) If the dynamic type of the entity has a final subroutine whose
>dummy argument has the same kind type parameters and rank as the entity
>being finalized, it is called with the entity as an actual argument.
>Otherwise, if there is an elemental final subroutine whose dummy
>argument has the same kind type parameters as the entity being
>finalized, it is called with the entity as an actual argument.
>Otherwise, no subroutine is called at this point.
>
>"(2) All finalizable components that appear in the type definition are
>finalized in a processor-dependent order. If the entity being finalized
>is an array, each finalizable component of each element of that entity
>is finalized separately.
>
>"(3) If the entity is of extended type and the parent type is
>finalizable, the parent component is finalized."
>
>
>The idea is to create a wrapper function which handles those steps - and
>attach a reference to the dynamic type (i.e. add it via proc-pointer to
>the vtable). Additionally, the wrapper can be directly called for TYPE.
>
>
>The attached patch implements the generation of the wrapper subroutine;
>it does not yet implement the actual calls. The wrapper is generated on
>Fortran AST level and creates code similar to
>
>subroutine final_wrapper_for_type_t (array)
>type(t), intent(inout) :: array(..)
>integer, pointer :: ptr
>integer(c_intptr_t) :: i, addr
>
>select case (rank (array))
>case (3)
>call final_rank3 (array)
>case default:
>do i = 0, size (array)-1
>addr = transfer (c_loc (array), addr) + i * STORAGE_SIZE (array)
>call c_f_pointer (transfer (addr, c_ptr), ptr)
>call elemental_final (ptr)
>end do
>end select
>
>! For all noninherited allocatable components, call
>! DEALLOCATE(array(:)%comp, stat=ignore)
>! scalarized as above
>
>call final_wrapper_of_parent (array(...)%parent)
>end subroutine final_wrapper_for_type_t
>
>
>Note 1: The call to the parent type requires packing support for
>assumed-rank arrays, which has not yet been implemented (also required
>for TS29113, though not for this usage). That is, without further
>patches, the wrapper will only work for scalars or if the parent has no
>wrapper subroutine.
>
>Note 2: The next step will be to add the calls to the wrapper, starting
>with an explicit DEALLOCATE.
>
>
>I intent to commit the patch, when approved, without allowing FINAL at
>resolution time; that way there is no false impression that finalization
>actually works.
>
>Build and regtested on x86-64-gnu-linux.
>OK for the trunk?
>
>* * *
>
>Note: The patch will break gfortran's OOP ABI. It does so by adding
>"_final" to the virtual table (vtab).
>
>I think breaking the ABI for this functionality is unavoidable. The ABI
>change only affects code which uses the CLASS (polymorphic variables)
>and the issue only raises if one mixes old with new code for the same
>derived type. However, if one does so (e.g. by incomplete
>recompilation), segfaults and similar issues will occur. Hence, I am
>considering to bump the .mod version; that will effectively force a
>recompilation and thus avoid the issue. The down side is that it will
>also break packages (e.g. of Linux distributions) which ship .mod files
>(sorry!). What do you think?
>
>I think it could then be combined with Janus' proc-pointer patch, which
>changes the assembler name of (non-Bind(C)) procedure pointers, declared
>at module level. Again, by forcing recompilation, the .mod version bump
>should ensure that users don't see the ABI breakage. His patch is at
>http://gcc.gnu.org/ml/fortran/2012-04/

Re: [C++ Pubnames Patch] Anonymous namespaces enclosed in named namespaces. (issue6343052)

2012-08-13 Thread Mike Stump
On Aug 13, 2012, at 4:56 PM, Sterling Augustine wrote:
> The enclosed patch adjusts the test so it will pass on darwin. The
> issue was that it looked for some elf-specific assembly directives,
> which it shouldn't.
> 
> OK for mainline?

Ok.


  1   2   >