date:20140214

Re: [Patch, testsuite]: Allow MicroBlaze .weakext pattern in regex match

2014-02-14 Thread Mike Stump

On Feb 13, 2014, at 10:07 PM, David Holsgrove  
wrote:
> I've attached a patch to extend the regex pattern to include optional 'ext' 
> at the end of
> '.weak' to match the MicroBlaze weak label '.weakext' in two of the g++ test 
> cases.

I don’t feel strongly either way.  I'd like think weak(_definition)?(ext)?….. 
is good enough, as this test doesn’t much care beyond that.

spec34 does:

 { dg-final { scan-assembler ".weak(_definition)?\[\t \]*_?_Z2f2IiEvT_”

for example.  Which I think is fairly readable/maintainable.

Let’s give others that might disagree with me an opportunity to do so…  I’m 
happy to defer to anyone that has a stronger opinion than mine.  If no one 
steps forward, I’ll ok either way you want to go.

Wearing my hat as darwin/testsuite maintainer.  :-)

[PATCH, nds32] Committed: Minor adjustment to follow GNU coding standards.

2014-02-14 Thread Chung-Ju Wu

Hi,

In the gcc/config/nds32/nds32.c, there is one function definition which
does not follow GNU coding standards:
  http://www.gnu.org/prep/standards/standards.html
  Section "5.1 Formatting Your Source Code"

For a function definition, its function name should start in column one.
Fixed it as obvious, committed as Rev.207774.


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 207773)
+++ gcc/ChangeLog   (revision 207774)
@@ -1,3 +1,8 @@
+2014-02-14  Chung-Ju Wu  
+
+   * config/nds32/nds32.c (nds32_naked_function_p): Follow the
+   GNU coding standards.
+
 2014-02-13  Jakub Jelinek  

PR debug/60152


Index: gcc/config/nds32/nds32.c
===
--- gcc/config/nds32/nds32.c(revision 207773)
+++ gcc/config/nds32/nds32.c(revision 207774)
@@ -1445,7 +1445,8 @@
 }

 /* Return true if FUNC is a naked function.  */
-static bool nds32_naked_function_p (tree func)
+static bool
+nds32_naked_function_p (tree func)
 {
   tree t;



Best regards,
jasonwucj

[PATCH, nds32] Committed: Fix typo in comment.

2014-02-14 Thread Chung-Ju Wu

Hi,

There are few typo in comment of nds32 port.
Fixed them as obvious, committed as Rev.207775.


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 207774)
+++ gcc/ChangeLog   (revision 207775)
@@ -1,5 +1,10 @@
 2014-02-14  Chung-Ju Wu  

+   * config/nds32/t-mlibs (MULTILIB_OPTIONS): Fix typo in comment.
+   * config/nds32/nds32.c (nds32_merge_decl_attributes): Likewise.
+
+2014-02-14  Chung-Ju Wu  
+
* config/nds32/nds32.c (nds32_naked_function_p): Follow the
GNU coding standards.

Index: gcc/config/nds32/t-mlibs
===
--- gcc/config/nds32/t-mlibs(revision 207774)
+++ gcc/config/nds32/t-mlibs(revision 207775)
@@ -28,7 +28,7 @@
 #   6. -mlittle-endian -mgp-direct
 #   7. -mlittle-endian -mno-gp-direct
 #   8. -mbig-endian -mgp-direct
-#   9. -mlittle-endian -mno-gp-direct
+#   9. -mbig-endian -mno-gp-direct
 #
 # We also define a macro MULTILIB_DEFAULTS in nds32.h that tells the
 # driver program which options are defaults for this target and thus


Index: gcc/config/nds32/nds32.c
===
--- gcc/config/nds32/nds32.c(revision 207774)
+++ gcc/config/nds32/nds32.c(revision 207775)
@@ -3084,7 +3084,7 @@
   combined_attrs = merge_attributes (DECL_ATTRIBUTES (olddecl),
 DECL_ATTRIBUTES (newdecl));

-  /* Sinc newdecl is acutally a duplicate of olddecl,
+  /* Since newdecl is acutally a duplicate of olddecl,
  we can take olddecl for some operations.  */
   if (TREE_CODE (olddecl) == FUNCTION_DECL)
 {


Best regards,
jasonwucj

[PATCH, nds32] Committed: Do not use nreverse() on DECL_ATTRIBUTES(current_function_decl)).

2014-02-14 Thread Chung-Ju Wu

Hi,

I notice the implementation of nreverse() would change the original tree list.
It is not good if the argument is an attributes list of current function.
Since the rationale of using it, in nds32 port, is just to display the
attributes in the order that user specifies, I think it is ok not to use it
on attributes list.

Bootstrapped and tested on nds32le-elf and nds32be-elf.
Committed as Rev.20.


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 207776)
+++ gcc/ChangeLog   (revision 20)
@@ -1,5 +1,10 @@
 2014-02-14  Chung-Ju Wu  

+   * config/nds32/nds32.c (nds32_asm_function_prologue): Do not use
+   nreverse() because it changes the content of original tree list.
+
+2014-02-14  Chung-Ju Wu  
+
* config/nds32/t-mlibs (MULTILIB_OPTIONS): Fix typo in comment.
* config/nds32/nds32.c (nds32_merge_decl_attributes): Likewise.


Index: gcc/config/nds32/nds32.c
===
--- gcc/config/nds32/nds32.c(revision 207776)
+++ gcc/config/nds32/nds32.c(revision 20)
@@ -1960,10 +1960,9 @@

   /* Display the attributes of this function.  */
   fprintf (file, "\t! function attributes: ");
-  /* GCC build attributes list with reverse order,
- so we use nreverse() to make it looks like
- the order that user specifies.  */
-  attrs = nreverse (DECL_ATTRIBUTES (current_function_decl));
+  /* Get the attributes tree list.
+ Note that GCC builds attributes list with reverse order.  */
+  attrs = DECL_ATTRIBUTES (current_function_decl);

   /* If there is no any attribute, print out "None".  */
   if (!attrs)


Best regards,
jasonwucj

Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-02-14 Thread Jakub Jelinek

On Thu, Feb 13, 2014 at 01:22:41PM -0800, Richard Henderson wrote:
> > +/* This is to enable best-effort cleanup after fork.  */
> > +static int gomp_we_are_forked = 0;
> 
> bool, no explicit initialization, possible removal, see below.
> 
> > +static void
> > +gomp_free_thread_pool (int threads_running)
> 
> bool for threads_running.  It looks like a count otherwise.
> 
> > +gomp_after_fork_callback ()
> 
>  (void)
> 
> > +  pthread_atfork (NULL, NULL, &gomp_after_fork_callback);
> 
> & not needed.
> 
> Any reason not to just run gomp_free_thread_pool from gomp_after_fork_callback
> directly?  I see no restrictions on what kind of code is allowed to execute
> during that callback.

Well, fork is async signal safe function, so calling malloc/free, or any
kind of synchronization primitives is completely unsafe there.

The only safe thing could be to atomically or in some global flag (or set
some TLS flag?) and deal with the freeing next time you encounter omp
parallel.  But, the state of the old thread pool may be in some inconsistent
shape.

Jakub

Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-02-14 Thread Jakub Jelinek

On Fri, Feb 14, 2014 at 09:21:24AM +0100, Jakub Jelinek wrote:
> Well, fork is async signal safe function, so calling malloc/free, or any
> kind of synchronization primitives is completely unsafe there.
> 
> The only safe thing could be to atomically or in some global flag (or set
> some TLS flag?) and deal with the freeing next time you encounter omp
> parallel.  But, the state of the old thread pool may be in some inconsistent
> shape.

BTW, I think far cleaner solution would be to discuss on Omp-lang and add
some standard omp_* function which would allow to throw away all the cached
OpenMP threads, after calling that function one could not assume
threadprivate vars (other than in the initial thread) preserve their values.
If this function would be only allowed outside of the parallel region (i.e.
if omp_in_parallel () == 0, or even just if omp_get_level () == 0) and
pretend to do
#pragma omp parallel num_threads (1)
;
i.e. something after which it isn't guaranteed to preserve threadprivate
vars, then the library could perform this at the point where it is safe to
do so (of course it wouldn't be async-signal-safe function) and isn't a
performance issue (calling it when you are expecting to soon launch another
#pragma omp parallel could of course slow things down a lot).

Anything else is going to be either unsafe, or leak memory.

Jakub

Re: Fix PR libffi/60073

2014-02-14 Thread Alan Modra

On Thu, Feb 13, 2014 at 05:18:10PM +0100, Eric Botcazou wrote:
> This adds proper variadic support to the SPARC port of libffi, thus fixing a 
> regression in the testsuite in 64-bit mode, and fixes a small inaccuracy in 
> the documentation.
> 
> Tested on SPARC/Solaris and SPARC64/Solaris, applied on the mainline.

> +ffi_status ffi_prep_cif_machdep(ffi_cif *cif)
> +{
> +  cif->nfixedargs = cif->nargs;
> +  return ffi_prep_cif_machdep_core (cif);
> +}

Eric, sorry to rain on your parade, but you're making the same mistake
I did at first when adding powerpc64le support to libffi.  libffi is
built as a shared library.  You can't add a field to ffi_cif like this
and uncondionally write to it:  An application linked against an older
version of libffi will only allocate the old size ffi_cif.  Your new
shared library will trash some random location in the old user app..
I worked around this problem on powerpc by defining new enum ffi_abi
values so that you can recognize an old app.

-- 
Alan Modra
Australia Development Lab, IBM

[PATCH] Fix c-c++-common/ubsan/overflow-negate-2.c

2014-02-14 Thread Bernd Edlinger

Hi,

this test case fails on ARM, because this target has by default unsigned char 
type.

Attached please find my proposed (almost obvious) fix for this,
by using signed char, instead of char alone.


Boot-Strapped and tested on X86_64 and ARM.


Thanks
Bernd.

patch-overflow-negate-2.diff
Description: Binary data

Re: Fix PR libffi/60073

2014-02-14 Thread Eric Botcazou

> Eric, sorry to rain on your parade, but you're making the same mistake
> I did at first when adding powerpc64le support to libffi.  libffi is
> built as a shared library.  You can't add a field to ffi_cif like this
> and uncondionally write to it:  An application linked against an older
> version of libffi will only allocate the old size ffi_cif.  Your new
> shared library will trash some random location in the old user app..

OK, I didn't realize that libffi was built as a shared library.  That seems a 
little strange and inconvenient for a glue library.

> I worked around this problem on powerpc by defining new enum ffi_abi
> values so that you can recognize an old app.

I see, thanks for the heads up and the hint!

-- 
Eric Botcazou

[PATCH] Fix PR60179 - do not LTO stream DECL_FUNCTION_SPECIFIC_TARGET

2014-02-14 Thread Richard Biener


This removes streaming of cl_target_option (we can't stream
pointers in it).  The info was redundant given that we do stream
the target attribute itself.  So the following patch re-builds
DECL_FUNCTION_SPECIFIC_TARGET at tree loading time (we need it
during WPA inline analysis as well).

LTO bootstrapped and tested on x86_64-unknown-linux-gnu, ok?

(and yes, this fixes the libcpp/lex.o miscompare I was seeing
with LTO bootstrap)

The target hook implementations never use the 'name' or 'flags'
arguments so I wonder if we should change its signature.  That
makes the values I pass to those args less arbitrary ;)

Thanks,
Richard.

2014-02-14  Richard Biener  

PR lto/60179
* lto-streamer-out.c (DFS_write_tree_body): Do not follow
DECL_FUNCTION_SPECIFIC_TARGET.
(hash_tree): Do not hash DECL_FUNCTION_SPECIFIC_TARGET.
* tree-streamer-out.c (pack_ts_target_option): Remove.
(streamer_pack_tree_bitfields): Do not stream
TS_TARGET_OPTION.
(write_ts_function_decl_tree_pointers): Do not stream
DECL_FUNCTION_SPECIFIC_TARGET.
* tree-streamer-in.c (unpack_ts_target_option): Remove.
(unpack_value_fields): Do not stream TS_TARGET_OPTION.
(lto_input_ts_function_decl_tree_pointers): Do not stream
DECL_FUNCTION_SPECIFIC_TARGET.

lto/
* lto.c (compare_tree_sccs_1): Do not compare
DECL_FUNCTION_SPECIFIC_TARGET.
(lto_read_decls): Re-build DECL_FUNCTION_SPECIFIC_TARGET.

Index: gcc/lto-streamer-out.c
===
*** gcc/lto-streamer-out.c  (revision 207756)
--- gcc/lto-streamer-out.c  (working copy)
*** DFS_write_tree_body (struct output_block
*** 550,556 
if (CODE_CONTAINS_STRUCT (code, TS_FUNCTION_DECL))
  {
DFS_follow_tree_edge (DECL_FUNCTION_PERSONALITY (expr));
!   DFS_follow_tree_edge (DECL_FUNCTION_SPECIFIC_TARGET (expr));
DFS_follow_tree_edge (DECL_FUNCTION_SPECIFIC_OPTIMIZATION (expr));
  }
  
--- 550,556 
if (CODE_CONTAINS_STRUCT (code, TS_FUNCTION_DECL))
  {
DFS_follow_tree_edge (DECL_FUNCTION_PERSONALITY (expr));
!   /* Do not DECL_FUNCTION_SPECIFIC_TARGET.  They will be regenerated.  */
DFS_follow_tree_edge (DECL_FUNCTION_SPECIFIC_OPTIMIZATION (expr));
  }
  
*** hash_tree (struct streamer_tree_cache_d
*** 885,891 
strlen (TRANSLATION_UNIT_LANGUAGE (t)), v);
  
if (CODE_CONTAINS_STRUCT (code, TS_TARGET_OPTION))
! v = iterative_hash (t, sizeof (struct cl_target_option), v);
  
if (CODE_CONTAINS_STRUCT (code, TS_OPTIMIZATION))
  v = iterative_hash (t, sizeof (struct cl_optimization), v);
--- 885,891 
strlen (TRANSLATION_UNIT_LANGUAGE (t)), v);
  
if (CODE_CONTAINS_STRUCT (code, TS_TARGET_OPTION))
! gcc_unreachable ();
  
if (CODE_CONTAINS_STRUCT (code, TS_OPTIMIZATION))
  v = iterative_hash (t, sizeof (struct cl_optimization), v);
*** hash_tree (struct streamer_tree_cache_d
*** 986,992 
if (CODE_CONTAINS_STRUCT (code, TS_FUNCTION_DECL))
  {
visit (DECL_FUNCTION_PERSONALITY (t));
!   visit (DECL_FUNCTION_SPECIFIC_TARGET (t));
visit (DECL_FUNCTION_SPECIFIC_OPTIMIZATION (t));
  }
  
--- 986,992 
if (CODE_CONTAINS_STRUCT (code, TS_FUNCTION_DECL))
  {
visit (DECL_FUNCTION_PERSONALITY (t));
!   /* Do not follow DECL_FUNCTION_SPECIFIC_TARGET.  */
visit (DECL_FUNCTION_SPECIFIC_OPTIMIZATION (t));
  }
  
Index: gcc/tree-streamer-out.c
===
*** gcc/tree-streamer-out.c (revision 207756)
--- gcc/tree-streamer-out.c (working copy)
*** pack_ts_translation_unit_decl_value_fiel
*** 353,376 
bp_pack_string (ob, bp, TRANSLATION_UNIT_LANGUAGE (expr), true);
  }
  
- /* Pack a TS_TARGET_OPTION tree in EXPR to BP.  */
- 
- static void
- pack_ts_target_option (struct bitpack_d *bp, tree expr)
- {
-   struct cl_target_option *t = TREE_TARGET_OPTION (expr);
-   unsigned i, len;
- 
-   /* The cl_target_option is target specific and generated by the options
-  awk script, so we just recreate a byte-by-byte copy here. */
- 
-   len = sizeof (struct cl_target_option);
-   for (i = 0; i < len; i++)
- bp_pack_value (bp, ((unsigned char *)t)[i], 8);
-   /* Catch struct size mismatches between reader and writer. */
-   bp_pack_value (bp, 0x12345678, 32);
- }
- 
  /* Pack a TS_OPTIMIZATION tree in EXPR to BP.  */
  
  static void
--- 353,358 
*** streamer_pack_tree_bitfields (struct out
*** 481,487 
  pack_ts_translation_unit_decl_value_fields (ob, bp, expr);
  
if (CODE_CONTAINS_STRUCT (code, TS_TARGET_OPTION))
! pack_ts_target_option (bp, expr);
  
if (CODE_CONTAINS_STRUCT (code, TS_OPTIMIZATION))
  pack_ts_optimization (bp, expr);
--- 463,469

Re: [PATCH] Fix c-c++-common/ubsan/overflow-negate-2.c

2014-02-14 Thread Richard Earnshaw

On 14/02/14 09:57, Bernd Edlinger wrote:
> Hi,
> 
> this test case fails on ARM, because this target has by default unsigned char 
> type.
> 
> Attached please find my proposed (almost obvious) fix for this,
> by using signed char, instead of char alone.
> 
> 
> Boot-Strapped and tested on X86_64 and ARM.
> 
> 
> Thanks
> Bernd.  =
> 
> 
> patch-overflow-negate-2.diff
> 

OK.

R.

Re: std::regex_replace behaviour (LWG DR 2213)

2014-02-14 Thread Paolo Carlini

.. I think it would be cleaner to have new, separate testcases, named 
after 2213. This is what we always did in the past when we implemented 
resolutions of DRs.


At minimum, refer to 2213 in a comment.

Paolo.

Re: RFA: one more version of patch for PR59535

2014-02-14 Thread Richard Earnshaw

On 13/02/14 15:10, Richard Earnshaw wrote:
> On 11/02/14 19:43, Vladimir Makarov wrote:
>>   This is one more version of the patch to fix the PR59535
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535
>>
>>   Here are the results of applying the patch:
>>
>> ThumbThumb2
>>
>> reload 2626334  2400154
>> lra (before the patch) 2665749  2414926
>> lra (after the patch)  2626334  2397132
>>
>>
>> I already wrote that the change in arm.h is to prevent reloading sp as
>> an address by LRA. Reload has no such problem as it uses legitimate
>> address hook and LRA mostly relies on base_reg_class.
>>
>> Richard, I need an approval for this change.
>>
>> 2014-02-11  Vladimir Makarov  
>>
>> PR rtl-optimization/59535
>> * lra-constraints.c (process_alt_operands): Encourage alternative
>> when unassigned pseudo class is superset of the alternative class.
>> (inherit_reload_reg): Don't inherit when optimizing for code size.
>> * config/arm/arm.h (MODE_BASE_REG_CLASS): Return CORE_REGS for
>> Thumb2 and BASE_REGS for modes not less than 4 for LRA.
> 
> 
>> Index: config/arm/arm.h
>> ===
>> --- config/arm/arm.h (revision 207562)
>> +++ config/arm/arm.h (working copy)
>> @@ -1272,8 +1272,10 @@ enum reg_class
>> when addressing quantities in QI or HI mode; if we don't know the
>> mode, then we must be conservative.  */
>>  #define MODE_BASE_REG_CLASS(MODE)   \
>> -(TARGET_ARM || (TARGET_THUMB2 && !optimize_size) ? CORE_REGS :  \
>> - (((MODE) == SImode) ? BASE_REGS : LO_REGS))
>> +(TARGET_ARM || (TARGET_THUMB2 && (!optimize_size || arm_lra_flag))  
>> \
>> + ? CORE_REGS : ((MODE) == SImode
>> \
>> +|| (arm_lra_flag && GET_MODE_SIZE (MODE) >= 4)  \
>> +? BASE_REGS : LO_REGS))
>>  
>>  /* For Thumb we can not support SP+reg addressing, so we return LO_REGS
>> instead of BASE_REGS.  */
>>
> 
> Awesome.  Thanks, Vladimir.
> 
> I find that while I can't convince myself that the logic in the change
> to MODE_BASE_REG_CLASS is wrong, it's very hard to follow.  Furthermore,
> when we come to rip out the old reload code it will be quite prone to
> getting this wrong.  I think restructuring this along the lines of:
> 
> #define MODE_BASE_REG_CLASS(MODE)
>   (arm_lra_flag
>? (TARGET_32BIT ? CORE_REGS
>   : GET_MODE_SIZE (MODE) >= 4 ? BASE_REGS
>   : LO_REGS)
>: ((TARGET_ARM || (TARGET_THUMB2 && !optimize_size)) ? CORE_REGS
>   : ((MODE) == SImode) ? BASE_REGS
>   : LO_REGS))
> 
> Is both easier to understand and easier to simplify later when reload
> goes away.
> 
> I'll run a regression test on this and let you know the results.
> 
> R.
> 

This version of the arm.h patch survives testing.  Please can you use
this in place of your version.

Thanks,

R.
--- arm.h   (revision 207778)
+++ arm.h   (local)
@@ -1272,8 +1272,13 @@ enum reg_class
when addressing quantities in QI or HI mode; if we don't know the
mode, then we must be conservative.  */
 #define MODE_BASE_REG_CLASS(MODE)  \
-(TARGET_ARM || (TARGET_THUMB2 && !optimize_size) ? CORE_REGS :  \
- (((MODE) == SImode) ? BASE_REGS : LO_REGS))
+  (arm_lra_flag
\
+   ? (TARGET_32BIT ? CORE_REGS \
+  : GET_MODE_SIZE (MODE) >= 4 ? BASE_REGS  \
+  : LO_REGS)   \
+   : ((TARGET_ARM || (TARGET_THUMB2 && !optimize_size)) ? CORE_REGS\
+  : ((MODE) == SImode) ? BASE_REGS \
+  : LO_REGS))
 
 /* For Thumb we can not support SP+reg addressing, so we return LO_REGS
instead of BASE_REGS.  */

Re: [PATCH, testsuite] Fix profile test failures

2014-02-14 Thread Richard Sandiford

Steve Ellcey  writes:
> On Thu, 2014-02-13 at 23:09 +, Joseph S. Myers wrote:
>> On Thu, 13 Feb 2014, Steve Ellcey  wrote:
>> 
>> > While testing the C++ profiling tests in g++.dg/bprob and using the
>> > qemu simulator we discovered that these tests were passing when we ran
>> > the testsuite with no extra options but that if we specified some options
>> > on the testsuite run then the tests would fail with this message in the
>> > c++.log file:
>> > 
>> > rsh: Could not resolve hostname multi-sim/-EL: Name or service not known
>> 
>> That means your board file is buggy.  If rsh is not the right way to 
>> access your target system, you need to implement the board file methods in 
>> some way other than rsh (possibly some operations should be no-ops, or do 
>> something directly on the build system, if you have a shared filesystem).
>
> I thought the bug was that it was using 'multi-sim/-EL' instead of just
> 'multi-sim'.  I.e.  I thought that target was a combination of where the
> test was run and what options were used, whereas host was just going to
> be where the test was run.  I guess I was wrong about that.

Using target in itself should be OK.  The hostname for rsh/ssh should be
[board_info $board hostname] rather than $board itself.  So in this case
[board_info "multi-sim/-EL" hostname] should be multi-sim.  The usual way
to set that up is to put:

set_board_info hostname multi-sim

in multi-sim.exp.

Thanks,
Richard

[patch i386]: Fix PR/60193

2014-02-14 Thread Kai Tietz

Hi,

ChangeLog

2014-02-14  Kai Tietz  

PR target/60193
* config/i386/i386.c (ix86_expand_prologue): Use
rax register as displacement for restoring %r10, %eax.

Regression-tested for x86_64-unknown-linux-gnu, and
x86_64-w64-mingw32, and i686-w64-mingw32.  Ok for apply?

Regards,
Kai

Index: i386.c
===
--- i386.c(Revision 207686)
+++ i386.c(Arbeitskopie)
@@ -11084,17 +11084,20 @@ ix86_expand_prologue (void)
  works for realigned stack, too.  */
   if (r10_live && eax_live)
 {
-  t = plus_constant (Pmode, stack_pointer_rtx, allocate);
+  rtx eax = gen_rtx_REG (word_mode, AX_REG);
+  t = gen_rtx_PLUS (Pmode, stack_pointer_rtx, eax);
   emit_move_insn (gen_rtx_REG (word_mode, R10_REG),
   gen_frame_mem (word_mode, t));
-  t = plus_constant (Pmode, stack_pointer_rtx,
- allocate - UNITS_PER_WORD);
+  t = plus_constant (Pmode, eax, UNITS_PER_WORD);
+  emit_move_insn (eax, t);
+  t = gen_rtx_PLUS (Pmode, stack_pointer_rtx, eax);
   emit_move_insn (gen_rtx_REG (word_mode, AX_REG),
   gen_frame_mem (word_mode, t));
 }
   else if (eax_live || r10_live)
 {
-  t = plus_constant (Pmode, stack_pointer_rtx, allocate);
+  rtx eax = gen_rtx_REG (word_mode, AX_REG);
+  t = gen_rtx_PLUS (Pmode, stack_pointer_rtx, eax);
   emit_move_insn (gen_rtx_REG (word_mode,
(eax_live ? AX_REG : R10_REG)),
   gen_frame_mem (word_mode, t));

Re: [PATCH] Fix PR60179 - do not LTO stream DECL_FUNCTION_SPECIFIC_TARGET

2014-02-14 Thread Jakub Jelinek

On Fri, Feb 14, 2014 at 11:51:43AM +0100, Richard Biener wrote:
> This removes streaming of cl_target_option (we can't stream
> pointers in it).  The info was redundant given that we do stream
> the target attribute itself.  So the following patch re-builds
> DECL_FUNCTION_SPECIFIC_TARGET at tree loading time (we need it
> during WPA inline analysis as well).
> 
> LTO bootstrapped and tested on x86_64-unknown-linux-gnu, ok?
> 
> (and yes, this fixes the libcpp/lex.o miscompare I was seeing
> with LTO bootstrap)
> 
> The target hook implementations never use the 'name' or 'flags'
> arguments so I wonder if we should change its signature.  That
> makes the values I pass to those args less arbitrary ;)

Looks good.

Jakub

RFA: RL78: Add missing instruction patterns

2014-02-14 Thread Nick Clifton

Hi DJ,

  The patch below adds some missing instruction patterns to the RL78
  backend.  Missing in the sense that gcc generates the RTL even
  if the patterns are not present in the backend and then triggers an
  ICE because they cannot be matched.  It is not clear to me why this
  should be happening, but adding the patterns was the easiest fix.

  Applying the patch resolves these tests in the gcc testsuite, and does
  not introduce any regressions.

gcc.c-torture/execute/20020226-1.c
gcc.c-torture/execute/20020508-1.c
gcc.c-torture/execute/pr57321.c
gcc.c-torture/unsorted/bf.c
gcc.dg/20050922-1.c

  OK to apply ?

Cheers
  Nick

gcc/ChangeLog
2014-02-14  Nick Clifton  

* config/rl78/rl78-expand.md (xorhi3): New pattern.
* config/rl78/rl78-virt.md (andhi3_virt): New pattern.
(nandhi3_virt): New pattern.
(xorhi3_virt): New pattern.
* config/rl78/rl78-real.md (andhi3_real): New pattern.
(nandhi3_real): New pattern.
(xorhi3_real): New pattern.

Index: gcc/config/rl78/rl78-expand.md
===
--- gcc/config/rl78/rl78-expand.md  (revision 207762)
+++ gcc/config/rl78/rl78-expand.md  (working copy)
@@ -304,3 +304,15 @@
   "1"
   "rl78_expand_compare (operands);"
 )
+
+(define_expand "xorhi3"
+  [(set (match_operand:HI 0 "register_operand")
+   (xor:HI (match_operand:HI 1 "register_operand")
+   (match_operand:HI 2 "nonmemory_operand")))
+   ]
+  ""
+  "if (GET_CODE (operands[2]) == SYMBOL_REF)
+ operands[2] = force_reg (HImode, operands[2]);
+   if (rl78_force_nonfar_3 (operands, gen_xorhi3))
+ DONE;"
+)
Index: gcc/config/rl78/rl78-real.md
===
--- gcc/config/rl78/rl78-real.md(revision 207762)
+++ gcc/config/rl78/rl78-real.md(working copy)
@@ -549,3 +576,34 @@
   [(set (reg:QI A_REG) (and:QI (reg:QI A_REG) (match_dup 1)))]
   )
 
+(define_insn "*andhi3_real"
+  [(set (match_operand:HI 0 "register_operand"  "=Av")
+   (and:HI (match_operand:HI 1 "register_operand"  "0")
+   (match_operand:HI 2 "immediate_operand" "n")))
+   ]
+  "rl78_real_insns_ok ()"
+  "and\t%q0, %q2 \; and\t%Q0, %Q2"
+)
+
+(define_insn "*nandhi3_real"
+  [(set (match_operand:HI 0 "register_operand"  "=A")
+   (and:HI (neg:HI (match_operand:HI 1 "register_operand"  "0"))
+   (match_operand:HI 2 "immediate_operand" "n")))
+   ]
+  "rl78_real_insns_ok ()"
+  "xor a, #0xff @ xch a, x @ xor a, #0xff @ xch a, x @ addw ax, #1 @ and a, 
%Q2 @ xch a, x @ and a, %q2 @ xch a, x"
+)
+
+;; Necessary because GCC insists upon being able to perform binary
+;; operations upon pointers.  Failure to provide these patterns
+;; results in GCC generating illegal subregs, eg: (SUBREG:QI (REG:HI 33) 1)
+
+(define_insn "*xorhi3_real"
+  [(set (match_operand:HI 0 "register_operand"   "=A")
+   (xor:HI (match_operand:HI 1 "register_operand"   "0")
+   (match_operand:HI 2 "nonmemory_operand"  "ABDTn")))
+   ]
+  "rl78_real_insns_ok ()"
+  "xor a, %Q2 \; xch a, x \; xor a, %q2 \; xch a, x"
+)
+
Index: gcc/config/rl78/rl78-virt.md
===
--- gcc/config/rl78/rl78-virt.md(revision 207762)
+++ gcc/config/rl78/rl78-virt.md(working copy)
@@ -405,3 +405,34 @@
]
   "rl78_setup_peep_movhi (operands);"
   )
+
+(define_insn "*andhi3_virt"
+  [(set (match_operand:HI 0 "register_operand" "=v")
+   (and:HI (match_operand:HI 1 "register_operand"  "0")
+   (match_operand:HI 2 "immediate_operand" "n")))
+   ]
+  "rl78_virt_insns_ok ()"
+  "v.and\t%0, %1, %2"
+)
+
+(define_insn "*nandhi3_virt"
+  [(set (match_operand:HI 0 "register_operand" "=v")
+   (and:HI (neg:HI (match_operand:HI 1 "register_operand"  "0"))
+   (match_operand:HI 2 "immediate_operand" "n")))
+   ]
+  "rl78_virt_insns_ok ()"
+  "v.nand\t%0, %1, %2"
+)
+
+;; Necessary because GCC insists upon being able to perform binary
+;; operations upon pointers.  Failure to provide these patterns
+;; results in GCC generating illegal subregs, eg: (SUBREG:QI (REG:HI 33) 1)
+
+(define_insn "*xorhi3_virt"
+  [(set (match_operand:HI 0 "register_operand" "=v")
+   (xor:HI (match_operand:HI 1 "register_operand"  "0")
+   (match_operand:HI 2 "nonmemory_operand" "vn")))
+   ]
+  "rl78_virt_insns_ok () && GET_CODE (operands[2]) != SYMBOL_REF"
+  "v.xor.hi\t%0, %1, %2"
+)

Re: [patch i386]: Fix PR/60193

2014-02-14 Thread Uros Bizjak

Hello!

> 2014-02-14  Kai Tietz  
>
> PR target/60193
> * config/i386/i386.c (ix86_expand_prologue): Use
> rax register as displacement for restoring %r10, %eax.
>
> Regression-tested for x86_64-unknown-linux-gnu, and
> x86_64-w64-mingw32, and i686-w64-mingw32.  Ok for apply?

No, you should check allocate to satisfy x86_64_immediate_operand and
put it into a temporary register if not. There is no need to always
force constant into a temporary.

Uros.

Re: [PATCH, WWW] [AVX-512] Add news about AVX-512.

2014-02-14 Thread Kirill Yukhin

Hello Gerald,
Thanks, for your inputs.
Updated patch in the bottom.

On 12 Feb 01:12, Gerald Pfeifer wrote:
> Is there an option to enable all of them together?
Not yet since we have no product in the market and
all can be changed.

If no objection I'll commit it tomorrow.

--
Thanks, K

Index: htdocs/index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v
retrieving revision 1.903
diff -p -r1.903 index.html
*** htdocs/index.html   3 Feb 2014 10:02:51 -   1.903
--- htdocs/index.html   14 Feb 2014 12:53:55 -
*** mission statement.
*** 53,58 
--- 53,69 

  

+ Intel AVX-512 support
+ [2014-02-14]
+ Intel AVX-512 support was added to GCC.  That includes inline assembly
+   support for inline assembly, new registers and extending existing ones,
+   new intrinsics, and basic autovectorization.
+   Code was contributed by Sergey Guriev, Alexander Ivchenko,
+   Maxim Kuznetsov, Sergey Lega, Anna Tikhonova, Ilya Tocar,
+   Andrey Turetskiy, Ilya Verbin, Kirill Yukhin and
+   Michael Zolotukhin of Intel, Corp. 
+ 
+
  Altera Nios II support
  [2013-12-31]
  A port for Altera Nios II has been contributed by Mentor 
Graphics.
Index: htdocs/gcc-4.9/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v
retrieving revision 1.54
diff -p -r1.54 changes.html
*** htdocs/gcc-4.9/changes.html 28 Jan 2014 23:57:49 -  1.54
--- htdocs/gcc-4.9/changes.html 14 Feb 2014 12:53:55 -
*** auto incr = [](auto x) { return x++; };
*** 387,392 
--- 387,401 

  IA-32/x86-64

+ Intel AVX-512 support was added to GCC.  That includes inline assembly
+   support for inline assembly, new registers and extending existing ones,
+   new intrinsics (covered by corresponding testsuite), and basic
+   autovectorization.  AVX-512 instructions are available via
+   the following GCC switches: AVX-512 foundamental instructions:
+   -mavx512f, AVX-512 prefetch instructions: 
-mavx512pf,
+   AVX-512 exponential and reciprocal instructions: 
-mavx512er,
+   AVX-512 conflict detection instructions: -mavx512cd.
+ 
   It is now possible to call x86 intrinsics from select functions in
a file that are tagged with the corresponding target attribute without
having to compile the entire file with the -mxxx option.

[PATCH] Fix PR60183

2014-02-14 Thread Richard Biener


The following avoids speculating loads in phiprop (sth it was never
supposed to do).  That fixes the crashes in the PR.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

Richard.

2014-02-14  Richard Biener  

PR tree-optimization/60183
* tree-ssa-phiprop.c (propagate_with_phi): Avoid speculating
loads.
(tree_ssa_phiprop): Calculate and free post-dominators.

* gcc.dg/torture/pr60183.c: New testcase.

Index: gcc/tree-ssa-phiprop.c
===
*** gcc/tree-ssa-phiprop.c  (revision 207757)
--- gcc/tree-ssa-phiprop.c  (working copy)
*** propagate_with_phi (basic_block bb, gimp
*** 309,314 
--- 309,320 
gimple def_stmt;
tree vuse;
  
+   /* Only replace loads in blocks that post-dominate the PHI node.  That
+  makes sure we don't end up speculating loads.  */
+   if (!dominated_by_p (CDI_POST_DOMINATORS,
+  bb, gimple_bb (use_stmt)))
+   continue;
+  
/* Check whether this is a load of *ptr.  */
if (!(is_gimple_assign (use_stmt)
&& TREE_CODE (gimple_assign_lhs (use_stmt)) == SSA_NAME
*** tree_ssa_phiprop (void)
*** 380,385 
--- 386,392 
size_t n;
  
calculate_dominance_info (CDI_DOMINATORS);
+   calculate_dominance_info (CDI_POST_DOMINATORS);
  
n = num_ssa_names;
phivn = XCNEWVEC (struct phiprop_d, n);
*** tree_ssa_phiprop (void)
*** 397,402 
--- 404,411 
bbs.release ();
free (phivn);
  
+   free_dominance_info (CDI_POST_DOMINATORS);
+ 
return 0;
  }
  
Index: gcc/testsuite/gcc.dg/torture/pr60183.c
===
*** gcc/testsuite/gcc.dg/torture/pr60183.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr60183.c  (working copy)
***
*** 0 
--- 1,38 
+ /* { dg-do run } */
+ 
+ /* Large so an out-of-bound read will crash.  */
+ unsigned char c[0x30001] = { 1 };
+ int j = 2;
+ 
+ static void
+ foo (unsigned long *x, unsigned char *y)
+ {
+   int i;
+   unsigned long w = x[0];
+   for (i = 0; i < j; i++)
+ {
+   w += *y;
+   y += 0x1;
+   w += *y;
+   y += 0x1;
+ }
+   x[1] = w;
+ }
+ 
+ __attribute__ ((noinline, noclone)) void
+ bar (unsigned long *x)
+ {
+   foo (x, c);
+ }
+ 
+ int
+ main ()
+ {
+   unsigned long a[2] = { 0, -1UL };
+   asm volatile (""::"r" (c):"memory");
+   c[0] = 0;
+   bar (a);
+   if (a[1] != 0)
+ __builtin_abort ();
+   return 0;
+ }

Re: FRE may run out of memory

2014-02-14 Thread Richard Biener

On Fri, Feb 14, 2014 at 3:50 AM, dxq  wrote:
> Richard Biener-2 wrote
>> On Sat, Feb 8, 2014 at 8:29 AM, dxq <
>
>> ziyan01@
>
>> > wrote:
>>> hi all,
>>>
>>> We found that gcc would run out of memory on Windows when compiling a
>>> *big*
>>> function (10 lines).
>>>
>>> More investigation shows that gcc crashes at the function
>>> *compute_avail*,
>>> in tree-fre pass.  *compute_avail* collects information from basic
>>> blocks,
>>> so memory is allocated to record informantion.
>>> However, if there are huge number of basic blocks,  the memory would be
>>> exhausted and gcc would crash down, especially for Windows PC, only 2G or
>>> 4G
>>> memory generally. It's ok On linux, and *compute_avail* allocates *2.4G*
>>> memory. I guess some optimization passes in gcc like FRE didn't consider
>>> the
>>> extreme
>>> case.
>>
>> This was fixed for GCC 4.8, FRE no longer uses compute_avail (but PRE
>> still does).
>> Basically GCC 4.8 should (at -O1) compile most extreme cases just fine.
>>
>> Richard.
>
> hi, Richard,
>
> More  investigation shows that
> 1, loop related passes take more compiling time and memory, especially
> pass_rtl_move_loop_invariants, lim,
>   and at least lim on tree will impact a lot to the following passes.
> 2, ira will take more than 20g memory in function *create_loop_tree_nodes*,
> because ira chooses 'mixed'
>   or 'all' region when optimize level.
> 3, sms pass always creats ddgs for all loops in compiled function, then does
> sms optimization for all loops,
>   and finally frees ddgs. If there are huge number of loops, sms may crash
> when creating ddgs because of
>   running out of memory.
>
> The passes above , should someone confirm about memory pressure problem?

What compiler version did you check?  I think that 4.8 has improvements
for 1. and 2. (SMS is unmaintained).  Note that we only spent time to
make -O1 behave sanely with extremely large functions.

Finally I'd suggest you open a bugreport and attach a testcase to it
that exposes the issues you list.

Richard.

> Thanks for your reply!
>
> danxiaoqiang
>
>
>
> --
> View this message in context: 
> http://gcc.1065356.n5.nabble.com/FRE-may-run-out-of-memory-tp1009578p1011035.html
> Sent from the gcc - patches mailing list archive at Nabble.com.

Re: [PATCH] Fix PCH on AArch64 (PR pch/60010)

2014-02-14 Thread Richard Earnshaw

On 31/01/14 19:59, Kyle McMartin wrote:
> Hi,
> 
> Similar to other architectures, failing to set TRY_EMPTY_VM_SPACE
> results in a Segmentation Fault and ICE in cc1plus when using
> precompiled headers and randomize_va_space is set. This patch fixes the
> issue, and now I can reliably build packages which use pch (wxGTK and
> openjdk in particular would fail every time. wxGTK has survived 30 build
> attempts without failure now.)
> 
> (The exact value is unimportant, as long as it's in an unused area. I
> suspect that the fallback buffer_size code path hasn't been fixed up
> since the exec-shield days and could use a re-think now that mmap
> randomization is upstream. I've been trying to debug exactly why it
> fails for all architctures, so we can remove this, but haven't had much
> luck yet.)
> 
> This is similar to pch/45979, pch/14940, target/25343.
> 
> Bootstrapped and tested on aarch64-linux-gnu.
> 
> regards, Kyle
> 
> 2014-01-31  Kyle McMartin 
> 
>   PR pch/60010
>   * config/host-linux.c (TRY_EMPTY_VM_SPACE): Define for AArch64.
> 

This is OK, subject to RM approval.

R.

> --- a/gcc/config/host-linux.c
> +++ b/gcc/config/host-linux.c
> @@ -86,6 +86,8 @@
>  # define TRY_EMPTY_VM_SPACE  0x6000
>  #elif defined(__mc68000__)
>  # define TRY_EMPTY_VM_SPACE  0x4000
> +#elif defined(__aarch64__)
> +# define TRY_EMPTY_VM_SPACE  0x10
>  #elif defined(__ARM_EABI__)
>  # define TRY_EMPTY_VM_SPACE 0x6000
>  #elif defined(__mips__) && defined(__LP64__)
>

Re: [PATCH, ARM] Skip pr59858.c test for -mfloat-abi=hard

2014-02-14 Thread Richard Earnshaw

On 13/02/14 14:32, Ian Bolton wrote:
> Hi,
> 
> The pr59858.c testcase explicitly sets -msoft-float which is incompatible
> with our -mfloat-abi=hard variant.
> 
> This patch therefore should not be run if you have -mfloat-abi=hard.
> 
> Tested with both variations for arm-none-eabi build.
> 
> OK for commit?
> 
> Cheers,
> Ian
> 
> 
> 2014-02-13  Ian Bolton  
> 
> testsuite/
> * gcc.target/arm/pr59858.c: Skip test if -mfloat-abi=hard.
> 
> 
> pr59858-skip-if-hard-float-patch-v2.txt
> 
> 
> diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c 
> b/gcc/testsuite/gcc.target/arm/pr59858.c
> index 463bd38..1e03203 100644
> --- a/gcc/testsuite/gcc.target/arm/pr59858.c
> +++ b/gcc/testsuite/gcc.target/arm/pr59858.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-march=armv5te -marm -mthumb-interwork -Wall 
> -Wstrict-prototypes -Wstrict-aliasing -funsigned-char -fno-builtin -fno-asm 
> -msoft-float -std=gnu99 -mlittle-endian -mthumb -fno-stack-protector  -Os -g 
> -feliminate-unused-debug-types -funit-at-a-time -fmerge-all-constants 
> -fstrict-aliasing -fno-tree-loop-optimize -fno-tree-dominator-opts 
> -fno-strength-reduce -fPIC -w" } */
> +/* { dg-skip-if "Test is not compatible with hard-float" { *-*-* } { 
> "-mfloat-abi=hard" } { "" } } */
>  
>  typedef enum {
>   REG_ENOSYS = -1,
> 

This won't work if hard-float is the default.  Take a look at the way
other tests check for this.

Re: [PATCH] Fix PCH on AArch64 (PR pch/60010)

2014-02-14 Thread Richard Biener

On Fri, 14 Feb 2014, Richard Earnshaw wrote:

> On 31/01/14 19:59, Kyle McMartin wrote:
> > Hi,
> > 
> > Similar to other architectures, failing to set TRY_EMPTY_VM_SPACE
> > results in a Segmentation Fault and ICE in cc1plus when using
> > precompiled headers and randomize_va_space is set. This patch fixes the
> > issue, and now I can reliably build packages which use pch (wxGTK and
> > openjdk in particular would fail every time. wxGTK has survived 30 build
> > attempts without failure now.)
> > 
> > (The exact value is unimportant, as long as it's in an unused area. I
> > suspect that the fallback buffer_size code path hasn't been fixed up
> > since the exec-shield days and could use a re-think now that mmap
> > randomization is upstream. I've been trying to debug exactly why it
> > fails for all architctures, so we can remove this, but haven't had much
> > luck yet.)
> > 
> > This is similar to pch/45979, pch/14940, target/25343.
> > 
> > Bootstrapped and tested on aarch64-linux-gnu.
> > 
> > regards, Kyle
> > 
> > 2014-01-31  Kyle McMartin 
> > 
> > PR pch/60010
> > * config/host-linux.c (TRY_EMPTY_VM_SPACE): Define for AArch64.
> > 
> 
> This is OK, subject to RM approval.

Works for me.

Richard.

> R.
> 
> > --- a/gcc/config/host-linux.c
> > +++ b/gcc/config/host-linux.c
> > @@ -86,6 +86,8 @@
> >  # define TRY_EMPTY_VM_SPACE0x6000
> >  #elif defined(__mc68000__)
> >  # define TRY_EMPTY_VM_SPACE0x4000
> > +#elif defined(__aarch64__)
> > +# define TRY_EMPTY_VM_SPACE0x10
> >  #elif defined(__ARM_EABI__)
> >  # define TRY_EMPTY_VM_SPACE 0x6000
> >  #elif defined(__mips__) && defined(__LP64__)
> > 
> 
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

Re: [PATCH] Fix PCH on AArch64 (PR pch/60010)

2014-02-14 Thread Jakub Jelinek

On Fri, Feb 14, 2014 at 01:43:26PM +, Richard Earnshaw wrote:
> On 31/01/14 19:59, Kyle McMartin wrote:
> > 2014-01-31  Kyle McMartin 
> > 
> > PR pch/60010
> > * config/host-linux.c (TRY_EMPTY_VM_SPACE): Define for AArch64.
> > 
> 
> This is OK, subject to RM approval.

Ok.  Kyle, do you have commit access, or Richard, are you going to check it
in for Kyle?

> > --- a/gcc/config/host-linux.c
> > +++ b/gcc/config/host-linux.c
> > @@ -86,6 +86,8 @@
> >  # define TRY_EMPTY_VM_SPACE0x6000
> >  #elif defined(__mc68000__)
> >  # define TRY_EMPTY_VM_SPACE0x4000
> > +#elif defined(__aarch64__)
> > +# define TRY_EMPTY_VM_SPACE0x10
> >  #elif defined(__ARM_EABI__)
> >  # define TRY_EMPTY_VM_SPACE 0x6000
> >  #elif defined(__mips__) && defined(__LP64__)

Jakub

Re: [patch i386]: Fix PR/60193

2014-02-14 Thread Kai Tietz

2014-02-14 13:55 GMT+01:00 Uros Bizjak :
> Hello!
>
>> 2014-02-14  Kai Tietz  
>>
>> PR target/60193
>> * config/i386/i386.c (ix86_expand_prologue): Use
>> rax register as displacement for restoring %r10, %eax.
>>
>> Regression-tested for x86_64-unknown-linux-gnu, and
>> x86_64-w64-mingw32, and i686-w64-mingw32.  Ok for apply?
>
> No, you should check allocate to satisfy x86_64_immediate_operand and
> put it into a temporary register if not. There is no need to always
> force constant into a temporary.

Well, in general I would agree to your statement.  But in this case we
have already the required value in rax-register loaded.  So I don't
see the advantage of using in case of <2^32 constant for those
restore-operation.  At least for code-size optimization it looks to me
better and I am not aware that usage of register is here more
expensive. I might be wrong about later.

> Uros.

Kai

Re: [PATCH] Fix PCH on AArch64 (PR pch/60010)

2014-02-14 Thread Richard Earnshaw

On 14/02/14 13:47, Jakub Jelinek wrote:
> On Fri, Feb 14, 2014 at 01:43:26PM +, Richard Earnshaw wrote:
>> On 31/01/14 19:59, Kyle McMartin wrote:
>>> 2014-01-31  Kyle McMartin 
>>>
>>> PR pch/60010
>>> * config/host-linux.c (TRY_EMPTY_VM_SPACE): Define for AArch64.
>>>
>>
>> This is OK, subject to RM approval.
> 
> Ok.  Kyle, do you have commit access, or Richard, are you going to check it
> in for Kyle?
> 
>>> --- a/gcc/config/host-linux.c
>>> +++ b/gcc/config/host-linux.c
>>> @@ -86,6 +86,8 @@
>>>  # define TRY_EMPTY_VM_SPACE0x6000
>>>  #elif defined(__mc68000__)
>>>  # define TRY_EMPTY_VM_SPACE0x4000
>>> +#elif defined(__aarch64__)
>>> +# define TRY_EMPTY_VM_SPACE0x10
>>>  #elif defined(__ARM_EABI__)
>>>  # define TRY_EMPTY_VM_SPACE 0x6000
>>>  #elif defined(__mips__) && defined(__LP64__)
> 
>   Jakub
> 

I've put it in.

R.

Re: [PATCH] Fix PCH on AArch64 (PR pch/60010)

2014-02-14 Thread Richard Earnshaw

On 14/02/14 14:14, Richard Earnshaw wrote:
> On 14/02/14 13:47, Jakub Jelinek wrote:
>> On Fri, Feb 14, 2014 at 01:43:26PM +, Richard Earnshaw wrote:
>>> On 31/01/14 19:59, Kyle McMartin wrote:
 2014-01-31  Kyle McMartin 

PR pch/60010
* config/host-linux.c (TRY_EMPTY_VM_SPACE): Define for AArch64.

>>>
>>> This is OK, subject to RM approval.
>>
>> Ok.  Kyle, do you have commit access, or Richard, are you going to check it
>> in for Kyle?
>>
 --- a/gcc/config/host-linux.c
 +++ b/gcc/config/host-linux.c
 @@ -86,6 +86,8 @@
  # define TRY_EMPTY_VM_SPACE   0x6000
  #elif defined(__mc68000__)
  # define TRY_EMPTY_VM_SPACE   0x4000
 +#elif defined(__aarch64__)
 +# define TRY_EMPTY_VM_SPACE   0x10
  #elif defined(__ARM_EABI__)
  # define TRY_EMPTY_VM_SPACE 0x6000
  #elif defined(__mips__) && defined(__LP64__)
>>
>>  Jakub
>>
> 
> I've put it in.
> 
> R.
> 

Kyle, the PR is against 4.8.  Have you tested a back-port?

R.

Re: [RS6000] power8 internal compiler errors

2014-02-14 Thread David Edelsohn

On Fri, Feb 14, 2014 at 2:18 AM, Alan Modra  wrote:
> On Wed, Feb 12, 2014 at 06:47:37PM +0100, Ulrich Weigand wrote:
>> Note that find_replacement itself already recurses into both sides
>> of a PLUS.
>
> Thanks, I missed seeing that.  I'd analysed the bug and knew what
> needed doing from past forays into reload, so went looking for ways to
> get at the reloads, ie. "replacements" at that stage of reload.  Lo
> and behold, there's a function tailor made to do just that!  So I
> plugged in find_replacements() wherever it seemed necessary.
>
>> So it might be
>> easier and cheaper overall to just do a find_replacement within
>> the PRE_MODIFY clause ...
>
> That's a good idea, since PRE_MODIFY doesn't occur that often.
> Here is the revised patch with your recommendations.  Bootstrapped
> and regression tested powerpc64-linux.
>
> PR target/58675
> PR target/57935
> * config/rs6000/rs6000.c (rs6000_secondary_reload_inner): Use
> find_replacement on parts of insn rtl that might be reloaded.

Okay, this is a cleaner solution.

- David

Re: [patch i386]: Fix PR/60193

2014-02-14 Thread Uros Bizjak

On Fri, Feb 14, 2014 at 2:48 PM, Kai Tietz  wrote:
> 2014-02-14 13:55 GMT+01:00 Uros Bizjak :
>> Hello!
>>
>>> 2014-02-14  Kai Tietz  
>>>
>>> PR target/60193
>>> * config/i386/i386.c (ix86_expand_prologue): Use
>>> rax register as displacement for restoring %r10, %eax.
>>>
>>> Regression-tested for x86_64-unknown-linux-gnu, and
>>> x86_64-w64-mingw32, and i686-w64-mingw32.  Ok for apply?
>>
>> No, you should check allocate to satisfy x86_64_immediate_operand and
>> put it into a temporary register if not. There is no need to always
>> force constant into a temporary.
>
> Well, in general I would agree to your statement.  But in this case we
> have already the required value in rax-register loaded.  So I don't
> see the advantage of using in case of <2^32 constant for those
> restore-operation.  At least for code-size optimization it looks to me
> better and I am not aware that usage of register is here more
> expensive. I might be wrong about later.

Ah, I was not aware of the fact that eax already holds the value.
However, there were some problems with the patch: eax RTX is
unnecessarily regenerated in the wrong mode, UNITS_PER_WORD should be
subtracted instead of added - you can use displacement+offset
addressing instead.

Something like (untested) attached patch.

Uros.
Index: i386.c
===
--- i386.c  (revision 207780)
+++ i386.c  (working copy)
@@ -11023,13 +11023,12 @@
   rtx r10 = NULL;
   rtx (*adjust_stack_insn)(rtx, rtx, rtx);
   const bool sp_is_cfa_reg = (m->fs.cfa_reg == stack_pointer_rtx);
-  bool eax_live = false;
+  bool eax_live = ix86_eax_live_at_start_p ();
   bool r10_live = false;
 
   if (TARGET_64BIT)
 r10_live = (DECL_STATIC_CHAIN (current_function_decl) != 0);
 
-  eax_live = ix86_eax_live_at_start_p ();
   if (eax_live)
{
  insn = emit_insn (gen_push (eax));
@@ -11084,17 +11083,16 @@
 works for realigned stack, too.  */
   if (r10_live && eax_live)
 {
- t = plus_constant (Pmode, stack_pointer_rtx, allocate);
+ t = gen_rtx_PLUS (Pmode, stack_pointer_rtx, eax);
  emit_move_insn (gen_rtx_REG (word_mode, R10_REG),
  gen_frame_mem (word_mode, t));
- t = plus_constant (Pmode, stack_pointer_rtx,
-allocate - UNITS_PER_WORD);
+ t = plus_constant (Pmode, t, -UNITS_PER_WORD);
  emit_move_insn (gen_rtx_REG (word_mode, AX_REG),
  gen_frame_mem (word_mode, t));
}
   else if (eax_live || r10_live)
{
- t = plus_constant (Pmode, stack_pointer_rtx, allocate);
+ t = gen_rtx_PLUS (Pmode, stack_pointer_rtx, eax);
  emit_move_insn (gen_rtx_REG (word_mode,
   (eax_live ? AX_REG : R10_REG)),
  gen_frame_mem (word_mode, t));

Re: [patch i386]: Fix PR/60193

2014-02-14 Thread Kai Tietz

So, here is the asked more complex variant regarding
displacement-value-range of amd64-instruction and taking care that we
use constant-offsets instead of register-based displacement addressing
if possible.

ChangeLog

2014-02-14  Kai Tietz  

PR target/60193
* config/i386/i386.c (ix86_expand_prologue): Use
rax register as displacement for restoring %r10, %eax.

Regression-tested for x86_64-unknown-linux-gnu, and
x86_64-w64-mingw32, and i686-w64-mingw32.  Ok for apply?

Regards,
Kai

Index: i386.c
===
--- i386.c(Revision 207686)
+++ i386.c(Arbeitskopie)
@@ -11084,17 +11084,34 @@ ix86_expand_prologue (void)
  works for realigned stack, too.  */
   if (r10_live && eax_live)
 {
-  t = plus_constant (Pmode, stack_pointer_rtx, allocate);
+  /* Don't exceed displacement-range for 64-bit.  */
+  if (!TARGET_64BIT || allocate <= (1 << 31))
+t = plus_constant (Pmode, stack_pointer_rtx, allocate);
+  else
+t = gen_rtx_PLUS (Pmode, stack_pointer_rtx, eax);
   emit_move_insn (gen_rtx_REG (word_mode, R10_REG),
   gen_frame_mem (word_mode, t));
-  t = plus_constant (Pmode, stack_pointer_rtx,
- allocate - UNITS_PER_WORD);
+
+  /* Don't exceed displacement-range for 64-bit.  */
+  if (!TARGET_64BIT || (allocate + UNITS_PER_WORD) <= (1 << 31))
+t = plus_constant (Pmode, stack_pointer_rtx,
+   allocate + UNITS_PER_WORD);
+  else
+{
+  t = plus_constant (Pmode, eax, UNITS_PER_WORD);
+  emit_move_insn (eax, t);
+  t = gen_rtx_PLUS (Pmode, stack_pointer_rtx, eax);
+}
   emit_move_insn (gen_rtx_REG (word_mode, AX_REG),
   gen_frame_mem (word_mode, t));
 }
   else if (eax_live || r10_live)
 {
-  t = plus_constant (Pmode, stack_pointer_rtx, allocate);
+  /* Don't exceed displacement-range for 64-bit.  */
+  if (!TARGET_64BIT || allocate <= (1 << 31))
+t = plus_constant (Pmode, stack_pointer_rtx, allocate);
+  else
+t = gen_rtx_PLUS (Pmode, stack_pointer_rtx, eax);
   emit_move_insn (gen_rtx_REG (word_mode,
(eax_live ? AX_REG : R10_REG)),
   gen_frame_mem (word_mode, t));

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-14 Thread Ilya Verbin

Hi Bernd and Thomas,

Are you planning to support offloading from DSO in PTX/CUDA
environment?  If yes, how are you going to solve the problem of the
collision of function names from different DSOs?

However, if we decide to use element-wise host-target address mapping,
there are opportunities to make this approach more robust.  E.g. we
can store some hash(name) in the compiler-generated tables along with
the address and size.  When libgomp will perform device
initialization, it will compare hashes from the host and target DSOs.
This should reveal possible errors during the initialization, and will
avoid hard-to-debug silent failures.

  -- Ilya

Re: [patch i386]: Fix PR/60193

2014-02-14 Thread Kai Tietz

2014-02-14 15:40 GMT+01:00 Uros Bizjak :
> On Fri, Feb 14, 2014 at 2:48 PM, Kai Tietz  wrote:
>> 2014-02-14 13:55 GMT+01:00 Uros Bizjak :
>>> Hello!
>>>
 2014-02-14  Kai Tietz  

 PR target/60193
 * config/i386/i386.c (ix86_expand_prologue): Use
 rax register as displacement for restoring %r10, %eax.

 Regression-tested for x86_64-unknown-linux-gnu, and
 x86_64-w64-mingw32, and i686-w64-mingw32.  Ok for apply?
>>>
>>> No, you should check allocate to satisfy x86_64_immediate_operand and
>>> put it into a temporary register if not. There is no need to always
>>> force constant into a temporary.
>>
>> Well, in general I would agree to your statement.  But in this case we
>> have already the required value in rax-register loaded.  So I don't
>> see the advantage of using in case of <2^32 constant for those
>> restore-operation.  At least for code-size optimization it looks to me
>> better and I am not aware that usage of register is here more
>> expensive. I might be wrong about later.
>
> Ah, I was not aware of the fact that eax already holds the value.
> However, there were some problems with the patch: eax RTX is
> unnecessarily regenerated in the wrong mode, UNITS_PER_WORD should be
> subtracted instead of added - you can use displacement+offset
> addressing instead.
>
> Something like (untested) attached patch.
>
> Uros.

No, the patch I attached works fine.  To substract here UNITS_PER_WORD
is in fact a bug.  As description see how we modify allocate on
pushing.

So for allocate of x * UNITS_PER_WORD with living rax, and r10, we
will see following stack layout:

[rax saved]: rsp = -1..-UNITS_PER_WORD1;
[r10 saved]: rsp = -UNITS_PER_WORD-1..-2*UNITS_PER_WORD
[reserved-stack]: rsp = -2*UNITS_PER_WORD-1.. -x*UNITS_PER_WORD

So final rsp is -x * UNITS_PER_WORD and the value of allocate is (x -
2) * UNITS_PER_WORD.

To restore r10, we can use [rsp+allocate] as (-2 * UNITS_PER_WORD) is
its location.
To restore rax we need to use [rsp+allocate+UNITS_PER_UNIT] as -
UNITS_PER_WORD is its location.

Regards.
Kai

Re: [patch i386]: Fix PR/60193

2014-02-14 Thread Uros Bizjak

On Fri, Feb 14, 2014 at 3:50 PM, Kai Tietz  wrote:
> 2014-02-14 15:40 GMT+01:00 Uros Bizjak :
>> On Fri, Feb 14, 2014 at 2:48 PM, Kai Tietz  wrote:
>>> 2014-02-14 13:55 GMT+01:00 Uros Bizjak :
 Hello!

> 2014-02-14  Kai Tietz  
>
> PR target/60193
> * config/i386/i386.c (ix86_expand_prologue): Use
> rax register as displacement for restoring %r10, %eax.
>
> Regression-tested for x86_64-unknown-linux-gnu, and
> x86_64-w64-mingw32, and i686-w64-mingw32.  Ok for apply?

 No, you should check allocate to satisfy x86_64_immediate_operand and
 put it into a temporary register if not. There is no need to always
 force constant into a temporary.
>>>
>>> Well, in general I would agree to your statement.  But in this case we
>>> have already the required value in rax-register loaded.  So I don't
>>> see the advantage of using in case of <2^32 constant for those
>>> restore-operation.  At least for code-size optimization it looks to me
>>> better and I am not aware that usage of register is here more
>>> expensive. I might be wrong about later.
>>
>> Ah, I was not aware of the fact that eax already holds the value.
>> However, there were some problems with the patch: eax RTX is
>> unnecessarily regenerated in the wrong mode, UNITS_PER_WORD should be
>> subtracted instead of added - you can use displacement+offset
>> addressing instead.
>>
>> Something like (untested) attached patch.
>>
>> Uros.
>
> No, the patch I attached works fine.  To substract here UNITS_PER_WORD
> is in fact a bug.  As description see how we modify allocate on
> pushing.

This fact was not mentioned in the ChangeLog.

So, simply change

+   t = plus_constant (Pmode, t, -UNITS_PER_WORD);

to

t = plus_constant (Pmode, t, UNITS_PER_WORD);

in my patch, and it should generate correct offset+displacement address.

Please also add the testcase from the PR.

Uros.

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-14 Thread Bernd Schmidt


On 02/14/2014 03:49 PM, Ilya Verbin wrote:

Hi Bernd and Thomas,

Are you planning to support offloading from DSO in PTX/CUDA
environment?  If yes, how are you going to solve the problem of the
collision of function names from different DSOs?


What I'm currently trying to do is to use get_file_function_name, which 
should provide a unique string that can be used to look up an offloaded 
function. That was suggested by Nathan Sidwell, I'd forgotten that such 
a function existed. I haven't actually given thought to whether that'll 
be unique across multiple DSOs, but as long as we also pass a value to 
the libgomp registration function that is unique per DSO, that shouldn't 
really matter - we should be able to reliably look up a function given 
these two keys.


I'm attaching a work-in-progress patch, which is based on patch #2 of 
Michael Zolotukhin's series. Does this look like something you could 
also work with? I should have something a little more complete next week.



Bernd

Index: gomp-4_0-branch/gcc/cgraphunit.c
===
--- gomp-4_0-branch.orig/gcc/cgraphunit.c
+++ gomp-4_0-branch/gcc/cgraphunit.c
@@ -206,6 +206,7 @@ along with GCC; see the file COPYING3.
 #include "pass_manager.h"
 #include "tree-nested.h"
 #include "gimplify.h"
+#include "omp-low.h"
 #include "lto-section-names.h"
 
 /* Queue of cgraph nodes scheduled to be added into cgraph.  This is a
@@ -2019,6 +2020,8 @@ ipa_passes (void)
 
   execute_ipa_summary_passes
 	((struct ipa_opt_pass_d *) passes->all_regular_ipa_passes);
+
+  omp_finish_file ();
 }
 
   /* Some targets need to handle LTO assembler output specially.  */
Index: gomp-4_0-branch/gcc/lto-streamer-out.c
===
--- gomp-4_0-branch.orig/gcc/lto-streamer-out.c
+++ gomp-4_0-branch/gcc/lto-streamer-out.c
@@ -498,6 +498,7 @@ DFS_write_tree_body (struct output_block
 	 special handling in LTO, it must be handled by streamer hooks.  */
 
   DFS_follow_tree_edge (DECL_ATTRIBUTES (expr));
+  DFS_follow_tree_edge (DECL_UNIQUE_ID (expr));
 
   /* Do not follow DECL_ABSTRACT_ORIGIN.  We cannot handle debug information
 	 for early inlining so drop it on the floor instead of ICEing in
Index: gomp-4_0-branch/gcc/omp-low.c
===
--- gomp-4_0-branch.orig/gcc/omp-low.c
+++ gomp-4_0-branch/gcc/omp-low.c
@@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.
 #include "optabs.h"
 #include "cfgloop.h"
 #include "target.h"
+#include "common/common-target.h"
 #include "omp-low.h"
 #include "gimple-low.h"
 #include "tree-cfgcleanup.h"
@@ -191,7 +192,6 @@ struct omp_for_data
   struct omp_for_data_loop *loops;
 };
 
-
 static splay_tree all_contexts;
 static int taskreg_nesting_level;
 static int target_nesting_level;
@@ -1889,6 +1889,7 @@ create_omp_child_function (omp_context *
   DECL_EXTERNAL (decl) = 0;
   DECL_CONTEXT (decl) = NULL_TREE;
   DECL_INITIAL (decl) = make_node (BLOCK);
+
   bool target_p = false;
   if (lookup_attribute ("omp declare target",
 			DECL_ATTRIBUTES (current_function_decl)))
@@ -12379,4 +12380,158 @@ make_pass_omp_simd_clone (gcc::context *
   return new pass_omp_simd_clone (ctxt);
 }
 
+struct ctor_elt_data
+{
+  vec *v;
+  char *buffer;
+  HOST_WIDE_INT offset;
+  tree string_decl;
+};
+
+static void
+make_constructor_elts (tree decl, struct ctor_elt_data *d)
+{
+  const char *str = IDENTIFIER_POINTER (DECL_UNIQUE_ID (decl));
+  size_t len = strlen (str) + 1;
+  memcpy (d->buffer + d->offset, str, len);
+  tree str_addr = build_fold_addr_expr (d->string_decl);
+  tree off = build_int_cst (size_type_node, d->offset);
+  d->offset += len;
+
+  CONSTRUCTOR_APPEND_ELT (d->v, NULL_TREE, build_fold_addr_expr (decl));
+  CONSTRUCTOR_APPEND_ELT (d->v, NULL_TREE,
+			  fold_build_pointer_plus (str_addr, off));
+}
+
+static void
+make_unique_name (tree decl)
+{
+  tree name = DECL_NAME (decl);
+  char *p = (char *)alloca (strlen (IDENTIFIER_POINTER (name)) + 3);
+  p[0] = 'O';
+  p[1] = '_';
+  strcpy (p + 2, IDENTIFIER_POINTER (name));
+  tree id = get_file_function_name (p);
+  DECL_UNIQUE_ID (decl) = id;
+}
+
+static size_t
+build_unique_names (size_t *plen)
+{
+  int n = 0;
+  size_t len = 0;
+  /* Collect all omp-target functions.  */
+  struct cgraph_node *node;
+  FOR_EACH_DEFINED_FUNCTION (node)
+{
+  if (!lookup_attribute ("omp declare target",
+			 DECL_ATTRIBUTES (node->decl))
+	  || !DECL_ARTIFICIAL (node->decl))
+	continue;
+  n++;
+  if (!in_lto_p)
+	make_unique_name (node->decl);
+  else
+	len += strlen (IDENTIFIER_POINTER (DECL_UNIQUE_ID (node->decl))) + 1;
+}
+  /* Collect all omp-target global variables.  */
+  struct varpool_node *vnode;
+  FOR_EACH_DEFINED_VARIABLE (vnode)
+{
+  if (!lookup_attribute ("omp declare target",
+			 DECL_ATTRIBUTES (vnode->decl))
+	  || TREE_CODE (vnode->decl) != VAR_DECL
+	  ||

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-14 Thread Jakub Jelinek

On Fri, Feb 14, 2014 at 04:01:46PM +0100, Bernd Schmidt wrote:
> What I'm currently trying to do is to use get_file_function_name,
> which should provide a unique string that can be used to look up an
> offloaded function. That was suggested by Nathan Sidwell, I'd
> forgotten that such a function existed. I haven't actually given
> thought to whether that'll be unique across multiple DSOs, but as
> long as we also pass a value to the libgomp registration function
> that is unique per DSO, that shouldn't really matter - we should be
> able to reliably look up a function given these two keys.

get_file_function_name is very problematic, it is not unique across DSOs,
and relies on -frandom-seed which user can tweak and even without tweaking
can have collisions in, if we can avoid it at all, we should.
Not to mention that strings, especially get_file_function_name based ones,
aren't really short and will occupy much more space than the tables.

I still don't see what you find wrong on the approach with host/target
address arrays, if you are afraid something will reorder the arrays
(but, what would do that), one can insert indexes into both arrays as well,
which the linker can fill in and you can then verify.

Jakub

Re: [patch i386]: Fix PR/60193

2014-02-14 Thread Kai Tietz

Hi,

Adjusted my original testcase so that eax isn't redeclared and
shadows.  Additional moved
initialization of eax_live up.
ChangeLog

2014-02-14  Kai Tietz  

PR target/60193
* config/i386/i386.c (ix86_expand_prologue): Use
rax register as displacement for restoring %r10, %rax.
Additional fix wrong offset for restoring both-registers.

ChangeLog  testsuite

2014-02-14  Kai Tietz  

PR target/60193
* gcc.target/i386/nest-1.c: New testcase.

Regression-tested for x86_64-unknown-linux-gnu, and
x86_64-w64-mingw32, and i686-w64-mingw32.  Ok for apply?

Regards,
Kai

Index: i386.c
===
--- i386.c(Revision 207686)
+++ i386.c(Arbeitskopie)
@@ -11023,13 +11023,12 @@ ix86_expand_prologue (void)
   rtx r10 = NULL;
   rtx (*adjust_stack_insn)(rtx, rtx, rtx);
   const bool sp_is_cfa_reg = (m->fs.cfa_reg == stack_pointer_rtx);
-  bool eax_live = false;
+  bool eax_live = ix86_eax_live_at_start_p ();
   bool r10_live = false;

   if (TARGET_64BIT)
 r10_live = (DECL_STATIC_CHAIN (current_function_decl) != 0);

-  eax_live = ix86_eax_live_at_start_p ();
   if (eax_live)
 {
   insn = emit_insn (gen_push (eax));
@@ -11084,17 +11083,20 @@ ix86_expand_prologue (void)
  works for realigned stack, too.  */
   if (r10_live && eax_live)
 {
-  t = plus_constant (Pmode, stack_pointer_rtx, allocate);
+  t = gen_rtx_PLUS (Pmode, stack_pointer_rtx, eax);
   emit_move_insn (gen_rtx_REG (word_mode, R10_REG),
   gen_frame_mem (word_mode, t));
-  t = plus_constant (Pmode, stack_pointer_rtx,
- allocate - UNITS_PER_WORD);
+
+  t = plus_constant (Pmode, eax, UNITS_PER_WORD);
+  emit_move_insn (eax, t);
+  t = gen_rtx_PLUS (Pmode, stack_pointer_rtx, eax);
   emit_move_insn (gen_rtx_REG (word_mode, AX_REG),
   gen_frame_mem (word_mode, t));
 }
   else if (eax_live || r10_live)
 {
-  t = plus_constant (Pmode, stack_pointer_rtx, allocate);
+  /* Don't exceed displacement-range for 64-bit.  */
+  t = gen_rtx_PLUS (Pmode, stack_pointer_rtx, eax);
   emit_move_insn (gen_rtx_REG (word_mode,
(eax_live ? AX_REG : R10_REG)),
   gen_frame_mem (word_mode, t));
Index: nest-1.c
===
--- nest-1.c(Revision 0)
+++ nest-1.c(Arbeitskopie)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+
+void foo (int i)
+{
+  void nested (void)
+  {
+char arr[(1U << 31) + 4U];
+arr[i] = 0;
+  }
+
+  nested ();
+}
+

Re: [patch i386]: Fix PR/60193

2014-02-14 Thread Richard Henderson

On 02/14/2014 06:41 AM, Kai Tietz wrote:
> +  else
> +{
> +  t = plus_constant (Pmode, eax, UNITS_PER_WORD);
> +  emit_move_insn (eax, t);
> +  t = gen_rtx_PLUS (Pmode, stack_pointer_rtx, eax);
> +}

Uros is right that you don't need the move here: 8(rsp,rax) is a perfectly fine
address.


r~

Re: [patch i386]: Fix PR/60193

2014-02-14 Thread Kai Tietz

2014-02-14 16:20 GMT+01:00 Richard Henderson :
> On 02/14/2014 06:41 AM, Kai Tietz wrote:
>> +  else
>> +{
>> +  t = plus_constant (Pmode, eax, UNITS_PER_WORD);
>> +  emit_move_insn (eax, t);
>> +  t = gen_rtx_PLUS (Pmode, stack_pointer_rtx, eax);
>> +}
>
> Uros is right that you don't need the move here: 8(rsp,rax) is a perfectly 
> fine
> address.
>
>
> r~

Oh, right.  I missed that.  Is prior patch ok with that adjustment?

Regards,
Kai

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-14 Thread Bernd Schmidt


On 02/14/2014 04:12 PM, Jakub Jelinek wrote:

On Fri, Feb 14, 2014 at 04:01:46PM +0100, Bernd Schmidt wrote:

What I'm currently trying to do is to use get_file_function_name,
which should provide a unique string that can be used to look up an
offloaded function. That was suggested by Nathan Sidwell, I'd
forgotten that such a function existed. I haven't actually given
thought to whether that'll be unique across multiple DSOs, but as
long as we also pass a value to the libgomp registration function
that is unique per DSO, that shouldn't really matter - we should be
able to reliably look up a function given these two keys.


get_file_function_name is very problematic, it is not unique across DSOs,
and relies on -frandom-seed which user can tweak and even without tweaking
can have collisions in, if we can avoid it at all, we should.
Not to mention that strings, especially get_file_function_name based ones,
aren't really short and will occupy much more space than the tables.


How many offloaded functions do we really expect to have in an 
executable? I don't think that's likely to be a bottleneck.


The use of a random-seed is really just a fallback, preferrably it uses 
the name of first symbol defined in the current translation unit which I 
think ought to be reliable enough.



I still don't see what you find wrong on the approach with host/target
address arrays, if you are afraid something will reorder the arrays
(but, what would do that), one can insert indexes into both arrays as well,
which the linker can fill in and you can then verify.


It strikes me as really unnecessarily brittle. On the host side we'd 
have multiple objects linked together in some order to produce such a 
table, on the ptx side we'd have to produce the table all in one go. 
Factor in possibilities like function cloning and I just think there are 
too many ways in which this can utterly fail. I'd rather have something 
that is more robust from the start even if it's slightly less efficient.



Bernd

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-14 Thread Jakub Jelinek

On Fri, Feb 14, 2014 at 04:50:34PM +0100, Bernd Schmidt wrote:
> How many offloaded functions do we really expect to have in an
> executable? I don't think that's likely to be a bottleneck.

First of all, this isn't just about offloaded functions, but also any
global variables that need to be mapped (for OpenMP #pragma omp declare
target surrounded vars).  Like functions, also the vars can be global
(but, with multiple DSOs, even global can be interposed and thus not unique
name), or static, at which point you can have the same name in between
different TUs.

> The use of a random-seed is really just a fallback, preferrably it
> uses the name of first symbol defined in the current translation
> unit which I think ought to be reliable enough.

Many TUs don't have any non-weak global symbols at all, if the symbols
they provide are all comdat etc., then you hit the random seed all the time.
Encoding the random seed into data sections of the binary is a problem for
build reproduceability, unless you always supply -frandom-seed=, but
then it isn't really that much random (e.g. the often used
-frandom-seed=$(@) or similar).  If the weirdo names are only used to name
symbols in .symtab, that randomness at least can be stripped off, but not
if it is in data sections.  So, to me this is far less reliable and against
the spirit of static symbols.

> >I still don't see what you find wrong on the approach with host/target
> >address arrays, if you are afraid something will reorder the arrays
> >(but, what would do that), one can insert indexes into both arrays as well,
> >which the linker can fill in and you can then verify.
> 
> It strikes me as really unnecessarily brittle. On the host side we'd
> have multiple objects linked together in some order to produce such
> a table, on the ptx side we'd have to produce the table all in one

Sure.  So, the linker/linker plugin orders the objects in some order
and thus by concatenation of the smaller per-TU tables creates the
host table, then the same linker/linker plugin just creates the to be target
table with the same order, and feeds that to the offloading target
compiler.

> go. Factor in possibilities like function cloning and I just think
> there are too many ways in which this can utterly fail. I'd rather
> have something that is more robust from the start even if it's
> slightly less efficient.

I don't see how function cloning or anything similar can make a difference
here, you have a function which is address taken and it's address is tracked
in some array, such function can't be cloned (well, can be cloned for
unrelated callers, but the table still keeps the original function, with the
same public ABI etc.).

Jakub

Re: [patch i386]: Fix PR/60193

2014-02-14 Thread Richard Henderson

On 02/14/2014 07:19 AM, Kai Tietz wrote:
> 2014-02-14  Kai Tietz  
> 
> PR target/60193
> * config/i386/i386.c (ix86_expand_prologue): Use
> rax register as displacement for restoring %r10, %rax.
> Additional fix wrong offset for restoring both-registers.
> 
> ChangeLog  testsuite
> 
> 2014-02-14  Kai Tietz  
> 
> PR target/60193
> * gcc.target/i386/nest-1.c: New testcase.

Ok with ...

> +  t = plus_constant (Pmode, eax, UNITS_PER_WORD);
> +  emit_move_insn (eax, t);
> +  t = gen_rtx_PLUS (Pmode, stack_pointer_rtx, eax);

... the move_insn removed.


r~

Re: RFA: one more version of patch for PR59535

2014-02-14 Thread Vladimir Makarov


On 2/14/2014, 6:02 AM, Richard Earnshaw wrote:

On 13/02/14 15:10, Richard Earnshaw wrote:

On 11/02/14 19:43, Vladimir Makarov wrote:

   This is one more version of the patch to fix the PR59535

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

   Here are the results of applying the patch:

 ThumbThumb2

reload 2626334  2400154
lra (before the patch) 2665749  2414926
lra (after the patch)  2626334  2397132


I already wrote that the change in arm.h is to prevent reloading sp as
an address by LRA. Reload has no such problem as it uses legitimate
address hook and LRA mostly relies on base_reg_class.

Richard, I need an approval for this change.

2014-02-11  Vladimir Makarov  

 PR rtl-optimization/59535
 * lra-constraints.c (process_alt_operands): Encourage alternative
 when unassigned pseudo class is superset of the alternative class.
 (inherit_reload_reg): Don't inherit when optimizing for code size.
 * config/arm/arm.h (MODE_BASE_REG_CLASS): Return CORE_REGS for
 Thumb2 and BASE_REGS for modes not less than 4 for LRA.




Index: config/arm/arm.h
===
--- config/arm/arm.h(revision 207562)
+++ config/arm/arm.h(working copy)
@@ -1272,8 +1272,10 @@ enum reg_class
 when addressing quantities in QI or HI mode; if we don't know the
 mode, then we must be conservative.  */
  #define MODE_BASE_REG_CLASS(MODE) \
-(TARGET_ARM || (TARGET_THUMB2 && !optimize_size) ? CORE_REGS :  \
- (((MODE) == SImode) ? BASE_REGS : LO_REGS))
+(TARGET_ARM || (TARGET_THUMB2 && (!optimize_size || arm_lra_flag)) \
+ ? CORE_REGS : ((MODE) == SImode   \
+|| (arm_lra_flag && GET_MODE_SIZE (MODE) >= 4)  \
+? BASE_REGS : LO_REGS))

  /* For Thumb we can not support SP+reg addressing, so we return LO_REGS
 instead of BASE_REGS.  */



Awesome.  Thanks, Vladimir.

I find that while I can't convince myself that the logic in the change
to MODE_BASE_REG_CLASS is wrong, it's very hard to follow.  Furthermore,
when we come to rip out the old reload code it will be quite prone to
getting this wrong.  I think restructuring this along the lines of:

#define MODE_BASE_REG_CLASS(MODE)
   (arm_lra_flag
? (TARGET_32BIT ? CORE_REGS
   : GET_MODE_SIZE (MODE) >= 4 ? BASE_REGS
   : LO_REGS)
: ((TARGET_ARM || (TARGET_THUMB2 && !optimize_size)) ? CORE_REGS
   : ((MODE) == SImode) ? BASE_REGS
   : LO_REGS))

Is both easier to understand and easier to simplify later when reload
goes away.

I'll run a regression test on this and let you know the results.

R.



This version of the arm.h patch survives testing.  Please can you use
this in place of your version.



Thanks, Richard.  I've committed the following patch as rev. 207787.

2014-02-14  Vladimir Makarov  
Richard Earnshaw  

PR rtl-optimization/59535
* lra-constraints.c (process_alt_operands): Encourage alternative
when unassigned pseudo class is superset of the alternative class.
(inherit_reload_reg): Don't inherit when optimizing for code size.
* config/arm/arm.h (MODE_BASE_REG_CLASS): Add version for LRA
returning CORE_REGS for anything but Thumb1 and BASE_REGS for
modes not less than 4 for Thumb1.


Index: lra-constraints.c
===
--- lra-constraints.c   (revision 207562)
+++ lra-constraints.c   (working copy)
@@ -2112,6 +2112,21 @@ process_alt_operands (int only_alternati
  goto fail;
}
 
+ /* If not assigned pseudo has a class which a subset of
+required reg class, it is a less costly alternative
+as the pseudo still can get a hard reg of necessary
+class.  */
+ if (! no_regs_p && REG_P (op) && hard_regno[nop] < 0
+ && (cl = get_reg_class (REGNO (op))) != NO_REGS
+ && ira_class_subset_p[this_alternative][cl])
+   {
+ if (lra_dump_file != NULL)
+   fprintf
+ (lra_dump_file,
+  "%d Super set class reg: reject-=3\n", nop);
+ reject -= 3;
+   }
+
  this_alternative_offmemok = offmemok;
  if (this_costly_alternative != NO_REGS)
{
@@ -4391,6 +4406,9 @@ static bool
 inherit_reload_reg (bool def_p, int original_regno,
enum reg_class cl, rtx insn, rtx next_usage_insns)
 {
+  if (optimize_function_for_size_p (cfun))
+return false;
+
   enum reg_class rclass = lra_get_allocno_class (original_regno);
   rtx original_reg = regno_reg_rtx[original_regno];
   rtx new_reg, new_insns, us

RE: [PATCH, ARM] Skip pr59858.c test for -mfloat-abi=hard

2014-02-14 Thread Ian Bolton

> > The pr59858.c testcase explicitly sets -msoft-float which is
> incompatible
> > with our -mfloat-abi=hard variant.
> >
> > This patch therefore should not be run if you have -mfloat-abi=hard.
> >
> > Tested with both variations for arm-none-eabi build.
> >
> > OK for commit?
> >
> > Cheers,
> > Ian
> >
> >
> > 2014-02-13  Ian Bolton  
> >
> > testsuite/
> > * gcc.target/arm/pr59858.c: Skip test if -mfloat-abi=hard.
> >
> >
> > pr59858-skip-if-hard-float-patch-v2.txt
> >
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c
> b/gcc/testsuite/gcc.target/arm/pr59858.c
> > index 463bd38..1e03203 100644
> > --- a/gcc/testsuite/gcc.target/arm/pr59858.c
> > +++ b/gcc/testsuite/gcc.target/arm/pr59858.c
> > @@ -1,5 +1,6 @@
> >  /* { dg-do compile } */
> >  /* { dg-options "-march=armv5te -marm -mthumb-interwork -Wall -
> Wstrict-prototypes -Wstrict-aliasing -funsigned-char -fno-builtin -fno-
> asm -msoft-float -std=gnu99 -mlittle-endian -mthumb -fno-stack-
> protector  -Os -g -feliminate-unused-debug-types -funit-at-a-time -
> fmerge-all-constants -fstrict-aliasing -fno-tree-loop-optimize -fno-
> tree-dominator-opts -fno-strength-reduce -fPIC -w" } */
> > +/* { dg-skip-if "Test is not compatible with hard-float" { *-*-* } {
> "-mfloat-abi=hard" } { "" } } */
> >
> >  typedef enum {
> >   REG_ENOSYS = -1,
> >
> 
> This won't work if hard-float is the default.  Take a look at the way
> other tests check for this.

Hi Richard,

The test does actually pass if it is hard float by default. My comment
on the skip line was misleading, because the precise issue is when
someone specifies -mfloat-abi=hard on the command line.  I've fixed up
that comment in the attached patch now.

I've also reduced the number of command-line options passed (without
affecting the code generated) in the patch and changed -msoft-float
into -mfloat-abi=soft, since the former is deprecated and maps to the
latter anyway.

OK for commit?

Cheers,
Iandiff --git a/gcc/testsuite/gcc.target/arm/pr59858.c 
b/gcc/testsuite/gcc.target/arm/pr59858.c
index 463bd38..a944b9a 100644
--- a/gcc/testsuite/gcc.target/arm/pr59858.c
+++ b/gcc/testsuite/gcc.target/arm/pr59858.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv5te -marm -mthumb-interwork -Wall 
-Wstrict-prototypes -Wstrict-aliasing -funsigned-char -fno-builtin -fno-asm 
-msoft-float -std=gnu99 -mlittle-endian -mthumb -fno-stack-protector  -Os -g 
-feliminate-unused-debug-types -funit-at-a-time -fmerge-all-constants 
-fstrict-aliasing -fno-tree-loop-optimize -fno-tree-dominator-opts 
-fno-strength-reduce -fPIC -w" } */
+/* { dg-options "-march=armv5te -fno-builtin -mfloat-abi=soft -mthumb 
-fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts -fPIC 
-w" } */
+/* { dg-skip-if "Incompatible command line options: -mfloat-abi=soft 
-mfloat-abi=hard" { *-*-* } { "-mfloat-abi=hard" } { "" } } */
 
 typedef enum {
  REG_ENOSYS = -1,

Re: [PATCH, ARM] Skip pr59858.c test for -mfloat-abi=hard

2014-02-14 Thread Richard Earnshaw

On 14/02/14 16:34, Ian Bolton wrote:
>>> The pr59858.c testcase explicitly sets -msoft-float which is
>> incompatible
>>> with our -mfloat-abi=hard variant.
>>>
>>> This patch therefore should not be run if you have -mfloat-abi=hard.
>>>
>>> Tested with both variations for arm-none-eabi build.
>>>
>>> OK for commit?
>>>
>>> Cheers,
>>> Ian
>>>
>>>
>>> 2014-02-13  Ian Bolton  
>>>
>>> testsuite/
>>> * gcc.target/arm/pr59858.c: Skip test if -mfloat-abi=hard.
>>>
>>>
>>> pr59858-skip-if-hard-float-patch-v2.txt
>>>
>>>
>>> diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c
>> b/gcc/testsuite/gcc.target/arm/pr59858.c
>>> index 463bd38..1e03203 100644
>>> --- a/gcc/testsuite/gcc.target/arm/pr59858.c
>>> +++ b/gcc/testsuite/gcc.target/arm/pr59858.c
>>> @@ -1,5 +1,6 @@
>>>  /* { dg-do compile } */
>>>  /* { dg-options "-march=armv5te -marm -mthumb-interwork -Wall -
>> Wstrict-prototypes -Wstrict-aliasing -funsigned-char -fno-builtin -fno-
>> asm -msoft-float -std=gnu99 -mlittle-endian -mthumb -fno-stack-
>> protector  -Os -g -feliminate-unused-debug-types -funit-at-a-time -
>> fmerge-all-constants -fstrict-aliasing -fno-tree-loop-optimize -fno-
>> tree-dominator-opts -fno-strength-reduce -fPIC -w" } */
>>> +/* { dg-skip-if "Test is not compatible with hard-float" { *-*-* } {
>> "-mfloat-abi=hard" } { "" } } */
>>>
>>>  typedef enum {
>>>   REG_ENOSYS = -1,
>>>
>>
>> This won't work if hard-float is the default.  Take a look at the way
>> other tests check for this.
> 
> Hi Richard,
> 
> The test does actually pass if it is hard float by default. My comment
> on the skip line was misleading, because the precise issue is when
> someone specifies -mfloat-abi=hard on the command line.  I've fixed up
> that comment in the attached patch now.
> 
> I've also reduced the number of command-line options passed (without
> affecting the code generated) in the patch and changed -msoft-float
> into -mfloat-abi=soft, since the former is deprecated and maps to the
> latter anyway.
> 
> OK for commit?
> 

Yes, thanks for clarifying.

R.

> Cheers,
> Ian
> 
> 
> pr59858-skip-if-abi-hard-patch-v3.txt
> 
> 
> diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c 
> b/gcc/testsuite/gcc.target/arm/pr59858.c
> index 463bd38..a944b9a 100644
> --- a/gcc/testsuite/gcc.target/arm/pr59858.c
> +++ b/gcc/testsuite/gcc.target/arm/pr59858.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
> -/* { dg-options "-march=armv5te -marm -mthumb-interwork -Wall 
> -Wstrict-prototypes -Wstrict-aliasing -funsigned-char -fno-builtin -fno-asm 
> -msoft-float -std=gnu99 -mlittle-endian -mthumb -fno-stack-protector  -Os -g 
> -feliminate-unused-debug-types -funit-at-a-time -fmerge-all-constants 
> -fstrict-aliasing -fno-tree-loop-optimize -fno-tree-dominator-opts 
> -fno-strength-reduce -fPIC -w" } */
> +/* { dg-options "-march=armv5te -fno-builtin -mfloat-abi=soft -mthumb 
> -fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts 
> -fPIC -w" } */
> +/* { dg-skip-if "Incompatible command line options: -mfloat-abi=soft 
> -mfloat-abi=hard" { *-*-* } { "-mfloat-abi=hard" } { "" } } */
>  
>  typedef enum {
>   REG_ENOSYS = -1,
>

Re: [patch i386]: Fix PR/60193

2014-02-14 Thread Uros Bizjak

On Fri, Feb 14, 2014 at 4:19 PM, Kai Tietz  wrote:

> Adjusted my original testcase so that eax isn't redeclared and
> shadows.  Additional moved
> initialization of eax_live up.
> ChangeLog
>
> 2014-02-14  Kai Tietz  
>
> PR target/60193
> * config/i386/i386.c (ix86_expand_prologue): Use
> rax register as displacement for restoring %r10, %rax.
> Additional fix wrong offset for restoring both-registers.
>
> ChangeLog  testsuite
>
> 2014-02-14  Kai Tietz  
>
> PR target/60193
> * gcc.target/i386/nest-1.c: New testcase.
>
> Regression-tested for x86_64-unknown-linux-gnu, and
> x86_64-w64-mingw32, and i686-w64-mingw32.  Ok for apply?
>
> Regards,
> Kai
>
> Index: i386.c
> ===
> --- i386.c(Revision 207686)
> +++ i386.c(Arbeitskopie)
> @@ -11023,13 +11023,12 @@ ix86_expand_prologue (void)
>rtx r10 = NULL;
>rtx (*adjust_stack_insn)(rtx, rtx, rtx);
>const bool sp_is_cfa_reg = (m->fs.cfa_reg == stack_pointer_rtx);
> -  bool eax_live = false;
> +  bool eax_live = ix86_eax_live_at_start_p ();
>bool r10_live = false;
>
>if (TARGET_64BIT)
>  r10_live = (DECL_STATIC_CHAIN (current_function_decl) != 0);
>
> -  eax_live = ix86_eax_live_at_start_p ();
>if (eax_live)
>  {
>insn = emit_insn (gen_push (eax));
> @@ -11084,17 +11083,20 @@ ix86_expand_prologue (void)
>   works for realigned stack, too.  */
>if (r10_live && eax_live)
>  {
> -  t = plus_constant (Pmode, stack_pointer_rtx, allocate);
> +  t = gen_rtx_PLUS (Pmode, stack_pointer_rtx, eax);
>emit_move_insn (gen_rtx_REG (word_mode, R10_REG),
>gen_frame_mem (word_mode, t));
> -  t = plus_constant (Pmode, stack_pointer_rtx,
> - allocate - UNITS_PER_WORD);
> +
> +  t = plus_constant (Pmode, eax, UNITS_PER_WORD);
> +  emit_move_insn (eax, t);
> +  t = gen_rtx_PLUS (Pmode, stack_pointer_rtx, eax);

only

 t = plus_constant (Pmode, t, UNITS_PER_WORD);

is enough here.

>emit_move_insn (gen_rtx_REG (word_mode, AX_REG),
>gen_frame_mem (word_mode, t));
>  }
>else if (eax_live || r10_live)
>  {
> -  t = plus_constant (Pmode, stack_pointer_rtx, allocate);
> +  /* Don't exceed displacement-range for 64-bit.  */

Stalled comment.

Re: [PATCH, testsuite] Fix profile test failures

2014-02-14 Thread Steve Ellcey

On Fri, 2014-02-14 at 11:12 +, Richard Sandiford wrote:

> Using target in itself should be OK.  The hostname for rsh/ssh should be
> [board_info $board hostname] rather than $board itself.  So in this case
> [board_info "multi-sim/-EL" hostname] should be multi-sim.  The usual way
> to set that up is to put:
> 
> set_board_info hostname multi-sim
> 
> in multi-sim.exp.
> 
> Thanks,
> Richard

Richard and Joseph, thanks for the pointers.  I tried Richards change
and it partially fixed the problem.  My board started doing an rsh/ssh
to 'multi-sim' instead of 'multi-sim/-EL' but that command times out
because multi-sim is not a machine that I can rsh/ssh to, it is just the
name of my dejagnu baseboard.  Looking around some more I found sim.exp
in dejagnu (which I include in my board) and I see it has overrides for
load, upload, and download but no override for exec.  I added this to
config/sim.exp:

proc sim_exec { dest srcfile args } {
return [remote_exec host $srcfile $args]
}

to match the sim_upload and sim_download definitions and things seem to
work now.  In fact, now that I have this change to sim.exp I no longer
need to set hostname in multi-sim.exp.

I will submit the config/sim.exp patch to the dejagnu mailing list later
today.

Steve Ellcey
sell...@mips.com

Re: [PATCH] Fix Cilk+ ICEs in the alias oracle

2014-02-14 Thread Jeff Law


On 02/13/14 05:47, Richard Biener wrote:

On Thu, 13 Feb 2014, Richard Biener wrote:



Cilk+ builds INDIRECT_REFs when expanding builtins (oops) and thus
those can leak into MEM_EXRs which will lead to ICEs later.
The following patch properly builds a MEM_REF instead.  Grepping
for INDIRECT_REF I found another suspicious use (just removed,
it cannot have triggered and it looks bogus) and the use of
a langhook instead of proper GIMPLE interfaces (function also
used during expansion).

Bootstrap / testing in progress together with some other stuff.

Ok?


Btw, this exposes that Cilk+ is LTO-ignorant - it doesn't properly
register its global trees (bah, more global trees...).  So
the types_compatible_p call ICEs.  Trying to process them in
lto/lto.c:read_cgraph_and_symbols doesn't seem to work though.

So I'm opting to remove the assert and leave fixing LTO for
somebody who cares about Cilk+.

Simpifies the patch as follows, bootstrapped & tested on
x86_64-unknown-linux-gnu.

Richard.

2014-02-13  Richard Biener  

* cilk-common.c (cilk_arrow): Build a MEM_REF, not an INDIRECT_REF.
(get_frame_arg): Drop the assert with langhook types_compatible_p.
Do not strip INDIRECT_REFs.
FWIW, I see a recurring issue here.  Specifically I'm regularly seeing 
cases where submissions are not playing well with LTO.   Speaking 
strictly for myself, I'm not LTO-aware enough to spot them in patches as 
they fly by.


It's not meant to be a criticism, just noting a recurring issue.

jeff

Re: [testsuite] Don't xfail gcc.dg/binop-xor1.c

2014-02-14 Thread Jeff Law


On 02/13/14 03:54, Richard Sandiford wrote:

Richard Sandiford  writes:

Hans-Peter Nilsson  writes:

On Tue, 4 Feb 2014, Rainer Orth wrote:

AFAICT the gcc.dg/binop-xor1.c test is XPASSing everywhere since about
20131114:


Bah, missing analysis. "Everywhere" does not include cris-elf,
powerpc64-unknown-linux-gnu, m68k-unknown-linux-gnu,
s390x-ibm-linux-gnu, powerpc-ibm-aix7.1.0.0.


Based on this list I'm guessing it's another BRANCH_COST==1


BRANCH_COST==1 || !LOGICAL_OP_NON_SHORT_CIRCUIT
ISTM that we ought to have a dejagnu test which we can use to ignore or 
otherwise change the expected output on these targets.


We could try and be clever and determine it from compiler output, or 
somehow arrange for GCC to make that information available to dejagnu. 
But by far the easiest way is just a list of targets.


jeff

Re: [PATCH] x86: Use ud2 assembly mnemonic when available.

2014-02-14 Thread Roland McGrath

On Thu, Feb 13, 2014 at 10:58 PM, Uros Bizjak  wrote:
> You forgot to tell us how the patch tested...

Right.  It's a pretty obviously harmless change.  I tested that the
configure check passes with binutils-2.22, and eyeball'd a -S compile of a
trivial function calling __builtin_trap() to see it uses the mnemonic.  I
don't have ready access to an assembler that does not support the mnemonic,
so I simulated the negative case by momentarily hacking configure to try
'ud2x' instead of 'ud2' and verified that this configure check failed and
that for the same trivial function it then emits '.value 0x0b0f' as before.

> OK for mainline and release branches.

Committed to trunk and 4.8.

Thanks,
Roland

Re: [testsuite] Don't xfail gcc.dg/binop-xor1.c

2014-02-14 Thread Jakub Jelinek

On Fri, Feb 14, 2014 at 10:37:03AM -0700, Jeff Law wrote:
> On 02/13/14 03:54, Richard Sandiford wrote:
> >Richard Sandiford  writes:
> >>Hans-Peter Nilsson  writes:
> >>>On Tue, 4 Feb 2014, Rainer Orth wrote:
> AFAICT the gcc.dg/binop-xor1.c test is XPASSing everywhere since about
> 20131114:
> >>>
> >>>Bah, missing analysis. "Everywhere" does not include cris-elf,
> >>>powerpc64-unknown-linux-gnu, m68k-unknown-linux-gnu,
> >>>s390x-ibm-linux-gnu, powerpc-ibm-aix7.1.0.0.
> >>
> >>Based on this list I'm guessing it's another BRANCH_COST==1
> >
> >BRANCH_COST==1 || !LOGICAL_OP_NON_SHORT_CIRCUIT
> ISTM that we ought to have a dejagnu test which we can use to ignore
> or otherwise change the expected output on these targets.
> 
> We could try and be clever and determine it from compiler output, or
> somehow arrange for GCC to make that information available to
> dejagnu. But by far the easiest way is just a list of targets.

Yeah, the BRANCH_COST and/or LOGICAL_OP_NON_SHORT_CIRCUIT value could e.g.
be emitted in some comment in selected tree dump if details are requested (say
-fdump-tree-gimple-details) and then an effective target can check for that
easily.

Jakub

Re: [Patch, microblaze]: Add optimized lshrsi3

2014-02-14 Thread Michael Eager

On 02/13/14 21:48, David Holsgrove wrote:

Hi Michael,

-Original Message-
From: Michael Eager [mailto:ea...@eagerm.com]
Sent: Sunday, 9 February 2014 2:58 am
To: David Holsgrove; gcc-patches@gcc.gnu.org
Cc: Edgar Iglesias; John Williams; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch, microblaze]: Add optimized lshrsi3

On 11/25/13 23:53, David Holsgrove wrote:

Add optimized lshrsi3 instruction, to be used when optimizing for size
with immediate values over 5

Changelog

2013-11-26  Nagaraju Mekala 

   * gcc/config/microblaze/microblaze.md: Add size optimized lshrsi3 insn.

David --

Please put the description of the patch in the text of the email,
rather than hiding it within an attached patch.

The patch describes a very specific situation where this patch
will have an effect.  Please provide a test case.

Updated version of patch attached with testcase. New Changelog entries are;

Changelog

2013-11-26  David Holsgrove 

  * gcc/config/microblaze/microblaze.md: Add size optimized lshrsi3 insn

ChangeLog/testsuite

2014-02-12  David Holsgrove 

  * gcc/testsuite/gcc.target/microblaze/others/lshrsi_Os_1.c: New test.

thanks,
David

Thanks.

--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

Re: [Patch, microblaze]: Remove SECONDARY_MEMORY_NEEDED

2014-02-14 Thread Michael Eager

On 02/13/14 21:48, David Holsgrove wrote:

Hi Michael, List,

-Original Message-
From: David Holsgrove
Sent: Wednesday, 22 January 2014 1:43 pm
To: 'Michael Eager'; gcc-patches@gcc.gnu.org
Cc: Edgar Iglesias; John Williams; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: RE: [Patch, microblaze]: Remove SECONDARY_MEMORY_NEEDED

Hi Michael,

-Original Message-
From: Michael Eager [mailto:ea...@eagerm.com]
Sent: Friday, 17 January 2014 4:44 am
To: David Holsgrove; gcc-patches@gcc.gnu.org
Cc: Edgar Iglesias; John Williams; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch, microblaze]: Remove SECONDARY_MEMORY_NEEDED

On 11/25/13 23:51, David Holsgrove wrote:

Hi Michael,

I've attached patch based on latest gcc master. Please let me know if
you need anything further.

thanks,
David

On 15 July 2013 14:44, David Holsgrove  wrote:

Hi Michael,

On 18 March 2013 22:49, David Holsgrove 

wrote:

MicroBlaze doesn't have restrictions that would force us to
reload regs via memory. Don't define SECONDARY_MEMORY_NEEDED.
Fixes an ICE when compiling OpenSSL for linux.

Changelog

2013-03-18  Edgar E. Iglesias 

   * gcc/config/microblaze/microblaze.h: Remove

SECONDARY_MEMORY_NEEDED

 definition.

Signed-off-by: Edgar E. Iglesias 
Signed-off-by: Peter A. G. Crosthwaite 

Patch remains the same, please apply when ready.

thanks,
David

Hi David --

Is it possible to add a test case which shows the ICE?

I'm afraid I don’t still have my test environment for this patch from last 
March, I'll
attempt to recreate and distil into a small test case if possible, based on the 
error
encountered whilst building openssl.

I'll update again when I have some further detail.

I've managed to recreate the original internal compiler error whilst building 
openssl with microblazeel linux toolchain.

I've reduced the error down to the attached testcase.
It is taken directly from openssl (with no dependencies on openssl headers), so 
I'm unsure of the suitability of this test both technically and license wise 
for inclusion in gcc.

Changelog entry would be;

2013-03-18  Edgar E. Iglesias 

  * gcc/config/microblaze/microblaze.h: Remove SECONDARY_MEMORY_NEEDED
definition.

ChangeLog/testsuite

2014-02-13  David Holsgrove 

  * gcc/testsuite/gcc.target/microblaze/others/mem_reload.c: New test.

thanks,
David

Thanks.

--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

Re: [Patch, microblaze]: Add TARGET_ASM_OUTPUT_MI_THUNK to support varargs thunk

2014-02-14 Thread Michael Eager

On 02/13/14 21:48, David Holsgrove wrote:

Hi Michael,

-Original Message-
From: Michael Eager [mailto:ea...@eagerm.com]
Sent: Sunday, 26 January 2014 1:57 am
To: David Holsgrove
Cc: gcc-patches@gcc.gnu.org; Edgar Iglesias; John Williams; Vinod Kathail;
Vidhumouli Hunsigida; Nagaraju Mekala; Tom Shui
Subject: Re: [Patch, microblaze]: Add TARGET_ASM_OUTPUT_MI_THUNK to
support varargs thunk

On 07/14/13 21:37, David Holsgrove wrote:

Hi Michael,

-Original Message-
From: Michael Eager [mailto:ea...@eagerm.com]
Sent: Saturday, 13 July 2013 9:33 am
To: David Holsgrove
Cc: gcc-patches@gcc.gnu.org; Edgar Iglesias; John Williams; Vinod Kathail;
Vidhumouli Hunsigida; Nagaraju Mekala; Tom Shui
Subject: Re: [Patch, microblaze]: Add TARGET_ASM_OUTPUT_MI_THUNK to
support varargs thunk

On 03/18/13 05:49, David Holsgrove wrote:

Changelog

2013-03-18  David Holsgrove 

* gcc/config/microblaze/microblaze.c: Add

microblaze_asm_output_mi_thunk

  and define TARGET_ASM_OUTPUT_MI_THUNK and

TARGET_ASM_CAN_OUTPUT_MI_THUNK

Sorry it has taken so long to review this patch.

[--snip--]

2013-07-15  David Holsgrove 

   * gcc/config/microblaze/microblaze.c: Add microblaze_asm_output_mi_thunk
 and define TARGET_ASM_OUTPUT_MI_THUNK and

TARGET_ASM_CAN_OUTPUT_MI_THUNK

This patch causes a number of regressions in the G++ test suite.
For example, abi/covariant{3,4,5}.C, abi/vcall1.C,
inherit/covariant{1,2,3,4,17,18}.C,
inherit/thunk{7,10}.C and others.

Apologies - this patch was originally written in 2012 and submitted to this 
list a year ago.
It has not been reviewed or tested for regressions in 12 months, and has taken 
me a bit of time to go back to the original work and rerun the testsuite as it 
stands today.

Please find attached updated patch which has no regressions. I believe the 
testcase which checks the functionality of this patch is ' 
g++.old-deja/g++.jason/thunk3.C'

Changelog entry remains the same since March 2013.

Thanks.

--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

Re: [testsuite] Don't xfail gcc.dg/binop-xor1.c

2014-02-14 Thread Hans-Peter Nilsson

On Fri, 14 Feb 2014, Jakub Jelinek wrote:
> On Fri, Feb 14, 2014 at 10:37:03AM -0700, Jeff Law wrote:
> > On 02/13/14 03:54, Richard Sandiford wrote:
> > >Richard Sandiford  writes:
> > >>Hans-Peter Nilsson  writes:
> > >>>On Tue, 4 Feb 2014, Rainer Orth wrote:
> > AFAICT the gcc.dg/binop-xor1.c test is XPASSing everywhere since about
> > 20131114:
> > >>>
> > >>>Bah, missing analysis. "Everywhere" does not include cris-elf,
> > >>>powerpc64-unknown-linux-gnu, m68k-unknown-linux-gnu,
> > >>>s390x-ibm-linux-gnu, powerpc-ibm-aix7.1.0.0.
> > >>
> > >>Based on this list I'm guessing it's another BRANCH_COST==1
> > >
> > >BRANCH_COST==1 || !LOGICAL_OP_NON_SHORT_CIRCUIT
> > ISTM that we ought to have a dejagnu test which we can use to ignore
> > or otherwise change the expected output on these targets.
> >
> > We could try and be clever and determine it from compiler output, or
> > somehow arrange for GCC to make that information available to
> > dejagnu. But by far the easiest way is just a list of targets.
>
> Yeah, the BRANCH_COST and/or LOGICAL_OP_NON_SHORT_CIRCUIT value could e.g.
> be emitted in some comment in selected tree dump if details are requested (say
> -fdump-tree-gimple-details) and then an effective target can check for that
> easily.

I've been thinking along those lines (though a RTL dump will be
somewhat more appropriate).  A target list will be insufficient
when the branch cost etc. depends on compiler options.

brgds, H-P

Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL

2014-02-14 Thread Jeff Law


On 02/11/14 02:06, Eric Botcazou wrote:

I pondered changing the condition for swapping the insn order, but it
didn't seem worth it.  I doubt we see many 3->2 combinations where I3 is
a JUMP_INSN, the result turns into two simple sets and the insn swapping
code you wrote decides it needs to swap the insns.


I didn't actually write it, just enhanced it recently, that's why I suggested
to do the same here.  It's one line of code and we have an example of valid
simplification at hand so I think we ought to do it.


It seems to me that as long as we're re-using the existing insns to
contain the simple sets that we have to ensure that they're INSNs, not
CALL_INSNs or JUMP_INSNs.


I disagree, nullifying JUMP_INSNs by changing them to (set (pc) (pc)) is a
standard method in the combiner.

So, the code has this structure

if (looks safe)
  emit in existing order
else if (reverse order looks safe)
  emit in reversed order
else
  undo_all


In this specific case, the existing order is never going to look safe 
because set1 uses (sp) as an input argument and use_crosses_set_p is 
very conservative when the value is the stack pointer on a PUSH_ROUNDING 
machine (such as the m68k)


So we could put the verification code that both I3 and I2 are INSNs in 
the else if (reverse order looks safe) clause.That would mean for 
this testcase, we ultimately undo_all.  But I consider that reasonable 
given the only reason this instance bled into RTL land was -O1 instead 
of -O2 compilation.


I already know that variant works as it's what I had before I started 
thinking about what happens if we have a CALL_INSN as I2.


Jeff

Re: [PATCH i386] Enable -freorder-blocks-and-partition

2014-02-14 Thread Teresa Johnson

On Tue, Feb 11, 2014 at 2:21 PM, Teresa Johnson  wrote:
> On Thu, Dec 19, 2013 at 10:19 PM, Teresa Johnson  wrote:
>> On Thu, Dec 12, 2013 at 5:13 PM, Jan Hubicka  wrote:
 On Wed, Dec 11, 2013 at 1:21 AM, Martin Liška  
 wrote:
 > Hello,
 >I prepared a collection of systemtap graphs for GIMP.
 >
 > 1) just my profile-based function reordering: 550 pages
 > 2) just -freorder-blocks-and-partitions: 646 pages
 > 3) just -fno-reorder-blocks-and-partitions: 638 pages
 >
 > Please see attached data.

 Thanks for the data. A few observations/questions:

 With both 1) (your (time-based?) reordering) and 2)
 (-freorder-blocks-and-partitions) there are a fair amount of accesses
 out of the cold section. I'm not seeing so many accesses out of the
 cold section in the apps I am looking at with splitting enabled. In
>>>
>>> I see you already comitted the patch, so perhaps Martin's measurement assume
>>> the pass is off by default?
>>>
>>> I rebuilded GCC with profiledboostrap and with the linkerscript unmapping
>>> text.unlikely.  I get ICE in:
>>> (gdb) bt
>>> #0  diagnostic_set_caret_max_width(diagnostic_context*, int) () at 
>>> ../../gcc/diagnostic.c:108
>>> #1  0x00f68457 in diagnostic_initialize (context=0x18ae000 
>>> , n_opts=n_opts@entry=1290) at 
>>> ../../gcc/diagnostic.c:135
>>> #2  0x0100050e in general_init (argv0=) at 
>>> ../../gcc/toplev.c:1110
>>> #3  toplev_main(int, char**) () at ../../gcc/toplev.c:1922
>>> #4  0x7774cbe5 in __libc_start_main () from /lib64/libc.so.6
>>> #5  0x00f7898d in _start () at ../sysdeps/x86_64/start.S:122
>>>
>>> That is relatively early in startup process. The function seems inlined and
>>> it fails only on second invocation, did not have time to investigate 
>>> further,
>>> yet while without -fprofile-use it starts...
>>
>> I'll see if I can reproduce this and investigate, although at this
>> point that might have to wait until after my holiday vacation.
>
> I tried the linkerscript with cpu2006 and got quite a lot of failures
> (using the ref inputs to train, so the behavior should be the same in
> both profile-gen and profile-use). I investigated the one in bzip2 and
> found an issue that may not be easy to fix and is perhaps something it
> is not worth fixing. The case was essentially the following: Function
> foo was called by callsites A and B, with call counts 148122615 and
> 18, respectively.
>
> Within function foo, there was a basic block that had a very low count
> (compared to the entry bb count of 148122633), and therefore a 0
> frequency:
>
> ;;   basic block 6, loop depth 0, count 18, freq 0
>
> The ipa inliner decided to inline into callsite A but not B. Because
> the vast majority of the call count was from callsite A, when we
> performed execute_fixup_cfg after doing the inline transformation, the
> count_scale is 0 and the out-of-line copy of foo's blocks all got
> counts 0. However, most of the bbs still had non-zero frequencies. But
> bb 6 ended up with a count and frequency of 0, leading us to split it
> out. It turns out that at least one of the 18 counts for this block
> were from callsite B, and we ended up trying to execute the split bb
> in the out-of-line copy from that callsite.
>
> I can think of a couple of ways to prevent this to happen (e.g. have
> execute_fixup_cfg give the block a count or frequency of 1 instead of
> 0, or mark the bb somehow as not eligible for splitting due to a low
> confidence in the 0 count/frequency), but they seem a little hacky. I
> am thinking that the splitting here is something we shouldn't worry
> about - it is so close to 0 count that the occasional jump to the
> split section caused by the lack of precision in the frequency is not
> a big deal. Unfortunately, without fixing this I can't use the linker
> script without disabling inlining to avoid this problem.
>
> I reran cpu2006 with the linker script but with -fno-inline and got 6
> more benchmarks to pass. So there are other issues out there. I will
> take a look at another one and see if it is a similar
> scaling/precision issue. I'm thinking that I may just use my heatmap
> scripts (which leverages perf-events profiles) to see if there is any
> significant execution in the split cold sections, since it seems we
> can't realistically prevent any and all execution of the cold split
> sections, and that is more meaningful anyway.

I collected perf cycle profiles for all of the split cpu2006 binaries
(with the patch I sent separately to do more COMDAT profile fixes),
and processed the results to see how many samples were in cold
functions. There were 13 benchmarks that had non-zero samples in split
cold sections, although they were very small as a total percentage of
sampled cycles in the benchmark. Here are the results, sorted in
reverse order of the percentage of total samples in the cold section:

Benchmark Cold samples   Total samplesPercent c

RE: [PATCH] Fix Cilk+ ICEs in the alias oracle

2014-02-14 Thread Iyer, Balaji V



> -Original Message-
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Friday, February 14, 2014 12:34 PM
> To: Richard Biener; gcc-patches@gcc.gnu.org
> Cc: Iyer, Balaji V
> Subject: Re: [PATCH] Fix Cilk+ ICEs in the alias oracle
> 
> On 02/13/14 05:47, Richard Biener wrote:
> > On Thu, 13 Feb 2014, Richard Biener wrote:
> >
> >>
> >> Cilk+ builds INDIRECT_REFs when expanding builtins (oops) and thus
> >> those can leak into MEM_EXRs which will lead to ICEs later.
> >> The following patch properly builds a MEM_REF instead.  Grepping for
> >> INDIRECT_REF I found another suspicious use (just removed, it cannot
> >> have triggered and it looks bogus) and the use of a langhook instead
> >> of proper GIMPLE interfaces (function also used during expansion).
> >>
> >> Bootstrap / testing in progress together with some other stuff.
> >>
> >> Ok?
> >
> > Btw, this exposes that Cilk+ is LTO-ignorant - it doesn't properly
> > register its global trees (bah, more global trees...).  So the
> > types_compatible_p call ICEs.  Trying to process them in
> > lto/lto.c:read_cgraph_and_symbols doesn't seem to work though.
> >
> > So I'm opting to remove the assert and leave fixing LTO for somebody
> > who cares about Cilk+.
> >
> > Simpifies the patch as follows, bootstrapped & tested on
> > x86_64-unknown-linux-gnu.
> >
> > Richard.
> >
> > 2014-02-13  Richard Biener  
> >
> > * cilk-common.c (cilk_arrow): Build a MEM_REF, not an
> INDIRECT_REF.
> > (get_frame_arg): Drop the assert with langhook
> types_compatible_p.
> > Do not strip INDIRECT_REFs.
> FWIW, I see a recurring issue here.  Specifically I'm regularly seeing
> cases where submissions are not playing well with LTO.   Speaking
> strictly for myself, I'm not LTO-aware enough to spot them in patches as they
> fly by.

I thought I had handled LTO correctly. I apologize if I made a mistake. I 
assure you that it was not deliberate. I even had my tests use -flto flags to 
make sure it is going through it correctly...

> 
> It's not meant to be a criticism, just noting a recurring issue.
> 
> jeff

Re: RFA: RL78: Add missing instruction patterns

2014-02-14 Thread DJ Delorie


I'm OK with adding patterns in general, but I wonder if gcc would
produce better code if they were split into QImode ops earlier?

+  [(set (match_operand:HI 0 "register_operand"  "=Av")
+   (and:HI (match_operand:HI 1 "register_operand"  "0")
+   (match_operand:HI 2 "immediate_operand" "n")))
+   ]
+  "rl78_real_insns_ok ()"
+  "and\t%q0, %q2 \; and\t%Q0, %Q2"

You can't use %Q with AX and AND.

+;; operations upon pointers.  Failure to provide these patterns
+;; results in GCC generating illegal subregs, eg: (SUBREG:QI (REG:HI 33) 1)

Why is that subreg illegal?

+(define_insn "*nandhi3_virt"

Is this one just an optimization?

Re: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork

2014-02-14 Thread Richard Henderson

On 02/14/2014 12:21 AM, Jakub Jelinek wrote:
>> Any reason not to just run gomp_free_thread_pool from 
>> gomp_after_fork_callback
>> directly?  I see no restrictions on what kind of code is allowed to execute
>> during that callback.
> 
> Well, fork is async signal safe function, so calling malloc/free, or any
> kind of synchronization primitives is completely unsafe there.

That's as may be, but even the opengroup's rationale for pthread_atfork
mentions using locks in the three callbacks.  I strongly suspect that no real
use of pthread_atfork can ever really be async safe.

r~

pure virtual method called

2014-02-14 Thread Benjamin Redelings


Hi Jan,

I hope to report a bug soon, but in the meantime I wanted to let 
you know that for the last month or so, the 4.9 branch has (I think) a 
bug at O3, where my program gets:


pure virtual method called
terminate called without an active exception
Aborted

4.8 works fine.

I am guessing this is related to your devirt work.  I haven't been 
able to produce a reduce testcase yet (sorry!), but here is some code 
that illustrates the C++ type for the object whose virtual table is (?) 
messed up.  (Note: this code compiles but does NOT crash.)


Again, sorry for not having a testcase.  I'll make one soon. 
Hopefully it is helpful to know that bugs still exist.


-BenRI

#include 

class Object
{
  virtual Object* clone() const =0;
};

template 
class Box: public Object, public T
{
public:
  Box* clone() const {return new Box(*this);}
};

template 
using Vector = Box>;

int main()
{
  Vector v;
  v.clone();
}

-BenRI

P.S. The bug exists in debian gcc snapshots taken on 2014-02-12 and 
2014-01-22, Linux AMD64.

Re: [RFA] [PATCH] [rtl-optimization/60131] Fix rtl-checking failure in REE

2014-02-14 Thread Richard Henderson

On 02/10/2014 03:06 PM, Jeff Law wrote:
> + PR rtl-optimization/60131
> + * ree.c (get_extended_src_reg): New function.
> + (combine_reaching_defs): Use it rather than assuming location
> + of REG.
> + (find_and_remove_re): Verify first operand of extension is
> + a REG before adding the insns to the copy list.

Ok.


r~

Re: pure virtual method called

2014-02-14 Thread Jan Hubicka

Hi,
the testcase would be wonderful - those bugs are hard to catch. I fixed some 
issues
recently, so you may try recent snapshot if you didn't.
You may try -fno-devirtualize to see if the bug goes away (likely it will) and 
you
may try to look in -fdump-tree-all -fdump-ipa-all dumps where cxa_pure_virtual 
call
appears in the program and send me some context.

Honza
> Hi Jan,
> 
> I hope to report a bug soon, but in the meantime I wanted to let
> you know that for the last month or so, the 4.9 branch has (I think)
> a bug at O3, where my program gets:
> 
> pure virtual method called
> terminate called without an active exception
> Aborted
> 
> 4.8 works fine.
> 
> I am guessing this is related to your devirt work.  I haven't
> been able to produce a reduce testcase yet (sorry!), but here is
> some code that illustrates the C++ type for the object whose virtual
> table is (?) messed up.  (Note: this code compiles but does NOT
> crash.)
> 
> Again, sorry for not having a testcase.  I'll make one soon.
> Hopefully it is helpful to know that bugs still exist.
> 
> -BenRI
> 
> #include 
> 
> class Object
> {
>   virtual Object* clone() const =0;
> };
> 
> template 
> class Box: public Object, public T
> {
> public:
>   Box* clone() const {return new Box(*this);}
> };
> 
> template 
> using Vector = Box>;
> 
> int main()
> {
>   Vector v;
>   v.clone();
> }
> 
> -BenRI
> 
> P.S. The bug exists in debian gcc snapshots taken on 2014-02-12 and
> 2014-01-22, Linux AMD64.

Re: pure virtual method called

2014-02-14 Thread Benjamin Redelings


On 02/14/2014 03:21 PM, Jan Hubicka wrote:

Hi,
the testcase would be wonderful - those bugs are hard to catch.

Yeah - hope to soon.

  I fixed some issues
recently, so you may try recent snapshot if you didn't.
You may try -fno-devirtualize to see if the bug goes away (likely it will) and 
you
may try to look in -fdump-tree-all -fdump-ipa-all dumps where cxa_pure_virtual 
call
appears in the program and send me some context.

Thanks!
-BenRI

[Patch, fortran] PR 59599 ICE on intrinsic ichar

2014-02-14 Thread Mikael Morin

Hello,

this bug is not a regression, but the patch shouldn't wreck the compiler
too much on the other hand.
The problem is a wrong number of arguments while generating code for the
ichar intrinsic.  The correct number is 2 without the kind argument and
3 with it.
The attached patch uses the gfc_intrinsic_argument_list_length function
like it's done for other intrinsics.

Regression tested on x86_64-unknown-linux-gnu. OK for trunk/4.8/4.7?

Mikael
2014-02-14  Mikael Morin  

PR fortran/59599
* trans-intrinsic.c (gfc_conv_intrinsic_ichar): Calculate the
number of arguments.

2014-02-14  Mikael Morin  

PR fortran/59599
* gfortran.dg/ichar_3.f90: New test.

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 1eb9490..cff8e89 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -4689,8 +4689,10 @@ static void
 gfc_conv_intrinsic_ichar (gfc_se * se, gfc_expr * expr)
 {
   tree args[2], type, pchartype;
+  int nargs;
 
-  gfc_conv_intrinsic_function_args (se, expr, args, 2);
+  nargs = gfc_intrinsic_argument_list_length (expr);
+  gfc_conv_intrinsic_function_args (se, expr, args, nargs);
   gcc_assert (POINTER_TYPE_P (TREE_TYPE (args[1])));
   pchartype = gfc_get_pchar_type (expr->value.function.actual->expr->ts.kind);
   args[1] = fold_build1_loc (input_location, NOP_EXPR, pchartype, args[1]);

! { dg-do compile }
!
! PR fortran/59599
! The call to ichar was triggering an ICE.
!
! Original testcase from Fran Martinez Fadrique 

character(1) cpk(2)
integer res(2)
cpk = 'a'
res = ichar( cpk, kind=1 )
print *, ichar( cpk, kind=1 )
end

[jit] Add some syntactic sugar to C++ wrapper API

2014-02-14 Thread David Malcolm

Committed to branch dmalcolm/jit:

gcc/jit/
* libgccjit++.h (gccjit::type::zero): New method.
(gccjit::type::one): New method.
(gccjit::function::add_call): New family of overloaded methods.

gcc/testsuite/
* jit.dg/test-operator-overloading.cc (make_test_quadratic): Use
the new "zero" and "one" methods of gccjit::type.
* jit.dg/test-quadratic.cc (make_test_quadratic): Use the new
"add_call" method of gccjit::function.
---
 gcc/jit/ChangeLog.jit |  6 +++
 gcc/jit/libgccjit++.h | 65 +++
 gcc/testsuite/ChangeLog.jit   |  7 +++
 gcc/testsuite/jit.dg/test-operator-overloading.cc |  4 +-
 gcc/testsuite/jit.dg/test-quadratic.cc|  3 +-
 5 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit
index 603dd96..39706f6 100644
--- a/gcc/jit/ChangeLog.jit
+++ b/gcc/jit/ChangeLog.jit
@@ -1,3 +1,9 @@
+2014-02-14  David Malcolm  
+
+   * libgccjit++.h (gccjit::type::zero): New method.
+   (gccjit::type::one): New method.
+   (gccjit::function::add_call): New family of overloaded methods.
+
 2014-02-13  David Malcolm  
 
* libgccjit.h (gcc_jit_context_get_builtin_function): New.
diff --git a/gcc/jit/libgccjit++.h b/gcc/jit/libgccjit++.h
index 897b262..a718d0c 100644
--- a/gcc/jit/libgccjit++.h
+++ b/gcc/jit/libgccjit++.h
@@ -259,6 +259,9 @@ namespace gccjit
 
 type get_pointer ();
 
+// Shortcuts for getting values of numeric types:
+rvalue zero ();
+rvalue one ();
  };
 
   class function : public object
@@ -311,6 +314,20 @@ namespace gccjit
 void add_return (rvalue rvalue,
 location loc = location ());
 
+/* A way to add a function call to the body of a function being
+   defined, with various numbers of args.  */
+rvalue add_call (function other,
+location loc = location ());
+rvalue add_call (function other,
+rvalue arg0,
+location loc = location ());
+rvalue add_call (function other,
+rvalue arg0, rvalue arg1,
+location loc = location ());
+rvalue add_call (function other,
+rvalue arg0, rvalue arg1, rvalue arg2,
+location loc = location ());
+
 /* A series of overloaded operator () with various numbers of arguments
for a very terse way of creating a call to this function.  The call
is created within the same context as the function itself, which may
@@ -991,6 +1008,18 @@ type::get_pointer ()
   return type (gcc_jit_type_get_pointer (get_inner_type ()));
 }
 
+inline rvalue
+type::zero ()
+{
+  return get_context ().new_rvalue (*this, 0);
+}
+
+inline rvalue
+type::one ()
+{
+  return get_context ().new_rvalue (*this, 1);
+}
+
 // class function
 inline function::function () : object (NULL) {}
 inline function::function (gcc_jit_function *inner)
@@ -1136,6 +1165,42 @@ function::add_return (rvalue rvalue,
 }
 
 inline rvalue
+function::add_call (function other,
+   location loc)
+{
+  rvalue c = get_context ().new_call (other, loc);
+  add_eval (c);
+  return c;
+}
+inline rvalue
+function::add_call (function other,
+   rvalue arg0,
+   location loc)
+{
+  rvalue c = get_context ().new_call (other, arg0, loc);
+  add_eval (c);
+  return c;
+}
+inline rvalue
+function::add_call (function other,
+   rvalue arg0, rvalue arg1,
+   location loc)
+{
+  rvalue c = get_context ().new_call (other, arg0, arg1, loc);
+  add_eval (c);
+  return c;
+}
+inline rvalue
+function::add_call (function other,
+   rvalue arg0, rvalue arg1, rvalue arg2,
+   location loc)
+{
+  rvalue c = get_context ().new_call (other, arg0, arg1, arg2, loc);
+  add_eval (c);
+  return c;
+}
+
+inline rvalue
 function::operator() (location loc)
 {
   return get_context ().new_call (*this, loc);
diff --git a/gcc/testsuite/ChangeLog.jit b/gcc/testsuite/ChangeLog.jit
index 1aa8082..ec1d76a 100644
--- a/gcc/testsuite/ChangeLog.jit
+++ b/gcc/testsuite/ChangeLog.jit
@@ -1,3 +1,10 @@
+2014-02-14  David Malcolm  
+
+   * jit.dg/test-operator-overloading.cc (make_test_quadratic): Use
+   the new "zero" and "one" methods of gccjit::type.
+   * jit.dg/test-quadratic.cc (make_test_quadratic): Use the new
+   "add_call" method of gccjit::function.
+
 2014-02-13  David Malcolm  
 
* jit.dg/harness.h (CHECK_DOUBLE_VALUE): New macro.
diff --git a/gcc/testsuite/jit.dg/test-operator-overloading.cc 
b/gcc/testsuite/jit.dg/test-operator-overloading.cc
index 1124d9c..226cb22 100644
--- a/gcc/testsuite/jit.dg/test-operator-overloading.cc
+++ b/gcc/testsuite/jit.dg/test-operator-overloading.cc
@@ -253,11 +253,11 @@ make_test_quadratic (quadratic_test &testcase)
   test_quadratic.add_assignment (

[PATCH] Fix PR 60203: No direct move support for long double/_Decimal128 on powerpc ISA 2.07

2014-02-14 Thread Michael Meissner

When I added direct move support for ISA 2.07 (power8), I did not add direct
move support for long double and _Decimal128 types.  This patch adds the direct
move support for those types when you are running in 64-bit mode.  Now, there
are still the problems raised in PR 25972 on machines without direct move, but
this simple patch does help the machines with direct move.

I bootstrapped the compiler with/without the change, and there were no
regressions in the test suite.  Is it ok to check into the tree?

[gcc]
2014-02-14  Michael Meissner  

PR target/60203
* config/rs6000/rs6000.md (rreg): Add TFmode, TDmode constraints.
(mov_internal, TFmode/TDmode): Split TFmode/TDmode moves
into 64-bit and 32-bit moves.  On 64-bit moves, add support for
using direct move instructions on ISA 2.07.  Also adjust
instruction length for 64-bit.
(mov_64bit, TFmode/TDmode): Likewise.
(mov_32bit, TFmode/TDmode): Likewise.

[gcc/testsuite]
2014-02-14  Michael Meissner  

PR target/60203
* gcc.target/powerpc/pr60203.c: New testsuite.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 207791)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -387,6 +387,8 @@ (define_mode_attr ptrm [(SI "m")
 
 (define_mode_attr rreg [(SF   "f")
(DF   "ws")
+   (TF   "f")
+   (TD   "f")
(V4SF "wf")
(V2DF "wd")])
 
@@ -9524,10 +9526,22 @@ (define_expand "mov"
 ;; It's important to list Y->r and r->Y before r->r because otherwise
 ;; reload, given m->r, will try to pick r->r and reload it, which
 ;; doesn't make progress.
-(define_insn_and_split "*mov_internal"
+(define_insn_and_split "*mov_64bit"
+  [(set (match_operand:FMOVE128 0 "nonimmediate_operand" "=m,d,d,Y,r,r,r,wm")
+   (match_operand:FMOVE128 1 "input_operand" "d,m,d,r,YGHF,r,wm,r"))]
+  "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_POWERPC64
+   && (gpc_reg_operand (operands[0], mode)
+   || gpc_reg_operand (operands[1], mode))"
+  "#"
+  "&& reload_completed"
+  [(pc)]
+{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
+  [(set_attr "length" "8,8,8,12,12,8,8,8")])
+
+(define_insn_and_split "*mov_32bit"
   [(set (match_operand:FMOVE128 0 "nonimmediate_operand" "=m,d,d,Y,r,r")
(match_operand:FMOVE128 1 "input_operand" "d,m,d,r,YGHF,r"))]
-  "TARGET_HARD_FLOAT && TARGET_FPRS
+  "TARGET_HARD_FLOAT && TARGET_FPRS && !TARGET_POWERPC64
&& (gpc_reg_operand (operands[0], mode)
|| gpc_reg_operand (operands[1], mode))"
   "#"

Re: [PATCH] Fix PR 60203: No direct move support for long double/_Decimal128 on powerpc ISA 2.07

2014-02-14 Thread Michael Meissner

I forgot to add the new test to my patches.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
2014-02-14  Michael Meissner  

PR target/60203
* gcc.target/powerpc/pr60203.c: New testsuite.

Index: gcc/testsuite/gcc.target/powerpc/pr60203.c
===
--- gcc/testsuite/gcc.target/powerpc/pr60203.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr60203.c  (revision 0)
@@ -0,0 +1,40 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O3" } */
+
+union u_ld { long double ld; double d[2]; };
+
+long double
+pack (double a, double aa)
+{
+  union u_ld u;
+  u.d[0] = a;
+  u.d[1] = aa;
+  return u.ld;
+}
+
+double
+unpack_0 (long double x)
+{
+  union u_ld u;
+  u.ld = x;
+  return u.d[0];
+}
+
+double
+unpack_1 (long double x)
+{
+  union u_ld u;
+  u.ld = x;
+  return u.d[1];
+}
+
+/* { dg-final { scan-assembler-not "stfd"   } } */
+/* { dg-final { scan-assembler-not "lfd"} } */
+/* { dg-final { scan-assembler-not "lxsdx"  } } */
+/* { dg-final { scan-assembler-not "stxsdx" } } */
+/* { dg-final { scan-assembler-not "mfvsrd" } } */
+/* { dg-final { scan-assembler-not "mtvsrd" } } */
+
+

Re: [Patch, fortran] PR 59599 ICE on intrinsic ichar

2014-02-14 Thread Steve Kargl

On Fri, Feb 14, 2014 at 10:51:14PM +0100, Mikael Morin wrote:
> Hello,
> 
> this bug is not a regression, but the patch shouldn't wreck the compiler
> too much on the other hand.
> The problem is a wrong number of arguments while generating code for the
> ichar intrinsic.  The correct number is 2 without the kind argument and
> 3 with it.
> The attached patch uses the gfc_intrinsic_argument_list_length function
> like it's done for other intrinsics.
> 

Once the problem is identified, the patch is almost trivial.
>From the Fortran standpoint, it's OK.  Need RM approval.

-- 
Steve

Re: [PATCH] Fix PR 60203: No direct move support for long double/_Decimal128 on powerpc ISA 2.07

2014-02-14 Thread David Edelsohn

On Fri, Feb 14, 2014 at 5:59 PM, Michael Meissner
 wrote:
> When I added direct move support for ISA 2.07 (power8), I did not add direct
> move support for long double and _Decimal128 types.  This patch adds the 
> direct
> move support for those types when you are running in 64-bit mode.  Now, there
> are still the problems raised in PR 25972 on machines without direct move, but
> this simple patch does help the machines with direct move.
>
> I bootstrapped the compiler with/without the change, and there were no
> regressions in the test suite.  Is it ok to check into the tree?
>
> [gcc]
> 2014-02-14  Michael Meissner  
>
> PR target/60203
> * config/rs6000/rs6000.md (rreg): Add TFmode, TDmode constraints.
> (mov_internal, TFmode/TDmode): Split TFmode/TDmode moves
> into 64-bit and 32-bit moves.  On 64-bit moves, add support for
> using direct move instructions on ISA 2.07.  Also adjust
> instruction length for 64-bit.
> (mov_64bit, TFmode/TDmode): Likewise.
> (mov_32bit, TFmode/TDmode): Likewise.
>
> [gcc/testsuite]
> 2014-02-14  Michael Meissner  
>
> PR target/60203
> * gcc.target/powerpc/pr60203.c: New testsuite.

Okay.

Thanks, David

70 matches

Mail list logo