Re: Fix libgomp semaphores

2011-11-25 Thread Alan Modra
On Fri, Nov 25, 2011 at 08:38:39AM +0100, Jakub Jelinek wrote:
 My preference would be to avoid the abstraction changes though, both
 because it is additional clutter in the changeset and because omp_lock
 and nested lock are part of public ABIs, so if struct is layed out
 differently on some weird architecture, it would be an ABI change.

OK, fair enough.  I didn't consider that structs may be laid out
differently.

 So, if you could keep gomp_mutex_t, omp_lock_t and gomp_sem_t as integers,
 it would be appreciated.
 
 Furthermore, I'd prefer if the patch could be split into smaller parts,
 e.g. for bisecting purposes.  One patch would do the mutex changes
 to use new atomics, remove extra mutex.h headers and start using 0/1/-1
 instead of 0/1/2.  And another patch would rewrite the semaphores.

OK.  I need to do this anyway as I just discovered a regression when
looping on one of the tests.  I suspect the acquire/release mutex
locking may have exposed bugs elsewhere in libgomp that were covered
by the heavyweight locking used by the __sync builtins.

-- 
Alan Modra
Australia Development Lab, IBM


Re: Re-merge crtstuff.c from the trans-mem branch

2011-11-25 Thread Rainer Orth
Rainer Orth r...@cebitec.uni-bielefeld.de writes:

 While the first patch allows Solaris 8/9 x86 bootstraps to finish
 (testsuite still running), I happened to run a Solaris 10/SPARC
 bootstrap that broke configuring stage 2 libgomp: even trivial
 executables die with a SEGV in _init.

 It turns out (still verifying with a fresh bootstrap) that the
 -fno-inline removal is the culprit.

All bootstraps have now completed without regressions, so the patch is
good to go from a Solaris POV.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[Patch, fortran, RFC] PR 40958 Reduce size of module files

2011-11-25 Thread Janne Blomqvist
Hi,

gfortran has a few long-standing bugs wrt module handling. The more
fundamental, and also more difficult to fix, issue is that we re-read
and re-parse module files every time a USE statement is encountered,
instead of once per translation unit. See PR 25708. Another issue, PR
40958, is that module files can be quite big which exacerbates the PR
25708 issues.

The attached patch fixes PR 40958 by compressing the module files with
zlib and storing them in the gzip format (RFC 1952). I chose zlib
because it's a) ubiquitous and b) there's already a copy of zlib in
the GCC source tree, so this doesn't introduce any further build
dependencies. Since the mod files with the patch are in the gzip
format, one can use tools like zcat, zless, zgrep, zdiff etc. to
inspect the uncompressed contents easily (one can also use gunzip if
one first copies the module file to a temporary file with .gz
extension).

However, there's a couple of issues related to seeking in gzip files
(gzseek() instead of fseek() which is currently used). One is fixed by
the patch, the other is a potentially serious performance issue.

First, for a writable gzip file, seeking backwards is not allowed.
Currently when writing a module file, we first write a placeholder for
the MD5, then write the actual module content while updating the MD5
sum in memory as we go, and finally we seek back and write the final
MD5 value. However, the gzip file format contains a solution, 8 bytes
from the end of the file a CRC32 checksum of the (uncompressed)
content is stored. So the patch rips out the MD5 machinery, and
instead compares these CRC32 checksums to determine whether to replace
an existing module file or not (from the command line, one can check
the CRC32 with 'zcat -l -v filename'). As a result, the module version
number has been bumped as well.

The second issue that the patch doesn't address in any way, is that
while seeking on a gzip file in read mode is allowed, from zlib.h: If
the file is opened for reading, this function is emulated but can be
extremely slow.. Unfortunately, when reading a module file we do seek
back and forth in it. Based on a brief inspection of the code, most if
not all of these seeks are for a very short distance (typically peek a
few bytes ahead in the stream, then seek back), and if the gzseek()
function is somewhat clever about seeking within the read buffer, this
might not be so slow after all. OTOH, if every gzseek() call means
restarting the inflation from the beginning of the file, the impact
could be quite bad.

The patch passes regression testing except for one failure,
module_md5_1.f90 which should be removed. Based on some quick testing,
the size of module files are reduced by a factor of 5 or thereabouts.
I haven't checked performance, in particular one would need to check
the second issue described above for some of those testcases
generating large module files. I think there was some single-file
version of cp2k somewhere that could be used for this, or are there
other appropriate tests somewhere that aren't too difficult to set up?

So at the moment, I'm not proposing this patch for inclusion, consider
it a RFC. Especially appropriate benchmark results and/or pointers to
easy-to-set-up testcases are appreciated.

In case the seeking in read mode is an issue, I suspect it wouldn't be
too hard to fix the parsing to not require it, but I think that would
push the patch more towards 4.8 material.

-- 
Janne Blomqvist
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 17ebd58..d6152b3 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -29,6 +29,9 @@ along with GCC; see the file COPYING3.  If not see
multiple header files.  Besides, Microsoft's winnt.h was 250k last
time I looked, so by comparison this is perfectly reasonable.  */
 
+#include config.h
+#include system.h
+
 /* Declarations common to the front-end and library are put in
libgfortran/libgfortran_frontend.h  */
 #include libgfortran.h
@@ -38,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include coretypes.h
 #include input.h
 #include splay-tree.h
+#include zlib.h
 
 /* Major control parameters.  */
 
@@ -2345,7 +2349,8 @@ void gfc_add_include_path (const char *, bool, bool);
 void gfc_add_intrinsic_modules_path (const char *);
 void gfc_release_include_path (void);
 FILE *gfc_open_included_file (const char *, bool, bool);
-FILE *gfc_open_intrinsic_module (const char *);
+gzFile gfc_gzopen_included_file (const char *, bool, bool);
+gzFile gfc_open_intrinsic_module (const char *);
 
 int gfc_at_end (void);
 int gfc_at_eof (void);
diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 62f7598..9fa8c97 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -72,15 +72,15 @@ along with GCC; see the file COPYING3.  If not see
 #include arith.h
 #include match.h
 #include parse.h /* FIXME */
-#include md5.h
 #include constructor.h
 #include cpp.h
+#include zlib.h
 
 #define 

Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule

2011-11-25 Thread Richard Sandiford
Hi Revital,

Revital Eres revital.e...@linaro.org writes:
 The attached patch adds register pressure estimation of the partial schedule.

My main comment is that we shouldn't need to track separate liveness
sets for each loop here, since we're only looking at one basic block.
I.e., rather than operate on the per-loop LOOP_DATA (loop)-regs_{ref,live},
we should be able to use a single pair of bitmaps.

Also, the code goes to a lot of trouble over this case:

+  /* Add to the set of out live regs all the registers defined in bb
+ which have uses outside of it (those registers where eliminated in
+ the above calculation).  Eliminate from this set the definitions
+ that exist in the epilog and with no uses inside the basic-block
+ as these definitions will be eliminated from the bb and thus should
+ not be considered for estimating register pressure in the bb.  */

But how often does it occur in practice?  It's not necessarily the case
that the instruction will be eliminated, because things like volatility
might require us to keep it.  It's probably more accurate to say that we
can treat these as unused defs.

There's an argument to say that we should only consider registers
that are used in the loop.  If the pressure is high because of
registers that are live across the loop but not used within it,
then it's reasonable to force code outside the loop to spill some
of those.  That would suggest starting with the intersection of
DR_LR_OUT and DF_LR_BB_INFO (bb)-use.  Starting with that set
also has the advantage of handling the above case for free.

(This occurs often in our friend the popular embedded benchmark, which
often has a single function of the form:

  A: ...set up...
  B: for (i = 0; i  num_runs; i++)
  C:   ...benchmark...
  D: ...record time...

Some values are live from A-D, but those values shouldn't affect
an SMSable loop somewhere in C.)

We talked earlier about making the main pressure-estimation code
process the loop twice, but I see instead you've gone for two
separate passes, one to calculate LR out, then the main pass.
I think with the changes above, running the same loop twice is
going to be easier and no less efficient.  We could even add
code to skip the second iteration if it would start with the
same lr_out as the first iteration.

Richard


Re: [PATCH 0/2] Add atomic support to m68k

2011-11-25 Thread Mikael Pettersson
Richard Henderson writes:
  On 11/23/2011 06:46 AM, Mikael Pettersson wrote:
   +FAIL: c-c++-common/gomp/atomic-10.c scan-tree-dump-times ompexp 
   __atomic_fetch_add 4
   +FAIL: c-c++-common/gomp/atomic-3.c scan-tree-dump-times ompexp xyzzy, 4 
   1
   +FAIL: c-c++-common/gomp/atomic-9.c scan-tree-dump-times ompexp 
   __atomic_fetch_add 1
  
  What are these failures?

Executing on host: /mnt/scratch/objdir47/gcc/xgcc -B/mnt/scratch/objdir47/gcc/ 
/mnt/scratch/gcc-4.7-2012/gcc/testsuite/c-c++-common/gomp/atomic-9.c
-fopenmp -fdump-tree-ompexp -S  -o atomic-9.s(timeout = 300)
PASS: c-c++-common/gomp/atomic-9.c (test for excess errors)
FAIL: c-c++-common/gomp/atomic-9.c scan-tree-dump-times ompexp 
__atomic_fetch_add 1

The test case expects

  #pragma omp atomic
*bar() += 1;

to become __atomic_fetch_add (it does on x86_64), but on m68k-linux with your
patch the assignment is instead bracketed by 
__builtin_GOMP_atomic_{start,end}().

atomic-10.c and atomic-3.c are the same issue.

  Are they fixed if you add m68k-linux to check_effective_target_sync_int_long 
  and check_effective_target_sync_char_short in 
  gcc/testsuite/lib/target-supports.exp?

No.  These tests require cas_int, and the patched gcc does provide that.
I believe the real error is that gomp for some reason doesn't think the target
has gcc atomics, and the tests fail in that case.

/Mikael


Re: [PATCH] Remove dead labels to increase superblock scope

2011-11-25 Thread Tom de Vries
On 21/11/11 17:13, Michael Matz wrote:
 Hi,
 
 On Sat, 19 Nov 2011, Tom de Vries wrote:
 
 On 11/18/2011 10:29 PM, Eric Botcazou wrote:
 For the test-case of PR50764, a dead label is introduced by
 fixup_reorder_chain in cfg_layout_finalize, called from
 pass_reorder_blocks.

 I presume that there is no reasonable way of preventing fixup_reorder_chain 
 from introducing it or of teaching fixup_reorder_chain to remove it?


 This (untested) patch also removes the dead label for the PR, and I 
 think it is safe. ...
 
 cfgrtl.c has already code to delete labels (delete_insn) when appropriate 
 (can_delete_label_p).  Perhaps that can be reused somehow.
 
 Index: cfglayout.c 
 === --- 
 cfglayout.c (revision 181377) +++ cfglayout.c (working copy) @@ -702,6 
 +702,21 @@ relink_block_chain (bool stay_in_cfglayo
  }
  

 +static bool
 +forced_label_p (rtx label)
 +{
 +  rtx insn, forced_label;
 +  for (insn = forced_labels; insn; insn = XEXP (insn, 1))
 +{
 +  forced_label = XEXP (insn, 0);
 +  if (!LABEL_P (forced_label))
 +continue;
 +  if (forced_label == label)
 +return true;
 +}
 +  return false;
 +}
 
 That's in_expr_list_p().
 
 @@ -857,6 +872,12 @@ fixup_reorder_chain (void)
 (e_taken-src, e_taken-dest));
e_taken-flags |= EDGE_FALLTHRU;
update_br_prob_note (bb);
 +  if (LABEL_NUSES (ret_label) == 0
 
 +   !LABEL_PRESERVE_P (ret_label)
 +   LABEL_NAME (ret_label) == NULL
 +   !forced_label_p (ret_label)
 
 And this is cfgrtl.c:can_delete_label_p.

Ok, using that in the new version.

 Note that you actually 
 can remove labels also if they are !can_delete_label_p, if you use 
 delete_insn (which you do).  It will replace such undeletable labels by a 
 DELETED_LABEL note.


I tried that as well but ran into these errors in rtl_verify_flow_info_1:
...
libquadmath/printf/cmp.c:56:1: error: NOTE_INSN_BASIC_BLOCK is missing for 
block 6
libquadmath/printf/cmp.c:56:1: error: NOTE_INSN_BASIC_BLOCK 79 in middle of
basic block 6
libquadmath/printf/cmp.c:56:1: internal compiler error: verify_flow_info failed
a-direct.ads:460:9: error: NOTE_INSN_BASIC_BLOCK is missing for block 6
a-direct.ads:460:9: error: NOTE_INSN_BASIC_BLOCK 25 in middle of basic block 6
+===GNAT BUG DETECTED==+
| 4.7.0 2023 (experimental) (x86_64-unknown-linux-gnu) GCC error:  |
| verify_flow_info failed  |
| Error detected around a-direct.ads:460:9 |
...

Eric,

This new patch was bootstrapped and reg-tested on x86_64.

this new patch or old patch (
http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01953.html ) ok for next stage1?

Thanks,
- Tom

 
 Ciao,
 Michael.

2011-11-25  Tom de Vries  t...@codesourcery.com

* rtl.h (can_delete_label_p): Declare.
* cfgrtl.c (can_delete_label_p): Remove static.
* cfglayout.c (fixup_reorder_chain): Delete unused label if
can_delete_label_p.

* gcc.dg/superblock.c: New test.
Index: gcc/cfglayout.c
===
--- gcc/cfglayout.c (revision 181652)
+++ gcc/cfglayout.c (working copy)
@@ -857,6 +857,10 @@ fixup_reorder_chain (void)
    (e_taken-src, e_taken-dest));
 		  e_taken-flags |= EDGE_FALLTHRU;
 		  update_br_prob_note (bb);
+		  if (LABEL_NUSES (ret_label) == 0
+		   can_delete_label_p (ret_label)
+		   single_pred_p (e_taken-dest))
+		delete_insn (ret_label);
 		  continue;
 		}
 	}
Index: gcc/rtl.h
===
--- gcc/rtl.h (revision 181652)
+++ gcc/rtl.h (working copy)
@@ -2482,6 +2482,9 @@ extern void dump_combine_total_stats (FI
 /* In cfgcleanup.c  */
 extern void delete_dead_jumptables (void);
 
+/* In rtlcfg.c  */
+int can_delete_label_p (const_rtx);
+
 /* In sched-vis.c.  */
 extern void debug_bb_n_slim (int);
 extern void debug_bb_slim (struct basic_block_def *);
Index: gcc/cfgrtl.c
===
--- gcc/cfgrtl.c (revision 181652)
+++ gcc/cfgrtl.c (working copy)
@@ -66,7 +66,6 @@ along with GCC; see the file COPYING3.
 #include df.h
 
 static int can_delete_note_p (const_rtx);
-static int can_delete_label_p (const_rtx);
 static basic_block rtl_split_edge (edge);
 static bool rtl_move_block_after (basic_block, basic_block);
 static int rtl_verify_flow_info (void);
@@ -102,7 +101,7 @@ can_delete_note_p (const_rtx note)
 
 /* True if a given label can be deleted.  */
 
-static int
+int
 can_delete_label_p (const_rtx label)
 {
   return (!LABEL_PRESERVE_P (label)
Index: gcc/testsuite/gcc.dg/superblock.c
===
--- /dev/null (new file)
+++ 

[Patch, Fortran] PR 50408 [4.6/4.7] ICE related to whole-file processing

2011-11-25 Thread Tobias Burnus
The patch fixes an issue when the backend_decl is reused (-fwhole-file). 
The problem is that not always the ts.u.derived-backend_decl was copied 
as well. I copied what was done a bit later in the file and extended it 
to also include BT_CLASS.
The trans-type.c change is not needed, but I thought it is a good 
optimization. from == to seems to happen quite regularly.


Build and regtested on x86-64-linux.
OK for the trunk and 4.6?

Tobias

PS: It also affects 4.5 if one uses -fwhole-file. However, my impression 
is that no one uses that option with 4.5 and other whole-file bugs have 
only been fixed for 4.6. But if you think one should backport it to 4.5, 
I can surely do so.


2011-11-25  Tobias Burnus  bur...@net-b.de

	PR fortran/50408
	* trans-decl.c (gfc_get_module_backend_decl): Also copy
	ts.u.derived from the gsym if the ts.type is BT_CLASS.
	(gfc_get_extern_function_decl): Copy also the backend_decl
	for the symbol's ts.u.{derived,cl} from the gsym.
	* trans-types.c (gfc_copy_dt_decls_ifequal): Directly
	return if from and to are the same.

2011-11-25  Tobias Burnus  bur...@net-b.de

	PR fortran/50408
	* gfortran.dg/whole_file_35.f90: New.

diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index fc8a9ed..39ec8cd 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -718,7 +718,7 @@ gfc_get_module_backend_decl (gfc_symbol *sym)
 	}
   else if (s-backend_decl)
 	{
-	  if (sym-ts.type == BT_DERIVED)
+	  if (sym-ts.type == BT_DERIVED || sym-ts.type == BT_CLASS)
 	gfc_copy_dt_decls_ifequal (s-ts.u.derived, sym-ts.u.derived,
    true);
 	  else if (sym-ts.type == BT_CHARACTER)
@@ -1670,6 +1670,11 @@ gfc_get_extern_function_decl (gfc_symbol * sym)
   gfc_find_symbol (sym-name, gsym-ns, 0, s);
   if (s  s-backend_decl)
 	{
+	  if (sym-ts.type == BT_DERIVED || sym-ts.type == BT_CLASS)
+	gfc_copy_dt_decls_ifequal (s-ts.u.derived, sym-ts.u.derived,
+   true);
+	  else if (sym-ts.type == BT_CHARACTER)
+	sym-ts.u.cl-backend_decl = s-ts.u.cl-backend_decl;
 	  sym-backend_decl = s-backend_decl;
 	  return sym-backend_decl;
 	}
diff --git a/gcc/fortran/trans-types.c b/gcc/fortran/trans-types.c
index 3f4ebd5..d643c2e 100644
--- a/gcc/fortran/trans-types.c
+++ b/gcc/fortran/trans-types.c
@@ -2188,6 +2188,9 @@ gfc_copy_dt_decls_ifequal (gfc_symbol *from, gfc_symbol *to,
   gfc_component *to_cm;
   gfc_component *from_cm;
 
+  if (from == to)
+return 1;
+
   if (from-backend_decl == NULL
 	|| !gfc_compare_derived_types (from, to))
 return 0;
--- /dev/null	2011-11-22 07:52:35.375586753 +0100
+++ gcc/gcc/testsuite/gfortran.dg/whole_file_35.f90	2011-11-25 09:30:18.0 +0100
@@ -0,0 +1,28 @@
+! { dg-do compile }
+!
+! PR fortran/50408
+!
+! Contributed by Vittorio Zecca
+!
+   module m
+ type int
+   integer  :: val
+ end type int
+ interface ichar
+   module procedure uch
+end interface
+   contains
+ function uch (c)
+   character (len=1), intent (in) :: c
+   type (int) :: uch
+   intrinsic ichar
+   uch%val = 127 - ichar (c)
+ end function uch 
+   end module m
+
+  program p
+use m
+print *,ichar('~') ! must print 1
+  end program p
+
+! { dg-final { cleanup-modules m } }



Re: [Patch,AVR]: Clean up SFR offset usage: %i for CONST_INT

2011-11-25 Thread Georg-Johann Lay
Georg-Johann Lay wrote:
 Denis Chertykov wrote:
 2011/11/20 Georg-Johann Lay .:
 Subtracting 0x20 to get the SFR address from a RAM address is scattered all
 over the backend.  The patch makes - PRINT_OPERAND_PUNCT_VALID_P and uses 
 %- to
 subtract the SFR offset instead of hard coded magic number 0x20 all over the
 place.  The offset is stored in a new field base_arch_s.sfr_offset
 I don't like '%-' as a sequence and I don't like it as a suffix.
 May be a right way is an adding a new prefix '%i' or '%I'.
 I.e.
   %m0 - memory address
   %i0 - io address (equal to %m0 - 0x20)

 Denis.
 
 hmmm. The intention was to be able to specify SFR offset in inline assembly,
 for example. The offset is independent of operands; it is a specific to the
 architecture.
 
 Anyway, here is a updated patch. Its the same as the last except that it
 implements %i instead of %- and avr_out_plus_1 prints constants more
 eye-friendly. And there was a missing return close to the end of 
 out_movqi_mr_r.
 
 Passes test suite.
 
 Ok?
 
 Johann
 
   * config/avr/avr.h (struct base_arch_s): Add field sfr_offset.
   * config/avr/avr-devices.c: Ditto. And initialize it.
   * config/avr/avr-c.c (avr_cpu_cpp_builtins): New built-in define
   __AVR_SFR_OFFSET__.
   * config/avr/avr-protos.h (out_movqi_r_mr, out_movqi_mr_r): Remove.
   (out_movhi_r_mr, out_movhi_mr_r): Remove.
   (out_movsi_r_mr, out_movsi_mr_r): Remove.
   * config/avr/avr.md (*cbi, *sbi): Use %i instead of %m-0x20.
   (*insv.io, *insv.not.io): Ditto.
   * config/avr/avr.c (out_movsi_r_mr, out_movsi_mr_r): Make static.
   (print_operand): Implement %i to print address as I/O address.
   (output_movqi): Clean up call of out_movqi_mr_r.
   (output_movhi): Clean up call of out_movhi_mr_r.
   (avr_file_start): Use avr_current_arch-sfr_offset instead of
   magic -0x20. Use TMP_REGNO, ZERO_REGNO instead of 0, 1.
   (avr_out_sbxx_branch): Use %i instead of %m-0x20.
   (out_movqi_r_mr, out_movqi_mr_r): Ditto. And make static.
   (out_movhi_r_mr, out_movhi_mr_r): Ditto. And use avr_asm_len.
   (out_shift_with_cnt): Clean up code: Use avr_asm_len.
   (output_movsisf): Use output_reload_insisf for all CONSTANT_P sources.
   (avr_out_movpsi): USE avr_out_reload_inpsi for all CONSTANT_P sources.
   Clean up call of avr_out_store_psi.
   (output_reload_in_const): Don't cut symbols longer than 2 bytes.
   (output_reload_insisf): Filter CONST_INT_P or CONST_DOUBLE_P to
   try if setting pre-cleared register is advantageous.
   (avr_out_plus_1): Use gen_int_mode instead of GEN_INT.

This adds %i support for CONST_INT.

It is needed because some insns don't use memory_operand but
 mem:QI (io_address_operand)

%i(mem) just forwards to %i(const_int)

Ok?

Johann

* config/avr/avr.c (print_operand): Support code = 'i' for CONST_INT.

Index: config/avr/avr.md
===
--- config/avr/avr.md	(revision 181717)
+++ config/avr/avr.md	(working copy)
@@ -28,8 +28,8 @@
 ;;  j  Branch condition.
 ;;  k  Reverse branch condition.
 ;;..m..Constant Direct Data memory address.
-;;  i  Print the SFR address quivalent of a CONST_INT RAM address.
-;; The resulting addres is suitable to be used in IN/OUT.
+;;  i  Print the SFR address quivalent of a CONST_INT or a CONST_INT
+;; RAM address.  The resulting addres is suitable to be used in IN/OUT.
 ;;  o  Displacement for (mem (plus (reg) (const_int))) operands.
 ;;  p  POST_INC or PRE_DEC address as a pointer (X, Y, Z)
 ;;  r  POST_INC or PRE_DEC address as a register (r26, r28, r30)
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 181717)
+++ config/avr/avr.c	(working copy)
@@ -1822,9 +1822,32 @@ print_operand (FILE *file, rtx x, int co
   else
 	fprintf (file, reg_names[true_regnum (x) + abcd]);
 }
-  else if (GET_CODE (x) == CONST_INT)
-fprintf (file, HOST_WIDE_INT_PRINT_DEC, INTVAL (x) + abcd);
-  else if (GET_CODE (x) == MEM)
+  else if (CONST_INT_P (x))
+{
+  HOST_WIDE_INT ival = INTVAL (x);
+
+  if ('i' != code)
+fprintf (file, HOST_WIDE_INT_PRINT_DEC, ival + abcd);
+  else if (low_io_address_operand (x, VOIDmode)
+   || high_io_address_operand (x, VOIDmode))
+{
+  switch (ival)
+{
+case RAMPZ_ADDR: fprintf (file, __RAMPZ__); break;
+case SREG_ADDR: fprintf (file, __SREG__); break;
+case SP_ADDR:   fprintf (file, __SP_L__); break;
+case SP_ADDR+1: fprintf (file, __SP_H__); break;
+  
+default:
+  fprintf (file, HOST_WIDE_INT_PRINT_HEX,
+   ival - avr_current_arch-sfr_offset);
+  break;
+}
+}
+  else
+fatal_insn (bad address, not an I/O address:, x);
+}
+  else if (MEM_P 

Re: [PATCH] Remove dead labels to increase superblock scope

2011-11-25 Thread Michael Matz
Hi,

On Fri, 25 Nov 2011, Tom de Vries wrote:

  Note that you actually can remove labels also if they are 
  !can_delete_label_p, if you use delete_insn (which you do).  It will 
  replace such undeletable labels by a DELETED_LABEL note.
 
 I tried that as well but ran into these errors in rtl_verify_flow_info_1:
 ...
 libquadmath/printf/cmp.c:56:1: error: NOTE_INSN_BASIC_BLOCK is missing for 
 block 6
 libquadmath/printf/cmp.c:56:1: error: NOTE_INSN_BASIC_BLOCK 79 in middle of
 basic block 6

Hmpf, probably bitrotted over time.  Oh well, so be it.


Ciao,
Michael.


Re: [PATCH] Remove dead labels to increase superblock scope

2011-11-25 Thread Steven Bosscher
On Fri, Nov 25, 2011 at 2:03 PM, Michael Matz m...@suse.de wrote:
 Hi,

 On Fri, 25 Nov 2011, Tom de Vries wrote:

  Note that you actually can remove labels also if they are
  !can_delete_label_p, if you use delete_insn (which you do).  It will
  replace such undeletable labels by a DELETED_LABEL note.

 I tried that as well but ran into these errors in rtl_verify_flow_info_1:
 ...
 libquadmath/printf/cmp.c:56:1: error: NOTE_INSN_BASIC_BLOCK is missing for 
 block 6
 libquadmath/printf/cmp.c:56:1: error: NOTE_INSN_BASIC_BLOCK 79 in middle of
 basic block 6

 Hmpf, probably bitrotted over time.  Oh well, so be it.

No, DELETED_LABEL notes still work just fine. It depends on how you
remove the label and replace it with a note, and Tom isn't showing
what he did, so...

Ciao!
Steven


Re: [Patch,AVR]: Clean up SFR offset usage: %i for CONST_INT

2011-11-25 Thread Denis Chertykov
2011/11/25 Georg-Johann Lay a...@gjlay.de

 Georg-Johann Lay wrote:
  Denis Chertykov wrote:
  2011/11/20 Georg-Johann Lay .:
  Subtracting 0x20 to get the SFR address from a RAM address is scattered 
  all
  over the backend.  The patch makes - PRINT_OPERAND_PUNCT_VALID_P and uses 
  %- to
  subtract the SFR offset instead of hard coded magic number 0x20 all over 
  the
  place.  The offset is stored in a new field base_arch_s.sfr_offset
  I don't like '%-' as a sequence and I don't like it as a suffix.
  May be a right way is an adding a new prefix '%i' or '%I'.
  I.e.
    %m0 - memory address
    %i0 - io address (equal to %m0 - 0x20)
 
  Denis.
 
  hmmm. The intention was to be able to specify SFR offset in inline assembly,
  for example. The offset is independent of operands; it is a specific to the
  architecture.
 
  Anyway, here is a updated patch. Its the same as the last except that it
  implements %i instead of %- and avr_out_plus_1 prints constants more
  eye-friendly. And there was a missing return close to the end of 
  out_movqi_mr_r.
 
  Passes test suite.
 
  Ok?
 
  Johann
 
        * config/avr/avr.h (struct base_arch_s): Add field sfr_offset.
        * config/avr/avr-devices.c: Ditto. And initialize it.
        * config/avr/avr-c.c (avr_cpu_cpp_builtins): New built-in define
        __AVR_SFR_OFFSET__.
        * config/avr/avr-protos.h (out_movqi_r_mr, out_movqi_mr_r): Remove.
        (out_movhi_r_mr, out_movhi_mr_r): Remove.
        (out_movsi_r_mr, out_movsi_mr_r): Remove.
        * config/avr/avr.md (*cbi, *sbi): Use %i instead of %m-0x20.
        (*insv.io, *insv.not.io): Ditto.
        * config/avr/avr.c (out_movsi_r_mr, out_movsi_mr_r): Make static.
        (print_operand): Implement %i to print address as I/O address.
        (output_movqi): Clean up call of out_movqi_mr_r.
        (output_movhi): Clean up call of out_movhi_mr_r.
        (avr_file_start): Use avr_current_arch-sfr_offset instead of
        magic -0x20. Use TMP_REGNO, ZERO_REGNO instead of 0, 1.
        (avr_out_sbxx_branch): Use %i instead of %m-0x20.
        (out_movqi_r_mr, out_movqi_mr_r): Ditto. And make static.
        (out_movhi_r_mr, out_movhi_mr_r): Ditto. And use avr_asm_len.
        (out_shift_with_cnt): Clean up code: Use avr_asm_len.
        (output_movsisf): Use output_reload_insisf for all CONSTANT_P sources.
        (avr_out_movpsi): USE avr_out_reload_inpsi for all CONSTANT_P sources.
        Clean up call of avr_out_store_psi.
        (output_reload_in_const): Don't cut symbols longer than 2 bytes.
        (output_reload_insisf): Filter CONST_INT_P or CONST_DOUBLE_P to
        try if setting pre-cleared register is advantageous.
        (avr_out_plus_1): Use gen_int_mode instead of GEN_INT.

 This adds %i support for CONST_INT.

 It is needed because some insns don't use memory_operand but
  mem:QI (io_address_operand)

 %i(mem) just forwards to %i(const_int)

 Ok?

 Johann

        * config/avr/avr.c (print_operand): Support code = 'i' for CONST_INT.


Ok.

Denis.


Re: [PATCH] Remove dead labels to increase superblock scope

2011-11-25 Thread Tom de Vries
On 25/11/11 14:05, Steven Bosscher wrote:
 On Fri, Nov 25, 2011 at 2:03 PM, Michael Matz m...@suse.de wrote:
 Hi,

 On Fri, 25 Nov 2011, Tom de Vries wrote:

 Note that you actually can remove labels also if they are
 !can_delete_label_p, if you use delete_insn (which you do).  It will
 replace such undeletable labels by a DELETED_LABEL note.

 I tried that as well but ran into these errors in rtl_verify_flow_info_1:
 ...
 libquadmath/printf/cmp.c:56:1: error: NOTE_INSN_BASIC_BLOCK is missing for 
 block 6
 libquadmath/printf/cmp.c:56:1: error: NOTE_INSN_BASIC_BLOCK 79 in middle of
 basic block 6

 Hmpf, probably bitrotted over time.  Oh well, so be it.
 
 No, DELETED_LABEL notes still work just fine. It depends on how you
 remove the label and replace it with a note, and Tom isn't showing
 what he did, so...

This is the patch with which I ran into the rtl_verify_flow_info_1 errors:
...
Index: gcc/cfglayout.c
===
--- gcc/cfglayout.c (revision 181172)
+++ gcc/cfglayout.c (working copy)
@@ -857,6 +857,9 @@ fixup_reorder_chain (void)
   (e_taken-src, e_taken-dest));
  e_taken-flags |= EDGE_FALLTHRU;
  update_br_prob_note (bb);
+ if (LABEL_NUSES (ret_label) == 0
+  single_pred_p (e_taken-dest))
+   delete_insn (ret_label);
  continue;
}
}
...

Thanks,
- Tom

 
 Ciao!
 Steven



Fix doloop bug with maximum-length loops

2011-11-25 Thread Joseph S. Myers
This patch fixes a bug in the RTL doloop pass that showed as timeouts
of gcc.c-torture/execute/961017-1.c execution on slow targets because
a 256-iteration loop was replaced with a 2^32-iteration loop (if the
test did not time out, it would still pass as it didn't contain any
checks on the number of iterations).  The testcases included with the
patch are self-checking testcases that will reliably fail on affected
targets (if the rest of the patch is not applied), aborting if they do
not time out.  Affected targets include sh-linux-gnu and
powerpc-linux-gnu.

The replacement occurs in the RTL doloop pass (loop-doloop.c).  Recall
that RTL CONST_INTs do not have modes.  The number of iterations of
the loop (appropriately defined) is calculated as (const_int -1) -
implicitly QImode.  It might seem appropriate for
loop-iv.c:iv_number_of_iterations, where it does

  if (CONST_INT_P (desc-niter_expr))
{
  unsigned HOST_WIDEST_INT val = INTVAL (desc-niter_expr);

  desc-const_iter = true;
  desc-niter_max = desc-niter = val  GET_MODE_MASK
  (desc-mode);
}

to adjust desc-niter_expr using the mask in the same way (i.e.
desc-niter_expr = GEN_INT (desc-niter);).  But that is neither
necessary nor sufficient to fix the bug.  It changes the number of
iterations to the correct (const_int 255).  But whether the number is
given as 255 or -1, doloop_modify is entered with zero_extend_p ==
true and from_mode == QImode.  The code there then determines that it
needs to increment the count - and does so in QImode, which in either
case produces 0, before then zero-extending to SImode.

This code for doing the increment in from_mode comes from the fix for
PR 37451 and the follow-up fix for PR 37782
http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html
http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html.  As far as
I can tell the idea of those changes - which were an attempt to
improve optimization - is simply broken when the loop might have
maximum length like this (which in the original PR 37451 case it
can't, but telling that in this code would be nontrivial) - including
the case of nonconstant length as well as that of constant length.

So this patch reverts both those previous patches and adds testcases
to demonstrate the problem they caused.  Bootstrapped with no
regressions on powerpc-linux-gnu.  OK to commit?

(If the patch holds up on trunk I'd propose it for 4.6 and 4.5 branches as 
well, as a wrong-code regression fix.)

2011-11-25  Joseph Myers  jos...@codesourcery.com

Revert:

2008-09-18  Andrew Pinski  andrew_pin...@playstation.sony.com

PR rtl-opt/37451
* loop-doloop.c (doloop_modify): New argument zero_extend_p and
zero extend count after the correction to it is done.
(doloop_optimize): Update call to doloop_modify, don't zero extend
count before call.

2008-11-03  Andrew Pinski  andrew_pin...@playstation.sony.com

PR rtl-opt/37782
* loop-doloop.c (doloop_modify): Add from_mode argument that says what
mode count is in.
(doloop_optimize): Update call to doloop_modify.

testsuite:
2011-11-25  Joseph Myers  jos...@codesourcery.com

* gcc.c-torture/execute/doloop-1.c,
gcc.c-torture/execute/doloop-2.c: New tests.

Index: testsuite/gcc.c-torture/execute/doloop-1.c
===
--- testsuite/gcc.c-torture/execute/doloop-1.c  (revision 0)
+++ testsuite/gcc.c-torture/execute/doloop-1.c  (revision 0)
@@ -0,0 +1,18 @@
+#include limits.h
+
+extern void exit (int);
+extern void abort (void);
+
+volatile unsigned int i;
+
+int
+main (void)
+{
+  unsigned char z = 0;
+
+  do ++i;
+  while (--z  0);
+  if (i != UCHAR_MAX + 1U)
+abort ();
+  exit (0);
+}
Index: testsuite/gcc.c-torture/execute/doloop-2.c
===
--- testsuite/gcc.c-torture/execute/doloop-2.c  (revision 0)
+++ testsuite/gcc.c-torture/execute/doloop-2.c  (revision 0)
@@ -0,0 +1,18 @@
+#include limits.h
+
+extern void exit (int);
+extern void abort (void);
+
+volatile unsigned int i;
+
+int
+main (void)
+{
+  unsigned short z = 0;
+
+  do ++i;
+  while (--z  0);
+  if (i != USHRT_MAX + 1U)
+abort ();
+  exit (0);
+}
Index: loop-doloop.c
===
--- loop-doloop.c   (revision 181697)
+++ loop-doloop.c   (working copy)
@@ -394,14 +394,11 @@ add_test (rtx cond, edge *e, basic_block
describes the loop, DESC describes the number of iterations of the
loop, and DOLOOP_INSN is the low-overhead looping insn to emit at the
end of the loop.  CONDITION is the condition separated from the
-   DOLOOP_SEQ.  COUNT is the number of iterations of the LOOP.
-   ZERO_EXTEND_P says to zero extend COUNT after the increment of it to
-   word_mode from FROM_MODE.  */
+   DOLOOP_SEQ.  COUNT is the number of iterations of the LOOP.  */
 
 static void
 

Re: Keep static VTA locs in cselib tables only

2011-11-25 Thread Jakub Jelinek
On Wed, Nov 23, 2011 at 08:10:00AM -0200, Alexandre Oliva wrote:
 - compiling stage2 target libs and stage3 host patched sources (with
 both unpatched and patched stage2 compiler) produced cc1plus with 10%
 fewer entry value expressions (a welcome surprise!), 1% fewer call site
 value expressions, an increase of 0.1% in the total number of variables
 with location lists and less than 0.5% decrease in variables with full
 coverage.

The numbers I got with your patch (RTL checking) are below, seems
the cumulative numbers other than 100% are all bigger with patched stage2,
which means unfortunately debug info quality degradation.  Have you
analysed at least on some shorter testcases why does that happen?

Otherwise the patch looks good to me.

x86_64 patched stage3 compiled by vanilla stage2
cov%samples cumul
0.0 230172/32%  230172/32%
0..10   12267/1%242439/34%
11..20  10548/1%252987/35%
21..30  17018/2%270005/37%
31..40  16374/2%286379/40%
41..50  17533/2%303912/42%
51..60  13051/1%316963/44%
61..70  13946/1%330909/46%
71..80  19627/2%350536/49%
81..90  28877/4%379413/53%
91..99  85086/11%   464499/65%
100 246568/34%  711067/100%
x86_64 patched stage3 compiled by patched stage2
cov%samples cumul
0.0 230182/32%  230182/32%
0..10   12319/1%242501/34%
11..20  10765/1%253266/35%
21..30  17390/2%270656/38%
31..40  16745/2%287401/40%
41..50  17821/2%305222/42%
51..60  13306/1%318528/44%
61..70  14104/1%332632/46%
71..80  19795/2%352427/49%
81..90  29030/4%381457/53%
91..99  85171/11%   466628/65%
100 244439/34%  711067/100%
i686 patched stage3 compiled by vanilla stage2
cov%samples cumul
0.0 225909/32%  225909/32%
0..10   12420/1%238329/34%
11..20  10693/1%249022/35%
21..30  17102/2%266124/38%
31..40  13529/1%279653/40%
41..50  17232/2%296885/42%
51..60  12568/1%309453/44%
61..70  14769/2%324222/46%
71..80  14937/2%339159/48%
81..90  23868/3%363027/52%
91..99  86306/12%   449333/64%
100 245327/35%  694660/100%
i686 patched stage3 compiled by patched stage2
cov%samples cumul
0.0 225917/32%  225917/32%
0..10   12471/1%238388/34%
11..20  10848/1%249236/35%
21..30  17292/2%266528/38%
31..40  13716/1%280244/40%
41..50  17324/2%297568/42%
51..60  12673/1%310241/44%
61..70  14950/2%325191/46%
71..80  15085/2%340276/48%
81..90  24019/3%364295/52%
91..99  86228/12%   450523/64%
100 244137/35%  694660/100%

Jakub


Added myself to MAINTAINERS: write after approval

2011-11-25 Thread Sameera Deshpande
Committed.

-- Index: MAINTAINERS
===
--- MAINTAINERS	(revision 181721)
+++ MAINTAINERS	(working copy)
@@ -345,6 +345,7 @@
 David Daney	david.da...@caviumnetworks.com
 Bud Davis	jmda...@link.com
 Chris Demetriou	c...@google.com
+Sameera Deshpandesameera.deshpa...@arm.com
 Fran�ois Dumont	fdum...@gcc.gnu.org
 Benoit Dupont de Dinechin			benoit.dupont-de-dinec...@st.com
 Michael Eager	ea...@eagercon.com

[Patch, Fortran, committed] PR51302 - fix ICE with volatile loop variable

2011-11-25 Thread Tobias Burnus

Fixed the ICE:
internal compiler error: in gfc_add_modify_loc, at fortran/trans.c:161

Build, regtested and committed (Rev. 181724 ) on x86-64-linux.

Tobias
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(revision 181723)
+++ gcc/fortran/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2011-11-25  Tobias Burnus  bur...@net-b.de
+
+	PR fortran/51302
+	* trans-stmt.c (gfc_trans_simple_do): Add a fold_convert.
+
 2011-11-24  Tobias Burnus  bur...@net-b.de
 
 	PR fortran/51218
Index: gcc/fortran/trans-stmt.c
===
--- gcc/fortran/trans-stmt.c	(revision 181723)
+++ gcc/fortran/trans-stmt.c	(working copy)
@@ -1259,7 +1259,8 @@ gfc_trans_simple_do (gfc_code * code, stmtblock_t
   loc = code-ext.iterator-start-where.lb-location;
 
   /* Initialize the DO variable: dovar = from.  */
-  gfc_add_modify_loc (loc, pblock, dovar, from);
+  gfc_add_modify_loc (loc, pblock, dovar,
+		  fold_convert (TREE_TYPE(dovar), from));
   
   /* Save value for do-tinkering checking. */
   if (gfc_option.rtcheck  GFC_RTCHECK_DO)
Index: gcc/testsuite/gfortran.dg/volatile13.f90
===
--- gcc/testsuite/gfortran.dg/volatile13.f90	(revision 0)
+++ gcc/testsuite/gfortran.dg/volatile13.f90	(working copy)
@@ -0,0 +1,11 @@
+! { dg-do compile }
+!
+! PR fortran/51302
+!
+! Volatile DO variable - was ICEing before
+!
+integer, volatile :: i
+integer :: n = 1
+do i = 1, n
+end do
+end
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog	(revision 181723)
+++ gcc/testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2011-11-25  Tobias Burnus  bur...@net-b.de
+
+	PR fortran/51302
+	* gfortran.dg/volatile13.f90: New.
+
 2011-11-24  Andrew MacLeod  amacl...@redhat.com
 
 	PR c/51256


Re: [Patch, Fortran] PR 50408 [4.6/4.7] ICE related to whole-file processing

2011-11-25 Thread Steve Kargl
On Fri, Nov 25, 2011 at 11:46:37AM +0100, Tobias Burnus wrote:
 The patch fixes an issue when the backend_decl is reused (-fwhole-file). 
 The problem is that not always the ts.u.derived-backend_decl was copied 
 as well. I copied what was done a bit later in the file and extended it 
 to also include BT_CLASS.
 The trans-type.c change is not needed, but I thought it is a good 
 optimization. from == to seems to happen quite regularly.
 
 Build and regtested on x86-64-linux.
 OK for the trunk and 4.6?
 

OK.

I have no issues with committing the fix to 4.5.  It however
may be time to allow 4.5 to ride off into the sunset.


-- 
Steve


Re: Go patch committed: New lock/note implementation

2011-11-25 Thread Rainer Orth
Ian Lance Taylor i...@google.com writes:

 This patch updates the implementations of locks and notes used in libgo
 to use the current version from the master Go library.  This now uses
 futexes when running on GNU/Linux, while still using semaphores on other
 systems.  This implementation should be faster, and does not require
 explicit initialization.  Bootstrapped and ran Go testsuite on
 x86_64-unknown-linux-gnu.  I tested both the futex and the semaphore
 versions.  Committed to mainline.

 +static int32
 +getproccount(void)
 +{
 + int32 fd, rd, cnt, cpustrlen;
 + const byte *cpustr, *pos;
 + byte *bufpos;
 + byte buf[256];
 +
 + fd = open(/proc/stat, O_RDONLY|O_CLOEXEC, 0);

This broke bootstrap on Linux/x86_64 (CentOS 5.5), which lacks
O_CLOEXEC.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH, testsuite]: Introduce sync_int128_runtime and sync_long_long_runtime

2011-11-25 Thread Uros Bizjak
Hello!

Attached patch introduces sync_int128_runtime and
sync_long_long_runtime runtime check to prevent running atomic runtime
tests on targets that don't support them. I also merged runtime check
for arm*-*-linux-gnueabi with corresponding arm*-*-* compile-time
check. This change has a nice side effect that
gcc.dg/di-longlong64-sync.c and gcc.dg/di-sync-multithread.c tests now
also run on x86_64.

Regarding arm, I have simply copied existing runtime check. Various
long-long atomic tests now also run on this target, so perhaps there
will be some fallout on recently introduced tests.

2011-11-25  Uros Bizjak  ubiz...@gmail.com

PR testsuite/51258
* lib/target-supports.exp
(check_effective_target_sync_int_128_runtime): New procedure.
(check_effective_target_sync_long_long_runtime): Ditto.
(check_effective_target_sync_long_long): Add arm*-*-*.
(check_effective_target_sync_longlong): Remove.

* gcc.dg/atomic-op-5.c: Require sync_int_128_runtime effective target.
* gcc.dg/atomic-compare-exchange-5.c: Ditto.
* gcc.dg/atomic-exchange-5.c: Ditto.
* gcc.dg/atomic-load-5.c: Ditto.
* gcc.dg/atomic-store-5.c: Ditto.
* gcc.dg/simulate-thread/atomic-load-int128.c: Ditto.
* gcc.dg/simulate-thread/atomic-other-int128.c: Ditto.
* gcc.dg/atomic-op-4.c: Require sync_long_long_runtime
effective target.
* gcc.dg/atomic-compare-exchange-4.c: Ditto.
* gcc.dg/atomic-exchange-4.c: Ditto.
* gcc.dg/atomic-load-4.c: Ditto.
* gcc.dg/atomic-store-4.c: Ditto.
* gcc.dg/di-longlong64-sync-1.c: Ditto.
* gcc.dg/di-sync-multithread.c: Ditto.
* gcc.dg/simulate-thread/atomic-load-longlong.c: Ditto.
* gcc.dg/simulate-thread/atomic-other-longlong.c: Ditto.

Patch was tested on x86_64-pc-linux-gnu and was committed to mainline SVN.

Uros.
Index: lib/target-supports.exp
===
--- lib/target-supports.exp (revision 181721)
+++ lib/target-supports.exp (working copy)
@@ -3620,17 +3620,80 @@
 }
 }
 
+# Return 1 if the target supports atomic operations on int_128 values
+# and can execute them.
+
+proc check_effective_target_sync_int_128_runtime { } {
+if { ([istarget x86_64-*-*] || [istarget i?86-*-*])
+ ![is-effective-target ia32] } {
+   return [check_cached_effective_target sync_int_128_available {
+   check_runtime_nocache sync_int_128_available {
+   #include cpuid.h
+   int main ()
+   {
+ unsigned int eax, ebx, ecx, edx;
+ if (__get_cpuid (1, eax, ebx, ecx, edx))
+   return !(ecx  bit_CMPXCHG16B);
+ return 1;
+   }
+   } 
+   }]
+} else {
+   return 0
+}
+}
+
 # Return 1 if the target supports atomic operations on long long.
 
 proc check_effective_target_sync_long_long { } {
 if { ([istarget x86_64-*-*] || [istarget i?86-*-*])
- ![is-effective-target ia32] } {
+ ![is-effective-target ia32]
+|| [istarget arm*-*-*] } {
return 1
 } else {
return 0
 }
 }
 
+# Return 1 if the target supports atomic operations on long long
+# and can execute them.
+
+proc check_effective_target_sync_long_long_runtime { } {
+if { ([istarget x86_64-*-*] || [istarget i?86-*-*])
+ ![is-effective-target ia32] } {
+   return [check_cached_effective_target sync_long_long_available {
+   check_runtime_nocache sync_long_long_available {
+   #include cpuid.h
+   int main ()
+   {
+ unsigned int eax, ebx, ecx, edx;
+ if (__get_cpuid (1, eax, ebx, ecx, edx))
+   return !(edx  bit_CMPXCHG8B);
+ return 1;
+   }
+   } 
+   }]
+} elseif { [istarget arm*-*-linux-gnueabi] } {
+   return [check_runtime sync_longlong_runtime {
+   #include stdlib.h
+   int main ()
+   {
+ long long l1;
+
+ if (sizeof (long long) != 8)
+   exit (1);
+
+ /* Just check for native; checking for kernel fallback is tricky. 
 */
+ asm volatile (ldrexd r0,r1, [%0] : : r (l1) : r0, r1);
+
+ exit (0);
+   }
+   }  ]
+} else {
+   return 0
+}
+}
+
 # Return 1 if the target supports atomic operations on int and long.
 
 proc check_effective_target_sync_int_long { } {
@@ -3662,31 +3725,6 @@
 return $et_sync_int_long_saved
 }
 
-# Return 1 if the target supports atomic operations on long long and can
-# execute them
-# So far only put checks in for ARM, others may want to add their own
-proc check_effective_target_sync_longlong { } {
-return [check_runtime sync_longlong_runtime {
-  #include stdlib.h
-  int main ()
-  {
-   long long l1;
-
-   if (sizeof 

[PATCH] Ignore EDGE_PRESERVE in flow info verification (PR rtl-optimization/49912)

2011-11-25 Thread Jakub Jelinek
Hi!

The following testcase ICEs during flow verification, because there is
an unconditional branch with EDGE_PRESERVE set on the edge and because of
that bit rtl_verify_flow_info_1 wouldn't count it as n_branch.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2011-11-25  Jakub Jelinek  ja...@redhat.com

PR rtl-optimization/49912
* cfgrtl.c (rtl_verify_flow_info_1): Ignore also EDGE_PRESERVE bit
when counting n_branch.

* g++.dg/other/pr49912.C: New test.

--- gcc/cfgrtl.c.jj 2011-11-21 16:22:02.0 +0100
+++ gcc/cfgrtl.c2011-11-25 10:29:54.272326735 +0100
@@ -1875,7 +1875,8 @@ rtl_verify_flow_info_1 (void)
| EDGE_CAN_FALLTHRU
| EDGE_IRREDUCIBLE_LOOP
| EDGE_LOOP_EXIT
-   | EDGE_CROSSING)) == 0)
+   | EDGE_CROSSING
+   | EDGE_PRESERVE)) == 0)
n_branch++;
 
  if (e-flags  EDGE_ABNORMAL_CALL)
--- gcc/testsuite/g++.dg/other/pr49912.C.jj 2011-11-25 10:40:27.180613829 
+0100
+++ gcc/testsuite/g++.dg/other/pr49912.C2011-11-25 10:40:15.0 
+0100
@@ -0,0 +1,38 @@
+// PR rtl-optimization/49912
+// { dg-do compile }
+// { dg-require-effective-target freorder }
+// { dg-options -O -freorder-blocks-and-partition }
+
+int foo (int *);
+
+struct S
+{
+  int *m1 ();
+  S (int);
+   ~S () { foo (m1 ()); }
+};
+
+template int
+struct V
+{
+  S *v1;
+  void m2 (const S );
+  S *base ();
+};
+
+template int N
+void VN::m2 (const S x)
+{
+  S a = x;
+  S *l = base ();
+  while (l)
+*v1 = *--l;
+}
+
+V0 v;
+
+void
+foo ()
+{
+  v.m2 (0);
+}

Jakub


[PATCH] Make sibcall argument overlap check less pessimistic (PR middle-end/50074)

2011-11-25 Thread Jakub Jelinek
Hi!

Kirill's recent change to mem_overlaps_already_clobbered_arg_p
resulted in various code quality regressions, many calls that used
to be tail call optimized no longer are.

Here is an attempt to make the check more complete (e.g.
the change wouldn't see overlap if addr was PLUS of two REGs,
where one of the REGs was based on internal_arg_pointer, etc.)
and less pessimistic.  As tree-tailcall.c doesn't allow tail calls
from functions that have address of any of the caller's parameters
taken, IMHO it is enough to look for internal_arg_pointer based
pseudos initialized in the tail call sequence.
This patch scans the tail call sequence and notes which pseudos
are based on internal_arg_pointer (and what offset from
that pointer they have) and uses that in
mem_overlaps_already_clobbered_arg_p.

Bootstrapped/regtested on x86_64-linux and i686-linux, tested on some
testcases on ia64-linux (as an example of target which doesn't have
reg + disp addressing and thus forces everything into registers).
Ok for trunk?

2011-11-25  Jakub Jelinek  ja...@redhat.com

PR middle-end/50074
* calls.c (internal_arg_pointer_seq_start,
internal_arg_pointer_cache): New variables.
(internal_arg_pointer_based_reg_1): New function.
(internal_arg_pointer_based_reg): New function.
(mem_overlaps_already_clobbered_arg_p): Use it.
(expand_call): Free internal_arg_pointer_cache vector
and clear internal_arg_pointer_seq_start.

--- gcc/calls.c.jj  2011-11-08 23:35:12.0 +0100
+++ gcc/calls.c 2011-11-25 17:24:52.445878841 +0100
@@ -1658,6 +1658,106 @@ rtx_for_function_call (tree fndecl, tree
   return funexp;
 }
 
+/* Last insn that has been already scanned by internal_arg_pointer_based_reg,
+   or NULL_RTX if none has been scanned yet.  */
+static rtx internal_arg_pointer_seq_start;
+/* Vector indexed by REGNO () - FIRST_PSEUDO_REGISTER, recoding if a pseudo
+   is based on crtl-args.internal_arg_pointer.  It is NULL_RTX if not based
+   on it, some CONST_INT as offset from crtl-args.internal_arg_pointer
+   or PC for unknown offset from it.  */
+static VEC(rtx, heap) *internal_arg_pointer_cache;
+
+static rtx internal_arg_pointer_based_reg (rtx, bool);
+
+/* Helper function for internal_arg_pointer_based_reg, called through
+   for_each_rtx.  Return 1 if a crtl-args.internal_arg_pointer based
+   register is seen anywhere.  */
+
+static int
+internal_arg_pointer_based_reg_1 (rtx *loc, void *data ATTRIBUTE_UNUSED)
+{
+  if (REG_P (*loc)  internal_arg_pointer_based_reg (*loc, false) != NULL_RTX)
+return 1;
+  if (MEM_P (*loc))
+return -1;
+  return 0;
+}
+
+/* If REG is based on crtl-args.internal_arg_pointer, return either
+   a CONST_INT offset from crtl-args.internal_arg_pointer if
+   offset from it is known constant, or PC if the offset is unknown.
+   Return NULL_RTX if REG isn't based on crtl-args.internal_arg_pointer.  */
+
+static rtx
+internal_arg_pointer_based_reg (rtx reg, bool scan)
+{
+  rtx insn;
+
+  if (CONSTANT_P (reg))
+return NULL_RTX;
+
+  if (reg == crtl-args.internal_arg_pointer)
+return const0_rtx;
+
+  if (REG_P (reg)  REGNO (reg)  FIRST_PSEUDO_REGISTER)
+return NULL_RTX;
+
+  if (GET_CODE (reg) == PLUS  CONST_INT_P (XEXP (reg, 1)))
+{
+  rtx val = internal_arg_pointer_based_reg (XEXP (reg, 0), scan);
+  if (val == NULL_RTX || val == pc_rtx)
+   return val;
+  return plus_constant (val, INTVAL (XEXP (reg, 1)));
+}
+
+  if (!scan)
+insn = NULL_RTX;
+  else if (internal_arg_pointer_seq_start == NULL_RTX)
+insn = get_insns ();
+  else
+insn = NEXT_INSN (internal_arg_pointer_seq_start);
+  while (insn)
+{
+  rtx set = single_set (insn);
+  if (set
+  REG_P (SET_DEST (set))
+  REGNO (SET_DEST (set)) = FIRST_PSEUDO_REGISTER)
+   {
+ rtx val = NULL_RTX;
+ unsigned int idx = REGNO (SET_DEST (set)) - FIRST_PSEUDO_REGISTER;
+ /* Punt on pseudos set multiple times.  */
+ if (idx  VEC_length (rtx, internal_arg_pointer_cache)
+  VEC_index (rtx, internal_arg_pointer_cache, idx)
+!= NULL_RTX)
+   val = pc_rtx;
+ else
+   val = internal_arg_pointer_based_reg (SET_SRC (set), false);
+ if (val != NULL_RTX)
+   {
+ VEC_safe_grow_cleared (rtx, heap, internal_arg_pointer_cache,
+idx + 1);
+ VEC_replace (rtx, internal_arg_pointer_cache, idx, val);
+   }
+   }
+  if (NEXT_INSN (insn) == NULL_RTX)
+   internal_arg_pointer_seq_start = insn;
+  insn = NEXT_INSN (insn);
+}
+
+  if (REG_P (reg))
+{
+  unsigned int idx = REGNO (reg) - FIRST_PSEUDO_REGISTER;
+  if (idx  VEC_length (rtx, internal_arg_pointer_cache))
+   return VEC_index (rtx, internal_arg_pointer_cache, idx);
+  else
+   return NULL_RTX;
+}
+
+  if (for_each_rtx (reg, internal_arg_pointer_based_reg_1, 

Re: Go patch committed: New lock/note implementation

2011-11-25 Thread Rainer Orth
Rainer Orth r...@cebitec.uni-bielefeld.de writes:

 This broke bootstrap on Linux/x86_64 (CentOS 5.5), which lacks
 O_CLOEXEC.

... and also Solaris 8 and 9 bootstrap which lack sem_timedwait:

/vol/gcc/src/hg/trunk/local/libgo/runtime/thread-sema.c: In function 
'runtime_semasleep':
/vol/gcc/src/hg/trunk/local/libgo/runtime/thread-sema.c:42:7: error: implicit 
declaration of function 'sem_timedwait' [-Werror=implicit-function-declaration]

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH, testsuite]: Enable sync_long_long on 32bit x86 and alpha

2011-11-25 Thread Uros Bizjak
Hello!

Attached patch enables sync_long_long tests on 32bit x86 and alpha.
Enabling the tests for alpha is obvious (it is 64bit-by-default
target, after all), but 32bit x86 needs at least -march=pentium passed
via dg-options. My previous patch checks bit_CMPXCHG8B cpuid bit
before compiling these tests, so passing-march=pentium is safe.

2011-11-25  Uros Bizjak  ubiz...@gmail.com

PR testsuite/51258
* lib/target-supports.exp
(check_effective_target_sync_long_long): Also supported on 32bit
x86 targets.  Add comment about required dg-options.
Add alpha*-*-* targets.
(check_effective_target_sync_long_long_runtime): Ditto.

* gcc.dg/atomic-op-4.c (dg-options): Add -march=pentium for
32bit x86 targets.
* gcc.dg/atomic-compare-exchange-4.c: Ditto.
* gcc.dg/atomic-exchange-4.c: Ditto.
* gcc.dg/atomic-load-4.c: Ditto.
* gcc.dg/atomic-store-4.c: Ditto.
* gcc.dg/di-longlong64-sync-1.c: Ditto.
* gcc.dg/di-sync-multithread.c: Ditto.
* gcc.dg/simulate-thread/atomic-load-longlong.c: Ditto.
* gcc.dg/simulate-thread/atomic-other-longlong.c: Ditto.

Patch was tested on 32bit x86 build and alphaev68-pc-linux-gnu.
Committed to mainline SVN.

However, the patch uncovers certain problems with existing fild/fistpl
implementation of atomic load/store. It fails in several of thread
simulation tests, i.e.

FAIL: gcc.dg/simulate-thread/atomic-load-longlong.c  -O0 -g  thread
simulation test

with:

1: x/i $pc

= 0x8048582 simulate_thread_main+61: fild   -0x8(%ebp)

0x08048585  104   __atomic_store_n (result, ret, __ATOMIC_SEQ_CST);

1: x/i $pc

= 0x8048585 simulate_thread_main+64: fistp  0x8049ac0

0x0804858b  104   __atomic_store_n (result, ret, __ATOMIC_SEQ_CST);

1: x/i $pc

= 0x804858b simulate_thread_main+70: lock orl $0x0,(%esp)

FAIL: Invalid result returned from fetch


I didn't check SSE, but it looks that fild/fistpl combo isn't atomic
or does not obey lock barriers.

Uros.
Index: lib/target-supports.exp
===
--- lib/target-supports.exp (revision 181727)
+++ lib/target-supports.exp (working copy)
@@ -3644,11 +3644,14 @@
 }
 
 # Return 1 if the target supports atomic operations on long long.
+#
+# Note: 32bit x86 targets require -march=pentium in dg-options.
 
 proc check_effective_target_sync_long_long { } {
-if { ([istarget x86_64-*-*] || [istarget i?86-*-*])
- ![is-effective-target ia32]
-|| [istarget arm*-*-*] } {
+if { [istarget x86_64-*-*]
+|| [istarget i?86-*-*])
+|| [istarget arm*-*-*]
+|| [istarget alpha*-*-*] } {
return 1
 } else {
return 0
@@ -3657,10 +3660,12 @@
 
 # Return 1 if the target supports atomic operations on long long
 # and can execute them.
+#
+# Note: 32bit x86 targets require -march=pentium in dg-options.
 
 proc check_effective_target_sync_long_long_runtime { } {
-if { ([istarget x86_64-*-*] || [istarget i?86-*-*])
- ![is-effective-target ia32] } {
+if { [istarget x86_64-*-*]
+|| [istarget i?86-*-*] } {
return [check_cached_effective_target sync_long_long_available {
check_runtime_nocache sync_long_long_available {
#include cpuid.h
@@ -3689,6 +3694,8 @@
  exit (0);
}
}  ]
+} elseif { [istarget alpha*-*-*] } {
+   return 1
 } else {
return 0
 }
Index: gcc.dg/atomic-compare-exchange-4.c
===
--- gcc.dg/atomic-compare-exchange-4.c  (revision 181727)
+++ gcc.dg/atomic-compare-exchange-4.c  (working copy)
@@ -3,6 +3,7 @@
 /* { dg-do run } */
 /* { dg-require-effective-target sync_long_long_runtime } */
 /* { dg-options  } */
+/* { dg-options -march=pentium { target { { i?86-*-* x86_64-*-* }  ia32 } 
} } */
 
 /* Test the execution of __atomic_compare_exchange_n builtin for a long_long.  
*/
 
Index: gcc.dg/di-longlong64-sync-1.c
===
--- gcc.dg/di-longlong64-sync-1.c   (revision 181727)
+++ gcc.dg/di-longlong64-sync-1.c   (working copy)
@@ -1,6 +1,8 @@
 /* { dg-do run } */
 /* { dg-require-effective-target sync_long_long_runtime } */
 /* { dg-options -std=gnu99 } */
+/* { dg-additional-options -march=pentium { target { { i?86-*-* x86_64-*-* } 
 ia32 } } } */
+
 /* { dg-message note: '__sync_fetch_and_nand' changed semantics in GCC 4.4 
 { target *-*-* } 0 } */
 /* { dg-message note: '__sync_nand_and_fetch' changed semantics in GCC 4.4 
 { target *-*-* } 0 } */
 
Index: gcc.dg/atomic-load-4.c
===
--- gcc.dg/atomic-load-4.c  (revision 181727)
+++ gcc.dg/atomic-load-4.c  (working copy)
@@ -3,6 +3,7 @@
 /* { dg-do run } */
 /* { dg-require-effective-target sync_long_long_runtime } */
 /* { dg-options  

Re: [PATCH, testsuite]: Enable sync_long_long on 32bit x86 and alpha

2011-11-25 Thread Uros Bizjak
On Fri, Nov 25, 2011 at 8:31 PM, Uros Bizjak ubiz...@gmail.com wrote:

 I didn't check SSE, but it looks that fild/fistpl combo isn't atomic
 or does not obey lock barriers.

Adding -msse to failing test works OK.

Uros.


[PATCH] Improve EXPAND_SUM handling in expand_expr_addr_expr* (PR middle-end/50074)

2011-11-25 Thread Jakub Jelinek
Hi!

While looking at this PR, I was first surprised that on i?86
we got pseudo = argp + 4 and mem_overlap* was called with
that pseudo + 4 etc.  I don't see why we should force the address
into register for EXPAND_SUM modifier, with this mem_overlap* sees
argp + 8 etc. directly (on i?86, of course on ia64 it still sees
a register and thus the other patch I've posted is needed).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-11-25  Jakub Jelinek  ja...@redhat.com

PR middle-end/50074
* expr.c (expand_expr_addr_expr_1): Don't call force_operand for
EXPAND_SUM modifier.

--- gcc/expr.c.jj   2011-11-21 16:22:02.0 +0100
+++ gcc/expr.c  2011-11-25 12:46:40.070831662 +0100
@@ -7452,7 +7452,8 @@ expand_expr_addr_expr_1 (tree exp, rtx t
}
 
  if (modifier != EXPAND_INITIALIZER
-  modifier != EXPAND_CONST_ADDRESS)
+  modifier != EXPAND_CONST_ADDRESS
+  modifier != EXPAND_SUM)
result = force_operand (result, target);
  return result;
}

Jakub


Re: [Patch, fortran, RFC] PR 40958 Reduce size of module files

2011-11-25 Thread Mikael Morin
On Friday 25 November 2011 11:10:01 Janne Blomqvist wrote:
 Based on a brief inspection of the code, most if
 not all of these seeks are for a very short distance (typically peek a
 few bytes ahead in the stream, then seek back)
I'm afraid they aren't.
The moves are as follows (-: sequential, x: seek)
-- beginning of file
  - skip operator interfaces
  - skip user operators
  - skip commons, equivalences, and derived type extensions
  - register the offset of each symbol node and skip it
  -   (this is usually
  -the biggest part of the module)
  - read the symtree list and mark needed the associated symbols (if they are 
wanted)
-- end of file
  x go back to operator interfaces and load them
  - load user operators
  - load commons
  - load equivalences
  xxx now the required symbols are known, so for each one of them seek to 
its offset and load it. This requires a lot of seeks, and if the number of 
symbols, components etc is high in the module, they are not necessarily short 
distance
  x load derived type extensions


We'll see the results from Salvatore, but I'm not very optimistic.

Mikael


Re: Memset/memcpy patch

2011-11-25 Thread Jan Hubicka
 On Wed, Nov 23, 2011 at 3:32 PM, Michael Zolotukhin
 michael.v.zolotuk...@gmail.com wrote:
  I found and fixed another problem in the latest memcpy/memest changes
  - with this fix all the failing tests mentioned in #51134 started
  passing. Bootstraps are also ok.
  Though I still see fails in 32-bit make check, so probably, it'd be
  better to revert the changes till these fails are fixed.
 
 
 I will revert it for now.

OK.  I guess I can break out the simple fixes and commit them for 4.7 and we
could revisit this for next stage1. Probably not by adding all the features
together, but extending prologues/epilogues first and adding SSE loops with
the new alignment logic next.

Honza
 
 -- 
 H.J.


RFA: Fix PR middle-end/50074

2011-11-25 Thread Joern Rennecke

On load-store architectures, the function address is generally loaded into
a register before any outgoing arguments are stored in the stack frame
(if any).  Thus, generally allowing memory loads before any arguments of
the sibcall have been stored in the stack frame is effective to make the
sibcall-6.c test work again.
This has been confirmed for Epiphany, x86_64-apple-darwin10 and s390x .

Bootstrapped and regtested on i686-pc-linux-gnu.
2011-11-19  Joern Rennecke  joern.renne...@embecosm.com

PR middle-end/50074
* calls.c (mem_overlaps_already_clobbered_arg_p):
Return false if no outgoing arguments have been stored so far.

Index: calls.c
===
--- calls.c (revision 2195)
+++ calls.c (working copy)
@@ -1668,6 +1668,8 @@ mem_overlaps_already_clobbered_arg_p (rt
 {
   HOST_WIDE_INT i;
 
+  if (sbitmap_empty_p (stored_args_map))
+return false;
   if (addr == crtl-args.internal_arg_pointer)
 i = 0;
   else if (GET_CODE (addr) == PLUS