Re: Fix libgomp semaphores
On Fri, Nov 25, 2011 at 08:38:39AM +0100, Jakub Jelinek wrote: My preference would be to avoid the abstraction changes though, both because it is additional clutter in the changeset and because omp_lock and nested lock are part of public ABIs, so if struct is layed out differently on some weird architecture, it would be an ABI change. OK, fair enough. I didn't consider that structs may be laid out differently. So, if you could keep gomp_mutex_t, omp_lock_t and gomp_sem_t as integers, it would be appreciated. Furthermore, I'd prefer if the patch could be split into smaller parts, e.g. for bisecting purposes. One patch would do the mutex changes to use new atomics, remove extra mutex.h headers and start using 0/1/-1 instead of 0/1/2. And another patch would rewrite the semaphores. OK. I need to do this anyway as I just discovered a regression when looping on one of the tests. I suspect the acquire/release mutex locking may have exposed bugs elsewhere in libgomp that were covered by the heavyweight locking used by the __sync builtins. -- Alan Modra Australia Development Lab, IBM
Re: Re-merge crtstuff.c from the trans-mem branch
Rainer Orth r...@cebitec.uni-bielefeld.de writes: While the first patch allows Solaris 8/9 x86 bootstraps to finish (testsuite still running), I happened to run a Solaris 10/SPARC bootstrap that broke configuring stage 2 libgomp: even trivial executables die with a SEGV in _init. It turns out (still verifying with a fresh bootstrap) that the -fno-inline removal is the culprit. All bootstraps have now completed without regressions, so the patch is good to go from a Solaris POV. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[Patch, fortran, RFC] PR 40958 Reduce size of module files
Hi, gfortran has a few long-standing bugs wrt module handling. The more fundamental, and also more difficult to fix, issue is that we re-read and re-parse module files every time a USE statement is encountered, instead of once per translation unit. See PR 25708. Another issue, PR 40958, is that module files can be quite big which exacerbates the PR 25708 issues. The attached patch fixes PR 40958 by compressing the module files with zlib and storing them in the gzip format (RFC 1952). I chose zlib because it's a) ubiquitous and b) there's already a copy of zlib in the GCC source tree, so this doesn't introduce any further build dependencies. Since the mod files with the patch are in the gzip format, one can use tools like zcat, zless, zgrep, zdiff etc. to inspect the uncompressed contents easily (one can also use gunzip if one first copies the module file to a temporary file with .gz extension). However, there's a couple of issues related to seeking in gzip files (gzseek() instead of fseek() which is currently used). One is fixed by the patch, the other is a potentially serious performance issue. First, for a writable gzip file, seeking backwards is not allowed. Currently when writing a module file, we first write a placeholder for the MD5, then write the actual module content while updating the MD5 sum in memory as we go, and finally we seek back and write the final MD5 value. However, the gzip file format contains a solution, 8 bytes from the end of the file a CRC32 checksum of the (uncompressed) content is stored. So the patch rips out the MD5 machinery, and instead compares these CRC32 checksums to determine whether to replace an existing module file or not (from the command line, one can check the CRC32 with 'zcat -l -v filename'). As a result, the module version number has been bumped as well. The second issue that the patch doesn't address in any way, is that while seeking on a gzip file in read mode is allowed, from zlib.h: If the file is opened for reading, this function is emulated but can be extremely slow.. Unfortunately, when reading a module file we do seek back and forth in it. Based on a brief inspection of the code, most if not all of these seeks are for a very short distance (typically peek a few bytes ahead in the stream, then seek back), and if the gzseek() function is somewhat clever about seeking within the read buffer, this might not be so slow after all. OTOH, if every gzseek() call means restarting the inflation from the beginning of the file, the impact could be quite bad. The patch passes regression testing except for one failure, module_md5_1.f90 which should be removed. Based on some quick testing, the size of module files are reduced by a factor of 5 or thereabouts. I haven't checked performance, in particular one would need to check the second issue described above for some of those testcases generating large module files. I think there was some single-file version of cp2k somewhere that could be used for this, or are there other appropriate tests somewhere that aren't too difficult to set up? So at the moment, I'm not proposing this patch for inclusion, consider it a RFC. Especially appropriate benchmark results and/or pointers to easy-to-set-up testcases are appreciated. In case the seeking in read mode is an issue, I suspect it wouldn't be too hard to fix the parsing to not require it, but I think that would push the patch more towards 4.8 material. -- Janne Blomqvist diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 17ebd58..d6152b3 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -29,6 +29,9 @@ along with GCC; see the file COPYING3. If not see multiple header files. Besides, Microsoft's winnt.h was 250k last time I looked, so by comparison this is perfectly reasonable. */ +#include config.h +#include system.h + /* Declarations common to the front-end and library are put in libgfortran/libgfortran_frontend.h */ #include libgfortran.h @@ -38,6 +41,7 @@ along with GCC; see the file COPYING3. If not see #include coretypes.h #include input.h #include splay-tree.h +#include zlib.h /* Major control parameters. */ @@ -2345,7 +2349,8 @@ void gfc_add_include_path (const char *, bool, bool); void gfc_add_intrinsic_modules_path (const char *); void gfc_release_include_path (void); FILE *gfc_open_included_file (const char *, bool, bool); -FILE *gfc_open_intrinsic_module (const char *); +gzFile gfc_gzopen_included_file (const char *, bool, bool); +gzFile gfc_open_intrinsic_module (const char *); int gfc_at_end (void); int gfc_at_eof (void); diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c index 62f7598..9fa8c97 100644 --- a/gcc/fortran/module.c +++ b/gcc/fortran/module.c @@ -72,15 +72,15 @@ along with GCC; see the file COPYING3. If not see #include arith.h #include match.h #include parse.h /* FIXME */ -#include md5.h #include constructor.h #include cpp.h +#include zlib.h #define
Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule
Hi Revital, Revital Eres revital.e...@linaro.org writes: The attached patch adds register pressure estimation of the partial schedule. My main comment is that we shouldn't need to track separate liveness sets for each loop here, since we're only looking at one basic block. I.e., rather than operate on the per-loop LOOP_DATA (loop)-regs_{ref,live}, we should be able to use a single pair of bitmaps. Also, the code goes to a lot of trouble over this case: + /* Add to the set of out live regs all the registers defined in bb + which have uses outside of it (those registers where eliminated in + the above calculation). Eliminate from this set the definitions + that exist in the epilog and with no uses inside the basic-block + as these definitions will be eliminated from the bb and thus should + not be considered for estimating register pressure in the bb. */ But how often does it occur in practice? It's not necessarily the case that the instruction will be eliminated, because things like volatility might require us to keep it. It's probably more accurate to say that we can treat these as unused defs. There's an argument to say that we should only consider registers that are used in the loop. If the pressure is high because of registers that are live across the loop but not used within it, then it's reasonable to force code outside the loop to spill some of those. That would suggest starting with the intersection of DR_LR_OUT and DF_LR_BB_INFO (bb)-use. Starting with that set also has the advantage of handling the above case for free. (This occurs often in our friend the popular embedded benchmark, which often has a single function of the form: A: ...set up... B: for (i = 0; i num_runs; i++) C: ...benchmark... D: ...record time... Some values are live from A-D, but those values shouldn't affect an SMSable loop somewhere in C.) We talked earlier about making the main pressure-estimation code process the loop twice, but I see instead you've gone for two separate passes, one to calculate LR out, then the main pass. I think with the changes above, running the same loop twice is going to be easier and no less efficient. We could even add code to skip the second iteration if it would start with the same lr_out as the first iteration. Richard
Re: [PATCH 0/2] Add atomic support to m68k
Richard Henderson writes: On 11/23/2011 06:46 AM, Mikael Pettersson wrote: +FAIL: c-c++-common/gomp/atomic-10.c scan-tree-dump-times ompexp __atomic_fetch_add 4 +FAIL: c-c++-common/gomp/atomic-3.c scan-tree-dump-times ompexp xyzzy, 4 1 +FAIL: c-c++-common/gomp/atomic-9.c scan-tree-dump-times ompexp __atomic_fetch_add 1 What are these failures? Executing on host: /mnt/scratch/objdir47/gcc/xgcc -B/mnt/scratch/objdir47/gcc/ /mnt/scratch/gcc-4.7-2012/gcc/testsuite/c-c++-common/gomp/atomic-9.c -fopenmp -fdump-tree-ompexp -S -o atomic-9.s(timeout = 300) PASS: c-c++-common/gomp/atomic-9.c (test for excess errors) FAIL: c-c++-common/gomp/atomic-9.c scan-tree-dump-times ompexp __atomic_fetch_add 1 The test case expects #pragma omp atomic *bar() += 1; to become __atomic_fetch_add (it does on x86_64), but on m68k-linux with your patch the assignment is instead bracketed by __builtin_GOMP_atomic_{start,end}(). atomic-10.c and atomic-3.c are the same issue. Are they fixed if you add m68k-linux to check_effective_target_sync_int_long and check_effective_target_sync_char_short in gcc/testsuite/lib/target-supports.exp? No. These tests require cas_int, and the patched gcc does provide that. I believe the real error is that gomp for some reason doesn't think the target has gcc atomics, and the tests fail in that case. /Mikael
Re: [PATCH] Remove dead labels to increase superblock scope
On 21/11/11 17:13, Michael Matz wrote: Hi, On Sat, 19 Nov 2011, Tom de Vries wrote: On 11/18/2011 10:29 PM, Eric Botcazou wrote: For the test-case of PR50764, a dead label is introduced by fixup_reorder_chain in cfg_layout_finalize, called from pass_reorder_blocks. I presume that there is no reasonable way of preventing fixup_reorder_chain from introducing it or of teaching fixup_reorder_chain to remove it? This (untested) patch also removes the dead label for the PR, and I think it is safe. ... cfgrtl.c has already code to delete labels (delete_insn) when appropriate (can_delete_label_p). Perhaps that can be reused somehow. Index: cfglayout.c === --- cfglayout.c (revision 181377) +++ cfglayout.c (working copy) @@ -702,6 +702,21 @@ relink_block_chain (bool stay_in_cfglayo } +static bool +forced_label_p (rtx label) +{ + rtx insn, forced_label; + for (insn = forced_labels; insn; insn = XEXP (insn, 1)) +{ + forced_label = XEXP (insn, 0); + if (!LABEL_P (forced_label)) +continue; + if (forced_label == label) +return true; +} + return false; +} That's in_expr_list_p(). @@ -857,6 +872,12 @@ fixup_reorder_chain (void) (e_taken-src, e_taken-dest)); e_taken-flags |= EDGE_FALLTHRU; update_br_prob_note (bb); + if (LABEL_NUSES (ret_label) == 0 + !LABEL_PRESERVE_P (ret_label) + LABEL_NAME (ret_label) == NULL + !forced_label_p (ret_label) And this is cfgrtl.c:can_delete_label_p. Ok, using that in the new version. Note that you actually can remove labels also if they are !can_delete_label_p, if you use delete_insn (which you do). It will replace such undeletable labels by a DELETED_LABEL note. I tried that as well but ran into these errors in rtl_verify_flow_info_1: ... libquadmath/printf/cmp.c:56:1: error: NOTE_INSN_BASIC_BLOCK is missing for block 6 libquadmath/printf/cmp.c:56:1: error: NOTE_INSN_BASIC_BLOCK 79 in middle of basic block 6 libquadmath/printf/cmp.c:56:1: internal compiler error: verify_flow_info failed a-direct.ads:460:9: error: NOTE_INSN_BASIC_BLOCK is missing for block 6 a-direct.ads:460:9: error: NOTE_INSN_BASIC_BLOCK 25 in middle of basic block 6 +===GNAT BUG DETECTED==+ | 4.7.0 2023 (experimental) (x86_64-unknown-linux-gnu) GCC error: | | verify_flow_info failed | | Error detected around a-direct.ads:460:9 | ... Eric, This new patch was bootstrapped and reg-tested on x86_64. this new patch or old patch ( http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01953.html ) ok for next stage1? Thanks, - Tom Ciao, Michael. 2011-11-25 Tom de Vries t...@codesourcery.com * rtl.h (can_delete_label_p): Declare. * cfgrtl.c (can_delete_label_p): Remove static. * cfglayout.c (fixup_reorder_chain): Delete unused label if can_delete_label_p. * gcc.dg/superblock.c: New test. Index: gcc/cfglayout.c === --- gcc/cfglayout.c (revision 181652) +++ gcc/cfglayout.c (working copy) @@ -857,6 +857,10 @@ fixup_reorder_chain (void) (e_taken-src, e_taken-dest)); e_taken-flags |= EDGE_FALLTHRU; update_br_prob_note (bb); + if (LABEL_NUSES (ret_label) == 0 + can_delete_label_p (ret_label) + single_pred_p (e_taken-dest)) + delete_insn (ret_label); continue; } } Index: gcc/rtl.h === --- gcc/rtl.h (revision 181652) +++ gcc/rtl.h (working copy) @@ -2482,6 +2482,9 @@ extern void dump_combine_total_stats (FI /* In cfgcleanup.c */ extern void delete_dead_jumptables (void); +/* In rtlcfg.c */ +int can_delete_label_p (const_rtx); + /* In sched-vis.c. */ extern void debug_bb_n_slim (int); extern void debug_bb_slim (struct basic_block_def *); Index: gcc/cfgrtl.c === --- gcc/cfgrtl.c (revision 181652) +++ gcc/cfgrtl.c (working copy) @@ -66,7 +66,6 @@ along with GCC; see the file COPYING3. #include df.h static int can_delete_note_p (const_rtx); -static int can_delete_label_p (const_rtx); static basic_block rtl_split_edge (edge); static bool rtl_move_block_after (basic_block, basic_block); static int rtl_verify_flow_info (void); @@ -102,7 +101,7 @@ can_delete_note_p (const_rtx note) /* True if a given label can be deleted. */ -static int +int can_delete_label_p (const_rtx label) { return (!LABEL_PRESERVE_P (label) Index: gcc/testsuite/gcc.dg/superblock.c === --- /dev/null (new file) +++
[Patch, Fortran] PR 50408 [4.6/4.7] ICE related to whole-file processing
The patch fixes an issue when the backend_decl is reused (-fwhole-file). The problem is that not always the ts.u.derived-backend_decl was copied as well. I copied what was done a bit later in the file and extended it to also include BT_CLASS. The trans-type.c change is not needed, but I thought it is a good optimization. from == to seems to happen quite regularly. Build and regtested on x86-64-linux. OK for the trunk and 4.6? Tobias PS: It also affects 4.5 if one uses -fwhole-file. However, my impression is that no one uses that option with 4.5 and other whole-file bugs have only been fixed for 4.6. But if you think one should backport it to 4.5, I can surely do so. 2011-11-25 Tobias Burnus bur...@net-b.de PR fortran/50408 * trans-decl.c (gfc_get_module_backend_decl): Also copy ts.u.derived from the gsym if the ts.type is BT_CLASS. (gfc_get_extern_function_decl): Copy also the backend_decl for the symbol's ts.u.{derived,cl} from the gsym. * trans-types.c (gfc_copy_dt_decls_ifequal): Directly return if from and to are the same. 2011-11-25 Tobias Burnus bur...@net-b.de PR fortran/50408 * gfortran.dg/whole_file_35.f90: New. diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c index fc8a9ed..39ec8cd 100644 --- a/gcc/fortran/trans-decl.c +++ b/gcc/fortran/trans-decl.c @@ -718,7 +718,7 @@ gfc_get_module_backend_decl (gfc_symbol *sym) } else if (s-backend_decl) { - if (sym-ts.type == BT_DERIVED) + if (sym-ts.type == BT_DERIVED || sym-ts.type == BT_CLASS) gfc_copy_dt_decls_ifequal (s-ts.u.derived, sym-ts.u.derived, true); else if (sym-ts.type == BT_CHARACTER) @@ -1670,6 +1670,11 @@ gfc_get_extern_function_decl (gfc_symbol * sym) gfc_find_symbol (sym-name, gsym-ns, 0, s); if (s s-backend_decl) { + if (sym-ts.type == BT_DERIVED || sym-ts.type == BT_CLASS) + gfc_copy_dt_decls_ifequal (s-ts.u.derived, sym-ts.u.derived, + true); + else if (sym-ts.type == BT_CHARACTER) + sym-ts.u.cl-backend_decl = s-ts.u.cl-backend_decl; sym-backend_decl = s-backend_decl; return sym-backend_decl; } diff --git a/gcc/fortran/trans-types.c b/gcc/fortran/trans-types.c index 3f4ebd5..d643c2e 100644 --- a/gcc/fortran/trans-types.c +++ b/gcc/fortran/trans-types.c @@ -2188,6 +2188,9 @@ gfc_copy_dt_decls_ifequal (gfc_symbol *from, gfc_symbol *to, gfc_component *to_cm; gfc_component *from_cm; + if (from == to) +return 1; + if (from-backend_decl == NULL || !gfc_compare_derived_types (from, to)) return 0; --- /dev/null 2011-11-22 07:52:35.375586753 +0100 +++ gcc/gcc/testsuite/gfortran.dg/whole_file_35.f90 2011-11-25 09:30:18.0 +0100 @@ -0,0 +1,28 @@ +! { dg-do compile } +! +! PR fortran/50408 +! +! Contributed by Vittorio Zecca +! + module m + type int + integer :: val + end type int + interface ichar + module procedure uch +end interface + contains + function uch (c) + character (len=1), intent (in) :: c + type (int) :: uch + intrinsic ichar + uch%val = 127 - ichar (c) + end function uch + end module m + + program p +use m +print *,ichar('~') ! must print 1 + end program p + +! { dg-final { cleanup-modules m } }
Re: [Patch,AVR]: Clean up SFR offset usage: %i for CONST_INT
Georg-Johann Lay wrote: Denis Chertykov wrote: 2011/11/20 Georg-Johann Lay .: Subtracting 0x20 to get the SFR address from a RAM address is scattered all over the backend. The patch makes - PRINT_OPERAND_PUNCT_VALID_P and uses %- to subtract the SFR offset instead of hard coded magic number 0x20 all over the place. The offset is stored in a new field base_arch_s.sfr_offset I don't like '%-' as a sequence and I don't like it as a suffix. May be a right way is an adding a new prefix '%i' or '%I'. I.e. %m0 - memory address %i0 - io address (equal to %m0 - 0x20) Denis. hmmm. The intention was to be able to specify SFR offset in inline assembly, for example. The offset is independent of operands; it is a specific to the architecture. Anyway, here is a updated patch. Its the same as the last except that it implements %i instead of %- and avr_out_plus_1 prints constants more eye-friendly. And there was a missing return close to the end of out_movqi_mr_r. Passes test suite. Ok? Johann * config/avr/avr.h (struct base_arch_s): Add field sfr_offset. * config/avr/avr-devices.c: Ditto. And initialize it. * config/avr/avr-c.c (avr_cpu_cpp_builtins): New built-in define __AVR_SFR_OFFSET__. * config/avr/avr-protos.h (out_movqi_r_mr, out_movqi_mr_r): Remove. (out_movhi_r_mr, out_movhi_mr_r): Remove. (out_movsi_r_mr, out_movsi_mr_r): Remove. * config/avr/avr.md (*cbi, *sbi): Use %i instead of %m-0x20. (*insv.io, *insv.not.io): Ditto. * config/avr/avr.c (out_movsi_r_mr, out_movsi_mr_r): Make static. (print_operand): Implement %i to print address as I/O address. (output_movqi): Clean up call of out_movqi_mr_r. (output_movhi): Clean up call of out_movhi_mr_r. (avr_file_start): Use avr_current_arch-sfr_offset instead of magic -0x20. Use TMP_REGNO, ZERO_REGNO instead of 0, 1. (avr_out_sbxx_branch): Use %i instead of %m-0x20. (out_movqi_r_mr, out_movqi_mr_r): Ditto. And make static. (out_movhi_r_mr, out_movhi_mr_r): Ditto. And use avr_asm_len. (out_shift_with_cnt): Clean up code: Use avr_asm_len. (output_movsisf): Use output_reload_insisf for all CONSTANT_P sources. (avr_out_movpsi): USE avr_out_reload_inpsi for all CONSTANT_P sources. Clean up call of avr_out_store_psi. (output_reload_in_const): Don't cut symbols longer than 2 bytes. (output_reload_insisf): Filter CONST_INT_P or CONST_DOUBLE_P to try if setting pre-cleared register is advantageous. (avr_out_plus_1): Use gen_int_mode instead of GEN_INT. This adds %i support for CONST_INT. It is needed because some insns don't use memory_operand but mem:QI (io_address_operand) %i(mem) just forwards to %i(const_int) Ok? Johann * config/avr/avr.c (print_operand): Support code = 'i' for CONST_INT. Index: config/avr/avr.md === --- config/avr/avr.md (revision 181717) +++ config/avr/avr.md (working copy) @@ -28,8 +28,8 @@ ;; j Branch condition. ;; k Reverse branch condition. ;;..m..Constant Direct Data memory address. -;; i Print the SFR address quivalent of a CONST_INT RAM address. -;; The resulting addres is suitable to be used in IN/OUT. +;; i Print the SFR address quivalent of a CONST_INT or a CONST_INT +;; RAM address. The resulting addres is suitable to be used in IN/OUT. ;; o Displacement for (mem (plus (reg) (const_int))) operands. ;; p POST_INC or PRE_DEC address as a pointer (X, Y, Z) ;; r POST_INC or PRE_DEC address as a register (r26, r28, r30) Index: config/avr/avr.c === --- config/avr/avr.c (revision 181717) +++ config/avr/avr.c (working copy) @@ -1822,9 +1822,32 @@ print_operand (FILE *file, rtx x, int co else fprintf (file, reg_names[true_regnum (x) + abcd]); } - else if (GET_CODE (x) == CONST_INT) -fprintf (file, HOST_WIDE_INT_PRINT_DEC, INTVAL (x) + abcd); - else if (GET_CODE (x) == MEM) + else if (CONST_INT_P (x)) +{ + HOST_WIDE_INT ival = INTVAL (x); + + if ('i' != code) +fprintf (file, HOST_WIDE_INT_PRINT_DEC, ival + abcd); + else if (low_io_address_operand (x, VOIDmode) + || high_io_address_operand (x, VOIDmode)) +{ + switch (ival) +{ +case RAMPZ_ADDR: fprintf (file, __RAMPZ__); break; +case SREG_ADDR: fprintf (file, __SREG__); break; +case SP_ADDR: fprintf (file, __SP_L__); break; +case SP_ADDR+1: fprintf (file, __SP_H__); break; + +default: + fprintf (file, HOST_WIDE_INT_PRINT_HEX, + ival - avr_current_arch-sfr_offset); + break; +} +} + else +fatal_insn (bad address, not an I/O address:, x); +} + else if (MEM_P
Re: [PATCH] Remove dead labels to increase superblock scope
Hi, On Fri, 25 Nov 2011, Tom de Vries wrote: Note that you actually can remove labels also if they are !can_delete_label_p, if you use delete_insn (which you do). It will replace such undeletable labels by a DELETED_LABEL note. I tried that as well but ran into these errors in rtl_verify_flow_info_1: ... libquadmath/printf/cmp.c:56:1: error: NOTE_INSN_BASIC_BLOCK is missing for block 6 libquadmath/printf/cmp.c:56:1: error: NOTE_INSN_BASIC_BLOCK 79 in middle of basic block 6 Hmpf, probably bitrotted over time. Oh well, so be it. Ciao, Michael.
Re: [PATCH] Remove dead labels to increase superblock scope
On Fri, Nov 25, 2011 at 2:03 PM, Michael Matz m...@suse.de wrote: Hi, On Fri, 25 Nov 2011, Tom de Vries wrote: Note that you actually can remove labels also if they are !can_delete_label_p, if you use delete_insn (which you do). It will replace such undeletable labels by a DELETED_LABEL note. I tried that as well but ran into these errors in rtl_verify_flow_info_1: ... libquadmath/printf/cmp.c:56:1: error: NOTE_INSN_BASIC_BLOCK is missing for block 6 libquadmath/printf/cmp.c:56:1: error: NOTE_INSN_BASIC_BLOCK 79 in middle of basic block 6 Hmpf, probably bitrotted over time. Oh well, so be it. No, DELETED_LABEL notes still work just fine. It depends on how you remove the label and replace it with a note, and Tom isn't showing what he did, so... Ciao! Steven
Re: [Patch,AVR]: Clean up SFR offset usage: %i for CONST_INT
2011/11/25 Georg-Johann Lay a...@gjlay.de Georg-Johann Lay wrote: Denis Chertykov wrote: 2011/11/20 Georg-Johann Lay .: Subtracting 0x20 to get the SFR address from a RAM address is scattered all over the backend. The patch makes - PRINT_OPERAND_PUNCT_VALID_P and uses %- to subtract the SFR offset instead of hard coded magic number 0x20 all over the place. The offset is stored in a new field base_arch_s.sfr_offset I don't like '%-' as a sequence and I don't like it as a suffix. May be a right way is an adding a new prefix '%i' or '%I'. I.e. %m0 - memory address %i0 - io address (equal to %m0 - 0x20) Denis. hmmm. The intention was to be able to specify SFR offset in inline assembly, for example. The offset is independent of operands; it is a specific to the architecture. Anyway, here is a updated patch. Its the same as the last except that it implements %i instead of %- and avr_out_plus_1 prints constants more eye-friendly. And there was a missing return close to the end of out_movqi_mr_r. Passes test suite. Ok? Johann * config/avr/avr.h (struct base_arch_s): Add field sfr_offset. * config/avr/avr-devices.c: Ditto. And initialize it. * config/avr/avr-c.c (avr_cpu_cpp_builtins): New built-in define __AVR_SFR_OFFSET__. * config/avr/avr-protos.h (out_movqi_r_mr, out_movqi_mr_r): Remove. (out_movhi_r_mr, out_movhi_mr_r): Remove. (out_movsi_r_mr, out_movsi_mr_r): Remove. * config/avr/avr.md (*cbi, *sbi): Use %i instead of %m-0x20. (*insv.io, *insv.not.io): Ditto. * config/avr/avr.c (out_movsi_r_mr, out_movsi_mr_r): Make static. (print_operand): Implement %i to print address as I/O address. (output_movqi): Clean up call of out_movqi_mr_r. (output_movhi): Clean up call of out_movhi_mr_r. (avr_file_start): Use avr_current_arch-sfr_offset instead of magic -0x20. Use TMP_REGNO, ZERO_REGNO instead of 0, 1. (avr_out_sbxx_branch): Use %i instead of %m-0x20. (out_movqi_r_mr, out_movqi_mr_r): Ditto. And make static. (out_movhi_r_mr, out_movhi_mr_r): Ditto. And use avr_asm_len. (out_shift_with_cnt): Clean up code: Use avr_asm_len. (output_movsisf): Use output_reload_insisf for all CONSTANT_P sources. (avr_out_movpsi): USE avr_out_reload_inpsi for all CONSTANT_P sources. Clean up call of avr_out_store_psi. (output_reload_in_const): Don't cut symbols longer than 2 bytes. (output_reload_insisf): Filter CONST_INT_P or CONST_DOUBLE_P to try if setting pre-cleared register is advantageous. (avr_out_plus_1): Use gen_int_mode instead of GEN_INT. This adds %i support for CONST_INT. It is needed because some insns don't use memory_operand but mem:QI (io_address_operand) %i(mem) just forwards to %i(const_int) Ok? Johann * config/avr/avr.c (print_operand): Support code = 'i' for CONST_INT. Ok. Denis.
Re: [PATCH] Remove dead labels to increase superblock scope
On 25/11/11 14:05, Steven Bosscher wrote: On Fri, Nov 25, 2011 at 2:03 PM, Michael Matz m...@suse.de wrote: Hi, On Fri, 25 Nov 2011, Tom de Vries wrote: Note that you actually can remove labels also if they are !can_delete_label_p, if you use delete_insn (which you do). It will replace such undeletable labels by a DELETED_LABEL note. I tried that as well but ran into these errors in rtl_verify_flow_info_1: ... libquadmath/printf/cmp.c:56:1: error: NOTE_INSN_BASIC_BLOCK is missing for block 6 libquadmath/printf/cmp.c:56:1: error: NOTE_INSN_BASIC_BLOCK 79 in middle of basic block 6 Hmpf, probably bitrotted over time. Oh well, so be it. No, DELETED_LABEL notes still work just fine. It depends on how you remove the label and replace it with a note, and Tom isn't showing what he did, so... This is the patch with which I ran into the rtl_verify_flow_info_1 errors: ... Index: gcc/cfglayout.c === --- gcc/cfglayout.c (revision 181172) +++ gcc/cfglayout.c (working copy) @@ -857,6 +857,9 @@ fixup_reorder_chain (void) (e_taken-src, e_taken-dest)); e_taken-flags |= EDGE_FALLTHRU; update_br_prob_note (bb); + if (LABEL_NUSES (ret_label) == 0 + single_pred_p (e_taken-dest)) + delete_insn (ret_label); continue; } } ... Thanks, - Tom Ciao! Steven
Fix doloop bug with maximum-length loops
This patch fixes a bug in the RTL doloop pass that showed as timeouts of gcc.c-torture/execute/961017-1.c execution on slow targets because a 256-iteration loop was replaced with a 2^32-iteration loop (if the test did not time out, it would still pass as it didn't contain any checks on the number of iterations). The testcases included with the patch are self-checking testcases that will reliably fail on affected targets (if the rest of the patch is not applied), aborting if they do not time out. Affected targets include sh-linux-gnu and powerpc-linux-gnu. The replacement occurs in the RTL doloop pass (loop-doloop.c). Recall that RTL CONST_INTs do not have modes. The number of iterations of the loop (appropriately defined) is calculated as (const_int -1) - implicitly QImode. It might seem appropriate for loop-iv.c:iv_number_of_iterations, where it does if (CONST_INT_P (desc-niter_expr)) { unsigned HOST_WIDEST_INT val = INTVAL (desc-niter_expr); desc-const_iter = true; desc-niter_max = desc-niter = val GET_MODE_MASK (desc-mode); } to adjust desc-niter_expr using the mask in the same way (i.e. desc-niter_expr = GEN_INT (desc-niter);). But that is neither necessary nor sufficient to fix the bug. It changes the number of iterations to the correct (const_int 255). But whether the number is given as 255 or -1, doloop_modify is entered with zero_extend_p == true and from_mode == QImode. The code there then determines that it needs to increment the count - and does so in QImode, which in either case produces 0, before then zero-extending to SImode. This code for doing the increment in from_mode comes from the fix for PR 37451 and the follow-up fix for PR 37782 http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html. As far as I can tell the idea of those changes - which were an attempt to improve optimization - is simply broken when the loop might have maximum length like this (which in the original PR 37451 case it can't, but telling that in this code would be nontrivial) - including the case of nonconstant length as well as that of constant length. So this patch reverts both those previous patches and adds testcases to demonstrate the problem they caused. Bootstrapped with no regressions on powerpc-linux-gnu. OK to commit? (If the patch holds up on trunk I'd propose it for 4.6 and 4.5 branches as well, as a wrong-code regression fix.) 2011-11-25 Joseph Myers jos...@codesourcery.com Revert: 2008-09-18 Andrew Pinski andrew_pin...@playstation.sony.com PR rtl-opt/37451 * loop-doloop.c (doloop_modify): New argument zero_extend_p and zero extend count after the correction to it is done. (doloop_optimize): Update call to doloop_modify, don't zero extend count before call. 2008-11-03 Andrew Pinski andrew_pin...@playstation.sony.com PR rtl-opt/37782 * loop-doloop.c (doloop_modify): Add from_mode argument that says what mode count is in. (doloop_optimize): Update call to doloop_modify. testsuite: 2011-11-25 Joseph Myers jos...@codesourcery.com * gcc.c-torture/execute/doloop-1.c, gcc.c-torture/execute/doloop-2.c: New tests. Index: testsuite/gcc.c-torture/execute/doloop-1.c === --- testsuite/gcc.c-torture/execute/doloop-1.c (revision 0) +++ testsuite/gcc.c-torture/execute/doloop-1.c (revision 0) @@ -0,0 +1,18 @@ +#include limits.h + +extern void exit (int); +extern void abort (void); + +volatile unsigned int i; + +int +main (void) +{ + unsigned char z = 0; + + do ++i; + while (--z 0); + if (i != UCHAR_MAX + 1U) +abort (); + exit (0); +} Index: testsuite/gcc.c-torture/execute/doloop-2.c === --- testsuite/gcc.c-torture/execute/doloop-2.c (revision 0) +++ testsuite/gcc.c-torture/execute/doloop-2.c (revision 0) @@ -0,0 +1,18 @@ +#include limits.h + +extern void exit (int); +extern void abort (void); + +volatile unsigned int i; + +int +main (void) +{ + unsigned short z = 0; + + do ++i; + while (--z 0); + if (i != USHRT_MAX + 1U) +abort (); + exit (0); +} Index: loop-doloop.c === --- loop-doloop.c (revision 181697) +++ loop-doloop.c (working copy) @@ -394,14 +394,11 @@ add_test (rtx cond, edge *e, basic_block describes the loop, DESC describes the number of iterations of the loop, and DOLOOP_INSN is the low-overhead looping insn to emit at the end of the loop. CONDITION is the condition separated from the - DOLOOP_SEQ. COUNT is the number of iterations of the LOOP. - ZERO_EXTEND_P says to zero extend COUNT after the increment of it to - word_mode from FROM_MODE. */ + DOLOOP_SEQ. COUNT is the number of iterations of the LOOP. */ static void
Re: Keep static VTA locs in cselib tables only
On Wed, Nov 23, 2011 at 08:10:00AM -0200, Alexandre Oliva wrote: - compiling stage2 target libs and stage3 host patched sources (with both unpatched and patched stage2 compiler) produced cc1plus with 10% fewer entry value expressions (a welcome surprise!), 1% fewer call site value expressions, an increase of 0.1% in the total number of variables with location lists and less than 0.5% decrease in variables with full coverage. The numbers I got with your patch (RTL checking) are below, seems the cumulative numbers other than 100% are all bigger with patched stage2, which means unfortunately debug info quality degradation. Have you analysed at least on some shorter testcases why does that happen? Otherwise the patch looks good to me. x86_64 patched stage3 compiled by vanilla stage2 cov%samples cumul 0.0 230172/32% 230172/32% 0..10 12267/1%242439/34% 11..20 10548/1%252987/35% 21..30 17018/2%270005/37% 31..40 16374/2%286379/40% 41..50 17533/2%303912/42% 51..60 13051/1%316963/44% 61..70 13946/1%330909/46% 71..80 19627/2%350536/49% 81..90 28877/4%379413/53% 91..99 85086/11% 464499/65% 100 246568/34% 711067/100% x86_64 patched stage3 compiled by patched stage2 cov%samples cumul 0.0 230182/32% 230182/32% 0..10 12319/1%242501/34% 11..20 10765/1%253266/35% 21..30 17390/2%270656/38% 31..40 16745/2%287401/40% 41..50 17821/2%305222/42% 51..60 13306/1%318528/44% 61..70 14104/1%332632/46% 71..80 19795/2%352427/49% 81..90 29030/4%381457/53% 91..99 85171/11% 466628/65% 100 244439/34% 711067/100% i686 patched stage3 compiled by vanilla stage2 cov%samples cumul 0.0 225909/32% 225909/32% 0..10 12420/1%238329/34% 11..20 10693/1%249022/35% 21..30 17102/2%266124/38% 31..40 13529/1%279653/40% 41..50 17232/2%296885/42% 51..60 12568/1%309453/44% 61..70 14769/2%324222/46% 71..80 14937/2%339159/48% 81..90 23868/3%363027/52% 91..99 86306/12% 449333/64% 100 245327/35% 694660/100% i686 patched stage3 compiled by patched stage2 cov%samples cumul 0.0 225917/32% 225917/32% 0..10 12471/1%238388/34% 11..20 10848/1%249236/35% 21..30 17292/2%266528/38% 31..40 13716/1%280244/40% 41..50 17324/2%297568/42% 51..60 12673/1%310241/44% 61..70 14950/2%325191/46% 71..80 15085/2%340276/48% 81..90 24019/3%364295/52% 91..99 86228/12% 450523/64% 100 244137/35% 694660/100% Jakub
Added myself to MAINTAINERS: write after approval
Committed. -- Index: MAINTAINERS === --- MAINTAINERS (revision 181721) +++ MAINTAINERS (working copy) @@ -345,6 +345,7 @@ David Daney david.da...@caviumnetworks.com Bud Davis jmda...@link.com Chris Demetriou c...@google.com +Sameera Deshpandesameera.deshpa...@arm.com Fran�ois Dumont fdum...@gcc.gnu.org Benoit Dupont de Dinechin benoit.dupont-de-dinec...@st.com Michael Eager ea...@eagercon.com
[Patch, Fortran, committed] PR51302 - fix ICE with volatile loop variable
Fixed the ICE: internal compiler error: in gfc_add_modify_loc, at fortran/trans.c:161 Build, regtested and committed (Rev. 181724 ) on x86-64-linux. Tobias Index: gcc/fortran/ChangeLog === --- gcc/fortran/ChangeLog (revision 181723) +++ gcc/fortran/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2011-11-25 Tobias Burnus bur...@net-b.de + + PR fortran/51302 + * trans-stmt.c (gfc_trans_simple_do): Add a fold_convert. + 2011-11-24 Tobias Burnus bur...@net-b.de PR fortran/51218 Index: gcc/fortran/trans-stmt.c === --- gcc/fortran/trans-stmt.c (revision 181723) +++ gcc/fortran/trans-stmt.c (working copy) @@ -1259,7 +1259,8 @@ gfc_trans_simple_do (gfc_code * code, stmtblock_t loc = code-ext.iterator-start-where.lb-location; /* Initialize the DO variable: dovar = from. */ - gfc_add_modify_loc (loc, pblock, dovar, from); + gfc_add_modify_loc (loc, pblock, dovar, + fold_convert (TREE_TYPE(dovar), from)); /* Save value for do-tinkering checking. */ if (gfc_option.rtcheck GFC_RTCHECK_DO) Index: gcc/testsuite/gfortran.dg/volatile13.f90 === --- gcc/testsuite/gfortran.dg/volatile13.f90 (revision 0) +++ gcc/testsuite/gfortran.dg/volatile13.f90 (working copy) @@ -0,0 +1,11 @@ +! { dg-do compile } +! +! PR fortran/51302 +! +! Volatile DO variable - was ICEing before +! +integer, volatile :: i +integer :: n = 1 +do i = 1, n +end do +end Index: gcc/testsuite/ChangeLog === --- gcc/testsuite/ChangeLog (revision 181723) +++ gcc/testsuite/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2011-11-25 Tobias Burnus bur...@net-b.de + + PR fortran/51302 + * gfortran.dg/volatile13.f90: New. + 2011-11-24 Andrew MacLeod amacl...@redhat.com PR c/51256
Re: [Patch, Fortran] PR 50408 [4.6/4.7] ICE related to whole-file processing
On Fri, Nov 25, 2011 at 11:46:37AM +0100, Tobias Burnus wrote: The patch fixes an issue when the backend_decl is reused (-fwhole-file). The problem is that not always the ts.u.derived-backend_decl was copied as well. I copied what was done a bit later in the file and extended it to also include BT_CLASS. The trans-type.c change is not needed, but I thought it is a good optimization. from == to seems to happen quite regularly. Build and regtested on x86-64-linux. OK for the trunk and 4.6? OK. I have no issues with committing the fix to 4.5. It however may be time to allow 4.5 to ride off into the sunset. -- Steve
Re: Go patch committed: New lock/note implementation
Ian Lance Taylor i...@google.com writes: This patch updates the implementations of locks and notes used in libgo to use the current version from the master Go library. This now uses futexes when running on GNU/Linux, while still using semaphores on other systems. This implementation should be faster, and does not require explicit initialization. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. I tested both the futex and the semaphore versions. Committed to mainline. +static int32 +getproccount(void) +{ + int32 fd, rd, cnt, cpustrlen; + const byte *cpustr, *pos; + byte *bufpos; + byte buf[256]; + + fd = open(/proc/stat, O_RDONLY|O_CLOEXEC, 0); This broke bootstrap on Linux/x86_64 (CentOS 5.5), which lacks O_CLOEXEC. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH, testsuite]: Introduce sync_int128_runtime and sync_long_long_runtime
Hello! Attached patch introduces sync_int128_runtime and sync_long_long_runtime runtime check to prevent running atomic runtime tests on targets that don't support them. I also merged runtime check for arm*-*-linux-gnueabi with corresponding arm*-*-* compile-time check. This change has a nice side effect that gcc.dg/di-longlong64-sync.c and gcc.dg/di-sync-multithread.c tests now also run on x86_64. Regarding arm, I have simply copied existing runtime check. Various long-long atomic tests now also run on this target, so perhaps there will be some fallout on recently introduced tests. 2011-11-25 Uros Bizjak ubiz...@gmail.com PR testsuite/51258 * lib/target-supports.exp (check_effective_target_sync_int_128_runtime): New procedure. (check_effective_target_sync_long_long_runtime): Ditto. (check_effective_target_sync_long_long): Add arm*-*-*. (check_effective_target_sync_longlong): Remove. * gcc.dg/atomic-op-5.c: Require sync_int_128_runtime effective target. * gcc.dg/atomic-compare-exchange-5.c: Ditto. * gcc.dg/atomic-exchange-5.c: Ditto. * gcc.dg/atomic-load-5.c: Ditto. * gcc.dg/atomic-store-5.c: Ditto. * gcc.dg/simulate-thread/atomic-load-int128.c: Ditto. * gcc.dg/simulate-thread/atomic-other-int128.c: Ditto. * gcc.dg/atomic-op-4.c: Require sync_long_long_runtime effective target. * gcc.dg/atomic-compare-exchange-4.c: Ditto. * gcc.dg/atomic-exchange-4.c: Ditto. * gcc.dg/atomic-load-4.c: Ditto. * gcc.dg/atomic-store-4.c: Ditto. * gcc.dg/di-longlong64-sync-1.c: Ditto. * gcc.dg/di-sync-multithread.c: Ditto. * gcc.dg/simulate-thread/atomic-load-longlong.c: Ditto. * gcc.dg/simulate-thread/atomic-other-longlong.c: Ditto. Patch was tested on x86_64-pc-linux-gnu and was committed to mainline SVN. Uros. Index: lib/target-supports.exp === --- lib/target-supports.exp (revision 181721) +++ lib/target-supports.exp (working copy) @@ -3620,17 +3620,80 @@ } } +# Return 1 if the target supports atomic operations on int_128 values +# and can execute them. + +proc check_effective_target_sync_int_128_runtime { } { +if { ([istarget x86_64-*-*] || [istarget i?86-*-*]) + ![is-effective-target ia32] } { + return [check_cached_effective_target sync_int_128_available { + check_runtime_nocache sync_int_128_available { + #include cpuid.h + int main () + { + unsigned int eax, ebx, ecx, edx; + if (__get_cpuid (1, eax, ebx, ecx, edx)) + return !(ecx bit_CMPXCHG16B); + return 1; + } + } + }] +} else { + return 0 +} +} + # Return 1 if the target supports atomic operations on long long. proc check_effective_target_sync_long_long { } { if { ([istarget x86_64-*-*] || [istarget i?86-*-*]) - ![is-effective-target ia32] } { + ![is-effective-target ia32] +|| [istarget arm*-*-*] } { return 1 } else { return 0 } } +# Return 1 if the target supports atomic operations on long long +# and can execute them. + +proc check_effective_target_sync_long_long_runtime { } { +if { ([istarget x86_64-*-*] || [istarget i?86-*-*]) + ![is-effective-target ia32] } { + return [check_cached_effective_target sync_long_long_available { + check_runtime_nocache sync_long_long_available { + #include cpuid.h + int main () + { + unsigned int eax, ebx, ecx, edx; + if (__get_cpuid (1, eax, ebx, ecx, edx)) + return !(edx bit_CMPXCHG8B); + return 1; + } + } + }] +} elseif { [istarget arm*-*-linux-gnueabi] } { + return [check_runtime sync_longlong_runtime { + #include stdlib.h + int main () + { + long long l1; + + if (sizeof (long long) != 8) + exit (1); + + /* Just check for native; checking for kernel fallback is tricky. */ + asm volatile (ldrexd r0,r1, [%0] : : r (l1) : r0, r1); + + exit (0); + } + } ] +} else { + return 0 +} +} + # Return 1 if the target supports atomic operations on int and long. proc check_effective_target_sync_int_long { } { @@ -3662,31 +3725,6 @@ return $et_sync_int_long_saved } -# Return 1 if the target supports atomic operations on long long and can -# execute them -# So far only put checks in for ARM, others may want to add their own -proc check_effective_target_sync_longlong { } { -return [check_runtime sync_longlong_runtime { - #include stdlib.h - int main () - { - long long l1; - - if (sizeof
[PATCH] Ignore EDGE_PRESERVE in flow info verification (PR rtl-optimization/49912)
Hi! The following testcase ICEs during flow verification, because there is an unconditional branch with EDGE_PRESERVE set on the edge and because of that bit rtl_verify_flow_info_1 wouldn't count it as n_branch. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-11-25 Jakub Jelinek ja...@redhat.com PR rtl-optimization/49912 * cfgrtl.c (rtl_verify_flow_info_1): Ignore also EDGE_PRESERVE bit when counting n_branch. * g++.dg/other/pr49912.C: New test. --- gcc/cfgrtl.c.jj 2011-11-21 16:22:02.0 +0100 +++ gcc/cfgrtl.c2011-11-25 10:29:54.272326735 +0100 @@ -1875,7 +1875,8 @@ rtl_verify_flow_info_1 (void) | EDGE_CAN_FALLTHRU | EDGE_IRREDUCIBLE_LOOP | EDGE_LOOP_EXIT - | EDGE_CROSSING)) == 0) + | EDGE_CROSSING + | EDGE_PRESERVE)) == 0) n_branch++; if (e-flags EDGE_ABNORMAL_CALL) --- gcc/testsuite/g++.dg/other/pr49912.C.jj 2011-11-25 10:40:27.180613829 +0100 +++ gcc/testsuite/g++.dg/other/pr49912.C2011-11-25 10:40:15.0 +0100 @@ -0,0 +1,38 @@ +// PR rtl-optimization/49912 +// { dg-do compile } +// { dg-require-effective-target freorder } +// { dg-options -O -freorder-blocks-and-partition } + +int foo (int *); + +struct S +{ + int *m1 (); + S (int); + ~S () { foo (m1 ()); } +}; + +template int +struct V +{ + S *v1; + void m2 (const S ); + S *base (); +}; + +template int N +void VN::m2 (const S x) +{ + S a = x; + S *l = base (); + while (l) +*v1 = *--l; +} + +V0 v; + +void +foo () +{ + v.m2 (0); +} Jakub
[PATCH] Make sibcall argument overlap check less pessimistic (PR middle-end/50074)
Hi! Kirill's recent change to mem_overlaps_already_clobbered_arg_p resulted in various code quality regressions, many calls that used to be tail call optimized no longer are. Here is an attempt to make the check more complete (e.g. the change wouldn't see overlap if addr was PLUS of two REGs, where one of the REGs was based on internal_arg_pointer, etc.) and less pessimistic. As tree-tailcall.c doesn't allow tail calls from functions that have address of any of the caller's parameters taken, IMHO it is enough to look for internal_arg_pointer based pseudos initialized in the tail call sequence. This patch scans the tail call sequence and notes which pseudos are based on internal_arg_pointer (and what offset from that pointer they have) and uses that in mem_overlaps_already_clobbered_arg_p. Bootstrapped/regtested on x86_64-linux and i686-linux, tested on some testcases on ia64-linux (as an example of target which doesn't have reg + disp addressing and thus forces everything into registers). Ok for trunk? 2011-11-25 Jakub Jelinek ja...@redhat.com PR middle-end/50074 * calls.c (internal_arg_pointer_seq_start, internal_arg_pointer_cache): New variables. (internal_arg_pointer_based_reg_1): New function. (internal_arg_pointer_based_reg): New function. (mem_overlaps_already_clobbered_arg_p): Use it. (expand_call): Free internal_arg_pointer_cache vector and clear internal_arg_pointer_seq_start. --- gcc/calls.c.jj 2011-11-08 23:35:12.0 +0100 +++ gcc/calls.c 2011-11-25 17:24:52.445878841 +0100 @@ -1658,6 +1658,106 @@ rtx_for_function_call (tree fndecl, tree return funexp; } +/* Last insn that has been already scanned by internal_arg_pointer_based_reg, + or NULL_RTX if none has been scanned yet. */ +static rtx internal_arg_pointer_seq_start; +/* Vector indexed by REGNO () - FIRST_PSEUDO_REGISTER, recoding if a pseudo + is based on crtl-args.internal_arg_pointer. It is NULL_RTX if not based + on it, some CONST_INT as offset from crtl-args.internal_arg_pointer + or PC for unknown offset from it. */ +static VEC(rtx, heap) *internal_arg_pointer_cache; + +static rtx internal_arg_pointer_based_reg (rtx, bool); + +/* Helper function for internal_arg_pointer_based_reg, called through + for_each_rtx. Return 1 if a crtl-args.internal_arg_pointer based + register is seen anywhere. */ + +static int +internal_arg_pointer_based_reg_1 (rtx *loc, void *data ATTRIBUTE_UNUSED) +{ + if (REG_P (*loc) internal_arg_pointer_based_reg (*loc, false) != NULL_RTX) +return 1; + if (MEM_P (*loc)) +return -1; + return 0; +} + +/* If REG is based on crtl-args.internal_arg_pointer, return either + a CONST_INT offset from crtl-args.internal_arg_pointer if + offset from it is known constant, or PC if the offset is unknown. + Return NULL_RTX if REG isn't based on crtl-args.internal_arg_pointer. */ + +static rtx +internal_arg_pointer_based_reg (rtx reg, bool scan) +{ + rtx insn; + + if (CONSTANT_P (reg)) +return NULL_RTX; + + if (reg == crtl-args.internal_arg_pointer) +return const0_rtx; + + if (REG_P (reg) REGNO (reg) FIRST_PSEUDO_REGISTER) +return NULL_RTX; + + if (GET_CODE (reg) == PLUS CONST_INT_P (XEXP (reg, 1))) +{ + rtx val = internal_arg_pointer_based_reg (XEXP (reg, 0), scan); + if (val == NULL_RTX || val == pc_rtx) + return val; + return plus_constant (val, INTVAL (XEXP (reg, 1))); +} + + if (!scan) +insn = NULL_RTX; + else if (internal_arg_pointer_seq_start == NULL_RTX) +insn = get_insns (); + else +insn = NEXT_INSN (internal_arg_pointer_seq_start); + while (insn) +{ + rtx set = single_set (insn); + if (set + REG_P (SET_DEST (set)) + REGNO (SET_DEST (set)) = FIRST_PSEUDO_REGISTER) + { + rtx val = NULL_RTX; + unsigned int idx = REGNO (SET_DEST (set)) - FIRST_PSEUDO_REGISTER; + /* Punt on pseudos set multiple times. */ + if (idx VEC_length (rtx, internal_arg_pointer_cache) + VEC_index (rtx, internal_arg_pointer_cache, idx) +!= NULL_RTX) + val = pc_rtx; + else + val = internal_arg_pointer_based_reg (SET_SRC (set), false); + if (val != NULL_RTX) + { + VEC_safe_grow_cleared (rtx, heap, internal_arg_pointer_cache, +idx + 1); + VEC_replace (rtx, internal_arg_pointer_cache, idx, val); + } + } + if (NEXT_INSN (insn) == NULL_RTX) + internal_arg_pointer_seq_start = insn; + insn = NEXT_INSN (insn); +} + + if (REG_P (reg)) +{ + unsigned int idx = REGNO (reg) - FIRST_PSEUDO_REGISTER; + if (idx VEC_length (rtx, internal_arg_pointer_cache)) + return VEC_index (rtx, internal_arg_pointer_cache, idx); + else + return NULL_RTX; +} + + if (for_each_rtx (reg, internal_arg_pointer_based_reg_1,
Re: Go patch committed: New lock/note implementation
Rainer Orth r...@cebitec.uni-bielefeld.de writes: This broke bootstrap on Linux/x86_64 (CentOS 5.5), which lacks O_CLOEXEC. ... and also Solaris 8 and 9 bootstrap which lack sem_timedwait: /vol/gcc/src/hg/trunk/local/libgo/runtime/thread-sema.c: In function 'runtime_semasleep': /vol/gcc/src/hg/trunk/local/libgo/runtime/thread-sema.c:42:7: error: implicit declaration of function 'sem_timedwait' [-Werror=implicit-function-declaration] Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH, testsuite]: Enable sync_long_long on 32bit x86 and alpha
Hello! Attached patch enables sync_long_long tests on 32bit x86 and alpha. Enabling the tests for alpha is obvious (it is 64bit-by-default target, after all), but 32bit x86 needs at least -march=pentium passed via dg-options. My previous patch checks bit_CMPXCHG8B cpuid bit before compiling these tests, so passing-march=pentium is safe. 2011-11-25 Uros Bizjak ubiz...@gmail.com PR testsuite/51258 * lib/target-supports.exp (check_effective_target_sync_long_long): Also supported on 32bit x86 targets. Add comment about required dg-options. Add alpha*-*-* targets. (check_effective_target_sync_long_long_runtime): Ditto. * gcc.dg/atomic-op-4.c (dg-options): Add -march=pentium for 32bit x86 targets. * gcc.dg/atomic-compare-exchange-4.c: Ditto. * gcc.dg/atomic-exchange-4.c: Ditto. * gcc.dg/atomic-load-4.c: Ditto. * gcc.dg/atomic-store-4.c: Ditto. * gcc.dg/di-longlong64-sync-1.c: Ditto. * gcc.dg/di-sync-multithread.c: Ditto. * gcc.dg/simulate-thread/atomic-load-longlong.c: Ditto. * gcc.dg/simulate-thread/atomic-other-longlong.c: Ditto. Patch was tested on 32bit x86 build and alphaev68-pc-linux-gnu. Committed to mainline SVN. However, the patch uncovers certain problems with existing fild/fistpl implementation of atomic load/store. It fails in several of thread simulation tests, i.e. FAIL: gcc.dg/simulate-thread/atomic-load-longlong.c -O0 -g thread simulation test with: 1: x/i $pc = 0x8048582 simulate_thread_main+61: fild -0x8(%ebp) 0x08048585 104 __atomic_store_n (result, ret, __ATOMIC_SEQ_CST); 1: x/i $pc = 0x8048585 simulate_thread_main+64: fistp 0x8049ac0 0x0804858b 104 __atomic_store_n (result, ret, __ATOMIC_SEQ_CST); 1: x/i $pc = 0x804858b simulate_thread_main+70: lock orl $0x0,(%esp) FAIL: Invalid result returned from fetch I didn't check SSE, but it looks that fild/fistpl combo isn't atomic or does not obey lock barriers. Uros. Index: lib/target-supports.exp === --- lib/target-supports.exp (revision 181727) +++ lib/target-supports.exp (working copy) @@ -3644,11 +3644,14 @@ } # Return 1 if the target supports atomic operations on long long. +# +# Note: 32bit x86 targets require -march=pentium in dg-options. proc check_effective_target_sync_long_long { } { -if { ([istarget x86_64-*-*] || [istarget i?86-*-*]) - ![is-effective-target ia32] -|| [istarget arm*-*-*] } { +if { [istarget x86_64-*-*] +|| [istarget i?86-*-*]) +|| [istarget arm*-*-*] +|| [istarget alpha*-*-*] } { return 1 } else { return 0 @@ -3657,10 +3660,12 @@ # Return 1 if the target supports atomic operations on long long # and can execute them. +# +# Note: 32bit x86 targets require -march=pentium in dg-options. proc check_effective_target_sync_long_long_runtime { } { -if { ([istarget x86_64-*-*] || [istarget i?86-*-*]) - ![is-effective-target ia32] } { +if { [istarget x86_64-*-*] +|| [istarget i?86-*-*] } { return [check_cached_effective_target sync_long_long_available { check_runtime_nocache sync_long_long_available { #include cpuid.h @@ -3689,6 +3694,8 @@ exit (0); } } ] +} elseif { [istarget alpha*-*-*] } { + return 1 } else { return 0 } Index: gcc.dg/atomic-compare-exchange-4.c === --- gcc.dg/atomic-compare-exchange-4.c (revision 181727) +++ gcc.dg/atomic-compare-exchange-4.c (working copy) @@ -3,6 +3,7 @@ /* { dg-do run } */ /* { dg-require-effective-target sync_long_long_runtime } */ /* { dg-options } */ +/* { dg-options -march=pentium { target { { i?86-*-* x86_64-*-* } ia32 } } } */ /* Test the execution of __atomic_compare_exchange_n builtin for a long_long. */ Index: gcc.dg/di-longlong64-sync-1.c === --- gcc.dg/di-longlong64-sync-1.c (revision 181727) +++ gcc.dg/di-longlong64-sync-1.c (working copy) @@ -1,6 +1,8 @@ /* { dg-do run } */ /* { dg-require-effective-target sync_long_long_runtime } */ /* { dg-options -std=gnu99 } */ +/* { dg-additional-options -march=pentium { target { { i?86-*-* x86_64-*-* } ia32 } } } */ + /* { dg-message note: '__sync_fetch_and_nand' changed semantics in GCC 4.4 { target *-*-* } 0 } */ /* { dg-message note: '__sync_nand_and_fetch' changed semantics in GCC 4.4 { target *-*-* } 0 } */ Index: gcc.dg/atomic-load-4.c === --- gcc.dg/atomic-load-4.c (revision 181727) +++ gcc.dg/atomic-load-4.c (working copy) @@ -3,6 +3,7 @@ /* { dg-do run } */ /* { dg-require-effective-target sync_long_long_runtime } */ /* { dg-options
Re: [PATCH, testsuite]: Enable sync_long_long on 32bit x86 and alpha
On Fri, Nov 25, 2011 at 8:31 PM, Uros Bizjak ubiz...@gmail.com wrote: I didn't check SSE, but it looks that fild/fistpl combo isn't atomic or does not obey lock barriers. Adding -msse to failing test works OK. Uros.
[PATCH] Improve EXPAND_SUM handling in expand_expr_addr_expr* (PR middle-end/50074)
Hi! While looking at this PR, I was first surprised that on i?86 we got pseudo = argp + 4 and mem_overlap* was called with that pseudo + 4 etc. I don't see why we should force the address into register for EXPAND_SUM modifier, with this mem_overlap* sees argp + 8 etc. directly (on i?86, of course on ia64 it still sees a register and thus the other patch I've posted is needed). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-11-25 Jakub Jelinek ja...@redhat.com PR middle-end/50074 * expr.c (expand_expr_addr_expr_1): Don't call force_operand for EXPAND_SUM modifier. --- gcc/expr.c.jj 2011-11-21 16:22:02.0 +0100 +++ gcc/expr.c 2011-11-25 12:46:40.070831662 +0100 @@ -7452,7 +7452,8 @@ expand_expr_addr_expr_1 (tree exp, rtx t } if (modifier != EXPAND_INITIALIZER - modifier != EXPAND_CONST_ADDRESS) + modifier != EXPAND_CONST_ADDRESS + modifier != EXPAND_SUM) result = force_operand (result, target); return result; } Jakub
Re: [Patch, fortran, RFC] PR 40958 Reduce size of module files
On Friday 25 November 2011 11:10:01 Janne Blomqvist wrote: Based on a brief inspection of the code, most if not all of these seeks are for a very short distance (typically peek a few bytes ahead in the stream, then seek back) I'm afraid they aren't. The moves are as follows (-: sequential, x: seek) -- beginning of file - skip operator interfaces - skip user operators - skip commons, equivalences, and derived type extensions - register the offset of each symbol node and skip it - (this is usually -the biggest part of the module) - read the symtree list and mark needed the associated symbols (if they are wanted) -- end of file x go back to operator interfaces and load them - load user operators - load commons - load equivalences xxx now the required symbols are known, so for each one of them seek to its offset and load it. This requires a lot of seeks, and if the number of symbols, components etc is high in the module, they are not necessarily short distance x load derived type extensions We'll see the results from Salvatore, but I'm not very optimistic. Mikael
Re: Memset/memcpy patch
On Wed, Nov 23, 2011 at 3:32 PM, Michael Zolotukhin michael.v.zolotuk...@gmail.com wrote: I found and fixed another problem in the latest memcpy/memest changes - with this fix all the failing tests mentioned in #51134 started passing. Bootstraps are also ok. Though I still see fails in 32-bit make check, so probably, it'd be better to revert the changes till these fails are fixed. I will revert it for now. OK. I guess I can break out the simple fixes and commit them for 4.7 and we could revisit this for next stage1. Probably not by adding all the features together, but extending prologues/epilogues first and adding SSE loops with the new alignment logic next. Honza -- H.J.
RFA: Fix PR middle-end/50074
On load-store architectures, the function address is generally loaded into a register before any outgoing arguments are stored in the stack frame (if any). Thus, generally allowing memory loads before any arguments of the sibcall have been stored in the stack frame is effective to make the sibcall-6.c test work again. This has been confirmed for Epiphany, x86_64-apple-darwin10 and s390x . Bootstrapped and regtested on i686-pc-linux-gnu. 2011-11-19 Joern Rennecke joern.renne...@embecosm.com PR middle-end/50074 * calls.c (mem_overlaps_already_clobbered_arg_p): Return false if no outgoing arguments have been stored so far. Index: calls.c === --- calls.c (revision 2195) +++ calls.c (working copy) @@ -1668,6 +1668,8 @@ mem_overlaps_already_clobbered_arg_p (rt { HOST_WIDE_INT i; + if (sbitmap_empty_p (stored_args_map)) +return false; if (addr == crtl-args.internal_arg_pointer) i = 0; else if (GET_CODE (addr) == PLUS