Re: [PATCH, x86] Use vector moves in memmove expanding
On Fri, Apr 12, 2013 at 01:08:15PM +0400, Michael Zolotukhin wrote: I did some profiling of builtin implementation, download this http://kam.mff.cuni.cz/~ondra/memcpy_profile_builtin.tar.bz2 Nice data, thanks! Could you please describe what is memcpy_new_builtin here? Is it how GCC expanded memcpy with this patch? Is this a comparison between libcall, libcall with your version of glibc, and expanded memmov with implementation from this patch? I try to make benchmarks self contained. So now I measure libcall, libcall with my version and current builtin expansion. I updated my benchmark, one of problems of measuring memcpy is that most memory ops happen asynchronously so this version should capute that. (padding now should be sufficient but I did not decrement it from time yet.) Now memcpy_gcc_builtin there measures builtin for first 100 sizes, then switches to my implementation. I added memcpy_new_builtin which is now same as memcpy_gcc_builtin. To add your implementation compile variant/builtin.c file into variant/builtin.s file. Then run ./benchmark. Ondra Michael On 12 April 2013 12:54, Ondřej Bílka nel...@seznam.cz wrote: On Thu, Apr 11, 2013 at 04:32:30PM +0400, Michael Zolotukhin wrote: 128 is about upper bound you can expand with sse moves. Tuning did not take into account code size and measured only when code is in tigth loop. For GPR-moves limit is around 64. Thanks for the data - I've not performed measurements with this implementation yet, but we surely should adjust thresholds to avoid performance degradations on small sizes. I did some profiling of builtin implementation, download this http://kam.mff.cuni.cz/~ondra/memcpy_profile_builtin.tar.bz2 see files results_rand/result.html and results_rand_noicache/result.html A memcpy_new_builtin for sizes x0,x1...x5 calls builtin and new otherwise. I did same for memcpy_glibc to see variance. memcpy_new does not call builtin. To regenerate graphs on other arch run benchmarks script. To use other builtin change in Makefile how to compile variant/builtin.c file. A builtin are faster by inlined function call, I did not add that as I do not know estimate of this cost. Michael On 10 April 2013 22:53, Ondřej Bílka nel...@seznam.cz wrote: On Wed, Apr 10, 2013 at 09:53:09PM +0400, Michael Zolotukhin wrote: Hi, I am writing memcpy for libc. It avoids computed jump and has is much faster on small strings (variant for sandy bridge attached. I'm not sure I get what you meant - could you please explain what is computed jumps? computed goto. See Duff's device it works almost exactly same. You must also check performance with cold instruction cache. Now memcpy(x,y,128) takes 126 bytes which is too much. Do not align for small sizes. Dependency caused by this erases any gains that you migth get. Keep in mind that in 55% of cases data are already aligned. Other algorithms are still available and we can use them for small sizes. E.g. for sizes 128 we could emit loop with GPR-moves and don't use vector instructions in it. 128 is about upper bound you can expand with sse moves. Tuning did not take into account code size and measured only when code is in tigth loop. For GPR-moves limit is around 64. What matters which code has best performance/size ratio. But that's tuning and I haven't worked on it yet - I'm going to measure performance of all algorithms on all sizes and thus defines on which sizes which algorithm is preferable. What I did in this patch is introducing some infrastructure to allow emitting of vector moves in movmem expanding - tuning is certainly possible and needed, but that's out of the scope of the patch. On 10 April 2013 21:43, Ondřej Bílka nel...@seznam.cz wrote: On Wed, Apr 10, 2013 at 08:14:30PM +0400, Michael Zolotukhin wrote: Hi, This patch adds a new algorithm of expanding movmem in x86 and a bit refactor existing implementation. This is a reincarnation of the patch that was sent wasn't checked couple of years ago - now I reworked it from scratch and divide into several more manageable parts. Hi, I am writing memcpy for libc. It avoids computed jump and has is much faster on small strings (variant for sandy bridge attached. For now this algorithm isn't used, because cost_models are tuned to use existing ones. I believe the new algorithm will give better performance, but I'll leave cost-models tuning for a separate patch. You must also check performance with cold instruction cache. Now memcpy(x,y,128) takes 126 bytes which is too much. Also, I changed get_mem_align_offset to make it handle MEM_REFs as well. Probably, there is another way of getting info about alignment - if so, please let me know. Do not align for small sizes. Dependency caused by this
Re: [patch][DF] do not call df_insn_delete in remove_insn, only unlink the insn
Il 13/04/2013 02:02, Steven Bosscher ha scritto: * emit-rtl.c (remove_insn): Do not call df_insn_delete here. * cfgrtl.c (delete_insn): Call it here instead. * lra-spills.c (lra_final_code_change): Use delete_insn. * haifa-sched.c (sched_remove_insn): Likewise. * sel-sched-ir.c (return_nop_to_pool): Clear INSN_DELETED_P for nops returning to the nop pool. (sel_remove_insn): Simplify the only_disconnect case via remove_insn, use delete_insn for definitive removal. Clear BLOCK_FOR_INSN. Ok, thanks! Paolo
[Patch, Fortran, OOP] PR 55959: ICE in in gfc_simplify_expr, at fortran/expr.c:1920
Hi all, here is another trivial ICE-on-invalid fix that I will commit later today (as obvious regtested on x86_64-unknown-linux-gnu). Cheers, Janus 2013-04-13 Janus Weil ja...@gcc.gnu.org PR fortran/55959 * expr.c (gfc_simplify_expr): Branch is not unreachable. 2013-04-13 Janus Weil ja...@gcc.gnu.org PR fortran/55959 * gfortran.dg/typebound_proc_29.f03: New. pr55959.diff Description: Binary data typebound_proc_29.f90 Description: Binary data
Re: [patch, fortran, backport, 4.8] PR51825 - Fortran runtime error: Cannot match namelist object name
Hi Tilo, I would like to backport the fix for PR51825 I posted here http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00316.html to the 4.8 branch. well, the usual gfortran policy is to only do backports of regression fixes. In exceptional cases, also non-regression fixes can be backported (and have been in the past), if the bug is extremely severe and/or the the fix is extremely simple (IMHO the most severe type of bug is a wrong-code issue, where the user does not see any kind of error message, but just 'silently' gets wrong results). None of these is conditions is completely fulfilled, if you ask me (although the patch is indeed relatively simple). Therefore I don't directly see an urgent need for backporting, unless you can convince us why backporting this is extremely important. Anyway, I don't feel sufficiently familiar with the namelist I/O sector to ok the patch, but maybe someone else can share his opinion ... Cheers, Janus
Re: [Patch, Fortran, OOP] PR 56266: ICE on invalid in gfc_match_varspec
Hello, Le 12/04/2013 20:38, Janus Weil a écrit : Unless someone has a better idea how to treat this, I will commit the attached patch as obvious. Not really a better idea, but it seems to me that function calls can have trailing sub-references, so that gfc_match_varspec could be called on them. gfc_match_rvalue has: [...] switch (sym-attr.flavor) { [...] case FL_UNKNOWN: [... try to match a variable ...] /* Give up, assume we have a function. */ [...] e-expr_type = EXPR_FUNCTION; [...] gfc_match_actual_arglist (...); [...] /* If our new function returns a character, array or structure type, it might have subsequent references. */ m = gfc_match_varspec (e, ...); So, it seems that EXPR_FUNCTION is acceptable in gfc_match_varspec. And then, there is nothing preventing 'c(i)' in 'c(i)%encM()' from being parsed as a function. Is this supported? Mikael
Re: [Patch, Fortran, OOP] PR 56266: ICE on invalid in gfc_match_varspec
Hi Mikael, Unless someone has a better idea how to treat this, I will commit the attached patch as obvious. Not really a better idea, but it seems to me that function calls can have trailing sub-references, so that gfc_match_varspec could be called on them. gfc_match_rvalue has: [...] switch (sym-attr.flavor) { [...] case FL_UNKNOWN: [... try to match a variable ...] /* Give up, assume we have a function. */ [...] e-expr_type = EXPR_FUNCTION; [...] gfc_match_actual_arglist (...); [...] /* If our new function returns a character, array or structure type, it might have subsequent references. */ m = gfc_match_varspec (e, ...); So, it seems that EXPR_FUNCTION is acceptable in gfc_match_varspec. And then, there is nothing preventing 'c(i)' in 'c(i)%encM()' from being parsed as a function. Is this supported? I think this is forbidden by the Fortran standard, cf. e.g. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42188 Actually I'm not sure in which context a function call with sub-refs would be valid. One should re-check the standard on this ... (Btw, I have already committed the patch as r197936.) Cheers, Janus
Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics
On Fri, 12 Apr 2013 20:09:39 +0100 Julian Brown jul...@codesourcery.com wrote: On Fri, 12 Apr 2013 15:19:18 +0100 Kyrylo Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic to arm_neon.h through the generator ML scripts and also adds the built-ins to which the intrinsics will map to. The generator ML scripts are updated and used to generate the relevant .texi documentation, arm_neon.h and the tests in gcc.target/arm/neon . FWIW, some of the changes to neon*.ml can be simplified somewhat -- my attempt at an improved version of those bits is attached. I'm still not too happy with mode_suffix, but these new instructions require adding semantics to parts of the generator program which weren't really very well-defined to start with :-). I appreciate that it's a bit of a tangle... I thought of an improvement to the mode_suffix part from the last version of the patch, so here it is. I'm done fiddling with this now, so back to you! Cheers, JulianIndex: neon-gen.ml === --- neon-gen.ml (revision 197804) +++ neon-gen.ml (working copy) @@ -121,6 +121,7 @@ let rec signed_ctype = function | T_uint16 | T_int16 - T_intHI | T_uint32 | T_int32 - T_intSI | T_uint64 | T_int64 - T_intDI + | T_float16 - T_floatHF | T_float32 - T_floatSF | T_poly8 - T_intQI | T_poly16 - T_intHI @@ -275,8 +276,8 @@ let rec mode_suffix elttype shape = let mode = mode_of_elt elttype shape in string_of_mode mode with MixedMode (dst, src) - -let dstmode = mode_of_elt dst shape -and srcmode = mode_of_elt src shape in +let dstmode = mode_of_elt ~argpos:0 dst shape +and srcmode = mode_of_elt ~argpos:1 src shape in string_of_mode dstmode ^ string_of_mode srcmode let get_shuffle features = @@ -291,19 +292,24 @@ let print_feature_test_start features = match List.find (fun feature - match feature with Requires_feature _ - true | Requires_arch _ - true +| Requires_FP_bit _ - true | _ - false) features with - Requires_feature feature - + Requires_feature feature - Format.printf #ifdef __ARM_FEATURE_%s@\n feature | Requires_arch arch - Format.printf #if __ARM_ARCH = %d@\n arch +| Requires_FP_bit bit - +Format.printf #if ((__ARM_FP 0x%X) != 0)@\n + (1 lsl bit) | _ - assert false with Not_found - assert true let print_feature_test_end features = let feature = -List.exists (function Requires_feature x - true - | Requires_arch x - true +List.exists (function Requires_feature _ - true + | Requires_arch _ - true + | Requires_FP_bit _ - true | _ - false) features in if feature then Format.printf #endif@\n @@ -365,6 +371,7 @@ let deftypes () = __builtin_neon_hi, int, 16, 4; __builtin_neon_si, int, 32, 2; __builtin_neon_di, int, 64, 1; +__builtin_neon_hf, float, 16, 4; __builtin_neon_sf, float, 32, 2; __builtin_neon_poly8, poly, 8, 8; __builtin_neon_poly16, poly, 16, 4; Index: neon.ml === --- neon.ml (revision 197804) +++ neon.ml (working copy) @@ -21,7 +21,7 @@ http://www.gnu.org/licenses/. *) (* Shorthand types for vector elements. *) -type elts = S8 | S16 | S32 | S64 | F32 | U8 | U16 | U32 | U64 | P8 | P16 +type elts = S8 | S16 | S32 | S64 | F16 | F32 | U8 | U16 | U32 | U64 | P8 | P16 | I8 | I16 | I32 | I64 | B8 | B16 | B32 | B64 | Conv of elts * elts | Cast of elts * elts | NoElts @@ -37,6 +37,7 @@ type vectype = T_int8x8| T_int8x16 | T_uint16x4 | T_uint16x8 | T_uint32x2 | T_uint32x4 | T_uint64x1 | T_uint64x2 + | T_float16x4 | T_float32x2 | T_float32x4 | T_poly8x8 | T_poly8x16 | T_poly16x4 | T_poly16x8 @@ -46,11 +47,13 @@ type vectype = T_int8x8| T_int8x16 | T_uint8 | T_uint16 | T_uint32| T_uint64 | T_poly8 | T_poly16 - | T_float32 | T_arrayof of int * vectype + | T_float16 | T_float32 + | T_arrayof of int * vectype | T_ptrto of vectype | T_const of vectype | T_void | T_intQI | T_intHI | T_intSI - | T_intDI | T_floatSF + | T_intDI | T_floatHF + | T_floatSF (* The meanings of the following are: TImode : Tetra, two registers (four words). @@ -93,7 +96,7 @@ type arity = Arity0 of vectype | Arity4 of vectype * vectype * vectype * vectype * vectype type vecmode = V8QI | V4HI | V2SI | V2SF | DI -
Re: [Patch, fortran] PR 56919 SYSTEM_CLOCK on Windows
On Fri, Apr 12, 2013 at 11:49 PM, Dave Korn dave.korn.cyg...@gmail.com wrote: On 12/04/2013 19:47, Janne Blomqvist wrote: As I don't have a Windows system to test on, I would appreciate if somebody more familiar with that platform could take a quick look. In particular, I *think* it should be Ok to use win32 API functions on Cygwin (that is, cygwin-gcc ships the windows.h and other necessary headers out of the box?), Well, after installing the w32api package, but basically yes, that's fine for simple stuff like that. (You shouldn't go doing things like creating threads or synchronisation through the Win32 API, but calling GetTickCount[64] will be fine.) Ok, thanks. and that _WIN32 is the correct macro to use to select code which is common to MinGW and Cygwin. Alas no: $ gcc-4 -E - /dev/null -dM | grep WIN #define __WINT_MAX__ 4294967295U #define __WINT_MIN__ 0U #define __SIZEOF_WINT_T__ 4 #define __CYGWIN__ 1 #define __WINT_TYPE__ unsigned int #define __CYGWIN32__ 1 You should probably use #if defined(__MINGW32__) || defined (__CYGWIN__), since that'll also work on 64-bit Cygwin, as opposed to using __CYGWIN32__. I think __MINGW32__ is defined for 64-bit as well as 32-bit targets. Ok, I'll do that. Thanks for the info. FWIW, I grepped through the gcc tree and there's quite a lot of #if defined(_WIN32) !defined(__CYGWIN__) and similar, which in the light of the above, is pointless. And yes, I also recall that mingw-w64 also defines __MINGW32__. -- Janne Blomqvist
Re: [Patch, fortran] PR 56919 SYSTEM_CLOCK on Windows
On Sat, Apr 13, 2013 at 1:02 AM, Tobias Burnus bur...@net-b.de wrote: Janne Blomqvist wrote: the attached patch implements the SYSTEM_CLOCK intrinsics on the MinGW and Cygwin targets using the GetTickCount/GetTickCount64 functions. These should be quite robust monotonic clocks and AFAICS are the best we can do on Windows. I think using QueryPerformanceCounter is the better approach. It is supported since Windows 2000 and recommended as high-performance counter: http://msdn.microsoft.com/en-us/library/windows/desktop/ms644900%28v=vs.85%29.aspx I didn't want to use QPC, as it can apparently be unreliable, see e.g. the PEP 418 I linked to previously. But it seems that the worst issues are caused by old and somewhat rare hardware, or have been fixed in more recent Windows service packs, so maybe it's not worth worrying about. I really dislike GetTickCount, which overflows after 50 days - that's not what you want to have. And GetTickCount64 only exists since Vista/2008. By contrast, QueryPerformanceCounter should allow for finer resolution and it is already available since Windows 2000. Attached is an updated patch which uses GetTickCount for system_clock_4; this should be fine as system_clock_4 wraps around in ~25 days anyways. For system_clock_8 it uses QueryPerformance{Counter,Frequency}. Regarding clock_gettime: I really think we should check check _POSIX_MONOTONIC_CLOCK as well. Currently, only MONOTONIC_CLOCK is checked, which is always available (on POSIX conform systems). See GLIBCXX_ENABLE_LIBSTDCXX_TIME in libstdc++-v3/acinclude.m4 - and in particular ac_has_clock_monotonic. The patch also adds an additional check for _POSIX_MONOTONIC_CLOCK. Ok for trunk? Frontend ChangeLog: 2013-04-13 Janne Blomqvist j...@gcc.gnu.org PR fortran/56919 * intrinsics.texi (SYSTEM_CLOCK): Update documentation. libgfortran ChangeLog: 2013-04-13 Janne Blomqvist j...@gcc.gnu.org PR fortran/56919 * intrinsics/time_1.h: Check __CYGWIN__ in addition to __MINGW32__. * intrinsics/system_clock.c (GF_CLOCK_MONOTONIC): Check _POSIX_MONOTONIC_CLOCK as well. (system_clock_4): Use GetTickCount on Windows. (system_clock_8): Use QueryPerformanceCounter and QueryPerformanceCounterFrequency on Windows. -- Janne Blomqvist sysclockwin.2.diff Description: Binary data
Re: [Patch, Fortran, OOP] PR 56266: ICE on invalid in gfc_match_varspec
Le 13/04/2013 16:02, Janus Weil a écrit : Hi Mikael, So, it seems that EXPR_FUNCTION is acceptable in gfc_match_varspec. And then, there is nothing preventing 'c(i)' in 'c(i)%encM()' from being parsed as a function. Is this supported? I think this is forbidden by the Fortran standard, cf. e.g. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42188 Actually I'm not sure in which context a function call with sub-refs would be valid. One should re-check the standard on this ... Indeed, that's invalid: structure-component is data-ref data-ref is part-ref [ % part-ref ] ... part-ref is part-name [ ( section-subscript-list ) ] [ image-selector ] (R611) The leftmost part-name shall be the name of a data object. I thought they were allowed for pointer-returning functions. Mikael
[patch] committed: minor sched-int.h and sched-deps.c fixes
Hello, In sched-deps.c:deps_analyze_insn there's no need to look for EH_REGION notes, they don't exist until just before final. This assert, and some code in alpha.c (PR56858), are the only remaining meaningful references outside final.c In sched-int.h, the header is only non-empty if INSN_SCHEDULING is defined. After #ifdef INSN_SCHEDULING the first header included is insn-attr.h - which defines (or not) INSN_SCHEDULING. So move that include outside the #ifdef INSN_SCHEDULING guard. Bootstrapped and tested on several targets over the past three weeks. Committed. Ciao! Steven * sched-deps.c (deps_analyze_insn): Do not check for EH_REGION insn notes, they are emitted only just before final. * sched-int.h: Include insn-attr.h before checking INSN_SCHEDULING. Index: sched-deps.c === --- sched-deps.c(revision 197944) +++ sched-deps.c(working copy) @@ -3680,12 +3680,6 @@ deps_analyze_insn (struct deps_desc *deps, rtx ins if (sched_deps_info-use_cselib) cselib_process_insn (insn); - /* EH_REGION insn notes can not appear until well after we complete - scheduling. */ - if (NOTE_P (insn)) -gcc_assert (NOTE_KIND (insn) != NOTE_INSN_EH_REGION_BEG -NOTE_KIND (insn) != NOTE_INSN_EH_REGION_END); - if (sched_deps_info-finish_insn) sched_deps_info-finish_insn (); Index: sched-int.h === --- sched-int.h (revision 197944) +++ sched-int.h (working copy) @@ -21,10 +21,10 @@ along with GCC; see the file COPYING3. If not see #ifndef GCC_SCHED_INT_H #define GCC_SCHED_INT_H +#include insn-attr.h + #ifdef INSN_SCHEDULING -/* For state_t. */ -#include insn-attr.h #include df.h #include basic-block.h
[PATCH] Enable java for aarch64
This enables building java for aarch64. Most of the configuration bits were copied from arm. === libjava Summary === # of expected passes2533 # of unexpected failures29 # of untested testcases 25 Andreas. * configure.ac (aarch64-*-*): Don't disable java. * configure: Regenerate. libjava/: * configure.host: Add support for aarch64. * sysdep/aarch64/locks.h: New file. libjava/classpath/: * native/fdlibm/ieeefp.h: Add support for aarch64. --- configure| 2 ++ configure.ac | 2 ++ libjava/classpath/native/fdlibm/ieeefp.h | 8 + libjava/configure.host | 8 - libjava/sysdep/aarch64/locks.h | 57 5 files changed, 76 insertions(+), 1 deletion(-) create mode 100644 libjava/sysdep/aarch64/locks.h diff --git a/configure b/configure index d809535..e161cad 100755 --- a/configure +++ b/configure @@ -3272,6 +3272,8 @@ esac # Disable Java if libffi is not supported. case ${target} in + aarch64-*-*) +;; alpha*-*-*) ;; arm*-*-*) diff --git a/configure.ac b/configure.ac index 48ec1aa..bec489f 100644 --- a/configure.ac +++ b/configure.ac @@ -611,6 +611,8 @@ esac # Disable Java if libffi is not supported. case ${target} in + aarch64-*-*) +;; alpha*-*-*) ;; arm*-*-*) diff --git a/libjava/classpath/native/fdlibm/ieeefp.h b/libjava/classpath/native/fdlibm/ieeefp.h index c230bbb..7ef2ae7e 100644 --- a/libjava/classpath/native/fdlibm/ieeefp.h +++ b/libjava/classpath/native/fdlibm/ieeefp.h @@ -4,6 +4,14 @@ #ifndef __IEEE_BIG_ENDIAN #ifndef __IEEE_LITTLE_ENDIAN +#ifdef __aarch64__ +#ifdef __AARCH64EB__ +#define __IEEE_BIG_ENDIAN +#else +#define __IEEE_LITTLE_ENDIAN +#endif +#endif + #ifdef __alpha__ #define __IEEE_LITTLE_ENDIAN #endif diff --git a/libjava/configure.host b/libjava/configure.host index 0c3b41c..96f86fe 100644 --- a/libjava/configure.host +++ b/libjava/configure.host @@ -81,6 +81,11 @@ ATOMICSPEC= # This case statement supports per-CPU defaults. case ${host} in + aarch64*-linux*) + libgcj_interpreter=yes + sysdeps_dir=aarch64 + ATOMICSPEC=-fuse-atomic-builtins + ;; arm*-elf) with_libffi_default=no PROCESS=Ecos @@ -224,7 +229,8 @@ case ${host} in x86_64*-linux* | \ hppa*-linux* | \ m68k*-linux* | \ - sh-linux* | sh[34]*-linux*) + sh-linux* | sh[34]*-linux* | \ + aarch64*-linux*) can_unwind_signal=yes libgcj_ld_symbolic='-Wl,-Bsymbolic' if test x$slow_pthread_self = xyes \ diff --git a/libjava/sysdep/aarch64/locks.h b/libjava/sysdep/aarch64/locks.h new file mode 100644 index 000..f91473d --- /dev/null +++ b/libjava/sysdep/aarch64/locks.h @@ -0,0 +1,57 @@ +// locks.h - Thread synchronization primitives. AArch64 implementation. + +#ifndef __SYSDEP_LOCKS_H__ +#define __SYSDEP_LOCKS_H__ + +typedef size_t obj_addr_t; /* Integer type big enough for object */ + /* address. */ + +// Atomically replace *addr by new_val if it was initially equal to old. +// Return true if the comparison succeeded. +// Assumed to have acquire semantics, i.e. later memory operations +// cannot execute before the compare_and_swap finishes. +inline static bool +compare_and_swap(volatile obj_addr_t *addr, + obj_addr_t old, + obj_addr_t new_val) +{ + return __sync_bool_compare_and_swap(addr, old, new_val); +} + +// Set *addr to new_val with release semantics, i.e. making sure +// that prior loads and stores complete before this +// assignment. +inline static void +release_set(volatile obj_addr_t *addr, obj_addr_t new_val) +{ + __sync_synchronize(); + *addr = new_val; +} + +// Compare_and_swap with release semantics instead of acquire semantics. +// On many architecture, the operation makes both guarantees, so the +// implementation can be the same. +inline static bool +compare_and_swap_release(volatile obj_addr_t *addr, +obj_addr_t old, +obj_addr_t new_val) +{ + return __sync_bool_compare_and_swap(addr, old, new_val); +} + +// Ensure that subsequent instructions do not execute on stale +// data that was loaded from memory before the barrier. +inline static void +read_barrier() +{ + __sync_synchronize(); +} + +// Ensure that prior stores to memory are completed with respect to other +// processors. +inline static void +write_barrier() +{ + __sync_synchronize(); +} +#endif -- 1.8.2.1 -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
[patch] fix PR52139 correctly
Hello, The fix for PR52139 only papered over another problem: That things from a basic block header were emitted into the insns stream. When going out of cfglayout mode, these header-insn will be lost. It's probably possible to construct a test case where e.g. a NOTE_INSN_DELETED_DEBUG_LABEL is lost because of this, but I haven't tried to do so. The correct fix is to find a new home for the header and footer insn, and the most logical place is to put them in the footer of the merged block. Bootstrappedtested on x86_64-unknown-linux-gnu with current trunk, and with r184004 (modified patch) to make sure the test case is fixed (it doesn't fail on trunk even with r184005 reverted). OK for trunk? Ciao! Steven * cfgrtl.c (cfg_layout_merge_blocks): Revert r184005, implement correct fix by moving header and footer insn to the footer of the merged basic block. Clear BB_END of the merged-away block. Index: cfgrtl.c === --- cfgrtl.c(revision 197942) +++ cfgrtl.c(working copy) @@ -4083,18 +4083,40 @@ cfg_layout_merge_blocks (basic_block a, if (!optimize) emit_nop_for_unique_locus_between (a, b); - /* Possible line number notes should appear in between. */ - if (BB_HEADER (b)) -{ - rtx first = BB_END (a), last; - - last = emit_insn_after_noloc (BB_HEADER (b), BB_END (a), a); - /* The above might add a BARRIER as BB_END, but as barriers -aren't valid parts of a bb, remove_insn doesn't update -BB_END if it is a barrier. So adjust BB_END here. */ - while (BB_END (a) != first BARRIER_P (BB_END (a))) - BB_END (a) = PREV_INSN (BB_END (a)); - delete_insn_chain (NEXT_INSN (first), last, false); + /* Move things from b-footer after a-footer. */ + if (BB_FOOTER (b)) +{ + if (!BB_FOOTER (a)) + BB_FOOTER (a) = BB_FOOTER (b); + else + { + rtx last = BB_FOOTER (a); + + while (NEXT_INSN (last)) + last = NEXT_INSN (last); + NEXT_INSN (last) = BB_FOOTER (b); + PREV_INSN (BB_FOOTER (b)) = last; + } + BB_FOOTER (b) = NULL; +} + + /* Move things from b-header before a-footer. + Note that this may include dead tablejump data, but we don't clean + those up until we go out of cfglayout mode. */ + if (BB_HEADER (b)) + { + if (! BB_FOOTER (a)) + BB_FOOTER (a) = BB_HEADER (b); + else + { + rtx last = BB_HEADER (b); + + while (NEXT_INSN (last)) + last = NEXT_INSN (last); + NEXT_INSN (last) = BB_FOOTER (a); + PREV_INSN (BB_FOOTER (a)) = last; + BB_FOOTER (a) = BB_HEADER (b); + } BB_HEADER (b) = NULL; }
Fwd: Fix std::pair std::is_copy_assignable behavior
Hider Here is a patch already posted to libstdc++ mailing but I am resending following libstdc++ maintainers advises to add gcc-patches mailing list. This patch proposal is to fix the behavior of std::pair regarding the std::is_*_assignable meta programming functions. As announced it is requiring a compiler patch to extend DR 1402 resolution to all defaulted methods. 2013-04-12 François Dumont fdum...@gcc.gnu.org * call.c (joust): Extend DR 1402 to all defaulted methods. This modification is mandatory so that pair operator=(const pair) can be defaulted whereas leaving gcc consider the other operator= in some situations like std::pairint, int. This way, with usage of std::enable_if on the template operator=, we can control when p1= p2 is a valid expression resulting in a correct behavior of std::is_copy_assignable. For the moment I preferred to add a dg-require-normal-mode option in the single test that fail to compile because of the compiler modification. Does DR 1402 resolution generalization need a Standard committee validation first ? 2013-04-13 François Dumont fdum...@gcc.gnu.org * include/bits/stl_pair.h (operator=(const pair)): Defaulted. (operator=(pair)): Likewise. (template operator=(const pair)): Add noexcept qualification. Enable if is_assignableT, const U true for both parameters. (template operator=(pair)): Add noexcept qualification. Enable if is_assignableT, U true for both parameters. * testsuite/23_containers/unordered_set/55043.cc: Add dg-require-normal-mode. * testsuite/20_util/pair/is_move_assignable.cc: New. * testsuite/20_util/pair/is_copy_assignable.cc: Likewise. * testsuite/20_util/pair/is_assignable.cc: Likewise. * testsuite/20_util/pair/is_nothrow_move_assignable.cc: Likewise. * testsuite/20_util/pair/assign_neg.cc: Likewise. * testsuite/20_util/pair/is_nothrow_copy_assignable.cc: Likewise. * testsuite/20_util/pair/assign.cc: Likewise. François Index: call.c === --- call.c (revision 197829) +++ call.c (working copy) @@ -8377,19 +8377,20 @@ (IS_TYPE_OR_DECL_P (cand1-fn))) return 1; - /* Prefer a non-deleted function over an implicitly deleted move - constructor or assignment operator. This differs slightly from the - wording for issue 1402 (which says the move op is ignored by overload - resolution), but this way produces better error messages. */ + /* Prefer a non-deleted function over an implicitly deleted one. This + differs slightly from the wording for issue 1402 because: + - it is extended to all defaulted functions, not only the ones with + move semantic + - it says the op is ignored by overload resolution while we are + only making it a worst candidate, but this way produces better error + messages. */ if (TREE_CODE (cand1-fn) == FUNCTION_DECL TREE_CODE (cand2-fn) == FUNCTION_DECL DECL_DELETED_FN (cand1-fn) != DECL_DELETED_FN (cand2-fn)) { - if (DECL_DELETED_FN (cand1-fn) DECL_DEFAULTED_FN (cand1-fn) - move_fn_p (cand1-fn)) + if (DECL_DELETED_FN (cand1-fn) DECL_DEFAULTED_FN (cand1-fn)) return -1; - if (DECL_DELETED_FN (cand2-fn) DECL_DEFAULTED_FN (cand2-fn) - move_fn_p (cand2-fn)) + if (DECL_DELETED_FN (cand2-fn) DECL_DEFAULTED_FN (cand2-fn)) return 1; } Index: include/bits/stl_pair.h === --- include/bits/stl_pair.h (revision 197829) +++ include/bits/stl_pair.h (working copy) @@ -155,26 +155,18 @@ pair(piecewise_construct_t, tuple_Args1..., tuple_Args2...); pair - operator=(const pair __p) - { - first = __p.first; - second = __p.second; - return *this; - } + operator=(const pair) = default; pair - operator=(pair __p) - noexcept(__and_is_nothrow_move_assignable_T1, - is_nothrow_move_assignable_T2::value) - { - first = std::forwardfirst_type(__p.first); - second = std::forwardsecond_type(__p.second); - return *this; - } + operator=(pair) = default; templateclass _U1, class _U2 - pair + typename enable_if__and_is_assignable_T1, const _U1, + is_assignable_T2, const _U2::value, + pair::type operator=(const pair_U1, _U2 __p) + noexcept(__and_is_nothrow_assignable_T1, const _U1, + is_nothrow_assignable_T2, const _U2::value) { first = __p.first; second = __p.second; @@ -182,8 +174,12 @@ } templateclass _U1, class _U2 - pair + typename enable_if__and_is_assignable_T1, _U1, + is_assignable_T2, _U2::value, + pair::type operator=(pair_U1, _U2 __p) + noexcept(__and_is_nothrow_assignable_T1, _U1, + is_nothrow_assignable_T2, _U2::value) { first = std::forward_U1(__p.first); second = std::forward_U2(__p.second); Index:
[PATCH] V2DI zero constant in GPR (PR target/56948)
V2DI mode is allowed in GPRs and the pattern predicate allows easy vector constants but the pattern in vsx.md does not provide an alternative for that case, which can lead to an ICE where the insn does not satisfy its constraints. The following patch adds an alternative for this case. I also noticed that the VSX movti_64bit pattern does not handle loading constants into a GPR. And both the movti_64bit and movti_32bit patterns use j-wa instead of O-wa. The j constraint will work because it will accept any mode, but I think that an O constraint is more accurate for a scalar mode like TImode. Because the failure depends on the details of register allocation, I do not have a short testcase. Comments? Thanks, David PR target/56948 * config/rs6000/vsx.md (vsx_movmode): Add j-r alternative. (vsx_movti_64bit): Change j-wa to O-wa. Add n-r alternative. (vsx_movti_32bit): Change j-wa to O-wa. Index: vsx.md === --- vsx.md(revision 197940) +++ vsx.md(working copy) @@ -207,8 +207,8 @@ ;; VSX moves (define_insn *vsx_movmode - [(set (match_operand:VSX_M 0 nonimmediate_operand =Z,VSr,VSr,?Z,?wa,?wa,*Y,*r,*r,VSr,?wa,v,wZ,v) -(match_operand:VSX_M 1 input_operand VSr,Z,VSr,wa,Z,wa,r,Y,r,j,j,W,v,wZ))] + [(set (match_operand:VSX_M 0 nonimmediate_operand =Z,VSr,VSr,?Z,?wa,?wa,*Y,*r,*r,VSr,?wa,*r,v,wZ,v) +(match_operand:VSX_M 1 input_operand VSr,Z,VSr,wa,Z,wa,r,Y,r,j,j,j,W,v,wZ))] VECTOR_MEM_VSX_P (MODEmode) (register_operand (operands[0], MODEmode) || register_operand (operands[1], MODEmode)) @@ -238,23 +238,24 @@ case 6: case 7: case 8: +case 11: return #; case 9: case 10: return xxlxor %x0,%x0,%x0; -case 11: +case 12: return output_vec_const_move (operands); -case 12: +case 13: gcc_assert (MEM_P (operands[0]) GET_CODE (XEXP (operands[0], 0)) != PRE_INC GET_CODE (XEXP (operands[0], 0)) != PRE_DEC GET_CODE (XEXP (operands[0], 0)) != PRE_MODIFY); return stvx %1,%y0; -case 13: +case 14: gcc_assert (MEM_P (operands[0]) GET_CODE (XEXP (operands[0], 0)) != PRE_INC GET_CODE (XEXP (operands[0], 0)) != PRE_DEC @@ -265,14 +266,14 @@ gcc_unreachable (); } } - [(set_attr type vecstore,vecload,vecsimple,vecstore,vecload,vecsimple,*,*,*,vecsimple,vecsimple,*,vecstore,vecload)]) + [(set_attr type vecstore,vecload,vecsimple,vecstore,vecload,vecsimple,*,*,*,vecsimple,vecsimple,*,*,vecstore,vecload)]) ;; Unlike other VSX moves, allow the GPRs even for reloading, since a normal ;; use of TImode is for unions. However for plain data movement, slightly ;; favor the vector loads (define_insn *vsx_movti_64bit - [(set (match_operand:TI 0 nonimmediate_operand =Z,wa,wa,wa,v, v,wZ,?Y,?r,?r) -(match_operand:TI 1 input_operandwa, Z,wa, j,W,wZ, v, r, Y, r))] + [(set (match_operand:TI 0 nonimmediate_operand =Z,wa,wa,wa,v, v,wZ,?Y,?r,?r,?r) +(match_operand:TI 1 input_operandwa, Z,wa, O,W,wZ, v, r, Y, r, n))] TARGET_POWERPC64 VECTOR_MEM_VSX_P (TImode) (register_operand (operands[0], TImode) || register_operand (operands[1], TImode)) @@ -303,18 +304,19 @@ case 7: case 8: case 9: +case 10: return #; default: gcc_unreachable (); } } - [(set_attr type vecstore,vecload,vecsimple,vecsimple,vecsimple,vecstore,vecload,*,*,*) - (set_attr length 4, 4,4, 4, 8, 4, 4,8,8,8)]) + [(set_attr type vecstore,vecload,vecsimple,vecsimple,vecsimple,vecstore,vecload,*,*,*,*) + (set_attr length 4, 4,4, 4, 8, 4, 4,8,8,8,8)]) (define_insn *vsx_movti_32bit [(set (match_operand:TI 0 nonimmediate_operand =Z,wa,wa,wa,v, v,wZ,Q,Y,r,r,r,r) -(match_operand:TI 1 input_operandwa, Z,wa, j,W,wZ, v,r,r,Q,Y,r,n))] +(match_operand:TI 1 input_operandwa, Z,wa, O,W,wZ, v,r,r,Q,Y,r,n))] ! TARGET_POWERPC64 VECTOR_MEM_VSX_P (TImode) (register_operand (operands[0], TImode) || register_operand (operands[1], TImode))
Re: Fwd: Fix std::pair std::is_copy_assignable behavior
On 04/13/2013 09:21 PM, François Dumont wrote: Does DR 1402 resolution generalization need a Standard committee validation first ? In my opinion, it's much more clear to send the C++ front-end patch *separately* together with a simple C++-only (no library) testcase. I would also CC Jason. Paolo.
[PATCH, tree-ssa] Avoid -Wuninitialized warning in try_unroll_loop_completely()
Hi, I noticed there is an uninitialized variable warning when compiling tree-ssa-loop-ivcanon.c file. Attached patch is a slight modification to avoid the warning and a plaintext ChangeLog is as below. Is it OK for trunk? 2013-04-14 Chung-Ju Wu jasonw...@gmail.com * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Avoid -Wuninitialized warning. Best regards, jasonwucj gcc490-tree-ssa-loop-ivcanon.svn.patch Description: Binary data
Re: Fix std::pair std::is_copy_assignable behavior
Does DR 1402 resolution generalization need a Standard committee validation first ? I cannot see why we would want otherwise :-) -- Gaby