Re: [PATCH, x86] Use vector moves in memmove expanding

2013-04-13 Thread Ondřej Bílka
On Fri, Apr 12, 2013 at 01:08:15PM +0400, Michael Zolotukhin wrote:
  I did some profiling of builtin implementation, download this
  http://kam.mff.cuni.cz/~ondra/memcpy_profile_builtin.tar.bz2
 Nice data, thanks!
 Could you please describe what is memcpy_new_builtin here? Is it how
 GCC expanded memcpy with this patch?
 Is this a comparison between libcall, libcall with your version of
 glibc, and expanded memmov with implementation from this patch?

 
I try to make benchmarks self contained. So now I measure
 libcall, libcall with my version and current builtin expansion.

I updated my benchmark, one of problems of measuring memcpy is that most
memory ops happen asynchronously so this version should capute that.
(padding now should be sufficient but I did not decrement it from time
yet.)

Now memcpy_gcc_builtin there measures builtin for first 100 sizes, then
switches to my implementation.

I added memcpy_new_builtin which is now same as memcpy_gcc_builtin.

To add your implementation compile variant/builtin.c file into
variant/builtin.s file. 
Then run ./benchmark.

Ondra
 Michael
 
 On 12 April 2013 12:54, Ondřej Bílka nel...@seznam.cz wrote:
  On Thu, Apr 11, 2013 at 04:32:30PM +0400, Michael Zolotukhin wrote:
   128 is about upper bound you can expand with sse moves.
   Tuning did not take into account code size and measured only when code
   is in tigth loop.
   For GPR-moves limit is around 64.
  Thanks for the data - I've not performed measurements with this
  implementation yet, but we surely should adjust thresholds to avoid
  performance degradations on small sizes.
 
 
  I did some profiling of builtin implementation, download this
  http://kam.mff.cuni.cz/~ondra/memcpy_profile_builtin.tar.bz2
 
  see files results_rand/result.html and results_rand_noicache/result.html
 
  A memcpy_new_builtin for sizes x0,x1...x5 calls builtin and new
  otherwise.
  I did same for memcpy_glibc to see variance.
 
  memcpy_new does not call builtin.
 
  To regenerate graphs on other arch run benchmarks script.
  To use other builtin change in Makefile how to compile variant/builtin.c
  file.
 
  A builtin are faster by inlined function call, I did not add that as I
  do not know estimate of this cost.
 
  Michael
 
  On 10 April 2013 22:53, Ondřej Bílka nel...@seznam.cz wrote:
   On Wed, Apr 10, 2013 at 09:53:09PM +0400, Michael Zolotukhin wrote:
Hi, I am writing memcpy for libc. It avoids computed jump and has is
much faster on small strings (variant for sandy bridge attached.
  
   I'm not sure I get what you meant - could you please explain what is
   computed jumps?
   computed goto. See Duff's device it works almost exactly same.
  
You must also check performance with cold instruction cache.
Now memcpy(x,y,128) takes 126 bytes which is too much.
  
Do not align for small sizes. Dependency caused by this erases any 
gains
that you migth get. Keep in mind that in 55% of cases data are already
aligned.
  
   Other algorithms are still available and we can use them for small
   sizes. E.g. for sizes 128 we could emit loop with GPR-moves and don't
   use vector instructions in it.
  
   128 is about upper bound you can expand with sse moves.
   Tuning did not take into account code size and measured only when code
   is in tigth loop.
   For GPR-moves limit is around 64.
  
   What matters which code has best performance/size ratio.
   But that's tuning and I haven't worked on it yet - I'm going to
   measure performance of all algorithms on all sizes and thus defines on
   which sizes which algorithm is preferable.
   What I did in this patch is introducing some infrastructure to allow
   emitting of vector moves in movmem expanding - tuning is certainly
   possible and needed, but that's out of the scope of the patch.
  
   On 10 April 2013 21:43, Ondřej Bílka nel...@seznam.cz wrote:
On Wed, Apr 10, 2013 at 08:14:30PM +0400, Michael Zolotukhin wrote:
Hi,
This patch adds a new algorithm of expanding movmem in x86 and a bit
refactor existing implementation. This is a reincarnation of the 
patch
that was sent wasn't checked couple of years ago - now I reworked it
from scratch and divide into several more manageable parts.
   
Hi, I am writing memcpy for libc. It avoids computed jump and has is
much faster on small strings (variant for sandy bridge attached.
   
For now this algorithm isn't used, because cost_models are tuned to
use existing ones. I believe the new algorithm will give better
performance, but I'll leave cost-models tuning for a separate patch.
   
You must also check performance with cold instruction cache.
Now memcpy(x,y,128) takes 126 bytes which is too much.
   
Also, I changed get_mem_align_offset to make it handle MEM_REFs as
well. Probably, there is another way of getting info about alignment 
-
if so, please let me know.
   
Do not align for small sizes. Dependency caused by this 

Re: [patch][DF] do not call df_insn_delete in remove_insn, only unlink the insn

2013-04-13 Thread Paolo Bonzini
Il 13/04/2013 02:02, Steven Bosscher ha scritto:
 
   * emit-rtl.c (remove_insn): Do not call df_insn_delete here.
   * cfgrtl.c (delete_insn): Call it here instead.
   * lra-spills.c (lra_final_code_change): Use delete_insn.
   * haifa-sched.c (sched_remove_insn): Likewise.
   * sel-sched-ir.c (return_nop_to_pool): Clear INSN_DELETED_P for nops
   returning to the nop pool.
   (sel_remove_insn): Simplify the only_disconnect case via remove_insn,
   use delete_insn for definitive removal.  Clear BLOCK_FOR_INSN.

Ok, thanks!

Paolo


[Patch, Fortran, OOP] PR 55959: ICE in in gfc_simplify_expr, at fortran/expr.c:1920

2013-04-13 Thread Janus Weil
Hi all,

here is another trivial ICE-on-invalid fix that I will commit later
today (as obvious  regtested on x86_64-unknown-linux-gnu).

Cheers,
Janus


2013-04-13  Janus Weil  ja...@gcc.gnu.org

PR fortran/55959
* expr.c (gfc_simplify_expr): Branch is not unreachable.


2013-04-13  Janus Weil  ja...@gcc.gnu.org

PR fortran/55959
* gfortran.dg/typebound_proc_29.f03: New.


pr55959.diff
Description: Binary data


typebound_proc_29.f90
Description: Binary data


Re: [patch, fortran, backport, 4.8] PR51825 - Fortran runtime error: Cannot match namelist object name

2013-04-13 Thread Janus Weil
Hi Tilo,

 I would like to backport the fix for PR51825 I posted here

 http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00316.html

 to the 4.8 branch.

well, the usual gfortran policy is to only do backports of regression
fixes. In exceptional cases, also non-regression fixes can be
backported (and have been in the past), if the bug is extremely severe
and/or the the fix is extremely simple (IMHO the most severe type of
bug is a wrong-code issue, where the user does not see any kind of
error message, but just 'silently' gets wrong results).

None of these is conditions is completely fulfilled, if you ask me
(although the patch is indeed relatively simple). Therefore I don't
directly see an urgent need for backporting, unless you can convince
us why backporting this is extremely important.

Anyway, I don't feel sufficiently familiar with the namelist I/O
sector to ok the patch, but maybe someone else can share his opinion
...

Cheers,
Janus


Re: [Patch, Fortran, OOP] PR 56266: ICE on invalid in gfc_match_varspec

2013-04-13 Thread Mikael Morin
Hello,

Le 12/04/2013 20:38, Janus Weil a écrit :
 Unless someone has a better idea how to treat this, I will commit the
 attached patch as obvious.
 
Not really a better idea, but it seems to me that function calls can
have trailing sub-references, so that gfc_match_varspec could be called
on them.

gfc_match_rvalue has:

[...]
switch (sym-attr.flavor)
 {
[...]

case FL_UNKNOWN:
  [... try to match a variable ...]
  /* Give up, assume we have a function.  */
  [...]
  e-expr_type = EXPR_FUNCTION;
  [...]
  gfc_match_actual_arglist (...);
  [...]
  /* If our new function returns a character, array or structure
 type, it might have subsequent references.  */
  m = gfc_match_varspec (e, ...);


So, it seems that EXPR_FUNCTION is acceptable in gfc_match_varspec.
And then, there is nothing preventing 'c(i)' in 'c(i)%encM()' from being
parsed as a function.  Is this supported?

Mikael



Re: [Patch, Fortran, OOP] PR 56266: ICE on invalid in gfc_match_varspec

2013-04-13 Thread Janus Weil
Hi Mikael,

 Unless someone has a better idea how to treat this, I will commit the
 attached patch as obvious.

 Not really a better idea, but it seems to me that function calls can
 have trailing sub-references, so that gfc_match_varspec could be called
 on them.

 gfc_match_rvalue has:

 [...]
 switch (sym-attr.flavor)
  {
 [...]

 case FL_UNKNOWN:
   [... try to match a variable ...]
   /* Give up, assume we have a function.  */
   [...]
   e-expr_type = EXPR_FUNCTION;
   [...]
   gfc_match_actual_arglist (...);
   [...]
   /* If our new function returns a character, array or structure
  type, it might have subsequent references.  */
   m = gfc_match_varspec (e, ...);


 So, it seems that EXPR_FUNCTION is acceptable in gfc_match_varspec.
 And then, there is nothing preventing 'c(i)' in 'c(i)%encM()' from being
 parsed as a function.  Is this supported?

I think this is forbidden by the Fortran standard, cf. e.g.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42188

Actually I'm not sure in which context a function call with sub-refs
would be valid. One should re-check the standard on this ...

(Btw, I have already committed the patch as r197936.)

Cheers,
Janus


Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics

2013-04-13 Thread Julian Brown
On Fri, 12 Apr 2013 20:09:39 +0100
Julian Brown jul...@codesourcery.com wrote:

 On Fri, 12 Apr 2013 15:19:18 +0100
 Kyrylo Tkachov kyrylo.tkac...@arm.com wrote:
 
  Hi all,
  
  This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic
  to arm_neon.h through the generator ML scripts and also adds the
  built-ins to which the intrinsics will map to. The generator ML
  scripts are updated and used to generate the relevant .texi
  documentation, arm_neon.h and the tests in gcc.target/arm/neon .
 
 FWIW, some of the changes to neon*.ml can be simplified somewhat -- my
 attempt at an improved version of those bits is attached. I'm still
 not too happy with mode_suffix, but these new instructions require
 adding semantics to parts of the generator program which weren't
 really very well-defined to start with :-). I appreciate that it's a
 bit of a tangle...

I thought of an improvement to the mode_suffix part from the last
version of the patch, so here it is. I'm done fiddling with this now,
so back to you!

Cheers,

JulianIndex: neon-gen.ml
===
--- neon-gen.ml	(revision 197804)
+++ neon-gen.ml	(working copy)
@@ -121,6 +121,7 @@ let rec signed_ctype = function
   | T_uint16 | T_int16 - T_intHI
   | T_uint32 | T_int32 - T_intSI
   | T_uint64 | T_int64 - T_intDI
+  | T_float16 - T_floatHF
   | T_float32 - T_floatSF
   | T_poly8 - T_intQI
   | T_poly16 - T_intHI
@@ -275,8 +276,8 @@ let rec mode_suffix elttype shape =
 let mode = mode_of_elt elttype shape in
 string_of_mode mode
   with MixedMode (dst, src) -
-let dstmode = mode_of_elt dst shape
-and srcmode = mode_of_elt src shape in
+let dstmode = mode_of_elt ~argpos:0 dst shape
+and srcmode = mode_of_elt ~argpos:1 src shape in
 string_of_mode dstmode ^ string_of_mode srcmode
 
 let get_shuffle features =
@@ -291,19 +292,24 @@ let print_feature_test_start features =
 match List.find (fun feature -
match feature with Requires_feature _ - true
 | Requires_arch _ - true
+| Requires_FP_bit _ - true
 | _ - false)
  features with
-  Requires_feature feature - 
+  Requires_feature feature -
 Format.printf #ifdef __ARM_FEATURE_%s@\n feature
 | Requires_arch arch -
 Format.printf #if __ARM_ARCH = %d@\n arch
+| Requires_FP_bit bit -
+Format.printf #if ((__ARM_FP  0x%X) != 0)@\n
+  (1 lsl bit)
 | _ - assert false
   with Not_found - assert true
 
 let print_feature_test_end features =
   let feature =
-List.exists (function Requires_feature x - true
-  | Requires_arch x - true
+List.exists (function Requires_feature _ - true
+  | Requires_arch _ - true
+  | Requires_FP_bit _ - true
   |  _ - false) features in
   if feature then Format.printf #endif@\n
 
@@ -365,6 +371,7 @@ let deftypes () =
 __builtin_neon_hi, int, 16, 4;
 __builtin_neon_si, int, 32, 2;
 __builtin_neon_di, int, 64, 1;
+__builtin_neon_hf, float, 16, 4;
 __builtin_neon_sf, float, 32, 2;
 __builtin_neon_poly8, poly, 8, 8;
 __builtin_neon_poly16, poly, 16, 4;
Index: neon.ml
===
--- neon.ml	(revision 197804)
+++ neon.ml	(working copy)
@@ -21,7 +21,7 @@
http://www.gnu.org/licenses/.  *)
 
 (* Shorthand types for vector elements.  *)
-type elts = S8 | S16 | S32 | S64 | F32 | U8 | U16 | U32 | U64 | P8 | P16
+type elts = S8 | S16 | S32 | S64 | F16 | F32 | U8 | U16 | U32 | U64 | P8 | P16
   | I8 | I16 | I32 | I64 | B8 | B16 | B32 | B64 | Conv of elts * elts
   | Cast of elts * elts | NoElts
 
@@ -37,6 +37,7 @@ type vectype = T_int8x8| T_int8x16
 	 | T_uint16x4  | T_uint16x8
 	 | T_uint32x2  | T_uint32x4
 	 | T_uint64x1  | T_uint64x2
+	 | T_float16x4
 	 | T_float32x2 | T_float32x4
 	 | T_poly8x8   | T_poly8x16
 	 | T_poly16x4  | T_poly16x8
@@ -46,11 +47,13 @@ type vectype = T_int8x8| T_int8x16
  | T_uint8 | T_uint16
  | T_uint32| T_uint64
  | T_poly8 | T_poly16
- | T_float32   | T_arrayof of int * vectype
+ | T_float16   | T_float32
+ | T_arrayof of int * vectype
  | T_ptrto of vectype | T_const of vectype
  | T_void  | T_intQI
  | T_intHI | T_intSI
- | T_intDI | T_floatSF
+ | T_intDI | T_floatHF
+ | T_floatSF
 
 (* The meanings of the following are:
  TImode : Tetra, two registers (four words).
@@ -93,7 +96,7 @@ type arity = Arity0 of vectype
| Arity4 of vectype * vectype * vectype * vectype * vectype
 
 type vecmode = V8QI | V4HI | V2SI | V2SF | DI
-

Re: [Patch, fortran] PR 56919 SYSTEM_CLOCK on Windows

2013-04-13 Thread Janne Blomqvist
On Fri, Apr 12, 2013 at 11:49 PM, Dave Korn dave.korn.cyg...@gmail.com wrote:
 On 12/04/2013 19:47, Janne Blomqvist wrote:

 As I don't have a Windows system to test on, I would appreciate if somebody
 more familiar with that platform could take a quick look. In particular, I
 *think* it should be Ok to use win32 API functions on Cygwin (that is,
 cygwin-gcc ships the windows.h and other necessary headers out of the
 box?),

   Well, after installing the w32api package, but basically yes, that's fine
 for simple stuff like that.  (You shouldn't go doing things like creating
 threads or synchronisation through the Win32 API, but calling GetTickCount[64]
 will be fine.)

Ok, thanks.

 and that _WIN32 is the correct macro to use to select code which is common
 to MinGW and Cygwin.

   Alas no:

 $ gcc-4 -E -  /dev/null -dM | grep WIN
 #define __WINT_MAX__ 4294967295U
 #define __WINT_MIN__ 0U
 #define __SIZEOF_WINT_T__ 4
 #define __CYGWIN__ 1
 #define __WINT_TYPE__ unsigned int
 #define __CYGWIN32__ 1

   You should probably use #if defined(__MINGW32__) || defined (__CYGWIN__),
 since that'll also work on 64-bit Cygwin, as opposed to using __CYGWIN32__.  I
 think __MINGW32__ is defined for 64-bit as well as 32-bit targets.

Ok, I'll do that. Thanks for the info. FWIW, I grepped through the gcc
tree and there's quite a lot of

#if defined(_WIN32)  !defined(__CYGWIN__)

and similar, which in the light of the above, is pointless.

And yes, I also recall that mingw-w64 also defines __MINGW32__.


--
Janne Blomqvist


Re: [Patch, fortran] PR 56919 SYSTEM_CLOCK on Windows

2013-04-13 Thread Janne Blomqvist
On Sat, Apr 13, 2013 at 1:02 AM, Tobias Burnus bur...@net-b.de wrote:
 Janne Blomqvist wrote:

 the attached patch implements the SYSTEM_CLOCK intrinsics on the MinGW
 and Cygwin targets using the GetTickCount/GetTickCount64 functions.
 These should be quite robust monotonic clocks and AFAICS are the best
 we can do on Windows.


 I think using QueryPerformanceCounter is the better approach. It is
 supported since Windows 2000 and recommended as high-performance counter:
 http://msdn.microsoft.com/en-us/library/windows/desktop/ms644900%28v=vs.85%29.aspx

I didn't want to use QPC, as it can apparently be unreliable, see e.g.
the PEP 418 I linked to previously. But it seems that the worst issues
are caused by old and somewhat rare hardware, or have been fixed in
more recent Windows service packs, so maybe it's not worth worrying
about.

 I really dislike GetTickCount, which overflows after 50 days - that's not
 what you want to have. And GetTickCount64 only exists since Vista/2008. By
 contrast, QueryPerformanceCounter should allow for finer resolution and it
 is already available since Windows 2000.

Attached is an updated patch which uses GetTickCount for
system_clock_4; this should be fine as system_clock_4 wraps around in
~25 days anyways. For system_clock_8 it uses
QueryPerformance{Counter,Frequency}.

 Regarding clock_gettime: I really think we should check check
 _POSIX_MONOTONIC_CLOCK as well. Currently,  only MONOTONIC_CLOCK is checked,
 which is always available (on POSIX conform systems).

 See GLIBCXX_ENABLE_LIBSTDCXX_TIME in libstdc++-v3/acinclude.m4 - and in
 particular ac_has_clock_monotonic.

The patch also adds an additional check for _POSIX_MONOTONIC_CLOCK.

Ok for trunk?

Frontend ChangeLog:

2013-04-13  Janne Blomqvist  j...@gcc.gnu.org

PR fortran/56919
* intrinsics.texi (SYSTEM_CLOCK): Update documentation.

libgfortran ChangeLog:

2013-04-13  Janne Blomqvist  j...@gcc.gnu.org

PR fortran/56919
* intrinsics/time_1.h: Check __CYGWIN__ in addition to
__MINGW32__.
* intrinsics/system_clock.c (GF_CLOCK_MONOTONIC): Check
_POSIX_MONOTONIC_CLOCK as well.
(system_clock_4): Use GetTickCount on Windows.
(system_clock_8): Use QueryPerformanceCounter and
QueryPerformanceCounterFrequency on Windows.


--
Janne Blomqvist


sysclockwin.2.diff
Description: Binary data


Re: [Patch, Fortran, OOP] PR 56266: ICE on invalid in gfc_match_varspec

2013-04-13 Thread Mikael Morin

Le 13/04/2013 16:02, Janus Weil a écrit :
 Hi Mikael,
 
 So, it seems that EXPR_FUNCTION is acceptable in gfc_match_varspec.
 And then, there is nothing preventing 'c(i)' in 'c(i)%encM()' from being
 parsed as a function.  Is this supported?
 
 I think this is forbidden by the Fortran standard, cf. e.g.
 
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42188
 
 Actually I'm not sure in which context a function call with sub-refs
 would be valid. One should re-check the standard on this ...
 
Indeed, that's invalid:

structure-component is data-ref
data-ref is part-ref [ % part-ref ] ...
part-ref is part-name [ ( section-subscript-list ) ] [ image-selector ]

(R611) The leftmost part-name shall be the name of a data object.


I thought they were allowed for pointer-returning functions.

Mikael


[patch] committed: minor sched-int.h and sched-deps.c fixes

2013-04-13 Thread Steven Bosscher
Hello,

In sched-deps.c:deps_analyze_insn there's no need to look for
EH_REGION notes, they don't exist until just before final. This
assert, and some code in alpha.c (PR56858), are the only remaining
meaningful references outside final.c

In sched-int.h, the header is only non-empty if INSN_SCHEDULING is
defined. After #ifdef INSN_SCHEDULING the first header included is
insn-attr.h - which defines (or not) INSN_SCHEDULING. So move that
include outside the #ifdef INSN_SCHEDULING guard.

Bootstrapped and tested on several targets over the past three weeks. Committed.

Ciao!
Steven


* sched-deps.c (deps_analyze_insn): Do not check for EH_REGION insn
notes, they are emitted only just before final.
* sched-int.h: Include insn-attr.h before checking INSN_SCHEDULING.

Index: sched-deps.c
===
--- sched-deps.c(revision 197944)
+++ sched-deps.c(working copy)
@@ -3680,12 +3680,6 @@ deps_analyze_insn (struct deps_desc *deps, rtx ins
   if (sched_deps_info-use_cselib)
 cselib_process_insn (insn);

-  /* EH_REGION insn notes can not appear until well after we complete
- scheduling.  */
-  if (NOTE_P (insn))
-gcc_assert (NOTE_KIND (insn) != NOTE_INSN_EH_REGION_BEG
-NOTE_KIND (insn) != NOTE_INSN_EH_REGION_END);
-
   if (sched_deps_info-finish_insn)
 sched_deps_info-finish_insn ();

Index: sched-int.h
===
--- sched-int.h (revision 197944)
+++ sched-int.h (working copy)
@@ -21,10 +21,10 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_SCHED_INT_H
 #define GCC_SCHED_INT_H

+#include insn-attr.h
+
 #ifdef INSN_SCHEDULING

-/* For state_t.  */
-#include insn-attr.h
 #include df.h
 #include basic-block.h


[PATCH] Enable java for aarch64

2013-04-13 Thread Andreas Schwab
This enables building java for aarch64.  Most of the configuration bits
were copied from arm.

=== libjava Summary ===

# of expected passes2533
# of unexpected failures29
# of untested testcases 25

Andreas.

* configure.ac (aarch64-*-*): Don't disable java.
* configure: Regenerate.

libjava/:
* configure.host: Add support for aarch64.
* sysdep/aarch64/locks.h: New file.

libjava/classpath/:
* native/fdlibm/ieeefp.h: Add support for aarch64.
---
 configure|  2 ++
 configure.ac |  2 ++
 libjava/classpath/native/fdlibm/ieeefp.h |  8 +
 libjava/configure.host   |  8 -
 libjava/sysdep/aarch64/locks.h   | 57 
 5 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100644 libjava/sysdep/aarch64/locks.h

diff --git a/configure b/configure
index d809535..e161cad 100755
--- a/configure
+++ b/configure
@@ -3272,6 +3272,8 @@ esac
 
 # Disable Java if libffi is not supported.
 case ${target} in
+  aarch64-*-*)
+;;
   alpha*-*-*)
 ;;
   arm*-*-*)
diff --git a/configure.ac b/configure.ac
index 48ec1aa..bec489f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -611,6 +611,8 @@ esac
 
 # Disable Java if libffi is not supported.
 case ${target} in
+  aarch64-*-*)
+;;
   alpha*-*-*)
 ;;
   arm*-*-*)
diff --git a/libjava/classpath/native/fdlibm/ieeefp.h 
b/libjava/classpath/native/fdlibm/ieeefp.h
index c230bbb..7ef2ae7e 100644
--- a/libjava/classpath/native/fdlibm/ieeefp.h
+++ b/libjava/classpath/native/fdlibm/ieeefp.h
@@ -4,6 +4,14 @@
 #ifndef __IEEE_BIG_ENDIAN
 #ifndef __IEEE_LITTLE_ENDIAN
 
+#ifdef __aarch64__
+#ifdef __AARCH64EB__
+#define __IEEE_BIG_ENDIAN
+#else
+#define __IEEE_LITTLE_ENDIAN
+#endif
+#endif
+
 #ifdef __alpha__
 #define __IEEE_LITTLE_ENDIAN
 #endif
diff --git a/libjava/configure.host b/libjava/configure.host
index 0c3b41c..96f86fe 100644
--- a/libjava/configure.host
+++ b/libjava/configure.host
@@ -81,6 +81,11 @@ ATOMICSPEC=
 
 # This case statement supports per-CPU defaults.
 case ${host} in
+  aarch64*-linux*)
+   libgcj_interpreter=yes
+   sysdeps_dir=aarch64
+   ATOMICSPEC=-fuse-atomic-builtins
+   ;;
   arm*-elf)
with_libffi_default=no
PROCESS=Ecos
@@ -224,7 +229,8 @@ case ${host} in
   x86_64*-linux* | \
   hppa*-linux* | \
   m68k*-linux* | \
-  sh-linux* | sh[34]*-linux*)
+  sh-linux* | sh[34]*-linux* | \
+  aarch64*-linux*)
can_unwind_signal=yes
libgcj_ld_symbolic='-Wl,-Bsymbolic'
if test x$slow_pthread_self = xyes \
diff --git a/libjava/sysdep/aarch64/locks.h b/libjava/sysdep/aarch64/locks.h
new file mode 100644
index 000..f91473d
--- /dev/null
+++ b/libjava/sysdep/aarch64/locks.h
@@ -0,0 +1,57 @@
+// locks.h - Thread synchronization primitives. AArch64 implementation.
+
+#ifndef __SYSDEP_LOCKS_H__
+#define __SYSDEP_LOCKS_H__
+
+typedef size_t obj_addr_t; /* Integer type big enough for object   */
+   /* address. */
+
+// Atomically replace *addr by new_val if it was initially equal to old.
+// Return true if the comparison succeeded.
+// Assumed to have acquire semantics, i.e. later memory operations
+// cannot execute before the compare_and_swap finishes.
+inline static bool
+compare_and_swap(volatile obj_addr_t *addr,
+ obj_addr_t old,
+ obj_addr_t new_val)
+{
+  return __sync_bool_compare_and_swap(addr, old, new_val);
+}
+
+// Set *addr to new_val with release semantics, i.e. making sure
+// that prior loads and stores complete before this
+// assignment.
+inline static void
+release_set(volatile obj_addr_t *addr, obj_addr_t new_val)
+{
+  __sync_synchronize();
+  *addr = new_val;
+}
+
+// Compare_and_swap with release semantics instead of acquire semantics.
+// On many architecture, the operation makes both guarantees, so the
+// implementation can be the same.
+inline static bool
+compare_and_swap_release(volatile obj_addr_t *addr,
+obj_addr_t old,
+obj_addr_t new_val)
+{
+  return __sync_bool_compare_and_swap(addr, old, new_val);
+}
+
+// Ensure that subsequent instructions do not execute on stale
+// data that was loaded from memory before the barrier.
+inline static void
+read_barrier()
+{
+  __sync_synchronize();
+}
+
+// Ensure that prior stores to memory are completed with respect to other
+// processors.
+inline static void
+write_barrier()
+{
+  __sync_synchronize();
+}
+#endif
-- 
1.8.2.1

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
And now for something completely different.


[patch] fix PR52139 correctly

2013-04-13 Thread Steven Bosscher
Hello,

The fix for PR52139 only papered over another problem: That things
from a basic block header were emitted into the insns stream. When
going out of cfglayout mode, these header-insn will be lost. It's
probably possible to construct a test case where e.g. a
NOTE_INSN_DELETED_DEBUG_LABEL is lost because of this, but I haven't
tried to do so.

The correct fix is to find a new home for the header and footer insn,
and the most logical place is to put them in the footer of the merged
block.

Bootstrappedtested on x86_64-unknown-linux-gnu with current trunk,
and with r184004 (modified patch) to make sure the test case is fixed
(it doesn't fail on trunk even with r184005 reverted).
OK for trunk?

Ciao!
Steven


* cfgrtl.c (cfg_layout_merge_blocks): Revert r184005, implement
correct fix by moving header and footer insn to the footer of
the merged basic block.  Clear BB_END of the merged-away block.

Index: cfgrtl.c
===
--- cfgrtl.c(revision 197942)
+++ cfgrtl.c(working copy)
@@ -4083,18 +4083,40 @@ cfg_layout_merge_blocks (basic_block a,
   if (!optimize)
 emit_nop_for_unique_locus_between (a, b);

-  /* Possible line number notes should appear in between.  */
-  if (BB_HEADER (b))
-{
-  rtx first = BB_END (a), last;
-
-  last = emit_insn_after_noloc (BB_HEADER (b), BB_END (a), a);
-  /* The above might add a BARRIER as BB_END, but as barriers
-aren't valid parts of a bb, remove_insn doesn't update
-BB_END if it is a barrier.  So adjust BB_END here.  */
-  while (BB_END (a) != first  BARRIER_P (BB_END (a)))
-   BB_END (a) = PREV_INSN (BB_END (a));
-  delete_insn_chain (NEXT_INSN (first), last, false);
+  /* Move things from b-footer after a-footer.  */
+  if (BB_FOOTER (b))
+{
+  if (!BB_FOOTER (a))
+   BB_FOOTER (a) = BB_FOOTER (b);
+  else
+   {
+ rtx last = BB_FOOTER (a);
+
+ while (NEXT_INSN (last))
+   last = NEXT_INSN (last);
+ NEXT_INSN (last) = BB_FOOTER (b);
+ PREV_INSN (BB_FOOTER (b)) = last;
+   }
+  BB_FOOTER (b) = NULL;
+}
+
+  /* Move things from b-header before a-footer.
+ Note that this may include dead tablejump data, but we don't clean
+ those up until we go out of cfglayout mode.  */
+   if (BB_HEADER (b))
+ {
+  if (! BB_FOOTER (a))
+   BB_FOOTER (a) = BB_HEADER (b);
+  else
+   {
+ rtx last = BB_HEADER (b);
+
+ while (NEXT_INSN (last))
+   last = NEXT_INSN (last);
+ NEXT_INSN (last) = BB_FOOTER (a);
+ PREV_INSN (BB_FOOTER (a)) = last;
+ BB_FOOTER (a) = BB_HEADER (b);
+   }
   BB_HEADER (b) = NULL;
 }


Fwd: Fix std::pair std::is_copy_assignable behavior

2013-04-13 Thread François Dumont


Hider

Here is a patch already posted to libstdc++ mailing but I am 
resending following libstdc++ maintainers advises to add gcc-patches 
mailing list.


This patch proposal is to fix the behavior of std::pair regarding 
the std::is_*_assignable meta programming functions.


As announced it is requiring a compiler patch to extend DR 1402 
resolution to all defaulted methods.


2013-04-12 François Dumont fdum...@gcc.gnu.org

* call.c (joust): Extend DR 1402 to all defaulted methods.

This modification is mandatory so that pair operator=(const pair) 
can be defaulted whereas leaving gcc consider the other operator= in 
some situations like std::pairint, int. This way, with usage of 
std::enable_if on the template operator=, we can control when p1= p2 is 
a valid expression resulting in a correct behavior of 
std::is_copy_assignable.


For the moment I preferred to add a dg-require-normal-mode option 
in the single test that fail to compile because of the compiler 
modification.


Does DR 1402 resolution generalization need a Standard committee 
validation first ?


2013-04-13  François Dumont  fdum...@gcc.gnu.org

* include/bits/stl_pair.h (operator=(const pair)): Defaulted.
(operator=(pair)): Likewise.
(template operator=(const pair)): Add noexcept
qualification. Enable if is_assignableT, const U true for both
parameters.
(template operator=(pair)): Add noexcept
qualification. Enable if is_assignableT, U true for both
parameters.
* testsuite/23_containers/unordered_set/55043.cc: Add
dg-require-normal-mode.
* testsuite/20_util/pair/is_move_assignable.cc: New.
* testsuite/20_util/pair/is_copy_assignable.cc: Likewise.
* testsuite/20_util/pair/is_assignable.cc: Likewise.
* testsuite/20_util/pair/is_nothrow_move_assignable.cc: Likewise.
* testsuite/20_util/pair/assign_neg.cc: Likewise.
* testsuite/20_util/pair/is_nothrow_copy_assignable.cc: Likewise.
* testsuite/20_util/pair/assign.cc: Likewise.

François

Index: call.c
===
--- call.c	(revision 197829)
+++ call.c	(working copy)
@@ -8377,19 +8377,20 @@
(IS_TYPE_OR_DECL_P (cand1-fn)))
 return 1;
 
-  /* Prefer a non-deleted function over an implicitly deleted move
- constructor or assignment operator.  This differs slightly from the
- wording for issue 1402 (which says the move op is ignored by overload
- resolution), but this way produces better error messages.  */
+  /* Prefer a non-deleted function over an implicitly deleted one. This
+ differs slightly from the wording for issue 1402 because:
+ - it is extended to all defaulted functions, not only the ones with
+ move semantic
+ - it says the op is ignored by overload resolution while we are
+ only making it a worst candidate, but this way produces better error
+ messages.  */
   if (TREE_CODE (cand1-fn) == FUNCTION_DECL
TREE_CODE (cand2-fn) == FUNCTION_DECL
DECL_DELETED_FN (cand1-fn) != DECL_DELETED_FN (cand2-fn))
 {
-  if (DECL_DELETED_FN (cand1-fn)  DECL_DEFAULTED_FN (cand1-fn)
-	   move_fn_p (cand1-fn))
+  if (DECL_DELETED_FN (cand1-fn)  DECL_DEFAULTED_FN (cand1-fn))
 	return -1;
-  if (DECL_DELETED_FN (cand2-fn)  DECL_DEFAULTED_FN (cand2-fn)
-	   move_fn_p (cand2-fn))
+  if (DECL_DELETED_FN (cand2-fn)  DECL_DEFAULTED_FN (cand2-fn))
 	return 1;
 }
 

Index: include/bits/stl_pair.h
===
--- include/bits/stl_pair.h	(revision 197829)
+++ include/bits/stl_pair.h	(working copy)
@@ -155,26 +155,18 @@
 pair(piecewise_construct_t, tuple_Args1..., tuple_Args2...);
 
   pair
-  operator=(const pair __p)
-  {
-	first = __p.first;
-	second = __p.second;
-	return *this;
-  }
+  operator=(const pair) = default;
 
   pair
-  operator=(pair __p)
-  noexcept(__and_is_nothrow_move_assignable_T1,
-	  is_nothrow_move_assignable_T2::value)
-  {
-	first = std::forwardfirst_type(__p.first);
-	second = std::forwardsecond_type(__p.second);
-	return *this;
-  }
+  operator=(pair) = default;
 
   templateclass _U1, class _U2
-	pair
+	typename enable_if__and_is_assignable_T1, const _U1,
+  is_assignable_T2, const _U2::value,
+			   pair::type
 	operator=(const pair_U1, _U2 __p)
+	noexcept(__and_is_nothrow_assignable_T1, const _U1,
+			is_nothrow_assignable_T2, const _U2::value)
 	{
 	  first = __p.first;
 	  second = __p.second;
@@ -182,8 +174,12 @@
 	}
 
   templateclass _U1, class _U2
-	pair
+  	typename enable_if__and_is_assignable_T1, _U1,
+    is_assignable_T2, _U2::value,
+  			   pair::type
 	operator=(pair_U1, _U2 __p)
+	noexcept(__and_is_nothrow_assignable_T1, _U1,
+			is_nothrow_assignable_T2, _U2::value)
 	{
 	  first = std::forward_U1(__p.first);
 	  second = std::forward_U2(__p.second);
Index: 

[PATCH] V2DI zero constant in GPR (PR target/56948)

2013-04-13 Thread David Edelsohn
V2DI mode is allowed in GPRs and the pattern predicate allows easy
vector constants but the pattern in vsx.md does not provide an
alternative for that case, which can lead to an ICE where the insn
does not satisfy its constraints.  The following patch adds an
alternative for this case.

I also noticed that the VSX movti_64bit pattern does not handle
loading constants into a GPR.

And both the movti_64bit and movti_32bit patterns use j-wa instead of
O-wa.  The j constraint will work because it will accept any mode,
but I think that an O constraint is more accurate for a scalar mode
like TImode.

Because the  failure depends on the details of register allocation, I
do not have a short testcase.

Comments?

Thanks, David

PR target/56948
* config/rs6000/vsx.md (vsx_movmode): Add j-r alternative.
(vsx_movti_64bit): Change j-wa to O-wa.  Add n-r alternative.
(vsx_movti_32bit): Change j-wa to O-wa.

Index: vsx.md
===
--- vsx.md(revision 197940)
+++ vsx.md(working copy)
@@ -207,8 +207,8 @@

 ;; VSX moves
 (define_insn *vsx_movmode
-  [(set (match_operand:VSX_M 0 nonimmediate_operand
=Z,VSr,VSr,?Z,?wa,?wa,*Y,*r,*r,VSr,?wa,v,wZ,v)
-(match_operand:VSX_M 1 input_operand
VSr,Z,VSr,wa,Z,wa,r,Y,r,j,j,W,v,wZ))]
+  [(set (match_operand:VSX_M 0 nonimmediate_operand
=Z,VSr,VSr,?Z,?wa,?wa,*Y,*r,*r,VSr,?wa,*r,v,wZ,v)
+(match_operand:VSX_M 1 input_operand
VSr,Z,VSr,wa,Z,wa,r,Y,r,j,j,j,W,v,wZ))]
   VECTOR_MEM_VSX_P (MODEmode)
 (register_operand (operands[0], MODEmode)
|| register_operand (operands[1], MODEmode))
@@ -238,23 +238,24 @@
 case 6:
 case 7:
 case 8:
+case 11:
   return #;

 case 9:
 case 10:
   return xxlxor %x0,%x0,%x0;

-case 11:
+case 12:
   return output_vec_const_move (operands);

-case 12:
+case 13:
   gcc_assert (MEM_P (operands[0])
GET_CODE (XEXP (operands[0], 0)) != PRE_INC
GET_CODE (XEXP (operands[0], 0)) != PRE_DEC
GET_CODE (XEXP (operands[0], 0)) != PRE_MODIFY);
   return stvx %1,%y0;

-case 13:
+case 14:
   gcc_assert (MEM_P (operands[0])
GET_CODE (XEXP (operands[0], 0)) != PRE_INC
GET_CODE (XEXP (operands[0], 0)) != PRE_DEC
@@ -265,14 +266,14 @@
   gcc_unreachable ();
 }
 }
-  [(set_attr type
vecstore,vecload,vecsimple,vecstore,vecload,vecsimple,*,*,*,vecsimple,vecsimple,*,vecstore,vecload)])
+  [(set_attr type
vecstore,vecload,vecsimple,vecstore,vecload,vecsimple,*,*,*,vecsimple,vecsimple,*,*,vecstore,vecload)])

 ;; Unlike other VSX moves, allow the GPRs even for reloading, since a normal
 ;; use of TImode is for unions.  However for plain data movement, slightly
 ;; favor the vector loads
 (define_insn *vsx_movti_64bit
-  [(set (match_operand:TI 0 nonimmediate_operand =Z,wa,wa,wa,v,
v,wZ,?Y,?r,?r)
-(match_operand:TI 1 input_operandwa, Z,wa, j,W,wZ, v,
r, Y, r))]
+  [(set (match_operand:TI 0 nonimmediate_operand =Z,wa,wa,wa,v,
v,wZ,?Y,?r,?r,?r)
+(match_operand:TI 1 input_operandwa, Z,wa, O,W,wZ, v,
r, Y, r, n))]
   TARGET_POWERPC64  VECTOR_MEM_VSX_P (TImode)
 (register_operand (operands[0], TImode)
|| register_operand (operands[1], TImode))
@@ -303,18 +304,19 @@
 case 7:
 case 8:
 case 9:
+case 10:
   return #;

 default:
   gcc_unreachable ();
 }
 }
-  [(set_attr type
vecstore,vecload,vecsimple,vecsimple,vecsimple,vecstore,vecload,*,*,*)
-   (set_attr length  4,  4,4,   4, 8,
 4,  4,8,8,8)])
+  [(set_attr type
vecstore,vecload,vecsimple,vecsimple,vecsimple,vecstore,vecload,*,*,*,*)
+   (set_attr length  4,  4,4,   4, 8,
 4,  4,8,8,8,8)])

 (define_insn *vsx_movti_32bit
   [(set (match_operand:TI 0 nonimmediate_operand =Z,wa,wa,wa,v,
v,wZ,Q,Y,r,r,r,r)
-(match_operand:TI 1 input_operandwa, Z,wa, j,W,wZ,
v,r,r,Q,Y,r,n))]
+(match_operand:TI 1 input_operandwa, Z,wa, O,W,wZ,
v,r,r,Q,Y,r,n))]
   ! TARGET_POWERPC64  VECTOR_MEM_VSX_P (TImode)
 (register_operand (operands[0], TImode)
|| register_operand (operands[1], TImode))


Re: Fwd: Fix std::pair std::is_copy_assignable behavior

2013-04-13 Thread Paolo Carlini

On 04/13/2013 09:21 PM, François Dumont wrote:
Does DR 1402 resolution generalization need a Standard committee 
validation first ?
In my opinion, it's much more clear to send the C++ front-end patch 
*separately* together with a simple C++-only (no library) testcase. I 
would also CC Jason.


Paolo.


[PATCH, tree-ssa] Avoid -Wuninitialized warning in try_unroll_loop_completely()

2013-04-13 Thread Chung-Ju Wu
Hi,

I noticed there is an uninitialized variable warning
when compiling tree-ssa-loop-ivcanon.c file.

Attached patch is a slight modification to avoid the warning
and a plaintext ChangeLog is as below.

Is it OK for trunk?


2013-04-14  Chung-Ju Wu  jasonw...@gmail.com

* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Avoid
-Wuninitialized warning.


Best regards,
jasonwucj


gcc490-tree-ssa-loop-ivcanon.svn.patch
Description: Binary data


Re: Fix std::pair std::is_copy_assignable behavior

2013-04-13 Thread Gabriel Dos Reis
 Does DR 1402 resolution generalization need a Standard committee validation
 first ?

I cannot see why we would want otherwise :-)

-- Gaby