Re: [PATCH] Add arm_cortex_m7_tune.

2014-11-04 Thread Ramana Radhakrishnan
On Tue, Oct 21, 2014 at 10:48 AM, Hale Wang hale.w...@arm.com wrote:
 Hi,

 This patch is used to tune the gcc for Cortex-M7.

 The performance of Dhrystone can be improved by 1%.
 The performance of Coremark can be improved by 2.3%.

 Patch also attached for convenience.

 Is it ok for trunk?

 Thanks and Best Regards,
 Hale Wang


Ok - For the future please include [Patch ARM] in your subject line.
Otherwise my filters won't catch these as requiring attention for the
ARM backend.

regards
Ramana

 gcc/ChangeLog
 2014-10-11  Hale Wang  hale.w...@arm.com

 * config/arm/arm.c: Add cortex-m7 tune.
 * config/arm/arm-cores.def: Use cortex-m7 tune.

 diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
 index 56ec7fd..3b34173 100644
 --- a/gcc/config/arm/arm-cores.def
 +++ b/gcc/config/arm/arm-cores.def
 @@ -149,7 +149,7 @@ ARM_CORE(cortex-r4,
 cortexr4, cortexr4,  7R,  FL_LDSCHED, cortex)
 ARM_CORE(cortex-r4f,cortexr4f, cortexr4f,
 7R,  FL_LDSCHED, cortex)
 ARM_CORE(cortex-r5,  cortexr5, cortexr5,
 7R,  FL_LDSCHED | FL_ARM_DIV, cortex)
 ARM_CORE(cortex-r7,  cortexr7, cortexr7,
 7R,  FL_LDSCHED | FL_ARM_DIV, cortex)
 -ARM_CORE(cortex-m7,   cortexm7, cortexm7,
 7EM, FL_LDSCHED, v7m)
 +ARM_CORE(cortex-m7,  cortexm7, cortexm7,
 7EM, FL_LDSCHED, cortex_m7)
 ARM_CORE(cortex-m4,   cortexm4, cortexm4,
 7EM, FL_LDSCHED, v7m)
 ARM_CORE(cortex-m3,   cortexm3, cortexm3,
 7M,  FL_LDSCHED, v7m)
 ARM_CORE(marvell-pj4, marvell_pj4, marvell_pj4,
 7A,  FL_LDSCHED, 9e)
 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
 index 93b989d..834b13a 100644
 --- a/gcc/config/arm/arm.c
 +++ b/gcc/config/arm/arm.c
 @@ -2003,6 +2003,27 @@ const struct tune_params arm_v7m_tune =
8
 /* Maximum insns to inline memset.  */
 };

 +/* Cortex-M7 tuning.  */
 +
 +const struct tune_params arm_cortex_m7_tune =
 +{
 +  arm_9e_rtx_costs,
 +  v7m_extra_costs,
 +  NULL,
 /* Sched adj cost.  */
 +  0,
 /* Constant limit.  */
 +  0,
 /* Max cond insns.  */
 +  ARM_PREFETCH_NOT_BENEFICIAL,
 +  true,
 /* Prefer constant pool.  */
 +  arm_cortex_m_branch_cost,
 +  false,
 /* Prefer LDRD/STRD.  */
 +  {true, true},
 /* Prefer non short circuit.  */
 +  arm_default_vec_cost,/* Vectorizer costs.  */
 +  false,/* Prefer Neon for 64-bits
 bitops.  */
 +  false, false, /* Prefer 32-bit encodings.
 */
 +  false,
 /* Prefer Neon for stringops.  */
 +  8
 /* Maximum insns to inline memset.  */
 +};
 +
 /* The arm_v6m_tune is duplicated from arm_cortex_tune, rather than
 arm_v6t2_tune. It is used for cortex-m0, cortex-m1 and cortex-m0plus.
 */
 const struct tune_params arm_v6m_tune =





Re: [1/2][PATCH,ARM]Generate UAL assembly code for Thumb-1 target

2014-11-04 Thread Ramana Radhakrishnan
On Tue, Oct 21, 2014 at 10:18 AM, Terry Guo terry@arm.com wrote:
 Hi There,

 This is the first patch to enable GCC generate UAL assembly code for Thumb1
 target. This new option enables user to specify which syntax is used in
 their inline assembly code.  If the inline assembly code uses UAL format,
 then gcc does nothing because gcc generates UAL code as well. If the inline
 assembly code uses non-UAL, then gcc will insert some directives in final
 assembly code. Is it ok to trunk?

 BR,
 Terry

 2014-10-21  Terry Guo  terry@arm.com

 * config/arm/arm.h (TARGET_UNIFIED_ASM): Also include thumb1.
 (ASM_APP_ON): Redefined.
 * config/arm/arm.c (arm_option_override): Thumb2 always uses UAL
 for inline assembly code.
 * config/arm/arm.opt (masm-syntax-unified): New option.
 * doc/invoke.texi (-masm-syntax-unified): Document new option.

Minor tweaks to documentation are required about ARM state, you should
also state that currently we default to divided syntax in ARM state
too.

Otherwise I'm ok with this. This deserves a news article entry - so
please consider writing something up for wwwdocs.

Ramana


Re: [2/2][PATCH,ARM]Generate UAL assembly code for Thumb-1 target

2014-11-04 Thread Ramana Radhakrishnan
On Tue, Oct 21, 2014 at 12:19 PM, Terry Guo terry@arm.com wrote:
 Hi there,

 Attached patch intends to enable GCC generate UAL format code for Thumb1
 target. Tested with regression test and no regressions. Is it OK to trunk?

This is OK -  Please don't commit it until we have sorted out patch
1/2 with the options. I'd encourage a follow-up patch to switch ARM
state also to UAL :) ...

Ramana


 BR,
 Terry

 2014-10-21  Terry Guo  terry@arm.com

* config/arm/arm.c (arm_output_mi_thunk): Use UAL for Thumb1
 target.
* config/arm/thumb1.md: Likewise.

 gcc/testsuite
 2014-10-21  Terry Guo  terry@arm.com

 * gcc.target/arm/anddi_notdi-1.c: Match with UAL format.
 * gcc.target/arm/pr40956.c: Likewise.
 * gcc.target/arm/thumb1-Os-mult.c: Likewise.
  * gcc.target/arm/thumb1-load-64bit-constant-3.c: Likewise.


Re: [libgomp, libiberty, libobjc] Fix gnu11 fallout on Solaris 10+

2014-11-04 Thread Richard Henderson
On 11/03/2014 05:22 PM, Rainer Orth wrote:
 2014-10-22  Rainer Orth  r...@cebitec.uni-bielefeld.de
 
   libobjc:
   * thr.c (_XOPEN_SOURCE): Define as 600.
 
   libiberty:
   * sigsetmask.c (_POSIX_SOURCE): Remove.
 
   libgomp:
   * config/posix/lock.c (_XOPEN_SOURCE) Define as 600.

Ok.


r~



Re: [COMMITTED][PATCH][ARM]Add ACLE 2.0 predefined marco __ARM_FEATURE_IDIV

2014-11-04 Thread Ramana Radhakrishnan
On Thu, Oct 23, 2014 at 11:30 AM, Renlin Li renlin...@arm.com wrote:
 Are you sure that the ACLE documents this with trailing underscores ?
 The copy that I have doesn't.


 You are right, it's my incaution. I have double checked, the macro should be
 __ARM_FEATURE_IDIV.

 Could you please do a obvious fix? Thank you so much!

applied this as obvious.

Ramana


2014-11-04  Ramana Radhakrishnan  ramana.radhakrish...@arm.com

* config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Fix typo in definition
of __ARM_FEATURE_IDIV.



 Kind regards,
 Renlin

Index: gcc/config/arm/arm.h
===
--- gcc/config/arm/arm.h(revision 217070)
+++ gcc/config/arm/arm.h(working copy)
@@ -166,7 +166,7 @@
if (TARGET_IDIV)\
  { \
 builtin_define (__ARM_ARCH_EXT_IDIV__);  \
-builtin_define (__ARM_FEATURE_IDIV__);   \
+builtin_define (__ARM_FEATURE_IDIV); \
  } \
 } while (0)
 


Re: [RFC PATCH, i386]: Fix PR 63538, With -mcmodel=medium .lrodata accesses do not use 64-bit addresses

2014-11-04 Thread Uros Bizjak
On Mon, Nov 3, 2014 at 10:00 PM, Uros Bizjak ubiz...@gmail.com wrote:

 Following patch fixes PR 63538, where the data in the large data
 section was accessed through 32bit address. The patch unifies places
 where large data sections are determined and passes all declarations
 to ix86_in_large_data_p only.

 The patch fixes the testcase form the PR. Also, the code from several
 tests involving various -mlarge-data-threshold= settings looks
 consistent now.

 2014-11-03  Uros Bizjak  ubiz...@gmail.com

 PR target/63538
 * config/i386/i386.c (ix86_encode_section_info): Do not check
 TREE_STATIC and DECL_EXTERNAL when setting SYMBOL_FLAG_FAR_ADDR flag.
 (x86_64_elf_select_section): Do not check ix86_cmodel here.
 (x86_64_elf_unique_section): Ditto.

 The patch was bootstrapped and regression tested on x86_64-linux-gnu
 {,-m32}. However, the -mcmodel testcases are virtually non-existent,
 this is the reason for the RFC status of the patch.

I forgot to add the original reporter to CC ...

We probably want to avoid putting automatic variables (and STRING_CSTs
?) to large data section. However, it is important that we use the
same condition in ix86_encode_section_info and
x86_64_elf_select_section (and likes), otherwise the data can go into
large section, while the access to the data will be via limited 32bit
pointers.

The attached alternative patch rejects automatic variables and
STRING_CSTs from large data sections. Please note, that it still
access the data with the correct instructions.

Honza, can you please comment the approach from the ABI perspective -
should we reject automatic variables and STRING_CSTs from going into
large section?

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 217069)
+++ config/i386/i386.c  (working copy)
@@ -5066,6 +5066,14 @@ ix86_in_large_data_p (tree exp)
   if (TREE_CODE (exp) == FUNCTION_DECL)
 return false;
 
+  /* Strings are never large data.  */
+  if (TREE_CODE (exp) == STRING_CST)
+return false;
+
+  /* Automatic variables are never large data.  */
+  if (TREE_CODE (exp) == VAR_DECL  !is_global_var (exp))
+return false;
+
   if (TREE_CODE (exp) == VAR_DECL  DECL_SECTION_NAME (exp))
 {
   const char *section = DECL_SECTION_NAME (exp);
@@ -5099,8 +5107,7 @@ ATTRIBUTE_UNUSED static section *
 x86_64_elf_select_section (tree decl, int reloc,
   unsigned HOST_WIDE_INT align)
 {
-  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
-   ix86_in_large_data_p (decl))
+  if (ix86_in_large_data_p (decl))
 {
   const char *sname = NULL;
   unsigned int flags = SECTION_WRITE;
@@ -5186,8 +5193,7 @@ x86_64_elf_section_type_flags (tree decl, const ch
 static void ATTRIBUTE_UNUSED
 x86_64_elf_unique_section (tree decl, int reloc)
 {
-  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
-   ix86_in_large_data_p (decl))
+  if (ix86_in_large_data_p (decl))
 {
   const char *prefix = NULL;
   /* We only need to use .gnu.linkonce if we don't have COMDAT groups.  */
@@ -44230,9 +44236,7 @@ ix86_encode_section_info (tree decl, rtx rtl, int
 {
   default_encode_section_info (decl, rtl, first);
 
-  if (TREE_CODE (decl) == VAR_DECL
-   (TREE_STATIC (decl) || DECL_EXTERNAL (decl))
-   ix86_in_large_data_p (decl))
+  if (ix86_in_large_data_p (decl))
 SYMBOL_REF_FLAGS (XEXP (rtl, 0)) |= SYMBOL_FLAG_FAR_ADDR;
 }
 


Re: libstdc++ testsuite make targets check-parallel and check-performance don't work anymore

2014-11-04 Thread Paolo Carlini
.. thanks a lot Jon! (after all this parallel mode is still useful for 
something ;)


Paolo.


Re: [AArch64] Fix predicate and constraint mismatch in logical atomic operations

2014-11-04 Thread Marcus Shawcroft
On 25 September 2014 04:45, Michael Collison
michael.colli...@linaro.org wrote:
 On certain patterns in atomics.md the constraint 'n' is used in combination
 with the predicate atomic_op_operand. The constraint is too general and
 allows constants that are disallowed by the predicate. This causes an ICE In
 final_scan_insn when the insn cannot be split because the constraint and
 predicate do not match.

 Tested on aarch64-none-elf, aarch64-linux-gnu. Additionally the originally
 reporter of the bug, (d...@ubuntu.com), applied the patch and successfully
 bootstrapped and tested with no regressions.

 2014-09-23  Michael Collison michael.colli...@linaro.org

 * config/aarch64/iterators.md (lconst_atomic): New mode attribute to
 support constraints for CONST_INT in atomic operations.
 * config/aarch64/atomics.md
 (atomic_atomic_optabmode): Use lconst_atomic constraint.
 (atomic_nandmode): Likewise.
 (atomic_fetch_atomic_optabmode): Likewise.
 (atomic_fetch_nandmode): Likewise.
 (atomic_atomic_optab_fetchmode): Likewise.
 (atomic_nand_fetchmode): Likewise.

OK Thanks.  /Marcus


Re: [AArch64] Make gentune.sh also generate generic_sched attribute

2014-11-04 Thread Marcus Shawcroft
On 25 September 2014 14:37, James Greenhalgh james.greenha...@arm.com wrote:

 Hi,

 This patch fixes an annoying gotcha when adding new cores or piepline
 models in builds for AArch64. The generic_sched attribute also needs
 updating in addition to aarch64-tune.md.

 I see no good reason for this, we can generate that attribute in
 gentune.sh quite easily.

 For testing, I built an aarch64-none-elf toolchain with no issues.

 OK?

 Thanks,
 James

 ---
 2014-09-25  James Greenhalgh  james.greenha...@arm.com

 * config/aarch64/aarch64.md (generic_sched): Don't define here.
 * config/aarch64/gentune.sh: Also generate generic_sched attribute.
 * config/aarch64/aarch64-tune.md: Regenerate.

OK /Marcus


Re: [PATCH][AArch64] Fix PR63293

2014-11-04 Thread Marcus Shawcroft
On 25 September 2014 17:32, Jiong Wang jiong.w...@arm.com wrote:


 patch updated, please review.


 2014-09-25  Jiong Wang  jiong.w...@arm.com
 2014-09-25  Wilco Dijkstra  wilco.dijks...@arm.com

 gcc/
   PR target/63293
   * config/aarch64/aarch64.c (aarch64_expand_epiloue): Add barriers before
 stack adjustment.

OK /Marcus


[PATCH, BUILDROBOT] SH: Fix unused variable warning (was: [SH][committed] PR 53513 - Add __builtin_sh_get_fpscr, __builtin_sh_set_fpscr)

2014-11-04 Thread Jan-Benedict Glaw
On Sat, 2014-10-18 12:54:33 +0200, Oleg Endo oleg.e...@t-online.de wrote:
 Hi,
 
 As discussed in the PR, this adds two new SH built-in functions
 __builtin_sh_get_fpscr __builtin_sh_set_fpscr.  Tested on r216173 with  
 
 make -k check RUNTESTFLAGS=--target_board=sh-sim\{-m4/-ml,-m4/-mb}
 
 and no new failures.  Committed as r216424.
 
 Cheers,
 Oleg
 
 gcc/ChangeLog:
   PR target/53513
   * config/sh/sh-modes.def (PSI): Remove.
   * config/sh/sh-protos.h (get_fpscr_rtx): Remove.
   * config/sh/sh.c (fpscr_rtx, get_fpscr_rtx): Remove.
   (sh_reorg): Remove commented out FPSCR code.
   (fpscr_set_from_mem): Use SImode instead of PSImode.  Emit lds_fpscr
   insn instead of move insn.
   (sh_hard_regno_mode_ok): Return SImode for FPSCR.
   (sh_legitimate_address_p, sh_legitimize_reload_address): Remove PSImode
   handling.
   (sh_emit_mode_set): Emit lds_fpscr and sts_fpscr insns.
   (sh1_builtin_p): Uncomment.
   (SH_BLTIN_UV 25, SH_BLTIN_VU 26): New macros.
   (bdesc): Add __builtin_sh_get_fpscr and __builtin_sh_set_fpscr.

Change to emit_fpu_switch() is missing here.

 Index: gcc/config/sh/sh.c
 ===
 --- gcc/config/sh/sh.c(revision 216350)
 +++ gcc/config/sh/sh.c(working copy)

This chunk is in emit_fpu_switch(), which drops every reference to
`dst':
 @@ -10055,13 +10032,12 @@
emit_move_insn (scratch, XEXP (src, 0));
if (index != 0)
   emit_insn (gen_addsi3 (scratch, scratch, GEN_INT (index * 4)));
 -  src = adjust_automodify_address (src, PSImode, scratch, index * 4);
 +  src = adjust_automodify_address (src, SImode, scratch, index * 4);
  }
else
 -src = adjust_address (src, PSImode, index * 4);
 +src = adjust_address (src, SImode, index * 4);
  
 -  dst = get_fpscr_rtx ();
 -  emit_move_insn (dst, src);
 +  emit_insn (gen_lds_fpscr (src));
  }
  
  static rtx get_free_reg (HARD_REG_SET);

...which in turn results in fall-out when simply building it with
config-list.mk, see eg.
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=372472 :

[...]
g++ -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic 
-Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common 
 -DHAVE_CONFIG_H -I. -I. -I../../../gcc/gcc -I../../../gcc/gcc/. 
-I../../../gcc/gcc/../include -I../../../gcc/gcc/../libcpp/include 
-I/opt/cfarm/mpc/include  -I../../../gcc/gcc/../libdecnumber 
-I../../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
-I../../../gcc/gcc/../libbacktrace-o sh.o -MT sh.o -MMD -MP -MF 
./.deps/sh.TPo ../../../gcc/gcc/config/sh/sh.c
../../../gcc/gcc/config/sh/sh.c: In function ‘void emit_fpu_switch(rtx, int)’:
../../../gcc/gcc/config/sh/sh.c:10027:7: error: unused variable ‘dst’ 
[-Werror=unused-variable]
   rtx dst, src;
   ^
cc1plus: all warnings being treated as errors
make[2]: *** [sh.o] Error 1

This should fix it, ok for mainline?

2014-11-04  Jan-Benedict Glaw  jbg...@lug-owl.de

* config/sh/sh.c (emit_fpu_switch): Drop unused automatic variable.

 
diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index 3a4f5e9..be944da 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -10024,7 +10024,7 @@ static GTY(()) tree fpscr_values;
 static void
 emit_fpu_switch (rtx scratch, int index)
 {
-  rtx dst, src;
+  rtx src;
 
   if (fpscr_values == NULL)
 {



MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of:http://www.chiark.greenend.org.uk/~sgtatham/bugs.html
the second  :


signature.asc
Description: Digital signature


Re: [PATCH][ARM] Fix names of some rounding intrinsics, impement vrndx_f32 and vrndxq_f32

2014-11-04 Thread Kyrill Tkachov

Phew,

This one slipped through the cracks. Ping?
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01981.html

Thanks,
Kyrill

On 23/09/14 16:25, Kyrill Tkachov wrote:

On 23/09/14 16:07, Kyrill Tkachov wrote:

Hi all,

Some intrinsics had the wrong name (inconsistent with the NEON
intrinsics spec). This patch fixes that and adds the vrndx_f32 and
vrndxq_f32 intrinsics that were missing.

For reference, the NEON intrinsics spec can be found at:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf

Kyrill


These map down to vrintx.f32 NEON instructions (d and q forms). We
already had builtins defined for them, just the intrinsics were not
wired up to them properly.

Tested arm-none-eabi

Ok for trunk?

2014-09-23  Kyrylo Tkachov  kyrylo.tkac...@arm.com

   * config/arm/arm_neon.h (vrndqn_f32): Rename to...
   (vrndnq_f32): ... this.
   (vrndqa_f32): Rename to...
   (vrndaq_f32): ... this.
   (vrndqp_f32): Rename to...
   (vrndpq_f32): ... this.
   (vrndqm_f32): Rename to...
   (vrndmq_f32): ... this.
   (vrndx_f32): New intrinsic.
   (vrndxq_f32): Likewise.

2014-09-23  Kyrylo Tkachov  kyrylo.tkac...@arm.com

   * gcc.target/arm/simd/neon-vrndx_f32_1.c: New test.
   * gcc.target/arm/simd/neon-vrndxq_f32_1.c: Likewise.
   * gcc.target/arm/neon/vrndqaf32.c: Rename to...
   * gcc.target/arm/neon/vrndaqf32.c: ... This. Update intrinsic names.
   * gcc.target/arm/neon/vrndqmf32.c: Rename to...
   * gcc.target/arm/neon/vrndmqf32.c: ... This. Update intrinsic names.
   * gcc.target/arm/neon/vrndqnf32.c: Rename to...
   * gcc.target/arm/neon/vrndnqf32.c: ... This. Update intrinsic names.
   * gcc.target/arm/neon/vrndqpf32.c: Rename to...
   * gcc.target/arm/neon/vrndpqf32.c: ... This. Update intrinsic names.








Re: [PATCH][AArch64] LR register not used in leaf functions

2014-11-04 Thread Jiong Wang


On 01/10/14 09:00, Kugan wrote:

On 01/10/14 01:00, Jiong Wang wrote:

On 27/09/14 22:20, Kugan wrote:

On 23/09/14 01:58, Jiong Wang wrote:

On 22/09/14 16:43, Kugan wrote:


AArch64 has the same issue ARM had where the LR register was not
used in
leaf functions. This was reported in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42017. In AArch64, this
test-case need to be added with more live ranges for the need for the
LR_REGNUM. i.e test-case in the PR needs additional loops up to r31 for
the case AArch64 to see this.

The same fix (from the thread
https://gcc.gnu.org/ml/gcc-patches/2011-04/msg02191.html) which went
into ARM should apply to AArch64 as well. Regression tested on qemu for
aarch64-none-linux-gnu with no new regressions. Is this OK for trunk?

This still be a partial fix. LR should be a caller-saved register free
to use in case it's saved properly to across function call.

Indeed. This should be improved from the generic code. Right now, if a
hard register is used in EPILOGUE_USES, it conflicts with all the live
ranges till a call site kills.  I think we should have this patch till
the generic code can be improved.

below is my local patch. LR is treated as free register, and strictly
following AArch64 ABI, frame should always be created, FP maintained
properly if LR clobbered under -fno-omit-frame-pointer.

Thanks Jiong. Sorry I missed your point. As for the additions in your patch:


I am really sorry, just noticed the reply...


/* ... and any callee saved register that dataflow says is live.  */
for (regno = R0_REGNUM; regno = R30_REGNUM; regno++)
  if (df_regs_ever_live_p (regno)
-!call_used_regs[regno])
+(regno == R30_REGNUM
+   || !call_used_regs[regno]))
cfun-machine-frame.reg_offset[regno] = SLOT_REQUIRED;

AArch64 CALL_USED_REGISTERS defines R30_REGNUM to be zero. Therefore
shouldn’t this be redundant?


that's because the patch also modified r30 to be caller saved.


-0, 0, 0, 0,   0, 1, 0, 1,  /* R24 - R30, SP */ \
+0, 0, 0, 0,   0, 1, 1, 1,  /* R24 - R30, SP */ \

thus, we don't need ad-hoc code in pro/epi to save/restore LR.

on the other hand, we want LR to be a callee-saved register that it could be
used as free register is some scenario.

you can't compile the testcase lr_free_2.c for details.


  }
+  else
+{
+  /* If we decided that we didn't need a leaf frame pointer but
then used
+LR in the function, then we'll want a frame pointer after all, so
+prevent this elimination to ensure a frame pointer is used.  */
+  if (to == STACK_POINTER_REGNUM
+  flag_omit_leaf_frame_pointer
+  df_regs_ever_live_p (LR_REGNUM))
+   return false;
+}

aarch64_frame_pointer_required makes frame pointer needed when
flag_omit_leaf_frame_pointer and df_regs_ever_live_p (LR_REGNUM).
Is this addition still needed?


it's needed.

the problem is the df_reg_ever_live_p (LR_REGNUM) in 
aarch64_frame_pointer_required

  if (flag_omit_leaf_frame_pointer
   (!crtl-is_leaf || df_regs_ever_live_p (LR_REGNUM)))

is invoked before register allocation that it can only catch those LR_REGNUM 
alive scenarios produced
by inline assembly clobber, for example __asm__ (:::x30), written by user 
explicitly in code, while
it can't catch the scenarios that LR_REGNUM allocated by register allocator.

so, we need to add another check in aarch64_can_eliminate which is invoked 
after register allocation to
catch those scenarios where LR allocated by register allocator.

thanks.



Thanks,
Kugan






Re: [PATCH 8/11][ARM] Migrate to new reduc_[us](min|max)_scal_optab

2014-11-04 Thread Ramana Radhakrishnan
On Fri, Oct 24, 2014 at 1:01 PM, Alan Lawrence alan.lawre...@arm.com wrote:
 Similarly to last patch.

 Tested, in combination with previous patch:
 bootstrap on arm-none-linux-gnueabihf
 cross-tested check-gcc on arm-none-eabi.

 gcc/ChangeLog:

 config/arm/neon.md (reduc_smin_mode *2): Rename to...
 (reduc_smin_scal_mode *2): ...this; extract scalar result.
 (reduc_smax_mode *2): Rename to...
 (reduc_smax_scal_mode *2): ...this; extract scalar result.
 (reduc_umin_mode *2): Rename to...
 (reduc_umin_scal_mode *2): ...this; extract scalar result.
 (reduc_umax_mode *2): Rename to...
 (reduc_umax_scal_mode *2): ...this; extract scalar result.

Ok.

Ramana


Re: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.

2014-11-04 Thread Marcus Shawcroft
On 29 October 2014 10:34, Tejas Belagod tejas.bela...@arm.com wrote:
 On 10/10/14 15:48, David Sherwood wrote:

 I have a fix (originally written by Tejas Belagod) for the following bug:

In which case you should add his name along side yours in the ChangeLog entry...

 ChangeLog:

  gcc/:
  2014-10-10  David Sherwood  david.sherw...@arm.com

  * config/aarch64/aarch64-protos.h
 (aarch64_simd_attr_length_rglist,
  aarch64_reverse_mask): New decls.
  * config/aarch64/iterators.md (UNSPEC_REV_REGLIST): New enum.
  * config/aarch64/iterators.md (insn_count): New mode_attr.
  * config/aarch64/aarch64-simd.md (vec_store_lanes(o/c/x)i,
  vec_load_lanes(o/c/x)i): Fixed to work for Big Endian.
  * config/aarch64/aarch64-simd.md (aarch64_rev_reglist,
  aarch64_simd_(ld/st)(2/3/4)): Added.
  * config/aarch64/aarch64.c (aarch64_simd_attr_length_rglist,
  aarch64_reverse_mask): Added.


 +  RTVEC_ELT (v, (i * usize + j)) = GEN_INT (((i+1) * usize) - 1 - j);


 s/i+1/i + 1. Remove extra parentheses.

 Tejas.



OK with the fix up requested by Tejas.

/Marcus


Re: [AArch64] [BE] Fix vector load/stores to not use ld1/st1

2014-11-04 Thread Marcus Shawcroft
On 10 October 2014 16:19, Alan Hayward alan.hayw...@arm.com wrote:
 This patch is dependant on [AArch64] [BE] [1/2] Make large opaque integer
 modes endianness-safe.”

 It fixes up movoi/ci/xi for Big Endian, so that we end up with the lsb of
 a big-endian integer to be in the low byte of the highest-numbered
 register.

 movoi uses stp/ldp
 movxi needs a split version (which is shared with register-register
 movxi), which just splits to two new movs
 movci can then split to three movs. A future patch will instead split to
 an movoi and a movti.


 There are no changes for LE.

 Ran whole of check with both parts of Make large opaque integer modes
 endianness-safe”. No regressions.


 ChangeLog:

 gcc/:
 2014-10-10  Alan Hayward  alan.hayw...@arm.com
 * config/aarch64/aarch64.c
 (aarch64_classify_address): Allow extra addressing modes for BE.
 (aarch64_print_operand): new operand for printing a q register+1.
 (aarch64_simd_emit_reg_reg_move): replacement for
 aarch64_simd_disambiguate_copy that plants the required mov.
 * config/aarch64/aarch64-protos.h
 (aarch64_simd_emit_reg_reg_move): replacement for
 aarch64_simd_disambiguate_copy.
 * config/aarch64/aarch64-simd.md
 (define_split) Use new aarch64_simd_emit_reg_reg_move.
 (define_expand movmode) less restrictive predicates.
 (define_insn *aarch64_movmode) Simplify and only allow for LE.
 (define_insn *aarch64_be_movoi) New.  BE only.  Plant ldp or stp.
 (define_insn *aarch64_be_movci) New.  BE only.  No instructions.
 (define_insn *aarch64_be_movxi) New.  BE only.  No instructions.
 (define_split) OI mov.  Use new aarch64_simd_emit_reg_reg_move.
 (define_split) CI mov.  Use new aarch64_simd_emit_reg_reg_move.
 On BE
 plant movs for reg to/from mem case.
 (define_split) XI mov.  Use new aarch64_simd_emit_reg_reg_move.
 On BE
 plant movs for reg to/from mem case.


OK /Marcus


Re: Recent go changes broke alpha bootstrap

2014-11-04 Thread Uros Bizjak
On Tue, Nov 4, 2014 at 1:00 AM, Ian Taylor i...@golang.org wrote:
 On Mon, Nov 3, 2014 at 9:02 AM, Dominik Vogt v...@linux.vnet.ibm.com wrote:
 On Thu, Oct 30, 2014 at 08:05:14AM -0700, Ian Taylor wrote:
 On Thu, Oct 30, 2014 at 5:46 AM, Dominik Vogt v...@linux.vnet.ibm.com 
 wrote:
  I'm not quite sure about the best approach.  The attempt to
  represent C unions in the right way is ultimately futile as Go
  does not have the types necessary for it.  For example, the
  handling of anonymous bit fields will never be right as it's
  undefinied.  On the other hand I could fix the issue at hand by
  changing the way anonymous unions are represented in Go.
 
  Example:
 
struct { int8_t x; union { int16_t y; int 32_t z; }; };
 
  Was represented (before the patch) as
 
struct { X byte; int16 Y; }
 
  which had size 4, alignment 2 and y at offset 2 but should had
  have size 8, alignment 4 and y at offset 4.  With the current
  patch the Go layout is
 
struct { X byte; artificial_name struct { y [2]byte; align [0]int32; } }
 
  with the proper size, alignment and offset, but y is addressed as
  .artificial_name.y insted of just .y, and y is a byte array
  and not an int16.
 
  I could remove the artificial_name struct and add padding before
  and after y instead:
 
struct { X byte; pad_0 [3]byte; Y int16; pad_1 [2]byte; align [0]int32; 
  }
 
  What do you think?

 Sounds good to me.  Basically the fields of the anonymous union should
 be promoted to become fields of the struct.  We can't do it in
 general, but we can do it for the first field.  That addresses the
 actual uses of anonymous unions.

 The attached patch fixes this, at least if the first element of the
 union is not a bitfield.

 Bitfields can really not be represented properly in Go (think about
 constructs like struct { int : 1; int bf : 1; }), I'd rather not
 try to represent them in a predictable way.  The patched code may
 or may not give them a name, and reserves the proper amount of
 padding where required in structs.  If a union begins with an
 anonymous bitfield (which makes no sense), that is ignored.  If a
 union begins with a named bitfield (possibly after unnamed ones),
 this may or may not be used as the (sole) element of the Go
 struct.


 Thanks.  I committed your patch.

I have checked that the patch fixes alpha bootstrap with libgo.

Thanks,
Uros.


Re: [AArch64] [BE] [2/2] Make large opaque integer modes endianness-safe.

2014-11-04 Thread Marcus Shawcroft
On 13 October 2014 11:01, David Sherwood david.sherw...@arm.com wrote:
 Hi,

 This is the second patch of the work to fix:

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59810

 and removes the CANNOT_CHANGE_MODE_CLASS macro, which now permits subregs of
 vector registers to work correctly on aarch64_be.

 NOTE: This patch depends upon the following:
 [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.
 [AArch64] [BE] Fix vector load/stores to not use ld1/st1

 Thanks,
 David.

 ChangeLog:

 gcc/:
 2014-13-10  David Sherwood  david.sherw...@arm.com

 * config/aarch64/aarch64.h (CLEAR_INSN_CACHE): Removed.

CANNOT_CHANGE_MODE_CLASS rather than CLEAR_INSN_CACHE.


 * config/aarch64/aarch64.c (aarch64_cannot_change_mode_class): 
 Removed.
 * config/aarch64/aarch64-protos.h (aarch64_cannot_change_mode_class):
 Removed.

The first line of the ChangeLog entry should also reference the PR
number it resolves.

OK with corrected ChangeLog entry.  Thanks /Marcus


[match-and-simplify] Merge from trunk

2014-11-04 Thread Richard Biener

2014-11-04  Richard Biener  rguent...@suse.de

Merge from trunk r216941 through r217074.

Brings back next merge piece.



Re: [gomp4] acc reductions with multiple variables

2014-11-04 Thread Thomas Schwinge
Hi!

On Fri, 31 Oct 2014 16:35:27 -0700, Cesar Philippidis ce...@codesourcery.com 
wrote:
 This patch also resolves an issue when reductions are preformed on the
 host, i.e. ACC_DEVICE_TYPE=host.

Additional cleanup has now been possible; r217078:

commit a10c5e14ec563fffa45d24366f95f8cba62af4fd
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Tue Nov 4 11:50:58 2014 +

libgomp testsuite: Remove special-casing for acc_device_host_nonshm.

libgomp/
* testsuite/libgomp.oacc-c/reduction-initial-1.c
[ACC_DEVICE_TYPE_host_nonshm]: Remove special-casing for N.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@217078 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp | 5 +
 libgomp/testsuite/libgomp.oacc-c/reduction-initial-1.c | 7 +--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 35322f0..c70850b 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2014-11-04  Thomas Schwinge  tho...@codesourcery.com
+
+   * testsuite/libgomp.oacc-c/reduction-initial-1.c
+   [ACC_DEVICE_TYPE_host_nonshm]: Remove special-casing for N.
+
 2014-11-03  Cesar Philippidis  ce...@codesourcery.com
Thomas Schwinge  tho...@codesourcery.com
 
diff --git libgomp/testsuite/libgomp.oacc-c/reduction-initial-1.c 
libgomp/testsuite/libgomp.oacc-c/reduction-initial-1.c
index 0f66a39..81cf865 100644
--- libgomp/testsuite/libgomp.oacc-c/reduction-initial-1.c
+++ libgomp/testsuite/libgomp.oacc-c/reduction-initial-1.c
@@ -4,12 +4,7 @@ int
 main(void)
 {
 #define I 5
-/* TODO */
-#ifdef ACC_DEVICE_TYPE_host_nonshm
-# define N 1
-#else
-# define N 11
-#endif
+#define N 11
 #define A 8
 
   int a = A;


Grüße,
 Thomas


pgpKmWaICt1ll.pgp
Description: PGP signature


Re: [Patch 3/7 arc] Deprecate *_BY_PIECES_P, move to hookized version

2014-11-04 Thread Joern Rennecke
On 31 October 2014 15:10, James Greenhalgh james.greenha...@arm.com wrote:

 While I am there, arc defines a macro CAN_MOVE_BY_PIECES, which is
 unused, so clean that up too.

That's not a clean-up.  This pertains to PR 39350.
Which, incidentally, this hookization completely ignores, entrenching
the conflation of
move expander and move cost estimates.
Thus, can_move_by_pieces gives the wrong result for purposes of rtl
optimizations
when a target-specific movmem etc expander emits target-specific code.
The patch at https://gcc.gnu.org/ml/gcc-patches/2009-03/txt00018.txt
shows a number of call sites that are affected.

 arc only implements MOVE_BY_PIECES_P, wiring it to false. Mirror that
 behaviour, and use the default hook for other by_pieces operations.

 I tried building a compiler but no amount of fiddling with target
 strings got me to a sensible result, so this patch is completely
 untested.

You could just pick one of the configs in contrib/config-list.mk


Re: [PATCH][PPC] Skip gcc.target tests with conflicting -mcpu

2014-11-04 Thread Andrew Stubbs

On 03/11/14 18:36, Mike Stump wrote:

On Oct 30, 2014, at 10:25 AM, Andrew Stubbs a...@codesourcery.com wrote:

Many of the tests in gcc.target/powerpc specify an explicit -mcpu option with 
dg-options. This is a problem for multilib configurations that use -mcpu in 
their definition



OK to commit?


Given the discussion, I think the patch as is is fine.  Thanks.


Committed, thanks.

Andrew


Re: libgomp testsuite: (not) using a specific driver for C++, Fortran?

2014-11-04 Thread Thomas Schwinge
Hi!

On Wed, 15 Oct 2014 17:46:48 +0200, I wrote:
 No matter whether it's C, C++, or Fortran source code, the libgomp
 testsuite always uses (for build-tree testing) gcc/xgcc, or (for
 installed testing) GCC_UNDER_TEST.  It doesn't make use of
 GXX_UNDER_TEST, GFORTRAN_UNDER_TEST.  To support the latter two
 languages' needs, some -l[...] flags are then added via lang_link_flags.
 For example, for Fortran this is -lgfortran.  This is, however, not what
 would happen if using the gfortran driver to build (which is what a user
 would be doing -- which we should replicate as much as possible at least
 for installed testing): the gfortran driver also adds -lquadmath, if
 applicable.
 
 Now, I wonder why to re-invent all that in the libgomp testsuite, if the
 respective driver already has that knowledge, via spec files, for
 example?  (Also, the regular GCC compiler tests, gcc/testsuite/, are
 doing the right thing.)  Why is libgomp testsuite implemented this way --
 just a legacy of the past, or is there a need for that (that I'm not
 seeing)?
 
 Maybe the question also is: why isn't the libgomp testsuite using more of
 the infrastructure for specific languages, that is already implemented in
 gcc/testsuite/lib/?  (That is, why does libgomp have to use
 libgomp_target_compile, instead of [language]_target_compile, for
 example.)

(I decided not to look into that latter idea, at the moment.)

 And maybe that problem also applied to additional target libraries'
 testsuites; I have not yet looked.

(It does, but I'm not addressing that with the following patches.)

 Anyway, here is a prototype patch to describe how I began to address this
 for the issue I stumbled upon, which is that the linker complained:
 
 Executing on host: x86_64-none-linux-gnu-gcc 
 [...]/libgomp/testsuite/libgomp.fortran/aligned1.f03 [...] -fopenmp   -O0  
 -fopenmp -fcray-pointer   -lgfortran -lm   -o ./aligned1.exe(timeout = 
 300)
 [...]/ld: warning: libquadmath.so.0, needed by [...]/libgfortran.so, not 
 found (try using -rpath or -rpath-link)
 [...]/libgfortran.so: undefined reference to `logq@QUADMATH_1.0'
 [...]
 
 (That goes away if I add -lquadmath to the command line, but that's not
 the point I'm making here.)
 
 Am I on the right track with the following?

Nobody commented, which also means nobody disagreed -- so, here are first
a bunch of cleanup patches, refactoring, and then a patch to enable usage
of GXX_UNDER_TEST, GFORTRAN_UNDER_TEST for installed testing.  OK to
commit all that to trunk?  I tested (that is, diffed the libgomp.log
file) each step incrementally, both in non-installed and in installed
testing scenarios.

commit 6229e75038b47a09638454a812fb9eff5f31d761
Author: Thomas Schwinge tho...@codesourcery.com
Date:   Mon Nov 3 09:58:38 2014 +0100

libgomp testsuite: Only use blddir if set.

libgomp/
* testsuite/lib/libgomp.exp (libgomp_init): Only use blddir if
set.
* testsuite/libgomp.c++/c++.exp: Likewise.

(It is unclear to me why the current working directory needs to be in
LD_LIBRARY_PATH.)
---
 libgomp/testsuite/lib/libgomp.exp | 5 +++--
 libgomp/testsuite/libgomp.c++/c++.exp | 3 ++-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git libgomp/testsuite/lib/libgomp.exp libgomp/testsuite/lib/libgomp.exp
index 094e5ed..4234d4f 100644
--- libgomp/testsuite/lib/libgomp.exp
+++ libgomp/testsuite/lib/libgomp.exp
@@ -53,7 +53,6 @@ proc libgomp_init { args } {
 global srcdir blddir objdir tool_root_dir
 global libgomp_initialized
 global tmpdir
-global blddir
 global gluefile wrap_flags
 global ALWAYS_CFLAGS
 global CFLAGS
@@ -105,7 +104,7 @@ proc libgomp_init { args } {
 }
 
 # Compute what needs to be put into LD_LIBRARY_PATH
-set always_ld_library_path .:${blddir}/.libs
+set always_ld_library_path .
 
 # Compute what needs to be added to the existing LD_LIBRARY_PATH.
 if {$gccdir != } {
@@ -139,6 +138,8 @@ proc libgomp_init { args } {
 lappend ALWAYS_CFLAGS additional_flags=-B${blddir}/.libs
 lappend ALWAYS_CFLAGS additional_flags=-I${blddir}
 lappend ALWAYS_CFLAGS ldflags=-L${blddir}/.libs
+
+   append always_ld_library_path :${blddir}/.libs
 }
 lappend ALWAYS_CFLAGS additional_flags=-I${srcdir}/..
 
diff --git libgomp/testsuite/libgomp.c++/c++.exp 
libgomp/testsuite/libgomp.c++/c++.exp
index a9cf41a..c2288c8 100644
--- libgomp/testsuite/libgomp.c++/c++.exp
+++ libgomp/testsuite/libgomp.c++/c++.exp
@@ -53,7 +53,8 @@ if { $lang_test_file_found } {
 set_ld_library_path_env_vars
 
 set flags_file ${blddir}/../libstdc++-v3/scripts/testsuite_flags
-if { [file exists $flags_file] } {
+if { $blddir !=  \
+ [file exists $flags_file] } {
set libstdcxx_includes [exec sh $flags_file --build-includes]
 } else {
set libstdcxx_includes 

commit 6d8f949c98e04966e6757c80094a63ce5eb2b334
Author: Thomas Schwinge 

[PATCH 1/4] Gccgo port to s390[x] -- part II

2014-11-04 Thread Dominik Vogt
See commit comment and ChangeLog for details.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
ChangeLog
2014-11-04  Dominik Vogt  v...@linux.vnet.ibm.com

* libgo/go/syscall/libcall_linux_s390.go: New file for s390 support.
* libgo/go/syscall/syscall_linux_s390.go: Ditto.
* libgo/go/syscall/libcall_linux_s390x.go: New file for s390x support.
* libgo/go/syscall/syscall_linux_s390x.go: Ditto.
* libgo/go/runtime/pprof/pprof.go (printStackRecord): Support s390 and
s390x.
* libgo/runtime/runtime.c (runtime_cputicks): Add support for s390 and
s390x
* libgo/mksysinfo.sh: Ditto.
(upcase_fields): New helper function

2014-11-04  Dominik Vogt  v...@linux.vnet.ibm.com

* libgo/go/debug/elf/file.go (applyRelocations): Implement relocations
on s390x.
(applyRelocationsS390x): Ditto.
(DWARF): Ditto.
* libgo/go/debug/elf/elf.go (R_390): New constants for S390 relocations.
(r390Strings): Ditto.
(String): Helper function for S390 relocations.
(GoString): Ditto.

2014-11-04  Dominik Vogt  v...@linux.vnet.ibm.com

* libgo/go/reflect/makefuncgo_s390.go: New file.
(S390MakeFuncStubGo): Implementation of s390 abi.
* libgo/go/reflect/makefuncgo_s390x.go: New file.
(S390xMakeFuncStubGo): Implementation of s390x abi.
* libgo/go/reflect/makefunc_s390.c: New file.
(makeFuncStub): s390 and s390x specific implementation of function.
* libgo/go/reflect/makefunc.go
(MakeFunc): Add support for s390 and s390x.
(makeMethodValue): Ditto.
(makeValueMethod): Ditto.
* libgo/Makefile.am (go_reflect_makefunc_s_file): Ditto.
(go_reflect_makefunc_file): Ditto.
* libgo/go/reflect/makefunc_dummy.c: Ditto.
* libgo/runtime/runtime.h (__go_makefunc_can_recover): Export prototype
for use in makefunc_s390.c.
(__go_makefunc_returning): Ditto.

2014-11-04  Dominik Vogt  v...@linux.vnet.ibm.com

* libgo/go/syscall/exec_linux.go (forkAndExecInChild): Fix order of the
arguments of the clone system call for s390[x].

2014-11-04  Dominik Vogt  v...@linux.vnet.ibm.com

* libgo/configure.ac (is_s390): New variable.
(is_s390x): Ditto
(LIBGO_IS_S390): Ditto.
(LIBGO_IS_S390X): Ditto.
(GOARCH): Support s390 and s390x.
* libgo/go/go/build/build.go (cgoEnabled): Ditto.
* libgo/go/go/build/syslist.go (goarchList): Ditto.
From 27c07fd6b61dfe86169e42ffbd63d24db31aea6c Mon Sep 17 00:00:00 2001
From: Dominik Vogt v...@linux.vnet.ibm.com
Date: Tue, 4 Nov 2014 10:13:26 +0100
Subject: [PATCH 1/4] libgo: Port to s390[x].

---
 libgo/Makefile.am   |  15 ++
 libgo/configure.ac  |  16 ++
 libgo/go/debug/elf/elf.go   |  66 +
 libgo/go/debug/elf/file.go  |  45 +++-
 libgo/go/go/build/build.go  |   2 +
 libgo/go/go/build/syslist.go|   2 +-
 libgo/go/reflect/makefunc.go|   2 +
 libgo/go/reflect/makefunc_s390.c|  92 +++
 libgo/go/reflect/makefuncgo_s390.go | 454 
 libgo/go/reflect/makefuncgo_s390x.go| 436 ++
 libgo/go/runtime/pprof/pprof.go |   5 +
 libgo/go/syscall/exec_linux.go  |   7 +-
 libgo/go/syscall/libcall_linux_s390.go  |   7 +
 libgo/go/syscall/libcall_linux_s390x.go |   7 +
 libgo/go/syscall/syscall_linux_s390.go  |  21 ++
 libgo/go/syscall/syscall_linux_s390x.go |  21 ++
 libgo/mksysinfo.sh  |  53 +++-
 libgo/runtime/runtime.c |   8 +
 18 files changed, 1248 insertions(+), 11 deletions(-)
 create mode 100644 libgo/go/reflect/makefunc_s390.c
 create mode 100644 libgo/go/reflect/makefuncgo_s390.go
 create mode 100644 libgo/go/reflect/makefuncgo_s390x.go
 create mode 100644 libgo/go/syscall/libcall_linux_s390.go
 create mode 100644 libgo/go/syscall/libcall_linux_s390x.go
 create mode 100644 libgo/go/syscall/syscall_linux_s390.go
 create mode 100644 libgo/go/syscall/syscall_linux_s390x.go

diff --git a/libgo/Makefile.am b/libgo/Makefile.am
index 3a42250..79cfdd8 100644
--- a/libgo/Makefile.am
+++ b/libgo/Makefile.am
@@ -943,11 +943,26 @@ go_reflect_makefunc_file = \
 go_reflect_makefunc_s_file = \
 	go/reflect/makefunc_386.S
 else
+if LIBGO_IS_S390
+go_reflect_makefunc_file = \
+	go/reflect/makefuncgo_s390.go
+go_reflect_makefunc_s_file = \
+	go/reflect/makefunc_s390.c
+else
+if LIBGO_IS_S390X
+go_reflect_makefunc_file = \
+	go/reflect/makefuncgo_s390x.go \
+	go/reflect/makefuncgo_s390.go
+go_reflect_makefunc_s_file = \
+	go/reflect/makefunc_s390.c
+else
 go_reflect_makefunc_file =
 go_reflect_makefunc_s_file = \
 	go/reflect/makefunc_dummy.c
 endif
 endif
+endif
+endif
 
 go_reflect_files = \
 	go/reflect/deepequal.go \
diff --git a/libgo/configure.ac b/libgo/configure.ac
index 0469b89..d651827 

[PATCH 0/4] Gccgo port to s390[x] -- part II

2014-11-04 Thread Dominik Vogt
This is the second set of patches to support s390[x] with Gccgo,
containing mostly architecture dependent parts that affect the
following directories:

  gcc/testsuite/go.test (golang)
  gcc/testsuite/go.test/go-test.exp (Gcc?)
  libgo (Gcc?)
  libgo/go  (gofrontend)
  libgo/runtime (gofrontend)

It seemed infeasible to split this patch series and send the
patches only to the right places.

All patches are required for proper s390[x] support.  With this
series, s390[x] support is basically complete.  However, I have
another patch that adds complex type support in the reflection
library, but that requires the latest libffi code.  I'll look into
merging that back into gcc next.

Summary of this series:
---

0001: Ports libgo to s390[x].
0002: Adapts some tests in libgo but also fixes some bugs in the
  tests.
0003: Allow //+ build for the go tests in Gcc.
0004: s390[x] specific fixes for some tests in golang.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



[PATCH 2/4] Gccgo port to s390[x] -- part II

2014-11-04 Thread Dominik Vogt
See commit comment and ChangeLog for details.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
ChangeLog
2014-11-04  Dominik Vogt  v...@linux.vnet.ibm.com

* libgo/go/sync/atomic/atomic_test.go: Use LoadInt32() to access sync
variables.
* libgo/go/math/all_test.go (tolerance): Fix bug in calculation.
(alike): Less strict precision requirements.
(TestLog2): Likewise.
From f62844a6e4a0f76c30fe3f1cffc0933f21bbf43d Mon Sep 17 00:00:00 2001
From: Dominik Vogt v...@linux.vnet.ibm.com
Date: Tue, 4 Nov 2014 10:13:16 +0100
Subject: [PATCH 2/4] libgo: Test fixes for s390[x].

1) libgo/math: Fix TestLog2 failures.

Fixes necessary to successfully run the test on s390x.  See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63269 for details.

2) libgo/sync/atomic: Fix tests for sync_atomic.CompareAndSwap...().

*addr must not be accessed directly as the compiler can optimise the access to
memory away, or make mulitple accesses (which it actually does on s390x). As Go
does have no way to express that the memory is volatile, read it with the
sync_atomic.Load...() function family instead.
---
 libgo/go/math/all_test.go   |  9 ++---
 libgo/go/sync/atomic/atomic_test.go | 16 
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/libgo/go/math/all_test.go b/libgo/go/math/all_test.go
index 0d8b10f..0d19e14 100644
--- a/libgo/go/math/all_test.go
+++ b/libgo/go/math/all_test.go
@@ -1671,8 +1671,8 @@ func tolerance(a, b, e float64) bool {
 		d = -d
 	}
 
-	if a != 0 {
-		e = e * a
+	if b != 0 {
+		e = e * b
 		if e  0 {
 			e = -e
 		}
@@ -1687,6 +1687,9 @@ func alike(a, b float64) bool {
 	switch {
 	case IsNaN(a)  IsNaN(b):
 		return true
+	case a == 0  !IsNaN(b)  !IsInf(b, 0):
+		// allow deviations when the expected value is zero
+		return true
 	case a == b:
 		return Signbit(a) == Signbit(b)
 	}
@@ -2284,7 +2287,7 @@ func TestLog2(t *testing.T) {
 	for i := -1074; i = 1023; i++ {
 		f := Ldexp(1, i)
 		l := Log2(f)
-		if l != float64(i) {
+		if !veryclose(l, float64(i)) {
 			t.Errorf(Log2(2**%d) = %g, want %d, i, l, i)
 		}
 	}
diff --git a/libgo/go/sync/atomic/atomic_test.go b/libgo/go/sync/atomic/atomic_test.go
index d2af4f4..eaa3b6b 100644
--- a/libgo/go/sync/atomic/atomic_test.go
+++ b/libgo/go/sync/atomic/atomic_test.go
@@ -858,7 +858,7 @@ func hammerCompareAndSwapInt32(uaddr *uint32, count int) {
 	addr := (*int32)(unsafe.Pointer(uaddr))
 	for i := 0; i  count; i++ {
 		for {
-			v := *addr
+			v := LoadInt32(addr)
 			if CompareAndSwapInt32(addr, v, v+1) {
 break
 			}
@@ -869,7 +869,7 @@ func hammerCompareAndSwapInt32(uaddr *uint32, count int) {
 func hammerCompareAndSwapUint32(addr *uint32, count int) {
 	for i := 0; i  count; i++ {
 		for {
-			v := *addr
+			v := LoadUint32(addr)
 			if CompareAndSwapUint32(addr, v, v+1) {
 break
 			}
@@ -883,7 +883,7 @@ func hammerCompareAndSwapUintptr32(uaddr *uint32, count int) {
 	addr := (*uintptr)(unsafe.Pointer(uaddr))
 	for i := 0; i  count; i++ {
 		for {
-			v := *addr
+			v := LoadUintptr(addr)
 			if CompareAndSwapUintptr(addr, v, v+1) {
 break
 			}
@@ -897,7 +897,7 @@ func hammerCompareAndSwapPointer32(uaddr *uint32, count int) {
 	addr := (*unsafe.Pointer)(unsafe.Pointer(uaddr))
 	for i := 0; i  count; i++ {
 		for {
-			v := *addr
+			v := LoadPointer(addr)
 			if CompareAndSwapPointer(addr, v, unsafe.Pointer(uintptr(v)+1)) {
 break
 			}
@@ -1039,7 +1039,7 @@ func hammerCompareAndSwapInt64(uaddr *uint64, count int) {
 	addr := (*int64)(unsafe.Pointer(uaddr))
 	for i := 0; i  count; i++ {
 		for {
-			v := *addr
+			v := LoadInt64(addr)
 			if CompareAndSwapInt64(addr, v, v+1) {
 break
 			}
@@ -1050,7 +1050,7 @@ func hammerCompareAndSwapInt64(uaddr *uint64, count int) {
 func hammerCompareAndSwapUint64(addr *uint64, count int) {
 	for i := 0; i  count; i++ {
 		for {
-			v := *addr
+			v := LoadUint64(addr)
 			if CompareAndSwapUint64(addr, v, v+1) {
 break
 			}
@@ -1064,7 +1064,7 @@ func hammerCompareAndSwapUintptr64(uaddr *uint64, count int) {
 	addr := (*uintptr)(unsafe.Pointer(uaddr))
 	for i := 0; i  count; i++ {
 		for {
-			v := *addr
+			v := LoadUintptr(addr)
 			if CompareAndSwapUintptr(addr, v, v+1) {
 break
 			}
@@ -1078,7 +1078,7 @@ func hammerCompareAndSwapPointer64(uaddr *uint64, count int) {
 	addr := (*unsafe.Pointer)(unsafe.Pointer(uaddr))
 	for i := 0; i  count; i++ {
 		for {
-			v := *addr
+			v := LoadPointer(addr)
 			if CompareAndSwapPointer(addr, v, unsafe.Pointer(uintptr(v)+1)) {
 break
 			}
-- 
1.8.4.2



[PATCH 3/4] Gccgo port to s390[x] -- part II

2014-11-04 Thread Dominik Vogt
See commit comment and ChangeLog for details.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/testsuite/ChangeLog
2014-11-04  Dominik Vogt  v...@linux.vnet.ibm.com

* go.test/go-test.exp: Add handling of // +build.
From 12f9861253b1c9b8758b5356daedad98bfc044c4 Mon Sep 17 00:00:00 2001
From: Dominik Vogt v...@linux.vnet.ibm.com
Date: Tue, 4 Nov 2014 10:13:39 +0100
Subject: [PATCH 3/4] go.test: Add handling of // +build to go-test.exp.

---
 gcc/testsuite/go.test/go-test.exp | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/go.test/go-test.exp b/gcc/testsuite/go.test/go-test.exp
index fd83722..71272a3 100644
--- a/gcc/testsuite/go.test/go-test.exp
+++ b/gcc/testsuite/go.test/go-test.exp
@@ -449,13 +449,22 @@ proc go-gc-tests { } {
 	}
 
 	if { [ string match // +build * $test_line ] } {
-		if { [ string match *[getenv GOARCH]* $test_line ] } {
-		continue
-		}
-		if { [ string match *linux* $test_line ] } {
-		continue
+		set matches_pos 0
+		set matches_neg 0
+		if { [ regexp -line \[ 	\][getenv GOARCH]\(\[ 	\]\|\$\) $test_line ] } {
+		set matches_pos 1
+		} elseif { [ regexp -line \[ 	\]\![getenv GOARCH]\(\[ 	\]\|\$\) $test_line ] } {
+		set matches_neg 1
+		} elseif { [ regexp -line \[ 	\]linux\(\[ 	\]\|\$\) $test_line ] } {
+		set matches_pos 1
+		} elseif { [ regexp -line \[ 	\]\!linux\(\[ 	\]\|\$\) $test_line ] } {
+		set matches_neg 1
+		} elseif { [ regexp -line \[ 	\]\!windows\(\[ 	\]\|\$\) $test_line ] } {
+		set matches_pos 1
+		} elseif { [ regexp -line \[ 	\]windows\(\[ 	\]\|\$\) $test_line ] } {
+		set matches_neg 1
 		}
-		if { [ string match *!windows* $test_line ] } {
+		if { $matches_pos == 1  $matches_neg == 0 } {
 		continue
 		}
 		close $fd
-- 
1.8.4.2



[PATCH 4/4] Gccgo port to s390[x] -- part II

2014-11-04 Thread Dominik Vogt
See commit comment and ChangeLog for details.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/testsuite/ChangeLog
2014-11-04  Dominik Vogt  v...@linux.vnet.ibm.com

* go.test/test/ken/cplx2.go:
Fix s390 test failures.

2014-11-04  Dominik Vogt  v...@linux.vnet.ibm.com

* go.test/test/nilptr_s390.go: Port nilptr.go to s390.
* go.test/test/nilptr_s390x.go: Port nilptr.go to s390x.
* go.test/test/nilptr.go: Do not run on s390/s390x.
* go.test/go-test.exp:
// +build  matches complete words without whitespace.  This is
rnecessary to distinguish s390 from s390x.  It also handles negation
with the '!' prefix.

2014-11-04  Dominik Vogt  v...@linux.vnet.ibm.com

* go.test/go-test.exp (go-set-goarch): Enable tests on s390[x].

2014-11-04  Dominik Vogt  v...@linux.vnet.ibm.com

* go.test/go-test.exp: Add handling of // +build.
From 5b0eaf73d005bd152fff5a1a922f5618da4be939 Mon Sep 17 00:00:00 2001
From: Dominik Vogt v...@linux.vnet.ibm.com
Date: Tue, 4 Nov 2014 10:13:22 +0100
Subject: [PATCH 4/4] go.test: Changes required for s390[x] port.

1) Add Go architectures s390 and s390x.

2) Do not run test nilptr.go on s390 - platform specific test.

   * Detects word boundaries to distinguish between s390 and s390x.
 (Switch to using regular expressions.)
   * Implement the Prefix '!' to exclude targets from build.

3) Fix go/test/ken/cplx2.go test failures.
---
 gcc/testsuite/go.test/go-test.exp  |   6 +
 gcc/testsuite/go.test/test/ken/cplx2.go|  20 ++-
 gcc/testsuite/go.test/test/nilptr.go   |   1 +
 gcc/testsuite/go.test/test/nilptr_s390.go  | 190 +
 gcc/testsuite/go.test/test/nilptr_s390x.go | 190 +
 5 files changed, 405 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/go.test/test/nilptr_s390.go
 create mode 100644 gcc/testsuite/go.test/test/nilptr_s390x.go

diff --git a/gcc/testsuite/go.test/go-test.exp b/gcc/testsuite/go.test/go-test.exp
index 71272a3..25e405b 100644
--- a/gcc/testsuite/go.test/go-test.exp
+++ b/gcc/testsuite/go.test/go-test.exp
@@ -244,6 +244,12 @@ proc go-set-goarch { } {
 		set goarch ppc64
 	}
 	}
+	s390-*-* {
+	set goarch s390
+	}
+	s390x-*-* {
+	set goarch s390x
+	}
 	sparc*-*-* {
 	if [check_effective_target_ilp32] {
 		set goarch sparc
diff --git a/gcc/testsuite/go.test/test/ken/cplx2.go b/gcc/testsuite/go.test/test/ken/cplx2.go
index eb1da7b..d11e33c 100644
--- a/gcc/testsuite/go.test/test/ken/cplx2.go
+++ b/gcc/testsuite/go.test/test/ken/cplx2.go
@@ -97,13 +97,29 @@ func main() {
 	}
 
 	cd := c5 / c6
-	if cd != Cd {
+	dr := real(Cd) - real(cd)
+	if dr  0 {
+		dr = -dr
+	}
+	di := imag(Cd) - imag(cd)
+	if di  0 {
+		di = -di
+	}
+	if dr  .00059604644775390625 || di  0 {
 		println(opcode x, cd, Cd)
 		panic(fail)
 	}
 
 	ce := cd * c6
-	if ce != Ce {
+	dr = real(Ce) - real(ce)
+	if dr  0 {
+		dr = -dr
+	}
+	di = imag(Ce) - imag(ce)
+	if di  0 {
+		di = -di
+	}
+	if dr  0 || di  0.0095367431640625 {
 		println(opcode x, ce, Ce)
 		panic(fail)
 	}
diff --git a/gcc/testsuite/go.test/test/nilptr.go b/gcc/testsuite/go.test/test/nilptr.go
index 9631d16..574d662 100644
--- a/gcc/testsuite/go.test/test/nilptr.go
+++ b/gcc/testsuite/go.test/test/nilptr.go
@@ -1,3 +1,4 @@
+// +build !s390 !s390x
 // run
 
 // Copyright 2011 The Go Authors.  All rights reserved.
diff --git a/gcc/testsuite/go.test/test/nilptr_s390.go b/gcc/testsuite/go.test/test/nilptr_s390.go
new file mode 100644
index 000..d79916d
--- /dev/null
+++ b/gcc/testsuite/go.test/test/nilptr_s390.go
@@ -0,0 +1,190 @@
+// +build s390
+// run
+
+// Copyright 2011 The Go Authors.  All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Test that the implementation catches nil ptr indirection in a
+// large address space.
+
+package main
+
+import unsafe
+
+// Having a big address space means that indexing at a large
+// offset from a nil pointer might not cause a memory access
+// fault.  This test checks that Go is doing the correct explicit
+// checks to catch these nil pointer accesses, not just relying on
+// the hardware.
+//
+// Give us a big address space somewhere near min_bss_offset.
+const in_mem_size uintptr = 256  20 // 256 MiB
+const min_bss_offset uintptr = 1  22 // 0x0040
+const maxlen uintptr = (1  31) - 2 // 0x7ffe
+var dummy [in_mem_size]byte
+
+func main() {
+	// The test only tests what we intend to test if dummy
+	// starts near 0x004.  Otherwise there might not be
+	// anything mapped at the address that might be
+	// accidentally dereferenced below.
+	if uintptr(unsafe.Pointer(dummy))  in_mem_size + min_bss_offset {
+		panic(dummy too far out)
+	} else if uintptr(unsafe.Pointer(dummy))  min_bss_offset {
+		panic(dummy too close)
+	}
+
+	shouldPanic(p1)
+	shouldPanic(p2)
+	shouldPanic(p3)
+	shouldPanic(p4)
+	

Re: [PATCH, BUILDROBOT] SH: Fix unused variable warning (was: [SH][committed] PR 53513 - Add __builtin_sh_get_fpscr, __builtin_sh_set_fpscr)

2014-11-04 Thread Oleg Endo




On 4 Nov 2014, at 11:50, Jan-Benedict Glaw jbg...@lug-owl.de wrote:

 On Sat, 2014-10-18 12:54:33 +0200, Oleg Endo oleg.e...@t-online.de wrote:
 Hi,
 
 As discussed in the PR, this adds two new SH built-in functions
 __builtin_sh_get_fpscr __builtin_sh_set_fpscr.  Tested on r216173 with  
 
 make -k check RUNTESTFLAGS=--target_board=sh-sim\{-m4/-ml,-m4/-mb}
 
 and no new failures.  Committed as r216424.
 
 Cheers,
 Oleg
 
 gcc/ChangeLog:
PR target/53513
* config/sh/sh-modes.def (PSI): Remove.
* config/sh/sh-protos.h (get_fpscr_rtx): Remove.
* config/sh/sh.c (fpscr_rtx, get_fpscr_rtx): Remove.
(sh_reorg): Remove commented out FPSCR code.
(fpscr_set_from_mem): Use SImode instead of PSImode.  Emit lds_fpscr
insn instead of move insn.
(sh_hard_regno_mode_ok): Return SImode for FPSCR.
(sh_legitimate_address_p, sh_legitimize_reload_address): Remove PSImode
handling.
(sh_emit_mode_set): Emit lds_fpscr and sts_fpscr insns.
(sh1_builtin_p): Uncomment.
(SH_BLTIN_UV 25, SH_BLTIN_VU 26): New macros.
(bdesc): Add __builtin_sh_get_fpscr and __builtin_sh_set_fpscr.
 
 Change to emit_fpu_switch() is missing here.
 
 Index: gcc/config/sh/sh.c
 ===
 --- gcc/config/sh/sh.c(revision 216350)
 +++ gcc/config/sh/sh.c(working copy)
 
 This chunk is in emit_fpu_switch(), which drops every reference to
 `dst':
 @@ -10055,13 +10032,12 @@
   emit_move_insn (scratch, XEXP (src, 0));
   if (index != 0)
emit_insn (gen_addsi3 (scratch, scratch, GEN_INT (index * 4)));
 -  src = adjust_automodify_address (src, PSImode, scratch, index * 4);
 +  src = adjust_automodify_address (src, SImode, scratch, index * 4);
 }
   else
 -src = adjust_address (src, PSImode, index * 4);
 +src = adjust_address (src, SImode, index * 4);
 
 -  dst = get_fpscr_rtx ();
 -  emit_move_insn (dst, src);
 +  emit_insn (gen_lds_fpscr (src));
 }
 
 static rtx get_free_reg (HARD_REG_SET);
 
 ...which in turn results in fall-out when simply building it with
 config-list.mk, see eg.
 http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=372472 :
 
 [...]
 g++ -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions 
 -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing 
 -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual 
 -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror 
 -fno-common  -DHAVE_CONFIG_H -I. -I. -I../../../gcc/gcc -I../../../gcc/gcc/. 
 -I../../../gcc/gcc/../include -I../../../gcc/gcc/../libcpp/include 
 -I/opt/cfarm/mpc/include  -I../../../gcc/gcc/../libdecnumber 
 -I../../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
 -I../../../gcc/gcc/../libbacktrace-o sh.o -MT sh.o -MMD -MP -MF 
 ./.deps/sh.TPo ../../../gcc/gcc/config/sh/sh.c
 ../../../gcc/gcc/config/sh/sh.c: In function ‘void emit_fpu_switch(rtx, int)’:
 ../../../gcc/gcc/config/sh/sh.c:10027:7: error: unused variable ‘dst’ 
 [-Werror=unused-variable]
   rtx dst, src;
   ^
 cc1plus: all warnings being treated as errors
 make[2]: *** [sh.o] Error 1
 
 This should fix it, ok for mainline?

Yes, OK.  Thanks for catching it.

Cheers,
Oleg


 
 2014-11-04  Jan-Benedict Glaw  jbg...@lug-owl.de
 
* config/sh/sh.c (emit_fpu_switch): Drop unused automatic variable.
 
 
 diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
 index 3a4f5e9..be944da 100644
 --- a/gcc/config/sh/sh.c
 +++ b/gcc/config/sh/sh.c
 @@ -10024,7 +10024,7 @@ static GTY(()) tree fpscr_values;
 static void
 emit_fpu_switch (rtx scratch, int index)
 {
 -  rtx dst, src;
 +  rtx src;
 
   if (fpscr_values == NULL)
 {
 
 
 
 MfG, JBG
 
 -- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
 Signature of:http://www.chiark.greenend.org.uk/~sgtatham/bugs.html
 the second  :


Re: [PATCH] Update libgcc.texi to match implementation in libgcc/libgcc2.c

2014-11-04 Thread Kito Cheng
ping!

On Tue, Oct 28, 2014 at 11:19 AM, Kito Cheng kito.ch...@gmail.com wrote:
 Hi all:

 This patch update `Bit operations` section in libgcc.text, most bit
 operation function is take an unsigned integer instead of signed
 integer in libgcc/libgcc2.c [1], and it seem more make sense :)

 ChangeLog
 2014-10-28  Kito Cheng  k...@0xlab.org

 * doc/libgcc.texi: Update text to match implementation in
 libgcc/libgcc2.c

 [1] 
 https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=libgcc/libgcc2.c;h=46d6a2ef030ff98a944935f77529cee9c05fb326;hb=HEAD


Re: nvptx offloading patches [3/n], i386 bits RFD

2014-11-04 Thread Bernd Schmidt

On 11/03/2014 11:27 PM, Jeff Law wrote:

On 11/01/14 05:57, Bernd Schmidt wrote:

This is not against current trunk; it applies to gomp-4_0-branch where
it is one of the necessary parts to make offloading x86-nvptx work. The
issue is that the LTO file format depends on the machine_modes enum, it
needs to match between host and offload target. The easiest way to do
this is to just use the host-modes.def when compiling an offload
compiler.

Ports that want to be hosts for offloading may need to modify their
modes.def. The patch below contains changes to i386-modes.def which
modifies XFmode depending on a target switch. I'm not actually entirely
sure what to do about this. Do we want to make this flag an error when
offloading is enabled? Or maybe add float format support to the
-foffload-abi option?

Thoughts? Ok for the first part of the patch once the other offloading
patches have gone in (bootstrapped and tested on x86_64-linux)?

It feels like we've got another real distinction to make.  We've had
host, build  target and they're all independent.  It feels like we need
offload target and better separate between target and offload target.
Then we need to figure out the places where we've got bleed-out.


Is this a question of terminology? I agree that saying offload host 
when we'd normally be calling it the target is confusing, but it's 
difficult to come up with better names.


Offload host and target are not quite independent - when it's 
implemented through LTO at least there has to be a fairly close 
agreement even down to machine modes, which is why a patch like this is 
necessary.



Not sure how to deal with this any further out than the immediate term
than using a hack like this. Though I'd prefer to avoid the #ifdef as it
seems to me this shouldn't be baked in at build/configure time.


Yeah, I'm not expecting the i386 part to go in quite as-is. For 
reference I'm including the offload-abi patch - Ilya is submitting this 
along with other option changes. One possibility would be to print and 
recognize strings such as lp64D128 or lp64D96 which would include 
information about the size of long double. Somehow though I can't really 
bring myself to believe that -mlong-double128 is a real use case with 
offloading so we might just disallow the combination.


CCing Uros in case he has an opinion.


Bernd

Index: gcc/common.opt
===
--- gcc/common.opt.orig
+++ gcc/common.opt
@@ -1601,6 +1601,19 @@ fnon-call-exceptions
 Common Report Var(flag_non_call_exceptions) Optimization
 Support synchronous non-call exceptions
 
+foffload-abi=
+Common Joined RejectNegative Enum(offload_abi) Var(flag_offload_abi) Init(OFFLOAD_ABI_UNSET)
+-foffload-abi=[lp64|ilp32]	Set the ABI to use in an offload compiler
+
+Enum
+Name(offload_abi) Type(enum offload_abi) UnknownError(unknown offload ABI %qs)
+
+EnumValue
+Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
+
+EnumValue
+Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
+
 fomit-frame-pointer
 Common Report Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c.orig
+++ gcc/config/i386/i386.c
@@ -4261,6 +4261,15 @@ ix86_option_override (void)
   register_pass (insert_vzeroupper_info);
 }
 
+/* Implement the TARGET_OFFLOAD_OPTIONS hook.  */
+static char *
+ix86_offload_options (void)
+{
+  if (TARGET_LP64)
+return xstrdup (-foffload-abi=lp64);
+  return xstrdup (-foffload-abi=ilp32);
+}
+
 /* Update register usage after having seen the compiler flags.  */
 
 static void
@@ -47142,6 +47151,10 @@ ix86_atomic_assign_expand_fenv (tree *ho
 #define TARGET_FLOAT_EXCEPTIONS_ROUNDING_SUPPORTED_P \
   ix86_float_exceptions_rounding_supported_p
 
+#undef TARGET_OFFLOAD_OPTIONS
+#define TARGET_OFFLOAD_OPTIONS \
+  ix86_offload_options
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include gt-i386.h
Index: gcc/coretypes.h
===
--- gcc/coretypes.h.orig
+++ gcc/coretypes.h
@@ -111,6 +111,12 @@ enum tls_model {
   TLS_MODEL_LOCAL_EXEC
 };
 
+enum offload_abi {
+  OFFLOAD_ABI_UNSET,
+  OFFLOAD_ABI_LP64,
+  OFFLOAD_ABI_ILP32
+};
+
 /* Types of unwind/exception handling info that can be generated.  */
 
 enum unwind_info_type
Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi.orig
+++ gcc/doc/tm.texi
@@ -11446,3 +11446,12 @@ Used when offloaded functions are seen i
 sections are available.  It is called once for each symbol that must be
 recorded in the offload function and variable table.
 @end deftypefn
+
+@deftypefn {Target Hook} {char *} TARGET_OFFLOAD_OPTIONS (void)
+Used when writing out the list of options into an LTO file.  It should
+translate any relevant target-specific options (such as the ABI in use)
+into one 

Re: [PATCH, BUILDROBOT] SH: Fix unused variable warning (was: [SH][committed] PR 53513 - Add __builtin_sh_get_fpscr, __builtin_sh_set_fpscr)

2014-11-04 Thread Jan-Benedict Glaw
On Tue, 2014-11-04 13:21:24 +0100, Oleg Endo oleg.e...@t-online.de wrote:
 On 4 Nov 2014, at 11:50, Jan-Benedict Glaw jbg...@lug-owl.de wrote:
 
  2014-11-04  Jan-Benedict Glaw  jbg...@lug-owl.de
  
 * config/sh/sh.c (emit_fpu_switch): Drop unused automatic variable.
[...]
  This should fix it, ok for mainline?
 
 Yes, OK.  Thanks for catching it.

Thanks.  Committed as r217082.

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of:  Fortschritt bedeutet, einen Schritt so zu machen,
the second  :   daß man den nächsten auch noch machen kann.


signature.asc
Description: Digital signature


Re: [match-and-simplify] support operator list

2014-11-04 Thread Richard Biener
On Mon, Nov 3, 2014 at 4:50 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Mon, Nov 03, 2014 at 04:46:43PM +0100, Richard Biener wrote:
 On Mon, Nov 3, 2014 at 2:01 PM, Jakub Jelinek ja...@redhat.com wrote:
  On Mon, Nov 03, 2014 at 06:26:12PM +0530, Prathamesh Kulkarni wrote:
  --- gcc/match-comparison.pd   (revision 216916)
  +++ gcc/match-comparison.pd   (working copy)
  @@ -1,5 +1,8 @@
   /* From fold_binary.  */
 
  +(define_operator_list eq_ops eq ne)
  +(define_operator_list cc eq_ops lt le gt ge)
 
  I think cc is a bad name for the macro, that usually stands for condition
  code register.

 OTOH it is a perfect match for 'condition code'.

 So eqcodes and ccodes, or comp_code, ... ?
 Saving a few keystrokes there can be a problem for readability.
 Not to mention that there are various other tcc_comparison codes (lggt,
 unordered, ordered, un{lt,le,gt,ge,eq}).

Sure.  Let's use ccodes then - or tcc_comparison (though I thought that
was quite long).

Well, the patch should be mostly about the new syntax of course.

Richard.


 Jakub


Re: libstdc++ testsuite make targets check-parallel and check-performance don't work anymore

2014-11-04 Thread Paolo Carlini

Hi again,

On 11/04/2014 05:34 AM, Jonathan Wakely wrote:

On 04/11/14 03:41 +, Jonathan Wakely wrote:

On 03/11/14 22:07 +, Jonathan Wakely wrote:
On 3 November 2014 17:51, Paolo Carlini paolo.carl...@oracle.com 
wrote:

.. other than the above issue, I see a segmentation fault for:

   performance/ext/pb_ds/multimap_text_insert_mem_large.cc
Not a big deal of course, but unfortunately today I'm seeing *two* 
segfaults for pb_ds:


performance/ext/pb_ds/multimap_text_insert_mem_large.cc
.../libstdc++-v3/scripts/check_performance: line 41: 16173 Segmentation 
fault  ./$EXE_NAME tmp.$FILE_NAME

performance/ext/pb_ds/multimap_text_insert_mem_small.cc
.../libstdc++-v3/scripts/check_performance: line 41: 16217 Segmentation 
fault  ./$EXE_NAME tmp.$FILE_NAME


Paolo.


Re: [AArch64, Docs, Patch] Add reference to ACLE in docs.

2014-11-04 Thread Tejas Belagod

On 03/11/14 17:58, Joseph Myers wrote:

On Mon, 3 Nov 2014, Tejas Belagod wrote:


If I mention in a couple of sentences the level of ACLE support there is in
GCC currently, this section will need to be updated every time there is an
improvement in ACLE support - I guess we'll just have to remember to remove
parts of this section as we do that.


Yes, it's generally the case when adding new user-visible features that
documentation needs updating.  The release notes (gcc-N/changes.html in
htdocs) should be updated for any significant new features as well.



Thanks. The AArch64 ACLE CRC32 intrinsics were introduced in 4.9, so its 
not new in 5.0 - https://gcc.gnu.org/gcc-4.9/changes.html. But we've 
improved AArch64 NEON Intrinsics in 5.0 significantly which deserves a 
mention. I'll do that in a separate patch.


Revised patch to fix extend.texi for ACLE attached. OK for trunk?

Thanks,
Tejas.

2014-11-04  Tejas Belagod  tejas.bela...@arm.com

gcc/
* Makefile.in (TEXI_GCC_FILES): Remove arm-acle-intrinsics.texi,
arm-neon-intrinsics.texi, aarch64-acle-intrinsics.texi.
* doc/aarch64-acle-intrinsics.texi: Remove.
* doc/arm-acle-intrinsics.texi: Remove.
* doc/arm-neon-intrinsics.texi: Remove.
* doc/extend.texi: Consolidate sections AArch64 intrinsics,
ARM NEON Intrinsics, ARM ACLE Intrinsics into one ARM C Language
Extension section. Add references to public ACLE specification.

intrinsics-doc-3.txt.gz
Description: application/gzip


Re: libstdc++ testsuite make targets check-parallel and check-performance don't work anymore

2014-11-04 Thread Paolo Carlini

.. and also:

performance/ext/pb_ds/priority_queue_text_pop_mem.cc
.../libstdc++-v3/scripts/check_performance: line 41: 16905 Segmentation 
fault  ./$EXE_NAME tmp.$FILE_NAME


Paolo.


Re: libstdc++ testsuite make targets check-parallel and check-performance don't work anymore

2014-11-04 Thread Rainer Emrich
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Am 11/04/2014 10:55 AM, schrieb Paolo Carlini:
 .. thanks a lot Jon! (after all this parallel mode is still useful
 for something ;)
 
 Paolo.
Sorry for the terse message. I'm under heavy workload at the moment.
But AFAIS now everything looks good again.
Thanks a lot, Paolo and Jon!

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJUWNVLAAoJEB3HOsWs+KJb24YIAJ9WUYMXxfVfQetYQjh60rKv
VGJaJ30imyhx5i+G06XI9vPpds0zICnAm/CPNp4bQKAR8XkO0CzI7qbP63KYTPOA
nRBDG1sbhxOpXYWbx7CJ52MM860bYF4MuvVSX1aXVmE2at1MqiYxSscARYGJx5b8
e7+7C12TuyocdU9l3edv3j1eA2eW0R4A9Ae1XECsUQg8llgnhiT5p+Jd1fffwUSH
g+v8TDXwqcvmdt/aTfyP1C/Gl3YWh/uZARqcbiShq+dlb2je7n2s1l/ZpXc26vXN
Z10BIt7x3FRM61qQuUuiRHlRgEcEeaFpy4uPoYKLfbsAToPYcPKbY8YE7qQGNKo=
=SdTc
-END PGP SIGNATURE-


Re: [Patch 3/7 arc] Deprecate *_BY_PIECES_P, move to hookized version

2014-11-04 Thread James Greenhalgh
On Tue, Nov 04, 2014 at 12:07:56PM +, Joern Rennecke wrote:
 On 31 October 2014 15:10, James Greenhalgh james.greenha...@arm.com wrote:
 
  While I am there, arc defines a macro CAN_MOVE_BY_PIECES, which is
  unused, so clean that up too.
 
 That's not a clean-up.  This pertains to PR 39350.

Well, it is a clean-up in the sense that this macro is completely unused
in the compiler and has no effect, but please revert this hunk if that
is your preference.

 Which, incidentally, this hookization completely ignores, entrenching
 the conflation of move expander and move cost estimates.

No, I have to disagree. The use_by_pieces_infrastructure_p hook doesn't
conflate anything - it gives a response to the question Should the
by_pieces infrastructure be used?. A target specific movmem pattern
- though it might itself choose to move things by pieces, is
categorically not using the move_by_pieces infrastructure.

If we want to keep a clean separation of concerns here, we would
want a similar target hook asking the single question will your
movmem/setmem expander succeed?.

 Thus, can_move_by_pieces gives the wrong result for purposes of rtl
 optimizations
 when a target-specific movmem etc expander emits target-specific code.
 The patch at https://gcc.gnu.org/ml/gcc-patches/2009-03/txt00018.txt
 shows a number of call sites that are affected.

can_move_by_pieces (likewise can_store_by_pieces) gives the right
result, the RTL expanders are using it wrong.

I disagree with the approach taken in your patch as it overloads the
purpose of can_move_by_pieces. However, I would support a patch pulling
this out in to two hooks, so the call in
value-prof.c:gimple_stringops_transform would change from:

  if (!can_move_by_pieces (val, MIN (dest_align, src_align)))
return false;

to something like:

  if (!can_move_by_pieces (val, MIN (dest_align, src_align))
   !targetm.can_expand_mem_op_p (val, MIN (dest_align, src_align),
   MOVE_BY_PIECES))
return false;

But let's not confuse the use of what should be a simple hook!

  arc only implements MOVE_BY_PIECES_P, wiring it to false. Mirror that
  behaviour, and use the default hook for other by_pieces operations.
 
  I tried building a compiler but no amount of fiddling with target
  strings got me to a sensible result, so this patch is completely
  untested.
 
 You could just pick one of the configs in contrib/config-list.mk

Digging in to this, my scripts like to integrate a GDB build - which
doesn't work for arc-unknown-elf. I've been following the builds on
Jan Benedict-Glaw's since I put the patches in on Saturday morning,
and it doesn't look like I broke anything for arc. If I have any more
arc patches I'll keep this in mind.

Thanks,
James



Re: [match-and-simplify] support operator list

2014-11-04 Thread Richard Biener
On Mon, Nov 3, 2014 at 1:56 PM, Prathamesh Kulkarni
bilbotheelffri...@gmail.com wrote:
 (had sent it earlier by private mail).

 The attached patch supports operator-list and it's use in for.
 For now, operator-list is rejected in expression.

Ok.

 This patch also allows user-defined operator to be used as operator-list
 (user-defined ops are really temporary or scoped operator-lists).
 (for op (plus minus)
   op2 (op...)
   (simplify
  ...))

 Should that be supported or rejected ?

I've made it supported, but not in the way you implemented it.  Instead
I am only expanding operator lists.  Thus

(for op (plus minus)
 (for op2 (minus op)
  ...

will iterate (op, op2) (plus, minus) (minus, minus) (plus, plus) (minus, minus).

I think that makes more sense(?) but it's maybe too confusing as well so
eventually we should reject this.

I've adapted the patch slightly, using safe_splice, adding a comment and
avoiding a long line.

I've also used an example operator list in match-builtin.pd instead where
it shows the obvious improvement we're going to implement.

Committed.

Thanks,
Richard.

2014-11-04  Prathamesh Kulkarni  bilbotheelffri...@gmail.com

* genmatch.c (user_id): Add new member is_oper_list.
(user_id::user_id): Add new default argument.
(parser::parse_operator_list): New function.
(parser::parse_for): Allow operator-list.
(parser::parse_pattern): Call parser::parse_operator_list.
(parser::parse_operation): Reject operator-list.
* match-builtin.pd: Define operator lists POWs, CBRTs and SQRTs.

(gmail is too stupid to attach stuff it seems)

 * genmatch.c
   (user_id): Add new member is_oper_list.
   (user_id::user_id): Add new default argument.
   (parser::parse_operator_list): New function.
   (parser::parse_for): Allow operator-list.
   (parser::parse_pattern): Call parser::parse_operator_list.
   (parser::parse_operation): Reject operator-list.

 * match-comparison.pd
   Define operator-lists eq_ops and cc and use them in patterns.

 Thanks,
 Prathamesh


RFC: Building a minimal libgfortran for nvptx

2014-11-04 Thread Bernd Schmidt
The ptx port by its nature is lacking features that are expected on 
normal machines, such as alloca and indirect jumps. We have a subset 
of the C library which contains functions that can be implemented on the 
target (excluding things like file I/O other than printf which is a ptx 
builtin).


It would be good to be able to also build as much of libgfortran as 
possible, and the following patch is what I've been using so far. It 
recognizes the target at configure time and restricts the list of 
compiled files, as well as providing a LIBGFOR_MINIMAL define for #ifdef 
tests. There's also a new file minimal.c which contains alternative 
implementations of some functionality (using printf to write error 
messages rather than fprintf and such). Constructors are currently 
unimplemented on ptx and therefore the init function is commented out.


Comments on the approach, do the Fortran maintainers have a preference 
how this should look? The whole thing is good enough to substantially 
reduce the number of failures when trying to run the Fortran testsuites 
on nvptx (although many remain).



Bernd
Index: libgfortran/Makefile.am
===
--- libgfortran/Makefile.am.orig
+++ libgfortran/Makefile.am
@@ -77,6 +77,14 @@ AM_CFLAGS += $(SECTION_FLAGS)
 AM_CFLAGS += $(IEEE_FLAGS)
 AM_FCFLAGS += $(IEEE_FLAGS)
 
+if LIBGFOR_MINIMAL
+AM_CFLAGS += -DLIBGFOR_MINIMAL
+endif
+
+if LIBGFOR_MINIMAL
+gfor_io_src= \
+io/size_from_kind.c
+else
 gfor_io_src= \
 io/close.c \
 io/file_pos.c \
@@ -94,6 +102,7 @@ io/unit.c \
 io/unix.c \
 io/write.c \
 io/fbuf.c
+endif
 
 gfor_io_headers= \
 io/io.h \
@@ -101,6 +110,41 @@ io/fbuf.h \
 io/format.h \
 io/unix.h
 
+if LIBGFOR_MINIMAL
+gfor_helper_src= \
+intrinsics/associated.c \
+intrinsics/abort.c \
+intrinsics/args.c \
+intrinsics/bit_intrinsics.c \
+intrinsics/cshift0.c \
+intrinsics/eoshift0.c \
+intrinsics/eoshift2.c \
+intrinsics/erfc_scaled.c \
+intrinsics/extends_type_of.c \
+intrinsics/fnum.c \
+intrinsics/ierrno.c \
+intrinsics/ishftc.c \
+intrinsics/iso_c_generated_procs.c \
+intrinsics/iso_c_binding.c \
+intrinsics/malloc.c \
+intrinsics/mvbits.c \
+intrinsics/move_alloc.c \
+intrinsics/pack_generic.c \
+intrinsics/selected_char_kind.c \
+intrinsics/size.c \
+intrinsics/spread_generic.c \
+intrinsics/string_intrinsics.c \
+intrinsics/rand.c \
+intrinsics/random.c \
+intrinsics/reshape_generic.c \
+intrinsics/reshape_packed.c \
+intrinsics/selected_int_kind.f90 \
+intrinsics/selected_real_kind.f90 \
+intrinsics/transpose_generic.c \
+intrinsics/unpack_generic.c \
+runtime/in_pack_generic.c \
+runtime/in_unpack_generic.c
+else
 gfor_helper_src= \
 intrinsics/associated.c \
 intrinsics/abort.c \
@@ -165,6 +209,7 @@ intrinsics/unlink.c \
 intrinsics/unpack_generic.c \
 runtime/in_pack_generic.c \
 runtime/in_unpack_generic.c
+endif
 
 if IEEE_SUPPORT
 
@@ -181,6 +226,15 @@ gfor_ieee_src=
 
 endif
 
+if LIBGFOR_MINIMAL
+gfor_src= \
+runtime/bounds.c \
+runtime/compile_options.c \
+runtime/memory.c \
+runtime/minimal.c \
+runtime/string.c \
+runtime/select.c
+else
 gfor_src= \
 runtime/backtrace.c \
 runtime/bounds.c \
@@ -195,6 +249,7 @@ runtime/pause.c \
 runtime/stop.c \
 runtime/string.c \
 runtime/select.c
+endif
 
 i_all_c= \
 $(srcdir)/generated/all_l1.c \
Index: libgfortran/Makefile.in
===
--- libgfortran/Makefile.in.orig
+++ libgfortran/Makefile.in
@@ -54,10 +54,11 @@ POST_UNINSTALL = :
 build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
-@IEEE_SUPPORT_TRUE@am__append_1 = ieee/ieee_helper.c
+@LIBGFOR_MINIMAL_TRUE@am__append_1 = -DLIBGFOR_MINIMAL
+@IEEE_SUPPORT_TRUE@am__append_2 = ieee/ieee_helper.c
 
 # dummy sources for libtool
-@onestep_TRUE@am__append_2 = libgfortran_c.c libgfortran_f.f90
+@onestep_TRUE@am__append_3 = libgfortran_c.c libgfortran_f.f90
 subdir = .
 DIST_COMMON = ChangeLog $(srcdir)/Makefile.in $(srcdir)/Makefile.am \
 	$(top_srcdir)/configure $(am__configure_deps) \
@@ -121,9 +122,13 @@ libcaf_single_la_LIBADD =
 am_libcaf_single_la_OBJECTS = single.lo
 libcaf_single_la_OBJECTS = $(am_libcaf_single_la_OBJECTS)
 libgfortran_la_LIBADD =
-am__objects_1 = backtrace.lo bounds.lo compile_options.lo \
-	convert_char.lo environ.lo error.lo fpu.lo main.lo memory.lo \
-	pause.lo stop.lo string.lo select.lo
+@LIBGFOR_MINIMAL_FALSE@am__objects_1 = backtrace.lo bounds.lo \
+@LIBGFOR_MINIMAL_FALSE@	compile_options.lo convert_char.lo \
+@LIBGFOR_MINIMAL_FALSE@	environ.lo error.lo fpu.lo main.lo \
+@LIBGFOR_MINIMAL_FALSE@	memory.lo pause.lo stop.lo string.lo \
+@LIBGFOR_MINIMAL_FALSE@	select.lo
+@LIBGFOR_MINIMAL_TRUE@am__objects_1 = bounds.lo compile_options.lo \
+@LIBGFOR_MINIMAL_TRUE@	memory.lo minimal.lo string.lo select.lo
 am__objects_2 = all_l1.lo all_l2.lo all_l4.lo all_l8.lo all_l16.lo
 am__objects_3 = any_l1.lo any_l2.lo any_l4.lo any_l8.lo any_l16.lo
 am__objects_4 = count_1_l.lo count_2_l.lo count_4_l.lo 

[PATCH][C-Family] Disable bogus shortening of DFP vs FP compare

2014-11-04 Thread Richard Biener

On the match-and-simplify branch we expose an issue in shorten_compare
which happily transforms (double) float-var != (double) dfp-float-var
to (float) float-var != (float) dfp-float-var which is wrong
and causes

FAIL: c-c++-common/dfp/convert-bfp-12.c  -std=c++11 execution test
FAIL: c-c++-common/dfp/convert-bfp-12.c  -std=c++1y execution test
FAIL: c-c++-common/dfp/convert-bfp-12.c  -std=c++98 execution test

you can expose this latent bug by making get_narrower also
treat CONVERT_EXPRs as conversions - on trunk the DFP value
happens to be converted to double using that.

You then also run into the issue that the C frontend rejects
the compare via its common_type implementation which complains
about a mixed FP - DFP compare.

So the following patch which I am testing right now disables
the (premature) shorten-compare optimization on mixed
FP - DFP operands.  It also exposes the issue on trunk by
improving get_narrower.

Ok for trunk either with or without the tree.c hunk?  (I'm
going to remove that if it exposes more issues elsewhere
revealed by testing)

Thanks,
Richard.

2014-11-04  Richard Biener  rguent...@suse.de

* tree.c (get_narrower): Also look through CONVERT_EXPRs.

c-family/
* c-common.c (shorten_compare): Do not shorten mixed
DFP and non-DFP compares.

Index: gcc/tree.c
===
--- gcc/tree.c  (revision 217049)
+++ gcc/tree.c  (working copy)
@@ -8494,7 +8494,7 @@ get_narrower (tree op, int *unsignedp_pt
   tree win = op;
   bool integral_p = INTEGRAL_TYPE_P (TREE_TYPE (op));
 
-  while (TREE_CODE (op) == NOP_EXPR)
+  while (CONVERT_EXPR_P (op))
 {
   int bitschange
= (TYPE_PRECISION (TREE_TYPE (op))
Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c (revision 217049)
+++ gcc/c-family/c-common.c (working copy)
@@ -4314,9 +4314,15 @@ shorten_compare (location_t loc, tree *o
   /* If either arg is decimal float and the other is float, find the
  proper common type to use for comparison.  */
   else if (real1  real2
+   DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (primop0)))
+   DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (primop1
+type = common_type (TREE_TYPE (primop0), TREE_TYPE (primop1));
+
+  /* If either arg is decimal float and the other is float, fail.  */
+  else if (real1  real2
(DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (primop0)))
   || DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (primop1)
-type = common_type (TREE_TYPE (primop0), TREE_TYPE (primop1));
+return 0;
 
   else if (real1  real2
(TYPE_PRECISION (TREE_TYPE (primop0))


Re: libstdc++ testsuite make targets check-parallel and check-performance don't work anymore

2014-11-04 Thread Jonathan Wakely
On 4 November 2014 13:13, Paolo Carlini paolo.carl...@oracle.com wrote:
 Hi again,


 On 11/04/2014 05:34 AM, Jonathan Wakely wrote:

 On 04/11/14 03:41 +, Jonathan Wakely wrote:

 On 03/11/14 22:07 +, Jonathan Wakely wrote:

 On 3 November 2014 17:51, Paolo Carlini paolo.carl...@oracle.com
 wrote:

 .. other than the above issue, I see a segmentation fault for:

performance/ext/pb_ds/multimap_text_insert_mem_large.cc

 Not a big deal of course, but unfortunately today I'm seeing *two* segfaults
 for pb_ds:

 performance/ext/pb_ds/multimap_text_insert_mem_large.cc
 .../libstdc++-v3/scripts/check_performance: line 41: 16173 Segmentation
 fault  ./$EXE_NAME tmp.$FILE_NAME
 performance/ext/pb_ds/multimap_text_insert_mem_small.cc
 .../libstdc++-v3/scripts/check_performance: line 41: 16217 Segmentation
 fault  ./$EXE_NAME tmp.$FILE_NAME

They were both caused by the infinite recursion in the operator== for
tracker_allocator, which should be fixed.

However, my change to std::deque's move constructor introduced some
new failures, mostly due to dg-error line numbers that need to change,
but one real failure in 23_containers/deque/allocator/move_assign.cc

I'm not sure whether the right approach is to violate the requirement
that deque's move constructor move constructs the allocator, (which I
changed to copy construction, to meet the exception safety guarantee
that the RHS of the move construction maintains its invariants) or to
change the invariant so that a deque doesn't always have a map
allocated.

I'll fix it asap.


Re: [PATCH][C-Family] Disable bogus shortening of DFP vs FP compare

2014-11-04 Thread Richard Biener
On Tue, 4 Nov 2014, Richard Biener wrote:

 
 On the match-and-simplify branch we expose an issue in shorten_compare
 which happily transforms (double) float-var != (double) dfp-float-var
 to (float) float-var != (float) dfp-float-var which is wrong
 and causes
 
 FAIL: c-c++-common/dfp/convert-bfp-12.c  -std=c++11 execution test
 FAIL: c-c++-common/dfp/convert-bfp-12.c  -std=c++1y execution test
 FAIL: c-c++-common/dfp/convert-bfp-12.c  -std=c++98 execution test
 
 you can expose this latent bug by making get_narrower also
 treat CONVERT_EXPRs as conversions - on trunk the DFP value
 happens to be converted to double using that.
 
 You then also run into the issue that the C frontend rejects
 the compare via its common_type implementation which complains
 about a mixed FP - DFP compare.
 
 So the following patch which I am testing right now disables
 the (premature) shorten-compare optimization on mixed
 FP - DFP operands.  It also exposes the issue on trunk by
 improving get_narrower.
 
 Ok for trunk either with or without the tree.c hunk?  (I'm
 going to remove that if it exposes more issues elsewhere
 revealed by testing)

I removed it - it fails very early in stage2 with a warning about an
always-true comparison.  No time to sort that out at this point.
So consider this a c-common.c change only.

Thanks,
Richard.

 2014-11-04  Richard Biener  rguent...@suse.de
 
   * tree.c (get_narrower): Also look through CONVERT_EXPRs.
 
   c-family/
   * c-common.c (shorten_compare): Do not shorten mixed
   DFP and non-DFP compares.
 
 Index: gcc/tree.c
 ===
 --- gcc/tree.c(revision 217049)
 +++ gcc/tree.c(working copy)
 @@ -8494,7 +8494,7 @@ get_narrower (tree op, int *unsignedp_pt
tree win = op;
bool integral_p = INTEGRAL_TYPE_P (TREE_TYPE (op));
  
 -  while (TREE_CODE (op) == NOP_EXPR)
 +  while (CONVERT_EXPR_P (op))
  {
int bitschange
   = (TYPE_PRECISION (TREE_TYPE (op))
 Index: gcc/c-family/c-common.c
 ===
 --- gcc/c-family/c-common.c   (revision 217049)
 +++ gcc/c-family/c-common.c   (working copy)
 @@ -4314,9 +4314,15 @@ shorten_compare (location_t loc, tree *o
/* If either arg is decimal float and the other is float, find the
   proper common type to use for comparison.  */
else if (real1  real2
 + DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (primop0)))
 + DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (primop1
 +type = common_type (TREE_TYPE (primop0), TREE_TYPE (primop1));
 +
 +  /* If either arg is decimal float and the other is float, fail.  */
 +  else if (real1  real2
   (DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (primop0)))
  || DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (primop1)
 -type = common_type (TREE_TYPE (primop0), TREE_TYPE (primop1));
 +return 0;
  
else if (real1  real2
   (TYPE_PRECISION (TREE_TYPE (primop0))
 

-- 
Richard Biener rguent...@suse.de
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany


Re: The nvptx port [1/11+] indirect jumps

2014-11-04 Thread Bernd Schmidt

On 10/20/2014 04:19 PM, Bernd Schmidt wrote:

ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be
defined.  Add a sorry.


Looking back through all the mails it turns out this one wasn't approved 
yet. Ping?



Bernd



[PATCH] Fix dump scan in test devirt-40.C

2014-11-04 Thread Martin Jambor
Hi,

since revision 216728, testsuite/g++.dg/ipa/devirt-40.C is failing
because although the tested-for devirtualization does happen, it is
probably being done earlier and the string we are trying to match is
not emitted.  But the important thing is that the tested
devirtualization takes place.

Patch has been pre-approved on IRC by Honza, I have tested it with
make -k check RUNTESTFLAGS=dg.exp=ipa/devirt-40.C which I hope is
sufficient and I have tested that the pattern matches a call to an
OBJ_TYPE_REF in function body in an earlier dump file.  I will commit
the patch shortly.

Thanks,

Martin


gcc/testsuite/

2014-11-04  Martin Jambor  mjam...@suse.cz

* devirt-40.C: Changed dump to not matching OBJ_TYPE_REF in
function body.

Index: src/gcc/testsuite/g++.dg/ipa/devirt-40.C
===
--- src.orig/gcc/testsuite/g++.dg/ipa/devirt-40.C
+++ src/gcc/testsuite/g++.dg/ipa/devirt-40.C
@@ -19,5 +19,5 @@ A::m_fn1 (UnicodeString , int p2, UErr
   UnicodeString a[2];
 }
 
-/* { dg-final { scan-tree-dump converting indirect call to function virtual 
UnicodeString fre2  } } */
+/* { dg-final { scan-tree-dump-not \\n  OBJ_TYPE_REF fre2  } } */
 /* { dg-final { cleanup-tree-dump fre2 } } */


Re: RFC: Building a minimal libgfortran for nvptx

2014-11-04 Thread Steve Kargl
On Tue, Nov 04, 2014 at 03:34:43PM +0100, Bernd Schmidt wrote:
 The ptx port by its nature is lacking features that are expected on 
 normal machines, such as alloca and indirect jumps. We have a subset 
 of the C library which contains functions that can be implemented on the 
 target (excluding things like file I/O other than printf which is a ptx 
 builtin).
 
 It would be good to be able to also build as much of libgfortran as 
 possible, and the following patch is what I've been using so far. It 
 recognizes the target at configure time and restricts the list of 
 compiled files, as well as providing a LIBGFOR_MINIMAL define for #ifdef 
 tests. There's also a new file minimal.c which contains alternative 
 implementations of some functionality (using printf to write error 
 messages rather than fprintf and such). Constructors are currently 
 unimplemented on ptx and therefore the init function is commented out.
 
 Comments on the approach, do the Fortran maintainers have a preference 
 how this should look? The whole thing is good enough to substantially 
 reduce the number of failures when trying to run the Fortran testsuites 
 on nvptx (although many remain).
 

It is unclear to me from reading the diff whether this patch 
cause gfortran on ptx to knowingly violate the fortran standard.
If the answer is yes, this patch causes gfortran on ptx to
violate the standard, then the patch is IMHO unacceptable.

-- 
steve


Re: The nvptx port [1/11+] indirect jumps

2014-11-04 Thread Richard Henderson
On 11/04/2014 04:32 PM, Bernd Schmidt wrote:
 On 10/20/2014 04:19 PM, Bernd Schmidt wrote:
 ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be
 defined.  Add a sorry.
 
 Looking back through all the mails it turns out this one wasn't approved yet.
 Ping?

Ok.


r~



Re: libstdc++ testsuite make targets check-parallel and check-performance don't work anymore

2014-11-04 Thread Paolo Carlini

Hi,

On 11/04/2014 04:32 PM, Jonathan Wakely wrote:

Not a big deal of course, but unfortunately today I'm seeing *two* segfaults
for pb_ds:

performance/ext/pb_ds/multimap_text_insert_mem_large.cc
.../libstdc++-v3/scripts/check_performance: line 41: 16173 Segmentation
fault  ./$EXE_NAME tmp.$FILE_NAME
performance/ext/pb_ds/multimap_text_insert_mem_small.cc
.../libstdc++-v3/scripts/check_performance: line 41: 16217 Segmentation
fault  ./$EXE_NAME tmp.$FILE_NAME
They were both caused by the infinite recursion in the operator== for
tracker_allocator, which should be fixed.
To be clear, I see (deterministically, apparently) the three pb_ds fails 
which I reported earlier today with a *current* tree. But the pb_ds 
issues are definitely low priority, unless they point to something 
serious elsewhere. Thanks for looking into that.


Paolo.



Re: [PATCH] Fix dump scan in test devirt-40.C

2014-11-04 Thread Richard Biener
On Tue, Nov 4, 2014 at 4:37 PM, Martin Jambor mjam...@suse.cz wrote:
 Hi,

 since revision 216728, testsuite/g++.dg/ipa/devirt-40.C is failing
 because although the tested-for devirtualization does happen, it is
 probably being done earlier and the string we are trying to match is
 not emitted.  But the important thing is that the tested
 devirtualization takes place.

 Patch has been pre-approved on IRC by Honza, I have tested it with
 make -k check RUNTESTFLAGS=dg.exp=ipa/devirt-40.C which I hope is
 sufficient and I have tested that the pattern matches a call to an
 OBJ_TYPE_REF in function body in an earlier dump file.  I will commit
 the patch shortly.

 Thanks,

 Martin


 gcc/testsuite/

 2014-11-04  Martin Jambor  mjam...@suse.cz

 * devirt-40.C: Changed dump to not matching OBJ_TYPE_REF in
 function body.

 Index: src/gcc/testsuite/g++.dg/ipa/devirt-40.C
 ===
 --- src.orig/gcc/testsuite/g++.dg/ipa/devirt-40.C
 +++ src/gcc/testsuite/g++.dg/ipa/devirt-40.C
 @@ -19,5 +19,5 @@ A::m_fn1 (UnicodeString , int p2, UErr
UnicodeString a[2];
  }

 -/* { dg-final { scan-tree-dump converting indirect call to function virtual 
 UnicodeString fre2  } } */
 +/* { dg-final { scan-tree-dump-not \\n  OBJ_TYPE_REF fre2  } } */

What's the odd newline and spaces here?

  /* { dg-final { cleanup-tree-dump fre2 } } */


Re: [PATCH] Fix dump scan in test devirt-40.C

2014-11-04 Thread Paolo Carlini

Hi,

On 11/04/2014 04:37 PM, Martin Jambor wrote:

Hi,

since revision 216728, testsuite/g++.dg/ipa/devirt-40.C is failing
because although the tested-for devirtualization does happen, it is
probably being done earlier and the string we are trying to match is
not emitted.  But the important thing is that the tested
devirtualization takes place.

Patch has been pre-approved on IRC by Honza, I have tested it with
make -k check RUNTESTFLAGS=dg.exp=ipa/devirt-40.C which I hope is
sufficient and I have tested that the pattern matches a call to an
OBJ_TYPE_REF in function body in an earlier dump file.  I will commit
the patch shortly.
Thanks. Just to confirm, devirt-42.C is also failing, that is also 
known, right?


FAIL: g++.dg/ipa/devirt-42.C -std=gnu++11 scan-tree-dump-times optimized 
return 2 2 FAIL: g++.dg/ipa/devirt-42.C -std=gnu++1y 
scan-tree-dump-times optimized return 2 2 FAIL: g++.dg/ipa/devirt-42.C 
-std=gnu++98 scan-tree-dump-times optimized return 2 2


Thanks again,
Paolo.


Re: [PATCH] Fix dump scan in test devirt-40.C

2014-11-04 Thread Jiong Wang


On 04/11/14 15:51, Paolo Carlini wrote:

Hi,

On 11/04/2014 04:37 PM, Martin Jambor wrote:

Hi,

since revision 216728, testsuite/g++.dg/ipa/devirt-40.C is failing
because although the tested-for devirtualization does happen, it is
probably being done earlier and the string we are trying to match is
not emitted.  But the important thing is that the tested
devirtualization takes place.

Patch has been pre-approved on IRC by Honza, I have tested it with
make -k check RUNTESTFLAGS=dg.exp=ipa/devirt-40.C which I hope is
sufficient and I have tested that the pattern matches a call to an
OBJ_TYPE_REF in function body in an earlier dump file.  I will commit
the patch shortly.

Thanks. Just to confirm, devirt-42.C is also failing,


+1, on my local arm/aarch64 test environment.

FAIL: g++.dg/ipa/devirt-42.C  -std=gnu++11  scan-tree-dump-times optimized return 
2 2
FAIL: g++.dg/ipa/devirt-42.C  -std=gnu++98  scan-tree-dump-times optimized return 
2 2
FAIL: g++.dg/ipa/devirt-42.C  -std=gnu++1y  scan-ipa-dump-times inline Discovered a 
virtual call to a known target 2
FAIL: g++.dg/ipa/devirt-42.C  -std=gnu++1y  scan-ipa-dump-times inline First type 
is base of second 3
FAIL: g++.dg/ipa/devirt-42.C  -std=gnu++11  scan-ipa-dump-times inline First type 
is base of second 3
FAIL: g++.dg/ipa/devirt-42.C  -std=gnu++98  scan-ipa-dump-times inline Discovered a 
virtual call to a known target 2
FAIL: g++.dg/ipa/devirt-42.C  -std=gnu++11  scan-ipa-dump-times inline Discovered a 
virtual call to a known target 2
FAIL: g++.dg/ipa/devirt-42.C  -std=gnu++98  scan-ipa-dump-times inline First type 
is base of second 3
FAIL: g++.dg/ipa/devirt-42.C  -std=gnu++1y  scan-tree-dump-times optimized return 
2 2


that is also
known, right?

FAIL: g++.dg/ipa/devirt-42.C -std=gnu++11 scan-tree-dump-times optimized
return 2 2 FAIL: g++.dg/ipa/devirt-42.C -std=gnu++1y
scan-tree-dump-times optimized return 2 2 FAIL: g++.dg/ipa/devirt-42.C
-std=gnu++98 scan-tree-dump-times optimized return 2 2

Thanks again,
Paolo.






Re: [PATCH] Fix dump scan in test devirt-40.C

2014-11-04 Thread Martin Jambor
On Tue, Nov 04, 2014 at 04:47:18PM +0100, Richard Biener wrote:
 On Tue, Nov 4, 2014 at 4:37 PM, Martin Jambor mjam...@suse.cz wrote:
  Hi,
 
  since revision 216728, testsuite/g++.dg/ipa/devirt-40.C is failing
  because although the tested-for devirtualization does happen, it is
  probably being done earlier and the string we are trying to match is
  not emitted.  But the important thing is that the tested
  devirtualization takes place.
 
  Patch has been pre-approved on IRC by Honza, I have tested it with
  make -k check RUNTESTFLAGS=dg.exp=ipa/devirt-40.C which I hope is
  sufficient and I have tested that the pattern matches a call to an
  OBJ_TYPE_REF in function body in an earlier dump file.  I will commit
  the patch shortly.
 
  Thanks,
 
  Martin
 
 
  gcc/testsuite/
 
  2014-11-04  Martin Jambor  mjam...@suse.cz
 
  * devirt-40.C: Changed dump to not matching OBJ_TYPE_REF in
  function body.
 
  Index: src/gcc/testsuite/g++.dg/ipa/devirt-40.C
  ===
  --- src.orig/gcc/testsuite/g++.dg/ipa/devirt-40.C
  +++ src/gcc/testsuite/g++.dg/ipa/devirt-40.C
  @@ -19,5 +19,5 @@ A::m_fn1 (UnicodeString , int p2, UErr
 UnicodeString a[2];
   }
 
  -/* { dg-final { scan-tree-dump converting indirect call to function 
  virtual UnicodeString fre2  } } */
  +/* { dg-final { scan-tree-dump-not \\n  OBJ_TYPE_REF fre2  } } */
 
 What's the odd newline and spaces here?

That only matches an OBJ_TYPE_REF as a statement in the function body
and specifically does not match
Determining dynamic type for call: OBJ_TYPE_REF and 
Starting walk at: OBJ_TYPE_REF
that are present in the dump file above the dump function.

An alternative would be of course to just scan the optimized dump if
this is considered too much of a hack.  But at least I though that for
a testcas it would do nicely.

Martin


Re: RFC: Building a minimal libgfortran for nvptx

2014-11-04 Thread Bernd Schmidt

On 11/04/2014 04:41 PM, Steve Kargl wrote:

It is unclear to me from reading the diff whether this patch
cause gfortran on ptx to knowingly violate the fortran standard.
If the answer is yes, this patch causes gfortran on ptx to
violate the standard, then the patch is IMHO unacceptable.


I don't have the Fortran standard, but I assume that missing pieces in 
the library would be a violation. However, the alternative is no Fortran 
(library) support at all, which doesn't seem like an improvement. The 
target simply does not allow full language support, even for something 
like C.


Note that the intention is not to support Fortran (or any other 
language) directly targetting ptx code. The only way it's supposed to be 
used is as an accelerator for OpenACC offloading.



Bernd



Re: RFC: Building a minimal libgfortran for nvptx

2014-11-04 Thread FX
 Comments on the approach, do the Fortran maintainers have a preference how 
 this should look? The whole thing is good enough to substantially reduce the 
 number of failures when trying to run the Fortran testsuites on nvptx 
 (although many remain).

I’m afraid I don’t really see the point. Maybe I’ll say it differently that 
Steve has: what would be the state of Fortran on such platform, and would it 
have users? It looks that, at the very least, your target wouldn’t be able to 
do any I/O.

Unless we have a good reason to think there will be users, because the state of 
support will be good, I don’t see that we should add to the library 
maintainance burden by special-casing targets.

Also: if other targets come along that have this need, how does your strategy 
scale up?

Thanks,
FX

Re: [PATCH][C-Family] Disable bogus shortening of DFP vs FP compare

2014-11-04 Thread Joseph Myers
On Tue, 4 Nov 2014, Richard Biener wrote:

   c-family/
   * c-common.c (shorten_compare): Do not shorten mixed
   DFP and non-DFP compares.

OK.  I think it's also wrong for get_narrower to strip conversions between 
binary and decimal floating point at all, as all such conversions for 
supported pairs of types can change values.  (_Decimal128 can represent 
all values of __fp16, but no target currently supports both types.)  And 
if flag_signaling_nans (strictly, if HONOR_SNANS (source-mode)), no 
floating-point conversions should be stripped in get_narrower at all.  And 
even without signaling NaNs, TYPE_PRECISION may not be a reliable 
indication of whether a conversion is widening - the only issue applying 
at present might be the corner case that a conversion of a subnormal from 
__float80 to __float128 is exact but still raises underflow if traps on 
underflow are enabled, but when you have both binary128 and IBM long 
double supported together then neither has a set of values that is a 
superset of the other (it seems 
https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01086.html misses 
get_narrower in the places that need addressing).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][C-Family] Disable bogus shortening of DFP vs FP compare

2014-11-04 Thread Richard Biener
On Tue, 4 Nov 2014, Joseph Myers wrote:

 On Tue, 4 Nov 2014, Richard Biener wrote:
 
  c-family/
  * c-common.c (shorten_compare): Do not shorten mixed
  DFP and non-DFP compares.
 
 OK.  I think it's also wrong for get_narrower to strip conversions between 
 binary and decimal floating point at all, as all such conversions for 
 supported pairs of types can change values.  (_Decimal128 can represent 
 all values of __fp16, but no target currently supports both types.)  And 
 if flag_signaling_nans (strictly, if HONOR_SNANS (source-mode)), no 
 floating-point conversions should be stripped in get_narrower at all.  And 
 even without signaling NaNs, TYPE_PRECISION may not be a reliable 
 indication of whether a conversion is widening - the only issue applying 
 at present might be the corner case that a conversion of a subnormal from 
 __float80 to __float128 is exact but still raises underflow if traps on 
 underflow are enabled, but when you have both binary128 and IBM long 
 double supported together then neither has a set of values that is a 
 superset of the other (it seems 
 https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01086.html misses 
 get_narrower in the places that need addressing).

From this and past issues it looks like most (if not all) conversions
between FP types use CONVERT_EXPR, not NOP_EXPR and thus may work
as expected here.  That's of course somewhat unreliable as the
middle-end happily uses NOP_EXPR for those.

Richard.


Re: RFC: Building a minimal libgfortran for nvptx

2014-11-04 Thread Bernd Schmidt

On 11/04/2014 04:59 PM, FX wrote:

Comments on the approach, do the Fortran maintainers have a
preference how this should look? The whole thing is good enough to
substantially reduce the number of failures when trying to run the
Fortran testsuites on nvptx (although many remain).


I’m afraid I don’t really see the point. Maybe I’ll say it
differently that Steve has: what would be the state of Fortran on
such platform, and would it have users? It looks that, at the very
least, your target wouldn’t be able to do any I/O.


It would be used through OpenACC - and that I do believe that does have 
Fortran users, with other compilers. We can make Fortran OpenACC work 
for simple testcases without any runtime library, but IMO it would be 
better to go as far as possible in supporting whatever can be made to 
work. I/O is the major piece that is missing, but you'd expect that to 
be done on the host rather than on the accelerator.



Bernd


Re: RFC: Building a minimal libgfortran for nvptx

2014-11-04 Thread Jakub Jelinek
On Tue, Nov 04, 2014 at 07:41:42AM -0800, Steve Kargl wrote:
 On Tue, Nov 04, 2014 at 03:34:43PM +0100, Bernd Schmidt wrote:
  The ptx port by its nature is lacking features that are expected on 
  normal machines, such as alloca and indirect jumps. We have a subset 
  of the C library which contains functions that can be implemented on the 
  target (excluding things like file I/O other than printf which is a ptx 
  builtin).
  
  It would be good to be able to also build as much of libgfortran as 
  possible, and the following patch is what I've been using so far. It 
  recognizes the target at configure time and restricts the list of 
  compiled files, as well as providing a LIBGFOR_MINIMAL define for #ifdef 
  tests. There's also a new file minimal.c which contains alternative 
  implementations of some functionality (using printf to write error 
  messages rather than fprintf and such). Constructors are currently 
  unimplemented on ptx and therefore the init function is commented out.
  
  Comments on the approach, do the Fortran maintainers have a preference 
  how this should look? The whole thing is good enough to substantially 
  reduce the number of failures when trying to run the Fortran testsuites 
  on nvptx (although many remain).
  
 
 It is unclear to me from reading the diff whether this patch 
 cause gfortran on ptx to knowingly violate the fortran standard.
 If the answer is yes, this patch causes gfortran on ptx to
 violate the standard, then the patch is IMHO unacceptable.

The point is, if the target can implement just a subset of the Fortran (or
C or C++) standards, then ideally if you use anything that is not supported
would just cause always host fallback, the code will still work, but will
not be offloaded.  So even supporting a subset of the standard is
worthwhile, usually one will just offload the most performance critical
parts of his code.

Jakub


Re: RFC: Building a minimal libgfortran for nvptx

2014-11-04 Thread FX
 The point is, if the target can implement just a subset of the Fortran (or
 C or C++) standards, then ideally if you use anything that is not supported
 would just cause always host fallback, the code will still work, but will
 not be offloaded.  So even supporting a subset of the standard is
 worthwhile, usually one will just offload the most performance critical
 parts of his code.

Do we have the architecture for that in place in GCC in general, and in the 
Fortran front-end in particular? I’d be interested to see how it works…

FX

Re: [PATCH 12/27] New file: gcc/jit/jit-recording.h

2014-11-04 Thread David Malcolm
On Mon, 2014-11-03 at 14:27 -0700, Jeff Law wrote:
 On 10/31/14 11:02, David Malcolm wrote:
  This file declares the gcc::jit::recording internal API, so that
  libgccjit.c can record the calls that are made to the public API, for
  later playback by the dummy frontend.
 
  gcc/jit/
  * jit-recording.h: New.
  ---
gcc/jit/jit-recording.h | 1593 
  +++
1 file changed, 1593 insertions(+)
create mode 100644 gcc/jit/jit-recording.h
 
  diff --git a/gcc/jit/jit-recording.h b/gcc/jit/jit-recording.h
  new file mode 100644
  index 000..bb1a2ee
  --- /dev/null
  +++ b/gcc/jit/jit-recording.h
 [ ... ]
 
  +
  +private:
  +  void validate ();
 So give the complexities in interfacing with the guts of GCC, would it 
 make sense to expose the validate method?

Most of the error-checking in the API happens in the API calls in
libgccjit.c, testing that the individual pieces are sane.

The validate method tests the things that can only be verified as a
whole, when the context is about to be compiled: are there unreachable
blocks?  is every block terminated?  etc

I can't quite see why client code might want to perform the latter kind
of validation without actually doing a compile, so I don't plan to
expose this at this time.  It's trivial to do so if someone needs it.

  +/* or just use std::string? */
  +class string : public memento
 Is there some reason not to use std::string?  I really like using 
 standard components rather than rolling our own.

I think I was trying to avoid std::string for some reason, but I'm not
quite sure why, perhaps out of a misremembered idea that libstdc++ was
verboten (I currently use std:: in one place, in jit-playback.h, where a
playback::context has a:
  vecstd::pairtree, location *  m_cached_locations;
).

In any case recording::string is an implementation detail hidden within
the library.  It is a recording::memento and hence has the same lifetime
as the recording::context.  I can't think of a reason off the top of my
head why a std::string wouldn't work instead, but the existing code
works, and has been through a fair amount of debugging.

One thing I do make use of is that it's a pointer, for this field within
recording::memento:

  string *m_debug_string;

so that I can use NULL for the common case of no debug string has been
built for this thing yet - though I suspect I could use a std::string *
for that.

 OK for the trunk.

Thanks.



Re: RFC: Building a minimal libgfortran for nvptx

2014-11-04 Thread Jakub Jelinek
On Tue, Nov 04, 2014 at 05:15:52PM +0100, FX wrote:
  The point is, if the target can implement just a subset of the Fortran (or
  C or C++) standards, then ideally if you use anything that is not supported
  would just cause always host fallback, the code will still work, but will
  not be offloaded.  So even supporting a subset of the standard is
  worthwhile, usually one will just offload the most performance critical
  parts of his code.
 
 Do we have the architecture for that in place in GCC in general, and in
 the Fortran front-end in particular?  I’d be interested to see how it
 works…

See https://gcc.gnu.org/wiki/Offloading and kyukhin/gomp4-offload and
branches/gomp-4_0-branch branches.  Both are in the process of being merged
into trunk these days.

Jakub


Re: [PATCH] Fix dump scan in test devirt-40.C

2014-11-04 Thread Richard Biener
On Tue, Nov 4, 2014 at 4:57 PM, Martin Jambor mjam...@suse.cz wrote:
 On Tue, Nov 04, 2014 at 04:47:18PM +0100, Richard Biener wrote:
 On Tue, Nov 4, 2014 at 4:37 PM, Martin Jambor mjam...@suse.cz wrote:
  Hi,
 
  since revision 216728, testsuite/g++.dg/ipa/devirt-40.C is failing
  because although the tested-for devirtualization does happen, it is
  probably being done earlier and the string we are trying to match is
  not emitted.  But the important thing is that the tested
  devirtualization takes place.
 
  Patch has been pre-approved on IRC by Honza, I have tested it with
  make -k check RUNTESTFLAGS=dg.exp=ipa/devirt-40.C which I hope is
  sufficient and I have tested that the pattern matches a call to an
  OBJ_TYPE_REF in function body in an earlier dump file.  I will commit
  the patch shortly.
 
  Thanks,
 
  Martin
 
 
  gcc/testsuite/
 
  2014-11-04  Martin Jambor  mjam...@suse.cz
 
  * devirt-40.C: Changed dump to not matching OBJ_TYPE_REF in
  function body.
 
  Index: src/gcc/testsuite/g++.dg/ipa/devirt-40.C
  ===
  --- src.orig/gcc/testsuite/g++.dg/ipa/devirt-40.C
  +++ src/gcc/testsuite/g++.dg/ipa/devirt-40.C
  @@ -19,5 +19,5 @@ A::m_fn1 (UnicodeString , int p2, UErr
 UnicodeString a[2];
   }
 
  -/* { dg-final { scan-tree-dump converting indirect call to function 
  virtual UnicodeString fre2  } } */
  +/* { dg-final { scan-tree-dump-not \\n  OBJ_TYPE_REF fre2  } } */

 What's the odd newline and spaces here?

 That only matches an OBJ_TYPE_REF as a statement in the function body
 and specifically does not match
 Determining dynamic type for call: OBJ_TYPE_REF and
 Starting walk at: OBJ_TYPE_REF
 that are present in the dump file above the dump function.

Ah ok.

Thanks,
Richard.

 An alternative would be of course to just scan the optimized dump if
 this is considered too much of a hack.  But at least I though that for
 a testcas it would do nicely.

 Martin


Re: [Patch 3/7 arc] Deprecate *_BY_PIECES_P, move to hookized version

2014-11-04 Thread Joern Rennecke
On 4 November 2014 14:24, James Greenhalgh james.greenha...@arm.com wrote:
 On Tue, Nov 04, 2014 at 12:07:56PM +, Joern Rennecke wrote:
 On 31 October 2014 15:10, James Greenhalgh james.greenha...@arm.com wrote:

  While I am there, arc defines a macro CAN_MOVE_BY_PIECES, which is
  unused, so clean that up too.

 That's not a clean-up.  This pertains to PR 39350.

 Well, it is a clean-up in the sense that this macro is completely unused
 in the compiler and has no effect, but please revert this hunk if that
 is your preference.

 Which, incidentally, this hookization completely ignores, entrenching
 the conflation of move expander and move cost estimates.

 No, I have to disagree. The use_by_pieces_infrastructure_p hook doesn't
 conflate anything - it gives a response to the question Should the
 by_pieces infrastructure be used?. A target specific movmem pattern
 - though it might itself choose to move things by pieces, is
 categorically not using the move_by_pieces infrastructure.

 If we want to keep a clean separation of concerns here, we would
 want a similar target hook asking the single question will your
 movmem/setmem expander succeed?.

That would not be helpful.  What the rtl optimizers actually want to know is
will this block copy / memset be cheap? .
A movmem expander might succeed (or not) for various reasons.  The one that's
interesting for the above question is if the call has been inlined
with a fast set
of instructions.

 Thus, can_move_by_pieces gives the wrong result for purposes of rtl
 optimizations
 when a target-specific movmem etc expander emits target-specific code.
 The patch at https://gcc.gnu.org/ml/gcc-patches/2009-03/txt00018.txt
 shows a number of call sites that are affected.

 can_move_by_pieces (likewise can_store_by_pieces) gives the right
 result, the RTL expanders are using it wrong.

I could agree with that view if there was a good strategy agreed what the rtl
expanders should do instead.

 I disagree with the approach taken in your patch as it overloads the
 purpose of can_move_by_pieces. However, I would support a patch pulling
 this out in to two hooks, so the call in
 value-prof.c:gimple_stringops_transform would change from:

   if (!can_move_by_pieces (val, MIN (dest_align, src_align)))
 return false;

 to something like:

   if (!can_move_by_pieces (val, MIN (dest_align, src_align))
!targetm.can_expand_mem_op_p (val, MIN (dest_align, src_align),
MOVE_BY_PIECES))
 return false;

But this goes back to the problem that it's not about if we can expand the mem
op at all, but if we have a fast expansion.  We can always expand via libcall
(the middle end does this as a fall-back).  Also, the target might do some
target-specific slow expansion, e.g. call a function with another name
and maybe a
modified ABI, but still relatively slow to work.

So, either the new hook would answer the wrong question, or it would be
misnamed, in which case it's likely that the semantics will sooner or
later follow
the name.
it will gravitate to answer the wrong question again.

 But let's not confuse the use of what should be a simple hook!

What would that be?  TARGET_RTX_COST is unsuitable because the RTL
for the call hasn't been made yet, and it it was, it would tend to be multiple
instructions, maybe even a loop.
Should we have an analogous TARGET_TREE_COST hook, so that you can ask the
target what it thinks the cost of a tree will be once it's expanded?


[patch] Provide a can_compare_and_swap_p target hook.

2014-11-04 Thread Andrew MacLeod
java/builtins.c needs to call can_compare_and_swap, which happens to be 
provided via optabs.h.

As a result, we see the following in java/builtins.c:

/* FIXME: All these headers are necessary for sync_compare_and_swap.
   Front ends should never have to look at that.  */
#include rtl.h
#include insn-codes.h
#include expr.h
#include optabs.h

by providing a target hook and a default which simply calls 
can_compare_and_swap_p(),  we can remove all that include crap from the 
front end.  None of those 4 include files are now directly included by 
any front end files.


Bootstraps on x86_64-unknown-linux-gnu, and no new testsuite failures.  
OK for trunk?


Andrew

	* target.def (can_compare_and_swap_p): Define target hook.
	* targhooks.c (default_can_compare_and_swap_p): Provide default hook.
	* targhooks.h (default_can_compare_and_swap_p): Provide prototype.
	* doc/tm.texi (target_can_compare_and_swap_p): Provide description.
	* doc/tm.texi.in (TARGET_CAN_COMPARE_AND_SWAP_P): Provide location.
	* java/builtins.c. Remove backend includes, include target.h.
	(compareAndSwapInt_builtin, compareAndSwapLong_builtin,
	compareAndSwapObject_builtin, VMSupportsCS8_builtin): Use
	compare_and_swap_p target hook.
	
Index: target.def
===
*** target.def	(revision 217057)
--- target.def	(working copy)
*** specific target options and the caller d
*** 5185,5190 
--- 5185,5199 
   bool, (tree caller, tree callee),
   default_target_can_inline_p)
  
+ DEFHOOK
+ (can_compare_and_swap_p,
+  This macro returns true if the target can perform an atomic\n\
+ compare_and_swap operation on an object of machine mode @var{mode}.\n\
+ @var{allow_libcall} is a boolean which is used to indicate whether the\n\
+ operation is allowed to be performed with a libcall.,
+  bool, (machine_mode mode, bool allow_libcall),
+  default_can_compare_and_swap_p)
+ 
  HOOK_VECTOR_END (target_option)
  
  /* For targets that need to mark extra registers as live on entry to
Index: targhooks.c
===
*** targhooks.c	(revision 217057)
--- targhooks.c	(working copy)
*** default_target_can_inline_p (tree caller
*** 1322,1327 
--- 1322,1334 
return ret;
  }
  
+ bool 
+ default_can_compare_and_swap_p (machine_mode mode, bool allow_libcall)
+ {
+   return can_compare_and_swap_p (mode, allow_libcall);
+ }
+ 
+ 
  #ifndef HAVE_casesi
  # define HAVE_casesi 0
  #endif
Index: targhooks.h
===
*** targhooks.h	(revision 217057)
--- targhooks.h	(working copy)
*** extern tree std_gimplify_va_arg_expr (tr
*** 220,224 
--- 220,226 
  extern bool can_use_doloop_if_innermost (const widest_int ,
  	 const widest_int ,
  	 unsigned int, bool);
+ extern bool default_can_compare_and_swap_p (machine_mode, bool);
+ 
  
  #endif /* GCC_TARGHOOKS_H */
Index: doc/tm.texi
===
*** doc/tm.texi	(revision 217057)
--- doc/tm.texi	(working copy)
*** default, inlining is not allowed if the
*** 9742,9747 
--- 9742,9754 
  specific target options and the caller does not use the same options.
  @end deftypefn
  
+ @deftypefn {Target Hook} bool TARGET_CAN_COMPARE_AND_SWAP_P (machine_mode @var{mode}, bool @var{allow_libcall})
+ This macro returns true if the target can perform an atomic
+ compare_and_swap operation on an object of machine mode @var{mode}.
+ @var{allow_libcall} is a boolean which is used to indicate whether the
+ operation is allowed to be performed with a libcall.
+ @end deftypefn
+ 
  @node Emulated TLS
  @section Emulating TLS
  @cindex Emulated TLS
Index: doc/tm.texi.in
===
*** doc/tm.texi.in	(revision 217057)
--- doc/tm.texi.in	(working copy)
*** on this implementation detail.
*** 7227,7232 
--- 7227,7234 
  
  @hook TARGET_CAN_INLINE_P
  
+ @hook TARGET_CAN_COMPARE_AND_SWAP_P
+ 
  @node Emulated TLS
  @section Emulating TLS
  @cindex Emulated TLS
Index: java/builtins.c
===
*** java/builtins.c	(revision 217057)
--- java/builtins.c	(working copy)
*** The Free Software Foundation is independ
*** 37,49 
  #include flags.h
  #include langhooks.h
  #include java-tree.h
! 
! /* FIXME: All these headers are necessary for sync_compare_and_swap.
!Front ends should never have to look at that.  */
! #include rtl.h
! #include insn-codes.h
! #include expr.h
! #include optabs.h
  
  static tree max_builtin (tree, tree);
  static tree min_builtin (tree, tree);
--- 37,43 
  #include flags.h
  #include langhooks.h
  #include java-tree.h
! #include target.h
  
  static tree max_builtin (tree, tree);
  static tree min_builtin (tree, tree);
*** compareAndSwapInt_builtin 

Re: [PATCH 13/27] New file: gcc/jit/jit-recording.c

2014-11-04 Thread David Malcolm
On Mon, 2014-11-03 at 15:03 -0700, Jeff Law wrote:
 On 10/31/14 11:02, David Malcolm wrote:
  Implementation of the gcc::jit::recording internal API, so that
  libgccjit.c can record the calls that are made to the public API, for
  later playback by the dummy frontend.
 
  gcc/jit/
  * jit-recording.c: New.
 Cursory review since it's the JIT implementation directory which you'll 
 likely end up owning...
 
 
  +
  +/* Assuming that this block has been terminated, get the number of
  +   successor blocks, which will be 0, 1 or 2, for return, unconditional
  +   jump, and conditional jump respectively.
  +   NEXT1 and NEXT2 must be non-NULL.  The first successor block (if any)
  +   is written to NEXT1, and the second (if any) to NEXT2.
  +
  +   Used when validating functions, and when dumping dot representations
  +   of them.  */
 So presumably no multi-way branches yet?  Might be better to build the 
 API in such a way that it handles multi-way branches from the get-go 
 rather than retrofitting later.  Your call.

This is an internal API.  If the external API needs to support building
switch statements from client code maybe with something like this:

  extern void
  gcc_jit_block_end_with_switch (gcc_jit_block *block,
 gcc_jit_location *loc,
 gcc_jit_rvalue *expr,
 int num_cases,
 gcc_jit_rvalue **case_values,
 gcc_jit_block **case_blocks,
 gcc_jit_block *default_block);

then this internal API would obviously need to change, maybe becoming:

  virtual int get_num_successor_blocks () const;
  virtual block *get_successor_block (int index) const;

but that won't affect the public API.

That said it's not clear to me that we do need to support switch
statements; no JIT implementation I've seen has needed them
(I put them in the Probably not needed section of TODO.rst).

 Otherwise I don't see anything objectionable.  OK for the trunk.

Thanks
Dave





Re: libstdc++ testsuite make targets check-parallel and check-performance don't work anymore

2014-11-04 Thread Paolo Carlini

On 11/04/2014 04:46 PM, Paolo Carlini wrote:

Hi,

On 11/04/2014 04:32 PM, Jonathan Wakely wrote:
Not a big deal of course, but unfortunately today I'm seeing *two* 
segfaults

for pb_ds:

performance/ext/pb_ds/multimap_text_insert_mem_large.cc
.../libstdc++-v3/scripts/check_performance: line 41: 16173 Segmentation
fault  ./$EXE_NAME tmp.$FILE_NAME
performance/ext/pb_ds/multimap_text_insert_mem_small.cc
.../libstdc++-v3/scripts/check_performance: line 41: 16217 Segmentation
fault  ./$EXE_NAME tmp.$FILE_NAME
They were both caused by the infinite recursion in the operator== for
tracker_allocator, which should be fixed.
To be clear, I see (deterministically, apparently) the three pb_ds 
fails which I reported earlier today with a *current* tree. But the 
pb_ds issues are definitely low priority, unless they point to 
something serious elsewhere. Thanks for looking into that.
Ah! The testsuite_allocator.h fix of yours is still unapplied, didn't 
know that, sorry ;)


Paolo.


Re: [PATCH 14/27] New files: gcc/jit/jit-builtins.{c|h}

2014-11-04 Thread David Malcolm
On Mon, 2014-11-03 at 14:04 -0700, Jeff Law wrote:
 On 10/31/14 11:02, David Malcolm wrote:
  These files implement support for builtins, for the
 gcc_jit_context_get_builtin_function
  API entrypoint.
 
  Only a subset of builtins are currently supported, based on those
  that I needed when porting GNU Octave's JIT.
 
  Attempts to use other builtins may lead to an error:
 unimplemented primitive type for builtin
  being emitted on the gcc_jit_context.
 
  gcc/jit/
  * jit-builtins.c: New.
  * jit-builtins.h: New.
 So how do you envision maintenance on this in the future?  Reality is 
 folks are regularly adding new builtins, are they going to have to 
 update the JIT too?

I believe the jit will only need updating if someone adds a new *macro*
to builtin-types.def e.g. if someone adds a new

  DEF_FUNCTION_TYPE_9

or somesuch.

Everything is keyed off the .def files, so GCC developers adding a new
builtin shouldn't need to touch the jit.

As I noted above, only a subset of the types are currently supported
from the jit; attempts to use builtins of other types from the jit will
fail at runtime with the type error I pasted above (and can be fixed up
on a case-by-case basis if the users file bug reports).

 OK for the trunk.

Thanks
Dave



Re: The nvptx port [10/11+] Target files

2014-11-04 Thread Richard Henderson
On 10/28/2014 03:56 PM, Bernd Schmidt wrote:
 +nvptx_ptx_type_from_mode (enum machine_mode mode, bool promote)
 +{
 +  switch (mode)
 +{
 +case BLKmode:
 +  return .b8;
 +case BImode:
 +  return .pred;
 +case QImode:
 +  if (promote)
 + return .u32;
 +  else
 + return .u8;
 +case HImode:
 +  return .u16;

Promote here too?  Or does this have nothing to do with

 +static enum machine_mode
 +arg_promotion (enum machine_mode mode)
 +{
 +  if (mode == QImode || mode == HImode)
 +return SImode;
 +  return mode;
 +}


r~


Re: The nvptx port [10/11+] Target files

2014-11-04 Thread Bernd Schmidt

On 11/04/2014 05:48 PM, Richard Henderson wrote:

On 10/28/2014 03:56 PM, Bernd Schmidt wrote:

+nvptx_ptx_type_from_mode (enum machine_mode mode, bool promote)
+{
+  switch (mode)
+{
+case BLKmode:
+  return .b8;
+case BImode:
+  return .pred;
+case QImode:
+  if (promote)
+   return .u32;
+  else
+   return .u8;
+case HImode:
+  return .u16;


Promote here too?  Or does this have nothing to do with


+static enum machine_mode
+arg_promotion (enum machine_mode mode)
+{
+  if (mode == QImode || mode == HImode)
+return SImode;
+  return mode;
+}


No, these are different problems - the one in arg promotion is purely 
about KR C and trying to match untyped function decls with calls, while 
the type_from_mode bit was about some ptx ideosyncracy. Although I 
forget what the problem was, that code is more than a year old - I'll 
see if I can get rid of this.



Bernd



Re: libstdc++ testsuite make targets check-parallel and check-performance don't work anymore

2014-11-04 Thread Jonathan Wakely
On 4 November 2014 16:34, Paolo Carlini wrote:
 Ah! The testsuite_allocator.h fix of yours is still unapplied, didn't know
 that, sorry ;)

My bad - I thought I'd committed it!  Done now.


Re: [PATCH 10/27] New file: gcc/jit/libgccjit.c

2014-11-04 Thread David Malcolm
On Mon, 2014-11-03 at 13:32 -0700, Jeff Law wrote:
 On 10/31/14 11:02, David Malcolm wrote:
  This file implements the entrypoints of the library's public API.
 
  It performs error-checking at this boundary, before calling into the
  jit-recording.h internal API.
 
  gcc/jit/
  * libgccjit.c: New.
  ---
gcc/jit/libgccjit.c | 1506 
  +++
1 file changed, 1506 insertions(+)
create mode 100644 gcc/jit/libgccjit.c
 
  +
  +#define IS_ASCII_ALPHA(CHAR) \
  +  (\
  +((CHAR) = 'a'  (CHAR) ='z')\
  +|| \
  +((CHAR) = 'A'  (CHAR) = 'Z')   \
  +  )
  +
  +#define IS_ASCII_DIGIT(CHAR) \
  +  ((CHAR) = '0'  (CHAR) ='9')
  +
  +#define IS_ASCII_ALNUM(CHAR) \
  +  (IS_ASCII_ALPHA (CHAR) || IS_ASCII_DIGIT (CHAR))
 Can't we rely on the C library to give us equivalents?

I've been burned in the past by the C library using locales, in
particular the two lowercase i variants in Turkish.

These macros are used by gcc_jit_context_new_function to enforce C's
naming restrictions, to avoid errors from the assembler.  The comment I
put there was:

  /* The assembler can only handle certain names, so for now, enforce
 C's rules for identifiers upon the name.
 Eventually we'll need some way to interact with e.g. C++ name mangling.  */

Am I right in thinking that for the assembler we need to enforce the C
naming rules specifically on *ASCII*.

(clearly another comment is needed here).

  +
  +/* TODO: mark failure branches as unlikely? */
  +
 Not likely worth the effort.  And it'd be better to somehow mark 
 jit_error such that anytime a path unconditionally calls jit_error, the 
 whole path is considered unlikely.

(nods)

 I think it was Ball  Larus that had a predictor of this nature, I don't 
 recall its effectiveness offhand.  But I'd rather be marking the 
 function than sprinlkling unlikely all over the actual codepaths.

Presumably by marking it with __attribute__((cold)) ?  (with a suitable
macro in case we're not being compiled with a gcc that supports it).

I suspect doing so will be of more use in teaching me about gcc
internals and the effect on generated code that in measurable benefits
to said code in this case :)

  +static bool
  +compatible_types (gcc::jit::recording::type *ltype,
  + gcc::jit::recording::type *rtype)
 All function definitions should have a block comment describing the 
 function and its arguments.   This comment applies throughout the code 
 and needs to be addressed prior to commit.  In fact I really can't even 
 continue review of this code due to the lack of comments -- I rely 
 heavily on them.

Sorry.  I'll post a followup with comments added.

Many of the functions are public API entrypoints, where there's a
comment in the public header.  Should I simply duplicate the comment
from there into the .c file, or put a comment like:

  /* Public entrypoint.  See description in libgccjit.h.  */

for each of these?  (perhaps with additional text giving implementation
notes?)


Thanks for all the reviews.  Looks like this and patch 16 are now the
only non-approved parts of the kit (I didn't see a review of 16).

Dave



Re: RFC: Building a minimal libgfortran for nvptx

2014-11-04 Thread Steve Kargl
On Tue, Nov 04, 2014 at 04:54:54PM +0100, Bernd Schmidt wrote:
 On 11/04/2014 04:41 PM, Steve Kargl wrote:
  It is unclear to me from reading the diff whether this patch
  cause gfortran on ptx to knowingly violate the fortran standard.
  If the answer is yes, this patch causes gfortran on ptx to
  violate the standard, then the patch is IMHO unacceptable.
 
 I don't have the Fortran standard, but I assume that missing pieces in 
 the library would be a violation. However, the alternative is no Fortran 
 (library) support at all, which doesn't seem like an improvement. The 
 target simply does not allow full language support, even for something 
 like C.
 
 Note that the intention is not to support Fortran (or any other 
 language) directly targetting ptx code. The only way it's supposed to be 
 used is as an accelerator for OpenACC offloading.
 

I see.  I get nervous when a patch appears that throws away
a part of the runtime library.  There are typically unintended
consequences, which then becomes a support issue.

-- 
Steve


[RFC PATCH v4] Fix PR56480 aka DR374. Allow explicit specialization in enclosing namespace.

2014-11-04 Thread Markus Trippelsdorf
On 2014.11.03 at 08:47 -0600, Jason Merrill wrote:
 On 11/03/2014 05:27 AM, Markus Trippelsdorf wrote:
  BTW both EDG and clang reject g++.dg/template/spec17.C:
 
  namespace io {
 template typename int foo();
  }
  using namespace io;
  template int fooint();
 
  But I think it is a reasonable extension to accept it.
 
 We should reject it, too.  We can only leave out the namespace 
 qualification with inline namespaces.
 
  +// { dg-options -pedantic -pedantic-errors }
 
 You shouldn't need these dg-options lines; if a testcase has no 
 dg-options line, -pedantic-errors is the default.

The following patch passes testing and fixes both issues.
Please let me know if you're OK with the general direction, before I
write a full Changelog.

Thanks.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 320c39f636a2..ddfa5ebfd968 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -7855,7 +7855,8 @@ grokfndecl (tree ctype,
   decl = check_explicit_specialization (orig_declarator, decl,
template_count,
2 * funcdef_flag +
-   4 * (friendp != 0));
+   4 * (friendp != 0) +
+   8 * (in_namespace != 0));
   if (decl == error_mark_node)
 return NULL_TREE;
 
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 2cf10f442f68..5cc320ed2fcf 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -755,57 +755,58 @@ end_explicit_instantiation (void)
   processing_explicit_instantiation = false;
 }
 
-/* An explicit specialization or partial specialization of TMPL is being
-   declared.  Check that the namespace in which the specialization is
-   occurring is permissible.  Returns false iff it is invalid to
-   specialize TMPL in the current namespace.  */
+/* If specialization is true, an explicit specialization or partial
+   specialization of TMPL is being declared.  Check that the namespace
+   in which the specialization is occurring is permissible.  Returns
+   false iff it is invalid to specialize TMPL in the current namespace.
+   If specialization is false, tmpl is an explicit instantiation.
+   Check that it is valid to perform this explicit instantiation in
+   the current namespace.  */
 
 static bool
-check_specialization_namespace (tree tmpl)
+check_instant_or_special_namespace (tree tmpl, bool specialization)
 {
   tree tpl_ns = decl_namespace_context (tmpl);
 
-  /* [tmpl.expl.spec]
-
- An explicit specialization shall be declared in the namespace of
- which the template is a member, or, for member templates, in the
- namespace of which the enclosing class or enclosing class
- template is a member.  An explicit specialization of a member
- function, member class or static data member of a class template
- shall be declared in the namespace of which the class template is
- a member.  */
-  if (current_scope() != DECL_CONTEXT (tmpl)
-   !at_namespace_scope_p ())
-{
-  error (specialization of %qD must appear at namespace scope, tmpl);
-  return false;
-}
-  if (is_associated_namespace (current_namespace, tpl_ns))
-/* Same or super-using namespace.  */
-return true;
-  else
+  if (specialization)
 {
-  permerror (input_location, specialization of %qD in different 
namespace, tmpl);
-  permerror (input_location,   from definition of %q+#D, tmpl);
-  return false;
-}
-}
+  /* [tmpl.expl.spec]
 
-/* SPEC is an explicit instantiation.  Check that it is valid to
-   perform this explicit instantiation in the current namespace.  */
-
-static void
-check_explicit_instantiation_namespace (tree spec)
-{
-  tree ns;
+An explicit specialization shall be declared in the namespace of
+which the template is a member, or, for member templates, in the
+namespace of which the enclosing class or enclosing class
+template is a member.  An explicit specialization of a member
+function, member class or static data member of a class template
+shall be declared in the namespace of which the class template is
+a member.  */
+  if (current_scope () != DECL_CONTEXT (tmpl)  !at_namespace_scope_p ())
+   {
+ error (specialization of %qD must appear at namespace scope, tmpl);
+ return false;
+   }
+  if (is_associated_namespace (current_namespace, tpl_ns))
+   /* Same or super-using namespace.  */
+   return true;
+  /* DR374: explicit specializations of templates in an enclosing
+namespace are allowed.  */
+  else if (cxx_dialect  cxx11)
+   {
+ pedwarn (input_location, OPT_Wpedantic,
+  specialization of %qD in different namespace, tmpl);
+ pedwarn (input_location, OPT_Wpedantic,   from definition of %q+#D,
+  tmpl);
+   }
+}
 
   /* DR 275: An explicit instantiation shall appear in an enclosing
  namespace of its 

Re: RFC: Building a minimal libgfortran for nvptx

2014-11-04 Thread FX
 See https://gcc.gnu.org/wiki/Offloading and kyukhin/gomp4-offload and
 branches/gomp-4_0-branch branches.  Both are in the process of being merged
 into trunk these days.

Thanks for the link, I’ll look into it.
I suppose then it makes sense to provide partial libgfortran support, assuming 
someone is volunteering to make offloading work nicely for Fortran code and 
maintain it.

FX

Re: RFC: Building a minimal libgfortran for nvptx

2014-11-04 Thread Jeff Law

On 11/04/14 09:11, Jakub Jelinek wrote:

On Tue, Nov 04, 2014 at 07:41:42AM -0800, Steve Kargl wrote:

On Tue, Nov 04, 2014 at 03:34:43PM +0100, Bernd Schmidt wrote:

The ptx port by its nature is lacking features that are expected on
normal machines, such as alloca and indirect jumps. We have a subset
of the C library which contains functions that can be implemented on the
target (excluding things like file I/O other than printf which is a ptx
builtin).

It would be good to be able to also build as much of libgfortran as
possible, and the following patch is what I've been using so far. It
recognizes the target at configure time and restricts the list of
compiled files, as well as providing a LIBGFOR_MINIMAL define for #ifdef
tests. There's also a new file minimal.c which contains alternative
implementations of some functionality (using printf to write error
messages rather than fprintf and such). Constructors are currently
unimplemented on ptx and therefore the init function is commented out.

Comments on the approach, do the Fortran maintainers have a preference
how this should look? The whole thing is good enough to substantially
reduce the number of failures when trying to run the Fortran testsuites
on nvptx (although many remain).



It is unclear to me from reading the diff whether this patch
cause gfortran on ptx to knowingly violate the fortran standard.
If the answer is yes, this patch causes gfortran on ptx to
violate the standard, then the patch is IMHO unacceptable.


The point is, if the target can implement just a subset of the Fortran (or
C or C++) standards, then ideally if you use anything that is not supported
would just cause always host fallback, the code will still work, but will
not be offloaded.  So even supporting a subset of the standard is
worthwhile, usually one will just offload the most performance critical
parts of his code.
Also note there's a reasonable chance that the GPUs will continue to 
evolve and will be able to support more of the standard language 
features.  Not sure if they'll ever do the IO side of Fortran, but they 
could always surprise us.


jeff


Re: [patch] Provide a can_compare_and_swap_p target hook.

2014-11-04 Thread Richard Henderson
On 11/04/2014 05:28 PM, Andrew MacLeod wrote:
 + bool 
 + default_can_compare_and_swap_p (machine_mode mode, bool allow_libcall)
 + {
 +   return can_compare_and_swap_p (mode, allow_libcall);
 + }

This is silly.  I think the problem you point out can be better fixed by moving
the can_compare_and_swap_p prototype elsewhere.


r~


Re: RFC: Building a minimal libgfortran for nvptx

2014-11-04 Thread Steve Kargl
On Tue, Nov 04, 2014 at 10:20:53AM -0700, Jeff Law wrote:
 On 11/04/14 09:11, Jakub Jelinek wrote:
  On Tue, Nov 04, 2014 at 07:41:42AM -0800, Steve Kargl wrote:
  On Tue, Nov 04, 2014 at 03:34:43PM +0100, Bernd Schmidt wrote:
  The ptx port by its nature is lacking features that are expected on
  normal machines, such as alloca and indirect jumps. We have a subset
  of the C library which contains functions that can be implemented on the
  target (excluding things like file I/O other than printf which is a ptx
  builtin).
 
  It would be good to be able to also build as much of libgfortran as
  possible, and the following patch is what I've been using so far. It
  recognizes the target at configure time and restricts the list of
  compiled files, as well as providing a LIBGFOR_MINIMAL define for #ifdef
  tests. There's also a new file minimal.c which contains alternative
  implementations of some functionality (using printf to write error
  messages rather than fprintf and such). Constructors are currently
  unimplemented on ptx and therefore the init function is commented out.
 
  Comments on the approach, do the Fortran maintainers have a preference
  how this should look? The whole thing is good enough to substantially
  reduce the number of failures when trying to run the Fortran testsuites
  on nvptx (although many remain).
 
 
  It is unclear to me from reading the diff whether this patch
  cause gfortran on ptx to knowingly violate the fortran standard.
  If the answer is yes, this patch causes gfortran on ptx to
  violate the standard, then the patch is IMHO unacceptable.
 
  The point is, if the target can implement just a subset of the Fortran (or
  C or C++) standards, then ideally if you use anything that is not supported
  would just cause always host fallback, the code will still work, but will
  not be offloaded.  So even supporting a subset of the standard is
  worthwhile, usually one will just offload the most performance critical
  parts of his code.
 Also note there's a reasonable chance that the GPUs will continue to 
 evolve and will be able to support more of the standard language 
 features.  Not sure if they'll ever do the IO side of Fortran, but they 
 could always surprise us.
 

Thanks for the explanation.  Certainly mapping Fortran's array syntax
and many of the intrinsic subprograms to an accelerator may provide a
nice speed improvement.  It wasn't clear to me from Bernd original message
that the patch was intended for OpenACC offloading.  I was assuming
that ptx was a new cpu architecture, which had limited capabilites.

-- 
Steve


Go patch committed: untyped bool for logical operators

2014-11-04 Thread Ian Taylor
The Go language was tweaked to permit logical operators to return an
untyped boolean value, which means that code like
type myBool bool
var b myBool = a  b || c  d
is now permitted (previously the || operator would return the named
type bool and an explicit conversion was required for this code).

This patch from Chris Manghane implements this in gccgo, simply by
removing the old code that prevented it from working.  This requires
adjusting one existing testcase.  This is http://golang.org/issue/6671
.

Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 217049)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -5314,15 +5314,6 @@ Binary_expression::do_determine_type(con
   subcontext.type = NULL;
 }
 
-  if (this-op_ == OPERATOR_ANDAND || this-op_ == OPERATOR_OROR)
-{
-  // For a logical operation, the context does not determine the
-  // types of the operands.  The operands must be some boolean
-  // type but if the context has a boolean type they do not
-  // inherit it.  See http://golang.org/issue/3924.
-  subcontext.type = NULL;
-}
-
   // Set the context for the left hand operand.
   if (is_shift_op)
 {
Index: gcc/testsuite/go.test/test/fixedbugs/issue3924.go
===
--- gcc/testsuite/go.test/test/fixedbugs/issue3924.go   (revision 217049)
+++ gcc/testsuite/go.test/test/fixedbugs/issue3924.go   (working copy)
@@ -1,4 +1,4 @@
-// errorcheck
+// compile
 
 // Copyright 2012 The Go Authors.  All rights reserved.
 // Use of this source code is governed by a BSD-style
@@ -9,5 +9,5 @@ package foo
 type mybool bool
 
 var x, y = 1, 2
-var _ mybool = x  y  x  y // ERROR cannot use
-var _ mybool = x  y || x  y // ERROR cannot use
+var _ mybool = x  y  x  y
+var _ mybool = x  y || x  y


Re: [patch] Provide a can_compare_and_swap_p target hook.

2014-11-04 Thread Andrew MacLeod

On 11/04/2014 12:25 PM, Richard Henderson wrote:

On 11/04/2014 05:28 PM, Andrew MacLeod wrote:

+ bool
+ default_can_compare_and_swap_p (machine_mode mode, bool allow_libcall)
+ {
+   return can_compare_and_swap_p (mode, allow_libcall);
+ }

This is silly.  I think the problem you point out can be better fixed by moving
the can_compare_and_swap_p prototype elsewhere.

yeah, except it uses some of the optab table stuff that is static to 
optabs.c...   so the basic functionality remains there.   we could stick 
a wrapper somewhere else... but thats effectively what this does.

we are asking a question about target behaviour...

Im open to other suggestions..

Andrew


Re: [patch] Provide a can_compare_and_swap_p target hook.

2014-11-04 Thread Richard Henderson
On 11/04/2014 06:56 PM, Andrew MacLeod wrote:
 On 11/04/2014 12:25 PM, Richard Henderson wrote:
 On 11/04/2014 05:28 PM, Andrew MacLeod wrote:
 + bool
 + default_can_compare_and_swap_p (machine_mode mode, bool allow_libcall)
 + {
 +   return can_compare_and_swap_p (mode, allow_libcall);
 + }
 This is silly.  I think the problem you point out can be better fixed by 
 moving
 the can_compare_and_swap_p prototype elsewhere.

 yeah, except it uses some of the optab table stuff that is static to
 optabs.c...   so the basic functionality remains there.

I said move the prototype.  Of course the implementation remains where it is.


r~


Re: RFC: Building a minimal libgfortran for nvptx

2014-11-04 Thread Jeff Law

On 11/04/14 08:54, Bernd Schmidt wrote:

Note that the intention is not to support Fortran (or any other
language) directly targetting ptx code. The only way it's supposed to be
used is as an accelerator for OpenACC offloading.
Right.  To reiterate for everyone, offloading is the goal of the nvptx 
port.  Normal C, C++, Fortran code really isn't interesting, though the 
direction the hardware is going should allow more and more C, C++  
Fortran code to be used as-is on the GPU.


Bernd has done a fair amount of work to allow normal-ish C, C++  
Fortran code to run, that's really been to allow running GCC's testsuite 
to shake out the first level bugs in the ptx support.


Jeff


Re: Recent go changes broke alpha bootstrap

2014-11-04 Thread Uros Bizjak
On Tue, Nov 4, 2014 at 12:20 PM, Uros Bizjak ubiz...@gmail.com wrote:

 Bitfields can really not be represented properly in Go (think about
 constructs like struct { int : 1; int bf : 1; }), I'd rather not
 try to represent them in a predictable way.  The patched code may
 or may not give them a name, and reserves the proper amount of
 padding where required in structs.  If a union begins with an
 anonymous bitfield (which makes no sense), that is ignored.  If a
 union begins with a named bitfield (possibly after unnamed ones),
 this may or may not be used as the (sole) element of the Go
 struct.


 Thanks.  I committed your patch.

 I have checked that the patch fixes alpha bootstrap with libgo.

Also passes gcc.misc-tests/godump-1.c test.

2014-11-04  Uros Bizjak  ubiz...@gmail.com

* gcc.misc-tests/godump-1.c (dg-skip-if): Add alpha*-*-*.

Tested on alphaev68-linux-gnu and committed to mainline SVN.

Uros.

Index: gcc.misc-tests/godump-1.c
===
--- gcc.misc-tests/godump-1.c   (revision 217089)
+++ gcc.misc-tests/godump-1.c   (working copy)
@@ -2,7 +2,7 @@

 /* { dg-options -c -fdump-go-spec=godump-1.out } */
 /* { dg-do compile } */
-/* { dg-skip-if not supported for target { ! s390*-*-* i?86-*-*
x86_64-*-* } } */
+/* { dg-skip-if not supported for target { ! alpha*-*-* s390*-*-*
i?86-*-* x86_64-*-* } } */
 /* { dg-skip-if not supported for target { ! lp64 } } */

 #include stdint.h


Re: [patch] Provide a can_compare_and_swap_p target hook.

2014-11-04 Thread Andrew MacLeod

On 11/04/2014 12:57 PM, Richard Henderson wrote:

On 11/04/2014 06:56 PM, Andrew MacLeod wrote:

On 11/04/2014 12:25 PM, Richard Henderson wrote:

On 11/04/2014 05:28 PM, Andrew MacLeod wrote:

+ bool
+ default_can_compare_and_swap_p (machine_mode mode, bool allow_libcall)
+ {
+   return can_compare_and_swap_p (mode, allow_libcall);
+ }

This is silly.  I think the problem you point out can be better fixed by moving
the can_compare_and_swap_p prototype elsewhere.


yeah, except it uses some of the optab table stuff that is static to
optabs.c...   so the basic functionality remains there.

I said move the prototype.  Of course the implementation remains where it is.

prototype is in optabs.h where it belongs since its defined in 
optabs.c.  :-)


I'm not sure why this is much different than something like the targhook 
for builtin_support_vector_misalignment(), other than we are calling the 
routine in optabs.c rather than putting the actual code in targhooks.c.


from targhooks.c:
bool
default_builtin_support_vector_misalignment (machine_mode mode, 
const_tree type, ...)

 {
  if (optab_handler (movmisalign_optab, mode) != CODE_FOR_nothing)
return true;
  return false;
}

the idea is to move all the functionality that front ends need into well 
defined and controlled places so we can increase the separation.  can 
perform a  compare_and_swap operation is clearly a target specific 
question isn't it?


Andrew








Re: libgomp testsuite: (not) using a specific driver for C++, Fortran?

2014-11-04 Thread Mike Stump
On Nov 4, 2014, at 4:13 AM, Thomas Schwinge tho...@codesourcery.com wrote:
 
 On Wed, 15 Oct 2014 17:46:48 +0200, I wrote:
 No matter whether it's C, C++, or Fortran source code, the libgomp
 testsuite always uses (for build-tree testing) gcc/xgcc, or (for
 installed testing) GCC_UNDER_TEST.  It doesn't make use of
 GXX_UNDER_TEST, GFORTRAN_UNDER_TEST.  To support the latter two
 languages' needs, some -l[...] flags are then added via lang_link_flags.
 For example, for Fortran this is -lgfortran.  This is, however, not what
 would happen if using the gfortran driver to build (which is what a user
 would be doing -- which we should replicate as much as possible at least
 for installed testing): the gfortran driver also adds -lquadmath, if
 applicable.
 
 Now, I wonder why to re-invent all that in the libgomp testsuite, if the
 respective driver already has that knowledge, via spec files, for
 example?  (Also, the regular GCC compiler tests, gcc/testsuite/, are
 doing the right thing.)  Why is libgomp testsuite implemented this way --
 just a legacy of the past, or is there a need for that (that I'm not
 seeing)?
 
 Maybe the question also is: why isn't the libgomp testsuite using more of
 the infrastructure for specific languages, that is already implemented in
 gcc/testsuite/lib/?  (That is, why does libgomp have to use
 libgomp_target_compile, instead of [language]_target_compile, for
 example.)
 
 (I decided not to look into that latter idea, at the moment.)
 
 And maybe that problem also applied to additional target libraries'
 testsuites; I have not yet looked.
 
 (It does, but I'm not addressing that with the following patches.)
 
 Anyway, here is a prototype patch to describe how I began to address this
 for the issue I stumbled upon, which is that the linker complained:
 
Executing on host: x86_64-none-linux-gnu-gcc 
 [...]/libgomp/testsuite/libgomp.fortran/aligned1.f03 [...] -fopenmp   -O0  
 -fopenmp -fcray-pointer   -lgfortran -lm   -o ./aligned1.exe(timeout = 
 300)
[...]/ld: warning: libquadmath.so.0, needed by [...]/libgfortran.so, not 
 found (try using -rpath or -rpath-link)
[...]/libgfortran.so: undefined reference to `logq@QUADMATH_1.0'
[...]
 
 (That goes away if I add -lquadmath to the command line, but that's not
 the point I'm making here.)
 
 Am I on the right track with the following?
 
 Nobody commented, which also means nobody disagreed

:-)

 OK to commit all that to trunk?

Ok, thanks.

If you could, one minor point, s/puts/verbose “boa bla“ N/   I know it was 
preexisting.

I wonder if any variables that you set that need to be cleared out are, that 
seems a defect in the api or conventions we use and it can screw following 
tests with a wrong environment in subtle ways.  I could miss accidental bleed 
over from the .exp you’re modifying into other unrelated parts of the test 
suite.  Usually, we just try and be careful and clean it up if it breaks.

Watch for any review points from the libgomp people, they might trickle a few 
in.  I don’t mean to cut short any review points from them.

Also, please watch for breakage.

Re: [PATCH] Optimize UBSAN_NULL checks, add sanopt.c

2014-11-04 Thread Marek Polacek
On Mon, Nov 03, 2014 at 04:35:00PM +0100, Jakub Jelinek wrote:
 Well, in theory you could just script removing them one by one and just
 make sanopt.o after each step to see if the header is or is not needed,
 perhaps with some manual tweeks.

I pruned those headers some more.
 
 Perhaps in preparation for future optimizations (other UBSAN_*
 calls, and ASAN_CHECK and tsan builtins), you should consider
 putting the hash_map into some structure and pass address of that
 structure instead, so that you have all the pass context at the same spot.
 You could put asan_num_accesses in there too, see below.
 
Done.

 I think it would be better to walk the vector backwards instead of forward.
 1) if you have the same SSA_NAME there multiple times, ignoring the already
unnecessary stmts, the only case where you'd have multiple stmts is
if the earlier stmts dominate the following stmts and for some reason
weren't optimized away; that for some reason currently should be
just higher required alignment in the dominated stmt (e.g. say have
UBSAN_NULL (foo_23, 0); UBSAN_NULL (foo_23, 2); UBSAN_NULL (foo_23, 4);
where the first stmt dominates the second two and second stmt dominates
the last one.
 2) All the si-visited_p stmts should be always at the end of the vector
IMHO, preceeded only by !si-visited_p stmts.
 Thus, when walking backwards, first remove the stmts with bb-visited_p,
 once you hit one that doesn't have it set, I believe there shouldn't be any
 further.  And then in theory it should be fine to just compare the last
 stmt in the vector that was left (if any).
 
I agree that should work - so changed.

 Perhaps you could return the asan_num_accesses value computed during
 sanopt_optimize (just count IFN_ASAN_CHECK calls that you don't optimize
 away), and do this only in else if (i.e. when sanopt_optimize has not been
 run).  That way, you'd save one extra IL walk.

Done.

Bootstrap-ubsan/regtest passed on x86_64-linux, ok for trunk?

2014-11-04  Marek Polacek  pola...@redhat.com

* Makefile.in (OBJS): Add sanopt.o.
(GTFILES): Add sanopt.c.
* asan.h (asan_expand_check_ifn): Declare.
* asan.c (asan_expand_check_ifn): No longer static.
(class pass_sanopt, pass_sanopt::execute, make_pass_sanopt): Move...
* sanopt.c: ...here.  New file.
testsuite/
* c-c++-common/ubsan/align-2.c: Remove dg-output.
* c-c++-common/ubsan/align-4.c: Likewise.
* g++.dg/ubsan/null-1.C: Likewise.
* g++.dg/ubsan/null-2.C: Likewise.

diff --git gcc/Makefile.in gcc/Makefile.in
index 9c67fe2..f383032 100644
--- gcc/Makefile.in
+++ gcc/Makefile.in
@@ -1376,6 +1376,7 @@ OBJS = \
asan.o \
tsan.o \
ubsan.o \
+   sanopt.o \
tree-call-cdce.o \
tree-cfg.o \
tree-cfgcleanup.o \
@@ -2305,6 +2306,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/asan.c \
   $(srcdir)/ubsan.c \
   $(srcdir)/tsan.c \
+  $(srcdir)/sanopt.c \
   $(srcdir)/ipa-devirt.c \
   $(srcdir)/internal-fn.h \
   @all_gtfiles@
diff --git gcc/asan.c gcc/asan.c
index 8f146d2..79dede7 100644
--- gcc/asan.c
+++ gcc/asan.c
@@ -2497,7 +2497,7 @@ asan_finish_file (void)
 
 /* Expand the ASAN_{LOAD,STORE} builtins.  */
 
-static bool
+bool
 asan_expand_check_ifn (gimple_stmt_iterator *iter, bool use_calls)
 {
   gimple g = gsi_stmt (*iter);
@@ -2800,114 +2800,4 @@ make_pass_asan_O0 (gcc::context *ctxt)
   return new pass_asan_O0 (ctxt);
 }
 
-/* Perform optimization of sanitize functions.  */
-
-namespace {
-
-const pass_data pass_data_sanopt =
-{
-  GIMPLE_PASS, /* type */
-  sanopt, /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
-  TV_NONE, /* tv_id */
-  ( PROP_ssa | PROP_cfg | PROP_gimple_leh ), /* properties_required */
-  0, /* properties_provided */
-  0, /* properties_destroyed */
-  0, /* todo_flags_start */
-  TODO_update_ssa, /* todo_flags_finish */
-};
-
-class pass_sanopt : public gimple_opt_pass
-{
-public:
-  pass_sanopt (gcc::context *ctxt)
-: gimple_opt_pass (pass_data_sanopt, ctxt)
-  {}
-
-  /* opt_pass methods: */
-  virtual bool gate (function *) { return flag_sanitize; }
-  virtual unsigned int execute (function *);
-
-}; // class pass_sanopt
-
-unsigned int
-pass_sanopt::execute (function *fun)
-{
-  basic_block bb;
-
-  int asan_num_accesses = 0;
-  if (flag_sanitize  SANITIZE_ADDRESS)
-{
-  gimple_stmt_iterator gsi;
-  FOR_EACH_BB_FN (bb, fun)
-   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
- {
-   gimple stmt = gsi_stmt (gsi);
-   if (is_gimple_call (stmt)  gimple_call_internal_p (stmt)
-gimple_call_internal_fn (stmt) == IFN_ASAN_CHECK)
- ++asan_num_accesses;
- }
-}
-
-  bool use_calls = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD  INT_MAX
- asan_num_accesses = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD;
-
-  FOR_EACH_BB_FN (bb, fun)
-{
-  

Re: RFC: Building a minimal libgfortran for nvptx

2014-11-04 Thread N.M. Maclaren

On Nov 4 2014, Jeff Law wrote:

On 11/04/14 09:11, Jakub Jelinek wrote:


The point is, if the target can implement just a subset of the Fortran 
(or C or C++) standards, then ideally if you use anything that is not 
supported would just cause always host fallback, the code will still 
work, but will not be offloaded. So even supporting a subset of the 
standard is worthwhile, usually one will just offload the most 
performance critical parts of his code.


As I see it, this isn't a free-standing compilation environment, but a
component of one for heterogeneous architectures.  There are similar
issues for some embedded systems, in several languages.  That doesn't
fit well with the current build model, unfortunately :-(

Also note there's a reasonable chance that the GPUs will continue to 
evolve and will be able to support more of the standard language 
features.  Not sure if they'll ever do the IO side of Fortran, but they 
could always surprise us.


I am almost certain that the current situation is going to change
significantly, probably by 2020, but there is as yet no indication of
how.  And it wouldn't be entirely reasonable to say nothing should be
done for this sort of use until the situation becomes clear.


Regards,
Nick Maclaren.



[PATCH v2, i386]: Fix PR 63538, With -mcmodel=medium .lrodata accesses do not use 64-bit addresses

2014-11-04 Thread Uros Bizjak
On Tue, Nov 4, 2014 at 10:46 AM, Uros Bizjak ubiz...@gmail.com wrote:

 Following patch fixes PR 63538, where the data in the large data
 section was accessed through 32bit address. The patch unifies places
 where large data sections are determined and passes all declarations
 to ix86_in_large_data_p only.

 The patch fixes the testcase form the PR. Also, the code from several
 tests involving various -mlarge-data-threshold= settings looks
 consistent now.

[...]

 We probably want to avoid putting automatic variables (and STRING_CSTs
 ?) to large data section. However, it is important that we use the
 same condition in ix86_encode_section_info and
 x86_64_elf_select_section (and likes), otherwise the data can go into
 large section, while the access to the data will be via limited 32bit
 pointers.

 The attached alternative patch rejects automatic variables and
 STRING_CSTs from large data sections. Please note, that it still
 access the data with the correct instructions.

Considering that STRING_CSTs always have TREE_STATIC set, the patch
should allow them, similar to the patch at [1].

Attached is the final patch.

2014-11-04  Uros Bizjak  ubiz...@gmail.com

PR target/63538
* config/i386/i386.c (in_large_data_p): Reject automatic variables.
(ix86_encode_section_info): Do not check for non-automatic varibles
when setting SYMBOL_FLAG_FAR_ADDR flag.
(x86_64_elf_select_section): Do not check ix86_cmodel here.
(x86_64_elf_unique_section): Ditto.
(x86_elf_aligned_common): Emit tab before .largecomm.

testsuite/ChangeLog:

2014-11-03  Uros Bizjak  ubiz...@gmail.com

PR target/63538
* gcc.target/i386/pr63538.c: New test.

I plan to commit the patch in a couple of days to mainline and 4.9 branch.

[1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01963.html

Uros.
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 217089)
+++ gcc/config/i386/i386.c  (working copy)
@@ -5066,6 +5066,10 @@ ix86_in_large_data_p (tree exp)
   if (TREE_CODE (exp) == FUNCTION_DECL)
 return false;
 
+  /* Automatic variables are never large data.  */
+  if (TREE_CODE (exp) == VAR_DECL  !is_global_var (exp))
+return false;
+
   if (TREE_CODE (exp) == VAR_DECL  DECL_SECTION_NAME (exp))
 {
   const char *section = DECL_SECTION_NAME (exp);
@@ -5099,8 +5103,7 @@ ATTRIBUTE_UNUSED static section *
 x86_64_elf_select_section (tree decl, int reloc,
   unsigned HOST_WIDE_INT align)
 {
-  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
-   ix86_in_large_data_p (decl))
+  if (ix86_in_large_data_p (decl))
 {
   const char *sname = NULL;
   unsigned int flags = SECTION_WRITE;
@@ -5186,8 +5189,7 @@ x86_64_elf_section_type_flags (tree decl, const ch
 static void ATTRIBUTE_UNUSED
 x86_64_elf_unique_section (tree decl, int reloc)
 {
-  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
-   ix86_in_large_data_p (decl))
+  if (ix86_in_large_data_p (decl))
 {
   const char *prefix = NULL;
   /* We only need to use .gnu.linkonce if we don't have COMDAT groups.  */
@@ -5256,7 +5258,7 @@ x86_elf_aligned_common (FILE *file,
 {
   if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
size  (unsigned int)ix86_section_threshold)
-fputs (.largecomm\t, file);
+fputs (\t.largecomm\t, file);
   else
 fputs (COMMON_ASM_OP, file);
   assemble_name (file, name);
@@ -44230,9 +44232,7 @@ ix86_encode_section_info (tree decl, rtx rtl, int
 {
   default_encode_section_info (decl, rtl, first);
 
-  if (TREE_CODE (decl) == VAR_DECL
-   (TREE_STATIC (decl) || DECL_EXTERNAL (decl))
-   ix86_in_large_data_p (decl))
+  if (ix86_in_large_data_p (decl))
 SYMBOL_REF_FLAGS (XEXP (rtl, 0)) |= SYMBOL_FLAG_FAR_ADDR;
 }
 
Index: gcc/testsuite/gcc.target/i386/pr63538.c
===
--- gcc/testsuite/gcc.target/i386/pr63538.c (revision 0)
+++ gcc/testsuite/gcc.target/i386/pr63538.c (working copy)
@@ -0,0 +1,13 @@
+/* PR target/63538 */
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options -O2 -mcmodel=medium -mlarge-data-threshold=0 } */
+
+static char *str = Hello World;
+
+char *foo ()
+{
+  return str;
+}
+
+/* { dg-final { scan-assembler movabs } } */


Re: [PATCH v2, i386]: Fix PR 63538, With -mcmodel=medium .lrodata accesses do not use 64-bit addresses

2014-11-04 Thread Sriraman Tallam
On Tue, Nov 4, 2014 at 10:47 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Nov 4, 2014 at 10:46 AM, Uros Bizjak ubiz...@gmail.com wrote:

 Following patch fixes PR 63538, where the data in the large data
 section was accessed through 32bit address. The patch unifies places
 where large data sections are determined and passes all declarations
 to ix86_in_large_data_p only.

 The patch fixes the testcase form the PR. Also, the code from several
 tests involving various -mlarge-data-threshold= settings looks
 consistent now.

 [...]

 We probably want to avoid putting automatic variables (and STRING_CSTs
 ?) to large data section. However, it is important that we use the
 same condition in ix86_encode_section_info and
 x86_64_elf_select_section (and likes), otherwise the data can go into
 large section, while the access to the data will be via limited 32bit
 pointers.

 The attached alternative patch rejects automatic variables and
 STRING_CSTs from large data sections. Please note, that it still
 access the data with the correct instructions.

 Considering that STRING_CSTs always have TREE_STATIC set, the patch
 should allow them, similar to the patch at [1].

Yes, it would be nice to be able to have STRING_CSTs in .lrodata.
Thanks for the patch.

Sri


 Attached is the final patch.

 2014-11-04  Uros Bizjak  ubiz...@gmail.com

 PR target/63538
 * config/i386/i386.c (in_large_data_p): Reject automatic variables.
 (ix86_encode_section_info): Do not check for non-automatic varibles
 when setting SYMBOL_FLAG_FAR_ADDR flag.
 (x86_64_elf_select_section): Do not check ix86_cmodel here.
 (x86_64_elf_unique_section): Ditto.
 (x86_elf_aligned_common): Emit tab before .largecomm.

 testsuite/ChangeLog:

 2014-11-03  Uros Bizjak  ubiz...@gmail.com

 PR target/63538
 * gcc.target/i386/pr63538.c: New test.

 I plan to commit the patch in a couple of days to mainline and 4.9 branch.

 [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01963.html

 Uros.


Re: nvptx offloading patches [3/n], i386 bits RFD

2014-11-04 Thread Uros Bizjak
On Tue, Nov 4, 2014 at 1:35 PM, Bernd Schmidt ber...@codesourcery.com wrote:

 Not sure how to deal with this any further out than the immediate term
 than using a hack like this. Though I'd prefer to avoid the #ifdef as it
 seems to me this shouldn't be baked in at build/configure time.


 Yeah, I'm not expecting the i386 part to go in quite as-is. For reference
 I'm including the offload-abi patch - Ilya is submitting this along with
 other option changes. One possibility would be to print and recognize
 strings such as lp64D128 or lp64D96 which would include information about
 the size of long double. Somehow though I can't really bring myself to
 believe that -mlong-double128 is a real use case with offloading so we might
 just disallow the combination.

 CCing Uros in case he has an opinion.

-mlong-double-128 was introduced for Android in:

2014-02-03  H.J. Lu  hongjiu...@intel.com

* config/i386/i386.c (flag_opts): Add -mlong-double-128.
(ix86_option_override_internal): Default long double to 64-bit for
32-bit Bionic and to 128-bit for 64-bit Bionic.

[...]

IMO, if it troubles offloading, anything else than the default
-mlong-double-80 should be disallowed.

Uros.


Re: [PATCH] Optimize UBSAN_NULL checks, add sanopt.c

2014-11-04 Thread Jakub Jelinek
On Tue, Nov 04, 2014 at 07:36:26PM +0100, Marek Polacek wrote:
 +   FOR_EACH_VEC_ELT_REVERSE (v, i, g)
 + {
 +   /* Remove statements for BBs that have been
 +  already processed.  */
 +   sanopt_info *si = (sanopt_info *) gimple_bb (g)-aux;
 +   if (si-visited_p)
 + {
 +   v.unordered_remove (i);
 +   continue;
 + }
 +   /* At this point we shouldn't have any statements
 +  that aren't dominating the current BB.  */
 +   tree align = gimple_call_arg (g, 2);
 +   remove = tree_int_cst_le (cur_align, align);
 +   break;
 + }

As you only remove last item, that is pop.
So, how about

while (!v.is_empty ())
  {
gimple g = v.last ();
/* Remove statements for BBs that have been
   already processed.  */
sanopt_info *si = (sanopt_info *) gimple_bb (g)-aux;
if (si-visited_p)
  v.pop ();
else
  {
/* At this point we shouldn't have any statements
   that aren't dominating the current BB.  */
tree align = gimple_call_arg (g, 2);
remove = tree_int_cst_le (cur_align, align);
break;
  }
  }
?

The patch is ok either way.

Jakub


Re: [patch] Provide a can_compare_and_swap_p target hook.

2014-11-04 Thread Richard Biener
On November 4, 2014 7:30:18 PM CET, Andrew MacLeod amacl...@redhat.com wrote:
On 11/04/2014 12:57 PM, Richard Henderson wrote:
 On 11/04/2014 06:56 PM, Andrew MacLeod wrote:
 On 11/04/2014 12:25 PM, Richard Henderson wrote:
 On 11/04/2014 05:28 PM, Andrew MacLeod wrote:
 + bool
 + default_can_compare_and_swap_p (machine_mode mode, bool
allow_libcall)
 + {
 +   return can_compare_and_swap_p (mode, allow_libcall);
 + }
 This is silly.  I think the problem you point out can be better
fixed by moving
 the can_compare_and_swap_p prototype elsewhere.

 yeah, except it uses some of the optab table stuff that is static to
 optabs.c...   so the basic functionality remains there.
 I said move the prototype.  Of course the implementation remains
where it is.

prototype is in optabs.h where it belongs since its defined in 
optabs.c.  :-)

I'm not sure why this is much different than something like the
targhook 
for builtin_support_vector_misalignment(), other than we are calling
the 
routine in optabs.c rather than putting the actual code in targhooks.c.

from targhooks.c:
bool
default_builtin_support_vector_misalignment (machine_mode mode, 
const_tree type, ...)
  {
   if (optab_handler (movmisalign_optab, mode) != CODE_FOR_nothing)
 return true;
   return false;
}

the idea is to move all the functionality that front ends need into
well 
defined and controlled places so we can increase the separation.  can 
perform a  compare_and_swap operation is clearly a target specific 
question isn't it?

I would rather question what is so special about java that it needs to ask that 
and other frontends not.  Don't we have generic atomics support now?

Richard.

Andrew




[committed] Remove unused variables

2014-11-04 Thread Marek Polacek
I forgot to remove two unused variables.

Applying to trunk as obvious.

2014-11-04  Marek Polacek  pola...@redhat.com

* sanopt.c (sanopt_optimize_walker): Remove unused variables.

diff --git gcc/sanopt.c gcc/sanopt.c
index f1d51d1..4483ff7 100644
--- gcc/sanopt.c
+++ gcc/sanopt.c
@@ -117,9 +117,6 @@ sanopt_optimize_walker (basic_block bb, struct sanopt_ctx 
*ctx)
 for this pointer.  Perhaps we can drop this one.
 But only if this check doesn't specify stricter
 alignment.  */
- int i;
- gimple g;
-
  while (!v.is_empty ())
{
  gimple g = v.last ();

Marek


Re: [patch] Provide a can_compare_and_swap_p target hook.

2014-11-04 Thread Andrew MacLeod

On 11/04/2014 02:53 PM, Richard Biener wrote:

On November 4, 2014 7:30:18 PM CET, Andrew MacLeod amacl...@redhat.com wrote:

On 11/04/2014 12:57 PM, Richard Henderson wrote:

On 11/04/2014 06:56 PM, Andrew MacLeod wrote:

On 11/04/2014 12:25 PM, Richard Henderson wrote:

On 11/04/2014 05:28 PM, Andrew MacLeod wrote:

+ bool
+ default_can_compare_and_swap_p (machine_mode mode, bool

allow_libcall)

+ {
+   return can_compare_and_swap_p (mode, allow_libcall);
+ }

This is silly.  I think the problem you point out can be better

fixed by moving

the can_compare_and_swap_p prototype elsewhere.


yeah, except it uses some of the optab table stuff that is static to
optabs.c...   so the basic functionality remains there.

I said move the prototype.  Of course the implementation remains

where it is.
prototype is in optabs.h where it belongs since its defined in
optabs.c.  :-)

I'm not sure why this is much different than something like the
targhook
for builtin_support_vector_misalignment(), other than we are calling
the
routine in optabs.c rather than putting the actual code in targhooks.c.


from targhooks.c:

bool
default_builtin_support_vector_misalignment (machine_mode mode,
const_tree type, ...)
  {
   if (optab_handler (movmisalign_optab, mode) != CODE_FOR_nothing)
 return true;
   return false;
}

the idea is to move all the functionality that front ends need into
well
defined and controlled places so we can increase the separation.  can
perform a  compare_and_swap operation is clearly a target specific
question isn't it?

I would rather question what is so special about java that it needs to ask that 
and other frontends not.  Don't we have generic atomics support now?

Richard.

True... I don't know if this is a thing that simply predates our current 
level of support or if it is something else that is java specific for 
its builtins.

Don't know enough about java to comment.

aph? Looks like you wrote the originals in 2006...   Can the java CAS 
builtins simply use our current atomic calls rather than doing their own 
thing and querying whether the target has a sync compare and swap operation?


Andrew


Re: [x86, 2/n] Replace builtins with vector extensions

2014-11-04 Thread Marc Glisse

Ping?
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01808.html

On Sat, 18 Oct 2014, Marc Glisse wrote:


Hello,

this time, +-* for 128 bit integer vectors. I am using an unsigned type so 
the compiler knows that we expect wrapping. I don't know why Intel's 
description of mullo insists that the multiplication is signed, that only 
matters for the high part...


Next parts (waiting for approval for this one) should be:
- same thing with 256 and 512 bit integer vectors
-  | ^ (integer only)

Maybe (or it can wait until the next release):
-   == abs min max (integer only)


2014-10-20  Marc Glisse  marc.gli...@inria.fr

* config/i386/emmintrin.h (__v2du, __v4su, __v8hu, __v16qu): New
typedefs.
(_mm_add_epi8, _mm_add_epi16, _mm_add_epi32, _mm_add_epi64,
_mm_sub_epi8, _mm_sub_epi16, _mm_sub_epi32, _mm_sub_epi64,
_mm_mullo_epi16): Use vector extensions instead of builtins.
* config/i386/smmintrin.h (_mm_mullo_epi32): Likewise.




--
Marc Glisse


Re: nvptx offloading patches [1/n]

2014-11-04 Thread Jeff Law

On 11/03/14 16:07, Bernd Schmidt wrote:

On 11/03/2014 11:22 PM, Jeff Law wrote:

On 11/01/14 05:47, Bernd Schmidt wrote:

This is one of the patches required to make offloading via the LTO path
work when the machines involved differ.

x86 requires bigger alignments for some types than nvptx does, which
becomes an issue when reading LTO produced by the host compiler. The
problem with having a variable with DECL_ALIGN larger than the stack
alignment is that gcc will try to align the variable dynamically with an
alloca/rounding operation, and there isn't a working alloca on nvptx.
Besides, the overhead would be pointless.

The patch below restricts the alignments to the maximum possible when
reading in LTO data in an offload compiler. Unfortunately
BIGGEST_ALIGNMENT isn't suitable for this, as it can vary at runtime
with attribute((target)), and because vector modes can exceed it, so a
limit based on BIGGEST_ALIGNMENT would be unsuitable for some ports.
Instead I've added a hook called limit_offload_alignment which is called
when reading LTO on an offload compiler. It does nothing anywhere except
on ptx where it limits alignments to 64 bit.

Bootstrapped and tested on x86_64-linux. Ok?

Not ideal.

Doesn't this affect our ability to pass data back and forth between the
host and GPU?  Or is this strictly a problem with stack objects and thus
lives entirely on the GPU?


Communication between host and GPU is all done via some form of memcpy,
so I wouldn't expect this to be a problem.
They still need to agree on the layout of the structure.  And assuming 
it'll always be memcpy perhaps isn't wise.  Consider the possibility 
that one day (perhaps soon) the host and GPU may share address space  
memory.



Structure layouts and such
are decided by the host compiler and since that uses higher alignments,
they should work fine on the GPU. I believe the only thing this really
does is relax the requirements when allocating storage on the GPU side.
But if the structure has a higher alignment on the host and that 
structure is embedded in another structure or array, then that higher 
alignment affects the composite object's layout.


But again, if this just affects stack objects, then it shouldn't be a 
problem.


Jeff


Re: [PATCH 12/27] New file: gcc/jit/jit-recording.h

2014-11-04 Thread Jeff Law

On 11/04/14 09:12, David Malcolm wrote:

So give the complexities in interfacing with the guts of GCC, would it
make sense to expose the validate method?


Most of the error-checking in the API happens in the API calls in
libgccjit.c, testing that the individual pieces are sane.

The validate method tests the things that can only be verified as a
whole, when the context is about to be compiled: are there unreachable
blocks?  is every block terminated?  etc

I can't quite see why client code might want to perform the latter kind
of validation without actually doing a compile, so I don't plan to
expose this at this time.  It's trivial to do so if someone needs it.
Yea, I saw all the border checking -- I was thinking mostly about things 
that require larger context.  In particular I was thinking block 
contents, the cfg and such.


Thinking about how I tend to work and might use this, I'd be likely to 
start building up statements/blocks and want to verify them without 
going all the way through compilation.  But that may be an artifact of 
living in a world where we have many points (between each pass) where a 
verification step for key data structures is useful.


I certainly don't see it as a blocking issue, just wanted to raise the 
possibility that exposing the verification step in the ABI might be 
useful.  If you don't want to do that right now, I can live with it.




I think I was trying to avoid std::string for some reason, but I'm not
quite sure why, perhaps out of a misremembered idea that libstdc++ was
verboten (I currently use std:: in one place, in jit-playback.h, where a
playback::context has a:
   vecstd::pairtree, location *  m_cached_locations;
).
I can't see any reason why we wouldn't use basic capabilities of the C++ 
runtime system.  To use an example we both know and understand, 
switching EXPR_LIST to a forward_list would be something I would look 
favorably upon simply because everyone doing C++ knows what a 
forward_list is, it's proprties, strenghts  weaknesses.  Only GCC 
junkies happen to know that EXPR_LIST is just a hand-rolled forward list :-)





In any case recording::string is an implementation detail hidden within
the library.  It is a recording::memento and hence has the same lifetime
as the recording::context.  I can't think of a reason off the top of my
head why a std::string wouldn't work instead, but the existing code
works, and has been through a fair amount of debugging.
How about as a follow-up?  I don't see this as being big enough to 
warrant blocking the work.


jeff


Re: [PATCH 13/27] New file: gcc/jit/jit-recording.c

2014-11-04 Thread Jeff Law

On 11/04/14 09:24, David Malcolm wrote:

So presumably no multi-way branches yet?  Might be better to build the
API in such a way that it handles multi-way branches from the get-go
rather than retrofitting later.  Your call.


This is an internal API.  If the external API needs to support building
switch statements from client code maybe with something like this:

   extern void
   gcc_jit_block_end_with_switch (gcc_jit_block *block,
  gcc_jit_location *loc,
  gcc_jit_rvalue *expr,
  int num_cases,
  gcc_jit_rvalue **case_values,
  gcc_jit_block **case_blocks,
  gcc_jit_block *default_block);

then this internal API would obviously need to change, maybe becoming:

   virtual int get_num_successor_blocks () const;
   virtual block *get_successor_block (int index) const;

but that won't affect the public API.

That said it's not clear to me that we do need to support switch
statements; no JIT implementation I've seen has needed them
(I put them in the Probably not needed section of TODO.rst).

Fair enough :-)

jeff



Re: [PATCH 10/27] New file: gcc/jit/libgccjit.c

2014-11-04 Thread Jeff Law

On 11/04/14 09:57, David Malcolm wrote:

+#define IS_ASCII_DIGIT(CHAR) \
+  ((CHAR) = '0'  (CHAR) ='9')
+
+#define IS_ASCII_ALNUM(CHAR) \
+  (IS_ASCII_ALPHA (CHAR) || IS_ASCII_DIGIT (CHAR))

Can't we rely on the C library to give us equivalents?


I've been burned in the past by the C library using locales, in
particular the two lowercase i variants in Turkish.

These macros are used by gcc_jit_context_new_function to enforce C's
naming restrictions, to avoid errors from the assembler.  The comment I
put there was:

   /* The assembler can only handle certain names, so for now, enforce
  C's rules for identifiers upon the name.
  Eventually we'll need some way to interact with e.g. C++ name mangling.  
*/

Am I right in thinking that for the assembler we need to enforce the C
naming rules specifically on *ASCII*.

(clearly another comment is needed here).
I guess you've got to do it somewhere.  Presumably there isn't something 
already in GCC that enforces an input character set?  I guess I just 
dislike seeing something that feels like it ought to already be available.




Presumably by marking it with __attribute__((cold)) ?  (with a suitable
macro in case we're not being compiled with a gcc that supports it).
Yup.  That's precisely what you want since that gives the predictors 
enough information to mark paths as unlikely without having to mark each 
path yourself.




Sorry.  I'll post a followup with comments added.
Thanks.  I probably rely more on those for this kind of review than 
anything, so the lack of them really stood out.




Many of the functions are public API entrypoints, where there's a
comment in the public header.  Should I simply duplicate the comment
from there into the .c file, or put a comment like:
Good question.  Normally in the past we'd have you duplicate the 
comment, but with this new usage scenario that may not make a lot of 
sense since one or the other will likely get out of sync at some point.


At this point a snarky comment about generating documentation and the 
interface from a single definition would be appropriate.





   /* Public entrypoint.  See description in libgccjit.h.  */

for each of these?  (perhaps with additional text giving implementation
notes?)
Let's go with this.  If folks want the comment duplicated, they can 
argue for it after the fact :-)





Thanks for all the reviews.  Looks like this and patch 16 are now the
only non-approved parts of the kit (I didn't see a review of 16).

Right.  I didn't get to #16 yesterday.

jeff


Re: nvptx offloading patches [3/n], i386 bits RFD

2014-11-04 Thread Jeff Law

On 11/04/14 05:35, Bernd Schmidt wrote:

Ports that want to be hosts for offloading may need to modify their
modes.def. The patch below contains changes to i386-modes.def which
modifies XFmode depending on a target switch. I'm not actually entirely
sure what to do about this. Do we want to make this flag an error when
offloading is enabled? Or maybe add float format support to the
-foffload-abi option?

Thoughts? Ok for the first part of the patch once the other offloading
patches have gone in (bootstrapped and tested on x86_64-linux)?

It feels like we've got another real distinction to make.  We've had
host, build  target and they're all independent.  It feels like we need
offload target and better separate between target and offload target.
Then we need to figure out the places where we've got bleed-out.


Is this a question of terminology? I agree that saying offload host
when we'd normally be calling it the target is confusing, but it's
difficult to come up with better names.
No, I don't think it's terminology.  It's really that in effect we have 
two targets.  One is a normal CPU, the other is a GPU.


ie, there's nothing that says we won't have a GPU that's being driven by 
an ARM or PPC.  What I want to avoid is GPU-isms getting sprinkled into 
the x86 (or any other) backend.


The problem is we don't have any infrastructure in place for this kind 
of situation.  So we start off with a few hacks and hopefully we're able 
to see some commonality and start to see how to handle the 
multi-architecture target issues a bit better.


Jeff


Re: [PATCH] x86: allow to suppress default clobbers added to asm()s

2014-11-04 Thread Jeff Law

On 11/03/14 01:11, Jan Beulich wrote:

I really don't like having an option that's globally applied for this
feature.  THough I am OK with having a mechanism to avoid implicit
clobbers on specific ASMs.


Why not? That way, for projects/components knowing all their
asm()s have all clobbers explicitly specified they could avoid
having to touch all of them and instead just pass the new option.
That said - if the option isn't being liked, I'm fine dropping it.
In general, I doubt projects are going to achieve and maintain that 
level of knowledge about their asms.  It's just asking for subtle bugs 
in the long run.






Why use negative numbers for the hard register numbers?  I wouldn't be
at all surprised if lots of random code assumes register numbers are
always positive.


I think I want through all relevant places, and fixed the one where
the assumption was wrongly made. The nice thing about using
negative numbers here is that there is already at least one place
where negative numbers aren't valid, and we want them to not be
valid there.


I don't like adding new registers with special names like !foo.  Instead
I think that listing !cc or something similar in the asm itself if it
doesn't clobber the cc register would be better.


You mean interpreting the ! in generic code? Doable, but not very
nice imo considering that the default addition of clobbers is limited
to very few architectures, plus dealing with this in a generic way
would - afaict - make the change quite a bit more intrusive.
yes, precisely.  This is a generic concept and while more invasive, I 
think what we get in the end is going to ultimately be better.


I also suspect other targets ought to be defining those implicit 
clobbers, but don't :(


jeff


  1   2   >