RE: [PATCH, ARM 3/6] Fix indentation of FL_FOR_ARCH* definition after adding support for ARMv8-M

2015-12-16 Thread Thomas Preud'homme
[Fixed the subject and added ARM maintainers to recipient.]

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Thursday, December 17, 2015 3:51 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH, ARM 3/8] Fix indentation of FL_FOR_ARCH* definition
> after adding support for ARMv8-M
> 
> Hi,
> 
> This patch is part of a patch series to add support for ARMv8-M[1] to GCC.
> This specific patch fixes the indentation of FL_FOR_ARCH* macros
> definition following the patch to add support for ARMv8-M. Since this is
> an obvious change, I'm not expecting a review and will commit it as soon
> as the other patches in the series are accepted.
> 
> [1] For a quick overview of ARMv8-M please refer to the initial cover
> letter.
> 
> ChangeLog entry is as follows:
> 
> 
> *** gcc/ChangeLog ***
> 
> 2015-11-06  Thomas Preud'homme  
> 
> * config/arm/arm-protos.h: Reindent FL_FOR_* macro definitions.
> 
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 1371ee7..bf0d1b4 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -391,32 +391,33 @@ extern bool arm_is_constant_pool_ref (rtx);
>  #define FL_TUNE  (FL_WBUF | FL_VFPV2 | FL_STRONG |
> FL_LDSCHED \
>| FL_CO_PROC)
> 
> -#define FL_FOR_ARCH2 FL_NOTM
> -#define FL_FOR_ARCH3 (FL_FOR_ARCH2 | FL_MODE32)
> -#define FL_FOR_ARCH3M(FL_FOR_ARCH3 | FL_ARCH3M)
> -#define FL_FOR_ARCH4 (FL_FOR_ARCH3M | FL_ARCH4)
> -#define FL_FOR_ARCH4T(FL_FOR_ARCH4 | FL_THUMB)
> -#define FL_FOR_ARCH5 (FL_FOR_ARCH4 | FL_ARCH5)
> -#define FL_FOR_ARCH5T(FL_FOR_ARCH5 | FL_THUMB)
> -#define FL_FOR_ARCH5E(FL_FOR_ARCH5 | FL_ARCH5E)
> -#define FL_FOR_ARCH5TE   (FL_FOR_ARCH5E | FL_THUMB)
> -#define FL_FOR_ARCH5TEJ  FL_FOR_ARCH5TE
> -#define FL_FOR_ARCH6 (FL_FOR_ARCH5TE | FL_ARCH6)
> -#define FL_FOR_ARCH6JFL_FOR_ARCH6
> -#define FL_FOR_ARCH6K(FL_FOR_ARCH6 | FL_ARCH6K)
> -#define FL_FOR_ARCH6ZFL_FOR_ARCH6
> -#define FL_FOR_ARCH6KZ   (FL_FOR_ARCH6K | FL_ARCH6KZ)
> -#define FL_FOR_ARCH6T2   (FL_FOR_ARCH6 | FL_THUMB2)
> -#define FL_FOR_ARCH6M(FL_FOR_ARCH6 & ~FL_NOTM)
> -#define FL_FOR_ARCH7 ((FL_FOR_ARCH6T2 & ~FL_NOTM) |
> FL_ARCH7)
> -#define FL_FOR_ARCH7A(FL_FOR_ARCH7 | FL_NOTM |
> FL_ARCH6K)
> -#define FL_FOR_ARCH7VE   (FL_FOR_ARCH7A | FL_THUMB_DIV |
> FL_ARM_DIV)
> -#define FL_FOR_ARCH7R(FL_FOR_ARCH7A | FL_THUMB_DIV)
> -#define FL_FOR_ARCH7M(FL_FOR_ARCH7 | FL_THUMB_DIV)
> -#define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
> -#define FL_FOR_ARCH8A(FL_FOR_ARCH7VE | FL_ARCH8)
> -#define FL_FOR_ARCH8M_BASE (FL_FOR_ARCH6M | FL_ARCH8 |
> FL_THUMB_DIV)
> -#define FL_FOR_ARCH8M_MAIN (FL_FOR_ARCH7M | FL_ARCH8)
> +#define FL_FOR_ARCH2 FL_NOTM
> +#define FL_FOR_ARCH3 (FL_FOR_ARCH2 | FL_MODE32)
> +#define FL_FOR_ARCH3M(FL_FOR_ARCH3 | FL_ARCH3M)
> +#define FL_FOR_ARCH4 (FL_FOR_ARCH3M | FL_ARCH4)
> +#define FL_FOR_ARCH4T(FL_FOR_ARCH4 | FL_THUMB)
> +#define FL_FOR_ARCH5 (FL_FOR_ARCH4 | FL_ARCH5)
> +#define FL_FOR_ARCH5T(FL_FOR_ARCH5 | FL_THUMB)
> +#define FL_FOR_ARCH5E(FL_FOR_ARCH5 | FL_ARCH5E)
> +#define FL_FOR_ARCH5TE   (FL_FOR_ARCH5E | FL_THUMB)
> +#define FL_FOR_ARCH5TEJ  FL_FOR_ARCH5TE
> +#define FL_FOR_ARCH6 (FL_FOR_ARCH5TE | FL_ARCH6)
> +#define FL_FOR_ARCH6JFL_FOR_ARCH6
> +#define FL_FOR_ARCH6K(FL_FOR_ARCH6 | FL_ARCH6K)
> +#define FL_FOR_ARCH6ZFL_FOR_ARCH6
> +#define FL_FOR_ARCH6ZK   FL_FOR_ARCH6K
> +#define FL_FOR_ARCH6KZ   (FL_FOR_ARCH6K | FL_ARCH6KZ)
> +#define FL_FOR_ARCH6T2   (FL_FOR_ARCH6 | FL_THUMB2)
> +#define FL_FOR_ARCH6M(FL_FOR_ARCH6 & ~FL_NOTM)
> +#define FL_FOR_ARCH7 ((FL_FOR_ARCH6T2 & ~FL_NOTM)
> | FL_ARCH7)
> +#define FL_FOR_ARCH7A(FL_FOR_ARCH7 | FL_NOTM |
> FL_ARCH6K)
> +#define FL_FOR_ARCH7VE   (FL_FOR_ARCH7A |
> FL_THUMB_DIV | FL_ARM_DIV)
> +#define FL_FOR_ARCH7R(FL_FOR_ARCH7A |
> FL_THUMB_DIV)
> +#define FL_FOR_ARCH7M(FL_FOR_ARCH7 |
> FL_THUMB_DIV)
> +#define FL_FOR_ARCH7EM   (FL_FOR_ARCH7M |
> FL_ARCH7EM)
> +#define FL_FOR_ARCH8A(FL_FOR_ARCH7VE | FL_ARCH8)
> +#define FL_FOR_ARCH8M_BASE   (FL_FOR_ARCH6M | FL_ARCH8 |
> FL_THUMB_DIV)
> +#define FL_FOR_ARCH8M_MAIN   (FL_FOR_ARCH7M | FL_ARCH8)
> 
>  /* There are too many feature bits to fit in a single word so the set of
> cpu and
> fpu capabilities is a structure.  A feature set is created and manipulated
> 
> 
> Is this ok for stage3?
> 
> Best regards,
> 
> Thomas




[arm-embedded][PATCH, ARM 3/6] Fix indentation of FL_FOR_ARCH* definition after adding support for ARMv8-M

2015-12-16 Thread Thomas Preud'homme
Hi,

We decided to apply the following patch to the ARM embedded 5 branch.

Best regards,

Thomas

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Thursday, December 17, 2015 3:51 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH, ARM 3/8] Fix indentation of FL_FOR_ARCH* definition
> after adding support for ARMv8-M
> 
> Hi,
> 
> This patch is part of a patch series to add support for ARMv8-M[1] to GCC.
> This specific patch fixes the indentation of FL_FOR_ARCH* macros
> definition following the patch to add support for ARMv8-M. Since this is
> an obvious change, I'm not expecting a review and will commit it as soon
> as the other patches in the series are accepted.
> 
> [1] For a quick overview of ARMv8-M please refer to the initial cover
> letter.
> 
> ChangeLog entry is as follows:
> 
> 
> *** gcc/ChangeLog ***
> 
> 2015-11-06  Thomas Preud'homme  
> 
> * config/arm/arm-protos.h: Reindent FL_FOR_* macro definitions.
> 
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 1371ee7..bf0d1b4 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -391,32 +391,33 @@ extern bool arm_is_constant_pool_ref (rtx);
>  #define FL_TUNE  (FL_WBUF | FL_VFPV2 | FL_STRONG |
> FL_LDSCHED \
>| FL_CO_PROC)
> 
> -#define FL_FOR_ARCH2 FL_NOTM
> -#define FL_FOR_ARCH3 (FL_FOR_ARCH2 | FL_MODE32)
> -#define FL_FOR_ARCH3M(FL_FOR_ARCH3 | FL_ARCH3M)
> -#define FL_FOR_ARCH4 (FL_FOR_ARCH3M | FL_ARCH4)
> -#define FL_FOR_ARCH4T(FL_FOR_ARCH4 | FL_THUMB)
> -#define FL_FOR_ARCH5 (FL_FOR_ARCH4 | FL_ARCH5)
> -#define FL_FOR_ARCH5T(FL_FOR_ARCH5 | FL_THUMB)
> -#define FL_FOR_ARCH5E(FL_FOR_ARCH5 | FL_ARCH5E)
> -#define FL_FOR_ARCH5TE   (FL_FOR_ARCH5E | FL_THUMB)
> -#define FL_FOR_ARCH5TEJ  FL_FOR_ARCH5TE
> -#define FL_FOR_ARCH6 (FL_FOR_ARCH5TE | FL_ARCH6)
> -#define FL_FOR_ARCH6JFL_FOR_ARCH6
> -#define FL_FOR_ARCH6K(FL_FOR_ARCH6 | FL_ARCH6K)
> -#define FL_FOR_ARCH6ZFL_FOR_ARCH6
> -#define FL_FOR_ARCH6KZ   (FL_FOR_ARCH6K | FL_ARCH6KZ)
> -#define FL_FOR_ARCH6T2   (FL_FOR_ARCH6 | FL_THUMB2)
> -#define FL_FOR_ARCH6M(FL_FOR_ARCH6 & ~FL_NOTM)
> -#define FL_FOR_ARCH7 ((FL_FOR_ARCH6T2 & ~FL_NOTM) |
> FL_ARCH7)
> -#define FL_FOR_ARCH7A(FL_FOR_ARCH7 | FL_NOTM |
> FL_ARCH6K)
> -#define FL_FOR_ARCH7VE   (FL_FOR_ARCH7A | FL_THUMB_DIV |
> FL_ARM_DIV)
> -#define FL_FOR_ARCH7R(FL_FOR_ARCH7A | FL_THUMB_DIV)
> -#define FL_FOR_ARCH7M(FL_FOR_ARCH7 | FL_THUMB_DIV)
> -#define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
> -#define FL_FOR_ARCH8A(FL_FOR_ARCH7VE | FL_ARCH8)
> -#define FL_FOR_ARCH8M_BASE (FL_FOR_ARCH6M | FL_ARCH8 |
> FL_THUMB_DIV)
> -#define FL_FOR_ARCH8M_MAIN (FL_FOR_ARCH7M | FL_ARCH8)
> +#define FL_FOR_ARCH2 FL_NOTM
> +#define FL_FOR_ARCH3 (FL_FOR_ARCH2 | FL_MODE32)
> +#define FL_FOR_ARCH3M(FL_FOR_ARCH3 | FL_ARCH3M)
> +#define FL_FOR_ARCH4 (FL_FOR_ARCH3M | FL_ARCH4)
> +#define FL_FOR_ARCH4T(FL_FOR_ARCH4 | FL_THUMB)
> +#define FL_FOR_ARCH5 (FL_FOR_ARCH4 | FL_ARCH5)
> +#define FL_FOR_ARCH5T(FL_FOR_ARCH5 | FL_THUMB)
> +#define FL_FOR_ARCH5E(FL_FOR_ARCH5 | FL_ARCH5E)
> +#define FL_FOR_ARCH5TE   (FL_FOR_ARCH5E | FL_THUMB)
> +#define FL_FOR_ARCH5TEJ  FL_FOR_ARCH5TE
> +#define FL_FOR_ARCH6 (FL_FOR_ARCH5TE | FL_ARCH6)
> +#define FL_FOR_ARCH6JFL_FOR_ARCH6
> +#define FL_FOR_ARCH6K(FL_FOR_ARCH6 | FL_ARCH6K)
> +#define FL_FOR_ARCH6ZFL_FOR_ARCH6
> +#define FL_FOR_ARCH6ZK   FL_FOR_ARCH6K
> +#define FL_FOR_ARCH6KZ   (FL_FOR_ARCH6K | FL_ARCH6KZ)
> +#define FL_FOR_ARCH6T2   (FL_FOR_ARCH6 | FL_THUMB2)
> +#define FL_FOR_ARCH6M(FL_FOR_ARCH6 & ~FL_NOTM)
> +#define FL_FOR_ARCH7 ((FL_FOR_ARCH6T2 & ~FL_NOTM)
> | FL_ARCH7)
> +#define FL_FOR_ARCH7A(FL_FOR_ARCH7 | FL_NOTM |
> FL_ARCH6K)
> +#define FL_FOR_ARCH7VE   (FL_FOR_ARCH7A |
> FL_THUMB_DIV | FL_ARM_DIV)
> +#define FL_FOR_ARCH7R(FL_FOR_ARCH7A |
> FL_THUMB_DIV)
> +#define FL_FOR_ARCH7M(FL_FOR_ARCH7 |
> FL_THUMB_DIV)
> +#define FL_FOR_ARCH7EM   (FL_FOR_ARCH7M |
> FL_ARCH7EM)
> +#define FL_FOR_ARCH8A(FL_FOR_ARCH7VE | FL_ARCH8)
> +#define FL_FOR_ARCH8M_BASE   (FL_FOR_ARCH6M | FL_ARCH8 |
> FL_THUMB_DIV)
> +#define FL_FOR_ARCH8M_MAIN   (FL_FOR_ARCH7M | FL_ARCH8)
> 
>  /* There are too many feature bits to fit in a single word so the set of
> cpu and
> fpu capabilities is a structure.  A feature set is created and manipulated
> 
> 
> Is this ok for stage3?
> 
> Best regards,
> 
> Thomas




[PATCH, ARM 3/8] Fix indentation of FL_FOR_ARCH* definition after adding support for ARMv8-M

2015-12-16 Thread Thomas Preud'homme
Hi,

This patch is part of a patch series to add support for ARMv8-M[1] to GCC. This 
specific patch fixes the indentation of FL_FOR_ARCH* macros definition 
following the patch to add support for ARMv8-M. Since this is an obvious 
change, I'm not expecting a review and will commit it as soon as the other 
patches in the series are accepted.

[1] For a quick overview of ARMv8-M please refer to the initial cover letter.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2015-11-06  Thomas Preud'homme  

* config/arm/arm-protos.h: Reindent FL_FOR_* macro definitions.


diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 1371ee7..bf0d1b4 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -391,32 +391,33 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_TUNE(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
 | FL_CO_PROC)
 
-#define FL_FOR_ARCH2   FL_NOTM
-#define FL_FOR_ARCH3   (FL_FOR_ARCH2 | FL_MODE32)
-#define FL_FOR_ARCH3M  (FL_FOR_ARCH3 | FL_ARCH3M)
-#define FL_FOR_ARCH4   (FL_FOR_ARCH3M | FL_ARCH4)
-#define FL_FOR_ARCH4T  (FL_FOR_ARCH4 | FL_THUMB)
-#define FL_FOR_ARCH5   (FL_FOR_ARCH4 | FL_ARCH5)
-#define FL_FOR_ARCH5T  (FL_FOR_ARCH5 | FL_THUMB)
-#define FL_FOR_ARCH5E  (FL_FOR_ARCH5 | FL_ARCH5E)
-#define FL_FOR_ARCH5TE (FL_FOR_ARCH5E | FL_THUMB)
-#define FL_FOR_ARCH5TEJFL_FOR_ARCH5TE
-#define FL_FOR_ARCH6   (FL_FOR_ARCH5TE | FL_ARCH6)
-#define FL_FOR_ARCH6J  FL_FOR_ARCH6
-#define FL_FOR_ARCH6K  (FL_FOR_ARCH6 | FL_ARCH6K)
-#define FL_FOR_ARCH6Z  FL_FOR_ARCH6
-#define FL_FOR_ARCH6KZ (FL_FOR_ARCH6K | FL_ARCH6KZ)
-#define FL_FOR_ARCH6T2 (FL_FOR_ARCH6 | FL_THUMB2)
-#define FL_FOR_ARCH6M  (FL_FOR_ARCH6 & ~FL_NOTM)
-#define FL_FOR_ARCH7   ((FL_FOR_ARCH6T2 & ~FL_NOTM) | FL_ARCH7)
-#define FL_FOR_ARCH7A  (FL_FOR_ARCH7 | FL_NOTM | FL_ARCH6K)
-#define FL_FOR_ARCH7VE (FL_FOR_ARCH7A | FL_THUMB_DIV | FL_ARM_DIV)
-#define FL_FOR_ARCH7R  (FL_FOR_ARCH7A | FL_THUMB_DIV)
-#define FL_FOR_ARCH7M  (FL_FOR_ARCH7 | FL_THUMB_DIV)
-#define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
-#define FL_FOR_ARCH8A  (FL_FOR_ARCH7VE | FL_ARCH8)
-#define FL_FOR_ARCH8M_BASE (FL_FOR_ARCH6M | FL_ARCH8 | FL_THUMB_DIV)
-#define FL_FOR_ARCH8M_MAIN (FL_FOR_ARCH7M | FL_ARCH8)
+#define FL_FOR_ARCH2   FL_NOTM
+#define FL_FOR_ARCH3   (FL_FOR_ARCH2 | FL_MODE32)
+#define FL_FOR_ARCH3M  (FL_FOR_ARCH3 | FL_ARCH3M)
+#define FL_FOR_ARCH4   (FL_FOR_ARCH3M | FL_ARCH4)
+#define FL_FOR_ARCH4T  (FL_FOR_ARCH4 | FL_THUMB)
+#define FL_FOR_ARCH5   (FL_FOR_ARCH4 | FL_ARCH5)
+#define FL_FOR_ARCH5T  (FL_FOR_ARCH5 | FL_THUMB)
+#define FL_FOR_ARCH5E  (FL_FOR_ARCH5 | FL_ARCH5E)
+#define FL_FOR_ARCH5TE (FL_FOR_ARCH5E | FL_THUMB)
+#define FL_FOR_ARCH5TEJFL_FOR_ARCH5TE
+#define FL_FOR_ARCH6   (FL_FOR_ARCH5TE | FL_ARCH6)
+#define FL_FOR_ARCH6J  FL_FOR_ARCH6
+#define FL_FOR_ARCH6K  (FL_FOR_ARCH6 | FL_ARCH6K)
+#define FL_FOR_ARCH6Z  FL_FOR_ARCH6
+#define FL_FOR_ARCH6ZK FL_FOR_ARCH6K
+#define FL_FOR_ARCH6KZ (FL_FOR_ARCH6K | FL_ARCH6KZ)
+#define FL_FOR_ARCH6T2 (FL_FOR_ARCH6 | FL_THUMB2)
+#define FL_FOR_ARCH6M  (FL_FOR_ARCH6 & ~FL_NOTM)
+#define FL_FOR_ARCH7   ((FL_FOR_ARCH6T2 & ~FL_NOTM) | FL_ARCH7)
+#define FL_FOR_ARCH7A  (FL_FOR_ARCH7 | FL_NOTM | FL_ARCH6K)
+#define FL_FOR_ARCH7VE (FL_FOR_ARCH7A | FL_THUMB_DIV | FL_ARM_DIV)
+#define FL_FOR_ARCH7R  (FL_FOR_ARCH7A | FL_THUMB_DIV)
+#define FL_FOR_ARCH7M  (FL_FOR_ARCH7 | FL_THUMB_DIV)
+#define FL_FOR_ARCH7EM (FL_FOR_ARCH7M | FL_ARCH7EM)
+#define FL_FOR_ARCH8A  (FL_FOR_ARCH7VE | FL_ARCH8)
+#define FL_FOR_ARCH8M_BASE (FL_FOR_ARCH6M | FL_ARCH8 | FL_THUMB_DIV)
+#define FL_FOR_ARCH8M_MAIN (FL_FOR_ARCH7M | FL_ARCH8)
 
 /* There are too many feature bits to fit in a single word so the set of cpu 
and
fpu capabilities is a structure.  A feature set is created and manipulated


Is this ok for stage3?

Best regards,

Thomas



[arm-embedded][PATCH, ARM 2/6] Add support for ARMv8-M

2015-12-16 Thread Thomas Preud'homme
Hi,

We decided to apply the following patch to the ARM embedded 5 branch.

Best regards,

Thomas

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Thursday, December 17, 2015 3:25 PM
> To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan;
> Kyrylo Tkachov
> Subject: [PATCH, ARM 2/6] Add support for ARMv8-M
> 
> Hi,
> 
> This patch is part of a patch series to add support for ARMv8-M[1] to GCC.
> This specific patch adds basic support for the new architecture, allowing
> the new names to be accepted by -march and the compiler to behave
> like ARMv6-M (for ARMv8-M Baseline) and or ARMv7-M (for ARMv8-M
> Mainline). The changes are divided in two categories:
> 
> * those to recognize the new architecture name
> * those to keep the behavior as previous architectures
> 
> Changes to make the compiler generate code with the new instructions
> are in follow-up patches.
> 
> [1] For a quick overview of ARMv8-M please refer to the initial cover
> letter.
> 
> 
> ChangeLog entries are as follow:
> 
> *** gcc/ChangeLog ***
> 
> 2015-11-23  Thomas Preud'homme  
> 
> * config/arm/arm-arches.def (armv8-m.base): Define new
> architecture.
> (armv8-m.main): Likewise.
> (armv8-m.main+dsp): Likewise
> * config/arm/arm-protos.h (FL_FOR_ARCH8M_BASE): Define.
> (FL_FOR_ARCH8M_MAIN): Likewise.
> * config/arm/arm-tables.opt: Regenerate.
> * config/arm/bpabi.h: Add armv8-m.base, armv8-m.main and
> armv8-m.main+dsp to BE8_LINK_SPEC.
> * config/arm/arm.h (TARGET_HAVE_LDACQ): Exclude ARMv8-M.
> (enum base_architecture): Add BASE_ARCH_8M_BASE and
> BASE_ARCH_8M_MAIN.
> (TARGET_ARM_V8M): Define.
> * config/arm/arm.c (arm_arch_name): Increase size to work with
> ARMv8-M
> Baseline and Mainline.
> (arm_option_override_internal): Also disable arm_restrict_it when
> !arm_arch_notm.
> (arm_file_start): Increase architecture buffer size.
> * doc/invoke.texi: Document architectures armv8-m.base, armv8-
> m.main
> and armv8-m.main+dsp.
> (mno-unaligned-access): Clarify that this is disabled by default for
> ARMv8-M Baseline architecture as well.
> 
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2015-11-10  Thomas Preud'homme  
> 
> * lib/target-supports.exp: Generate
> add_options_for_arm_arch_FUNC and
> check_effective_target_arm_arch_FUNC_multilib for ARMv8-M
> Baseline and
> ARMv8-M Mainline architectures.
> 
> 
> *** libgcc/ChangeLog ***
> 
> 2015-11-10  Thomas Preud'homme  
> 
> * config/arm/lib1funcs.S (__ARM_ARCH__): Define to 8 for ARMv8-
> M.
> 
> 
> diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-
> arches.def
> index
> ddf6c3c330f91640d647d266f3d0e2350e7b986a..1d0301a3b9414127d38783
> 4584f3e42c225b6d3f 100644
> --- a/gcc/config/arm/arm-arches.def
> +++ b/gcc/config/arm/arm-arches.def
> @@ -57,6 +57,12 @@ ARM_ARCH("armv7-m", cortexm3,  7M,
>   ARM_FSET_MAKE_CPU1 (FL_CO_PROC |  FL_FOR_
>  ARM_ARCH("armv7e-m", cortexm4,  7EM, ARM_FSET_MAKE_CPU1
> (FL_CO_PROC |   FL_FOR_ARCH7EM))
>  ARM_ARCH("armv8-a", cortexa53,  8A,  ARM_FSET_MAKE_CPU1
> (FL_CO_PROC | FL_FOR_ARCH8A))
>  ARM_ARCH("armv8-a+crc",cortexa53, 8A,   ARM_FSET_MAKE_CPU1
> (FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A))
> +ARM_ARCH("armv8-m.base", cortexm0, 8M_BASE,
> +  ARM_FSET_MAKE_CPU1 (
> FL_FOR_ARCH8M_BASE))
> +ARM_ARCH("armv8-m.main", cortexm7, 8M_MAIN,
> +  ARM_FSET_MAKE_CPU1(FL_CO_PROC |
> FL_FOR_ARCH8M_MAIN))
> +ARM_ARCH("armv8-m.main+dsp", cortexm7, 8M_MAIN,
> +  ARM_FSET_MAKE_CPU1(FL_CO_PROC | FL_ARCH7EM |
> FL_FOR_ARCH8M_MAIN))
>  ARM_ARCH("iwmmxt",  iwmmxt, 5TE, ARM_FSET_MAKE_CPU1
> (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE |
> FL_IWMMXT))
>  ARM_ARCH("iwmmxt2", iwmmxt2,5TE, ARM_FSET_MAKE_CPU1
> (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE |
> FL_IWMMXT | FL_IWMMXT2))
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index
> e7328e79650739fca1c3e21b10c194feaa697465..dc7a0871c37bfda267
> 1f197bfe83c20c7888 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -415,6 +415,8 @@ extern bool arm_is_constant_pool_ref (rtx);
>  #define FL_FOR_ARCH7M(FL_FOR_ARCH7 | FL_THUMB_DIV)
>  #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
>  #define FL_FOR_ARCH8A(FL_FOR_ARCH7VE | FL_ARCH8)
> +#define FL_FOR_ARCH8M_BASE (FL_FOR_ARCH6M | FL_ARCH8 |
> FL_THUMB_DIV)
> +#define FL_FOR_ARCH8M_MAIN (FL_FOR_ARCH7M | FL_ARCH8)
> 
>  /* There are too many feature bits to fit in a single word so the set of
> cpu and
> fpu capabilities is a structure.  A feature set is created and manipulated
> diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-
> tables.opt
> index
> 48aac41c37a35e27440f67863d4a92457916dd

[gomp4] Merge trunk r231689 (2015-12-16) into gomp-4_0-branch

2015-12-16 Thread Thomas Schwinge
Hi!

Committed to gomp-4_0-branch in r231738:

commit d0b110f2163a5b186f15d05c9bfc6f51a42d652c
Merge: 2a5a682 565bc8f
Author: tschwinge 
Date:   Thu Dec 17 07:11:02 2015 +

svn merge -r 231118:231689 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@231738 
138bc75d-0d04-0410-961f-82ee72b054a4

This merge commit imports the trunk regressions that gcc/testsuite/ Cilk+
testing is no longer run for build-tree testing,
,
and the libgomp.oacc-c-c++-common/declare-4.c compilation regression,
.

The merge commit also includes a few new test cases (ChangeLog fixed in
r231739):

2015-12-17  Tom de Vries  

* c-c++-common/goacc/kernels-offload-alias-2.c: New test.
* c-c++-common/goacc/kernels-offload-alias-3.c: New test.
* c-c++-common/goacc/kernels-offload-alias-4.c: New test.
* c-c++-common/goacc/kernels-offload-alias-5.c: New test.
* c-c++-common/goacc/kernels-offload-alias-6.c: New test.
* c-c++-common/goacc/kernels-offload-alias.c: New test.

diff --cc gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-2.c
index 000,000..ae829dc
new file mode 100644
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-2.c
@@@ -1,0 -1,0 +1,24 @@@
++/* { dg-additional-options "-O2 -foffload-alias=pointer" } */
++/* { dg-additional-options "-fdump-tree-ealias-all -fdump-tree-optimized" } */
++
++#define N 2
++
++void
++foo (void)
++{
++  unsigned int a[N];
++  unsigned int *p = &a[0];
++
++#pragma acc kernels pcopyin (a, p[0:2])
++  {
++a[0] = 0;
++*p = 1;
++  }
++}
++
++/* { dg-final { scan-tree-dump-times " = 0" 1 "optimized" } } */
++
++/* { dg-final { scan-tree-dump-times "clique 1 base 1" 2 "ealias" } } */
++/* { dg-final { scan-tree-dump-times "clique 1 base 2" 1 "ealias" } } */
++/* { dg-final { scan-tree-dump-times "clique 1 base 3" 1 "ealias" } } */
++/* { dg-final { scan-tree-dump-times "(?n)clique .* base .*" 4 "ealias" } } */
diff --cc gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-3.c
index 000,000..2eb009e
new file mode 100644
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-3.c
@@@ -1,0 -1,0 +1,22 @@@
++/* { dg-additional-options "-O2 -foffload-alias=pointer" } */
++/* { dg-additional-options "-fdump-tree-ealias-all -fdump-tree-optimized" } */
++
++void
++foo (int *a)
++{
++  int *p = a;
++
++#pragma acc kernels pcopyin (a[0:1], p[0:1])
++  {
++*a = 0;
++*p = 1;
++  }
++}
++
++/* { dg-final { scan-tree-dump-times " = 0" 1 "optimized" } } */
++
++/* { dg-final { scan-tree-dump-times "clique 1 base 1" 2 "ealias" } } */
++/* { dg-final { scan-tree-dump-times "clique 1 base 2" 1 "ealias" } } */
++/* { dg-final { scan-tree-dump-times "clique 1 base 3" 1 "ealias" } } */
++/* { dg-final { scan-tree-dump-times "(?n)clique .* base .*" 4 "ealias" } } */
++
diff --cc gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-4.c
index 000,000..bb5a3c3
new file mode 100644
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-4.c
@@@ -1,0 -1,0 +1,26 @@@
++/* { dg-additional-options "-O2 -foffload-alias=pointer" } */
++/* { dg-additional-options "-fdump-tree-ealias-all -fdump-tree-optimized" } */
++
++typedef __SIZE_TYPE__ size_t;
++extern void *acc_copyin (void *, size_t);
++
++void
++foo (void)
++{
++  int a = 2;
++  int *p = (int *)acc_copyin (&a, sizeof (a));
++
++#pragma acc kernels deviceptr (p) pcopy(a)
++  {
++a = 0;
++*p = 1;
++  }
++}
++
++/* { dg-final { scan-tree-dump-times " = 0" 1 "optimized" } } */
++
++/* { dg-final { scan-tree-dump-times "clique 1 base 1" 2 "ealias" } } */
++/* { dg-final { scan-tree-dump-times "clique 1 base 2" 1 "ealias" } } */
++/* { dg-final { scan-tree-dump-times "clique 1 base 3" 1 "ealias" } } */
++/* { dg-final { scan-tree-dump-times "(?n)clique .* base .*" 4 "ealias" } } */
++
diff --cc gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-5.c
index 000,000..5e076c0
new file mode 100644
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-5.c
@@@ -1,0 -1,0 +1,27 @@@
++/* { dg-additional-options "-O2 -foffload-alias=pointer" } */
++/* { dg-additional-options "-fdump-tree-ealias-all -fdump-tree-optimized" } */
++
++typedef __SIZE_TYPE__ size_t;
++extern void *acc_copyin (void *, size_t);
++
++#define N 2
++
++void
++foo (void)
++{
++  int a[N];
++  int *p = (int *)acc_copyin (&a[0], sizeof (a));
++
++#pragma acc kernels deviceptr (p) pcopy(a)
++  {
++a[0] = 0;
++*p = 1;
++  }
++}
++
++/* { dg-final { scan-tree-dump-times " = 0" 1 "optimized" } } */
++
++/* { dg-final { scan-tree-dump-times "clique 1 base 1" 2 "ealias" } } */
++/* { dg-final { scan-tree-dump-times "clique 1 base 2" 1 "ealias" } } */
++/* { dg-final { scan-t

Re: [build] Only support -gstabs on Mac OS X if assember supports it (PR target/67973)

2015-12-16 Thread Rainer Orth
Mike Stump  writes:

> On Dec 15, 2015, at 5:35 AM, Rainer Orth  
> wrote:
>> Right: I'm effectively keeping just the first configure test for .stabs
>> support in the assembler to enable or disable
>> DBX_DEBUG/DBX_DEBUGGING_INFO.  I'll post it later since …
>
>> ... testing revealed another instance of static assumptions which hurts
>> us now: while support for -gstabs* is checked for dynamically in
>> lib/gcc-dg.exp and lib/gfortran-dg.exp for the debug.exp tests, there
>> are a couple of testcases that use -gstabs* unconditionally, but have a
>> hardcoded list of targets that support those options.  I'll introduce a
>> new effective-target keyword (simply checking if -gstabs is accepted
>> should be enough) to also perform this test dynamically and repost once
>> it's tested.
>
> Sounds good.

Here's what I came up with.  Tested with the appropriate runtest
invocations both in a tree with the Xcode 7/LLVM as without stabs
support, where the tests come out UNSUPPORTED, and another one with the
Xcode 6.4/gas as with stabs, where they PASS.

I've left alone two testcases using -gstabs* which are guaranteed to
work without the keyword:

gcc.target/powerpc/stabs-attrib-vect-darwin.c
gcc.target/s390/20041216-1.c

In case the current test for stabs (checking if one can compile/assemble
with -gstabs) isn't enough on some of the targets currently listed
explicitly, it could easily be augmented.

Ok for mainline?

Rainer


2015-12-11  Rainer Orth  

gcc:
PR target/67973
* configure.ac (gcc_cv_as_stabs_directive): New test.
* configure: Regenerate.
* config.in: Regenerate.
* config/darwin.h (DBX_DEBUGGING_INFO): Wrap in
HAVE_AS_STABS_DIRECTIVE.
(PREFERRED_DEBUGGING_TYPE): Likewise.
* config/i386/darwin.h (PREFERRED_DEBUGGING_TYPE): Only include
DBX_DEBUG if HAVE_AS_STABS_DIRECTIVE.

* doc/sourcebuild.texi (Effective-Target Keywords, Environment
attributes): Document stabs.

gcc/testsuite:
* lib/target-supports.exp (check_effective_target_stabs): New
proc.
* g++.dg/cpp0x/alias-decl-debug-0.C: Restrict to stabs targets.
* g++.dg/other/PR23205.C: Likewise.
* g++.dg/other/pr23205-2.C: Likewise.
* gcc.dg/20040813-1.c: Likewise.
* gcc.dg/darwin-20040809-2.c: Likewise.
* objc.dg/stabs-1.m: Likewise.

# HG changeset patch
# Parent  c7950a736b94d8efa4a0120ac24359bc446b4c0b
Only support -gstabs on Mac OS X if assember supports it (PR target/67973)

diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -400,12 +400,13 @@ extern GTY(()) int darwin_ms_struct;
 
 #define ASM_DEBUG_SPEC  "%{g*:%{!g0:%{!gdwarf*:--gstabs}}}"
 
-/* We still allow output of STABS.  */
-
+/* We still allow output of STABS if the assembler supports it.  */
+#ifdef HAVE_AS_STABS_DIRECTIVE
 #define DBX_DEBUGGING_INFO 1
+#define PREFERRED_DEBUGGING_TYPE DBX_DEBUG
+#endif
 
 #define DWARF2_DEBUGGING_INFO 1
-#define PREFERRED_DEBUGGING_TYPE DBX_DEBUG
 
 #define DEBUG_FRAME_SECTION	"__DWARF,__debug_frame,regular,debug"
 #define DEBUG_INFO_SECTION	"__DWARF,__debug_info,regular,debug"
diff --git a/gcc/config/i386/darwin.h b/gcc/config/i386/darwin.h
--- a/gcc/config/i386/darwin.h
+++ b/gcc/config/i386/darwin.h
@@ -226,7 +226,11 @@ do {	\
compiles default to stabs+.  darwin9+ defaults to dwarf-2.  */
 #ifndef DARWIN_PREFER_DWARF
 #undef PREFERRED_DEBUGGING_TYPE
+#ifdef HAVE_AS_STABS_DIRECTIVE
 #define PREFERRED_DEBUGGING_TYPE (TARGET_64BIT ? DWARF2_DEBUG : DBX_DEBUG)
+#else
+#define PREFERRED_DEBUGGING_TYPE DWARF2_DEBUG
+#endif
 #endif
 
 /* Darwin uses the standard DWARF register numbers but the default
diff --git a/gcc/configure.ac b/gcc/configure.ac
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -2909,6 +2909,11 @@ AC_DEFINE_UNQUOTED(HAVE_GAS_SHF_MERGE,
   [`if test $gcc_cv_as_shf_merge = yes; then echo 1; else echo 0; fi`],
 [Define 0/1 if your assembler supports marking sections with SHF_MERGE flag.])
 
+gcc_GAS_CHECK_FEATURE([stabs directive], gcc_cv_as_stabs_directive, ,,
+[.stabs "gcc2_compiled.",60,0,0,0],,
+[AC_DEFINE(HAVE_AS_STABS_DIRECTIVE, 1,
+  [Define if your assembler supports .stabs.])])
+
 gcc_GAS_CHECK_FEATURE([COMDAT group support (GNU as)],
  gcc_cv_as_comdat_group,
  [elf,2,16,0], [--fatal-warnings],
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1824,6 +1824,9 @@ time) should be run on this target.  Thi
 Test system runs executables on a simulator (i.e. slowly) rather than
 hardware (i.e. fast).
 
+@item stabs
+Target supports the stabs debugging format.
+
 @item stdint_types
 Target has the basic signed and unsigned C types in @code{stdint.h}.
 This will be obsolete when GCC ensures a working @code{stdint.h} for
diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-debug-0.C b/gcc/testsuite/g++.d

[PATCH, ARM 2/6] Add support for ARMv8-M

2015-12-16 Thread Thomas Preud'homme
Hi,

This patch is part of a patch series to add support for ARMv8-M[1] to GCC. This 
specific patch adds basic support for the new architecture, allowing the new 
names to be accepted by -march and the compiler to behave like ARMv6-M (for 
ARMv8-M Baseline) and or ARMv7-M (for ARMv8-M Mainline). The changes are 
divided in two categories:

* those to recognize the new architecture name
* those to keep the behavior as previous architectures

Changes to make the compiler generate code with the new instructions are in 
follow-up patches.

[1] For a quick overview of ARMv8-M please refer to the initial cover letter.


ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2015-11-23  Thomas Preud'homme  

* config/arm/arm-arches.def (armv8-m.base): Define new architecture.
(armv8-m.main): Likewise.
(armv8-m.main+dsp): Likewise
* config/arm/arm-protos.h (FL_FOR_ARCH8M_BASE): Define.
(FL_FOR_ARCH8M_MAIN): Likewise.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/bpabi.h: Add armv8-m.base, armv8-m.main and
armv8-m.main+dsp to BE8_LINK_SPEC.
* config/arm/arm.h (TARGET_HAVE_LDACQ): Exclude ARMv8-M.
(enum base_architecture): Add BASE_ARCH_8M_BASE and BASE_ARCH_8M_MAIN.
(TARGET_ARM_V8M): Define.
* config/arm/arm.c (arm_arch_name): Increase size to work with ARMv8-M
Baseline and Mainline.
(arm_option_override_internal): Also disable arm_restrict_it when
!arm_arch_notm.
(arm_file_start): Increase architecture buffer size.
* doc/invoke.texi: Document architectures armv8-m.base, armv8-m.main
and armv8-m.main+dsp.
(mno-unaligned-access): Clarify that this is disabled by default for
ARMv8-M Baseline architecture as well.


*** gcc/testsuite/ChangeLog ***

2015-11-10  Thomas Preud'homme  

* lib/target-supports.exp: Generate add_options_for_arm_arch_FUNC and
check_effective_target_arm_arch_FUNC_multilib for ARMv8-M Baseline and
ARMv8-M Mainline architectures.


*** libgcc/ChangeLog ***

2015-11-10  Thomas Preud'homme  

* config/arm/lib1funcs.S (__ARM_ARCH__): Define to 8 for ARMv8-M.


diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 
ddf6c3c330f91640d647d266f3d0e2350e7b986a..1d0301a3b9414127d387834584f3e42c225b6d3f
 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -57,6 +57,12 @@ ARM_ARCH("armv7-m", cortexm3,7M, 
ARM_FSET_MAKE_CPU1 (FL_CO_PROC |  FL_FOR_
 ARM_ARCH("armv7e-m", cortexm4,  7EM,   ARM_FSET_MAKE_CPU1 (FL_CO_PROC |
  FL_FOR_ARCH7EM))
 ARM_ARCH("armv8-a", cortexa53,  8A,ARM_FSET_MAKE_CPU1 (FL_CO_PROC |
 FL_FOR_ARCH8A))
 ARM_ARCH("armv8-a+crc",cortexa53, 8A,   ARM_FSET_MAKE_CPU1 (FL_CO_PROC | 
FL_CRC32  | FL_FOR_ARCH8A))
+ARM_ARCH("armv8-m.base", cortexm0, 8M_BASE,
+ARM_FSET_MAKE_CPU1 ( FL_FOR_ARCH8M_BASE))
+ARM_ARCH("armv8-m.main", cortexm7, 8M_MAIN,
+ARM_FSET_MAKE_CPU1(FL_CO_PROC |  FL_FOR_ARCH8M_MAIN))
+ARM_ARCH("armv8-m.main+dsp", cortexm7, 8M_MAIN,
+ARM_FSET_MAKE_CPU1(FL_CO_PROC | FL_ARCH7EM | FL_FOR_ARCH8M_MAIN))
 ARM_ARCH("iwmmxt",  iwmmxt, 5TE,   ARM_FSET_MAKE_CPU1 (FL_LDSCHED | 
FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT))
 ARM_ARCH("iwmmxt2", iwmmxt2,5TE,   ARM_FSET_MAKE_CPU1 (FL_LDSCHED | 
FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2))
 
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
e7328e79650739fca1c3e21b10c194feaa697465..dc7a0871c37bfda2671f197bfe83c20c7888
 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -415,6 +415,8 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_FOR_ARCH7M  (FL_FOR_ARCH7 | FL_THUMB_DIV)
 #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
 #define FL_FOR_ARCH8A  (FL_FOR_ARCH7VE | FL_ARCH8)
+#define FL_FOR_ARCH8M_BASE (FL_FOR_ARCH6M | FL_ARCH8 | FL_THUMB_DIV)
+#define FL_FOR_ARCH8M_MAIN (FL_FOR_ARCH7M | FL_ARCH8)
 
 /* There are too many feature bits to fit in a single word so the set of cpu 
and
fpu capabilities is a structure.  A feature set is created and manipulated
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 
48aac41c37a35e27440f67863d4a92457916dd1b..2f24bf4c9a9ae1a12ba284fd160c34e577b2e4c6
 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -416,10 +416,19 @@ EnumValue
 Enum(arm_arch) String(armv8-a+crc) Value(26)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt) Value(27)
+Enum(arm_arch) String(armv8-m.base) Value(27)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt2) Value(28)
+Enum(arm_arch) String(armv8-m.main) Value(28)
+
+EnumValue
+Enum(arm_arch) String(armv8-m.main+dsp) Value(29)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt) Value(30)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(31)
 
 Enum
 Name(arm_fpu) Type(int

RE: [arm-embedded][PATCH, libgcc/ARM 1/6] Fix Thumb-1 only == ARMv6-M & Thumb-2 only == ARMv7-M assumptions

2015-12-16 Thread Thomas Preud'homme
The following was committed, once rebased on top of the embedded branch (patch 
was generated on top of gcc-5-branch):

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
8c10ea3c9053e89b8eae1e5353b92d6020499409..bf1a0e874b1669f3ebe1e5870556a46b80686b82
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2336,8 +2336,10 @@ extern int making_const_table;
 #define TARGET_ARM_ARCH\
   (arm_base_arch)  \
 
-#define TARGET_ARM_V6M (!arm_arch_notm && !arm_arch_thumb2)
-#define TARGET_ARM_V7M (!arm_arch_notm && arm_arch_thumb2)
+#define TARGET_ARM_V6M (TARGET_ARM_ARCH == BASE_ARCH_6M && !arm_arch_notm \
+   && !arm_arch_thumb2)
+#define TARGET_ARM_V7M (TARGET_ARM_ARCH == BASE_ARCH_7M && !arm_arch_notm \
+   && arm_arch_thumb2)
 
 /* The highest Thumb instruction set version supported by the chip.  */
 #define TARGET_ARM_ARCH_ISA_THUMB  \
diff --git a/gcc/config/arm/elf.h b/gcc/config/arm/elf.h
index 
c56bbdff69466af8b2e8db70f99f33054748b650..fe06cd1a2857db44fb7c1d9407aa6710a061d4df
 100644
--- a/gcc/config/arm/elf.h
+++ b/gcc/config/arm/elf.h
@@ -149,8 +149,9 @@
   while (0)
 
 /* Horrible hack: We want to prevent some libgcc routines being included
-   for some multilibs.  */
-#ifndef __ARM_ARCH_6M__
+   for some multilibs.  The condition should match the one in
+   libgcc/config/arm/lib1funcs.S.  */
+#if __ARM_ARCH_ISA_ARM || __ARM_ARCH_ISA_THUMB != 1
 #undef L_fixdfsi
 #undef L_fixunsdfsi
 #undef L_truncdfsf2
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 
950db11637636f46e805beee3bd55ead62aec67e..35867a281a01cbbb3810013b17e3b2173eb275ee
 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2973,10 +2973,8 @@ proc check_effective_target_arm_cortex_m { } {
return 0
 }
 return [check_no_compiler_messages arm_cortex_m assembly {
-   #if !defined(__ARM_ARCH_7M__) \
-&& !defined (__ARM_ARCH_7EM__) \
-&& !defined (__ARM_ARCH_6M__)
-   #error !__ARM_ARCH_7M__ && !__ARM_ARCH_7EM__ && !__ARM_ARCH_6M__
+   #if defined(__ARM_ARCH_ISA_ARM)
+   #error __ARM_ARCH_ISA_ARM is defined
#endif
int i;
 } "-mthumb"]
diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S
index 
a1e164032a08d04f7e8be80094d3b054b4e8bed4..9ae0bb82d1b3e3f81baf73e11b484f4212ea28e4
 100644
--- a/libgcc/config/arm/bpabi-v6m.S
+++ b/libgcc/config/arm/bpabi-v6m.S
@@ -1,4 +1,4 @@
-/* Miscellaneous BPABI functions.  ARMv6M implementation
+/* Miscellaneous BPABI functions.  Thumb-1 only implementation
 
Copyright (C) 2006-2015 Free Software Foundation, Inc.
Contributed by CodeSourcery.
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 
a1e41c292b22bcbd6b6ec9e11060d7a1f6e28fba..e8fd73f48a197fe4b81eaf73c13a11f913bba9e0
 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -124,7 +124,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
  && !defined(__thumb2__)   \
  && (!defined(__THUMB_INTERWORK__) \
 || defined (__OPTIMIZE_SIZE__) \
-|| defined(__ARM_ARCH_6M__)))
+|| !__ARM_ARCH_ISA_ARM))
 # define __prefer_thumb__
 #endif
 
@@ -305,7 +305,7 @@ LSYM(Lend_fde):
 
 #ifdef __ARM_EABI__
 .macro THUMB_LDIV0 name signed
-#if defined(__ARM_ARCH_6M__)
+#if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1
 
push{r0, lr}
mov r0, #0
@@ -456,7 +456,7 @@ _L__\name:
 
 #else /* !(__INTERWORKING_STUBS__ || __thumb2__) */
 
-#ifdef __ARM_ARCH_6M__
+#if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1
 #define EQUIV .thumb_set
 #else
 .macro ARM_FUNC_START name sp_section=
@@ -488,7 +488,7 @@ SYM (__\name):
 #endif
 .endm
 
-#ifndef __ARM_ARCH_6M__
+#if __ARM_ARCH_ISA_ARM || __ARM_ARCH_ISA_THUMB != 1
 .macro ARM_FUNC_ALIAS new old
.globl  SYM (__\new)
EQUIV   SYM (__\new), SYM (__\old)
@@ -1213,7 +1213,7 @@ ARM_FUNC_START aeabi_uidivmod
 /*  */
 #ifdef L_umodsi3
 
-#ifdef __ARM_ARCH_EXT_IDIV__
+#if defined(__ARM_ARCH_EXT_IDIV__) && __ARM_ARCH_ISA_THUMB != 1
 
ARM_FUNC_START umodsi3
 
@@ -1424,7 +1424,7 @@ ARM_FUNC_START aeabi_idivmod
 /*  */
 #ifdef L_modsi3
 
-#if defined(__ARM_ARCH_EXT_IDIV__)
+#if defined(__ARM_ARCH_EXT_IDIV__) && __ARM_ARCH_ISA_THUMB != 1
 
ARM_FUNC_START modsi3
 
@@ -1685,14 +1685,14 @@ LSYM(Lover12):
 
 #endif /* __symbian__ */
 
-#if ((__ARM_ARCH__ > 5) && !defined(__ARM_ARCH_6M__)) \
+#if ((__ARM_ARCH__ > 5) && (__ARM_ARCH_ISA_ARM || __ARM_ARCH_ISA_THUMB != 1)) \
 || defined(__ARM_ARCH_5E__) || defined(__ARM_ARCH_5TE__) \
 || defined(__ARM_ARCH_5TEJ__)
 #define HAVE_ARM_CLZ 1
 #endif
 
 #ifdef L_clzsi2
-#if defined(__ARM_ARCH_6M__)
+#if !__

[arm-embedded][PATCH, libgcc/ARM 1/6] Fix Thumb-1 only == ARMv6-M & Thumb-2 only == ARMv7-M assumptions

2015-12-16 Thread Thomas Preud'homme
Hi,

We decided to apply the following patch to the ARM embedded 5 branch. This is 
*not* intended for trunk for now. We will send a separate email for trunk.

This patch is part of a patch series to add support for ARMv8-M[1] to GCC. This 
specific patch fixes some assumptions related to M profile architectures. 
Currently GCC (mostly libgcc) contains several assumptions that the only ARM 
architecture with Thumb-1 only instructions is ARMv6-M and the only one with 
Thumb-2 only instructions is ARMv7-M. ARMv8-M [1] make this wrong since ARMv8-M 
baseline is also (mostly) Thumb-1 only and ARMv8-M mainline is also Thumb-2 
only. This patch replace checks for __ARM_ARCH_*__ for checks against 
__ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM instead. For instance, Thumb-1 only 
can be checked with #if !defined(__ARM_ARCH_ISA_ARM) && (__ARM_ARCH_ISA_THUMB 
== 1). It also fixes the guard for DIV code to not apply to ARMv8-M Baseline 
since it uses Thumb-2 instructions.

[1] For a quick overview of ARMv8-M please refer to the initial cover letter.

ChangeLog entries are as follow:


*** gcc/ChangeLog ***

2015-11-13  Thomas Preud'homme  

* config/arm/elf.h: Use __ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM to
decide whether to prevent some libgcc routines being included for some
multilibs rather than __ARM_ARCH_6M__ and add comment to indicate the
link between this condition and the one in
libgcc/config/arm/lib1func.S.
* config/arm/arm.h (TARGET_ARM_V6M): Add check to TARGET_ARM_ARCH.
(TARGET_ARM_V7M): Likewise.


*** gcc/testsuite/ChangeLog ***

2015-11-10  Thomas Preud'homme  

* lib/target-supports.exp (check_effective_target_arm_cortex_m): Use
__ARM_ARCH_ISA_ARM to test for Cortex-M devices.


*** libgcc/ChangeLog ***

2015-11-13  Thomas Preud'homme  

* config/arm/bpabi-v6m.S: Fix header comment to mention Thumb-1 rather
than ARMv6-M.
* config/arm/lib1funcs.S (__prefer_thumb__): Define among other cases
for all Thumb-1 only targets.
(__only_thumb1__): Define for all Thumb-1 only targets.
(THUMB_LDIV0): Test for __only_thumb1__ rather than __ARM_ARCH_6M__.
(EQUIV): Likewise.
(ARM_FUNC_ALIAS): Likewise.
(umodsi3): Add check to __only_thumb1__ to guard the idiv version.
(modsi3): Likewise.
(HAVE_ARM_CLZ): Test for __only_thumb1__ rather than __ARM_ARCH_6M__.
(clzsi2): Likewise.
(clzdi2): Likewise.
(ctzsi2): Likewise.
(L_interwork_call_via_rX): Test for __ARM_ARCH_ISA_ARM rather than
__ARM_ARCH_6M__ in guard for checking whether it is defined.
(final includes): Test for __only_thumb1__ rather than
__ARM_ARCH_6M__ and add comment to indicate the connection between
this condition and the one in gcc/config/arm/elf.h.
* config/arm/libunwind.S: Test for __ARM_ARCH_ISA_THUMB and
__ARM_ARCH_ISA_ARM rather than __ARM_ARCH_6M__.
* config/arm/t-softfp: Likewise.


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 6ed8ad3..06abcf3 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2181,8 +2181,10 @@ extern int making_const_table;
 #define TARGET_ARM_ARCH\
   (arm_base_arch)  \
 
-#define TARGET_ARM_V6M (!arm_arch_notm && !arm_arch_thumb2)
-#define TARGET_ARM_V7M (!arm_arch_notm && arm_arch_thumb2)
+#define TARGET_ARM_V6M (TARGET_ARM_ARCH == BASE_ARCH_6M && !arm_arch_notm \
+   && !arm_arch_thumb2)
+#define TARGET_ARM_V7M (TARGET_ARM_ARCH == BASE_ARCH_7M && !arm_arch_notm \
+   && arm_arch_thumb2)
 
 /* The highest Thumb instruction set version supported by the chip.  */
 #define TARGET_ARM_ARCH_ISA_THUMB  \
diff --git a/gcc/config/arm/elf.h b/gcc/config/arm/elf.h
index 3795728..579a580 100644
--- a/gcc/config/arm/elf.h
+++ b/gcc/config/arm/elf.h
@@ -148,8 +148,9 @@
   while (0)
 
 /* Horrible hack: We want to prevent some libgcc routines being included
-   for some multilibs.  */
-#ifndef __ARM_ARCH_6M__
+   for some multilibs.  The condition should match the one in
+   libgcc/config/arm/lib1funcs.S.  */
+#if __ARM_ARCH_ISA_ARM || __ARM_ARCH_ISA_THUMB != 1
 #undef L_fixdfsi
 #undef L_fixunsdfsi
 #undef L_truncdfsf2
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 254c4e3..6cf7ee1 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3210,10 +3210,8 @@ proc check_effective_target_arm_cortex_m { } {
return 0
 }
 return [check_no_compiler_messages arm_cortex_m assembly {
-   #if !defined(__ARM_ARCH_7M__) \
-&& !defined (__ARM_ARCH_7EM__) \
-&& !defined (__ARM_ARCH_6M__)
-   #error !__ARM_ARCH_7M__ && !__ARM_ARCH_7EM__ && !__ARM_ARCH_6M__
+   #if defined(__ARM_ARCH_ISA_ARM)
+   #error __ARM_ARCH_ISA_ARM is defined
#endif
int i;
 } 

[PATCH, GCC, V8M 0/6] Add support for ARMv8-M

2015-12-16 Thread Thomas Preud'homme
Hi,

I'll be posting a patch series intended for trunk whose aim is to add support 
for ARMv8-M. This patch series does not include changes to support the security 
extensions [nor does it include atomics for ARMv8-M Baseline]. This will be 
posted as a separate patch series.


=== Quick overview of ARMv8-M ===

ARMv8-M has two profiles[1]: Baseline and Mainline.  In terms of features they 
can be defined as:

ARMv8-M Baseline (armv8-m.base):
 * All ARMv6-M features
 * 16-bit immediate moves
 * Wide Branch
 * Compare & branch if (not) zero
 * Integer divide
 * Load/store exclusives
 * Atomic Load/stores
 * security extensions

ARMv8-M Mainline (armv8-m.main):
 * All ARMv7-M features
 * Atomic load/stores
 * security extensions.

ARMv8-M Mainline with DSP extension (armv8-m.main+dsp):
 * ARMv8-M Mainline
 * Those instructions added to ARMv7E-M on top of ARMv7-M.

Note that although certain architectural features of the security extensions 
are optional for cores implementing ARMv8-M, some of the new instructions are 
always available in the architecture.

Note also that only the security extensions instructions are new instructions, 
all other instructions have previously been available in other ARM Architecture 
profiles.

[1] 
http://www.arm.com/products/processors/instruction-set-architectures/armv8-m-architecture.php



Re: [PATCH] Fix some blockers of PR c++/24666 (arrays decay to pointers too early)

2015-12-16 Thread Jason Merrill

OK, thanks.

Jason


[PATCH] C FE: improvements to ranges of bad return values

2015-12-16 Thread David Malcolm
In the C FE, c_parser_statement_after_labels passes "xloc" to
c_finish_return, which is the location of the first token
within the returned expression.

Hence we don't get a full underline for the following:

diagnostic-range-bad-return.c:34:10: warning: function returns address of local 
variable [-Wreturn-local-addr]
   return &some_local;
  ^

This feels like a bug; this patch fixes it to use the location of
the expr if available, and to fall back to xloc otherwise, giving
us underlining of the full expression:

diagnostic-range-bad-return.c:34:10: warning: function returns address of local 
variable [-Wreturn-local-addr]
   return &some_local;
  ^~~

The testcase also adds some coverage for underlining the
"return" token for the cases where we're warning about th
erroneous presence/absence of a return value.

As an additional tweak, it struck me that we could be more
user-friendly for these latter diagnostics by issuing a note
about where the function was declared, so this patch also adds
an inform for these cases:

diagnostic-range-bad-return.c: In function 'missing_return_value':
diagnostic-range-bad-return.c:31:3: warning: 'return' with no value, in 
function returning non-void
   return; /* { dg-warning "'return' with no value, in function returning 
non-void" } */
   ^~

diagnostic-range-bad-return.c:29:5: note: declared here
 int missing_return_value (void)
 ^~~~

(ideally we'd put the underline on the return type, but that location
isn't captured)

This latter part of the patch is an enhancement rather than a
bugfix, though FWIW, and I'm not sure I can argue this with a
straight face, the tweak was posted as part of:
  "[PATCH 16/22] C/C++ frontend: use tree ranges in various diagnostics"
in https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00745.html
during stage 1.  Hopefully low risk, and a small usability improvement;
but if this is pushing it, it'd be simple to split this up and only
do the bug fix.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu;
adds 12 PASS results to gcc.sum.

OK for trunk for stage 3?

gcc/c/ChangeLog:
* c-parser.c (c_parser_statement_after_labels): When calling
c_finish_return, Use the return expression's location if it has
one, falling back to the location of the first token within it.
* c-typeck.c (c_finish_return): When issuing warnings about
the incorrect presence/absence of a return value, issue a note
showing the declaration of the function.

gcc/testsuite/ChangeLog:
* gcc.dg/diagnostic-range-bad-return.c: New test case.
---
 gcc/c/c-parser.c   |  3 +-
 gcc/c/c-typeck.c   | 28 
 gcc/testsuite/gcc.dg/diagnostic-range-bad-return.c | 52 ++
 3 files changed, 74 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/diagnostic-range-bad-return.c

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index e149e19..933a938 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -5108,7 +5108,8 @@ c_parser_statement_after_labels (c_parser *parser, 
vec *chain)
  location_t xloc = c_parser_peek_token (parser)->location;
  struct c_expr expr = c_parser_expression_conv (parser);
  mark_exp_read (expr.value);
- stmt = c_finish_return (xloc, expr.value, expr.original_type);
+ stmt = c_finish_return (EXPR_LOC_OR_LOC (expr.value, xloc),
+ expr.value, expr.original_type);
  goto expect_semicolon;
}
  break;
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index b93da07..c314f08 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -9546,24 +9546,36 @@ c_finish_return (location_t loc, tree retval, tree 
origtype)
   if ((warn_return_type || flag_isoc99)
  && valtype != 0 && TREE_CODE (valtype) != VOID_TYPE)
{
+ bool warned_here;
  if (flag_isoc99)
-   pedwarn (loc, 0, "% with no value, in "
-"function returning non-void");
+   warned_here = pedwarn
+ (loc, 0,
+  "% with no value, in function returning non-void");
  else
-   warning_at (loc, OPT_Wreturn_type, "% with no value, "
-   "in function returning non-void");
+   warned_here = warning_at
+ (loc, OPT_Wreturn_type,
+  "% with no value, in function returning non-void");
  no_warning = true;
+ if (warned_here)
+   inform (DECL_SOURCE_LOCATION (current_function_decl),
+   "declared here");
}
 }
   else if (valtype == 0 || TREE_CODE (valtype) == VOID_TYPE)
 {
   current_function_returns_null = 1;
+  bool warned_here;
   if (TREE_CODE (TREE_TYPE (retval)) != VOID_TYPE)
-   pedwarn (xloc, 0,
-"% with a value, in function returning void");
+ 

[committed] Fix HPPA/PARISC 32-bit Linux kernel build

2015-12-16 Thread John David Anglin
The attached patch fixes a reload error in one of the new 64-bit atomic 
patterns in a 32-bit kernel build.

Kernel builds disable the use of floating point registers with 
-mdisable-fpregs.  The atomic patterns need
to use floating point loads and stores and should have been disabled when 
-mdisable-fpregs was specified.

Although no atomic instructions were explicitly used by the code, the atomic 
pattern in question was created
during cse1.  In general, we want to use the normal floating point patterns.  
So, I moved the atomic patterns
to the end of pa.md.

Tested on hppa-unknown-linux-gnu, hppa2.0w-hp-hpux11.11 and hppa64-hp-hpux11.11 
with no observed
regressions.  Committed to trunk and gcc-5 branch.

Dave
--
John David Anglin   dave.ang...@bell.net


2015-12-16  John David Anglin  

PR target/68779
* config/pa/pa.md (atomic_loaddi): Honor -mdisable-fpregs.
(atomic_loaddi_1): Likewise.
(atomic_storedi): Likewise.
(atomic_storedi_1): Likewise.
(atomic_loaddf): Likewise.
(atomic_loaddf_1): Likewise.
(atomic_storedf): Likewise.
(atomic_storedf_1): Likewise.
Move all atomic patterns to end of file.

Index: config/pa/pa.md
===
--- config/pa/pa.md (revision 231580)
+++ config/pa/pa.md (working copy)
@@ -692,237 +692,6 @@
 (include "predicates.md")
 (include "constraints.md")
 
-;; Atomic instructions
-
-;; All memory loads and stores access storage atomically except
-;; for one exception.  The STORE BYTES, STORE DOUBLE BYTES, and
-;; doubleword loads and stores are not guaranteed to be atomic
-;; when referencing the I/O address space.
-
-;; The kernel cmpxchg operation on linux is not atomic with respect to
-;; memory stores on SMP machines, so we must do stores using a cmpxchg
-;; operation.
-
-;; Implement atomic QImode store using exchange.
-
-(define_expand "atomic_storeqi"
-  [(match_operand:QI 0 "memory_operand");; memory
-   (match_operand:QI 1 "register_operand")  ;; val out
-   (match_operand:SI 2 "const_int_operand")];; model
-  ""
-{
-  if (TARGET_SYNC_LIBCALL)
-{
-  rtx mem = operands[0];
-  rtx val = operands[1];
-  if (pa_maybe_emit_compare_and_swap_exchange_loop (NULL_RTX, mem, val))
-   DONE;
-}
-  FAIL;
-})
-
-;; Implement atomic HImode stores using exchange.
-
-(define_expand "atomic_storehi"
-  [(match_operand:HI 0 "memory_operand");; memory
-   (match_operand:HI 1 "register_operand")  ;; val out
-   (match_operand:SI 2 "const_int_operand")];; model
-  ""
-{
-  if (TARGET_SYNC_LIBCALL)
-{
-  rtx mem = operands[0];
-  rtx val = operands[1];
-  if (pa_maybe_emit_compare_and_swap_exchange_loop (NULL_RTX, mem, val))
-   DONE;
-}
-  FAIL;
-})
-
-;; Implement atomic SImode store using exchange.
-
-(define_expand "atomic_storesi"
-  [(match_operand:SI 0 "memory_operand");; memory
-   (match_operand:SI 1 "register_operand")  ;; val out
-   (match_operand:SI 2 "const_int_operand")];; model
-  ""
-{
-  if (TARGET_SYNC_LIBCALL)
-{
-  rtx mem = operands[0];
-  rtx val = operands[1];
-  if (pa_maybe_emit_compare_and_swap_exchange_loop (NULL_RTX, mem, val))
-   DONE;
-}
-  FAIL;
-})
-
-;; Implement atomic SFmode store using exchange.
-
-(define_expand "atomic_storesf"
-  [(match_operand:SF 0 "memory_operand");; memory
-   (match_operand:SF 1 "register_operand")  ;; val out
-   (match_operand:SI 2 "const_int_operand")];; model
-  ""
-{
-  if (TARGET_SYNC_LIBCALL)
-{
-  rtx mem = operands[0];
-  rtx val = operands[1];
-  if (pa_maybe_emit_compare_and_swap_exchange_loop (NULL_RTX, mem, val))
-   DONE;
-}
-  FAIL;
-})
-
-;; Implement atomic DImode load using 64-bit floating point load.
-
-(define_expand "atomic_loaddi"
-  [(match_operand:DI 0 "register_operand")  ;; val out
-   (match_operand:DI 1 "memory_operand");; memory
-   (match_operand:SI 2 "const_int_operand")];; model
-  ""
-{
-  enum memmodel model;
-
-  if (TARGET_64BIT || TARGET_SOFT_FLOAT)
-FAIL;
-
-  model = memmodel_from_int (INTVAL (operands[2]));
-  operands[1] = force_reg (SImode, XEXP (operands[1], 0));
-  expand_mem_thread_fence (model);
-  emit_insn (gen_atomic_loaddi_1 (operands[0], operands[1]));
-  if (is_mm_seq_cst (model))
-expand_mem_thread_fence (model);
-  DONE;
-})
-
-(define_insn "atomic_loaddi_1"
-  [(set (match_operand:DI 0 "register_operand" "=f,r")
-(mem:DI (match_operand:SI 1 "register_operand" "r,r")))
-   (clobber (match_scratch:DI 2 "=X,f"))]
-  "!TARGET_64BIT && !TARGET_SOFT_FLOAT"
-  "@
-   {fldds|fldd} 0(%1),%0
-   {fldds|fldd} 0(%1),%2\n\t{fstds|fstd} %2,-16(%%sp)\n\t{ldws|ldw} 
-16(%%sp),%0\n\t{ldws|ldw} -12(%%sp),%R0"
-  [(set_attr "type" "move,move

Re: [ARM] Use vector wide add for mixed-mode adds

2015-12-16 Thread Michael Collison

Kyrill,

I have attached a patch that address your comments. The only change I 
would ask you to re-consider renaming is the function 'bool 
aarch32_simd_check_vect_par_cnst_half'. This function was copied from 
the aarch64 port and I thought it as important to match the naming for 
maintenance purposes. I did rename the function to 'bool 
arm_simd_check_vect_par_cnst_half_p'. I changed 'aarch32' to 'arm' and 
added '_p' per you suggestions. Is this okay?


I implemented all your other change suggestions.

2015-12-16  Michael Collison  

* config/arm/neon.md (widen_sum): New patterns where
mode is VQI to improve mixed mode vectorization.
* config/arm/neon.md (vec_sel_widen_ssum_lo3): New
define_insn to match low half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_ssum_hi3): New
define_insn to match high half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_usum_lo3): New
define_insn to match low half of unsigned vaddw.
* config/arm/neon.md (vec_sel_widen_usum_hi3): New
define_insn to match high half of unsigned vaddw.
* config/arm/arm.c (arm_simd_vect_par_cnst_half): New function.
(arm_simd_check_vect_par_cnst_half_p): Likewise.
* config/arm/arm-protos.h (arm_simd_vect_par_cnst_half): Prototype
for new function.
(arm_simd_check_vect_par_cnst_half_p): Likewise.
* config/arm/predicates.md (vect_par_constant_high): Support
big endian and simplify by calling
arm_simd_check_vect_par_cnst_half
(vect_par_constant_low): Likewise.
* testsuite/gcc.target/arm/neon-vaddws16.c: New test.
* testsuite/gcc.target/arm/neon-vaddws32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
* testsuite/lib/target-supports.exp
(check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
that arm neon support vector widen sum of HImode TO SImode.

On 12/10/2015 08:09 AM, Kyrill Tkachov wrote:

Hi Michael,

A few comments while I look deeper into this patch...

On 30/11/15 01:18, Michael Collison wrote:


This is a modified version of my previous patch that supports vector 
wide add. I added support for vaddw on big endian when generating the 
parallel operand for the vector select.


There are four failing test cases on arm big endian with similar 
code. They are:


gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test


The failures occur without my patch and are related to a bug with 
vector loads using VUZP operations.


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68532

Validated on arm-none-eabi, arm-none-linux-gnueabi, 
arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf.


2015-11-29  Michael Collison 

* config/arm/neon.md (widen_sum): New patterns where
mode is VQI to improve mixed mode vectorization.
* config/arm/neon.md (vec_sel_widen_ssum_lo3): 
New

define_insn to match low half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_ssum_hi3): 
New

define_insn to match high half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_usum_lo3): 
New

define_insn to match low half of unsigned vaddw.
* config/arm/neon.md (vec_sel_widen_usum_hi3): 
New

define_insn to match high half of unsigned vaddw.
* config/arm/arm.c (aarch32_simd_vect_par_cnst_half): New function.
(aarch32_simd_check_vect_par_cnst_half): Likewise.
* config/arm/arm-protos.h (aarch32_simd_vect_par_cnst_half): 
Prototype

for new function.
(aarch32_simd_check_vect_par_cnst_half): Likewise.
* config/arm/predicates.md (vect_par_constant_high): Support
big endian and simplify by calling
aarch32_simd_check_vect_par_cnst_half
(vect_par_constant_low): Likewise.
* testsuite/gcc.target/arm/neon-vaddws16.c: New test.
* testsuite/gcc.target/arm/neon-vaddws32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
* testsuite/lib/target-supports.exp
(check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
that arm neon support vector widen sum of HImode TO SImode.

Okay for trunk?



--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -50,7 +50,9 @@ extern tree arm_builtin_decl (unsigned code, bool 
initialize_p

   ATTRIBUTE_UNUSED);
 extern void arm_init_builtins (void);
 extern void arm_atomic_assign_expand_fenv (tree *hold, tree *clear, 
tree *update);

-
+extern rtx aarch32_simd_vect_par_cnst_half (machine_mode mode, bool 
high);
+extern bool aarch32_simd_check_vect_par_cnst_half (rtx op, 
machine_mode mode,

+   boo

Re: [PATCH] Pr 68805, Fix PowerPC little endian -mvsx-timode

2015-12-16 Thread David Edelsohn
On Wed, Dec 16, 2015 at 6:20 PM, Michael Meissner
 wrote:
> My first mail did not seem to be delivered, so I'm trying again.
>
> This fixes a bug with the debug switch -mvsx-timode that we would eventually
> like to enable by default on PowerPC little endian server systems.  The bug is
> that the load with rotate or rotate with store instructions needed on power8
> little endian systems used VEC_SELECT to swap the 64-bit words.  This patch
> uses ROTATE for TImode, just like I did for KFmode.
>
> Without this patch, 10 of the 30 spec 2006 benchmarks fail to compile on a
> little endian PowerPC system with -mvsx-timode.  With the patch, all 30
> benchmarks compile and do the spec verification.
>
> In developing the patch, I noticed that the generic swap optimizations that 
> are
> done for vector types are not done for TImode, since we don't split the 
> TImoves
> until after register allocation when we discover a vector register was used
> instead of a GPR register.  So, I added a peephole2 to catch the common case 
> of
> store followed by load, eliminating the pair of ROTATE insns.
>
> I bootstrapped it on both a big endian power7 and a little endian power8 
> system
> with no regressions.  Is it ok to install on the trunk?
>
> At the current time, I don't see the need to back port it to GCC 5 (though the
> backport is fairly simple), because it isn't on by default in GCC 5, and we
> don't plan to eventually have -mvsx-timode and -mlra on by default in that
> branch.
>
> [gcc]
> 2015-12-15  Michael Meissner  
>
> PR target/68805
> * config/rs6000/rs6000.c (rs6000_gen_le_vsx_permute): Use ROTATE
> instead of VEC_SELECT for TImode.
>
> * config/rs6000/vsx.md (VSX_LE): Move TImode from VSX_LE to
> VSX_LE_128, so that we use ROTATE to swap the 64-bit words instead
> of using VEC_SELECT.
> (VSX_LE_128): Likewise.
> (define_peephole2): Add peephole to eliminate double xxpermdi when
> copying TImode.
>
> [gcc/testsuite]
> 2015-12-15  Michael Meissner  
>
> PR target/68805
> * gcc.target/powerpc/pr68805.c: New test.

Okay.

Thanks, David


[PATCH] [graphite] move all isl include files to graphite.h

2015-12-16 Thread Sebastian Pop
* graphite-dependences.c: Move all isl include files to...
* graphite-isl-ast-to-gimple.c: Same.
* graphite-optimize-isl.c: Same.
* graphite-poly.c: Same.
* graphite-scop-detection.c: Same.
* graphite.c: Same.
* graphite.h: ... here.
---
 gcc/graphite-dependences.c   |  9 -
 gcc/graphite-isl-ast-to-gimple.c | 12 
 gcc/graphite-optimize-isl.c  | 17 -
 gcc/graphite-poly.c  | 14 --
 gcc/graphite-scop-detection.c|  6 --
 gcc/graphite.c   |  7 ---
 gcc/graphite.h   | 20 ++--
 7 files changed, 18 insertions(+), 67 deletions(-)

diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c
index 407a11e..46869d7 100644
--- a/gcc/graphite-dependences.c
+++ b/gcc/graphite-dependences.c
@@ -37,17 +37,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "cfgloop.h"
 #include "tree-data-ref.h"
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
 #include "graphite.h"
 
-
 /* Add the constraints from the set S to the domain of MAP.  */
 
 static isl_map *
diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 66c4c5a..795232a 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -53,18 +53,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-pretty-print.h"
 #include "cfganal.h"
 #include "value-prof.h"
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
-#include 
-#endif
-
 #include "graphite.h"
 #include 
 
diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index cb45bfe..c546edc 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -37,23 +37,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-data-ref.h"
 #include "params.h"
 #include "dumpfile.h"
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
-/* isl 0.15 or later.  */
-#include 
-#include 
-#endif
-
 #include "graphite.h"
 
 #ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index 1f50ff8..d188341 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -40,22 +40,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "pretty-print.h"
 #include "gimple-pretty-print.h"
 #include "tree-dump.h"
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
 #include "graphite.h"
 
-#define OPENSCOP_MAX_STRING 256
-
-
 /* Print to STDERR the GMP value VAL.  */
 
 DEBUG_FUNCTION void
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 0d5dc2b..dd506b5 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -48,12 +48,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "tree-ssa-propagate.h"
 #include "gimple-pretty-print.h"
-
-#include 
-#include 
-#include 
-#include 
-
 #include "graphite.h"
 
 class debug_printer
diff --git a/gcc/graphite.c b/gcc/graphite.c
index f156c5e..8d0d24c 100644
--- a/gcc/graphite.c
+++ b/gcc/graphite.c
@@ -52,13 +52,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "dbgcnt.h"
 #include "tree-parloops.h"
 #include "tree-cfgcleanup.h"
-
-#include 
-#include 
-#include 
-#include 
-#include 
-
 #include "graphite.h"
 
 /* Print global statistics to FILE.  */
diff --git a/gcc/graphite.h b/gcc/graphite.h
index 01da3db..83f8191 100644
--- a/gcc/graphite.h
+++ b/gcc/graphite.h
@@ -23,10 +23,26 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_GRAPHITE_POLY_H
 
 #include "sese.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
 #include 
+#include 
 
-#ifndef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
-  /* isl 0.14.  */
+#ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
+/* isl 0.15 or later.  */
+#include 
+
+#else
+/* isl 0.14 or 0.13.  */
 # define isl_stat int
 # define isl_stat_ok 0
 #endif
-- 
1.9.1



[PATCH] PR target/68937: i686: -fno-plt produces wrong code (maybe only with tailcall

2015-12-16 Thread H.J. Lu
Since sibcall never returns, we can only use call-clobbered register
as GOT base.  Otherwise, callee-saved register used as GOT base won't
be properly restored.

Tested on x86-64 with -m32.  OK for trunk?


H.J.
---
gcc/

PR target/68937
* config/i386/i386.c (ix86_function_ok_for_sibcall): Count
call via GOT slot as indirect call.
(ix86_expand_call): Mark PIC register used for sibcall as
call-clobbered.
* config/i386/i386.md (*sibcall_GOT_32): New pattern.
(*sibcall_value_GOT_32): Likewise.

gcc/testsuite/

PR target/68937
* gcc.target/i386/pr68937-1.c: New test.
* gcc.target/i386/pr68937-2.c: Likewise.
* gcc.target/i386/pr68937-3.c: Likewise.
* gcc.target/i386/pr68937-4.c: Likewise.
---
 gcc/config/i386/i386.c| 15 ---
 gcc/config/i386/i386.md   | 45 +++
 gcc/testsuite/gcc.target/i386/pr68937-1.c | 13 +
 gcc/testsuite/gcc.target/i386/pr68937-2.c | 13 +
 gcc/testsuite/gcc.target/i386/pr68937-3.c | 13 +
 gcc/testsuite/gcc.target/i386/pr68937-4.c | 13 +
 6 files changed, 109 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-4.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cecea24..ebc9d09 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6723,8 +6723,10 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
   /* If this call is indirect, we'll need to be able to use a
 call-clobbered register for the address of the target function.
 Make sure that all such registers are not used for passing
-parameters.  Note that DLLIMPORT functions are indirect.  */
+parameters.  Note that DLLIMPORT functions and call via GOT
+slot are indirect.  */
   if (!decl
+ || (flag_pic && !flag_plt)
  || (TARGET_DLLIMPORT_DECL_ATTRIBUTES && DECL_DLLIMPORT_P (decl)))
{
  /* Check if regparm >= 3 since arg_reg_available is set to
@@ -27019,8 +27021,8 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
  rtx callarg2,
  rtx pop, bool sibcall)
 {
-  rtx vec[3];
-  rtx use = NULL, call;
+  rtx vec[4];
+  rtx use = NULL, call, clobber = NULL;
   unsigned int vec_len = 0;
 
   if (pop == const0_rtx)
@@ -27075,6 +27077,10 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
  fnaddr = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr),
   UNSPEC_GOT);
  fnaddr = gen_rtx_CONST (Pmode, fnaddr);
+ /* Since sibcall never returns, mark PIC register as
+call-clobbered.  */
+ if (sibcall)
+   clobber = pic_offset_table_rtx;
  fnaddr = gen_rtx_PLUS (Pmode, pic_offset_table_rtx,
 fnaddr);
}
@@ -27151,6 +27157,9 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
 }
   vec[vec_len++] = call;
 
+  if (clobber)
+vec[vec_len++] = gen_rtx_CLOBBER (VOIDmode, clobber);
+
   if (pop)
 {
   pop = gen_rtx_PLUS (Pmode, stack_pointer_rtx, pop);
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 49b2216..65c1534 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11865,6 +11865,28 @@
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
+;; Since sibcall never returns, we can only use call-clobbered register
+;; as GOT base.
+(define_insn "*sibcall_GOT_32"
+  [(call (mem:QI
+  (mem:SI (plus:SI
+(match_operand:SI 0 "register_operand" "U")
+(const:SI
+  (unspec:SI [(match_operand:SI 1 "symbol_operand")]
+   UNSPEC_GOT)
+(match_operand 2))
+   (clobber (match_dup 0))]
+  "!TARGET_MACHO && !TARGET_64BIT && SIBLING_CALL_P (insn)"
+{
+  rtx fnaddr = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, operands[1]),
+  UNSPEC_GOT);
+  fnaddr = gen_rtx_CONST (Pmode, fnaddr);
+  fnaddr = gen_rtx_PLUS (Pmode, operands[0], fnaddr);
+  fnaddr = gen_const_mem (Pmode, fnaddr);
+  return ix86_output_call_insn (insn, fnaddr);
+}
+  [(set_attr "type" "call")])
+
 (define_insn "*sibcall"
   [(call (mem:QI (match_operand:W 0 "sibcall_insn_operand" "UBsBz"))
 (match_operand 1))]
@@ -12042,6 +12064,29 @@
   "* return ix86_output_call_insn (insn, operands[1]);"
   [(set_attr "type" "callv")])
 
+;; Since sibcall never returns, we can only use call-clobbered register
+;; as GOT base.
+(define_insn "*sibcall_value_GOT_32"
+  [(set (match_operand 0)
+(call (mem:QI
+   (mem:SI

[PATCH] Pr 68805, Fix PowerPC little endian -mvsx-timode

2015-12-16 Thread Michael Meissner
My first mail did not seem to be delivered, so I'm trying again.

This fixes a bug with the debug switch -mvsx-timode that we would eventually
like to enable by default on PowerPC little endian server systems.  The bug is
that the load with rotate or rotate with store instructions needed on power8
little endian systems used VEC_SELECT to swap the 64-bit words.  This patch
uses ROTATE for TImode, just like I did for KFmode.

Without this patch, 10 of the 30 spec 2006 benchmarks fail to compile on a
little endian PowerPC system with -mvsx-timode.  With the patch, all 30
benchmarks compile and do the spec verification.

In developing the patch, I noticed that the generic swap optimizations that are
done for vector types are not done for TImode, since we don't split the TImoves
until after register allocation when we discover a vector register was used
instead of a GPR register.  So, I added a peephole2 to catch the common case of
store followed by load, eliminating the pair of ROTATE insns.

I bootstrapped it on both a big endian power7 and a little endian power8 system
with no regressions.  Is it ok to install on the trunk?

At the current time, I don't see the need to back port it to GCC 5 (though the
backport is fairly simple), because it isn't on by default in GCC 5, and we
don't plan to eventually have -mvsx-timode and -mlra on by default in that
branch.

[gcc]
2015-12-15  Michael Meissner  

PR target/68805
* config/rs6000/rs6000.c (rs6000_gen_le_vsx_permute): Use ROTATE
instead of VEC_SELECT for TImode.

* config/rs6000/vsx.md (VSX_LE): Move TImode from VSX_LE to
VSX_LE_128, so that we use ROTATE to swap the 64-bit words instead
of using VEC_SELECT.
(VSX_LE_128): Likewise.
(define_peephole2): Add peephole to eliminate double xxpermdi when
copying TImode.

[gcc/testsuite]
2015-12-15  Michael Meissner  

PR target/68805
* gcc.target/powerpc/pr68805.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 231624)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -8829,8 +8829,9 @@ rs6000_const_vec (machine_mode mode)
 rtx
 rs6000_gen_le_vsx_permute (rtx source, machine_mode mode)
 {
-  /* Use ROTATE instead of VEC_SELECT on IEEE 128-bit floating point.  */
-  if (FLOAT128_VECTOR_P (mode))
+  /* Use ROTATE instead of VEC_SELECT on IEEE 128-bit floating point, and
+ 128-bit integers if they are allowed in VSX registers.  */
+  if (FLOAT128_VECTOR_P (mode) || mode == TImode)
 return gen_rtx_ROTATE (mode, source, GEN_INT (64));
   else
 {
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 231624)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -26,15 +26,13 @@ (define_mode_iterator VSX_D [V2DF V2DI])
 
 ;; Iterator for the 2 64-bit vector types + 128-bit types that are loaded with
 ;; lxvd2x to properly handle swapping words on little endian
-(define_mode_iterator VSX_LE [V2DF
- V2DI
- V1TI
- (TI   "VECTOR_MEM_VSX_P (TImode)")])
+(define_mode_iterator VSX_LE [V2DF V2DI V1TI])
 
 ;; Mode iterator to handle swapping words on little endian for the 128-bit
 ;; types that goes in a single vector register.
 (define_mode_iterator VSX_LE_128 [(KF   "FLOAT128_VECTOR_P (KFmode)")
- (TF   "FLOAT128_VECTOR_P (TFmode)")])
+ (TF   "FLOAT128_VECTOR_P (TFmode)")
+ (TI   "TARGET_VSX_TIMODE")])
 
 ;; Iterator for the 2 32-bit vector types
 (define_mode_iterator VSX_W [V4SF V4SI])
@@ -739,6 +737,21 @@ (define_split
: operands[0];
 })
 
+;; Peephole to catch memory to memory transfers for TImode if TImode landed in
+;; VSX registers on a little endian system.  The vector types and IEEE 128-bit
+;; floating point are handled by the more generic swap elimination pass.
+(define_peephole2
+  [(set (match_operand:TI 0 "vsx_register_operand" "")
+   (rotate:TI (match_operand:TI 1 "vsx_register_operand" "")
+  (const_int 64)))
+   (set (match_operand:TI 2 "vsx_register_operand" "")
+   (rotate:TI (match_dup 0)
+  (const_int 64)))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX && TARGET_VSX_TIMODE
+   && (rtx_equal_p (operands[0], operands[2])
+   || peep2_reg_dead_p (2, operands[0]))"
+   [(set (match_dup 2) (match_dup 1))])
+
 ;; The post-reload split requires that we re-permute the source
 ;; register in case it is still live.
 (define_split
Index: gcc/testsuite/gcc.target/powerpc/pr68805.c
===

libgcc: unwind-ia64.c without malloc/free

2015-12-16 Thread Bernd Edlinger
Hi,

this is just an idea, how to avoid use of malloc in unwind-ia64.c.

I can compile this with my cross-compiler, but can not test anything.

If you find it interesting, then someone should continue this work and test
and/or fix it until it really works.

The idea is, I can use alloca instead of malloc: this is re-entrant, and
works also with zero free memory.  The allocations have to be done
in the main loop, and the state machine, is always guaranteed to have
at least one available memory block.  What do you think?


Regards,
Bernd.
Index: libgcc/config/ia64/unwind-ia64.c
===
--- libgcc/config/ia64/unwind-ia64.c	(revision 231696)
+++ libgcc/config/ia64/unwind-ia64.c	(working copy)
@@ -129,6 +129,14 @@ struct unw_labeled_state {
   struct unw_reg_state saved_state;
 };
 
+struct unw_alloc_context
+{
+  struct unw_reg_state *avail_reg_state;
+  struct unw_labeled_state *avail_labeled_state;
+  struct unw_reg_state **dup_state_result;
+  struct unw_reg_state *dup_state_param;
+};
+
 typedef struct unw_state_record
 {
   unsigned int first_region : 1;	/* is this the first region? */
@@ -152,9 +160,10 @@ typedef struct unw_state_record
 
   struct unw_labeled_state *labeled_states;	/* list of all labeled states */
   struct unw_reg_state curr;	/* current state */
+  struct unw_alloc_context ac;
 
   _Unwind_Personality_Fn personality;
-  
+
 } _Unwind_FrameState;
 
 enum unw_nat_type
@@ -237,128 +246,44 @@ static unsigned char const save_order[] =
 
 #define MIN(X, Y) ((X) < (Y) ? (X) : (Y))
 
-/* MASK is a bitmap describing the allocation state of emergency buffers,
-   with bit set indicating free. Return >= 0 if allocation is successful;
-   < 0 if failure.  */
-
-static inline int
-atomic_alloc (unsigned int *mask)
-{
-  unsigned int old = *mask, ret, new;
-
-  while (1)
-{
-  if (old == 0)
-	return -1;
-  ret = old & -old;
-  new = old & ~ret;
-  new = __sync_val_compare_and_swap (mask, old, new);
-  if (old == new)
-	break;
-  old = new;
-}
-
-  return __builtin_ffs (ret) - 1;
-}
-
-/* Similarly, free an emergency buffer.  */
-
-static inline void
-atomic_free (unsigned int *mask, int bit)
-{
-  __sync_xor_and_fetch (mask, 1 << bit);
-}
-
-
-#define SIZE(X)		(sizeof(X) / sizeof(*(X)))
-#define MASK_FOR(X)	((2U << (SIZE (X) - 1)) - 1)
-#define PTR_IN(X, P)	((P) >= (X) && (P) < (X) + SIZE (X))
-
-static struct unw_reg_state emergency_reg_state[32];
-static unsigned int emergency_reg_state_free = MASK_FOR (emergency_reg_state);
-
-static struct unw_labeled_state emergency_labeled_state[8];
-static unsigned int emergency_labeled_state_free = MASK_FOR (emergency_labeled_state);
-
-#ifdef ENABLE_MALLOC_CHECKING
-static int reg_state_alloced;
-static int labeled_state_alloced;
-#endif
-
 /* Allocation and deallocation of structures.  */
 
 static struct unw_reg_state *
-alloc_reg_state (void)
+alloc_reg_state (struct unw_alloc_context *ac)
 {
   struct unw_reg_state *rs;
 
-#ifdef ENABLE_MALLOC_CHECKING
-  reg_state_alloced++;
-#endif
+  rs = ac->avail_reg_state;
+  if (rs)
+ac->avail_reg_state = rs->next;
 
-  rs = malloc (sizeof (struct unw_reg_state));
-  if (!rs)
-{
-  int n = atomic_alloc (&emergency_reg_state_free);
-  if (n >= 0)
-	rs = &emergency_reg_state[n];
-}
-
   return rs;
 }
 
 static void
-free_reg_state (struct unw_reg_state *rs)
+free_reg_state (struct unw_reg_state *rs, struct unw_alloc_context *ac)
 {
-#ifdef ENABLE_MALLOC_CHECKING
-  reg_state_alloced--;
-#endif
-
-  if (PTR_IN (emergency_reg_state, rs))
-atomic_free (&emergency_reg_state_free, rs - emergency_reg_state);
-  else
-free (rs);
+  rs->next = ac->avail_reg_state;
+  ac->avail_reg_state = rs;
 }
 
 static struct unw_labeled_state *
-alloc_label_state (void)
+alloc_label_state (struct unw_alloc_context *ac)
 {
   struct unw_labeled_state *ls;
 
-#ifdef ENABLE_MALLOC_CHECKING
-  labeled_state_alloced++;
-#endif
+  ls = ac->avail_labeled_state;
+  ac->avail_labeled_state = NULL;
 
-  ls = malloc(sizeof(struct unw_labeled_state));
-  if (!ls)
-{
-  int n = atomic_alloc (&emergency_labeled_state_free);
-  if (n >= 0)
-	ls = &emergency_labeled_state[n];
-}
-
   return ls;
 }
 
-static void
-free_label_state (struct unw_labeled_state *ls)
-{
-#ifdef ENABLE_MALLOC_CHECKING
-  labeled_state_alloced--;
-#endif
-
-  if (PTR_IN (emergency_labeled_state, ls))
-atomic_free (&emergency_labeled_state_free, emergency_labeled_state - ls);
-  else
-free (ls);
-}
-
 /* Routines to manipulate the state stack.  */
 
 static void
 push (struct unw_state_record *sr)
 {
-  struct unw_reg_state *rs = alloc_reg_state ();
+  struct unw_reg_state *rs = alloc_reg_state (&sr->ac);
   memcpy (rs, &sr->curr, sizeof (*rs));
   sr->curr.next = rs;
 }
@@ -371,34 +296,40 @@ pop (struct unw_state_record *sr)
   if (!rs)
 abort ();
   memcpy (&sr->curr, rs, sizeof(*rs));
-  free_reg_state (rs);
+  free_

Re: [gomp4] [WIP] OpenACC bind, nohost clauses

2015-12-16 Thread Cesar Philippidis
On 12/14/2015 12:36 PM, Cesar Philippidis wrote:
> On 12/08/2015 11:55 AM, Thomas Schwinge wrote:
>> On Sat, 14 Nov 2015 09:36:36 +0100, I wrote:

>> C front end:
>>
>> --- gcc/c/c-parser.c
>> +++ gcc/c/c-parser.c
>> @@ -11607,6 +11607,8 @@ c_parser_oacc_clause_async (c_parser *parser, 
>> tree list)
>>  static tree
>>  c_parser_oacc_clause_bind (c_parser *parser, tree list)
>>  {
>> +  check_no_duplicate_clause (list, OMP_CLAUSE_BIND, "bind");
>> +
>>location_t loc = c_parser_peek_token (parser)->location;
>>  
>>parser->lex_untranslated_string = true;
>> @@ -11615,20 +11617,43 @@ c_parser_oacc_clause_bind (c_parser *parser, 
>> tree list)
>>parser->lex_untranslated_string = false;
>>return list;
>>  }
>> -  if (c_parser_next_token_is (parser, CPP_NAME)
>> -  || c_parser_next_token_is (parser, CPP_STRING))
>> +  tree name = error_mark_node;
>> +  c_token *token = c_parser_peek_token (parser);
>> +  if (c_parser_next_token_is (parser, CPP_NAME))
>>  {
>> -  tree t = c_parser_peek_token (parser)->value;
>> +  tree decl = lookup_name (token->value);
>> +  if (!decl)
>> +   error_at (token->location, "%qE has not been declared",
>> + token->value);
>> +  else if (TREE_CODE (decl) != FUNCTION_DECL)
>> +   error_at (token->location, "%qE does not refer to a function",
>> + token->value);
>>
>> Quite possibly we'll want to add more error checking (matching signature
>> of X and Y, for example).
> 
> Good idea, but I wonder if that would be too strict. Should we allow
> integer promotion in the bind function arguments?

I decided to be strict here. In c++ there is a decls_match function
which determines if, say, a function prototype matches it's definition.
Turns out that function was a little too strict and generic for the bind
clause, because I think we want to allow the user to bind functions
declared in different namespaces. E.g.

 namespace foo {
   #pragma acc routine
   int bar ();
 }

 #pragma acc routine bind (foo::bar)
 ...

This should be acceptable. As a consequence, I created a
bind_decls_match function, for this purpose. I probably could have
taught the existing decls_match function how to copy with bind clauses,
but I suspect we may need to add more special cases for the bind clause
because the spec is so vague.

>> Again simplifying the c_head/clauses handling (snipped), the C++ front
>> end changes are very similar to the C front end changes:
>>
>> --- gcc/cp/parser.c
>> +++ gcc/cp/parser.c
>> @@ -31539,42 +31538,76 @@ static tree
>>  cp_parser_oacc_clause_bind (cp_parser *parser, tree list)
>>  {
>> [...]
>> -  if (cp_lexer_next_token_is (parser->lexer, CPP_NAME)
>> -  || cp_lexer_next_token_is (parser->lexer, CPP_STRING))
>> +  tree name = error_mark_node;
>> +  cp_token *token = cp_lexer_peek_token (parser->lexer);
>> +  if (cp_lexer_next_token_is (parser->lexer, CPP_NAME))
>>
>> I'm not particularly confident in the following lookup/error checking
>> (which I copied a lot from C++ OpenACC routine parsing):
>>
>>  {
>> -  tree t;
>> -
>> -  if (cp_lexer_peek_token (parser->lexer)->type == CPP_STRING)
>> -   {
>> - t = cp_lexer_peek_token (parser->lexer)->u.value;
>> - cp_lexer_consume_token (parser->lexer);
>> +  //TODO
>> +  tree id = cp_parser_id_expression (parser, /*template_p=*/false,
>> +/*check_dependency_p=*/true,
>> +/*template_p=*/NULL,
>> +/*declarator_p=*/false,
>> +/*optional_p=*/false);
>> +  tree decl = cp_parser_lookup_name_simple (parser, id, 
>> token->location);
>> +  if (id != error_mark_node && decl == error_mark_node)
>> +   cp_parser_name_lookup_error (parser, id, decl, NLE_NULL,
>> +token->location);
>> +  if (/* TODO */ !decl || decl == error_mark_node)
>> +   error_at (token->location, "%qE has not been declared",
>> + token->u.value);
>> +  else if (/* TODO */ is_overloaded_fn (decl)
>> +  && (TREE_CODE (decl) != FUNCTION_DECL
>> +  || DECL_FUNCTION_TEMPLATE_P (decl)))
>> +   error_at (token->location, "%qE names a set of overloads",
>> + token->u.value);
>> +  else if (/* TODO */ !DECL_NAMESPACE_SCOPE_P (decl))
>> +   {
>> + /* Perhaps we should use the same rule as declarations in 
>> different
>> +namespaces?  */
>> + error_at (token->location,
>> +   "%qE does not refer to a namespace scope function",
>> +

[PATCH] [graphite] attach schedule tree to the scop

2015-12-16 Thread Sebastian Pop
we used to translate the just computed schedule tree into a union_map,
and then in the code generation it would be translated back to a schedule tree
just before generating AST code.
---
 gcc/graphite-isl-ast-to-gimple.c | 65 ++--
 gcc/graphite-optimize-isl.c  |  5 +++-
 gcc/graphite-poly.c  |  4 +--
 gcc/graphite.h   |  4 +++
 4 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index e2b5a00..66c4c5a 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -61,6 +61,9 @@ along with GCC; see the file COPYING3.  If not see
 #include 
 #include 
 #include 
+#ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
+#include 
+#endif
 
 #include "graphite.h"
 #include 
@@ -124,6 +127,29 @@ void ivs_params_clear (ivs_params &ip)
 }
 }
 
+#ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
+
+/* Set the "separate" option for the schedule node.  */
+
+static __isl_give isl_schedule_node *
+set_separate_option (__isl_take isl_schedule_node *node, void *user)
+{
+  if (user)
+return node;
+
+  if (isl_schedule_node_get_type (node) != isl_schedule_node_band)
+return node;
+
+  /* Set the "separate" option unless it is set earlier to another option.  */
+  if (isl_schedule_node_band_member_get_ast_loop_type (node, 0)
+  == isl_ast_loop_default)
+return isl_schedule_node_band_member_set_ast_loop_type
+  (node, 0, isl_ast_loop_separate);
+
+  return node;
+}
+#endif
+
 class translate_isl_ast_to_gimple
 {
  public:
@@ -289,6 +315,14 @@ class translate_isl_ast_to_gimple
 
   __isl_give isl_union_map *generate_isl_schedule (scop_p scop);
 
+#ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
+  /* Set the "separate" option for all schedules.  This helps reducing control
+ overhead.  */
+
+  __isl_give isl_schedule *
+set_options_for_schedule_tree (__isl_take isl_schedule *schedule);
+#endif
+
   /* Set the separate option for all dimensions.
  This helps to reduce control overhead.  */
 
@@ -3162,6 +3196,19 @@ ast_build_before_for (__isl_keep isl_ast_build *build, 
void *user)
   return id;
 }
 
+#ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
+/* Set the separate option for all schedules.  This helps reducing control
+   overhead.  */
+
+__isl_give isl_schedule *
+translate_isl_ast_to_gimple::set_options_for_schedule_tree
+(__isl_take isl_schedule *schedule)
+{
+  return isl_schedule_map_schedule_node_bottom_up
+(schedule, set_separate_option, NULL);
+}
+#endif
+
 /* Set the separate option for all dimensions.
This helps to reduce control overhead.  */
 
@@ -3186,6 +3233,7 @@ translate_isl_ast_to_gimple::set_options (__isl_take 
isl_ast_build *control,
 __isl_give isl_ast_node *
 translate_isl_ast_to_gimple::scop_to_isl_ast (scop_p scop, ivs_params &ip)
 {
+  isl_ast_node *ast_isl = NULL;
   /* Generate loop upper bounds that consist of the current loop iterator, an
  operator (< or <=) and an expression not involving the iterator.  If this
  option is not set, then the current loop iterator may appear several times
@@ -3203,8 +3251,21 @@ translate_isl_ast_to_gimple::scop_to_isl_ast (scop_p 
scop, ivs_params &ip)
isl_ast_build_set_before_each_for (context_isl, ast_build_before_for,
   dependence);
 }
-  isl_ast_node *ast_isl = isl_ast_build_ast_from_schedule (context_isl,
-  schedule_isl);
+
+#ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
+  if (scop->schedule)
+{
+  scop->schedule = set_options_for_schedule_tree (scop->schedule);
+  ast_isl = isl_ast_build_node_from_schedule (context_isl, scop->schedule);
+  isl_union_map_free(schedule_isl);
+}
+  else
+ast_isl = isl_ast_build_ast_from_schedule (context_isl, schedule_isl);
+#else
+  ast_isl = isl_ast_build_ast_from_schedule (context_isl, schedule_isl);
+  isl_schedule_free (scop->schedule);
+#endif
+
   isl_ast_build_free (context_isl);
   return ast_isl;
 }
diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index f5cb5c4..cb45bfe 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -420,6 +420,10 @@ optimize_isl (scop_p scop)
   return false;
 }
 
+  // Attach the schedule to scop so that it can be used in code generation.
+  // schedule freeing will occur in code generation.
+  scop->schedule = schedule;
+
 #ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
   /* isl 0.15 or later.  */
   isl_union_map *schedule_map = get_schedule_map_st (schedule);
@@ -428,7 +432,6 @@ optimize_isl (scop_p scop)
 #endif
   apply_schedule_map_to_scop (scop, schedule_map);
 
-  isl_schedule_free (schedule);
   isl_union_map_free (schedule_map);
   return true;
 }
diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index 00d674c..1f50ff8 100644
--- a/gcc/graphite-poly.c

Re: [v3 PATCH] PR libstdc++/68276

2015-12-16 Thread Ville Voutilainen
On 17 December 2015 at 00:12, Ville Voutilainen
 wrote:
> Tested on Linux-PPC64.
>
> 2015-12-17  Ville Voutilainen  
>
> PR libstdc++/68276
>
> * src/c++11/ios.cc (_M_grow_words): Use nothrow new.
> * testsuite/27_io/ios_base/storage/11584.cc: Adjust.

Shock horror, inconsistent indentation introduced by the patch. Fixed by the
attached patch.
diff --git a/libstdc++-v3/src/c++11/ios.cc b/libstdc++-v3/src/c++11/ios.cc
index 4adc701..f701e61 100644
--- a/libstdc++-v3/src/c++11/ios.cc
+++ b/libstdc++-v3/src/c++11/ios.cc
@@ -121,9 +121,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
if (__ix < numeric_limits::max())
  {
__newsize = __ix + 1;
-   __try
- { __words = new _Words[__newsize]; }
-   __catch(const std::bad_alloc&)
+   __words = new (std::nothrow) _Words[__newsize];
+   if (!__words)
  {
_M_streambuf_state |= badbit;
if (_M_streambuf_state & _M_exception)
diff --git a/libstdc++-v3/testsuite/27_io/ios_base/storage/11584.cc 
b/libstdc++-v3/testsuite/27_io/ios_base/storage/11584.cc
index 0c80795..ae680c7 100644
--- a/libstdc++-v3/testsuite/27_io/ios_base/storage/11584.cc
+++ b/libstdc++-v3/testsuite/27_io/ios_base/storage/11584.cc
@@ -26,14 +26,14 @@
 
 int new_fails;
 
-void* operator new(std::size_t n) throw (std::bad_alloc)
+void* operator new(std::size_t n, const std::nothrow_t&) throw()
 {
   if (new_fails)
-throw std::bad_alloc();  
+return 0;
   return malloc(n);
 }
-void* operator new[] (std::size_t n) throw (std::bad_alloc)
-{ return operator new(n); }
+void* operator new[] (std::size_t n, const std::nothrow_t& ntt) throw()
+{ return operator new(n, ntt); }
 
 void operator delete (void *p) throw() { free(p); }
 void operator delete[] (void *p) throw() { operator delete(p); }


Re: ISL version check patch

2015-12-16 Thread Jeff Law

On 12/16/2015 02:22 PM, Nathan Sidwell wrote:

This patch https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01273.html
breaks builds using static libisl & libgmp.  (a whole slew of undefined
__gmpz_FOO symbols).

Fixed with the attached patch to add -lgmp etc to the isl link test. ok?

OK.
jeff


Re: Fix PR66208

2015-12-16 Thread Jeff Law

On 12/16/2015 10:13 AM, Bernd Schmidt wrote:

This is a relatively straightforward PR where we should mention a macro
expansion in a warning message. The patch below implements the
suggestion by Marek to pass a location down from
build_function_call_vec. Ok if tests pass on x86_64-linux?

One question I have is about -Wformat, which is dealt with in the same
area. We do get a mention of the macro expansion, but in a different style:

/* { dg-do compile } */
/* { dg-options "-Wformat" } */

int foox (const char *, ...) __attribute__ ((format (printf, 1, 2)));

#define foo(p, x) foox (p, x)
#define foo3(p, x, y) foox (p, x, y)

void
bar (unsigned int x)
{
   foo ("%x", "fish");
   foo3 ("%x", 3, "fish");
}

macroloc2.c: In function ‘bar’:
macroloc2.c:12:8: warning: format ‘%x’ expects argument of type
‘unsigned int’, but argument 2 has type ‘char *’ [-Wformat=]
foo ("%x", "fish");
 ^
macroloc2.c:6:25: note: in definition of macro ‘foo’
  #define foo(p, x) foox (p, x)
  ^
macroloc2.c:13:9: warning: too many arguments for format
[-Wformat-extra-args]
foo3 ("%x", 3, "fish");
  ^
macroloc2.c:7:29: note: in definition of macro ‘foo3’
  #define foo3(p, x, y) foox (p, x, y)
  ^

Is this what we're looking for in terms of output?
Are you referring to the "in definition" vs "in expansion" difference? 
That appears to be a difference in the locus of the first displayed 
diagnostic before the unwinding of the macros.


So in the case above, the first locus is outside the the macro 
definitions (diagnostic for line #12) and we get the "in definition" 
form.  Similarly for the diagnostic on line #13 and the following note 
using the "in definition" form.


In your pr66208.c testcase we get the following diagnostics after your 
patch:


j.c: In function ‘baz’:
j.c:5:16: warning: null argument where non-null required (argument 1) 
[-Wnonnull]

 #define foo(p) foox (p, "p is null") /* { dg-warning "null argument" } */
^

j.c:9:3: note: in expansion of macro ‘foo’
   foo (0); /* { dg-message "note: in expansion" } */
   ^~~

So the diagnostic is inside the macro, so according to tree-diagnostic.c 
you should get the "in expansion" variant.


The difference may be an artifact of input_location's state when we 
detect the error.


Jeff





[v3 PATCH] PR libstdc++/68276

2015-12-16 Thread Ville Voutilainen
Tested on Linux-PPC64.

2015-12-17  Ville Voutilainen  

PR libstdc++/68276

* src/c++11/ios.cc (_M_grow_words): Use nothrow new.
* testsuite/27_io/ios_base/storage/11584.cc: Adjust.
diff --git a/libstdc++-v3/src/c++11/ios.cc b/libstdc++-v3/src/c++11/ios.cc
index 4adc701..4241bef 100644
--- a/libstdc++-v3/src/c++11/ios.cc
+++ b/libstdc++-v3/src/c++11/ios.cc
@@ -121,9 +121,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
if (__ix < numeric_limits::max())
  {
__newsize = __ix + 1;
-   __try
- { __words = new _Words[__newsize]; }
-   __catch(const std::bad_alloc&)
+__words = new (std::nothrow) _Words[__newsize];
+if (!__words)
  {
_M_streambuf_state |= badbit;
if (_M_streambuf_state & _M_exception)
diff --git a/libstdc++-v3/testsuite/27_io/ios_base/storage/11584.cc 
b/libstdc++-v3/testsuite/27_io/ios_base/storage/11584.cc
index 0c80795..ae680c7 100644
--- a/libstdc++-v3/testsuite/27_io/ios_base/storage/11584.cc
+++ b/libstdc++-v3/testsuite/27_io/ios_base/storage/11584.cc
@@ -26,14 +26,14 @@
 
 int new_fails;
 
-void* operator new(std::size_t n) throw (std::bad_alloc)
+void* operator new(std::size_t n, const std::nothrow_t&) throw()
 {
   if (new_fails)
-throw std::bad_alloc();  
+return 0;
   return malloc(n);
 }
-void* operator new[] (std::size_t n) throw (std::bad_alloc)
-{ return operator new(n); }
+void* operator new[] (std::size_t n, const std::nothrow_t& ntt) throw()
+{ return operator new(n, ntt); }
 
 void operator delete (void *p) throw() { free(p); }
 void operator delete[] (void *p) throw() { operator delete(p); }


Re: [PATCH] Fix some blockers of PR c++/24666 (arrays decay to pointers too early)

2015-12-16 Thread Patrick Palka

On Wed, 16 Dec 2015, Jason Merrill wrote:


On 12/15/2015 04:16 PM, Patrick Palka wrote:

+  if (MAYBE_CLASS_TYPE_P (type))
+;


What does this patch do with conversion to const reference to class?  I think 
we want to check MAYBE_CLASS_TYPE_P (non_reference (type)) here.


That makes sense.  Here's an updated patch using non_reference, with
tests updated to each have a reference-returning variant.

-- 8< --

gcc/cp/ChangeLog:

PR c++/16333
PR c++/41426
PR c++/59878
PR c++/66895
* typeck.c (convert_for_initialization): Don't perform an early
decaying conversion if converting to a class type.

gcc/testsuite/ChangeLog:

PR c++/16333
PR c++/41426
PR c++/59878
PR c++/66895
* g++.dg/conversion/pr16333.C: New test.
* g++.dg/conversion/pr41426.C: New test.
* g++.dg/conversion/pr59878.C: New test.
* g++.dg/conversion/pr66895.C: New test.
---
 gcc/cp/typeck.c   | 16 +++--
 gcc/testsuite/g++.dg/conversion/pr16333.C | 10 
 gcc/testsuite/g++.dg/conversion/pr41426.C | 40 +++
 gcc/testsuite/g++.dg/conversion/pr59878.C | 25 +++
 gcc/testsuite/g++.dg/conversion/pr66895.C | 16 +
 5 files changed, 100 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/conversion/pr16333.C
 create mode 100644 gcc/testsuite/g++.dg/conversion/pr41426.C
 create mode 100644 gcc/testsuite/g++.dg/conversion/pr59878.C
 create mode 100644 gcc/testsuite/g++.dg/conversion/pr66895.C

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 39c1af2..a06ecf0 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -8479,13 +8479,15 @@ convert_for_initialization (tree exp, tree type, tree 
rhs, int flags,
   || (TREE_CODE (rhs) == TREE_LIST && TREE_VALUE (rhs) == error_mark_node))
 return error_mark_node;

-  if ((TREE_CODE (TREE_TYPE (rhs)) == ARRAY_TYPE
-   && TREE_CODE (type) != ARRAY_TYPE
-   && (TREE_CODE (type) != REFERENCE_TYPE
-  || TREE_CODE (TREE_TYPE (type)) != ARRAY_TYPE))
-  || (TREE_CODE (TREE_TYPE (rhs)) == FUNCTION_TYPE
- && !TYPE_REFFN_P (type))
-  || TREE_CODE (TREE_TYPE (rhs)) == METHOD_TYPE)
+  if (MAYBE_CLASS_TYPE_P (non_reference (type)))
+;
+  else if ((TREE_CODE (TREE_TYPE (rhs)) == ARRAY_TYPE
+   && TREE_CODE (type) != ARRAY_TYPE
+   && (TREE_CODE (type) != REFERENCE_TYPE
+   || TREE_CODE (TREE_TYPE (type)) != ARRAY_TYPE))
+  || (TREE_CODE (TREE_TYPE (rhs)) == FUNCTION_TYPE
+  && !TYPE_REFFN_P (type))
+  || TREE_CODE (TREE_TYPE (rhs)) == METHOD_TYPE)
 rhs = decay_conversion (rhs, complain);

   rhstype = TREE_TYPE (rhs);
diff --git a/gcc/testsuite/g++.dg/conversion/pr16333.C 
b/gcc/testsuite/g++.dg/conversion/pr16333.C
new file mode 100644
index 000..810c12a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/conversion/pr16333.C
@@ -0,0 +1,10 @@
+// PR c++/16333
+
+struct X {
+   X (const int (&)[3]);
+};
+
+int a[3];
+X foo1 () { return a; }
+const X &foo2 () { return a; } // { dg-warning "returning reference to 
temporary" }
+X &foo3 () { return a; } // { dg-error "invalid initialization" }
diff --git a/gcc/testsuite/g++.dg/conversion/pr41426.C 
b/gcc/testsuite/g++.dg/conversion/pr41426.C
new file mode 100644
index 000..78ec5fb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/conversion/pr41426.C
@@ -0,0 +1,40 @@
+// PR c++/41426
+
+template 
+struct A
+{
+   template 
+   A(_T (&V)[_N]);
+   A();
+};
+
+A g1()
+{
+   float f[] = {1.1f, 2.3f};
+   return f;
+}
+
+const A &g3()
+{
+   float f[] = {1.1f, 2.3f};
+   return f; // { dg-warning "returning reference to temporary" }
+}
+
+A &g4()
+{
+   float f[] = {1.1f, 2.3f};
+   return f; // { dg-error "invalid initialization" }
+}
+
+struct B
+{
+   B (int (&v)[10]);
+   B();
+};
+
+B g2()
+{
+   int c[10];
+   return c;
+}
+
diff --git a/gcc/testsuite/g++.dg/conversion/pr59878.C 
b/gcc/testsuite/g++.dg/conversion/pr59878.C
new file mode 100644
index 000..ed567fe
--- /dev/null
+++ b/gcc/testsuite/g++.dg/conversion/pr59878.C
@@ -0,0 +1,25 @@
+// PR c++/59878
+
+struct Test {
+ template 
+ Test(const char (&array)[N]) {}
+};
+
+Test test() {
+ return "test1";
+}
+
+void test2(Test arg = "test12") {}
+
+template 
+void test3(T arg = "test123") {}
+
+template 
+void test4(const T &arg = "test123") {}
+
+int main() {
+ test();
+ test2();
+ test3();
+ test4();
+}
diff --git a/gcc/testsuite/g++.dg/conversion/pr66895.C 
b/gcc/testsuite/g++.dg/conversion/pr66895.C
new file mode 100644
index 000..14203bd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/conversion/pr66895.C
@@ -0,0 +1,16 @@
+// PR c++/66895
+// { dg-do compile { target c++11 } }
+
+#include 
+#include 
+
+struct S {
+template S(char const (&)[N]);
+};
+struct T1 { S s; };
+void f1(std::initializer_list);
+void g1() { f1({{""}}); }
+
+struct T2 { const S& s; };
+void f2(std::initializer_list);
+void g2() 

[PATCH] Fix INTEGER_CST handling for > 64 bits wide bitfields (PR tree-optimization/68835)

2015-12-16 Thread Jakub Jelinek
Hi!

As can be seen on the testcases below, on > 64 bit precision bitfields
we either ICE or miscompile.

get_int_cst_ext_nunits already has code that for unsigned precision
in multiplies of HOST_BITS_PER_WIDE_INT it forces TREE_INT_CST_EXT_NUNITS
to be bigger than TREE_INT_CST_NUNITS, the former holds the actual
value (as negative) and is followed by 0 or more -1 values and a final 0
value.  But for some reason this isn't done for > HOST_BITS_PER_WIDE_INT
precisions that aren't multiples of HOST_BITS_PER_WIDE_INT, while we want to
say even in those cases that the value is actually not negative, but very
large.

The following patch attempts to do that, by handling those precisions
the same, TREE_INT_CST_NUNITS again hold the negative value, followed by
0 or more -1 values and finally one which is the -1 zero extended to the
precision % HOST_BITS_PER_WIDE_INT (so for the former special case
of precision % HOST_BITS_PER_WIDE_INT == 0 still 0 as before).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

BTW, the tree-pretty-print.c printing of such INTEGER_CSTs still looks
wrong, we use for the unsigned wi::neg_p values
print_hex (const wide_int_ref &, char *) which prints digits rounded up
to BLOCKS_NEEDED (wi.get_precision ()).  I think it would be better
to print in that case just the non-padded number of digits (and for digits
not divisible by 4 start with 0x1, 0x3 or 0x7), but not sure if additional
parameter should be added for this to print_hex, or just tree-pretty-print
should call sprintf directly in that case.  Preferences?

2015-12-16  Jakub Jelinek  

PR tree-optimization/68835
* tree.c (get_int_cst_ext_nunits): Return
cst.get_precision () / HOST_BITS_PER_WIDE_INT + 1
for all unsigned wi::neg_p (cst) constants.
(build_new_int_cst): If cst.get_precision is not a multiple
of HOST_BITS_PER_WIDE_INT, zero extend -1 to the precision
% HOST_BITS_PER_WIDE_INT.

* gcc.dg/pr68835-1.c: New test.
* gcc.dg/pr68835-2.c: New test.

--- gcc/tree.c.jj   2015-12-16 09:02:11.0 +0100
+++ gcc/tree.c  2015-12-16 17:50:25.0 +0100
@@ -1245,11 +1245,9 @@ static unsigned int
 get_int_cst_ext_nunits (tree type, const wide_int &cst)
 {
   gcc_checking_assert (cst.get_precision () == TYPE_PRECISION (type));
-  /* We need an extra zero HWI if CST is an unsigned integer with its
- upper bit set, and if CST occupies a whole number of HWIs.  */
-  if (TYPE_UNSIGNED (type)
-  && wi::neg_p (cst)
-  && (cst.get_precision () % HOST_BITS_PER_WIDE_INT) == 0)
+  /* We need extra HWIs if CST is an unsigned integer with its
+ upper bit set.  */
+  if (TYPE_UNSIGNED (type) && wi::neg_p (cst))
 return cst.get_precision () / HOST_BITS_PER_WIDE_INT + 1;
   return cst.get_len ();
 }
@@ -1266,7 +1264,8 @@ build_new_int_cst (tree type, const wide
   if (len < ext_len)
 {
   --ext_len;
-  TREE_INT_CST_ELT (nt, ext_len) = 0;
+  TREE_INT_CST_ELT (nt, ext_len)
+   = zext_hwi (-1, cst.get_precision () % HOST_BITS_PER_WIDE_INT);
   for (unsigned int i = len; i < ext_len; ++i)
TREE_INT_CST_ELT (nt, i) = -1;
 }
--- gcc/testsuite/gcc.dg/pr68835-1.c.jj 2015-12-16 18:14:08.960943653 +0100
+++ gcc/testsuite/gcc.dg/pr68835-1.c2015-12-16 18:15:56.803447877 +0100
@@ -0,0 +1,12 @@
+/* PR tree-optimization/68835 */
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2" } */
+
+unsigned __int128
+foo (unsigned long a, unsigned long b)
+{
+  unsigned __int128 x = (unsigned __int128) a * b;
+  struct { unsigned __int128 a : 96; } w;
+  w.a = x;
+  return w.a;
+}
--- gcc/testsuite/gcc.dg/pr68835-2.c.jj 2015-12-16 18:41:32.162177493 +0100
+++ gcc/testsuite/gcc.dg/pr68835-2.c2015-12-16 18:43:07.829853729 +0100
@@ -0,0 +1,23 @@
+/* PR tree-optimization/68835 */
+/* { dg-do run { target int128 } } */
+/* { dg-options "-O2" } */
+
+__attribute__((noinline, noclone)) unsigned __int128
+foo (void)
+{
+  unsigned __int128 x = (unsigned __int128) 0xULL;
+  struct { unsigned __int128 a : 65; } w;
+  w.a = x;
+  w.a += x;
+  return w.a;
+}
+
+int
+main ()
+{
+  unsigned __int128 x = foo ();
+  if ((unsigned long long) x != 0xfffeULL
+  || (unsigned long long) (x >> 64) != 1)
+__builtin_abort ();
+  return 0;
+}

Jakub


Re: [PATCH] Fix PR c++/21802 (two-stage name lookup fails for operators)

2015-12-16 Thread Markus Trippelsdorf
On 2015.12.14 at 19:34 -0500, Jason Merrill wrote:
> OK.

This patch caused https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68936 

-- 
Markus


Re: [PATCH] Fix some blockers of PR c++/24666 (arrays decay to pointers too early)

2015-12-16 Thread Jason Merrill

On 12/15/2015 04:16 PM, Patrick Palka wrote:

+  if (MAYBE_CLASS_TYPE_P (type))
+;


What does this patch do with conversion to const reference to class?  I 
think we want to check MAYBE_CLASS_TYPE_P (non_reference (type)) here.


Jason



Re: [PATCH] Fix PR c++/21802 (two-stage name lookup fails for operators)

2015-12-16 Thread Jason Merrill

Looks good.

Jason


Re: [PATCH] Fix PR c++/21802 (two-stage name lookup fails for operators)

2015-12-16 Thread Patrick Palka

On Wed, 16 Dec 2015, Michael Matz wrote:


Hi,

On Mon, 14 Dec 2015, Patrick Palka wrote:


This should use cp_tree_operand_length.

Hmm, I don't immediately see how I can use this function here.  It
expects a tree but I dont have an appropriate tree to give to it, only a
tree_code.


True.  So let's introduce cp_tree_code_length next to cp_tree_operand_length.

Jason




Like this?  Incremental diff followed by patch v4:


Not my turf, but if I may make a suggestion: please use the new function
in the old one instead of duplicating the implementation.


The implementation is not exactly a duplicate since the old function
returns TREE_OPERAND_LENGTH in the default case and the new function
returns TREE_CODE_LENGTH in the default case.

It's still doable though. Defining one in terms of the other would look
like this.  Is this patch OK to commit after testing?

-- 8< --

Subject: [PATCH] Avoid code duplication in cp_tree_[operand|code]_length

gcc/cp/ChangeLog:

* tree.c (cp_tree_operand_length): Define in terms of
cp_tree_code_length.
---
 gcc/cp/tree.c | 19 +++
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 0c0987d..ae176d0 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -4427,23 +4427,10 @@ cp_tree_operand_length (const_tree t)
 {
   enum tree_code code = TREE_CODE (t);

-  switch (code)
-{
-case PREINCREMENT_EXPR:
-case PREDECREMENT_EXPR:
-case POSTINCREMENT_EXPR:
-case POSTDECREMENT_EXPR:
-  return 1;
+  if (TREE_CODE_CLASS (code) == tcc_vl_exp)
+return VL_EXP_OPERAND_LENGTH (t);

-case ARRAY_REF:
-  return 2;
-
-case EXPR_PACK_EXPANSION:
-  return 1;
-
-default:
-  return TREE_OPERAND_LENGTH (t);
-}
+  return cp_tree_code_length (code);
 }

 /* Like cp_tree_operand_length, but takes a tree_code CODE.  */
--
2.7.0.rc0.50.g1470d8f.dirty






Ciao,
Michael.



Re: [PATCH][AArch64] Replace insn to zero up DF register

2015-12-16 Thread Evandro Menezes

On 10/30/2015 05:24 AM, Marcus Shawcroft wrote:

On 20 October 2015 at 00:40, Evandro Menezes  wrote:

In the existing targets, it seems that it's always faster to zero up a DF
register with "movi %d0, #0" instead of "fmov %d0, xzr".

This patch modifies the respective pattern.


Hi Evandro,

This patch changes the generic, u architecture independent instruction
selection. The ARM ARM (C3.5.3) makes a specific recommendation about
the choice of instruction in this situation and the current
implementation in GCC follows that recommendation.  Wilco has also
picked up on this issue he has the same patch internal to ARM along
with an ongoing discussion with ARM architecture folk regarding this
recommendation.  I'm reluctant to take this patch right now on the
basis that it runs contrary to ARM ARM recommendation pending the
conclusion of Wilco's discussion with ARM architecture folk.



Marcus,

Have you had a chance to discuss this internally further?

Thank you,

--
Evandro Menezes



Re: [PATCHES, PING*5] Enhance standard DWARF for Ada

2015-12-16 Thread Jason Merrill

On 12/16/2015 03:53 AM, Pierre-Marie de Rodat wrote:

+  /* Called from finalize_size_functions for functions whose body is needed to
+ generate complete debug info.  For instance, functions used to compute the
+ size of variable-length structures.  */
+  void (* function_body) (tree decl);


Calling this "function_body" seems overly generic; let's call it 
size_function and talk specifically about encoding the function body in 
the debug info.



   debug_nothing_rtx_insn,   /* var_location */
+  debug_nothing_tree,   /* var_location */


And this comment shouldn't be the same as the previous line.


+/* Helper for loc_descr_without_nops: free the location description operation
+   P.  */
+bool
+free_loc_descr (const dw_loc_descr_ref &loc, void *data ATTRIBUTE_UNUSED)


Blank line between comment and function.

OK with those changes.

Jason



ISL version check patch

2015-12-16 Thread Nathan Sidwell
This patch https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01273.html breaks 
builds using static libisl & libgmp.  (a whole slew of undefined __gmpz_FOO 
symbols).


Fixed with the attached patch to add -lgmp etc to the isl link test. ok?

nathan
2015-12-16  Nathan Sidwell  

	* config/isl.m4 (ISL_CHECK_VERSION): Add gmp libs.
	* configure: Regenerate.

Index: config/isl.m4
===
--- config/isl.m4	(revision 231721)
+++ config/isl.m4	(working copy)
@@ -103,8 +103,8 @@ AC_DEFUN([ISL_CHECK_VERSION],
 _isl_saved_LIBS=$LIBS
 
 CFLAGS="${_isl_saved_CFLAGS} ${islinc} ${gmpinc}"
-LDFLAGS="${_isl_saved_LDFLAGS} ${isllibs}"
-LIBS="${_isl_saved_LIBS} -lisl"
+LDFLAGS="${_isl_saved_LDFLAGS} ${isllibs} ${gmplibs}"
+LIBS="${_isl_saved_LIBS} -lisl -lgmp"
 
 AC_MSG_CHECKING([for isl 0.15 (or deprecated 0.14)])
 AC_TRY_LINK([#include ],
Index: configure
===
--- configure	(revision 231721)
+++ configure	(working copy)
@@ -6017,8 +6017,8 @@ $as_echo "$as_me: WARNING: using in-tree
 _isl_saved_LIBS=$LIBS
 
 CFLAGS="${_isl_saved_CFLAGS} ${islinc} ${gmpinc}"
-LDFLAGS="${_isl_saved_LDFLAGS} ${isllibs}"
-LIBS="${_isl_saved_LIBS} -lisl"
+LDFLAGS="${_isl_saved_LDFLAGS} ${isllibs} ${gmplibs}"
+LIBS="${_isl_saved_LIBS} -lisl -lgmp"
 
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for isl 0.15 (or deprecated 0.14)" >&5
 $as_echo_n "checking for isl 0.15 (or deprecated 0.14)... " >&6; }


Re: [PATCH] C and C++ FE: better source ranges for binary ops

2015-12-16 Thread Jeff Law

On 12/16/2015 11:04 AM, David Malcolm wrote:

Currently trunk emits range information for most bad binary operations
in the C++ frontend; but not in the C frontend.

The helper function binary_op_error shared by C and C++ takes a
location_t.  In the C++ frontend, a location_t containing the range
has already been built, so we get the underline when it later validates
the expression: e.g. this example from:
https://gcc.gnu.org/wiki/ClangDiagnosticsComparison

t.cc:9:19: error: no match for ‘operator+’ (operand types are ‘int’ and ‘foo’)
return P->bar() + *P;
   ~^~~~

In the C frontend, the "full" location_t is only built
after type-checking, so if type-checking fails, we get a caret/range
covering just the operator so e.g. for:

return (some_function ()
 + some_other_function ());

we get just:

gcc.dg/bad-binary-ops.c:29:4: error: invalid operands to binary + (have 'struct 
s' and 'struct t')
 + some_other_function ());
 ^

The following patch updates binary_op_error to accept a rich_location *.
For the C++ frontend we populate it with just the location_t as before,
but for the C frontend we can add locations for the operands, giving
this underlining for the example above:

bad-binary-ops.c:29:4: error: invalid operands to binary + (have 'struct s' and 
'struct t')
return (some_function ()

 + some_other_function ());
 ^ ~~

Additionally, in the C++ frontend, cp_build_binary_op has an "error"
call, which implicitly uses input_location, giving this reduced
information for another test case from
https://gcc.gnu.org/wiki/ClangDiagnosticsComparison:

bad-binary-ops.C:10:14: error: invalid operands of types '__m128 {aka float}' 
and 'const int*' to binary 'operator/'
myvec[1] / ptr;
   ^~~

The patch updates it to use error_at with the location_t provided,
fixing the above to become:

bad-binary-ops.C:10:12: error: invalid operands of types '__m128 {aka float}' 
and 'const int*' to binary 'operator/'
myvec[1] / ptr;
~^

Finally, since this patch adds a usage of
gcc_rich_location::maybe_add_expr to cc1, we can drop the hacked-in
copy of gcc_rich_location::add_expr from
gcc.dg/plugin/diagnostic_plugin_show_trees.c.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu;
adds 15 new PASS results in g++.sum and 7 new PASS results in gcc.sum

OK for trunk in stage 3?

gcc/c-family/ChangeLog:
* c-common.c (binary_op_error): Convert first param from
location_t to rich_location * and use it when emitting an error.
* c-common.h (binary_op_error): Convert first param from
location_t to rich_location *.

gcc/c/ChangeLog:
* c-typeck.c: Include "gcc-rich-location.h".
(build_binary_op): In the two places that call binary_op_error,
create a gcc_rich_location and populate it with the location of
the binary op and its two operands.

gcc/cp/ChangeLog:
* typeck.c (cp_build_binary_op): Update for change in signature
of build_binary_op.  Use error_at to replace an implicit use
of input_location with param "location" in "invalid operands"
error.
(cp_build_binary_op): Replace an error with an error_at, using
"location", rather than implicitly using input_location.

gcc/testsuite/ChangeLog:
* g++.dg/diagnostic/bad-binary-ops.C: New test case.
* gcc.dg/bad-binary-ops.c: New test case.
gcc.dg/plugin/diagnostic_plugin_show_trees.c (get_range_for_expr):
Remove material copied from gcc-rich-location.c
(gcc_rich_location::add_expr): Likewise.

So I'm of a mixed mind here.

We're well into stage3 and I can't see a reasonable way to call this a 
bugfix, but I see the value here at the end user level and it looks to 
be quite safe.


I'll tentatively approve -- give other maintainers 48hrs to object on 
the grounds this isn't a bugfix before committing.


Jeff


Re: [PATCH][WIP] libstdc++: Make certain exceptions transaction_safe.

2015-12-16 Thread Jonathan Wakely

Sorry for the delay finishing this review, some of the code kept
melting my brain ;-)


On 14/11/15 20:45 +0100, Torvald Riegel wrote:

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index 1b3184a..d902b03 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1876,6 +1876,12 @@ GLIBCXX_3.4.22 {
_ZNSt6thread6_StateD[012]Ev;

_ZNSt6thread15_M_start_threadESt10unique_ptrINS_6_StateESt14default_deleteIS1_EEPFvvE;

+# Support for the Transactional Memory TS (N4514)
+_ZGTtNSt11logic_errorC1EPKc;
+
_ZGTtNSt11logic_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;
+_ZGTtNKSt11logic_error4whatEv;
+_ZGTtNSt11logic_errorD1Ev;
+
} GLIBCXX_3.4.21;


This is OK but ...


# Symbols in the support library (libsupc++) have their own tag.
@@ -2107,6 +2113,12 @@ CXXABI_1.3.9 {
# operator delete[](void*, std::size_t)
_ZdaPv[jmy];

+# Support for the Transactional Memory TS (N4514)
+_ZGTtNKSt9exceptionD1Ev;
+_ZGTtNKSt9exception4whatEv;
+_ZGTtNKSt13bad_exceptionD1Ev;
+_ZGTtNKSt13bad_exception4whatEv;
+
} CXXABI_1.3.8;


That symbol version was already used in the gcc-5 release and so is
frozen, you'll need CXXABI_1.3.10 for these new symbols (similar to
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00258.html so if
Catherine's already added that version you can just add them there).



diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 723feb1..0e66bb0 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -481,6 +481,17 @@ namespace std
# define _GLIBCXX_BEGIN_EXTERN_C extern "C" {
# define _GLIBCXX_END_EXTERN_C }

+// Conditionally enable annotations for the Transactional Memory TS on C++11.
+#if __cplusplus >= 201103L && \
+  _GLIBCXX_USE_CXX11_ABI && _GLIBCXX_USE_DUAL_ABI && \


It's possible we can make this work for the old ABI too, but this is
OK for now. The old ABI always uses COW strings, but that's what the
code you've written deals with anyway.


+  defined(__cpp_transactional_memory) && __cpp_transactional_memory >= 201505L


The defined(__cpp_transactional_memory) check is redundant, isn't it?

Users aren't allowed to define it, so it will either be defined to an
integer value or undefined and evaluate to zero.


+#define _GLIBCXX_TXN_SAFE transaction_safe
+#define _GLIBCXX_TXN_SAFE_DYN transaction_safe_dynamic
+#else
+#define _GLIBCXX_TXN_SAFE
+#define _GLIBCXX_TXN_SAFE_DYN
+#endif
+
#else // !__cplusplus
# define _GLIBCXX_BEGIN_EXTERN_C
# define _GLIBCXX_END_EXTERN_C




@@ -44,7 +46,36 @@ std::exception::what() const _GLIBCXX_USE_NOEXCEPT
}

const char* 
-std::bad_exception::what() const _GLIBCXX_USE_NOEXCEPT

+std::bad_exception::what() const _GLIBCXX_TXN_SAFE_DYN _GLIBCXX_USE_NOEXCEPT
{
  return "std::bad_exception";
}
+
+// Transactional clones for the destructors and what().
+// what() is effectively transaction_pure, but we do not want to annotate it
+// as such; thus, we call exactly the respective nontransactional function.
+extern "C" {
+
+void
+_ZGTtNKSt9exceptionD1Ev(const std::exception*)
+{ }
+
+const char*
+_ZGTtNKSt9exception4whatEv(const std::exception* that)
+{
+  return that->std::exception::what();
+}


This makes a non-virtual call, is that correct?

If users derive from std::exception and override what() they will
expect a call to what() to dispatch to their override in the derived
class, but IIUC in a transactional block they would call this
function, which would call the base what(), not their override.


+_ZGTtNKSt13bad_exception4whatEv(
+const std::bad_exception* that)
+{
+  return that->std::bad_exception::what();
+}


Ditto.


@@ -151,3 +164,220 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

_GLIBCXX_END_NAMESPACE_VERSION
} // namespace
+
+// Support for the Transactional Memory TS (N4514).
+//
+// logic_error and runtime_error both carry a message in the form of a COW
+// string.  This COW string is never made visible to users of the exception
+// because what() returns a C string.  The COW string can be constructed as
+// either a copy of a COW string of another logic_error/runtime_error, or
+// using a C string or SSO string; thus, the COW string's _Rep is only
+// accessed by logic_error operations.  We control all txnal clones of those
+// operations and thus can ensure that _Rep is never accessed transactionally.
+// Furthermore, _Rep will always have been allocated or deallocated via
+// global new or delete, so nontransactional writes we do to _Rep cannot


Hmm, will it always be global new/delete? It uses std::allocator,
which by default uses new/delete but libstdc++ can be configured to
use a different std::allocator implementation. If they always use new
at some point maybe we're OK, but I'd have to check the alternative
allocators. Maybe we just say only new_allocator is supported for TM.

I assume we want to avoid making a txnal std::allocator.


+

Re: update_vtable_references segfault

2015-12-16 Thread Jan Hubicka
> On 12/12/15 09:44, Nathan Sidwell wrote:
> >On 12/11/15 13:15, Jan Hubicka wrote:
> >>>Jan,
> >
> >>>b) augment can_replace_by_local_alias_in_vtable to check whether
> >>>aliases can be created?
> >>
> >>I think this is best: can_replace_by_local_alias_in_vtable exists to 
> >>prevent the
> >>body walk in cases we are not going to create the alias.  This is because 
> >>in LTO
> >>we may need to stream in the constructor from the object file that is not
> >>copletely
> >>free and thus it is better to not touch it unless necessary.
> >
> >I went with augmenting can_replace_by_local_alias, which
> >can_replace_by_local_alias_in_vtable calls.  I also noticed that both should 
> >be
> >static, which  I suspect will encourage the inliner to go inline them and 
> >then
> >determine a bunch of code is unreachable.
> >
> >tested on x86-linux and ptx-none.
> >
> >ok?
> 
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01324.html
OK, thanks!
Honza
> 
> ping?
> 
> nathan


[PATCH] Remove use of 'struct map' from plugin (nvptx)

2015-12-16 Thread James Norris

Hi,

The attached patch removes the use of the map structure
(struct map) from the NVPTX plugin.

Regtested on x86_64-pc-linux-gnu

Ok for trunk?

Thanks!
Jim

ChangeLog
=

2015-12-XX  James Norris  

libgomp/
* plugin/plugin-nvptx.c (struct map): Removed.
(map_init, map_pop): Remove use of struct map. (map_push):
Likewise and change argument list.
* testsuite/libgomp.oacc-c-c++-common/mapping-1.c: New
Index: plugin/plugin-nvptx.c
===
diff --git a/trunk/libgomp/plugin/plugin-nvptx.c b/trunk/libgomp/plugin/plugin-nvptx.c
--- a/trunk/libgomp/plugin/plugin-nvptx.c	(revision 231649)
+++ b/trunk/libgomp/plugin/plugin-nvptx.c	(working copy)
@@ -91,13 +91,6 @@
   struct ptx_device *ptx_dev;
 };
 
-struct map
-{
-  int async;
-  size_t  size;
-  charmappings[0];
-};
-
 static void
 map_init (struct ptx_stream *s)
 {
@@ -140,17 +133,13 @@
 static void
 map_pop (struct ptx_stream *s)
 {
-  struct map *m;
-
   assert (s != NULL);
   assert (s->h_next);
   assert (s->h_prev);
   assert (s->h_tail);
 
-  m = s->h_tail;
+  s->h_tail = s->h_next;
 
-  s->h_tail += m->size;
-
   if (s->h_tail >= s->h_end)
 s->h_tail = s->h_begin + (int) (s->h_tail - s->h_end);
 
@@ -167,16 +156,14 @@
 }
 
 static void
-map_push (struct ptx_stream *s, int async, size_t size, void **h, void **d)
+map_push (struct ptx_stream *s, size_t size, void **h, void **d)
 {
   int left;
   int offset;
-  struct map *m;
 
   assert (s != NULL);
 
   left = s->h_end - s->h_next;
-  size += sizeof (struct map);
 
   assert (s->h_prev);
   assert (s->h_next);
@@ -183,22 +170,14 @@
 
   if (size >= left)
 {
-  m = s->h_prev;
-  m->size += left;
-  s->h_next = s->h_begin;
-
-  if (s->h_next + size > s->h_end)
-	GOMP_PLUGIN_fatal ("unable to push map");
+  assert (s->h_next == s->h_prev);
+  s->h_next = s->h_prev = s->h_tail = s->h_begin;
 }
 
   assert (s->h_next);
 
-  m = s->h_next;
-  m->async = async;
-  m->size = size;
+  offset = s->h_next - s->h;
 
-  offset = (void *)&m->mappings[0] - s->h;
-
   *d = (void *)(s->d + offset);
   *h = (void *)(s->h + offset);
 
@@ -904,7 +883,7 @@
   /* This reserves a chunk of a pre-allocated page of memory mapped on both
  the host and the device. HP is a host pointer to the new chunk, and DP is
  the corresponding device pointer.  */
-  map_push (dev_str, async, mapnum * sizeof (void *), &hp, &dp);
+  map_push (dev_str, mapnum * sizeof (void *), &hp, &dp);
 
   GOMP_PLUGIN_debug (0, "  %s: prepare mappings\n", __FUNCTION__);
 
Index: testsuite/libgomp.oacc-c-c++-common/mapping-1.c
===
diff --git a/trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/mapping-1.c b/trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/mapping-1.c
new file mode 10644
--- /dev/null	(revision 0)
+++ b/trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/mapping-1.c	(working copy)
@@ -0,0 +1,63 @@
+/* { dg-do run } */
+
+#include 
+#include 
+#include 
+
+/* Exercise the kernel launch argument mapping.  */
+
+int
+main (int argc, char **argv)
+{
+  int a[256], b[256], c[256], d[256], e[256], f[256];
+  int i;
+  int n;
+
+  /* 48 is the size of the mappings for the first parallel construct.  */
+  n = sysconf (_SC_PAGESIZE) / 48 - 1;
+
+  i = 0;
+
+  for (i = 0; i < n; i++)
+{
+  #pragma acc parallel copy (a, b, c, d)
+	{
+	  int j;
+
+	  for (j = 0; j < 256; j++)
+	{
+	  a[j] = j;
+	  b[j] = j;
+	  c[j] = j;
+	  d[j] = j;
+	}
+	}
+}
+
+#pragma acc parallel copy (a, b, c, d, e, f)
+  {
+int j;
+
+for (j = 0; j < 256; j++)
+  {
+	a[j] = j;
+	b[j] = j;
+	c[j] = j;
+	d[j] = j;
+	e[j] = j;
+	f[j] = j;
+  }
+  }
+
+  for (i = 0; i < 256; i++)
+   {
+ if (a[i] != i) abort();
+ if (b[i] != i) abort();
+ if (c[i] != i) abort();
+ if (d[i] != i) abort();
+ if (e[i] != i) abort();
+ if (f[i] != i) abort();
+   }
+
+  exit (0);
+}


Re: [RFA] [PATCH] Fix invalid redundant extension elimination for rl78 port

2015-12-16 Thread Jeff Law

On 12/01/2015 12:32 PM, Richard Sandiford wrote:

Jeff Law  writes:

@@ -1080,6 +1070,18 @@ add_removable_extension (const_rtx expr, rtx_insn *insn,
  }
  }

+  /* Fourth, if the extended version occupies more registers than the
+original and the source of the extension is the same hard register
+as the destination of the extension, then we can not eliminate
+the extension without deep analysis, so just punt.
+
+We allow this when the registers are different because the
+code in combine_reaching_defs will handle that case correctly.  */
+  if ((HARD_REGNO_NREGS (REGNO (dest), mode)
+  != HARD_REGNO_NREGS (REGNO (reg), GET_MODE (reg)))
+ && REGNO (dest) == REGNO (reg))
+   return;
+
/* Then add the candidate to the list and insert the reaching 
definitions
   into the definition map.  */
ext_cand e = {expr, code, mode, insn};


I might be wrong, but the check looks specific to little-endian.  Would
it make sense to use reg_overlap_mentioned_p instead of the REGNO check?
Fixed thusly.  Installed on the trunk after the usual testing on 
x86_64-linux-gnu and verifying that rl78-elf doesn't botch pr42833.c.


Jeff
commit 362f406136207b89bddab99f5ff904b0798b7115
Author: law 
Date:   Wed Dec 16 20:34:31 2015 +

* ree.c (add_removable_extension): Use reg_overlap_mentioned_p
rather than testing hard register #s.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@231719 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 1d2a994..a8475b7 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2015-12-16  Jeff Law  
+
+   * ree.c (add_removable_extension): Use reg_overlap_mentioned_p
+   rather than testing hard register #s.
+
 2015-12-16  Nathan Sidwell  
 
* config/nvptx/nvptx.h (OUTGOING_STATIC_CHAIN_REGNUM): Remove.
diff --git a/gcc/ree.c b/gcc/ree.c
index 6cfc477..d12e24d 100644
--- a/gcc/ree.c
+++ b/gcc/ree.c
@@ -1085,7 +1085,7 @@ add_removable_extension (const_rtx expr, rtx_insn *insn,
 code in combine_reaching_defs will handle that case correctly.  */
   if ((HARD_REGNO_NREGS (REGNO (dest), mode)
   != HARD_REGNO_NREGS (REGNO (reg), GET_MODE (reg)))
- && REGNO (dest) == REGNO (reg))
+ && reg_overlap_mentioned_p (dest, reg))
return;
 
   /* Then add the candidate to the list and insert the reaching definitions


Re: update_vtable_references segfault

2015-12-16 Thread Nathan Sidwell

On 12/12/15 09:44, Nathan Sidwell wrote:

On 12/11/15 13:15, Jan Hubicka wrote:

Jan,



b) augment can_replace_by_local_alias_in_vtable to check whether
aliases can be created?


I think this is best: can_replace_by_local_alias_in_vtable exists to prevent the
body walk in cases we are not going to create the alias.  This is because in LTO
we may need to stream in the constructor from the object file that is not
copletely
free and thus it is better to not touch it unless necessary.


I went with augmenting can_replace_by_local_alias, which
can_replace_by_local_alias_in_vtable calls.  I also noticed that both should be
static, which  I suspect will encourage the inliner to go inline them and then
determine a bunch of code is unreachable.

tested on x86-linux and ptx-none.

ok?


https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01324.html

ping?

nathan


Re: [PATCH 2/4][AArch64] Increase the loop peeling limit

2015-12-16 Thread Evandro Menezes

On 12/16/2015 05:24 AM, Richard Earnshaw (lists) wrote:

On 15/12/15 23:34, Evandro Menezes wrote:

On 12/14/2015 05:26 AM, James Greenhalgh wrote:

On Thu, Dec 03, 2015 at 03:07:43PM -0600, Evandro Menezes wrote:

On 11/20/2015 05:53 AM, James Greenhalgh wrote:

On Thu, Nov 19, 2015 at 04:04:41PM -0600, Evandro Menezes wrote:

On 11/05/2015 02:51 PM, Evandro Menezes wrote:

2015-11-05  Evandro Menezes 

gcc/

* config/aarch64/aarch64.c
(aarch64_override_options_internal):
Increase loop peeling limit.

This patch increases the limit for the number of peeled insns.
With this change, I noticed no major regression in either
Geekbench v3 or SPEC CPU2000 while some benchmarks, typically FP
ones, improved significantly.

I tested this tuning on Exynos M1 and on A57.  ThunderX seems to
benefit from this tuning too.  However, I'd appreciate comments

>from other stakeholders.

Ping.

I'd like to leave this for a call from the port maintainers. I can
see why
this leads to more opportunities for vectorization, but I'm
concerned about
the wider impact on code size. Certainly I wouldn't expect this to
be our
default at -O2 and below.

My gut feeling is that this doesn't really belong in the back-end
(there are
presumably good reasons why the default for this parameter across
GCC has
fluctuated from 400 to 100 to 200 over recent years), but as I say, I'd
like Marcus or Richard to make the call as to whether or not we take
this
patch.

Please, correct me if I'm wrong, but loop peeling is enabled only
with loop unrolling (and with PGO).  If so, then extra code size is
not a concern, for this heuristic is only active when unrolling
loops, when code size is already of secondary importance.

My understanding was that loop peeling is enabled from -O2 upwards, and
is also used to partially peel unaligned loops for vectorization
(allowing
the vector code to be well aligned), or to completely peel inner loops
which
may then become amenable to SLP vectorization.

If I'm wrong then I take back these objections. But I was sure this
parameter was used in a number of situations outside of just
-funroll-loops/-funroll-all-loops . Certainly I remember seeing
performance
sensitivities to this parameter at -O3 in some internal workloads I was
analysing.

Vectorization, including SLP, is only enabled at -O3, isn't it?  It
seems to me that peeling is only used by optimizations which already
lead to potential increase in code size.

For instance, with "-Ofast -funroll-all-loops", the total text size for
the SPEC CPU2000 suite is 26.9MB with this proposed change and 26.8MB
without it; with just "-O2", it is the same at 23.1MB regardless of this
setting.

So it seems to me that this proposal should be neutral for up to -O2.

Thank you,


My preference would be to not diverge from the global parameter
settings.  I haven't looked in detail at this parameter but it seems to
me there are two possible paths:

1) We could get agreement globally that the parameter should be increased.
2) We could agree that this specific use of the parameter is distinct
from some other uses and deserves a new param in its own right with a
higher value.



Here's what I have observed, not only in AArch64: architectures benefit 
differently from certain loop optimizations, especially those dealing 
with vectorization.  Be it because some have plenty of registers of more 
aggressive loop unrolling, or because some have lower costs to 
vectorize.  With this, I'm trying to imply that there may be the case to 
wiggle this parameter to suit loop optimizations better to specific 
targets.  While it is not the only parameter related to loop 
optimizations, it seems to be the one with the desired effects, as 
exemplified by PPC, S390 and x86 (AOSP).  Though there is the 
possibility that they are actually side-effects, as Richard Biener 
perhaps implied in another reply.


Cheers,

--
Evandro Menezes



[PTX] function frame emission

2015-12-16 Thread Nathan Sidwell
This patch removes OUTGOING_STATIC_CHAIN_REGNUM -- there's no need for it to be 
distinct from STATIC_CHAIN_REGNUM.  Also, when we have to emit a frame or 
outgoing args, but it's zero sized, there's no need to actually emit the frame 
or arg array.  We can just initialize the appropriate register to zero -- it's 
ill formed for the area to be dereferenced.


Some other minor tidying up included.

nathan
2015-12-16  Nathan Sidwell  

	* config/nvptx/nvptx.h (OUTGOING_STATIC_CHAIN_REGNUM): Remove.
	(REGISTER_NAMES): Adjust.
	* config/nvptx/nvptx.c (nvptx_pass_by_reference): Avoid long line.
	(nvptx_static_hain): Delete.
	(write_arg_mode): Don't emit initializer if argno < 0.
	(write_arg_type): Fix whitespace.
	(init_frame): Initialize reg to zero if frame is zero-sized.
	(nvptx_declare_function_name):  Use write_arg_type to emit chain
	decl.
	(nvptx_output_call_insn): Adjust static chain emission.
	(nvptx_goacc_reduction): Make static.
	(TARGET_STATIC_CHAIN): Don't override.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 231717)
+++ config/nvptx/nvptx.c	(working copy)
@@ -525,8 +525,9 @@ nvptx_function_value_regno_p (const unsi
reference in memory.  */
 
 static bool
-nvptx_pass_by_reference (cumulative_args_t ARG_UNUSED (cum), machine_mode mode,
-			 const_tree type, bool ARG_UNUSED (named))
+nvptx_pass_by_reference (cumulative_args_t ARG_UNUSED (cum),
+			 machine_mode mode, const_tree type,
+			 bool ARG_UNUSED (named))
 {
   return pass_in_memory (mode, type, false);
 }
@@ -549,18 +550,6 @@ nvptx_promote_function_mode (const_tree
   return promote_arg (mode, for_return || !type || TYPE_ARG_TYPES (funtype));
 }
 
-/* Implement TARGET_STATIC_CHAIN.  */
-
-static rtx
-nvptx_static_chain (const_tree fndecl, bool incoming_p)
-{
-  if (!DECL_STATIC_CHAIN (fndecl))
-return NULL;
-
-  return gen_rtx_REG (Pmode, (incoming_p ? STATIC_CHAIN_REGNUM
-			  : OUTGOING_STATIC_CHAIN_REGNUM));
-}
-
 /* Helper for write_arg.  Emit a single PTX argument of MODE, either
in a prototype, or as copy in a function prologue.  ARGNO is the
index of this argument in the PTX function.  FOR_REG is negative,
@@ -588,12 +577,15 @@ write_arg_mode (std::stringstream &s, in
   else
 	s << "%ar" << argno;
   s << ";\n";
-  s << "\tld.param" << ptx_type << " ";
-  if (for_reg)
-	s << reg_names[for_reg];
-  else
-	s << "%ar" << argno;
-  s << ", [%in_ar" << argno << "];\n";
+  if (argno >= 0)
+	{
+	  s << "\tld.param" << ptx_type << " ";
+	  if (for_reg)
+	s << reg_names[for_reg];
+	  else
+	s << "%ar" << argno;
+	  s << ", [%in_ar" << argno << "];\n";
+	}
 }
   return argno + 1;
 }
@@ -625,7 +617,7 @@ write_arg_type (std::stringstream &s, in
 	{
 	  /* Complex types are sent as two separate args.  */
 	  type = TREE_TYPE (type);
-	  mode  = TYPE_MODE (type);
+	  mode = TYPE_MODE (type);
 	  prototyped = true;
 	}
 
@@ -917,16 +909,20 @@ nvptx_maybe_record_fnsym (rtx sym)
 }
 
 /* Emit a local array to hold some part of a conventional stack frame
-   and initialize REGNO to point to it.  */
+   and initialize REGNO to point to it.  If the size is zero, it'll
+   never be valid to dereference, so we can simply initialize to
+   zero.  */
 
 static void
 init_frame (FILE  *file, int regno, unsigned align, unsigned size)
 {
-  fprintf (file, "\t.reg.u%d %s;\n"
-	   "\t.local.align %d .b8 %s_ar[%u];\n"
-	   "\tcvta.local.u%d %s, %s_ar;\n",
-	   POINTER_SIZE, reg_names[regno],
-	   align, reg_names[regno], size ? size : 1,
+  if (size)
+fprintf (file, "\t.local .align %d .b8 %s_ar[%u];\n",
+	 align, reg_names[regno], size);
+  fprintf (file, "\t.reg.u%d %s;\n",
+	   POINTER_SIZE, reg_names[regno]);
+  fprintf (file, (size ? "\tcvta.local.u%d %s, %s_ar;\n"
+		  :  "\tmov.u%d %s, 0;\n"),
 	   POINTER_SIZE, reg_names[regno], reg_names[regno]);
 }
 
@@ -981,12 +977,14 @@ nvptx_declare_function_name (FILE *file,
 }
 
   if (stdarg_p (fntype))
-argno = write_arg_type (s, ARG_POINTER_REGNUM, argno, ptr_type_node, true);
-
-  if (DECL_STATIC_CHAIN (decl))
-argno = write_arg_type (s, STATIC_CHAIN_REGNUM, argno, ptr_type_node,
+argno = write_arg_type (s, ARG_POINTER_REGNUM, argno, ptr_type_node,
 			true);
 
+  if (DECL_STATIC_CHAIN (decl) || cfun->machine->has_chain)
+write_arg_type (s, STATIC_CHAIN_REGNUM,
+		DECL_STATIC_CHAIN (decl) ? argno : -1, ptr_type_node,
+		true);
+
   fprintf (file, "%s", s.str().c_str());
 
   /* Declare a local var for outgoing varargs.  */
@@ -1000,10 +998,6 @@ nvptx_declare_function_name (FILE *file,
 init_frame (file, FRAME_POINTER_REGNUM,
 		crtl->stack_alignment_needed / BITS_PER_UNIT, sz);
 
-  if (cfun->machine->has_chain)
-fprintf (file, "\t.reg.u%d %s;\n", GET_MODE_BITSIZE (Pmode),
-	 reg_names[OUTGOING_STATIC_CHAIN_REGNUM]);
-
   /* Declare the pseudos we have as ptx registers.  */
   int maxregs = max_reg_num ();
   for (

Re: [PATCH] Fix PR c++/21802 (two-stage name lookup fails for operators)

2015-12-16 Thread Michael Matz
Hi,

On Mon, 14 Dec 2015, Patrick Palka wrote:

> >>> >This should use cp_tree_operand_length.
> >> Hmm, I don't immediately see how I can use this function here.  It
> >> expects a tree but I dont have an appropriate tree to give to it, only a
> >> tree_code.
> >
> > True.  So let's introduce cp_tree_code_length next to 
> > cp_tree_operand_length.
> >
> > Jason
> >
> >
> 
> Like this?  Incremental diff followed by patch v4:

Not my turf, but if I may make a suggestion: please use the new function 
in the old one instead of duplicating the implementation.


Ciao,
Michael.


Last testcase for PR middle-end/25140

2015-12-16 Thread Jan Hubicka
Hi,
I checked the ipa-pta and pta implementations and these seems to work just
fine with presence of aliases because get_constraint_for_ssa_var already
looks into the alias targets.

This patch adds a testcase I constructed.  Since I am done with auditing
*alias*.c for variable aliases I will close the PR (after 10 years, yay)

Honza

PR middle-end/25140
* gcc.c-torture/execute/alias-4.c: New testcase.
Index: testsuite/gcc.c-torture/execute/alias-4.c
===
--- testsuite/gcc.c-torture/execute/alias-4.c   (revision 0)
+++ testsuite/gcc.c-torture/execute/alias-4.c   (revision 0)
@@ -0,0 +1,19 @@
+/* { dg-require-alias "" } */
+int a = 1;
+extern int b __attribute__ ((alias ("a")));
+int c = 1;
+extern int d __attribute__ ((alias ("c")));
+main (int argc)
+{
+  int *p;
+  int *q;
+  if (argc)
+p = &a, q = &b;
+  else
+p = &c, q = &d;
+  *p = 1;
+  *q = 2;
+  if (*p == 1)
+__builtin_abort ();
+  return 0;
+}


Re: ipa-cp heuristics fixes

2015-12-16 Thread Jan Hubicka
Hi,
just to summarize a discussion on IRC. The problem is that we produce debug
statements for eliminated arguments only in ipa-sra and ipa-split, while we
don't do anything for cgraph clones. This is a problem on release branches,
too.

It seems we have all the necessary logic, but the callee modification code from
ipa-split should be moved to tree_function_versioning (which is used by both
ipa-split and cgraph clone mechanizm) and caller modifcation copied to
cgraph_edge::redirect_call_stmt_to_callee.

I am trying to do that. It seems bit difficult as the caller and callee
modifications are tied together and I do not know how chaining of
transfomraitons is going to work. 

Honza


Re: [RFA] [PATCH] Fix invalid redundant extension elimination for rl78 port

2015-12-16 Thread Jeff Law

On 12/01/2015 12:32 PM, Richard Sandiford wrote:

Jeff Law  writes:

@@ -1080,6 +1070,18 @@ add_removable_extension (const_rtx expr, rtx_insn *insn,
  }
  }

+  /* Fourth, if the extended version occupies more registers than the
+original and the source of the extension is the same hard register
+as the destination of the extension, then we can not eliminate
+the extension without deep analysis, so just punt.
+
+We allow this when the registers are different because the
+code in combine_reaching_defs will handle that case correctly.  */
+  if ((HARD_REGNO_NREGS (REGNO (dest), mode)
+  != HARD_REGNO_NREGS (REGNO (reg), GET_MODE (reg)))
+ && REGNO (dest) == REGNO (reg))
+   return;
+
/* Then add the candidate to the list and insert the reaching 
definitions
   into the definition map.  */
ext_cand e = {expr, code, mode, insn};


I might be wrong, but the check looks specific to little-endian.  Would
it make sense to use reg_overlap_mentioned_p instead of the REGNO check?

Agreed.  Testing in progress now...

jeff



Fix size of enum bitfield in recently added test

2015-12-16 Thread Jeff Law


Matthew pointed out this test was failing for arm-none-eabi because the 
rtx_code enum is represented in 8 bits which causes this error:


pr68619-4.c:42:17: error: width of 'code' exceeds its type
   enum rtx_code code:16;


I changed the size of the bitfield in the obvious way.  I verified all 
the other pr68619 tests on arm-none-eabi as well as verifying x86_64's 
was happy with the change to pr68619-4.c.


Installed on the trunk.

Jeff
commit 7b42c818ec41486da307b50f504d29d20086e8a8
Author: law 
Date:   Wed Dec 16 18:53:25 2015 +

* gcc.dg/tree-ssa/pr68619-4.c: Change size of code bitfield.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@231717 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 83cf7ac..9ce80b1 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2015-12-13  Jeff Law  
+
+   * gcc.dg/tree-ssa/pr68619-4.c: Change size of code bitfield.
+
 2015-12-16  David Malcolm  
 
* c-c++-common/conflict-markers-1.c: New testcase.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68619-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr68619-4.c
index da3cdb9..6c7d180 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr68619-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68619-4.c
@@ -39,7 +39,7 @@ union rtunion_def
 typedef union rtunion_def rtunion;
 struct rtx_def
 {
-  enum rtx_code code:16;
+  enum rtx_code code:8;
   union u
   {
 rtunion fld[1];


Re: [PATCH] Better error recovery for merge-conflict markers (v5)

2015-12-16 Thread David Malcolm
On Wed, 2015-12-16 at 00:52 +0100, Bernd Schmidt wrote:
> On 12/15/2015 08:30 PM, David Malcolm wrote:
> 
> > I got thinking about what we'd have to do to support Perforce-style
> > markers, and began to find my token-matching approach to be a little
> > clunky (in conjunction with reading Martin's observations on
> > c_parser_peek_nth_token).
> >
> > Here's a reimplementation of the patch which takes a much simpler
> > approach, and avoids the need to touch the C lexer: check that we're
> > not in a macro expansion and then read in the source line, and
> > textually compare against the various possible conflict markers.
> > This adds the requirement that the source file be readable, so it
> > won't detect conflict markers in a .i file from -save-temps,
> 
> How come? Is source file defined as the one before preprocessing?

Yes, unless you manually strip the #line directives.  So unless the
original source files are still around in the right path relative to
where you're compiling the .i file, the calls to
location_get_source_line will fail.

> And I do think this is an unfortunate limitation (given that we often 
> load .i files into cc1 for debugging and we'd ideally like that to be 
> consistent with normal compilation as much as possible). I'd rather go 
> with the original patch based on this.

(nods)  This can be a pain when debugging diagnostic_show_locus, but
there's not much that can be done about it.



Re: [PATCH] Better error recovery for merge-conflict markers (v4)

2015-12-16 Thread David Malcolm
On Wed, 2015-12-09 at 18:44 +0100, Bernd Schmidt wrote:
> On 12/09/2015 05:58 PM, David Malcolm wrote:
> > On Wed, 2015-11-04 at 14:56 +0100, Bernd Schmidt wrote:
> >>
> >> This seems like fairly low impact but also low cost, so I'm fine with it
> >> in principle. I wonder whether the length of the marker is the same
> >> across all versions of patch (and VC tools)?
> >
> > It's hardcoded for GNU patch:
> [...]
>  >From what I can tell, Perforce is the outlier here.
> 
> Thanks for checking all that.
> 
> >> Just thinking out loud - I guess it would be too much to hope for to
> >> share lexers between frontends so that we need only one copy of this?
> >
> > Probably :(
> 
> Someone slap sense into me, I just thought of deriving C and C++ parsers 
> from a common base class... (no this is not a suggestion for this patch).
> 
> > Would a better wording be:
> >
> > extern short some_var; /* This line would lead to a warning due to the
> >duplicate name, but it is skipped when handling
> >the conflict marker.  */
> 
> I think so, yes.
> 
> > That said, it's not clear they're always at the beginning of a line;
> > this bazaar bug indicates that CVS (and bazaar) can emit them
> > mid-line:
> >https://bugs.launchpad.net/bzr/+bug/36399
> 
> Ok. CVS I think we shouldn't worry about, and it looks like this is one 
> particular bug/corner case where the conflict end marker is the last 
> thing in the file. I think on the whole it's best to check for beginning 
> of the line as you've done.
> 
> > Wording-wise, should it be "merge conflict marker", rather
> > than "patch conflict marker"?
> >
> > Clang spells it:
> > "error: version control conflict marker in file"
> > http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#merge_conflicts
> 
> Yeah, if another compiler has a similar/identical diagnostic I think we 
> should just copy that unless there's a very good reason not to.
> 
> > Rebased on top of r231445 (from yesterday).
> > Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
> > Adds 82 new PASSes to g++.sum and 27 new PASSes to gcc.sum.
> >
> > OK for trunk?
> 
> I'm inclined to say yes since it was originally submitted in time and 
> it's hard to imagine how the change could be risky (I'll point out right 
> away that there are one or two other patches in the queue that were also 
> submitted in time which I feel should not be considered for gcc-6 at 
> this point due to risk).
> 
> Let's wait until the end of the week for objections, commit then.

Thanks.  I updated it based on the feedback above, including changing
the wording to match clang's:
 "error: version control conflict marker in file"
I replaced "patch conflict marker" with "conflict marker" in the code
and names of test cases.

I took the liberty of adding:
  gcc_assert (n > 0);
to c_parser_peek_nth_token based on Martin's feedback.

Having verified bootstrap®rtest (on x86_64-pc-linux-gnu), I've
committed it to trunk as r231712.

I'm attaching what I committed, for reference.
Index: gcc/c-family/ChangeLog
===
--- gcc/c-family/ChangeLog	(revision 231711)
+++ gcc/c-family/ChangeLog	(revision 231712)
@@ -1,3 +1,8 @@
+2015-12-16  David Malcolm  
+
+	* c-common.h (conflict_marker_get_final_tok_kind): New prototype.
+	* c-lex.c (conflict_marker_get_final_tok_kind): New function.
+
 2015-12-15  Ilya Verbin  
 
 	* c-common.c (c_common_attribute_table): Handle "omp declare target
Index: gcc/c-family/c-lex.c
===
--- gcc/c-family/c-lex.c	(revision 231711)
+++ gcc/c-family/c-lex.c	(revision 231712)
@@ -1263,3 +1263,29 @@
 
   return value;
 }
+
+/* Helper function for c_parser_peek_conflict_marker
+   and cp_lexer_peek_conflict_marker.
+   Given a possible conflict marker token of kind TOK1_KIND
+   consisting of a pair of characters, get the token kind for the
+   standalone final character.  */
+
+enum cpp_ttype
+conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind)
+{
+  switch (tok1_kind)
+{
+default: gcc_unreachable ();
+case CPP_LSHIFT:
+  /* "<<" and '<' */
+  return CPP_LESS;
+
+case CPP_EQ_EQ:
+  /* "==" and '=' */
+  return CPP_EQ;
+
+case CPP_RSHIFT:
+  /* ">>" and '>' */
+  return CPP_GREATER;
+}
+}
Index: gcc/c-family/c-common.h
===
--- gcc/c-family/c-common.h	(revision 231711)
+++ gcc/c-family/c-common.h	(revision 231712)
@@ -1089,6 +1089,10 @@
 extern int c_gimplify_expr (tree *, gimple_seq *, gimple_seq *);
 extern tree c_build_bind_expr (location_t, tree, tree);
 
+/* In c-lex.c.  */
+extern enum cpp_ttype
+conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind);
+
 /* In c-pch.c  */
 extern void pch_init (void);
 extern void pch_cpp_save_state (void);
Index: gcc/c/ChangeLog
=

C++ PATCH for c++/63628 (generic lambdas and variadic capture)

2015-12-16 Thread Jason Merrill
My patch for 63809 fixed non-capturing use of a parameter pack in a 
regular lambda, but not in a generic lambda, where we can't rely on 
being instantiated within the enclosing context.  So we use the existing 
support for references to parameters from trailing return type, another 
situation where we have a reference to a pack from unevaluated context 
and don't have a local_specialization to help us.


With this change, my patch to copy the local_specializations table is no 
longer needed, so I'm reverting it (but leaving the hash table copy 
constructors).


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 46101839657233781f6fc9e824971cfbf8202a1e
Author: Jason Merrill 
Date:   Tue Dec 15 16:43:02 2015 -0500

	PR c++/63628
	* pt.c (tsubst_pack_expansion): Also make dummy decls if
	retrieve_local_specialization fails.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index a45e6df..58742b0 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10794,12 +10794,16 @@ tsubst_pack_expansion (tree t, tree args, tsubst_flags_t complain,
 	  if (PACK_EXPANSION_LOCAL_P (t) || CONSTRAINT_VAR_P (parm_pack))
 	arg_pack = retrieve_local_specialization (parm_pack);
 	  else
+	/* We can't rely on local_specializations for a parameter
+	   name used later in a function declaration (such as in a
+	   late-specified return type).  Even if it exists, it might
+	   have the wrong value for a recursive call.  */
+	need_local_specializations = true;
+
+	  if (!arg_pack)
 	{
-	  /* We can't rely on local_specializations for a parameter
-		 name used later in a function declaration (such as in a
-		 late-specified return type).  Even if it exists, it might
-		 have the wrong value for a recursive call.  Just make a
-		 dummy decl, since it's only used for its type.  */
+	  /* This parameter pack was used in an unevaluated context.  Just
+		 make a dummy decl, since it's only used for its type.  */
 	  arg_pack = tsubst_decl (parm_pack, args, complain);
 	  if (arg_pack && DECL_PACK_P (arg_pack))
 		/* Partial instantiation of the parm_pack, we can't build
@@ -10807,7 +10811,6 @@ tsubst_pack_expansion (tree t, tree args, tsubst_flags_t complain,
 		arg_pack = NULL_TREE;
 	  else
 		arg_pack = make_fnparm_pack (arg_pack);
-	  need_local_specializations = true;
 	}
 	}
   else if (TREE_CODE (parm_pack) == FIELD_DECL)
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-generic-variadic3.C b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-variadic3.C
new file mode 100644
index 000..9b3455a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-variadic3.C
@@ -0,0 +1,15 @@
+// PR c++/63628
+// { dg-do compile { target c++14 } }
+
+auto const pack = [](auto&&... t)
+{
+  return [&](auto&& f)->decltype(auto)
+  {
+return f(static_cast(t)...);
+  };
+};
+
+int main(int argc, char** argv) {
+  pack(1)([](int){});
+  return 0;
+}


[PATCH] C and C++ FE: better source ranges for binary ops

2015-12-16 Thread David Malcolm
Currently trunk emits range information for most bad binary operations
in the C++ frontend; but not in the C frontend.

The helper function binary_op_error shared by C and C++ takes a
location_t.  In the C++ frontend, a location_t containing the range
has already been built, so we get the underline when it later validates
the expression: e.g. this example from:
   https://gcc.gnu.org/wiki/ClangDiagnosticsComparison

t.cc:9:19: error: no match for ‘operator+’ (operand types are ‘int’ and ‘foo’)
   return P->bar() + *P;
  ~^~~~

In the C frontend, the "full" location_t is only built
after type-checking, so if type-checking fails, we get a caret/range
covering just the operator so e.g. for:

   return (some_function ()
+ some_other_function ());

we get just:

gcc.dg/bad-binary-ops.c:29:4: error: invalid operands to binary + (have 'struct 
s' and 'struct t')
+ some_other_function ());
^

The following patch updates binary_op_error to accept a rich_location *.
For the C++ frontend we populate it with just the location_t as before,
but for the C frontend we can add locations for the operands, giving
this underlining for the example above:

bad-binary-ops.c:29:4: error: invalid operands to binary + (have 'struct s' and 
'struct t')
   return (some_function ()
   
+ some_other_function ());
^ ~~

Additionally, in the C++ frontend, cp_build_binary_op has an "error"
call, which implicitly uses input_location, giving this reduced
information for another test case from
https://gcc.gnu.org/wiki/ClangDiagnosticsComparison:

bad-binary-ops.C:10:14: error: invalid operands of types '__m128 {aka float}' 
and 'const int*' to binary 'operator/'
   myvec[1] / ptr;
  ^~~

The patch updates it to use error_at with the location_t provided,
fixing the above to become:

bad-binary-ops.C:10:12: error: invalid operands of types '__m128 {aka float}' 
and 'const int*' to binary 'operator/'
   myvec[1] / ptr;
   ~^

Finally, since this patch adds a usage of
gcc_rich_location::maybe_add_expr to cc1, we can drop the hacked-in
copy of gcc_rich_location::add_expr from
gcc.dg/plugin/diagnostic_plugin_show_trees.c.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu;
adds 15 new PASS results in g++.sum and 7 new PASS results in gcc.sum

OK for trunk in stage 3?

gcc/c-family/ChangeLog:
* c-common.c (binary_op_error): Convert first param from
location_t to rich_location * and use it when emitting an error.
* c-common.h (binary_op_error): Convert first param from
location_t to rich_location *.

gcc/c/ChangeLog:
* c-typeck.c: Include "gcc-rich-location.h".
(build_binary_op): In the two places that call binary_op_error,
create a gcc_rich_location and populate it with the location of
the binary op and its two operands.

gcc/cp/ChangeLog:
* typeck.c (cp_build_binary_op): Update for change in signature
of build_binary_op.  Use error_at to replace an implicit use
of input_location with param "location" in "invalid operands"
error.
(cp_build_binary_op): Replace an error with an error_at, using
"location", rather than implicitly using input_location.

gcc/testsuite/ChangeLog:
* g++.dg/diagnostic/bad-binary-ops.C: New test case.
* gcc.dg/bad-binary-ops.c: New test case.
gcc.dg/plugin/diagnostic_plugin_show_trees.c (get_range_for_expr):
Remove material copied from gcc-rich-location.c
(gcc_rich_location::add_expr): Likewise.
---
 gcc/c-family/c-common.c| 21 +++---
 gcc/c-family/c-common.h|  2 +-
 gcc/c/c-typeck.c   | 11 -
 gcc/cp/typeck.c| 13 --
 gcc/testsuite/g++.dg/diagnostic/bad-binary-ops.C   | 44 
 gcc/testsuite/gcc.dg/bad-binary-ops.c  | 48 ++
 .../gcc.dg/plugin/diagnostic_plugin_show_trees.c   | 44 
 7 files changed, 128 insertions(+), 55 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/bad-binary-ops.C
 create mode 100644 gcc/testsuite/gcc.dg/bad-binary-ops.c

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 4250cdf..653d1dc 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -3795,10 +3795,21 @@ c_register_builtin_type (tree type, const char* name)
 
 /* Print an error message for invalid operands to arith operation
CODE with TYPE0 for operand 0, and TYPE1 for operand 1.
-   LOCATION is the location of the message.  */
+   RICHLOC is a rich location for the message, containing either
+   three separate locations for each of the operator and operands
+
+  lhs op rhs
+  ~~~ ^~ ~~~
+
+   (C FE), or one location ranging over all over them
+
+  lhs op rhs
+  ^~
+
+   (C++ FE).  */
 
 void
-binary_op_error (

[PATCH] [ARM] PR68532: Fix VUZP and VZIP recognition on big endian

2015-12-16 Thread Charles Baylis
Hi

This patch addresses incorrect recognition of VEC_PERM_EXPRs as VUZP
and VZIP on armeb-* targets. It also fixes the definition of the
vuzpq_* and vzipq_*  NEON intrinsics which use incorrect lane
specifiers in the use of __builtin_shuffle().

The problem with arm_neon.h can be seen by temporarily altering
arm_expand_vec_perm_const_1() to unconditionally return false. If this
is done, the vuzp/vzip tests in the advsimd execution tests will fail.
With these patches, this is no longer the case.

The problem is caused by the weird mapping of architectural lane order
to gcc lane order in big endian. For 64 bit vectors, the order is
simply reversed, but 128 bit vectors are treated as 2 64 bit vectors
where the lane ordering is reversed inside those. This is due to the
memory ordering defined by the EABI. There is a large comment in
gcc/config/arm.c above output_move_neon() which describes this in more
detail.

The arm_evpc_neon_vuzp() and  arm_evpc_neon_vzip() functions do not
allow for this lane order, instead treating the lane order as simply
reversed in 128 bit vectors. These patches fix this. I have included a
test case for vuzp, but I don't have one for vzip.

Tested with make check on arm-unknown-linux-gnueabihf with no regressions
Tested with make check on armeb-unknown-linux-gnueabihf. Some
gcc.dg/vect tests fail due to no longer being vectorized. I haven't
analysed these, but it is expected since vuzp is not usable for the
shuffle patterns for which it was previously used. There are also a
few new PASSes.


Patch 1 (vuzp):

gcc/ChangeLog:

2015-12-15  Charles Baylis  

* config/arm/arm.c (arm_neon_endian_lane_map): New function.
(arm_neon_vector_pair_endian_lane_map): New function.
(arm_evpc_neon_vuzp): Allow for big endian lane order.
* config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big
endian.
(vuzpq_s16): Likewise.
(vuzpq_s32): Likewise.
(vuzpq_f32): Likewise.
(vuzpq_u8): Likewise.
(vuzpq_u16): Likewise.
(vuzpq_u32): Likewise.
(vuzpq_p8): Likewise.
(vuzpq_p16): Likewise.

gcc/testsuite/ChangeLog:

2015-12-15  Charles Baylis  

* gcc.c-torture/execute/pr68532.c: New test.


Patch 2 (vzip)

gcc/ChangeLog:

2015-12-15  Charles Baylis  

* config/arm/arm.c (arm_evpc_neon_vzip): Allow for big endian lane
order.
* config/arm/arm_neon.h (vzipq_s8): Adjust shuffle patterns for big
endian.
(vzipq_s16): Likewise.
(vzipq_s32): Likewise.
(vzipq_f32): Likewise.
(vzipq_u8): Likewise.
(vzipq_u16): Likewise.
(vzipq_u32): Likewise.
(vzipq_p8): Likewise.
(vzipq_p16): Likewise.


0001-ARM-Fix-up-vuzp-for-big-endian.patch
Description: application/download


0002-ARM-Fix-up-vzip-recognition-for-big-endian.patch
Description: application/download


Re: ipa-cp heuristics fixes

2015-12-16 Thread Jakub Jelinek
On Wed, Dec 16, 2015 at 06:15:33PM +0100, Jan Hubicka wrote:
> > On Wed, Dec 16, 2015 at 05:24:25PM +0100, Jan Hubicka wrote:
> > > I am trying to understand Jakub's debug code and perhaps it can be 
> > > improved. But in
> > > the case of optimized out unused parameters I think it is perfectly 
> > > resonable to
> > > say that the variable was optimized out.
> > 
> > As long as the values that would be passed to the unused parameters are
> > constant or live in memory or registers that isn't clobbered by the call in
> > the caller, we have the ability to express that in the debug info now, and
> > we should.
> int
> main ()
> {
>   int l = 0;
>   asm ("" : "=r" (l) : "0" (l));
>   a = foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30);
>   asm volatile ("" :: "r" (l));
>   return 0;
> }
> 
> the unused parameters are not constant because of the asm block and we simply 
> do not
> compute them if cloning happens.  The following patch to the testcase

So, we had a debuginfo QoI bug before, but with the change we hit that now
way more often than in the past, which makes it a debuginfo quality regression.

But, we still do emit the right debuginfo stuff for
volatile int vv;

static __attribute__ ((noinline))
int f1 (int x, int y, int z)
{
  int a = x * 2;
  int b = y * 2;
  int c = z * 2;
  vv++;
  return x + z;
}

int
u1 (int x)
{
  return f1 (x, 2, 3);
}

and I really can't see what is the difference between that and say
the pr36728-1.c testcase or the one with your modification to also pass
in a constant.  This one also has one used argument, some unused ones, and
some where a constant is passed.  BTW, if I adjust your modified
pr36728-1.c so that cst is used somewhere, then that parameter is passed,
rather than optimized away as I'd expect (and as happens on the above
testcase).
So, clearly there are multiple ways to perform the same parameter
optimizations, one in isra (though I don't see anything to scalarize above),
another one in ipa-cp or where, and only some of them hit the debug info
creation for the optimized away arguments.
Thus the question is why we have multiple spots to do the same thing,
what is so different between the testcases, and what do we need to change
to emit the debug info we should, and what we should change to emit
the cst below if it is used, if we already clone it anyway.

> Index: testsuite/gcc.dg/guality/pr36728-1.c
> ===
> --- testsuite/gcc.dg/guality/pr36728-1.c(revision 231022)
> +++ testsuite/gcc.dg/guality/pr36728-1.c(working copy)
> @@ -7,7 +7,7 @@
>  int a, b;
> 
>  int __attribute__((noinline))
> -foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7)
> +foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7, 
> int cst)
>  { 
>char *x = __builtin_alloca (arg7);
>int __attribute__ ((aligned(32))) y;
> @@ -48,7 +48,7 @@ main ()
>  { 
>int l = 0;
>asm ("" : "=r" (l) : "0" (l));
> -  a = foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30);
> +  a = foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30, 7);
>asm volatile ("" :: "r" (l));
>return 0;
>  }
> 
> makes it fail on GCC 5 tree, too. The extra unused constant argument makes 
> GCC 5
> to clone foo and  optimize out l, too.

Jakub


Re: [PATCH][combine] PR rtl-optimization/68651 Try changing rtx from (r + r) to (r << 1) to aid recognition

2015-12-16 Thread Jeff Law

On 12/16/2015 10:00 AM, Kyrill Tkachov wrote:


On 16/12/15 12:18, Bernd Schmidt wrote:

On 12/15/2015 05:21 PM, Kyrill Tkachov wrote:

Then for the shift pattern in the MD file we'd have to
dynamically select the scheduling type depending on whether or
not the shift amount is 1 and the costs line up?


Yes. This isn't unusual, take a look at i386.md where you have a
lot of switches on attr type to decide which string to print.



I'm just worried that if we take this idea to its logical conclusion,
we have to add a new canonicalisation rule: "all (plus x x)
expressions shall be expressed as (ashift x 1)". Such a rule seems
too specific to me and all targets would have to special-case it in
their MD patterns and costs if they ever wanted to treat an add and a
shift differently. In this particular case we'd have to
conditionalise the scheduling string selection on a particular CPU
tuning and the shift amount, which will make the pattern much harder
to read. To implement this properly we'd also have to
That's not terribly unusual.  And we've done those kind of 
canonicalization rules before -- most recently to deal with issues in 
combine we settled on canonicalization rules for ashift vs mult.  While 
there was fallout, it's manageable.






The price we pay when trying these substitutions is an iteration
over the rtx with FOR_EACH_SUBRTX_PTR. recog gets called only if
that iteration actually performed a substitution of x + x into x
<< 1. Is that too high a price to pay? (I'm not familiar with the
performance characteristics of the FOR_EACH_SUBRTX machinery)


It depends on how many of these transforms we are going to try; it
 also feels very hackish, trying to work around the core design of
the combiner. IMO it would be better for machine descriptions to
work with the pass rather than against it.



Perhaps I'm lacking the historical context, but what is the core
design of the combiner? Why should the backend have to jump through
these hoops if it already communicates to the midend (through correct
rtx costs) that a shift is more expensive than a plus? I'd be more
inclined to agree that this is perhaps a limitation in recog rather
than combine, but still not a backend problem.

The historical design of combine is pretty simple.

Use data dependence to substitute the definition of an operand in a use
of the operand. Essentially create bigger blobs of RTL. Canonicalize and
simplify that larger blob of RTL, then try to match it with a pattern in
the backend.

Note that costing didn't enter the picture. The assumption was that if
the combination succeeds, then it's profitable (fewer insns).  We 
haven't generally encouraged trying to match multiple forms of 
equivalent expressions, instead we declare a canonical form and make 
sure combine uses it.






If you can somehow arrange for the (plus x x) to be turned into a
shift while substituting that might be yet another approach to
try.



I did investigate where else we could make this transformation. For
the zero_extend+shift case (the ubfiz instruction from the testcase
in my original submission) we could fix this by modifying
make_extraction to convert its argument to a shift from (plus x x)
as, in that context, shifts are undoubtedly more likely to simplify
with the various extraction operations that it's trying to perform.
Note that canonicalizing (plus x x) to (ashift x 1) is consistent with 
the canonicalization we do for (mult x C) to (ashift x log2 (C)) where C 
is an exact power of two.


When we made that change consistently (there were cases where we instead 
preferred MULT in the past), we had to fix some backends, but the 
fallout wasn't terrible.


I would think from a representational standpoint canonicalizing (plus x 
x) to (ashift x 1) would be generally a good thing.



jeff


Re: ipa-cp heuristics fixes

2015-12-16 Thread Jan Hubicka
> On Wed, Dec 16, 2015 at 05:24:25PM +0100, Jan Hubicka wrote:
> > I am trying to understand Jakub's debug code and perhaps it can be 
> > improved. But in
> > the case of optimized out unused parameters I think it is perfectly 
> > resonable to
> > say that the variable was optimized out.
> 
> As long as the values that would be passed to the unused parameters are
> constant or live in memory or registers that isn't clobbered by the call in
> the caller, we have the ability to express that in the debug info now, and
> we should.
int
main ()
{
  int l = 0;
  asm ("" : "=r" (l) : "0" (l));
  a = foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30);
  asm volatile ("" :: "r" (l));
  return 0;
}

the unused parameters are not constant because of the asm block and we simply 
do not
compute them if cloning happens.  The following patch to the testcase
Index: testsuite/gcc.dg/guality/pr36728-1.c
===
--- testsuite/gcc.dg/guality/pr36728-1.c(revision 231022)
+++ testsuite/gcc.dg/guality/pr36728-1.c(working copy)
@@ -7,7 +7,7 @@
 int a, b;

 int __attribute__((noinline))
-foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7)
+foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7, int 
cst)
 { 
   char *x = __builtin_alloca (arg7);
   int __attribute__ ((aligned(32))) y;
@@ -48,7 +48,7 @@ main ()
 { 
   int l = 0;
   asm ("" : "=r" (l) : "0" (l));
-  a = foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30);
+  a = foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30, 7);
   asm volatile ("" :: "r" (l));
   return 0;
 }

makes it fail on GCC 5 tree, too. The extra unused constant argument makes GCC 5
to clone foo and  optimize out l, too.

Honza
> 
> > As you can see, the testcase explicitely prevents ipa-cp constant 
> > propagation by the
> > asm statement.  We can just update the testcases to use the parameters and 
> > test
> > the original issue again.
> 
> No, the testcase intentionally wants to test unused arguments, they happen
> in quite a lot of code and "optimized out" is not really desirable answer if
> we can provide the values some other way.
> 
>   Jakub


Fix PR66208

2015-12-16 Thread Bernd Schmidt
This is a relatively straightforward PR where we should mention a macro 
expansion in a warning message. The patch below implements the 
suggestion by Marek to pass a location down from 
build_function_call_vec. Ok if tests pass on x86_64-linux?


One question I have is about -Wformat, which is dealt with in the same 
area. We do get a mention of the macro expansion, but in a different style:


/* { dg-do compile } */
/* { dg-options "-Wformat" } */

int foox (const char *, ...) __attribute__ ((format (printf, 1, 2)));

#define foo(p, x) foox (p, x)
#define foo3(p, x, y) foox (p, x, y)

void
bar (unsigned int x)
{
  foo ("%x", "fish");
  foo3 ("%x", 3, "fish");
}

macroloc2.c: In function ‘bar’:
macroloc2.c:12:8: warning: format ‘%x’ expects argument of type 
‘unsigned int’, but argument 2 has type ‘char *’ [-Wformat=]

   foo ("%x", "fish");
^
macroloc2.c:6:25: note: in definition of macro ‘foo’
 #define foo(p, x) foox (p, x)
 ^
macroloc2.c:13:9: warning: too many arguments for format 
[-Wformat-extra-args]

   foo3 ("%x", 3, "fish");
 ^
macroloc2.c:7:29: note: in definition of macro ‘foo3’
 #define foo3(p, x, y) foox (p, x, y)
 ^

Is this what we're looking for in terms of output?


Bernd
	PR c/66208
	* c-common.c (check_function_nonnull): Remove unnecessary declaration.
	Add new arg loc and pass it down as context.
	(check_nonnull_arg): Don't mark ctx arg as unused. Use it as a pointer
	to the location to use for the warning.
	(check_function_arguments): New arg loc.  All callers changed.  Pass
	it to check_function_nonnull.
	* c-common.h (check_function_arguments): Adjust declaration.

testsuite/
	PR c/66208
	* c-c++-common/pr66208.c: New file.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 369574f..6074d14 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -395,7 +395,6 @@ static tree handle_bnd_variable_size_attribute (tree *, tree, tree, int, bool *)
 static tree handle_bnd_legacy (tree *, tree, tree, int, bool *);
 static tree handle_bnd_instrument (tree *, tree, tree, int, bool *);
 
-static void check_function_nonnull (tree, int, tree *);
 static void check_nonnull_arg (void *, tree, unsigned HOST_WIDE_INT);
 static bool nonnull_check_p (tree, unsigned HOST_WIDE_INT);
 static bool get_nonnull_operand (tree, unsigned HOST_WIDE_INT *);
@@ -9611,11 +9610,10 @@ handle_nonnull_attribute (tree *node, tree ARG_UNUSED (name),
 
 /* Check the argument list of a function call for null in argument slots
that are marked as requiring a non-null pointer argument.  The NARGS
-   arguments are passed in the array ARGARRAY.
-*/
+   arguments are passed in the array ARGARRAY.  */
 
 static void
-check_function_nonnull (tree attrs, int nargs, tree *argarray)
+check_function_nonnull (location_t loc, tree attrs, int nargs, tree *argarray)
 {
   tree a;
   int i;
@@ -9635,7 +9633,7 @@ check_function_nonnull (tree attrs, int nargs, tree *argarray)
 
   if (a != NULL_TREE)
 for (i = 0; i < nargs; i++)
-  check_function_arguments_recurse (check_nonnull_arg, NULL, argarray[i],
+  check_function_arguments_recurse (check_nonnull_arg, &loc, argarray[i],
 	i + 1);
   else
 {
@@ -9651,7 +9649,7 @@ check_function_nonnull (tree attrs, int nargs, tree *argarray)
 	}
 
 	  if (a != NULL_TREE)
-	check_function_arguments_recurse (check_nonnull_arg, NULL,
+	check_function_arguments_recurse (check_nonnull_arg, &loc,
 	  argarray[i], i + 1);
 	}
 }
@@ -9737,9 +9735,10 @@ nonnull_check_p (tree args, unsigned HOST_WIDE_INT param_num)
via check_function_arguments_recurse.  */
 
 static void
-check_nonnull_arg (void * ARG_UNUSED (ctx), tree param,
-		   unsigned HOST_WIDE_INT param_num)
+check_nonnull_arg (void *ctx, tree param, unsigned HOST_WIDE_INT param_num)
 {
+  location_t *ploc = (location_t *) ctx;
+
   /* Just skip checking the argument if it's not a pointer.  This can
  happen if the "nonnull" attribute was given without an operand
  list (which means to check every pointer argument).  */
@@ -9748,8 +9748,8 @@ check_nonnull_arg (void * ARG_UNUSED (ctx), tree param,
 return;
 
   if (integer_zerop (param))
-warning (OPT_Wnonnull, "null argument where non-null required "
-	 "(argument %lu)", (unsigned long) param_num);
+warning_at (*ploc, OPT_Wnonnull, "null argument where non-null required "
+		"(argument %lu)", (unsigned long) param_num);
 }
 
 /* Helper for nonnull attribute handling; fetch the operand number
@@ -10198,15 +10198,17 @@ handle_designated_init_attribute (tree *node, tree name, tree, int,
 
 
 /* Check for valid arguments being passed to a function with FNTYPE.
-   There are NARGS arguments in the array ARGARRAY.  */
+   There are NARGS arguments in the array ARGARRAY.  LOC should be used for
+   diagnostics.  */
 void
-check_function_arguments (const_tree fntype, int nargs, tree *argarray)
+check_function_arguments (location_t loc,

Re: [PATCH, PR67627][RFC] broken libatomic multilib parallel build

2015-12-16 Thread Jeff Law

On 12/04/2015 05:39 AM, Szabolcs Nagy wrote:

As described in pr other/67627, the all-multi target can be
built in parallel with the %_.lo targets which generate make
dependencies that are parsed during the build of all-multi.

gcc -MD does not generate the makefile dependencies in an
atomic way so make can fail if it concurrently parses those
half-written files.
(not observed on x86, but happens on arm native builds.)

this workaround forces all-multi to only run after the *_.lo
targets are done, but there might be a better solution using
automake properly. (automake should know about the generated
make dependency files that are included into the makefile so
no manual tinkering is needed to get the right build order,
but i don't know how to do that.)

2015-12-04  Szabolcs Nagy  

 PR other/67627
 * Makefile.am (all-multi): Add dependency.
 * Makefile.in: Regenerate.
So looking at the patch, it looks like you're adding a dependency in 
Makefile.am to pass it through to Makefile.in, which is fine.


So I think you just need to replicate that fix across the other 
libraries which have this problem.


jeff


Re: [PATCH, PR67627][RFC] broken libatomic multilib parallel build

2015-12-16 Thread Jeff Law

On 12/04/2015 05:39 AM, Szabolcs Nagy wrote:

As described in pr other/67627, the all-multi target can be
built in parallel with the %_.lo targets which generate make
dependencies that are parsed during the build of all-multi.

gcc -MD does not generate the makefile dependencies in an
atomic way so make can fail if it concurrently parses those
half-written files.
(not observed on x86, but happens on arm native builds.)

this workaround forces all-multi to only run after the *_.lo
targets are done, but there might be a better solution using
automake properly. (automake should know about the generated
make dependency files that are included into the makefile so
no manual tinkering is needed to get the right build order,
but i don't know how to do that.)

2015-12-04  Szabolcs Nagy  

 PR other/67627
 * Makefile.am (all-multi): Add dependency.
 * Makefile.in: Regenerate.

ISTM that many of the libraries have this problem.


[law@localhost gcc]$ grep all-am: */Makefile.in | grep -v install
boehm-gc/Makefile.in:all-am: Makefile $(LTLIBRARIES) all-multi
gotools/Makefile.in:all-am: Makefile $(PROGRAMS) $(MANS)
libatomic/Makefile.in:all-am: Makefile $(LTLIBRARIES) all-multi 
auto-config.h

libbacktrace/Makefile.in:all-am: Makefile $(LTLIBRARIES) all-multi config.h
libcc1/Makefile.in:all-am: Makefile $(LTLIBRARIES) cc1plugin-config.h
libcilkrts/Makefile.in:all-am: Makefile $(LTLIBRARIES) all-multi $(HEADERS)
libffi/Makefile.in:all-am: Makefile $(INFO_DEPS) $(LTLIBRARIES) 
all-multi $(DATA) \
libgfortran/Makefile.in:all-am: Makefile $(LTLIBRARIES) all-multi 
$(DATA) $(HEADERS) config.h
libgo/Makefile.in:all-am: Makefile $(LIBRARIES) $(LTLIBRARIES) all-multi 
$(DATA) \
libgomp/Makefile.in:all-am: Makefile $(INFO_DEPS) $(LTLIBRARIES) 
all-multi $(HEADERS) \
libitm/Makefile.in:all-am: Makefile $(INFO_DEPS) $(LTLIBRARIES) 
all-multi $(HEADERS) \
libjava/Makefile.in:all-am: Makefile $(LTLIBRARIES) $(PROGRAMS) 
$(SCRIPTS) all-multi \

libmpx/Makefile.in:all-am: Makefile all-multi $(HEADERS) config.h
liboffloadmic/Makefile.in:all-am: Makefile $(LTLIBRARIES) all-multi 
$(HEADERS) all-local
libquadmath/Makefile.in:all-am: Makefile $(INFO_DEPS) $(LTLIBRARIES) 
all-multi $(HEADERS) \

libsanitizer/Makefile.in:all-am: Makefile all-multi $(HEADERS) config.h
libssp/Makefile.in:all-am: Makefile $(LTLIBRARIES) all-multi $(HEADERS) 
config.h

libstdc++-v3/Makefile.in:all-am: Makefile all-multi config.h
libvtv/Makefile.in:all-am: Makefile $(LTLIBRARIES) all-multi $(HEADERS)
lto-plugin/Makefile.in:all-am: Makefile $(LTLIBRARIES) config.h all-local
zlib/Makefile.in:all-am: Makefile $(LIBRARIES) $(LTLIBRARIES) all-multi


And if you look at the all-multi targets in each of those, all-multi has 
no dependencies.  So any of them which use auto-dependency generation 
are potentially going to bump into this problem.


We can't fix it by twiddling Makefile.in as that's a generated file.  I 
think it has to happen at the automake level.


jeff


Re: [PATCH][combine] PR rtl-optimization/68651 Try changing rtx from (r + r) to (r << 1) to aid recognition

2015-12-16 Thread Kyrill Tkachov


On 16/12/15 12:18, Bernd Schmidt wrote:

On 12/15/2015 05:21 PM, Kyrill Tkachov wrote:

Then for the shift pattern in the MD file we'd have to dynamically
select the scheduling type depending on whether or not the shift
amount is 1 and the costs line up?


Yes. This isn't unusual, take a look at i386.md where you have a lot of 
switches on attr type to decide which string to print.



I'm just worried that if we take this idea to its logical conclusion, we have 
to add a new canonicalisation rule:
"all (plus x x) expressions shall be expressed as (ashift x 1)".
Such a rule seems too specific to me and all targets would have to special-case 
it in their MD patterns and costs
if they ever wanted to treat an add and a shift differently.
In this particular case we'd have
to conditionalise the scheduling string selection on a particular CPU tuning 
and the shift amount, which will make
the pattern much harder to read.
To implement this properly we'd also have to


The price we pay when trying these substitutions is an iteration over
 the rtx with FOR_EACH_SUBRTX_PTR. recog gets called only if that
iteration actually performed a substitution of x + x into x << 1. Is
that too high a price to pay? (I'm not familiar with the performance
 characteristics of the FOR_EACH_SUBRTX machinery)


It depends on how many of these transforms we are going to try; it also feels 
very hackish, trying to work around the core design of the combiner. IMO it 
would be better for machine descriptions to work with the pass rather than 
against it.



Perhaps I'm lacking the historical context, but what is the core design of the 
combiner?
Why should the backend have to jump through these hoops if it already 
communicates to the midend
(through correct rtx costs) that a shift is more expensive than a plus?
I'd be more inclined to agree that this is perhaps a limitation in recog rather 
than combine,
but still not a backend problem.


If you can somehow arrange for the (plus x x) to be turned into a shift while 
substituting that might be yet another approach to try.



I did investigate where else we could make this transformation.
For the zero_extend+shift case (the ubfiz instruction from the testcase in my 
original submission)
we could fix this by modifying make_extraction to convert its argument to a 
shift from (plus x x)
as, in that context, shifts are undoubtedly more likely to simplify with the 
various extraction
operations that it's trying to perform.

That leaves the other case (orr + shift), where converting to a shift isn't a
simplification in any way, but the backend happens to have an instruction that 
matches the
combined orr+shift form. There we want to perform the transformation purely to 
aid
recognition, not out of any simplification considerations. That's what I'm 
trying to figure out
how to do now.

However, we want to avoid doing it unconditionally because if we have just a 
simple set
of y = x + x we want to leave it as a plus rather than a shift because it's 
cheaper
on that target.

Thanks,
Kyrill


Bernd




[PATCH] [graphite] update required isl version

2015-12-16 Thread Sebastian Pop
we check for a the isl compute timeout function added in isl 0.13.
That means GCC could still be configured with isl 0.13, 0.14, and 0.15.

* config/isl.m4 (ISL_CHECK_VERSION): Check for
isl_ctx_get_max_operations.
* configure: Regenerate.

gcc/
* config.in: Regenerate.
* configure: Regenerate.
* configure.ac: Remove checks for functions that exist in isl 0.13 or
later.
* graphite-isl-ast-to-gimple.c: Remove #ifdefs and code for isl 0.12.
* graphite-optimize-isl.c: Same.
* graphite-poly.c: Same.
* graphite-sese-to-poly.c: Same.
* graphite.h: Add comment for isl 0.14.
* toplev.c (print_version): Print isl version.
---
 config/isl.m4| 29 +++
 configure| 23 +--
 gcc/config.in| 12 
 gcc/configure| 61 ++--
 gcc/configure.ac | 23 ---
 gcc/graphite-isl-ast-to-gimple.c | 10 ++-
 gcc/graphite-optimize-isl.c  | 34 --
 gcc/graphite-poly.c  |  8 --
 gcc/graphite-sese-to-poly.c  |  8 --
 gcc/graphite.h   |  1 +
 gcc/toplev.c | 10 +--
 11 files changed, 49 insertions(+), 170 deletions(-)

diff --git a/config/isl.m4 b/config/isl.m4
index 459fac1..7387ff2 100644
--- a/config/isl.m4
+++ b/config/isl.m4
@@ -19,23 +19,23 @@
 
 # ISL_INIT_FLAGS ()
 # -
-# Provide configure switches for ISL support.
+# Provide configure switches for isl support.
 # Initialize isllibs/islinc according to the user input.
 AC_DEFUN([ISL_INIT_FLAGS],
 [
   AC_ARG_WITH([isl-include],
 [AS_HELP_STRING(
   [--with-isl-include=PATH],
-  [Specify directory for installed ISL include files])])
+  [Specify directory for installed isl include files])])
   AC_ARG_WITH([isl-lib],
 [AS_HELP_STRING(
   [--with-isl-lib=PATH],
-  [Specify the directory for the installed ISL library])])
+  [Specify the directory for the installed isl library])])
 
   AC_ARG_ENABLE(isl-version-check,
 [AS_HELP_STRING(
   [--disable-isl-version-check],
-  [disable check for ISL version])],
+  [disable check for isl version])],
 ENABLE_ISL_CHECK=$enableval,
 ENABLE_ISL_CHECK=yes)
   
@@ -58,15 +58,15 @@ AC_DEFUN([ISL_INIT_FLAGS],
   if test "x${with_isl_lib}" != x; then
 isllibs="-L$with_isl_lib"
   fi
-  dnl If no --with-isl flag was specified and there is in-tree ISL
+  dnl If no --with-isl flag was specified and there is in-tree isl
   dnl source, set up flags to use that and skip any version tests
-  dnl as we cannot run them before building ISL.
+  dnl as we cannot run them before building isl.
   if test "x${islinc}" = x && test "x${isllibs}" = x \
  && test -d ${srcdir}/isl; then
 isllibs='-L$$r/$(HOST_SUBDIR)/isl/'"$lt_cv_objdir"' '
 islinc='-I$$r/$(HOST_SUBDIR)/isl/include -I$$s/isl/include'
 ENABLE_ISL_CHECK=no
-AC_MSG_WARN([using in-tree ISL, disabling version check])
+AC_MSG_WARN([using in-tree isl, disabling version check])
   fi
 
   isllibs="${isllibs} -lisl"
@@ -75,7 +75,7 @@ AC_DEFUN([ISL_INIT_FLAGS],
 
 # ISL_REQUESTED (ACTION-IF-REQUESTED, ACTION-IF-NOT)
 # 
-# Provide actions for failed ISL detection.
+# Provide actions for failed isl detection.
 AC_DEFUN([ISL_REQUESTED],
 [
   AC_REQUIRE([ISL_INIT_FLAGS])
@@ -106,12 +106,17 @@ AC_DEFUN([ISL_CHECK_VERSION],
 LDFLAGS="${_isl_saved_LDFLAGS} ${isllibs}"
 LIBS="${_isl_saved_LIBS} -lisl"
 
-AC_MSG_CHECKING([for compatible ISL])
-AC_LINK_IFELSE([AC_LANG_PROGRAM([[#include ]], [[;]])],
-   [gcc_cv_isl=yes],
-   [gcc_cv_isl=no])
+AC_MSG_CHECKING([for isl 0.15 (or deprecated 0.14)])
+AC_TRY_LINK([#include ],
+[isl_ctx_get_max_operations (isl_ctx_alloc ());],
+[gcc_cv_isl=yes],
+[gcc_cv_isl=no])
 AC_MSG_RESULT([$gcc_cv_isl])
 
+if test "${gcc_cv_isl}" = no ; then
+  AC_MSG_RESULT([recommended isl version is 0.15, minimum required isl 
version 0.14 is deprecated])
+fi
+
 CFLAGS=$_isl_saved_CFLAGS
 LDFLAGS=$_isl_saved_LDFLAGS
 LIBS=$_isl_saved_LIBS
diff --git a/configure b/configure
index 090615f..a6495c4 100755
--- a/configure
+++ b/configure
@@ -1492,7 +1492,7 @@ Optional Features:
   build static libjava [default=no]
   --enable-bootstrap  enable bootstrapping [yes if native build]
   --disable-isl-version-check
-  disable check for ISL version
+  disable check for isl version
   --enable-ltoenable link time optimization support
   --enable-linker-plugin-configure-flags=FLAGS
   additional flags for configuring linker plugins
@@ -1553,8 +1553,8 @@ Optional Packages:
  

[PTX] xfail sibcall test

2015-12-16 Thread Nathan Sidwell

PTX doesn't support sibcalls, so this test is doomed to fail.
2015-12-16  Nathan Sidwell  

	* gcc.dg/sibcall-9.c: Xfail for nvptx.

Index: gcc.dg/sibcall-9.c
===
--- gcc.dg/sibcall-9.c	(revision 231689)
+++ gcc.dg/sibcall-9.c	(working copy)
@@ -5,7 +5,7 @@
Copyright (C) 2002 Free Software Foundation Inc.
Contributed by Hans-Peter Nilsson*/
 
-/* { dg-do run { xfail { { cris-*-* crisv32-*-* h8300-*-* hppa*64*-*-* m32r-*-* mcore-*-* mn10300-*-* msp430*-*-* nds32*-*-* xstormy16-*-* v850*-*-* vax-*-* xtensa*-*-* } || { arm*-*-* && { ! arm32 } } } } } */
+/* { dg-do run { xfail { { cris-*-* crisv32-*-* h8300-*-* hppa*64*-*-* m32r-*-* mcore-*-* mn10300-*-* msp430*-*-* nds32*-*-* nvptx-*-* xstormy16-*-* v850*-*-* vax-*-* xtensa*-*-* } || { arm*-*-* && { ! arm32 } } } } } */
 /* -mlongcall disables sibcall patterns.  */
 /* { dg-skip-if "" { powerpc*-*-* } { "-mlongcall" } { "" } } */
 /* { dg-options "-O2 -foptimize-sibling-calls" } */


RE: [Patch] Fix for MIPS PR target/65604

2015-12-16 Thread Moore, Catherine


> -Original Message-
> From: Steve Ellcey [mailto:sell...@imgtec.com]
> Sent: Tuesday, December 15, 2015 4:09 PM
> To: Moore, Catherine
> Cc: gcc-patches@gcc.gnu.org; matthew.fort...@imgtec.com
> Subject: RE: [Patch] Fix for MIPS PR target/65604
> 
> On Tue, 2015-12-15 at 15:13 +, Moore, Catherine wrote:
> 
> >
> > HI Steve, The patch is OK.  Will you please add a test case and repost?
> > Thanks,
> > Catherine
> 
> Here is the patch with a test case.
> 

Looks good, thanks.


Re: C PATCH for c/64637 (better location for -Wunused-value)

2015-12-16 Thread Jeff Law

On 12/16/2015 07:58 AM, Marek Polacek wrote:

The following improves the location for "statement with no effect" warning by
using the location of the expression if available.  Can't use EXPR_LOCATION as
*_DECLs still don't carry a location.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-12-16  Marek Polacek  

PR c/64637
* c-typeck.c (c_process_expr_stmt): Use location of the expression if
available.

* gcc.dg/pr64637.c: New test.

OK.
jeff



Re: ipa-cp heuristics fixes

2015-12-16 Thread Jakub Jelinek
On Wed, Dec 16, 2015 at 05:24:25PM +0100, Jan Hubicka wrote:
> I am trying to understand Jakub's debug code and perhaps it can be improved. 
> But in
> the case of optimized out unused parameters I think it is perfectly resonable 
> to
> say that the variable was optimized out.

As long as the values that would be passed to the unused parameters are
constant or live in memory or registers that isn't clobbered by the call in
the caller, we have the ability to express that in the debug info now, and
we should.

> As you can see, the testcase explicitely prevents ipa-cp constant propagation 
> by the
> asm statement.  We can just update the testcases to use the parameters and 
> test
> the original issue again.

No, the testcase intentionally wants to test unused arguments, they happen
in quite a lot of code and "optimized out" is not really desirable answer if
we can provide the values some other way.

Jakub


Re: ipa-cp heuristics fixes

2015-12-16 Thread Jan Hubicka
> On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
> > * ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
> > (good_cloning_opportunity_p): Likewise.
> > (gather_context_independent_values): Do not return true when
> > polymorphic call context is known or when we have known aggregate
> > value of unused parameter.
> > (estimate_local_effects): Try to create clone for all context
> > when either some params are substituted or devirtualization is possible
> > or some params can be removed; use local flag instead of
> > node->will_be_removed_from_program_if_no_direct_calls_p.
> > (identify_dead_nodes): Likewise.
> 
> This commit breaks several guality tests on S/390x:

The patch changes ipa-cp heuristics in a way that it will clone in order to 
remove
unused parameter.  Previously it did know how to remove unused parameter but 
would
do so only if some of function's parameter (perhaps the unused one) was also 
constant.
This is inconsistent.  As a result we do more ipa-cp cloning and I suppose we 
are
more likely to hit them on guality for simplified testcases that often contain 
unused
code:
int __attribute__((noinline))
foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7)
{
  char *x = __builtin_alloca (arg7);
  int __attribute__ ((aligned(32))) y;

  y = 2;
  asm (NOP : "=m" (y), "=m" (b) : "m" (y));
  x[0] = 25;
  asm (NOP : "=m" (x[0]), "=m" (a) : "m" (x[0]), "m" (b));
  return y;
}

int
main ()
{
  int l = 0;
  asm ("" : "=r" (l) : "0" (l));
  a = foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30);
  asm volatile ("" :: "r" (l));
  return 0;
}

I am trying to understand Jakub's debug code and perhaps it can be improved. 
But in
the case of optimized out unused parameters I think it is perfectly resonable to
say that the variable was optimized out.

As you can see, the testcase explicitely prevents ipa-cp constant propagation 
by the
asm statement.  We can just update the testcases to use the parameters and test
the original issue again.
> 
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 16 y == 2
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 18 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 18 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 18 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 18 y == 2
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 y == 2
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 y == 2
>  FAIL: gcc.dg/guality/pr36728-2.c   -O2  line 18 *x == (char) 25
>  FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  line 18 *x == (char) 25
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 16 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 16 y == 2
>  FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 18 *x == (char) 25
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  line 18 arg4 ==

Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-16 Thread Thomas Schwinge
Hi!

On Mon, 14 Dec 2015 20:17:33 +0300, Ilya Verbin  wrote:
> [updated patch]

This regresses libgomp.oacc-c-c++-common/declare-4.c compilation for
nvptx offloading:

spawn [...]/build-gcc/gcc/xgcc -B[...]/build-gcc/gcc/ 
[...]/source-gcc/libgomp/testsuite/libgomp.oacc-c/../libgomp.oacc-c-c++-common/declare-4.c
 -B[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp/ 
-B[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp/.libs 
-I[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp 
-I[...]/source-gcc/libgomp/testsuite/../../include 
-I[...]/source-gcc/libgomp/testsuite/.. -fmessage-length=0 
-fno-diagnostics-show-caret -fdiagnostics-color=never 
-B/libexec/gcc/x86_64-pc-linux-gnu/6.0.0 -B/bin 
-B[...]/build-gcc/gcc/accel/x86_64-intelmicemul-linux-gnu/fake_install/libexec/gcc/x86_64-pc-linux-gnu/6.0.0
 -B[...]/build-gcc/gcc/accel/x86_64-intelmicemul-linux-gnu/fake_install/bin 
-fopenacc -I[...]/source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 
-L[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp/.libs -lm -o ./declare-4.exe
ptxas /tmp/ccLXqNjE.o, line 50; error   : State space mismatch between 
instruction and address in instruction 'ld'
ptxas /tmp/ccLXqNjE.o, line 50; error   : Unknown symbol 'b_linkptr'
ptxas /tmp/ccLXqNjE.o, line 50; error   : Label expected for forward 
reference of 'b_linkptr'
ptxas fatal   : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
mkoffload: fatal error: 
[...]/build-gcc/gcc/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit 
status
compilation terminated.

That "b_linkptr" symbol is not declared/referenced in the test case
itself, libgomp/testsuite/libgomp.oacc-c-c++-common/declare-4.c:

/* { dg-do run  { target openacc_nvidia_accel_selected } } */

#include 
#include 

float b;
#pragma acc declare link (b)

#pragma acc routine
int
func (int a)
{
  b = a + 1;

  return b;
}

int
main (int argc, char **argv)
{
  float a;

  a = 2.0;

#pragma acc parallel copy (a)
  {
b = a;
a = 1.0;
a = a + b;
  }

  if (a != 3.0)
abort ();

  a = func (a);

  if (a != 4.0)
abort ();

  return 0;
}

..., but I see that the "b_linkptr" identifier is generated for "b" in
the new gcc/lto/lto.c:offload_handle_link_vars based on whether attribute
"omp declare target link" is set, so maybe we fail to set that one as
appropriate?  Jim, as the main author of the OpenACC declare
implementation, would you please have a look?  I have not yet studied in
detail the thread, starting at
,
that resulted in the trunk r231655 commit:

> gcc/c-family/
>   * c-common.c (c_common_attribute_table): Handle "omp declare target
>   link" attribute.
> gcc/
>   * cgraphunit.c (output_in_order): Do not assemble "omp declare target
>   link" variables in ACCEL_COMPILER.
>   * gimplify.c (gimplify_adjust_omp_clauses): Do not remove mapping of
>   "omp declare target link" variables.
>   * lto/lto.c: Include stringpool.h and fold-const.h.
>   (offload_handle_link_vars): New static function.
>   (lto_main): Call offload_handle_link_vars.
>   * omp-low.c (scan_sharing_clauses): Do not remove mapping of "omp
>   declare target link" variables.
>   (add_decls_addresses_to_decl_constructor): For "omp declare target link"
>   variables output address of the artificial pointer instead of address of
>   the variable.  Set most significant bit of the size to mark them.
>   (pass_data_omp_target_link): New pass_data.
>   (pass_omp_target_link): New class.
>   (find_link_var_op): New static function.
>   (make_pass_omp_target_link): New function.
>   * passes.def: Add pass_omp_target_link.
>   * tree-pass.h (make_pass_omp_target_link): Declare.
>   * varpool.c (symbol_table::output_variables): Do not assemble "omp
>   declare target link" variables in ACCEL_COMPILER.
> libgomp/
>   * libgomp.h (REFCOUNT_LINK): Define.
>   (struct splay_tree_key_s): Add link_key.
>   * target.c (gomp_map_vars): Treat REFCOUNT_LINK objects as not mapped.
>   Replace target address of the pointer with target address of newly
>   mapped object in the splay tree.  Set link pointer on target to the
>   device address of the mapped object.
>   (gomp_unmap_vars): Restore target address of the pointer in the splay
>   tree for REFCOUNT_LINK objects after unmapping.
>   (gomp_load_image_to_device): Set refcount to REFCOUNT_LINK for "omp
>   declare target link" objects.
>   (gomp_unload_image_from_device): Replace j with i.  Force unmap of all
>   "omp declare target link" objects, which were mapped for the image.
>   (gomp_exit_d

Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

2015-12-16 Thread Tom de Vries

On 10/12/15 14:14, Tom de Vries wrote:

[ copy-pasting-with-quote from
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00420.html , for some
reason I didn't get this email ]


On Thu, 3 Dec 2015, Tom de Vries wrote:

The flag is set here in expand_omp_target:
...
12682 /* Prevent IPA from removing child_fn as unreachable,
 since there are no
12683refs from the parent function to child_fn in offload
 LTO mode.  */
12684 if (ENABLE_OFFLOADING)
12685   cgraph_node::get (child_fn)->mark_force_output ();
...



How are there no refs from the "parent"?  Are there not refs from
some kind of descriptor that maps fallback CPU and offloaded variants?


That descriptor is the offload table, which is emitted in
omp_finish_file. The function iterates over vectors offload_vars and
offload_funcs.

[ I would guess there's a one-on-one correspondance between
symtab_node::offloadable and membership of either offload_vars or
offload_funcs. ]


I think the above needs sorting out in somw way, making the refs
explicit rather than implicit via force_output.


I've tried an approach where I add a test for node->offloadable next to
each test for node->force_output, except for the test in the nonlocal_p
def in ipa_pta_execute. But I didn't (yet) manage to make that work.


I guess setting forced_by_abi instead would also mean child_fn is not
removed
as unreachable, while still allowing optimizations:
...
  /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
 to be exported.  Unlike FORCE_OUTPUT this flag gets cleared to
 symbols promoted to static and it does not inhibit
 optimization.  */
  unsigned forced_by_abi : 1;
...

But I suspect that other optimizations (than ipa-pta) might break
things.


How so?


Probably it's more accurate to say that I do not understand the
difference very well between force_output and force_by_abi, and what is
the class of optimizations enabled by using forced_by_abi instead of
force_output.'


Essentially we have two situations:
- in the host compiler, there is no need for the forced_output flag,
  and it inhibits optimization
- in the accelerator compiler, it (or some equivalent) is needed


Actually, things are slightly more complicated, I realize now. There's
also the distinction between:
- symbols declared as offloadable in the source code, and
- symbols create by the compiler and marked offloadable


I wonder if setting the force_output flag only when streaming the
bytecode for
offloading would work. That way, it wouldn't be set in the host
compiler,
while being set in the accelerator compiler.


Yeah, that was my original thinking btw.


FTR, I've tried that approach, as attached. It fixed the
goacc/kernels-alias-ipa-pta*.c failures. And I ran target-libgomp (also
using an accelerator configuration) without any regressions.


How about this patch?

We remove the setting of force_output when:
- encountering offloadable symbols in the frontend, or
- creating offloadable symbols in expand-omp.

Instead, we set force_output in input_offload_tables.

This is an improvement because:
- it moves the force_output setting to a single location
- it does the force_output setting ALAP

Thanks,
- Tom
Mark symbols in offload tables with force_output in read_offload_tables

2015-12-15  Tom de Vries  

	* c-parser.c (c_parser_oacc_declare, c_parser_omp_declare_target): Don't
	set force_output.

	* parser.c (cp_parser_oacc_declare, cp_parser_omp_declare_target): Don't
	set force_output.

	* omp-low.c (expand_omp_target): Don't set force_output.
	* varpool.c (varpool_node::get_create): Same.
	* lto-cgraph.c (input_offload_tables): Mark entries in offload_vars and
	offload_funcs with force_output.

---
 gcc/c/c-parser.c | 10 ++
 gcc/cp/parser.c  | 10 ++
 gcc/lto-cgraph.c |  9 +
 gcc/omp-low.c|  5 -
 gcc/varpool.c|  1 -
 5 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 124c30b..6e6f4b8 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -13527,10 +13527,7 @@ c_parser_oacc_declare (c_parser *parser)
 		{
 		  g->have_offload = true;
 		  if (is_a  (node))
-			{
-			  vec_safe_push (offload_vars, decl);
-			  node->force_output = 1;
-			}
+			vec_safe_push (offload_vars, decl);
 		}
 		}
 	}
@@ -16412,10 +16409,7 @@ c_parser_omp_declare_target (c_parser *parser)
 		{
 		  g->have_offload = true;
 		  if (is_a  (node))
-		{
-		  vec_safe_push (offload_vars, t);
-		  node->force_output = 1;
-		}
+		vec_safe_push (offload_vars, t);
 		}
 	}
 	}
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index a420cf1..340cc4a 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -35091,10 +35091,7 @@ cp_parser_oacc_declare (cp_parser *parser, cp_token *pragma_tok)
 		{
 		  g->have_offload = true;
 		  if (is_a  (node))
-			{
-			  vec_safe_push (offload_vars, decl);
-			  node->force_

Re: Ping [PATCH] c++/42121 - diagnose invalid flexible array members

2015-12-16 Thread Martin Sebor

I think this caused PR68932 - FAIL:
obj-c++.dg/property/at-property-23.mm -fgnu-runtime (internal compiler
error)


Sorry about that. I'll look into it today.

Martin


Re: Ping [PATCH] c++/42121 - diagnose invalid flexible array members

2015-12-16 Thread Martin Sebor

I think this caused PR68932 - FAIL:
obj-c++.dg/property/at-property-23.mm -fgnu-runtime (internal compiler
error)


Sorry about that. I'll look into it today.

Martin


Re: [C++ Patch] Use default arguments in one more place

2015-12-16 Thread Jason Merrill

OK.

Jason


Re: [PATCH, IA64] Fix building a bare-metal ia64 compiler

2015-12-16 Thread Bernd Edlinger


On 16.12.2015 05:59, Bernd Edlinger wrote:
> Hi,
>
> On 16.12.2015 00:55 Bernd Schmidt wrote:
>> On 12/15/2015 10:13 PM, Bernd Edlinger wrote:
>>> due to recent discussion on the basic asm, and the special handling
>>> of ASM_INPUT in ia64, I tried to build a bare-metal cross-compiler
>>> for ia64, but that did not work, because it seems to be impossible to
>>> build it without having a stdlib.h.
>>
>> Actually David Howells has complained to me about this as well, it 
>> seems to be a problem when building a toolchain for kernel compilation.
>
> yes.  I am not sure, if this is also problematic, when building a 
> cross-glibc,
> then I need also a cross C compiler first, build glibc, and install .h 
> and objects
> to the sysroot, then I build gcc again, with all languages.
>
>>
>>> With the attached patch, I was finally able to build the cross
>>> compiler, by declaring abort in the way as it is done already in many
>>> other places at libgcc.
>>
>> Can you just use __builtin_abort ()? Ok with that change.
>>
>
> I will try, but ./config/ia64/unwind-ia64.c includes this header,
> and it also uses abort () at many places.  So I'd expect warnings
> there.
>
>

OK, no new warnings, because tsystem.h defines a prototype of abort.

=> checked in as r231697.

But I see that unwind-ia64.c does not only use abort, but also malloc and
free too.

If I see that right, the stack walk will crash if malloc fails. What happens
on this target, if new throws out_of_memory?

I think this function should be rewritten to use alloca instead.
(I have an idea how this could be done, but no way to test a patch :(


Bernd.


[Ping]Re: [AArch64] Simplify TLS pattern by hardcoding relocation modifiers into pattern

2015-12-16 Thread Jiong Wang



On 10/09/15 12:28, Jiong Wang wrote:

TLS instruction sequences are always with fixed format, there is no need
to use operand modifier, we can hardcode the relocation modifiers into
instruction pattern, all those redundant checks in aarch64_print_operand
can be removed.

OK for trunk?

2015-09-10  Jiong Wang  

gcc/
   * config/aarch64/aarch64.md (ldr_got_tiny): Hardcode relocation
   modifers.
   (tlsgd_small): Likewise.
   (tlsgd_tiny): Likewise.
   (tlsie_small_): Likewise.
   (tlsie_small_sidi): Likewise.
   (tlsie_tiny_): Likewise.
   (tlsie_tiny_sidi): Likewise.
   (tlsle12_): Likewise.
   (tlsle24_): Likewise.
   (tlsdesc_small_): Likewise.
   (tlsdesc_small_pseudo_): Likewise.
   (tlsdesc_tiny_): Likewise.
   (tlsdesc_tiny_pseudo_): Likewise.
   * config/aarch64/aarch64.c (aarch64_print_operand): Delete useless
   check on 'A', 'L', 'G'.
   


Ping ~

There is no functional change by this patch, but just cleanup of those
unnecessary use of output modifiers.

All these instruction sequences are always with fixed format, we can just
hardcode the relocation modifiers into instruction patterns, then all those
redundant checks in aarch64_print_operand can be removed.




Re: C PATCH for c/64637 (better location for -Wunused-value)

2015-12-16 Thread Marek Polacek
On Wed, Dec 16, 2015 at 10:04:05AM -0500, David Malcolm wrote:
> On Wed, 2015-12-16 at 15:58 +0100, Marek Polacek wrote:
> > The following improves the location for "statement with no effect" warning 
> > by
> > using the location of the expression if available.  Can't use EXPR_LOCATION 
> > as
> > *_DECLs still don't carry a location.
> 
> Out of interest, does it emit sane underlined ranges for these cases,
> with the patch?

Yes, it emits what I'd expect, e.g.:

pr64637.c:10:28: warning: statement with no effect [-Wunused-value]
   for (int i = 0; i < b; i + b)
  ~~^~~
Similarly for the rest.

(Yes, I could've used dg-begin-multiline-output + dg-end-multiline-output to
check that, but I think what I have right now in the test should be enough.)

Marek


Re: C PATCH for c/64637 (better location for -Wunused-value)

2015-12-16 Thread David Malcolm
On Wed, 2015-12-16 at 16:09 +0100, Marek Polacek wrote:
> On Wed, Dec 16, 2015 at 10:04:05AM -0500, David Malcolm wrote:
> > On Wed, 2015-12-16 at 15:58 +0100, Marek Polacek wrote:
> > > The following improves the location for "statement with no effect" 
> > > warning by
> > > using the location of the expression if available.  Can't use 
> > > EXPR_LOCATION as
> > > *_DECLs still don't carry a location.
> > 
> > Out of interest, does it emit sane underlined ranges for these cases,
> > with the patch?
> 
> Yes, it emits what I'd expect, e.g.:
> 
> pr64637.c:10:28: warning: statement with no effect [-Wunused-value]
>for (int i = 0; i < b; i + b)
>   ~~^~~
> Similarly for the rest.
> 
> (Yes, I could've used dg-begin-multiline-output + dg-end-multiline-output to
> check that, but I think what I have right now in the test should be enough.)

Excellent; thanks!



Re: C PATCH for c/64637 (better location for -Wunused-value)

2015-12-16 Thread David Malcolm
On Wed, 2015-12-16 at 15:58 +0100, Marek Polacek wrote:
> The following improves the location for "statement with no effect" warning by
> using the location of the expression if available.  Can't use EXPR_LOCATION as
> *_DECLs still don't carry a location.

Out of interest, does it emit sane underlined ranges for these cases,
with the patch?

> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2015-12-16  Marek Polacek  
> 
>   PR c/64637
>   * c-typeck.c (c_process_expr_stmt): Use location of the expression if
>   available.
> 
>   * gcc.dg/pr64637.c: New test.
> 
> diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
> index 9d6c604..a147ac6 100644
> --- gcc/c/c-typeck.c
> +++ gcc/c/c-typeck.c
> @@ -10131,7 +10131,7 @@ c_process_expr_stmt (location_t loc, tree expr)
>   out which is the result.  */
>if (!STATEMENT_LIST_STMT_EXPR (cur_stmt_list)
>&& warn_unused_value)
> -emit_side_effect_warnings (loc, expr);
> +emit_side_effect_warnings (EXPR_LOC_OR_LOC (expr, loc), expr);
>  
>exprv = expr;
>while (TREE_CODE (exprv) == COMPOUND_EXPR)
> diff --git gcc/testsuite/gcc.dg/pr64637.c gcc/testsuite/gcc.dg/pr64637.c
> index e69de29..779ff50 100644
> --- gcc/testsuite/gcc.dg/pr64637.c
> +++ gcc/testsuite/gcc.dg/pr64637.c
> @@ -0,0 +1,25 @@
> +/* PR c/64637 */
> +/* { dg-do compile } */
> +/* { dg-options "-Wunused" } */
> +
> +void g ();
> +
> +void
> +f (int b)
> +{
> +  for (int i = 0; i < b; i + b) /* { dg-warning "28:statement with no 
> effect" } */
> +g ();
> +  // PARM_DECLs still don't have a location, don't expect an exact location.
> +  for (int i = 0; i < b; b) /* { dg-warning "statement with no effect" } */
> +g ();
> +  for (int i = 0; i < b; !i) /* { dg-warning "26:statement with no effect" } 
> */
> +g ();
> +  for (!b;;) /* { dg-warning "8:statement with no effect" } */
> +g ();
> +  for (;; b * 2) /* { dg-warning "13:statement with no effect" } */
> +g ();
> +  ({
> + b / 5; /* { dg-warning "8:statement with no effect" } */
> + b ^ 5;
> +   });
> +}
> 
>   Marek




[PATCH] Fix PR68707, 67323

2015-12-16 Thread Richard Biener

The following patch adds a heuristic to prefer store/load-lanes
over SLP when vectorizing.  Compared to the variant attached to
the PR I made the STMT_VINFO_STRIDED_P behavior explicit (matching
what you've tested).

It's a heuristic that may end up vectorizing less loops or loops
in a less optimal way.

Thus I wait for your ok (it's essentially ARM specific).

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Ok?  It will require a bunch of vectorizer tests to be adjusted for
ARM I think.

Thanks,
Richard.

2015-12-16  Richard Biener  

PR tree-optimization/68707
PR tree-optimization/67323
* tree-vect-slp.c (vect_analyze_slp_instance): Drop SLP instances
if they can be vectorized using load/store-lane instructions.

Index: gcc/tree-vect-slp.c
===
*** gcc/tree-vect-slp.c (revision 231673)
--- gcc/tree-vect-slp.c (working copy)
*** vect_analyze_slp_instance (vec_info *vin
*** 1808,1813 
--- 1802,1836 
  }
  }
  
+   /* If the loads and stores can be handled with load/store-lane
+  instructions do not generate this SLP instance.  */
+   if (is_a  (vinfo)
+ && loads_permuted
+ && dr && vect_store_lanes_supported (vectype, group_size))
+   {
+ slp_tree load_node;
+ FOR_EACH_VEC_ELT (loads, i, load_node)
+   {
+ gimple *first_stmt = GROUP_FIRST_ELEMENT
+ (vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (load_node)[0]));
+ stmt_vec_info stmt_vinfo = vinfo_for_stmt (first_stmt);
+ if (! STMT_VINFO_STRIDED_P (stmt_vinfo)
+ && ! vect_load_lanes_supported
+(STMT_VINFO_VECTYPE (stmt_vinfo),
+ GROUP_SIZE (stmt_vinfo)))
+   break;
+   }
+ if (i == loads.length ())
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"Built SLP cancelled: can use "
+"load/store-lanes\n");
+   vect_free_slp_instance (new_instance);
+   return false;
+   }
+   }
+ 
vinfo->slp_instances.safe_push (new_instance);
  
if (dump_enabled_p ())


Re: [PATCH][AArch64] Properly cost zero_extend+ashift forms of ubfi[xz]

2015-12-16 Thread James Greenhalgh
On Fri, Dec 04, 2015 at 09:30:45AM +, Kyrill Tkachov wrote:
> Hi all,
> 
> We don't handle properly the patterns for the [us]bfiz and [us]bfx 
> instructions when they
> have an extend+ashift form. For example, the 
> *_ashl pattern.
> This leads to rtx costs recuring into the extend and assigning a cost to 
> these patterns that is too
> large.
> 
> This patch fixes that oversight.
> I stumbled across this when working on a different combine patch and ended up 
> matching the above
> pattern, only to have it rejected for -mcpu=cortex-a53 due to the erroneous 
> cost.
> 
> Bootstrapped and tested on aarch64.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2015-12-04  Kyrylo Tkachov  
> 
> * config/aarch64/aarch64.c (aarch64_extend_bitfield_pattern_p):
> New function.
> (aarch64_rtx_costs, ZERO_EXTEND, SIGN_EXTEND cases): Use the above
> to handle extend+shift rtxes.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> c97ecdc0859e0a24792a57aeb18b2e4ea35918f4..d180f6f2d37a280ad77f34caad8496ddaa6e01b2
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -5833,6 +5833,50 @@ aarch64_if_then_else_costs (rtx op0, rtx op1, rtx op2, 
> int *cost, bool speed)
>return false;
>  }
>  
> +/* Check whether X is a bitfield operation of the form shift + extend that
> +   maps down to a UBFIZ/SBFIZ/UBFX/SBFX instruction.  If so, return the
> +   operand to which the bitfield operation is applied to.  Otherwise return

No need for that second "to" at the end of the sentence.

> +   NULL_RTX.  */
> +
> +static rtx
> +aarch64_extend_bitfield_pattern_p (rtx x)
> +{
> +  rtx_code outer_code = GET_CODE (x);
> +  machine_mode outer_mode = GET_MODE (x);
> +
> +  if (outer_code != ZERO_EXTEND && outer_code != SIGN_EXTEND
> +  && outer_mode != SImode && outer_mode != DImode)
> +return NULL_RTX;
> +
> +  rtx inner = XEXP (x, 0);
> +  rtx_code inner_code = GET_CODE (inner);
> +  machine_mode inner_mode = GET_MODE (inner);
> +  rtx op = NULL_RTX;
> +
> +  switch (inner_code)
> +{
> +  case ASHIFT:
> + if (CONST_INT_P (XEXP (inner, 1))
> + && (inner_mode == QImode || inner_mode == HImode))
> +   op = XEXP (inner, 0);
> + break;
> +  case LSHIFTRT:
> + if (outer_code == ZERO_EXTEND && CONST_INT_P (XEXP (inner, 1))
> + && (inner_mode == QImode || inner_mode == HImode))
> +   op = XEXP (inner, 0);
> + break;
> +  case ASHIFTRT:
> + if (outer_code == SIGN_EXTEND && CONST_INT_P (XEXP (inner, 1))
> + && (inner_mode == QImode || inner_mode == HImode))
> +   op = XEXP (inner, 0);
> + break;
> +  default:
> + break;
> +}
> +
> +  return op;
> +}
> +
>  /* Calculate the cost of calculating X, storing it in *COST.  Result
> is true if the total cost of the operation has now been calculated.  */
>  static bool
> @@ -6521,6 +6565,14 @@ cost_plus:
> return true;
>   }
>  
> +  op0 = aarch64_extend_bitfield_pattern_p (x);
> +  if (op0)
> + {
> +   *cost += rtx_cost (op0, mode, ZERO_EXTEND, 0, speed);
> +   if (speed)
> + *cost += extra_cost->alu.bfx;
> +   return true;
> + }

Newline here.

>if (speed)
>   {
> if (VECTOR_MODE_P (mode))
> @@ -6552,6 +6604,14 @@ cost_plus:
> return true;
>   }
>  
> +  op0 = aarch64_extend_bitfield_pattern_p (x);
> +  if (op0)
> + {
> +   *cost += rtx_cost (op0, mode, SIGN_EXTEND, 0, speed);
> +   if (speed)
> + *cost += extra_cost->alu.bfx;
> +   return true;
> + }

And here.

>if (speed)
>   {
> if (VECTOR_MODE_P (mode))

OK with those changes.

Thanks,
James




C PATCH for c/64637 (better location for -Wunused-value)

2015-12-16 Thread Marek Polacek
The following improves the location for "statement with no effect" warning by
using the location of the expression if available.  Can't use EXPR_LOCATION as
*_DECLs still don't carry a location.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-12-16  Marek Polacek  

PR c/64637
* c-typeck.c (c_process_expr_stmt): Use location of the expression if
available.

* gcc.dg/pr64637.c: New test.

diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 9d6c604..a147ac6 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -10131,7 +10131,7 @@ c_process_expr_stmt (location_t loc, tree expr)
  out which is the result.  */
   if (!STATEMENT_LIST_STMT_EXPR (cur_stmt_list)
   && warn_unused_value)
-emit_side_effect_warnings (loc, expr);
+emit_side_effect_warnings (EXPR_LOC_OR_LOC (expr, loc), expr);
 
   exprv = expr;
   while (TREE_CODE (exprv) == COMPOUND_EXPR)
diff --git gcc/testsuite/gcc.dg/pr64637.c gcc/testsuite/gcc.dg/pr64637.c
index e69de29..779ff50 100644
--- gcc/testsuite/gcc.dg/pr64637.c
+++ gcc/testsuite/gcc.dg/pr64637.c
@@ -0,0 +1,25 @@
+/* PR c/64637 */
+/* { dg-do compile } */
+/* { dg-options "-Wunused" } */
+
+void g ();
+
+void
+f (int b)
+{
+  for (int i = 0; i < b; i + b) /* { dg-warning "28:statement with no effect" 
} */
+g ();
+  // PARM_DECLs still don't have a location, don't expect an exact location.
+  for (int i = 0; i < b; b) /* { dg-warning "statement with no effect" } */
+g ();
+  for (int i = 0; i < b; !i) /* { dg-warning "26:statement with no effect" } */
+g ();
+  for (!b;;) /* { dg-warning "8:statement with no effect" } */
+g ();
+  for (;; b * 2) /* { dg-warning "13:statement with no effect" } */
+g ();
+  ({
+ b / 5; /* { dg-warning "8:statement with no effect" } */
+ b ^ 5;
+   });
+}

Marek


Re: [PATCH][AArch64] PR target/68696 FAIL: gcc.target/aarch64/vbslq_u64_1.c scan-assembler-times bif\tv 1

2015-12-16 Thread James Greenhalgh
On Tue, Dec 08, 2015 at 09:21:29AM +, Kyrill Tkachov wrote:
> Hi all,
> 
> The test gcc.target/aarch64/vbslq_u64_1.c started failing recently due to 
> some tree-level changes.
> This just exposed a deficiency in our xor-and-xor pattern for the vector 
> bit-select pattern:
> aarch64_simd_bsl_internal.
> 
> We now fail to match the rtx:
> (set (reg:V4SI 79)
> (xor:V4SI (and:V4SI (xor:V4SI (reg:V4SI 32 v0 [ a ])
> (reg/v:V4SI 77 [ b ]))
> (reg:V4SI 34 v2 [ mask ]))
> (reg/v:V4SI 77 [ b ])))
> 
> whereas before combine attempted:
> (set (reg:V4SI 79)
> (xor:V4SI (and:V4SI (xor:V4SI (reg/v:V4SI 77 [ b ])
> (reg:V4SI 32 v0 [ a ]))
> (reg:V4SI 34 v2 [ mask ]))
> (reg/v:V4SI 77 [ b ])))
> 
> Note that just the order of the operands of the inner XOR has changed.
> This could be solved by making the second operand of the outer XOR a 4th 
> operand
> of the pattern, enforcing that it should be equal to operand 2 or 3 in the 
> pattern
> condition and performing the appropriate swapping in the output template.
> However, the aarch64_simd_bsl_internal pattern is expanded to by other
> places in aarch64-simd.md and updating all the callsites to add a 4th operand 
> is
> wasteful and makes them harder to understand.
> 
> Therefore this patch adds a new define_insn with the match_dup of operand 2 in
> the outer XOR.  I also had to update the alternatives/constraints in the 
> pattern
> and the output template. Basically it involves swapping operands 2 and 3 
> around in the
> constraints and output templates.

Yuck, but OK.

Thanks,
James



[PATCH] Fix PR68870

2015-12-16 Thread Richard Biener

This extends the previous fix for the CFG cleanup issue WRT dead SSA
defs to properly avoid doing sth fancy with conditons in the first pass.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-12-16  Richard Biener  

PR tree-optimization/68870
* tree-cfgcleanup.c (cleanup_control_expr_graph): Add first_p
parameter, if set only perform trivial constant folding.
Queue other blocks with conditions for later processing.
(cleanup_control_flow_bb): Add first_p parameter and pass it through.
(cleanup_tree_cfg_1): Pass true for the first iteration
cleanup_control_expr_graph.

* gcc.dg/torture/pr68870.c: New testcase.

Index: gcc/tree-cfgcleanup.c
===
--- gcc/tree-cfgcleanup.c   (revision 231673)
+++ gcc/tree-cfgcleanup.c   (working copy)
@@ -78,7 +78,8 @@ remove_fallthru_edge (vec *
at block BB.  */
 
 static bool
-cleanup_control_expr_graph (basic_block bb, gimple_stmt_iterator gsi)
+cleanup_control_expr_graph (basic_block bb, gimple_stmt_iterator gsi,
+   bool first_p)
 {
   edge taken_edge;
   bool retval = false;
@@ -95,15 +96,26 @@ cleanup_control_expr_graph (basic_block
   switch (gimple_code (stmt))
{
case GIMPLE_COND:
- {
-   code_helper rcode;
-   tree ops[3] = {};
-   if (gimple_simplify (stmt, &rcode, ops, NULL, no_follow_ssa_edges,
-no_follow_ssa_edges)
-   && rcode == INTEGER_CST)
- val = ops[0];
-   break;
- }
+ /* During a first iteration on the CFG only remove trivially
+dead edges but mark other conditions for re-evaluation.  */
+ if (first_p)
+   {
+ val = const_binop (gimple_cond_code (stmt), boolean_type_node,
+gimple_cond_lhs (stmt),
+gimple_cond_rhs (stmt));
+ if (! val)
+   bitmap_set_bit (cfgcleanup_altered_bbs, bb->index);
+   }
+ else
+   {
+ code_helper rcode;
+ tree ops[3] = {};
+ if (gimple_simplify (stmt, &rcode, ops, NULL, no_follow_ssa_edges,
+  no_follow_ssa_edges)
+ && rcode == INTEGER_CST)
+   val = ops[0];
+   }
+ break;
 
case GIMPLE_SWITCH:
  val = gimple_switch_index (as_a  (stmt));
@@ -176,7 +188,7 @@ cleanup_call_ctrl_altering_flag (gimple
true if anything changes.  */
 
 static bool
-cleanup_control_flow_bb (basic_block bb)
+cleanup_control_flow_bb (basic_block bb, bool first_p)
 {
   gimple_stmt_iterator gsi;
   bool retval = false;
@@ -199,7 +211,7 @@ cleanup_control_flow_bb (basic_block bb)
   || gimple_code (stmt) == GIMPLE_SWITCH)
 {
   gcc_checking_assert (gsi_stmt (gsi_last_bb (bb)) == stmt);
-  retval |= cleanup_control_expr_graph (bb, gsi);
+  retval |= cleanup_control_expr_graph (bb, gsi, first_p);
 }
   else if (gimple_code (stmt) == GIMPLE_GOTO
   && TREE_CODE (gimple_goto_dest (stmt)) == ADDR_EXPR
@@ -680,7 +692,7 @@ cleanup_tree_cfg_1 (void)
 {
   bb = BASIC_BLOCK_FOR_FN (cfun, i);
   if (bb)
-   retval |= cleanup_control_flow_bb (bb);
+   retval |= cleanup_control_flow_bb (bb, true);
 }
 
   /* After doing the above SSA form should be valid (or an update SSA
@@ -708,7 +720,7 @@ cleanup_tree_cfg_1 (void)
   if (!bb)
continue;
 
-  retval |= cleanup_control_flow_bb (bb);
+  retval |= cleanup_control_flow_bb (bb, false);
   retval |= cleanup_tree_cfg_bb (bb);
 }
 
Index: gcc/testsuite/gcc.dg/torture/pr68870.c
===
--- gcc/testsuite/gcc.dg/torture/pr68870.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr68870.c  (working copy)
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+
+int printf (const char *, ...);
+
+int a, f, g;
+char b, d;
+short c;
+static short e;
+
+char
+fn1 ()
+{
+  for (; b; b++)
+{
+  int h = 5;
+  for (a = 0; a < 1; a++)
+   {
+ for (d = 0; d < 1; d++)
+   for (c = 0; c < 1; c++)
+ for (; e >= 0;)
+   return 5;
+ if (f)
+   h = 0;
+   }
+  if (h)
+   printf ("%d", 0);
+}
+  return g;
+}


Re: [PATCH, 4/16] Implement -foffload-alias

2015-12-16 Thread Tom de Vries

On 16/12/15 14:16, Richard Biener wrote:

On Mon, 14 Dec 2015, Tom de Vries wrote:


On 14/12/15 14:26, Richard Biener wrote:

On Sun, 13 Dec 2015, Tom de Vries wrote:


On 11/12/15 14:00, Richard Biener wrote:

On Fri, 11 Dec 2015, Tom de Vries wrote:


On 13/11/15 12:39, Jakub Jelinek wrote:

We simply have some compiler internal interface between the caller
and
callee of the outlined regions, each interface in between those has
its own structure type used to communicate the info;
we can attach attributes on the fields, or some flags to indicate
some
properties interesting from aliasing POV.  We don't really need to
perform
full IPA-PTA, perhaps it would be enough to a) record somewhere in
cgraph
the relationship in between such callers and callees (for offloading
regions
we already have "omp target entrypoint" attribute on the callee and
a
singler caller), tell LTO if possible not to split those into
different
partitions if easily possible, and then just for these pairs perform
aliasing/points-to analysis in the caller and the result record
using
cliques/special attributes/whatever to the callee side, so that the
callee
(outlined OpenMP/OpenACC/Cilk+ region) can then improve its alias
analysis.


Hi,

This work-in-progress patch allows me to use IPA PTA information in
the
kernels pass group.

Since:
-  I'm running IPA PTA before ealias, and IPA PTA does not interpret
  restrict, and
- compute_may_alias doesn't run if IPA PTA information is present
I needed to convince ealias to do the restrict clique/base annotation.

It would be more logical to fit IPA PTA after ealias, but one is an
IPA
pass,
the other a regular one-function pass, so I would have to split the
containing
pass groups pass_all_early_optimizations and
pass_local_optimization_passes.
I'll give that a try now.



I've tried this approach, but realized that this changes the order in
which
non-openacc functions are processed in the compiler, so I've abandoned
this
idea.


Any comments?


I don't think you want to run IPA PTA before early
optimizations, it (and ealias) rely on some initial cleanup to
do anything meaningful with well-spent ressources.

The local PTA "hack" also looks more like a waste of resources, but well
... teaching IPA PTA to honor restrict might be an impossible task
though I didn't think much about it other than handling it only for
nonlocal_p functions (for others we should see all incoming args
if IPA PTA works optimally).  The restrict tags will leak all over
the place of course and in the end no meaningful cliques may remain.



This patch:
- moves the kernels pass group to the first position in the pass list
after ealias where we're back in ipa mode
- inserts an new ipa pass to contain the gimple pass group called
pass_oacc_ipa
- inserts a version of ipa-pta before the pass group.


In principle I like this a lot, but

+  NEXT_PASS (pass_ipa_pta_oacc_kernels);
+  NEXT_PASS (pass_oacc_ipa);
+  PUSH_INSERT_PASSES_WITHIN (pass_oacc_ipa)

I think you can put pass_ipa_pta_oacc_kernels into the pass_oacc_ipa
group and thus just "clone" ipa_pta?


Done. But using a clone means using the same gate function, and that means
that this pass_ipa_pta instance no longer runs by default for openacc by
default.

I've added enabling-by-default of fipa-pta for fopenacc in
default_options_optimization to fix that.


Hmm, but that enables both IPA PTA passes then?


Yes. An alternative could be to:
- have 'NEXT_PASS (pass_ipa_pta, true/false /* oacc_p */)' in the pass
  list,
- declare a new flag fipa-pta-oacc, and
- use fipa-pta or fipa-pta-oacc in the gate function depending on
  oacc_p.


I suppose that's ok,
and if not enabling the "late" IPA PTA you'd want to re-set
gimple_df->ipa_pta.


sub-passes of IPA passes can
be both ipa passes and non-ipa passes.


Right. It does mean that I need yet another pass (pass_ipa_oacc_kernels) to do
the IPA/non-IPA transition at pass/sub-pass boundary:
...
   NEXT_PASS (pass_ipa_oacc);
   PUSH_INSERT_PASSES_WITHIN (pass_ipa_oacc)
   NEXT_PASS (pass_ipa_pta);
   NEXT_PASS (pass_ipa_oacc_kernels);
   PUSH_INSERT_PASSES_WITHIN (pass_ipa_oacc_kernels)
  /* out-of-ipa */
  NEXT_PASS (pass_oacc_kernels);
  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
...

OK for stage3 if bootstrap and reg-test succeeds?


Ok.


Committed as attached, with the following changes:
- test for opt->value of OPT_fopenacc in default_options_optimization,
  to prevent fipa-pta to be switched on by default for -fno-openacc.
- fixed pta -> pta2 scan failures.

Thanks,
- Tom


Add pass_oacc_ipa

2015-12-14  Tom de Vries  

	* opts.c (default_options_optimization): Set fipa-pta on by default for
	fopenacc.
	* passes.def: Move kernels pass group to pass_ipa_oacc.
	* tree-pass.h (make_pass_oacc_kernels2): Remove.
	(make_pass_ipa_oacc, make_pass_ipa_oacc_kernels): Declare.
	* tree-ssa-loop.c (pass_oacc_kernels2, make_pass_oacc_kernels2): Remove.
	(pass_ipa_oacc, pass_ipa_oacc_kernels)

Re: [PATCH PR68906]

2015-12-16 Thread Richard Biener
On Wed, Dec 16, 2015 at 3:36 PM, Yuri Rumyantsev  wrote:
> Richard,
>
> Here is updated patch which includes (1) a test on exit proposed by
> you and (2) another test from PR68021 which is caught by new check on
> counted loop. Outer-loop unswitching is not performed for both new
> tests.

As said I don't think

   /* If the loop is not expected to iterate, there is no need
   for unswitching.  */
-  iterations = estimated_loop_iterations_int (loop);
-  if (iterations >= 0 && iterations <= 1)
+  niters = number_of_latch_executions (loop);
+  if (!niters || chrec_contains_undetermined (niters))
 {

is good.  We do want to retain the estimated_loop_iterations check
as it takes into account profile data while yours lets through all
counted loops.

Also I don't see why SCEV needs to be able to analyze the IV to check
for validity.

Can you please split the patch into the part I suggested (which is ok)
and the rest?

Thanks,
Richard.

>
> Bootstrapping and regression testing did not show any new failures.
>
> Is it OK for trunk.
>
> ChangeLog:
> 2014-12-16  Yuri Rumyantsev  
>
> PR tree-optimization/68021
> PR tree-optimization/68906
> * tree-ssa-loop-unswitch.c : Include couple header files.
> (tree_unswitch_outer_loop): Add check that an exit is not inside inner
> loop, use number_of_latch_executions to detect non-iterated loops.
>
> gcc/testsuite/ChangeLog
> * gcc.dg/torture/pr68021.c: New test.
> * gcc.dg/torture/pr68906.c: Likewise.
>
> 2015-12-16 15:51 GMT+03:00 Richard Biener :
>> On Wed, Dec 16, 2015 at 1:14 PM, Yuri Rumyantsev  wrote:
>>> Hi All,
>>>
>>> Here is simple patch which cures the issue with outer-loop unswitching
>>> - added invocation of number_of_latch_executions() to reject
>>> unswitching for non-iterated loops.
>>>
>>> Bootstrapping and regression testing did not show any new failures.
>>> Is it OK for trunk?
>>
>> No, that looks like just papering over the issue.
>>
>> The issue (with the 2nd testcase at least) is that single_exit () accepts
>> an exit from the inner loop.
>>
>> Index: gcc/tree-ssa-loop-unswitch.c
>> ===
>> --- gcc/tree-ssa-loop-unswitch.c(revision 231686)
>> +++ gcc/tree-ssa-loop-unswitch.c(working copy)
>> @@ -431,7 +431,7 @@ tree_unswitch_outer_loop (struct loop *l
>>  return false;
>>/* Accept loops with single exit only.  */
>>exit = single_exit (loop);
>> -  if (!exit)
>> +  if (!exit || exit->src->loop_father != loop)
>>  return false;
>>/* Check that phi argument of exit edge is not defined inside loop.  */
>>if (!check_exit_phi (loop))
>>
>> fixes the runtime testcase for me (not suitable for the testsuite due
>> to the infinite
>> looping though).
>>
>> Can you please bootstrap/test the above with your testcase?  The above patch 
>> is
>> ok if it passes testing (no time myself right now)
>>
>> Thanks,
>> Richard.
>>
>>> ChangeLog:
>>>
>>> 2014-12-16  Yuri Rumyantsev  
>>>
>>> PR tree-optimization/68906
>>> * tree-ssa-loop-unswitch.c : Include couple header files.
>>> (tree_unswitch_outer_loop): Use number_of_latch_executions
>>> to reject non-iterated loops.
>>>
>>> gcc/testsuite/ChangeLog
>>> * gcc.dg/torture/pr68906.c: New test.


Re: [PATCH][AArch64] Avoid emitting zero immediate as zero register

2015-12-16 Thread James Greenhalgh
On Tue, Dec 15, 2015 at 11:17:35AM +, Wilco Dijkstra wrote:
> ping
> 
> > -Original Message-
> > From: Wilco Dijkstra [mailto:wdijk...@arm.com]
> > Sent: 28 October 2015 17:33
> > To: GCC Patches
> > Subject: [PATCH][AArch64] Avoid emitting zero immediate as zero register
> > 
> > Several instructions accidentally emit wzr/xzr even when the pattern
> > specifies an immediate. Fix this by removing the register specifier in
> > patterns that emit immediates.
> > 
> > Passes regression tests. OK for commit?
> > 
> > ChangeLog:
> > 2015-10-28  Wilco Dijkstra  
> > 
> > * gcc/config/aarch64/aarch64.md (ccmp_and): Emit
> > immediate as %1.
> > (ccmp_ior): Likewise.
> > (add3_compare0): Likewise.
> > (addsi3_compare0_uxtw): Likewise.
> > (add3nr_compare0): Likewise.
> > (compare_neg): Likewise.
> > (3): Likewise.

Remove the gcc/ from the ChangeLog entires. Otherwise, this is OK.

Thanks,
James



Re: [PATCH PR68906]

2015-12-16 Thread Yuri Rumyantsev
Richard,

Here is updated patch which includes (1) a test on exit proposed by
you and (2) another test from PR68021 which is caught by new check on
counted loop. Outer-loop unswitching is not performed for both new
tests.

Bootstrapping and regression testing did not show any new failures.

Is it OK for trunk.

ChangeLog:
2014-12-16  Yuri Rumyantsev  

PR tree-optimization/68021
PR tree-optimization/68906
* tree-ssa-loop-unswitch.c : Include couple header files.
(tree_unswitch_outer_loop): Add check that an exit is not inside inner
loop, use number_of_latch_executions to detect non-iterated loops.

gcc/testsuite/ChangeLog
* gcc.dg/torture/pr68021.c: New test.
* gcc.dg/torture/pr68906.c: Likewise.

2015-12-16 15:51 GMT+03:00 Richard Biener :
> On Wed, Dec 16, 2015 at 1:14 PM, Yuri Rumyantsev  wrote:
>> Hi All,
>>
>> Here is simple patch which cures the issue with outer-loop unswitching
>> - added invocation of number_of_latch_executions() to reject
>> unswitching for non-iterated loops.
>>
>> Bootstrapping and regression testing did not show any new failures.
>> Is it OK for trunk?
>
> No, that looks like just papering over the issue.
>
> The issue (with the 2nd testcase at least) is that single_exit () accepts
> an exit from the inner loop.
>
> Index: gcc/tree-ssa-loop-unswitch.c
> ===
> --- gcc/tree-ssa-loop-unswitch.c(revision 231686)
> +++ gcc/tree-ssa-loop-unswitch.c(working copy)
> @@ -431,7 +431,7 @@ tree_unswitch_outer_loop (struct loop *l
>  return false;
>/* Accept loops with single exit only.  */
>exit = single_exit (loop);
> -  if (!exit)
> +  if (!exit || exit->src->loop_father != loop)
>  return false;
>/* Check that phi argument of exit edge is not defined inside loop.  */
>if (!check_exit_phi (loop))
>
> fixes the runtime testcase for me (not suitable for the testsuite due
> to the infinite
> looping though).
>
> Can you please bootstrap/test the above with your testcase?  The above patch 
> is
> ok if it passes testing (no time myself right now)
>
> Thanks,
> Richard.
>
>> ChangeLog:
>>
>> 2014-12-16  Yuri Rumyantsev  
>>
>> PR tree-optimization/68906
>> * tree-ssa-loop-unswitch.c : Include couple header files.
>> (tree_unswitch_outer_loop): Use number_of_latch_executions
>> to reject non-iterated loops.
>>
>> gcc/testsuite/ChangeLog
>> * gcc.dg/torture/pr68906.c: New test.


patch.1
Description: Binary data


Re: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS

2015-12-16 Thread James Greenhalgh
On Wed, Dec 16, 2015 at 01:05:21PM +, Wilco Dijkstra wrote:
> James Greenhalgh wrote:
> > On Tue, Dec 15, 2015 at 10:54:49AM +, Wilco Dijkstra wrote:
> > > ping
> > >
> > > > -Original Message-
> > > > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> > > > Sent: 06 November 2015 20:06
> > > > To: 'gcc-patches@gcc.gnu.org'
> > > > Subject: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > > >
> > > > This patch adds support for the TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > > > hook. When the cost of GENERAL_REGS and FP_REGS is identical, the 
> > > > register
> > > > allocator always uses ALL_REGS even when it has a much higher cost. The
> > > > hook changes the class to either FP_REGS or GENERAL_REGS depending on 
> > > > the
> > > > mode of the register. This results in better register allocation 
> > > > overall,
> > > > fewer spills and reduced codesize - particularly in SPEC2006 gamess.
> > > >
> > > > GCC regression passes with several minor fixes.
> > > >
> > > > OK for commit?
> > > >
> > > > ChangeLog:
> > > > 2015-11-06  Wilco Dijkstra  
> > > >
> > > > * gcc/config/aarch64/aarch64.c
> > > > (TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): New define.
> > > > (aarch64_ira_change_pseudo_allocno_class): New function.
> > > > * gcc/testsuite/gcc.target/aarch64/cvtf_1.c: Build with -O2.
> > > > * gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > (test_corners_sisd_di): Improve force to SIMD register.
> > > > (test_corners_sisd_si): Likewise.
> > > > * gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c: Build with 
> > > > -O2.
> > > > * gcc/testsuite/gcc.target/aarch64/vect-ld1r-compile-fp.c:
> > > > Remove scan-assembler check for ldr.
> > 
> > Drop the gcc/ from the ChangeLog.
> > 
> > > > --
> > > >  gcc/config/aarch64/aarch64.c   | 22 
> > > > ++
> > > >  gcc/testsuite/gcc.target/aarch64/cvtf_1.c  |  2 +-
> > > >  gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c  |  4 ++--
> > > >  gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c |  2 +-
> > > >  .../gcc.target/aarch64/vect-ld1r-compile-fp.c  |  1 -
> > 
> > These testsuite changes concern me a bit, and you don't mention them beyond
> > saying they are minor fixes...
> 
> Well any changes to register allocator preferencing would cause fallout in
> tests that are assuming which register is allocated, especially if they use
> nasty inline assembler hacks to do so...

Sure, but the testcases here each operate on data that should live in
FP_REGS given the initial conditions that the nasty hacks try to mimic -
that's what makes the regressions notable.

>
> > > >  #define FCVTDEF(ftype,itype) \
> > > >  void \
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c 
> > > > b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > index 363f554..8465c89 100644
> > > > --- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > @@ -186,9 +186,9 @@ test_corners_sisd_di (Int64x1 b)
> > > >  {
> > > >force_simd_di (b);
> > > >b = b >> 63;
> > > > +  force_simd_di (b);
> > > >b = b >> 0;
> > > >b += b >> 65; /* { dg-warning "right shift count >= width of type" } 
> > > > */
> > > > -  force_simd_di (b);
> > 
> > This one I don't understand, but seems to say that we've decided to move
> > b out of FP_REGS after getting it in there for b = b << 63; ? So this is
> > another register allocator regression?
> 
> No, basically the register allocator is now making better decisions as to
> where to allocate integer variables. It will only allocate them to FP
> registers if they are primarily used by other FP operations. The
> force_simd_di inline assembler tries to mimic FP uses, and if there are
> enough of them at the right places then everything works as expected.  If
> however you do 3 consecutive integer operations then the allocator will now
> correctly prefer to allocate them to the integer registers (while previously
> it wouldn't, which is inefficient).

I'm not sure I understand this argument in the abstract (though I believe
it for some of the supported cores for the AArch64 target). At an abstract
level, given a set of operations which can execute in either FP_REGS or
GENERAL_REGS and initial and post conditions that allocate all input and
output registers from those operations to FP_REGS, I would expect those
operations to take place using FP_REGS? Your patch seems to break this
expectation?

> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c 
> > > > b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
> > > > index a49db3e..c5a9c52 100644
> > > > --- a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
> > > > @@ -1,6 +1,6 @@
> > > >  /* Test vdup_lane intrinsics work correctly.  */
> > > >  /* { dg-do run } */
> > > 

Re: [PATCH] S/390: Allow to use r1 to r4 as literal pool base.

2015-12-16 Thread Ulrich Weigand
Dominik Vogt wrote:
> On Wed, Dec 16, 2015 at 01:51:45PM +0100, Ulrich Weigand wrote:
> > Dominik Vogt wrote:
> > > > r2 through r4 should be fine.  [ Not sure if there will be many (any?) 
> > > > cases
> > > > where one of those is unused but r5 isn't, however. ]
> > > 
> > > This can happen if the function only uses register pairs
> > > (__int128).  Actually I'm not sure whether r2 and r4 are valid
> > > candidates.
> > 
> > Huh?  Why not?
> 
> Because I'm not sure it is possible to write code where r2 (r4) is
> free but r3 (r5) is not - at least when s390_emit_prologue is
> called.  Writing code that uses r4 and r5 but not r3 was diffucult
> enough:

Ah, OK.  I agree that it will rarely happen (it could in more
complex cases where something initially uses r2 but a very late
optimization pass manages to eliminate that use).

However, when it is *is* free, it is a valid candidate in the
sense that it would be safe and correct to use it.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



[PTX] simplify calling struct

2015-12-16 Thread Nathan Sidwell
PTX's machine_function structure squirrels away the function type to calculate 
the presence of  varadic args later,  rather than calculate it immediately.  It 
also uses an rtx field as a boolean.  This patch reorganizes it with less 
verbose names and more apt types.


I also noticed that nvptx_hard_regno_mode_ok wasn't  being used, so that's 
deleted.

nathan
2015-12-16  Nathan Sidwell  

	* config/nvptx/nvptx-protos.h (nvptx_hard_regno_mode_ok): Delete.
	* config/nvptx/nvptx.h (struct machine_function):
	Reimplement. Adjust all users.
	* config/nvptx/nvptx.c (nvptx_declare_function_name): Move stack
	and frame array generation earlier.
	(nvptx_call_args): Reimplement.
	(nvptx_expand_call): Adjust.
	(nvptx_hard_reno_mode_ok): Delete.
	(nvptx_reorg): Revert scan of hard regs.

Index: config/nvptx/nvptx-protos.h
===
--- config/nvptx/nvptx-protos.h	(revision 231689)
+++ config/nvptx/nvptx-protos.h	(working copy)
@@ -41,7 +41,6 @@ extern const char *nvptx_ptx_type_from_m
 extern const char *nvptx_output_mov_insn (rtx, rtx);
 extern const char *nvptx_output_call_insn (rtx_insn *, rtx, rtx);
 extern const char *nvptx_output_return (void);
-extern bool nvptx_hard_regno_mode_ok (int, machine_mode);
 extern rtx nvptx_maybe_convert_symbolic_operand (rtx);
 #endif
 #endif
Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 231689)
+++ config/nvptx/nvptx.c	(working copy)
@@ -147,7 +147,7 @@ static struct machine_function *
 nvptx_init_machine_status (void)
 {
   struct machine_function *p = ggc_cleared_alloc ();
-  p->ret_reg_mode = VOIDmode;
+  p->return_mode = VOIDmode;
   return p;
 }
 
@@ -487,7 +487,7 @@ nvptx_strict_argument_naming (cumulative
 static rtx
 nvptx_libcall_value (machine_mode mode, const_rtx)
 {
-  if (cfun->machine->start_call == NULL_RTX)
+  if (!cfun->machine->doing_call)
 /* Pretend to return in a hard reg for early uses before pseudos can be
generated.  */
 return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);
@@ -506,7 +506,7 @@ nvptx_function_value (const_tree type, c
 
   if (outgoing)
 {
-  cfun->machine->ret_reg_mode = mode;
+  cfun->machine->return_mode = mode;
   return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);
 }
 
@@ -678,14 +678,14 @@ write_return_type (std::stringstream &s,
 	 optimization-level specific, so no caller can make use of
 	 this data, but more importantly for us, we must ensure it
 	 doesn't change the PTX prototype.  */
-  mode = (machine_mode) cfun->machine->ret_reg_mode;
+  mode = (machine_mode) cfun->machine->return_mode;
 
   if (mode == VOIDmode)
 	return return_in_mem;
 
-  /* Clear ret_reg_mode to inhibit copy of retval to non-existent
+  /* Clear return_mode to inhibit copy of retval to non-existent
 	 retval parameter.  */
-  cfun->machine->ret_reg_mode = VOIDmode;
+  cfun->machine->return_mode = VOIDmode;
 }
   else
 mode = promote_return (mode);
@@ -989,7 +989,18 @@ nvptx_declare_function_name (FILE *file,
 
   fprintf (file, "%s", s.str().c_str());
 
-  if (regno_reg_rtx[OUTGOING_STATIC_CHAIN_REGNUM] != const0_rtx)
+  /* Declare a local var for outgoing varargs.  */
+  if (cfun->machine->has_varadic)
+init_frame (file, STACK_POINTER_REGNUM,
+		UNITS_PER_WORD, crtl->outgoing_args_size);
+
+  /* Declare a local variable for the frame.  */
+  HOST_WIDE_INT sz = get_frame_size ();
+  if (sz || cfun->machine->has_chain)
+init_frame (file, FRAME_POINTER_REGNUM,
+		crtl->stack_alignment_needed / BITS_PER_UNIT, sz);
+
+  if (cfun->machine->has_chain)
 fprintf (file, "\t.reg.u%d %s;\n", GET_MODE_BITSIZE (Pmode),
 	 reg_names[OUTGOING_STATIC_CHAIN_REGNUM]);
 
@@ -1010,17 +1021,6 @@ nvptx_declare_function_name (FILE *file,
 	}
 }
 
-  /* Declare a local var for outgoing varargs.  */
-  if (cfun->machine->has_call_with_varargs)
-init_frame (file, STACK_POINTER_REGNUM,
-		UNITS_PER_WORD, crtl->outgoing_args_size);
-
-  /* Declare a local variable for the frame.  */
-  HOST_WIDE_INT sz = get_frame_size ();
-  if (sz || cfun->machine->has_call_with_sc)
-init_frame (file, FRAME_POINTER_REGNUM,
-		crtl->stack_alignment_needed / BITS_PER_UNIT, sz);
-
   /* Emit axis predicates. */
   if (cfun->machine->axis_predicate[0])
 nvptx_init_axis_predicate (file,
@@ -1036,7 +1036,7 @@ nvptx_declare_function_name (FILE *file,
 const char *
 nvptx_output_return (void)
 {
-  machine_mode mode = (machine_mode)cfun->machine->ret_reg_mode;
+  machine_mode mode = (machine_mode)cfun->machine->return_mode;
 
   if (mode != VOIDmode)
 fprintf (asm_out_file, "\tst.param%s\t[%s_out], %s;\n",
@@ -1076,20 +1076,28 @@ nvptx_get_drap_rtx (void)
argument to the next call.  */
 
 static void
-nvptx_call_args (rtx arg, tree funtype)
+nvptx_call_args (rtx arg, tree fntype)
 {
-  if (cfun->machine->start_call == NULL_RTX)
+  if (!cfun->machine->doing_call)
 

Re: [PATCH] S/390: Allow to use r1 to r4 as literal pool base.

2015-12-16 Thread Dominik Vogt
On Wed, Dec 16, 2015 at 01:51:45PM +0100, Ulrich Weigand wrote:
> Dominik Vogt wrote:
> > > r2 through r4 should be fine.  [ Not sure if there will be many (any?) 
> > > cases
> > > where one of those is unused but r5 isn't, however. ]
> > 
> > This can happen if the function only uses register pairs
> > (__int128).  Actually I'm not sure whether r2 and r4 are valid
> > candidates.
> 
> Huh?  Why not?

Because I'm not sure it is possible to write code where r2 (r4) is
free but r3 (r5) is not - at least when s390_emit_prologue is
called.  Writing code that uses r4 and r5 but not r3 was diffucult
enough:

  __int128 gi;
  const int c = 0x12345678u;
  int foo(void)
  {
gi += c;
return c;
  }

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



[PATCH, obvious, i386] Remove duplicate check for CLZERO.

2015-12-16 Thread Kirill Yukhin
Hello,
ix86_target_macros_internal () contains duplicated check of `clzero'
option.
I've committed to main trunk as obvious patch in the bottom.

gcc/
* config/i386/i386-c.c (ix86_target_macros_internal): Remove
duplicate check (__CLZERO__).

--
Thanks, K

Index: gcc/config/i386/i386-c.c
===
--- gcc/config/i386/i386-c.c(revision 231687)
+++ gcc/config/i386/i386-c.c(revision 231688)
@@ -439,8 +439,6 @@
 def_or_undef (parse_in, "__CLWB__");
   if (isa_flag & OPTION_MASK_ISA_MWAITX)
 def_or_undef (parse_in, "__MWAITX__");
-  if (isa_flag & OPTION_MASK_ISA_CLZERO)
-def_or_undef (parse_in, "__CLZERO__");
   if (TARGET_IAMCU)
 {
   def_or_undef (parse_in, "__iamcu");


Re: [RFC] Request for comments on ivopts patch

2015-12-16 Thread Richard Biener
On Tue, Dec 15, 2015 at 12:06 AM, Steve Ellcey  wrote:
> On Mon, 2015-12-14 at 09:57 +0100, Richard Biener wrote:
>
>> I don't know enough to assess the effect of this but
>>
>>  1) not all archs can do auto-incdec so either the comment is misleading
>> or the test should probably be amended
>>  2) I wonder why with the comment ("during the loop") you exclude 
>> IP_NORMAL/END
>>
>> that said, the comment needs to explain the situation better.
>>
>> Of course all such patches need some code-gen effect investigation
>> on more than one arch.
>>
>> [I wonder if a IV cost adjust target hook makes sense at some point]
>>
>> Richard.
>
> I like the idea of a target hook to modify IV costs.  What do you think
> about this?  I had to move some structures from tree-ssa-loop-ivopts.c
> to tree-ssa-loop-ivopts.h in order to give a target hooks access to
> information on the IV candidates.

IMHO it's better to not have empty default implementations but do

if (targetm.adjust_iv_cand_cost)
  targetm.adjust_iv_cand_cost (...);

which saves an unconditional indirect call on most targets.

Generally I think we should prefer target independent heuristics if possible
or heuristics derived from target cost queries.  So this kind of hook
says we've given up (and it makes it too easy to change things just for
a single target).

Anyway, this is for next stage1 of course so there's some time to come up
with good ideas and do testing on more than one target.

Richard.

> Steve Ellcey
> sell...@imgtec.com
>
>
> 2015-12-14  Steve Ellcey  
>
> * doc/tm.texi.in (TARGET_ADJUST_IV_CAND_COST): New target function.
> * target.def (adjust_iv_cand_cost): New target function.
> * target.h (struct iv_cand): New forward declaration.
> * targhooks.c (default_adjust_iv_cand_cost): New default function.
> * targhooks.h (default_adjust_iv_cand_cost): Ditto.
> * tree-ssa-loop-ivopts.c (struct iv, enum iv_position, struct iv_cand)
> Moved to tree-ssa-loop-ivopts.h.
> (determine_iv_cost): Add call to targetm.adjust_iv_cand_cost.
> * tree-ssa-loop-ivopts.h (struct iv, enum iv_position, struct iv_cand)
> Copied here from tree-ssa-loop-ivopts.h.
>
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index a0a0a81..1ad4c2d 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -8221,6 +8221,8 @@ and the associated definitions of those functions.
>
>  @hook TARGET_OFFLOAD_OPTIONS
>
> +@hook TARGET_ADJUST_IV_CAND_COST
> +
>  @defmac TARGET_SUPPORTS_WIDE_INT
>
>  On older ports, large integers are stored in @code{CONST_DOUBLE} rtl
> diff --git a/gcc/target.def b/gcc/target.def
> index d754337..6bdcfcc 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -5846,6 +5846,12 @@ DEFHOOK
>   void, (tree *hold, tree *clear, tree *update),
>   default_atomic_assign_expand_fenv)
>
> +DEFHOOK
> +(adjust_iv_cand_cost,
> +"Allow target to modify the cost of a possible induction variable.",
> +void, (struct iv_cand *cand),
> + default_adjust_iv_cand_cost)
> +
>  /* Leave the boolean fields at the end.  */
>
>  /* True if we can create zeroed data by switching to a BSS section
> diff --git a/gcc/target.h b/gcc/target.h
> index ffc4d6a..6f55575 100644
> --- a/gcc/target.h
> +++ b/gcc/target.h
> @@ -139,6 +139,9 @@ struct ao_ref;
>  /* This is defined in tree-vectorizer.h.  */
>  struct _stmt_vec_info;
>
> +/* This is defined in tree-ivopts.h. */
> +struct iv_cand;
> +
>  /* These are defined in tree-vect-stmts.c.  */
>  extern tree stmt_vectype (struct _stmt_vec_info *);
>  extern bool stmt_in_inner_loop_p (struct _stmt_vec_info *);
> diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> index dcf0863..0d0bbfc 100644
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -1961,4 +1961,10 @@ default_optab_supported_p (int, machine_mode, 
> machine_mode, optimization_type)
>return true;
>  }
>
> +/* Default implementation of TARGET_ADJUST_IV_CAND_COST.  */
> +void
> +default_adjust_iv_cand_cost (struct iv_cand *)
> +{
> +}
> +
>  #include "gt-targhooks.h"
> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> index 47b5cfc..dd0481d 100644
> --- a/gcc/targhooks.h
> +++ b/gcc/targhooks.h
> @@ -253,4 +253,6 @@ extern void default_setup_incoming_vararg_bounds 
> (cumulative_args_t ca ATTRIBUTE
>  extern bool default_optab_supported_p (int, machine_mode, machine_mode,
>optimization_type);
>
> +extern void default_adjust_iv_cand_cost (struct iv_cand *);
> +
>  #endif /* GCC_TARGHOOKS_H */
> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
> index 98dc451..5bfd232 100644
> --- a/gcc/tree-ssa-loop-ivopts.c
> +++ b/gcc/tree-ssa-loop-ivopts.c
> @@ -126,21 +126,6 @@ avg_loop_niter (struct loop *loop)
>return niter;
>  }
>
> -/* Representation of the induction variable.  */
> -struct iv
> -{
> -  tree base;   /* Initial value of the iv.  */
> -  tree base_object;/* A memory object to that the induction variable 
> poin

Re: [PATCH] Fix Fortran deviceptr clause.

2015-12-16 Thread James Norris

Hi,

This is an update of my previous patch. Cesar (thanks!)
pointed out some issues with the original patch that
have now been addressed.

Regtested on x86_64

OK for trunk?

Thanks!
Jim


diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 276f2f1..9350dc4 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -812,19 +812,10 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
    OMP_MAP_ALLOC))
 	continue;
   if ((mask & OMP_CLAUSE_DEVICEPTR)
-	  && gfc_match ("deviceptr ( ") == MATCH_YES)
-	{
-	  gfc_omp_namelist **list = &c->lists[OMP_LIST_MAP];
-	  gfc_omp_namelist **head = NULL;
-	  if (gfc_match_omp_variable_list ("", list, true, NULL, &head, false)
-	  == MATCH_YES)
-	{
-	  gfc_omp_namelist *n;
-	  for (n = *head; n; n = n->next)
-		n->u.map_op = OMP_MAP_FORCE_DEVICEPTR;
-	  continue;
-	}
-	}
+	  && gfc_match ("deviceptr ( ") == MATCH_YES
+	  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
+   OMP_MAP_FORCE_DEVICEPTR))
+	continue;
   if ((mask & OMP_CLAUSE_USE_DEVICE)
 	  && gfc_match_omp_variable_list ("use_device (",
 	  &c->lists[OMP_LIST_USE_DEVICE], true)
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index db7cab3..98982c3 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -49,6 +49,51 @@ find_pset (int pos, size_t mapnum, unsigned short *kinds)
   return kind == GOMP_MAP_TO_PSET;
 }
 
+/* Handle the mapping pair that are presented when a
+   deviceptr clause is used with Fortran.  */
+
+static void
+handle_ftn_pointers (size_t mapnum, void **hostaddrs, size_t *sizes,
+		 unsigned short *kinds)
+{
+  int i;
+
+  for (i = 0; i < mapnum; i++)
+{
+  unsigned short kind1 = kinds[i] & 0xff;
+
+  /* Handle Fortran deviceptr clause.  */
+  if (kind1 == GOMP_MAP_FORCE_DEVICEPTR)
+	{
+	  unsigned short kind2;
+
+	  if (i < (signed)mapnum - 1)
+	kind2 = kinds[i + 1] & 0xff;
+	  else
+	kind2 = 0x;
+
+	  if (sizes[i] == sizeof (void *))
+	continue;
+
+	  /* At this point, we're dealing with a Fortran deviceptr.
+	 If the next element is not what we're expecting, then
+	 this is an instance of where the deviceptr variable was
+	 not used within the region and the pointer was removed
+	 by the gimplifier.  */
+	  if (kind2 == GOMP_MAP_POINTER
+	  && sizes[i + 1] == 0
+	  && hostaddrs[i] == *(void **)hostaddrs[i + 1])
+	{
+	  kinds[i+1] = kinds[i];
+	  sizes[i+1] = sizeof (void *);
+	}
+
+	  /* Invalidate the entry.  */
+	  hostaddrs[i] = NULL;
+	}
+}
+}
+
 static void goacc_wait (int async, int num_waits, va_list *ap);
 
 
@@ -88,6 +133,8 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
   thr = goacc_thread ();
   acc_dev = thr->dev;
 
+  handle_ftn_pointers (mapnum, hostaddrs, sizes, kinds);
+
   /* Host fallback if "if" clause is false or if the current device is set to
  the host.  */
   if (host_fallback)
@@ -172,8 +219,13 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
 
   devaddrs = gomp_alloca (sizeof (void *) * mapnum);
   for (i = 0; i < mapnum; i++)
-devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start
-			+ tgt->list[i].key->tgt_offset);
+{
+  if (tgt->list[i].key != NULL)
+	devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start
++ tgt->list[i].key->tgt_offset);
+  else
+	devaddrs[i] = NULL;
+}
 
   acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs,
 			  async, dims, tgt);
@@ -224,6 +276,8 @@ GOACC_data_start (int device, size_t mapnum,
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
+  handle_ftn_pointers (mapnum, hostaddrs, sizes, kinds);
+
   /* Host fallback or 'do nothing'.  */
   if ((acc_dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
   || host_fallback)
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
index f717d1b..2d4b707 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
@@ -1,29 +1,22 @@
 ! { dg-do run  { target openacc_nvidia_accel_selected } }
 
+! Tests to exercise the declare directive along with
+! the clauses: copy
+!  copyin
+!  copyout
+!  create
+!  present
+!  present_or_copy
+!  present_or_copyin
+!  present_or_copyout
+!  present_or_create
+
 module vars
   implicit none
   integer z
   !$acc declare create (z)
 end module vars
 
-subroutine subr6 (a, d)
-  implicit none
-  integer, parameter :: N = 8
-  integer :: i
-  integer :: a(N)
-  !$acc declare deviceptr (a)
-  integer :: d(N)
-
-  i = 0
-
-  !$acc parallel copy (d)
-do i = 1, N
-  d(i) = a(i) + a(i)
-end do
-  !$acc end parallel
-
-end subroutine
-
 subroutine subr5 (a, b, c, d)
   implicit none
   integer, parameter :: N = 8
@@ 

[PATCH] Fix PR68861

2015-12-16 Thread Richard Biener

The following fixes the SLP miscompile in PR68861 which happens because
we didn't think of stmts appering multiple times in a SLP node when
doing the operand swapping support.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-12-16  Richard Biener  

PR tree-optimization/68861
* tree-vect-slp.c (vect_build_slp_tree): Properly handle
duplicate stmts when applying swapping to stmts.

Index: gcc/tree-vect-slp.c
===
*** gcc/tree-vect-slp.c (revision 231675)
--- gcc/tree-vect-slp.c (working copy)
*** vect_build_slp_tree (vec_info *vinfo,
*** 1049,1059 
 if we end up building the operand from scalars as
 we'll continue to process swapped operand two.  */
  for (j = 0; j < group_size; ++j)
!   if (!matches[j])
  {
gimple *stmt = SLP_TREE_SCALAR_STMTS (*node)[j];
!   swap_ssa_operands (stmt, gimple_assign_rhs1_ptr (stmt),
!  gimple_assign_rhs2_ptr (stmt));
  }
  
  /* If we have all children of child built up from scalars then
--- 1049,1077 
 if we end up building the operand from scalars as
 we'll continue to process swapped operand two.  */
  for (j = 0; j < group_size; ++j)
!   {
! gimple *stmt = SLP_TREE_SCALAR_STMTS (*node)[j];
! gimple_set_plf (stmt, GF_PLF_1, false);
!   }
! for (j = 0; j < group_size; ++j)
!   {
! gimple *stmt = SLP_TREE_SCALAR_STMTS (*node)[j];
! if (!matches[j])
!   {
! /* Avoid swapping operands twice.  */
! if (gimple_plf (stmt, GF_PLF_1))
!   continue;
! swap_ssa_operands (stmt, gimple_assign_rhs1_ptr (stmt),
!gimple_assign_rhs2_ptr (stmt));
! gimple_set_plf (stmt, GF_PLF_1, true);
!   }
!   }
! /* Verify we swap all duplicates or none.  */
! if (flag_checking)
!   for (j = 0; j < group_size; ++j)
  {
gimple *stmt = SLP_TREE_SCALAR_STMTS (*node)[j];
!   gcc_assert (gimple_plf (stmt, GF_PLF_1) == ! matches[j]);
  }
  
  /* If we have all children of child built up from scalars then



Re: [PATCH PR68542]

2015-12-16 Thread Richard Biener
On Fri, Dec 11, 2015 at 3:03 PM, Yuri Rumyantsev  wrote:
> Richard.
> Thanks for your review.
> I re-designed fix for assert by adding additional checks for vector
> comparison with boolean result to fold_binary_op_with_conditional_arg
> and remove early exit to combine_cond_expr_cond.
> Unfortunately, I am not able to provide you with test-case since it is
> in my second patch related to back-end patch which I sent earlier
> (12-08).
>
> Bootstrapping and regression testing did not show any new failures.
> Is it OK for trunk?

+  else if (TREE_CODE (type) == VECTOR_TYPE)
 {
   tree testtype = TREE_TYPE (cond);
   test = cond;
   true_value = constant_boolean_node (true, testtype);
   false_value = constant_boolean_node (false, testtype);
 }
+  else
+{
+  test = cond;
+  cond_type = type;
+  true_value = boolean_true_node;
+  false_value = boolean_false_node;
+}

So this is, say, vec1 != vec2 with scalar vs. vector result.  If we have
scalar result and thus, say, scalar + vec1 != vec2.  I believe rather
than doing the above (not seeing how this not would generate wrong
code eventually) we should simply detect the case of mixing vector
and scalar types and bail out.  At least without some comments
your patch makes the function even more difficult to understand than
it is already.

@@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
   if (TREE_CODE (op0_type) == VECTOR_TYPE
  || TREE_CODE (op1_type) == VECTOR_TYPE)
 {
-  error ("vector comparison returning a boolean");
-  debug_generic_expr (op0_type);
-  debug_generic_expr (op1_type);
-  return true;
+ /* Allow vector comparison returning boolean if operand types
+are boolean or integral and CODE is EQ/NE.  */
+ if (code != EQ_EXPR && code != NE_EXPR
+ && !VECTOR_BOOLEAN_TYPE_P (op0_type)
+ && !VECTOR_INTEGER_TYPE_P (op0_type))
+   {
+ error ("type mismatch for vector comparison returning a boolean");
+ debug_generic_expr (op0_type);
+ debug_generic_expr (op1_type);
+ return true;
+   }
 }
 }
   /* Or a boolean vector type with the same element count

as said before please merge the cascaded if()s.  Better wording for
the error is "unsupported operation or type for vector comparison
returning a boolean"

Otherwise the patch looks sensible to me though it shows that overloading of
EQ/NE_EXPR for scalar result and vector operands might have some more unexpected
fallout (which is why I originally prefered the view-convert to large
integer type variant).

Thanks,
Richard.


> ChangeLog:
> 2015-12-11  Yuri Rumyantsev  
>
> PR middle-end/68542
> * fold-const.c (fold_binary_op_with_conditional_arg): Add checks oh
> vector comparison with boolean result to avoid ICE.
> (fold_relational_const): Add handling of vector
> comparison with boolean result.
> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
> comparison of vector operands with boolean result for EQ/NE only.
> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
> (verify_gimple_cond): Likewise.
> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
> combining for non-compatible vector types.
> * tree-vrp.c (register_edge_assert_for): VRP does not track ranges for
> vector types.
>
> 2015-12-10 16:36 GMT+03:00 Richard Biener :
>> On Fri, Dec 4, 2015 at 4:07 PM, Yuri Rumyantsev  wrote:
>>> Hi Richard.
>>>
>>> Thanks a lot for your review.
>>> Below are my answers.
>>>
>>> You asked why I inserted additional check to
>>> ++ b/gcc/tree-ssa-forwprop.c
>>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>>> tree_code code, tree type,
>>>
>>>gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>>
>>> +  /* Do not perform combining it types are not compatible.  */
>>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>>> +  && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE 
>>> (op0
>>> +return NULL_TREE;
>>> +
>>>
>>> again, how does this happen?
>>>
>>> This is because without it I've got assert in fold_convert_loc
>>>   gcc_assert (TREE_CODE (orig) == VECTOR_TYPE
>>>  && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig)));
>>>
>>> since it tries to convert vector of bool to scalar bool.
>>> Here is essential part of call-stack:
>>>
>>> #0  internal_error (gmsgid=0x1e48397 "in %s, at %s:%d")
>>> at ../../gcc/diagnostic.c:1259
>>> #1  0x01743ada in fancy_abort (
>>> file=0x1847fc3 "../../gcc/fold-const.c", line=2217,
>>> function=0x184b9d0 >> tree_node*)::__FUNCTION__> "fold_convert_loc") at
>>> ../../gcc/diagnostic.c:1332
>>> #2  0x009c8330 in fold_convert_loc (loc=0, type=0x718a9d20,
>>> arg=0x71a7f488) at ../../gcc/fold-const.c:2216
>>> #3  0x009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR,
>>> 

[PATCH] OpenACC documentation for libgomp

2015-12-16 Thread James Norris

Hi,

Attached is the patch to add OpenACC documentation for libgomp.

Ok to commit to trunk?

Thanks!
Jim
Index: libgomp.texi
===
--- libgomp.texi	(revision 231662)
+++ libgomp.texi	(working copy)
@@ -94,10 +94,25 @@
 @comment  better formatting.
 @comment
 @menu
+* Enabling OpenACC::   How to enable OpenACC for your
+   applications.
+* OpenACC Runtime Library Routines::
+   The OpenACC runtime application
+   programming interface.
+* OpenACC Environment Variables::
+   Influencing OpenACC runtime behavior with
+   environment variables.
+* CUDA Streams Usage:: Notes on the implementation of
+   asynchronous operations.
+* OpenACC Library Interoperability::
+   OpenACC library interoperability with the
+   NVIDIA CUBLAS library.
 * Enabling OpenMP::How to enable OpenMP for your applications.
-* Runtime Library Routines::   The OpenMP runtime application programming 
+* OpenMP Runtime Library Routines::
+   The OpenMP runtime application programming 
interface.
-* Environment Variables::  Influencing runtime behavior with environment 
+* OpenMP Environment Variables::
+   Influencing runtime behavior with environment 
variables.
 * The libgomp ABI::Notes on the external ABI presented by libgomp.
 * Reporting Bugs:: How to report bugs in the GNU Offloading and
@@ -113,6 +128,643 @@
 
 
 @c -
+@c Enabling OpenACC
+@c -
+
+@node Enabling OpenACC
+@chapter Enabling OpenACC
+
+To activate the OpenACC extensions for C/C++ and Fortran, the compile-time 
+flag @command{-fopenacc} must be specified.  This enables the OpenACC directive
+@code{#pragma acc} in C/C++ and @code{!$accp} directives in free form,
+@code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
+@code{!$} conditional compilation sentinels in free form and @code{c$},
+@code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
+arranges for automatic linking of the OpenACC runtime library 
+(@ref{OpenACC Runtime Library Routines}).
+
+A complete description of all OpenACC directives accepted may be found in 
+the @uref{http://www.openacc.org/, OpenMP Application Programming
+Interface} manual, version 2.0.
+
+Note that this is an experimental feature, incomplete, and subject to
+change in future versions of GCC.  See
+@uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
+
+
+
+@c -
+@c OpenACC Runtime Library Routines
+@c -
+
+@node OpenACC Runtime Library Routines
+@chapter OpenACC Runtime Library Routines
+
+The runtime routines described here are defined by section 3 of the OpenACC
+specifications in version 2.0.
+They have C linkage, and do not throw exceptions.
+Generally, they are available only for the host, with the exception of
+@code{acc_on_device}, which is available for both the host and the
+acceleration device.
+
+@menu
+* acc_get_num_devices:: Get number of devices for the given device type
+* acc_set_device_type::
+* acc_get_device_type::
+* acc_set_device_num::
+* acc_get_device_num::
+* acc_init::
+* acc_shutdown::
+* acc_on_device::   Whether executing on a particular device
+* acc_malloc::
+* acc_free::
+* acc_copyin::
+* acc_present_or_copyin::
+* acc_create::
+* acc_present_or_create::
+* acc_copyout::
+* acc_delete::
+* acc_update_device::
+* acc_update_self::
+* acc_map_data::
+* acc_unmap_data::
+* acc_deviceptr::
+* acc_hostptr::
+* acc_is_present::
+* acc_memcpy_to_device::
+* acc_memcpy_from_device::
+
+API routines for target platforms.
+
+* acc_get_current_cuda_device::
+* acc_get_current_cuda_context::
+* acc_get_cuda_stream::
+* acc_set_cuda_stream::
+@end menu
+
+
+
+@node acc_get_num_devices
+@section @code{acc_get_num_devices} -- Get number of devices for given device type
+@table @asis
+@item @emph{Description}
+This routine returns a value indicating the
+number of devices available for the given device type.  It determines
+the number of devices in a @emph{passive} manner.  In other words, it
+does not alter the state within the runtime environment aside from
+possibly initializing an uninitialized device.  This aspect allows
+the routine to be called without concern for altering the interaction
+with an attached accelerator device.
+
+@item @emph{Reference}:
+@uref{http://www.openacc.org/, OpenACC specification v2.0}, section
+3

Re: [0/7] Type promotion pass and elimination of zext/sext

2015-12-16 Thread Richard Biener
On Thu, Dec 10, 2015 at 1:27 AM, Kugan
 wrote:
> Hi Riachard,
>
> Thanks for the reviews.
>
> I think since we have some unresolved issues here, it is best to aim for
> the next stage1. I however would like any feedback so that I can
> continue to improve this.

Yeah, sorry I've been distracted lately and am not sure I'll get to
the patch before
christmas break.

> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01063.html is also related
> to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67714. I don't think
> there is any agreement on this. Or is there any better place to fix this?

I don't know enough in this area to suggest anything.

Richard.

> Thanks,
> Kugan


Re: [RFC, rtl optimization]: Better heuristics for estimate_reg_pressure_cost in presence of call for LICM.

2015-12-16 Thread Bernd Schmidt

On 12/16/2015 12:54 PM, Ajit Kumar Agarwal wrote:

/* If there is a call in the loop body, the call-clobbered registers
   are not available for loop invariants.  */
+
if (call_p)
  available_regs = available_regs - target_clobbered_regs;
-
+
/* If we have enough registers, we should use them and not restrict
   the transformations unnecessarily.  */
if (regs_needed + target_res_regs <= available_regs)
  return 0;


Just a thought, one thing that might make sense here is counting some of 
the target_res_regs among the clobbered ones. This would become


 int res_regs = target_res_regs;
 if (call_p)
   {
 available_regs = available_regs - target_clobbered_regs;
 res_regs /= 2;
   }

 /* If we have enough registers, we should use them and not restrict
the transformations unnecessarily.  */
 if (regs_needed + res_regs <= available_regs)
   return 0;

It's all a bit crude, before and after, but such a change might be 
justifiable.



Bernd


Re: [PATCH, 4/16] Implement -foffload-alias

2015-12-16 Thread Richard Biener
On Mon, 14 Dec 2015, Tom de Vries wrote:

> On 14/12/15 14:26, Richard Biener wrote:
> > On Sun, 13 Dec 2015, Tom de Vries wrote:
> > 
> > > On 11/12/15 14:00, Richard Biener wrote:
> > > > On Fri, 11 Dec 2015, Tom de Vries wrote:
> > > > 
> > > > > On 13/11/15 12:39, Jakub Jelinek wrote:
> > > > > > We simply have some compiler internal interface between the caller
> > > > > > and
> > > > > > callee of the outlined regions, each interface in between those has
> > > > > > its own structure type used to communicate the info;
> > > > > > we can attach attributes on the fields, or some flags to indicate
> > > > > > some
> > > > > > properties interesting from aliasing POV.  We don't really need to
> > > > > > perform
> > > > > > full IPA-PTA, perhaps it would be enough to a) record somewhere in
> > > > > > cgraph
> > > > > > the relationship in between such callers and callees (for offloading
> > > > > > regions
> > > > > > we already have "omp target entrypoint" attribute on the callee and
> > > > > > a
> > > > > > singler caller), tell LTO if possible not to split those into
> > > > > > different
> > > > > > partitions if easily possible, and then just for these pairs perform
> > > > > > aliasing/points-to analysis in the caller and the result record
> > > > > > using
> > > > > > cliques/special attributes/whatever to the callee side, so that the
> > > > > > callee
> > > > > > (outlined OpenMP/OpenACC/Cilk+ region) can then improve its alias
> > > > > > analysis.
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > This work-in-progress patch allows me to use IPA PTA information in
> > > > > the
> > > > > kernels pass group.
> > > > > 
> > > > > Since:
> > > > > -  I'm running IPA PTA before ealias, and IPA PTA does not interpret
> > > > >  restrict, and
> > > > > - compute_may_alias doesn't run if IPA PTA information is present
> > > > > I needed to convince ealias to do the restrict clique/base annotation.
> > > > > 
> > > > > It would be more logical to fit IPA PTA after ealias, but one is an
> > > > > IPA
> > > > > pass,
> > > > > the other a regular one-function pass, so I would have to split the
> > > > > containing
> > > > > pass groups pass_all_early_optimizations and
> > > > > pass_local_optimization_passes.
> > > > > I'll give that a try now.
> > > > > 
> > > 
> > > I've tried this approach, but realized that this changes the order in
> > > which
> > > non-openacc functions are processed in the compiler, so I've abandoned
> > > this
> > > idea.
> > > 
> > > > > Any comments?
> > > > 
> > > > I don't think you want to run IPA PTA before early
> > > > optimizations, it (and ealias) rely on some initial cleanup to
> > > > do anything meaningful with well-spent ressources.
> > > > 
> > > > The local PTA "hack" also looks more like a waste of resources, but well
> > > > ... teaching IPA PTA to honor restrict might be an impossible task
> > > > though I didn't think much about it other than handling it only for
> > > > nonlocal_p functions (for others we should see all incoming args
> > > > if IPA PTA works optimally).  The restrict tags will leak all over
> > > > the place of course and in the end no meaningful cliques may remain.
> > > > 
> > > 
> > > This patch:
> > > - moves the kernels pass group to the first position in the pass list
> > >after ealias where we're back in ipa mode
> > > - inserts an new ipa pass to contain the gimple pass group called
> > >pass_oacc_ipa
> > > - inserts a version of ipa-pta before the pass group.
> > 
> > In principle I like this a lot, but
> > 
> > +  NEXT_PASS (pass_ipa_pta_oacc_kernels);
> > +  NEXT_PASS (pass_oacc_ipa);
> > +  PUSH_INSERT_PASSES_WITHIN (pass_oacc_ipa)
> > 
> > I think you can put pass_ipa_pta_oacc_kernels into the pass_oacc_ipa
> > group and thus just "clone" ipa_pta?
> 
> Done. But using a clone means using the same gate function, and that means
> that this pass_ipa_pta instance no longer runs by default for openacc by
> default.
> 
> I've added enabling-by-default of fipa-pta for fopenacc in
> default_options_optimization to fix that.

Hmm, but that enables both IPA PTA passes then?  I suppose that's ok,
and if not enabling the "late" IPA PTA you'd want to re-set 
gimple_df->ipa_pta.

> > sub-passes of IPA passes can
> > be both ipa passes and non-ipa passes.
> 
> Right. It does mean that I need yet another pass (pass_ipa_oacc_kernels) to do
> the IPA/non-IPA transition at pass/sub-pass boundary:
> ...
>   NEXT_PASS (pass_ipa_oacc);
>   PUSH_INSERT_PASSES_WITHIN (pass_ipa_oacc)
>   NEXT_PASS (pass_ipa_pta);
>   NEXT_PASS (pass_ipa_oacc_kernels);
>   PUSH_INSERT_PASSES_WITHIN (pass_ipa_oacc_kernels)
>  /* out-of-ipa */
>  NEXT_PASS (pass_oacc_kernels);
>  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
> ...
> 
> OK for stage3 if bootstrap and reg-test succeeds?

Ok.

Richard.


RE: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS

2015-12-16 Thread Wilco Dijkstra
James Greenhalgh wrote:
> On Tue, Dec 15, 2015 at 10:54:49AM +, Wilco Dijkstra wrote:
> > ping
> >
> > > -Original Message-
> > > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> > > Sent: 06 November 2015 20:06
> > > To: 'gcc-patches@gcc.gnu.org'
> > > Subject: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > >
> > > This patch adds support for the TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > > hook. When the cost of GENERAL_REGS and FP_REGS is identical, the register
> > > allocator always uses ALL_REGS even when it has a much higher cost. The
> > > hook changes the class to either FP_REGS or GENERAL_REGS depending on the
> > > mode of the register. This results in better register allocation overall,
> > > fewer spills and reduced codesize - particularly in SPEC2006 gamess.
> > >
> > > GCC regression passes with several minor fixes.
> > >
> > > OK for commit?
> > >
> > > ChangeLog:
> > > 2015-11-06  Wilco Dijkstra  
> > >
> > >   * gcc/config/aarch64/aarch64.c
> > >   (TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): New define.
> > >   (aarch64_ira_change_pseudo_allocno_class): New function.
> > >   * gcc/testsuite/gcc.target/aarch64/cvtf_1.c: Build with -O2.
> > >   * gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > >   (test_corners_sisd_di): Improve force to SIMD register.
> > >   (test_corners_sisd_si): Likewise.
> > >   * gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c: Build with -O2.
> > >   * gcc/testsuite/gcc.target/aarch64/vect-ld1r-compile-fp.c:
> > >   Remove scan-assembler check for ldr.
> 
> Drop the gcc/ from the ChangeLog.
> 
> > > --
> > >  gcc/config/aarch64/aarch64.c   | 22 
> > > ++
> > >  gcc/testsuite/gcc.target/aarch64/cvtf_1.c  |  2 +-
> > >  gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c  |  4 ++--
> > >  gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c |  2 +-
> > >  .../gcc.target/aarch64/vect-ld1r-compile-fp.c  |  1 -
> 
> These testsuite changes concern me a bit, and you don't mention them beyond
> saying they are minor fixes...

Well any changes to register allocator preferencing would cause fallout in 
tests that
are assuming which register is allocated, especially if they use nasty inline 
assembler
hacks to do so...

> > > diff --git a/gcc/testsuite/gcc.target/aarch64/cvtf_1.c 
> > > b/gcc/testsuite/gcc.target/aarch64/cvtf_1.c
> > > index 5f2ff81..96501db 100644
> > > --- a/gcc/testsuite/gcc.target/aarch64/cvtf_1.c
> > > +++ b/gcc/testsuite/gcc.target/aarch64/cvtf_1.c
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-do run } */
> > > -/* { dg-options "-save-temps -fno-inline -O1" } */
> > > +/* { dg-options "-save-temps -fno-inline -O2" } */
> 
> This one says we have a code-gen regression at -O1 ?

It avoids a regalloc bug - see below.

> > >  #define FCVTDEF(ftype,itype) \
> > >  void \
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c 
> > > b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > index 363f554..8465c89 100644
> > > --- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > +++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > @@ -186,9 +186,9 @@ test_corners_sisd_di (Int64x1 b)
> > >  {
> > >force_simd_di (b);
> > >b = b >> 63;
> > > +  force_simd_di (b);
> > >b = b >> 0;
> > >b += b >> 65; /* { dg-warning "right shift count >= width of type" } */
> > > -  force_simd_di (b);
> 
> This one I don't understand, but seems to say that we've decided to move
> b out of FP_REGS after getting it in there for b = b << 63; ? So this is
> another register allocator regression?

No, basically the register allocator is now making better decisions as to where 
to
allocate integer variables. It will only allocate them to FP registers if they 
are primarily
used by other FP operations. The force_simd_di inline assembler tries to mimic 
FP uses,
and if there are enough of them at the right places then everything works as 
expected.
If however you do 3 consecutive integer operations then the allocator will now 
correctly
prefer to allocate them to the integer registers (while previously it wouldn't, 
which is
inefficient).

> > > diff --git a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c 
> > > b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
> > > index a49db3e..c5a9c52 100644
> > > --- a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
> > > +++ b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
> > > @@ -1,6 +1,6 @@
> > >  /* Test vdup_lane intrinsics work correctly.  */
> > >  /* { dg-do run } */
> > > -/* { dg-options "-O1 --save-temps" } */
> > > +/* { dg-options "-O2 --save-temps" } */
> 
> Another -O1 regression ?

No, it's triggering a bug in the -O1 register preferencing that causes 
incorrect preferences to be
selected despite the costs being right. The cost calculation with -O1 for eg. 
wrap_vdupb_lane_s8_0() in vdup_lane_2.c:

Pass 0 for finding pseudo/allocno costs

r79: preferred FP_REGS, alternative GENERAL_REGS, allocno GENERAL_REGS
 

Re: [PATCH PR68906]

2015-12-16 Thread Richard Biener
On Wed, Dec 16, 2015 at 1:14 PM, Yuri Rumyantsev  wrote:
> Hi All,
>
> Here is simple patch which cures the issue with outer-loop unswitching
> - added invocation of number_of_latch_executions() to reject
> unswitching for non-iterated loops.
>
> Bootstrapping and regression testing did not show any new failures.
> Is it OK for trunk?

No, that looks like just papering over the issue.

The issue (with the 2nd testcase at least) is that single_exit () accepts
an exit from the inner loop.

Index: gcc/tree-ssa-loop-unswitch.c
===
--- gcc/tree-ssa-loop-unswitch.c(revision 231686)
+++ gcc/tree-ssa-loop-unswitch.c(working copy)
@@ -431,7 +431,7 @@ tree_unswitch_outer_loop (struct loop *l
 return false;
   /* Accept loops with single exit only.  */
   exit = single_exit (loop);
-  if (!exit)
+  if (!exit || exit->src->loop_father != loop)
 return false;
   /* Check that phi argument of exit edge is not defined inside loop.  */
   if (!check_exit_phi (loop))

fixes the runtime testcase for me (not suitable for the testsuite due
to the infinite
looping though).

Can you please bootstrap/test the above with your testcase?  The above patch is
ok if it passes testing (no time myself right now)

Thanks,
Richard.

> ChangeLog:
>
> 2014-12-16  Yuri Rumyantsev  
>
> PR tree-optimization/68906
> * tree-ssa-loop-unswitch.c : Include couple header files.
> (tree_unswitch_outer_loop): Use number_of_latch_executions
> to reject non-iterated loops.
>
> gcc/testsuite/ChangeLog
> * gcc.dg/torture/pr68906.c: New test.


Re: [PATCH] S/390: Allow to use r1 to r4 as literal pool base.

2015-12-16 Thread Ulrich Weigand
Dominik Vogt wrote:
> On Mon, Dec 14, 2015 at 04:08:32PM +0100, Ulrich Weigand wrote:
> > I don't think that r1 is actually safe here.  Note that it may be used
> > (unconditionally) as temp register in s390_emit_prologue in certain cases;
> > the upcoming split-stack code will also need to use r1 in some cases.
> 
> How about the attached patch?  It also allows to use r0 as the
> temp register if possible (needs more testing).

This doesn't look safe either.  In particular:

- you use cfun_save_high_fprs_p at a place where its value might not yet
  have been determined (when calling s390_get_prologue_temp_regno from
  s390_init_frame_layout *before* the s390_register_info/s390_frame_info
  calls)

- r0 might hold the incoming static chain value, in which case it cannot
  be used as temp register

> If that's too
> much effort, I'm fine with limiting the original patch to r4 to
> r2.

That seems preferable to me.

> > r2 through r4 should be fine.  [ Not sure if there will be many (any?) cases
> > where one of those is unused but r5 isn't, however. ]
> 
> This can happen if the function only uses register pairs
> (__int128).  Actually I'm not sure whether r2 and r4 are valid
> candidates.

Huh?  Why not?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



  1   2   >