[PR83663] Revert r255946

2018-01-08 Thread Vidya Praveen
Hello,

This patch reverts the changes introduced by r255946 and further changes
to that done by r256195, as the former causes large number of regressions
on aarch64_be* targets. This should be respun with the mismatch in lane
numbering in AArch64 and GCC's numbering fixed as explained in PR83663.

OK for trunk?

VP.


ChangeLog:

gcc/

PR target/83663 - Revert r255946

* config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code
generation for cases where splatting a value is not useful.
* simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge
across a vec_duplicate and a paradoxical subreg forming a vector
mode to a vec_concat.

gcc/testsuite/

PR target/83663 - Revert r255946

* gcc.target/aarch64/vect-slp-dup.c: New.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a189605..03a92b6 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12129,51 +12129,9 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 	maxv = matches[i][1];
 	  }
 
-  /* Create a duplicate of the most common element, unless all elements
-	 are equally useless to us, in which case just immediately set the
-	 vector register using the first element.  */
-
-  if (maxv == 1)
-	{
-	  /* For vectors of two 64-bit elements, we can do even better.  */
-	  if (n_elts == 2
-	  && (inner_mode == E_DImode
-		  || inner_mode == E_DFmode))
-
-	{
-	  rtx x0 = XVECEXP (vals, 0, 0);
-	  rtx x1 = XVECEXP (vals, 0, 1);
-	  /* Combine can pick up this case, but handling it directly
-		 here leaves clearer RTL.
-
-		 This is load_pair_lanes, and also gives us a clean-up
-		 for store_pair_lanes.  */
-	  if (memory_operand (x0, inner_mode)
-		  && memory_operand (x1, inner_mode)
-		  && !STRICT_ALIGNMENT
-		  && rtx_equal_p (XEXP (x1, 0),
-  plus_constant (Pmode,
-		 XEXP (x0, 0),
-		 GET_MODE_SIZE (inner_mode
-		{
-		  rtx t;
-		  if (inner_mode == DFmode)
-		t = gen_load_pair_lanesdf (target, x0, x1);
-		  else
-		t = gen_load_pair_lanesdi (target, x0, x1);
-		  emit_insn (t);
-		  return;
-		}
-	}
-	  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, 0));
-	  aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
-	  maxelement = 0;
-	}
-  else
-	{
-	  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
-	  aarch64_emit_move (target, gen_vec_duplicate (mode, x));
-	}
+  /* Create a duplicate of the most common element.  */
+  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
+  aarch64_emit_move (target, gen_vec_duplicate (mode, x));
 
   /* Insert the rest.  */
   for (int i = 0; i < n_elts; i++)
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 6cb5a6e..b052fbb 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -5888,57 +5888,6 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
 		return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1);
 	}
 
-	  /* Replace:
-
-	  (vec_merge:outer (vec_duplicate:outer x:inner)
-			   (subreg:outer y:inner 0)
-			   (const_int N))
-
-	 with (vec_concat:outer x:inner y:inner) if N == 1,
-	 or (vec_concat:outer y:inner x:inner) if N == 2.
-	 We assume that degenrate cases (N == 0 or N == 3), which
-	 represent taking all elements from either input, are handled
-	 elsewhere.
-
-	 Implicitly, this means we have a paradoxical subreg, but such
-	 a check is cheap, so make it anyway.
-
-	 Only applies for vectors of two elements.  */
-
-	  if ((GET_CODE (op0) == VEC_DUPLICATE
-	   || GET_CODE (op1) == VEC_DUPLICATE)
-	  && GET_MODE (op0) == GET_MODE (op1)
-	  && known_eq (GET_MODE_NUNITS (GET_MODE (op0)), 2)
-	  && known_eq (GET_MODE_NUNITS (GET_MODE (op1)), 2)
-	  && IN_RANGE (sel, 1, 2))
-	{
-	  rtx newop0 = op0, newop1 = op1;
-
-	  /* Canonicalize locally such that the VEC_DUPLICATE is always
-		 the first operand.  */
-	  if (GET_CODE (newop1) == VEC_DUPLICATE)
-		{
-		  std::swap (newop0, newop1);
-		  /* If we swap the operand order, we also need to swap
-		 the selector mask.  */
-		  sel = sel == 1 ? 2 : 1;
-		}
-
-	  if (GET_CODE (newop1) == SUBREG
-		  && paradoxical_subreg_p (newop1)
-		  && subreg_lowpart_p (newop1)
-		  && GET_MODE (SUBREG_REG (newop1))
-		  == GET_MODE (XEXP (newop0, 0)))
-		{
-		  newop0 = XEXP (newop0, 0);
-		  newop1 = SUBREG_REG (newop1);
-		  if (sel == 2)
-		std::swap (newop0, newop1);
-		  return simplify_gen_binary (VEC_CONCAT, mode,
-	  newop0, newop1);
-		}
-	}
-
 	  /* Replace (vec_merge (vec_duplicate x) (vec_duplicate y)
  (const_int n))
 	 with (vec_concat x y) or (vec_concat y x) depending on value
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-slp-dup.c b/gcc/testsuite/gcc.target/aarch64/vect-slp-dup.c
deleted file mode 100644
index 

[patch][arm] (respin) Improve error checking in parsecpu.awk

2017-09-22 Thread Vidya Praveen
Hello,

This patch by Richard Earnshaw was reverted along with the commit that preceded
it as the preceding commit was causing  cross-native builds  to fail and I
presumed this patch was related too.  Now I am respinning as the issue that
caused the cross-native failure have been fixed. This patch however is simply
rebased and has no other changes.

For reference, the ChangeLog of the preceding patch that broke cross-native 
build.

[arm] auto-generate arm-isa.h from CPU descriptions

This patch autogenerates arm-isa.h from new entries in arm-cpus.in.
This has the primary advantage that it makes the description file more
self-contained, but it also solves the 'array dimensioning' problem
that Tamar recently encountered.  It adds two new constructs to
arm-cpus.in: features and fgroups.  Fgroups are simply a way of naming
a group of feature bits so that they can be referenced together.  We
follow the convention that feature bits are all lower case, while
fgroups are (predominantly) upper case.  This is helpful as in some
contexts they share the same namespace.  Most of the minor changes in
this patch are related to adopting this new naming convention.

* config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file.
* config/arm/arm-isa.h: Delete.  Move definitions to ...
* arm-cpus.in: ... here.  Use new feature and fgroup values.
* config/arm/arm.c (arm_option_override): Use lower case for feature
bit names.
* config/arm/arm.h (TARGET_HARD_FLOAT): Likewise.
(TARGET_VFP3, TARGET_VFP5, TARGET_FMA): Likewise.
* config/arm/parsecpu.awk (END): Add new command 'isa'.
(isa_pfx): Delete.
(print_isa_bits_for): New function.
(gen_isa): New function.
(gen_comm_data): Use print_isa_bits_for.
(define feature): New keyword.
(define fgroup): New keyword.
* config/arm/t-arm (OPTIONS_H_EXTRA): Add arm-isa.h
(arm-isa.h): Add rule to generate file.
* common/config/arm/arm-common.c: (arm_canon_arch_option): Use lower
case for feature bit names.

Tested by building cross/cross-native arm-none-linux-gnueabihf and baremetal
cross build (arm-none-eabi) on x86_64.

OK for trunk?

Regards
VP.

gcc/ChangeLog:

[arm] Improve error checking in parsecpu.awk

This patch adds a bit more error checking to parsecpu.awk to ensure
that statements are not missing arguments or have excess arguments
beyond those permitted.  It also slightly improves the handling of
errors so that we terminate properly if parsing fails and be as
helpful as we can while in the parsing phase.

2017-09-22  Richard Earnshaw  

* config/arm/parsecpu.awk (fatal): Note that we've encountered an
error.  Only quit immediately if parsing is complete.
(BEGIN): Initialize fatal_err and parse_done.
(begin fpu, end fpu): Check number of arguments.
(begin arch, end arch): Likewise.
(begin cpu, end cpu): Likewise.
(cname, tune for, tune flags, architecture, fpu, option): Likewise.
(optalias): Likewise.
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 5b9217c..4885746 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -2154,6 +2154,17 @@
 
 2017-09-06  Richard Earnshaw  
 
+	* config/arm/parsecpu.awk (fatal): Note that we've encountered an
+	error.  Only quit immediately if parsing is complete.
+	(BEGIN): Initialize fatal_err and parse_done.
+	(begin fpu, end fpu): Check number of arguments.
+	(begin arch, end arch): Likewise.
+	(begin cpu, end cpu): Likewise.
+	(cname, tune for, tune flags, architecture, fpu, option): Likewise.
+	(optalias): Likewise.
+
+2017-09-06  Richard Earnshaw  
+
 	* config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file.
 	* config/arm/arm-isa.h: Delete.  Move definitions to ...
 	* arm-cpus.in: ... here.  Use new feature and fgroup values.
diff --git a/gcc/config/arm/parsecpu.awk b/gcc/config/arm/parsecpu.awk
index d07d3fc..0b4fc68 100644
--- a/gcc/config/arm/parsecpu.awk
+++ b/gcc/config/arm/parsecpu.awk
@@ -32,7 +32,8 @@
 
 function fatal (m) {
 print "error ("lineno"): " m > "/dev/stderr"
-exit 1
+fatal_err = 1
+if (parse_done) exit 1
 }
 
 function toplevel () {
@@ -502,14 +503,18 @@ BEGIN {
 arch_name = ""
 fpu_name = ""
 lineno = 0
+fatal_err = 0
+parse_done = 0
 if (cmd == "") fatal("Usage parsecpu.awk -v cmd=")
 }
 
+# New line.  Reset parse status and increment line count for error messages
 // {
 lineno++
 parse_ok = 0
 }
 
+# Comments must be on a line on their own.
 /^#/ {
 parse_ok = 1
 }
@@ -552,12 +557,14 @@ BEGIN {
 }
 
 /^begin fpu / {
+if (NF != 3) fatal("syntax: begin fpu ")
 toplevel()
 fpu_name = $3
 parse_ok = 1
 }
 
 /^end fpu / {
+if (NF != 3) fatal("syntax: end fpu ")
 if (fpu_name != $3) fatal("mimatched end fpu")
 if (! (fpu_name in fpu_isa)) {

[patch][arm] (respin) auto-generate arm-isa.h from CPU descriptions

2017-09-22 Thread Vidya Praveen
Hello,

This patch by Richard Earnshaw was reverted earlier as it was breaking
cross-native builds. Respinning now with a minor change that fixes the build
issue - adding arm-isa.h to GTM_H. Also remove a redundant dependency (TM_H
includes GTM_H).

Tested by building cross/cross-native arm-none-linux-gnueabihf and baremetal
cross build (arm-none-eabi) on x86_64.

OK for trunk?

Regards
VP.

gcc/ChangeLog:

[arm] auto-generate arm-isa.h from CPU descriptions

This patch autogenerates arm-isa.h from new entries in arm-cpus.in.
This has the primary advantage that it makes the description file more
self-contained, but it also solves the 'array dimensioning' problem
that Tamar recently encountered.  It adds two new constructs to
arm-cpus.in: features and fgroups.  Fgroups are simply a way of naming
a group of feature bits so that they can be referenced together.  We
follow the convention that feature bits are all lower case, while
fgroups are (predominantly) upper case. This is helpful as in some
contexts they share the same namespace. Most of the minor changes in
this patch are related to adopting this new naming convention.

2017-09-22  Richard Earnshaw  

* config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file.
* config/arm/arm-isa.h: Delete.  Move definitions to ...
* arm-cpus.in: ... here.  Use new feature and fgroup values.
* config/arm/arm.c (arm_option_override): Use lower case for feature
bit names.
* config/arm/arm.h (TARGET_HARD_FLOAT): Likewise.
(TARGET_VFP3, TARGET_VFP5, TARGET_FMA): Likewise.
* config/arm/parsecpu.awk (END): Add new command 'isa'.
(isa_pfx): Delete.
(print_isa_bits_for): New function.
(gen_isa): New function.
(gen_comm_data): Use print_isa_bits_for.
(define feature): New keyword.
(define fgroup): New keyword.
* config/arm/t-arm (TM_H): Remove.
(GTM_H): Add arm-isa.h.
(arm-isa.h): Add rule to generate file.
* common/config/arm/arm-common.c: (arm_canon_arch_option): Use lower
case for feature bit names.

diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config/arm/arm-common.c
index 38bd3a7..7cb99ec 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -574,7 +574,7 @@ arm_canon_arch_option (int argc, const char **argv)
 	{
 	  /* The easiest and safest way to remove the default fpu
 	 capabilities is to look for a '+no..' option that removes
-	 the base FPU bit (isa_bit_VFPv2).  If that doesn't exist
+	 the base FPU bit (isa_bit_vfpv2).  If that doesn't exist
 	 then the best we can do is strip out all the bits that
 	 might be part of the most capable FPU we know about,
 	 which is "crypto-neon-fp-armv8".  */
@@ -586,7 +586,7 @@ arm_canon_arch_option (int argc, const char **argv)
 		   ++ext)
 		{
 		  if (ext->remove
-		  && check_isa_bits_for (ext->isa_bits, isa_bit_VFPv2))
+		  && check_isa_bits_for (ext->isa_bits, isa_bit_vfpv2))
 		{
 		  arm_initialize_isa (fpu_isa, ext->isa_bits);
 		  bitmap_and_compl (target_isa, target_isa, fpu_isa);
@@ -620,7 +620,7 @@ arm_canon_arch_option (int argc, const char **argv)
 {
   /* Clearing the VFPv2 bit is sufficient to stop any extention that
 	 builds on the FPU from matching.  */
-  bitmap_clear_bit (target_isa, isa_bit_VFPv2);
+  bitmap_clear_bit (target_isa, isa_bit_vfpv2);
 }
 
   /* If we don't have a selected architecture by now, something's
@@ -692,8 +692,8 @@ arm_canon_arch_option (int argc, const char **argv)
  capable FPU variant that we do support.  This is sufficient for
  multilib selection.  */
 
-  if (bitmap_bit_p (target_isa_unsatisfied, isa_bit_VFPv2)
-  && bitmap_bit_p (fpu_isa, isa_bit_VFPv2))
+  if (bitmap_bit_p (target_isa_unsatisfied, isa_bit_vfpv2)
+  && bitmap_bit_p (fpu_isa, isa_bit_vfpv2))
 {
   std::list::iterator ipoint = extensions.begin ();
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 555ed69..00225104 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -593,7 +593,7 @@ x86_64-*-*)
 	tm_file="vxworks-dummy.h ${tm_file}"
 	;;
 arm*-*-*)
-	tm_p_file="arm/arm-flags.h arm/arm-isa.h ${tm_p_file} arm/aarch-common-protos.h"
+	tm_p_file="arm/arm-flags.h ${tm_p_file} arm/aarch-common-protos.h"
 	tm_file="vxworks-dummy.h ${tm_file}"
 	;;
 mips*-*-* | sh*-*-* | sparc*-*-*)
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index d009a9e..07de4c9 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -40,6 +40,210 @@
 # names in the final compiler.  The order within each group is preserved and
 # forms the order for the list within the compiler.
 
+# Most objects in this file support forward references.  The major
+# exception is feature groups, which may only refer to previously
+# defined features or feature groups.  This is done to avoid the risk
+# of 

Re: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-09-11 Thread Vidya Praveen
On Tue, Sep 05, 2017 at 03:12:47PM +0200, Richard Biener wrote:
> On Tue, 5 Sep 2017, Tamar Christina wrote:
> 
> > 
> > 
> > > -Original Message-
> > > From: Richard Biener [mailto:rguent...@suse.de]
> > > Sent: 05 September 2017 13:51
> > > To: Tamar Christina
> > > Cc: Andrew Pinski; Andreas Schwab; Jon Beniston; gcc-patches@gcc.gnu.org;
> > > nd
> > > Subject: RE: [RFC, vectorizer] Allow single element vector types for 
> > > vector
> > > reduction operations
> > > 
> > > On Tue, 5 Sep 2017, Richard Biener wrote:
> > > 
> > > > On Tue, 5 Sep 2017, Tamar Christina wrote:
> > > >
> > > > > Hi Richard,
> > > > >
> > > > > That was an really interesting analysis, thanks for the details!
> > > > >
> > > > > Would you be submitting the patch you proposed at the end as a fix?
> > > >
> > > > I'm testing it currently.
> > > 
> > > Unfortunately it breaks some required lowering.  I'll have to more closely
> > > look at this.
> > 
> > Ah, ok. In the meantime, can this patch be reverted? It's currently 
> > breaking spec for us so we're
> > Not able to get any benchmarking numbers.
> 
> Testing the following instead:

Any news on this?

VP.


> 
> Index: gcc/tree-vect-generic.c
> ===
> --- gcc/tree-vect-generic.c (revision 251642)
> +++ gcc/tree-vect-generic.c (working copy)
> @@ -1640,7 +1640,7 @@ expand_vector_operations_1 (gimple_stmt_
>|| code == VEC_UNPACK_FLOAT_LO_EXPR)
>  type = TREE_TYPE (rhs1);
>  
> -  /* For widening/narrowing vector operations, the relevant type is of 
> the
> +  /* For widening vector operations, the relevant type is of the
>   arguments, not the widened result.  VEC_UNPACK_FLOAT_*_EXPR is
>   calculated in the same way above.  */
>if (code == WIDEN_SUM_EXPR
> @@ -1650,9 +1650,6 @@ expand_vector_operations_1 (gimple_stmt_
>|| code == VEC_WIDEN_MULT_ODD_EXPR
>|| code == VEC_UNPACK_HI_EXPR
>|| code == VEC_UNPACK_LO_EXPR
> -  || code == VEC_PACK_TRUNC_EXPR
> -  || code == VEC_PACK_SAT_EXPR
> -  || code == VEC_PACK_FIX_TRUNC_EXPR
>|| code == VEC_WIDEN_LSHIFT_HI_EXPR
>|| code == VEC_WIDEN_LSHIFT_LO_EXPR)
>  type = TREE_TYPE (rhs1);
> 
> 
> also fix for a bug uncovered by the previous one:
> 
> Index: gcc/gimple-ssa-strength-reduction.c
> ===
> --- gcc/gimple-ssa-strength-reduction.c (revision 251710)
> +++ gcc/gimple-ssa-strength-reduction.c (working copy)
> @@ -1742,8 +1742,7 @@ find_candidates_dom_walker::before_dom_c
> slsr_process_ref (gs);
>  
>else if (is_gimple_assign (gs)
> -  && SCALAR_INT_MODE_P
> -   (TYPE_MODE (TREE_TYPE (gimple_assign_lhs (gs)
> +  && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (gs
> {
>   tree rhs1 = NULL_TREE, rhs2 = NULL_TREE;
>  
> 


Re: [COMMITTED][arm] Revert r251800 & r251799

2017-09-11 Thread Vidya Praveen
Now with the patch :-)

VP.

On Mon, Sep 11, 2017 at 03:20:12PM +0100, Vidya Praveen wrote:
> Hello,
> 
> The following two related patches need to be reverted as it causes 
> cross-native
> builds to fail with the following message:
> 
> g++ -c -DIN_GCC -DGENERATOR_FILE  -I. [...] \
>   -o build/genpreds.o /path/to/src/gcc/gcc/genpreds.c
> In file included from ./options.h:8:0,
>  from ./tm.h:23,
>  from /path/to/src/gcc/gcc/genpreds.c:26:
> /path/to/src/gcc/gcc/config/arm/arm-opts.h:29:21: fatal error: arm-isa.h: No 
> such file or directory
>  #include "arm-isa.h"
>   ^
> genpreds depends on GTM_H which does not depend on options.h, or any of its
> dependencies.  Nevertheless, it still tries to include options.h when reading
> tm.h, so we miss the rule to build arm-isa.h. It is unclear why it is only an
> issue with the cross-native builds.
> 
> For now, in order to keep the builds going, I am reverting these patches.
> 
> 
> r251800 | rearnsha | 2017-09-06 14:42:54 +0100 (Wed, 06 Sep 2017) | 16 lines
> 
> [arm] Improve error checking in parsecpu.awk
> 
> This patch adds a bit more error checking to parsecpu.awk to ensure
> that statements are not missing arguments or have excess arguments
> beyond those permitted.  It also slightly improves the handling of
> errors so that we terminate properly if parsing fails and be as
> helpful as we can while in the parsing phase.
> 
> * config/arm/parsecpu.awk (fatal): Note that we've encountered an
> error.  Only quit immediately if parsing is complete.
> (BEGIN): Initialize fatal_err and parse_done.
> (begin fpu, end fpu): Check number of arguments.
> (begin arch, end arch): Likewise.
> (begin cpu, end cpu): Likewise.
> (cname, tune for, tune flags, architecture, fpu, option): Likewise.
> (optalias): Likewise.
> 
> r251799 | rearnsha | 2017-09-06 14:42:46 +0100 (Wed, 06 Sep 2017) | 31 lines
> 
> [arm] auto-generate arm-isa.h from CPU descriptions
> 
> This patch autogenerates arm-isa.h from new entries in arm-cpus.in.
> This has the primary advantage that it makes the description file more
> self-contained, but it also solves the 'array dimensioning' problem
> that Tamar recently encountered.  It adds two new constructs to
> arm-cpus.in: features and fgroups.  Fgroups are simply a way of naming
> a group of feature bits so that they can be referenced together.  We
> follow the convention that feature bits are all lower case, while
> fgroups are (predominantly) upper case.  This is helpful as in some
> contexts they share the same namespace.  Most of the minor changes in
> this patch are related to adopting this new naming convention.
> 
> * config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file.
> * config/arm/arm-isa.h: Delete.  Move definitions to ...
> * arm-cpus.in: ... here.  Use new feature and fgroup values.
> * config/arm/arm.c (arm_option_override): Use lower case for feature
> bit names.
> * config/arm/arm.h (TARGET_HARD_FLOAT): Likewise.
> (TARGET_VFP3, TARGET_VFP5, TARGET_FMA): Likewise.
> * config/arm/parsecpu.awk (END): Add new command 'isa'.
> (isa_pfx): Delete.
> (print_isa_bits_for): New function.
> (gen_isa): New function.
> (gen_comm_data): Use print_isa_bits_for.
> (define feature): New keyword.
> (define fgroup): New keyword.
> * config/arm/t-arm (OPTIONS_H_EXTRA): Add arm-isa.h
>     (arm-isa.h): Add rule to generate file.
> * common/config/arm/arm-common.c: (arm_canon_arch_option): Use lower
> case for feature bit names.
> 
> Regards,
> VP.
> 
> 
> gcc/ChangeLog:
> 
> 2017-09-11  Vidya Praveen  <vidyaprav...@arm.com>
> 
>   Revert r251800 and r251799.
diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config/arm/arm-common.c
index 7cb99ec..38bd3a7 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -574,7 +574,7 @@ arm_canon_arch_option (int argc, const char **argv)
 	{
 	  /* The easiest and safest way to remove the default fpu
 	 capabilities is to look for a '+no..' option that removes
-	 the base FPU bit (isa_bit_vfpv2).  If that doesn't exist
+	 the base FPU bit (isa_bit_VFPv2).  If that doesn't exist
 	 then the best we can do is strip out all the bits that
 	 might be part of the most capable FPU we know about,
 	 which is "crypto-neon-fp-armv8".  */
@@ -586,7 +586,7 @@ arm_canon_arch_option (int argc, const char **argv)
 		   ++ext)
 		{
 		  if (ext->remove
-		  && check_isa_bits_f

[COMMITTED][arm] Revert r251800 & r251799

2017-09-11 Thread Vidya Praveen
Hello,

The following two related patches need to be reverted as it causes cross-native
builds to fail with the following message:

g++ -c -DIN_GCC -DGENERATOR_FILE  -I. [...] \
-o build/genpreds.o /path/to/src/gcc/gcc/genpreds.c
In file included from ./options.h:8:0,
 from ./tm.h:23,
 from /path/to/src/gcc/gcc/genpreds.c:26:
/path/to/src/gcc/gcc/config/arm/arm-opts.h:29:21: fatal error: arm-isa.h: No 
such file or directory
 #include "arm-isa.h"
  ^
genpreds depends on GTM_H which does not depend on options.h, or any of its
dependencies.  Nevertheless, it still tries to include options.h when reading
tm.h, so we miss the rule to build arm-isa.h. It is unclear why it is only an
issue with the cross-native builds.

For now, in order to keep the builds going, I am reverting these patches.


r251800 | rearnsha | 2017-09-06 14:42:54 +0100 (Wed, 06 Sep 2017) | 16 lines

[arm] Improve error checking in parsecpu.awk

This patch adds a bit more error checking to parsecpu.awk to ensure
that statements are not missing arguments or have excess arguments
beyond those permitted.  It also slightly improves the handling of
errors so that we terminate properly if parsing fails and be as
helpful as we can while in the parsing phase.

* config/arm/parsecpu.awk (fatal): Note that we've encountered an
error.  Only quit immediately if parsing is complete.
(BEGIN): Initialize fatal_err and parse_done.
(begin fpu, end fpu): Check number of arguments.
(begin arch, end arch): Likewise.
(begin cpu, end cpu): Likewise.
(cname, tune for, tune flags, architecture, fpu, option): Likewise.
(optalias): Likewise.

r251799 | rearnsha | 2017-09-06 14:42:46 +0100 (Wed, 06 Sep 2017) | 31 lines

[arm] auto-generate arm-isa.h from CPU descriptions

This patch autogenerates arm-isa.h from new entries in arm-cpus.in.
This has the primary advantage that it makes the description file more
self-contained, but it also solves the 'array dimensioning' problem
that Tamar recently encountered.  It adds two new constructs to
arm-cpus.in: features and fgroups.  Fgroups are simply a way of naming
a group of feature bits so that they can be referenced together.  We
follow the convention that feature bits are all lower case, while
fgroups are (predominantly) upper case.  This is helpful as in some
contexts they share the same namespace.  Most of the minor changes in
this patch are related to adopting this new naming convention.

* config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file.
* config/arm/arm-isa.h: Delete.  Move definitions to ...
* arm-cpus.in: ... here.  Use new feature and fgroup values.
* config/arm/arm.c (arm_option_override): Use lower case for feature
bit names.
* config/arm/arm.h (TARGET_HARD_FLOAT): Likewise.
(TARGET_VFP3, TARGET_VFP5, TARGET_FMA): Likewise.
* config/arm/parsecpu.awk (END): Add new command 'isa'.
(isa_pfx): Delete.
(print_isa_bits_for): New function.
(gen_isa): New function.
(gen_comm_data): Use print_isa_bits_for.
(define feature): New keyword.
(define fgroup): New keyword.
* config/arm/t-arm (OPTIONS_H_EXTRA): Add arm-isa.h
(arm-isa.h): Add rule to generate file.
* common/config/arm/arm-common.c: (arm_canon_arch_option): Use lower
case for feature bit names.

Regards,
VP.


gcc/ChangeLog:

2017-09-11  Vidya Praveen  <vidyaprav...@arm.com>

Revert r251800 and r251799.


Re: [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754).

2017-01-18 Thread Vidya Praveen
On Wed, Jan 18, 2017 at 11:10:32AM +0100, Martin Liška wrote:
> Hello.
> 
> After basic understanding of loop predictive commoning, the problematic 
> combined chain is:
> 
> Loads-only chain 0x38b6730 (combined)
>   max distance 0
>   references:
> MEM[(real(kind=8) *)vectp_a.29_81] (id 1)
>   offset 20
>   distance 0
> MEM[(real(kind=8) *)vectp_a.38_141] (id 3)
>   offset 20
>   distance 0
> 
> Loads-only chain 0x38b68b0 (combined)
>   max distance 0
>   references:
> MEM[(real(kind=8) *)vectp_a.23_102] (id 0)
>   offset 0
>   distance 0
> MEM[(real(kind=8) *)vectp_a.33_33] (id 2)
>   offset 0
>   distance 0
> 
> Combination chain 0x38b65b0
>   max distance 0, may reuse first
>   equal to 0x38b6730 + 0x38b68b0 in type vector(2) real(kind=8)
>   references:
> combination ref
>   in statement predreastmp.48_10 = vect__32.31_78 + vect__28.25_100;
> 
>   distance 0
> combination ref
>   in statement predreastmp.50_17 = vect__42.41_138 + vect__38.36_29;
> 
>   distance 0
> 
> It's important to note that distance is equal to zero (happening within a 
> same loop iteration).
> Aforementioned chains correspond to:
> 
> ...
> r2:  vect__28.25_100 = MEM[(real(kind=8) *)vectp_a.23_102];
>   vectp_a.23_99 = vectp_a.23_102 + 16;
>   vect__28.26_98 = MEM[(real(kind=8) *)vectp_a.23_99];
>   vect__82.27_97 = vect__22.22_108;
>   vect__82.27_96 = vect__22.22_107;
>   vect__79.28_95 = vect__82.27_97 + vect__84.17_120;
>   vect__79.28_94 = vect__82.27_96 + vect__84.17_119;
> r1:  vect__32.31_78 = MEM[(real(kind=8) *)vectp_a.29_81];
>   vectp_a.29_77 = vectp_a.29_81 + 16;
>   vect__32.32_76 = MEM[(real(kind=8) *)vectp_a.29_77];
>   vect__38.35_39 = MEM[(real(kind=8) *)vectp_a.33_57];
> r2':  vectp_a.33_33 = vectp_a.33_57 + 16;
>   vect__38.36_29 = MEM[(real(kind=8) *)vectp_a.33_33];
>   vect__56.37_23 = vect__38.35_39;
>   vect__56.37_15 = vect__32.32_76;
>   vect__42.40_161 = MEM[(real(kind=8) *)vectp_a.38_163];
>   vectp_a.38_141 = vectp_a.38_163 + 16;
> r1':  vect__42.41_138 = MEM[(real(kind=8) *)vectp_a.38_141];
>   vect__54.42_135 = vect__42.40_161 + vect__56.37_23;
> r1'+r2':  predreastmp.50_17 = vect__42.41_138 + vect__38.36_29;
>   predreastmp.51_18 = vect__56.37_15;
>   vect__54.42_134 = predreastmp.50_17;
> r1+r2:  predreastmp.48_10 = vect__32.31_78 + vect__28.25_100;
> ...
> 
> Problematic construct is that while having load-only chains r1->r1' and 
> r2->r2', the combination
> is actually r1'+r2'->r1+r2, which cause the troubles. I believe the proper 
> fix is to reject such
> combinations where combined root stmt does not dominate usages. It's probably 
> corner case as it does
> not reuse any values among loop iterations (which is main motivation of the 
> pass), it's doing PRE
> if I'm right.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

I could bootstrap on aarch64-none-linux-gnu without any issues, regression
tests are fine and the testcase compiles without ICE.

Thanks for fixing this.

VP.




> 
> Ready to be installed?
> Martin
> 

> From 41b153cf975374fff48419ec8ac5991ac134735f Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Tue, 17 Jan 2017 14:22:40 +0100
> Subject: [PATCH] Be careful about combined chain with length == 0 (PR
>  tree-optimization/70754).
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-01-17  Martin Liska  
> 
>   PR tree-optimization/70754
>   * gfortran.dg/pr70754.f90: New test.
> 
> gcc/ChangeLog:
> 
> 2017-01-17  Martin Liska  
> 
>   PR tree-optimization/70754
>   * tree-predcom.c (combine_chains): Do not create a combined chain
>   with length equal to zero when root_stmt does not dominate
>   stmts of references.
> ---
>  gcc/testsuite/gfortran.dg/pr70754.f90 | 35 
> +++
>  gcc/tree-predcom.c| 10 ++
>  2 files changed, 45 insertions(+)
>  create mode 100644 gcc/testsuite/gfortran.dg/pr70754.f90
> 
> diff --git a/gcc/testsuite/gfortran.dg/pr70754.f90 
> b/gcc/testsuite/gfortran.dg/pr70754.f90
> new file mode 100644
> index 000..758901ce2b2
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/pr70754.f90
> @@ -0,0 +1,35 @@
> +! { dg-options "-Ofast" }
> +
> +module m
> +  implicit none
> +  private
> +  save
> +
> +  integer, parameter, public :: &
> +ii4  = selected_int_kind(6), &
> +rr8  = selected_real_kind(13)
> +
> +  integer (ii4), dimension(40,40,199), public :: xyz
> +  public :: foo
> +contains
> +  subroutine foo(a)
> +real (rr8), dimension(40,40), intent(out) :: a
> +real (rr8), dimension(40,40) :: b
> +integer (ii4), dimension(40,40) :: c
> +integer  i, j
> +
> +do i=1,8
> +  b(i,j) = 123 * a(i,j) + a(i,j+1) &
> + + a(i,j) + a(i+1,j+1) &
> + + a(i+1,j) + a(i-1,j+1) &
> + + a(i-1,j)
> +  c(i,j) = 123
> +end do
> +
> +where ((xyz(:,:,2) /= 0) 

Re: [PATCH, rs6000, testsuite, PR65456] Changes for unaligned vector load/store support on POWER8

2015-06-16 Thread Vidya Praveen
On Mon, Jun 15, 2015 at 08:14:31PM +0100, Bill Schmidt wrote:
 On Fri, 2015-06-12 at 17:36 +0100, Vidya Praveen wrote:
  On Thu, Apr 30, 2015 at 01:34:18PM +0100, Bill Schmidt wrote:
   On Thu, 2015-04-30 at 18:26 +0800, Bin.Cheng wrote:
On Mon, Apr 27, 2015 at 9:26 PM, Bill Schmidt
wschm...@linux.vnet.ibm.com wrote:
 On Mon, 2015-04-27 at 14:23 +0800, Bin.Cheng wrote:
 On Mon, Mar 30, 2015 at 1:42 AM, Bill Schmidt
 wschm...@linux.vnet.ibm.com wrote:


  Index: gcc/testsuite/gcc.dg/vect/vect-33.c
  ===
  --- gcc/testsuite/gcc.dg/vect/vect-33.c (revision 221118)
  +++ gcc/testsuite/gcc.dg/vect/vect-33.c (working copy)
  @@ -36,9 +36,10 @@ int main (void)
 return main1 ();
   }
 
  +/* vect_hw_misalign  { ! vect64 } */
 
   /* { dg-final { scan-tree-dump-times vectorized 1 loops 1 
  vect  } } */
  -/* { dg-final { scan-tree-dump Vectorizing an unaligned access 
  vect { target { vect_hw_misalign  { {! vect64} || 
  vect_multiple_sizes } } } } } */
  +/* { dg-final { scan-tree-dump Vectorizing an unaligned access 
  vect { target { { { ! powerpc*-*-* }  vect_hw_misalign }  { 
  { ! vect64 } || vect_multiple_sizes } } } } }  */
   /* { dg-final { scan-tree-dump Alignment of access forced using 
  peeling vect { target { vector_alignment_reachable  { vect64 
   {! vect_multiple_sizes} } } } } } */
   /* { dg-final { scan-tree-dump-times Alignment of access forced 
  using versioning 1 vect { target { { {! 
  vector_alignment_reachable} || {! vect64} }  {! 
  vect_hw_misalign} } } } } */
   /* { dg-final { cleanup-tree-dump vect } } */

 Hi Bill,
 With this change, the test case is skipped on aarch64 now.  Since it
 passed before, Is it expected to act like this on 64bit platforms?

 Hi Bin,

 No, that's a mistake on my part -- thanks for the report!  That first
 added line was not intended to be part of the patch:

 +/* vect_hw_misalign  { ! vect64 } */

 Please try removing that line and verify that the patch succeeds again
 for ARM.  Assuming so, I'll prepare a patch to fix this.

 It looks like this mistake was introduced only in this particular 
 test,
 but please let me know if you see any other anomalies.
Hi Bill,
I chased the wrong branch.  The test disappeared on fsf-48 branch in
out build, rather than trunk.  I guess it's not your patch's fault.
Will follow up and get back to you later.
Sorry for the inconvenience.
   
   OK, thanks for letting me know!  There was still a bad line in this
   patch, although it was only introduced in 5.1 and trunk, so I guess that
   wasn't responsible in this case.  Thanks for checking!
  
  
  Hi Bill,
  
  In 4.8 branch, you have changed:
  
  -/* { dg-final { scan-tree-dump-times Vectorizing an unaligned access 0 
  vect } } */
  +/* { dg-final { scan-tree-dump-times Vectorizing an unaligned access 0 
  vect { target { ! vect_hw_misalign } } } } */
  
  Whereas your comment says:
  
 2015-04-24  Bill Schmidt  wschm...@linux.vnet.ibm.com
  
  Backport from mainline r222349
  2015-04-22  Bill Schmidt  wschm...@linux.vnet.ibm.com
  
  PR target/65456
  [...]
  * gcc.dg/vect/vect-33.c: Exclude unaligned access test for
  POWER8.
  [...]
  
  There wasn't an unaligned access test in the first place. But if you wanted 
  to
  introduce it and exclude it for POWER8 then it should've been:
  
   ...   { { ! powerpc*-*-* }  vect_hw_misalign } ...
  
  like you have done for the trunk. At the moment, this change has made the 
  test
  to be skipped for AArch64. It should've been skipped for x86_64-*-* and 
  i*86-*-*
  as well.
  
  I believe it wasn't intended to be skipped so?
 
 Right, wasn't intended to be skipped.  This test changed substantially
 between 4.8 and 4.9, so when I did the backport I tried (and failed) to
 adjust it properly.
 
 Because the sense of the test has been reversed, I believe the correct
 change is
 
 /* { dg-final { scan-tree-dump-times Vectorizing an unaligned access 0 
 vect { target { { ! powerpc*-*-* } || { ! vect_hw_misalign } } } } } */

Makes sense. If I understand it right, it shouldn't vectorize for targets
(except powerpc) that support vector misalign access?

Regards
VP.






 
 I'll give that a quick test.
 
 Bill
 
  
  Regards
  VP.
  
  
  
  
  
   
   Bill
   

Thanks,
bin

 Thanks very much!

 Bill

 PASS-NA: gcc.dg/vect/vect-33.c -flto -ffat-lto-objects
 scan-tree-dump-times vect Vectorizing an unaligned access 0
 PASS-NA: gcc.dg/vect/vect-33.c scan-tree-dump-times vect 
 Vectorizing
 an unaligned access 0

 Thanks,
 bin




   
   
  
 
 



Re: [PATCH, rs6000, testsuite, PR65456] Changes for unaligned vector load/store support on POWER8

2015-06-12 Thread Vidya Praveen
On Thu, Apr 30, 2015 at 01:34:18PM +0100, Bill Schmidt wrote:
 On Thu, 2015-04-30 at 18:26 +0800, Bin.Cheng wrote:
  On Mon, Apr 27, 2015 at 9:26 PM, Bill Schmidt
  wschm...@linux.vnet.ibm.com wrote:
   On Mon, 2015-04-27 at 14:23 +0800, Bin.Cheng wrote:
   On Mon, Mar 30, 2015 at 1:42 AM, Bill Schmidt
   wschm...@linux.vnet.ibm.com wrote:
  
  
Index: gcc/testsuite/gcc.dg/vect/vect-33.c
===
--- gcc/testsuite/gcc.dg/vect/vect-33.c (revision 221118)
+++ gcc/testsuite/gcc.dg/vect/vect-33.c (working copy)
@@ -36,9 +36,10 @@ int main (void)
   return main1 ();
 }
   
+/* vect_hw_misalign  { ! vect64 } */
   
 /* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect  } 
} */
-/* { dg-final { scan-tree-dump Vectorizing an unaligned access 
vect { target { vect_hw_misalign  { {! vect64} || 
vect_multiple_sizes } } } } } */
+/* { dg-final { scan-tree-dump Vectorizing an unaligned access 
vect { target { { { ! powerpc*-*-* }  vect_hw_misalign }  { { ! 
vect64 } || vect_multiple_sizes } } } } }  */
 /* { dg-final { scan-tree-dump Alignment of access forced using 
peeling vect { target { vector_alignment_reachable  { vect64  
{! vect_multiple_sizes} } } } } } */
 /* { dg-final { scan-tree-dump-times Alignment of access forced 
using versioning 1 vect { target { { {! vector_alignment_reachable} 
|| {! vect64} }  {! vect_hw_misalign} } } } } */
 /* { dg-final { cleanup-tree-dump vect } } */
  
   Hi Bill,
   With this change, the test case is skipped on aarch64 now.  Since it
   passed before, Is it expected to act like this on 64bit platforms?
  
   Hi Bin,
  
   No, that's a mistake on my part -- thanks for the report!  That first
   added line was not intended to be part of the patch:
  
   +/* vect_hw_misalign  { ! vect64 } */
  
   Please try removing that line and verify that the patch succeeds again
   for ARM.  Assuming so, I'll prepare a patch to fix this.
  
   It looks like this mistake was introduced only in this particular test,
   but please let me know if you see any other anomalies.
  Hi Bill,
  I chased the wrong branch.  The test disappeared on fsf-48 branch in
  out build, rather than trunk.  I guess it's not your patch's fault.
  Will follow up and get back to you later.
  Sorry for the inconvenience.
 
 OK, thanks for letting me know!  There was still a bad line in this
 patch, although it was only introduced in 5.1 and trunk, so I guess that
 wasn't responsible in this case.  Thanks for checking!


Hi Bill,

In 4.8 branch, you have changed:

-/* { dg-final { scan-tree-dump-times Vectorizing an unaligned access 0 
vect } } */
+/* { dg-final { scan-tree-dump-times Vectorizing an unaligned access 0 
vect { target { ! vect_hw_misalign } } } } */

Whereas your comment says:

   2015-04-24  Bill Schmidt  wschm...@linux.vnet.ibm.com

Backport from mainline r222349
2015-04-22  Bill Schmidt  wschm...@linux.vnet.ibm.com

PR target/65456
[...]
* gcc.dg/vect/vect-33.c: Exclude unaligned access test for
POWER8.
[...]

There wasn't an unaligned access test in the first place. But if you wanted to
introduce it and exclude it for POWER8 then it should've been:

 ...   { { ! powerpc*-*-* }  vect_hw_misalign } ...

like you have done for the trunk. At the moment, this change has made the test
to be skipped for AArch64. It should've been skipped for x86_64-*-* and i*86-*-*
as well.

I believe it wasn't intended to be skipped so?

Regards
VP.





 
 Bill
 
  
  Thanks,
  bin
  
   Thanks very much!
  
   Bill
  
   PASS-NA: gcc.dg/vect/vect-33.c -flto -ffat-lto-objects
   scan-tree-dump-times vect Vectorizing an unaligned access 0
   PASS-NA: gcc.dg/vect/vect-33.c scan-tree-dump-times vect Vectorizing
   an unaligned access 0
  
   Thanks,
   bin
  
  
  
  
 
 



Re: [PATCH, RFC] New memory usage statistics infrastructure

2015-06-01 Thread Vidya Praveen

On 01/06/15 15:21, Vidya Praveen wrote:

On 01/06/15 15:08, Martin Liška wrote:

On 06/01/2015 02:18 PM, Richard Biener wrote:

On Mon, Jun 1, 2015 at 1:38 PM, Martin Liška mli...@suse.cz wrote:

On 05/29/2015 06:09 PM, Vidya Praveen wrote:


Martin,

The following change:

@@ -2655,10 +2655,10 @@ s-iov: build/gcov-iov$(build_exeext) $(BASEVER) 
$(DEVPHASE)

   GCOV_OBJS = gcov.o
   gcov$(exeext): $(GCOV_OBJS) $(LIBDEPS)
-   +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_OBJS) $(LIBS) -o $@
+   +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_OBJS) 
build/hash-table.o ggc-none.o $(LIBS) -o $@


seem to cause canadian cross build failure for arm and aarch64 on x86_64 as
build/hash-table.o and ggc-none.o are not built by the same compiler?

arm-none-linux-gnueabi-g++ -no-pie   -g -O2 -DIN_GCC-fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing
+-Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual 
-pedantic -Wno-long-long -Wn
  build/hash-table.o ggc-none.o libcommon.a ../libcpp/libcpp.a 
../libbacktrace/.libs/libbacktrace.a ../libiberty/libiberty.a ..
+/libdecnumber/libdecnumber.a  -o gcov
build/hash-table.o: file not recognized: File format not recognized
collect2: error: ld returned 1 exit status
make[1]: *** [gcov] Error 1


Should it be:

-   +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_OBJS) $(LIBS) -o $@
+   +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_OBJS) hash-table.o 
ggc-none.o $(LIBS) -o $@

instead?


Hello Vidya.

Thanks for pointing out. To be honest, I'm not a build system guru and it's 
hard for me to verify
that the change you suggest is the correct.

May I please ask you for sending a patch to mailing?


gcov isn't a build but a host tool so the patch looks good to me.

Richard.


Thanks,
Martin



VP.


On 15/05/15 15:38, Martin Liška wrote:

Hello.

Following patch attempts to rewrite memory reports for GCC's internal 
allocations
so that it uses a new template type. The type shares parts which are currently 
duplicated,
adds support for special 'counters' and introduces new support for 
hash-{set,map,table}.

Transformation of the current code is a bit tricky as we internally used 
hash-table as main
data structure which takes care of location-related allocations. As I want to 
add support even
for hash tables (and all derived types), header files inclusion and forward 
declaration is utilized.

Feel free to comment the patch, as well as missing features one may want to 
track by location sensitive
memory allocation.

Attachment contains sample output taken from tramp3d-v4.cpp.

Thanks,
Martin







Ok.

I'm going to install following patch.



Martin,

I realized we require change in one more place. I'm just doing builds to verify
this.

VP.




   GCOV_OBJS = gcov.o
   gcov$(exeext): $(GCOV_OBJS) $(LIBDEPS)
  +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_OBJS) \
-   build/hash-table.o ggc-none.o $(LIBS) -o $@
+   hash-table.o ggc-none.o $(LIBS) -o $@
   GCOV_DUMP_OBJS = gcov-dump.o
   gcov-dump$(exeext): $(GCOV_DUMP_OBJS) $(LIBDEPS)
  +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_DUMP_OBJS) \
-   build/hash-table.o build/ggc-none.o\
+   hash-table.o ggc-none.o\
  $(LIBS) -o $@

   GCOV_TOOL_DEP_FILES = $(srcdir)/../libgcc/libgcov-util.c gcov-io.c 
$(GCOV_IO_H) \



Installing the following patch as it is obvious and same kind of change as the
the previous one (which was approved by richi). Verified by building canadian
cross of aarch64-none-linux-gnu and cross build of arm-none-eabi.

gcc/ChangeLog:

2015-06-01  Vidya Praveen  vidyaprav...@arm.com

* Makefile.in: Pick up gcov-dump dependencies from gcc/ directory
rather than from gcc/build directory.


diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 952f285..3d14938 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2671,7 +2671,7 @@ gcov$(exeext): $(GCOV_OBJS) $(LIBDEPS)
 GCOV_DUMP_OBJS = gcov-dump.o
 gcov-dump$(exeext): $(GCOV_DUMP_OBJS) $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_DUMP_OBJS) \
-   build/hash-table.o build/ggc-none.o\
+   hash-table.o ggc-none.o\
$(LIBS) -o $@

 GCOV_TOOL_DEP_FILES = $(srcdir)/../libgcc/libgcov-util.c gcov-io.c 
$(GCOV_IO_H) \




Re: [PATCH, RFC] New memory usage statistics infrastructure

2015-06-01 Thread Vidya Praveen

On 01/06/15 15:08, Martin Liška wrote:

On 06/01/2015 02:18 PM, Richard Biener wrote:

On Mon, Jun 1, 2015 at 1:38 PM, Martin Liška mli...@suse.cz wrote:

On 05/29/2015 06:09 PM, Vidya Praveen wrote:


Martin,

The following change:

@@ -2655,10 +2655,10 @@ s-iov: build/gcov-iov$(build_exeext) $(BASEVER) 
$(DEVPHASE)

  GCOV_OBJS = gcov.o
  gcov$(exeext): $(GCOV_OBJS) $(LIBDEPS)
-   +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_OBJS) $(LIBS) -o $@
+   +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_OBJS) 
build/hash-table.o ggc-none.o $(LIBS) -o $@


seem to cause canadian cross build failure for arm and aarch64 on x86_64 as
build/hash-table.o and ggc-none.o are not built by the same compiler?

arm-none-linux-gnueabi-g++ -no-pie   -g -O2 -DIN_GCC-fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing
+-Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual 
-pedantic -Wno-long-long -Wn
 build/hash-table.o ggc-none.o libcommon.a ../libcpp/libcpp.a 
../libbacktrace/.libs/libbacktrace.a ../libiberty/libiberty.a ..
+/libdecnumber/libdecnumber.a  -o gcov
build/hash-table.o: file not recognized: File format not recognized
collect2: error: ld returned 1 exit status
make[1]: *** [gcov] Error 1


Should it be:

-   +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_OBJS) $(LIBS) -o $@
+   +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_OBJS) hash-table.o 
ggc-none.o $(LIBS) -o $@

instead?


Hello Vidya.

Thanks for pointing out. To be honest, I'm not a build system guru and it's 
hard for me to verify
that the change you suggest is the correct.

May I please ask you for sending a patch to mailing?


gcov isn't a build but a host tool so the patch looks good to me.

Richard.


Thanks,
Martin



VP.


On 15/05/15 15:38, Martin Liška wrote:

Hello.

Following patch attempts to rewrite memory reports for GCC's internal 
allocations
so that it uses a new template type. The type shares parts which are currently 
duplicated,
adds support for special 'counters' and introduces new support for 
hash-{set,map,table}.

Transformation of the current code is a bit tricky as we internally used 
hash-table as main
data structure which takes care of location-related allocations. As I want to 
add support even
for hash tables (and all derived types), header files inclusion and forward 
declaration is utilized.

Feel free to comment the patch, as well as missing features one may want to 
track by location sensitive
memory allocation.

Attachment contains sample output taken from tramp3d-v4.cpp.

Thanks,
Martin







Ok.

I'm going to install following patch.



Martin,

I realized we require change in one more place. I'm just doing builds to verify
this.

VP.


diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index b59b5d9..3d14938 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2667,11 +2667,11 @@ s-iov: build/gcov-iov$(build_exeext) $(BASEVER) 
$(DEVPHASE)
 GCOV_OBJS = gcov.o
 gcov$(exeext): $(GCOV_OBJS) $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_OBJS) \
-   build/hash-table.o ggc-none.o $(LIBS) -o $@
+   hash-table.o ggc-none.o $(LIBS) -o $@
 GCOV_DUMP_OBJS = gcov-dump.o
 gcov-dump$(exeext): $(GCOV_DUMP_OBJS) $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_DUMP_OBJS) \
-   build/hash-table.o build/ggc-none.o\
+   hash-table.o ggc-none.o\
$(LIBS) -o $@

 GCOV_TOOL_DEP_FILES = $(srcdir)/../libgcc/libgcov-util.c gcov-io.c 
$(GCOV_IO_H) \




Re: [PATCH, RFC] New memory usage statistics infrastructure

2015-05-29 Thread Vidya Praveen


Martin,

The following change:

@@ -2655,10 +2655,10 @@ s-iov: build/gcov-iov$(build_exeext) $(BASEVER) 
$(DEVPHASE)

 GCOV_OBJS = gcov.o
 gcov$(exeext): $(GCOV_OBJS) $(LIBDEPS)
-   +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_OBJS) $(LIBS) -o $@
+   +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_OBJS) build/hash-table.o 
ggc-none.o $(LIBS) -o $@



seem to cause canadian cross build failure for arm and aarch64 on x86_64 as
build/hash-table.o and ggc-none.o are not built by the same compiler?

arm-none-linux-gnueabi-g++ -no-pie   -g -O2 -DIN_GCC-fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing
+-Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual 
-pedantic -Wno-long-long -Wn
build/hash-table.o ggc-none.o libcommon.a ../libcpp/libcpp.a 
../libbacktrace/.libs/libbacktrace.a ../libiberty/libiberty.a ..

+/libdecnumber/libdecnumber.a  -o gcov
build/hash-table.o: file not recognized: File format not recognized
collect2: error: ld returned 1 exit status
make[1]: *** [gcov] Error 1


Should it be:

-   +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_OBJS) $(LIBS) -o $@
+   +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_OBJS) hash-table.o 
ggc-none.o $(LIBS) -o $@


instead?

VP.


On 15/05/15 15:38, Martin Liška wrote:

Hello.

Following patch attempts to rewrite memory reports for GCC's internal 
allocations
so that it uses a new template type. The type shares parts which are currently 
duplicated,
adds support for special 'counters' and introduces new support for 
hash-{set,map,table}.

Transformation of the current code is a bit tricky as we internally used 
hash-table as main
data structure which takes care of location-related allocations. As I want to 
add support even
for hash tables (and all derived types), header files inclusion and forward 
declaration is utilized.

Feel free to comment the patch, as well as missing features one may want to 
track by location sensitive
memory allocation.

Attachment contains sample output taken from tramp3d-v4.cpp.

Thanks,
Martin





Re: [Patch,testsuite] Fix bind_pic_locally

2014-06-25 Thread Vidya Praveen
PING!

On Wed, Jun 04, 2014 at 02:56:00PM +0100, Vidya Praveen wrote:
 Hello,
 
 This is to follow up the patch I had posted to fix  bind_pic_locally some time
 ago (sorry, this went in to my back log for a while).
 
 To summarize, multilib_flags  when it contains -fpic or -fPIC, overrides -fpie
 or -fPIE that is added by bind_pic_locally. The fix that was finally agreed on
 was to  store the  flags to a  variable at  bind_pic_locally  and append it 
 to 
 multilib_flags just before  invoking target_compile  and remove it 
 immediately 
 after that (Refer [1]).
 
 This patch  implements the same. Since this is  an issue  not only for gcc 
 but 
 also for g++ and gfortran tests, I have fixed this in g++.exp and gfortran.exp
 along with gcc.exp. 
 
 This was tested and works fine on:
 
 aarch64-none-elf
 aarch64-none-linux-gnu
 arm-none-linux-gnueabihf
 x86_64-unknown-linux-gnu
 
 OK for trunk?
 
 Cheers
 VP.
 
 [1] http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00365.html
 
 ~~~
 
 gcc/testsuite/ChangeLog:
 
 2014-06-04  Vidya Praveen  vidyaprav...@arm.com
 
   * lib/target-support.exp (bind_pic_locally): Save the flags to
   'flags_to_postpone' instead of appending to 'flags'.
   * lib/gcc.exp (gcc_target_compile): Append board_info's multilib_flags
   with flags_to_postpone and revert after target_compile.
   * lib/g++.exp (g++_target_compile): Ditto.
   * lib/gfortran.exp (gfortran_target_compile): Ditto.

 diff --git a/gcc/testsuite/lib/g++.exp b/gcc/testsuite/lib/g++.exp
 index 751e27b..6658c58 100644
 --- a/gcc/testsuite/lib/g++.exp
 +++ b/gcc/testsuite/lib/g++.exp
 @@ -288,6 +288,8 @@ proc g++_target_compile { source dest type options } {
  global gluefile wrap_flags
  global ALWAYS_CXXFLAGS
  global GXX_UNDER_TEST
 +global flags_to_postpone
 +global board_info
  
  if { [target_info needs_status_wrapper] !=   [info exists gluefile] 
 } {
   lappend options libs=${gluefile}
 @@ -313,10 +315,25 @@ proc g++_target_compile { source dest type options } {
   exec rm -f $rponame
  }
  
 +# bind_pic_locally adds -fpie/-fPIE flags to flags_to_postpone and it is
 +# appended here to multilib_flags as it can be overridden by the latter
 +# if it was added earlier. After the target_compile, multilib_flags is
 +# restored to its orignal content.
 +set tboard [target_info name]
 +if {[board_info $tboard exists multilib_flags]} {
 +set orig_multilib_flags [board_info [target_info name] 
 multilib_flags]
 +append board_info($tboard,multilib_flags)  $flags_to_postpone
 +}
 +
  set options [dg-additional-files-options $options $source]
  
  set result [target_compile $source $dest $type $options]
  
 +if {[board_info $tboard exists multilib_flags]} {
 +set board_info($tboard,multilib_flags) $orig_multilib_flags
 +set flags_to_postpone 
 +}
 +
  return $result
  }
  
 diff --git a/gcc/testsuite/lib/gcc.exp b/gcc/testsuite/lib/gcc.exp
 index 49394b0..f937064 100644
 --- a/gcc/testsuite/lib/gcc.exp
 +++ b/gcc/testsuite/lib/gcc.exp
 @@ -126,7 +126,9 @@ proc gcc_target_compile { source dest type options } {
  global GCC_UNDER_TEST
  global TOOL_OPTIONS
  global TEST_ALWAYS_FLAGS
 - 
 +global flags_to_postpone
 +global board_info
 +
  if {[target_info needs_status_wrapper] !=   \
   [target_info needs_status_wrapper] != 0  \
   [info exists gluefile] } {
 @@ -162,8 +164,26 @@ proc gcc_target_compile { source dest type options } {
   set options [concat {additional_flags=$TOOL_OPTIONS} $options]
  }
  
 +# bind_pic_locally adds -fpie/-fPIE flags to flags_to_postpone and it is
 +# appended here to multilib_flags as it can be overridden by the latter
 +# if it was added earlier. After the target_compile, multilib_flags is
 +# restored to its orignal content.
 +set tboard [target_info name]
 +if {[board_info $tboard exists multilib_flags]} {
 +set orig_multilib_flags [board_info [target_info name] 
 multilib_flags]
 +append board_info($tboard,multilib_flags)  $flags_to_postpone
 +}
 +
  lappend options timeout=[timeout_value]
  lappend options compiler=$GCC_UNDER_TEST
  set options [dg-additional-files-options $options $source]
 -return [target_compile $source $dest $type $options]
 +set return_val [target_compile $source $dest $type $options]
 +
 +if {[board_info $tboard exists multilib_flags]} {
 +set board_info($tboard,multilib_flags) $orig_multilib_flags
 +set flags_to_postpone 
 +}
 +
 +return $return_val
  }
 +
 diff --git a/gcc/testsuite/lib/gfortran.exp b/gcc/testsuite/lib/gfortran.exp
 index c9b5d64..9d174bb 100644
 --- a/gcc/testsuite/lib/gfortran.exp
 +++ b/gcc/testsuite/lib/gfortran.exp
 @@ -234,16 +234,35 @@ proc gfortran_target_compile { source dest type options 
 } {
  global gluefile wrap_flags
  global ALWAYS_GFORTRANFLAGS

[Patch,testsuite] Fix bind_pic_locally

2014-06-04 Thread Vidya Praveen
Hello,

This is to follow up the patch I had posted to fix  bind_pic_locally some time
ago (sorry, this went in to my back log for a while).

To summarize, multilib_flags  when it contains -fpic or -fPIC, overrides -fpie
or -fPIE that is added by bind_pic_locally. The fix that was finally agreed on
was to  store the  flags to a  variable at  bind_pic_locally  and append it to 
multilib_flags just before  invoking target_compile  and remove it immediately 
after that (Refer [1]).

This patch  implements the same. Since this is  an issue  not only for gcc but 
also for g++ and gfortran tests, I have fixed this in g++.exp and gfortran.exp
along with gcc.exp. 

This was tested and works fine on:

aarch64-none-elf
aarch64-none-linux-gnu
arm-none-linux-gnueabihf
x86_64-unknown-linux-gnu

OK for trunk?

Cheers
VP.

[1] http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00365.html

~~~

gcc/testsuite/ChangeLog:

2014-06-04  Vidya Praveen  vidyaprav...@arm.com

* lib/target-support.exp (bind_pic_locally): Save the flags to
'flags_to_postpone' instead of appending to 'flags'.
* lib/gcc.exp (gcc_target_compile): Append board_info's multilib_flags
with flags_to_postpone and revert after target_compile.
* lib/g++.exp (g++_target_compile): Ditto.
* lib/gfortran.exp (gfortran_target_compile): Ditto.diff --git a/gcc/testsuite/lib/g++.exp b/gcc/testsuite/lib/g++.exp
index 751e27b..6658c58 100644
--- a/gcc/testsuite/lib/g++.exp
+++ b/gcc/testsuite/lib/g++.exp
@@ -288,6 +288,8 @@ proc g++_target_compile { source dest type options } {
 global gluefile wrap_flags
 global ALWAYS_CXXFLAGS
 global GXX_UNDER_TEST
+global flags_to_postpone
+global board_info
 
 if { [target_info needs_status_wrapper] !=   [info exists gluefile] } {
 	lappend options libs=${gluefile}
@@ -313,10 +315,25 @@ proc g++_target_compile { source dest type options } {
 	exec rm -f $rponame
 }
 
+# bind_pic_locally adds -fpie/-fPIE flags to flags_to_postpone and it is
+# appended here to multilib_flags as it can be overridden by the latter
+# if it was added earlier. After the target_compile, multilib_flags is
+# restored to its orignal content.
+set tboard [target_info name]
+if {[board_info $tboard exists multilib_flags]} {
+set orig_multilib_flags [board_info [target_info name] multilib_flags]
+append board_info($tboard,multilib_flags)  $flags_to_postpone
+}
+
 set options [dg-additional-files-options $options $source]
 
 set result [target_compile $source $dest $type $options]
 
+if {[board_info $tboard exists multilib_flags]} {
+set board_info($tboard,multilib_flags) $orig_multilib_flags
+set flags_to_postpone 
+}
+
 return $result
 }
 
diff --git a/gcc/testsuite/lib/gcc.exp b/gcc/testsuite/lib/gcc.exp
index 49394b0..f937064 100644
--- a/gcc/testsuite/lib/gcc.exp
+++ b/gcc/testsuite/lib/gcc.exp
@@ -126,7 +126,9 @@ proc gcc_target_compile { source dest type options } {
 global GCC_UNDER_TEST
 global TOOL_OPTIONS
 global TEST_ALWAYS_FLAGS
-	
+global flags_to_postpone
+global board_info
+
 if {[target_info needs_status_wrapper] !=   \
 	[target_info needs_status_wrapper] != 0  \
 	[info exists gluefile] } {
@@ -162,8 +164,26 @@ proc gcc_target_compile { source dest type options } {
 	set options [concat {additional_flags=$TOOL_OPTIONS} $options]
 }
 
+# bind_pic_locally adds -fpie/-fPIE flags to flags_to_postpone and it is
+# appended here to multilib_flags as it can be overridden by the latter
+# if it was added earlier. After the target_compile, multilib_flags is
+# restored to its orignal content.
+set tboard [target_info name]
+if {[board_info $tboard exists multilib_flags]} {
+set orig_multilib_flags [board_info [target_info name] multilib_flags]
+append board_info($tboard,multilib_flags)  $flags_to_postpone
+}
+
 lappend options timeout=[timeout_value]
 lappend options compiler=$GCC_UNDER_TEST
 set options [dg-additional-files-options $options $source]
-return [target_compile $source $dest $type $options]
+set return_val [target_compile $source $dest $type $options]
+
+if {[board_info $tboard exists multilib_flags]} {
+set board_info($tboard,multilib_flags) $orig_multilib_flags
+set flags_to_postpone 
+}
+
+return $return_val
 }
+
diff --git a/gcc/testsuite/lib/gfortran.exp b/gcc/testsuite/lib/gfortran.exp
index c9b5d64..9d174bb 100644
--- a/gcc/testsuite/lib/gfortran.exp
+++ b/gcc/testsuite/lib/gfortran.exp
@@ -234,16 +234,35 @@ proc gfortran_target_compile { source dest type options } {
 global gluefile wrap_flags
 global ALWAYS_GFORTRANFLAGS
 global GFORTRAN_UNDER_TEST
+global flags_to_postpone
+global board_info
 
 if { [target_info needs_status_wrapper] !=   [info exists gluefile] } {
 	lappend options libs=${gluefile}
 	lappend options ldflags

[Patch,testsuite] Fix tests that fail due to symbol visibility when -fPIC

2014-06-04 Thread Vidya Praveen
Hello,

The following  test cases fail when -fPIC  is passed  as dejagnu multilib flag 
since -fPIC  causes the  'availability' of the functions to be overwritable. I 
have fixed this by adding bind_pic_locally to these cases.

  gcc.dg/fail_always_inline.c
  gcc.dg/inline-22.c
  gcc.dg/inline_4.c
  g++.dg/ipa/devirt-25.C

Tested on:

  aarch64-none-elf
  aarch64-none-linux-gnu
  arm-none-linux-gnueabihf
  x86_64-unknown-linux-gnu

OK for trunk?

Cheers
VP.

~~~

gcc/testsuite/ChangeLog:

2014-06-04  Vidya Praveen  vidyaprav...@arm.com

* gcc.dg/inline-22.c: Add bind_pic_locally.
* gcc.dg/inline_4.c: Ditto.
* gcc.dg/fail_always_inline.c: Ditto.
* g++.dg/ipa/devirt-25.C: Ditto.
diff --git a/gcc/testsuite/g++.dg/ipa/devirt-25.C b/gcc/testsuite/g++.dg/ipa/devirt-25.C
index 7516479..387d529 100644
--- a/gcc/testsuite/g++.dg/ipa/devirt-25.C
+++ b/gcc/testsuite/g++.dg/ipa/devirt-25.C
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options -O3 -fdump-ipa-cp  } */
+/* { dg-add-options bind_pic_locally } */
 
 class ert_RefCounter {
  protected:
diff --git a/gcc/testsuite/gcc.dg/fail_always_inline.c b/gcc/testsuite/gcc.dg/fail_always_inline.c
index 4b196ac..86645b8 100644
--- a/gcc/testsuite/gcc.dg/fail_always_inline.c
+++ b/gcc/testsuite/gcc.dg/fail_always_inline.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-add-options bind_pic_locally } */
 
 extern __attribute__ ((always_inline)) void
  bar() { } /* { dg-warning function might not be inlinable } */
diff --git a/gcc/testsuite/gcc.dg/inline-22.c b/gcc/testsuite/gcc.dg/inline-22.c
index 1785e1c..6795c5f 100644
--- a/gcc/testsuite/gcc.dg/inline-22.c
+++ b/gcc/testsuite/gcc.dg/inline-22.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options -funit-at-a-time -Wno-attributes } */
+/* { dg-add-options bind_pic_locally } */
 /* Verify we can inline without a complete prototype and with promoted
arguments.  See also PR32492.  */
 __attribute__((always_inline)) void f1() {}
diff --git a/gcc/testsuite/gcc.dg/inline_4.c b/gcc/testsuite/gcc.dg/inline_4.c
index dd4fadb..ebd57e9 100644
--- a/gcc/testsuite/gcc.dg/inline_4.c
+++ b/gcc/testsuite/gcc.dg/inline_4.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options -O2 -fdump-tree-optimized -fdisable-tree-einline=foo2 -fdisable-ipa-inline -Wno-attributes } */
+/* { dg-add-options bind_pic_locally } */
 int g;
 __attribute__((always_inline)) void bar (void)
 {

[Patch,AArch64] Support SISD variants of SCVTF,UCVTF

2014-01-13 Thread Vidya Praveen
Hello,

This patch adds support to the SISD variants of SCVTF/UCVTF instructions.
This also refactors the existing support for floating point instruction
variants of SCVTF/UCVTF in order to direct the instruction selection based
on the constraints. Given that the floating-point variations supports
inequal width convertions (SI to DF and DI to SF), new mode iterator w1 and
w2 have been introduced and fcvt_target,FCVT_TARGET have been extended to
support non vector type. Since this patch changes the existing patterns, the
testcase includes tests for both SISD and floating point variations of the
instructions.

Tested for aarch64-none-elf.

OK for trunk?

Cheers
VP.

gcc/ChangeLog:

2013-01-13  Vidya Praveen  vidyaprav...@arm.com

* aarch64.md (floatGPI:modeGPF:mode2): Remove.
(floatunsGPI:modeGPF:mode2): Remove.
(optabfcvt_targetGPF:mode2): New pattern for equal width float
and floatuns conversions.
(optabfcvt_iesizeGPF:mode2): New pattern for inequal width float
and floatuns conversions.
* iterators.md (fcvt_target, FCVT_TARGET): Support SF and DF modes.
(w1,w2): New mode attributes for inequal width conversions.

gcc/testsuite/ChangeLog:

2013-01-13  Vidya Praveen  vidyaprav...@arm.com

* gcc.target/aarch64/cvtf_1.c: New.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c83622d..1775849 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3295,20 +3295,24 @@
   [(set_attr type f_cvtf2i)]
 )
 
-(define_insn floatGPI:modeGPF:mode2
-  [(set (match_operand:GPF 0 register_operand =w)
-(float:GPF (match_operand:GPI 1 register_operand r)))]
-  TARGET_FLOAT
-  scvtf\\t%GPF:s0, %GPI:w1
-  [(set_attr type f_cvti2f)]
+(define_insn optabfcvt_targetGPF:mode2
+  [(set (match_operand:GPF 0 register_operand =w,w)
+(FLOATUORS:GPF (match_operand:FCVT_TARGET 1 register_operand w,r)))]
+  
+  @
+   su_optabcvtf\t%GPF:s0, %s1
+   su_optabcvtf\t%GPF:s0, %w11
+  [(set_attr simd yes,no)
+   (set_attr fp no,yes)
+   (set_attr type neon_int_to_fp_Vetype,f_cvti2f)]
 )
 
-(define_insn floatunsGPI:modeGPF:mode2
+(define_insn optabfcvt_iesizeGPF:mode2
   [(set (match_operand:GPF 0 register_operand =w)
-(unsigned_float:GPF (match_operand:GPI 1 register_operand r)))]
+(FLOATUORS:GPF (match_operand:FCVT_IESIZE 1 register_operand r)))]
   TARGET_FLOAT
-  ucvtf\\t%GPF:s0, %GPI:w1
-  [(set_attr type f_cvt)]
+  su_optabcvtf\t%GPF:s0, %w21
+  [(set_attr type f_cvti2f)]
 )
 
 ;; ---
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index c4f95dc..11bdc35 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -293,6 +293,10 @@
 ;; 32-bit version and %x0 in the 64-bit version.
 (define_mode_attr w [(QI w) (HI w) (SI w) (DI x) (SF s) (DF d)])
 
+;; For inequal width int to float conversion
+(define_mode_attr w1 [(SF w) (DF x)])
+(define_mode_attr w2 [(SF x) (DF w)])
+
 ;; For constraints used in scalar immediate vector moves
 (define_mode_attr hq [(HI h) (QI q)])
 
@@ -558,8 +562,12 @@
 (define_mode_attr atomic_sfx
   [(QI b) (HI h) (SI ) (DI )])
 
-(define_mode_attr fcvt_target [(V2DF v2di) (V4SF v4si) (V2SF v2si)])
-(define_mode_attr FCVT_TARGET [(V2DF V2DI) (V4SF V4SI) (V2SF V2SI)])
+(define_mode_attr fcvt_target [(V2DF v2di) (V4SF v4si) (V2SF v2si) (SF si) (DF di)])
+(define_mode_attr FCVT_TARGET [(V2DF V2DI) (V4SF V4SI) (V2SF V2SI) (SF SI) (DF DI)])
+
+;; for the inequal width integer to fp conversions
+(define_mode_attr fcvt_iesize [(SF di) (DF si)])
+(define_mode_attr FCVT_IESIZE [(SF DI) (DF SI)])
 
 (define_mode_attr VSWAP_WIDTH [(V8QI V16QI) (V16QI V8QI)
 (V4HI V8HI) (V8HI  V4HI)
diff --git a/gcc/testsuite/gcc.target/aarch64/cvtf_1.c b/gcc/testsuite/gcc.target/aarch64/cvtf_1.c
new file mode 100644
index 000..80ab9a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cvtf_1.c
@@ -0,0 +1,95 @@
+/* { dg-do run } */
+/* { dg-options -save-temps -fno-inline -O1 } */
+
+#define FCVTDEF(ftype,itype) \
+void \
+cvt_##itype##_to_##ftype (itype a, ftype b)\
+{\
+  ftype c;\
+  c = (ftype) a;\
+  if ( (c - b)  0.1) abort();\
+}
+
+#define force_simd_for_float(v) asm volatile (mov %s0, %1.s[0] :=w (v) :w (v) :)
+#define force_simd_for_double(v) asm volatile (mov %d0, %1.d[0] :=w (v) :w (v) :)
+
+#define FCVTDEF_SISD(ftype,itype) \
+void \
+cvt_##itype##_to_##ftype##_sisd (itype a, ftype b)\
+{\
+  ftype c;\
+  force_simd_for_##ftype(a);\
+  c = (ftype) a;\
+  if ( (c - b)  0.1) abort();\
+}
+
+#define FCVT(ftype,itype,ival,fval) cvt_##itype##_to_##ftype (ival, fval);
+#define FCVT_SISD(ftype,itype,ival,fval) cvt_##itype##_to_##ftype##_sisd (ival, fval);
+
+typedef int int32_t;
+typedef unsigned int uint32_t;
+typedef long long int int64_t;
+typedef unsigned long long int uint64_t;
+
+extern void abort();
+
+FCVTDEF (float, int32_t)
+/* { dg-final

Re: [Patch,testsuite] Fix testcases that use bind_pic_locally

2014-01-09 Thread Vidya Praveen
On Wed, Jan 08, 2014 at 12:28:56PM +, Jakub Jelinek wrote:
 On Wed, Jan 08, 2014 at 11:49:08AM +, Vidya Praveen wrote:
  On Tue, Jan 07, 2014 at 09:35:54PM +, Mike Stump wrote:
   On Dec 17, 2013, at 6:06 AM, Vidya Praveen vidyaprav...@arm.com wrote:
bind_pic_locally is broken for targets that doesn't pass -fPIC/-fpic by
default [1][2].
   
   Let's give Jakub 2 days to weigh in?  If no objections, Ok, though, do 
   see about adding documentation for it.  
  
  Sure. I didn't respin the patch with documentation since I wanted to know
  if the solution is acceptable. If this patch is OK, I'll respin with the
  documentation for bind_pic_locally_ok. 
  
   I kinda would like a simpler interface for these two, but?  that can be 
   follow on work, if someone has a bright idea and some time to implement 
   it.
   
  
  Could you explain what do you mean by simpler interface here? 
 
 The simpler interface, as I said earlier, would be just to make sure
 /* { dg-add-options bind_pic_locally } */
 does the right thing, I really don't believe you've tried hard enough.
 
 It is true dejagnu's default_target_compile has:
 if {[board_info $dest exists multilib_flags]} {
 append add_flags  [board_info $dest multilib_flags]
 }
 last (before just adding -o $destfile; is multilib_flags where the
 -fpic/-fPIC comes in, right?), but if say dg-add-options bind_pic_locally
 adds the necessary options not to dg-extra-tools-flags, but to some
 other variable and say gcc_target_compile (and g++_target_compile)
 around the [target_compile ...] invocation e.g. temporarily append
 that other variable (if not empty) to board_info's multilib_flags
 and afterwards remove it, I don't see why it wouldn't work.
 Tcl is quite flexible in this.

Thanks Jakub. I seem to have not properly understood your earlier email. I could
do this and works fine. I'll test and post the patch.

VP.




Re: [Patch,testsuite] Fix testcases that use bind_pic_locally

2014-01-08 Thread Vidya Praveen
On Tue, Jan 07, 2014 at 09:35:54PM +, Mike Stump wrote:
 On Dec 17, 2013, at 6:06 AM, Vidya Praveen vidyaprav...@arm.com wrote:
  bind_pic_locally is broken for targets that doesn't pass -fPIC/-fpic by
  default [1][2].
 
 Let's give Jakub 2 days to weigh in?  If no objections, Ok, though, do see 
 about adding documentation for it.  

Sure. I didn't respin the patch with documentation since I wanted to know
if the solution is acceptable. If this patch is OK, I'll respin with the
documentation for bind_pic_locally_ok. 

 I kinda would like a simpler interface for these two, but?  that can be 
 follow on work, if someone has a bright idea and some time to implement it.
 

Could you explain what do you mean by simpler interface here? 

Cheers
VP.




Re: [Patch,testsuite] Fix testcases that use bind_pic_locally

2014-01-07 Thread Vidya Praveen
Ping!

On Tue, Dec 17, 2013 at 02:06:13PM +, Vidya Praveen wrote:
 Hello,
 
 bind_pic_locally is broken for targets that doesn't pass -fPIC/-fpic by
 default [1][2].
 
 One of the suggestions was to have a effective target check called
 bind_pic_locally_ok which checks if bind_pic_locally will work and have it
 included in all the tests that uses bind_pic_locally in dg-add-options [1].
 
 This patch implements the same by checking if -fpic/-fPIC are passed by
 default as well in general with the flags passed through various means. It
 returns 1 when either the -fpic/-fPIC is passed by default OR when it is 
 not passed by default as well as not passed through any other means. This 
 however, will allow if -fpic/-fPIC is passed both by default and by the 
 other means since we can't really tell such a case and it makes no sense 
 to do so (because there's no reason for the testcase to pass -fPIC/-fpic 
 when it tries to override the same using bind_pic_locally and if it is 
 passed by default, there's no need to pass them through, say, board file's
 cflags).
 
 default  other-means  returns
 pic   -   1
 pic   pic 1 (invalid)
 - pic 0
 - -   1
 
 This patch also modifies all the testcases that use bind_pic_locally to 
 include this bind_pic_locally_ok check.
 
 Tested for aarch64-none-elf, arm-none-eabi, arm-none-linux-gnueabihf.
 
 OK?
 
 Cheers
 VP.
 
 [1] http://gcc.gnu.org/ml/gcc/2013-09/msg00207.html
 [2] http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00462.html
 
 
 gcc/testsuite/ChangeLog:
 
 2013-12-17  Vidya Praveen  vidyaprav...@arm.com
 
   * lib/target-support.exp: (check_effective_target_bind_pic_locally_ok):
   New check.
   * g++.dg/ipa/iinline-1.C: Introduce bind_pic_locally_ok.
   * g++.dg/ipa/iinline-2.C: Likewise.
   * g++.dg/ipa/iinline-3.C: Likewise.
   * g++.dg/ipa/inline-1.C: Likewise.
   * g++.dg/ipa/inline-2.C: Likewise.
   * g++.dg/ipa/inline-3.C: Likewise.
   * g++.dg/other/first-global.C: Likewise.
   * g++.dg/parse/attr-externally-visible-1.C: Likewise.
   * g++.dg/torture/pr40323.C: Likewise.
   * g++.dg/torture/pr55260-1.C: Likewise.
   * g++.dg/torture/pr55260-2.C: Likewise.
   * g++.dg/tree-ssa/inline-1.C: Likewise.
   * g++.dg/tree-ssa/inline-2.C: Likewise.
   * g++.dg/tree-ssa/inline-3.C: Likewise.
   * g++.dg/tree-ssa/nothrow-1.C: Likewise.
   * gcc.dg/inline-33.c: Likewise.
   * gcc.dg/ipa/ipa-1.c: Likewise.
   * gcc.dg/ipa/ipa-2.c: Likewise.
   * gcc.dg/ipa/ipa-3.c: Likewise.
   * gcc.dg/ipa/ipa-4.c: Likewise.
   * gcc.dg/ipa/ipa-5.c: Likewise.
   * gcc.dg/ipa/ipa-7.c: Likewise.
   * gcc.dg/ipa/ipa-8.c: Likewise.
   * gcc.dg/ipa/ipacost-2.c: Likewise.
   * gcc.dg/ipa/ipcp-1.c: Likewise.
   * gcc.dg/ipa/ipcp-2.c: Likewise.
   * gcc.dg/ipa/ipcp-4.c: Likewise.
   * gcc.dg/ipa/ipcp-agg-1.c: Likewise.
   * gcc.dg/ipa/ipcp-agg-2.c: Likewise.
   * gcc.dg/ipa/ipcp-agg-3.c: Likewise.
   * gcc.dg/ipa/ipcp-agg-4.c: Likewise.
   * gcc.dg/ipa/ipcp-agg-5.c: Likewise.
   * gcc.dg/ipa/ipcp-agg-6.c: Likewise.
   * gcc.dg/ipa/ipcp-agg-7.c: Likewise.
   * gcc.dg/ipa/ipcp-agg-8.c: Likewise.
   * gcc.dg/ipa/pr56988.c: Likewise.
   * gcc.dg/tree-ssa/inline-3.c: Likewise.
   * gcc.dg/tree-ssa/inline-4.c: Likewise.
   * gcc.dg/tree-ssa/ipa-cp-1.c: Likewise.
   * gcc.dg/tree-ssa/local-pure-const.c: Likewise.
   * gfortran.dg/whole_file_5.f90: Likewise.
   * gfortran.dg/whole_file_6.f90: Likewise.
 

 diff --git a/gcc/testsuite/g++.dg/ipa/iinline-1.C 
 b/gcc/testsuite/g++.dg/ipa/iinline-1.C
 index 9f99893..b86daf1 100644
 --- a/gcc/testsuite/g++.dg/ipa/iinline-1.C
 +++ b/gcc/testsuite/g++.dg/ipa/iinline-1.C
 @@ -1,6 +1,7 @@
  /* Verify that simple indirect calls are inlined even without early
 inlining..  */
  /* { dg-do compile } */
 +/* { dg-require-effective-target bind_pic_locally_ok } */
  /* { dg-options -O3 -fdump-ipa-inline -fno-early-inlining  } */
  /* { dg-add-options bind_pic_locally } */
  
 diff --git a/gcc/testsuite/g++.dg/ipa/iinline-2.C 
 b/gcc/testsuite/g++.dg/ipa/iinline-2.C
 index 670a5dd..d4329c1 100644
 --- a/gcc/testsuite/g++.dg/ipa/iinline-2.C
 +++ b/gcc/testsuite/g++.dg/ipa/iinline-2.C
 @@ -1,6 +1,7 @@
  /* Verify that simple indirect calls are inlined even without early
 inlining..  */
  /* { dg-do compile } */
 +/* { dg-require-effective-target bind_pic_locally_ok } */
  /* { dg-options -O3 -fdump-ipa-inline -fno-early-inlining  } */
  /* { dg-add-options bind_pic_locally } */
  
 diff --git a/gcc/testsuite/g++.dg/ipa/iinline-3.C 
 b/gcc/testsuite/g++.dg/ipa/iinline-3.C
 index 3daee9a..4dc604e 100644
 --- a/gcc/testsuite/g++.dg/ipa/iinline-3.C
 +++ b/gcc/testsuite/g++.dg/ipa/iinline-3.C
 @@ -1,6 +1,7 @@
  /* Verify that we do not indirect-inline using member pointer
 parameters which have been modified.  */
  /* { dg-do run

[Patch,testsuite] Fix testcases that use bind_pic_locally

2013-12-17 Thread Vidya Praveen
Hello,

bind_pic_locally is broken for targets that doesn't pass -fPIC/-fpic by
default [1][2].

One of the suggestions was to have a effective target check called
bind_pic_locally_ok which checks if bind_pic_locally will work and have it
included in all the tests that uses bind_pic_locally in dg-add-options [1].

This patch implements the same by checking if -fpic/-fPIC are passed by
default as well in general with the flags passed through various means. It
returns 1 when either the -fpic/-fPIC is passed by default OR when it is 
not passed by default as well as not passed through any other means. This 
however, will allow if -fpic/-fPIC is passed both by default and by the 
other means since we can't really tell such a case and it makes no sense 
to do so (because there's no reason for the testcase to pass -fPIC/-fpic 
when it tries to override the same using bind_pic_locally and if it is 
passed by default, there's no need to pass them through, say, board file's
cflags).

default  other-means  returns
pic -   1
pic pic 1 (invalid)
-   pic 0
-   -   1

This patch also modifies all the testcases that use bind_pic_locally to 
include this bind_pic_locally_ok check.

Tested for aarch64-none-elf, arm-none-eabi, arm-none-linux-gnueabihf.

OK?

Cheers
VP.

[1] http://gcc.gnu.org/ml/gcc/2013-09/msg00207.html
[2] http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00462.html


gcc/testsuite/ChangeLog:

2013-12-17  Vidya Praveen  vidyaprav...@arm.com

* lib/target-support.exp: (check_effective_target_bind_pic_locally_ok):
New check.
* g++.dg/ipa/iinline-1.C: Introduce bind_pic_locally_ok.
* g++.dg/ipa/iinline-2.C: Likewise.
* g++.dg/ipa/iinline-3.C: Likewise.
* g++.dg/ipa/inline-1.C: Likewise.
* g++.dg/ipa/inline-2.C: Likewise.
* g++.dg/ipa/inline-3.C: Likewise.
* g++.dg/other/first-global.C: Likewise.
* g++.dg/parse/attr-externally-visible-1.C: Likewise.
* g++.dg/torture/pr40323.C: Likewise.
* g++.dg/torture/pr55260-1.C: Likewise.
* g++.dg/torture/pr55260-2.C: Likewise.
* g++.dg/tree-ssa/inline-1.C: Likewise.
* g++.dg/tree-ssa/inline-2.C: Likewise.
* g++.dg/tree-ssa/inline-3.C: Likewise.
* g++.dg/tree-ssa/nothrow-1.C: Likewise.
* gcc.dg/inline-33.c: Likewise.
* gcc.dg/ipa/ipa-1.c: Likewise.
* gcc.dg/ipa/ipa-2.c: Likewise.
* gcc.dg/ipa/ipa-3.c: Likewise.
* gcc.dg/ipa/ipa-4.c: Likewise.
* gcc.dg/ipa/ipa-5.c: Likewise.
* gcc.dg/ipa/ipa-7.c: Likewise.
* gcc.dg/ipa/ipa-8.c: Likewise.
* gcc.dg/ipa/ipacost-2.c: Likewise.
* gcc.dg/ipa/ipcp-1.c: Likewise.
* gcc.dg/ipa/ipcp-2.c: Likewise.
* gcc.dg/ipa/ipcp-4.c: Likewise.
* gcc.dg/ipa/ipcp-agg-1.c: Likewise.
* gcc.dg/ipa/ipcp-agg-2.c: Likewise.
* gcc.dg/ipa/ipcp-agg-3.c: Likewise.
* gcc.dg/ipa/ipcp-agg-4.c: Likewise.
* gcc.dg/ipa/ipcp-agg-5.c: Likewise.
* gcc.dg/ipa/ipcp-agg-6.c: Likewise.
* gcc.dg/ipa/ipcp-agg-7.c: Likewise.
* gcc.dg/ipa/ipcp-agg-8.c: Likewise.
* gcc.dg/ipa/pr56988.c: Likewise.
* gcc.dg/tree-ssa/inline-3.c: Likewise.
* gcc.dg/tree-ssa/inline-4.c: Likewise.
* gcc.dg/tree-ssa/ipa-cp-1.c: Likewise.
* gcc.dg/tree-ssa/local-pure-const.c: Likewise.
* gfortran.dg/whole_file_5.f90: Likewise.
* gfortran.dg/whole_file_6.f90: Likewise.
diff --git a/gcc/testsuite/g++.dg/ipa/iinline-1.C b/gcc/testsuite/g++.dg/ipa/iinline-1.C
index 9f99893..b86daf1 100644
--- a/gcc/testsuite/g++.dg/ipa/iinline-1.C
+++ b/gcc/testsuite/g++.dg/ipa/iinline-1.C
@@ -1,6 +1,7 @@
 /* Verify that simple indirect calls are inlined even without early
inlining..  */
 /* { dg-do compile } */
+/* { dg-require-effective-target bind_pic_locally_ok } */
 /* { dg-options -O3 -fdump-ipa-inline -fno-early-inlining  } */
 /* { dg-add-options bind_pic_locally } */
 
diff --git a/gcc/testsuite/g++.dg/ipa/iinline-2.C b/gcc/testsuite/g++.dg/ipa/iinline-2.C
index 670a5dd..d4329c1 100644
--- a/gcc/testsuite/g++.dg/ipa/iinline-2.C
+++ b/gcc/testsuite/g++.dg/ipa/iinline-2.C
@@ -1,6 +1,7 @@
 /* Verify that simple indirect calls are inlined even without early
inlining..  */
 /* { dg-do compile } */
+/* { dg-require-effective-target bind_pic_locally_ok } */
 /* { dg-options -O3 -fdump-ipa-inline -fno-early-inlining  } */
 /* { dg-add-options bind_pic_locally } */
 
diff --git a/gcc/testsuite/g++.dg/ipa/iinline-3.C b/gcc/testsuite/g++.dg/ipa/iinline-3.C
index 3daee9a..4dc604e 100644
--- a/gcc/testsuite/g++.dg/ipa/iinline-3.C
+++ b/gcc/testsuite/g++.dg/ipa/iinline-3.C
@@ -1,6 +1,7 @@
 /* Verify that we do not indirect-inline using member pointer
parameters which have been modified.  */
 /* { dg-do run } */
+/* { dg-require-effective-target bind_pic_locally_ok } */
 /* { dg-options -O3 -fno-early-inlining

Re: [RFC] Vectorization of indexed elements

2013-12-04 Thread Vidya Praveen
Hi Richi,

Apologies for the late response. I was on vacation.

On Mon, Oct 14, 2013 at 09:04:58AM +0100, Richard Biener wrote:
 On Fri, 11 Oct 2013, Vidya Praveen wrote:
 
  On Tue, Oct 01, 2013 at 09:26:25AM +0100, Richard Biener wrote:
   On Mon, 30 Sep 2013, Vidya Praveen wrote:
   
On Mon, Sep 30, 2013 at 02:19:32PM +0100, Richard Biener wrote:
 On Mon, 30 Sep 2013, Vidya Praveen wrote:
 
  On Fri, Sep 27, 2013 at 04:19:45PM +0100, Vidya Praveen wrote:
   On Fri, Sep 27, 2013 at 03:50:08PM +0100, Vidya Praveen wrote:
   [...]
  I can't really insist on the single lane load.. something 
  like:
  
  vc:V4SI[0] = c
  vt:V4SI = vec_duplicate:V4SI (vec_select:SI vc:V4SI 0)
  va:V4SI = vb:V4SI op vt:V4SI
  
  Or is there any other way to do this?
 
 Can you elaborate on I can't really insist on the single 
 lane load?
 What's the single lane load in your example? 

Loading just one lane of the vector like this:

vc:V4SI[0] = c // from the above scalar example

or 

vc:V4SI[0] = c[2] 

is what I meant by single lane load. In this example:

t = c[2] 
...
vb:v4si = b[0:3] 
vc:v4si = { t, t, t, t }
va:v4si = vb:v4si op vc:v4si 

If we are expanding the CONSTRUCTOR as vec_duplicate at 
vec_init, I cannot
insist 't' to be vector and t = c[2] to be vect_t[0] = c[2] 
(which could be 
seen as vec_select:SI (vect_t 0) ). 

 I'd expect the instruction
 pattern as quoted to just work (and I hope we expand an 
 uniform
 constructor { a, a, a, a } properly using vec_duplicate).

As much as I went through the code, this is only done using 
vect_init. It is
not expanded as vec_duplicate from, for example, 
store_constructor() of expr.c
   
   Do you see any issues if we expand such constructor as 
   vec_duplicate directly 
   instead of going through vect_init way? 
  
  Sorry, that was a bad question.
  
  But here's what I would like to propose as a first step. Please 
  tell me if this
  is acceptable or if it makes sense:
  
  - Introduce standard pattern names 
  
  vmulim4 - vector muliply with second operand as indexed operand
  
  Example:
  
  (define_insn vmuliv4si4
 [set (match_operand:V4SI 0 register_operand)
  (mul:V4SI (match_operand:V4SI 1 register_operand)
(vec_duplicate:V4SI
  (vec_select:SI
(match_operand:V4SI 2 register_operand)
(match_operand:V4SI 3 immediate_operand)]
   ...
  )
 
 We could factor this with providing a standard pattern name for
 
 (define_insn vdupimode
   [set (match_operand:mode 0 register_operand)
(vec_duplicate:mode
   (vec_select:scalarmode
  (match_operand:mode 1 register_operand)
  (match_operand:SI 2 immediate_operand]

This is good. I did think about this but then I thought of avoiding the 
need
for combiner patterns :-) 

But do you find the lane specific mov pattern I proposed, acceptable? 
   
   The specific mul pattern?  As said, consider factoring to vdupi to
   avoid an explosion in required special optabs.
   
 (you use V4SI for the immediate?  

Sorry typo again!! It should've been SI.

 Ideally vdupi has another custom
 mode for the vector index).
 
 Note that this factored pattern is already available as 
 vec_perm_const!
 It is simply (vec_perm_const:V4SI source source 
 immediate-selector).
 
 Which means that on the GIMPLE level we should try to combine
 
 el_4 = BIT_FIELD_REF v_3, ...;
 v_5 = { el_4, el_4, ... };

I don't think we reach this state at all for the scenarios in 
discussion.
what we generally have is:

 el_4 = MEM_REF  array + index*size 
 v_5 = { el_4, ... }

Or am I missing something?
   
   Well, but in that case I doubt it is profitable (or even valid!) to
   turn this into a vector lane load from the array.  If it is profitable
   to perform a vector read (because we're going to use the other elements
   of the vector as well) then the vectorizer should produce a vector
   load and materialize the uniform vector from one of its elements.
   
   Maybe at this point you should show us a compilable C testcase
   with a loop that should be vectorized using your instructions in
   the end?
  
  Here's a compilable example:
  
  void 
  foo (int *__restrict__ a,
   int *__restrict__ b,
   int *__restrict__ c)
  {
int i;
  
for (i = 0; i  8; i++)
  a[i

Re: [RFC] Vectorization of indexed elements

2013-12-04 Thread Vidya Praveen
Hi Jakub,

Apologies for the late response.

On Fri, Oct 11, 2013 at 04:05:24PM +0100, Jakub Jelinek wrote:
 On Fri, Oct 11, 2013 at 03:54:08PM +0100, Vidya Praveen wrote:
  Here's a compilable example:
  
  void 
  foo (int *__restrict__ a,
   int *__restrict__ b,
   int *__restrict__ c)
  {
int i;
  
for (i = 0; i  8; i++)
  a[i] = b[i] * c[2];
  }
  
  This is vectorized by duplicating c[2] now. But I'm trying to take advantage
  of target instructions that can take a vector register as second argument 
  but
  use only one element (by using the same value for all the lanes) of the 
  vector register.
  
  Eg. mul vec-reg, vec-reg, vec-reg[index]
  mla vec-reg, vec-reg, vec-reg[index] // multiply and add
  
  But for a loop like the one in the C example given, I will have to load the
  c[2] in one element of the vector register (leaving the remaining unused)
  rather. This is why I was proposing to load just one element in a vector 
  register (what I meant as lane specific load). The benefit of doing this 
  is
  that we avoid explicit duplication, however such a simplification can only
  be done where such support is available - the reason why I was thinking in
  terms of optional standard pattern name. Another benefit is we will also be
  able to support scalars in the expression like in the following example:
  
  void
  foo (int *__restrict__ a,
   int *__restrict__ b,
   int c)
  {
int i;
  
for (i = 0; i  8; i++)
  a[i] = b[i] * c;
  }
 
 So just during combine let the broadcast operation be combined with the
 arithmetics?  

Yes. I can do that. But I always want it to be possible to recognize and load
directly to the indexed vector register from memory.


 Intel AVX512 ISA has similar feature, not sure what exactly
 they are doing for this. 

Thanks. I'll try to go through the code to understand.

 That said, the broadcast is likely going to be
 hoisted before the loop, and in that case is it really cheaper to have
 it unbroadcasted in a vector register rather than to broadcast it before the
 loop and just use there?

Could you explain what do you mean by unbroadcast? The constructor needs to be
expanded in one way or another, isn't it? I thought expanding to vec_duplicate
when the values are uniform is the most efficient when vec_duplicate could be
supported by the target. If you had meant that each element of vector is loaded
separately, I am thinking how can I combine such an operation with the 
arithmetic
operation.

Thanks
VP.





Re: Re: [Patch] Fix gcc.dg/20050922-*.c

2013-11-18 Thread Vidya Praveen

Mike,

On 25/10/13 00:37, Mike Stump wrote:

On Oct 24, 2013, at 2:26 AM, Vidya Praveen vidyaprav...@arm.com wrote:

On Mon, Oct 21, 2013 at 06:40:28PM +0100, Mike Stump wrote:

On Oct 21, 2013, at 3:28 AM, Vidya Praveen vidyaprav...@arm.com wrote:

Tests gcc.dg/20050922-1.c and gcc.dg/20050922-2.c includes stdlib.h. This can
be a issue especially since they define uint32_t.



OK for 4.7, 4.8?


It fails on arm-none-eabi.


Ok, let it bake on trunk and then you can back port it if no one screams.



I think it has baked long enough. Could this be approved for 4.7 and 4.8 now?

VP.



Re: [Patch] Fix gcc.dg/20050922-*.c

2013-10-24 Thread Vidya Praveen
On Mon, Oct 21, 2013 at 06:40:28PM +0100, Mike Stump wrote:
 On Oct 21, 2013, at 3:28 AM, Vidya Praveen vidyaprav...@arm.com wrote:
  Tests gcc.dg/20050922-1.c and gcc.dg/20050922-2.c includes stdlib.h. This 
  can
  be a issue especially since they define uint32_t.
 
  OK for 4.7, 4.8?
 
 For release branches, you'd need to transition from the theoretical to the 
 practical.  On which systems (software) does it fail?  If none, then no, a 
 back port isn't necessary.  If it fails on a system (or software) on which 
 real users use, then I'll approve it once you name the system (software) and 
 let it bake on trunk for a week and see if anyone objects?


It fails on arm-none-eabi. 

VP.

 



Re: [Patch] Fix gcc.dg/20050922-*.c

2013-10-24 Thread Vidya Praveen
On Mon, Oct 21, 2013 at 05:47:44PM +0100, Jeff Law wrote:
 On 10/21/13 04:28, Vidya Praveen wrote:
  Hello,
 
  Tests gcc.dg/20050922-1.c and gcc.dg/20050922-2.c includes stdlib.h. This 
  can
  be a issue especially since they define uint32_t. Testcase writing 
  guidelines
  discourages such inclusion as well.
 
  This patch replaces these #includes with manual declarations.
 
  Tested for aarch64-none-elf, arm-none-eabi and x86_64-linux-gnu
 
  OK for trunk, 4.7, 4.8?
 
  VP.
 
  ---
 
  gcc/testsuite/ChangeLog:
 
  2013-10-21  Vidya Praveen  vidyaprav...@arm.com
 
  * gcc.dg/20050922-1.c: Remove stdlib.h and declare abort().
  * gcc.dg/20050922-1.c: Remove stdlib.h and declare abort() and exit().
 OK  installed on trunk.
 
 Release managers would need to make a decision about whether or not to 
 include this for the 4.7/4.8 branches.
 

Thanks Jeff!

VP.




[Patch] Fix gcc.dg/20050922-*.c

2013-10-21 Thread Vidya Praveen
Hello,

Tests gcc.dg/20050922-1.c and gcc.dg/20050922-2.c includes stdlib.h. This can
be a issue especially since they define uint32_t. Testcase writing guidelines
discourages such inclusion as well.

This patch replaces these #includes with manual declarations.

Tested for aarch64-none-elf, arm-none-eabi and x86_64-linux-gnu

OK for trunk, 4.7, 4.8?

VP.

---

gcc/testsuite/ChangeLog:

2013-10-21  Vidya Praveen  vidyaprav...@arm.com

* gcc.dg/20050922-1.c: Remove stdlib.h and declare abort().
* gcc.dg/20050922-1.c: Remove stdlib.h and declare abort() and exit().

diff --git a/gcc/testsuite/gcc.dg/20050922-1.c b/gcc/testsuite/gcc.dg/20050922-1.c
index ed5a3c6..982f820 100644
--- a/gcc/testsuite/gcc.dg/20050922-1.c
+++ b/gcc/testsuite/gcc.dg/20050922-1.c
@@ -4,7 +4,7 @@
 /* { dg-do run } */
 /* { dg-options -O1 -std=c99 } */
 
-#include stdlib.h
+extern void abort (void);
 
 #if __INT_MAX__ == 2147483647
 typedef unsigned int uint32_t;
diff --git a/gcc/testsuite/gcc.dg/20050922-2.c b/gcc/testsuite/gcc.dg/20050922-2.c
index c2974d0..2e8db82 100644
--- a/gcc/testsuite/gcc.dg/20050922-2.c
+++ b/gcc/testsuite/gcc.dg/20050922-2.c
@@ -4,7 +4,8 @@
 /* { dg-do run } */
 /* { dg-options -O1 -std=c99 } */
 
-#include stdlib.h
+extern void abort (void);
+extern void exit (int);
 
 #if __INT_MAX__ == 2147483647
 typedef unsigned int uint32_t;

Re: [RFC] Vectorization of indexed elements

2013-10-11 Thread Vidya Praveen
On Tue, Oct 01, 2013 at 09:26:25AM +0100, Richard Biener wrote:
 On Mon, 30 Sep 2013, Vidya Praveen wrote:
 
  On Mon, Sep 30, 2013 at 02:19:32PM +0100, Richard Biener wrote:
   On Mon, 30 Sep 2013, Vidya Praveen wrote:
   
On Fri, Sep 27, 2013 at 04:19:45PM +0100, Vidya Praveen wrote:
 On Fri, Sep 27, 2013 at 03:50:08PM +0100, Vidya Praveen wrote:
 [...]
I can't really insist on the single lane load.. something like:

vc:V4SI[0] = c
vt:V4SI = vec_duplicate:V4SI (vec_select:SI vc:V4SI 0)
va:V4SI = vb:V4SI op vt:V4SI

Or is there any other way to do this?
   
   Can you elaborate on I can't really insist on the single lane 
   load?
   What's the single lane load in your example? 
  
  Loading just one lane of the vector like this:
  
  vc:V4SI[0] = c // from the above scalar example
  
  or 
  
  vc:V4SI[0] = c[2] 
  
  is what I meant by single lane load. In this example:
  
  t = c[2] 
  ...
  vb:v4si = b[0:3] 
  vc:v4si = { t, t, t, t }
  va:v4si = vb:v4si op vc:v4si 
  
  If we are expanding the CONSTRUCTOR as vec_duplicate at vec_init, I 
  cannot
  insist 't' to be vector and t = c[2] to be vect_t[0] = c[2] (which 
  could be 
  seen as vec_select:SI (vect_t 0) ). 
  
   I'd expect the instruction
   pattern as quoted to just work (and I hope we expand an uniform
   constructor { a, a, a, a } properly using vec_duplicate).
  
  As much as I went through the code, this is only done using 
  vect_init. It is
  not expanded as vec_duplicate from, for example, 
  store_constructor() of expr.c
 
 Do you see any issues if we expand such constructor as vec_duplicate 
 directly 
 instead of going through vect_init way? 

Sorry, that was a bad question.

But here's what I would like to propose as a first step. Please tell me 
if this
is acceptable or if it makes sense:

- Introduce standard pattern names 

vmulim4 - vector muliply with second operand as indexed operand

Example:

(define_insn vmuliv4si4
   [set (match_operand:V4SI 0 register_operand)
(mul:V4SI (match_operand:V4SI 1 register_operand)
  (vec_duplicate:V4SI
(vec_select:SI
  (match_operand:V4SI 2 register_operand)
  (match_operand:V4SI 3 immediate_operand)]
 ...
)
   
   We could factor this with providing a standard pattern name for
   
   (define_insn vdupimode
 [set (match_operand:mode 0 register_operand)
  (vec_duplicate:mode
 (vec_select:scalarmode
(match_operand:mode 1 register_operand)
(match_operand:SI 2 immediate_operand]
  
  This is good. I did think about this but then I thought of avoiding the need
  for combiner patterns :-) 
  
  But do you find the lane specific mov pattern I proposed, acceptable? 
 
 The specific mul pattern?  As said, consider factoring to vdupi to
 avoid an explosion in required special optabs.
 
   (you use V4SI for the immediate?  
  
  Sorry typo again!! It should've been SI.
  
   Ideally vdupi has another custom
   mode for the vector index).
   
   Note that this factored pattern is already available as vec_perm_const!
   It is simply (vec_perm_const:V4SI source source immediate-selector).
   
   Which means that on the GIMPLE level we should try to combine
   
   el_4 = BIT_FIELD_REF v_3, ...;
   v_5 = { el_4, el_4, ... };
  
  I don't think we reach this state at all for the scenarios in discussion.
  what we generally have is:
  
   el_4 = MEM_REF  array + index*size 
   v_5 = { el_4, ... }
  
  Or am I missing something?
 
 Well, but in that case I doubt it is profitable (or even valid!) to
 turn this into a vector lane load from the array.  If it is profitable
 to perform a vector read (because we're going to use the other elements
 of the vector as well) then the vectorizer should produce a vector
 load and materialize the uniform vector from one of its elements.
 
 Maybe at this point you should show us a compilable C testcase
 with a loop that should be vectorized using your instructions in
 the end?

Here's a compilable example:

void 
foo (int *__restrict__ a,
 int *__restrict__ b,
 int *__restrict__ c)
{
  int i;

  for (i = 0; i  8; i++)
a[i] = b[i] * c[2];
}

This is vectorized by duplicating c[2] now. But I'm trying to take advantage
of target instructions that can take a vector register as second argument but
use only one element (by using the same value for all the lanes) of the 
vector register.

Eg. mul vec-reg, vec-reg, vec-reg[index]
mla vec-reg, vec-reg, vec-reg[index] // multiply and add

But for a loop like the one in the C example given, I will have to load the
c[2] in one element of the vector register (leaving

[Patch] Fix the testcases that use bind_pic_locally

2013-10-08 Thread Vidya Praveen
Hello,

There are several tests that use dg-add-options bind_pic_locally in order to
add -fPIE or -fpie when -fPIC or -fpic are used respectively with the expecta-
tion that -fPIE/-fpie will override -fPIC/-fpic. But this doesn't happen since
since -fPIE/-fpie will be added before the -fPIC/-fpic (whether -fPIC/-fpic is
added as a multilib option or through cflags). This is essentially due to the
fact that cflags and multilib flags are added after the options are added 
through dg-options, dg-add-options, et al. in default_target_compile function.

Assuming dg-options or dg-add-options should always win, we can fix this by
modifying the order in which they are concatenated at default_target_compile in
target.exp. But this is not recommended since it depends on everyone who tests
upgrading their dejagnu (refer [1]). 

So this patch replaces:

/* { dg-add-options bind_pic_locally } */

with 

/* { dg-skip-if  { *-*-* } { -fPIC -fpic } {  } } */

in all the applicable test files. 

NOTE: There are many files that uses bind_pic_locally but they do PASS whether
or not -fPIE/-fpie is passed. But I've replaced in all the files that uses
bind_pic_locally.

add_options_for_bind_pic_locally should IMO be removed or deprecated since it is
is misleading. I can post a separate patch for this if everyone agrees to it.

References:
[1] http://gcc.gnu.org/ml/gcc/2013-07/msg00281.html
[2] http://gcc.gnu.org/ml/gcc/2013-09/msg00207.html

This issue for obvious reasons, common to all targets. 

Tested for aarch64-none-elf. OK for trunk?

Cheers
VP

---

gcc/testsuite/ChangeLog:

2013-10-08  Vidya Praveen  vidyaprav...@arm.com

* gcc.dg/inline-33.c: Remove bind_pic_locally and skip if -fPIC/-fpic
is used.
* gcc.dg/ipa/ipa-3.c: Likewise.
* gcc.dg/ipa/ipa-5.c: Likewise.
* gcc.dg/ipa/ipa-7.c: Likewise.
* gcc.dg/ipa/ipcp-2.c: Likewise.
* gcc.dg/ipa/ipcp-agg-1.c: Likewise.
* gcc.dg/ipa/ipcp-agg-2.c: Likewise.
* gcc.dg/ipa/ipcp-agg-6.c: Likewise.
* gcc.dg/ipa/ipa-1.c: Likewise.
* gcc.dg/ipa/ipa-2.c: Likewise.
* gcc.dg/ipa/ipa-4.c: Likewise.
* gcc.dg/ipa/ipa-8.c: Likewise.
* gcc.dg/ipa/ipacost-2.c: Likewise.
* gcc.dg/ipa/ipcp-1.c: Likewise.
* gcc.dg/ipa/ipcp-4.c: Likewise.
* gcc.dg/ipa/ipcp-agg-3.c: Likewise.
* gcc.dg/ipa/ipcp-agg-4.c: Likewise.
* gcc.dg/ipa/ipcp-agg-5.c: Likewise.
* gcc.dg/ipa/ipcp-agg-7.c: Likewise.
* gcc.dg/ipa/ipcp-agg-8.c: Likewise.
* gcc.dg/ipa/pr56988.c: Likewise.
* g++.dg/ipa/iinline-1.C: Likewise.
* g++.dg/ipa/iinline-2.C: Likewise.
* g++.dg/ipa/iinline-3.C: Likewise.
* g++.dg/ipa/inline-1.C: Likewise.
* g++.dg/ipa/inline-2.C: Likewise.
* g++.dg/ipa/inline-3.C: Likewise.
* g++.dg/other/first-global.C: Likewise.
* g++.dg/parse/attr-externally-visible-1.C: Likewise.
* g++.dg/torture/pr40323.C: Likewise.
* g++.dg/torture/pr55260-1.C: Likewise.
* g++.dg/torture/pr55260-2.C: Likewise.
* g++.dg/tree-ssa/inline-1.C: Likewise.
* g++.dg/tree-ssa/inline-2.C: Likewise.
* g++.dg/tree-ssa/inline-3.C: Likewise.
* g++.dg/tree-ssa/nothrow-1.C: Likewise.
* gcc.dg/tree-ssa/inline-3.c: Likewise.
* gcc.dg/tree-ssa/inline-4.c: Likewise.
* gcc.dg/tree-ssa/ipa-cp-1.c: Likewise.
* gcc.dg/tree-ssa/local-pure-const.c: Likewise.
* gfortran.dg/whole_file_5.f90: Likewise.
* gfortran.dg/whole_file_6.f90: Likewise.
diff --git a/gcc/testsuite/g++.dg/ipa/iinline-1.C b/gcc/testsuite/g++.dg/ipa/iinline-1.C
index 9f99893..e4daa8c 100644
--- a/gcc/testsuite/g++.dg/ipa/iinline-1.C
+++ b/gcc/testsuite/g++.dg/ipa/iinline-1.C
@@ -2,7 +2,7 @@
inlining..  */
 /* { dg-do compile } */
 /* { dg-options -O3 -fdump-ipa-inline -fno-early-inlining  } */
-/* { dg-add-options bind_pic_locally } */
+/* { dg-skip-if  { *-*-* } { -fPIC -fpic } {  } } */
 
 extern void non_existent (const char *, int);
 
diff --git a/gcc/testsuite/g++.dg/ipa/iinline-2.C b/gcc/testsuite/g++.dg/ipa/iinline-2.C
index 670a5dd..64a4dce 100644
--- a/gcc/testsuite/g++.dg/ipa/iinline-2.C
+++ b/gcc/testsuite/g++.dg/ipa/iinline-2.C
@@ -2,7 +2,7 @@
inlining..  */
 /* { dg-do compile } */
 /* { dg-options -O3 -fdump-ipa-inline -fno-early-inlining  } */
-/* { dg-add-options bind_pic_locally } */
+/* { dg-skip-if  { *-*-* } { -fPIC -fpic } {  } } */
 
 extern void non_existent (const char *, int);
 
diff --git a/gcc/testsuite/g++.dg/ipa/iinline-3.C b/gcc/testsuite/g++.dg/ipa/iinline-3.C
index 3daee9a..0d59969 100644
--- a/gcc/testsuite/g++.dg/ipa/iinline-3.C
+++ b/gcc/testsuite/g++.dg/ipa/iinline-3.C
@@ -2,7 +2,7 @@
parameters which have been modified.  */
 /* { dg-do run } */
 /* { dg-options -O3 -fno-early-inlining  } */
-/* { dg-add-options bind_pic_locally } */
+/* { dg-skip-if  { *-*-* } { -fPIC -fpic } {  } } */
 
 extern C void abort

Re: [Patch] Fix the testcases that use bind_pic_locally

2013-10-08 Thread Vidya Praveen
On Tue, Oct 08, 2013 at 10:30:22AM +0100, Jakub Jelinek wrote:
 On Tue, Oct 08, 2013 at 10:14:59AM +0100, Vidya Praveen wrote:
  There are several tests that use dg-add-options bind_pic_locally in order 
  to
  add -fPIE or -fpie when -fPIC or -fpic are used respectively with the 
  expecta-
  tion that -fPIE/-fpie will override -fPIC/-fpic. But this doesn't happen 
  since
  since -fPIE/-fpie will be added before the -fPIC/-fpic (whether -fPIC/-fpic 
  is
  added as a multilib option or through cflags). This is essentially due to 
  the
  fact that cflags and multilib flags are added after the options are added 
  through dg-options, dg-add-options, et al. in default_target_compile 
  function.
  
  Assuming dg-options or dg-add-options should always win, we can fix this by
  modifying the order in which they are concatenated at 
  default_target_compile in
  target.exp. But this is not recommended since it depends on everyone who 
  tests
  upgrading their dejagnu (refer [1]). 
 
 This looks like a big step backwards and I'm afraid it can break targets
 where -fpic/-fPIC is the default. 

I agree. I didn't think of this. Since the -fPIC/-fpic comes before the 
-fPIE/-fpie
this will work here. In other words, bind_pic_locally is not broken in this 
case.

(This is assuming the -fPIC/-fpic as default option is passed through 
DRIVER_SELF_SPECS or similar).

 If dg-add-options bind_pic_locally must
 add options to the end of command line, then can't you just push the options
 that must go last to some variable other than dg-extra-tool-flags and as we
 override dejagnu's dg-test, put it in our override last (or in whatever
 other method that already added the multilib options)?

Well, multilib options are added at default_target_compile which is in 
target.exp.
If I store the flags in some variable at add_options_for_bind_pic_locally and 
add
it later, it still going to be before default_target_compile is called. 

Hope I understood your suggestion right.

Cheers
VP




Re: [RFC] Vectorization of indexed elements

2013-09-30 Thread Vidya Praveen
On Fri, Sep 27, 2013 at 04:19:45PM +0100, Vidya Praveen wrote:
 On Fri, Sep 27, 2013 at 03:50:08PM +0100, Vidya Praveen wrote:
 [...]
I can't really insist on the single lane load.. something like:

vc:V4SI[0] = c
vt:V4SI = vec_duplicate:V4SI (vec_select:SI vc:V4SI 0)
va:V4SI = vb:V4SI op vt:V4SI

Or is there any other way to do this?
   
   Can you elaborate on I can't really insist on the single lane load?
   What's the single lane load in your example? 
  
  Loading just one lane of the vector like this:
  
  vc:V4SI[0] = c // from the above scalar example
  
  or 
  
  vc:V4SI[0] = c[2] 
  
  is what I meant by single lane load. In this example:
  
  t = c[2] 
  ...
  vb:v4si = b[0:3] 
  vc:v4si = { t, t, t, t }
  va:v4si = vb:v4si op vc:v4si 
  
  If we are expanding the CONSTRUCTOR as vec_duplicate at vec_init, I cannot
  insist 't' to be vector and t = c[2] to be vect_t[0] = c[2] (which could be 
  seen as vec_select:SI (vect_t 0) ). 
  
   I'd expect the instruction
   pattern as quoted to just work (and I hope we expand an uniform
   constructor { a, a, a, a } properly using vec_duplicate).
  
  As much as I went through the code, this is only done using vect_init. It is
  not expanded as vec_duplicate from, for example, store_constructor() of 
  expr.c
 
 Do you see any issues if we expand such constructor as vec_duplicate directly 
 instead of going through vect_init way? 

Sorry, that was a bad question.

But here's what I would like to propose as a first step. Please tell me if this
is acceptable or if it makes sense:

- Introduce standard pattern names 

vmulim4 - vector muliply with second operand as indexed operand

Example:

(define_insn vmuliv4si4
   [set (match_operand:V4SI 0 register_operand)
(mul:V4SI (match_operand:V4SI 1 register_operand)
  (vec_duplicate:V4SI
(vec_select:SI
  (match_operand:V4SI 2 register_operand)
  (match_operand:V4SI 3 immediate_operand)]
 ...
)

vlmovmn3 - move where one of the operands is specific lane of a vector and 
 other is a scalar. 

Example:

(define_insn vlmovv4sisi3
  [set (vec_select:SI (match_operand:V4SI 0 register_operand)
  (match_operand:SI 1 immediate_operand))
   (match_operand:SI 2 memory_operand)]
  ...
)

- Identify the following idiom and expand through the above standard patterns:

  t = c[m] 
  vc[0:n] = { t, t, t, t}
  a[0:n] = b[0:n] * vc[0:n] 

as 

 (insn (set (vec_select:SI (reg:V4SI 0) 0) (mem:SI ... )))
 (insn (set (reg:V4SI 1)
(mult:V4SI (reg:V4SI 2)
   (vec_duplicate:V4SI (vec_select:SI (reg:V4SI 0) 0)

If this path is acceptable, then I can extend this to support 

vmaddim4 - multiply and add (with indexed element as multiplier)
vmsubim4 - multiply and subtract (with indexed element as multiplier)

Please let me know your thoughts.

Cheers
VP




Re: [RFC] Vectorization of indexed elements

2013-09-30 Thread Vidya Praveen
On Wed, Sep 25, 2013 at 10:22:05AM +0100, Richard Biener wrote:
 On Tue, 24 Sep 2013, Vidya Praveen wrote:
 
  On Tue, Sep 10, 2013 at 09:25:32AM +0100, Richard Biener wrote:
   On Mon, 9 Sep 2013, Marc Glisse wrote:
   
On Mon, 9 Sep 2013, Vidya Praveen wrote:

 Hello,
 
 This post details some thoughts on an enhancement to the vectorizer 
 that
 could take advantage of the SIMD instructions that allows indexed 
 element
 as an operand thus reducing the need for duplication and possibly 
 improve
 reuse of previously loaded data.
 
 Appreciate your opinion on this.
 
 ---
 
 A phrase like this:
 
 for(i=0;i4;i++)
   a[i] = b[i] op c[2];
 
 is usually vectorized as:
 
  va:V4SI = a[0:3]
  vb:V4SI = b[0:3]
  t = c[2]
  vc:V4SI = { t, t, t, t } // typically expanded as vec_duplicate at 
 vec_init
  ...
  va:V4SI = vb:V4SI op vc:V4SI
 
 But this could be simplified further if a target has instructions that
 support
 indexed element as a parameter. For example an instruction like this:
 
  mul v0.4s, v1.4s, v2.4s[2]
 
 can perform multiplication of each element of v2.4s with the third 
 element
 of
 v2.4s (specified as v2.4s[2]) and store the results in the 
 corresponding
 elements of v0.4s.
 
 For this to happen, vectorizer needs to understand this idiom and 
 treat the
 operand c[2] specially (and by taking in to consideration if the 
 machine
 supports indexed element as an operand for op through a target hook 
 or
 macro)
 and consider this as vectorizable statement without having to 
 duplicate the
 elements explicitly.
 
 There are fews ways this could be represented at gimple:
 
  ...
  va:V4SI = vb:V4SI op VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 
 2))
  ...
 
 or by allowing a vectorizer treat an indexed element as a valid 
 operand in a
 vectorizable statement:

Might as well allow any scalar then...
   
   I agree.  The VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2)) form
   would necessarily be two extra separate statements and thus subject
   to CSE obfuscating it enough for RTL expansion to no longer notice it.
  
  I also thought about having a specialized expression like
  
  VEC_INDEXED_op_EXPR  arg0, arg1, arg2, index 
  
  to mean:
  
  arg0 = arg1 op arg2[index]
  
  and handle it directly in the expander, like (for eg.) how VEC_LSHIFT_EXPR
  is handled in expr.c. But I dropped this idea since we may need to introduce
  many such nodes.
  
   
   That said, allowing mixed scalar/vector ops isn't very nice and
   your scheme can be simplified by just using
   
 vc:V4SI = VEC_DUPLICATE_EXPR ...
 va:V4SI = vb:V4SI op vc:V4SI
   
   where the expander only has to see that vc:V4SI is defined by
   a duplicate.
  
  I did try out something like this quickly before I posted this RFC, though
  I called it VEC_DUP to mean a equivalent of vec_duplicate(vec_select())
  
  for: 
  
for(i=0;i8;i++)
  a[i] = b[2] * c[i];
  
  I could generate:
  
...
bb 8:
_88 = prolog_loop_adjusted_niters.6_60 * 4;
vectp_c.13_87 = c_10(D) + _88;
vect_ldidx_.16_92 = MEM[(int *)b_8(D) + 8B]; 
vect_idxed_.17_93 = (vect_ldidx_.16_92)  ???  (0); 
_96 = prolog_loop_adjusted_niters.6_60 * 4;
vectp_a.19_95 = a_6(D) + _96;
vect__12.14_115 = MEM[(int *)vectp_c.13_87];
vect_patt_40.15_116 = vect__12.14_115 * vect_idxed_.17_93;   
MEM[(int *)vectp_a.19_95] = vect_patt_40.15_116; 
vectp_c.12_118 = vectp_c.13_87 + 16;
vectp_a.18_119 = vectp_a.19_95 + 16;
ivtmp_120 = 1;
if (ivtmp_120  bnd.8_62)
  goto bb 9;
else
  goto bb 11;
  
bb 9:
# vectp_c.12_89 = PHI vectp_c.12_118(8)
# vectp_a.18_97 = PHI vectp_a.18_119(8)
# ivtmp_14 = PHI ivtmp_120(8)
vect__12.14_91 = MEM[(int *)vectp_c.12_89];  
vect_patt_40.15_94 = vect__12.14_91 * vect_idxed_.17_93; 
MEM[(int *)vectp_a.18_97] = vect_patt_40.15_94;
...
  
  It's a crude implementation so VEC_DUP is printed as:
  
(vect_ldidx_.16_92)  ???  (0);
  
  
  ...
  va:V4SI = vb:V4SI op VEC_SELECT_EXPR (vc:V4SI 2)
  ...

 For the sake of explanation, the above two representations assumes 
 that
 c[0:3] is loaded in vc for some other use and reused here. But when 
 c[2] is
 the
 only use of 'c' then it may be safer to just load one element and use 
 it
 like
 this:

  vc:V4SI[0] = c[2]
  va:V4SI = vb:V4SI op VEC_SELECT_EXPR (vc:V4SI 0)
 
 This could also mean that expressions involving scalar could be 
 treated
 similarly. For example,
 
  for(i=0;i4;i++)
a[i] = b[i] op c
 
 could be vectorized as:
 
  vc:V4SI[0] = c

Re: [RFC] Vectorization of indexed elements

2013-09-30 Thread Vidya Praveen
On Mon, Sep 30, 2013 at 02:19:32PM +0100, Richard Biener wrote:
 On Mon, 30 Sep 2013, Vidya Praveen wrote:
 
  On Fri, Sep 27, 2013 at 04:19:45PM +0100, Vidya Praveen wrote:
   On Fri, Sep 27, 2013 at 03:50:08PM +0100, Vidya Praveen wrote:
   [...]
  I can't really insist on the single lane load.. something like:
  
  vc:V4SI[0] = c
  vt:V4SI = vec_duplicate:V4SI (vec_select:SI vc:V4SI 0)
  va:V4SI = vb:V4SI op vt:V4SI
  
  Or is there any other way to do this?
 
 Can you elaborate on I can't really insist on the single lane load?
 What's the single lane load in your example? 

Loading just one lane of the vector like this:

vc:V4SI[0] = c // from the above scalar example

or 

vc:V4SI[0] = c[2] 

is what I meant by single lane load. In this example:

t = c[2] 
...
vb:v4si = b[0:3] 
vc:v4si = { t, t, t, t }
va:v4si = vb:v4si op vc:v4si 

If we are expanding the CONSTRUCTOR as vec_duplicate at vec_init, I 
cannot
insist 't' to be vector and t = c[2] to be vect_t[0] = c[2] (which 
could be 
seen as vec_select:SI (vect_t 0) ). 

 I'd expect the instruction
 pattern as quoted to just work (and I hope we expand an uniform
 constructor { a, a, a, a } properly using vec_duplicate).

As much as I went through the code, this is only done using vect_init. 
It is
not expanded as vec_duplicate from, for example, store_constructor() of 
expr.c
   
   Do you see any issues if we expand such constructor as vec_duplicate 
   directly 
   instead of going through vect_init way? 
  
  Sorry, that was a bad question.
  
  But here's what I would like to propose as a first step. Please tell me if 
  this
  is acceptable or if it makes sense:
  
  - Introduce standard pattern names 
  
  vmulim4 - vector muliply with second operand as indexed operand
  
  Example:
  
  (define_insn vmuliv4si4
 [set (match_operand:V4SI 0 register_operand)
  (mul:V4SI (match_operand:V4SI 1 register_operand)
(vec_duplicate:V4SI
  (vec_select:SI
(match_operand:V4SI 2 register_operand)
(match_operand:V4SI 3 immediate_operand)]
   ...
  )
 
 We could factor this with providing a standard pattern name for
 
 (define_insn vdupimode
   [set (match_operand:mode 0 register_operand)
(vec_duplicate:mode
   (vec_select:scalarmode
  (match_operand:mode 1 register_operand)
  (match_operand:SI 2 immediate_operand]

This is good. I did think about this but then I thought of avoiding the need
for combiner patterns :-) 

But do you find the lane specific mov pattern I proposed, acceptable? 

 (you use V4SI for the immediate?  

Sorry typo again!! It should've been SI.

 Ideally vdupi has another custom
 mode for the vector index).
 
 Note that this factored pattern is already available as vec_perm_const!
 It is simply (vec_perm_const:V4SI source source immediate-selector).
 
 Which means that on the GIMPLE level we should try to combine
 
 el_4 = BIT_FIELD_REF v_3, ...;
 v_5 = { el_4, el_4, ... };

I don't think we reach this state at all for the scenarios in discussion.
what we generally have is:

 el_4 = MEM_REF  array + index*size 
 v_5 = { el_4, ... }

Or am I missing something?

 
 into
 
 v_5 = VEC_PERM_EXPR v_3, v_3, ...;
 
 which it should already do with simplify_permutation.
 
 But I'm not sure what you are after at then end ;)
 
 Richard.

 
Regards
VP



[Patch,AArch64] Support SADDL/SSUBL/UADDL/USUBL

2013-09-30 Thread Vidya Praveen
Hello,

This patch adds support to generate SADDL/SSUBL/UADDL/USUBL. Part of the support
is available already (supported for intrinsics). This patch extends this support
to generate these instructions (and lane variations) in all scenarios and adds a
testcase. Tested for aarch64-none-elf, aarch64_be-none-elf with no regressions.

OK for trunk?

Cheers
VP

~~~

gcc/ChangeLog:

2013-09-30  Vidya Praveen  vidyaprav...@arm.com

* aarch64-simd.md 
(aarch64_ANY_EXTEND:suADDSUB:optabl2mode_internal): Rename to ...
(aarch64_ANY_EXTEND:suADDSUB:optablmode_hi_internal): ... this;
Insert '\t' to output template.
(aarch64_ANY_EXTEND:suADDSUB:optablmode_lo_internal): New.
(aarch64_saddl2mode, aarch64_uaddl2mode): Modify to call 
gen_aarch64_ANY_EXTEND:suADDSUB:optablmode_hi_internal() instead.
(aarch64_ssubl2mode, aarch64_usubl2mode): Ditto.

gcc/testsuite/ChangeLog:

2013-09-30  Vidya Praveen  vidyaprav...@arm.com

* gcc.target/aarch64/vect_saddl_1.c: New.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index f13cd5b..a0259b8 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2586,7 +2586,7 @@
 
 ;; suaddsublq.
 
-(define_insn aarch64_ANY_EXTEND:suADDSUB:optabl2mode_internal
+(define_insn aarch64_ANY_EXTEND:suADDSUB:optablmode_hi_internal
  [(set (match_operand:VWIDE 0 register_operand =w)
(ADDSUB:VWIDE (ANY_EXTEND:VWIDE (vec_select:VHALF
 			   (match_operand:VQW 1 register_operand w)
@@ -2595,11 +2595,26 @@
 			   (match_operand:VQW 2 register_operand w)
 			   (match_dup 3)]
   TARGET_SIMD
-  ANY_EXTEND:suADDSUB:optabl2 %0.Vwtype, %1.Vtype, %2.Vtype
+  ANY_EXTEND:suADDSUB:optabl2\t%0.Vwtype, %1.Vtype, %2.Vtype
   [(set_attr simd_type simd_addl)
(set_attr simd_mode MODE)]
 )
 
+(define_insn aarch64_ANY_EXTEND:suADDSUB:optablmode_lo_internal
+ [(set (match_operand:VWIDE 0 register_operand =w)
+   (ADDSUB:VWIDE (ANY_EXTEND:VWIDE (vec_select:VHALF
+   (match_operand:VQW 1 register_operand w)
+   (match_operand:VQW 3 vect_par_cnst_lo_half )))
+   (ANY_EXTEND:VWIDE (vec_select:VHALF
+   (match_operand:VQW 2 register_operand w)
+   (match_dup 3)]
+  TARGET_SIMD
+  ANY_EXTEND:suADDSUB:optabl\t%0.Vwtype, %1.Vhalftype, %2.Vhalftype
+  [(set_attr simd_type simd_addl)
+   (set_attr simd_mode MODE)]
+)
+
+
 (define_expand aarch64_saddl2mode
   [(match_operand:VWIDE 0 register_operand =w)
(match_operand:VQW 1 register_operand w)
@@ -2607,8 +2622,8 @@
   TARGET_SIMD
 {
   rtx p = aarch64_simd_vect_par_cnst_half (MODEmode, true);
-  emit_insn (gen_aarch64_saddl2mode_internal (operands[0], operands[1],
-		operands[2], p));
+  emit_insn (gen_aarch64_saddlmode_hi_internal (operands[0], operands[1],
+  operands[2], p));
   DONE;
 })
 
@@ -2619,8 +2634,8 @@
   TARGET_SIMD
 {
   rtx p = aarch64_simd_vect_par_cnst_half (MODEmode, true);
-  emit_insn (gen_aarch64_uaddl2mode_internal (operands[0], operands[1],
-		operands[2], p));
+  emit_insn (gen_aarch64_uaddlmode_hi_internal (operands[0], operands[1],
+  operands[2], p));
   DONE;
 })
 
@@ -2631,7 +2646,7 @@
   TARGET_SIMD
 {
   rtx p = aarch64_simd_vect_par_cnst_half (MODEmode, true);
-  emit_insn (gen_aarch64_ssubl2mode_internal (operands[0], operands[1],
+  emit_insn (gen_aarch64_ssublmode_hi_internal (operands[0], operands[1],
 		operands[2], p));
   DONE;
 })
@@ -2643,7 +2658,7 @@
   TARGET_SIMD
 {
   rtx p = aarch64_simd_vect_par_cnst_half (MODEmode, true);
-  emit_insn (gen_aarch64_usubl2mode_internal (operands[0], operands[1],
+  emit_insn (gen_aarch64_usublmode_hi_internal (operands[0], operands[1],
 		operands[2], p));
   DONE;
 })
diff --git a/gcc/testsuite/gcc.target/aarch64/vect_saddl_1.c b/gcc/testsuite/gcc.target/aarch64/vect_saddl_1.c
new file mode 100644
index 000..ecbd8a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect_saddl_1.c
@@ -0,0 +1,315 @@
+/* { dg-do run } */
+/* { dg-options -O3 -fno-inline -save-temps -fno-vect-cost-model } */
+
+typedef signed char S8_t;
+typedef signed short S16_t;
+typedef signed int S32_t;
+typedef signed long long S64_t;
+
+typedef signed char *__restrict__ pS8_t;
+typedef signed short *__restrict__ pS16_t;
+typedef signed int *__restrict__ pS32_t;
+typedef signed long long *__restrict__ pS64_t;
+
+typedef unsigned char U8_t;
+typedef unsigned short U16_t;
+typedef unsigned int U32_t;
+typedef unsigned long long U64_t;
+
+typedef unsigned char *__restrict__ pU8_t;
+typedef unsigned short *__restrict__ pU16_t;
+typedef unsigned int *__restrict__ pU32_t;
+typedef unsigned long long *__restrict__ pU64_t;
+
+extern void abort ();
+
+void
+test_addl_S64_S32_4 (pS64_t a, pS32_t b, pS32_t c)
+{
+  int i;
+  for (i = 0; i  4; i

Re: [RFC] Vectorization of indexed elements

2013-09-27 Thread Vidya Praveen
On Fri, Sep 27, 2013 at 03:50:08PM +0100, Vidya Praveen wrote:
[...]
   I can't really insist on the single lane load.. something like:
   
   vc:V4SI[0] = c
   vt:V4SI = vec_duplicate:V4SI (vec_select:SI vc:V4SI 0)
   va:V4SI = vb:V4SI op vt:V4SI
   
   Or is there any other way to do this?
  
  Can you elaborate on I can't really insist on the single lane load?
  What's the single lane load in your example? 
 
 Loading just one lane of the vector like this:
 
 vc:V4SI[0] = c // from the above scalar example
 
 or 
 
 vc:V4SI[0] = c[2] 
 
 is what I meant by single lane load. In this example:
 
 t = c[2] 
 ...
 vb:v4si = b[0:3] 
 vc:v4si = { t, t, t, t }
 va:v4si = vb:v4si op vc:v4si 
 
 If we are expanding the CONSTRUCTOR as vec_duplicate at vec_init, I cannot
 insist 't' to be vector and t = c[2] to be vect_t[0] = c[2] (which could be 
 seen as vec_select:SI (vect_t 0) ). 
 
  I'd expect the instruction
  pattern as quoted to just work (and I hope we expand an uniform
  constructor { a, a, a, a } properly using vec_duplicate).
 
 As much as I went through the code, this is only done using vect_init. It is
 not expanded as vec_duplicate from, for example, store_constructor() of expr.c

Do you see any issues if we expand such constructor as vec_duplicate directly 
instead of going through vect_init way? 

VP




Re: dejagnu multilib options and dg-{add|additional-}options

2013-09-24 Thread Vidya Praveen
On Tue, Aug 27, 2013 at 04:34:09PM +0100, Janis Johnson wrote:
 On 08/27/2013 06:52 AM, Marcus Shawcroft wrote:
  On 23 July 2013 17:40, Janis Johnson janis_john...@mentor.com wrote:
  On 07/22/2013 02:59 AM, Vidya Praveen wrote:
  Hello,
 
  There are 42 test files (25 under gcc.dg) that specifies
 
  { dg-add-options bind_pic_locally }
 
  in the regression testsuite. The procedure 
  add_options_for_bind_pic_locally
  from lib/target-supports.exp adds -fPIE or -fpie when -fPIC or -fpic is 
  passed
  respectively. But this is added before the dejagnu multilib options are 
  added.
  So when -fPIC is passed as a multilib option, -fPIE will be unset by -fPIC
  (see gcc/common.opt). This should have been the behaviour since the patch
  http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01026.html that brings all 
  -fPIC
   -fPIE variants in a Negative loop, was applied.
 
  I tried fixing this in dejagnu/target.exp by adding the multilib options 
  before
  the other options:
 
  default_target_compile:
 
 append add_flags  [board_info $dest multilib_flags]
  ---
set add_flags  [board_info $dest multilib_flags] $add_flags
 
  and ran regressions for x86_64-unknown-linux-gnu before and after the 
  change.
  The only difference in the results after the change was 24 new PASSes 
  which
  are from the testcases which either use bind_pic_locally or that use 
  -fno-pic.
 
  (Interestingly, there are many test files that bind_pic_locally pass 
  without
  any issue before and after the change.)
 
  I tend to think that the options added from the test files should always 
  win.
  Please correct me if I'm wrong. If I'm right, is dejagnu/target.exp is the
  best place to fix this and the way it tried to fix? Any better 
  suggestions?
 
  Though this case is to do with -fPIC, I'm sure there are other options 
  which
  when they come as multilib options might have same issue with the some of 
  the
  options added by the test files or the default options.
 
  Regards
  VP
 
  Ideally we would ask for that change in DejaGnu, but relying on such a
  change would require everyone testing GCC to upgrade to a version of
  DejaGnu with that fix, and I don't think we're ready to do that.
 
  Tests that add options that might override or conflict with multilib
  flags should check for those flags first and skip the test if they are
  used.  For examples, see tests in gcc.target/arm that use dg-skip-if.
  
  Umm, the purpose of bind_pic_locally appears to be to detect the use
  of -fPIC and override that behavior with -fPIE.  If I understand the
  above paragraph correctly then bind_pic_locally is fundamentally
  broken and all of the tests that use it need rewriting to skip if
  -fPIC or -fpic is in use?
  
  Cheers
  /Marcus
  
 
 That is correct.  There should probably be an effective-target check
 bind_pic_locally_ok that fails if adding -fpie or -fPIE doesn't have the
 expected result, and the tests that use dg-add-options bind_pic_locally
 should be skipped if bind_pic_locally_ok fails.
 
 Janis


Janis, whether we pass -fPIC/-fpic through multilib_flags or through cflags
bind_pic_locally remains to be broken. So I am wondering if it's really
necessary to go through bind_pic_locally_ok route. Instead could we just
replace all the uses of bind_pic_locally with dg-skip-if and perhaps remove
the definition of bind_pic_locally to avoid future use of it?

Cheers
VP



Re: [RFC] Vectorization of indexed elements

2013-09-24 Thread Vidya Praveen
On Tue, Sep 10, 2013 at 09:25:32AM +0100, Richard Biener wrote:
 On Mon, 9 Sep 2013, Marc Glisse wrote:
 
  On Mon, 9 Sep 2013, Vidya Praveen wrote:
  
   Hello,
   
   This post details some thoughts on an enhancement to the vectorizer that
   could take advantage of the SIMD instructions that allows indexed element
   as an operand thus reducing the need for duplication and possibly improve
   reuse of previously loaded data.
   
   Appreciate your opinion on this.
   
   ---
   
   A phrase like this:
   
   for(i=0;i4;i++)
 a[i] = b[i] op c[2];
   
   is usually vectorized as:
   
va:V4SI = a[0:3]
vb:V4SI = b[0:3]
t = c[2]
vc:V4SI = { t, t, t, t } // typically expanded as vec_duplicate at 
   vec_init
...
va:V4SI = vb:V4SI op vc:V4SI
   
   But this could be simplified further if a target has instructions that
   support
   indexed element as a parameter. For example an instruction like this:
   
mul v0.4s, v1.4s, v2.4s[2]
   
   can perform multiplication of each element of v2.4s with the third element
   of
   v2.4s (specified as v2.4s[2]) and store the results in the corresponding
   elements of v0.4s.
   
   For this to happen, vectorizer needs to understand this idiom and treat 
   the
   operand c[2] specially (and by taking in to consideration if the machine
   supports indexed element as an operand for op through a target hook or
   macro)
   and consider this as vectorizable statement without having to duplicate 
   the
   elements explicitly.
   
   There are fews ways this could be represented at gimple:
   
...
va:V4SI = vb:V4SI op VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2))
...
   
   or by allowing a vectorizer treat an indexed element as a valid operand 
   in a
   vectorizable statement:
  
  Might as well allow any scalar then...
 
 I agree.  The VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2)) form
 would necessarily be two extra separate statements and thus subject
 to CSE obfuscating it enough for RTL expansion to no longer notice it.

I also thought about having a specialized expression like

VEC_INDEXED_op_EXPR  arg0, arg1, arg2, index 

to mean:

arg0 = arg1 op arg2[index]

and handle it directly in the expander, like (for eg.) how VEC_LSHIFT_EXPR
is handled in expr.c. But I dropped this idea since we may need to introduce
many such nodes.

 
 That said, allowing mixed scalar/vector ops isn't very nice and
 your scheme can be simplified by just using
 
   vc:V4SI = VEC_DUPLICATE_EXPR ...
   va:V4SI = vb:V4SI op vc:V4SI
 
 where the expander only has to see that vc:V4SI is defined by
 a duplicate.

I did try out something like this quickly before I posted this RFC, though
I called it VEC_DUP to mean a equivalent of vec_duplicate(vec_select())

for: 

  for(i=0;i8;i++)
a[i] = b[2] * c[i];

I could generate:

  ...
  bb 8:
  _88 = prolog_loop_adjusted_niters.6_60 * 4;
  vectp_c.13_87 = c_10(D) + _88;
  vect_ldidx_.16_92 = MEM[(int *)b_8(D) + 8B]; 
  vect_idxed_.17_93 = (vect_ldidx_.16_92)  ???  (0); 
  _96 = prolog_loop_adjusted_niters.6_60 * 4;
  vectp_a.19_95 = a_6(D) + _96;
  vect__12.14_115 = MEM[(int *)vectp_c.13_87];
  vect_patt_40.15_116 = vect__12.14_115 * vect_idxed_.17_93;   
  MEM[(int *)vectp_a.19_95] = vect_patt_40.15_116; 
  vectp_c.12_118 = vectp_c.13_87 + 16;
  vectp_a.18_119 = vectp_a.19_95 + 16;
  ivtmp_120 = 1;
  if (ivtmp_120  bnd.8_62)
goto bb 9;
  else
goto bb 11;

  bb 9:
  # vectp_c.12_89 = PHI vectp_c.12_118(8)
  # vectp_a.18_97 = PHI vectp_a.18_119(8)
  # ivtmp_14 = PHI ivtmp_120(8)
  vect__12.14_91 = MEM[(int *)vectp_c.12_89];  
  vect_patt_40.15_94 = vect__12.14_91 * vect_idxed_.17_93; 
  MEM[(int *)vectp_a.18_97] = vect_patt_40.15_94;
  ...

It's a crude implementation so VEC_DUP is printed as:

  (vect_ldidx_.16_92)  ???  (0);


...
va:V4SI = vb:V4SI op VEC_SELECT_EXPR (vc:V4SI 2)
...
  
   For the sake of explanation, the above two representations assumes that
   c[0:3] is loaded in vc for some other use and reused here. But when c[2] 
   is
   the
   only use of 'c' then it may be safer to just load one element and use it
   like
   this:
  
vc:V4SI[0] = c[2]
va:V4SI = vb:V4SI op VEC_SELECT_EXPR (vc:V4SI 0)
   
   This could also mean that expressions involving scalar could be treated
   similarly. For example,
   
for(i=0;i4;i++)
  a[i] = b[i] op c
   
   could be vectorized as:
   
vc:V4SI[0] = c
va:V4SI = vb:V4SI op VEC_SELECT_EXPR (vc:V4SI 0)
   
   Such a change would also require new standard pattern names to be defined
   for
   each op.
   
   Alternatively, having something like this:
   
...
vt:V4SI = VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2))
va:V4SI = vb:V4SI op vt:V4SI
...
   
   would remove the need to introduce several new standard pattern names but
   have
   just one to represent vec_duplicate(vec_select()) but ofcourse this will
   expect

Re: [RFC] Vectorization of indexed elements

2013-09-24 Thread Vidya Praveen
On Mon, Sep 09, 2013 at 07:02:52PM +0100, Marc Glisse wrote:
 On Mon, 9 Sep 2013, Vidya Praveen wrote:
 
  Hello,
 
  This post details some thoughts on an enhancement to the vectorizer that
  could take advantage of the SIMD instructions that allows indexed element
  as an operand thus reducing the need for duplication and possibly improve
  reuse of previously loaded data.
 
  Appreciate your opinion on this.
 
  ---
 
  A phrase like this:
 
  for(i=0;i4;i++)
a[i] = b[i] op c[2];
 
  is usually vectorized as:
 
   va:V4SI = a[0:3]
   vb:V4SI = b[0:3]
   t = c[2]
   vc:V4SI = { t, t, t, t } // typically expanded as vec_duplicate at vec_init
   ...
   va:V4SI = vb:V4SI op vc:V4SI
 
  But this could be simplified further if a target has instructions that 
  support
  indexed element as a parameter. For example an instruction like this:
 
   mul v0.4s, v1.4s, v2.4s[2]
 
  can perform multiplication of each element of v2.4s with the third element 
  of
  v2.4s (specified as v2.4s[2]) and store the results in the corresponding
  elements of v0.4s.
 
  For this to happen, vectorizer needs to understand this idiom and treat the
  operand c[2] specially (and by taking in to consideration if the machine
  supports indexed element as an operand for op through a target hook or 
  macro)
  and consider this as vectorizable statement without having to duplicate the
  elements explicitly.
 
  There are fews ways this could be represented at gimple:
 
   ...
   va:V4SI = vb:V4SI op VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2))
   ...
 
  or by allowing a vectorizer treat an indexed element as a valid operand in a
  vectorizable statement:
 
 Might as well allow any scalar then...

Yes, I had given an example below.

 
   ...
   va:V4SI = vb:V4SI op VEC_SELECT_EXPR (vc:V4SI 2)
   ...
 
  For the sake of explanation, the above two representations assumes that
  c[0:3] is loaded in vc for some other use and reused here. But when c[2] is 
  the
  only use of 'c' then it may be safer to just load one element and use it 
  like
  this:
 
   vc:V4SI[0] = c[2]
   va:V4SI = vb:V4SI op VEC_SELECT_EXPR (vc:V4SI 0)
 
  This could also mean that expressions involving scalar could be treated
  similarly. For example,
 
   for(i=0;i4;i++)
 a[i] = b[i] op c
 
  could be vectorized as:
 
   vc:V4SI[0] = c
   va:V4SI = vb:V4SI op VEC_SELECT_EXPR (vc:V4SI 0)
 
  Such a change would also require new standard pattern names to be defined 
  for
  each op.
 
  Alternatively, having something like this:
 
   ...
   vt:V4SI = VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2))
   va:V4SI = vb:V4SI op vt:V4SI
   ...
 
  would remove the need to introduce several new standard pattern names but 
  have
  just one to represent vec_duplicate(vec_select()) but ofcourse this will 
  expect
  the target to have combiner patterns.
 
 The cost estimation wouldn't be very good, but aren't combine patterns 
 enough for the whole thing? Don't you model your mul instruction as:
 
 (mult:V4SI
(match_operand:V4SI)
(vec_duplicate:V4SI (vec_select:SI (match_operand:V4SI
 
 anyway? Seems that combine should be able to handle it. What currently 
 happens that we fail to generate the right instruction?

At vec_init, I can recognize an idiom in order to generate vec_duplicate but
I can't really insist on the single lane load.. something like:

vc:V4SI[0] = c
vt:V4SI = vec_duplicate:V4SI (vec_select:SI vc:V4SI 0)
va:V4SI = vb:V4SI op vt:V4SI

Or is there any other way to do this?

Cheers
VP

 
 In gimple, we already have BIT_FIELD_REF for vec_select and CONSTRUCTOR 
 for vec_duplicate, adding new nodes is always painful.
 
  This enhancement could possibly help further optimizing larger scenarios 
  such
  as linear systems.
 
  Regards
  VP
 
 -- 
 Marc Glisse





[RFC] Vectorization of indexed elements

2013-09-09 Thread Vidya Praveen
Hello,

This post details some thoughts on an enhancement to the vectorizer that 
could take advantage of the SIMD instructions that allows indexed element
as an operand thus reducing the need for duplication and possibly improve
reuse of previously loaded data.

Appreciate your opinion on this. 

--- 

A phrase like this:

 for(i=0;i4;i++)
   a[i] = b[i] op c[2];

is usually vectorized as:

  va:V4SI = a[0:3]
  vb:V4SI = b[0:3]
  t = c[2]
  vc:V4SI = { t, t, t, t } // typically expanded as vec_duplicate at vec_init
  ...
  va:V4SI = vb:V4SI op vc:V4SI

But this could be simplified further if a target has instructions that support
indexed element as a parameter. For example an instruction like this:

  mul v0.4s, v1.4s, v2.4s[2]

can perform multiplication of each element of v2.4s with the third element of
v2.4s (specified as v2.4s[2]) and store the results in the corresponding 
elements of v0.4s. 

For this to happen, vectorizer needs to understand this idiom and treat the 
operand c[2] specially (and by taking in to consideration if the machine 
supports indexed element as an operand for op through a target hook or macro)
and consider this as vectorizable statement without having to duplicate the 
elements explicitly. 

There are fews ways this could be represented at gimple:

  ...
  va:V4SI = vb:V4SI op VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2))
  ...

or by allowing a vectorizer treat an indexed element as a valid operand in a 
vectorizable statement:

  ...
  va:V4SI = vb:V4SI op VEC_SELECT_EXPR (vc:V4SI 2)
  ...

For the sake of explanation, the above two representations assumes that 
c[0:3] is loaded in vc for some other use and reused here. But when c[2] is the
only use of 'c' then it may be safer to just load one element and use it like
this:

  vc:V4SI[0] = c[2]
  va:V4SI = vb:V4SI op VEC_SELECT_EXPR (vc:V4SI 0)

This could also mean that expressions involving scalar could be treated 
similarly. For example,

  for(i=0;i4;i++)
a[i] = b[i] op c

could be vectorized as:

  vc:V4SI[0] = c
  va:V4SI = vb:V4SI op VEC_SELECT_EXPR (vc:V4SI 0)
  
Such a change would also require new standard pattern names to be defined for
each op.

Alternatively, having something like this:

  ...
  vt:V4SI = VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2))
  va:V4SI = vb:V4SI op vt:V4SI
  ...

would remove the need to introduce several new standard pattern names but have
just one to represent vec_duplicate(vec_select()) but ofcourse this will expect
the target to have combiner patterns.

This enhancement could possibly help further optimizing larger scenarios such 
as linear systems.

Regards
VP





[Patch,AArch64] Support SISD Shifts (SHL/USHR/SSHL/USHL/SSHR)

2013-08-20 Thread Vidya Praveen
Hello,

This patch supports SISD shift instructions SHL/USHR/SSHR/SSHL/USHL for
SImode and DImode. This patch also refactors the integer shifts pattern
optabmode3_insn. Pattern for rotate is moved out as rormode3_insn.

Shift patterns (aarch64_{lshr|ashl|ashr}_sisd_or_int_{si|di}3) support
both SIMD registers and general purpose registers with the shift quantity
either as variable or literal. Since there are no SISD instructions for
right shifts, the instruction SSHL and USHL are used with shift operand
negated using NEG in order reverse the direction. This is done by
insisting on splitting (after reload) in to neg and UNSPEC_SISD_USHL or
UNSPEC_SISD_SSHL or UNSPEC_USHL_S2 or UNSPEC_SSHL_S2 pattern. Since there
are no SISD variants of shift instructions available for SImode, the SIMD
variants of corresponsing instructions are used with 2S size by taking
one lane alone in to cosideration and ignoring other.

This patch also introduces a predicate aarch64_simd_register to help in
splitting patterns. Tests for both newly introduced instructions as well
as for the integer instructions are included.

Tested and no new regressions.

OK for trunk?

Regards
VP

---

gcc/ChangeLog

2013-08-20  Vidya Praveen  vidyaprav...@arm.com

* config/aarch64/aarch64.md (unspec): Add UNSPEC_SISD_SSHL,
UNSPEC_SISD_USHL, UNSPEC_USHL_2S, UNSPEC_SSHL_2S, UNSPEC_SISD_NEG.
(optabmode3_insn): Remove.
(aarch64_ashl_sisd_or_int_mode3): New Pattern.
(aarch64_lshr_sisd_or_int_mode3): Likewise.
(aarch64_ashr_sisd_or_int_mode3): Likewise.
(define_split for aarch64_lshr_sisd_or_int_di3): Likewise.
(define_split for aarch64_lshr_sisd_or_int_si3): Likewise.
(define_split for aarch64_ashr_sisd_or_int_di3): Likewise.
(define_split for aarch64_ashr_sisd_or_int_si3): Likewise.
(aarch64_sisd_ushl, aarch64_sisd_sshl): Likewise.
(aarch64_ushl_2s, aarch64_sshl_2s, aarch64_sisd_neg_qi): Likewise.
(rormode3_insn): Likewise.
* config/aarch64/predicates.md (aarch64_simd_register): New.

gcc/testsuite/ChangeLog

2013-08-20  Vidya Praveen  vidyaprav...@arm.com

* gcc.target/aarch64/scalar_shift_1.c: New.





Re: [Patch,AArch64] Support SISD Shifts (SHL/USHR/SSHL/USHL/SSHR)

2013-08-20 Thread Vidya Praveen
With the attachment this time :-)

Regards
VP

On Tue, Aug 20, 2013 at 04:01:59PM +0100, Vidya Praveen wrote:
 Hello,
 
 This patch supports SISD shift instructions SHL/USHR/SSHR/SSHL/USHL for
 SImode and DImode. This patch also refactors the integer shifts pattern
 optabmode3_insn. Pattern for rotate is moved out as rormode3_insn.
 
 Shift patterns (aarch64_{lshr|ashl|ashr}_sisd_or_int_{si|di}3) support
 both SIMD registers and general purpose registers with the shift quantity
 either as variable or literal. Since there are no SISD instructions for
 right shifts, the instruction SSHL and USHL are used with shift operand
 negated using NEG in order reverse the direction. This is done by
 insisting on splitting (after reload) in to neg and UNSPEC_SISD_USHL or
 UNSPEC_SISD_SSHL or UNSPEC_USHL_S2 or UNSPEC_SSHL_S2 pattern. Since there
 are no SISD variants of shift instructions available for SImode, the SIMD
 variants of corresponsing instructions are used with 2S size by taking
 one lane alone in to cosideration and ignoring other.
 
 This patch also introduces a predicate aarch64_simd_register to help in
 splitting patterns. Tests for both newly introduced instructions as well
 as for the integer instructions are included.
 
 Tested and no new regressions.
 
 OK for trunk?
 
 Regards
 VP
 
 ---
 
 gcc/ChangeLog
 
 2013-08-20  Vidya Praveen  vidyaprav...@arm.com
 
 * config/aarch64/aarch64.md (unspec): Add UNSPEC_SISD_SSHL,
 UNSPEC_SISD_USHL, UNSPEC_USHL_2S, UNSPEC_SSHL_2S, UNSPEC_SISD_NEG.
 (optabmode3_insn): Remove.
 (aarch64_ashl_sisd_or_int_mode3): New Pattern.
 (aarch64_lshr_sisd_or_int_mode3): Likewise.
 (aarch64_ashr_sisd_or_int_mode3): Likewise.
 (define_split for aarch64_lshr_sisd_or_int_di3): Likewise.
 (define_split for aarch64_lshr_sisd_or_int_si3): Likewise.
 (define_split for aarch64_ashr_sisd_or_int_di3): Likewise.
 (define_split for aarch64_ashr_sisd_or_int_si3): Likewise.
 (aarch64_sisd_ushl, aarch64_sisd_sshl): Likewise.
 (aarch64_ushl_2s, aarch64_sshl_2s, aarch64_sisd_neg_qi): Likewise.
 (rormode3_insn): Likewise.
 * config/aarch64/predicates.md (aarch64_simd_register): New.
 
 gcc/testsuite/ChangeLog
 
 2013-08-20  Vidya Praveen  vidyaprav...@arm.com
 
 * gcc.target/aarch64/scalar_shift_1.c: New.
 
 diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5312a79..07349c6 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -88,11 +88,16 @@
 UNSPEC_NOP
 UNSPEC_PRLG_STK
 UNSPEC_RBIT
+UNSPEC_SISD_NEG
+UNSPEC_SISD_SSHL
+UNSPEC_SISD_USHL
+UNSPEC_SSHL_2S
 UNSPEC_ST2
 UNSPEC_ST3
 UNSPEC_ST4
 UNSPEC_TLS
 UNSPEC_TLSDESC
+UNSPEC_USHL_2S
 UNSPEC_VSTRUCTDUMMY
 ])
 
@@ -3183,13 +3188,182 @@
   }
 )
 
-(define_insn *optabmode3_insn
+;; Logical left shift using SISD or Integer instruction
+(define_insn *aarch64_ashl_sisd_or_int_mode3
+  [(set (match_operand:GPI 0 register_operand =w,w,r)
+(ashift:GPI
+  (match_operand:GPI 1 register_operand w,w,r)
+  (match_operand:QI 2 aarch64_reg_or_shift_imm_mode Uscmode,w,rUscmode)))]
+  
+  @
+   shl\t%rtn0vas, %rtn1vas, %2
+   ushl\t%rtn0vas, %rtn1vas, %rtn2vas
+   lsl\t%w0, %w1, %w2
+  [(set_attr simd yes,yes,no)
+   (set_attr simd_type simd_shift_imm,simd_shift,*)
+   (set_attr simd_mode MODE,MODE,*)
+   (set_attr v8type *,*,shift)
+   (set_attr type *,*,shift)
+   (set_attr mode *,*,MODE)]
+)
+
+;; Logical right shift using SISD or Integer instruction
+(define_insn *aarch64_lshr_sisd_or_int_mode3
+  [(set (match_operand:GPI 0 register_operand =w,w,r)
+(lshiftrt:GPI
+  (match_operand:GPI 1 register_operand w,w,r)
+  (match_operand:QI 2 aarch64_reg_or_shift_imm_mode Uscmode,w,rUscmode)))]
+  
+  @
+   ushr\t%rtn0vas, %rtn1vas, %2
+   #
+   lsr\t%w0, %w1, %w2
+  [(set_attr simd yes,yes,no)
+   (set_attr simd_type simd_shift_imm,simd_shift,*)
+   (set_attr simd_mode MODE,MODE,*)
+   (set_attr v8type *,*,shift)
+   (set_attr type *,*,shift)
+   (set_attr mode *,*,MODE)]
+)
+
+(define_split
+  [(set (match_operand:DI 0 aarch64_simd_register)
+(lshiftrt:DI
+   (match_operand:DI 1 aarch64_simd_register)
+   (match_operand:QI 2 aarch64_simd_register)))]
+  TARGET_SIMD  reload_completed
+  [(set (match_dup 2)
+(unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
+   (set (match_dup 0)
+(unspec:DI [(match_dup 1) (match_dup 2)] UNSPEC_SISD_USHL))]
+  
+)
+
+(define_split
+  [(set (match_operand:SI 0 aarch64_simd_register)
+(lshiftrt:SI
+   (match_operand:SI 1 aarch64_simd_register)
+   (match_operand:QI 2 aarch64_simd_register)))]
+  TARGET_SIMD  reload_completed
+  [(set (match_dup 2)
+(unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
+   (set (match_dup 0)
+(unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S

[Patch] Fix selector for vect-iv-5.c

2013-07-23 Thread Vidya Praveen
Hello

gcc.dg/vect/vect-iv-5.c XPASSes for arm-*-* since gcc.dg/vect/*.c tests are
always run with -ffast-math for arm-*-*. This patch makes xfail conditional
for this test by adding effective target keyword !arm_neon_ok.

OK for trunk?

Regards
VP

--

gcc/testsuite/ChangeLog:

2013-07-22  Vidya Praveen  vidyaprav...@arm.com

* gcc.dg/vect/vect-iv-5.c: Make xfail conditional with !arm_neon_ok.
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-5.c b/gcc/testsuite/gcc.dg/vect/vect-iv-5.c
index 1766ae6..8861095 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-5.c
@@ -36,5 +36,5 @@ int main (void)
   return main1 ();
 } 
 
-/* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect { xfail {! arm_neon_ok } } } } */
 /* { dg-final { cleanup-tree-dump vect } } */

dejagnu multilib options and dg-{add|additional-}options

2013-07-22 Thread Vidya Praveen
Hello,

There are 42 test files (25 under gcc.dg) that specifies

{ dg-add-options bind_pic_locally }

in the regression testsuite. The procedure add_options_for_bind_pic_locally
from lib/target-supports.exp adds -fPIE or -fpie when -fPIC or -fpic is passed
respectively. But this is added before the dejagnu multilib options are added.
So when -fPIC is passed as a multilib option, -fPIE will be unset by -fPIC 
(see gcc/common.opt). This should have been the behaviour since the patch
http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01026.html that brings all -fPIC
 -fPIE variants in a Negative loop, was applied. 

I tried fixing this in dejagnu/target.exp by adding the multilib options before
the other options:

default_target_compile:

   append add_flags  [board_info $dest multilib_flags]
---
   set add_flags  [board_info $dest multilib_flags] $add_flags

and ran regressions for x86_64-unknown-linux-gnu before and after the change.
The only difference in the results after the change was 24 new PASSes which
are from the testcases which either use bind_pic_locally or that use -fno-pic.

(Interestingly, there are many test files that bind_pic_locally pass without
any issue before and after the change.)

I tend to think that the options added from the test files should always win.
Please correct me if I'm wrong. If I'm right, is dejagnu/target.exp is the 
best place to fix this and the way it tried to fix? Any better suggestions?

Though this case is to do with -fPIC, I'm sure there are other options which
when they come as multilib options might have same issue with the some of the
options added by the test files or the default options.

Regards
VP




[AArch64] Support for SMLAL/SMLSL/UMLAL/UMLSL

2013-06-14 Thread Vidya Praveen

Hello,

This patch adds support to SMLAL/SMLSL/UMLAL/UMLSL instructions and adds tests
for the same. Regression test run for aarch64-none-elf with no regressions.

OK?

~VP

---

gcc/ChangeLog

2013-06-14  Vidya Praveen vidyaprav...@arm.com

* config/aarch64/aarch64-simd.md (*aarch64_sumlal_lomode):
  New pattern to support SMLAL,UMLAL instructions.
* config/aarch64/aarch64-simd.md (*aarch64_sumlal_himode):
  New pattern to support SMLAL2,UMLAL2 instructions.
* config/aarch64/aarch64-simd.md (*aarch64_sumlsl_lomode):
  New pattern to support SMLSL,UMLSL instructions.
* config/aarch64/aarch64-simd.md (*aarch64_sumlsl_himode):
  New pattern to support SMLSL2,UMLSL2 instructions.
* config/aarch64/aarch64-simd.md (*aarch64_sumlalmode): New pattern
  to support SMLAL/UMLAL instructions for 64 bit vector modes.
* config/aarch64/aarch64-simd.md (*aarch64_sumlslmode): New pattern
  to support SMLSL/UMLSL instructions for 64 bit vector modes.

gcc/testsuite/ChangeLog

2013-06-14  Vidya Praveen vidyaprav...@arm.com

* gcc.target/aarch64/vect_smlal_1.c: New file.diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index e5990d4..8589476 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1190,6 +1190,104 @@
 
 ;; Widening arithmetic.
 
+(define_insn *aarch64_sumlal_lomode
+  [(set (match_operand:VWIDE 0 register_operand =w)
+(plus:VWIDE
+  (mult:VWIDE
+  (ANY_EXTEND:VWIDE (vec_select:VHALF
+ (match_operand:VQW 2 register_operand w)
+ (match_operand:VQW 3 vect_par_cnst_lo_half )))
+  (ANY_EXTEND:VWIDE (vec_select:VHALF
+ (match_operand:VQW 4 register_operand w)
+ (match_dup 3
+  (match_operand:VWIDE 1 register_operand 0)))]
+  TARGET_SIMD
+  sumlal\t%0.Vwtype, %2.Vhalftype, %4.Vhalftype
+  [(set_attr simd_type simd_mlal)
+   (set_attr simd_mode MODE)]
+)
+
+(define_insn *aarch64_sumlal_himode
+  [(set (match_operand:VWIDE 0 register_operand =w)
+(plus:VWIDE
+  (mult:VWIDE
+  (ANY_EXTEND:VWIDE (vec_select:VHALF
+ (match_operand:VQW 2 register_operand w)
+ (match_operand:VQW 3 vect_par_cnst_hi_half )))
+  (ANY_EXTEND:VWIDE (vec_select:VHALF
+ (match_operand:VQW 4 register_operand w)
+ (match_dup 3
+  (match_operand:VWIDE 1 register_operand 0)))]
+  TARGET_SIMD
+  sumlal2\t%0.Vwtype, %2.Vtype, %4.Vtype
+  [(set_attr simd_type simd_mlal)
+   (set_attr simd_mode MODE)]
+)
+
+(define_insn *aarch64_sumlsl_lomode
+  [(set (match_operand:VWIDE 0 register_operand =w)
+(minus:VWIDE
+  (match_operand:VWIDE 1 register_operand 0)
+  (mult:VWIDE
+  (ANY_EXTEND:VWIDE (vec_select:VHALF
+ (match_operand:VQW 2 register_operand w)
+ (match_operand:VQW 3 vect_par_cnst_lo_half )))
+  (ANY_EXTEND:VWIDE (vec_select:VHALF
+ (match_operand:VQW 4 register_operand w)
+ (match_dup 3))]
+  TARGET_SIMD
+  sumlsl\t%0.Vwtype, %2.Vhalftype, %4.Vhalftype
+  [(set_attr simd_type simd_mlal)
+   (set_attr simd_mode MODE)]
+)
+
+(define_insn *aarch64_sumlsl_himode
+  [(set (match_operand:VWIDE 0 register_operand =w)
+(minus:VWIDE
+  (match_operand:VWIDE 1 register_operand 0)
+  (mult:VWIDE
+  (ANY_EXTEND:VWIDE (vec_select:VHALF
+ (match_operand:VQW 2 register_operand w)
+ (match_operand:VQW 3 vect_par_cnst_hi_half )))
+  (ANY_EXTEND:VWIDE (vec_select:VHALF
+ (match_operand:VQW 4 register_operand w)
+ (match_dup 3))]
+  TARGET_SIMD
+  sumlsl2\t%0.Vwtype, %2.Vtype, %4.Vtype
+  [(set_attr simd_type simd_mlal)
+   (set_attr simd_mode MODE)]
+)
+
+(define_insn *aarch64_sumlalmode
+  [(set (match_operand:VWIDE 0 register_operand =w)
+(plus:VWIDE
+  (mult:VWIDE
+(ANY_EXTEND:VWIDE
+  (match_operand:VDW 1 register_operand w))
+(ANY_EXTEND:VWIDE
+  (match_operand:VDW 2 register_operand w)))
+  (match_operand:VWIDE 3 register_operand 0)))]
+  TARGET_SIMD
+  sumlal\t%0.Vwtype, %1.Vtype, %2.Vtype
+  [(set_attr simd_type simd_mlal)
+   (set_attr simd_mode MODE)]
+)
+
+(define_insn *aarch64_sumlslmode
+  [(set (match_operand:VWIDE 0 register_operand =w)
+(minus:VWIDE
+  (match_operand:VWIDE 1 register_operand 0)
+  (mult:VWIDE
+(ANY_EXTEND:VWIDE
+  (match_operand:VDW 2 register_operand w))
+(ANY_EXTEND:VWIDE
+  (match_operand:VDW 3 register_operand w)]
+  TARGET_SIMD
+  sumlsl\t%0.Vwtype, %2.Vtype, %3.Vtype
+  [(set_attr simd_type simd_mlal)
+   (set_attr simd_mode MODE)]
+)
+
 (define_insn

Re: [AArch64] Support for SMLAL/SMLSL/UMLAL/UMLSL

2013-06-14 Thread Vidya Praveen

On 14/06/13 16:01, Richard Earnshaw wrote:

On 14/06/13 15:33, Marcus Shawcroft wrote:

On 14/06/13 14:55, Vidya Praveen wrote:

[...]

  to support SMLAL/UMLAL instructions for 64 bit vector modes.
* config/aarch64/aarch64-simd.md (*aarch64_sumlslmode): New pattern
  to support SMLSL/UMLSL instructions for 64 bit vector modes.



Convention is that we say what changed in the changelog entry and write
the justification in the covering email summary.  Tmsg00853.htmlherefore in 
instances
like this where you are defining a new pattern in is sufficient to write
simply.

* config/aarch64/aarch64-simd.md (*aarch64_sumlal_lomode): Define.



I tend to prefer New pattern. over Define. on the grounds that it tells me 
that this is a pattern, not a constraint or some other construct.

Also, there's no need to repeat the file name each time, or put the leading '*' 
on the pattern name.  You can
also list more than one function at the same time if it has the same 
description, and use 'Likewise' when this
extends to multiple lines.

Finally, don't over-indent continuation lines.

So:

2013-06-14  Vidya Praveen vidyaprav...@arm.com

* config/aarch64/aarch64-simd.md (aarch64_sumlal_lomode):
New pattern.
(aarch64_sumlal_himode, aarch64_sumlsl_lomode): Likewise.
(aarch64_sumlsl_himode, aarch64_sumlalmode): Likewise.

etc.


Thanks Marcus/Richard for the recommendations. After changes:

gcc/ChangeLog

2013-06-14  Vidya Praveen vidyaprav...@arm.com

* config/aarch64/aarch64-simd.md (aarch64_sumlal_lomode):
New pattern.
(aarch64_sumlal_himode, aarch64_sumlsl_lomode): Likewise.
(aarch64_sumlsl_himode, aarch64_sumlalmode): Likewise.
(aarch64_sumlslmode): Likewise.

gcc/testsuite/ChangeLog

2013-06-14  Vidya Praveen vidyaprav...@arm.com

* gcc.target/aarch64/vect_smlal_1.c: New file.

~VP



Added myself to MAINTAINERS (Write After Approval)

2013-06-14 Thread Vidya Praveen


2013-06-14  Vidya Praveen  vidyaprav...@arm.com

* MAINTAINERS (Write After Approval): Add myself.


Index: MAINTAINERS
===
--- MAINTAINERS (revision 200091)
+++ MAINTAINERS (working copy)
@@ -487,6 +487,7 @@
 Paul Pluzhnikovppluzhni...@google.com
 Marek Polacek  pola...@redhat.com
 Antoniu Popantoniu@gmail.com
+Vidya Praveen  vidyaprav...@arm.com
 Vladimir Prus  vladi...@codesourcery.com
 Yao Qi y...@codesourcery.com
 Jerry Quinnjlqu...@optonline.net



Re: [AArch64] Support for CLZ

2013-05-23 Thread Vidya Praveen

On 23/05/13 14:40, Marcus Shawcroft wrote:

On 22 May 2013 12:47, Vidya Praveen vidyaprav...@arm.com wrote:

Hello,

This patch adds support to AdvSIMD CLZ instruction and adds tests for the
same.
Regression test done for aarch64-none-elf with no issues.

OK?

Regards
VP

---

gcc/ChangeLog

2013-05-22  Vidya Praveen vidyaprav...@arm.com

 * config/aarch64/aarch64-simd.md (clzv4si2): Support for CLZ
   instruction (AdvSIMD).
 * config/aarch64/aarch64-builtins.c
   (aarch64_builtin_vectorized_function): Handler for BUILT_IN_CLZ.
 * config/aarch64/aarch-simd-builtins.def: Entry for CLZ.
 * testsuite/gcc.target/aarch64/vect-clz.c: New file.


I committed this for you, and moved the testsuite ChangeLog entry over
to gcc/testsuite/ChangeLog.


Thanks Marcus! :-)

Regards
VP





[AArch64] Support for CLZ

2013-05-22 Thread Vidya Praveen

Hello,

This patch adds support to AdvSIMD CLZ instruction and adds tests for the same.
Regression test done for aarch64-none-elf with no issues.

OK?

Regards
VP

---

gcc/ChangeLog

2013-05-22  Vidya Praveen vidyaprav...@arm.com

* config/aarch64/aarch64-simd.md (clzv4si2): Support for CLZ
  instruction (AdvSIMD).
* config/aarch64/aarch64-builtins.c
  (aarch64_builtin_vectorized_function): Handler for BUILT_IN_CLZ.
* config/aarch64/aarch-simd-builtins.def: Entry for CLZ.
* testsuite/gcc.target/aarch64/vect-clz.c: New file.
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 4fdfe24..2a0e5fd 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1245,6 +1245,16 @@ aarch64_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
 	  return AARCH64_FIND_FRINT_VARIANT (sqrt);
 #undef AARCH64_CHECK_BUILTIN_MODE
 #define AARCH64_CHECK_BUILTIN_MODE(C, N) \
+  (out_mode == SImode  out_n == C \
+in_mode == N##Imode  in_n == C)
+case BUILT_IN_CLZ:
+  {
+if (AARCH64_CHECK_BUILTIN_MODE (4, S))
+  return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_clzv4si];
+return NULL_TREE;
+  }
+#undef AARCH64_CHECK_BUILTIN_MODE
+#define AARCH64_CHECK_BUILTIN_MODE(C, N) \
   (out_mode == N##Imode  out_n == C \
 in_mode == N##Fmode  in_n == C)
 	case BUILT_IN_LFLOOR:
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index e420173..5134f96 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -49,6 +49,7 @@
   BUILTIN_VDQF (UNOP, sqrt, 2)
   BUILTIN_VD_BHSI (BINOP, addp, 0)
   VAR1 (UNOP, addp, 0, di)
+  VAR1 (UNOP, clz, 2, v4si)
 
   BUILTIN_VD_RE (REINTERP, reinterpretdi, 0)
   BUILTIN_VDC (REINTERP, reinterpretv8qi, 0)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 9069a73..82fe1ad 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1611,6 +1611,15 @@
   DONE;
 })
 
+(define_insn clzmode2
+ [(set (match_operand:VDQ_BHSI 0 register_operand =w)
+   (clz:VDQ_BHSI (match_operand:VDQ_BHSI 1 register_operand w)))]
+ TARGET_SIMD
+ clz\\t%0.Vtype, %1.Vtype
+ [(set_attr simd_type simd_cls)
+  (set_attr simd_mode MODE)]
+)
+
 ;; 'across lanes' max and min ops.
 
 (define_insn reduc_maxmin_uns_mode
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-clz.c b/gcc/testsuite/gcc.target/aarch64/vect-clz.c
new file mode 100644
index 000..8f1fe70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-clz.c
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+/* { dg-options -O3 -save-temps -fno-inline } */
+
+extern void abort ();
+
+void
+count_lz_v4si (unsigned *__restrict a, int *__restrict b)
+{
+  int i;
+
+  for (i = 0; i  4; i++)
+b[i] = __builtin_clz (a[i]);
+}
+
+/* { dg-final { scan-assembler clz\tv\[0-9\]+\.4s } } */
+
+int
+main ()
+{
+  unsigned int x[4] = { 0x0, 0x, 0x1, 0x };
+  int r[4] = { 32, 16, 15, 0 };
+  int d[4], i;
+
+  count_lz_v4si (x, d);
+
+  for (i = 0; i  4; i++)
+{
+  if (d[i] != r[i])
+	abort ();
+}
+
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */

[AArch64] Support scalar form of FABD

2013-05-02 Thread Vidya Praveen

Hello,

This attached patch adds support to the scalar form of FABD
instruction along with the compile  execute tests for the same.

Regression tested on aarch64-none-elf with no issues.

OK?

Regards
VP

---

gcc/ChangeLog

2013-05-02  Vidya Praveen  vidyaprav...@arm.com

* config/aarch64/aarch64-simd.md (*fabd_scalarmode3): Support
  scalar form of FABD instruction.

gcc/testsuite/ChangeLog

2013-05-02  Vidya Praveen vidyaprav...@arm.com

* gcc.target/aarch64/fabd.c: New file.diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 5862d26..e5fc032 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -556,6 +556,17 @@
(set_attr simd_mode MODE)]
 )
 
+(define_insn *fabd_scalarmode3
+  [(set (match_operand:GPF 0 register_operand =w)
+(abs:GPF (minus:GPF
+ (match_operand:GPF 1 register_operand w)
+ (match_operand:GPF 2 register_operand w]
+  TARGET_SIMD
+  fabd\t%s0, %s1, %s2
+  [(set_attr simd_type simd_fabd)
+   (set_attr mode MODE)]
+)
+
 (define_insn andmode3
   [(set (match_operand:VDQ 0 register_operand =w)
 (and:VDQ (match_operand:VDQ 1 register_operand w)
diff --git a/gcc/testsuite/gcc.target/aarch64/fabd.c b/gcc/testsuite/gcc.target/aarch64/fabd.c
new file mode 100644
index 000..7206d5e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fabd.c
@@ -0,0 +1,38 @@
+/* { dg-do run } */
+/* { dg-options -O1 -fno-inline --save-temps } */
+
+extern double fabs (double);
+extern float fabsf (float);
+extern void abort ();
+extern void exit (int);
+
+void
+fabd_d (double x, double y, double d)
+{
+  if ((fabs (x - y) - d)  0.1)
+abort ();
+}
+
+/* { dg-final { scan-assembler fabd\td\[0-9\]+ } } */
+
+void
+fabd_f (float x, float y, float d)
+{
+  if ((fabsf (x - y) - d)  0.1)
+abort ();
+}
+
+/* { dg-final { scan-assembler fabd\ts\[0-9\]+ } } */
+
+int
+main ()
+{
+  fabd_d (10.0, 5.0, 5.0);
+  fabd_d (5.0, 10.0, 5.0);
+  fabd_f (10.0, 5.0, 5.0);
+  fabd_f (5.0, 10.0, 5.0);
+
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */

[AArch64] Fix the description of simd_fabd

2013-05-02 Thread Vidya Praveen

Hello,

This attached patch corrects the description for simd_fabd.

OK?

Regards
VP


gcc/ChangeLog

2013-05-02  Vidya Praveen  vidyaprav...@arm.com

* config/aarch64/aarch64-simd.md (simd_fabd): Correct the description.diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 5862d26..65847ce 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -44,7 +44,7 @@
 ; simd_dup  duplicate element.
 ; simd_dupgpduplicate general purpose register.
 ; simd_ext  bitwise extract from pair.
-; simd_fabd floating absolute difference and accumulate.
+; simd_fabd floating point absolute difference.
 ; simd_fadd floating point add/sub.
 ; simd_fcmp floating point compare.
 ; simd_fcvtifloating point convert to integer.