[PATCH v2, RFC] fsra: gimple final sra pass for paramters and returns

2024-03-07 Thread Jiufu Guo
Hi,

As known there are a few PRs (meta-bug PR101926) about
accessing aggregate param/returns which are passed through registers.

We could use the current SRA pass in a special mode right before
RTL expansion for the incoming/outgoing part, as the suggestion from:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637935.html

Compared to previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646625.html
This version enhanced the expand_ARG_PARTS to pass bootstrap aarch64.
And this version merge previous three patches into one patch for
review.

As mentioned in previous version:
This patch is using IFN ARG_PARTS and SET_RET_PARTS for parameters
and returns. And expand the IFNs according to the incoming/outgoing
registers.

Bootstrapped/regtested on ppc64{,le}, x86_64 and aarch64.

Again there are a few thing could be enhanced for this patch:
* Multi-registers access
* Parameter access cross call
* Optimize for access parameter which in memory
* More cases/targets checking

I would like to ask for comments to avoid major flaw.

BR,
Jeff (Jiufu Guo)


PR target/108073
PR target/65421
PR target/69143

gcc/ChangeLog:

* cfgexpand.cc (expand_value_return): Update for rtx eq checking.
(expand_return): Update for sclarized returns.
* internal-fn.cc (query_position_in_parallel): New function.
(construct_reg_seq): New function.
(get_incoming_element): New function.
(reference_alias_ptr_type): Extern declare.
(expand_ARG_PARTS): New IFN expand.
(store_outgoing_element): New function.
(expand_SET_RET_PARTS): New IFN expand.
(expand_SET_RET_LAST_PARTS): New IFN expand.
* internal-fn.def (ARG_PARTS): New IFN.
(SET_RET_PARTS): New IFN.
(SET_RET_LAST_PARTS): New IFN.
* passes.def (pass_sra_final): Add new pass.
* tree-pass.h (make_pass_sra_final): New function.
* tree-sra.cc (enum sra_mode): New enum item SRA_MODE_FINAL_INTRA.
(build_accesses_from_assign): Accept SRA_MODE_FINAL_INTRA.
(scan_function): Update for argment in fsra.
(find_var_candidates): Collect candidates for SRA_MODE_FINAL_INTRA.
(analyze_access_subtree): Update analyze for fsra.
(generate_subtree_copies): Update to generate new IFNs.
(final_intra_sra): New function.
(class pass_sra_final): New class.
(make_pass_sra_final): New function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/pr102024.C: Update instructions.
* gcc.target/powerpc/pr108073-1.c: New test.
* gcc.target/powerpc/pr108073.c: New test.
* gcc.target/powerpc/pr65421.c: New test.
* gcc.target/powerpc/pr69143.c: New test.

---
 gcc/cfgexpand.cc  |   6 +-
 gcc/internal-fn.cc| 254 ++
 gcc/internal-fn.def   |   9 +
 gcc/passes.def|   2 +
 gcc/tree-pass.h   |   1 +
 gcc/tree-sra.cc   | 157 ++-
 gcc/testsuite/g++.target/powerpc/pr102024.C   |   2 +-
 gcc/testsuite/gcc.target/powerpc/pr108073-1.c |  76 ++
 gcc/testsuite/gcc.target/powerpc/pr108073.c   |  74 +
 gcc/testsuite/gcc.target/powerpc/pr65421.c|  10 +
 gcc/testsuite/gcc.target/powerpc/pr69143.c|  23 ++
 11 files changed, 598 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr69143.c

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index eef565eddb5..1ec6c2d8102 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -3759,7 +3759,7 @@ expand_value_return (rtx val)
 
   tree decl = DECL_RESULT (current_function_decl);
   rtx return_reg = DECL_RTL (decl);
-  if (return_reg != val)
+  if (!rtx_equal_p (return_reg, val))
 {
   tree funtype = TREE_TYPE (current_function_decl);
   tree type = TREE_TYPE (decl);
@@ -3832,6 +3832,10 @@ expand_return (tree retval)
  been stored into it, so we don't have to do anything special.  */
   if (TREE_CODE (retval_rhs) == RESULT_DECL)
 expand_value_return (result_rtl);
+  /* return is scalarized by fsra: TODO use FLAG. */
+  else if (VAR_P (retval_rhs)
+  && rtx_equal_p (result_rtl, DECL_RTL (retval_rhs)))
+expand_null_return_1 ();
 
   /* If the result is an aggregate that is being returned in one (or more)
  registers, load the registers here.  */
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index fcf47c7fa12..905ee7da005 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3394,6 +3394,260 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
 }
 }
 
+/* In the parallel rtx register series REGS, compute the register position for
+   given {BITPOS, B

Re: [PATCH v2, RFC] fsra: gimple final sra pass for paramters and returns

2024-04-16 Thread Jiufu Guo


Hi,

Ping this RFC, look forward to comments for the incoming stage1...

Jeff (Jiufu Guo)

Jiufu Guo  writes:

> Hi,
>
> As known there are a few PRs (meta-bug PR101926) about
> accessing aggregate param/returns which are passed through registers.
>
> We could use the current SRA pass in a special mode right before
> RTL expansion for the incoming/outgoing part, as the suggestion from:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637935.html
>
> Compared to previous version:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646625.html
> This version enhanced the expand_ARG_PARTS to pass bootstrap aarch64.
> And this version merge previous three patches into one patch for
> review.
>
> As mentioned in previous version:
> This patch is using IFN ARG_PARTS and SET_RET_PARTS for parameters
> and returns. And expand the IFNs according to the incoming/outgoing
> registers.
>
> Bootstrapped/regtested on ppc64{,le}, x86_64 and aarch64.
>
> Again there are a few thing could be enhanced for this patch:
> * Multi-registers access
> * Parameter access cross call
> * Optimize for access parameter which in memory
> * More cases/targets checking
>
> I would like to ask for comments to avoid major flaw.
>
> BR,
> Jeff (Jiufu Guo)
>
>
>   PR target/108073
>   PR target/65421
>   PR target/69143
>
> gcc/ChangeLog:
>
>   * cfgexpand.cc (expand_value_return): Update for rtx eq checking.
>   (expand_return): Update for sclarized returns.
>   * internal-fn.cc (query_position_in_parallel): New function.
>   (construct_reg_seq): New function.
>   (get_incoming_element): New function.
>   (reference_alias_ptr_type): Extern declare.
>   (expand_ARG_PARTS): New IFN expand.
>   (store_outgoing_element): New function.
>   (expand_SET_RET_PARTS): New IFN expand.
>   (expand_SET_RET_LAST_PARTS): New IFN expand.
>   * internal-fn.def (ARG_PARTS): New IFN.
>   (SET_RET_PARTS): New IFN.
>   (SET_RET_LAST_PARTS): New IFN.
>   * passes.def (pass_sra_final): Add new pass.
>   * tree-pass.h (make_pass_sra_final): New function.
>   * tree-sra.cc (enum sra_mode): New enum item SRA_MODE_FINAL_INTRA.
>   (build_accesses_from_assign): Accept SRA_MODE_FINAL_INTRA.
>   (scan_function): Update for argment in fsra.
>   (find_var_candidates): Collect candidates for SRA_MODE_FINAL_INTRA.
>   (analyze_access_subtree): Update analyze for fsra.
>   (generate_subtree_copies): Update to generate new IFNs.
>   (final_intra_sra): New function.
>   (class pass_sra_final): New class.
>   (make_pass_sra_final): New function.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.target/powerpc/pr102024.C: Update instructions.
>   * gcc.target/powerpc/pr108073-1.c: New test.
>   * gcc.target/powerpc/pr108073.c: New test.
>   * gcc.target/powerpc/pr65421.c: New test.
>   * gcc.target/powerpc/pr69143.c: New test.
>
> ---
>  gcc/cfgexpand.cc  |   6 +-
>  gcc/internal-fn.cc| 254 ++
>  gcc/internal-fn.def   |   9 +
>  gcc/passes.def|   2 +
>  gcc/tree-pass.h   |   1 +
>  gcc/tree-sra.cc   | 157 ++-
>  gcc/testsuite/g++.target/powerpc/pr102024.C   |   2 +-
>  gcc/testsuite/gcc.target/powerpc/pr108073-1.c |  76 ++
>  gcc/testsuite/gcc.target/powerpc/pr108073.c   |  74 +
>  gcc/testsuite/gcc.target/powerpc/pr65421.c|  10 +
>  gcc/testsuite/gcc.target/powerpc/pr69143.c|  23 ++
>  11 files changed, 598 insertions(+), 16 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr69143.c
>
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index eef565eddb5..1ec6c2d8102 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -3759,7 +3759,7 @@ expand_value_return (rtx val)
>  
>tree decl = DECL_RESULT (current_function_decl);
>rtx return_reg = DECL_RTL (decl);
> -  if (return_reg != val)
> +  if (!rtx_equal_p (return_reg, val))
>  {
>tree funtype = TREE_TYPE (current_function_decl);
>tree type = TREE_TYPE (decl);
> @@ -3832,6 +3832,10 @@ expand_return (tree retval)
>   been stored into it, so we don't have to do anything special.  */
>if (TREE_CODE (retval_rhs) == RESULT_DECL)
>  expand_value_return (result_rtl);
> +  /* return is scalarized by fsra: TODO use FLAG. */
> +  else if (VAR_P (retval_rhs)
> +&& rtx_equal_p (result_rtl, DECL_RTL (retval_rhs)))
> +expand_null_return_1 ();
>  
>/* If the result is an aggregate that is being returned in one (or more)
>   registers, load the registers here.  */
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index f

Re: [PATCH v2, RFC] fsra: gimple final sra pass for paramters and returns

2024-05-10 Thread Jiufu Guo


Hi,

Ping this patch, look forward to comments.


Jeff (Jiufu Guo)

Jiufu Guo  writes:

> Hi,
>
> As known there are a few PRs (meta-bug PR101926) about
> accessing aggregate param/returns which are passed through registers.
>
> We could use the current SRA pass in a special mode right before
> RTL expansion for the incoming/outgoing part, as the suggestion from:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637935.html
>
> Compared to previous version:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646625.html
> This version enhanced the expand_ARG_PARTS to pass bootstrap aarch64.
> And this version merge previous three patches into one patch for
> review.
>
> As mentioned in previous version:
> This patch is using IFN ARG_PARTS and SET_RET_PARTS for parameters
> and returns. And expand the IFNs according to the incoming/outgoing
> registers.
>
> Bootstrapped/regtested on ppc64{,le}, x86_64 and aarch64.
>
> Again there are a few thing could be enhanced for this patch:
> * Multi-registers access
> * Parameter access cross call
> * Optimize for access parameter which in memory
> * More cases/targets checking
>
> I would like to ask for comments to avoid major flaw.
>
> BR,
> Jeff (Jiufu Guo)
>
>
>   PR target/108073
>   PR target/65421
>   PR target/69143
>
> gcc/ChangeLog:
>
>   * cfgexpand.cc (expand_value_return): Update for rtx eq checking.
>   (expand_return): Update for sclarized returns.
>   * internal-fn.cc (query_position_in_parallel): New function.
>   (construct_reg_seq): New function.
>   (get_incoming_element): New function.
>   (reference_alias_ptr_type): Extern declare.
>   (expand_ARG_PARTS): New IFN expand.
>   (store_outgoing_element): New function.
>   (expand_SET_RET_PARTS): New IFN expand.
>   (expand_SET_RET_LAST_PARTS): New IFN expand.
>   * internal-fn.def (ARG_PARTS): New IFN.
>   (SET_RET_PARTS): New IFN.
>   (SET_RET_LAST_PARTS): New IFN.
>   * passes.def (pass_sra_final): Add new pass.
>   * tree-pass.h (make_pass_sra_final): New function.
>   * tree-sra.cc (enum sra_mode): New enum item SRA_MODE_FINAL_INTRA.
>   (build_accesses_from_assign): Accept SRA_MODE_FINAL_INTRA.
>   (scan_function): Update for argment in fsra.
>   (find_var_candidates): Collect candidates for SRA_MODE_FINAL_INTRA.
>   (analyze_access_subtree): Update analyze for fsra.
>   (generate_subtree_copies): Update to generate new IFNs.
>   (final_intra_sra): New function.
>   (class pass_sra_final): New class.
>   (make_pass_sra_final): New function.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.target/powerpc/pr102024.C: Update instructions.
>   * gcc.target/powerpc/pr108073-1.c: New test.
>   * gcc.target/powerpc/pr108073.c: New test.
>   * gcc.target/powerpc/pr65421.c: New test.
>   * gcc.target/powerpc/pr69143.c: New test.
>
> ---
>  gcc/cfgexpand.cc  |   6 +-
>  gcc/internal-fn.cc| 254 ++
>  gcc/internal-fn.def   |   9 +
>  gcc/passes.def|   2 +
>  gcc/tree-pass.h   |   1 +
>  gcc/tree-sra.cc   | 157 ++-
>  gcc/testsuite/g++.target/powerpc/pr102024.C   |   2 +-
>  gcc/testsuite/gcc.target/powerpc/pr108073-1.c |  76 ++
>  gcc/testsuite/gcc.target/powerpc/pr108073.c   |  74 +
>  gcc/testsuite/gcc.target/powerpc/pr65421.c|  10 +
>  gcc/testsuite/gcc.target/powerpc/pr69143.c|  23 ++
>  11 files changed, 598 insertions(+), 16 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr69143.c
>
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index eef565eddb5..1ec6c2d8102 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -3759,7 +3759,7 @@ expand_value_return (rtx val)
>  
>tree decl = DECL_RESULT (current_function_decl);
>rtx return_reg = DECL_RTL (decl);
> -  if (return_reg != val)
> +  if (!rtx_equal_p (return_reg, val))
>  {
>tree funtype = TREE_TYPE (current_function_decl);
>tree type = TREE_TYPE (decl);
> @@ -3832,6 +3832,10 @@ expand_return (tree retval)
>   been stored into it, so we don't have to do anything special.  */
>if (TREE_CODE (retval_rhs) == RESULT_DECL)
>  expand_value_return (result_rtl);
> +  /* return is scalarized by fsra: TODO use FLAG. */
> +  else if (VAR_P (retval_rhs)
> +&& rtx_equal_p (result_rtl, DECL_RTL (retval_rhs)))
> +expand_null_return_1 ();
>  
>/* If the result is an aggregate that is being returned in one (or more)
>   registers, load the registers here.  */
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index fcf47c7fa12..905ee7da005