On 28/02/2023 23:01, Kwok Cheung Yeung wrote:
Hello

This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION target hook for the AMD GCN architecture, such that when vectorized, calls to builtin standard math functions such as asinf, exp, pow etc. are converted to calls to the recently added vectorized math functions for GCN in Newlib. The -fno-math-errno flag is required in addition to the usual vectorization optimization flags for this to occur, and some of the math functions (the larger double-precision ones) require a large stack size to function properly.

This patch requires the GCN vector math functions in Newlib to function - these were included in the recent 4.3.0.20230120 snapshot. As this was a minimum requirement starting from the patch 'amdgcn, libgomp: Manually allocated stacks', this should not be a problem.

I have added new testcases in the testsuite that compare the output of the vectorized math functions against the scalar, passing if they are sufficiently close. With the testcase for standalone GCN (without libgomp) in gcc.target/gcn/, there is a problem since gcn-run currently cannot set the stack size correctly in DejaGnu testing, so I have made it a compile test for now - it is still useful to check that calls to the correct functions are being made. The runtime correctness is still covered by the libgomp test.

Okay for trunk?

The main part of the patch is OK, with the small changes below.

Others have pointed out that "omp declare simd" exists, but you and I have been all through that verbally, long ago, and as Tobias says the offload compiler cannot rely on markup in the host compiler's header files to solve this problem.

@@ -7324,6 +7429,11 @@ gcn_dwarf_register_span (rtx rtl)
   gcn_simd_clone_compute_vecsize_and_simdlen
 #undef  TARGET_SIMD_CLONE_USABLE
 #define TARGET_SIMD_CLONE_USABLE gcn_simd_clone_usable
+#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
+#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \
+  gcn_vectorize_builtin_vectorized_function
+#undef TARGET_LIBC_HAS_FUNCTION
+#define TARGET_LIBC_HAS_FUNCTION gcn_libc_has_function
 #undef  TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
 #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P \
   gcn_small_register_classes_for_mode_p

Please keep these in alphabetical order.

+/* Ideally this test should be run, but the math routines require a large
+   stack and gcn-run currently does not respect the stack-size parameter.  */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -fno-math-errno -mstack-size=3000000 
-fdump-tree-vect" } */

This isn't ideal. The dg-set-target-env-var directive (I think this is it?) can set GCN_STACK_SIZE, which gcn-run does honour, but I realise that doesn't work with remote test targets (like ours).

I suggest adding an additional test that sets the envvar and #includes the code from this one; one test to scan the dumps, one test to run it. Like this .... (untested, syntax uncertain).

/* { dg-do run } */
/* { dg-options "-O2 -ftree-vectorize -fno-math-errno" } */
/* { dg-set-target-env-var "GCN_STACK_SIZE" "3000000" } */
#include "simd-math-1.c"

The run test will get skipped in our test environment (and anyone else using remote), but the libgomp test should make up for that.

Andrew

Reply via email to