On 28/02/2023 23:01, Kwok Cheung Yeung wrote:
Hello
This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
target hook for the AMD GCN architecture, such that when vectorized,
calls to builtin standard math functions such as asinf, exp, pow etc.
are converted to calls to the recently added vectorized math functions
for GCN in Newlib. The -fno-math-errno flag is required in addition to
the usual vectorization optimization flags for this to occur, and some
of the math functions (the larger double-precision ones) require a large
stack size to function properly.
This patch requires the GCN vector math functions in Newlib to function
- these were included in the recent 4.3.0.20230120 snapshot. As this was
a minimum requirement starting from the patch 'amdgcn, libgomp: Manually
allocated stacks', this should not be a problem.
I have added new testcases in the testsuite that compare the output of
the vectorized math functions against the scalar, passing if they are
sufficiently close. With the testcase for standalone GCN (without
libgomp) in gcc.target/gcn/, there is a problem since gcn-run currently
cannot set the stack size correctly in DejaGnu testing, so I have made
it a compile test for now - it is still useful to check that calls to
the correct functions are being made. The runtime correctness is still
covered by the libgomp test.
Okay for trunk?
The main part of the patch is OK, with the small changes below.
Others have pointed out that "omp declare simd" exists, but you and I
have been all through that verbally, long ago, and as Tobias says the
offload compiler cannot rely on markup in the host compiler's header
files to solve this problem.
@@ -7324,6 +7429,11 @@ gcn_dwarf_register_span (rtx rtl)
gcn_simd_clone_compute_vecsize_and_simdlen
#undef TARGET_SIMD_CLONE_USABLE
#define TARGET_SIMD_CLONE_USABLE gcn_simd_clone_usable
+#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
+#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \
+ gcn_vectorize_builtin_vectorized_function
+#undef TARGET_LIBC_HAS_FUNCTION
+#define TARGET_LIBC_HAS_FUNCTION gcn_libc_has_function
#undef TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
#define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P \
gcn_small_register_classes_for_mode_p
Please keep these in alphabetical order.
+/* Ideally this test should be run, but the math routines require a large
+ stack and gcn-run currently does not respect the stack-size parameter. */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -fno-math-errno -mstack-size=3000000
-fdump-tree-vect" } */
This isn't ideal. The dg-set-target-env-var directive (I think this is
it?) can set GCN_STACK_SIZE, which gcn-run does honour, but I realise
that doesn't work with remote test targets (like ours).
I suggest adding an additional test that sets the envvar and #includes
the code from this one; one test to scan the dumps, one test to run it.
Like this .... (untested, syntax uncertain).
/* { dg-do run } */
/* { dg-options "-O2 -ftree-vectorize -fno-math-errno" } */
/* { dg-set-target-env-var "GCN_STACK_SIZE" "3000000" } */
#include "simd-math-1.c"
The run test will get skipped in our test environment (and anyone else
using remote), but the libgomp test should make up for that.
Andrew