[PATCH] D97869: [OpenCL][Draft] Add OpenCL builtin test generator

Anastasia Stulova via Phabricator via cfe-commits Wed, 24 Mar 2021 13:33:06 -0700

Anastasia added a comment.

I have done some measurements using the test produced from this Tablegen 
emitter (59K lines).


I have used the test it in two files:

1. `SemaOpenCL/all-std-buitins.cl` that has the following RUN line appended 6 
times (for every supported OpenCL version v1.0, v1.1, v1.2, v2.0, v1.3, C++)

  //RUN: %clang_cc1 %s -triple=spir -fsyntax-only -verify -cl-std=CL2.0 
-finclude-default-header -fdeclare-opencl-builtins



2. `SemaOpenCL/all-std-buitins-slow-header.cl` that has the following RUN line 
appended 6 times (for every supported OpenCL version v1.0, v1.1, v1.2, v2.0, 
v3.0, C++)

  //RUN: %clang_cc1 %s -triple=spir -fsyntax-only -verify -cl-std=CL2.0 
-finclude-default-header

So I am getting the following testing time breakdown then:

  201.61s: Clang :: SemaOpenCL/all-std-buitins-slow-header.cl
  199.70s: Clang :: SemaOpenCL/all-std-buitins.cl
  85.14s: Clang :: Headers/arm-neon-header.c
  68.06s: Clang :: OpenMP/nesting_of_regions.cpp
  65.23s: Clang :: Driver/crash-report.c
  60.26s: Clang :: Analysis/PR24184.cpp
  57.80s: Clang :: CodeGen/X86/rot-intrinsics.c
  57.58s: Clang :: CodeGen/X86/x86_64-xsave.c
  56.34s: Clang :: Headers/opencl-c-header.cl
  55.68s: Clang :: CodeGen/X86/x86_32-xsave.c
  44.83s: Clang :: Driver/crash-report-with-asserts.c
  40.38s: Clang :: Lexer/SourceLocationsOverflow.c
  37.44s: Clang :: Headers/x86intrin-2.c
  36.53s: Clang :: 
OpenMP/target_teams_distribute_parallel_for_simd_codegen_registration.cpp
  34.09s: Clang :: CodeGen/X86/avx512f-builtins-constrained.c
  33.41s: Clang :: CodeGen/X86/sse-builtins-constrained.c
  32.82s: Clang :: Analysis/iterator-modeling.cpp
  31.37s: Clang :: OpenMP/target_teams_distribute_simd_codegen_registration.cpp
  31.10s: Clang :: OpenMP/target_parallel_for_simd_codegen_registration.cpp
  30.78s: Clang :: Analysis/use-after-move.cpp

I am very confused though about why is the difference between Tablegen and 
`opencl-c.h` so insignificant? FYI, also for a single clang invocation with 
Tablegen and `opencl-c.h` the difference is very insignificant in parsing time 
of this test - 20.794s vs 21.401s. This is really interesting because with 
small files the difference is huge 0.043s vs 3.990s on test with empty kernel.

---------------------------------------

I also timed `check-clang` invocation on my 8 core machine:

1. with both tests  - 697.70s
2. with all-std-buitins.cl only  - 684.43s
3. without any new tests  - 673.00s

The change in total testing time appears to be insignificant. I guess this is 
due to parallel execution?
Btw one thing I have thought of since OpenCL v1.0-1.1 doesn't differ a lot for 
builtin functions and they are not modified much either, perhaps we only need 
to test v1.2? That would reduce number of clang invocations to 4 in each test. 
Then the measurements are as follows:

  134.13s: Clang :: SemaOpenCL/all-std-buitins-slow-header.cl
  131.52s: Clang :: SemaOpenCL/all-std-buitins.cl
  85.81s: Clang :: Headers/arm-neon-header.c
  69.14s: Clang :: OpenMP/nesting_of_regions.cpp
  60.08s: Clang :: Driver/crash-report.c
  59.67s: Clang :: Analysis/PR24184.cpp
  57.27s: Clang :: CodeGen/X86/rot-intrinsics.c
  56.93s: Clang :: CodeGen/X86/x86_32-xsave.c
  56.59s: Clang :: CodeGen/X86/x86_64-xsave.c
  55.68s: Clang :: Headers/opencl-c-header.cl
  40.71s: Clang :: Driver/crash-report-with-asserts.c
  39.44s: Clang :: Lexer/SourceLocationsOverflow.c
  38.02s: Clang :: 
OpenMP/target_teams_distribute_parallel_for_simd_codegen_registration.cpp
  37.07s: Clang :: Headers/x86intrin-2.c
  32.61s: Clang :: CodeGen/X86/avx512f-builtins-constrained.c
  32.58s: Clang :: CodeGen/X86/sse-builtins-constrained.c
  32.19s: Clang :: Analysis/use-after-move.cpp
  31.96s: Clang :: Analysis/iterator-modeling.cpp
  31.02s: Clang :: OpenMP/target_teams_distribute_simd_codegen_registration.cpp
  30.59s: Clang :: OpenMP/target_parallel_for_simd_codegen_registration.cpp

with a total testing time 688.61s

**Conclusion:**

- if we test the whole functionality the test will be at least 2x slower than 
the slowest clang test so far but it hardly affect the full testing time of 
clang-check on modern HW due to the parallel execution. Also related to this 
partitioning of test files could help with the latency due to the parallel 
execution.
- Testing of opencl-c.h only doubles the testing time.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97869/new/

https://reviews.llvm.org/D97869

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97869: [OpenCL][Draft] Add OpenCL builtin test generator

Reply via email to