https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98302
--- Comment #15 from Alex Coplan <acoplan at gcc dot gnu.org> --- (In reply to Martin Liška from comment #11) > > which is miscompiled at -O2 -ftree-vectorize or -O3. > > What a great reduction, can you please share knowledge how did you achieve > that?! Hi Martin, Sorry for the late reply. Generally, when reducing a yarpgen testcase, I start by checking whether we can hit the bug with just one large C++ file, i.e. with: $ cat init.h func.cpp driver.cpp > combined.cc and then deleting the #include of "init.h" from combined.cc. For most issues, you should be able to reproduce the bug with the combined file. For a wrong code bug, I usually proceed with two separate reductions: 1. Reduce the C++ file without preprocessing. 2. Attempt to manually convert the reduced C++ testcase to a C testcase. 3. Preprocess the C testcase and then reduce the preprocessed file. For (2) this usually involves superficial syntax changes, using C headers instead of their C++ equivalents, and replacing calls to std::{min,max} with a C equivalent (using pointers instead of references). I use a relatively beefy x86 machine for reduction, using qemu to run target code (aarch64, in this case). My reduction script broadly does the following: Sanitizer checks: * I run three separate compile+executes (2x gcc, 1x clang) on the x86 host with sanitizer options enabled (-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=undefined). I use the host GCC (7.5.0 on Ubuntu 18.04) and clang 10 for this. I run clang at -O0 and GCC at both -O0 and -O1 -fno-common (the latter appears to be necessary to catch some global buffer overflows). * We fail the interestingness test if any of these executions fail. If possible, I also save the results of these executions for comparison with the target executions. Target comparisons: * Then, using the target (aarch64) GCC, I compile+execute at various optimisation levels. In this case I used -O{0,1,2,s,g,3}, checking that the output (exit code and stdout/stderr) differs for the optimisation level exhibiting the bug (-O3 in this case) but is the same for all other cases. Finally, I run the reduced testcase through https://cerberus.cl.cam.ac.uk/ to check for any undefined behaviour that the sanitizers didn't manage to catch. More recently I've also been experimenting with Frama-C which works on some testcases that are too large for Cerberus to handle. Currently the reduction script is a simple bash script that runs everything serially. I've been experimenting with some python scripts that can run things in parallel, I might also explore generating make/ninja to parallelise the interestingness test at some point. I'm sure that this process could be improved, but this is what I've been using so far, and it seems to work. I hope this is of some use, let me know if you have any more specific questions or thoughts.