JonasToth added a comment. > - The output of clang-tidy diagnostic is YAML, and YAML is not an > space-efficient format (just for human readability). If you want to save > space further, we might consider using some compressed formats, e.g. > llvm::bitcode. Given the reduced YAML result (5.4MB) is promising, this might > not matter.
The output were normal diagnostics written to stdout, deduplication happens from there (see the test-cases). The files i created were just through piping to filter some of the noise. Without de-duplication its very hard to get something useful out of a run with many checks activated for bigger projects (e.g. Blender and OpenCV are useless to try, because they have some commonly used macros with a check-violation. The buildbot filled 30GB of RAM before it crashed and couldn't even finish the analysis of the project. Similar for LLVM) > - clang-tidy itself doesn't do deduplication, and `run-clang-tidy.py` seems > an old way of running clang-tidy in parallel. The python script seems become > more complicated now. We have `AllTUsToolExecutor` right now, which supports > running clang tools on a compilation database in parallel, so another option > would be to use `AllTUsToolExecutor` in clang-tidy, and we can do > deduplication inside clang-tidy binary (in reduce phase), which should be > faster than the python script (spawn new clang-tidy processes and do > round-trip of all the data through YAML-on-disk). Yes, this patch came out of necessity because testing through all available clang-tidy checks for big projects and see if their transformations are incorrect or not was/is just impossible right now with the tools we have upstream. I agree that `AllTUsToolExecutor` would be better instead of the python script, but i think getting this done takes longer, then just patching the script now. From the patch here (it is an by-default off option as well) it is easier to test all pieces of clang-tidy. From there we can easily migrate to something better then `run-clang-tidy.py´. The deduplication within clang-tidy would be the best option! But for full deduplication the parallelization must happen first. > The python script seems become more complicated now. A bit, yes. The actual calling of clang-tidy and other parts are not touched. Just the parser adds additional complexity, which is covered in the unit tests. I don't think this solution lives for ever, but its fast and effective, and again its optional and by default off. For context: This is more a spinoff of my attempts of getting statistics of clang-tidy results for a wide range of projects. This parser is the minimal version that can do de-duplication. Repository: rCTE Clang Tools Extra https://reviews.llvm.org/D54141 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits