JonasToth added a comment.

> - The output of clang-tidy diagnostic is YAML, and YAML is not an 
> space-efficient format (just for human readability). If you want to save 
> space further, we might consider using some compressed formats, e.g. 
> llvm::bitcode. Given the reduced YAML result (5.4MB) is promising, this might 
> not matter.

The output were normal diagnostics written to stdout, deduplication happens 
from there (see the test-cases). The files i created were just through piping 
to filter some of the noise.
Without de-duplication its very hard to get something useful out of a run with 
many checks activated for bigger projects (e.g. Blender and OpenCV are useless 
to try, because they have some commonly used macros with a check-violation. The 
buildbot filled 30GB of RAM before it crashed and couldn't even finish the 
analysis of the project. Similar for LLVM)

> - clang-tidy itself doesn't do deduplication, and `run-clang-tidy.py` seems 
> an old way of running clang-tidy in parallel. The python script seems become 
> more complicated now.  We have `AllTUsToolExecutor` right now, which supports 
> running clang tools on a compilation database in parallel, so another option 
> would be to use `AllTUsToolExecutor` in clang-tidy, and we can do 
> deduplication inside clang-tidy binary (in reduce phase), which should be 
> faster than the python script (spawn new clang-tidy processes and do 
> round-trip of all the data through YAML-on-disk).

Yes, this patch came out of necessity because testing through all available 
clang-tidy checks for big projects and see if their transformations are 
incorrect or not was/is just impossible right now with the tools we have 
upstream.
I agree that `AllTUsToolExecutor` would be better instead of the python script, 
but i think getting this done takes longer, then just patching the script now. 
From the patch here (it is an by-default off option as well) it is easier to 
test all pieces of clang-tidy. From there we can easily migrate to something 
better then `run-clang-tidy.py´.
The deduplication within clang-tidy would be the best option! But for full 
deduplication the parallelization must happen first.

> The python script seems become more complicated now.

A bit, yes. The actual calling of clang-tidy and other parts are not touched. 
Just the parser adds additional complexity, which is covered in the unit tests. 
I don't think this solution lives for ever, but its fast and effective, and 
again its optional and by default off.

For context: This is more a spinoff of my attempts of getting statistics of 
clang-tidy results for a wide range of projects. This parser is the minimal 
version that can do de-duplication.


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D54141



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to