Hi, Sorry that it's taking me so long to get back to you on this. I wanted to finish a bunch of patches and check them in, then perform a merge, before proceeding to the tests, so that the performance results could be easily duplicated. This took longer than I'd anticipated.
On Jun 8, 2009, Diego Novillo <dnovi...@google.com> wrote: > - Size of the IL over some standard code bodies > (http://gcc.gnu.org/wiki/PerformanceTesting). I started looking at this wiki page last night. I expected to find something in there to measure the size of the IL, but nothing jumped at me. Are you speaking of taking the sizes of tree or rtl dumps, or is there a more accurate measure? > - Memory consumption in cc1/cc1plus at -Ox -g over that set of apps. Wouldn't this be expected to be strongly correlated with the above? Is -fmem-report processed by mem-stats what you're after? > - Compile time in cc1/cc1plus at -Ox -g. While trying to figure out the items above, I've been working on this first, although now I realize there's -ftime-report and a time-stats that you might have wanted instead. Anyhow... I'll present the methodology and results I have so far, mainly because I expect to be mostly away from computers starting some time tomorrow (*), to return only on the 28th. (*) still pending a fix for tickets that were purchased incorrectly for a Free Software event in which I'm expected to speak; I might end up not flying, and stay around till Tuesday. On x86_64-linux-gnu, I bootstrapped and installed tags/var-tracking-assignments-merge-148582-trunk (tr...@148582) and tags/var-tracking-assignments-merge-148582-after (branches/var-tracking-assignments-bra...@148600), then used these toolchains to build and install --disable-bootstrap --enable-languages=c toolchains with -O2 -g0. These C-only toolchains were the ones I used for the performance tests below. Then, I configured and built, out of the sources in the vta branch, 6 variants of GCC, all of them --disable-bootstrap --enable-languages=c, with CFLAGS="-O2 -time=`pwd`.log" and CC="/path/to/installed/$which/bin/gcc $gflags" (which and gflags defined below; below, vt=var-tracking and vta=var-tracking-assignments) # name user time which gflags 1 g0-trunk 18m57.284s trunk -g0 2 g0 18m36.999s vta -g0 3 g-novt 19m08.668s vta -g -fno-$vt -fno-vta 4 g-novta 19m35.518s vta -g -f$vt -fno-$vta 5 g-novt-vta 19m29.107s vta -g -fno-$vt -f$vta 6 g 21m19.831s vta -g -f$vt -f$vta This is a single run so far; I'm now running a few more of these to average the results, but they already show some interesting points: - using the trunk compiler, rather than the vta compiler, makes the build slower. AFAICT, the difference between the object files is mostly limited to the compiler version in .comment, but I found cases in which rodata and eh_frame were emitted by trunk, but not by vta. I don't recall any patch that might have this effect, and I haven't looked into it further yet. This difference might also be explained by caching: I built the 3 toolchains out of a top-level Makefile that recursed into them, using -j3 for the top level, on a 4-processor box, so there were two toolchains building out of the vta compiler while only one building out of the trunk compiler. This could keep the vta compiler hotter in the cache or something. Anyhow, this hopefully provides some evidence that supporting VTA doesn't make the compiler slower, in spite of the testing for debug stmts and insns. I'll repeat the tests without -j, just to be on the safe side. - emitting debug information, without any var tracking whatsoever, incurs an overhead on -O2 compilations of 2.9% - enabling var tracking raises the overhead over -g0 to 5.2%, or 2.2% over non-VT debug info, and this doesn't count the overhead of carrying REG and MEM attributes that are only used for debug information purposes. - carrying, maintaining and ignoring as needed the annotations needed for VTA, without running the var-tracking pass, costs less than var-tracking: 4.7% over -g0, or 1.7% over non-VT debug info - carrying all the VTA debug annotations and running var-tracking with support for them, and tracking values as needed to support VTA, raises the overhead to 14.6% over -g0, or 11.3% over non-VT debug info. This is more than just adding the overheads of carrying the annotations and that of running the old, much simpler VT pass: the VTA-supporting VT pass maintains and propagates far more information in order to get better debug info. The extra information exposes weaknesses in var-tracking data structures and algorithms, such as excess memory use and algorithmic complexity. This is not a privilege of VTA: https://bugzilla.redhat.com/show_bug.cgi?id=503816 comes up in a toolchain that has no traces whatsoever of VTA code, but it still exhibits excessive memory use and compile time, very much like compiling HTML401F in libjava with vs without VTA. Redesigning the VT data structures for more efficient propagation of information is something that we should look into, for it will benefit VTA as well as non-VTA VT compilations. But I hope that's not set as a requirement to have VTA support integrated into the compiler. Results from the second run (still with -j3) are just in: # name run0 run1 1 g0-trunk 18m57.284s 19m04.429s trunk -g0 2 g0 18m36.999s 19m13.588s vta -g0 3 g-novt 19m08.668s 19m16.078s vta -g -fno-$vt -fno-vta 4 g-novta 19m35.518s 20m06.529s vta -g -f$vt -fno-$vta 5 g-novt-vta 19m29.107s 19m52.220s vta -g -fno-$vt -f$vta 6 g 21m19.831s 20m44.965s vta -g -f$vt -f$vta the distortion between 1 and 2 appears to be fixed, the overhead for -g without VT is much smaller, and the VTA overhead is down to 8.8%. Ok, so the results of the first run are not that significant, and I guess I'll have to average the results over more runs, but maybe they can at least give you a rough idea of where we'll be heading if we bring VTA in. -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer