Re: SSA range class and removal of VR_ANTI_RANGEs

Aldy Hernandez Wed, 31 May 2017 07:21:17 -0700

On 05/23/2017 08:11 AM, Jakub Jelinek wrote:

On Tue, May 23, 2017 at 06:48:15AM -0400, Aldy Hernandez wrote:


[ughh, one more time, but CCing the list.]

Sorry, for the delayed response. I was fighting with Firefox + LTO togather some data :).

I'm worried a lot here about compile time memory usage increase, especially
with EVRP and IPA-VRP and even more so with LTO.
The changes mean that for every SSA_NAME of any kind we now need 8 more
bytes, and for every SSA_NAME that has range info (typically both range info
and nonzero_bits; in the old implementation the two were tied together for a
good reason, many ranges also imply some non-zero bits and from non-zero
bits one can in many cases derive a range) much more than that (through
the nonzero_bits_def you get all the overhead of trailing_wide_ints -
the 3 fields it has, just save on the actual 2 lengths and 2 HWI sets,
but then you add 3x 8 bytes and 6x size of the whole wide_int which is
from what I understood not really meant as the memory storage of wide ints
in data structures, but as something not memory efficient you can work
quickly on (so ideal for automatic variables in functions that work with
those).  So it is a 8 byte growth for SSA_NAMEs without ranges and
8+3*8+6*32-2*4-2*8*x growth for SSA_NAMEs with ranges if x is the number
of HWIs needed to represent the integers of that type (so most commonly
x=1).  For x=1 8+3*8+6*32-2*4-2*8*x is 200 bytes growth.
With say 10000000 SSA_NAMEs, 5000000 of them with ranges, that will be
already a 1GB difference, dunno how many SSA_NAMEs are there e.g. in firefox
LTO build.
Can the nonzero_bits stuff be folded back into irange (and have code to
maintain nonzero_bits in addition to ranges as before (from setting a range
compute or update nonzero_bits and vice versa)?  Can you use
trailing_wide_int for the irange storage of the values?  Can you allocate
only as many HWIs as you really need, rather than always 6?
Again, it can be different between a class you use for accessing the
information and manipulating it and for actual underlying storage on
SSA_NAMEs.

Before I fire off some stats, a few things. Yes, we can do better. Yes,we can use trailing wide ints or another idiom. Yes, we could joinnonzero bits with the range info (if absolutely necessary, because asI'll show below, the extra word doesn't blow up memory as much asadvertised).

Thanks to Martin Liska and Markus I was able to build a three-month oldfirefox branch with GCC7 LTO. I built firefox with -O2 -flto=9, andthen looked at the biggest partition at the end of VRP1 (that is,warn_array_bound_p == true).

The short version is that I don't see anywhere close to 10 millionSSA_NAMEs. Of the SSA_NAMEs we do get, 25% are pointers and irrelevant.22% of the non-pointer SSA_NAMEs actually have range information, andof this range info 32% is actually useless because it spans the entiredomain. So, unless I'm misunderstanding something, even in the worstcase scenario (in firefox + LTO anyways), I don't see such a big blow upof memory.

Now on to the actual numbers. Remember, this is only for one of thepartitions during the ltrans stage of LTO, running VRP1, since thisseems to be the granular level of a compilation process. If you'retrying to compile 20 partitions at the same time, get more memory :).

The biggest number of SSA_NAMEs I saw was actually 472,225. Of these,357,032 were non-pointers, so could conceivably have range information.In reality, 77,398 had range information, so 16% of all pointer andnon-pointer SSA_NAMEs actually have range information.

Now here's the interesting bit... in analyzing those 77,398 ranges, Inoticed that a great many of them were useless. They consisted of therange for the entire domain even with no NON_ZERO_BITS set. The newimplementation will obviously not have this limitation :-). Taking thisinto account, I see a total of 52,718 non-useless ranges out of 472,225.That's about 11%.

In the compilation of this large partition, G.stats.total_allocated is2,192,677,688, so about 2 gigs of memory. Of which the totalnon-useless range info is 1,686,976 for vanilla GCC7. This looks likeless than 2 megs. I calculated the range info memory consumption with(sizeof (range_info_def) + trailing_wide_ints <3>::extra_size (precision)).

Assuming a pathological case of 200 extra bytes in each of the 52,718non-useless ranges, we'd have 10 megabytes of extra data. And that's ina compilation that already consumes 2 gigs of memory. Isn't that likeless than half of a percent? We could even account for that extrapointer in SSA_NAME and add less than 4 megs to that number (472,225 * 8bytes).

Again, do not take this as an excuse for the careless memory layout ofmy patch (after all, it was meant as a proof of concept). I willre-arrange things and optimize things a bit. However, I also don'tthink that adding a word or two here is going to make a difference,since even at the pathological case, we're only increasing memory byless than a percent.

This begs the question, is firefox with LTO representative of a largeenough sample? Am I wrong in analyzing SSA_NAME usage at the end ofVRP1? Are there programs out there with 10 million SSA_NAMEs, for whicha larger range info would be a non-trivial increase of the compilationprocess's memory?

I will post a less horrible version, but in the meantime I wanted toshow some numbers.


Thanks to everyone for their input.
Aldy

p.s. I did measure the number of entries in the internal VR_VALUE tablefor the above case. It had 103,577 non-empty entries. Again, thisdoesn't look like a significant increase in memory, even for thisinternal VRP representation.

Re: SSA range class and removal of VR_ANTI_RANGEs

Reply via email to