On Tue, 29 May 2012, Steve Reinhardt wrote:

Hi folks,

Thinking more about Yasuko's problem where performance is going down
because of increased physical register pressure due to the split CC
register, it occurs to me that in the end we'll almost certainly need to
specialize the renaming mechanism to deal with condition codes anyway.

I think you have not spelled out the reason why would you need to specialize the renaming mechanism. If register file pressure is the only concern, you can always increase the register file size to take care of it. A crude way would be to figure out the expected number of renamed CC registers over all the instructions in the pipeline and increase the register file size by that number. On an expected basis, this should mean that none of the physical registers in the original register file are devoted to the split CC registers.

That is, no rational x86 implementation would take up an entire physical
GPR just to hold a few bits that probably aren't going to be read anyway.

Steve, you are in a very good position to know what a rational x86 implementation does.


The more "conventional" solution is described here:
http://www.ptlsim.org/Documentation/html/node7.html#SECTION02540000000000000000

Basically every physical GPR includes a set of (possibly invalid) CC bits.
The various CC sub-fields have to be renamed separately, as the producer
of a particular CC field may not be the latest producer of any particular
GPR value, but the CC subfield rename registers will always point to a
physical register file entry.

I realize this is not a trivial change to the O3 model, but I think it's
one that needs to be made.  I'm bringing it up now because it may (probably
will) impact how we handle the partial CC read/write issue that's still
outstanding.  To me there's no sense in fixing the CC problem in the
current renaming context, particularly since there are no easy solutions.
We're probably better off revamping how CC renaming works to begin with,
then try and optimize that solution to deal with partial writes.

Given the familiarity I have with the o3 cpu and the isa parser, I am inclined to say that the isa parser is much more easier to work with.


For example, in the CC renaming model I'm envisioning, having bitmasks of
which CC subfields get read or written by a particular micro-op might be
adequate, and it would be much easier to generate bitmasks like this on the
fly than it would to do the dynamic munging of source register indices that
we've been talking about.

It is still not clear to me what this renaming process would achieve, why would you need to model the renaming of CC registers with so much accuracy. Moreover, once we improve the handing of the source and the destination registers, the pressure on the register may go down significantly, in which case the effort spent on redoing renaming of CC registers may not be worth it.

--
Nilay
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to