Re: [m5-dev] Condition code bits in X86 O3

Gabe Black Sun, 13 Feb 2011 21:09:23 -0800

Of course. I haven't really thought about it very much yet beyond what
was in my earlier email, but when I do I'll be sure to keep you (and
this list) in the loop.


Gabe

On 02/13/11 20:04, Steve Reinhardt wrote:
> Hi Gabe,
>
> I just got around to reading this... please fill me in with more
> design details as you work on this, as I'd like to keep on top of what
> you're doing and (perhaps) be in a position to offer some suggestions.
>
> Thanks,
>
> Steve
>
> On Fri, Feb 11, 2011 at 4:16 PM, Gabriel Michael Black
> <gbl...@eecs.umich.edu <mailto:gbl...@eecs.umich.edu>> wrote:
>
>     Hello again. I've had a chance to talk with an expert, and I have
>     an idea of how to approach this. It's going to require more
>     flexibility than the ISA parser has currently, though,
>     specifically in how the list of source and destination registers
>     are managed. It would also be nice to have a more integrated idea
>     of composite operands, ie. ones where some bits come from here,
>     some from there, and in the end it builds a single uint64_t,
>     double precision float, vector of uint32_ts, etc.
>
>     Rather than try to shoe horn this into a system that's already
>     suffered enough of my abuse, aka the ISA description language, I'm
>     going to attempt to build a parallel facility for defining
>     instructions usable from inside the python in "let" blocks.
>     Basically it would be python classes, functions, etc., (hopefully
>     not that many) exported into the let block context that would
>     allow more direct interaction with the parser's guts, and more
>     control over how things are put together.
>
>     In the future I'd like to see this bud into isa_parser2.py, but
>     that's going to be a lot of work and is a somewhat orthogonal
>     issue. Ideally this sort of thing will also make it easier to
>     split output into smaller files.
>
>     Gabe
>
>
>     Quoting Gabe Black <gbl...@eecs.umich.edu
>     <mailto:gbl...@eecs.umich.edu>>:
>
>         I'm looking at why x86 goes so much slower than Alpha on O3
>         (4x the
>         ticks), and I think one culprit are dependencies set up by the
>         condition
>         code bits of the flags register. Many instructions in x86
>         modify or
>         depend on those bits, and even though the condition codes are
>         separated
>         out from the flags register (which does a lot of other stuff too),
>         they're being updated with a read-modify-write sort of
>         mechanism. I
>         expect that's setting up long chains of serializing
>         dependencies which
>         is killing parallelism and performance.
>
>         Basically, There are 6 condition codes in x86, Z, C, A, S, P,
>         O or zero,
>         carry, auxiliary carry, sign, parity and overflow. In M5's
>         implementation (and in the patent I patterned it after) there
>         are also
>         artificial "emulation" zero and carry flags that work like the
>         regular
>         ones but are maintained separately. They can be updated
>         independently
>         and checked separately, and are useful behind the scenes when
>         implementing some macroops. Instructions may update all of
>         these flags
>         or only some of them. The PTLSim manual claims that there's a
>         "ZAPS"
>         rule where the zero, auxiliary carry, parity and sign bits are
>         always
>         updated together. That's usually true, but certain
>         instructions change
>         only the zero flag. CMPXCHG8B is an example.
>
>         What I'd been thinking of doing to handle this is to further
>         split up
>         the condition code bits into separate registers to be managed
>         independently for any register renaming. There are a couple of
>         issues
>         with that, though. First, it looks like there'd have to be 6
>         different
>         registers, APS, Z, O, C, EZ, and EC. A non-trivial number of
>         instructions would need to update 4 or more of those, putting
>         a perhaps
>         unrealistic burden on any rename mechanism. That would also
>         make the
>         simple CPUs slower because they'd have to read/write all those
>         extra
>         registers. Bread and butter x86 tends to be condition code
>         happy, so
>         that could be a significant slow down.
>
>         Also, that complicates decoding significantly. Conceptually
>         it's easy to
>         imagine reading/writing the registers with the bits you need,
>         but with
>         the ISA parser, the code needs to either be there or not be
>         there. If
>         you have code that's never used but accesses a register, it'll
>         still get
>         pulled in as a source or dest. That means there would need to
>         be a hard
>         coded version of every microop that would correspond to each
>         possible
>         combination of condition code bits. Since there are 6 bits,
>         that's 2^6,
>         plus 2 variants for partial or complete register writes, so
>         2^7 or 128
>         versions of every microop. There are also register/immediate
>         versions of
>         many microops. We would likely end up with thousands of
>         microop classes.
>         We'd also need to generate selection functions that would pick
>         which
>         variant to use. This is all possible, but fairly ugly and clunky.
>
>         So does anybody have any suggestions on how to unserialize these
>         microops? I found a paper here:
>         
> http://www.wseas.us/e-library/conferences/2006elounda1/papers/537-325.pdf
>         that claims IPC for x86 CPUs is significantly worse than other
>         ISAs
>         specifically because of this sort of thing. Is this just a
>         fact of life
>         with x86? Would fixing it be not only very annoying but also
>         unrealistic? Is that paper's claim actually true?
>
>         Gabe
>         _______________________________________________
>         m5-dev mailing list
>         m5-dev@m5sim.org <mailto:m5-dev@m5sim.org>
>         http://m5sim.org/mailman/listinfo/m5-dev
>
>
>
>     _______________________________________________
>     m5-dev mailing list
>     m5-dev@m5sim.org <mailto:m5-dev@m5sim.org>
>     http://m5sim.org/mailman/listinfo/m5-dev
>
>
>
> _______________________________________________
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev

_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] Condition code bits in X86 O3

Reply via email to