Steve Reinhardt wrote:
> On Tue, Aug 25, 2009 at 11:03 PM, Gabe Black<gbl...@eecs.umich.edu> wrote:
>   
>> The problem is that you need the displacement/immediate to actually do
>> the cache look up since those are part of the ExtMachInst and are
>> factored into a match. Those could be ignored for a preliminary lookup,
>> read in if there's a match, and then considered for the second look up,
>> but that sounds less efficient than just doing it like it's done now.
>> There could be a more direct simplification of the logic in there, a way
>> to reduce the number of function calls, etc. that would be easier. The
>> code is in arch/x86/predecoder.cc if you want to take a look.
>>     
>
> I'm confused.  My thought was that, for the purposes of caching, the
> ExtMachInst would contain the raw instruction bytes, including the
> displacement/immediate, plus the byte count, plus whatever decode
> context is necessary.  So the cache lookup for PC X would be:
>   

That's basically what it contains right now, except that it doesn't have
the byte count.

> 1. Get StaticInst for PC X (if any).
> 2. Read the StaticInst's ExtMachInst to learn that the original
> instruction it represents occupied N bytes, and what those bytes were.
> 3. Compare the N bytes in memory at PC X with the N bytes stored in
> the ExtMachInst.
>   

That mostly sounds reasonable in isolation, but either that's going to
need to be a #ifed mechanism just for x86 (I don't like those at all) or
all the other ISAs will have to go through the motions with little to no
benefit. Step 3 will also need to compare the extra contextualizing
state. I do like the idea of reading in all the bytes at once, though.
It would simplify the logic in the predecoder in that case. There would
still need to be a version that figured out what bytes are needed if we
miss, though, so we'd end up with two copies of it, one simplified and
one regular.

> This will require some changes in the decode cache, since the
> ExtMachInst isn't fixed-size anymore (unless you've changed that
> already), and probably some bigger changes in the decode structure
> since the ExtMachInst isn't in this "predecoded" format anymore.
> Maybe the predecoded thing is a separate structure that's only
> temporary.  Actually creating the ExtMachInst on a decode cache miss
> will of course still require some decoding to learn how many bytes
> need to go into it.
>   

It could still be fixed size like it is now and just have empty spots if
the instruction is smaller. The changes in the decode structure would be
significant assuming there isn't a "predecoded" format. Really, it's not
predecoded in the sense that there's any decoding done, just that
different regions of the instruction are identified and separated. Most
of the fields aren't always there, the rules deciding if they are are
fairly complex, they aren't at fixed positions within the instruction,
etc. Decoding with bitfields breaks down totally if you can't identify
where the value you want to decode with actually is. There are a number
of fields that do double duty as part of the opcode or operand
specifiers, so it's not always the case that you can figure out the
extra parts after you've identified an instruction. We would have to do
this level of decoding before we went into the actual decoder somewhere.
The decoder function as generated by the isa description doesn't allow
for that.

I don't want to sound like this won't work with enough effort, but there
will be significant challenges that may make it not worthwhile. Decoding
x86 efficiently certainly deserves its reputation.

Gabe
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to