Steve Reinhardt wrote: > On Tue, Aug 25, 2009 at 11:03 PM, Gabe Black<gbl...@eecs.umich.edu> wrote: > >> The problem is that you need the displacement/immediate to actually do >> the cache look up since those are part of the ExtMachInst and are >> factored into a match. Those could be ignored for a preliminary lookup, >> read in if there's a match, and then considered for the second look up, >> but that sounds less efficient than just doing it like it's done now. >> There could be a more direct simplification of the logic in there, a way >> to reduce the number of function calls, etc. that would be easier. The >> code is in arch/x86/predecoder.cc if you want to take a look. >> > > I'm confused. My thought was that, for the purposes of caching, the > ExtMachInst would contain the raw instruction bytes, including the > displacement/immediate, plus the byte count, plus whatever decode > context is necessary. So the cache lookup for PC X would be: >
That's basically what it contains right now, except that it doesn't have the byte count. > 1. Get StaticInst for PC X (if any). > 2. Read the StaticInst's ExtMachInst to learn that the original > instruction it represents occupied N bytes, and what those bytes were. > 3. Compare the N bytes in memory at PC X with the N bytes stored in > the ExtMachInst. > That mostly sounds reasonable in isolation, but either that's going to need to be a #ifed mechanism just for x86 (I don't like those at all) or all the other ISAs will have to go through the motions with little to no benefit. Step 3 will also need to compare the extra contextualizing state. I do like the idea of reading in all the bytes at once, though. It would simplify the logic in the predecoder in that case. There would still need to be a version that figured out what bytes are needed if we miss, though, so we'd end up with two copies of it, one simplified and one regular. > This will require some changes in the decode cache, since the > ExtMachInst isn't fixed-size anymore (unless you've changed that > already), and probably some bigger changes in the decode structure > since the ExtMachInst isn't in this "predecoded" format anymore. > Maybe the predecoded thing is a separate structure that's only > temporary. Actually creating the ExtMachInst on a decode cache miss > will of course still require some decoding to learn how many bytes > need to go into it. > It could still be fixed size like it is now and just have empty spots if the instruction is smaller. The changes in the decode structure would be significant assuming there isn't a "predecoded" format. Really, it's not predecoded in the sense that there's any decoding done, just that different regions of the instruction are identified and separated. Most of the fields aren't always there, the rules deciding if they are are fairly complex, they aren't at fixed positions within the instruction, etc. Decoding with bitfields breaks down totally if you can't identify where the value you want to decode with actually is. There are a number of fields that do double duty as part of the opcode or operand specifiers, so it's not always the case that you can figure out the extra parts after you've identified an instruction. We would have to do this level of decoding before we went into the actual decoder somewhere. The decoder function as generated by the isa description doesn't allow for that. I don't want to sound like this won't work with enough effort, but there will be significant challenges that may make it not worthwhile. Decoding x86 efficiently certainly deserves its reputation. Gabe _______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev