It looks like the right place to place "the code that checks for a fault and calls the CPU read/write function" would be BaseDynInst<Impl>::finishTranslation().
All the code to make this work seems to be there already. If it hits in the TLB, then TheISA::TLB::translateTiming() should call translation->finish() right away. I checked alpha, x86 and ARM and they do. It would execute the "the code" at the end of the initiateTranslation()->translateTiming() call chain, which is effectively the same as now where "the code" is executed right after initiateTranslation() returns. In case of a TLB miss, 1) for alpha (and other sw-handling arch), it would call translation->finish() with a fault, which can be handled in finishTranslation() the same way 2) for archs that do hw page-table walker, a) memory is timing, then translation->finish() is called when the walk is finished. x86 seems to have the code for this Walker::recvTiming(), ARM has the code and it's working with TimingSimpleCPU. b) memory is atomic (is it a possible combination? dyn_inst + atomic?) - x86 doesn't seem to have code for this case - Walker::recvAtomic() does nothing. Sounds safe? Thanks, Min On Tue, Jul 13, 2010 at 6:19 PM, Gabriel Michael Black < gbl...@eecs.umich.edu> wrote: > I think you've mostly interpretted this correctly. The instructions aren't > retried if the translation fails, they just hang around and wait for it. The > check if fault == NoFault will work if the translation is finished by the > time initiateTranslation is done. That's true for everything we have now > except x86 and ARM, neither of which is currently supported by O3. What > might work to fix this is to move the code that checks for a fault and calls > the CPU read/write function into the callback itself. That way once > translation is done, whenever that may be, the correct action will happen. > > Gabe > > > > Quoting Min Kyu Jeong <mkje...@gmail.com>: > > Thanks, Tim >> >> It looks like the for the DTLB translation, some code is there to handle >> this but not complete, for the ISAs that does hardware page table walk. >> >> cpu/base_dyn_inst.hh >> BaseDynInst<Impl>::read(Addr addr, T &data, unsigned flags) >> { >> ... >> initiateTranslation(req, sreqLow, sreqHigh, NULL, BaseTLB::Read); >> >> if (fault == NoFault) { >> effAddr = req->getVaddr(); >> effAddrValid = true; >> fault = cpu->read(req, sreqLow, sreqHigh, data, lqIdx); >> } else { >> ... >> this->setExecuted(); >> } >> >> It first initiate translation, and would call cpu->read() as long as a >> fault >> has not been generated during the translation. This should work for the >> Alpha, where TLB miss is treated as fault and handled in software by >> PALcode. Alpha TLB returns a fault in case of a miss. >> >> For the ISAs that does hardware page-table walk, the TLB-miss instruction >> shout not either start a read (cpu->read()) or taken out of the >> instruction >> window (this->setExecuted()). I think it should wait for the table walk to >> finish and retry the execution of the load/store (it might be not true >> depending on the implementation??) >> >> I looked into the x86 code, and if the memory is timing, then the >> pagetable >> walker would initiate a memory access and return without a fault - it >> means >> the cpu->read() would be called w/o the translation finished. It is the >> same >> case for the Arm. >> >> Is there any plan or ongoing effort to support this wait-on-TLB-miss on >> the >> other ISAs? or ideas about how to go about implementing it? >> >> Thanks, >> >> Min >> >> On Mon, Jul 12, 2010 at 5:44 PM, Timothy M Jones <tjon...@inf.ed.ac.uk >> >wrote: >> >> Hi Min, >>> >>> The way that the TLB deals with a timing translation is specific to each >>> ISA. I don't have much experience with anything other than Power but for >>> that ISA, yes, you're correct. The timing translation is just a wrapper >>> around the atomic translation. It seems from a quick check that Alpha is >>> the same. >>> >>> If you actually wanted to have the fetch translation finish on a >>> different >>> cycle to the one it was initiated on then you would have to make some >>> changes to the fetch stage to allow that. I wouldn't have thought it >>> would >>> be too difficult but might require splitting up several functions into >>> code >>> that's executed before the translation and code that's executed >>> afterwards. >>> >>> Cheers >>> Tim >>> >>> >>> On 12/07/2010 18:14, Min Kyu Jeong wrote: >>> >>> Hi, >>>> >>>> This question is regarding the changeset >>>> (http://repo.m5sim.org/m5?cmd=changeset;node=a123bd350935). >>>> >>>> This initiates a timing translation and passes the read or write on >>>> to the >>>> >>>> processor before waiting for it to finish >>>> >>>> >>>> It looks like even in the event of TLB miss, TLB-walk does not delay the >>>> actual execution of the loads. Am I correct? >>>> >>>> I was trying to find a reference for replacing the translateAtomic() in >>>> the fetch stage w/ translateTIming(). It would require some mechanism to >>>> stop the actual fetch until the translation is finished - which doesn't >>>> seem to exist in the O3 CPU even for the data translation. >>>> >>>> Thanks, >>>> >>>> Min >>>> >>>> >>>> >>>> _______________________________________________ >>>> m5-dev mailing list >>>> m5-dev@m5sim.org >>>> http://m5sim.org/mailman/listinfo/m5-dev >>>> >>>> >>> -- >>> Timothy M. Jones >>> http://homepages.inf.ed.ac.uk/tjones1 >>> >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >>> _______________________________________________ >>> m5-dev mailing list >>> m5-dev@m5sim.org >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >>> >> > >
_______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev