Re: [m5-dev] O3CPU + translateTiming
The dynamic instruction object is really just the dynamic information associated with an instruction, as apposed to the static instruction object that gets reused. Strictly speaking there's no guarantee that an arbitrary dynamic instruction will always use timing mode, but all the CPU models we have that are complicated enough to need dynamic instructions use timing mode exclusively as far as I know. O3's dynamic instruction object for sure uses only timing mode, so yes, in that case, the data parameter is just so the function signatures are the same in the read case. initiateAcc is how memory instructions access memory in timing mode, so again, in this case and in most practical cases a dynamic instruction object's read function would be called from initiateAcc and not execute. If you're only worried about O3 (ie. it's in the o3 directory) you don't need to keep track of data. If it's the base dynamic instruction object it's a little less clear since -practically- speaking it will probably only be used with timing mode, but there isn't any reason I can think of that has to be true in all cases. I think the base dynamic instruction object may actually not be very far separated from O3 so it may already assume timing mode in other places. Please someone speak up if I'm wildly mischaracterizing how this is supposed to work. I've worked a lot with O3's innards, but I think all the design work was done before I was associated with M5. Gabe Min Kyu Jeong wrote: so base_dyn_inst is always used timing memory - I assumed so but just wanted to confirm this to make sure that read function, BaseDynInstImpl::read(Addr addr, T data, unsigned flags) 'data' argument isn't really doing anything but being a placeholder for func sig matching in xc interface. -- is this correct? BaseDynInst::read() will be called only in initiateAcc(), not in execute() function. When the initiateAcc()/completeAcc() pair is used, pkt-get() is used in completeAcc() to read the data. I am moving some code from read() to finishTranslation() and it looks like 'data' variable doesn't need to be passed. Thanks, Min On Thu, Jul 15, 2010 at 10:01 AM, Steve Reinhardt ste...@gmail.com mailto:ste...@gmail.com wrote: On Wed, Jul 14, 2010 at 8:35 AM, Min Kyu Jeong mkje...@gmail.com mailto:mkje...@gmail.com wrote: b) memory is atomic (is it a possible combination? dyn_inst + atomic?) - x86 doesn't seem to have code for this case - Walker::recvAtomic() does nothing. I don't believe that O3+atomic mode works. Practically speaking it doesn't make any sense. Steve ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] O3CPU + translateTiming
I think you stated it pretty accurately, Gabe. It looks like the base_dyn_inst.hh file is only used in o3, ozone, and checker, and of those only o3 is really being used right now. Steve On Thu, Jul 22, 2010 at 12:56 AM, Gabe Black gbl...@eecs.umich.edu wrote: The dynamic instruction object is really just the dynamic information associated with an instruction, as apposed to the static instruction object that gets reused. Strictly speaking there's no guarantee that an arbitrary dynamic instruction will always use timing mode, but all the CPU models we have that are complicated enough to need dynamic instructions use timing mode exclusively as far as I know. O3's dynamic instruction object for sure uses only timing mode, so yes, in that case, the data parameter is just so the function signatures are the same in the read case. initiateAcc is how memory instructions access memory in timing mode, so again, in this case and in most practical cases a dynamic instruction object's read function would be called from initiateAcc and not execute. If you're only worried about O3 (ie. it's in the o3 directory) you don't need to keep track of data. If it's the base dynamic instruction object it's a little less clear since -practically- speaking it will probably only be used with timing mode, but there isn't any reason I can think of that has to be true in all cases. I think the base dynamic instruction object may actually not be very far separated from O3 so it may already assume timing mode in other places. Please someone speak up if I'm wildly mischaracterizing how this is supposed to work. I've worked a lot with O3's innards, but I think all the design work was done before I was associated with M5. Gabe Min Kyu Jeong wrote: so base_dyn_inst is always used timing memory - I assumed so but just wanted to confirm this to make sure that read function, BaseDynInstImpl::read(Addr addr, T data, unsigned flags) 'data' argument isn't really doing anything but being a placeholder for func sig matching in xc interface. -- is this correct? BaseDynInst::read() will be called only in initiateAcc(), not in execute() function. When the initiateAcc()/completeAcc() pair is used, pkt-get() is used in completeAcc() to read the data. I am moving some code from read() to finishTranslation() and it looks like 'data' variable doesn't need to be passed. Thanks, Min On Thu, Jul 15, 2010 at 10:01 AM, Steve Reinhardt ste...@gmail.com mailto:ste...@gmail.com wrote: On Wed, Jul 14, 2010 at 8:35 AM, Min Kyu Jeong mkje...@gmail.com mailto:mkje...@gmail.com wrote: b) memory is atomic (is it a possible combination? dyn_inst + atomic?) - x86 doesn't seem to have code for this case - Walker::recvAtomic() does nothing. I don't believe that O3+atomic mode works. Practically speaking it doesn't make any sense. Steve ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] O3CPU + translateTiming
so base_dyn_inst is always used timing memory - I assumed so but just wanted to confirm this to make sure that read function, BaseDynInstImpl::read(Addr addr, T data, unsigned flags) 'data' argument isn't really doing anything but being a placeholder for func sig matching in xc interface. -- is this correct? BaseDynInst::read() will be called only in initiateAcc(), not in execute() function. When the initiateAcc()/completeAcc() pair is used, pkt-get() is used in completeAcc() to read the data. I am moving some code from read() to finishTranslation() and it looks like 'data' variable doesn't need to be passed. Thanks, Min On Thu, Jul 15, 2010 at 10:01 AM, Steve Reinhardt ste...@gmail.com wrote: On Wed, Jul 14, 2010 at 8:35 AM, Min Kyu Jeong mkje...@gmail.com wrote: b) memory is atomic (is it a possible combination? dyn_inst + atomic?) - x86 doesn't seem to have code for this case - Walker::recvAtomic() does nothing. I don't believe that O3+atomic mode works. Practically speaking it doesn't make any sense. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] O3CPU + translateTiming
It looks like the right place to place the code that checks for a fault and calls the CPU read/write function would be BaseDynInstImpl::finishTranslation(). All the code to make this work seems to be there already. If it hits in the TLB, then TheISA::TLB::translateTiming() should call translation-finish() right away. I checked alpha, x86 and ARM and they do. It would execute the the code at the end of the initiateTranslation()-translateTiming() call chain, which is effectively the same as now where the code is executed right after initiateTranslation() returns. In case of a TLB miss, 1) for alpha (and other sw-handling arch), it would call translation-finish() with a fault, which can be handled in finishTranslation() the same way 2) for archs that do hw page-table walker, a) memory is timing, then translation-finish() is called when the walk is finished. x86 seems to have the code for this Walker::recvTiming(), ARM has the code and it's working with TimingSimpleCPU. b) memory is atomic (is it a possible combination? dyn_inst + atomic?) - x86 doesn't seem to have code for this case - Walker::recvAtomic() does nothing. Sounds safe? Thanks, Min On Tue, Jul 13, 2010 at 6:19 PM, Gabriel Michael Black gbl...@eecs.umich.edu wrote: I think you've mostly interpretted this correctly. The instructions aren't retried if the translation fails, they just hang around and wait for it. The check if fault == NoFault will work if the translation is finished by the time initiateTranslation is done. That's true for everything we have now except x86 and ARM, neither of which is currently supported by O3. What might work to fix this is to move the code that checks for a fault and calls the CPU read/write function into the callback itself. That way once translation is done, whenever that may be, the correct action will happen. Gabe Quoting Min Kyu Jeong mkje...@gmail.com: Thanks, Tim It looks like the for the DTLB translation, some code is there to handle this but not complete, for the ISAs that does hardware page table walk. cpu/base_dyn_inst.hh BaseDynInstImpl::read(Addr addr, T data, unsigned flags) { ... initiateTranslation(req, sreqLow, sreqHigh, NULL, BaseTLB::Read); if (fault == NoFault) { effAddr = req-getVaddr(); effAddrValid = true; fault = cpu-read(req, sreqLow, sreqHigh, data, lqIdx); } else { ... this-setExecuted(); } It first initiate translation, and would call cpu-read() as long as a fault has not been generated during the translation. This should work for the Alpha, where TLB miss is treated as fault and handled in software by PALcode. Alpha TLB returns a fault in case of a miss. For the ISAs that does hardware page-table walk, the TLB-miss instruction shout not either start a read (cpu-read()) or taken out of the instruction window (this-setExecuted()). I think it should wait for the table walk to finish and retry the execution of the load/store (it might be not true depending on the implementation??) I looked into the x86 code, and if the memory is timing, then the pagetable walker would initiate a memory access and return without a fault - it means the cpu-read() would be called w/o the translation finished. It is the same case for the Arm. Is there any plan or ongoing effort to support this wait-on-TLB-miss on the other ISAs? or ideas about how to go about implementing it? Thanks, Min On Mon, Jul 12, 2010 at 5:44 PM, Timothy M Jones tjon...@inf.ed.ac.uk wrote: Hi Min, The way that the TLB deals with a timing translation is specific to each ISA. I don't have much experience with anything other than Power but for that ISA, yes, you're correct. The timing translation is just a wrapper around the atomic translation. It seems from a quick check that Alpha is the same. If you actually wanted to have the fetch translation finish on a different cycle to the one it was initiated on then you would have to make some changes to the fetch stage to allow that. I wouldn't have thought it would be too difficult but might require splitting up several functions into code that's executed before the translation and code that's executed afterwards. Cheers Tim On 12/07/2010 18:14, Min Kyu Jeong wrote: Hi, This question is regarding the changeset (http://repo.m5sim.org/m5?cmd=changeset;node=a123bd350935). This initiates a timing translation and passes the read or write on to the processor before waiting for it to finish It looks like even in the event of TLB miss, TLB-walk does not delay the actual execution of the loads. Am I correct? I was trying to find a reference for replacing the translateAtomic() in the fetch stage w/ translateTIming(). It would require some mechanism to stop the actual fetch until the translation is finished - which doesn't seem to exist in the O3 CPU even for the data translation. Thanks, Min
Re: [m5-dev] O3CPU + translateTiming
Thanks, Tim It looks like the for the DTLB translation, some code is there to handle this but not complete, for the ISAs that does hardware page table walk. cpu/base_dyn_inst.hh BaseDynInstImpl::read(Addr addr, T data, unsigned flags) { ... initiateTranslation(req, sreqLow, sreqHigh, NULL, BaseTLB::Read); if (fault == NoFault) { effAddr = req-getVaddr(); effAddrValid = true; fault = cpu-read(req, sreqLow, sreqHigh, data, lqIdx); } else { ... this-setExecuted(); } It first initiate translation, and would call cpu-read() as long as a fault has not been generated during the translation. This should work for the Alpha, where TLB miss is treated as fault and handled in software by PALcode. Alpha TLB returns a fault in case of a miss. For the ISAs that does hardware page-table walk, the TLB-miss instruction shout not either start a read (cpu-read()) or taken out of the instruction window (this-setExecuted()). I think it should wait for the table walk to finish and retry the execution of the load/store (it might be not true depending on the implementation??) I looked into the x86 code, and if the memory is timing, then the pagetable walker would initiate a memory access and return without a fault - it means the cpu-read() would be called w/o the translation finished. It is the same case for the Arm. Is there any plan or ongoing effort to support this wait-on-TLB-miss on the other ISAs? or ideas about how to go about implementing it? Thanks, Min On Mon, Jul 12, 2010 at 5:44 PM, Timothy M Jones tjon...@inf.ed.ac.ukwrote: Hi Min, The way that the TLB deals with a timing translation is specific to each ISA. I don't have much experience with anything other than Power but for that ISA, yes, you're correct. The timing translation is just a wrapper around the atomic translation. It seems from a quick check that Alpha is the same. If you actually wanted to have the fetch translation finish on a different cycle to the one it was initiated on then you would have to make some changes to the fetch stage to allow that. I wouldn't have thought it would be too difficult but might require splitting up several functions into code that's executed before the translation and code that's executed afterwards. Cheers Tim On 12/07/2010 18:14, Min Kyu Jeong wrote: Hi, This question is regarding the changeset (http://repo.m5sim.org/m5?cmd=changeset;node=a123bd350935). This initiates a timing translation and passes the read or write on to the processor before waiting for it to finish It looks like even in the event of TLB miss, TLB-walk does not delay the actual execution of the loads. Am I correct? I was trying to find a reference for replacing the translateAtomic() in the fetch stage w/ translateTIming(). It would require some mechanism to stop the actual fetch until the translation is finished - which doesn't seem to exist in the O3 CPU even for the data translation. Thanks, Min ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- Timothy M. Jones http://homepages.inf.ed.ac.uk/tjones1 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] O3CPU + translateTiming
I think you've mostly interpretted this correctly. The instructions aren't retried if the translation fails, they just hang around and wait for it. The check if fault == NoFault will work if the translation is finished by the time initiateTranslation is done. That's true for everything we have now except x86 and ARM, neither of which is currently supported by O3. What might work to fix this is to move the code that checks for a fault and calls the CPU read/write function into the callback itself. That way once translation is done, whenever that may be, the correct action will happen. Gabe Quoting Min Kyu Jeong mkje...@gmail.com: Thanks, Tim It looks like the for the DTLB translation, some code is there to handle this but not complete, for the ISAs that does hardware page table walk. cpu/base_dyn_inst.hh BaseDynInstImpl::read(Addr addr, T data, unsigned flags) { ... initiateTranslation(req, sreqLow, sreqHigh, NULL, BaseTLB::Read); if (fault == NoFault) { effAddr = req-getVaddr(); effAddrValid = true; fault = cpu-read(req, sreqLow, sreqHigh, data, lqIdx); } else { ... this-setExecuted(); } It first initiate translation, and would call cpu-read() as long as a fault has not been generated during the translation. This should work for the Alpha, where TLB miss is treated as fault and handled in software by PALcode. Alpha TLB returns a fault in case of a miss. For the ISAs that does hardware page-table walk, the TLB-miss instruction shout not either start a read (cpu-read()) or taken out of the instruction window (this-setExecuted()). I think it should wait for the table walk to finish and retry the execution of the load/store (it might be not true depending on the implementation??) I looked into the x86 code, and if the memory is timing, then the pagetable walker would initiate a memory access and return without a fault - it means the cpu-read() would be called w/o the translation finished. It is the same case for the Arm. Is there any plan or ongoing effort to support this wait-on-TLB-miss on the other ISAs? or ideas about how to go about implementing it? Thanks, Min On Mon, Jul 12, 2010 at 5:44 PM, Timothy M Jones tjon...@inf.ed.ac.ukwrote: Hi Min, The way that the TLB deals with a timing translation is specific to each ISA. I don't have much experience with anything other than Power but for that ISA, yes, you're correct. The timing translation is just a wrapper around the atomic translation. It seems from a quick check that Alpha is the same. If you actually wanted to have the fetch translation finish on a different cycle to the one it was initiated on then you would have to make some changes to the fetch stage to allow that. I wouldn't have thought it would be too difficult but might require splitting up several functions into code that's executed before the translation and code that's executed afterwards. Cheers Tim On 12/07/2010 18:14, Min Kyu Jeong wrote: Hi, This question is regarding the changeset (http://repo.m5sim.org/m5?cmd=changeset;node=a123bd350935). This initiates a timing translation and passes the read or write on to the processor before waiting for it to finish It looks like even in the event of TLB miss, TLB-walk does not delay the actual execution of the loads. Am I correct? I was trying to find a reference for replacing the translateAtomic() in the fetch stage w/ translateTIming(). It would require some mechanism to stop the actual fetch until the translation is finished - which doesn't seem to exist in the O3 CPU even for the data translation. Thanks, Min ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- Timothy M. Jones http://homepages.inf.ed.ac.uk/tjones1 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] O3CPU + translateTiming
Hi Min, The way that the TLB deals with a timing translation is specific to each ISA. I don't have much experience with anything other than Power but for that ISA, yes, you're correct. The timing translation is just a wrapper around the atomic translation. It seems from a quick check that Alpha is the same. If you actually wanted to have the fetch translation finish on a different cycle to the one it was initiated on then you would have to make some changes to the fetch stage to allow that. I wouldn't have thought it would be too difficult but might require splitting up several functions into code that's executed before the translation and code that's executed afterwards. Cheers Tim On 12/07/2010 18:14, Min Kyu Jeong wrote: Hi, This question is regarding the changeset (http://repo.m5sim.org/m5?cmd=changeset;node=a123bd350935). This initiates a timing translation and passes the read or write on to the processor before waiting for it to finish It looks like even in the event of TLB miss, TLB-walk does not delay the actual execution of the loads. Am I correct? I was trying to find a reference for replacing the translateAtomic() in the fetch stage w/ translateTIming(). It would require some mechanism to stop the actual fetch until the translation is finished - which doesn't seem to exist in the O3 CPU even for the data translation. Thanks, Min ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- Timothy M. Jones http://homepages.inf.ed.ac.uk/tjones1 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev