Re: [m5-dev] O3CPU + translateTiming

2010-07-22 Thread Gabe Black
The dynamic instruction object is really just the dynamic information
associated with an instruction, as apposed to the static instruction
object that gets reused. Strictly speaking there's no guarantee that an
arbitrary dynamic instruction will always use timing mode, but all the
CPU models we have that are complicated enough to need dynamic
instructions use timing mode exclusively as far as I know. O3's dynamic
instruction object for sure uses only timing mode, so yes, in that case,
the data parameter is just so the function signatures are the same in
the read case. initiateAcc is how memory instructions access memory in
timing mode, so again, in this case and in most practical cases a
dynamic instruction object's read function would be called from
initiateAcc and not execute. If you're only worried about O3 (ie. it's
in the o3 directory) you don't need to keep track of data. If it's the
base dynamic instruction object it's a little less clear since
-practically- speaking it will probably only be used with timing mode,
but there isn't any reason I can think of that has to be true in all
cases. I think the base dynamic instruction object may actually not be
very far separated from O3 so it may already assume timing mode in other
places.

Please someone speak up if I'm wildly mischaracterizing how this is
supposed to work. I've worked a lot with O3's innards, but I think all
the design work was done before I was associated with M5.

Gabe

Min Kyu Jeong wrote:
 so base_dyn_inst is always used timing memory - I assumed so but just
 wanted to confirm this to make sure that read function,

 BaseDynInstImpl::read(Addr addr, T data, unsigned flags)

 'data' argument isn't really doing anything but being a placeholder
 for func sig matching in xc interface. -- is this correct?

 BaseDynInst::read() will be called only in initiateAcc(), not in
 execute() function. When the initiateAcc()/completeAcc() pair is used,
 pkt-get() is used in completeAcc() to read the data.

 I am moving some code from read() to finishTranslation() and it looks
 like 'data' variable doesn't need to be passed.

 Thanks, 

 Min

 On Thu, Jul 15, 2010 at 10:01 AM, Steve Reinhardt ste...@gmail.com
 mailto:ste...@gmail.com wrote:

 On Wed, Jul 14, 2010 at 8:35 AM, Min Kyu Jeong mkje...@gmail.com
 mailto:mkje...@gmail.com wrote:
   b) memory is atomic (is it a possible combination? dyn_inst +
 atomic?) -
  x86 doesn't seem to have code for this case -
 Walker::recvAtomic() does
  nothing.

 I don't believe that O3+atomic mode works.  Practically speaking it
 doesn't make any sense.

 Steve
 ___
 m5-dev mailing list
 m5-dev@m5sim.org mailto:m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


 

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev
   

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] O3CPU + translateTiming

2010-07-22 Thread Steve Reinhardt
I think you stated it pretty accurately, Gabe.  It looks like the
base_dyn_inst.hh file is only used in o3, ozone, and checker, and of
those only o3 is really being used right now.

Steve

On Thu, Jul 22, 2010 at 12:56 AM, Gabe Black gbl...@eecs.umich.edu wrote:
 The dynamic instruction object is really just the dynamic information
 associated with an instruction, as apposed to the static instruction
 object that gets reused. Strictly speaking there's no guarantee that an
 arbitrary dynamic instruction will always use timing mode, but all the
 CPU models we have that are complicated enough to need dynamic
 instructions use timing mode exclusively as far as I know. O3's dynamic
 instruction object for sure uses only timing mode, so yes, in that case,
 the data parameter is just so the function signatures are the same in
 the read case. initiateAcc is how memory instructions access memory in
 timing mode, so again, in this case and in most practical cases a
 dynamic instruction object's read function would be called from
 initiateAcc and not execute. If you're only worried about O3 (ie. it's
 in the o3 directory) you don't need to keep track of data. If it's the
 base dynamic instruction object it's a little less clear since
 -practically- speaking it will probably only be used with timing mode,
 but there isn't any reason I can think of that has to be true in all
 cases. I think the base dynamic instruction object may actually not be
 very far separated from O3 so it may already assume timing mode in other
 places.

 Please someone speak up if I'm wildly mischaracterizing how this is
 supposed to work. I've worked a lot with O3's innards, but I think all
 the design work was done before I was associated with M5.

 Gabe

 Min Kyu Jeong wrote:
 so base_dyn_inst is always used timing memory - I assumed so but just
 wanted to confirm this to make sure that read function,

 BaseDynInstImpl::read(Addr addr, T data, unsigned flags)

 'data' argument isn't really doing anything but being a placeholder
 for func sig matching in xc interface. -- is this correct?

 BaseDynInst::read() will be called only in initiateAcc(), not in
 execute() function. When the initiateAcc()/completeAcc() pair is used,
 pkt-get() is used in completeAcc() to read the data.

 I am moving some code from read() to finishTranslation() and it looks
 like 'data' variable doesn't need to be passed.

 Thanks,

 Min

 On Thu, Jul 15, 2010 at 10:01 AM, Steve Reinhardt ste...@gmail.com
 mailto:ste...@gmail.com wrote:

     On Wed, Jul 14, 2010 at 8:35 AM, Min Kyu Jeong mkje...@gmail.com
     mailto:mkje...@gmail.com wrote:
       b) memory is atomic (is it a possible combination? dyn_inst +
     atomic?) -
      x86 doesn't seem to have code for this case -
     Walker::recvAtomic() does
      nothing.

     I don't believe that O3+atomic mode works.  Practically speaking it
     doesn't make any sense.

     Steve
     ___
     m5-dev mailing list
     m5-dev@m5sim.org mailto:m5-dev@m5sim.org
     http://m5sim.org/mailman/listinfo/m5-dev


 

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] O3CPU + translateTiming

2010-07-21 Thread Min Kyu Jeong
so base_dyn_inst is always used timing memory - I assumed so but just wanted
to confirm this to make sure that read function,

BaseDynInstImpl::read(Addr addr, T data, unsigned flags)

'data' argument isn't really doing anything but being a placeholder for func
sig matching in xc interface. -- is this correct?

BaseDynInst::read() will be called only in initiateAcc(), not in execute()
function. When the initiateAcc()/completeAcc() pair is used, pkt-get() is
used in completeAcc() to read the data.

I am moving some code from read() to finishTranslation() and it looks like
'data' variable doesn't need to be passed.

Thanks,

Min

On Thu, Jul 15, 2010 at 10:01 AM, Steve Reinhardt ste...@gmail.com wrote:

 On Wed, Jul 14, 2010 at 8:35 AM, Min Kyu Jeong mkje...@gmail.com wrote:
   b) memory is atomic (is it a possible combination? dyn_inst + atomic?) -
  x86 doesn't seem to have code for this case - Walker::recvAtomic() does
  nothing.

 I don't believe that O3+atomic mode works.  Practically speaking it
 doesn't make any sense.

 Steve
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] O3CPU + translateTiming

2010-07-14 Thread Min Kyu Jeong
It looks like the right place to place the code that checks for a fault and
calls the CPU read/write function would be
BaseDynInstImpl::finishTranslation().

All the code to make this work seems to be there already. If it hits in the
TLB, then TheISA::TLB::translateTiming() should call translation-finish()
right away. I checked alpha, x86 and ARM and they do. It would execute
the the code at the end of the initiateTranslation()-translateTiming()
call chain, which is effectively the same as now where the code is
executed right after initiateTranslation() returns.

In case of a TLB miss,
1) for alpha (and other sw-handling arch), it would call
translation-finish() with a fault, which can be handled in
finishTranslation() the same way

2) for archs that do hw page-table walker,

 a) memory is timing, then translation-finish() is called when the walk is
finished. x86 seems to have the code for this Walker::recvTiming(), ARM has
the code and it's working with TimingSimpleCPU.

 b) memory is atomic (is it a possible combination? dyn_inst + atomic?) -
x86 doesn't seem to have code for this case - Walker::recvAtomic() does
nothing.

Sounds safe?

Thanks,

Min


On Tue, Jul 13, 2010 at 6:19 PM, Gabriel Michael Black 
gbl...@eecs.umich.edu wrote:

 I think you've mostly interpretted this correctly. The instructions aren't
 retried if the translation fails, they just hang around and wait for it. The
 check if fault == NoFault will work if the translation is finished by the
 time initiateTranslation is done. That's true for everything we have now
 except x86 and ARM, neither of which is currently supported by O3. What
 might work to fix this is to move the code that checks for a fault and calls
 the CPU read/write function into the callback itself. That way once
 translation is done, whenever that may be, the correct action will happen.

 Gabe



 Quoting Min Kyu Jeong mkje...@gmail.com:

  Thanks, Tim

 It looks like the for the DTLB translation, some code is there to handle
 this but not complete, for the ISAs that does hardware page table walk.

 cpu/base_dyn_inst.hh
 BaseDynInstImpl::read(Addr addr, T data, unsigned flags)
 {
 ...
initiateTranslation(req, sreqLow, sreqHigh, NULL, BaseTLB::Read);

if (fault == NoFault) {
effAddr = req-getVaddr();
effAddrValid = true;
fault = cpu-read(req, sreqLow, sreqHigh, data, lqIdx);
} else {
 ...
this-setExecuted();
}

 It first initiate translation, and would call cpu-read() as long as a
 fault
 has not been generated during the translation. This should work for the
 Alpha, where TLB miss is treated as fault and handled in software by
 PALcode. Alpha TLB returns a fault in case of a miss.

 For the ISAs that does hardware page-table walk, the TLB-miss instruction
 shout not either start a read (cpu-read()) or taken out of the
 instruction
 window (this-setExecuted()). I think it should wait for the table walk to
 finish and retry the execution of the load/store (it might be not true
 depending on the implementation??)

 I looked into the x86 code, and if the memory is timing, then the
 pagetable
 walker would initiate a memory access and return without a fault - it
 means
 the cpu-read() would be called w/o the translation finished. It is the
 same
 case for the Arm.

 Is there any plan or ongoing effort to support this wait-on-TLB-miss on
 the
 other ISAs? or ideas about how to go about implementing it?

 Thanks,

 Min

 On Mon, Jul 12, 2010 at 5:44 PM, Timothy M Jones tjon...@inf.ed.ac.uk
 wrote:

  Hi Min,

 The way that the TLB deals with a timing translation is specific to each
 ISA.  I don't have much experience with anything other than Power but for
 that ISA, yes, you're correct.  The timing translation is just a wrapper
 around the atomic translation.  It seems from a quick check that Alpha is
 the same.

 If you actually wanted to have the fetch translation finish on a
 different
 cycle to the one it was initiated on then you would have to make some
 changes to the fetch stage to allow that.  I wouldn't have thought it
 would
 be too difficult but might require splitting up several functions into
 code
 that's executed before the translation and code that's executed
 afterwards.

 Cheers
 Tim


 On 12/07/2010 18:14, Min Kyu Jeong wrote:

  Hi,

 This question is regarding the changeset
 (http://repo.m5sim.org/m5?cmd=changeset;node=a123bd350935).

   This initiates a timing translation and passes the read or write on
   to the

   processor before waiting for it to finish


 It looks like even in the event of TLB miss, TLB-walk does not delay the
 actual execution of the loads. Am I correct?

 I was trying to find a reference for replacing the translateAtomic() in
 the fetch stage w/ translateTIming(). It would require some mechanism to
 stop the actual fetch until the translation is finished - which doesn't
 seem to exist in the O3 CPU even for the data translation.

 Thanks,

 Min



 

Re: [m5-dev] O3CPU + translateTiming

2010-07-13 Thread Min Kyu Jeong
Thanks, Tim

It looks like the for the DTLB translation, some code is there to handle
this but not complete, for the ISAs that does hardware page table walk.

cpu/base_dyn_inst.hh
BaseDynInstImpl::read(Addr addr, T data, unsigned flags)
{
...
initiateTranslation(req, sreqLow, sreqHigh, NULL, BaseTLB::Read);

if (fault == NoFault) {
effAddr = req-getVaddr();
effAddrValid = true;
fault = cpu-read(req, sreqLow, sreqHigh, data, lqIdx);
} else {
...
this-setExecuted();
}

It first initiate translation, and would call cpu-read() as long as a fault
has not been generated during the translation. This should work for the
Alpha, where TLB miss is treated as fault and handled in software by
PALcode. Alpha TLB returns a fault in case of a miss.

For the ISAs that does hardware page-table walk, the TLB-miss instruction
shout not either start a read (cpu-read()) or taken out of the instruction
window (this-setExecuted()). I think it should wait for the table walk to
finish and retry the execution of the load/store (it might be not true
depending on the implementation??)

I looked into the x86 code, and if the memory is timing, then the pagetable
walker would initiate a memory access and return without a fault - it means
the cpu-read() would be called w/o the translation finished. It is the same
case for the Arm.

Is there any plan or ongoing effort to support this wait-on-TLB-miss on the
other ISAs? or ideas about how to go about implementing it?

Thanks,

Min

On Mon, Jul 12, 2010 at 5:44 PM, Timothy M Jones tjon...@inf.ed.ac.ukwrote:

 Hi Min,

 The way that the TLB deals with a timing translation is specific to each
 ISA.  I don't have much experience with anything other than Power but for
 that ISA, yes, you're correct.  The timing translation is just a wrapper
 around the atomic translation.  It seems from a quick check that Alpha is
 the same.

 If you actually wanted to have the fetch translation finish on a different
 cycle to the one it was initiated on then you would have to make some
 changes to the fetch stage to allow that.  I wouldn't have thought it would
 be too difficult but might require splitting up several functions into code
 that's executed before the translation and code that's executed afterwards.

 Cheers
 Tim


 On 12/07/2010 18:14, Min Kyu Jeong wrote:

 Hi,

 This question is regarding the changeset
 (http://repo.m5sim.org/m5?cmd=changeset;node=a123bd350935).

This initiates a timing translation and passes the read or write on
to the

processor before waiting for it to finish


 It looks like even in the event of TLB miss, TLB-walk does not delay the
 actual execution of the loads. Am I correct?

 I was trying to find a reference for replacing the translateAtomic() in
 the fetch stage w/ translateTIming(). It would require some mechanism to
 stop the actual fetch until the translation is finished - which doesn't
 seem to exist in the O3 CPU even for the data translation.

 Thanks,

 Min



 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


 --
 Timothy M. Jones
 http://homepages.inf.ed.ac.uk/tjones1

 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] O3CPU + translateTiming

2010-07-13 Thread Gabriel Michael Black
I think you've mostly interpretted this correctly. The instructions  
aren't retried if the translation fails, they just hang around and  
wait for it. The check if fault == NoFault will work if the  
translation is finished by the time initiateTranslation is done.  
That's true for everything we have now except x86 and ARM, neither of  
which is currently supported by O3. What might work to fix this is to  
move the code that checks for a fault and calls the CPU read/write  
function into the callback itself. That way once translation is done,  
whenever that may be, the correct action will happen.


Gabe


Quoting Min Kyu Jeong mkje...@gmail.com:


Thanks, Tim

It looks like the for the DTLB translation, some code is there to handle
this but not complete, for the ISAs that does hardware page table walk.

cpu/base_dyn_inst.hh
BaseDynInstImpl::read(Addr addr, T data, unsigned flags)
{
...
initiateTranslation(req, sreqLow, sreqHigh, NULL, BaseTLB::Read);

if (fault == NoFault) {
effAddr = req-getVaddr();
effAddrValid = true;
fault = cpu-read(req, sreqLow, sreqHigh, data, lqIdx);
} else {
...
this-setExecuted();
}

It first initiate translation, and would call cpu-read() as long as a fault
has not been generated during the translation. This should work for the
Alpha, where TLB miss is treated as fault and handled in software by
PALcode. Alpha TLB returns a fault in case of a miss.

For the ISAs that does hardware page-table walk, the TLB-miss instruction
shout not either start a read (cpu-read()) or taken out of the instruction
window (this-setExecuted()). I think it should wait for the table walk to
finish and retry the execution of the load/store (it might be not true
depending on the implementation??)

I looked into the x86 code, and if the memory is timing, then the pagetable
walker would initiate a memory access and return without a fault - it means
the cpu-read() would be called w/o the translation finished. It is the same
case for the Arm.

Is there any plan or ongoing effort to support this wait-on-TLB-miss on the
other ISAs? or ideas about how to go about implementing it?

Thanks,

Min

On Mon, Jul 12, 2010 at 5:44 PM, Timothy M Jones tjon...@inf.ed.ac.ukwrote:


Hi Min,

The way that the TLB deals with a timing translation is specific to each
ISA.  I don't have much experience with anything other than Power but for
that ISA, yes, you're correct.  The timing translation is just a wrapper
around the atomic translation.  It seems from a quick check that Alpha is
the same.

If you actually wanted to have the fetch translation finish on a different
cycle to the one it was initiated on then you would have to make some
changes to the fetch stage to allow that.  I wouldn't have thought it would
be too difficult but might require splitting up several functions into code
that's executed before the translation and code that's executed afterwards.

Cheers
Tim


On 12/07/2010 18:14, Min Kyu Jeong wrote:


Hi,

This question is regarding the changeset
(http://repo.m5sim.org/m5?cmd=changeset;node=a123bd350935).

   This initiates a timing translation and passes the read or write on
   to the

   processor before waiting for it to finish


It looks like even in the event of TLB miss, TLB-walk does not delay the
actual execution of the loads. Am I correct?

I was trying to find a reference for replacing the translateAtomic() in
the fetch stage w/ translateTIming(). It would require some mechanism to
stop the actual fetch until the translation is finished - which doesn't
seem to exist in the O3 CPU even for the data translation.

Thanks,

Min



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



--
Timothy M. Jones
http://homepages.inf.ed.ac.uk/tjones1

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev






___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] O3CPU + translateTiming

2010-07-12 Thread Timothy M Jones

Hi Min,

The way that the TLB deals with a timing translation is specific to each 
ISA.  I don't have much experience with anything other than Power but 
for that ISA, yes, you're correct.  The timing translation is just a 
wrapper around the atomic translation.  It seems from a quick check that 
Alpha is the same.


If you actually wanted to have the fetch translation finish on a 
different cycle to the one it was initiated on then you would have to 
make some changes to the fetch stage to allow that.  I wouldn't have 
thought it would be too difficult but might require splitting up several 
functions into code that's executed before the translation and code 
that's executed afterwards.


Cheers
Tim

On 12/07/2010 18:14, Min Kyu Jeong wrote:

Hi,

This question is regarding the changeset
(http://repo.m5sim.org/m5?cmd=changeset;node=a123bd350935).

This initiates a timing translation and passes the read or write on
to the

processor before waiting for it to finish


It looks like even in the event of TLB miss, TLB-walk does not delay the
actual execution of the loads. Am I correct?

I was trying to find a reference for replacing the translateAtomic() in
the fetch stage w/ translateTIming(). It would require some mechanism to
stop the actual fetch until the translation is finished - which doesn't
seem to exist in the O3 CPU even for the data translation.

Thanks,

Min



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


--
Timothy M. Jones
http://homepages.inf.ed.ac.uk/tjones1

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev