> On Nov. 6, 2015, 10:14 a.m., Stephan Diestelhorst wrote:
> > src/cpu/trace/trace_cpu.cc, line 168
> > <http://reviews.gem5.org/r/3029/diff/2/?file=51455#file51455line168>
> >
> >     space around /

Done.


> On Nov. 6, 2015, 10:14 a.m., Stephan Diestelhorst wrote:
> > src/cpu/trace/trace_cpu.cc, line 1074
> > <http://reviews.gem5.org/r/3029/diff/2/?file=51455#file51455line1074>
> >
> >     I think
> >     
> >     trace.read(&currElement)
> >     
> >     would make it more obvious that currElement is an output operand here.  
> > (Other options would be currElement = trace.read(), but then checking for 
> > failure needs extra handling.)

Done.


- Radhika


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3029/#review7521
-----------------------------------------------------------


On Nov. 19, 2015, 5:22 p.m., Curtis Dunham wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/3029/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2015, 5:22 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> This patch defines a TraceCPU that replays trace generated using the elastic
> trace probe attached to the O3 CPU model. The elastic trace is an execution
> trace with data dependencies and ordering dependencies annoted to it. It also
> replays fixed timestamp instruction fetch trace that is also generated by the
> elastic trace probe.
> 
> The TraceCPU inherits from BaseCPU as a result of which some methods need
> to be defined. It has two port subclasses inherited from MasterPort for
> instruction and data ports. It issues the memory requests deducing the
> timing from the trace and without performing real execution of micro-ops.
> As soon as the last dependency for an instruction is complete,
> its computational delay, also provided in the input trace is added. The
> dependency-free nodes are maintained in a list, called 'ReadyList',
> ordered by ready time. Instructions which depend on load stall until the
> responses for read requests are received thus achieving elastic replay. If
> the dependency is not found when adding a new node, it is assumed complete.
> Thus, if this node is found to be completely dependency-free its issue time is
> calculated and it is added to the ready list immediately. This is encapsulated
> in the subclass ElasticDataGen.
> 
> If ready nodes are issued in an unconstrained way there can be more nodes
> outstanding which results in divergence in timing compared to the O3CPU.
> Therefore, the Trace CPU also models hardware resources. A sub-class to model
> hardware resources is added which contains the maximum sizes of load buffer,
> store buffer and ROB. If resources are not available, the node is not issued.
> The 'depFreeQueue' structure holds nodes that are pending issue.
> 
> Modeling the ROB size in the Trace CPU as a resource limitation is arguably 
> the
> most important parameter of all resources. The ROB occupancy is estimated 
> using
> the newly added field 'robNum'. We need to use ROB number as sequence number 
> is
> at times much higher due to squashing and trace replay is focused on correct
> path modeling.
> 
> A map called 'inFlightNodes' is added to track nodes that are not only in
> the readyList but also load nodes that are executed (and thus removed from
> readyList) but are not complete. ReadyList handles what and when to execute
> next node while the inFlightNodes is used for resource modelling. The oldest
> ROB number is updated when any node occupies the ROB or when an entry in the
> ROB is released. The ROB occupancy is equal to the difference in the ROB 
> number
> of the newly dependency-free node and the oldest ROB number in flight.
> 
> If no node dependends on a non load/store node then there is no reason to 
> track
> it in the dependency graph. We filter out such nodes but count them and add a
> weight field to the subsequent node that we do include in the trace. The 
> weight
> field is used to model ROB occupancy during replay.
> 
> The depFreeQueue is chosen to be FIFO so that child nodes which are in
> program order get pushed into it in that order and thus issued in the in
> program order, like in the O3CPU. This is also why the dependents is made a
> sequential container, std::set to std::vector. We only check head of the
> depFreeQueue as nodes are issued in order and blocking on head models that
> better than looping the entire queue. An alternative choice would be to 
> inspect
> top N pending nodes where N is the issue-width. This is left for future as the
> timing correlation looks good as it is.
> 
> At the start of an execution event, first we attempt to issue such pending
> nodes by checking if appropriate resources have become available. If yes, we
> compute the execute tick with respect to the time then. Then we proceed to
> complete nodes from the readyList.
> 
> When a read response is received, sometimes a dependency on it that was
> supposed to be released when it was issued is still not released. This occurs
> because the dependent gets added to the graph after the read was sent. So the
> check is made less strict and the dependency is marked complete on read
> response instead of insisting that it should have been removed on read sent.
> 
> There is a check for requests spanning two cache lines as this condition
> triggers an assert fail in the L1 cache. If it does then truncate the size
> to access only until the end of that line and ignore the remainder.
> Strictly-ordered requests are skipped and the dependencies on such requests
> are handled by simply marking them complete immediately.
> 
> The simulated seconds can be calculated as the difference between the
> final_tick stat and the tickOffset stat. A CountedExitEvent that contains
> a static int belonging to the Trace CPU class as a down counter is used to
> implement multi Trace CPU simulation exit.
> 
> 
> Diffs
> -----
> 
>   src/cpu/trace/trace_cpu.hh PRE-CREATION 
>   src/cpu/trace/trace_cpu.cc PRE-CREATION 
>   src/cpu/trace/SConscript PRE-CREATION 
>   src/cpu/trace/TraceCPU.py PRE-CREATION 
> 
> Diff: http://reviews.gem5.org/r/3029/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Curtis Dunham
> 
>

_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to