Dear MARSS developers

Recently I found some timings issues in MARSS. All issues are, I think,
irrelevant to functional correctness.

1) Memory requests in CPUController inconsistently wake up.
Memory requests wake up sometimes in a single cycle or at other times in 2
cycles.
The CPUController's clocking logic intends that the wakeup time of memory
requests depends on only operational latency (wakeup delay, cache access
latency), but it also depends on where they are placed in CPUController's
pending request queue.

CPUController iterates over its pending requests and decreases the left
cycle of each request.
This logic seems reasonable, but misses a corner case where a woken up
entry produces a new event.
Let's say request A wake up at cycle 10 and produces a new request B which
is placed after the request A in CPUController's pending request queue.
Request B is scheduled to wake up after 2 cycles; its left cycle is set as
2.
As it is located after request A in the request queue, CPUController handle
it and decrement its left cycles to 1 at cycle 10.
As a result, after the next cycle, cycle 11, the request B is woken up by
CPUController as left cycles become 0.
If it were located before the request A, it could wake up at cycle 12.

To fix this problem, I changed clock function to skip requests that are
created at the current cycle. Refer my cpuController.{cpp, h}.patch

2) A thread's context switching makes the other threads flush their TLBs.
I found an instruction that succeeded ITLB access and accessed I-cache has
ITLB miss again after a thread the instruction does not belong to finishes
context switching. The current implementation of TLB flushing logic flushes
all threads make flush their TLBs. This does not hurt the functionality of
MARSS, but it adds lots of timing overhead that the real microarchitecture
would not incur.

I changed flush_tlb function to check the request belong to a thread. Check
my patch to OooCore::flush_tlb. (ooo.cpp.path)

3) Stores that wait a value to store wake up early than their dependent
uops.
O3 model issues in two phases, address calculation and memory access. So,
stores first issues when operands ra and rb are ready. At the same time, rc
value is checked with PhysicalRegister::ready() function. If rc is ready,
the stores succeed in issue. However, the ready function is not for
checking whether value is available in O3 timing model. It's for checking a
value is calculated by PTLSim.

I changed the line ooo-exec.cpp:562 like the below
     bool rcready = rc.state != PHYSREG_WAITING;
     completed = issuestore(*lsq, origvirt, radata, rbdata, rcdata,
rcready, pteupdate);
This is not complete as it does not take account of clustered architecture.
If you want to model clustered architecture, which have several issue
queues, you need to know when rc value arrives in the issue queue where
store is waiting to issue.

4) A load/store that already experienced TLB hit sometimes get TLB miss.
I think this is not a timing bug, but I want to propose this as O3 pipeline
optimization. Sometimes memory operands are issued several times due to
memory disambiguation and unavailable caches. During the repeated issues,
the memory instructions suffer from TLB miss for the address that they've
already calculated. As the implementation of issueload and issuestore
always check TLB even if the instructions have already got their addresses,
and sometimes TLB entries the instructions hit at the previous issue are
evicted due to others' accesses, they sometimes suffer additional TLB
misses.
This type of  TLB misses can be avoidable if the addresses are latched in
their LSQ. I patched the pipeline to cache the calculated addresses in
their LSQ. Check the patch, ooo.h.patch, ooo-exec.cpp.patch,and
ooo-pipe.cpp.patch.


Thanks,
Hanhwi

Attachment: cpuController.h.patch
Description: Binary data

Attachment: ooo-pipe.cpp.patch
Description: Binary data

Attachment: ooo.cpp.patch
Description: Binary data

Attachment: ooo.h.patch
Description: Binary data

Attachment: cpuController.cpp.patch
Description: Binary data

Attachment: ooo-exec.cpp.patch
Description: Binary data

_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Reply via email to