Hi guys,

I am using MARSSx86 to do some investigations on cache design and noticed
some strange behavior in the implementation of the simulator for store
handling.

Stores are fed into the memory hierarchy during commit stage and the
corresponding ROB and LSQ entries are deallocated immediately afterwards.
In some situations this will cause the prendingRequest list in
cpuController to be flooded with store requests.
This does not seem right from the architecture point of view, right? All
memory requests need to be tracked by some kind of hardware structure until
the data is finally merged into the cache.
So in MARSSx86 this would be the ROB+STQ entry. In case of a cache miss,
finalization of the request might take some time and new requests from the
issueQ need to be re-issued in case the LD/ST queues are full.

I did some debugging by dumping the pendingRequest list in the
cpuController from time to time as I was curious about the purpose and
functionality of the pendingRequest list.
For that reason I increased the size to 512 entries (also increased the
pendingRequest lists of the cacheControllers) and what happens is that in
some cases the list will fill up with store requests.

I would assume that this list can hold  at max STQ_SIZE+LDQ_SIZE(+some
entries for icache requests?) entries.

Due to the fact that the ROB/STQ entries will be deallocated and the
stores_in_flight counter is decremented as well after the store request, a
consecutive store might allocate that ROB/STQ and send itself a new
request, while the first store is still in flight due to a miss.

Request{Memory Request: core[0] thread[0] address[0x00011db42bc0] robid[*109
*] init-cycle[276140] ref-counter[4] op-type[memory_op_write] isData[1]
ownerUUID[288412] ownerRIP[0x4ca4b2] History[ {+core_0_cont} {+L1_D_0}
{+L2_0} {+MEM_0} ] Signal[ ooo_0_0-dcache-wakeup] } idx[145] cycles[-428]
depends[176] waitFor[-1] annuled[0]
Request{Memory Request: core[0] thread[0] address[0x00011db42cf0] robid[*109
*] init-cycle[276185] ref-counter[1] op-type[memory_op_write] isData[1]
ownerUUID[288540] ownerRIP[0x4ca4ab] History[ {+core_0_cont} ] Signal[
ooo_0_0-dcache-wakeup] } idx[225] cycles[-383] depends[226] waitFor[242]
annuled[0]

(I have added the robid here for debugging purposes. In the original
sources the robid is always zero in case of a store request)

I could observe some situations where over 460 store requests were present
in the pendingRequest list of the cpuController. (size = 512)
This will happen if, for example, a memset funtion is called to zero a
bunch of cachelines inside a loop.

What do you think about this?
I think it would be a valid scenario if the ROB entry would be deallocated
after the store request, but in that case the STQ entry needs to stay valid
until the store request is finalized. I am not sure if that's possible as,
the ROB and LSQ are closely bound together in MARSS if I interpreted the
code correctly.
For now I solved the issue by limiting the allocation of new pending
requests during the call to MemoryHierarchy::is_cache_available(). I track
the number of pending loads and stores inside the cpuController class and
only allow allocation if store count does not exceed STQ_SIZE/load count
does not exceed LDQ_SIZE. In case the function will return false, a load
operation will be re-issued and a store may not commit at that point.
Though I am not sure if that's a good solution for this.

Regards,
Stefan
_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Reply via email to