I'm experimenting with various O3 configurations combined with Ruby's 
MESI_Three_Level memory subsystem. I notice that it's very challenging to 
provide the core with more memory bandwidth. For typical/realistic 
O3/Ruby/memory parameters, a single core struggles to achieve 3000 MB/s in 
STREAM. If I max out all the parameters of the O3 core, Ruby, the NoC and 
provide a lot of memory bandwidth, STREAM just about reaches 6000 MB/s. I 
believe this should be much higher. I've found one possible explanation for 
this behaviour.

The Ruby Sequencer receives memory requests from the core via the function 
Sequencer::insertRequest(PacketPtr pkt, RubyRequestType request_type). This 
function determine whether there are requests to the same cache line and--if 
there are--returns without enqueing the memory request. This also happens with 
load requests in which there is already an outstanding load request to the same 
cache line.

RequestTable::value_type default_entry(line_addr, (SequencerRequest*) NULL);
pair<RequestTable::iterator, bool> r  = 
m_readRequestTable.insert(default_entry);

if (r.second) {
    /* snip */
} else {
    // There is an outstanding read request for the cache line
    m_load_waiting_on_load++;
    return RequestStatus_Aliased;
}

This eventually returns to the LSQ which interprets the Aliased RequestStatus 
as the cache controller being blocked.

bool LSQUnit<Impl>::trySendPacket(bool isLoad, PacketPtr data_pkt)
{
    if (!lsq->cacheBlocked() &&
        lsq->cachePortAvailable(isLoad)) {
        if (!dcachePort->sendTimingReq(data_pkt)) {
            ret = false;
            cache_got_blocked = true;
        }
     }
     if (cache_got_blocked) {
         lsq->cacheBlocked(true);
        ++lsqCacheBlocked;
     }
}

If the code is generating many load requests to contigious memory, e.g. in 
STREAM, won't the cache get blocked extremely frequently? Would this explain 
why it's so difficult to get the core to consume more bandwidth?

I'm happy to go ahead and fix/improve this, but I wanted to check first that 
I'm not missing something--can Ruby handle multiple outstanding loads to the 
same cache line without blocking the cache?


--

Timothy Hayes

Senior Research Engineer

Arm Research

Phone: +44-1223405170

timothy.ha...@arm.com


​


IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to