Hi Nilay,
Overall, I believe we are in more agreement with each other than maybe you
think. I'm glad you included pseudo code in your latest email. That is a
great idea. I think part of our problem is we are comprehending our textual
descriptions in different ways.
Below are my responses:
> >
> > Absolutely, we still need the ability to allocate or deallocate
> > entries within actions. I'm not advocating to completely eliminate
> > the set/unset cache and tbe entry functions. I just want to avoid
> > including those calls in the inports. I'm confused why the mandatory
> > queue is different than other queues. They all trigger events in the same
> way.
>
> if (L1IcacheMemory.isTagPresent(in_msg.LineAddress)) {
> // The tag matches for the L1, so the L1 asks the L2 for it.
> trigger(mandatory_request_type_to_event(in_msg.Type),
> in_msg.LineAddress); }
>
> Brad, mandatory queue is just an example where an inport may perform tag
> lookup before cache and transaction buffer entries has been set. Above is an
> excerpt from the file MOESI_CMP_directory-L1cache.sm. Before the
> trigger() is called, isTagPresent() is called. This means tag look up is being
> performed before cache or transaction buffer entries have been set.
> Suppose the tag was present in L1Icache, then in the trigger() call, we will
> again perform lookup.
>
> Similarly, there is an inport in the Hammer's protocol implementation where
> getCacheEntry() is called before a call to trigger(). Now, why should we use
> getCacheEntry() in the inport and cache entry in the action?
>
The reason is, as you pointed out, we ideally want to call getCacheEntry once.
I believe your suggestion to use local variables in the input ports gets us
there. Below is what I'm envisioning for the MOESI_hammer mandatory queue
in_port logic (at least the IFETCH half of the logic):
ENTRY getL1ICacheEntry(Address addr) {
assert(is_valid(L1DcacheMemory.lookup(addr) == FALSE);
assert(is_valid(L2cacheMemory.lookup(addr) == FALSE);
return L1IcacheMemory.lookup(addr);
}
ENTRY getL1DCacheEntry(Address addr) {
assert(is_valid(L1IcacheMemory.lookup(addr) == FALSE);
assert(is_valid(L2cacheMemory.lookup(addr) == FALSE);
return L1DcacheMemory.lookup(addr);
}
ENTRY getL2CacheEntry(Address addr) {
assert(is_valid(L1IcacheMemory.lookup(addr) == FALSE);
assert(is_valid(L1DcacheMemory.lookup(addr) == FALSE);
return L2cacheMemory.lookup(addr);
}
in_port(mandatoryQueue_in, CacheMsg, mandatoryQueue, desc="...", rank=0) {
if (mandatoryQueue_in.isReady()) {
peek(mandatoryQueue_in, CacheMsg, block_on="LineAddress") {
// Set the local entry variables
ENTRY L1I_cache_entry = getL1ICacheEntry(in_msg.LineAddress);
ENTRY L1D_cache_entry = getL1DCacheEntry(in_msg.LineAddress);
TBE_Entry tbe_entry = getTBE(in_msg.LineAddress);
// Check for data access to blocks in I-cache and ifetchs to blocks in
D-cache
if (in_msg.Type == CacheRequestType:IFETCH) {
// ** INSTRUCTION ACCESS ***
// Check to see if it is in the OTHER L1
if (is_valid(L1D_cache_entry)) {
// The block is in the wrong L1, try to write it to the L2
if (L2cacheMemory.cacheAvail(in_msg.LineAddress)) {
trigger(Event:L1_to_L2, in_msg.LineAddress, L1D_cache_entry,
tbe_entry);
} else {
replace_addr = L2cacheMemory.cacheProbe(in_msg.LineAddress);
replace_cache_entry = getL2CacheEntry(replace_addr);
replace_tbe_entry = getTBE(replace_addr);
trigger(Event:L2_Replacement, replace_addr, replace_cache_entry,
replace_tbe_entry);
}
}
if (is_valid(L1I_cache_entry)) {
// The tag matches for the L1, so the L1 fetches the line. We know
it can't be in the L2 due to exclusion
trigger(mandatory_request_type_to_event(in_msg.Type),
in_msg.LineAddress, L1I_cache_entry, tbe_entry);
} else {
if (L1IcacheMemory.cacheAvail(in_msg.LineAddress)) {
// L1 does't have the line, but we have space for it in the L1
ENTRY L2_cache_entry = getL2CacheEntry(in_msg.LineAddress);
if (is_valid(L2_cache_entry)) {
// L2 has it (maybe not with the right permissions)
trigger(Event:Trigger_L2_to_L1I, in_msg.LineAddress,
L2_cache_entry, tbe_entry);
} else {
// We have room, the L2 doesn't have it, so the L1 fetches the
line
trigger(mandatory_request_type_to_event(in_msg.Type),
in_msg.LineAddress, L1Icache_entry, tbe_entry);
// you could also say here:
"trigger(mandatory_request_type_to_event(in_msg.Type), in_msg.LineAddress, ODD,
ODD);"
}
} else {
// No room in the L1, so we need to make room
if
(L2cacheMemory.cacheAvail(L1IcacheMemory.cacheProbe(in_msg.LineAddress))) {
// The L2 has room, so we move the line from the L1 to the L2
replace_addr = L1IcacheMemory.cacheProbe(in_msg.LineAddress);
replace_cache_entry = getL2CacheEntry(replace_addr);
replace_tbe_entry = getTBE(replace_addr);
trigger(Event:L1_to_L2, replace_addr, replace_cache_entry,
replace_tbe_entry);
} else {
// The L2 does not have room, so we replace a line from the L2
replace_addr =
L2cacheMemory.cacheProbe(L1IcacheMemory.cacheProbe(in_msg.LineAddress));
replace_cache_entry = getL2CacheEntry(replace_addr);
replace_tbe_entry = getTBE(replace_addr);
trigger(Event:L2_Replacement, replace_addr,
replace_cache_entry, replace_tbe_entry);
}
}
}
} else {
// *** DATA ACCESS ***
...
}
> // The tag matches for the L1, so the L1 asks the L2 for it.
> trigger(mandatory_request_type_to_event(in_msg.Type),
> in_msg.LineAddress); }
> In MESI_CMP_directory, getState() is called from an inport. This means we
> cannot have an implementation of getState() that makes use of cache entry
> variable since it would not have been set. Now, different implementations
> for setState() and getState() simply does not make sense to me, so in my
> opinion setState() will also not use cache entry. These two function (I just
> went through the profile output for MOESI hammer), account for about half
> of the calls to isTagPresent(), the function that we are trying to get rid of.
>
I think we need to modify MESI_CMP_directories getState function to use the
cache_entry variable. It doesn't makes sense to me that it wouldn't since the
state is part of the cache_entry. How else does getState and setState access
the cache state?
> > Maybe I should point out that I'm assuming that getCacheEntry can
> > return a NULL pointer and thus that can be passed into the trigger
> > call when no cache or tbe entry exists.
>
> You are correct that getCacheEntry() would have to return a NULL, another
> thing which I believe we earlier preferred avoiding. As an aside, we cannot
> use the keyword NULL since it is already in use. So, we will have to rename
> NULL as some thing else.
>
> On second thought, I think it may not be necessary for getCacheEntry() to
> use the keyword.
>
Yes, I initially was hoping to avoid the NULL (or the OOD) keyword, but over
the past few weeks I have realized that isn't possible. By the way, if all
things are equal, I would prefer to use your OOD keyword versus NULL. That way
we differentiate from C++ NULL since the getCacheEntry functions don't return
pointers. At least I'm still hoping to avoid that.
> >
> >
> >> Another concern is in implementation of getCacheEntry(). If this
> >> function has to return a pointer to a cache entry, we would have to
> >> provide support for local variables which internally SLICC would assume to
> be pointer variables.
> >>
> >
> > Within SLICC understanding that certain variables are actually
> > pointers is a little bit of a nuisance, but there already exists
> > examples where we make that distinction. For instance, look at the "if
> para.pointer"
> > conditionals in StateMachine.py. We just have to treat cache and tbe
> > entries in the same fashion.
>
> I know it is possible to declare pointers in SLICC, CacheMemory being the
> foremost example. But we only allow declaration of pointers. We tacitly
> assume that when ever they will be used, they will only be used after being
> dereferenced. Now, in case of getCacheEntry(), I do not see this happening.
> Below is the current implementation of getCacheEntry() from
> MOESI_CMP_directory-L1cache.sm.
>
> Entry getCacheEntry(Address addr), return_by_ref="yes" {
> if (L1DcacheMemory.isTagPresent(addr)) {
> return static_cast(Entry, L1DcacheMemory[addr]);
> } else {
> return static_cast(Entry, L1IcacheMemory[addr]);
> }
> }
>
> I would probably convert it to some thing like this.
>
> AbstractCacheEntry getCacheEntry(Address addr) {
> AbstractCacheEntry * cache_entry = L1DcacheMemory.lookup(addr);
> if(is_valid(cache_entry)) return cache_entry;
> return L1IcacheMemory.lookup(addr);
> }
>
> Now, if we assume that cache_entry will always be used in its dereferenced
> form, then we will face problem in returning cache entry. We can introduce a
> 'return_by_pointer' annotation for the function to handle this situtation. But
> you have pointed out in your previous emails that it is possible for
> getCacheEntry() to do more things, which may cause problems for the
> compiler.
>
I think we always assume cache_entry is dereference whenever one of its members
or member functions is called. However cache_entry itself is a pointer and
getCacheEntry always returns a pointer.
> >
> >
> >> In my opinion, we should maintain one method for looking up cache
> entries.
> >> My own experience informs me that it is not difficult to incorporate
> >> calls to set/unset_cache_entry () in already existing protocol
> implementations.
> >> For implementing new protocols, I think the greater obstacle will be
> >> in implementing the protocol correctly and not in using entry
> >> variables correctly. If we document this change lucidly, there is no
> >> reason to believe a SLICC programmer will be exceptionally pushed
> because of this change.
> >>
> >> Assuming that this change does introduce some complexity in
> >> progamming with SLICC, does that complexity out weigh the performance
> improvements?
> >>
> >
> > My position is we can leverage SLICC as an intermediate language and
> > achieve the performance benefits of your change without significantly
> > impacting the programmability. I agree that we need the
> > set/unset_cache_entry calls in the allocate and deallocate actions. I
> > see no problem with that. I just want to treat these new implicit
> > cache and tbe entry variables like the existing implicit variable address.
> > Therefore I want to pass them into the trigger operation like the
> > address variable. I also want just one method for looking up cache
> > entries. I believe the only difference is that I would like to set
> > the cache and tbe entries in the trigger function, as well as allowing
> > them to be set in the actions.
> >
>
> I still don't think that we can avoid those calls in inports. By avoiding
> those calls, the job of the compiler and the programmer will only become
> more difficult. Currently I am of the opinion that if we avoid those
> calls, we should avoid this change all together.
>
We definitely should not give up on this patch. You've put a lot of hard work
into it and, as your performance data has suggested, it should give us a
noticeable speedup. I sympathize with your position and I really appreciate
the effort you've put it. If you'd like, I would be happy to take over the
patch if you'd like.
Brad
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev