Hi Nilay,

Overall, I believe we are in more agreement with each other than maybe you 
think.  I'm glad you included pseudo code in your latest email.  That is a 
great idea.  I think part of our problem is we are comprehending our textual 
descriptions in different ways.

Below are my  responses:
 
> >
> > Absolutely, we still need the ability to allocate or deallocate
> > entries within actions.  I'm not advocating to completely eliminate
> > the set/unset cache and tbe entry functions.  I just want to avoid
> > including those calls in the inports.  I'm confused why the mandatory
> > queue is different than other queues.  They all trigger events in the same
> way.
> 
> if (L1IcacheMemory.isTagPresent(in_msg.LineAddress)) {
>     // The tag matches for the L1, so the L1 asks the L2 for it.
>    trigger(mandatory_request_type_to_event(in_msg.Type),
> in_msg.LineAddress); }
> 
> Brad, mandatory queue is just an example where an inport may perform tag
> lookup before cache and transaction buffer entries has been set. Above is an
> excerpt from the file MOESI_CMP_directory-L1cache.sm. Before the
> trigger() is called, isTagPresent() is called. This means tag look up is being
> performed before cache or transaction buffer entries have been set.
> Suppose the tag was present in L1Icache, then in the trigger() call, we will
> again perform lookup.
> 
> Similarly, there is an inport in the Hammer's protocol implementation where
> getCacheEntry() is called before a call to trigger(). Now, why should we use
> getCacheEntry() in the inport and cache entry in the action?
> 

The reason is, as you pointed out, we ideally want to call getCacheEntry once.  
I believe your suggestion to use local variables in the input ports gets us 
there.  Below is what I'm envisioning for the MOESI_hammer mandatory queue 
in_port logic (at least the IFETCH half of the logic):

ENTRY getL1ICacheEntry(Address addr) {
   assert(is_valid(L1DcacheMemory.lookup(addr) == FALSE);
   assert(is_valid(L2cacheMemory.lookup(addr) == FALSE);
   return L1IcacheMemory.lookup(addr);
}

ENTRY getL1DCacheEntry(Address addr) {
   assert(is_valid(L1IcacheMemory.lookup(addr) == FALSE);
   assert(is_valid(L2cacheMemory.lookup(addr) == FALSE);
   return L1DcacheMemory.lookup(addr);
}

ENTRY getL2CacheEntry(Address addr) {
   assert(is_valid(L1IcacheMemory.lookup(addr) == FALSE);
   assert(is_valid(L1DcacheMemory.lookup(addr) == FALSE);
   return L2cacheMemory.lookup(addr);
}

  in_port(mandatoryQueue_in, CacheMsg, mandatoryQueue, desc="...", rank=0) {
    if (mandatoryQueue_in.isReady()) {
      peek(mandatoryQueue_in, CacheMsg, block_on="LineAddress") {

        // Set the local entry variables
        ENTRY L1I_cache_entry = getL1ICacheEntry(in_msg.LineAddress);
        ENTRY L1D_cache_entry = getL1DCacheEntry(in_msg.LineAddress);
        TBE_Entry tbe_entry = getTBE(in_msg.LineAddress);

        // Check for data access to blocks in I-cache and ifetchs to blocks in 
D-cache

        if (in_msg.Type == CacheRequestType:IFETCH) {
          // ** INSTRUCTION ACCESS ***

          // Check to see if it is in the OTHER L1
          if (is_valid(L1D_cache_entry)) {
            // The block is in the wrong L1, try to write it to the L2
            if (L2cacheMemory.cacheAvail(in_msg.LineAddress)) {
              trigger(Event:L1_to_L2, in_msg.LineAddress, L1D_cache_entry, 
tbe_entry);
            } else {
              replace_addr = L2cacheMemory.cacheProbe(in_msg.LineAddress);
              replace_cache_entry = getL2CacheEntry(replace_addr);
              replace_tbe_entry = getTBE(replace_addr);
              trigger(Event:L2_Replacement, replace_addr, replace_cache_entry, 
replace_tbe_entry);
            }
          }

          if (is_valid(L1I_cache_entry)) { 
            // The tag matches for the L1, so the L1 fetches the line.  We know 
it can't be in the L2 due to exclusion
            trigger(mandatory_request_type_to_event(in_msg.Type), 
in_msg.LineAddress, L1I_cache_entry, tbe_entry);
          } else {
            if (L1IcacheMemory.cacheAvail(in_msg.LineAddress)) {
              // L1 does't have the line, but we have space for it in the L1
              ENTRY L2_cache_entry = getL2CacheEntry(in_msg.LineAddress);
              if (is_valid(L2_cache_entry)) {
                // L2 has it (maybe not with the right permissions)
                trigger(Event:Trigger_L2_to_L1I, in_msg.LineAddress, 
L2_cache_entry, tbe_entry);
              } else {
                // We have room, the L2 doesn't have it, so the L1 fetches the 
line
                trigger(mandatory_request_type_to_event(in_msg.Type), 
in_msg.LineAddress, L1Icache_entry, tbe_entry);
        // you could also say here: 
"trigger(mandatory_request_type_to_event(in_msg.Type), in_msg.LineAddress, ODD, 
ODD);"
              }
            } else {
              // No room in the L1, so we need to make room              
              if 
(L2cacheMemory.cacheAvail(L1IcacheMemory.cacheProbe(in_msg.LineAddress))) {
                // The L2 has room, so we move the line from the L1 to the L2
                replace_addr = L1IcacheMemory.cacheProbe(in_msg.LineAddress);
                replace_cache_entry = getL2CacheEntry(replace_addr);
                replace_tbe_entry = getTBE(replace_addr);
                trigger(Event:L1_to_L2, replace_addr, replace_cache_entry, 
replace_tbe_entry);
              } else {
                // The L2 does not have room, so we replace a line from the L2
                replace_addr = 
L2cacheMemory.cacheProbe(L1IcacheMemory.cacheProbe(in_msg.LineAddress));
                replace_cache_entry = getL2CacheEntry(replace_addr);
                replace_tbe_entry = getTBE(replace_addr);
                trigger(Event:L2_Replacement, replace_addr, 
replace_cache_entry, replace_tbe_entry);
              }
            }
          }
        } else {
          // *** DATA ACCESS ***
...
  }

>     // The tag matches for the L1, so the L1 asks the L2 for it.
>    trigger(mandatory_request_type_to_event(in_msg.Type),
> in_msg.LineAddress); }

> In MESI_CMP_directory, getState() is called from an inport. This means we
> cannot have an implementation of getState() that makes use of cache entry
> variable since it would not have been set. Now, different implementations
> for setState() and getState() simply does not make sense to me, so in my
> opinion setState() will also not use cache entry. These two function (I just
> went through the profile output for MOESI hammer), account for about half
> of the calls to isTagPresent(), the function that we are trying to get rid of.
> 
I think we need to modify MESI_CMP_directories getState function to use the 
cache_entry variable.  It doesn't makes sense to me that it wouldn't since the 
state is part of the cache_entry.  How else does getState and setState access 
the cache state?

> > Maybe I should point out that I'm assuming that getCacheEntry can
> > return a NULL pointer and thus that can be passed into the trigger
> > call when no cache or tbe entry exists.
> 
> You are correct that getCacheEntry() would have to return a NULL, another
> thing which I believe we earlier preferred avoiding. As an aside, we cannot
> use the keyword NULL since it is already in use. So, we will have to rename
> NULL as some thing else.
> 
> On second thought, I think it may not be necessary for getCacheEntry() to
> use the keyword.
> 

Yes, I initially was hoping to avoid the NULL (or the OOD) keyword, but over 
the past few weeks I have realized that isn't possible.  By the way, if all 
things are equal, I would prefer to use your OOD keyword versus NULL.  That way 
we differentiate from C++ NULL since the getCacheEntry functions don't return 
pointers.  At least I'm still hoping to avoid that.

> >
> >
> >> Another concern is in implementation of getCacheEntry(). If this
> >> function has to return a pointer to a cache entry, we would have to
> >> provide support for local variables which internally SLICC would assume to
> be pointer variables.
> >>
> >
> > Within SLICC understanding that certain variables are actually
> > pointers is a little bit of a nuisance, but there already exists
> > examples where we make that distinction.  For instance, look at the "if
> para.pointer"
> > conditionals in StateMachine.py.  We just have to treat cache and tbe
> > entries in the same fashion.
> 
> I know it is possible to declare pointers in SLICC, CacheMemory being the
> foremost example. But we only allow declaration of pointers. We tacitly
> assume that when ever they will be used, they will only be used after being
> dereferenced. Now, in case of getCacheEntry(), I do not see this happening.
> Below is the current implementation of getCacheEntry() from
> MOESI_CMP_directory-L1cache.sm.
> 
> Entry getCacheEntry(Address addr), return_by_ref="yes" {
>    if (L1DcacheMemory.isTagPresent(addr)) {
>      return static_cast(Entry, L1DcacheMemory[addr]);
>    } else {
>      return static_cast(Entry, L1IcacheMemory[addr]);
>    }
> }
> 
> I would probably convert it to some thing like this.
> 
> AbstractCacheEntry getCacheEntry(Address addr) {
>    AbstractCacheEntry * cache_entry = L1DcacheMemory.lookup(addr);
>    if(is_valid(cache_entry)) return cache_entry;
>    return L1IcacheMemory.lookup(addr);
> }
> 
> Now, if we assume that cache_entry will always be used in its dereferenced
> form, then we will face problem in returning cache entry. We can introduce a
> 'return_by_pointer' annotation for the function to handle this situtation. But
> you have pointed out in your previous emails that it is possible for
> getCacheEntry() to do more things, which may cause problems for the
> compiler.
> 

I think we always assume cache_entry is dereference whenever one of its members 
or member functions is called.  However cache_entry itself is a pointer and 
getCacheEntry always returns a pointer.

> >
> >
> >> In my opinion, we should maintain one method for looking up cache
> entries.
> >> My own experience informs me that it is not difficult to incorporate
> >> calls to set/unset_cache_entry () in already existing protocol
> implementations.
> >> For implementing new protocols, I think the greater obstacle will be
> >> in implementing the protocol correctly and not in using entry
> >> variables correctly. If we document this change lucidly, there is no
> >> reason to believe a SLICC programmer will be exceptionally pushed
> because of this change.
> >>
> >> Assuming that this change does introduce some complexity in
> >> progamming with SLICC, does that complexity out weigh the performance
> improvements?
> >>
> >
> > My position is we can leverage SLICC as an intermediate language and
> > achieve the performance benefits of your change without significantly
> > impacting the programmability.  I agree that we need the
> > set/unset_cache_entry calls in the allocate and deallocate actions.  I
> > see no problem with that.  I just want to treat these new implicit
> > cache and tbe entry variables like the existing implicit variable address.
> > Therefore I want to pass them into the trigger operation like the
> > address variable.  I also want just one method for looking up cache
> > entries.  I believe the only difference is that I would like to set
> > the cache and tbe entries in the trigger function, as well as allowing
> > them to be set in the actions.
> >
> 
> I still don't think that we can avoid those calls in inports. By avoiding
> those calls, the job of the compiler and the programmer will only become
> more difficult. Currently I am of the opinion that if we avoid those
> calls, we should avoid this change all together.
> 

We definitely should not give up on this patch.  You've put a lot of hard work 
into it and, as your performance data has suggested, it should give us a 
noticeable speedup.  I sympathize with your position and I really appreciate 
the effort you've put it.  If you'd like, I would be happy to take over the 
patch if you'd like.  

Brad

_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to