Stephan Diestelhorst wrote on Tuesday 13 November 2012, 19:29:56:
> Hi,
>   I am curious about the handling of dependencies / order in the cache
> subsystem.  So far, I have looked at the function find_dependencies in
> *Controller, and have a few related questions:

[...]

> 2) How are the dependencies enforced?  I have to admit I frequently get
>    lost with all the *_cb and handle_* functions and the multiple levels
>    of requests etc.  I have, however, failed so far to understand how
>    dependants are woken up (only found
>    CPUController::wakeup_dependents), but that one should not apply for
>    the dependencies in the normal coherent CacheController.  I also did
>    not find any retry path which will make the stalled request check its
>    dependencies periodically, in case there is no external wakeup.

Just answering myself here, since I just found that path:

CacheController::clear_entry_cb wakes up depending (younger) entries
by enqueueing a call to CacheController::cache_access (through the
respective signal) for the next youngest guy

(Somehow I had older code in build/  that was picked up by my indexer,
first)

[...]

> 1) AFAICS, there is no mechanism that orders stores, correct? 
[...]
> 4) I wanted to use the dependency mechanism to be able to have a memory
>    request that will wait on all older stores (and loads) to ripple
>    through the memory subsystem, first.  Using an mfence in the core
>    *should* be enough, but I would need to know when the (implicit in
>    Marss86) store buffer has drained and know that stores are actually
>    drained in-order.  Since both is not in place, my attempt to model
>    that particular request as a store has failed, so far.

Since dependencies cannot change in the current layout and each element
can only be part of a single chain, it is impossible to track both,
coherence (same address) and store order (different addresses, same
type) for each entry.  I have been contemplating various options:

* full dependeny tracking, i.e., every entry in pendingRequests_ can be
  dependent on every other -> this requires at least 256 ready bits for
  each CacheQueueEntry -> not really feasible

* implement in-order cache update for stores only where it is required
  (until the stores have reached the point of coherence) for normal WB
  L1s this would just mean buffering stores in-order behind the core's
  retire stage, either in the existing LSQ, or adding a separate store
  write-back queue
  
  This is problematic with respect to systems using WT L1s.  I have to
  admit I have not fully understood the WT handling in Marss, but one
  would have to have an in-order queue that makes sure things get
  written to the L2 (or whichever cache is coherent) in-order.

  Advantage:  This would leave the memory subsystem without changes.

* add a single new entry for store dependencies, allowing to add a
  second precondition for each CacheQueueEntry (namely previous store
  needs to complete)

* rebuild dependencies by scanning or dependencies for each entry when
  it gets ready in cache_access_cb, i.e., rebuilding dependencies on the
  fly

  Advantage: No new tracking structures.
  Disadvantage: May cause more replays than needed: get woken up, see
  that there is still a store / dependent other thing in front of you ->
  set dependency -> go back to sleep


Any opinions on what would be best?

Cheers,
  Stephan


_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Reply via email to