Stephan Diestelhorst wrote on Tuesday 13 November 2012, 19:29:56: > Hi, > I am curious about the handling of dependencies / order in the cache > subsystem. So far, I have looked at the function find_dependencies in > *Controller, and have a few related questions:
[...] > 2) How are the dependencies enforced? I have to admit I frequently get > lost with all the *_cb and handle_* functions and the multiple levels > of requests etc. I have, however, failed so far to understand how > dependants are woken up (only found > CPUController::wakeup_dependents), but that one should not apply for > the dependencies in the normal coherent CacheController. I also did > not find any retry path which will make the stalled request check its > dependencies periodically, in case there is no external wakeup. Just answering myself here, since I just found that path: CacheController::clear_entry_cb wakes up depending (younger) entries by enqueueing a call to CacheController::cache_access (through the respective signal) for the next youngest guy (Somehow I had older code in build/ that was picked up by my indexer, first) [...] > 1) AFAICS, there is no mechanism that orders stores, correct? [...] > 4) I wanted to use the dependency mechanism to be able to have a memory > request that will wait on all older stores (and loads) to ripple > through the memory subsystem, first. Using an mfence in the core > *should* be enough, but I would need to know when the (implicit in > Marss86) store buffer has drained and know that stores are actually > drained in-order. Since both is not in place, my attempt to model > that particular request as a store has failed, so far. Since dependencies cannot change in the current layout and each element can only be part of a single chain, it is impossible to track both, coherence (same address) and store order (different addresses, same type) for each entry. I have been contemplating various options: * full dependeny tracking, i.e., every entry in pendingRequests_ can be dependent on every other -> this requires at least 256 ready bits for each CacheQueueEntry -> not really feasible * implement in-order cache update for stores only where it is required (until the stores have reached the point of coherence) for normal WB L1s this would just mean buffering stores in-order behind the core's retire stage, either in the existing LSQ, or adding a separate store write-back queue This is problematic with respect to systems using WT L1s. I have to admit I have not fully understood the WT handling in Marss, but one would have to have an in-order queue that makes sure things get written to the L2 (or whichever cache is coherent) in-order. Advantage: This would leave the memory subsystem without changes. * add a single new entry for store dependencies, allowing to add a second precondition for each CacheQueueEntry (namely previous store needs to complete) * rebuild dependencies by scanning or dependencies for each entry when it gets ready in cache_access_cb, i.e., rebuilding dependencies on the fly Advantage: No new tracking structures. Disadvantage: May cause more replays than needed: get woken up, see that there is still a store / dependent other thing in front of you -> set dependency -> go back to sleep Any opinions on what would be best? Cheers, Stephan _______________________________________________ http://www.marss86.org Marss86-Devel mailing list [email protected] https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
