Dear All,

I have submitted a review request which enable gem5 to simulate a simple HMC 
device:
http://reviews.gem5.org/r/2986/
(I had previously submitted a set of patches for this model, but I had to 
discard them all
because they were corrupted. So, please ignore them).

This model highly reuses the existing components in gem5 (please see the 
attached pdf),
and its parameters have been tuned based on the following references:
 [1] http://www.hybridmemorycube.org/specification-download/
 [2] High performance AXI-4.0 based interconnect for extensible smart memory 
cubes
     (E. Azarkhish et. al)
 [3] Low-Power Hybrid Memory Cubes With Link Power Management and Two-Level
     Prefetching (J. Ahn et. al)
 [4] Memory-centric system interconnect design with Hybrid Memory Cubes
     (G. Kim et. al)
 [5] Near Data Processing, Are we there yet? (M. Gokhale)
     http://www.cs.utah.edu/wondp/gokhale.pdf

Here are the main components in this model:

- VAULT CONTROLLERS:
  Vault controllers are simply instances of the HMC_2500_x32 class with their
  functionalities specified in the DRAMCtrl class

- THE MAIN XBAR OF THE HMC:
  This component is just an instance of the NoncoherentXBar class, and its
  parameters are tuned to [2] which is a cycle accurate and synthesizable 
interconnect
  to be used in the Logic-based of the HMC.

- SERIAL LINKS:
  Each SerialLink is a simple variation of the Bridge class, with the ability to
  account for the latency of packet serialization. We assume that the
  serializer component at the transmitter side does not need to receive the
  whole packet to start the serialization. But the deserializer waits for the
  complete packet to check its integrity first.
  * Bandwidth of the serial links is not modeled in the SerialLink component
    itself. Instead, bandwidth/port of the HMCController has been adjusted to
    reflect the bandwidth delivered by each serial link.

- HMC CONTROLLER:
  Contains a large buffer (modeled with Bridge - See config.dot.pdf) to hide
  the access latency of the memory cube. Plus it simply forwards the packets
  to the serial links in a round-robin fashion to balance the load among them.
  * It is inferred from the standard [1] and the literature [3] that serial 
links
    share the same address range and packets can travel over any of them
    so a load distribution mechanism is required among them.

Lastly, an accuracy comparison of this model and our in-house cycle accurate
model of HMC shows that "execution-time", and "bandwidth" match really well
between these two (less than 5% difference). Also, we have adjusted the queue
sizes in gem5 (in the Bridges and the DRAM Controllers) to obtain reasonable 
"memory-access-time" values in comparison with the cycle-accurate model,
nevertheless, there are still disparities and "memory-access-time" should be
reported carefully.

After applying the changeset, you can simply run a full-system simulation:
./build/ARM/gem5.opt configs/example/fs.py \
        --mem-type=HMC_2500_x32 --mem-channels=16 \
        --caches --l2cache \
        --cpu-type=timing

Or a traffic based simulation:
./build/ARM/gem5.opt configs/example/hmctest.py \
        --mem-type=HMC_2500_x32 \
        --mode=RANDOM

I hope that this patch is helpful,
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to