Dear All, I have submitted a review request which enable gem5 to simulate a simple HMC device: http://reviews.gem5.org/r/2986/ (I had previously submitted a set of patches for this model, but I had to discard them all because they were corrupted. So, please ignore them).
This model highly reuses the existing components in gem5 (please see the attached pdf), and its parameters have been tuned based on the following references: [1] http://www.hybridmemorycube.org/specification-download/ [2] High performance AXI-4.0 based interconnect for extensible smart memory cubes (E. Azarkhish et. al) [3] Low-Power Hybrid Memory Cubes With Link Power Management and Two-Level Prefetching (J. Ahn et. al) [4] Memory-centric system interconnect design with Hybrid Memory Cubes (G. Kim et. al) [5] Near Data Processing, Are we there yet? (M. Gokhale) http://www.cs.utah.edu/wondp/gokhale.pdf Here are the main components in this model: - VAULT CONTROLLERS: Vault controllers are simply instances of the HMC_2500_x32 class with their functionalities specified in the DRAMCtrl class - THE MAIN XBAR OF THE HMC: This component is just an instance of the NoncoherentXBar class, and its parameters are tuned to [2] which is a cycle accurate and synthesizable interconnect to be used in the Logic-based of the HMC. - SERIAL LINKS: Each SerialLink is a simple variation of the Bridge class, with the ability to account for the latency of packet serialization. We assume that the serializer component at the transmitter side does not need to receive the whole packet to start the serialization. But the deserializer waits for the complete packet to check its integrity first. * Bandwidth of the serial links is not modeled in the SerialLink component itself. Instead, bandwidth/port of the HMCController has been adjusted to reflect the bandwidth delivered by each serial link. - HMC CONTROLLER: Contains a large buffer (modeled with Bridge - See config.dot.pdf) to hide the access latency of the memory cube. Plus it simply forwards the packets to the serial links in a round-robin fashion to balance the load among them. * It is inferred from the standard [1] and the literature [3] that serial links share the same address range and packets can travel over any of them so a load distribution mechanism is required among them. Lastly, an accuracy comparison of this model and our in-house cycle accurate model of HMC shows that "execution-time", and "bandwidth" match really well between these two (less than 5% difference). Also, we have adjusted the queue sizes in gem5 (in the Bridges and the DRAM Controllers) to obtain reasonable "memory-access-time" values in comparison with the cycle-accurate model, nevertheless, there are still disparities and "memory-access-time" should be reported carefully. After applying the changeset, you can simply run a full-system simulation: ./build/ARM/gem5.opt configs/example/fs.py \ --mem-type=HMC_2500_x32 --mem-channels=16 \ --caches --l2cache \ --cpu-type=timing Or a traffic based simulation: ./build/ARM/gem5.opt configs/example/hmctest.py \ --mem-type=HMC_2500_x32 \ --mode=RANDOM I hope that this patch is helpful, _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
