Hi Andreas,

Some brief clarifications before addressing your questions below: I've
validated most of our gem5-gpu memory hierarchy modeling against NVIDIA
Fermi hardware (GTX580 and Tesla C2070) using some reverse engineering.
While I've also tested newer hardware, it will be easier to validate this
gem5 change if we aim to model something close to these same Fermi
baselines. Also, the RubyMemoryController doesn't model things like
separate data, command, and core frequency, or an open-page policy, so I've
had to do some digging to translate the RubyMemoryController parameters
back to actual parameters - more on this below.


I am not sure I grok the latency and queue argument still. Adding a larger
> response queue does not increase the latency unless there is also a bunch
> of transactions queued up. Am I missing something?
>

I probably should have been referring to these as "delay queues" rather
than buffers. They do not model actual hardware buffers, but rather they
are meant to model the DRAM controller pipeline (and possibly interconnect)
latencies. GPGPU-Sim often decouples functional simulation from timing
simulation, and this is one of those cases. Unlike gem5's event-driven
simulation which allows scheduling when an access should complete,
GPGPU-Sim puts accesses in these delay queues, which are stepped each DRAM
controller cycle to move the accesses through at predictable latencies.


We can easily set the static pipeline latency to a high value, but 100 ns
> sounds much too large. What is that value based on?
>

This was based on my own empirical testing, and what we've used for
validating gem5-gpu in the past. However, I appreciate the prompt for
review, since I had not thought through how the DRAMCtrl models this
differently than the RubyMemoryController:

First, the RubyMemoryController models a close-page policy with a static
"bank access latency", which is meant to approximate RAS + CAS + bank
access. However, given that row-buffers are large for 8n prefetch, GPU DRAM
controllers probably use an open-page policy, so an average random access
(what I used for validating) would be an open-page row buffer miss and
probably include precharge latency. For this reason, I had pegged
RubyMemoryController zero-load latency between 88 and 104ns depending on
the system being modeled. A row-buffer hit would bring that zero-load
latency down to maybe 20-36ns.

Second, it doesn't address your prompt directly, but it is worth noting
that most GPU memory hierarchies aim for very predictable latencies, which
often means seemingly excessive pipelining. This is to help support
very-near-peak bandwidths where necessary and makes GPU-wide scheduling
more predictable. This is likely a second-order reason why GPU DRAM
controllers have higher than expected zero-load latency.

I'm not sure I can get any more precise with the data I have, but maybe
that is enough to work with?


Do you have details on the addressing scheme you mention?
>

The GPGPU-Sim config contains the following for the DRAM pin identifiers (R
= row address, C = column address, B = bank address, 0 = data), and the
field sizes are echoed on page 15, table 6 of the Hynix data sheet:

-gpgpu_mem_addr_mapping dramid@8
;00000000.00000000.00000000.00000000.0000RRRR.RRRRRRRR.BBBCCCCB.CCSSSSSS

I'm not familiar with the different (sub)cycle interpretations of control
and address signals on particular lines though. Are the pin IDs sufficient,
or are you looking for something more specific here?


In GDDR5, the command bus is DDR and the data bus QDR. The part you refer
> to is either running at 5.5 Gbps or 6 Gbps according to the data sheet, so
> I do find the 4 Gbps surprising. Are you sure?
>

Yes, to start with I think we should use the 4 Gbps target: We've validated
against the GeForce GTX480 and 580, which use low-end data bus clocks
between 3.2 and 4 Gbps/pin (note: in practice, discrete GPUs take more
liberties to set frequencies as desired, since their tighter coupling with
DRAMs offers more freedom than the clocking agreements required between CPU
chips and their memories). More recent cards use the higher frequencies,
but I have limited testing with them.


The burst length I am talking about is the DRAM burst length, which for
> GDDR5 is 8 beats (so with a x32 mode that would be 32 bytes per burst. I
> do not see this specified anywhere in the line you sent, and therefore I
> am curious if it is omitted completely (DRAMSim2 for example makes
> dangerous assumptions here).
>

Yep, I follow but wasn't sure if/how GPU vendors might play with this for
streaming/interleaving purposes. On closer inspection of the Hynix doc, it
says burst length is "8 only" on page 5. I also did a spot check of a few,
more recent GDDR5 data sheets, and they show the same thing.


  Joel





>
> On 14/10/2014 00:22, "Joel Hestness via gem5-dev" <gem5-dev@gem5.org>
> wrote:
>
> >Hi Andreas,
> >
> >
> >> Thanks. I really do not understand the return queue argument. Why on
> >>earth
> >> would you need such a large return queue? Surely the agent making the
> >> requests (the GPU in this case) should have allocated space for the
> >> response, no?
> >>
> >
> >Good question. I believe that buffer is just to get the appropriate
> >overall
> >minimum latency for memory accesses (i.e. GPGPU-Sim does some level of
> >functional controller modeling, and the buffer is just for timing). If
> >there a way to add arbitrary latency to the DRAMCtrl, it would be nice to
> >aim for no-load latency of roughly 100ns (from when the controller
> >receives
> >the access to when a response is returned to the Ruby network). Also, if I
> >have a chance to test out Nilay's patch, I could tune on this setting.
> >
> >Concerning the configuration, what is the assumed clock speed (tCK), and
> >> is it operated as a x16 or x32 part? Is the burst length implicit in
> >>your
> >> configuration (or is 8 the default)?
> >>
> >
> >The memory modeled in GPGPU-Sim is x32 based on the addressing scheme.
> >That's where I'd start the mode.
> >
> >In GDDR5, the core frequency is 1/4 of command frequency (and 1/2 in
> >GDDR3?). A common channel frequency is 4GHz, resulting in effective
> >channel
> >bandwidth of 32GT/s. So, a starting baseline of tCK = 1ns should be
> >sufficient. Does that sound right?
> >
> >I believe burst length is typically 8 and can be much longer. In practice,
> >this depend highly on address hashing/interleaving across many controllers
> >in various different GPUs, so 8 should be a sufficient baseline.
> >
> >
> >  Joel
> >
> >
> >
> >On 10/13/14, 10:28 PM, "Joel Hestness via gem5-dev" <gem5-dev@gem5.org>
> >> wrote:
> >>
> >> >Hi Andreas,
> >> >  Sure thing. We try to closely replicate the parameters used in
> >>GPGPU-Sim
> >> >v3.2.2, which are specified in the Hynix datasheet here:
> >> >http://www.hynix.com/datasheet/pdf/graphics/H5GQ1H24AFR(Rev1.0).pdf.
> >> >
> >> >  Here are relevant excerpts from the GPGPU-Sim GTX480 config file
> >> >(attached):
> >> >
> >> ># The DRAM return queue and the scheduler queue together should provide
> >> >buffer
> >> ># to sustain the memory level parallelism to tolerate DRAM latency
> >> ># To allow 100% DRAM utility, there should at least be enough buffer to
> >> >sustain
> >> ># the minimum DRAM latency (100 core cycles).  I.e.
> >> >#   Total buffer space required = 100 x 924MHz / 700MHz = 132
> >> >-gpgpu_frfcfs_dram_sched_queue_size 16
> >> >-gpgpu_dram_return_queue_size 116
> >> >
> >> >-dram_data_command_freq_ratio 4  # GDDR5 is QDR
> >> >-gpgpu_dram_timing_opt "nbk=16:CCD=2:RRD=6:RCD=12:RAS=28:RP=12:RC=40:
> >> >                        CL=12:WL=4:CDLR=5:WR=12:nbkgrp=4:CCDL=3:RTPL=2"
> >> >
> >> >  If anything needs clarification, I'm happy to help sort it out. Just
> >>let
> >> >me know.
> >> >
> >> >  Thanks!
> >> >  Joel
> >> >
> >> >
> >> >
> >> >On Mon, Oct 13, 2014 at 4:11 PM, Andreas Hansson via gem5-dev <
> >> >gem5-dev@gem5.org> wrote:
> >> >
> >> >> Hi Joel,
> >> >>
> >> >> I am happy to spend the 5 minutes creating a GDDR5 configuration. Do
> >>you
> >> >> have any specific data sheet you would like to capture?
> >> >>
> >> >> Andreas
> >> >>
> >> >> On 10/13/14, 10:09 PM, "Joel Hestness via gem5-dev"
> >><gem5-dev@gem5.org>
> >> >> wrote:
> >> >>
> >> >> >Hi guys,
> >> >> >
> >> >> >
> >> >> >> Thanks for the clarification. I believe the RubyMemoryController
> >>is
> >> >> >> completely Pareto dominated by the vanilla DRAMCtrl module, but if
> >> >>there
> >> >> >> is any specific feature/setting missing I would be keen to know.
> >> >> >>
> >> >> >> If possible I would like to make sure we use the same controller
> >>as a
> >> >> >> default for all timing simulations (even if the other one would be
> >> >> >> maintained as a fallback).
> >> >> >
> >> >> >
> >> >> >I'd like to second the desire to have a simple replacement baseline
> >> >>that
> >> >> >performs at least as well as the RubyMemoryController in most/all
> >> >>cases:
> >> >> >gem5-gpu now has more than 100 users, and as far as I know, we are
> >>all
> >> >> >using Ruby and thus the RubyMemoryController. The
> >>RubyMemoryController
> >> >>is
> >> >> >pretty simple to configure similarly to DDR3 or GDDR5 and to
> >>interpret
> >> >> >results. It performs surprisingly close to some GPU hardware. If
> >>this
> >> >> >controller goes away, I (and I'm sure other gem5-gpu users) would
> >> >>prefer
> >> >> >to
> >> >> >have something that is known to perform as well and is also easy to
> >> >> >configure.
> >> >> >
> >> >> >I think we (the gem5-gpu crew) are fine with the
> >>RubyMemoryController
> >> >> >going
> >> >> >away eventually. However, given that there isn't currently a
> >>GDDR-like
> >> >> >DRAMCtrl configuration in gem5, I'd like to second Nilay and Brad
> >>that
> >> >>we
> >> >> >offer users sufficient time to prepare for RubyMemoryController
> >> >>removal.
> >> >> >We
> >> >> >will need to adapt our heterogeneous Ruby coherence protocols, and
> >> >>other
> >> >> >users have their own protocols they'd need to adapt as well.
> >> >> >
> >> >> >
> >> >> >  Joel
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >> On 10/13/14, 9:01 PM, "Nilay Vaish via gem5-dev"
> >><gem5-dev@gem5.org>
> >> >> >> wrote:
> >> >> >>
> >> >> >> >On Mon, 13 Oct 2014, Andreas Hansson via gem5-dev wrote:
> >> >> >> >
> >> >> >> >> Hi all,
> >> >> >> >>
> >> >> >> >> With Nilay?s recent improvements to Ruby I would like to
> >> >>understand
> >> >> >>if
> >> >> >> >> there is any point in still having the RubyMemoryControl, or
> >>if we
> >> >> >> >> should just clean things up a bit and remove it. I would think
> >>the
> >> >> >>best
> >> >> >> >> way forward is to clean up the integration of Ruby and classic
> >>and
> >> >> >> >> ensure that there is no duplicated functionality beyond what is
> >> >> >> >>strictly
> >> >> >> >> necessary.
> >> >> >> >>
> >> >> >> >> Nilay, do you think this would make sense? Is there anyone else
> >> >>with
> >> >> >> >>any
> >> >> >> >> opinions in this matter?
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >I was in favor of dropping RubyMemoryControl.  But I had some
> >> >> >>discussion
> >> >> >> >with Brad Beckmann from AMD.  Since AMD has some infrastructure
> >>in
> >> >> >>place
> >> >> >> >already, they would like to retain RubyMemoryControl for the time
> >> >> >>being.
> >> >> >> >
> >> >> >> >I suggest that we retain the memory controller code in ruby for
> >> >>another
> >> >> >> >six months or so, and then we will drop it.  In the mean time,
> >>we
> >> >> >> >will update the interface so that ruby protocols can use classic
> >> >>memory
> >> >> >> >controller.  The code for this is already on the reviewboard.
> >>Over
> >> >> >>this
> >> >> >> >six month period, I hope, most users would have switched to using
> >> >> >>classic
> >> >> >> >controller.
> >> >> >> >
> >> >> >> >Thanks
> >> >> >> >Nilay
> >> >> >> >_______________________________________________
> >> >> >> >gem5-dev mailing list
> >> >> >> >gem5-dev@gem5.org
> >> >> >> >http://m5sim.org/mailman/listinfo/gem5-dev
> >> >> >>
> >> >> >>
> >> >> >> -- IMPORTANT NOTICE: The contents of this email and any
> >>attachments
> >> >>are
> >> >> >> confidential and may also be privileged. If you are not the
> >>intended
> >> >> >> recipient, please notify the sender immediately and do not
> >>disclose
> >> >>the
> >> >> >> contents to any other person, use it for any purpose, or store or
> >> >>copy
> >> >> >>the
> >> >> >> information in any medium.  Thank you.
> >> >> >>
> >> >> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1
> >>9NJ,
> >> >> >> Registered in England & Wales, Company No:  2557590
> >> >> >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge
> >>CB1
> >> >> >>9NJ,
> >> >> >> Registered in England & Wales, Company No:  2548782
> >> >> >> _______________________________________________
> >> >> >> gem5-dev mailing list
> >> >> >> gem5-dev@gem5.org
> >> >> >> http://m5sim.org/mailman/listinfo/gem5-dev
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> >--
> >> >> >  Joel Hestness
> >> >> >  PhD Student, Computer Architecture
> >> >> >  Dept. of Computer Science, University of Wisconsin - Madison
> >> >> >  http://pages.cs.wisc.edu/~hestness/
> >> >> >_______________________________________________
> >> >> >gem5-dev mailing list
> >> >> >gem5-dev@gem5.org
> >> >> >http://m5sim.org/mailman/listinfo/gem5-dev
> >> >> >
> >> >>
> >> >>
> >> >> -- IMPORTANT NOTICE: The contents of this email and any attachments
> >>are
> >> >> confidential and may also be privileged. If you are not the intended
> >> >> recipient, please notify the sender immediately and do not disclose
> >>the
> >> >> contents to any other person, use it for any purpose, or store or
> >>copy
> >> >>the
> >> >> information in any medium.  Thank you.
> >> >>
> >> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> >> >> Registered in England & Wales, Company No:  2557590
> >> >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
> >> >>9NJ,
> >> >> Registered in England & Wales, Company No:  2548782
> >> >>
> >> >> _______________________________________________
> >> >> gem5-dev mailing list
> >> >> gem5-dev@gem5.org
> >> >> http://m5sim.org/mailman/listinfo/gem5-dev
> >> >>
> >> >
> >> >
> >> >
> >> >--
> >> >  Joel Hestness
> >> >  PhD Student, Computer Architecture
> >> >  Dept. of Computer Science, University of Wisconsin - Madison
> >> >  http://pages.cs.wisc.edu/~hestness/
> >> >_______________________________________________
> >> >gem5-dev mailing list
> >> >gem5-dev@gem5.org
> >> >http://m5sim.org/mailman/listinfo/gem5-dev
> >> >
> >>
> >>
> >> -- IMPORTANT NOTICE: The contents of this email and any attachments are
> >> confidential and may also be privileged. If you are not the intended
> >> recipient, please notify the sender immediately and do not disclose the
> >> contents to any other person, use it for any purpose, or store or copy
> >>the
> >> information in any medium.  Thank you.
> >>
> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> >> Registered in England & Wales, Company No:  2557590
> >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
> >>9NJ,
> >> Registered in England & Wales, Company No:  2548782
> >>
> >> _______________________________________________
> >> gem5-dev mailing list
> >> gem5-dev@gem5.org
> >> http://m5sim.org/mailman/listinfo/gem5-dev
> >>
> >
> >
> >--
> >  Joel Hestness
> >  PhD Student, Computer Architecture
> >  Dept. of Computer Science, University of Wisconsin - Madison
> >  http://pages.cs.wisc.edu/~hestness/
> >_______________________________________________
> >gem5-dev mailing list
> >gem5-dev@gem5.org
> >http://m5sim.org/mailman/listinfo/gem5-dev
> >
>
>
> -- IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium.  Thank you.
>
> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> Registered in England & Wales, Company No:  2557590
> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> Registered in England & Wales, Company No:  2548782
>
> _______________________________________________
> gem5-dev mailing list
> gem5-dev@gem5.org
> http://m5sim.org/mailman/listinfo/gem5-dev
>

-- 
  Joel Hestness
  PhD Student, Computer Architecture
  Dept. of Computer Science, University of Wisconsin - Madison
  http://pages.cs.wisc.edu/~hestness/
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to