On Fri, Jul 8, 2016 at 9:40 AM, Jeff Hammond <jeff.scie...@gmail.com> wrote:
> >> > 1) How do we run at bandwidth peak on new architectures like Cori or >> Aurora? >> >> Huh, there is a how here, not a why? >> > >> > Patrick and Rich have good suggestions here. Karl and Rich showed some >> promising numbers for KNL at the PETSc meeting. >> > >> > >> > Future systems from multiple vendors basically move from 2-tier memory >> hierarchy of shared LLC and DRAM to a 3-tier hierarchy of fast memory (e.g. >> HBM), regular memory (e.g. DRAM), and slow (likely nonvolatile) memory on >> a node. >> >> Jeff, >> >> Would Intel sell me a system that had essentially no regular memory >> DRAM (which is too slow anyway) and no slow memory (which is absurdly too >> slow)? What cost savings would I get in $ and power usage compared to say >> what is going in the theta? 10% and 20%, 5% and 30%, 5% and 5 %? If it is a >> significant savings then get the cut down machine, if it is insignificant >> than realize the cost of not using it (the DRAM you paid so little for) is >> insignificant and not worth worrying about, just like cruise control when >> you don't use the highway. Actually I could use the DRAM to store the >> history needed for the adjoints; so maybe it is ok to keep, but surely not >> useful for data that is continuously involved in the computation. >> > > *Disclaimer: All of the following data is pulled off of the Internet, > which in some cases is horribly unreliable. My comments are strictly for > academic discussion and not meant to be authoritative or have any influence > on purchasing or design decisions. Do not equate quoted TDP to measured > power during any workload, or assume that different measurements can be > compared directly.* > > Your thinking is in line with > http://www.nextplatform.com/2015/08/03/future-systems-intel-ponders-breaking-up-the-cpu/. > .. > > Intel sells KNL packages as parts ( > http://ark.intel.com/products/family/92650/Intel-Xeon-Phi-Product-Family-x200#@Server) > that don't have any DRAM in them, just MCDRAM. It's the decision of the > integrator what goes into the system, which of course is correlated to what > the intended customer wants. While you might not need a node with DRAM, > many users do, and the systems that DOE buys are designed to meet the needs > of their broad user base. > > I don't know if KNL is bootable without no DRAM at all - this is likely > more to do with what motherboard, BIOS, etc. expect than the processor > package itself. However, the KNL alltoall mode addresses the case where > DRAM channels are underpopulated (with fully populated channels, one should > use quadrant, hemisphere, SNC-2 or SNC-4), so if DRAM is necessary, you > should be able to boot it with only one channel populated. Of course, if > you do this, you'll get 1/6 of the DDR4 bandwidth. > Just FYI: I have run on KNL systems with no DRAM, only MCDRAM. This was on an internal lab machine and not a commercially available system, but I see no reason why one couldn't buy systems this way. --Richard > As to the question of DRAM power, there is a lot of detailed information > available (e.g. > https://www.micron.com/~/media/documents/products/power-calculator/ddr4_power_calc.xlsm, > > https://www.micron.com/~/media/Documents/Products/Technical%20Note/DRAM/TN4603.pdf, > https://lenovopress.com/lp0083.pdf) but since I am lazy, I'll use the > numbers reported on > http://www.tomshardware.com/reviews/intel-core-i7-5960x-haswell-e-cpu,3918-13.html > for client memory (i.e. not server memory, hence probably not providing > ECC, but ECC doesn't change power consumption much), which works out to > 0.37 W/GB for DDR4-2133, hence 71 W for 192 GB [ > http://www.nextplatform.com/2015/11/30/inside-future-knights-landing-xeon-phi-systems/]. > That 71W is ~1/3 of the processor package power (215W). The network > adapter draws some power, and the cables and switches (especially optics) > are a nontrivial power draw. So DRAM is at most 25% of the node power, and > perhaps ~17% of system power based upon what I can derive from Shaheen II. > > Shaheen II Cray XC40 > 1.96 MW = 6174 * (2 sockets * 135 W/socket + 128 GB * 0.37 W/GB) > 2.83 MW total > = 69% from CPU+DRAM > > Again, *these are not the exact numbers* but what I can derive from > https://www.top500.org/system/178515, > https://www.hpc.kaust.edu.sa/content/shaheen-ii and > http://ark.intel.com/products/81060/Intel-Xeon-Processor-E5-2698-v3-40M-Cache-2_30-GHz > . > > Back to the higher level analysis, what is unfortunate about DRAM is that > it needs power to hold data even if the data isn't used, because it is not > persistent. I don't know how well it powers down when the physical memory > isn't mapped but it seems that power is not gated today [ > http://digitalpiglet.org/research/sion2014socc.pdf]. The advantage of > nonvolatile memory is that it doesn't require power when not being > accessed, whether or not the data is preserved. > > I suspect that nonvolatile memory (NVM) is the right place to put your > adjoint matrices, provided the NVM bandwidth is sufficient. > > *Disclaimer: All of these are academic comments. Do not use them to try > to influence others or make any decisions. Do your own research and be > skeptical of everything I derived from the Internet.* > > Jeff > > -- > Jeff Hammond > jeff.scie...@gmail.com > http://jeffhammond.github.io/ >