On Thu, Mar 13, 2014 at 09:04:52AM +0100, Gerd Hoffmann wrote: >> Should we just assert((ram_size >> 10) < 0x80000000), and officially >> limit guests to < 2T ? > No. Not fully sure what reasonable behavier would be in case more than > 2T are present. I guess either not generating type16 entries at all or > simply fill in the maximum value we can represent.
Well, there's an "extended maximum capacity" field available starting with smbios v2.7, which is an uint64_t counting bytes. Bumping the few other types up to 2.7 shouldn't be too onerous, but I have no idea how well the various currently supported OSs would react to smbios suddenly going v2.7... > > Then, a type20 node is assigned to the sub-4G portion of the first > > Type17 "device", and another type20 node is assigned to the over-4G > > portion of the same. > > > > From then on, type20 nodes correspond to the rest of the 16G-or-less > > type17 devices pretty much on a 1:1 basis. > > Hmm, not sure why type20 entries are handled the way they are. I think > it would make more sense to have one type20 entry per e820 ram entry, > similar to type19. Type20 entries have pointers to type17 (memory_device_handle) and type19 (memory_array_mapped_address_handle). Which, if you turn it upside down could be interpreted as "every type 17 dimm needs (at least) a type20 device mapped address to point at it". > > If the e820 table will contain more than just two E820_RAM entries, > > and therefore we'll have more than the two Type19 nodes on the bottom > > row, what are the rules for extending the rest of the figure > > accordingly (i.e. how do we hook together more Type17 and Type20 nodes > > to go along with the extra Type19 nodes) ? > > See above for type19+20. type17 represents the dimms, so where the > memory is actually mapped doesn't matter there. Lets simply sum up all > memory, then split into 16g pieces and create a type17 entry for each > piece. At least initially. That's pretty much what happens now. If we decide to use e820 instead of simply (below_4g, above_4g), I'd like add some sort of assertion that would alert anyone who might start adding extra entries into e820 beyond the current two (below_4g and above_4g) :) > As further improvement we could make the dimm size configurable, so if > you have a 4 node numa machine with 4g ram on each node you can present > 4 virtual 4g ram dimms to the guest instead of a single 16g dimm. But > that is clearly beyond the scope of the initial revision ... Minimum number of largest-possible power-of-two dimms per node, given the size of RAM assigned to each node. Then we'd basically just replicate the figure laterally, one instance per node (perhaps keeping a common T16 on top, but having one T19 at the bottom per node, and one T17,T20 pair per DIMM): t16 <-------------------------------------------------------> t17 t17 ... t17 t17 t17 ... t17 ... t17 t17 ... t17 t20 t20 ... t20 t20 t20 ... t20 ... t20 t20 ... t20 <-------------> <-------------> <-------------> t19 t19 t19 (node 0) (node 1) (node N) Would the 4G boundary issue still occur on a NUMA system (i.e., would node 0 have two t19s, and two t20s for the first t17, just like my current picture)? Do NUMA systems even have (or need) a smbios table ? :) But I agree, this shouldn't have to be sorted out right away :) --Gabriel