On Thu, Mar 13, 2014 at 09:04:52AM +0100, Gerd Hoffmann wrote:
>> Should we just assert((ram_size >> 10) < 0x80000000), and officially
>> limit guests to < 2T ?
> No.  Not fully sure what reasonable behavier would be in case more than
> 2T are present.  I guess either not generating type16 entries at all or
> simply fill in the maximum value we can represent.

Well, there's an "extended maximum capacity" field available starting
with smbios v2.7, which is an uint64_t counting bytes. Bumping the few
other types up to 2.7 shouldn't be too onerous, but I have no idea how
well the various currently supported OSs would react to smbios suddenly
going v2.7...

> > Then, a type20 node is assigned to the sub-4G portion of the first
> > Type17 "device", and another type20 node is assigned to the over-4G
> > portion of the same.
> > 
> > From then on, type20 nodes correspond to the rest of the 16G-or-less
> > type17 devices pretty much on a 1:1 basis.
> 
> Hmm, not sure why type20 entries are handled the way they are.  I think
> it would make more sense to have one type20 entry per e820 ram entry,
> similar to type19.

Type20 entries have pointers to type17 (memory_device_handle) and
type19 (memory_array_mapped_address_handle). Which, if you turn it
upside down could be interpreted as "every type 17 dimm needs (at
least) a type20 device mapped address to point at it".

> > If the e820 table will contain more than just two E820_RAM entries,
> > and therefore we'll have more than the two Type19 nodes on the bottom
> > row, what are the rules for extending the rest of the figure
> > accordingly (i.e. how do we hook together more Type17 and Type20 nodes
> > to go along with the extra Type19 nodes) ?
> 
> See above for type19+20.  type17 represents the dimms, so where the
> memory is actually mapped doesn't matter there.  Lets simply sum up all
> memory, then split into 16g pieces and create a type17 entry for each
> piece.  At least initially.

That's pretty much what happens now. If we decide to use e820 instead
of simply (below_4g, above_4g), I'd like add some sort of assertion
that would alert anyone who might start adding extra entries into e820
beyond the current two (below_4g and above_4g) :)

> As further improvement we could make the dimm size configurable, so if
> you have a 4 node numa machine with 4g ram on each node you can present
> 4 virtual 4g ram dimms to the guest instead of a single 16g dimm.  But
> that is clearly beyond the scope of the initial revision ...

Minimum number of largest-possible power-of-two dimms per node, given
the size of RAM assigned to each node. Then we'd basically just
replicate the figure laterally, one instance per node (perhaps keeping
a common T16 on top, but having one T19 at the bottom per node, and 
one T17,T20 pair per DIMM):

                       t16
<------------------------------------------------------->
t17 t17 ... t17   t17 t17 ... t17   ...   t17 t17 ... t17
t20 t20 ... t20   t20 t20 ... t20   ...   t20 t20 ... t20
<------------->   <------------->         <------------->
      t19              t19                      t19
    (node 0)         (node 1)                 (node N)

Would the 4G boundary issue still occur on a NUMA system (i.e., would
node 0 have two t19s, and two t20s for the first t17, just like my
current picture)? Do NUMA systems even have (or need) a smbios table ? :)

But I agree, this shouldn't have to be sorted out right away :)

--Gabriel

Reply via email to