Re: [Qemu-devel] SMBIOS vs. NUMA (was: Build full type 19 tables)

Gerd Hoffmann Thu, 13 Mar 2014 01:06:07 -0700

>  ----------------------------------------------------------------------------
> |                               Type16  0x1000                               |
>  ----------------------------------------------------------------------------
>  ^             ^               ^           ^                    ^           ^
>  |             |               |           |                    |           |
>  |         ----+---        ----+----   ----+----       ---------+--------   |
>  |        | Type17 |      | Type17  | | Type17  |     | Type17           |  |
>  |        | 0..16G |      | 16..32G | | 32..48G | ... | N*16G..(N+1)*16G |  |
>  |        | 0x1100 |      | 0x1101  | | 0x1102  |     | 0x110<N>         |  |
>  |         --------        ---------   ---------       ------------------   |
>  |          ^   ^              ^           ^                    ^           |
>  |          |   |              |           |                    |           |
>  |       +--+   +--+           |           |                    |           |
>  |       |         |           |           |                    |           |
>  |   ----+---   ---+----   ----+----   ----+----       ---------+--------   |
>  |  | Type20 | | Type20 | | Type20  | | Type20  |     | Type20           |  |
>  |  | 0..4G  | | 4..16G | | 16..32G | | 32..48G | ... | N*16G..(N+1)*16G |  |
>  |  | 0x1400 | | 0x1401 | | 0x1402  | | 0x1403  |     | 0x140<N+1>       |  |
>  |   ----+---   ---+----   ----+----   ----+----       ---------+--------   |
>  |       |         |           |           |                    |           |
>  |       |         |           +-------+   |   +----------------+           |
>  |       |         +----------------+  |   |   |                            |
>  |       |                          |  |   |   |                            |
>  |       v                          v  v   v   v                            |
>  |   --------                      --------------                           |
>  |  | Type19 |                    | Type19       |                          |
>  |  | 0..4G  |                    | 4G..ram_size |                          |
>  |  | 0x1300 |                    | 0x1301       |                          |
>  |   ----+---                      ------+-------                           |
>  |       |                               |                                  |
>  +-------+                               +----------------------------------+


Very nice.

> Here are some of the limit values, and some questions and thoughts:
> 
> - Type16 max == 2T - 1K;
> 
> Should we just assert((ram_size >> 10) < 0x80000000), and officially
> limit guests to < 2T ?

No.  Not fully sure what reasonable behavier would be in case more than
2T are present.  I guess either not generating type16 entries at all or
simply fill in the maximum value we can represent.

> - Type17 max == 32G - 1M;
> 
> This explains why we create Type17 device tables in increments of 16G,
> since that's the largest possible value that's a nice, round power of
> two :)

Yes.

> - Type19 & Type20 max == 4T - 1K;
> 
> If we limit ourselves to what Type16 can currently represent (2T),
> this should be plenty enough to work with...

And there is the option to simply create multiple type19+20 entries to
cover more I think.

> So, currently, we split available ram into blobs of up to 16G each,
> and assign each blob a Type17 node.
> 
> We then split available ram into <4G and 4G+, and create up to two
> Type19 nodes for these two areas.

Yes.

> Now, re. e820: currently, the expectation is that the (up to) two
> Type19 nodes in the above figure correspond to (up to) two entries of
> type E820_RAM in the e820 table.

Yes.  If more e820 ram entries show up one day, additional type19 nodes
should be generated (i.e. basially simply loop over the e830 table).

> Then, a type20 node is assigned to the sub-4G portion of the first
> Type17 "device", and another type20 node is assigned to the over-4G
> portion of the same.
> 
> From then on, type20 nodes correspond to the rest of the 16G-or-less
> type17 devices pretty much on a 1:1 basis.

Hmm, not sure why type20 entries are handled the way they are.  I think
it would make more sense to have one type20 entry per e820 ram entry,
similar to type19.

> If the e820 table will contain more than just two E820_RAM entries,
> and therefore we'll have more than the two Type19 nodes on the bottom
> row, what are the rules for extending the rest of the figure
> accordingly (i.e. how do we hook together more Type17 and Type20 nodes
> to go along with the extra Type19 nodes) ?

See above for type19+20.  type17 represents the dimms, so where the
memory is actually mapped doesn't matter there.  Lets simply sum up all
memory, then split into 16g pieces and create a type17 entry for each
piece.  At least initially.

As further improvement we could make the dimm size configurable, so if
you have a 4 node numa machine with 4g ram on each node you can present
4 virtual 4g ram dimms to the guest instead of a single 16g dimm.  But
that is clearly beyond the scope of the initial revision ...

cheers,
  Gerd

Re: [Qemu-devel] SMBIOS vs. NUMA (was: Build full type 19 tables)

Reply via email to