> ---------------------------------------------------------------------------- > | Type16 0x1000 | > ---------------------------------------------------------------------------- > ^ ^ ^ ^ ^ ^ > | | | | | | > | ----+--- ----+---- ----+---- ---------+-------- | > | | Type17 | | Type17 | | Type17 | | Type17 | | > | | 0..16G | | 16..32G | | 32..48G | ... | N*16G..(N+1)*16G | | > | | 0x1100 | | 0x1101 | | 0x1102 | | 0x110<N> | | > | -------- --------- --------- ------------------ | > | ^ ^ ^ ^ ^ | > | | | | | | | > | +--+ +--+ | | | | > | | | | | | | > | ----+--- ---+---- ----+---- ----+---- ---------+-------- | > | | Type20 | | Type20 | | Type20 | | Type20 | | Type20 | | > | | 0..4G | | 4..16G | | 16..32G | | 32..48G | ... | N*16G..(N+1)*16G | | > | | 0x1400 | | 0x1401 | | 0x1402 | | 0x1403 | | 0x140<N+1> | | > | ----+--- ---+---- ----+---- ----+---- ---------+-------- | > | | | | | | | > | | | +-------+ | +----------------+ | > | | +----------------+ | | | | > | | | | | | | > | v v v v v | > | -------- -------------- | > | | Type19 | | Type19 | | > | | 0..4G | | 4G..ram_size | | > | | 0x1300 | | 0x1301 | | > | ----+--- ------+------- | > | | | | > +-------+ +----------------------------------+
Very nice. > Here are some of the limit values, and some questions and thoughts: > > - Type16 max == 2T - 1K; > > Should we just assert((ram_size >> 10) < 0x80000000), and officially > limit guests to < 2T ? No. Not fully sure what reasonable behavier would be in case more than 2T are present. I guess either not generating type16 entries at all or simply fill in the maximum value we can represent. > - Type17 max == 32G - 1M; > > This explains why we create Type17 device tables in increments of 16G, > since that's the largest possible value that's a nice, round power of > two :) Yes. > - Type19 & Type20 max == 4T - 1K; > > If we limit ourselves to what Type16 can currently represent (2T), > this should be plenty enough to work with... And there is the option to simply create multiple type19+20 entries to cover more I think. > So, currently, we split available ram into blobs of up to 16G each, > and assign each blob a Type17 node. > > We then split available ram into <4G and 4G+, and create up to two > Type19 nodes for these two areas. Yes. > Now, re. e820: currently, the expectation is that the (up to) two > Type19 nodes in the above figure correspond to (up to) two entries of > type E820_RAM in the e820 table. Yes. If more e820 ram entries show up one day, additional type19 nodes should be generated (i.e. basially simply loop over the e830 table). > Then, a type20 node is assigned to the sub-4G portion of the first > Type17 "device", and another type20 node is assigned to the over-4G > portion of the same. > > From then on, type20 nodes correspond to the rest of the 16G-or-less > type17 devices pretty much on a 1:1 basis. Hmm, not sure why type20 entries are handled the way they are. I think it would make more sense to have one type20 entry per e820 ram entry, similar to type19. > If the e820 table will contain more than just two E820_RAM entries, > and therefore we'll have more than the two Type19 nodes on the bottom > row, what are the rules for extending the rest of the figure > accordingly (i.e. how do we hook together more Type17 and Type20 nodes > to go along with the extra Type19 nodes) ? See above for type19+20. type17 represents the dimms, so where the memory is actually mapped doesn't matter there. Lets simply sum up all memory, then split into 16g pieces and create a type17 entry for each piece. At least initially. As further improvement we could make the dimm size configurable, so if you have a 4 node numa machine with 4g ram on each node you can present 4 virtual 4g ram dimms to the guest instead of a single 16g dimm. But that is clearly beyond the scope of the initial revision ... cheers, Gerd