On Sat, 03 Aug 2013 15:56:50 +0200 Andreas Färber <afaer...@suse.de> wrote:
> Am 26.07.2013 16:37, schrieb Paolo Bonzini: > > Il 26/07/2013 14:51, Igor Mammedov ha scritto: > >> On Fri, 26 Jul 2013 11:26:16 +0200 > >> Paolo Bonzini <pbonz...@redhat.com> wrote: > >>> Il 26/07/2013 09:38, Igor Mammedov ha scritto: > >>>>>> I agree that specifying the policy on every hotplug complicates > >>>>>> management and may be overkill. But then, most guests are not NUMA at > >>>>>> all and you would hardly perceive the difference, you would just have > >>>>>> to > >>>>>> separate > >>>>>> > >>>>>> set-mem-policy 0 size=2G > >>>>>> device_add dimm,slot=0 > >>>> I'm confused, size is inherent property of generic DimmDevice and > >>>> policies > >>>> are NUMA specific of node. > >>> > >>> No, the size is not a property of the DimmDevice, it is a property of > >>> the host object that backs the DimmDevice. This is like the size is not > >>> a property of a disk, but rather of the image that backs it. > >>> > >>> Now, there may be a good reason to remove the host/guest distinction in > >>> the case of memory, but I'm still not sure this is the case. > >> I don't like split model in this case because it's complicates interface > >> and confuses user. On bare-metal when user adds DIMM, it definitely has > >> size property. So when user adds DIMM device like: > >> > >> set-mem-policy 0 size=2G,somehostprop=y > >> device_add dimm,slot=0 > >> > >> it's not very clear/intuitive to what 'size' relates to. On contrary: > >> > >> set-mem-policy 0 somehostprop=y > >> device_add dimm,slot=0,size=2G > >> > >> clearly says that we are adding 2Gb DIMM and allocator of memory that > >> bakes up DIMM, takes care about host specific details isolating them > >> from DIMM device model (which resembles baremetal one as much as possible). > > > > How is this different from > > > > drive-add id=foo,aio=native > > device-add virtio-block,drive=foo,file=/vm/foo.img > > > > We clearly do not do that, we put the file in the drive-add. > > The difference is that a user can understand drive-add, whereas > set-mem-policy in this use case is really hard to grasp as a prereq. If > it were > memory-add id=foo,size=2G > device-add dimm,slot=0,mem=foo > that may be different, but that is unhandy implementation-wise because > we assume initial RAM to be in one chunk and want to partition it into > guest NUMA nodes IIUC. Or has that changed? > > I don't care if we name the device DIMM or MemoryBar, point is it should > represent the physical thing one plugs into a physical board, to match > what admins do with physical servers. And that physical bar has a size, > as Igor points out. It should not just represent some virtual > hot-plugged policy thing. > > The drive is separate from the device because we can exchange the CD > media while leaving the device connected to ATA/AHCI/SCSI (and it allows > us to keep file, cache, etc. properties in a central place). Admins buy > servers/boards and memory bars, they won't solder chips onto it nor > change the NUMA configuration of the board while running, so we should > consider it immutable once created. If we unplug physical DIMMs, the > data can be considered lost (ignoring deep cooling tricks etc.), so no > point to transfer memory data from one DIMM to another. > > Would we seriously want to exchange the memory a DIMM is backed by while > leaving the DIMM plugged? Rather the entity that owns the DIMM slots (as > opposed to DIMMs) has the guest NUMA policy configured for the > unpopulated slots. The slot has a maximum size and the DIMM has a size > less or equal to slot's maximum size. maximum size per slot is hardware limitation, we can ignore it for virtual hardware case. I've added 'slot' option only for migration case, so we could replicate environment on receiving side (it's interface connecting QEMU and ACPI objects in guest). The thing is even if slot was an object providing NUMA policies and etc., I'd like to keep it dynamic and specifying such properties in runtime rather then at startup time to keep it flexible and avoid not necessary limitations wich lead again to backend/frontend separation. If DimmDevice is separated on backend/frontend parts than for simplicity 'size' option could go to memory-add command as well. So taking previous feedback, would be following interface acceptable? memory-add id=foo,size=x,numa-node=y device-add dimm,slot=z,mem=foo memory-add would: 1. allocate host memory 2. apply policy for a specified node device-add would create DimmDevice which: 1. will own memory region 2. have properties: slot - addressing specific ACPI memory device through QEMU/ACPI OSPM layers [default: auto selected] start - address where memory is mapped [default: auto selected] 3. serve as proxy object for back-end size and proximity properties > I just wonder whether we need a DimmBus at all? Because if we get the > slot specified as in your examples then we could just set some dimm[n] > link<DIMM> on realize (question is where). We had a discussion once > about a missing callback when a link property is set to a new value, not > sure if that has been resolved meantime? > Can't think of a situation where we would have multiple DimmBus'es, only > some cases where it's on a SoC or SoM and not directly on the mainboard, > i.e. not /machine/dimm[n]. > Code seems to be copying ICC bus, but ICC bus was added because APIC > needed a hot-pluggable bus to replace SysBus. Bus here is not only source of implicit links but also a gateway that provides address space for mapping DimmDevice-s in acceptable way. I've tried /machine/link<dimm> looking for simpler solution, but result was more hackish, so I've dumped idea and returned to original Bus approach with established usage pattern. > > Hadn't Anthony and Ping Fang factored out a memory controller on the > i440fx? Patch 9 seems to add the bus directly to the i440fx PCI device. not upstream so far, there was ignored RFC by Hu Tao [http://lists.nongnu.org/archive/html/qemu-devel/2013-06/msg03500.html] This series provides memory hotplug in the least invasive way, and it would be trivial to move to factored out memory controller once it's upstream. > Patch 13 seems to use a Notifier for PIIX4 ACPI rather than having the > bus handle hotplug itself and bus / memory controller communicating with > ACPI as necessary (layering violation). For the CPU we initially had no > bus at all to handle such event (we still don't outside x86). I'll look at it. > > What I am not finding in this series is a translation from -m to initial > non-hotplugged DIMMs? it wasn't goal of this series for it would require factored out a memory controller, so not to increase mess in current code. > > Some random other remarks on the series: > * s/"dimmbus"/"dimm-bus"/g > * s/klass/oc/g -- there were loud protests against k* (class reserved) > * gtk-doc markup for parent fields is missing in header > * s/qdev/parent_obj/, s/qbus/parent_obj/ -- don't invite to use them > * s/dc/dbc/g -- dc is for DeviceClass > * DimmBusClass::register_memory() is never overwritten? In that case it > could well be just a static function called from realizefn - only It's not much of over-engineering and keeps things object oriented. > disctinction I can think of is how to allocate MemoryRegion. A > caller-settable DimmBus::slot_plugged() for i440fx to notify piix4 would > seem to solve the issue of the Notifier elegantly. PCIBus has such hooks. Thanks for a tip, I'll certainly try it. > > Regards, > Andreas >