Re: [Qemu-devel] [PATCH v3 3/3] pc-dimm: factor out address space logic into MemoryDevice code
> > > > > > >> + > > > > > > >> +memory_region_add_subregion(&hpms->mr, addr - hpms->base, > > > > > > >> mr); > > > > > > > missing vmstate registration? > > > > > > > > > > > > Missed this one: To be called by the caller. Important because e.g. > > > > > > for > > > > > > virtio-pmem we don't want this (I assume :) ). > > > > > if pmem isn't on shared storage, then We'd probably want to migrate > > > > > it as well, otherwise target would experience data loss. > > > > > Anyways, I'd just reat it as normal RAM in migration case > > > > > > > > Main difference between RAM and pmem it acts like combination of RAM > > > > and > > > > disk. > > > > Saying this, in normal use-case size would be 100 GB's - few TB's > > > > range. > > > > I am not sure we really want to migrate it for non-shared storage > > > > use-case. > > > with non shared storage you'd have to migrate it target host but > > > with shared storage it might be possible to flush it and use directly > > > from target host. That probably won't work right out of box and would > > > need some sort of synchronization between src/dst hosts. > > > > Shared storage should work out of the box. > > Only thing is data in destination > > host will be cache cold and existing pages in cache should be invalidated > > first. > > But if we migrate entire fake DAX RAMstate it will populate destination > > host page > > cache including pages while were idle in source host. This would > > unnecessarily > > create entropy in destination host. > > > > To me this feature don't make much sense. Problem which we are solving is: > > Efficiently use guest RAM. > What would live migration handover flow look like in case of > guest constantly dirting memory provided by virtio-pmem and > and sometimes issuing async flush req along with it? Dirty entire pmem (disk) at once not a usual scenario. Some part of disk/pmem would get dirty and we need to handle that. I just want to say moving entire pmem (disk) is not efficient solution because we are using this solution to manage guest memory efficiently. Otherwise it will be like any block device copy with non-shared storage. > > > > > The same applies to nv/pc-dimm as well, as backend file easily could be > > > on pmem storage as well. > > > > Are you saying backing file is in actual actual nvdimm hardware? we don't > > need > > emulation at all. > depends on if file is on DAX filesystem, but your argument about > migrating huge 100Gb- TB's range applies in this case as well. > > > > > > > > > Maybe for now we should migrate everything so it would work in case of > > > non shared NVDIMM on host. And then later add migration-less capability > > > to all of them. > > > > not sure I agree. > So would you inhibit migration in case of non shared backend storage, > to avoid loosing data since they aren't migrated? I am just thinking what features we want to support with pmem. And live migration with shared storage is the one which comes to my mind. If live migration with non-shared storage is what we want to support (I don't know yet) we can add this? Even with shared storage it would copy entire pmem state? Thanks, Pankaj > > > > > > One reason why nvdimm added vmstate info could be: still there would be > > > > transient > > > > writes in memory with fake DAX and there is no way(till now) to flush > > > > the > > > > guest > > > > writes. But with virtio-pmem we can flush such writes before migration > > > > and > > > > automatically > > > > at destination host with shared disk we will have updated data. > > > nvdimm has concept of flush address hint (may be not implemented in qemu > > > yet) > > > but it can flush. The only reason I'm buying into virtio-mem idea > > > is that would allow async flush queues which would reduce number > > > of vmexits. > > > > Thats correct. > > > > Thanks, > > Pankaj > > > > > > >
Re: [Qemu-devel] [PATCH v3 3/3] pc-dimm: factor out address space logic into MemoryDevice code
On Wed, 25 Apr 2018 09:56:49 -0400 (EDT) Pankaj Gupta wrote: > > > > > > > > > > > >> + > > > > > >> +memory_region_add_subregion(&hpms->mr, addr - hpms->base, > > > > > >> mr); > > > > > > missing vmstate registration? > > > > > > > > > > Missed this one: To be called by the caller. Important because e.g. > > > > > for > > > > > virtio-pmem we don't want this (I assume :) ). > > > > if pmem isn't on shared storage, then We'd probably want to migrate > > > > it as well, otherwise target would experience data loss. > > > > Anyways, I'd just reat it as normal RAM in migration case > > > > > > Main difference between RAM and pmem it acts like combination of RAM and > > > disk. > > > Saying this, in normal use-case size would be 100 GB's - few TB's range. > > > I am not sure we really want to migrate it for non-shared storage > > > use-case. > > with non shared storage you'd have to migrate it target host but > > with shared storage it might be possible to flush it and use directly > > from target host. That probably won't work right out of box and would > > need some sort of synchronization between src/dst hosts. > > Shared storage should work out of the box. > Only thing is data in destination > host will be cache cold and existing pages in cache should be invalidated > first. > But if we migrate entire fake DAX RAMstate it will populate destination host > page > cache including pages while were idle in source host. This would > unnecessarily > create entropy in destination host. > > To me this feature don't make much sense. Problem which we are solving is: > Efficiently use guest RAM. What would live migration handover flow look like in case of guest constantly dirting memory provided by virtio-pmem and and sometimes issuing async flush req along with it? > > The same applies to nv/pc-dimm as well, as backend file easily could be > > on pmem storage as well. > > Are you saying backing file is in actual actual nvdimm hardware? we don't > need > emulation at all. depends on if file is on DAX filesystem, but your argument about migrating huge 100Gb- TB's range applies in this case as well. > > > > > Maybe for now we should migrate everything so it would work in case of > > non shared NVDIMM on host. And then later add migration-less capability > > to all of them. > > not sure I agree. So would you inhibit migration in case of non shared backend storage, to avoid loosing data since they aren't migrated? > > > One reason why nvdimm added vmstate info could be: still there would be > > > transient > > > writes in memory with fake DAX and there is no way(till now) to flush the > > > guest > > > writes. But with virtio-pmem we can flush such writes before migration and > > > automatically > > > at destination host with shared disk we will have updated data. > > nvdimm has concept of flush address hint (may be not implemented in qemu > > yet) > > but it can flush. The only reason I'm buying into virtio-mem idea > > is that would allow async flush queues which would reduce number > > of vmexits. > > Thats correct. > > Thanks, > Pankaj > >
Re: [Qemu-devel] [PATCH v3 3/3] pc-dimm: factor out address space logic into MemoryDevice code
> > > > > > > > > >> + > > > > >> +memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr); > > > > > missing vmstate registration? > > > > > > > > Missed this one: To be called by the caller. Important because e.g. for > > > > virtio-pmem we don't want this (I assume :) ). > > > if pmem isn't on shared storage, then We'd probably want to migrate > > > it as well, otherwise target would experience data loss. > > > Anyways, I'd just reat it as normal RAM in migration case > > > > Main difference between RAM and pmem it acts like combination of RAM and > > disk. > > Saying this, in normal use-case size would be 100 GB's - few TB's range. > > I am not sure we really want to migrate it for non-shared storage use-case. > with non shared storage you'd have to migrate it target host but > with shared storage it might be possible to flush it and use directly > from target host. That probably won't work right out of box and would > need some sort of synchronization between src/dst hosts. Shared storage should work out of the box. Only thing is data in destination host will be cache cold and existing pages in cache should be invalidated first. But if we migrate entire fake DAX RAMstate it will populate destination host page cache including pages while were idle in source host. This would unnecessarily create entropy in destination host. To me this feature don't make much sense. Problem which we are solving is: Efficiently use guest RAM. > > The same applies to nv/pc-dimm as well, as backend file easily could be > on pmem storage as well. Are you saying backing file is in actual actual nvdimm hardware? we don't need emulation at all. > > Maybe for now we should migrate everything so it would work in case of > non shared NVDIMM on host. And then later add migration-less capability > to all of them. not sure I agree. > > > One reason why nvdimm added vmstate info could be: still there would be > > transient > > writes in memory with fake DAX and there is no way(till now) to flush the > > guest > > writes. But with virtio-pmem we can flush such writes before migration and > > automatically > > at destination host with shared disk we will have updated data. > nvdimm has concept of flush address hint (may be not implemented in qemu yet) > but it can flush. The only reason I'm buying into virtio-mem idea > is that would allow async flush queues which would reduce number > of vmexits. Thats correct. Thanks, Pankaj
Re: [Qemu-devel] [PATCH v3 3/3] pc-dimm: factor out address space logic into MemoryDevice code
On Wed, 25 Apr 2018 01:45:12 -0400 (EDT) Pankaj Gupta wrote: > > > > > > > > > >> +/* we will need a new memory slot for kvm and vhost */ > > > >> +if (kvm_enabled() && !kvm_has_free_slot(machine)) { > > > >> +error_setg(errp, "hypervisor has no free memory slots left"); > > > >> +return; > > > >> +} > > > >> +if (!vhost_has_free_slot()) { > > > >> +error_setg(errp, "a used vhost backend has no free memory > > > >> slots > > > >> left"); > > > >> +return; > > > >> +} > > > > move these checks to pre_plug time > > > > > > > >> + > > > >> +memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr); > > > > missing vmstate registration? > > > > > > Missed this one: To be called by the caller. Important because e.g. for > > > virtio-pmem we don't want this (I assume :) ). > > if pmem isn't on shared storage, then We'd probably want to migrate > > it as well, otherwise target would experience data loss. > > Anyways, I'd just reat it as normal RAM in migration case > > Main difference between RAM and pmem it acts like combination of RAM and disk. > Saying this, in normal use-case size would be 100 GB's - few TB's range. > I am not sure we really want to migrate it for non-shared storage use-case. with non shared storage you'd have to migrate it target host but with shared storage it might be possible to flush it and use directly from target host. That probably won't work right out of box and would need some sort of synchronization between src/dst hosts. The same applies to nv/pc-dimm as well, as backend file easily could be on pmem storage as well. Maybe for now we should migrate everything so it would work in case of non shared NVDIMM on host. And then later add migration-less capability to all of them. > One reason why nvdimm added vmstate info could be: still there would be > transient > writes in memory with fake DAX and there is no way(till now) to flush the > guest > writes. But with virtio-pmem we can flush such writes before migration and > automatically > at destination host with shared disk we will have updated data. nvdimm has concept of flush address hint (may be not implemented in qemu yet) but it can flush. The only reason I'm buying into virtio-mem idea is that would allow async flush queues which would reduce number of vmexits. > > > Thanks, > Pankaj > >
Re: [Qemu-devel] [PATCH v3 3/3] pc-dimm: factor out address space logic into MemoryDevice code
> > > > > > >> +/* we will need a new memory slot for kvm and vhost */ > > >> +if (kvm_enabled() && !kvm_has_free_slot(machine)) { > > >> +error_setg(errp, "hypervisor has no free memory slots left"); > > >> +return; > > >> +} > > >> +if (!vhost_has_free_slot()) { > > >> +error_setg(errp, "a used vhost backend has no free memory slots > > >> left"); > > >> +return; > > >> +} > > > move these checks to pre_plug time > > > > > >> + > > >> +memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr); > > > missing vmstate registration? > > > > Missed this one: To be called by the caller. Important because e.g. for > > virtio-pmem we don't want this (I assume :) ). > if pmem isn't on shared storage, then We'd probably want to migrate > it as well, otherwise target would experience data loss. > Anyways, I'd just reat it as normal RAM in migration case Main difference between RAM and pmem it acts like combination of RAM and disk. Saying this, in normal use-case size would be 100 GB's - few TB's range. I am not sure we really want to migrate it for non-shared storage use-case. One reason why nvdimm added vmstate info could be: still there would be transient writes in memory with fake DAX and there is no way(till now) to flush the guest writes. But with virtio-pmem we can flush such writes before migration and automatically at destination host with shared disk we will have updated data. Thanks, Pankaj
Re: [Qemu-devel] [PATCH v3 3/3] pc-dimm: factor out address space logic into MemoryDevice code
On 24.04.2018 16:44, Igor Mammedov wrote: > On Tue, 24 Apr 2018 15:41:23 +0200 > David Hildenbrand wrote: > >> On 24.04.2018 15:31, Igor Mammedov wrote: >>> On Mon, 23 Apr 2018 14:52:37 +0200 >>> David Hildenbrand wrote: >>> > >> +/* we will need a new memory slot for kvm and vhost */ >> +if (kvm_enabled() && !kvm_has_free_slot(machine)) { >> +error_setg(errp, "hypervisor has no free memory slots left"); >> +return; >> +} >> +if (!vhost_has_free_slot()) { >> +error_setg(errp, "a used vhost backend has no free memory slots >> left"); >> +return; >> +} > move these checks to pre_plug time > >> + >> +memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr); > missing vmstate registration? Missed this one: To be called by the caller. Important because e.g. for virtio-pmem we don't want this (I assume :) ). >>> if pmem isn't on shared storage, then We'd probably want to migrate >>> it as well, otherwise target would experience data loss. >>> Anyways, I'd just reat it as normal RAM in migration case >> >> Yes, if we realize that all MemoryDevices need this call, we can move it >> to that place, too. >> >> Wonder if we might want to make this configurable for virtio-pmem later >> on (via a flag or sth like that). > I don't see any reason why we wouldn't like it to be migrated, > it's the same as nvdimm but with another qemu:guest ABI > and async flush instead of sync one we have with nvdimm. > Didn't you just mention "shared storage" ? :) Anyhow, I leave such stuff to Pankaj to figure out. I remember him working on some page cache details. Once clarified, this is easily refactored later on. -- Thanks, David / dhildenb
Re: [Qemu-devel] [PATCH v3 3/3] pc-dimm: factor out address space logic into MemoryDevice code
On Tue, 24 Apr 2018 15:41:23 +0200 David Hildenbrand wrote: > On 24.04.2018 15:31, Igor Mammedov wrote: > > On Mon, 23 Apr 2018 14:52:37 +0200 > > David Hildenbrand wrote: > > > >>> > +/* we will need a new memory slot for kvm and vhost */ > +if (kvm_enabled() && !kvm_has_free_slot(machine)) { > +error_setg(errp, "hypervisor has no free memory slots left"); > +return; > +} > +if (!vhost_has_free_slot()) { > +error_setg(errp, "a used vhost backend has no free memory slots > left"); > +return; > +} > >>> move these checks to pre_plug time > >>> > + > +memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr); > >>> missing vmstate registration? > >> > >> Missed this one: To be called by the caller. Important because e.g. for > >> virtio-pmem we don't want this (I assume :) ). > > if pmem isn't on shared storage, then We'd probably want to migrate > > it as well, otherwise target would experience data loss. > > Anyways, I'd just reat it as normal RAM in migration case > > Yes, if we realize that all MemoryDevices need this call, we can move it > to that place, too. > > Wonder if we might want to make this configurable for virtio-pmem later > on (via a flag or sth like that). I don't see any reason why we wouldn't like it to be migrated, it's the same as nvdimm but with another qemu:guest ABI and async flush instead of sync one we have with nvdimm.
Re: [Qemu-devel] [PATCH v3 3/3] pc-dimm: factor out address space logic into MemoryDevice code
On Tue, 24 Apr 2018 15:39:30 +0200 David Hildenbrand wrote: > On 24.04.2018 15:28, Igor Mammedov wrote: > > On Mon, 23 Apr 2018 14:44:34 +0200 > > David Hildenbrand wrote: > > > >> On 23.04.2018 14:19, Igor Mammedov wrote: > >>> On Fri, 20 Apr 2018 14:34:56 +0200 > >>> David Hildenbrand wrote: > > considering v4 queued, I'm dropping mostly nor relevant points at this > > point. > > wrt, virtio I'll elaborate more in reply to Pankaj > > > > [...] > > > diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c > index b860c9c582..b96efa3bf4 100644 > --- a/hw/mem/memory-device.c > +++ b/hw/mem/memory-device.c > @@ -15,6 +15,8 @@ > #include "qapi/error.h" > #include "hw/boards.h" > #include "qemu/range.h" > +#include "hw/virtio/vhost.h" > +#include "sysemu/kvm.h" > > static gint memory_device_addr_sort(gconstpointer a, gconstpointer b) > { > @@ -106,6 +108,166 @@ uint64_t get_plugged_memory_size(void) > return size; > } > > +static int memory_device_used_region_size_internal(Object *obj, void > *opaque) > +{ > +uint64_t *size = opaque; > + > +if (object_dynamic_cast(obj, TYPE_MEMORY_DEVICE)) { > +DeviceState *dev = DEVICE(obj); > +MemoryDeviceState *md = MEMORY_DEVICE(obj); > +MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(obj); > + > +if (dev->realized) { > +*size += mdc->get_region_size(md, &error_abort); > +} > +} > + > +object_child_foreach(obj, memory_device_used_region_size_internal, > opaque); > +return 0; > +} > + > +static uint64_t memory_device_used_region_size(void) > +{ > +uint64_t size = 0; > + > +memory_device_used_region_size_internal(qdev_get_machine(), &size); > + > +return size; > +} > + > +uint64_t memory_device_get_free_addr(uint64_t *hint, uint64_t align, > + uint64_t size, Error **errp) > >>> I'd suggest to pc_dimm_memory_plug/pc_dimm_get_free_addr first, > >>> namely most of the stuff it does like checks and assigning default > >>> values should go to pre_plug (pre realize) handler and then only > >>> actual mapping is left for plug (after realize) handler to deal with: > >>> > >> > >> Can you elaborate what you mean by pre-plug? If this is about pre plug > >> handler of the (machine) hotplug handler, it might be problematic for > >> virtio devices. > > yes, something along these lines: c871bc70b > > > > > > Yes, we can factor that out (at least) for pc-dimm later on easily. > Seems to be just about moving a couple of calls. yep, but there is nice side effect, there is no need to call devicefoo::unrealize() on failure since devicefoo:realize() hasn't been called yet.
Re: [Qemu-devel] [PATCH v3 3/3] pc-dimm: factor out address space logic into MemoryDevice code
On 24.04.2018 15:31, Igor Mammedov wrote: > On Mon, 23 Apr 2018 14:52:37 +0200 > David Hildenbrand wrote: > >>> +/* we will need a new memory slot for kvm and vhost */ +if (kvm_enabled() && !kvm_has_free_slot(machine)) { +error_setg(errp, "hypervisor has no free memory slots left"); +return; +} +if (!vhost_has_free_slot()) { +error_setg(errp, "a used vhost backend has no free memory slots left"); +return; +} >>> move these checks to pre_plug time >>> + +memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr); >>> missing vmstate registration? >> >> Missed this one: To be called by the caller. Important because e.g. for >> virtio-pmem we don't want this (I assume :) ). > if pmem isn't on shared storage, then We'd probably want to migrate > it as well, otherwise target would experience data loss. > Anyways, I'd just reat it as normal RAM in migration case Yes, if we realize that all MemoryDevices need this call, we can move it to that place, too. Wonder if we might want to make this configurable for virtio-pmem later on (via a flag or sth like that). -- Thanks, David / dhildenb
Re: [Qemu-devel] [PATCH v3 3/3] pc-dimm: factor out address space logic into MemoryDevice code
On 24.04.2018 15:28, Igor Mammedov wrote: > On Mon, 23 Apr 2018 14:44:34 +0200 > David Hildenbrand wrote: > >> On 23.04.2018 14:19, Igor Mammedov wrote: >>> On Fri, 20 Apr 2018 14:34:56 +0200 >>> David Hildenbrand wrote: > considering v4 queued, I'm dropping mostly nor relevant points at this point. > wrt, virtio I'll elaborate more in reply to Pankaj > > [...] > diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c index b860c9c582..b96efa3bf4 100644 --- a/hw/mem/memory-device.c +++ b/hw/mem/memory-device.c @@ -15,6 +15,8 @@ #include "qapi/error.h" #include "hw/boards.h" #include "qemu/range.h" +#include "hw/virtio/vhost.h" +#include "sysemu/kvm.h" static gint memory_device_addr_sort(gconstpointer a, gconstpointer b) { @@ -106,6 +108,166 @@ uint64_t get_plugged_memory_size(void) return size; } +static int memory_device_used_region_size_internal(Object *obj, void *opaque) +{ +uint64_t *size = opaque; + +if (object_dynamic_cast(obj, TYPE_MEMORY_DEVICE)) { +DeviceState *dev = DEVICE(obj); +MemoryDeviceState *md = MEMORY_DEVICE(obj); +MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(obj); + +if (dev->realized) { +*size += mdc->get_region_size(md, &error_abort); +} +} + +object_child_foreach(obj, memory_device_used_region_size_internal, opaque); +return 0; +} + +static uint64_t memory_device_used_region_size(void) +{ +uint64_t size = 0; + +memory_device_used_region_size_internal(qdev_get_machine(), &size); + +return size; +} + +uint64_t memory_device_get_free_addr(uint64_t *hint, uint64_t align, + uint64_t size, Error **errp) >>> I'd suggest to pc_dimm_memory_plug/pc_dimm_get_free_addr first, >>> namely most of the stuff it does like checks and assigning default >>> values should go to pre_plug (pre realize) handler and then only >>> actual mapping is left for plug (after realize) handler to deal with: >>> >> >> Can you elaborate what you mean by pre-plug? If this is about pre plug >> handler of the (machine) hotplug handler, it might be problematic for >> virtio devices. > yes, something along these lines: c871bc70b > > Yes, we can factor that out (at least) for pc-dimm later on easily. Seems to be just about moving a couple of calls. -- Thanks, David / dhildenb
Re: [Qemu-devel] [PATCH v3 3/3] pc-dimm: factor out address space logic into MemoryDevice code
On Mon, 23 Apr 2018 14:52:37 +0200 David Hildenbrand wrote: > > > >> +/* we will need a new memory slot for kvm and vhost */ > >> +if (kvm_enabled() && !kvm_has_free_slot(machine)) { > >> +error_setg(errp, "hypervisor has no free memory slots left"); > >> +return; > >> +} > >> +if (!vhost_has_free_slot()) { > >> +error_setg(errp, "a used vhost backend has no free memory slots > >> left"); > >> +return; > >> +} > > move these checks to pre_plug time > > > >> + > >> +memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr); > > missing vmstate registration? > > Missed this one: To be called by the caller. Important because e.g. for > virtio-pmem we don't want this (I assume :) ). if pmem isn't on shared storage, then We'd probably want to migrate it as well, otherwise target would experience data loss. Anyways, I'd just reat it as normal RAM in migration case > > Thanks! > >
Re: [Qemu-devel] [PATCH v3 3/3] pc-dimm: factor out address space logic into MemoryDevice code
On Mon, 23 Apr 2018 14:44:34 +0200 David Hildenbrand wrote: > On 23.04.2018 14:19, Igor Mammedov wrote: > > On Fri, 20 Apr 2018 14:34:56 +0200 > > David Hildenbrand wrote: considering v4 queued, I'm dropping mostly nor relevant points at this point. wrt, virtio I'll elaborate more in reply to Pankaj [...] > >> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c > >> index b860c9c582..b96efa3bf4 100644 > >> --- a/hw/mem/memory-device.c > >> +++ b/hw/mem/memory-device.c > >> @@ -15,6 +15,8 @@ > >> #include "qapi/error.h" > >> #include "hw/boards.h" > >> #include "qemu/range.h" > >> +#include "hw/virtio/vhost.h" > >> +#include "sysemu/kvm.h" > >> > >> static gint memory_device_addr_sort(gconstpointer a, gconstpointer b) > >> { > >> @@ -106,6 +108,166 @@ uint64_t get_plugged_memory_size(void) > >> return size; > >> } > >> > >> +static int memory_device_used_region_size_internal(Object *obj, void > >> *opaque) > >> +{ > >> +uint64_t *size = opaque; > >> + > >> +if (object_dynamic_cast(obj, TYPE_MEMORY_DEVICE)) { > >> +DeviceState *dev = DEVICE(obj); > >> +MemoryDeviceState *md = MEMORY_DEVICE(obj); > >> +MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(obj); > >> + > >> +if (dev->realized) { > >> +*size += mdc->get_region_size(md, &error_abort); > >> +} > >> +} > >> + > >> +object_child_foreach(obj, memory_device_used_region_size_internal, > >> opaque); > >> +return 0; > >> +} > >> + > >> +static uint64_t memory_device_used_region_size(void) > >> +{ > >> +uint64_t size = 0; > >> + > >> +memory_device_used_region_size_internal(qdev_get_machine(), &size); > >> + > >> +return size; > >> +} > >> + > >> +uint64_t memory_device_get_free_addr(uint64_t *hint, uint64_t align, > >> + uint64_t size, Error **errp) > > I'd suggest to pc_dimm_memory_plug/pc_dimm_get_free_addr first, > > namely most of the stuff it does like checks and assigning default > > values should go to pre_plug (pre realize) handler and then only > > actual mapping is left for plug (after realize) handler to deal with: > > > > Can you elaborate what you mean by pre-plug? If this is about pre plug > handler of the (machine) hotplug handler, it might be problematic for > virtio devices. yes, something along these lines: c871bc70b
Re: [Qemu-devel] [PATCH v3 3/3] pc-dimm: factor out address space logic into MemoryDevice code
On 23.04.2018 14:19, Igor Mammedov wrote: > On Fri, 20 Apr 2018 14:34:56 +0200 > David Hildenbrand wrote: > >> To be able to reuse MemoryDevice logic from other devices besides >> pc-dimm, factor the relevant stuff out into the MemoryDevice code. >> >> As we don't care about slots for memory devices that are not pc-dimm, >> don't factor that part out. > that's not really true, you still consume kvm and vhost slots (whatever it is) > whenever you map it into address space as ram memory region. > > Also ram_slots currently are (ab)used as flag that user enabled memory > hotplug via CLI. > >> Most of this patch just moves checks and logic around. While at it, make >> the code properly detect certain error conditions better (e.g. fragmented >> memory). > I'd suggest splitting patch in several smaller ones if possible, > especially parts that do anything more than just moving code around. > > >> Signed-off-by: David Hildenbrand >> --- >> hw/i386/pc.c | 12 +-- >> hw/mem/memory-device.c | 162 >> hw/mem/pc-dimm.c | 185 >> +++-- >> hw/ppc/spapr.c | 9 +- >> include/hw/mem/memory-device.h | 4 + >> include/hw/mem/pc-dimm.h | 14 +--- >> 6 files changed, 185 insertions(+), 201 deletions(-) >> >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c >> index fa8862af33..1c25546a0c 100644 >> --- a/hw/i386/pc.c >> +++ b/hw/i386/pc.c >> @@ -1711,7 +1711,7 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev, >> goto out; >> } >> >> -pc_dimm_memory_plug(dev, &pcms->hotplug_memory, mr, align, &local_err); >> +pc_dimm_memory_plug(dev, align, &local_err); > Is there a reason why you are dropping pcms->hotplug_memory argument > and fall back to qdev_get_machine()? > > I'd rather see it going other direction, > i.e. move hotplug_memory from PC > machine to MachineState and then pass it down as argument whenever it's > needed. FWIW, I think I found a way to split this into smaller patches. The current prototypes will look like this for pc_dimm void pc_dimm_memory_plug(DeviceState *dev, MachineState *machine, uint64_t align, Error **errp); void pc_dimm_memory_unplug(DeviceState *dev, MachineState *machine); I am not sure yet if I'll work on the pre-plug stuff for pc-dimm (I want to get memory devices running not rewrite all of the pc-dimm memory hotplug code :) ), but that can be reworked later on easily. -- Thanks, David / dhildenb
Re: [Qemu-devel] [PATCH v3 3/3] pc-dimm: factor out address space logic into MemoryDevice code
> >> +/* we will need a new memory slot for kvm and vhost */ >> +if (kvm_enabled() && !kvm_has_free_slot(machine)) { >> +error_setg(errp, "hypervisor has no free memory slots left"); >> +return; >> +} >> +if (!vhost_has_free_slot()) { >> +error_setg(errp, "a used vhost backend has no free memory slots >> left"); >> +return; >> +} > move these checks to pre_plug time > >> + >> +memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr); > missing vmstate registration? Missed this one: To be called by the caller. Important because e.g. for virtio-pmem we don't want this (I assume :) ). Thanks! -- Thanks, David / dhildenb
Re: [Qemu-devel] [PATCH v3 3/3] pc-dimm: factor out address space logic into MemoryDevice code
On 23.04.2018 14:19, Igor Mammedov wrote: > On Fri, 20 Apr 2018 14:34:56 +0200 > David Hildenbrand wrote: > >> To be able to reuse MemoryDevice logic from other devices besides >> pc-dimm, factor the relevant stuff out into the MemoryDevice code. >> >> As we don't care about slots for memory devices that are not pc-dimm, >> don't factor that part out. > that's not really true, you still consume kvm and vhost slots (whatever it is) > whenever you map it into address space as ram memory region. Let me rephrase ACPI slots are not of interest. That user visible part is not needed for other memory devices. KVM and VHOST slots are different (just memory regions, we don't care about which specific slot is taken) > > Also ram_slots currently are (ab)used as flag that user enabled memory > hotplug via CLI. Yes, have a patch for this :) > >> Most of this patch just moves checks and logic around. While at it, make >> the code properly detect certain error conditions better (e.g. fragmented >> memory). > I'd suggest splitting patch in several smaller ones if possible, > especially parts that do anything more than just moving code around. I tried to do it in smaller steps but most of it turned out to just introduce and delete temporary code. Will have a look if this can be done after we got a common understanding of what the end result should look like. > > >> Signed-off-by: David Hildenbrand >> --- >> hw/i386/pc.c | 12 +-- >> hw/mem/memory-device.c | 162 >> hw/mem/pc-dimm.c | 185 >> +++-- >> hw/ppc/spapr.c | 9 +- >> include/hw/mem/memory-device.h | 4 + >> include/hw/mem/pc-dimm.h | 14 +--- >> 6 files changed, 185 insertions(+), 201 deletions(-) >> >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c >> index fa8862af33..1c25546a0c 100644 >> --- a/hw/i386/pc.c >> +++ b/hw/i386/pc.c >> @@ -1711,7 +1711,7 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev, >> goto out; >> } >> >> -pc_dimm_memory_plug(dev, &pcms->hotplug_memory, mr, align, &local_err); >> +pc_dimm_memory_plug(dev, align, &local_err); > Is there a reason why you are dropping pcms->hotplug_memory argument > and fall back to qdev_get_machine()? Yes, because we a) access machine either way internally (a couple of times even). b) this only works if we have a hotplug handler on the machine (esp: not for virtio devices). Otherwise we have to get the machine from the virtio realize function - also ugly. > > I'd rather see it going other direction, > i.e. move hotplug_memory from PC > machine to MachineState and then pass it down as argument whenever it's > needed. As said, ugly for virtio devices. And I don't see a benefit if we access the machine internally already either way. > >> if (local_err) { >> goto out; >> } >> @@ -1761,17 +1761,9 @@ static void pc_dimm_unplug(HotplugHandler >> *hotplug_dev, >> DeviceState *dev, Error **errp) >> { >> PCMachineState *pcms = PC_MACHINE(hotplug_dev); >> -PCDIMMDevice *dimm = PC_DIMM(dev); >> -PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm); >> -MemoryRegion *mr; >> HotplugHandlerClass *hhc; >> Error *local_err = NULL; >> >> -mr = ddc->get_memory_region(dimm, &local_err); >> -if (local_err) { >> -goto out; >> -} >> - >> hhc = HOTPLUG_HANDLER_GET_CLASS(pcms->acpi_dev); >> hhc->unplug(HOTPLUG_HANDLER(pcms->acpi_dev), dev, &local_err); >> >> @@ -1779,7 +1771,7 @@ static void pc_dimm_unplug(HotplugHandler *hotplug_dev, >> goto out; >> } >> >> -pc_dimm_memory_unplug(dev, &pcms->hotplug_memory, mr); >> +pc_dimm_memory_unplug(dev); > ditto (and I still think it looks cleaner, but we can discuss) > >> object_unparent(OBJECT(dev)); >> >> out: >> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c >> index b860c9c582..b96efa3bf4 100644 >> --- a/hw/mem/memory-device.c >> +++ b/hw/mem/memory-device.c >> @@ -15,6 +15,8 @@ >> #include "qapi/error.h" >> #include "hw/boards.h" >> #include "qemu/range.h" >> +#include "hw/virtio/vhost.h" >> +#include "sysemu/kvm.h" >> >> static gint memory_device_addr_sort(gconstpointer a, gconstpointer b) >> { >> @@ -106,6 +108,166 @@ uint64_t get_plugged_memory_size(void) >> return size; >> } >> >> +static int memory_device_used_region_size_internal(Object *obj, void >> *opaque) >> +{ >> +uint64_t *size = opaque; >> + >> +if (object_dynamic_cast(obj, TYPE_MEMORY_DEVICE)) { >> +DeviceState *dev = DEVICE(obj); >> +MemoryDeviceState *md = MEMORY_DEVICE(obj); >> +MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(obj); >> + >> +if (dev->realized) { >> +*size += mdc->get_region_size(md, &error_abort); >> +} >> +} >> + >> +object_child_foreach(obj, memory_device_used_region_size_internal, >>
Re: [Qemu-devel] [PATCH v3 3/3] pc-dimm: factor out address space logic into MemoryDevice code
On Fri, 20 Apr 2018 14:34:56 +0200 David Hildenbrand wrote: > To be able to reuse MemoryDevice logic from other devices besides > pc-dimm, factor the relevant stuff out into the MemoryDevice code. > > As we don't care about slots for memory devices that are not pc-dimm, > don't factor that part out. that's not really true, you still consume kvm and vhost slots (whatever it is) whenever you map it into address space as ram memory region. Also ram_slots currently are (ab)used as flag that user enabled memory hotplug via CLI. > Most of this patch just moves checks and logic around. While at it, make > the code properly detect certain error conditions better (e.g. fragmented > memory). I'd suggest splitting patch in several smaller ones if possible, especially parts that do anything more than just moving code around. > Signed-off-by: David Hildenbrand > --- > hw/i386/pc.c | 12 +-- > hw/mem/memory-device.c | 162 > hw/mem/pc-dimm.c | 185 > +++-- > hw/ppc/spapr.c | 9 +- > include/hw/mem/memory-device.h | 4 + > include/hw/mem/pc-dimm.h | 14 +--- > 6 files changed, 185 insertions(+), 201 deletions(-) > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > index fa8862af33..1c25546a0c 100644 > --- a/hw/i386/pc.c > +++ b/hw/i386/pc.c > @@ -1711,7 +1711,7 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev, > goto out; > } > > -pc_dimm_memory_plug(dev, &pcms->hotplug_memory, mr, align, &local_err); > +pc_dimm_memory_plug(dev, align, &local_err); Is there a reason why you are dropping pcms->hotplug_memory argument and fall back to qdev_get_machine()? I'd rather see it going other direction, i.e. move hotplug_memory from PC machine to MachineState and then pass it down as argument whenever it's needed. > if (local_err) { > goto out; > } > @@ -1761,17 +1761,9 @@ static void pc_dimm_unplug(HotplugHandler *hotplug_dev, > DeviceState *dev, Error **errp) > { > PCMachineState *pcms = PC_MACHINE(hotplug_dev); > -PCDIMMDevice *dimm = PC_DIMM(dev); > -PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm); > -MemoryRegion *mr; > HotplugHandlerClass *hhc; > Error *local_err = NULL; > > -mr = ddc->get_memory_region(dimm, &local_err); > -if (local_err) { > -goto out; > -} > - > hhc = HOTPLUG_HANDLER_GET_CLASS(pcms->acpi_dev); > hhc->unplug(HOTPLUG_HANDLER(pcms->acpi_dev), dev, &local_err); > > @@ -1779,7 +1771,7 @@ static void pc_dimm_unplug(HotplugHandler *hotplug_dev, > goto out; > } > > -pc_dimm_memory_unplug(dev, &pcms->hotplug_memory, mr); > +pc_dimm_memory_unplug(dev); ditto > object_unparent(OBJECT(dev)); > > out: > diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c > index b860c9c582..b96efa3bf4 100644 > --- a/hw/mem/memory-device.c > +++ b/hw/mem/memory-device.c > @@ -15,6 +15,8 @@ > #include "qapi/error.h" > #include "hw/boards.h" > #include "qemu/range.h" > +#include "hw/virtio/vhost.h" > +#include "sysemu/kvm.h" > > static gint memory_device_addr_sort(gconstpointer a, gconstpointer b) > { > @@ -106,6 +108,166 @@ uint64_t get_plugged_memory_size(void) > return size; > } > > +static int memory_device_used_region_size_internal(Object *obj, void *opaque) > +{ > +uint64_t *size = opaque; > + > +if (object_dynamic_cast(obj, TYPE_MEMORY_DEVICE)) { > +DeviceState *dev = DEVICE(obj); > +MemoryDeviceState *md = MEMORY_DEVICE(obj); > +MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(obj); > + > +if (dev->realized) { > +*size += mdc->get_region_size(md, &error_abort); > +} > +} > + > +object_child_foreach(obj, memory_device_used_region_size_internal, > opaque); > +return 0; > +} > + > +static uint64_t memory_device_used_region_size(void) > +{ > +uint64_t size = 0; > + > +memory_device_used_region_size_internal(qdev_get_machine(), &size); > + > +return size; > +} > + > +uint64_t memory_device_get_free_addr(uint64_t *hint, uint64_t align, > + uint64_t size, Error **errp) I'd suggest to pc_dimm_memory_plug/pc_dimm_get_free_addr first, namely most of the stuff it does like checks and assigning default values should go to pre_plug (pre realize) handler and then only actual mapping is left for plug (after realize) handler to deal with: memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr); vmstate_register_ram(vmstate_mr, dev); > +{ > +const uint64_t used_region_size = memory_device_used_region_size(); > +uint64_t address_space_start, address_space_end; > +MachineState *machine = MACHINE(qdev_get_machine()); > +MachineClass *mc = MACHINE_GET_CLASS(machine); > +MemoryHotplugState *hpms; > +GSList *list = NULL, *item; > +ui