On 11/30/21 10:28, David Hildenbrand wrote: > With VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we signal the VM that reading > unplugged memory is not supported. We have to fail feature negotiation > in case the guest does not support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE. > > First, VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE is required to properly handle > memory backends (or architectures) without support for the shared zeropage > in the hypervisor cleanly. Without the shared zeropage, even reading an > unpopulated virtual memory location can populate real memory and > consequently consume memory in the hypervisor. We have a guaranteed shared > zeropage only on MAP_PRIVATE anonymous memory. > > Second, we want VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE to be the default > long-term as even populating the shared zeropage can be problematic: for > example, without THP support (possible) or without support for the shared > huge zeropage with THP (unlikely), the PTE page tables to hold the shared > zeropage entries can consume quite some memory that cannot be reclaimed > easily. > > Third, there are other optimizations+features (e.g., protection of > unplugged memory, reducing the total memory slot size and bitmap sizes) > that will require VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE. > > We really only support x86 targets with virtio-mem for now (and > Linux similarly only support x86), but that might change soon, so prepare > for different targets already. > > Add a new "unplugged-inaccessible" tristate property for x86 targets: > - "off" will keep VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE unset and legacy > guests working. > - "on" will set VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE and stop legacy guests > from using the device. > - "auto" selects the default based on support for the shared zeropage. > > Warn in case the property is set to "off" and we don't have support for the > shared zeropage. > > For existing compat machines, the property will default to "off", to > not change the behavior but eventually warn about a problematic setup. > Short-term, we'll set the property default to "auto" for new QEMU machines. > Mid-term, we'll set the property default to "on" for new QEMU machines. > Long-term, we'll deprecate the parameter and disallow legacy > guests completely. > > The property has to match on the migration source and destination. "auto" > will result in the same VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE setting as long > as the qemu command line (esp. memdev) match -- so "auto" is good enough > for migration purposes and the parameter doesn't have to be migrated > explicitly. > > Signed-off-by: David Hildenbrand <da...@redhat.com> > --- > hw/virtio/virtio-mem.c | 63 ++++++++++++++++++++++++++++++++++ > include/hw/virtio/virtio-mem.h | 8 +++++ > 2 files changed, 71 insertions(+) > > diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c > index d5a578142b..1e57156e81 100644 > --- a/hw/virtio/virtio-mem.c > +++ b/hw/virtio/virtio-mem.c > @@ -32,6 +32,14 @@ > #include CONFIG_DEVICES > #include "trace.h" > > +/* > + * We only had legacy x86 guests that did not support > + * VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE. Other targets don't have legacy > guests. > + */ > +#if defined(TARGET_X86_64) || defined(TARGET_I386) > +#define VIRTIO_MEM_HAS_LEGACY_GUESTS > +#endif > + > /* > * Let's not allow blocks smaller than 1 MiB, for example, to keep the > tracking > * bitmap small. > @@ -110,6 +118,19 @@ static uint64_t virtio_mem_default_block_size(RAMBlock > *rb) > return MAX(page_size, VIRTIO_MEM_MIN_BLOCK_SIZE); > } > > +#if defined(VIRTIO_MEM_HAS_LEGACY_GUESTS) > +static bool virtio_mem_has_shared_zeropage(RAMBlock *rb) > +{ > + /* > + * We only have a guaranteed shared zeropage on ordinary MAP_PRIVATE > + * anonymous RAM. In any other case, reading unplugged *can* populate a > + * fresh page, consuming actual memory. > + */ > + return !qemu_ram_is_shared(rb) && rb->fd < 0 && > + qemu_ram_pagesize(rb) == qemu_real_host_page_size; > +} > +#endif /* VIRTIO_MEM_HAS_LEGACY_GUESTS */ > + > /* > * Size the usable region bigger than the requested size if possible. Esp. > * Linux guests will only add (aligned) memory blocks in case they fully > @@ -653,15 +674,29 @@ static uint64_t virtio_mem_get_features(VirtIODevice > *vdev, uint64_t features, > Error **errp) > { > MachineState *ms = MACHINE(qdev_get_machine()); > + VirtIOMEM *vmem = VIRTIO_MEM(vdev); > > if (ms->numa_state) { > #if defined(CONFIG_ACPI) > virtio_add_feature(&features, VIRTIO_MEM_F_ACPI_PXM); > #endif > } > + assert(vmem->unplugged_inaccessible != ON_OFF_AUTO_AUTO); > + if (vmem->unplugged_inaccessible == ON_OFF_AUTO_ON) { > + virtio_add_feature(&features, VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE); > + } > return features; > } > > +static int virtio_mem_validate_features(VirtIODevice *vdev) > +{ > + if (virtio_host_has_feature(vdev, VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE) && > + !virtio_vdev_has_feature(vdev, VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE)) > { > + return -EFAULT; > + } > + return 0; > +} > + > static void virtio_mem_system_reset(void *opaque) > { > VirtIOMEM *vmem = VIRTIO_MEM(opaque); > @@ -716,6 +751,29 @@ static void virtio_mem_device_realize(DeviceState *dev, > Error **errp) > rb = vmem->memdev->mr.ram_block; > page_size = qemu_ram_pagesize(rb); > > +#if defined(VIRTIO_MEM_HAS_LEGACY_GUESTS) > + switch (vmem->unplugged_inaccessible) { > + case ON_OFF_AUTO_AUTO: > + if (virtio_mem_has_shared_zeropage(rb)) { > + vmem->unplugged_inaccessible = ON_OFF_AUTO_OFF; > + } else { > + vmem->unplugged_inaccessible = ON_OFF_AUTO_ON; > + } > + break; > + case ON_OFF_AUTO_OFF: > + if (!virtio_mem_has_shared_zeropage(rb)) { > + warn_report("'%s' property set to 'off' with a memdev that does" > + " not support the shared zeropage.", > + VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP); > + } > + break; > + default: > + break; > + } > +#else /* VIRTIO_MEM_HAS_LEGACY_GUESTS */ > + vmem->unplugged_inaccessible = ON_OFF_AUTO_ON; > +#endif /* VIRTIO_MEM_HAS_LEGACY_GUESTS */ > + > /* > * If the block size wasn't configured by the user, use a sane default. > This > * allows using hugetlbfs backends of any page size without manual > @@ -1109,6 +1167,10 @@ static Property virtio_mem_properties[] = { > DEFINE_PROP_UINT32(VIRTIO_MEM_NODE_PROP, VirtIOMEM, node, 0), > DEFINE_PROP_LINK(VIRTIO_MEM_MEMDEV_PROP, VirtIOMEM, memdev, > TYPE_MEMORY_BACKEND, HostMemoryBackend *), > +#if defined(VIRTIO_MEM_HAS_LEGACY_GUESTS) > + DEFINE_PROP_ON_OFF_AUTO(VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP, > VirtIOMEM, > + unplugged_inaccessible, ON_OFF_AUTO_OFF),
In theory, libvirt likes properties to be arch agnostic, but that boat sailed away already. And honestly, the default is reasonable so I don't think we will need to query for this capability. > +#endif > DEFINE_PROP_END_OF_LIST(), > }; > Reviewed-by: Michal Privoznik <mpriv...@redhat.com> Michal