On 05/06/2016 08:39 AM, Alex Williamson wrote:
On Wed, 4 May 2016 16:52:13 +1000
Alexey Kardashevskiy <a...@ozlabs.ru> wrote:
This postpones VFIO container deinitialization to let region_del()
callbacks (called via vfio_listener_release) do proper clean up
while the group is still attached to the container.
Any mappings within the container should clean themselves up when the
container is deprivleged by removing the last group in the kernel. Is
the issue that that doesn't happen, which would be a spapr vfio kernel
bug, or that our QEMU side structures get all out of whack if we let
that happen?
My mailbase got corrupted, missed that.
This is mostly for "[PATCH qemu v16 17/19] spapr_iommu, vfio, memory:
Notify IOMMU about starting/stopping being used by VFIO", I should have put
01/19 and 02/19 right before 17/19, sorry about that.
Every reboot the spapr machine removes all (i.e. one or two) windows and
creates the default one.
I do this by memory_region_del_subregion(iommu_mr) +
memory_region_add_subregion(iommu_mr). Which gets translated to
VFIO_IOMMU_SPAPR_TCE_REMOVE + VFIO_IOMMU_SPAPR_TCE_CREATE via
vfio_memory_listener if there is VFIO; no direct calls from spapr to vfio
=> cool. During the machine reset, the VFIO device is there with the
container and groups attached, at some point with no windows.
Now to VFIO plug/unplug.
When VFIO plug happens, vfio_memory_listener is created, region_add() is
called, the hardware window is created (via VFIO_IOMMU_SPAPR_TCE_CREATE).
Unplugging should end up doing VFIO_IOMMU_SPAPR_TCE_REMOVE somehow. If
region_del() is not called when the container is being destroyed (as before
this patchset), then the kernel cleans and destroys windows when
close(container->fd) is called or when qemu is killed (and this fd is
naturally closed), I hope this answers the comment from 02/19.
So far so good (right?)
However I also have a guest view of the TCE table, this is what the guest
sees and this is what emulated PCI devices use. This guest view is either
allocated in the KVM (so H_PUT_TCE can be handled quickly right in the host
kernel, even in real mode) or userspace (VFIO case).
I generally want the guest view to be in the KVM. However when I plug VFIO,
I have to move the table to the userspace. When I unplug VFIO, I want to do
the opposite so I need a way to tell spapr that it can move the table.
region_del() seemed a natural way of doing this as region_add() is already
doing the opposite part.
With this patchset, each IOMMU MR gets a usage counter, region_add() does
+1, region_del() does -1 (yeah, not extremely optimal during reset). When
the counter goes from 0 to 1, vfio_start() hook is called, when the counter
becomes 0 - vfio_stop(). Note that we may have multiple VFIO containers on
the same PHB.
Without 01/19 and 02/19, I'll have to repeat region_del()'s counter
decrement steps in vfio_disconnect_container(). And I still cannot move
counting from region_add() to vfio_connect_container() so there will be
asymmetry which I am fine with, I am just checking here - what would be the
best approach here?
Thanks.
Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
---
hw/vfio/common.c | 22 +++++++++++++++-------
1 file changed, 15 insertions(+), 7 deletions(-)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index fe5ec6a..0b40262 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -921,23 +921,31 @@ static void vfio_disconnect_container(VFIOGroup *group)
{
VFIOContainer *container = group->container;
- if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, &container->fd)) {
- error_report("vfio: error disconnecting group %d from container",
- group->groupid);
- }
-
QLIST_REMOVE(group, container_next);
+
+ if (QLIST_EMPTY(&container->group_list)) {
+ VFIOGuestIOMMU *giommu;
+
+ vfio_listener_release(container);
+
+ QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
+ memory_region_unregister_iommu_notifier(&giommu->n);
+ }
+ }
+
group->container = NULL;
+ if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, &container->fd)) {
+ error_report("vfio: error disconnecting group %d from container",
+ group->groupid);
+ }
if (QLIST_EMPTY(&container->group_list)) {
VFIOAddressSpace *space = container->space;
VFIOGuestIOMMU *giommu, *tmp;
- vfio_listener_release(container);
QLIST_REMOVE(container, next);
QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
- memory_region_unregister_iommu_notifier(&giommu->n);
QLIST_REMOVE(giommu, giommu_next);
g_free(giommu);
}
I'm not spotting why this is a 2-pass process vs simply moving the
existing QLIST_EMPTY cleanup above the ioctl. Thanks,
--
Alexey