Re: [PATCH v12 07/17] s390/vfio-ap: implement in-use callback for vfio_ap driver

2020-12-14 Thread Tony Krowiak




On 11/26/20 10:54 AM, Halil Pasic wrote:

On Tue, 24 Nov 2020 16:40:06 -0500
Tony Krowiak  wrote:


Let's implement the callback to indicate when an APQN
is in use by the vfio_ap device driver. The callback is
invoked whenever a change to the apmask or aqmask would
result in one or more queue devices being removed from the driver. The
vfio_ap device driver will indicate a resource is in use
if the APQN of any of the queue devices to be removed are assigned to
any of the matrix mdevs under the driver's control.

There is potential for a deadlock condition between the matrix_dev->lock
used to lock the matrix device during assignment of adapters and domains
and the ap_perms_mutex locked by the AP bus when changes are made to the
sysfs apmask/aqmask attributes.

Consider following scenario (courtesy of Halil Pasic):
1) apmask_store() takes ap_perms_mutex
2) assign_adapter_store() takes matrix_dev->lock
3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
to take matrix_dev->lock
4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
which tries to take ap_perms_mutex

BANG!

To resolve this issue, instead of using the mutex_lock(_dev->lock)
function to lock the matrix device during assignment of an adapter or
domain to a matrix_mdev as well as during the in_use callback, the
mutex_trylock(_dev->lock) function will be used. If the lock is not
obtained, then the assignment and in_use functions will terminate with
-EBUSY.

Good news is: the final product is OK with regards to in_use(). Bad news
is: this patch does not do enough. At this stage we are still racy.

The problem is that the assign operations don't bother to take the
ap_perms_mutex lock under the matrix_dev->lock.

The scenario is the following:
1) apmask_store() takes ap_perms_mutex
2) apmask_store() calls vfio_ap_mdev_resource_in_use() which
  takes matrix_dev->lock
3) vfio_ap_mdev_resource_in_use() releases matrix_dev->lock
and returns 0
4) assign_adapter_store() takes matrix_dev->lock does the
assign (the queues are still bound to vfio_ap) and releases
matrix_dev->lock
5) apmask_store() carries on, does the update to apask and releases
ap_perms_mutex
6) The queues get 'stolen' from vfio ap while used.


You're missing an interim step between 5 and 6 where the apmask_store()
function executes the device_reprobe() function which results in queues
to be taken from vfio_ap getting unbound. In this case, the
vfio_ap_mdev_remove_queue() function gets called to remove the
queues resulting in unplugging



This gets fixed with "s390/vfio-ap: allow assignment of unavailable AP
queues to mdev device". Maybe we can reorder these patches. I didn't
look into that.

We could also just ignore the problem, because it is just for a couple
of commits, but I would prefer it gone.


Reordering the patches is not a trivial task, I perfer not to do it.



Regards,
Halil








Re: [PATCH v12 07/17] s390/vfio-ap: implement in-use callback for vfio_ap driver

2020-11-26 Thread Halil Pasic
On Tue, 24 Nov 2020 16:40:06 -0500
Tony Krowiak  wrote:

> Let's implement the callback to indicate when an APQN
> is in use by the vfio_ap device driver. The callback is
> invoked whenever a change to the apmask or aqmask would
> result in one or more queue devices being removed from the driver. The
> vfio_ap device driver will indicate a resource is in use
> if the APQN of any of the queue devices to be removed are assigned to
> any of the matrix mdevs under the driver's control.
> 
> There is potential for a deadlock condition between the matrix_dev->lock
> used to lock the matrix device during assignment of adapters and domains
> and the ap_perms_mutex locked by the AP bus when changes are made to the
> sysfs apmask/aqmask attributes.
> 
> Consider following scenario (courtesy of Halil Pasic):
> 1) apmask_store() takes ap_perms_mutex
> 2) assign_adapter_store() takes matrix_dev->lock
> 3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
>to take matrix_dev->lock
> 4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
>which tries to take ap_perms_mutex
> 
> BANG!
> 
> To resolve this issue, instead of using the mutex_lock(_dev->lock)
> function to lock the matrix device during assignment of an adapter or
> domain to a matrix_mdev as well as during the in_use callback, the
> mutex_trylock(_dev->lock) function will be used. If the lock is not
> obtained, then the assignment and in_use functions will terminate with
> -EBUSY.

Good news is: the final product is OK with regards to in_use(). Bad news
is: this patch does not do enough. At this stage we are still racy.

The problem is that the assign operations don't bother to take the
ap_perms_mutex lock under the matrix_dev->lock.

The scenario is the following:
1) apmask_store() takes ap_perms_mutex
2) apmask_store() calls vfio_ap_mdev_resource_in_use() which
 takes matrix_dev->lock
3) vfio_ap_mdev_resource_in_use() releases matrix_dev->lock
   and returns 0
4) assign_adapter_store() takes matrix_dev->lock does the
   assign (the queues are still bound to vfio_ap) and releases
   matrix_dev->lock 
5) apmask_store() carries on, does the update to apask and releases
   ap_perms_mutex
6) The queues get 'stolen' from vfio ap while used.

This gets fixed with "s390/vfio-ap: allow assignment of unavailable AP
queues to mdev device". Maybe we can reorder these patches. I didn't
look into that.

We could also just ignore the problem, because it is just for a couple
of commits, but I would prefer it gone.

Regards,
Halil
   




[PATCH v12 07/17] s390/vfio-ap: implement in-use callback for vfio_ap driver

2020-11-24 Thread Tony Krowiak
Let's implement the callback to indicate when an APQN
is in use by the vfio_ap device driver. The callback is
invoked whenever a change to the apmask or aqmask would
result in one or more queue devices being removed from the driver. The
vfio_ap device driver will indicate a resource is in use
if the APQN of any of the queue devices to be removed are assigned to
any of the matrix mdevs under the driver's control.

There is potential for a deadlock condition between the matrix_dev->lock
used to lock the matrix device during assignment of adapters and domains
and the ap_perms_mutex locked by the AP bus when changes are made to the
sysfs apmask/aqmask attributes.

Consider following scenario (courtesy of Halil Pasic):
1) apmask_store() takes ap_perms_mutex
2) assign_adapter_store() takes matrix_dev->lock
3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
   to take matrix_dev->lock
4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
   which tries to take ap_perms_mutex

BANG!

To resolve this issue, instead of using the mutex_lock(_dev->lock)
function to lock the matrix device during assignment of an adapter or
domain to a matrix_mdev as well as during the in_use callback, the
mutex_trylock(_dev->lock) function will be used. If the lock is not
obtained, then the assignment and in_use functions will terminate with
-EBUSY.

Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_drv.c |  1 +
 drivers/s390/crypto/vfio_ap_ops.c | 96 +++
 drivers/s390/crypto/vfio_ap_private.h |  2 +
 3 files changed, 71 insertions(+), 28 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
b/drivers/s390/crypto/vfio_ap_drv.c
index 73bd073fd5d3..8934471b7944 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -147,6 +147,7 @@ static int __init vfio_ap_init(void)
memset(_ap_drv, 0, sizeof(vfio_ap_drv));
vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
+   vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
vfio_ap_drv.ids = ap_queue_ids;
 
ret = ap_driver_register(_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 07caf871943c..3c2479d7e674 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -520,18 +520,40 @@ vfio_ap_mdev_verify_queues_reserved_for_apid(struct 
ap_matrix_mdev *matrix_mdev,
return 0;
 }
 
+#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
+"already assigned to %s"
+
+static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
+unsigned long *apm,
+unsigned long *aqm)
+{
+   unsigned long apid, apqi;
+
+   for_each_set_bit_inv(apid, apm, AP_DEVICES)
+   for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
+   pr_warn(MDEV_SHARING_ERR, apid, apqi, mdev_name);
+}
+
 /**
  * vfio_ap_mdev_verify_no_sharing
  *
- * Verifies that the APQNs derived from the cross product of the AP adapter IDs
- * and AP queue indexes comprising the AP matrix are not configured for another
+ * Verifies that each APQN derived from the cross product of the AP adapter IDs
+ * and AP queue indexes comprising an AP matrix is not assigned to a
  * mediated device. AP queue sharing is not allowed.
  *
- * @matrix_mdev: the mediated matrix device
+ * @matrix_mdev: the mediated matrix device to which the APQNs being verified
+ *  are assigned. If the value is not NULL, then verification will
+ *  proceed for all other matrix mediated devices; otherwise, all
+ *  matrix mediated devices will be verified.
+ * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
+ * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
  *
- * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
+ * Returns 0 if no APQNs are not shared, otherwise; returns -EBUSY if one
+ * or more APQNs are shared.
  */
-static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
+static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long *mdev_apm,
+ unsigned long *mdev_aqm)
 {
struct ap_matrix_mdev *lstdev;
DECLARE_BITMAP(apm, AP_DEVICES);
@@ -548,15 +570,16 @@ static int vfio_ap_mdev_verify_no_sharing(struct 
ap_matrix_mdev *matrix_mdev)
 * We work on full longs, as we can only exclude the leftover
 * bits in non-inverse order. The leftover is all zeros.
 */
-   if (!bitmap_and(apm, matrix_mdev->matrix.apm,
-   lstdev->matrix.apm, AP_DEVICES))
+   if (!bitmap_and(apm, mdev_apm,