Re: [PATCH v15 00/13] s390/vfio-ap: dynamic configuration support

2021-04-09 Thread Tony Krowiak




On 4/8/21 4:38 PM, Halil Pasic wrote:

On Tue,  6 Apr 2021 11:31:09 -0400
Tony Krowiak  wrote:


Tony Krowiak (13):
   s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

The subsequent patches, re introduce this circular locking dependency
problem. See my kernel messages for the details. The link we severe
in the above patch is re-introduced at several places. One of them is
assign_adapter_store().


Like in the patch referenced above, the lockdep splat occurs when
the APCB masks are set which requires acquisition of the kvm lock.
Patch 08/13, allow hot plug/unplug of AP resources using mdev,
introduces code that updates the APCB masks whenever an
adapter, domain or control domain is assigned or unassigned
as well as when a queue device is probed or removed.
I think the solution from the patch above can be implemented
here to resolve this problem.



Regards,
Halil

[  +0.000236] vfio_ap matrix: MDEV: Registered
[  +0.037919] vfio_mdev 4f77ad87-1e62-4959-8b7a-c677c98d2194: Adding to iommu 
group 1
[  +0.92] vfio_mdev 4f77ad87-1e62-4959-8b7a-c677c98d2194: MDEV: group_id = 1

[Apr 8 22:31] ==
[  +0.02] WARNING: possible circular locking dependency detected
[  +0.02] 5.12.0-rc6-00016-g5bea90816c56 #57 Not tainted
[  +0.02] --
[  +0.02] CPU 1/KVM/6651 is trying to acquire lock:
[  +0.02] cef9d508 (_dev->lock){+.+.}-{3:3}, at: 
handle_pqap+0x56/0x1c8 [vfio_ap]
[  +0.11]
   but task is already holding lock:
[  +0.01] d41f4308 (>mutex){+.+.}-{3:3}, at: 
kvm_vcpu_ioctl+0x90/0x898 [kvm]
[  +0.38]
   which lock already depends on the new lock.

[  +0.02]
   the existing dependency chain (in reverse order) is:
[  +0.01]
   -> #2 (>mutex){+.+.}-{3:3}:
[  +0.04]validate_chain+0x796/0xa20
[  +0.06]__lock_acquire+0x420/0x7c8
[  +0.03]lock_acquire.part.0+0xec/0x1e8
[  +0.02]lock_acquire+0xb8/0x208
[  +0.02]__mutex_lock+0xa2/0x928
[  +0.05]mutex_lock_nested+0x32/0x40
[  +0.02]kvm_s390_cpus_to_pv+0x4e/0xf8 [kvm]
[  +0.19]kvm_s390_handle_pv+0x1ce/0x6b0 [kvm]
[  +0.18]kvm_arch_vm_ioctl+0x3ec/0x550 [kvm]
[  +0.19]kvm_vm_ioctl+0x40e/0x4a8 [kvm]
[  +0.18]__s390x_sys_ioctl+0xc0/0x100
[  +0.04]do_syscall+0x7e/0xd0
[  +0.43]__do_syscall+0xc0/0xd8
[  +0.04]system_call+0x72/0x98
[  +0.04]
   -> #1 (>lock){+.+.}-{3:3}:
[  +0.04]validate_chain+0x796/0xa20
[  +0.02]__lock_acquire+0x420/0x7c8
[  +0.02]lock_acquire.part.0+0xec/0x1e8
[  +0.02]lock_acquire+0xb8/0x208
[  +0.03]__mutex_lock+0xa2/0x928
[  +0.02]mutex_lock_nested+0x32/0x40
[  +0.02]kvm_arch_crypto_set_masks+0x4a/0x2b8 [kvm]
[  +0.18]vfio_ap_mdev_refresh_apcb+0xd0/0xe0 [vfio_ap]
[  +0.03]assign_adapter_store+0x1f2/0x240 [vfio_ap]
[  +0.03]kernfs_fop_write_iter+0x13e/0x1e0
[  +0.03]new_sync_write+0x10a/0x198
[  +0.03]vfs_write.part.0+0x196/0x290
[  +0.02]ksys_write+0x6c/0xf8
[  +0.03]do_syscall+0x7e/0xd0
[  +0.02]__do_syscall+0xc0/0xd8
[  +0.03]system_call+0x72/0x98
[  +0.02]
   -> #0 (_dev->lock){+.+.}-{3:3}:
[  +0.04]check_noncircular+0x16e/0x190
[  +0.02]check_prev_add+0xec/0xf38
[  +0.02]validate_chain+0x796/0xa20
[  +0.02]__lock_acquire+0x420/0x7c8
[  +0.02]lock_acquire.part.0+0xec/0x1e8
[  +0.02]lock_acquire+0xb8/0x208
[  +0.02]__mutex_lock+0xa2/0x928
[  +0.02]mutex_lock_nested+0x32/0x40
[  +0.03]handle_pqap+0x56/0x1c8 [vfio_ap]
[  +0.02]handle_pqap+0xe2/0x1d8 [kvm]
[  +0.19]kvm_handle_sie_intercept+0x134/0x248 [kvm]
[  +0.19]vcpu_post_run+0x2b6/0x580 [kvm]
[  +0.18]__vcpu_run+0x27e/0x388 [kvm]
[  +0.19]kvm_arch_vcpu_ioctl_run+0x10a/0x278 [kvm]
[  +0.18]kvm_vcpu_ioctl+0x2cc/0x898 [kvm]
[  +0.18]__s390x_sys_ioctl+0xc0/0x100
[  +0.03]do_syscall+0x7e/0xd0
[  +0.02]__do_syscall+0xc0/0xd8
[  +0.02]system_call+0x72/0x98
[  +0.03]
   other info that might help us debug this:

[  +0.01] Chain exists of:
 _dev->lock --> >lock --> >mutex

[  +0.05]  Possible unsafe locking scenario:

[  +0.01]CPU0CPU1
[  +0.01]
[  +0.02]   lock(>mutex);
[  +0.02]lock(>lock);
[  +0.02]lock(>mutex);
[  +0.02]   lock(_dev-&g

[PATCH v15 13/13] s390/vfio-ap: update docs to include dynamic config support

2021-04-06 Thread Tony Krowiak
Update the documentation in vfio-ap.rst to include information about the
AP dynamic configuration support (i.e., hot plug of adapters, domains
and control domains via the matrix mediated device's sysfs assignment
attributes).

Signed-off-by: Tony Krowiak 
---
 Documentation/s390/vfio-ap.rst | 383 -
 1 file changed, 284 insertions(+), 99 deletions(-)

diff --git a/Documentation/s390/vfio-ap.rst b/Documentation/s390/vfio-ap.rst
index e15436599086..031c2e5ee138 100644
--- a/Documentation/s390/vfio-ap.rst
+++ b/Documentation/s390/vfio-ap.rst
@@ -123,9 +123,9 @@ Let's now take a look at how AP instructions executed on a 
guest are interpreted
 by the hardware.
 
 A satellite control block called the Crypto Control Block (CRYCB) is attached 
to
-our main hardware virtualization control block. The CRYCB contains three fields
-to identify the adapters, usage domains and control domains assigned to the KVM
-guest:
+our main hardware virtualization control block. The CRYCB contains an AP 
Control
+Block (APCB) that has three fields to identify the adapters, usage domains and
+control domains assigned to the KVM guest:
 
 * The AP Mask (APM) field is a bit mask that identifies the AP adapters 
assigned
   to the KVM guest. Each bit in the mask, from left to right (i.e. from most
@@ -192,7 +192,7 @@ The design introduces three new objects:
 
 1. AP matrix device
 2. VFIO AP device driver (vfio_ap.ko)
-3. VFIO AP mediated matrix pass-through device
+3. VFIO AP mediated pass-through device
 
 The VFIO AP device driver
 -
@@ -200,12 +200,13 @@ The VFIO AP (vfio_ap) device driver serves the following 
purposes:
 
 1. Provides the interfaces to secure APQNs for exclusive use of KVM guests.
 
-2. Sets up the VFIO mediated device interfaces to manage a mediated matrix
+2. Sets up the VFIO mediated device interfaces to manage a vfio_ap mediated
device and creates the sysfs interfaces for assigning adapters, usage
domains, and control domains comprising the matrix for a KVM guest.
 
-3. Configures the APM, AQM and ADM in the CRYCB referenced by a KVM guest's
-   SIE state description to grant the guest access to a matrix of AP devices
+3. Configures the APM, AQM and ADM in the APCB contained in the CRYCB 
referenced
+   by a KVM guest's SIE state description to grant the guest access to a matrix
+   of AP devices
 
 Reserve APQNs for exclusive use of KVM guests
 -
@@ -253,7 +254,7 @@ The process for reserving an AP queue for use by a KVM 
guest is:
 1. The administrator loads the vfio_ap device driver
 2. The vfio-ap driver during its initialization will register a single 'matrix'
device with the device core. This will serve as the parent device for
-   all mediated matrix devices used to configure an AP matrix for a guest.
+   all vfio_ap mediated devices used to configure an AP matrix for a guest.
 3. The /sys/devices/vfio_ap/matrix device is created by the device core
 4. The vfio_ap device driver will register with the AP bus for AP queue devices
of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap
@@ -269,7 +270,7 @@ The process for reserving an AP queue for use by a KVM 
guest is:
default zcrypt cex4queue driver.
 8. The AP bus probes the vfio_ap device driver to bind the queues reserved for
it.
-9. The administrator creates a passthrough type mediated matrix device to be
+9. The administrator creates a passthrough type vfio_ap mediated device to be
used by a guest
 10. The administrator assigns the adapters, usage domains and control domains
 to be exclusively used by a guest.
@@ -279,14 +280,14 @@ Set up the VFIO mediated device interfaces
 The VFIO AP device driver utilizes the common interface of the VFIO mediated
 device core driver to:
 
-* Register an AP mediated bus driver to add a mediated matrix device to and
+* Register an AP mediated bus driver to add a vfio_ap mediated device to and
   remove it from a VFIO group.
-* Create and destroy a mediated matrix device
-* Add a mediated matrix device to and remove it from the AP mediated bus driver
-* Add a mediated matrix device to and remove it from an IOMMU group
+* Create and destroy a vfio_ap mediated device
+* Add a vfio_ap mediated device to and remove it from the AP mediated bus 
driver
+* Add a vfio_ap mediated device to and remove it from an IOMMU group
 
 The following high-level block diagram shows the main components and interfaces
-of the VFIO AP mediated matrix device driver::
+of the VFIO AP mediated device driver::
 
+-+
| |
@@ -343,7 +344,7 @@ matrix device.
* device_api:
the mediated device type's API
* available_instances:
-   the number of mediated matrix passthrough devices
+   the number of vfio_ap mediated passthrough devices
that can be created
* device_api:
specifies the VFIO API

[PATCH v15 12/13] s390/zcrypt: notify drivers on config changed and scan complete callbacks

2021-04-06 Thread Tony Krowiak
This patch intruduces an extension to the ap bus to notify device drivers
when the host AP configuration changes - i.e., adapters, domains or
control domains are added or removed. To that end, two new callbacks are
introduced for AP device drivers:

  void (*on_config_changed)(struct ap_config_info *new_config_info,
struct ap_config_info *old_config_info);

 This callback is invoked at the start of the AP bus scan
 function when it determines that the host AP configuration information
 has changed since the previous scan. This is done by storing
 an old and current QCI info struct and comparing them. If there is any
 difference, the callback is invoked.

 Note that when the AP bus scan detects that AP adapters, domains or
 control domains have been removed from the host's AP configuration, it
 will remove the associated devices from the AP bus subsystem's device
 model. This callback gives the device driver a chance to respond to
 the removal of the AP devices from the host configuration prior to
 calling the device driver's remove callback. The primary purpose of
 this callback is to allow the vfio_ap driver to do a bulk unplug of
 all affected adapters, domains and control domains from affected
 guests rather than unplugging them one at a time when the remove
 callback is invoked.

  void (*on_scan_complete)(struct ap_config_info *new_config_info,
   struct ap_config_info *old_config_info);

 The on_scan_complete callback is invoked after the ap bus scan is
 complete if the host AP configuration data has changed.

 Note that when the AP bus scan detects that adapters, domains or
 control domains have been added to the host's configuration, it will
 create new devices in the AP bus subsystem's device model. The primary
 purpose of this callback is to allow the vfio_ap driver to do a bulk
 plug of all affected adapters, domains and control domains into
 affected guests rather than plugging them one at a time when the
 probe callback is invoked.

Please note that changes to the apmask and aqmask do not trigger
these two callbacks since the bus scan function is not invoked by changes
to those masks.

Signed-off-by: Harald Freudenberger 
Reviewed-by: Halil Pasic 
Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/ap_bus.c  |  89 ++-
 drivers/s390/crypto/ap_bus.h  |  12 ++
 drivers/s390/crypto/vfio_ap_drv.c |   4 +-
 drivers/s390/crypto/vfio_ap_ops.c | 215 +++---
 drivers/s390/crypto/vfio_ap_private.h |  15 +-
 5 files changed, 306 insertions(+), 29 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index b7653cec81ac..ccc87fb84c6d 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -82,6 +82,7 @@ static atomic64_t ap_scan_bus_count;
 static DECLARE_COMPLETION(ap_init_apqn_bindings_complete);
 
 static struct ap_config_info *ap_qci_info;
+static struct ap_config_info *ap_qci_info_old;
 
 /*
  * AP bus related debug feature things.
@@ -1579,6 +1580,50 @@ static int __match_queue_device_with_queue_id(struct 
device *dev, const void *da
&& AP_QID_QUEUE(to_ap_queue(dev)->qid) == (int)(long) data;
 }
 
+/* Helper function for notify_config_changed */
+static int __drv_notify_config_changed(struct device_driver *drv, void *data)
+{
+   struct ap_driver *ap_drv = to_ap_drv(drv);
+
+   if (try_module_get(drv->owner)) {
+   if (ap_drv->on_config_changed)
+   ap_drv->on_config_changed(ap_qci_info,
+ ap_qci_info_old);
+   module_put(drv->owner);
+   }
+
+   return 0;
+}
+
+/* Notify all drivers about an qci config change */
+static inline void notify_config_changed(void)
+{
+   bus_for_each_drv(_bus_type, NULL, NULL,
+__drv_notify_config_changed);
+}
+
+/* Helper function for notify_scan_complete */
+static int __drv_notify_scan_complete(struct device_driver *drv, void *data)
+{
+   struct ap_driver *ap_drv = to_ap_drv(drv);
+
+   if (try_module_get(drv->owner)) {
+   if (ap_drv->on_scan_complete)
+   ap_drv->on_scan_complete(ap_qci_info,
+ap_qci_info_old);
+   module_put(drv->owner);
+   }
+
+   return 0;
+}
+
+/* Notify all drivers about bus scan complete */
+static inline void notify_scan_complete(void)
+{
+   bus_for_each_drv(_bus_type, NULL, NULL,
+__drv_notify_scan_complete);
+}
+
 /*
  * Helper function for ap_scan_bus().
  * Remove card device and associated queue devices.
@@ -1857,15 +1902,51 @@ static inline void ap_scan_adapter(int ap)
put_device(>ap_dev.device);
 }
 
+/*
+ * ap_get_configuration
+ *
+ * Stores the host

[PATCH v15 11/13] s390/vfio-ap: sysfs attribute to display the guest's matrix

2021-04-06 Thread Tony Krowiak
The matrix of adapters and domains configured in a guest's APCB may
differ from the matrix of adapters and domains assigned to the matrix mdev,
so this patch introduces a sysfs attribute to display the matrix of
adapters and domains that are or will be assigned to the APCB of a guest
that is or will be using the matrix mdev. For a matrix mdev denoted by
$uuid, the guest matrix can be displayed as follows:

   cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix

Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_ops.c | 51 ++-
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 818757739f5d..618d9e37e82b 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1106,29 +1106,24 @@ static ssize_t control_domains_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(control_domains);
 
-static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
-  char *buf)
+static ssize_t vfio_ap_mdev_matrix_show(struct ap_matrix *matrix, char *buf)
 {
-   struct mdev_device *mdev = mdev_from_dev(dev);
-   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
char *bufpos = buf;
unsigned long apid;
unsigned long apqi;
unsigned long apid1;
unsigned long apqi1;
-   unsigned long napm_bits = matrix_mdev->matrix.apm_max + 1;
-   unsigned long naqm_bits = matrix_mdev->matrix.aqm_max + 1;
+   unsigned long napm_bits = matrix->apm_max + 1;
+   unsigned long naqm_bits = matrix->aqm_max + 1;
int nchars = 0;
int n;
 
-   apid1 = find_first_bit_inv(matrix_mdev->matrix.apm, napm_bits);
-   apqi1 = find_first_bit_inv(matrix_mdev->matrix.aqm, naqm_bits);
-
-   mutex_lock(_dev->lock);
+   apid1 = find_first_bit_inv(matrix->apm, napm_bits);
+   apqi1 = find_first_bit_inv(matrix->aqm, naqm_bits);
 
if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {
-   for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
-   for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
+   for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
+   for_each_set_bit_inv(apqi, matrix->aqm,
 naqm_bits) {
n = sprintf(bufpos, "%02lx.%04lx\n", apid,
apqi);
@@ -1137,25 +1132,52 @@ static ssize_t matrix_show(struct device *dev, struct 
device_attribute *attr,
}
}
} else if (apid1 < napm_bits) {
-   for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
+   for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
n = sprintf(bufpos, "%02lx.\n", apid);
bufpos += n;
nchars += n;
}
} else if (apqi1 < naqm_bits) {
-   for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm_bits) {
+   for_each_set_bit_inv(apqi, matrix->aqm, naqm_bits) {
n = sprintf(bufpos, ".%04lx\n", apqi);
bufpos += n;
nchars += n;
}
}
 
+   return nchars;
+}
+
+static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
+  char *buf)
+{
+   ssize_t nchars;
+   struct mdev_device *mdev = mdev_from_dev(dev);
+   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+   mutex_lock(_dev->lock);
+   nchars = vfio_ap_mdev_matrix_show(_mdev->matrix, buf);
mutex_unlock(_dev->lock);
 
return nchars;
 }
 static DEVICE_ATTR_RO(matrix);
 
+static ssize_t guest_matrix_show(struct device *dev,
+struct device_attribute *attr, char *buf)
+{
+   ssize_t nchars;
+   struct mdev_device *mdev = mdev_from_dev(dev);
+   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+   mutex_lock(_dev->lock);
+   nchars = vfio_ap_mdev_matrix_show(_mdev->shadow_apcb, buf);
+   mutex_unlock(_dev->lock);
+
+   return nchars;
+}
+static DEVICE_ATTR_RO(guest_matrix);
+
 static struct attribute *vfio_ap_mdev_attrs[] = {
_attr_assign_adapter.attr,
_attr_unassign_adapter.attr,
@@ -1165,6 +1187,7 @@ static struct attribute *vfio_ap_mdev_attrs[] = {
_attr_unassign_control_domain.attr,
_attr_control_domains.attr,
_attr_matrix.attr,
+   _attr_guest_matrix.attr,
NULL,
 };
 
-- 
2.21.3



[PATCH v15 10/13] s390/vfio-ap: implement in-use callback for vfio_ap driver

2021-04-06 Thread Tony Krowiak
Let's implement the callback to indicate when an APQN
is in use by the vfio_ap device driver. The callback is
invoked whenever a change to the apmask or aqmask would
result in one or more queue devices being removed from the driver. The
vfio_ap device driver will indicate a resource is in use
if the APQN of any of the queue devices to be removed are assigned to
any of the matrix mdevs under the driver's control.

There is potential for a deadlock condition between the matrix_dev->lock
used to lock the matrix device during assignment of adapters and domains
and the ap_perms_mutex locked by the AP bus when changes are made to the
sysfs apmask/aqmask attributes.

Consider following scenario (courtesy of Halil Pasic):
1) apmask_store() takes ap_perms_mutex
2) assign_adapter_store() takes matrix_dev->lock
3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
   to take matrix_dev->lock
4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
   which tries to take ap_perms_mutex

BANG!

To resolve this issue, instead of using the mutex_lock(_dev->lock)
function to lock the matrix device during assignment of an adapter or
domain to a matrix_mdev as well as during the in_use callback, the
mutex_trylock(_dev->lock) function will be used. If the lock is not
obtained, then the assignment and in_use functions will terminate with
-EAGAIN.

Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_drv.c |  1 +
 drivers/s390/crypto/vfio_ap_ops.c | 38 ++-
 drivers/s390/crypto/vfio_ap_private.h |  2 ++
 3 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
b/drivers/s390/crypto/vfio_ap_drv.c
index 73bd073fd5d3..8934471b7944 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -147,6 +147,7 @@ static int __init vfio_ap_init(void)
memset(_ap_drv, 0, sizeof(vfio_ap_drv));
vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
+   vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
vfio_ap_drv.ids = ap_queue_ids;
 
ret = ap_driver_register(_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 8e7f24f0cd49..818757739f5d 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -667,10 +667,14 @@ static void vfio_ap_mdev_link_adapter(struct 
ap_matrix_mdev *matrix_mdev,
  *driver; or, if no APQIs have yet been assigned, the APID is not
  *contained in an APQN bound to the vfio_ap device driver.
  *
- * 4. -EBUSY
+ * 4. -EADDRINUSE
  *An APQN derived from the cross product of the APID being assigned
  *and the APQIs previously assigned is being used by another mediated
- *matrix device or the mdev lock could not be acquired.
+ *matrix device.
+ *
+ * 5. -EAGAIN
+ *The mdev lock could not be acquired which is required in order to
+ *change the AP configuration for the mdev
  */
 static ssize_t assign_adapter_store(struct device *dev,
struct device_attribute *attr,
@@ -681,7 +685,8 @@ static ssize_t assign_adapter_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-   mutex_lock(_dev->lock);
+   if (!mutex_trylock(_dev->lock))
+   return -EAGAIN;
 
/*
 * If the KVM pointer is in flux or the guest is running, disallow
@@ -820,10 +825,14 @@ static void vfio_ap_mdev_link_domain(struct 
ap_matrix_mdev *matrix_mdev,
  *driver; or, if no APIDs have yet been assigned, the APQI is not
  *contained in an APQN bound to the vfio_ap device driver.
  *
- * 4. -BUSY
+ * 4. -EADDRINUSE
  *An APQN derived from the cross product of the APQI being assigned
  *and the APIDs previously assigned is being used by another mediated
- *matrix device or the mdev lock could not be acquired.
+ *matrix device.
+ *
+ * 5. -EAGAIN
+ *The mdev lock could not be acquired which is required in order to
+ *change the AP configuration for the mdev
  */
 static ssize_t assign_domain_store(struct device *dev,
   struct device_attribute *attr,
@@ -835,7 +844,8 @@ static ssize_t assign_domain_store(struct device *dev,
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
 
-   mutex_lock(_dev->lock);
+   if (!mutex_trylock(_dev->lock))
+   return -EAGAIN;
 
/*
 * If the KVM pointer is in flux or the guest is running, disallow
@@ -963,6 +973,7 @@ static void vfio_ap_mdev_hot_plug_cdom(struct 
ap_matrix_mdev *matrix_mdev,
  * returns one of the fol

[PATCH v15 09/13] s390/zcrypt: driver callback to indicate resource in use

2021-04-06 Thread Tony Krowiak
Introduces a new driver callback to prevent a root user from unbinding
an AP queue from its device driver if the queue is in use. The callback
will be invoked whenever a change to the AP bus's sysfs apmask or aqmask
attributes would result in one or more AP queues being removed from its
driver. If the callback responds in the affirmative for any driver
queried, the change to the apmask or aqmask will be rejected with a device
busy error.

For this patch, only non-default drivers will be queried. Currently,
there is only one non-default driver, the vfio_ap device driver. The
vfio_ap device driver facilitates pass-through of an AP queue to a
guest. The idea here is that a guest may be administered by a different
sysadmin than the host and we don't want AP resources to unexpectedly
disappear from a guest's AP configuration (i.e., adapters and domains
assigned to the matrix mdev). This will enforce the proper procedure for
removing AP resources intended for guest usage which is to
first unassign them from the matrix mdev, then unbind them from the
vfio_ap device driver.

Signed-off-by: Tony Krowiak 
Reviewed-by: Harald Freudenberger 
Reviewed-by: Halil Pasic 
---
 drivers/s390/crypto/ap_bus.c | 160 ---
 drivers/s390/crypto/ap_bus.h |   4 +
 2 files changed, 154 insertions(+), 10 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 2758d05a802d..b7653cec81ac 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ap_bus.h"
 #include "ap_debug.h"
@@ -1006,6 +1007,23 @@ static int modify_bitmap(const char *str, unsigned long 
*bitmap, int bits)
return 0;
 }
 
+static int ap_parse_bitmap_str(const char *str, unsigned long *bitmap, int 
bits,
+  unsigned long *newmap)
+{
+   unsigned long size;
+   int rc;
+
+   size = BITS_TO_LONGS(bits) * sizeof(unsigned long);
+   if (*str == '+' || *str == '-') {
+   memcpy(newmap, bitmap, size);
+   rc = modify_bitmap(str, newmap, bits);
+   } else {
+   memset(newmap, 0, size);
+   rc = hex2bitmap(str, newmap, bits);
+   }
+   return rc;
+}
+
 int ap_parse_mask_str(const char *str,
  unsigned long *bitmap, int bits,
  struct mutex *lock)
@@ -1025,14 +1043,7 @@ int ap_parse_mask_str(const char *str,
kfree(newmap);
return -ERESTARTSYS;
}
-
-   if (*str == '+' || *str == '-') {
-   memcpy(newmap, bitmap, size);
-   rc = modify_bitmap(str, newmap, bits);
-   } else {
-   memset(newmap, 0, size);
-   rc = hex2bitmap(str, newmap, bits);
-   }
+   rc = ap_parse_bitmap_str(str, bitmap, bits, newmap);
if (rc == 0)
memcpy(bitmap, newmap, size);
mutex_unlock(lock);
@@ -1224,12 +1235,76 @@ static ssize_t apmask_show(struct bus_type *bus, char 
*buf)
return rc;
 }
 
+static int __verify_card_reservations(struct device_driver *drv, void *data)
+{
+   int rc = 0;
+   struct ap_driver *ap_drv = to_ap_drv(drv);
+   unsigned long *newapm = (unsigned long *)data;
+
+   /*
+* No need to verify whether the driver is using the queues if it is the
+* default driver.
+*/
+   if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+   return 0;
+
+   /*
+* increase the driver's module refcounter to be sure it is not
+* going away when we invoke the callback function.
+*/
+   if (!try_module_get(drv->owner))
+   return 0;
+
+   if (ap_drv->in_use) {
+   rc = ap_drv->in_use(newapm, ap_perms.aqm);
+   if (rc)
+   rc = -EBUSY;
+   }
+
+   /* release the driver's module */
+   module_put(drv->owner);
+
+   return rc;
+}
+
+static int apmask_commit(unsigned long *newapm)
+{
+   int rc;
+   unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
+
+   /*
+* Check if any bits in the apmask have been set which will
+* result in queues being removed from non-default drivers
+*/
+   if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
+   rc = bus_for_each_drv(_bus_type, NULL, reserved,
+ __verify_card_reservations);
+   if (rc)
+   return rc;
+   }
+
+   memcpy(ap_perms.apm, newapm, APMASKSIZE);
+
+   return 0;
+}
+
 static ssize_t apmask_store(struct bus_type *bus, const char *buf,
size_t count)
 {
int rc;
+   DECLARE_BITMAP(newapm, AP_DEVICES);
+
+   if (mutex_lock_interruptible(_perms_mutex))
+   return -ERESTARTSYS;
 
-   rc = ap_parse_mask_str(buf, ap_perms.apm

[PATCH v15 07/13] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device

2021-04-06 Thread Tony Krowiak
The current implementation does not allow assignment of an AP adapter or
domain to an mdev device if each APQN resulting from the assignment
does not reference an AP queue device that is bound to the vfio_ap device
driver. This patch allows assignment of AP resources to the matrix mdev as
long as the APQNs resulting from the assignment:
   1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
   2. Are not assigned to another matrix mdev.

The rationale behind this is that the AP architecture does not preclude
assignment of APQNs to an AP configuration profile that are not available
to the system.

Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_ops.c | 228 --
 1 file changed, 56 insertions(+), 172 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index a8ae1d22aeba..69b58b0fac8f 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -536,141 +536,50 @@ static struct attribute_group 
*vfio_ap_mdev_type_groups[] = {
NULL,
 };
 
-struct vfio_ap_queue_reserved {
-   unsigned long *apid;
-   unsigned long *apqi;
-   bool reserved;
-};
-
-/**
- * vfio_ap_has_queue
- *
- * @dev: an AP queue device
- * @data: a struct vfio_ap_queue_reserved reference
- *
- * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
- * apid or apqi specified in @data:
- *
- * - If @data contains both an apid and apqi value, then @data will be flagged
- *   as reserved if the APID and APQI fields for the AP queue device matches
- *
- * - If @data contains only an apid value, @data will be flagged as
- *   reserved if the APID field in the AP queue device matches
- *
- * - If @data contains only an apqi value, @data will be flagged as
- *   reserved if the APQI field in the AP queue device matches
- *
- * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
- * @data does not contain either an apid or apqi.
- */
-static int vfio_ap_has_queue(struct device *dev, void *data)
-{
-   struct vfio_ap_queue_reserved *qres = data;
-   struct ap_queue *ap_queue = to_ap_queue(dev);
-   ap_qid_t qid;
-   unsigned long id;
-
-   if (qres->apid && qres->apqi) {
-   qid = AP_MKQID(*qres->apid, *qres->apqi);
-   if (qid == ap_queue->qid)
-   qres->reserved = true;
-   } else if (qres->apid && !qres->apqi) {
-   id = AP_QID_CARD(ap_queue->qid);
-   if (id == *qres->apid)
-   qres->reserved = true;
-   } else if (!qres->apid && qres->apqi) {
-   id = AP_QID_QUEUE(ap_queue->qid);
-   if (id == *qres->apqi)
-   qres->reserved = true;
-   } else {
-   return -EINVAL;
-   }
-
-   return 0;
-}
-
-/**
- * vfio_ap_verify_queue_reserved
- *
- * @matrix_dev: a mediated matrix device
- * @apid: an AP adapter ID
- * @apqi: an AP queue index
- *
- * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP 
device
- * driver according to the following rules:
- *
- * - If both @apid and @apqi are not NULL, then there must be an AP queue
- *   device bound to the vfio_ap driver with the APQN identified by @apid and
- *   @apqi
- *
- * - If only @apid is not NULL, then there must be an AP queue device bound
- *   to the vfio_ap driver with an APQN containing @apid
- *
- * - If only @apqi is not NULL, then there must be an AP queue device bound
- *   to the vfio_ap driver with an APQN containing @apqi
- *
- * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
- */
-static int vfio_ap_verify_queue_reserved(unsigned long *apid,
-unsigned long *apqi)
-{
-   int ret;
-   struct vfio_ap_queue_reserved qres;
-
-   qres.apid = apid;
-   qres.apqi = apqi;
-   qres.reserved = false;
+#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
+"already assigned to %s"
 
-   ret = driver_for_each_device(_dev->vfio_ap_drv->driver, NULL,
-, vfio_ap_has_queue);
-   if (ret)
-   return ret;
-
-   if (qres.reserved)
-   return 0;
-
-   return -EADDRNOTAVAIL;
-}
-
-static int
-vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev 
*matrix_mdev,
-unsigned long apid)
+static void vfio_ap_mdev_log_sharing_err(struct ap_matrix_mdev *matrix_mdev,
+unsigned long *apm,
+unsigned long *aqm)
 {
-   int ret;
-   unsigned long apqi;
-   unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
-
-   if (find_first_bit_inv(matrix_mdev->matrix.aqm, n

[PATCH v15 08/13] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

2021-04-06 Thread Tony Krowiak
Let's allow adapters, domains and control domains to be hot plugged into
and hot unplugged from a KVM guest using a matrix mdev when:

* The adapter, domain or control domain is assigned to or unassigned from
  the matrix mdev

* A queue device with an APQN assigned to the matrix mdev is bound to or
  unbound from the vfio_ap device driver.

Whenever an assignment or unassignment of an adapter, domain or control
domain is performed as well as when a bind or unbind of a queue device
is executed, the AP control block (APCB) that supplies the AP configuration
to the guest is first refreshed.

After refreshing the APCB, if the mdev is in use by a KVM guest, it is
hot plugged into the guest to provide access to dynamically provide
access to the adapters, domains and control domains provided via the
newly refreshed APCB.

Signed-off-by: Tony Krowiak 
Acked-by: Halil Pasic 
---
 drivers/s390/crypto/vfio_ap_ops.c | 72 +++
 1 file changed, 63 insertions(+), 9 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 69b58b0fac8f..8e7f24f0cd49 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -311,6 +311,20 @@ static void vfio_ap_matrix_init(struct ap_config_info 
*info,
matrix->adm_max = info->apxa ? info->Nd : 15;
 }
 
+static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
+{
+   return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
+}
+
+static void vfio_ap_mdev_commit_apcb(struct ap_matrix_mdev *matrix_mdev)
+{
+   if (vfio_ap_mdev_has_crycb(matrix_mdev))
+   kvm_arch_crypto_set_masks(matrix_mdev->kvm,
+ matrix_mdev->shadow_apcb.apm,
+ matrix_mdev->shadow_apcb.aqm,
+ matrix_mdev->shadow_apcb.adm);
+}
+
 /*
  * vfio_ap_mdev_filter_apcb
  *
@@ -378,6 +392,7 @@ static void vfio_ap_mdev_refresh_apcb(struct ap_matrix_mdev 
*matrix_mdev)
   sizeof(struct ap_matrix)) != 0) {
memcpy(_mdev->shadow_apcb, _apcb,
   sizeof(struct ap_matrix));
+   vfio_ap_mdev_commit_apcb(matrix_mdev);
}
 }
 
@@ -672,7 +687,7 @@ static ssize_t assign_adapter_store(struct device *dev,
 * If the KVM pointer is in flux or the guest is running, disallow
 * un-assignment of adapter
 */
-   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   if (matrix_mdev->kvm_busy) {
ret = -EBUSY;
goto done;
}
@@ -745,7 +760,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
 * If the KVM pointer is in flux or the guest is running, disallow
 * un-assignment of adapter
 */
-   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   if (matrix_mdev->kvm_busy) {
ret = -EBUSY;
goto done;
}
@@ -826,7 +841,7 @@ static ssize_t assign_domain_store(struct device *dev,
 * If the KVM pointer is in flux or the guest is running, disallow
 * assignment of domain
 */
-   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   if (matrix_mdev->kvm_busy) {
ret = -EBUSY;
goto done;
}
@@ -898,7 +913,7 @@ static ssize_t unassign_domain_store(struct device *dev,
 * If the KVM pointer is in flux or the guest is running, disallow
 * un-assignment of domain
 */
-   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   if (matrix_mdev->kvm_busy) {
ret = -EBUSY;
goto done;
}
@@ -923,6 +938,16 @@ static ssize_t unassign_domain_store(struct device *dev,
 }
 static DEVICE_ATTR_WO(unassign_domain);
 
+static void vfio_ap_mdev_hot_plug_cdom(struct ap_matrix_mdev *matrix_mdev,
+  unsigned long domid)
+{
+   if (!test_bit_inv(domid, matrix_mdev->shadow_apcb.adm) &&
+   test_bit_inv(domid, (unsigned long *)matrix_dev->info.adm)) {
+   set_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
+   vfio_ap_mdev_commit_apcb(matrix_mdev);
+   }
+}
+
 /**
  * assign_control_domain_store
  *
@@ -954,7 +979,7 @@ static ssize_t assign_control_domain_store(struct device 
*dev,
 * If the KVM pointer is in flux or the guest is running, disallow
 * assignment of control domain.
 */
-   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   if (matrix_mdev->kvm_busy) {
ret = -EBUSY;
goto done;
}
@@ -974,7 +999,7 @@ static ssize_t assign_control_domain_store(struct device 
*dev,
 * number of control domains that can be assigned.
 */
set_bit_inv(id, matrix_mdev->matrix.adm);
-   vfio_ap_mdev_

[PATCH v15 06/13] s390/vfio-ap: refresh guest's APCB by filtering APQNs assigned to mdev

2021-04-06 Thread Tony Krowiak
Refresh the guest's APCB by filtering the APQNs assigned to the matrix mdev
that do not reference an AP queue device bound to the vfio_ap device
driver. The mdev's APQNs will be filtered according to the following rules:

* The APID of each adapter and the APQI of each domain that is not in the
  host's AP configuration is filtered out.

* The APID of each adapter comprising an APQN that does not reference a
  queue device bound to the vfio_ap device driver is filtered. The APQNs
  are derived from the Cartesian product of the APID of each adapter and
  APQI of each domain assigned to the mdev.

The filtering will take place:

* Whenever an adapter, domain or control domains is assigned or
  unassigned.

* When a queue device is bound to or unbound from the vfio_ap device
  driver.

Signed-off-by: Tony Krowiak 
Acked-by: Halil Pasic 
---
 drivers/s390/crypto/vfio_ap_ops.c | 84 +--
 1 file changed, 81 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index ce57d7f24f74..a8ae1d22aeba 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -311,6 +311,76 @@ static void vfio_ap_matrix_init(struct ap_config_info 
*info,
matrix->adm_max = info->apxa ? info->Nd : 15;
 }
 
+/*
+ * vfio_ap_mdev_filter_apcb
+ *
+ * @matrix_mdev: the mdev whose AP configuration is to be filtered.
+ * @shadow_apcb: the APCB to use to store the guest's AP configuration after
+ *  filtering takes place.
+ */
+static void vfio_ap_mdev_filter_apcb(struct ap_matrix_mdev *matrix_mdev,
+struct ap_matrix *shadow_apcb)
+{
+   int ret;
+   unsigned long apid, apqi, apqn;
+
+   ret = ap_qci(_dev->info);
+   if (ret)
+   return;
+
+   /*
+* Copy the adapters, domains and control domains to the shadow_apcb
+* from the matrix mdev, but only those that are assigned to the host's
+* AP configuration.
+*/
+   bitmap_and(shadow_apcb->apm, matrix_mdev->matrix.apm,
+  (unsigned long *)matrix_dev->info.apm, AP_DEVICES);
+   bitmap_and(shadow_apcb->aqm, matrix_mdev->matrix.aqm,
+  (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
+   bitmap_and(shadow_apcb->adm, matrix_mdev->matrix.adm,
+  (unsigned long *)matrix_dev->info.adm, AP_DOMAINS);
+
+   for_each_set_bit_inv(apid, shadow_apcb->apm, AP_DEVICES) {
+   for_each_set_bit_inv(apqi, shadow_apcb->aqm, AP_DOMAINS) {
+   /*
+* If the APQN is not bound to the vfio_ap device
+* driver, then we can't assign it to the guest's
+* AP configuration. The AP architecture won't
+* allow filtering of a single APQN, so if we're
+* filtering APIDs, then filter the APID; otherwise,
+* filter the APQI.
+*/
+   apqn = AP_MKQID(apid, apqi);
+   if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
+   clear_bit_inv(apid, shadow_apcb->apm);
+   break;
+   }
+   }
+   }
+}
+
+/**
+ * vfio_ap_mdev_refresh_apcb
+ *
+ * Refresh the guest's APCB by filtering the APQNs assigned to the matrix mdev
+ * that do not reference an AP queue device bound to the vfio_ap device driver.
+ *
+ * @matrix_mdev:  the matrix mdev whose AP configuration is to be filtered
+ */
+static void vfio_ap_mdev_refresh_apcb(struct ap_matrix_mdev *matrix_mdev)
+{
+   struct ap_matrix shadow_apcb;
+
+   vfio_ap_matrix_init(_dev->info, _apcb);
+   vfio_ap_mdev_filter_apcb(matrix_mdev, _apcb);
+
+   if (memcmp(_apcb, _mdev->shadow_apcb,
+  sizeof(struct ap_matrix)) != 0) {
+   memcpy(_mdev->shadow_apcb, _apcb,
+  sizeof(struct ap_matrix));
+   }
+}
+
 static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 {
struct ap_matrix_mdev *matrix_mdev;
@@ -711,6 +781,7 @@ static ssize_t assign_adapter_store(struct device *dev,
goto share_err;
 
vfio_ap_mdev_link_adapter(matrix_mdev, apid);
+   vfio_ap_mdev_refresh_apcb(matrix_mdev);
ret = count;
goto done;
 
@@ -780,6 +851,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
 
clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
vfio_ap_mdev_unlink_adapter(matrix_mdev, apid);
+   vfio_ap_mdev_refresh_apcb(matrix_mdev);
ret = count;
 done:
mutex_unlock(_dev->lock);
@@ -888,6 +960,7 @@ static ssize_t assign_domain_store(struct device *dev,
goto share_err;
 
vfio_ap_mdev_link_domain(matrix_mdev, apqi);
+   

[PATCH v15 04/13] s390/vfio-ap: manage link between queue struct and matrix mdev

2021-04-06 Thread Tony Krowiak
Let's create links between each queue device bound to the vfio_ap device
driver and the matrix mdev to which the queue's APQN is assigned. The idea
is to facilitate efficient retrieval of the objects representing the queue
devices and matrix mdevs as well as to verify that a queue assigned to
a matrix mdev is bound to the driver.

The links will be created as follows:

 * When the queue device is probed, if its APQN is assigned to a matrix
   mdev, the structures representing the queue device and the matrix mdev
   will be linked.

 * When an adapter or domain is assigned to a matrix mdev, for each new
   APQN assigned that references a queue device bound to the vfio_ap
   device driver, the structures representing the queue device and the
   matrix mdev will be linked.

The links will be removed as follows:

 * When the queue device is removed, if its APQN is assigned to a matrix
   mdev, the link from the structure representing the matrix mdev to the
   structure representing the queue will be removed. The link from the
   queue to the matrix mdev will be maintained because if the queue device
   is being removed due to a manual sysfs unbind, it may be needed after
   the queue is reset to clean up the IRQ resources allocated to enable AP
   interrupts for the KVM guest. Since the storage for the structure
   representing the queue device is ultimately freed by the remove
   callback, keeping the reference shouldn't be a problem.

 * When an adapter or domain is unassigned from a matrix mdev, for each
   APQN unassigned that references a queue device bound to the vfio_ap
   device driver, the structures representing the queue device and the
   matrix mdev will be unlinked.

 * When an mdev is removed, the link from any queues assigned to the mdev
   to the mdev will be removed.

Signed-off-by: Tony Krowiak 
Reviewed-by: Halil Pasic 
---
 drivers/s390/crypto/vfio_ap_ops.c | 162 ++
 drivers/s390/crypto/vfio_ap_private.h |   3 +
 2 files changed, 140 insertions(+), 25 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index c630abac81d0..8bc21f3ec2b4 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -27,33 +27,17 @@
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
 static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
 
-/**
- * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
- * @matrix_mdev: the associated mediated matrix
- * @apqn: The queue APQN
- *
- * Retrieve a queue with a specific APQN from the list of the
- * devices of the vfio_ap_drv.
- * Verify that the APID and the APQI are set in the matrix.
- *
- * Returns the pointer to the associated vfio_ap_queue
- */
-static struct vfio_ap_queue *vfio_ap_get_queue(
-   struct ap_matrix_mdev *matrix_mdev,
-   int apqn)
+static struct vfio_ap_queue *
+vfio_ap_mdev_get_queue(struct ap_matrix_mdev *matrix_mdev, unsigned long apqn)
 {
struct vfio_ap_queue *q;
 
-   if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
-   return NULL;
-   if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
-   return NULL;
-
-   q = vfio_ap_find_queue(apqn);
-   if (q)
-   q->matrix_mdev = matrix_mdev;
+   hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
+   if (q && q->apqn == apqn)
+   return q;
+   }
 
-   return q;
+   return NULL;
 }
 
 /**
@@ -171,7 +155,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct 
vfio_ap_queue *q)
  status.response_code);
 end_free:
vfio_ap_free_aqic_resources(q);
-   q->matrix_mdev = NULL;
return status;
 }
 
@@ -300,7 +283,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
if (!matrix_mdev->kvm)
goto out_unlock;
 
-   q = vfio_ap_get_queue(matrix_mdev, apqn);
+   q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
if (!q)
goto out_unlock;
 
@@ -344,6 +327,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct 
mdev_device *mdev)
matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(_dev->info, _mdev->matrix);
init_waitqueue_head(_mdev->wait_for_kvm);
+   hash_init(matrix_mdev->qtable);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
matrix_mdev->pqap_hook.owner = THIS_MODULE;
@@ -354,6 +338,66 @@ static int vfio_ap_mdev_create(struct kobject *kobj, 
struct mdev_device *mdev)
return 0;
 }
 
+static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
+   struct vfio_ap_queue *q)
+{
+   if (q) {
+   q->matrix_mdev = matrix_mdev;
+   hash_add(matrix_mdev-&

[PATCH v15 05/13] s390/vfio-ap: introduce shadow APCB

2021-04-06 Thread Tony Krowiak
The APCB is a field within the CRYCB that provides the AP configuration
to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
maintain it for the lifespan of the guest.

Signed-off-by: Tony Krowiak 
Reviewed-by: Halil Pasic 
---
 drivers/s390/crypto/vfio_ap_ops.c | 10 ++
 drivers/s390/crypto/vfio_ap_private.h |  2 ++
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 8bc21f3ec2b4..ce57d7f24f74 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -327,6 +327,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct 
mdev_device *mdev)
matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(_dev->info, _mdev->matrix);
init_waitqueue_head(_mdev->wait_for_kvm);
+   vfio_ap_matrix_init(_dev->info, _mdev->shadow_apcb);
hash_init(matrix_mdev->qtable);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
@@ -1201,12 +1202,13 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
*matrix_mdev,
}
 
kvm_get_kvm(kvm);
+   memcpy(_mdev->shadow_apcb, _mdev->matrix,
+  sizeof(struct ap_matrix));
matrix_mdev->kvm_busy = true;
mutex_unlock(_dev->lock);
-   kvm_arch_crypto_set_masks(kvm,
- matrix_mdev->matrix.apm,
- matrix_mdev->matrix.aqm,
- matrix_mdev->matrix.adm);
+   kvm_arch_crypto_set_masks(kvm, matrix_mdev->shadow_apcb.apm,
+ matrix_mdev->shadow_apcb.aqm,
+ matrix_mdev->shadow_apcb.adm);
mutex_lock(_dev->lock);
kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
matrix_mdev->kvm = kvm;
diff --git a/drivers/s390/crypto/vfio_ap_private.h 
b/drivers/s390/crypto/vfio_ap_private.h
index af3f53a3ea4c..6f4f1f5bd611 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -75,6 +75,7 @@ struct ap_matrix {
  * @list:  allows the ap_matrix_mdev struct to be added to a list
  * @matrix:the adapters, usage domains and control domains assigned to the
  * mediated matrix device.
+ * @shadow_apcb:the shadow copy of the APCB field of the KVM guest's CRYCB
  * @group_notifier: notifier block used for specifying callback function for
  * handling the VFIO_GROUP_NOTIFY_SET_KVM event
  * @kvm:   the struct holding guest's state
@@ -82,6 +83,7 @@ struct ap_matrix {
 struct ap_matrix_mdev {
struct list_head node;
struct ap_matrix matrix;
+   struct ap_matrix shadow_apcb;
struct notifier_block group_notifier;
struct notifier_block iommu_notifier;
bool kvm_busy;
-- 
2.21.3



[PATCH v15 03/13] s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c

2021-04-06 Thread Tony Krowiak
Let's move the probe and remove callbacks into the vfio_ap_ops.c
file to keep all code related to managing queues in a single file. This
way, all functions related to queue management can be removed from the
vfio_ap_private.h header file defining the public interfaces for the
vfio_ap device driver.

Signed-off-by: Tony Krowiak 
Reviewed-by: Halil Pasic 
---
 drivers/s390/crypto/vfio_ap_drv.c | 41 ++-
 drivers/s390/crypto/vfio_ap_ops.c | 28 ++
 drivers/s390/crypto/vfio_ap_private.h |  5 ++--
 3 files changed, 33 insertions(+), 41 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
b/drivers/s390/crypto/vfio_ap_drv.c
index 7dc72cb718b0..73bd073fd5d3 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -43,43 +43,6 @@ static struct ap_device_id ap_queue_ids[] = {
 
 MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
 
-/**
- * vfio_ap_queue_dev_probe:
- *
- * Allocate a vfio_ap_queue structure and associate it
- * with the device as driver_data.
- */
-static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
-{
-   struct vfio_ap_queue *q;
-
-   q = kzalloc(sizeof(*q), GFP_KERNEL);
-   if (!q)
-   return -ENOMEM;
-   dev_set_drvdata(>device, q);
-   q->apqn = to_ap_queue(>device)->qid;
-   q->saved_isc = VFIO_AP_ISC_INVALID;
-   return 0;
-}
-
-/**
- * vfio_ap_queue_dev_remove:
- *
- * Takes the matrix lock to avoid actions on this device while removing
- * Free the associated vfio_ap_queue structure
- */
-static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
-{
-   struct vfio_ap_queue *q;
-
-   mutex_lock(_dev->lock);
-   q = dev_get_drvdata(>device);
-   vfio_ap_mdev_reset_queue(q, 1);
-   dev_set_drvdata(>device, NULL);
-   kfree(q);
-   mutex_unlock(_dev->lock);
-}
-
 static void vfio_ap_matrix_dev_release(struct device *dev)
 {
struct ap_matrix_dev *matrix_dev = dev_get_drvdata(dev);
@@ -182,8 +145,8 @@ static int __init vfio_ap_init(void)
return ret;
 
memset(_ap_drv, 0, sizeof(vfio_ap_drv));
-   vfio_ap_drv.probe = vfio_ap_queue_dev_probe;
-   vfio_ap_drv.remove = vfio_ap_queue_dev_remove;
+   vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
+   vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
vfio_ap_drv.ids = ap_queue_ids;
 
ret = ap_driver_register(_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 128a66d57305..c630abac81d0 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1446,3 +1446,31 @@ void vfio_ap_mdev_unregister(void)
 {
mdev_unregister_device(_dev->device);
 }
+
+int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
+{
+   struct vfio_ap_queue *q;
+
+   q = kzalloc(sizeof(*q), GFP_KERNEL);
+   if (!q)
+   return -ENOMEM;
+   mutex_lock(_dev->lock);
+   q->apqn = to_ap_queue(>device)->qid;
+   q->saved_isc = VFIO_AP_ISC_INVALID;
+   dev_set_drvdata(>device, q);
+   mutex_unlock(_dev->lock);
+
+   return 0;
+}
+
+void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
+{
+   struct vfio_ap_queue *q;
+
+   mutex_lock(_dev->lock);
+   q = dev_get_drvdata(>device);
+   vfio_ap_mdev_reset_queue(q, 1);
+   dev_set_drvdata(>device, NULL);
+   kfree(q);
+   mutex_unlock(_dev->lock);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h 
b/drivers/s390/crypto/vfio_ap_private.h
index f82a6396acae..3ca2da62bdee 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -100,7 +100,8 @@ struct vfio_ap_queue {
 
 int vfio_ap_mdev_register(void);
 void vfio_ap_mdev_unregister(void);
-int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q,
-unsigned int retry);
+
+int vfio_ap_mdev_probe_queue(struct ap_device *queue);
+void vfio_ap_mdev_remove_queue(struct ap_device *queue);
 
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.21.3



[PATCH v15 01/13] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-04-06 Thread Tony Krowiak
This patch fixes a lockdep splat introduced by commit f21916ec4826
("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
The lockdep splat only occurs when starting a Secure Execution guest.
Crypto virtualization (vfio_ap) is not yet supported for SE guests;
however, in order to avoid this problem when support becomes available,
this fix is being provided.

The circular locking dependency was introduced when the setting of the
masks in the guest's APCB was executed while holding the matrix_dev->lock.
While the lock is definitely needed to protect the setting/unsetting of the
matrix_mdev->kvm pointer, it is not necessarily critical for setting the
masks; so, the matrix_dev->lock will be released while the masks are being
set or cleared.

Keep in mind, however, that another process that takes the matrix_dev->lock
can get control while the masks in the guest's APCB are being set or
cleared as a result of the driver being notified that the KVM pointer
has been set or unset. This could result in invalid access to the
matrix_mdev->kvm pointer by the intervening process. To avoid this
scenario, two new fields are being added to the ap_matrix_mdev struct:

struct ap_matrix_mdev {
...
bool kvm_busy;
wait_queue_head_t wait_for_kvm;
   ...
};

The functions that handle notification that the KVM pointer value has
been set or cleared will set the kvm_busy flag to true until they are done
processing at which time they will set it to false and wake up the tasks on
the matrix_mdev->wait_for_kvm wait queue. Functions that require
access to matrix_mdev->kvm will sleep on the wait queue until they are
awakened at which time they can safely access the matrix_mdev->kvm
field.

Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer 
invalidated")
Cc: sta...@vger.kernel.org
Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_ops.c | 308 ++
 drivers/s390/crypto/vfio_ap_private.h |   2 +
 2 files changed, 215 insertions(+), 95 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 1ffdd411201c..6946a7e26eff 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -294,6 +294,19 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
   struct ap_matrix_mdev, pqap_hook);
 
+   /*
+* If the KVM pointer is in the process of being set, wait until the
+* process has completed.
+*/
+   wait_event_cmd(matrix_mdev->wait_for_kvm,
+  !matrix_mdev->kvm_busy,
+  mutex_unlock(_dev->lock),
+  mutex_lock(_dev->lock));
+
+   /* If the there is no guest using the mdev, there is nothing to do */
+   if (!matrix_mdev->kvm)
+   goto out_unlock;
+
q = vfio_ap_get_queue(matrix_mdev, apqn);
if (!q)
goto out_unlock;
@@ -337,6 +350,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct 
mdev_device *mdev)
 
matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(_dev->info, _mdev->matrix);
+   init_waitqueue_head(_mdev->wait_for_kvm);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
matrix_mdev->pqap_hook.owner = THIS_MODULE;
@@ -351,17 +365,23 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)
 {
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-   if (matrix_mdev->kvm)
+   mutex_lock(_dev->lock);
+
+   /*
+* If the KVM pointer is in flux or the guest is running, disallow
+* un-assignment of control domain.
+*/
+   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   mutex_unlock(_dev->lock);
return -EBUSY;
+   }
 
-   mutex_lock(_dev->lock);
vfio_ap_mdev_reset_queues(mdev);
list_del(_mdev->node);
-   mutex_unlock(_dev->lock);
-
kfree(matrix_mdev);
mdev_set_drvdata(mdev, NULL);
atomic_inc(_dev->available_instances);
+   mutex_unlock(_dev->lock);
 
return 0;
 }
@@ -606,24 +626,31 @@ static ssize_t assign_adapter_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-   /* If the guest is running, disallow assignment of adapter */
-   if (matrix_mdev->kvm)
-   return -EBUSY;
+   mutex_lock(_dev->lock);
+
+   /*
+* If the KVM pointer is in flux or the guest is running, disallow
+* un-assignment of adapter
+*/
+   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   ret = -EBUSY;
+   goto done;
+   }
 

[PATCH v15 02/13] s390/vfio-ap: use new AP bus interface to search for queue devices

2021-04-06 Thread Tony Krowiak
This patch refactors the vfio_ap device driver to use the AP bus's
ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
information about a queue that is bound to the vfio_ap device driver.
The bus's ap_get_qdev() function retrieves the queue device from a
hashtable keyed by APQN. This is much more efficient than looping over
the list of devices attached to the AP bus by several orders of
magnitude.

Signed-off-by: Tony Krowiak 
Reviewed-by: Halil Pasic 
---
 drivers/s390/crypto/vfio_ap_ops.c | 23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 6946a7e26eff..128a66d57305 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -27,13 +27,6 @@
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
 static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
 
-static int match_apqn(struct device *dev, const void *data)
-{
-   struct vfio_ap_queue *q = dev_get_drvdata(dev);
-
-   return (q->apqn == *(int *)(data)) ? 1 : 0;
-}
-
 /**
  * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
  * @matrix_mdev: the associated mediated matrix
@@ -1232,15 +1225,17 @@ static int vfio_ap_mdev_group_notifier(struct 
notifier_block *nb,
 
 static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
 {
-   struct device *dev;
+   struct ap_queue *queue;
struct vfio_ap_queue *q = NULL;
 
-   dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,
-, match_apqn);
-   if (dev) {
-   q = dev_get_drvdata(dev);
-   put_device(dev);
-   }
+   queue = ap_get_qdev(apqn);
+   if (!queue)
+   return NULL;
+
+   if (queue->ap_dev.device.driver == _dev->vfio_ap_drv->driver)
+   q = dev_get_drvdata(>ap_dev.device);
+
+   put_device(>ap_dev.device);
 
return q;
 }
-- 
2.21.3



[PATCH v15 00/13] s390/vfio-ap: dynamic configuration support

2021-04-06 Thread Tony Krowiak
so that if any
  APQN assigned to the mdev is not bound to the vfio_ap device driver,
  the adapter will not get plugged into the KVM guest on startup, or when
  a new adapter is assigned to the mdev.

* Replaced patch 8 by squashing patches 8 (filtering patch) and 15 (handle
  probe/remove).

* Added a patch 1 to remove disable IRQ after a reset because the reset
  already disables a queue.

* Now using filtering code to update the KVM guest's matrix when
  notified that AP bus scan has completed.

* Fixed issue with probe/remove not inititiated by a configuration change
  occurring within a config change.


Change log v9-v10:
-
* Updated the documentation in vfio-ap.rst to include information about the
  AP dynamic configuration support

Change log v8-v9:

* Fixed errors flagged by the kernel test robot

* Fixed issue with guest losing queues when a new queue is probed due to
  manual bind operation.

Change log v7-v8:

* Now logging a message when an attempt to reserve APQNs for the zcrypt
  drivers will result in taking a queue away from a KVM guest to provide
  the sysadmin a way to ascertain why the sysfs operation failed.

* Created locked and unlocked versions of the ap_parse_mask_str() function.

* Now using new interface provided by an AP bus patch -
  s390/ap: introduce new ap function ap_get_qdev() - to retrieve
  struct ap_queue representing an AP queue device. This patch is not a
  part of this series but is a prerequisite for this series.

Change log v6-v7:

* Added callbacks to AP bus:
  - on_config_changed: Notifies implementing drivers that
the AP configuration has changed since last AP device scan.
  - on_scan_complete: Notifies implementing drivers that the device scan
has completed.
  - implemented on_config_changed and on_scan_complete callbacks for
vfio_ap device driver.
  - updated vfio_ap device driver's probe and remove callbacks to handle
dynamic changes to the AP device model.
* Added code to filter APQNs when assigning AP resources to a KVM guest's
  CRYCB

Change log v5-v6:

* Fixed a bug in ap_bus.c introduced with patch 2/7 of the v5
  series. Harald Freudenberer pointed out that the mutex lock
  for ap_perms_mutex in the apmask_store and aqmask_store functions
  was not being freed.

* Removed patch 6/7 which added logging to the vfio_ap driver
  to expedite acceptance of this series. The logging will be introduced
  with a separate patch series to allow more time to explore options
  such as DBF logging vs. tracepoints.

* Added 3 patches related to ensuring that APQNs that do not reference
  AP queue devices bound to the vfio_ap device driver are not assigned
  to the guest CRYCB:

  Patch 4: Filter CRYCB bits for unavailable queue devices
  Patch 5: sysfs attribute to display the guest CRYCB
  Patch 6: update guest CRYCB in vfio_ap probe and remove callbacks

* Added a patch (Patch 9) to version the vfio_ap module.

* Reshuffled patches to allow the in_use callback implementation to
  invoke the vfio_ap_mdev_verify_no_sharing() function introduced in
  patch 2.

Change log v4-v5:

* Added a patch to provide kernel s390dbf debug logs for VFIO AP

Change log v3->v4:
-
* Restored patches preventing root user from changing ownership of
  APQNs from zcrypt drivers to the vfio_ap driver if the APQN is
  assigned to an mdev.

* No longer enforcing requirement restricting guest access to
  queues represented by a queue device bound to the vfio_ap
  device driver.

* Removed shadow CRYCB and now directly updating the guest CRYCB
  from the matrix mdev's matrix.

* Rebased the patch series on top of 'vfio: ap: AP Queue Interrupt
  Control' patches.

* Disabled bind/unbind sysfs interfaces for vfio_ap driver

Change log v2->v3:
-
* Allow guest access to an AP queue only if the queue is bound to
  the vfio_ap device driver.

* Removed the patch to test CRYCB masks before taking the vCPUs
  out of SIE. Now checking the shadow CRYCB in the vfio_ap driver.

Change log v1->v2:
-
* Removed patches preventing root user from unbinding AP queues from
  the vfio_ap device driver
* Introduced a shadow CRYCB in the vfio_ap driver to manage dynamic
  changes to the AP guest configuration due to root user interventions
  or hardware anomalies.

Tony Krowiak (13):
  s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks
  s390/vfio-ap: use new AP bus interface to search for queue devices
  s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c
  s390/vfio-ap: manage link between queue struct and matrix mdev
  s390/vfio-ap: introduce shadow APCB
  s390/vfio-ap: refresh guest's APCB by filtering APQNs assigned to mdev
  s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  s390/zcrypt: driver callback to indicate resource 

Re: [PATCH v14 00/13] s390/vfio-ap: dynamic configuration support

2021-04-06 Thread Tony Krowiak
Given what I finally was able to figure out, it is interesting to note 
that this failure only occurred
when building the kernel with the debug_defconfig configuration. The 
problem occurs when the
vfio_ap_mdev_remove_queue() callback is called subsequent to the mdev 
being removed via the
vfio_ap_mdev_remove() callback. The failure results because the 
vfio_ap_queue object representing
the queue device being removed still has a link to the mdev to which the 
queue is assigned.
The fix is to remove the link to the mdev from all vfio_ap_queue objects 
when the mdev is

removed. I will provide a new set of patches with the fix included.

On 4/1/21 3:17 PM, Halil Pasic wrote:

On Wed, 31 Mar 2021 11:22:43 -0400
Tony Krowiak  wrote:


Change log v13-v14:
--

When testing I've experienced this kernel panic.


[ 4422.479706] vfio_ap matrix: MDEV: Registered
[ 4422.516999] vfio_mdev b2013234-18b2-49bf-badd-a4be9c78b120: Adding to iommu 
group 1
[ 4422.517037] vfio_mdev b2013234-18b2-49bf-badd-a4be9c78b120: MDEV: group_id = 
1
[ 4577.906708] vfio_mdev b2013234-18b2-49bf-badd-a4be9c78b120: Removing from 
iommu group 1
[ 4577.906917] vfio_mdev b2013234-18b2-49bf-badd-a4be9c78b120: MDEV: detaching 
iommu
[ 4577.908093] Unable to handle kernel pointer dereference in virtual kernel 
address space
[ 4577.908097] Failing address: 0006ec02f000 TEID: 0006ec02f403
[ 4577.908100] Fault in home space mode while using kernel ASCE.
[ 4577.908106] AS:00035eb4c007 R3:0024
[ 4577.908126] Oops: 003b ilc:3 [#1] PREEMPT SMP
[ 4577.908132] Modules linked in: vfio_ap vhost_vsock 
vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb kvm xt_CHECKSUM 
xt_MASQUERADE xt_conntrack ipt_R
EJECT xt_tcpudp nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter 
bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf
_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack 
nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc s390_trng eadm_s
ch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio sch_fq_codel configfs 
ip_tables x_tables dm_service_time ghash_s390 prng aes_s390 des_s390 libdes 
sha3_512_s390
  sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common nvme nvme_core 
zfcp scsi_transport_fc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
dm_mirror d
m_region_hash dm_log dm_mod rng_core autofs4
[ 4577.908181] CPU: 0 PID: 14315 Comm: nose2 Not tainted 
5.12.0-rc5-00030-g4cd110385fa2 #55
[ 4577.908183] Hardware name: IBM 8561 T01 701 (LPAR)
[ 4577.908185] Krnl PSW : 0404e0018000 00035d2a50f4 
(__lock_acquire+0xdc/0x7c8)
[ 4577.908194]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 
RI:0 EA:3
[ 4577.908232] Krnl GPRS: 00039d168d46 0006ec02f538 00035e7de940 

[ 4577.908235]  0001 
f9e04150
[ 4577.908237]00035fa8b100 006b6b6b680c417f f9e04150 
00035e61e8d0
[ 4577.908239]00035fa8b100  038010c4b7d8 
038010c4b738
[ 4577.908247] Krnl Code: 00035d2a50e4: eb110003000dsllg
%r1,%r1,3
[ 4577.908247]00035d2a50ea: b9080012agr %r1,%r2
[ 4577.908247]   #00035d2a50ee: e31003b80008ag  %r1,952
[ 4577.908247]   >00035d2a50f4: eb01107aagsi0(%r1),1
[ 4577.908247]00035d2a50fa: a718lhi %r1,-1
[ 4577.908247]00035d2a50fe: eb1103a800f8laa 
%r1,%r1,936
[ 4577.908247]00035d2a5104: ec18026b017ecij 
%r1,1,8,00035d2a55da
[ 4577.908247]00035d2a510a: c4180086d01flgrl
%r1,00035e37f148
[ 4577.908262] Call Trace:
[ 4577.908264]  [<00035d2a50f4>] __lock_acquire+0xdc/0x7c8
[ 4577.908267]  [<00035d2a41ac>] lock_acquire.part.0+0xec/0x1e8
[ 4577.908270]  [<00035d2a4360>] lock_acquire+0xb8/0x208
[ 4577.908272]  [<00035de6fa2a>] _raw_spin_lock_irqsave+0x6a/0xd8
[ 4577.908279]  [<00035d2874fe>] prepare_to_wait_event+0x2e/0x1e0
[ 4577.908281]  [<03ff805d539a>] vfio_ap_mdev_remove_queue+0x122/0x148 
[vfio_ap]
[ 4577.908287]  [<00035de20e94>] ap_device_remove+0x4c/0xf0
[ 4577.908292]  [<00035db268a2>] __device_release_driver+0x18a/0x230
[ 4577.908298]  [<00035db27cf0>] device_driver_detach+0x58/0xd0
[ 4577.908301]  [<00035db25000>] device_reprobe+0x30/0xc0
[ 4577.908304]  [<00035de22570>] __ap_revise_reserved+0x110/0x148
[ 4577.908307]  [<00035db2408c>] bus_for_each_dev+0x7c/0xb8
[ 4577.908310]  [<00035de2290c>] apmask_store+0xd4/0x118
[ 4577.908313]  [<00035d639316>] kernfs_fop_write_iter+0x13e/0x1e0
[ 4577.908317]  [<00035d542d22>] new_sync_write+0x10a/0x198
[ 4577.908321]  [<00035d5433ee>] v

Re: [PATCH v14 00/13] s390/vfio-ap: dynamic configuration support

2021-04-02 Thread Tony Krowiak




On 4/1/21 3:17 PM, Halil Pasic wrote:

On Wed, 31 Mar 2021 11:22:43 -0400
Tony Krowiak  wrote:


Change log v13-v14:
--

When testing I've experienced this kernel panic.


I am able to recreate this, but only when the kernel is built with
the debug_defconfig configuration. I'll look into this to try to
figure out why.







[ 4422.479706] vfio_ap matrix: MDEV: Registered
[ 4422.516999] vfio_mdev b2013234-18b2-49bf-badd-a4be9c78b120: Adding to iommu 
group 1
[ 4422.517037] vfio_mdev b2013234-18b2-49bf-badd-a4be9c78b120: MDEV: group_id = 
1
[ 4577.906708] vfio_mdev b2013234-18b2-49bf-badd-a4be9c78b120: Removing from 
iommu group 1
[ 4577.906917] vfio_mdev b2013234-18b2-49bf-badd-a4be9c78b120: MDEV: detaching 
iommu
[ 4577.908093] Unable to handle kernel pointer dereference in virtual kernel 
address space
[ 4577.908097] Failing address: 0006ec02f000 TEID: 0006ec02f403
[ 4577.908100] Fault in home space mode while using kernel ASCE.
[ 4577.908106] AS:00035eb4c007 R3:0024
[ 4577.908126] Oops: 003b ilc:3 [#1] PREEMPT SMP
[ 4577.908132] Modules linked in: vfio_ap vhost_vsock 
vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb kvm xt_CHECKSUM 
xt_MASQUERADE xt_conntrack ipt_R
EJECT xt_tcpudp nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter 
bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf
_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack 
nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc s390_trng eadm_s
ch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio sch_fq_codel configfs 
ip_tables x_tables dm_service_time ghash_s390 prng aes_s390 des_s390 libdes 
sha3_512_s390
  sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common nvme nvme_core 
zfcp scsi_transport_fc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
dm_mirror d
m_region_hash dm_log dm_mod rng_core autofs4
[ 4577.908181] CPU: 0 PID: 14315 Comm: nose2 Not tainted 
5.12.0-rc5-00030-g4cd110385fa2 #55
[ 4577.908183] Hardware name: IBM 8561 T01 701 (LPAR)
[ 4577.908185] Krnl PSW : 0404e0018000 00035d2a50f4 
(__lock_acquire+0xdc/0x7c8)
[ 4577.908194]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 
RI:0 EA:3
[ 4577.908232] Krnl GPRS: 00039d168d46 0006ec02f538 00035e7de940 

[ 4577.908235]  0001 
f9e04150
[ 4577.908237]00035fa8b100 006b6b6b680c417f f9e04150 
00035e61e8d0
[ 4577.908239]00035fa8b100  038010c4b7d8 
038010c4b738
[ 4577.908247] Krnl Code: 00035d2a50e4: eb110003000dsllg
%r1,%r1,3
[ 4577.908247]00035d2a50ea: b9080012agr %r1,%r2
[ 4577.908247]   #00035d2a50ee: e31003b80008ag  %r1,952
[ 4577.908247]   >00035d2a50f4: eb01107aagsi0(%r1),1
[ 4577.908247]00035d2a50fa: a718lhi %r1,-1
[ 4577.908247]00035d2a50fe: eb1103a800f8laa 
%r1,%r1,936
[ 4577.908247]00035d2a5104: ec18026b017ecij 
%r1,1,8,00035d2a55da
[ 4577.908247]00035d2a510a: c4180086d01flgrl
%r1,00035e37f148
[ 4577.908262] Call Trace:
[ 4577.908264]  [<00035d2a50f4>] __lock_acquire+0xdc/0x7c8
[ 4577.908267]  [<00035d2a41ac>] lock_acquire.part.0+0xec/0x1e8
[ 4577.908270]  [<00035d2a4360>] lock_acquire+0xb8/0x208
[ 4577.908272]  [<00035de6fa2a>] _raw_spin_lock_irqsave+0x6a/0xd8
[ 4577.908279]  [<00035d2874fe>] prepare_to_wait_event+0x2e/0x1e0
[ 4577.908281]  [<03ff805d539a>] vfio_ap_mdev_remove_queue+0x122/0x148 
[vfio_ap]
[ 4577.908287]  [<00035de20e94>] ap_device_remove+0x4c/0xf0
[ 4577.908292]  [<00035db268a2>] __device_release_driver+0x18a/0x230
[ 4577.908298]  [<00035db27cf0>] device_driver_detach+0x58/0xd0
[ 4577.908301]  [<00035db25000>] device_reprobe+0x30/0xc0
[ 4577.908304]  [<00035de22570>] __ap_revise_reserved+0x110/0x148
[ 4577.908307]  [<00035db2408c>] bus_for_each_dev+0x7c/0xb8
[ 4577.908310]  [<00035de2290c>] apmask_store+0xd4/0x118
[ 4577.908313]  [<00035d639316>] kernfs_fop_write_iter+0x13e/0x1e0
[ 4577.908317]  [<00035d542d22>] new_sync_write+0x10a/0x198
[ 4577.908321]  [<00035d5433ee>] vfs_write.part.0+0x196/0x290
[ 4577.908323]  [<00035d545f44>] ksys_write+0x6c/0xf8
[ 4577.908326]  [<00035d1ce7ae>] do_syscall+0x7e/0xd0
[ 4577.908330]  [<00035de5fc00>] __do_syscall+0xc0/0xd8
[ 4577.908334]  [<00035de70c22>] system_call+0x72/0x98
[ 4577.908337] INFO: lockdep is turned off.
[ 4577.908338] Last Breaking-Event-Address:
[ 4577.908340]  [<038010c4b648>] 0x38010c4b648
[ 4577.908345] Kernel panic - not syncing: Fatal exception: panic_on_oops




[PATCH v14 12/13] s390/zcrypt: notify drivers on config changed and scan complete callbacks

2021-03-31 Thread Tony Krowiak
This patch intruduces an extension to the ap bus to notify device drivers
when the host AP configuration changes - i.e., adapters, domains or
control domains are added or removed. To that end, two new callbacks are
introduced for AP device drivers:

  void (*on_config_changed)(struct ap_config_info *new_config_info,
struct ap_config_info *old_config_info);

 This callback is invoked at the start of the AP bus scan
 function when it determines that the host AP configuration information
 has changed since the previous scan. This is done by storing
 an old and current QCI info struct and comparing them. If there is any
 difference, the callback is invoked.

 Note that when the AP bus scan detects that AP adapters, domains or
 control domains have been removed from the host's AP configuration, it
 will remove the associated devices from the AP bus subsystem's device
 model. This callback gives the device driver a chance to respond to
 the removal of the AP devices from the host configuration prior to
 calling the device driver's remove callback. The primary purpose of
 this callback is to allow the vfio_ap driver to do a bulk unplug of
 all affected adapters, domains and control domains from affected
 guests rather than unplugging them one at a time when the remove
 callback is invoked.

  void (*on_scan_complete)(struct ap_config_info *new_config_info,
   struct ap_config_info *old_config_info);

 The on_scan_complete callback is invoked after the ap bus scan is
 complete if the host AP configuration data has changed.

 Note that when the AP bus scan detects that adapters, domains or
 control domains have been added to the host's configuration, it will
 create new devices in the AP bus subsystem's device model. The primary
 purpose of this callback is to allow the vfio_ap driver to do a bulk
 plug of all affected adapters, domains and control domains into
 affected guests rather than plugging them one at a time when the
 probe callback is invoked.

Please note that changes to the apmask and aqmask do not trigger
these two callbacks since the bus scan function is not invoked by changes
to those masks.

Signed-off-by: Harald Freudenberger 
Reviewed-by: Halil Pasic 
Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/ap_bus.c  |  89 ++-
 drivers/s390/crypto/ap_bus.h  |  12 ++
 drivers/s390/crypto/vfio_ap_drv.c |   4 +-
 drivers/s390/crypto/vfio_ap_ops.c | 215 +++---
 drivers/s390/crypto/vfio_ap_private.h |  15 +-
 5 files changed, 306 insertions(+), 29 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index b7653cec81ac..ccc87fb84c6d 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -82,6 +82,7 @@ static atomic64_t ap_scan_bus_count;
 static DECLARE_COMPLETION(ap_init_apqn_bindings_complete);
 
 static struct ap_config_info *ap_qci_info;
+static struct ap_config_info *ap_qci_info_old;
 
 /*
  * AP bus related debug feature things.
@@ -1579,6 +1580,50 @@ static int __match_queue_device_with_queue_id(struct 
device *dev, const void *da
&& AP_QID_QUEUE(to_ap_queue(dev)->qid) == (int)(long) data;
 }
 
+/* Helper function for notify_config_changed */
+static int __drv_notify_config_changed(struct device_driver *drv, void *data)
+{
+   struct ap_driver *ap_drv = to_ap_drv(drv);
+
+   if (try_module_get(drv->owner)) {
+   if (ap_drv->on_config_changed)
+   ap_drv->on_config_changed(ap_qci_info,
+ ap_qci_info_old);
+   module_put(drv->owner);
+   }
+
+   return 0;
+}
+
+/* Notify all drivers about an qci config change */
+static inline void notify_config_changed(void)
+{
+   bus_for_each_drv(_bus_type, NULL, NULL,
+__drv_notify_config_changed);
+}
+
+/* Helper function for notify_scan_complete */
+static int __drv_notify_scan_complete(struct device_driver *drv, void *data)
+{
+   struct ap_driver *ap_drv = to_ap_drv(drv);
+
+   if (try_module_get(drv->owner)) {
+   if (ap_drv->on_scan_complete)
+   ap_drv->on_scan_complete(ap_qci_info,
+ap_qci_info_old);
+   module_put(drv->owner);
+   }
+
+   return 0;
+}
+
+/* Notify all drivers about bus scan complete */
+static inline void notify_scan_complete(void)
+{
+   bus_for_each_drv(_bus_type, NULL, NULL,
+__drv_notify_scan_complete);
+}
+
 /*
  * Helper function for ap_scan_bus().
  * Remove card device and associated queue devices.
@@ -1857,15 +1902,51 @@ static inline void ap_scan_adapter(int ap)
put_device(>ap_dev.device);
 }
 
+/*
+ * ap_get_configuration
+ *
+ * Stores the host

[PATCH v14 10/13] s390/vfio-ap: implement in-use callback for vfio_ap driver

2021-03-31 Thread Tony Krowiak
Let's implement the callback to indicate when an APQN
is in use by the vfio_ap device driver. The callback is
invoked whenever a change to the apmask or aqmask would
result in one or more queue devices being removed from the driver. The
vfio_ap device driver will indicate a resource is in use
if the APQN of any of the queue devices to be removed are assigned to
any of the matrix mdevs under the driver's control.

There is potential for a deadlock condition between the matrix_dev->lock
used to lock the matrix device during assignment of adapters and domains
and the ap_perms_mutex locked by the AP bus when changes are made to the
sysfs apmask/aqmask attributes.

Consider following scenario (courtesy of Halil Pasic):
1) apmask_store() takes ap_perms_mutex
2) assign_adapter_store() takes matrix_dev->lock
3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
   to take matrix_dev->lock
4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
   which tries to take ap_perms_mutex

BANG!

To resolve this issue, instead of using the mutex_lock(_dev->lock)
function to lock the matrix device during assignment of an adapter or
domain to a matrix_mdev as well as during the in_use callback, the
mutex_trylock(_dev->lock) function will be used. If the lock is not
obtained, then the assignment and in_use functions will terminate with
-EAGAIN.

Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_drv.c |  1 +
 drivers/s390/crypto/vfio_ap_ops.c | 38 ++-
 drivers/s390/crypto/vfio_ap_private.h |  2 ++
 3 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
b/drivers/s390/crypto/vfio_ap_drv.c
index 73bd073fd5d3..8934471b7944 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -147,6 +147,7 @@ static int __init vfio_ap_init(void)
memset(_ap_drv, 0, sizeof(vfio_ap_drv));
vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
+   vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
vfio_ap_drv.ids = ap_queue_ids;
 
ret = ap_driver_register(_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 2578dfe68cda..191807c10c23 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -650,10 +650,14 @@ static void vfio_ap_mdev_link_adapter(struct 
ap_matrix_mdev *matrix_mdev,
  *driver; or, if no APQIs have yet been assigned, the APID is not
  *contained in an APQN bound to the vfio_ap device driver.
  *
- * 4. -EBUSY
+ * 4. -EADDRINUSE
  *An APQN derived from the cross product of the APID being assigned
  *and the APQIs previously assigned is being used by another mediated
- *matrix device or the mdev lock could not be acquired.
+ *matrix device.
+ *
+ * 5. -EAGAIN
+ *The mdev lock could not be acquired which is required in order to
+ *change the AP configuration for the mdev
  */
 static ssize_t assign_adapter_store(struct device *dev,
struct device_attribute *attr,
@@ -664,7 +668,8 @@ static ssize_t assign_adapter_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-   mutex_lock(_dev->lock);
+   if (!mutex_trylock(_dev->lock))
+   return -EAGAIN;
 
/*
 * If the KVM pointer is in flux or the guest is running, disallow
@@ -803,10 +808,14 @@ static void vfio_ap_mdev_link_domain(struct 
ap_matrix_mdev *matrix_mdev,
  *driver; or, if no APIDs have yet been assigned, the APQI is not
  *contained in an APQN bound to the vfio_ap device driver.
  *
- * 4. -BUSY
+ * 4. -EADDRINUSE
  *An APQN derived from the cross product of the APQI being assigned
  *and the APIDs previously assigned is being used by another mediated
- *matrix device or the mdev lock could not be acquired.
+ *matrix device.
+ *
+ * 5. -EAGAIN
+ *The mdev lock could not be acquired which is required in order to
+ *change the AP configuration for the mdev
  */
 static ssize_t assign_domain_store(struct device *dev,
   struct device_attribute *attr,
@@ -818,7 +827,8 @@ static ssize_t assign_domain_store(struct device *dev,
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
 
-   mutex_lock(_dev->lock);
+   if (!mutex_trylock(_dev->lock))
+   return -EAGAIN;
 
/*
 * If the KVM pointer is in flux or the guest is running, disallow
@@ -946,6 +956,7 @@ static void vfio_ap_mdev_hot_plug_cdom(struct 
ap_matrix_mdev *matrix_mdev,
  * returns one of the fol

[PATCH v14 13/13] s390/vfio-ap: update docs to include dynamic config support

2021-03-31 Thread Tony Krowiak
Update the documentation in vfio-ap.rst to include information about the
AP dynamic configuration support (i.e., hot plug of adapters, domains
and control domains via the matrix mediated device's sysfs assignment
attributes).

Signed-off-by: Tony Krowiak 
---
 Documentation/s390/vfio-ap.rst | 383 -
 1 file changed, 284 insertions(+), 99 deletions(-)

diff --git a/Documentation/s390/vfio-ap.rst b/Documentation/s390/vfio-ap.rst
index e15436599086..031c2e5ee138 100644
--- a/Documentation/s390/vfio-ap.rst
+++ b/Documentation/s390/vfio-ap.rst
@@ -123,9 +123,9 @@ Let's now take a look at how AP instructions executed on a 
guest are interpreted
 by the hardware.
 
 A satellite control block called the Crypto Control Block (CRYCB) is attached 
to
-our main hardware virtualization control block. The CRYCB contains three fields
-to identify the adapters, usage domains and control domains assigned to the KVM
-guest:
+our main hardware virtualization control block. The CRYCB contains an AP 
Control
+Block (APCB) that has three fields to identify the adapters, usage domains and
+control domains assigned to the KVM guest:
 
 * The AP Mask (APM) field is a bit mask that identifies the AP adapters 
assigned
   to the KVM guest. Each bit in the mask, from left to right (i.e. from most
@@ -192,7 +192,7 @@ The design introduces three new objects:
 
 1. AP matrix device
 2. VFIO AP device driver (vfio_ap.ko)
-3. VFIO AP mediated matrix pass-through device
+3. VFIO AP mediated pass-through device
 
 The VFIO AP device driver
 -
@@ -200,12 +200,13 @@ The VFIO AP (vfio_ap) device driver serves the following 
purposes:
 
 1. Provides the interfaces to secure APQNs for exclusive use of KVM guests.
 
-2. Sets up the VFIO mediated device interfaces to manage a mediated matrix
+2. Sets up the VFIO mediated device interfaces to manage a vfio_ap mediated
device and creates the sysfs interfaces for assigning adapters, usage
domains, and control domains comprising the matrix for a KVM guest.
 
-3. Configures the APM, AQM and ADM in the CRYCB referenced by a KVM guest's
-   SIE state description to grant the guest access to a matrix of AP devices
+3. Configures the APM, AQM and ADM in the APCB contained in the CRYCB 
referenced
+   by a KVM guest's SIE state description to grant the guest access to a matrix
+   of AP devices
 
 Reserve APQNs for exclusive use of KVM guests
 -
@@ -253,7 +254,7 @@ The process for reserving an AP queue for use by a KVM 
guest is:
 1. The administrator loads the vfio_ap device driver
 2. The vfio-ap driver during its initialization will register a single 'matrix'
device with the device core. This will serve as the parent device for
-   all mediated matrix devices used to configure an AP matrix for a guest.
+   all vfio_ap mediated devices used to configure an AP matrix for a guest.
 3. The /sys/devices/vfio_ap/matrix device is created by the device core
 4. The vfio_ap device driver will register with the AP bus for AP queue devices
of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap
@@ -269,7 +270,7 @@ The process for reserving an AP queue for use by a KVM 
guest is:
default zcrypt cex4queue driver.
 8. The AP bus probes the vfio_ap device driver to bind the queues reserved for
it.
-9. The administrator creates a passthrough type mediated matrix device to be
+9. The administrator creates a passthrough type vfio_ap mediated device to be
used by a guest
 10. The administrator assigns the adapters, usage domains and control domains
 to be exclusively used by a guest.
@@ -279,14 +280,14 @@ Set up the VFIO mediated device interfaces
 The VFIO AP device driver utilizes the common interface of the VFIO mediated
 device core driver to:
 
-* Register an AP mediated bus driver to add a mediated matrix device to and
+* Register an AP mediated bus driver to add a vfio_ap mediated device to and
   remove it from a VFIO group.
-* Create and destroy a mediated matrix device
-* Add a mediated matrix device to and remove it from the AP mediated bus driver
-* Add a mediated matrix device to and remove it from an IOMMU group
+* Create and destroy a vfio_ap mediated device
+* Add a vfio_ap mediated device to and remove it from the AP mediated bus 
driver
+* Add a vfio_ap mediated device to and remove it from an IOMMU group
 
 The following high-level block diagram shows the main components and interfaces
-of the VFIO AP mediated matrix device driver::
+of the VFIO AP mediated device driver::
 
+-+
| |
@@ -343,7 +344,7 @@ matrix device.
* device_api:
the mediated device type's API
* available_instances:
-   the number of mediated matrix passthrough devices
+   the number of vfio_ap mediated passthrough devices
that can be created
* device_api:
specifies the VFIO API

[PATCH v14 09/13] s390/zcrypt: driver callback to indicate resource in use

2021-03-31 Thread Tony Krowiak
Introduces a new driver callback to prevent a root user from unbinding
an AP queue from its device driver if the queue is in use. The callback
will be invoked whenever a change to the AP bus's sysfs apmask or aqmask
attributes would result in one or more AP queues being removed from its
driver. If the callback responds in the affirmative for any driver
queried, the change to the apmask or aqmask will be rejected with a device
busy error.

For this patch, only non-default drivers will be queried. Currently,
there is only one non-default driver, the vfio_ap device driver. The
vfio_ap device driver facilitates pass-through of an AP queue to a
guest. The idea here is that a guest may be administered by a different
sysadmin than the host and we don't want AP resources to unexpectedly
disappear from a guest's AP configuration (i.e., adapters and domains
assigned to the matrix mdev). This will enforce the proper procedure for
removing AP resources intended for guest usage which is to
first unassign them from the matrix mdev, then unbind them from the
vfio_ap device driver.

Signed-off-by: Tony Krowiak 
Reviewed-by: Harald Freudenberger 
Reviewed-by: Halil Pasic 
---
 drivers/s390/crypto/ap_bus.c | 160 ---
 drivers/s390/crypto/ap_bus.h |   4 +
 2 files changed, 154 insertions(+), 10 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 2758d05a802d..b7653cec81ac 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ap_bus.h"
 #include "ap_debug.h"
@@ -1006,6 +1007,23 @@ static int modify_bitmap(const char *str, unsigned long 
*bitmap, int bits)
return 0;
 }
 
+static int ap_parse_bitmap_str(const char *str, unsigned long *bitmap, int 
bits,
+  unsigned long *newmap)
+{
+   unsigned long size;
+   int rc;
+
+   size = BITS_TO_LONGS(bits) * sizeof(unsigned long);
+   if (*str == '+' || *str == '-') {
+   memcpy(newmap, bitmap, size);
+   rc = modify_bitmap(str, newmap, bits);
+   } else {
+   memset(newmap, 0, size);
+   rc = hex2bitmap(str, newmap, bits);
+   }
+   return rc;
+}
+
 int ap_parse_mask_str(const char *str,
  unsigned long *bitmap, int bits,
  struct mutex *lock)
@@ -1025,14 +1043,7 @@ int ap_parse_mask_str(const char *str,
kfree(newmap);
return -ERESTARTSYS;
}
-
-   if (*str == '+' || *str == '-') {
-   memcpy(newmap, bitmap, size);
-   rc = modify_bitmap(str, newmap, bits);
-   } else {
-   memset(newmap, 0, size);
-   rc = hex2bitmap(str, newmap, bits);
-   }
+   rc = ap_parse_bitmap_str(str, bitmap, bits, newmap);
if (rc == 0)
memcpy(bitmap, newmap, size);
mutex_unlock(lock);
@@ -1224,12 +1235,76 @@ static ssize_t apmask_show(struct bus_type *bus, char 
*buf)
return rc;
 }
 
+static int __verify_card_reservations(struct device_driver *drv, void *data)
+{
+   int rc = 0;
+   struct ap_driver *ap_drv = to_ap_drv(drv);
+   unsigned long *newapm = (unsigned long *)data;
+
+   /*
+* No need to verify whether the driver is using the queues if it is the
+* default driver.
+*/
+   if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+   return 0;
+
+   /*
+* increase the driver's module refcounter to be sure it is not
+* going away when we invoke the callback function.
+*/
+   if (!try_module_get(drv->owner))
+   return 0;
+
+   if (ap_drv->in_use) {
+   rc = ap_drv->in_use(newapm, ap_perms.aqm);
+   if (rc)
+   rc = -EBUSY;
+   }
+
+   /* release the driver's module */
+   module_put(drv->owner);
+
+   return rc;
+}
+
+static int apmask_commit(unsigned long *newapm)
+{
+   int rc;
+   unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
+
+   /*
+* Check if any bits in the apmask have been set which will
+* result in queues being removed from non-default drivers
+*/
+   if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
+   rc = bus_for_each_drv(_bus_type, NULL, reserved,
+ __verify_card_reservations);
+   if (rc)
+   return rc;
+   }
+
+   memcpy(ap_perms.apm, newapm, APMASKSIZE);
+
+   return 0;
+}
+
 static ssize_t apmask_store(struct bus_type *bus, const char *buf,
size_t count)
 {
int rc;
+   DECLARE_BITMAP(newapm, AP_DEVICES);
+
+   if (mutex_lock_interruptible(_perms_mutex))
+   return -ERESTARTSYS;
 
-   rc = ap_parse_mask_str(buf, ap_perms.apm

[PATCH v14 08/13] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

2021-03-31 Thread Tony Krowiak
Let's allow adapters, domains and control domains to be hot plugged into
and hot unplugged from a KVM guest using a matrix mdev when:

* The adapter, domain or control domain is assigned to or unassigned from
  the matrix mdev

* A queue device with an APQN assigned to the matrix mdev is bound to or
  unbound from the vfio_ap device driver.

Whenever an assignment or unassignment of an adapter, domain or control
domain is performed as well as when a bind or unbind of a queue device
is executed, the AP control block (APCB) that supplies the AP configuration
to the guest is first refreshed.

After refreshing the APCB, if the mdev is in use by a KVM guest, it is
hot plugged into the guest to provide access to dynamically provide
access to the adapters, domains and control domains provided via the
newly refreshed APCB.

Signed-off-by: Tony Krowiak 
Acked-by: Halil Pasic 
---
 drivers/s390/crypto/vfio_ap_ops.c | 72 +++
 1 file changed, 63 insertions(+), 9 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 1f2a3049b283..2578dfe68cda 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -311,6 +311,20 @@ static void vfio_ap_matrix_init(struct ap_config_info 
*info,
matrix->adm_max = info->apxa ? info->Nd : 15;
 }
 
+static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
+{
+   return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
+}
+
+static void vfio_ap_mdev_commit_apcb(struct ap_matrix_mdev *matrix_mdev)
+{
+   if (vfio_ap_mdev_has_crycb(matrix_mdev))
+   kvm_arch_crypto_set_masks(matrix_mdev->kvm,
+ matrix_mdev->shadow_apcb.apm,
+ matrix_mdev->shadow_apcb.aqm,
+ matrix_mdev->shadow_apcb.adm);
+}
+
 /*
  * vfio_ap_mdev_filter_apcb
  *
@@ -378,6 +392,7 @@ static void vfio_ap_mdev_refresh_apcb(struct ap_matrix_mdev 
*matrix_mdev)
   sizeof(struct ap_matrix)) != 0) {
memcpy(_mdev->shadow_apcb, _apcb,
   sizeof(struct ap_matrix));
+   vfio_ap_mdev_commit_apcb(matrix_mdev);
}
 }
 
@@ -655,7 +670,7 @@ static ssize_t assign_adapter_store(struct device *dev,
 * If the KVM pointer is in flux or the guest is running, disallow
 * un-assignment of adapter
 */
-   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   if (matrix_mdev->kvm_busy) {
ret = -EBUSY;
goto done;
}
@@ -728,7 +743,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
 * If the KVM pointer is in flux or the guest is running, disallow
 * un-assignment of adapter
 */
-   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   if (matrix_mdev->kvm_busy) {
ret = -EBUSY;
goto done;
}
@@ -809,7 +824,7 @@ static ssize_t assign_domain_store(struct device *dev,
 * If the KVM pointer is in flux or the guest is running, disallow
 * assignment of domain
 */
-   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   if (matrix_mdev->kvm_busy) {
ret = -EBUSY;
goto done;
}
@@ -881,7 +896,7 @@ static ssize_t unassign_domain_store(struct device *dev,
 * If the KVM pointer is in flux or the guest is running, disallow
 * un-assignment of domain
 */
-   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   if (matrix_mdev->kvm_busy) {
ret = -EBUSY;
goto done;
}
@@ -906,6 +921,16 @@ static ssize_t unassign_domain_store(struct device *dev,
 }
 static DEVICE_ATTR_WO(unassign_domain);
 
+static void vfio_ap_mdev_hot_plug_cdom(struct ap_matrix_mdev *matrix_mdev,
+  unsigned long domid)
+{
+   if (!test_bit_inv(domid, matrix_mdev->shadow_apcb.adm) &&
+   test_bit_inv(domid, (unsigned long *)matrix_dev->info.adm)) {
+   set_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
+   vfio_ap_mdev_commit_apcb(matrix_mdev);
+   }
+}
+
 /**
  * assign_control_domain_store
  *
@@ -937,7 +962,7 @@ static ssize_t assign_control_domain_store(struct device 
*dev,
 * If the KVM pointer is in flux or the guest is running, disallow
 * assignment of control domain.
 */
-   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   if (matrix_mdev->kvm_busy) {
ret = -EBUSY;
goto done;
}
@@ -957,7 +982,7 @@ static ssize_t assign_control_domain_store(struct device 
*dev,
 * number of control domains that can be assigned.
 */
set_bit_inv(id, matrix_mdev->matrix.adm);
-   vfio_ap_mdev_

[PATCH v14 11/13] s390/vfio-ap: sysfs attribute to display the guest's matrix

2021-03-31 Thread Tony Krowiak
The matrix of adapters and domains configured in a guest's APCB may
differ from the matrix of adapters and domains assigned to the matrix mdev,
so this patch introduces a sysfs attribute to display the matrix of
adapters and domains that are or will be assigned to the APCB of a guest
that is or will be using the matrix mdev. For a matrix mdev denoted by
$uuid, the guest matrix can be displayed as follows:

   cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix

Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_ops.c | 51 ++-
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 191807c10c23..c147e3f43ae4 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1089,29 +1089,24 @@ static ssize_t control_domains_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(control_domains);
 
-static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
-  char *buf)
+static ssize_t vfio_ap_mdev_matrix_show(struct ap_matrix *matrix, char *buf)
 {
-   struct mdev_device *mdev = mdev_from_dev(dev);
-   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
char *bufpos = buf;
unsigned long apid;
unsigned long apqi;
unsigned long apid1;
unsigned long apqi1;
-   unsigned long napm_bits = matrix_mdev->matrix.apm_max + 1;
-   unsigned long naqm_bits = matrix_mdev->matrix.aqm_max + 1;
+   unsigned long napm_bits = matrix->apm_max + 1;
+   unsigned long naqm_bits = matrix->aqm_max + 1;
int nchars = 0;
int n;
 
-   apid1 = find_first_bit_inv(matrix_mdev->matrix.apm, napm_bits);
-   apqi1 = find_first_bit_inv(matrix_mdev->matrix.aqm, naqm_bits);
-
-   mutex_lock(_dev->lock);
+   apid1 = find_first_bit_inv(matrix->apm, napm_bits);
+   apqi1 = find_first_bit_inv(matrix->aqm, naqm_bits);
 
if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {
-   for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
-   for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
+   for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
+   for_each_set_bit_inv(apqi, matrix->aqm,
 naqm_bits) {
n = sprintf(bufpos, "%02lx.%04lx\n", apid,
apqi);
@@ -1120,25 +1115,52 @@ static ssize_t matrix_show(struct device *dev, struct 
device_attribute *attr,
}
}
} else if (apid1 < napm_bits) {
-   for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
+   for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
n = sprintf(bufpos, "%02lx.\n", apid);
bufpos += n;
nchars += n;
}
} else if (apqi1 < naqm_bits) {
-   for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm_bits) {
+   for_each_set_bit_inv(apqi, matrix->aqm, naqm_bits) {
n = sprintf(bufpos, ".%04lx\n", apqi);
bufpos += n;
nchars += n;
}
}
 
+   return nchars;
+}
+
+static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
+  char *buf)
+{
+   ssize_t nchars;
+   struct mdev_device *mdev = mdev_from_dev(dev);
+   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+   mutex_lock(_dev->lock);
+   nchars = vfio_ap_mdev_matrix_show(_mdev->matrix, buf);
mutex_unlock(_dev->lock);
 
return nchars;
 }
 static DEVICE_ATTR_RO(matrix);
 
+static ssize_t guest_matrix_show(struct device *dev,
+struct device_attribute *attr, char *buf)
+{
+   ssize_t nchars;
+   struct mdev_device *mdev = mdev_from_dev(dev);
+   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+   mutex_lock(_dev->lock);
+   nchars = vfio_ap_mdev_matrix_show(_mdev->shadow_apcb, buf);
+   mutex_unlock(_dev->lock);
+
+   return nchars;
+}
+static DEVICE_ATTR_RO(guest_matrix);
+
 static struct attribute *vfio_ap_mdev_attrs[] = {
_attr_assign_adapter.attr,
_attr_unassign_adapter.attr,
@@ -1148,6 +1170,7 @@ static struct attribute *vfio_ap_mdev_attrs[] = {
_attr_unassign_control_domain.attr,
_attr_control_domains.attr,
_attr_matrix.attr,
+   _attr_guest_matrix.attr,
NULL,
 };
 
-- 
2.21.3



[PATCH v14 05/13] s390/vfio-ap: introduce shadow APCB

2021-03-31 Thread Tony Krowiak
The APCB is a field within the CRYCB that provides the AP configuration
to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
maintain it for the lifespan of the guest.

Signed-off-by: Tony Krowiak 
Reviewed-by: Halil Pasic 
---
 drivers/s390/crypto/vfio_ap_ops.c | 10 ++
 drivers/s390/crypto/vfio_ap_private.h |  2 ++
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 28266165eb75..588de7ec4866 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -327,6 +327,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct 
mdev_device *mdev)
matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(_dev->info, _mdev->matrix);
init_waitqueue_head(_mdev->wait_for_kvm);
+   vfio_ap_matrix_init(_dev->info, _mdev->shadow_apcb);
hash_init(matrix_mdev->qtable);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
@@ -1184,12 +1185,13 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
*matrix_mdev,
}
 
kvm_get_kvm(kvm);
+   memcpy(_mdev->shadow_apcb, _mdev->matrix,
+  sizeof(struct ap_matrix));
matrix_mdev->kvm_busy = true;
mutex_unlock(_dev->lock);
-   kvm_arch_crypto_set_masks(kvm,
- matrix_mdev->matrix.apm,
- matrix_mdev->matrix.aqm,
- matrix_mdev->matrix.adm);
+   kvm_arch_crypto_set_masks(kvm, matrix_mdev->shadow_apcb.apm,
+ matrix_mdev->shadow_apcb.aqm,
+ matrix_mdev->shadow_apcb.adm);
mutex_lock(_dev->lock);
kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
matrix_mdev->kvm = kvm;
diff --git a/drivers/s390/crypto/vfio_ap_private.h 
b/drivers/s390/crypto/vfio_ap_private.h
index af3f53a3ea4c..6f4f1f5bd611 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -75,6 +75,7 @@ struct ap_matrix {
  * @list:  allows the ap_matrix_mdev struct to be added to a list
  * @matrix:the adapters, usage domains and control domains assigned to the
  * mediated matrix device.
+ * @shadow_apcb:the shadow copy of the APCB field of the KVM guest's CRYCB
  * @group_notifier: notifier block used for specifying callback function for
  * handling the VFIO_GROUP_NOTIFY_SET_KVM event
  * @kvm:   the struct holding guest's state
@@ -82,6 +83,7 @@ struct ap_matrix {
 struct ap_matrix_mdev {
struct list_head node;
struct ap_matrix matrix;
+   struct ap_matrix shadow_apcb;
struct notifier_block group_notifier;
struct notifier_block iommu_notifier;
bool kvm_busy;
-- 
2.21.3



[PATCH v14 07/13] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device

2021-03-31 Thread Tony Krowiak
The current implementation does not allow assignment of an AP adapter or
domain to an mdev device if each APQN resulting from the assignment
does not reference an AP queue device that is bound to the vfio_ap device
driver. This patch allows assignment of AP resources to the matrix mdev as
long as the APQNs resulting from the assignment:
   1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
   2. Are not assigned to another matrix mdev.

The rationale behind this is that the AP architecture does not preclude
assignment of APQNs to an AP configuration profile that are not available
to the system.

Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_ops.c | 228 --
 1 file changed, 56 insertions(+), 172 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 241051565783..1f2a3049b283 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -475,141 +475,50 @@ static struct attribute_group 
*vfio_ap_mdev_type_groups[] = {
NULL,
 };
 
-struct vfio_ap_queue_reserved {
-   unsigned long *apid;
-   unsigned long *apqi;
-   bool reserved;
-};
-
-/**
- * vfio_ap_has_queue
- *
- * @dev: an AP queue device
- * @data: a struct vfio_ap_queue_reserved reference
- *
- * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
- * apid or apqi specified in @data:
- *
- * - If @data contains both an apid and apqi value, then @data will be flagged
- *   as reserved if the APID and APQI fields for the AP queue device matches
- *
- * - If @data contains only an apid value, @data will be flagged as
- *   reserved if the APID field in the AP queue device matches
- *
- * - If @data contains only an apqi value, @data will be flagged as
- *   reserved if the APQI field in the AP queue device matches
- *
- * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
- * @data does not contain either an apid or apqi.
- */
-static int vfio_ap_has_queue(struct device *dev, void *data)
-{
-   struct vfio_ap_queue_reserved *qres = data;
-   struct ap_queue *ap_queue = to_ap_queue(dev);
-   ap_qid_t qid;
-   unsigned long id;
-
-   if (qres->apid && qres->apqi) {
-   qid = AP_MKQID(*qres->apid, *qres->apqi);
-   if (qid == ap_queue->qid)
-   qres->reserved = true;
-   } else if (qres->apid && !qres->apqi) {
-   id = AP_QID_CARD(ap_queue->qid);
-   if (id == *qres->apid)
-   qres->reserved = true;
-   } else if (!qres->apid && qres->apqi) {
-   id = AP_QID_QUEUE(ap_queue->qid);
-   if (id == *qres->apqi)
-   qres->reserved = true;
-   } else {
-   return -EINVAL;
-   }
-
-   return 0;
-}
-
-/**
- * vfio_ap_verify_queue_reserved
- *
- * @matrix_dev: a mediated matrix device
- * @apid: an AP adapter ID
- * @apqi: an AP queue index
- *
- * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP 
device
- * driver according to the following rules:
- *
- * - If both @apid and @apqi are not NULL, then there must be an AP queue
- *   device bound to the vfio_ap driver with the APQN identified by @apid and
- *   @apqi
- *
- * - If only @apid is not NULL, then there must be an AP queue device bound
- *   to the vfio_ap driver with an APQN containing @apid
- *
- * - If only @apqi is not NULL, then there must be an AP queue device bound
- *   to the vfio_ap driver with an APQN containing @apqi
- *
- * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
- */
-static int vfio_ap_verify_queue_reserved(unsigned long *apid,
-unsigned long *apqi)
-{
-   int ret;
-   struct vfio_ap_queue_reserved qres;
-
-   qres.apid = apid;
-   qres.apqi = apqi;
-   qres.reserved = false;
+#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
+"already assigned to %s"
 
-   ret = driver_for_each_device(_dev->vfio_ap_drv->driver, NULL,
-, vfio_ap_has_queue);
-   if (ret)
-   return ret;
-
-   if (qres.reserved)
-   return 0;
-
-   return -EADDRNOTAVAIL;
-}
-
-static int
-vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev 
*matrix_mdev,
-unsigned long apid)
+static void vfio_ap_mdev_log_sharing_err(struct ap_matrix_mdev *matrix_mdev,
+unsigned long *apm,
+unsigned long *aqm)
 {
-   int ret;
-   unsigned long apqi;
-   unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
-
-   if (find_first_bit_inv(matrix_mdev->matrix.aqm, n

[PATCH v14 06/13] s390/vfio-ap: refresh guest's APCB by filtering APQNs assigned to mdev

2021-03-31 Thread Tony Krowiak
Refresh the guest's APCB by filtering the APQNs assigned to the matrix mdev
that do not reference an AP queue device bound to the vfio_ap device
driver. The mdev's APQNs will be filtered according to the following rules:

* The APID of each adapter and the APQI of each domain that is not in the
  host's AP configuration is filtered out.

* The APID of each adapter comprising an APQN that does not reference a
  queue device bound to the vfio_ap device driver is filtered. The APQNs
  are derived from the Cartesian product of the APID of each adapter and
  APQI of each domain assigned to the mdev.

The filtering will take place:

* Whenever an adapter, domain or control domains is assigned or
  unassigned.

* When a queue device is bound to or unbound from the vfio_ap device
  driver.

Signed-off-by: Tony Krowiak 
Acked-by: Halil Pasic 
---
 drivers/s390/crypto/vfio_ap_ops.c | 84 +--
 1 file changed, 81 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 588de7ec4866..241051565783 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -311,6 +311,76 @@ static void vfio_ap_matrix_init(struct ap_config_info 
*info,
matrix->adm_max = info->apxa ? info->Nd : 15;
 }
 
+/*
+ * vfio_ap_mdev_filter_apcb
+ *
+ * @matrix_mdev: the mdev whose AP configuration is to be filtered.
+ * @shadow_apcb: the APCB to use to store the guest's AP configuration after
+ *  filtering takes place.
+ */
+static void vfio_ap_mdev_filter_apcb(struct ap_matrix_mdev *matrix_mdev,
+struct ap_matrix *shadow_apcb)
+{
+   int ret;
+   unsigned long apid, apqi, apqn;
+
+   ret = ap_qci(_dev->info);
+   if (ret)
+   return;
+
+   /*
+* Copy the adapters, domains and control domains to the shadow_apcb
+* from the matrix mdev, but only those that are assigned to the host's
+* AP configuration.
+*/
+   bitmap_and(shadow_apcb->apm, matrix_mdev->matrix.apm,
+  (unsigned long *)matrix_dev->info.apm, AP_DEVICES);
+   bitmap_and(shadow_apcb->aqm, matrix_mdev->matrix.aqm,
+  (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
+   bitmap_and(shadow_apcb->adm, matrix_mdev->matrix.adm,
+  (unsigned long *)matrix_dev->info.adm, AP_DOMAINS);
+
+   for_each_set_bit_inv(apid, shadow_apcb->apm, AP_DEVICES) {
+   for_each_set_bit_inv(apqi, shadow_apcb->aqm, AP_DOMAINS) {
+   /*
+* If the APQN is not bound to the vfio_ap device
+* driver, then we can't assign it to the guest's
+* AP configuration. The AP architecture won't
+* allow filtering of a single APQN, so if we're
+* filtering APIDs, then filter the APID; otherwise,
+* filter the APQI.
+*/
+   apqn = AP_MKQID(apid, apqi);
+   if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
+   clear_bit_inv(apid, shadow_apcb->apm);
+   break;
+   }
+   }
+   }
+}
+
+/**
+ * vfio_ap_mdev_refresh_apcb
+ *
+ * Refresh the guest's APCB by filtering the APQNs assigned to the matrix mdev
+ * that do not reference an AP queue device bound to the vfio_ap device driver.
+ *
+ * @matrix_mdev:  the matrix mdev whose AP configuration is to be filtered
+ */
+static void vfio_ap_mdev_refresh_apcb(struct ap_matrix_mdev *matrix_mdev)
+{
+   struct ap_matrix shadow_apcb;
+
+   vfio_ap_matrix_init(_dev->info, _apcb);
+   vfio_ap_mdev_filter_apcb(matrix_mdev, _apcb);
+
+   if (memcmp(_apcb, _mdev->shadow_apcb,
+  sizeof(struct ap_matrix)) != 0) {
+   memcpy(_mdev->shadow_apcb, _apcb,
+  sizeof(struct ap_matrix));
+   }
+}
+
 static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 {
struct ap_matrix_mdev *matrix_mdev;
@@ -694,6 +764,7 @@ static ssize_t assign_adapter_store(struct device *dev,
goto share_err;
 
vfio_ap_mdev_link_adapter(matrix_mdev, apid);
+   vfio_ap_mdev_refresh_apcb(matrix_mdev);
ret = count;
goto done;
 
@@ -763,6 +834,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
 
clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
vfio_ap_mdev_unlink_adapter(matrix_mdev, apid);
+   vfio_ap_mdev_refresh_apcb(matrix_mdev);
ret = count;
 done:
mutex_unlock(_dev->lock);
@@ -871,6 +943,7 @@ static ssize_t assign_domain_store(struct device *dev,
goto share_err;
 
vfio_ap_mdev_link_domain(matrix_mdev, apqi);
+   

[PATCH v14 02/13] s390/vfio-ap: use new AP bus interface to search for queue devices

2021-03-31 Thread Tony Krowiak
This patch refactors the vfio_ap device driver to use the AP bus's
ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
information about a queue that is bound to the vfio_ap device driver.
The bus's ap_get_qdev() function retrieves the queue device from a
hashtable keyed by APQN. This is much more efficient than looping over
the list of devices attached to the AP bus by several orders of
magnitude.

Signed-off-by: Tony Krowiak 
Reviewed-by: Halil Pasic 
---
 drivers/s390/crypto/vfio_ap_ops.c | 23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 6946a7e26eff..128a66d57305 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -27,13 +27,6 @@
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
 static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
 
-static int match_apqn(struct device *dev, const void *data)
-{
-   struct vfio_ap_queue *q = dev_get_drvdata(dev);
-
-   return (q->apqn == *(int *)(data)) ? 1 : 0;
-}
-
 /**
  * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
  * @matrix_mdev: the associated mediated matrix
@@ -1232,15 +1225,17 @@ static int vfio_ap_mdev_group_notifier(struct 
notifier_block *nb,
 
 static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
 {
-   struct device *dev;
+   struct ap_queue *queue;
struct vfio_ap_queue *q = NULL;
 
-   dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,
-, match_apqn);
-   if (dev) {
-   q = dev_get_drvdata(dev);
-   put_device(dev);
-   }
+   queue = ap_get_qdev(apqn);
+   if (!queue)
+   return NULL;
+
+   if (queue->ap_dev.device.driver == _dev->vfio_ap_drv->driver)
+   q = dev_get_drvdata(>ap_dev.device);
+
+   put_device(>ap_dev.device);
 
return q;
 }
-- 
2.21.3



[PATCH v14 03/13] s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c

2021-03-31 Thread Tony Krowiak
Let's move the probe and remove callbacks into the vfio_ap_ops.c
file to keep all code related to managing queues in a single file. This
way, all functions related to queue management can be removed from the
vfio_ap_private.h header file defining the public interfaces for the
vfio_ap device driver.

Signed-off-by: Tony Krowiak 
Reviewed-by: Halil Pasic 
---
 drivers/s390/crypto/vfio_ap_drv.c | 41 ++-
 drivers/s390/crypto/vfio_ap_ops.c | 28 ++
 drivers/s390/crypto/vfio_ap_private.h |  5 ++--
 3 files changed, 33 insertions(+), 41 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
b/drivers/s390/crypto/vfio_ap_drv.c
index 7dc72cb718b0..73bd073fd5d3 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -43,43 +43,6 @@ static struct ap_device_id ap_queue_ids[] = {
 
 MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
 
-/**
- * vfio_ap_queue_dev_probe:
- *
- * Allocate a vfio_ap_queue structure and associate it
- * with the device as driver_data.
- */
-static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
-{
-   struct vfio_ap_queue *q;
-
-   q = kzalloc(sizeof(*q), GFP_KERNEL);
-   if (!q)
-   return -ENOMEM;
-   dev_set_drvdata(>device, q);
-   q->apqn = to_ap_queue(>device)->qid;
-   q->saved_isc = VFIO_AP_ISC_INVALID;
-   return 0;
-}
-
-/**
- * vfio_ap_queue_dev_remove:
- *
- * Takes the matrix lock to avoid actions on this device while removing
- * Free the associated vfio_ap_queue structure
- */
-static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
-{
-   struct vfio_ap_queue *q;
-
-   mutex_lock(_dev->lock);
-   q = dev_get_drvdata(>device);
-   vfio_ap_mdev_reset_queue(q, 1);
-   dev_set_drvdata(>device, NULL);
-   kfree(q);
-   mutex_unlock(_dev->lock);
-}
-
 static void vfio_ap_matrix_dev_release(struct device *dev)
 {
struct ap_matrix_dev *matrix_dev = dev_get_drvdata(dev);
@@ -182,8 +145,8 @@ static int __init vfio_ap_init(void)
return ret;
 
memset(_ap_drv, 0, sizeof(vfio_ap_drv));
-   vfio_ap_drv.probe = vfio_ap_queue_dev_probe;
-   vfio_ap_drv.remove = vfio_ap_queue_dev_remove;
+   vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
+   vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
vfio_ap_drv.ids = ap_queue_ids;
 
ret = ap_driver_register(_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 128a66d57305..c630abac81d0 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1446,3 +1446,31 @@ void vfio_ap_mdev_unregister(void)
 {
mdev_unregister_device(_dev->device);
 }
+
+int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
+{
+   struct vfio_ap_queue *q;
+
+   q = kzalloc(sizeof(*q), GFP_KERNEL);
+   if (!q)
+   return -ENOMEM;
+   mutex_lock(_dev->lock);
+   q->apqn = to_ap_queue(>device)->qid;
+   q->saved_isc = VFIO_AP_ISC_INVALID;
+   dev_set_drvdata(>device, q);
+   mutex_unlock(_dev->lock);
+
+   return 0;
+}
+
+void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
+{
+   struct vfio_ap_queue *q;
+
+   mutex_lock(_dev->lock);
+   q = dev_get_drvdata(>device);
+   vfio_ap_mdev_reset_queue(q, 1);
+   dev_set_drvdata(>device, NULL);
+   kfree(q);
+   mutex_unlock(_dev->lock);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h 
b/drivers/s390/crypto/vfio_ap_private.h
index f82a6396acae..3ca2da62bdee 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -100,7 +100,8 @@ struct vfio_ap_queue {
 
 int vfio_ap_mdev_register(void);
 void vfio_ap_mdev_unregister(void);
-int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q,
-unsigned int retry);
+
+int vfio_ap_mdev_probe_queue(struct ap_device *queue);
+void vfio_ap_mdev_remove_queue(struct ap_device *queue);
 
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.21.3



[PATCH v14 04/13] s390/vfio-ap: manage link between queue struct and matrix mdev

2021-03-31 Thread Tony Krowiak
Let's create links between each queue device bound to the vfio_ap device
driver and the matrix mdev to which the queue's APQN is assigned. The idea
is to facilitate efficient retrieval of the objects representing the queue
devices and matrix mdevs as well as to verify that a queue assigned to
a matrix mdev is bound to the driver.

The links will be created as follows:

 * When the queue device is probed, if its APQN is assigned to a matrix
   mdev, the structures representing the queue device and the matrix mdev
   will be linked.

 * When an adapter or domain is assigned to a matrix mdev, for each new
   APQN assigned that references a queue device bound to the vfio_ap
   device driver, the structures representing the queue device and the
   matrix mdev will be linked.

The links will be removed as follows:

 * When the queue device is removed, if its APQN is assigned to a matrix
   mdev, the link from the structure representing the matrix mdev to the
   structure representing the queue will be removed. The link from the
   queue to the matrix mdev will be maintained because if the queue device
   is being removed due to a manual sysfs unbind, it may be needed after
   the queue is reset to clean up the IRQ resources allocated to enable AP
   interrupts for the KVM guest. Since the storage for the structure
   representing the queue device is ultimately freed by the remove
   callback, keeping the reference shouldn't be a problem.

 * When an adapter or domain is unassigned from a matrix mdev, for each
   APQN unassigned that references a queue device bound to the vfio_ap
   device driver, the structures representing the queue device and the
   matrix mdev will be unlinked.

Signed-off-by: Tony Krowiak 
Reviewed-by: Halil Pasic 
---
 drivers/s390/crypto/vfio_ap_ops.c | 145 +-
 drivers/s390/crypto/vfio_ap_private.h |   3 +
 2 files changed, 123 insertions(+), 25 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index c630abac81d0..28266165eb75 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -27,33 +27,17 @@
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
 static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
 
-/**
- * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
- * @matrix_mdev: the associated mediated matrix
- * @apqn: The queue APQN
- *
- * Retrieve a queue with a specific APQN from the list of the
- * devices of the vfio_ap_drv.
- * Verify that the APID and the APQI are set in the matrix.
- *
- * Returns the pointer to the associated vfio_ap_queue
- */
-static struct vfio_ap_queue *vfio_ap_get_queue(
-   struct ap_matrix_mdev *matrix_mdev,
-   int apqn)
+static struct vfio_ap_queue *
+vfio_ap_mdev_get_queue(struct ap_matrix_mdev *matrix_mdev, unsigned long apqn)
 {
struct vfio_ap_queue *q;
 
-   if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
-   return NULL;
-   if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
-   return NULL;
-
-   q = vfio_ap_find_queue(apqn);
-   if (q)
-   q->matrix_mdev = matrix_mdev;
+   hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
+   if (q && q->apqn == apqn)
+   return q;
+   }
 
-   return q;
+   return NULL;
 }
 
 /**
@@ -171,7 +155,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct 
vfio_ap_queue *q)
  status.response_code);
 end_free:
vfio_ap_free_aqic_resources(q);
-   q->matrix_mdev = NULL;
return status;
 }
 
@@ -300,7 +283,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
if (!matrix_mdev->kvm)
goto out_unlock;
 
-   q = vfio_ap_get_queue(matrix_mdev, apqn);
+   q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
if (!q)
goto out_unlock;
 
@@ -344,6 +327,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct 
mdev_device *mdev)
matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(_dev->info, _mdev->matrix);
init_waitqueue_head(_mdev->wait_for_kvm);
+   hash_init(matrix_mdev->qtable);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
matrix_mdev->pqap_hook.owner = THIS_MODULE;
@@ -578,6 +562,60 @@ static int vfio_ap_mdev_verify_no_sharing(struct 
ap_matrix_mdev *matrix_mdev)
return 0;
 }
 
+static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
+   struct vfio_ap_queue *q)
+{
+   if (q) {
+   q->matrix_mdev = matrix_mdev;
+   hash_add(matrix_mdev->qtable,
+>mdev_qnode, q->apqn);
+   }
+}
+
+static void vfio_a

[PATCH v14 00/13] s390/vfio-ap: dynamic configuration support

2021-03-31 Thread Tony Krowiak
not get plugged into the KVM guest on startup, or when
  a new adapter is assigned to the mdev.

* Replaced patch 8 by squashing patches 8 (filtering patch) and 15 (handle
  probe/remove).

* Added a patch 1 to remove disable IRQ after a reset because the reset
  already disables a queue.

* Now using filtering code to update the KVM guest's matrix when
  notified that AP bus scan has completed.

* Fixed issue with probe/remove not inititiated by a configuration change
  occurring within a config change.


Change log v9-v10:
-
* Updated the documentation in vfio-ap.rst to include information about the
  AP dynamic configuration support

Change log v8-v9:

* Fixed errors flagged by the kernel test robot

* Fixed issue with guest losing queues when a new queue is probed due to
  manual bind operation.

Change log v7-v8:

* Now logging a message when an attempt to reserve APQNs for the zcrypt
  drivers will result in taking a queue away from a KVM guest to provide
  the sysadmin a way to ascertain why the sysfs operation failed.

* Created locked and unlocked versions of the ap_parse_mask_str() function.

* Now using new interface provided by an AP bus patch -
  s390/ap: introduce new ap function ap_get_qdev() - to retrieve
  struct ap_queue representing an AP queue device. This patch is not a
  part of this series but is a prerequisite for this series.

Change log v6-v7:

* Added callbacks to AP bus:
  - on_config_changed: Notifies implementing drivers that
the AP configuration has changed since last AP device scan.
  - on_scan_complete: Notifies implementing drivers that the device scan
has completed.
  - implemented on_config_changed and on_scan_complete callbacks for
vfio_ap device driver.
  - updated vfio_ap device driver's probe and remove callbacks to handle
dynamic changes to the AP device model.
* Added code to filter APQNs when assigning AP resources to a KVM guest's
  CRYCB

Change log v5-v6:

* Fixed a bug in ap_bus.c introduced with patch 2/7 of the v5
  series. Harald Freudenberer pointed out that the mutex lock
  for ap_perms_mutex in the apmask_store and aqmask_store functions
  was not being freed.

* Removed patch 6/7 which added logging to the vfio_ap driver
  to expedite acceptance of this series. The logging will be introduced
  with a separate patch series to allow more time to explore options
  such as DBF logging vs. tracepoints.

* Added 3 patches related to ensuring that APQNs that do not reference
  AP queue devices bound to the vfio_ap device driver are not assigned
  to the guest CRYCB:

  Patch 4: Filter CRYCB bits for unavailable queue devices
  Patch 5: sysfs attribute to display the guest CRYCB
  Patch 6: update guest CRYCB in vfio_ap probe and remove callbacks

* Added a patch (Patch 9) to version the vfio_ap module.

* Reshuffled patches to allow the in_use callback implementation to
  invoke the vfio_ap_mdev_verify_no_sharing() function introduced in
  patch 2.

Change log v4-v5:

* Added a patch to provide kernel s390dbf debug logs for VFIO AP

Change log v3->v4:
-
* Restored patches preventing root user from changing ownership of
  APQNs from zcrypt drivers to the vfio_ap driver if the APQN is
  assigned to an mdev.

* No longer enforcing requirement restricting guest access to
  queues represented by a queue device bound to the vfio_ap
  device driver.

* Removed shadow CRYCB and now directly updating the guest CRYCB
  from the matrix mdev's matrix.

* Rebased the patch series on top of 'vfio: ap: AP Queue Interrupt
  Control' patches.

* Disabled bind/unbind sysfs interfaces for vfio_ap driver

Change log v2->v3:
-
* Allow guest access to an AP queue only if the queue is bound to
  the vfio_ap device driver.

* Removed the patch to test CRYCB masks before taking the vCPUs
  out of SIE. Now checking the shadow CRYCB in the vfio_ap driver.

Change log v1->v2:
-
* Removed patches preventing root user from unbinding AP queues from
  the vfio_ap device driver
* Introduced a shadow CRYCB in the vfio_ap driver to manage dynamic
  changes to the AP guest configuration due to root user interventions
  or hardware anomalies.

Tony Krowiak (13):
  s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks
  s390/vfio-ap: use new AP bus interface to search for queue devices
  s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c
  s390/vfio-ap: manage link between queue struct and matrix mdev
  s390/vfio-ap: introduce shadow APCB
  s390/vfio-ap: refresh guest's APCB by filtering APQNs assigned to mdev
  s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  s390/zcrypt: driver callback to indicate resource in use
  s390/vfio-ap: implement in-use callback for vfio_ap driver
  s390/vfio-ap: sysfs attribute to disp

[PATCH v14 01/13] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-31 Thread Tony Krowiak
This patch fixes a lockdep splat introduced by commit f21916ec4826
("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
The lockdep splat only occurs when starting a Secure Execution guest.
Crypto virtualization (vfio_ap) is not yet supported for SE guests;
however, in order to avoid this problem when support becomes available,
this fix is being provided.

The circular locking dependency was introduced when the setting of the
masks in the guest's APCB was executed while holding the matrix_dev->lock.
While the lock is definitely needed to protect the setting/unsetting of the
matrix_mdev->kvm pointer, it is not necessarily critical for setting the
masks; so, the matrix_dev->lock will be released while the masks are being
set or cleared.

Keep in mind, however, that another process that takes the matrix_dev->lock
can get control while the masks in the guest's APCB are being set or
cleared as a result of the driver being notified that the KVM pointer
has been set or unset. This could result in invalid access to the
matrix_mdev->kvm pointer by the intervening process. To avoid this
scenario, two new fields are being added to the ap_matrix_mdev struct:

struct ap_matrix_mdev {
...
bool kvm_busy;
wait_queue_head_t wait_for_kvm;
   ...
};

The functions that handle notification that the KVM pointer value has
been set or cleared will set the kvm_busy flag to true until they are done
processing at which time they will set it to false and wake up the tasks on
the matrix_mdev->wait_for_kvm wait queue. Functions that require
access to matrix_mdev->kvm will sleep on the wait queue until they are
awakened at which time they can safely access the matrix_mdev->kvm
field.

Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer 
invalidated")
Cc: sta...@vger.kernel.org
Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_ops.c | 308 ++
 drivers/s390/crypto/vfio_ap_private.h |   2 +
 2 files changed, 215 insertions(+), 95 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 1ffdd411201c..6946a7e26eff 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -294,6 +294,19 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
   struct ap_matrix_mdev, pqap_hook);
 
+   /*
+* If the KVM pointer is in the process of being set, wait until the
+* process has completed.
+*/
+   wait_event_cmd(matrix_mdev->wait_for_kvm,
+  !matrix_mdev->kvm_busy,
+  mutex_unlock(_dev->lock),
+  mutex_lock(_dev->lock));
+
+   /* If the there is no guest using the mdev, there is nothing to do */
+   if (!matrix_mdev->kvm)
+   goto out_unlock;
+
q = vfio_ap_get_queue(matrix_mdev, apqn);
if (!q)
goto out_unlock;
@@ -337,6 +350,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct 
mdev_device *mdev)
 
matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(_dev->info, _mdev->matrix);
+   init_waitqueue_head(_mdev->wait_for_kvm);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
matrix_mdev->pqap_hook.owner = THIS_MODULE;
@@ -351,17 +365,23 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)
 {
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-   if (matrix_mdev->kvm)
+   mutex_lock(_dev->lock);
+
+   /*
+* If the KVM pointer is in flux or the guest is running, disallow
+* un-assignment of control domain.
+*/
+   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   mutex_unlock(_dev->lock);
return -EBUSY;
+   }
 
-   mutex_lock(_dev->lock);
vfio_ap_mdev_reset_queues(mdev);
list_del(_mdev->node);
-   mutex_unlock(_dev->lock);
-
kfree(matrix_mdev);
mdev_set_drvdata(mdev, NULL);
atomic_inc(_dev->available_instances);
+   mutex_unlock(_dev->lock);
 
return 0;
 }
@@ -606,24 +626,31 @@ static ssize_t assign_adapter_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-   /* If the guest is running, disallow assignment of adapter */
-   if (matrix_mdev->kvm)
-   return -EBUSY;
+   mutex_lock(_dev->lock);
+
+   /*
+* If the KVM pointer is in flux or the guest is running, disallow
+* un-assignment of adapter
+*/
+   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   ret = -EBUSY;
+   goto done;
+   }
 

Re: [PATCH v13 06/15] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device

2021-03-31 Thread Tony Krowiak




On 1/14/21 8:44 PM, Halil Pasic wrote:

On Thu, 14 Jan 2021 12:54:39 -0500
Tony Krowiak  wrote:


   /**
* vfio_ap_mdev_verify_no_sharing
*
- * Verifies that the APQNs derived from the cross product of the AP adapter IDs
- * and AP queue indexes comprising the AP matrix are not configured for another
- * mediated device. AP queue sharing is not allowed.
+ * Verifies that each APQN derived from the Cartesian product of the AP adapter
+ * IDs and AP queue indexes comprising the AP matrix are not configured for
+ * another mediated device. AP queue sharing is not allowed.
*
- * @matrix_mdev: the mediated matrix device
+ * @matrix_mdev: the mediated matrix device to which the APQNs being verified
+ *  are assigned.
+ * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
+ * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
*
- * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
+ * Returns 0 if the APQNs are not shared, otherwise; returns -EBUSY.
*/
-static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
+static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long *mdev_apm,
+ unsigned long *mdev_aqm)
   {
struct ap_matrix_mdev *lstdev;
DECLARE_BITMAP(apm, AP_DEVICES);
@@ -523,20 +426,31 @@ static int vfio_ap_mdev_verify_no_sharing(struct 
ap_matrix_mdev *matrix_mdev)
 * We work on full longs, as we can only exclude the leftover
 * bits in non-inverse order. The leftover is all zeros.
 */
-   if (!bitmap_and(apm, matrix_mdev->matrix.apm,
-   lstdev->matrix.apm, AP_DEVICES))
+   if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
continue;
   
-		if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,

-   lstdev->matrix.aqm, AP_DOMAINS))
+   if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
continue;
   
-		return -EADDRINUSE;

+   vfio_ap_mdev_log_sharing_err(dev_name(mdev_dev(lstdev->mdev)),
+apm, aqm);
+
+   return -EBUSY;

Why do we change -EADDRINUSE to -EBUSY? This gets bubbled up to
userspace, or? So a tool that checks for the other mdev has it
condition by checking for -EADDRINUSE, would be confused...

Back in v8 of the series, Christian suggested the occurrences
of -EADDRINUSE should be replaced by the more appropriate
-EBUSY (Message ID ),
so I changed it here. It does get bubbled up to userspace, so you make a
valid point. I will
change it back. I will, however, set the value returned from the
__verify_card_reservations() function in ap_bus.c to -EBUSY as
suggested by Christian.

As long as the error code for an ephemeral failure due to can't take a
lock right now, and the error code for a failure due to a sharing
conflict are (which most likely requires admin action to be resolved)
I'm fine.

Choosing EBUSY for sharing conflict, and something else for can't take
lock for the bus attributes, while choosing EADDRINUSE for sharing
conflict, and EBUSY for can't take lock in the case of the mdev
attributes (assign_*; unassign_*) sounds confusing to me, but is still
better than collating the two conditions. Maybe we can choose EAGAIN
or EWOULDBLOCK for the can't take the lock right now. I don't know.


I was in the process of creating the change log for v14 of
this patch series and realized I never addressed this.
I think EAGAIN would be a better return code for the
mutex_trylock failures in the mdev assign/unassign
operations.



I'm open to suggestions. And if Christian wants to change this for
the already released interfaces, I will have to live with that. But it
has to be a conscious decision at least.

What I consider tricky about EBUSY, is that according to my intuition,
in pseudocode, object.operation(argument) returns -EBUSY probably tells
me that object is busy (i.e. is in the middle of something incompatible
with performing operation). In our case, it is not the object that is
busy, but the resource denoted by the argument.

Regards,
Halil




Re: [PATCH v5 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-26 Thread Tony Krowiak




On 3/25/21 4:32 PM, Halil Pasic wrote:

On Thu, 25 Mar 2021 08:46:40 -0400
Tony Krowiak  wrote:


This patch fixes a lockdep splat introduced by commit f21916ec4826
("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
The lockdep splat only occurs when starting a Secure Execution guest.
Crypto virtualization (vfio_ap) is not yet supported for SE guests;
however, in order to avoid this problem when support becomes available,
this fix is being provided.

The circular locking dependency was introduced when the setting of the
masks in the guest's APCB was executed while holding the matrix_dev->lock.
While the lock is definitely needed to protect the setting/unsetting of the
matrix_mdev->kvm pointer, it is not necessarily critical for setting the
masks; so, the matrix_dev->lock will be released while the masks are being
set or cleared.

Keep in mind, however, that another process that takes the matrix_dev->lock
can get control while the masks in the guest's APCB are being set or
cleared as a result of the driver being notified that the KVM pointer
has been set or unset. This could result in invalid access to the
matrix_mdev->kvm pointer by the intervening process. To avoid this
scenario, two new fields are being added to the ap_matrix_mdev struct:

struct ap_matrix_mdev {
...
bool kvm_busy;
wait_queue_head_t wait_for_kvm;
...
};

The functions that handle notification that the KVM pointer value has
been set or cleared will set the kvm_busy flag to true until they are done
processing at which time they will set it to false and wake up the tasks on
the matrix_mdev->wait_for_kvm wait queue. Functions that require
access to matrix_mdev->kvm will sleep on the wait queue until they are
awakened at which time they can safely access the matrix_mdev->kvm
field.

Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer 
invalidated")
Cc: sta...@vger.kernel.org
Signed-off-by: Tony Krowiak 

Reviewed-by: Halil Pasic 

I intend to give a couple of work-days to others, and if nobody objects
merge this. (I will wait till Tuesday.)


Thanks Halil.



I've tested it and it does silence the lockdep splat.

Regards,
Halil




[PATCH v5 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-25 Thread Tony Krowiak
This patch fixes a lockdep splat introduced by commit f21916ec4826
("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
The lockdep splat only occurs when starting a Secure Execution guest.
Crypto virtualization (vfio_ap) is not yet supported for SE guests;
however, in order to avoid this problem when support becomes available,
this fix is being provided.

The circular locking dependency was introduced when the setting of the
masks in the guest's APCB was executed while holding the matrix_dev->lock.
While the lock is definitely needed to protect the setting/unsetting of the
matrix_mdev->kvm pointer, it is not necessarily critical for setting the
masks; so, the matrix_dev->lock will be released while the masks are being
set or cleared.

Keep in mind, however, that another process that takes the matrix_dev->lock
can get control while the masks in the guest's APCB are being set or
cleared as a result of the driver being notified that the KVM pointer
has been set or unset. This could result in invalid access to the
matrix_mdev->kvm pointer by the intervening process. To avoid this
scenario, two new fields are being added to the ap_matrix_mdev struct:

struct ap_matrix_mdev {
...
bool kvm_busy;
wait_queue_head_t wait_for_kvm;
   ...
};

The functions that handle notification that the KVM pointer value has
been set or cleared will set the kvm_busy flag to true until they are done
processing at which time they will set it to false and wake up the tasks on
the matrix_mdev->wait_for_kvm wait queue. Functions that require
access to matrix_mdev->kvm will sleep on the wait queue until they are
awakened at which time they can safely access the matrix_mdev->kvm
field.

Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer 
invalidated")
Cc: sta...@vger.kernel.org
Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_ops.c | 309 ++
 drivers/s390/crypto/vfio_ap_private.h |   2 +
 2 files changed, 215 insertions(+), 96 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 1ffdd411201c..7deb0f9bb9ee 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -294,6 +294,19 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
   struct ap_matrix_mdev, pqap_hook);
 
+   /*
+* If the KVM pointer is in the process of being set, wait until the
+* process has completed.
+*/
+   wait_event_cmd(matrix_mdev->wait_for_kvm,
+  matrix_mdev->kvm_busy == false,
+  mutex_unlock(_dev->lock),
+  mutex_lock(_dev->lock));
+
+   /* If the there is no guest using the mdev, there is nothing to do */
+   if (!matrix_mdev->kvm)
+   goto out_unlock;
+
q = vfio_ap_get_queue(matrix_mdev, apqn);
if (!q)
goto out_unlock;
@@ -337,6 +350,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct 
mdev_device *mdev)
 
matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(_dev->info, _mdev->matrix);
+   init_waitqueue_head(_mdev->wait_for_kvm);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
matrix_mdev->pqap_hook.owner = THIS_MODULE;
@@ -351,17 +365,23 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)
 {
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-   if (matrix_mdev->kvm)
+   mutex_lock(_dev->lock);
+
+   /*
+* If the KVM pointer is in flux or the guest is running, disallow
+* un-assignment of control domain.
+*/
+   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   mutex_unlock(_dev->lock);
return -EBUSY;
+   }
 
-   mutex_lock(_dev->lock);
vfio_ap_mdev_reset_queues(mdev);
list_del(_mdev->node);
-   mutex_unlock(_dev->lock);
-
kfree(matrix_mdev);
mdev_set_drvdata(mdev, NULL);
atomic_inc(_dev->available_instances);
+   mutex_unlock(_dev->lock);
 
return 0;
 }
@@ -606,24 +626,31 @@ static ssize_t assign_adapter_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-   /* If the guest is running, disallow assignment of adapter */
-   if (matrix_mdev->kvm)
-   return -EBUSY;
+   mutex_lock(_dev->lock);
+
+   /*
+* If the KVM pointer is in flux or the guest is running, disallow
+* un-assignment of adapter
+*/
+   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   ret = -EBUSY;
+   go

[PATCH v5 0/1] s390/vfio-ap: fix circular lockdep when starting

2021-03-25 Thread Tony Krowiak
*Commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
pointer invalidated") introduced a change that results in a circular
lockdep when a Secure Execution guest that is configured with
crypto devices is started. The problem resulted due to the fact that the
patch moved the setting of the guest's AP masks within the protection of
the matrix_dev->lock when the vfio_ap driver is notified that the KVM
pointer has been set. Since it is not critical that setting/clearing of
the guest's AP masks be done under the matrix_dev->lock when the driver
is notified, the masks will not be updated under the matrix_dev->lock.
The lock is necessary for the setting/unsetting of the KVM pointer,
however, so that will remain in place.

The dependency chain for the circular lockdep resolved by this patch
is (in reverse order):

2:  vfio_ap_mdev_group_notifier:kvm->lock
matrix_dev->lock

1:  handle_pqap:matrix_dev->lock
kvm_vcpu_ioctl: vcpu->mutex

0:  kvm_s390_cpus_to_pv:vcpu->mutex
kvm_vm_ioctl:   kvm->lock

Please note:
---
* If checkpatch is run against this patch series, you may
  get a "WARNING: Unknown commit id 'f21916ec4826', maybe rebased or not
  pulled?" message. The commit 'f21916ec4826', however, is definitely
  in the master branch on top of which this patch series was built, so
  I'm not sure why this message is being output by checkpatch.
* All acks granted from previous review of this patch have been removed
  due to the fact that this patch introduces non-trivial changes (see
  change log below).

Change log v4=> v5:
--
* In vfio_ap_mdev_ioctl() function:
  - Verify matrix_mdev is not NULL before doing reset
  - Do reset regardless matrix_mdev->kvm is NULL or not

Change log v3=> v4:
--
* In vfio_ap_mdev_set_kvm() function, moved the setting of
  matrix_mdev->kvm_busy just prior to unlocking matrix_dev->lock.

* Reset queues regardless of regardless of the value of matrix_mdev->kvm
  in response to the VFIO_DEVICE_RESET ioctl.

Change log v2=> v3:
--
* Added two fields - 'bool kvm_busy' and 'wait_queue_head_t wait_for_kvm'
  to struct ap_matrix_mdev. The former indicates that the KVM
  pointer is in the process of being updated and the second allows a
  function that needs access to the KVM pointer to wait until it is
  no longer being updated. Resolves problem of synchronization between
  the functions that change the KVM pointer value and the functions that
  required access to it.

Change log v1=> v2:
--
* No longer holding the matrix_dev->lock prior to setting/clearing the
  masks supplying the AP configuration to a KVM guest.
* Make all updates to the data in the matrix mdev that is used to manage
  AP resources used by the KVM guest in the vfio_ap_mdev_set_kvm()
  function instead of the group notifier callback.
* Check for the matrix mdev's KVM pointer in the vfio_ap_mdev_unset_kvm()
  function instead of the vfio_ap_mdev_release() function.

Tony Krowiak (1):
  s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

 drivers/s390/crypto/vfio_ap_ops.c | 309 ++
 drivers/s390/crypto/vfio_ap_private.h |   2 +
 2 files changed, 215 insertions(+), 96 deletions(-)

--
2.21.3



Re: [PATCH v4 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-18 Thread Tony Krowiak




On 3/17/21 7:17 PM, Halil Pasic wrote:

On Wed, 10 Mar 2021 10:05:59 -0500
Tony Krowiak  wrote:


-   ret = vfio_ap_mdev_reset_queues(mdev);
+   matrix_mdev = mdev_get_drvdata(mdev);

Is it guaranteed that matrix_mdev can't be NULL here? If yes, please
remind me of the mechanism that ensures this.


+
+   /*
+* If the KVM pointer is in the process of being set, wait until
+* the process has completed.
+*/
+   wait_event_cmd(matrix_mdev->wait_for_kvm,
+  matrix_mdev->kvm_busy == false,
+  mutex_unlock(_dev->lock),
+  mutex_lock(_dev->lock));
+
+   if (matrix_mdev->kvm)
+   ret = vfio_ap_mdev_reset_queues(mdev);
+   else
+   ret = -ENODEV;

Didn't we agree to make the call to vfio_ap_mdev_reset_queues()
unconditional again (for reference please take look at
Message-ID: <64afa72c-2d6a-2ca1-e576-34e15fa57...@linux.ibm.com>)?


How about this:

static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
                    unsigned int cmd, unsigned long arg)
{
    int ret = 0;
    struct ap_matrix_mdev *matrix_mdev;

    ...
    case VFIO_DEVICE_RESET:
        matrix_mdev = mdev_get_drvdata(mdev);
        WARN(!matrix_mdev, "Driver data missing from mdev!!");

        if (matrix_mdev) {
            /*
             * If the KVM pointer is in the process of being set, wait 
until

             * the process has completed.
             */
            wait_event_cmd(matrix_mdev->wait_for_kvm,
                   matrix_mdev->kvm_busy == false,
mutex_unlock(_dev->lock),
mutex_lock(_dev->lock));

            ret = vfio_ap_mdev_reset_queues(mdev);
        }
        break;
    ...

    return ret;
}



Regards,
Halil




Re: [PATCH v4 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-18 Thread Tony Krowiak




On 3/17/21 7:17 PM, Halil Pasic wrote:

On Wed, 10 Mar 2021 10:05:59 -0500
Tony Krowiak  wrote:


-   ret = vfio_ap_mdev_reset_queues(mdev);
+   matrix_mdev = mdev_get_drvdata(mdev);

Is it guaranteed that matrix_mdev can't be NULL here? If yes, please
remind me of the mechanism that ensures this.


The matrix_mdev is set as drvdata when the mdev is created and
is only cleared when the mdev is removed. Likewise, this function
is a callback defined by by vfio in the vfio_ap_matrix_ops structure
when the matrix_dev is registered and is intended to handle ioctl
calls from userspace during the lifetime of the mdev. While I can't
speak definitively to the guarantee, I think it is extremely unlikely
that matrix_mdev would be NULL at this point. On the other hand,
it wouldn't hurt to check for NULL and log an error or warning
message (I prefer an error here) if NULL.




+
+   /*
+* If the KVM pointer is in the process of being set, wait until
+* the process has completed.
+*/
+   wait_event_cmd(matrix_mdev->wait_for_kvm,
+  matrix_mdev->kvm_busy == false,
+  mutex_unlock(_dev->lock),
+  mutex_lock(_dev->lock));
+
+   if (matrix_mdev->kvm)
+   ret = vfio_ap_mdev_reset_queues(mdev);
+   else
+   ret = -ENODEV;

Didn't we agree to make the call to vfio_ap_mdev_reset_queues()
unconditional again (for reference please take look at
Message-ID: <64afa72c-2d6a-2ca1-e576-34e15fa57...@linux.ibm.com>)?


Yes, we did agree to that and I changed it at the time. That change
got lost somehow; I'll reinstate it.



Regards,
Halil




[PATCH v4 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-10 Thread Tony Krowiak
This patch fixes a lockdep splat introduced by commit f21916ec4826
("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
The lockdep splat only occurs when starting a Secure Execution guest.
Crypto virtualization (vfio_ap) is not yet supported for SE guests;
however, in order to avoid this problem when support becomes available,
this fix is being provided.

The circular locking dependency was introduced when the setting of the
masks in the guest's APCB was executed while holding the matrix_dev->lock.
While the lock is definitely needed to protect the setting/unsetting of the
matrix_mdev->kvm pointer, it is not necessarily critical for setting the
masks; so, the matrix_dev->lock will be released while the masks are being
set or cleared.

Keep in mind, however, that another process that takes the matrix_dev->lock
can get control while the masks in the guest's APCB are being set or
cleared as a result of the driver being notified that the KVM pointer
has been set or unset. This could result in invalid access to the
matrix_mdev->kvm pointer by the intervening process. To avoid this
scenario, two new fields are being added to the ap_matrix_mdev struct:

struct ap_matrix_mdev {
...
bool kvm_busy;
wait_queue_head_t wait_for_kvm;
   ...
};

The functions that handle notification that the KVM pointer value has
been set or cleared will set the kvm_busy flag to true until they are done
processing at which time they will set it to false and wake up the tasks on
the matrix_mdev->wait_for_kvm wait queue. Functions that require
access to matrix_mdev->kvm will sleep on the wait queue until they are
awakened at which time they can safely access the matrix_mdev->kvm
field.

Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer 
invalidated")
Cc: sta...@vger.kernel.org
Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_ops.c | 309 ++
 drivers/s390/crypto/vfio_ap_private.h |   2 +
 2 files changed, 215 insertions(+), 96 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 41fc2e4135fe..445d1457faa8 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -294,6 +294,19 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
   struct ap_matrix_mdev, pqap_hook);
 
+   /*
+* If the KVM pointer is in the process of being set, wait until the
+* process has completed.
+*/
+   wait_event_cmd(matrix_mdev->wait_for_kvm,
+  matrix_mdev->kvm_busy == false,
+  mutex_unlock(_dev->lock),
+  mutex_lock(_dev->lock));
+
+   /* If the there is no guest using the mdev, there is nothing to do */
+   if (!matrix_mdev->kvm)
+   goto out_unlock;
+
q = vfio_ap_get_queue(matrix_mdev, apqn);
if (!q)
goto out_unlock;
@@ -337,6 +350,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct 
mdev_device *mdev)
 
matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(_dev->info, _mdev->matrix);
+   init_waitqueue_head(_mdev->wait_for_kvm);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
matrix_mdev->pqap_hook.owner = THIS_MODULE;
@@ -351,17 +365,23 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)
 {
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-   if (matrix_mdev->kvm)
+   mutex_lock(_dev->lock);
+
+   /*
+* If the KVM pointer is in flux or the guest is running, disallow
+* un-assignment of control domain.
+*/
+   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   mutex_unlock(_dev->lock);
return -EBUSY;
+   }
 
-   mutex_lock(_dev->lock);
vfio_ap_mdev_reset_queues(mdev);
list_del(_mdev->node);
-   mutex_unlock(_dev->lock);
-
kfree(matrix_mdev);
mdev_set_drvdata(mdev, NULL);
atomic_inc(_dev->available_instances);
+   mutex_unlock(_dev->lock);
 
return 0;
 }
@@ -606,24 +626,31 @@ static ssize_t assign_adapter_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-   /* If the guest is running, disallow assignment of adapter */
-   if (matrix_mdev->kvm)
-   return -EBUSY;
+   mutex_lock(_dev->lock);
+
+   /*
+* If the KVM pointer is in flux or the guest is running, disallow
+* un-assignment of adapter
+*/
+   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   ret = -EBUSY;
+   go

[PATCH v4 0/1] s390/vfio-ap: fix circular lockdep when starting

2021-03-10 Thread Tony Krowiak
*Commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
pointer invalidated") introduced a change that results in a circular
lockdep when a Secure Execution guest that is configured with
crypto devices is started. The problem resulted due to the fact that the
patch moved the setting of the guest's AP masks within the protection of
the matrix_dev->lock when the vfio_ap driver is notified that the KVM 
pointer has been set. Since it is not critical that setting/clearing of
the guest's AP masks be done under the matrix_dev->lock when the driver
is notified, the masks will not be updated under the matrix_dev->lock.
The lock is necessary for the setting/unsetting of the KVM pointer,
however, so that will remain in place. 

The dependency chain for the circular lockdep resolved by this patch 
is (in reverse order):

2:  vfio_ap_mdev_group_notifier:kvm->lock
matrix_dev->lock

1:  handle_pqap:matrix_dev->lock
kvm_vcpu_ioctl: vcpu->mutex

0:  kvm_s390_cpus_to_pv:vcpu->mutex
kvm_vm_ioctl:   kvm->lock

Please note:
---
* If checkpatch is run against this patch series, you may
  get a "WARNING: Unknown commit id 'f21916ec4826', maybe rebased or not 
  pulled?" message. The commit 'f21916ec4826', however, is definitely
  in the master branch on top of which this patch series was built, so
  I'm not sure why this message is being output by checkpatch.
* All acks granted from previous review of this patch have been removed
  due to the fact that this patch introduces non-trivial changes (see
  change log below).

Change log v3=> v4:
--
* In vfio_ap_mdev_set_kvm() function, moved the setting of 
  matrix_mdev->kvm_busy just prior to unlocking matrix_dev->lock.

* Reset queues regardless of regardless of the value of matrix_mdev->kvm
  in response to the VFIO_DEVICE_RESET ioctl.

Change log v2=> v3:
-- 
* Added two fields - 'bool kvm_busy' and 'wait_queue_head_t wait_for_kvm'
  to struct ap_matrix_mdev. The former indicates that the KVM
  pointer is in the process of being updated and the second allows a
  function that needs access to the KVM pointer to wait until it is
  no longer being updated. Resolves problem of synchronization between
  the functions that change the KVM pointer value and the functions that
  required access to it.

Change log v1=> v2:
--
* No longer holding the matrix_dev->lock prior to setting/clearing the
  masks supplying the AP configuration to a KVM guest.
* Make all updates to the data in the matrix mdev that is used to manage
  AP resources used by the KVM guest in the vfio_ap_mdev_set_kvm()
  function instead of the group notifier callback.
* Check for the matrix mdev's KVM pointer in the vfio_ap_mdev_unset_kvm()
  function instead of the vfio_ap_mdev_release() function.

Tony Krowiak (1):
  s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

 drivers/s390/crypto/vfio_ap_ops.c | 309 ++
 drivers/s390/crypto/vfio_ap_private.h |   2 +
 2 files changed, 215 insertions(+), 96 deletions(-)

-- 
2.21.3



Re: [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-09 Thread Tony Krowiak




On 3/9/21 5:23 AM, Halil Pasic wrote:

On Thu, 4 Mar 2021 12:43:44 -0500
Tony Krowiak  wrote:


On the other hand, if we don't have ->kvm because something broke,
then we may be out of luck anyway. There will certainly be no
way to unregister the GISC; however, it may still be possible
to unpin the pages if we still have q->saved_pfn.

The point is, if the queue is bound to vfio_ap, it can be reset. If we can't
clean up the IRQ resources because something is broken, then there
is nothing we can do about that.

Especially since the recently added WARN_ONCE macros calling reset_queues
unconditionally ain't that bad: we would at least see if there is a
problem with cleaning up the IRQ resources.

Let's make it unconditional again and observe. Can you send out a v4 with
this and the other issue fixed.


I agree and I can do that.

  


Regards,
Halil




Re: [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-04 Thread Tony Krowiak




On 3/3/21 2:47 PM, Halil Pasic wrote:

On Wed, 3 Mar 2021 12:10:11 -0500
Tony Krowiak  wrote:


On 3/3/21 10:23 AM, Halil Pasic wrote:

On Tue,  2 Mar 2021 15:43:22 -0500
Tony Krowiak  wrote:
  

This patch fixes a lockdep splat introduced by commit f21916ec4826
("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
The lockdep splat only occurs when starting a Secure Execution guest.
Crypto virtualization (vfio_ap) is not yet supported for SE guests;
however, in order to avoid this problem when support becomes available,
this fix is being provided.

[..]
  

@@ -1038,14 +1116,28 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
*matrix_mdev,
   {
struct ap_matrix_mdev *m;

-   list_for_each_entry(m, _dev->mdev_list, node) {
-   if ((m != matrix_mdev) && (m->kvm == kvm))
-   return -EPERM;
-   }
+   if (kvm->arch.crypto.crycbd) {
+   matrix_mdev->kvm_busy = true;

-   matrix_mdev->kvm = kvm;
-   kvm_get_kvm(kvm);
-   kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
+   list_for_each_entry(m, _dev->mdev_list, node) {
+   if ((m != matrix_mdev) && (m->kvm == kvm)) {
+   wake_up_all(_mdev->wait_for_kvm);

This ain't no good. kvm_busy will remain true if we take this exit. The
wake_up_all() is not needed, because we hold the lock, so nobody can
observe it if we don't forget kvm_busy set.

I suggest moving matrix_mdev->kvm_busy = true; after this loop, maybe right
before the unlock, and removing the wake_up_all().
  

+   return -EPERM;
+   }
+   }
+
+   kvm_get_kvm(kvm);
+   mutex_unlock(_dev->lock);
+   kvm_arch_crypto_set_masks(kvm,
+ matrix_mdev->matrix.apm,
+ matrix_mdev->matrix.aqm,
+ matrix_mdev->matrix.adm);
+   mutex_lock(_dev->lock);
+   kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
+   matrix_mdev->kvm = kvm;
+   matrix_mdev->kvm_busy = false;
+   wake_up_all(_mdev->wait_for_kvm);
+   }

return 0;
   }

[..]
  

@@ -1300,7 +1406,21 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device 
*mdev,
ret = vfio_ap_mdev_get_device_info(arg);
break;
case VFIO_DEVICE_RESET:
-   ret = vfio_ap_mdev_reset_queues(mdev);
+   matrix_mdev = mdev_get_drvdata(mdev);
+
+   /*
+* If the KVM pointer is in the process of being set, wait until
+* the process has completed.
+*/
+   wait_event_cmd(matrix_mdev->wait_for_kvm,
+  matrix_mdev->kvm_busy == false,
+  mutex_unlock(_dev->lock),
+  mutex_lock(_dev->lock));
+
+   if (matrix_mdev->kvm)
+   ret = vfio_ap_mdev_reset_queues(mdev);
+   else
+   ret = -ENODEV;

I don't think rejecting the reset is a good idea. I have you a more detailed
explanation of the list, where we initially discussed this question.

How do you exect userspace to react to this -ENODEV?

After reading your more detailed explanation, I have come to the
conclusion that the test for matrix_mdev->kvm should not be
performed here and the the vfio_ap_mdev_reset_queues() function
should be called regardless. Each queue assigned to the mdev
that is also bound to the vfio_ap driver will get reset and its
IRQ resources cleaned up if they haven't already been and the
other required conditions are met (i.e., see
vfio_ap_mdev_free_irq_resources()).

My point is if !->kvm the other required conditions are not met. But
yes we can go back to unconditional vfio_ap_mdev_reset_queues(mdev),
and think about the necessity of performing a
vfio_ap_mdev_reset_queues() if !->kvm later as I proposed in the other
mail.


The other conditions may or may not have been met depending
upon whether ->kvm is NULL because the VFIO_DEVICE_RESET
ioctl was invoked while the matrix_dev->lock was released
in the vfio_ap_mdev_unset_kvm() function. If that was the case,
then there is no need to clean up the IRQ resources because it
will already have been done.

On the other hand, if we don't have ->kvm because something broke,
then we may be out of luck anyway. There will certainly be no
way to unregister the GISC; however, it may still be possible
to unpin the pages if we still have q->saved_pfn.

The point is, if the queue is bound to vfio_ap, it can be reset. If we can't
clean up the IRQ resources because something is broken, then there
is nothing we can do about that.




Regards,
Halil




Re: [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-04 Thread Tony Krowiak




On 3/3/21 2:42 PM, Halil Pasic wrote:

On Wed, 3 Mar 2021 11:41:22 -0500
Tony Krowiak  wrote:


How do you exect userspace to react to this -ENODEV?

The VFIO_DEVICE_RESET ioctl expects a return code.
The vfio_ap_mdev_reset_queues() function can return -EIO or
-EBUSY, so I would expect userspace to handle -ENODEV
similarly to -EIO or any other non-zero return code. I also
looked at all of the VFIO_DEVICE_RESET calls from QEMU to see
how the return from the ioctl call is handled:

* ap: reports the reset failed along with the rc

And carries on as if nothing happened. There is not much smart
userspace can do in such a situation. Therefore the reset really
should not fail.


Regardless of what we decide to do here, there is the
possibility that the vfio_ap_mdev_reset_queues()
function will return an error, so your point is moot
and maybe should be brought up as a QEMU
implementation issue. I don't think it is encumbent
upon the KVM code to anticipate how userspace
will respond to a non-zero return code. I think the
pertinent question is what return code makes sense.
Having said that, I have other concerns which I
discussed below.



Please note that in this particular case, if the userspace would
opt for a retry, we would most likely end up in a retry loop.


* ccw: doesn't check the rc
* pci: kind of hard to follow without digging deep, but definitely
   handles non-zero rc.

I think the caller should be notified whether the queues were
successfully reset or not, and why; in this case, the answer is
there are no devices to reset.

That is the wrong answer. The ioctl is supposed to reset the
ap_matrix_mdev device. The ap_matrix_mdev device still exists. Thus
returning -ENODEV is bugous.


That makes sense and it begs the question, what does it mean to
reset the mdev? Is resetting the queues an appropriate response
to the VFIO_DEVICE_RESET ioctl call?

The purpose of the mdev is to supply the AP configuration to a KVM
guest. The queues themselves belong to the guest. If the guest enables
interrupts for a queue and vfio_ap does a reset in response to the ioctl
call, then the guest will be sitting there waiting for interrupts which
have been disabled by the reset. It seems that as long as a guest is
using the mdev, then management of its queues (i.e., reset) should be
left to the guest. Unless there is something to reset as far as the
mdev is concerned, maybe the response to the VFIO_RESET_DEVICE
ioctl ought to be a NOP regardless of the value of ->kvm.



Regards,
Halil




Re: [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-03 Thread Tony Krowiak




On 3/3/21 10:23 AM, Halil Pasic wrote:

On Tue,  2 Mar 2021 15:43:22 -0500
Tony Krowiak  wrote:


This patch fixes a lockdep splat introduced by commit f21916ec4826
("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
The lockdep splat only occurs when starting a Secure Execution guest.
Crypto virtualization (vfio_ap) is not yet supported for SE guests;
however, in order to avoid this problem when support becomes available,
this fix is being provided.

[..]


@@ -1038,14 +1116,28 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
*matrix_mdev,
  {
struct ap_matrix_mdev *m;

-   list_for_each_entry(m, _dev->mdev_list, node) {
-   if ((m != matrix_mdev) && (m->kvm == kvm))
-   return -EPERM;
-   }
+   if (kvm->arch.crypto.crycbd) {
+   matrix_mdev->kvm_busy = true;

-   matrix_mdev->kvm = kvm;
-   kvm_get_kvm(kvm);
-   kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
+   list_for_each_entry(m, _dev->mdev_list, node) {
+   if ((m != matrix_mdev) && (m->kvm == kvm)) {
+   wake_up_all(_mdev->wait_for_kvm);

This ain't no good. kvm_busy will remain true if we take this exit. The
wake_up_all() is not needed, because we hold the lock, so nobody can
observe it if we don't forget kvm_busy set.

I suggest moving matrix_mdev->kvm_busy = true; after this loop, maybe right
before the unlock, and removing the wake_up_all().


+   return -EPERM;
+   }
+   }
+
+   kvm_get_kvm(kvm);
+   mutex_unlock(_dev->lock);
+   kvm_arch_crypto_set_masks(kvm,
+ matrix_mdev->matrix.apm,
+ matrix_mdev->matrix.aqm,
+ matrix_mdev->matrix.adm);
+   mutex_lock(_dev->lock);
+   kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
+   matrix_mdev->kvm = kvm;
+   matrix_mdev->kvm_busy = false;
+   wake_up_all(_mdev->wait_for_kvm);
+   }

return 0;
  }

[..]


@@ -1300,7 +1406,21 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device 
*mdev,
ret = vfio_ap_mdev_get_device_info(arg);
break;
case VFIO_DEVICE_RESET:
-   ret = vfio_ap_mdev_reset_queues(mdev);
+   matrix_mdev = mdev_get_drvdata(mdev);
+
+   /*
+* If the KVM pointer is in the process of being set, wait until
+* the process has completed.
+*/
+   wait_event_cmd(matrix_mdev->wait_for_kvm,
+  matrix_mdev->kvm_busy == false,
+  mutex_unlock(_dev->lock),
+  mutex_lock(_dev->lock));
+
+   if (matrix_mdev->kvm)
+   ret = vfio_ap_mdev_reset_queues(mdev);
+   else
+   ret = -ENODEV;

I don't think rejecting the reset is a good idea. I have you a more detailed
explanation of the list, where we initially discussed this question.

How do you exect userspace to react to this -ENODEV?


After reading your more detailed explanation, I have come to the
conclusion that the test for matrix_mdev->kvm should not be
performed here and the the vfio_ap_mdev_reset_queues() function
should be called regardless. Each queue assigned to the mdev
that is also bound to the vfio_ap driver will get reset and its
IRQ resources cleaned up if they haven't already been and the
other required conditions are met (i.e., see 
vfio_ap_mdev_free_irq_resources()).




Otherwise looks good to me!

I've tested your branch from yesterday (which looks to me like this patch
without the above check on ->kvm and reset) for the lockdep splat, but I
didn't do any comprehensive testing -- which would ensure that we didn't
break something else in the process. With the two issues fixed, and your
word that the patch was properly tested (except for the lockdep splat
which I tested myself), I feel comfortable with moving forward with this.

Regards,





Re: [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-03 Thread Tony Krowiak




On 3/3/21 10:23 AM, Halil Pasic wrote:

On Tue,  2 Mar 2021 15:43:22 -0500
Tony Krowiak  wrote:


This patch fixes a lockdep splat introduced by commit f21916ec4826
("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
The lockdep splat only occurs when starting a Secure Execution guest.
Crypto virtualization (vfio_ap) is not yet supported for SE guests;
however, in order to avoid this problem when support becomes available,
this fix is being provided.

[..]


@@ -1038,14 +1116,28 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
*matrix_mdev,
  {
struct ap_matrix_mdev *m;

-   list_for_each_entry(m, _dev->mdev_list, node) {
-   if ((m != matrix_mdev) && (m->kvm == kvm))
-   return -EPERM;
-   }
+   if (kvm->arch.crypto.crycbd) {
+   matrix_mdev->kvm_busy = true;

-   matrix_mdev->kvm = kvm;
-   kvm_get_kvm(kvm);
-   kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
+   list_for_each_entry(m, _dev->mdev_list, node) {
+   if ((m != matrix_mdev) && (m->kvm == kvm)) {
+   wake_up_all(_mdev->wait_for_kvm);

This ain't no good. kvm_busy will remain true if we take this exit. The
wake_up_all() is not needed, because we hold the lock, so nobody can
observe it if we don't forget kvm_busy set.

I suggest moving matrix_mdev->kvm_busy = true; after this loop, maybe right
before the unlock, and removing the wake_up_all().


Okay




+   return -EPERM;
+   }
+   }
+
+   kvm_get_kvm(kvm);
+   mutex_unlock(_dev->lock);
+   kvm_arch_crypto_set_masks(kvm,
+ matrix_mdev->matrix.apm,
+ matrix_mdev->matrix.aqm,
+ matrix_mdev->matrix.adm);
+   mutex_lock(_dev->lock);
+   kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
+   matrix_mdev->kvm = kvm;
+   matrix_mdev->kvm_busy = false;
+   wake_up_all(_mdev->wait_for_kvm);
+   }

return 0;
  }

[..]


@@ -1300,7 +1406,21 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device 
*mdev,
ret = vfio_ap_mdev_get_device_info(arg);
break;
case VFIO_DEVICE_RESET:
-   ret = vfio_ap_mdev_reset_queues(mdev);
+   matrix_mdev = mdev_get_drvdata(mdev);
+
+   /*
+* If the KVM pointer is in the process of being set, wait until
+* the process has completed.
+*/
+   wait_event_cmd(matrix_mdev->wait_for_kvm,
+  matrix_mdev->kvm_busy == false,
+  mutex_unlock(_dev->lock),
+  mutex_lock(_dev->lock));
+
+   if (matrix_mdev->kvm)
+   ret = vfio_ap_mdev_reset_queues(mdev);
+   else
+   ret = -ENODEV;

I don't think rejecting the reset is a good idea. I have you a more detailed
explanation of the list, where we initially discussed this question.

How do you exect userspace to react to this -ENODEV?


The VFIO_DEVICE_RESET ioctl expects a return code.
The vfio_ap_mdev_reset_queues() function can return -EIO or
-EBUSY, so I would expect userspace to handle -ENODEV
similarly to -EIO or any other non-zero return code. I also
looked at all of the VFIO_DEVICE_RESET calls from QEMU to see
how the return from the ioctl call is handled:

* ap: reports the reset failed along with the rc
* ccw: doesn't check the rc
* pci: kind of hard to follow without digging deep, but definitely
 handles non-zero rc.

I think the caller should be notified whether the queues were
successfully reset or not, and why; in this case, the answer is
there are no devices to reset.



Otherwise looks good to me!

I've tested your branch from yesterday (which looks to me like this patch
without the above check on ->kvm and reset) for the lockdep splat, but I
didn't do any comprehensive testing -- which would ensure that we didn't
break something else in the process. With the two issues fixed, and your
word that the patch was properly tested (except for the lockdep splat
which I tested myself), I feel comfortable with moving forward with this.

Regards,





[PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-03-02 Thread Tony Krowiak
This patch fixes a lockdep splat introduced by commit f21916ec4826
("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
The lockdep splat only occurs when starting a Secure Execution guest.
Crypto virtualization (vfio_ap) is not yet supported for SE guests;
however, in order to avoid this problem when support becomes available,
this fix is being provided.

The circular locking dependency was introduced when the setting of the
masks in the guest's APCB was executed while holding the matrix_dev->lock.
While the lock is definitely needed to protect the setting/unsetting of the
matrix_mdev->kvm pointer, it is not necessarily critical for setting the
masks; so, the matrix_dev->lock will be released while the masks are being
set or cleared.

Keep in mind, however, that another process that takes the matrix_dev->lock
can get control while the masks in the guest's APCB are being set or
cleared as a result of the driver being notified that the KVM pointer
has been set or unset. This could result in invalid access to the
matrix_mdev->kvm pointer by the intervening process. To avoid this
scenario, two new fields are being added to the ap_matrix_mdev struct:

struct ap_matrix_mdev {
...
bool kvm_busy;
wait_queue_head_t wait_for_kvm;
   ...
};

The functions that handle notification that the KVM pointer value has
been set or cleared will set the kvm_busy flag to true until they are done
processing at which time they will set it to false and wake up the tasks on
the matrix_mdev->wait_for_kvm wait queue. Functions that require
access to matrix_mdev->kvm will sleep on the wait queue until they are
awakened at which time they can safely access the matrix_mdev->kvm
field.

Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer 
invalidated")
Cc: sta...@vger.kernel.org
Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_ops.c | 312 ++
 drivers/s390/crypto/vfio_ap_private.h |   2 +
 2 files changed, 218 insertions(+), 96 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 41fc2e4135fe..aaf642a21a9d 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -294,6 +294,19 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
   struct ap_matrix_mdev, pqap_hook);
 
+   /*
+* If the KVM pointer is in the process of being set, wait until the
+* process has completed.
+*/
+   wait_event_cmd(matrix_mdev->wait_for_kvm,
+  matrix_mdev->kvm_busy == false,
+  mutex_unlock(_dev->lock),
+  mutex_lock(_dev->lock));
+
+   /* If the there is no guest using the mdev, there is nothing to do */
+   if (!matrix_mdev->kvm)
+   goto out_unlock;
+
q = vfio_ap_get_queue(matrix_mdev, apqn);
if (!q)
goto out_unlock;
@@ -337,6 +350,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct 
mdev_device *mdev)
 
matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(_dev->info, _mdev->matrix);
+   init_waitqueue_head(_mdev->wait_for_kvm);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
matrix_mdev->pqap_hook.owner = THIS_MODULE;
@@ -351,17 +365,23 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)
 {
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-   if (matrix_mdev->kvm)
+   mutex_lock(_dev->lock);
+
+   /*
+* If the KVM pointer is in flux or the guest is running, disallow
+* un-assignment of control domain.
+*/
+   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   mutex_unlock(_dev->lock);
return -EBUSY;
+   }
 
-   mutex_lock(_dev->lock);
vfio_ap_mdev_reset_queues(mdev);
list_del(_mdev->node);
-   mutex_unlock(_dev->lock);
-
kfree(matrix_mdev);
mdev_set_drvdata(mdev, NULL);
atomic_inc(_dev->available_instances);
+   mutex_unlock(_dev->lock);
 
return 0;
 }
@@ -606,24 +626,31 @@ static ssize_t assign_adapter_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-   /* If the guest is running, disallow assignment of adapter */
-   if (matrix_mdev->kvm)
-   return -EBUSY;
+   mutex_lock(_dev->lock);
+
+   /*
+* If the KVM pointer is in flux or the guest is running, disallow
+* un-assignment of adapter
+*/
+   if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+   ret = -EBUSY;
+   go

[PATCH v3 0/1] s390/vfio-ap: fix circular lockdep when starting SE guest

2021-03-02 Thread Tony Krowiak
*Commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
pointer invalidated") introduced a change that results in a circular
lockdep when a Secure Execution guest that is configured with
crypto devices is started. The problem resulted due to the fact that the
patch moved the setting of the guest's AP masks within the protection of
the matrix_dev->lock when the vfio_ap driver is notified that the KVM 
pointer has been set. Since it is not critical that setting/clearing of
the guest's AP masks be done under the matrix_dev->lock when the driver is
notified, the masks will not be updated under the matrix_dev->lock. The
lock is necessary for the setting/unsetting of the KVM pointer, however,
so that will remain in place. 

The dependency chain for the circular lockdep resolved by this patch 
is (in reverse order):

2:  vfio_ap_mdev_group_notifier:kvm->lock
matrix_dev->lock

1:  handle_pqap:matrix_dev->lock
kvm_vcpu_ioctl: vcpu->mutex

0:  kvm_s390_cpus_to_pv:vcpu->mutex
kvm_vm_ioctl:   kvm->lock

Please note:
---
* If checkpatch is run against this patch series, you may
  get a "WARNING: Unknown commit id 'f21916ec4826', maybe rebased or not 
  pulled?" message. The commit 'f21916ec4826', however, is definitely
  in the master branch on top of which this patch series was built, so I'm
 not sure why this message is being output by checkpatch.
* All acks granted from previous review of this patch have been removed due
  to the fact that this patch introduces non-trivial changes (see change
  log below).

Change log v2=> v3:
-- 
* Added two fields - 'bool kvm_busy' and 'wait_queue_head_t wait_for_kvm' -
  fields to struct ap_matrix_mdev. The former indicates that the KVM
  pointer is in the process of being updated and the second allows a
  function that needs access to the KVM pointer to wait until it is
  no longer being updated. Resolves problem of synchronization between
  the functions that change the KVM pointer value and the functions that
  required access to it.

Change log v1=> v2:
--
* No longer holding the matrix_dev->lock prior to setting/clearing the
  masks supplying the AP configuration to a KVM guest.
* Make all updates to the data in the matrix mdev that is used to manage
  AP resources used by the KVM guest in the vfio_ap_mdev_set_kvm() function
  instead of the group notifier callback.
* Check for the matrix mdev's KVM pointer in the vfio_ap_mdev_unset_kvm()
  function instead of the vfio_ap_mdev_release() function.

Tony Krowiak (1):
  s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

 drivers/s390/crypto/vfio_ap_ops.c | 312 ++
 drivers/s390/crypto/vfio_ap_private.h |   2 +
 2 files changed, 218 insertions(+), 96 deletions(-)

-- 
2.21.3



Re: [PATCH] s390: crypto: Return -EFAULT if copy_to_user() fails

2021-03-01 Thread Tony Krowiak




On 3/1/21 7:08 AM, Wang Qing wrote:

The copy_to_user() function returns the number of bytes remaining to be
copied, but we want to return -EFAULT if the copy doesn't complete.

Signed-off-by: Wang Qing 
---
  drivers/s390/crypto/vfio_ap_ops.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 41fc2e413..1ffdd41
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1286,7 +1286,7 @@ static int vfio_ap_mdev_get_device_info(unsigned long arg)
info.num_regions = 0;
info.num_irqs = 0;
  
-	return copy_to_user((void __user *)arg, , minsz);

+   return copy_to_user((void __user *)arg, , minsz) ? -EFAULT : 0;
  }


LGTM
Reviewed-by: Tony Krowiak 

  
  static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,




Re: [PATCH v2 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-25 Thread Tony Krowiak




On 2/25/21 10:35 AM, Halil Pasic wrote:

On Thu, 25 Feb 2021 10:25:24 -0500
Tony Krowiak  wrote:


On 2/25/21 8:53 AM, Tony Krowiak wrote:


On 2/25/21 6:28 AM, Halil Pasic wrote:

On Wed, 24 Feb 2021 22:28:50 -0500
Tony Krowiak  wrote:
  

static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
{
-   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
-   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
-   vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
-   kvm_put_kvm(matrix_mdev->kvm);
-   matrix_mdev->kvm = NULL;
+   struct kvm *kvm;
+
+   if (matrix_mdev->kvm) {
+   kvm = matrix_mdev->kvm;
+   kvm_get_kvm(kvm);
+   matrix_mdev->kvm = NULL;

I think if there were two threads dong the unset in parallel, one
of them could bail out and carry on before the cleanup is done. But
since nothing much happens in release after that, I don't see an
immediate problem.

Another thing to consider is, that setting ->kvm to NULL arms
vfio_ap_mdev_remove()...

I'm not entirely sure what you mean by this, but my
assumption is that you are talking about the check
for matrix_mdev->kvm != NULL at the start of
that function.

Yes I was talking about the check

static int vfio_ap_mdev_remove(struct mdev_device *mdev)
{
  struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
  
  if (matrix_mdev->kvm)

  return -EBUSY;
...
  kfree(matrix_mdev);
...
}

As you see, we bail out if kvm is still set, otherwise we clean up the
matrix_mdev which includes kfree-ing it. And vfio_ap_mdev_remove() is
initiated via the sysfs, i.e. can be initiated at any time. If we were
to free matrix_mdev in mdev_remove() and then carry on with kvm_unset()
with mutex_lock(_dev->lock); that would be bad.

I agree.
  
  

The reason
matrix_mdev->kvm is set to NULL before giving up
the matrix_dev->lock is so that functions that check
for the presence of the matrix_mdev->kvm pointer,
such as assign_adapter_store() - will exit if they get
control while the masks are being cleared.

I disagree!

static ssize_t assign_adapter_store(struct device *dev,
  struct device_attribute *attr,
  const char *buf, size_t count)
{
  int ret;
  unsigned long apid;
  struct mdev_device *mdev = mdev_from_dev(dev);
  struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
  
  /* If the guest is running, disallow assignment of adapter */

  if (matrix_mdev->kvm)
  return -EBUSY;

We bail out when kvm != NULL, so having it set to NULL while the
mask are being cleared will make these not bail out.

You are correct, I am an idiot.
  

So what we have
here is a catch-22; in other words, we have the case
you pointed out above and the cases related to
assigning/unassigning adapters, domains and
control domains which should exit when a guest
is running.

See above.

Ditto.
  

I may have an idea to resolve this. Suppose we add:

struct ap_matrix_mdev {
       ...
       bool kvm_busy;
       ...
}

This flag will be set to true at the start of both the
vfio_ap_mdev_set_kvm() and vfio_ap_mdev_unset_kvm()
and set to false at the end. The assignment/unassignment
and remove callback functions can test this flag and
return -EBUSY if the flag is true. That will preclude assigning
or unassigning adapters, domains and control domains when
the KVM pointer is being set/unset. Likewise, removal of the
mediated device will also be prevented while the KVM pointer
is being set/unset.

In the case of the PQAP handler function, it can wait for the
set/unset of the KVM pointer as follows:

/while (matrix_mdev->kvm_busy) {//
//        mutex_unlock(_dev->lock);//
//        msleep(100);//
//        mutex_lock(_dev->lock);//
//}//
//
//if (!matrix_mdev->kvm)//
//        goto out_unlock;

/What say you?
//

I'm not sure. Since I disagree with your analysis above it is difficult
to deal with the conclusion. I'm not against decoupling the tracking of
the state of the mdev_matrix device from the value of the kvm pointer. I
think we should first get a common understanding of the problem, before
we proceed to the solution.

Regardless of my brain fog regarding the testing of the
matrix_mdev->kvm pointer, I stand by what I stated
in the paragraphs just before the code snippet.

The problem is there are 10 functions that depend upon
the value of the matrix_mdev->kvm pointer that can get
control while the pointer is being set/unset and the
matrix_dev->lock is given up to set/clear the masks:

* vfio_ap_irq_enable: called by handle_pqap() when AQIC is intercepted
* vfio_ap_irq_disable: called by handle_pqap() when AQ

Re: [PATCH v2 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-25 Thread Tony Krowiak




On 2/25/21 8:53 AM, Tony Krowiak wrote:



On 2/25/21 6:28 AM, Halil Pasic wrote:

On Wed, 24 Feb 2021 22:28:50 -0500
Tony Krowiak  wrote:


   static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
   {
-   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
-   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
-   vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
-   kvm_put_kvm(matrix_mdev->kvm);
-   matrix_mdev->kvm = NULL;
+   struct kvm *kvm;
+
+   if (matrix_mdev->kvm) {
+   kvm = matrix_mdev->kvm;
+   kvm_get_kvm(kvm);
+   matrix_mdev->kvm = NULL;

I think if there were two threads dong the unset in parallel, one
of them could bail out and carry on before the cleanup is done. But
since nothing much happens in release after that, I don't see an
immediate problem.

Another thing to consider is, that setting ->kvm to NULL arms
vfio_ap_mdev_remove()...

I'm not entirely sure what you mean by this, but my
assumption is that you are talking about the check
for matrix_mdev->kvm != NULL at the start of
that function.

Yes I was talking about the check

static int vfio_ap_mdev_remove(struct mdev_device *mdev)
{
 struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
 if (matrix_mdev->kvm)

 return -EBUSY;
...
 kfree(matrix_mdev);
...
}

As you see, we bail out if kvm is still set, otherwise we clean up the
matrix_mdev which includes kfree-ing it. And vfio_ap_mdev_remove() is
initiated via the sysfs, i.e. can be initiated at any time. If we were
to free matrix_mdev in mdev_remove() and then carry on with kvm_unset()
with mutex_lock(_dev->lock); that would be bad.


I agree.




The reason
matrix_mdev->kvm is set to NULL before giving up
the matrix_dev->lock is so that functions that check
for the presence of the matrix_mdev->kvm pointer,
such as assign_adapter_store() - will exit if they get
control while the masks are being cleared.

I disagree!

static ssize_t assign_adapter_store(struct device *dev,
 struct device_attribute *attr,
 const char *buf, size_t count)
{
 int ret;
 unsigned long apid;
 struct mdev_device *mdev = mdev_from_dev(dev);
 struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
 /* If the guest is running, disallow assignment of adapter */

 if (matrix_mdev->kvm)
 return -EBUSY;

We bail out when kvm != NULL, so having it set to NULL while the
mask are being cleared will make these not bail out.


You are correct, I am an idiot.


So what we have
here is a catch-22; in other words, we have the case
you pointed out above and the cases related to
assigning/unassigning adapters, domains and
control domains which should exit when a guest
is running.

See above.


Ditto.


I may have an idea to resolve this. Suppose we add:

struct ap_matrix_mdev {
      ...
      bool kvm_busy;
      ...
}

This flag will be set to true at the start of both the
vfio_ap_mdev_set_kvm() and vfio_ap_mdev_unset_kvm()
and set to false at the end. The assignment/unassignment
and remove callback functions can test this flag and
return -EBUSY if the flag is true. That will preclude assigning
or unassigning adapters, domains and control domains when
the KVM pointer is being set/unset. Likewise, removal of the
mediated device will also be prevented while the KVM pointer
is being set/unset.

In the case of the PQAP handler function, it can wait for the
set/unset of the KVM pointer as follows:

/while (matrix_mdev->kvm_busy) {//
//        mutex_unlock(_dev->lock);//
//        msleep(100);//
//        mutex_lock(_dev->lock);//
//}//
//
//if (!matrix_mdev->kvm)//
//        goto out_unlock;

/What say you?
//

I'm not sure. Since I disagree with your analysis above it is difficult
to deal with the conclusion. I'm not against decoupling the tracking of
the state of the mdev_matrix device from the value of the kvm pointer. I
think we should first get a common understanding of the problem, before
we proceed to the solution.


Regardless of my brain fog regarding the testing of the
matrix_mdev->kvm pointer, I stand by what I stated
in the paragraphs just before the code snippet.

The problem is there are 10 functions that depend upon
the value of the matrix_mdev->kvm pointer that can get
control while the pointer is being set/unset and the
matrix_dev->lock is given up to set/clear the masks:


* vfio_ap_irq_enable: called by handle_pqap() when AQIC is intercepted
* vfio_ap_irq_disable: called by handle_pqap() when AQIC is intercepted
* assign_adapter_store: sysfs
* unassign_adapter_store: sysfs
* assign_domain_store: sy

Re: [PATCH v2 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-24 Thread Tony Krowiak




On 2/24/21 11:10 AM, Christian Borntraeger wrote:


On 23.02.21 10:48, Halil Pasic wrote:

On Mon, 15 Feb 2021 20:15:47 -0500
Tony Krowiak  wrote:


This patch fixes a circular locking dependency in the CI introduced by
commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
pointer invalidated"). The lockdep only occurs when starting a Secure
Execution guest. Crypto virtualization (vfio_ap) is not yet supported for
SE guests; however, in order to avoid CI errors, this fix is being
provided.

The circular lockdep was introduced when the masks in the guest's APCB
were taken under the matrix_dev->lock. While the lock is definitely
needed to protect the setting/unsetting of the KVM pointer, it is not
necessarily critical for setting the masks, so this will not be done under
protection of the matrix_dev->lock.



With the one little thing I commented on below addressed:
Acked-by: Halil Pasic 

Tony, can you comment on Halils comment or send a v3 right away?


I was locked out of email due to expiration of my w3 password.
I am working on the response now.




Re: [PATCH v2 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-19 Thread Tony Krowiak




On 2/19/21 8:45 AM, Cornelia Huck wrote:

On Mon, 15 Feb 2021 20:15:47 -0500
Tony Krowiak  wrote:


This patch fixes a circular locking dependency in the CI introduced by
commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
pointer invalidated"). The lockdep only occurs when starting a Secure
Execution guest. Crypto virtualization (vfio_ap) is not yet supported for
SE guests; however, in order to avoid CI errors, this fix is being
provided.

The circular lockdep was introduced when the masks in the guest's APCB
were taken under the matrix_dev->lock. While the lock is definitely
needed to protect the setting/unsetting of the KVM pointer, it is not
necessarily critical for setting the masks, so this will not be done under
protection of the matrix_dev->lock.

Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer 
invalidated")
Cc: sta...@vger.kernel.org
Signed-off-by: Tony Krowiak 
---
  drivers/s390/crypto/vfio_ap_ops.c | 119 +-
  1 file changed, 84 insertions(+), 35 deletions(-)

I've been looking at the patch for a bit now and tried to follow down
the various paths; and while I think it's ok, I do not really have
enough confidence about that for a R-b. But have an

Acked-by: Cornelia Huck 


Thanks for the review.







[PATCH v2 0/1] s390/vfio-ap: fix circular lockdep when staring SE guest

2021-02-15 Thread Tony Krowiak
Commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
pointer invalidated") introduced a change that results in a circular
lockdep when a Secure Execution guest that is configured with
crypto devices is started. The problem resulted due to the fact that the
patch moved the setting of the guest's AP masks within the protection of
the matrix_dev->lock when the vfio_ap driver is notified that the KVM 
pointer has been set. Since it is not critical that setting/clearing of
the guest's AP masks be done under the matrix_dev->lock when the driver is
notified, the masks will not be updated under the matrix_dev->lock. The
lock is necessary for the setting/unsetting of the KVM pointer, however,
so that will remain in place. 

The dependency chain for the circular lockdep resolved by this patch 
is (in reverse order):

2:  vfio_ap_mdev_group_notifier:kvm->lock
matrix_dev->lock

1:  handle_pqap:matrix_dev->lock
kvm_vcpu_ioctl: vcpu->mutex

0:  kvm_s390_cpus_to_pv:vcpu->mutex
kvm_vm_ioctl:   kvm->lock

Please note that if checkpatch is run against this patch series, you may
get a "WARNING: Unknown commit id 'f21916ec4826', maybe rebased or not 
pulled?" message. The commit 'f21916ec4826', however, is definitely
in the master branch on top of which this patch series was built, so I'm
not sure why this message is being output by checkpatch. 

Change log v1=> v2:
--
* No longer holding the matrix_dev->lock prior to setting/clearing the
  masks supplying the AP configuration to a KVM guest.
* Make all updates to the data in the matrix mdev that is used to manage
  AP resources used by the KVM guest in the vfio_ap_mdev_set_kvm() function
  instead of the group notifier callback.
* Check for the matrix mdev's KVM pointer in the vfio_ap_mdev_unset_kvm()
  function instead of the vfio_ap_mdev_release() function.

Tony Krowiak (1):
  s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

 drivers/s390/crypto/vfio_ap_ops.c | 119 +-
 1 file changed, 84 insertions(+), 35 deletions(-)

-- 
2.21.1



[PATCH v2 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-15 Thread Tony Krowiak
This patch fixes a circular locking dependency in the CI introduced by
commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
pointer invalidated"). The lockdep only occurs when starting a Secure
Execution guest. Crypto virtualization (vfio_ap) is not yet supported for
SE guests; however, in order to avoid CI errors, this fix is being
provided.

The circular lockdep was introduced when the masks in the guest's APCB
were taken under the matrix_dev->lock. While the lock is definitely
needed to protect the setting/unsetting of the KVM pointer, it is not
necessarily critical for setting the masks, so this will not be done under
protection of the matrix_dev->lock.

Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer 
invalidated")
Cc: sta...@vger.kernel.org
Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_ops.c | 119 +-
 1 file changed, 84 insertions(+), 35 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 41fc2e4135fe..8574b6ecc9c5 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1027,8 +1027,21 @@ static const struct attribute_group 
*vfio_ap_mdev_attr_groups[] = {
  * @matrix_mdev: a mediated matrix device
  * @kvm: reference to KVM instance
  *
- * Verifies no other mediated matrix device has @kvm and sets a reference to
- * it in @matrix_mdev->kvm.
+ * Sets all data for @matrix_mdev that are needed to manage AP resources
+ * for the guest whose state is represented by @kvm:
+ * 1. Verifies no other mediated device has a reference to @kvm.
+ * 2. Increments the ref count for @kvm so it doesn't disappear until the
+ *vfio_ap driver is notified the pointer is being nullified.
+ * 3. Sets a reference to the PQAP hook (i.e., handle_pqap() function) into
+ *@kvm to handle interception of the PQAP(AQIC) instruction.
+ * 4. Sets the masks supplying the AP configuration to the KVM guest.
+ * 5. Sets the KVM pointer into @kvm so the vfio_ap driver can access it.
+ *
+ * Note: The matrix_dev->lock must be taken prior to calling
+ * this function; however, the lock will be temporarily released to avoid a
+ * potential circular lock dependency with other asynchronous processes that
+ * lock the kvm->lock mutex which is also needed to supply the guest's AP
+ * configuration.
  *
  * Return 0 if no other mediated matrix device has a reference to @kvm;
  * otherwise, returns an -EPERM.
@@ -1043,9 +1056,17 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
*matrix_mdev,
return -EPERM;
}
 
-   matrix_mdev->kvm = kvm;
-   kvm_get_kvm(kvm);
-   kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
+   if (kvm->arch.crypto.crycbd) {
+   kvm_get_kvm(kvm);
+   kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
+   mutex_unlock(_dev->lock);
+   kvm_arch_crypto_set_masks(kvm,
+ matrix_mdev->matrix.apm,
+ matrix_mdev->matrix.aqm,
+ matrix_mdev->matrix.adm);
+   mutex_lock(_dev->lock);
+   matrix_mdev->kvm = kvm;
+   }
 
return 0;
 }
@@ -1079,51 +1100,80 @@ static int vfio_ap_mdev_iommu_notifier(struct 
notifier_block *nb,
return NOTIFY_DONE;
 }
 
+/**
+ * vfio_ap_mdev_unset_kvm
+ *
+ * @matrix_mdev: a matrix mediated device
+ *
+ * Performs clean-up of resources no longer needed by @matrix_mdev.
+ *
+ * Note: The matrix_dev->lock must be taken prior to calling this
+ * function; however,  the lock will be temporarily released to avoid a
+ * potential circular lock dependency with other asynchronous processes that
+ * lock the kvm->lock mutex which is also needed to update the guest's AP
+ * configuration as follows:
+ * 1.  Grab a reference to the KVM pointer stored in @matrix_mdev.
+ * 2.  Set the KVM pointer in @matrix_mdev to NULL so no other asynchronous
+ * process uses it (e.g., assign_adapter store function) after
+ * unlocking the matrix_dev->lock mutex.
+ * 3.  Set the PQAP hook to NULL so it will not be invoked after unlocking
+ * the matrix_dev->lock mutex.
+ * 4.  Unlock the matrix_dev->lock mutex to avoid circular lock
+ * dependencies.
+ * 5.  Clear the masks in the guest's APCB to remove guest access to AP
+ * resources assigned to @matrix_mdev.
+ * 6.  Lock the matrix_dev->lock mutex to prevent access to resources
+ * assigned to @matrix_mdev while the remainder of the cleanup
+ * operations take place.
+ * 7.  Decrement the reference counter incremented in #1.
+ * 8.  Set the reference to the KVM pointer grabbed in #1 into @matrix_mdev
+ * (set to NULL in #2) because it will be needed 

Re: [PATCH 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-11 Thread Tony Krowiak




On 2/11/21 11:47 AM, Halil Pasic wrote:

On Thu, 11 Feb 2021 09:21:26 -0500
Tony Krowiak  wrote:


Yes, it makes sense. I guess I didn't look closely at your
suggestion when I said it was exactly what I implemented
after agreeing with Connie. I had a slight difference in
my implementation:

static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
{
      struct kvm *kvm;

      mutex_lock(_dev->lock);

      if (matrix_mdev->kvm) {
          kvm = matrix_mdev->kvm;
          mutex_unlock(_dev->lock);

The problem with this one is that as soon as we drop
the lock here, another thread can in theory execute
the critical section below, which drops our reference
to kvm via kvm_put_kvm(kvm). Thus when we enter
kvm_arch_crypto_clear_mask(), even if we are guaranteed
to have a non-null pointer, the pointee is not guaranteed
to be around. So like Connie suggested, you better take
another reference to kvm in the first critical section.


Sure.



Regards,
Halil


          kvm_arch_crypto_clear_masks(kvm);
          mutex_lock(_dev->lock);
          kvm->arch.crypto.pqap_hook = NULL;
          vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
      matrix_mdev->kvm = NULL;
          kvm_put_kvm(kvm);
      }

      mutex_unlock(_dev->lock);
}




Re: [PATCH 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-11 Thread Tony Krowiak




On 2/11/21 7:23 AM, Cornelia Huck wrote:

On Wed, 10 Feb 2021 15:34:24 -0500
Tony Krowiak  wrote:


On 2/10/21 5:53 AM, Cornelia Huck wrote:

On Tue,  9 Feb 2021 14:48:30 -0500
Tony Krowiak  wrote:
  

This patch fixes a circular locking dependency in the CI introduced by
commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
pointer invalidated"). The lockdep only occurs when starting a Secure
Execution guest. Crypto virtualization (vfio_ap) is not yet supported for
SE guests; however, in order to avoid CI errors, this fix is being
provided.

The circular lockdep was introduced when the masks in the guest's APCB
were taken under the matrix_dev->lock. While the lock is definitely
needed to protect the setting/unsetting of the KVM pointer, it is not
necessarily critical for setting the masks, so this will not be done under
protection of the matrix_dev->lock.

Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer 
invalidated")
Cc: sta...@vger.kernel.org
Signed-off-by: Tony Krowiak 
---
   drivers/s390/crypto/vfio_ap_ops.c | 75 ++-
   1 file changed, 45 insertions(+), 30 deletions(-)

   static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
   {
-   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
-   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
-   vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
-   kvm_put_kvm(matrix_mdev->kvm);
-   matrix_mdev->kvm = NULL;
+   if (matrix_mdev->kvm) {

If you're doing setting/unsetting under matrix_dev->lock, is it
possible that matrix_mdev->kvm gets unset between here and the next
line, as you don't hold the lock?

That is highly unlikely because the only place the matrix_mdev->kvm
pointer is cleared is in this function which is called from only two
places: the notifier that handles the VFIO_GROUP_NOTIFY_SET_KVM
notification when the KVM pointer is cleared; the vfio_ap_mdev_release()
function which is called when the mdev fd is closed (i.e., when the guest
is shut down). The fact is, with the only end-to-end implementation
currently available, the notifier callback is never invoked to clear
the KVM pointer because the vfio_ap_mdev_release callback is
invoked first and it unregisters the notifier callback.

Having said that, I suppose there is no guarantee that there will not
be different userspace clients in the future that do things in a
different order. At the very least, it wouldn't hurt to protect against
that as you suggest below.

Yes, if userspace is able to use the interfaces in the certain way, we
should always make sure that nothing bad happens if it does so, even if
known userspace applications are well-behaved.

[Can we make an 'evil userspace' test program, maybe? The hardware
dependency makes this hard to run, though.]


Of course it is possible to create such a test program, but off the
top of my head, I can't come up with an algorithm that would
result in the scenario you have laid out. I haven't dabbled in the QEMU
space in quite some time; so, there would also be a bit of a re-learning
curve. I'm not sure it would be worth the effort to take this on given
how unlikely it is this scenario can happen, but I will take it into
consideration as it is a good idea.




Maybe you could
- grab a reference to kvm while holding the lock
- call the mask handling functions with that kvm reference
- lock again, drop the reference, and do the rest of the processing?
  

+   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+   mutex_lock(_dev->lock);
+   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
+   vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
+   kvm_put_kvm(matrix_mdev->kvm);
+   matrix_mdev->kvm = NULL;
+   mutex_unlock(_dev->lock);
+   }
   }




Re: [PATCH 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-11 Thread Tony Krowiak




On 2/10/21 5:46 PM, Halil Pasic wrote:

On Wed, 10 Feb 2021 17:05:48 -0500
Tony Krowiak  wrote:


On 2/10/21 10:32 AM, Halil Pasic wrote:

On Wed, 10 Feb 2021 16:24:29 +0100
Halil Pasic  wrote:
  

Maybe you could
- grab a reference to kvm while holding the lock
- call the mask handling functions with that kvm reference
- lock again, drop the reference, and do the rest of the processing?

I agree, matrix_mdev->kvm can go NULL any time and we are risking
a null pointer dereference here.

Another idea would be to do


static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
{
  struct kvm *kvm;
  
  mutex_lock(_dev->lock);

  if (matrix_mdev->kvm) {
  kvm = matrix_mdev->kvm;
  matrix_mdev->kvm = NULL;
  mutex_unlock(_dev->lock);
  kvm_arch_crypto_clear_masks(kvm);
  mutex_lock(_dev->lock);
  matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;

s/matrix_mdev->kvm/kvm

  vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
  kvm_put_kvm(kvm);
  }
  mutex_unlock(_dev->lock);
}

That way only one unset would actually do the unset and cleanup
and every other invocation would bail out with only checking
matrix_mdev->kvm.

But the problem with that is that we enable the the assign/unassign
prematurely, which could interfere wit reset_queues(). Forget about
it.

Not sure what you mean by this.



I mean because above I first do
(1) matrix_mdev->kvm = NULL;
and then do
(2) vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
another thread could do
static ssize_t unassign_adapter_store(struct device *dev,
   struct device_attribute *attr,
   const char *buf, size_t count)
{
 int ret;
 unsigned long apid;
 struct mdev_device *mdev = mdev_from_dev(dev);
 struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
 /* If the guest is running, disallow un-assignment of adapter */

 if (matrix_mdev->kvm)
 return -EBUSY;
...
}
between (1) and (2), and we would not bail out with -EBUSY because !!kvm
because of (1). That means we would change matrix_mdev->matrix and we
would not reset the queues that correspond to the apid that was just
removed, because by the time we do the reset_queues, the queues are
not in the matrix_mdev->matrix any more.

Does that make sense?


Yes, it makes sense. I guess I didn't look closely at your
suggestion when I said it was exactly what I implemented
after agreeing with Connie. I had a slight difference in
my implementation:

static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
{
    struct kvm *kvm;

    mutex_lock(_dev->lock);

    if (matrix_mdev->kvm) {
        kvm = matrix_mdev->kvm;
        mutex_unlock(_dev->lock);
        kvm_arch_crypto_clear_masks(kvm);
        mutex_lock(_dev->lock);
        kvm->arch.crypto.pqap_hook = NULL;
        vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
    matrix_mdev->kvm = NULL;
        kvm_put_kvm(kvm);
    }

    mutex_unlock(_dev->lock);
}

In your scenario, the unassignment would fail with -EBUSY because
the matrix_mdev->kvm pointer would not have yet been
cleared. The other problem with your implementation is that
IRQ resources would not get cleared after the reset because
the matrix_mdev->kvm pointer would be NULL at that time.


Re: [PATCH 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-10 Thread Tony Krowiak




On 2/10/21 10:24 AM, Halil Pasic wrote:

On Wed, 10 Feb 2021 11:53:34 +0100
Cornelia Huck  wrote:


On Tue,  9 Feb 2021 14:48:30 -0500
Tony Krowiak  wrote:


This patch fixes a circular locking dependency in the CI introduced by
commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
pointer invalidated"). The lockdep only occurs when starting a Secure
Execution guest. Crypto virtualization (vfio_ap) is not yet supported for
SE guests; however, in order to avoid CI errors, this fix is being
provided.

The circular lockdep was introduced when the masks in the guest's APCB
were taken under the matrix_dev->lock. While the lock is definitely
needed to protect the setting/unsetting of the KVM pointer, it is not
necessarily critical for setting the masks, so this will not be done under
protection of the matrix_dev->lock.

Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer 
invalidated")
Cc: sta...@vger.kernel.org
Signed-off-by: Tony Krowiak 
---
  drivers/s390/crypto/vfio_ap_ops.c | 75 ++-
  1 file changed, 45 insertions(+), 30 deletions(-)
   
  static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)

  {
-   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
-   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
-   vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
-   kvm_put_kvm(matrix_mdev->kvm);
-   matrix_mdev->kvm = NULL;
+   if (matrix_mdev->kvm) {

If you're doing setting/unsetting under matrix_dev->lock, is it
possible that matrix_mdev->kvm gets unset between here and the next
line, as you don't hold the lock?

Maybe you could
- grab a reference to kvm while holding the lock
- call the mask handling functions with that kvm reference
- lock again, drop the reference, and do the rest of the processing?

I agree, matrix_mdev->kvm can go NULL any time and we are risking
a null pointer dereference here.

Another idea would be to do


static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
{
 struct kvm *kvm;
 
 mutex_lock(_dev->lock);

 if (matrix_mdev->kvm) {
 kvm = matrix_mdev->kvm;
 matrix_mdev->kvm = NULL;
 mutex_unlock(_dev->lock);
 kvm_arch_crypto_clear_masks(kvm);
 mutex_lock(_dev->lock);
 matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
 vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
 kvm_put_kvm(kvm);
 }
 mutex_unlock(_dev->lock);
}

That way only one unset would actually do the unset and cleanup
and every other invocation would bail out with only checking
matrix_mdev->kvm.


How ironic, that is exactly what I did after agreeing with Connie.



  

+   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+   mutex_lock(_dev->lock);
+   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
+   vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
+   kvm_put_kvm(matrix_mdev->kvm);
+   matrix_mdev->kvm = NULL;
+   mutex_unlock(_dev->lock);
+   }
  }




Re: [PATCH 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-10 Thread Tony Krowiak




On 2/10/21 10:32 AM, Halil Pasic wrote:

On Wed, 10 Feb 2021 16:24:29 +0100
Halil Pasic  wrote:


Maybe you could
- grab a reference to kvm while holding the lock
- call the mask handling functions with that kvm reference
- lock again, drop the reference, and do the rest of the processing?

I agree, matrix_mdev->kvm can go NULL any time and we are risking
a null pointer dereference here.

Another idea would be to do


static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
{
 struct kvm *kvm;
 
 mutex_lock(_dev->lock);

 if (matrix_mdev->kvm) {
 kvm = matrix_mdev->kvm;
 matrix_mdev->kvm = NULL;
 mutex_unlock(_dev->lock);
 kvm_arch_crypto_clear_masks(kvm);
 mutex_lock(_dev->lock);
 matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;

s/matrix_mdev->kvm/kvm

 vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
 kvm_put_kvm(kvm);
 }
 mutex_unlock(_dev->lock);
}

That way only one unset would actually do the unset and cleanup
and every other invocation would bail out with only checking
matrix_mdev->kvm.

But the problem with that is that we enable the the assign/unassign
prematurely, which could interfere wit reset_queues(). Forget about
it.


Not sure what you mean by this.




Re: [PATCH 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-10 Thread Tony Krowiak




On 2/10/21 5:53 AM, Cornelia Huck wrote:

On Tue,  9 Feb 2021 14:48:30 -0500
Tony Krowiak  wrote:


This patch fixes a circular locking dependency in the CI introduced by
commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
pointer invalidated"). The lockdep only occurs when starting a Secure
Execution guest. Crypto virtualization (vfio_ap) is not yet supported for
SE guests; however, in order to avoid CI errors, this fix is being
provided.

The circular lockdep was introduced when the masks in the guest's APCB
were taken under the matrix_dev->lock. While the lock is definitely
needed to protect the setting/unsetting of the KVM pointer, it is not
necessarily critical for setting the masks, so this will not be done under
protection of the matrix_dev->lock.

Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer 
invalidated")
Cc: sta...@vger.kernel.org
Signed-off-by: Tony Krowiak 
---
  drivers/s390/crypto/vfio_ap_ops.c | 75 ++-
  1 file changed, 45 insertions(+), 30 deletions(-)

  static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
  {
-   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
-   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
-   vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
-   kvm_put_kvm(matrix_mdev->kvm);
-   matrix_mdev->kvm = NULL;
+   if (matrix_mdev->kvm) {

If you're doing setting/unsetting under matrix_dev->lock, is it
possible that matrix_mdev->kvm gets unset between here and the next
line, as you don't hold the lock?


That is highly unlikely because the only place the matrix_mdev->kvm
pointer is cleared is in this function which is called from only two
places: the notifier that handles the VFIO_GROUP_NOTIFY_SET_KVM
notification when the KVM pointer is cleared; the vfio_ap_mdev_release()
function which is called when the mdev fd is closed (i.e., when the guest
is shut down). The fact is, with the only end-to-end implementation
currently available, the notifier callback is never invoked to clear
the KVM pointer because the vfio_ap_mdev_release callback is
invoked first and it unregisters the notifier callback.

Having said that, I suppose there is no guarantee that there will not
be different userspace clients in the future that do things in a
different order. At the very least, it wouldn't hurt to protect against
that as you suggest below.



Maybe you could
- grab a reference to kvm while holding the lock
- call the mask handling functions with that kvm reference
- lock again, drop the reference, and do the rest of the processing?


+   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+   mutex_lock(_dev->lock);
+   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
+   vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
+   kvm_put_kvm(matrix_mdev->kvm);
+   matrix_mdev->kvm = NULL;
+   mutex_unlock(_dev->lock);
+   }
  }




[PATCH 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

2021-02-09 Thread Tony Krowiak
This patch fixes a circular locking dependency in the CI introduced by
commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
pointer invalidated"). The lockdep only occurs when starting a Secure
Execution guest. Crypto virtualization (vfio_ap) is not yet supported for
SE guests; however, in order to avoid CI errors, this fix is being
provided.

The circular lockdep was introduced when the masks in the guest's APCB
were taken under the matrix_dev->lock. While the lock is definitely
needed to protect the setting/unsetting of the KVM pointer, it is not
necessarily critical for setting the masks, so this will not be done under
protection of the matrix_dev->lock.

Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer 
invalidated")
Cc: sta...@vger.kernel.org
Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_ops.c | 75 ++-
 1 file changed, 45 insertions(+), 30 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 41fc2e4135fe..f4e19aa2acb9 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -322,6 +322,20 @@ static void vfio_ap_matrix_init(struct ap_config_info 
*info,
matrix->adm_max = info->apxa ? info->Nd : 15;
 }
 
+static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
+{
+   return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
+}
+
+static void vfio_ap_mdev_commit_apcb(struct ap_matrix_mdev *matrix_mdev)
+{
+   if (vfio_ap_mdev_has_crycb(matrix_mdev))
+   kvm_arch_crypto_set_masks(matrix_mdev->kvm,
+ matrix_mdev->matrix.apm,
+ matrix_mdev->matrix.aqm,
+ matrix_mdev->matrix.adm);
+}
+
 static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 {
struct ap_matrix_mdev *matrix_mdev;
@@ -1028,7 +1042,9 @@ static const struct attribute_group 
*vfio_ap_mdev_attr_groups[] = {
  * @kvm: reference to KVM instance
  *
  * Verifies no other mediated matrix device has @kvm and sets a reference to
- * it in @matrix_mdev->kvm.
+ * it in @matrix_mdev->kvm. The matrix_dev->lock must not be taken prior to
+ * calling this function; doing so may result in a circular lock dependency
+ * when the kvm->lock is taken to set masks in the guest's APCB.
  *
  * Return 0 if no other mediated matrix device has a reference to @kvm;
  * otherwise, returns an -EPERM.
@@ -1038,6 +1054,8 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
*matrix_mdev,
 {
struct ap_matrix_mdev *m;
 
+   mutex_lock(_dev->lock);
+
list_for_each_entry(m, _dev->mdev_list, node) {
if ((m != matrix_mdev) && (m->kvm == kvm))
return -EPERM;
@@ -1046,6 +1064,8 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
*matrix_mdev,
matrix_mdev->kvm = kvm;
kvm_get_kvm(kvm);
kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
+   mutex_unlock(_dev->lock);
+   vfio_ap_mdev_commit_apcb(matrix_mdev);
 
return 0;
 }
@@ -1079,13 +1099,27 @@ static int vfio_ap_mdev_iommu_notifier(struct 
notifier_block *nb,
return NOTIFY_DONE;
 }
 
+/**
+ * vfio_ap_mdev_unset_kvm
+ *
+ * @matrix_mdev: a matrix mediated device
+ *
+ * Clears the masks in the guest's APCB as well as the reference to KVM from
+ * @matrix_mdev. The matrix_dev->lock must not be taken prior to calling this
+ * function; doing so may result in a circular lock dependency when the
+ * kvm->lock is taken to clear the masks in the guest's APCB.
+ */
 static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
 {
-   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
-   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
-   vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
-   kvm_put_kvm(matrix_mdev->kvm);
-   matrix_mdev->kvm = NULL;
+   if (matrix_mdev->kvm) {
+   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+   mutex_lock(_dev->lock);
+   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
+   vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
+   kvm_put_kvm(matrix_mdev->kvm);
+   matrix_mdev->kvm = NULL;
+   mutex_unlock(_dev->lock);
+   }
 }
 
 static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
@@ -1098,32 +1132,15 @@ static int vfio_ap_mdev_group_notifier(struct 
notifier_block *nb,
return NOTIFY_OK;
 
matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
-   mutex_lock(_dev->lock);
-
-   if (!data) {
-   if (matrix_mdev->kvm)
-   vfio_ap_mdev_unset_kvm(matrix_mdev

[PATCH 0/1] fix circular lockdep when staring SE guest

2021-02-09 Thread Tony Krowiak
Patch f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
pointer invalidated") introduced a change that results in a circular
locking dependency when a Secure Execution guest that is configured with
crypto devices is started. The problem resulted due to the fact that the
patch moved the setting of the guest's AP masks within the protection of
the matrix_dev->lock when the vfio_ap driver is notified that the KVM 
pointer has been set. Since it is not critical that setting/clearing of
the guest's AP masks when the driver is notified, the masks will not be
updated under the matrix_dev->lock. The lock is necessary for the
setting/unsetting of the KVM pointer, however, so that will remain in
place. 

The dependency chain for the circular lockdep resolved by this patch 
is:

#2  vfio_ap_mdev_group_notifier:kvm->lock
matrix_dev->lock

#1: handle_pqap:matrix_dev->lock
kvm_vcpu_ioctl: vcpu->mutex

#0: kvm_s390_cpus_to_pv:vcpu->mutex
kvm_vm_ioctl:       kvm->lock   

Tony Krowiak (1):
  s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

 drivers/s390/crypto/vfio_ap_ops.c | 75 ++-
 1 file changed, 45 insertions(+), 30 deletions(-)

-- 
2.21.1



Re: [PATCH v13 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

2021-02-03 Thread Tony Krowiak




On 1/12/21 12:55 PM, Halil Pasic wrote:

On Tue, 12 Jan 2021 02:12:51 +0100
Halil Pasic  wrote:


@@ -1347,8 +1437,11 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
apqi = AP_QID_QUEUE(q->apqn);
vfio_ap_mdev_reset_queue(apid, apqi, 1);
  
-	if (q->matrix_mdev)

+   if (q->matrix_mdev) {
+   matrix_mdev = q->matrix_mdev;
vfio_ap_mdev_unlink_queue(q);
+   vfio_ap_mdev_refresh_apcb(matrix_mdev);
+   }
  
  	kfree(q);

mutex_unlock(_dev->lock);

Shouldn't we first remove the queue from the APCB and then
reset? Sorry, I missed this one yesterday.


I agreed to move the reset, however if the remove callback is
invoked due to a manual unbind of the queue and the queue is
in use by a guest, the cleanup of the IRQ resources after the
reset of the queue will not happen because the link from the
queue to the matrix mdev was removed. Consequently, I'm going
to have to change the patch 05/15 to split the vfio_ap_mdev_unlink_queue()
function into two functions: one to remove the link from the matrix mdev to
the queue; and, one to remove the link from the queue to the matrix
mdev. Only the first will be used for the remove callback which should
be fine since the queue object is freed at the end of the remove
function anyway.



Regards,
Halil




Re: [PATCH v13 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

2021-02-01 Thread Tony Krowiak




On 1/12/21 12:55 PM, Halil Pasic wrote:

On Tue, 12 Jan 2021 02:12:51 +0100
Halil Pasic  wrote:


@@ -1347,8 +1437,11 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
apqi = AP_QID_QUEUE(q->apqn);
vfio_ap_mdev_reset_queue(apid, apqi, 1);
  
-	if (q->matrix_mdev)

+   if (q->matrix_mdev) {
+   matrix_mdev = q->matrix_mdev;
vfio_ap_mdev_unlink_queue(q);
+   vfio_ap_mdev_refresh_apcb(matrix_mdev);
+   }
  
  	kfree(q);

mutex_unlock(_dev->lock);

Shouldn't we first remove the queue from the APCB and then
reset? Sorry, I missed this one yesterday.


Yes, that's probably the order in which  it should be done.
I'll change it.



Regards,
Halil




Re: [PATCH v13 08/15] s390/vfio-ap: sysfs attribute to display the guest's matrix

2021-01-28 Thread Tony Krowiak




On 1/11/21 5:58 PM, Halil Pasic wrote:

On Tue, 22 Dec 2020 20:15:59 -0500
Tony Krowiak  wrote:


The matrix of adapters and domains configured in a guest's APCB may
differ from the matrix of adapters and domains assigned to the matrix mdev,
so this patch introduces a sysfs attribute to display the matrix of
adapters and domains that are or will be assigned to the APCB of a guest
that is or will be using the matrix mdev. For a matrix mdev denoted by
$uuid, the guest matrix can be displayed as follows:

cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix

Signed-off-by: Tony Krowiak 

Reviewed-by: Halil Pasic 

But because vfio_ap_mdev_commit_shadow_apcb() is not used (see prev
patch) the attribute won't show the guest matrix at this point. :(


I'll move this patch following all of the filtering and hot plug
patches.




---
  drivers/s390/crypto/vfio_ap_ops.c | 51 ++-
  1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 44b3a81cadfb..1b1d5975ee0e 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -894,29 +894,24 @@ static ssize_t control_domains_show(struct device *dev,
  }
  static DEVICE_ATTR_RO(control_domains);
  
-static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,

-  char *buf)
+static ssize_t vfio_ap_mdev_matrix_show(struct ap_matrix *matrix, char *buf)
  {
-   struct mdev_device *mdev = mdev_from_dev(dev);
-   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
char *bufpos = buf;
unsigned long apid;
unsigned long apqi;
unsigned long apid1;
unsigned long apqi1;
-   unsigned long napm_bits = matrix_mdev->matrix.apm_max + 1;
-   unsigned long naqm_bits = matrix_mdev->matrix.aqm_max + 1;
+   unsigned long napm_bits = matrix->apm_max + 1;
+   unsigned long naqm_bits = matrix->aqm_max + 1;
int nchars = 0;
int n;
  
-	apid1 = find_first_bit_inv(matrix_mdev->matrix.apm, napm_bits);

-   apqi1 = find_first_bit_inv(matrix_mdev->matrix.aqm, naqm_bits);
-
-   mutex_lock(_dev->lock);
+   apid1 = find_first_bit_inv(matrix->apm, napm_bits);
+   apqi1 = find_first_bit_inv(matrix->aqm, naqm_bits);
  
  	if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {

-   for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
-   for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
+   for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
+   for_each_set_bit_inv(apqi, matrix->aqm,
 naqm_bits) {
n = sprintf(bufpos, "%02lx.%04lx\n", apid,
apqi);
@@ -925,25 +920,52 @@ static ssize_t matrix_show(struct device *dev, struct 
device_attribute *attr,
}
}
} else if (apid1 < napm_bits) {
-   for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
+   for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
n = sprintf(bufpos, "%02lx.\n", apid);
bufpos += n;
nchars += n;
}
} else if (apqi1 < naqm_bits) {
-   for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm_bits) {
+   for_each_set_bit_inv(apqi, matrix->aqm, naqm_bits) {
n = sprintf(bufpos, ".%04lx\n", apqi);
bufpos += n;
nchars += n;
}
}
  
+	return nchars;

+}
+
+static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
+  char *buf)
+{
+   ssize_t nchars;
+   struct mdev_device *mdev = mdev_from_dev(dev);
+   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+   mutex_lock(_dev->lock);
+   nchars = vfio_ap_mdev_matrix_show(_mdev->matrix, buf);
mutex_unlock(_dev->lock);
  
  	return nchars;

  }
  static DEVICE_ATTR_RO(matrix);
  
+static ssize_t guest_matrix_show(struct device *dev,

+struct device_attribute *attr, char *buf)
+{
+   ssize_t nchars;
+   struct mdev_device *mdev = mdev_from_dev(dev);
+   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+   mutex_lock(_dev->lock);
+   nchars = vfio_ap_mdev_matrix_show(_mdev->shadow_apcb, buf);
+   mutex_unlock(_dev->lock);
+
+   return nchars;
+}
+static DEVICE_ATTR_RO(guest_matrix);
+
  static struct attribute *vfio_ap_mdev_attrs[] = {
_attr_assign_adapter.attr,
_attr_unassign_adapter.attr,
@@ -953,6 +975,7 @@ static struct attribute *vfio_ap_mde

Re: [PATCH v13 07/15] s390/vfio-ap: introduce shadow APCB

2021-01-14 Thread Tony Krowiak




On 1/11/21 5:50 PM, Halil Pasic wrote:

On Tue, 22 Dec 2020 20:15:58 -0500
Tony Krowiak  wrote:


The APCB is a field within the CRYCB that provides the AP configuration
to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
maintain it for the lifespan of the guest.

Signed-off-by: Tony Krowiak 
Reviewed-by: Halil Pasic 
---
  drivers/s390/crypto/vfio_ap_ops.c | 15 +++
  drivers/s390/crypto/vfio_ap_private.h |  2 ++
  2 files changed, 17 insertions(+)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 2d58b39977be..44b3a81cadfb 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -293,6 +293,20 @@ static void vfio_ap_matrix_init(struct ap_config_info 
*info,
matrix->adm_max = info->apxa ? info->Nd : 15;
  }
  
+static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)

+{
+   return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
+}
+
+static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
+{
+   if (vfio_ap_mdev_has_crycb(matrix_mdev))
+   kvm_arch_crypto_set_masks(matrix_mdev->kvm,
+ matrix_mdev->shadow_apcb.apm,
+ matrix_mdev->shadow_apcb.aqm,
+ matrix_mdev->shadow_apcb.adm);
+}
+
  static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
  {
struct ap_matrix_mdev *matrix_mdev;
@@ -308,6 +322,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct 
mdev_device *mdev)
  
  	matrix_mdev->mdev = mdev;

vfio_ap_matrix_init(_dev->info, _mdev->matrix);
+   vfio_ap_matrix_init(_dev->info, _mdev->shadow_apcb);
hash_init(matrix_mdev->qtable);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
diff --git a/drivers/s390/crypto/vfio_ap_private.h 
b/drivers/s390/crypto/vfio_ap_private.h
index 4e5cc72fc0db..d2d26ba18602 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -75,6 +75,7 @@ struct ap_matrix {
   * @list: allows the ap_matrix_mdev struct to be added to a list
   * @matrix:   the adapters, usage domains and control domains assigned to the
   *mediated matrix device.
+ * @shadow_apcb:the shadow copy of the APCB field of the KVM guest's CRYCB
   * @group_notifier: notifier block used for specifying callback function for
   *handling the VFIO_GROUP_NOTIFY_SET_KVM event
   * @kvm:  the struct holding guest's state
@@ -82,6 +83,7 @@ struct ap_matrix {
  struct ap_matrix_mdev {
struct list_head node;
struct ap_matrix matrix;
+   struct ap_matrix shadow_apcb;
struct notifier_block group_notifier;
struct notifier_block iommu_notifier;
struct kvm *kvm;

What happened to the following hunk from v12?


That's a very good question, I'll reinstate it.



@@ -1218,13 +1233,9 @@ static int vfio_ap_mdev_group_notifier(struct 
notifier_block *nb,
if (ret)
return NOTIFY_DONE;
  
-	/* If there is no CRYCB pointer, then we can't copy the masks */

-   if (!matrix_mdev->kvm->arch.crypto.crycbd)
-   return NOTIFY_DONE;
-
-   kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
- matrix_mdev->matrix.aqm,
- matrix_mdev->matrix.adm);
+   memcpy(_mdev->shadow_apcb, _mdev->matrix,
+  sizeof(matrix_mdev->shadow_apcb));
+   vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
  
  	return NOTIFY_OK;

  }




Re: [PATCH v13 05/15] s390/vfio-ap: manage link between queue struct and matrix mdev

2021-01-14 Thread Tony Krowiak




On 1/13/21 9:50 PM, Halil Pasic wrote:

On Wed, 13 Jan 2021 16:41:27 -0500
Tony Krowiak  wrote:


On 1/11/21 2:17 PM, Halil Pasic wrote:

On Tue, 22 Dec 2020 20:15:56 -0500
Tony Krowiak  wrote:
  

Let's create links between each queue device bound to the vfio_ap device
driver and the matrix mdev to which the queue's APQN is assigned. The idea
is to facilitate efficient retrieval of the objects representing the queue
devices and matrix mdevs as well as to verify that a queue assigned to
a matrix mdev is bound to the driver.

The links will be created as follows:

 * When the queue device is probed, if its APQN is assigned to a matrix
   mdev, the structures representing the queue device and the matrix mdev
   will be linked.

 * When an adapter or domain is assigned to a matrix mdev, for each new
   APQN assigned that references a queue device bound to the vfio_ap
   device driver, the structures representing the queue device and the
   matrix mdev will be linked.

The links will be removed as follows:

 * When the queue device is removed, if its APQN is assigned to a matrix
   mdev, the structures representing the queue device and the matrix mdev
   will be unlinked.

 * When an adapter or domain is unassigned from a matrix mdev, for each
   APQN unassigned that references a queue device bound to the vfio_ap
   device driver, the structures representing the queue device and the
   matrix mdev will be unlinked.

Signed-off-by: Tony Krowiak 

Reviewed-by: Halil Pasic 
  

[..]


+
   int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
   {
struct vfio_ap_queue *q;
@@ -1324,9 +1404,13 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
q = kzalloc(sizeof(*q), GFP_KERNEL);
if (!q)
return -ENOMEM;
+   mutex_lock(_dev->lock);
dev_set_drvdata(>device, q);
q->apqn = to_ap_queue(>device)->qid;
q->saved_isc = VFIO_AP_ISC_INVALID;
+   vfio_ap_queue_link_mdev(q);
+   mutex_unlock(_dev->lock);
+

Does the critical section have to include more than just
vfio_ap_queue_link_mdev()? Did we need the critical section
before this patch?

We did not need the critical section before this patch because
the only function that retrieved the vfio_ap_queue via the queue
device's drvdata was the remove callback. I included the initialization
of the vfio_ap_queue object under lock because the
vfio_ap_find_queue() function retrieves the vfio_ap_queue object from
the queue device's drvdata so it might be advantageous to initialize
it under the mdev lock. On the other hand, I can't come up with a good
argument to change this.



I was asking out of curiosity, not because I want it changed. I was
also wondering if somebody could see a partially initialized device:
we even first call dev_set_drvdata() and only then finish the
initialization. Before 's390/vfio-ap: use new AP bus interface to search
for queue devices', which is the previous patch, we had the klist code
in between, which uses spinlocks, which I think ensure, that all
effects of probe are seen when we get the queue from
vfio_ap_find_queue(). But with patch 4 in place that is not the case any
more. Or am I wrong?


You are correct insofar as patch 4 replaces the driver_find_device()
function call with a call to AP bus's ap_get_qdev() function which
does not use spinlocks. Without digging deeply into the probe call
chain I do not know whether or not  the use of spinlocks by the klist
code ensures all effects of the probe are seen when we get the
queue from vfio_ap_find_queue(). What I'm sure about is that since
both vfio_ap_find_queue() and the setting of the drvdata in the
probe function are always done under the mdev lock, consistency
should be maintained. What I did decide when thinking about your
previous review comment is that we should probably initialize the
vfio_ap_queue object before setting the drvdata, so I made that change.



Regards,
Halil




Re: [PATCH v13 06/15] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device

2021-01-14 Thread Tony Krowiak




On 1/11/21 3:40 PM, Halil Pasic wrote:

On Tue, 22 Dec 2020 20:15:57 -0500
Tony Krowiak  wrote:


The current implementation does not allow assignment of an AP adapter or
domain to an mdev device if each APQN resulting from the assignment
does not reference an AP queue device that is bound to the vfio_ap device
driver. This patch allows assignment of AP resources to the matrix mdev as
long as the APQNs resulting from the assignment:
1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
2. Are not assigned to another matrix mdev.

The rationale behind this is twofold:
1. The AP architecture does not preclude assignment of APQNs to an AP
   configuration that are not available to the system.
2. APQNs that do not reference a queue device bound to the vfio_ap
   device driver will not be assigned to the guest's CRYCB, so the
   guest will not get access to queues not bound to the vfio_ap driver.

You didn't tell us about the changed error code.


I am assuming you are talking about returning -EBUSY from
the vfio_ap_mdev_verify_no_sharing() function instead of
-EADDRINUSE. I'm going to change this back per your comments
below.



Also notice that this point we don't have neither filtering nor in-use.
This used to be patch 11, and most of that stuff used to be in place. But
I'm going to trust you, if you say its fine to enable it this early.


The patch order was changed due to your review comments in
in Message ID <20201126165431.6ef1457a.pa...@linux.ibm.com>,
patch 07/17 in the v12 series. In order to ensure that only queues
bound to the vfio_ap driver are given to the guest, I'm going to
create a patch that will preceded this one which introduces the
filtering code currently introduced in the patch 12/17, the hot
plug patch.




Signed-off-by: Tony Krowiak 
---
  drivers/s390/crypto/vfio_ap_ops.c | 241 --
  1 file changed, 62 insertions(+), 179 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index cdcc6378b4a5..2d58b39977be 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -379,134 +379,37 @@ static struct attribute_group 
*vfio_ap_mdev_type_groups[] = {
NULL,
  };
  
-struct vfio_ap_queue_reserved {

-   unsigned long *apid;
-   unsigned long *apqi;
-   bool reserved;
-};
+#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
+"already assigned to %s"
  
-/**

- * vfio_ap_has_queue
- *
- * @dev: an AP queue device
- * @data: a struct vfio_ap_queue_reserved reference
- *
- * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
- * apid or apqi specified in @data:
- *
- * - If @data contains both an apid and apqi value, then @data will be flagged
- *   as reserved if the APID and APQI fields for the AP queue device matches
- *
- * - If @data contains only an apid value, @data will be flagged as
- *   reserved if the APID field in the AP queue device matches
- *
- * - If @data contains only an apqi value, @data will be flagged as
- *   reserved if the APQI field in the AP queue device matches
- *
- * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
- * @data does not contain either an apid or apqi.
- */
-static int vfio_ap_has_queue(struct device *dev, void *data)
+static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
+unsigned long *apm,
+unsigned long *aqm)

[..]

-   return 0;
+   for_each_set_bit_inv(apid, apm, AP_DEVICES)
+   for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
+   pr_warn(MDEV_SHARING_ERR, apid, apqi, mdev_name);

I would prefer dev_warn() here. We know which device is about to get
more queues, and this device can provide a clue regarding the initiator.


Will do.



Also I believe a warning is too heavy handed here. Warnings should not
be ignored. This is a condition that can emerge during normal operation,
AFAIU. Or am I worng?


It can happen during normal operation, but we had this discussion
in the previous review. Both Connie and I felt it should be a warning
since this message is the only way for a user to identify the queues
in use. A message of lower severity may not get logged depriving the
user from easily determining why an adapter or domain could not
be assigned.




  }
  
  /**

   * vfio_ap_mdev_verify_no_sharing
   *
- * Verifies that the APQNs derived from the cross product of the AP adapter IDs
- * and AP queue indexes comprising the AP matrix are not configured for another
- * mediated device. AP queue sharing is not allowed.
+ * Verifies that each APQN derived from the Cartesian product of the AP adapter
+ * IDs and AP queue indexes comprising the AP matrix are not configured for
+ * another mediated device. AP queue sharing is not allo

Re: [PATCH v13 02/15] s390/vfio-ap: No need to disable IRQ after queue reset

2021-01-13 Thread Tony Krowiak




On 1/13/21 4:21 PM, Halil Pasic wrote:

On Wed, 13 Jan 2021 12:06:28 -0500
Tony Krowiak  wrote:


On 1/11/21 11:32 AM, Halil Pasic wrote:

On Tue, 22 Dec 2020 20:15:53 -0500
Tony Krowiak  wrote:
  

The queues assigned to a matrix mediated device are currently reset when:

* The VFIO_DEVICE_RESET ioctl is invoked
* The mdev fd is closed by userspace (QEMU)
* The mdev is removed from sysfs.

Immediately after the reset of a queue, a call is made to disable
interrupts for the queue. This is entirely unnecessary because the reset of
a queue disables interrupts, so this will be removed.

Signed-off-by: Tony Krowiak 
---
   drivers/s390/crypto/vfio_ap_drv.c |  1 -
   drivers/s390/crypto/vfio_ap_ops.c | 40 +--
   drivers/s390/crypto/vfio_ap_private.h |  1 -
   3 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
b/drivers/s390/crypto/vfio_ap_drv.c
index be2520cc010b..ca18c91afec9 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -79,7 +79,6 @@ static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
apid = AP_QID_CARD(q->apqn);
apqi = AP_QID_QUEUE(q->apqn);
vfio_ap_mdev_reset_queue(apid, apqi, 1);
-   vfio_ap_irq_disable(q);
kfree(q);
mutex_unlock(_dev->lock);
   }
diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 7339043906cf..052f61391ec7 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -25,6 +25,7 @@
   #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
   
   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);

+static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
   
   static int match_apqn(struct device *dev, const void *data)

   {
@@ -49,20 +50,15 @@ static struct vfio_ap_queue *(
int apqn)
   {
struct vfio_ap_queue *q;
-   struct device *dev;
   
   	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))

return NULL;
if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
return NULL;
   
-	dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,

-, match_apqn);
-   if (!dev)
-   return NULL;
-   q = dev_get_drvdata(dev);
-   q->matrix_mdev = matrix_mdev;
-   put_device(dev);
+   q = vfio_ap_find_queue(apqn);
+   if (q)
+   q->matrix_mdev = matrix_mdev;
   
   	return q;

   }
@@ -1126,24 +1122,27 @@ static int vfio_ap_mdev_group_notifier(struct 
notifier_block *nb,
return notify_rc;
   }
   
-static void (int apqn)

+static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
   {
struct device *dev;
-   struct vfio_ap_queue *q;
+   struct vfio_ap_queue *q = NULL;
   
   	dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,

 , match_apqn);
if (dev) {
q = dev_get_drvdata(dev);
-   vfio_ap_irq_disable(q);
put_device(dev);
}
+
+   return q;
   }

This hunk and the previous one are a rewrite of vfio_ap_get_queue() and
have next to nothing to do with the patch's objective. If we were at an
earlier stage, I would ask to split it up.

The rewrite of vfio_ap_get_queue() definitely is related to this
patch's objective.

Definitively loosely related.


A matter of opinion I suppose and I respect yours.




Below, in the vfio_ap_mdev_reset_queue()
function, there is the label 'free_aqic_resources' which is where
the call to vfio_ap_free_aqic_resources() function is called.
That function takes a struct vfio_ap_queue as an argument,
so the object needs to be retrieved prior to calling the function.
We can't use the vfio_ap_get_queue() function for two reasons:
1. The vfio_ap_get_queue() function takes a struct ap_matrix_mdev
      as a parameter and we do not have a pointer to such at the time.
2. The vfio_ap_get_queue() function is used to link the mdev to the
      vfio_ap_queue object with the specified APQN.
So, we needed a way to retrieve the vfio_ap_queue object by its
APQN only, Rather than creating a function that retrieves the
vfio_ap_queue object which duplicates the retrieval code in
vfio_ap_get_queue(), I created the vfio_ap_find_queue()
function to do just that and modified the vfio_ap_get_queue()
function to call it (i.e., code reuse).

Please tell me what prevented you from doing a doing the splitting out
vfio_ap_find_queue() from vfio_ap_get_queue() in a separate patch, that
precedes this patch? It would have resulted in simpler diffs, because
the split out wouldn't be intermingled with other stuff, i.e. getting
rid of vfio_ap_irq_disable_apqn(). Don't you see that the two are
intermingled in this diff?


I included this here for the reasons I stated abov

Re: [PATCH v13 05/15] s390/vfio-ap: manage link between queue struct and matrix mdev

2021-01-13 Thread Tony Krowiak




On 1/11/21 2:17 PM, Halil Pasic wrote:

On Tue, 22 Dec 2020 20:15:56 -0500
Tony Krowiak  wrote:


Let's create links between each queue device bound to the vfio_ap device
driver and the matrix mdev to which the queue's APQN is assigned. The idea
is to facilitate efficient retrieval of the objects representing the queue
devices and matrix mdevs as well as to verify that a queue assigned to
a matrix mdev is bound to the driver.

The links will be created as follows:

* When the queue device is probed, if its APQN is assigned to a matrix
  mdev, the structures representing the queue device and the matrix mdev
  will be linked.

* When an adapter or domain is assigned to a matrix mdev, for each new
  APQN assigned that references a queue device bound to the vfio_ap
  device driver, the structures representing the queue device and the
  matrix mdev will be linked.

The links will be removed as follows:

* When the queue device is removed, if its APQN is assigned to a matrix
  mdev, the structures representing the queue device and the matrix mdev
  will be unlinked.

* When an adapter or domain is unassigned from a matrix mdev, for each
  APQN unassigned that references a queue device bound to the vfio_ap
  device driver, the structures representing the queue device and the
  matrix mdev will be unlinked.

Signed-off-by: Tony Krowiak 

Reviewed-by: Halil Pasic 


---
  drivers/s390/crypto/vfio_ap_ops.c | 140 +-
  drivers/s390/crypto/vfio_ap_private.h |   3 +
  2 files changed, 117 insertions(+), 26 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 835c963ae16d..cdcc6378b4a5 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -27,33 +27,17 @@
  static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
  static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
  
-/**

- * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
- * @matrix_mdev: the associated mediated matrix
- * @apqn: The queue APQN
- *
- * Retrieve a queue with a specific APQN from the list of the
- * devices of the vfio_ap_drv.
- * Verify that the APID and the APQI are set in the matrix.
- *
- * Returns the pointer to the associated vfio_ap_queue
- */
-static struct vfio_ap_queue *vfio_ap_get_queue(
-   struct ap_matrix_mdev *matrix_mdev,
-   int apqn)
+static struct vfio_ap_queue *
+vfio_ap_mdev_get_queue(struct ap_matrix_mdev *matrix_mdev, unsigned long apqn)
  {
-   struct vfio_ap_queue *q = NULL;
-
-   if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
-   return NULL;
-   if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
-   return NULL;
+   struct vfio_ap_queue *q;
  
-	q = vfio_ap_find_queue(apqn);

-   if (q)
-   q->matrix_mdev = matrix_mdev;
+   hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
+   if (q && (q->apqn == apqn))
+   return q;
+   }
  
-	return q;

+   return NULL;
  }
  
  /**

@@ -166,7 +150,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct 
vfio_ap_queue *q)
  status.response_code);
  end_free:
vfio_ap_free_aqic_resources(q);
-   q->matrix_mdev = NULL;
return status;
  }
  
@@ -282,7 +265,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)

matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
   struct ap_matrix_mdev, pqap_hook);
  
-	q = vfio_ap_get_queue(matrix_mdev, apqn);

+   q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
if (!q)
goto out_unlock;
  
@@ -325,6 +308,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
  
  	matrix_mdev->mdev = mdev;

vfio_ap_matrix_init(_dev->info, _mdev->matrix);
+   hash_init(matrix_mdev->qtable);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
matrix_mdev->pqap_hook.owner = THIS_MODULE;
@@ -553,6 +537,50 @@ static int vfio_ap_mdev_verify_no_sharing(struct 
ap_matrix_mdev *matrix_mdev)
return 0;
  }
  
+static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,

+   struct vfio_ap_queue *q)
+{
+   if (q) {
+   q->matrix_mdev = matrix_mdev;
+   hash_add(matrix_mdev->qtable,
+>mdev_qnode, q->apqn);
+   }
+}
+
+static void vfio_ap_mdev_link_apqn(struct ap_matrix_mdev *matrix_mdev, int 
apqn)
+{
+   struct vfio_ap_queue *q;
+
+   q = vfio_ap_find_queue(apqn);
+   vfio_ap_mdev_link_queue(matrix_mdev, q);
+}
+
+static void vfio_ap_mdev_unlink_queue(struct vfio_ap_queue

Re: [PATCH v13 02/15] s390/vfio-ap: No need to disable IRQ after queue reset

2021-01-13 Thread Tony Krowiak




On 1/11/21 11:32 AM, Halil Pasic wrote:

On Tue, 22 Dec 2020 20:15:53 -0500
Tony Krowiak  wrote:


The queues assigned to a matrix mediated device are currently reset when:

* The VFIO_DEVICE_RESET ioctl is invoked
* The mdev fd is closed by userspace (QEMU)
* The mdev is removed from sysfs.

Immediately after the reset of a queue, a call is made to disable
interrupts for the queue. This is entirely unnecessary because the reset of
a queue disables interrupts, so this will be removed.

Signed-off-by: Tony Krowiak 
---
  drivers/s390/crypto/vfio_ap_drv.c |  1 -
  drivers/s390/crypto/vfio_ap_ops.c | 40 +--
  drivers/s390/crypto/vfio_ap_private.h |  1 -
  3 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
b/drivers/s390/crypto/vfio_ap_drv.c
index be2520cc010b..ca18c91afec9 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -79,7 +79,6 @@ static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
apid = AP_QID_CARD(q->apqn);
apqi = AP_QID_QUEUE(q->apqn);
vfio_ap_mdev_reset_queue(apid, apqi, 1);
-   vfio_ap_irq_disable(q);
kfree(q);
mutex_unlock(_dev->lock);
  }
diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 7339043906cf..052f61391ec7 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -25,6 +25,7 @@
  #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
  
  static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);

+static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
  
  static int match_apqn(struct device *dev, const void *data)

  {
@@ -49,20 +50,15 @@ static struct vfio_ap_queue *(
int apqn)
  {
struct vfio_ap_queue *q;
-   struct device *dev;
  
  	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))

return NULL;
if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
return NULL;
  
-	dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,

-, match_apqn);
-   if (!dev)
-   return NULL;
-   q = dev_get_drvdata(dev);
-   q->matrix_mdev = matrix_mdev;
-   put_device(dev);
+   q = vfio_ap_find_queue(apqn);
+   if (q)
+   q->matrix_mdev = matrix_mdev;
  
  	return q;

  }
@@ -1126,24 +1122,27 @@ static int vfio_ap_mdev_group_notifier(struct 
notifier_block *nb,
return notify_rc;
  }
  
-static void vfio_ap_irq_disable_apqn(int apqn)

+static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
  {
struct device *dev;
-   struct vfio_ap_queue *q;
+   struct vfio_ap_queue *q = NULL;
  
  	dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,

 , match_apqn);
if (dev) {
q = dev_get_drvdata(dev);
-   vfio_ap_irq_disable(q);
put_device(dev);
}
+
+   return q;
  }

This hunk and the previous one are a rewrite of vfio_ap_get_queue() and
have next to nothing to do with the patch's objective. If we were at an
earlier stage, I would ask to split it up.


The rewrite of vfio_ap_get_queue() definitely is related to this
patch's objective. Below, in the vfio_ap_mdev_reset_queue()
function, there is the label 'free_aqic_resources' which is where
the call to vfio_ap_free_aqic_resources() function is called.
That function takes a struct vfio_ap_queue as an argument,
so the object needs to be retrieved prior to calling the function.
We can't use the vfio_ap_get_queue() function for two reasons:
1. The vfio_ap_get_queue() function takes a struct ap_matrix_mdev
    as a parameter and we do not have a pointer to such at the time.
2. The vfio_ap_get_queue() function is used to link the mdev to the
    vfio_ap_queue object with the specified APQN.
So, we needed a way to retrieve the vfio_ap_queue object by its
APQN only, Rather than creating a function that retrieves the
vfio_ap_queue object which duplicates the retrieval code in
vfio_ap_get_queue(), I created the vfio_ap_find_queue()
function to do just that and modified the vfio_ap_get_queue()
function to call it (i.e., code reuse).




  
  int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,

 unsigned int retry)
  {
struct ap_queue_status status;
+   struct vfio_ap_queue *q;
+   int ret;
int retry2 = 2;
int apqn = AP_MKQID(apid, apqi);
  
@@ -1156,18 +1155,32 @@ int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,

status = ap_tapq(apqn, NULL);
}
WARN_ON_ONCE(retry2 <= 0);
-   return 0;
+   ret = 0;
+   

Re: [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support

2021-01-06 Thread Tony Krowiak

Ping

On 12/22/20 8:15 PM, Tony Krowiak wrote:

Note: Patch 1, s390/vfio-ap: clean up vfio_ap resources when KVM
   pointer invalidated does not belong to this series. It has been
   posted as a separate patch to fix a known problem. It is included
   here because it will likely pre-req for this series.

The current design for AP pass-through does not support making dynamic
changes to the AP matrix of a running guest resulting in a few
deficiencies this patch series is intended to mitigate:

1. Adapters, domains and control domains can not be added to or removed
from a running guest. In order to modify a guest's AP configuration,
the guest must be terminated; only then can AP resources be assigned
to or unassigned from the guest's matrix mdev. The new AP
configuration becomes available to the guest when it is subsequently
restarted.

2. The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask interfaces can
be modified by a root user without any restrictions. A change to
either mask can result in AP queue devices being unbound from the
vfio_ap device driver and bound to a zcrypt device driver even if a
guest is using the queues, thus giving the host access to the guest's
private crypto data and vice versa.

3. The APQNs derived from the Cartesian product of the APIDs of the
adapters and APQIs of the domains assigned to a matrix mdev must
reference an AP queue device bound to the vfio_ap device driver. The
AP architecture allows assignment of AP resources that are not
available to the system, so this artificial restriction is not
compliant with the architecture.

4. The AP configuration profile can be dynamically changed for the linux
host after a KVM guest is started. For example, a new domain can be
dynamically added to the configuration profile via the SE or an HMC
connected to a DPM enabled lpar. Likewise, AP adapters can be
dynamically configured (online state) and deconfigured (standby state)
using the SE, an SCLP command or an HMC connected to a DPM enabled
lpar. This can result in inadvertent sharing of AP queues between the
guest and host.

5. A root user can manually unbind an AP queue device representing a
queue in use by a KVM guest via the vfio_ap device driver's sysfs
unbind attribute. In this case, the guest will be using a queue that
is not bound to the driver which violates the device model.

This patch series introduces the following changes to the current design
to alleviate the shortcomings described above as well as to implement
more of the AP architecture:

1. A root user will be prevented from making edits to the AP bus's
/sys/bus/ap/apmask or /sys/bus/ap/aqmask if the change would transfer
ownership of an APQN from the vfio_ap device driver to a zcrypt driver
while the APQN is assigned to a matrix mdev.

2. Allow a root user to hot plug/unplug AP adapters, domains and control
domains for a KVM guest using the matrix mdev via its sysfs
assign/unassign attributes.

4. Allow assignment of an AP adapter or domain to a matrix mdev even if
it results in assignment of an APQN that does not reference an AP
queue device bound to the vfio_ap device driver, as long as the APQN
is not reserved for use by the default zcrypt drivers (also known as
over-provisioning of AP resources). Allowing over-provisioning of AP
resources better models the architecture which does not preclude
assigning AP resources that are not yet available in the system. Such
APQNs, however, will not be assigned to the guest using the matrix
mdev; only APQNs referencing AP queue devices bound to the vfio_ap
device driver will actually get assigned to the guest.

5. Handle dynamic changes to the AP device model.

1. Rationale for changes to AP bus's apmask/aqmask interfaces:
--
Due to the extremely sensitive nature of cryptographic data, it is
imperative that great care be taken to ensure that such data is secured.
Allowing a root user, either inadvertently or maliciously, to configure
these masks such that a queue is shared between the host and a guest is
not only avoidable, it is advisable. It was suggested that this scenario
is better handled in user space with management software, but that does
not preclude a malicious administrator from using the sysfs interfaces
to gain access to a guest's crypto data. It was also suggested that this
scenario could be avoided by taking access to the adapter away from the
guest and zeroing out the queues prior to the vfio_ap driver releasing the
device; however, stealing an adapter in use from a guest as a by-product
of an operation is bad and will likely cause problems for the guest
unnecessarily. It was decided that the most effective solution with the
least number of negative side effects is to prevent the situation at the
source.

2. Rationale for hot plug/unplug

[PATCH v5] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated

2020-12-22 Thread Tony Krowiak
The vfio_ap device driver registers a group notifier with VFIO when the
file descriptor for a VFIO mediated device for a KVM guest is opened to
receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
event). When the KVM pointer is set, the vfio_ap driver takes the
following actions:
1. Stashes the KVM pointer in the vfio_ap_mdev struct that holds the state
   of the mediated device.
2. Calls the kvm_get_kvm() function to increment its reference counter.
3. Sets the function pointer to the function that handles interception of
   the instruction that enables/disables interrupt processing.
4. Sets the masks in the KVM guest's CRYCB to pass AP resources through to
   the guest.

In order to avoid memory leaks, when the notifier is called to receive
notification that the KVM pointer has been set to NULL, the vfio_ap device
driver should reverse the actions taken when the KVM pointer was set.

Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")
Cc: sta...@vger.kernel.org
Signed-off-by: Tony Krowiak 
Reviewed-by: Halil Pasic 
Reviewed-by: Cornelia Huck 
---
 drivers/s390/crypto/vfio_ap_ops.c | 49 ++-
 1 file changed, 28 insertions(+), 21 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index e0bde8518745..7339043906cf 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1037,19 +1037,14 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
*matrix_mdev,
 {
struct ap_matrix_mdev *m;
 
-   mutex_lock(_dev->lock);
-
list_for_each_entry(m, _dev->mdev_list, node) {
-   if ((m != matrix_mdev) && (m->kvm == kvm)) {
-   mutex_unlock(_dev->lock);
+   if ((m != matrix_mdev) && (m->kvm == kvm))
return -EPERM;
-   }
}
 
matrix_mdev->kvm = kvm;
kvm_get_kvm(kvm);
kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
-   mutex_unlock(_dev->lock);
 
return 0;
 }
@@ -1083,35 +1078,52 @@ static int vfio_ap_mdev_iommu_notifier(struct 
notifier_block *nb,
return NOTIFY_DONE;
 }
 
+static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
+{
+   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
+   vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
+   kvm_put_kvm(matrix_mdev->kvm);
+   matrix_mdev->kvm = NULL;
+}
+
 static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
   unsigned long action, void *data)
 {
-   int ret;
+   int ret, notify_rc = NOTIFY_OK;
struct ap_matrix_mdev *matrix_mdev;
 
if (action != VFIO_GROUP_NOTIFY_SET_KVM)
return NOTIFY_OK;
 
matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
+   mutex_lock(_dev->lock);
 
if (!data) {
-   matrix_mdev->kvm = NULL;
-   return NOTIFY_OK;
+   if (matrix_mdev->kvm)
+   vfio_ap_mdev_unset_kvm(matrix_mdev);
+   goto notify_done;
}
 
ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
-   if (ret)
-   return NOTIFY_DONE;
+   if (ret) {
+   notify_rc = NOTIFY_DONE;
+   goto notify_done;
+   }
 
/* If there is no CRYCB pointer, then we can't copy the masks */
-   if (!matrix_mdev->kvm->arch.crypto.crycbd)
-   return NOTIFY_DONE;
+   if (!matrix_mdev->kvm->arch.crypto.crycbd) {
+   notify_rc = NOTIFY_DONE;
+   goto notify_done;
+   }
 
kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
  matrix_mdev->matrix.aqm,
  matrix_mdev->matrix.adm);
 
-   return NOTIFY_OK;
+notify_done:
+   mutex_unlock(_dev->lock);
+   return notify_rc;
 }
 
 static void vfio_ap_irq_disable_apqn(int apqn)
@@ -1222,13 +1234,8 @@ static void vfio_ap_mdev_release(struct mdev_device 
*mdev)
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
mutex_lock(_dev->lock);
-   if (matrix_mdev->kvm) {
-   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
-   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
-   vfio_ap_mdev_reset_queues(mdev);
-   kvm_put_kvm(matrix_mdev->kvm);
-   matrix_mdev->kvm = NULL;
-   }
+   if (matrix_mdev->kvm)
+   vfio_ap_mdev_unset_kvm(matrix_mdev);
mutex_unlock(_dev->lock);
 
vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
-- 
2.21.1



[PATCH v13 01/15] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated

2020-12-22 Thread Tony Krowiak
The vfio_ap device driver registers a group notifier with VFIO when the
file descriptor for a VFIO mediated device for a KVM guest is opened to
receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
event). When the KVM pointer is set, the vfio_ap driver takes the
following actions:
1. Stashes the KVM pointer in the vfio_ap_mdev struct that holds the state
   of the mediated device.
2. Calls the kvm_get_kvm() function to increment its reference counter.
3. Sets the function pointer to the function that handles interception of
   the instruction that enables/disables interrupt processing.
4. Sets the masks in the KVM guest's CRYCB to pass AP resources through to
   the guest.

In order to avoid memory leaks, when the notifier is called to receive
notification that the KVM pointer has been set to NULL, the vfio_ap device
driver should reverse the actions taken when the KVM pointer was set.

Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")
Cc: sta...@vger.kernel.org
Signed-off-by: Tony Krowiak 
Reviewed-by: Halil Pasic 
Reviewed-by: Cornelia Huck 
---
 drivers/s390/crypto/vfio_ap_ops.c | 49 ++-
 1 file changed, 28 insertions(+), 21 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index e0bde8518745..7339043906cf 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1037,19 +1037,14 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
*matrix_mdev,
 {
struct ap_matrix_mdev *m;
 
-   mutex_lock(_dev->lock);
-
list_for_each_entry(m, _dev->mdev_list, node) {
-   if ((m != matrix_mdev) && (m->kvm == kvm)) {
-   mutex_unlock(_dev->lock);
+   if ((m != matrix_mdev) && (m->kvm == kvm))
return -EPERM;
-   }
}
 
matrix_mdev->kvm = kvm;
kvm_get_kvm(kvm);
kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
-   mutex_unlock(_dev->lock);
 
return 0;
 }
@@ -1083,35 +1078,52 @@ static int vfio_ap_mdev_iommu_notifier(struct 
notifier_block *nb,
return NOTIFY_DONE;
 }
 
+static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
+{
+   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
+   vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
+   kvm_put_kvm(matrix_mdev->kvm);
+   matrix_mdev->kvm = NULL;
+}
+
 static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
   unsigned long action, void *data)
 {
-   int ret;
+   int ret, notify_rc = NOTIFY_OK;
struct ap_matrix_mdev *matrix_mdev;
 
if (action != VFIO_GROUP_NOTIFY_SET_KVM)
return NOTIFY_OK;
 
matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
+   mutex_lock(_dev->lock);
 
if (!data) {
-   matrix_mdev->kvm = NULL;
-   return NOTIFY_OK;
+   if (matrix_mdev->kvm)
+   vfio_ap_mdev_unset_kvm(matrix_mdev);
+   goto notify_done;
}
 
ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
-   if (ret)
-   return NOTIFY_DONE;
+   if (ret) {
+   notify_rc = NOTIFY_DONE;
+   goto notify_done;
+   }
 
/* If there is no CRYCB pointer, then we can't copy the masks */
-   if (!matrix_mdev->kvm->arch.crypto.crycbd)
-   return NOTIFY_DONE;
+   if (!matrix_mdev->kvm->arch.crypto.crycbd) {
+   notify_rc = NOTIFY_DONE;
+   goto notify_done;
+   }
 
kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
  matrix_mdev->matrix.aqm,
  matrix_mdev->matrix.adm);
 
-   return NOTIFY_OK;
+notify_done:
+   mutex_unlock(_dev->lock);
+   return notify_rc;
 }
 
 static void vfio_ap_irq_disable_apqn(int apqn)
@@ -1222,13 +1234,8 @@ static void vfio_ap_mdev_release(struct mdev_device 
*mdev)
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
mutex_lock(_dev->lock);
-   if (matrix_mdev->kvm) {
-   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
-   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
-   vfio_ap_mdev_reset_queues(mdev);
-   kvm_put_kvm(matrix_mdev->kvm);
-   matrix_mdev->kvm = NULL;
-   }
+   if (matrix_mdev->kvm)
+   vfio_ap_mdev_unset_kvm(matrix_mdev);
mutex_unlock(_dev->lock);
 
vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
-- 
2.21.1



[PATCH v13 04/15] s390/vfio-ap: use new AP bus interface to search for queue devices

2020-12-22 Thread Tony Krowiak
This patch refactors the vfio_ap device driver to use the AP bus's
ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
information about a queue that is bound to the vfio_ap device driver.
The bus's ap_get_qdev() function retrieves the queue device from a
hashtable keyed by APQN. This is much more efficient than looping over
the list of devices attached to the AP bus by several orders of
magnitude.

Signed-off-by: Tony Krowiak 
Reviewed-by: Halil Pasic 
---
 drivers/s390/crypto/vfio_ap_ops.c | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index a83d6e75361b..835c963ae16d 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -27,13 +27,6 @@
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
 static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
 
-static int match_apqn(struct device *dev, const void *data)
-{
-   struct vfio_ap_queue *q = dev_get_drvdata(dev);
-
-   return (q->apqn == *(int *)(data)) ? 1 : 0;
-}
-
 /**
  * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
  * @matrix_mdev: the associated mediated matrix
@@ -49,7 +42,7 @@ static struct vfio_ap_queue *vfio_ap_get_queue(
struct ap_matrix_mdev *matrix_mdev,
int apqn)
 {
-   struct vfio_ap_queue *q;
+   struct vfio_ap_queue *q = NULL;
 
if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
return NULL;
@@ -1124,15 +1117,17 @@ static int vfio_ap_mdev_group_notifier(struct 
notifier_block *nb,
 
 static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
 {
-   struct device *dev;
+   struct ap_queue *queue;
struct vfio_ap_queue *q = NULL;
 
-   dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,
-, match_apqn);
-   if (dev) {
-   q = dev_get_drvdata(dev);
-   put_device(dev);
-   }
+   queue = ap_get_qdev(apqn);
+   if (!queue)
+   return NULL;
+
+   put_device(>ap_dev.device);
+
+   if (queue->ap_dev.device.driver == _dev->vfio_ap_drv->driver)
+   q = dev_get_drvdata(>ap_dev.device);
 
return q;
 }
-- 
2.21.1



[PATCH v13 05/15] s390/vfio-ap: manage link between queue struct and matrix mdev

2020-12-22 Thread Tony Krowiak
Let's create links between each queue device bound to the vfio_ap device
driver and the matrix mdev to which the queue's APQN is assigned. The idea
is to facilitate efficient retrieval of the objects representing the queue
devices and matrix mdevs as well as to verify that a queue assigned to
a matrix mdev is bound to the driver.

The links will be created as follows:

   * When the queue device is probed, if its APQN is assigned to a matrix
 mdev, the structures representing the queue device and the matrix mdev
 will be linked.

   * When an adapter or domain is assigned to a matrix mdev, for each new
 APQN assigned that references a queue device bound to the vfio_ap
 device driver, the structures representing the queue device and the
 matrix mdev will be linked.

The links will be removed as follows:

   * When the queue device is removed, if its APQN is assigned to a matrix
 mdev, the structures representing the queue device and the matrix mdev
 will be unlinked.

   * When an adapter or domain is unassigned from a matrix mdev, for each
 APQN unassigned that references a queue device bound to the vfio_ap
 device driver, the structures representing the queue device and the
 matrix mdev will be unlinked.

Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_ops.c | 140 +-
 drivers/s390/crypto/vfio_ap_private.h |   3 +
 2 files changed, 117 insertions(+), 26 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 835c963ae16d..cdcc6378b4a5 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -27,33 +27,17 @@
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
 static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
 
-/**
- * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
- * @matrix_mdev: the associated mediated matrix
- * @apqn: The queue APQN
- *
- * Retrieve a queue with a specific APQN from the list of the
- * devices of the vfio_ap_drv.
- * Verify that the APID and the APQI are set in the matrix.
- *
- * Returns the pointer to the associated vfio_ap_queue
- */
-static struct vfio_ap_queue *vfio_ap_get_queue(
-   struct ap_matrix_mdev *matrix_mdev,
-   int apqn)
+static struct vfio_ap_queue *
+vfio_ap_mdev_get_queue(struct ap_matrix_mdev *matrix_mdev, unsigned long apqn)
 {
-   struct vfio_ap_queue *q = NULL;
-
-   if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
-   return NULL;
-   if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
-   return NULL;
+   struct vfio_ap_queue *q;
 
-   q = vfio_ap_find_queue(apqn);
-   if (q)
-   q->matrix_mdev = matrix_mdev;
+   hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
+   if (q && (q->apqn == apqn))
+   return q;
+   }
 
-   return q;
+   return NULL;
 }
 
 /**
@@ -166,7 +150,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct 
vfio_ap_queue *q)
  status.response_code);
 end_free:
vfio_ap_free_aqic_resources(q);
-   q->matrix_mdev = NULL;
return status;
 }
 
@@ -282,7 +265,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
   struct ap_matrix_mdev, pqap_hook);
 
-   q = vfio_ap_get_queue(matrix_mdev, apqn);
+   q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
if (!q)
goto out_unlock;
 
@@ -325,6 +308,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct 
mdev_device *mdev)
 
matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(_dev->info, _mdev->matrix);
+   hash_init(matrix_mdev->qtable);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
matrix_mdev->pqap_hook.owner = THIS_MODULE;
@@ -553,6 +537,50 @@ static int vfio_ap_mdev_verify_no_sharing(struct 
ap_matrix_mdev *matrix_mdev)
return 0;
 }
 
+static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
+   struct vfio_ap_queue *q)
+{
+   if (q) {
+   q->matrix_mdev = matrix_mdev;
+   hash_add(matrix_mdev->qtable,
+>mdev_qnode, q->apqn);
+   }
+}
+
+static void vfio_ap_mdev_link_apqn(struct ap_matrix_mdev *matrix_mdev, int 
apqn)
+{
+   struct vfio_ap_queue *q;
+
+   q = vfio_ap_find_queue(apqn);
+   vfio_ap_mdev_link_queue(matrix_mdev, q);
+}
+
+static void vfio_ap_mdev_unlink_queue(struct vfio_ap_queue *q)
+{
+   if (q) {
+   q->matrix_mdev = NULL;
+   hash_del(>mdev_qnode);
+   }
+}
+
+static void vfio_ap_mdev_u

[PATCH v13 08/15] s390/vfio-ap: sysfs attribute to display the guest's matrix

2020-12-22 Thread Tony Krowiak
The matrix of adapters and domains configured in a guest's APCB may
differ from the matrix of adapters and domains assigned to the matrix mdev,
so this patch introduces a sysfs attribute to display the matrix of
adapters and domains that are or will be assigned to the APCB of a guest
that is or will be using the matrix mdev. For a matrix mdev denoted by
$uuid, the guest matrix can be displayed as follows:

   cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix

Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_ops.c | 51 ++-
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 44b3a81cadfb..1b1d5975ee0e 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -894,29 +894,24 @@ static ssize_t control_domains_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(control_domains);
 
-static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
-  char *buf)
+static ssize_t vfio_ap_mdev_matrix_show(struct ap_matrix *matrix, char *buf)
 {
-   struct mdev_device *mdev = mdev_from_dev(dev);
-   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
char *bufpos = buf;
unsigned long apid;
unsigned long apqi;
unsigned long apid1;
unsigned long apqi1;
-   unsigned long napm_bits = matrix_mdev->matrix.apm_max + 1;
-   unsigned long naqm_bits = matrix_mdev->matrix.aqm_max + 1;
+   unsigned long napm_bits = matrix->apm_max + 1;
+   unsigned long naqm_bits = matrix->aqm_max + 1;
int nchars = 0;
int n;
 
-   apid1 = find_first_bit_inv(matrix_mdev->matrix.apm, napm_bits);
-   apqi1 = find_first_bit_inv(matrix_mdev->matrix.aqm, naqm_bits);
-
-   mutex_lock(_dev->lock);
+   apid1 = find_first_bit_inv(matrix->apm, napm_bits);
+   apqi1 = find_first_bit_inv(matrix->aqm, naqm_bits);
 
if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {
-   for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
-   for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
+   for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
+   for_each_set_bit_inv(apqi, matrix->aqm,
 naqm_bits) {
n = sprintf(bufpos, "%02lx.%04lx\n", apid,
apqi);
@@ -925,25 +920,52 @@ static ssize_t matrix_show(struct device *dev, struct 
device_attribute *attr,
}
}
} else if (apid1 < napm_bits) {
-   for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
+   for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
n = sprintf(bufpos, "%02lx.\n", apid);
bufpos += n;
nchars += n;
}
} else if (apqi1 < naqm_bits) {
-   for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm_bits) {
+   for_each_set_bit_inv(apqi, matrix->aqm, naqm_bits) {
n = sprintf(bufpos, ".%04lx\n", apqi);
bufpos += n;
nchars += n;
}
}
 
+   return nchars;
+}
+
+static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
+  char *buf)
+{
+   ssize_t nchars;
+   struct mdev_device *mdev = mdev_from_dev(dev);
+   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+   mutex_lock(_dev->lock);
+   nchars = vfio_ap_mdev_matrix_show(_mdev->matrix, buf);
mutex_unlock(_dev->lock);
 
return nchars;
 }
 static DEVICE_ATTR_RO(matrix);
 
+static ssize_t guest_matrix_show(struct device *dev,
+struct device_attribute *attr, char *buf)
+{
+   ssize_t nchars;
+   struct mdev_device *mdev = mdev_from_dev(dev);
+   struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+   mutex_lock(_dev->lock);
+   nchars = vfio_ap_mdev_matrix_show(_mdev->shadow_apcb, buf);
+   mutex_unlock(_dev->lock);
+
+   return nchars;
+}
+static DEVICE_ATTR_RO(guest_matrix);
+
 static struct attribute *vfio_ap_mdev_attrs[] = {
_attr_assign_adapter.attr,
_attr_unassign_adapter.attr,
@@ -953,6 +975,7 @@ static struct attribute *vfio_ap_mdev_attrs[] = {
_attr_unassign_control_domain.attr,
_attr_control_domains.attr,
_attr_matrix.attr,
+   _attr_guest_matrix.attr,
NULL,
 };
 
-- 
2.21.1



[PATCH v13 06/15] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device

2020-12-22 Thread Tony Krowiak
The current implementation does not allow assignment of an AP adapter or
domain to an mdev device if each APQN resulting from the assignment
does not reference an AP queue device that is bound to the vfio_ap device
driver. This patch allows assignment of AP resources to the matrix mdev as
long as the APQNs resulting from the assignment:
   1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
   2. Are not assigned to another matrix mdev.

The rationale behind this is twofold:
   1. The AP architecture does not preclude assignment of APQNs to an AP
  configuration that are not available to the system.
   2. APQNs that do not reference a queue device bound to the vfio_ap
  device driver will not be assigned to the guest's CRYCB, so the
  guest will not get access to queues not bound to the vfio_ap driver.

Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_ops.c | 241 --
 1 file changed, 62 insertions(+), 179 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index cdcc6378b4a5..2d58b39977be 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -379,134 +379,37 @@ static struct attribute_group 
*vfio_ap_mdev_type_groups[] = {
NULL,
 };
 
-struct vfio_ap_queue_reserved {
-   unsigned long *apid;
-   unsigned long *apqi;
-   bool reserved;
-};
+#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
+"already assigned to %s"
 
-/**
- * vfio_ap_has_queue
- *
- * @dev: an AP queue device
- * @data: a struct vfio_ap_queue_reserved reference
- *
- * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
- * apid or apqi specified in @data:
- *
- * - If @data contains both an apid and apqi value, then @data will be flagged
- *   as reserved if the APID and APQI fields for the AP queue device matches
- *
- * - If @data contains only an apid value, @data will be flagged as
- *   reserved if the APID field in the AP queue device matches
- *
- * - If @data contains only an apqi value, @data will be flagged as
- *   reserved if the APQI field in the AP queue device matches
- *
- * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
- * @data does not contain either an apid or apqi.
- */
-static int vfio_ap_has_queue(struct device *dev, void *data)
+static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
+unsigned long *apm,
+unsigned long *aqm)
 {
-   struct vfio_ap_queue_reserved *qres = data;
-   struct ap_queue *ap_queue = to_ap_queue(dev);
-   ap_qid_t qid;
-   unsigned long id;
-
-   if (qres->apid && qres->apqi) {
-   qid = AP_MKQID(*qres->apid, *qres->apqi);
-   if (qid == ap_queue->qid)
-   qres->reserved = true;
-   } else if (qres->apid && !qres->apqi) {
-   id = AP_QID_CARD(ap_queue->qid);
-   if (id == *qres->apid)
-   qres->reserved = true;
-   } else if (!qres->apid && qres->apqi) {
-   id = AP_QID_QUEUE(ap_queue->qid);
-   if (id == *qres->apqi)
-   qres->reserved = true;
-   } else {
-   return -EINVAL;
-   }
+   unsigned long apid, apqi;
 
-   return 0;
-}
-
-/**
- * vfio_ap_verify_queue_reserved
- *
- * @matrix_dev: a mediated matrix device
- * @apid: an AP adapter ID
- * @apqi: an AP queue index
- *
- * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP 
device
- * driver according to the following rules:
- *
- * - If both @apid and @apqi are not NULL, then there must be an AP queue
- *   device bound to the vfio_ap driver with the APQN identified by @apid and
- *   @apqi
- *
- * - If only @apid is not NULL, then there must be an AP queue device bound
- *   to the vfio_ap driver with an APQN containing @apid
- *
- * - If only @apqi is not NULL, then there must be an AP queue device bound
- *   to the vfio_ap driver with an APQN containing @apqi
- *
- * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
- */
-static int vfio_ap_verify_queue_reserved(unsigned long *apid,
-unsigned long *apqi)
-{
-   int ret;
-   struct vfio_ap_queue_reserved qres;
-
-   qres.apid = apid;
-   qres.apqi = apqi;
-   qres.reserved = false;
-
-   ret = driver_for_each_device(_dev->vfio_ap_drv->driver, NULL,
-, vfio_ap_has_queue);
-   if (ret)
-   return ret;
-
-   if (qres.reserved)
-   return 0;
-
-   return -EADDRNOTAVAIL;
-}
-
-static int
-vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev 

[PATCH v13 10/15] s390/zcrypt: driver callback to indicate resource in use

2020-12-22 Thread Tony Krowiak
Introduces a new driver callback to prevent a root user from unbinding
an AP queue from its device driver if the queue is in use. The callback
will be invoked whenever a change to the AP bus's sysfs apmask or aqmask
attributes would result in one or more AP queues being removed from its
driver. If the callback responds in the affirmative for any driver
queried, the change to the apmask or aqmask will be rejected with a device
busy error.

For this patch, only non-default drivers will be queried. Currently,
there is only one non-default driver, the vfio_ap device driver. The
vfio_ap device driver facilitates pass-through of an AP queue to a
guest. The idea here is that a guest may be administered by a different
sysadmin than the host and we don't want AP resources to unexpectedly
disappear from a guest's AP configuration (i.e., adapters and domains
assigned to the matrix mdev). This will enforce the proper procedure for
removing AP resources intended for guest usage which is to
first unassign them from the matrix mdev, then unbind them from the
vfio_ap device driver.

Signed-off-by: Tony Krowiak 
Reviewed-by: Harald Freudenberger 
---
 drivers/s390/crypto/ap_bus.c | 160 ---
 drivers/s390/crypto/ap_bus.h |   4 +
 2 files changed, 154 insertions(+), 10 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 2758d05a802d..7d8add952dd6 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ap_bus.h"
 #include "ap_debug.h"
@@ -1006,6 +1007,23 @@ static int modify_bitmap(const char *str, unsigned long 
*bitmap, int bits)
return 0;
 }
 
+static int ap_parse_bitmap_str(const char *str, unsigned long *bitmap, int 
bits,
+  unsigned long *newmap)
+{
+   unsigned long size;
+   int rc;
+
+   size = BITS_TO_LONGS(bits)*sizeof(unsigned long);
+   if (*str == '+' || *str == '-') {
+   memcpy(newmap, bitmap, size);
+   rc = modify_bitmap(str, newmap, bits);
+   } else {
+   memset(newmap, 0, size);
+   rc = hex2bitmap(str, newmap, bits);
+   }
+   return rc;
+}
+
 int ap_parse_mask_str(const char *str,
  unsigned long *bitmap, int bits,
  struct mutex *lock)
@@ -1025,14 +1043,7 @@ int ap_parse_mask_str(const char *str,
kfree(newmap);
return -ERESTARTSYS;
}
-
-   if (*str == '+' || *str == '-') {
-   memcpy(newmap, bitmap, size);
-   rc = modify_bitmap(str, newmap, bits);
-   } else {
-   memset(newmap, 0, size);
-   rc = hex2bitmap(str, newmap, bits);
-   }
+   rc = ap_parse_bitmap_str(str, bitmap, bits, newmap);
if (rc == 0)
memcpy(bitmap, newmap, size);
mutex_unlock(lock);
@@ -1224,12 +1235,76 @@ static ssize_t apmask_show(struct bus_type *bus, char 
*buf)
return rc;
 }
 
+static int __verify_card_reservations(struct device_driver *drv, void *data)
+{
+   int rc = 0;
+   struct ap_driver *ap_drv = to_ap_drv(drv);
+   unsigned long *newapm = (unsigned long *)data;
+
+   /*
+* No need to verify whether the driver is using the queues if it is the
+* default driver.
+*/
+   if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+   return 0;
+
+   /*
+* increase the driver's module refcounter to be sure it is not
+* going away when we invoke the callback function.
+*/
+   if (!try_module_get(drv->owner))
+   return 0;
+
+   if (ap_drv->in_use) {
+   rc = ap_drv->in_use(newapm, ap_perms.aqm);
+   if (rc)
+   return rc;
+   }
+
+   /* release the driver's module */
+   module_put(drv->owner);
+
+   return rc;
+}
+
+static int apmask_commit(unsigned long *newapm)
+{
+   int rc;
+   unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
+
+   /*
+* Check if any bits in the apmask have been set which will
+* result in queues being removed from non-default drivers
+*/
+   if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
+   rc = bus_for_each_drv(_bus_type, NULL, reserved,
+ __verify_card_reservations);
+   if (rc)
+   return rc;
+   }
+
+   memcpy(ap_perms.apm, newapm, APMASKSIZE);
+
+   return 0;
+}
+
 static ssize_t apmask_store(struct bus_type *bus, const char *buf,
size_t count)
 {
int rc;
+   DECLARE_BITMAP(newapm, AP_DEVICES);
+
+   if (mutex_lock_interruptible(_perms_mutex))
+   return -ERESTARTSYS;
 
-   rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, _perms_mutex

[PATCH v13 07/15] s390/vfio-ap: introduce shadow APCB

2020-12-22 Thread Tony Krowiak
The APCB is a field within the CRYCB that provides the AP configuration
to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
maintain it for the lifespan of the guest.

Signed-off-by: Tony Krowiak 
Reviewed-by: Halil Pasic 
---
 drivers/s390/crypto/vfio_ap_ops.c | 15 +++
 drivers/s390/crypto/vfio_ap_private.h |  2 ++
 2 files changed, 17 insertions(+)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 2d58b39977be..44b3a81cadfb 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -293,6 +293,20 @@ static void vfio_ap_matrix_init(struct ap_config_info 
*info,
matrix->adm_max = info->apxa ? info->Nd : 15;
 }
 
+static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
+{
+   return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
+}
+
+static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
+{
+   if (vfio_ap_mdev_has_crycb(matrix_mdev))
+   kvm_arch_crypto_set_masks(matrix_mdev->kvm,
+ matrix_mdev->shadow_apcb.apm,
+ matrix_mdev->shadow_apcb.aqm,
+ matrix_mdev->shadow_apcb.adm);
+}
+
 static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 {
struct ap_matrix_mdev *matrix_mdev;
@@ -308,6 +322,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct 
mdev_device *mdev)
 
matrix_mdev->mdev = mdev;
vfio_ap_matrix_init(_dev->info, _mdev->matrix);
+   vfio_ap_matrix_init(_dev->info, _mdev->shadow_apcb);
hash_init(matrix_mdev->qtable);
mdev_set_drvdata(mdev, matrix_mdev);
matrix_mdev->pqap_hook.hook = handle_pqap;
diff --git a/drivers/s390/crypto/vfio_ap_private.h 
b/drivers/s390/crypto/vfio_ap_private.h
index 4e5cc72fc0db..d2d26ba18602 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -75,6 +75,7 @@ struct ap_matrix {
  * @list:  allows the ap_matrix_mdev struct to be added to a list
  * @matrix:the adapters, usage domains and control domains assigned to the
  * mediated matrix device.
+ * @shadow_apcb:the shadow copy of the APCB field of the KVM guest's CRYCB
  * @group_notifier: notifier block used for specifying callback function for
  * handling the VFIO_GROUP_NOTIFY_SET_KVM event
  * @kvm:   the struct holding guest's state
@@ -82,6 +83,7 @@ struct ap_matrix {
 struct ap_matrix_mdev {
struct list_head node;
struct ap_matrix matrix;
+   struct ap_matrix shadow_apcb;
struct notifier_block group_notifier;
struct notifier_block iommu_notifier;
struct kvm *kvm;
-- 
2.21.1



[PATCH v13 14/15] s390/vfio-ap: handle AP bus scan completed notification

2020-12-22 Thread Tony Krowiak
Implements the driver callback invoked by the AP bus when the AP bus
scan has completed. Since this callback is invoked after binding the newly
added devices to their respective device drivers, the vfio_ap driver will
attempt to hot plug the adapters, domains and control domains into each
guest using the matrix mdev to which they are assigned. Keep in mind that
an adapter or domain can be plugged in only if:
* Each APQN derived from the newly added APID of the adapter and the APQIs
  already assigned to the guest's APCB references an AP queue device bound
  to the vfio_ap driver
* Each APQN derived from the newly added APQI of the domain and the APIDs
  already assigned to the guest's APCB references an AP queue device bound
  to the vfio_ap driver

Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_drv.c |  1 +
 drivers/s390/crypto/vfio_ap_ops.c | 21 +
 drivers/s390/crypto/vfio_ap_private.h |  2 ++
 3 files changed, 24 insertions(+)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
b/drivers/s390/crypto/vfio_ap_drv.c
index 2029d8392416..075495fc44c0 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -149,6 +149,7 @@ static int __init vfio_ap_init(void)
vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
vfio_ap_drv.on_config_changed = vfio_ap_on_cfg_changed;
+   vfio_ap_drv.on_scan_complete = vfio_ap_on_scan_complete;
vfio_ap_drv.ids = ap_queue_ids;
 
ret = ap_driver_register(_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 8bbbd1dc7546..b8ed01297812 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1592,3 +1592,24 @@ void vfio_ap_on_cfg_changed(struct ap_config_info 
*new_config_info,
vfio_ap_mdev_on_cfg_add();
mutex_unlock(_dev->lock);
 }
+
+void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
+ struct ap_config_info *old_config_info)
+{
+   struct ap_matrix_mdev *matrix_mdev;
+
+   mutex_lock(_dev->lock);
+   list_for_each_entry(matrix_mdev, _dev->mdev_list, node) {
+   if (bitmap_intersects(matrix_mdev->matrix.apm,
+ matrix_dev->ap_add, AP_DEVICES) ||
+   bitmap_intersects(matrix_mdev->matrix.aqm,
+ matrix_dev->aq_add, AP_DOMAINS) ||
+   bitmap_intersects(matrix_mdev->matrix.adm,
+ matrix_dev->ad_add, AP_DOMAINS))
+   vfio_ap_mdev_refresh_apcb(matrix_mdev);
+   }
+
+   bitmap_clear(matrix_dev->ap_add, 0, AP_DEVICES);
+   bitmap_clear(matrix_dev->aq_add, 0, AP_DOMAINS);
+   mutex_unlock(_dev->lock);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h 
b/drivers/s390/crypto/vfio_ap_private.h
index b99b68968447..7f0f7c92e686 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -117,5 +117,7 @@ int vfio_ap_mdev_resource_in_use(unsigned long *apm, 
unsigned long *aqm);
 
 void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
struct ap_config_info *old_config_info);
+void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
+ struct ap_config_info *old_config_info);
 
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.21.1



[PATCH v13 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

2020-12-22 Thread Tony Krowiak
Let's allow adapters, domains and control domains to be hot plugged into
and hot unplugged from a KVM guest using a matrix mdev when:

* The adapter, domain or control domain is assigned to or unassigned from
  the matrix mdev

* A queue device with an APQN assigned to the matrix mdev is bound to or
  unbound from the vfio_ap device driver.

Whenever an assignment or unassignment of an adapter, domain or control
domain is performed as well as when a bind or unbind of a queue device
is executed, the AP control block (APCB) that supplies the AP configuration
to a guest is first refreshed. The APCB is refreshed by copying the AP
configuration from the mdev's matrix to the APCB, then filtering the
APCB according to the following rules:

* The APID of each adapter and the APQI of each domain that is not in the
  host's AP configuration is filtered out.

* The APID of each adapter comprising an APQN that does not reference a
  queue device bound to the vfio_ap device driver is filtered. The APQNs
  are derived from the Cartesian product of the APID of each adapter and
  APQI of each domain assigned to the mdev's matrix.

After refreshing the APCB, if the mdev is in use by a KVM guest, it is
hot plugged into the guest to provide access to dynamically provide
access to the adapters, domains and control domains provided via the
newly refreshed APCB.

Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_ops.c | 143 --
 1 file changed, 118 insertions(+), 25 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 1b1d5975ee0e..843862c88379 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -307,6 +307,88 @@ static void vfio_ap_mdev_commit_shadow_apcb(struct 
ap_matrix_mdev *matrix_mdev)
  matrix_mdev->shadow_apcb.adm);
 }
 
+static void vfio_ap_mdev_filter_apcb(struct ap_matrix_mdev *matrix_mdev,
+struct ap_matrix *shadow_apcb)
+{
+   int ret;
+   unsigned long apid, apqi, apqn;
+
+   ret = ap_qci(_dev->info);
+   if (ret)
+   return;
+
+   memcpy(shadow_apcb, _mdev->matrix, sizeof(struct ap_matrix));
+
+   /*
+* Copy the adapters, domains and control domains to the shadow_apcb
+* from the matrix mdev, but only those that are assigned to the host's
+* AP configuration.
+*/
+   bitmap_and(shadow_apcb->apm, matrix_mdev->matrix.apm,
+  (unsigned long *)matrix_dev->info.apm, AP_DEVICES);
+   bitmap_and(shadow_apcb->aqm, matrix_mdev->matrix.aqm,
+  (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
+   bitmap_and(shadow_apcb->adm, matrix_mdev->matrix.adm,
+  (unsigned long *)matrix_dev->info.adm, AP_DOMAINS);
+
+   /* If there are no APQNs assigned, then filtering them be unnecessary */
+   if (bitmap_empty(shadow_apcb->apm, AP_DEVICES)) {
+   if (!bitmap_empty(shadow_apcb->aqm, AP_DOMAINS))
+   bitmap_clear(shadow_apcb->aqm, 0, AP_DOMAINS);
+   return;
+   } else if (bitmap_empty(shadow_apcb->aqm, AP_DOMAINS)) {
+   if (!bitmap_empty(shadow_apcb->apm, AP_DEVICES))
+   bitmap_clear(shadow_apcb->apm, 0, AP_DEVICES);
+   return;
+   }
+
+   for_each_set_bit_inv(apid, shadow_apcb->apm, AP_DEVICES) {
+   for_each_set_bit_inv(apqi, shadow_apcb->aqm, AP_DOMAINS) {
+   /*
+* If the APQN is not bound to the vfio_ap device
+* driver, then we can't assign it to the guest's
+* AP configuration. The AP architecture won't
+* allow filtering of a single APQN, so if we're
+* filtering APIDs, then filter the APID; otherwise,
+* filter the APQI.
+*/
+   apqn = AP_MKQID(apid, apqi);
+   if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
+   clear_bit_inv(apid, shadow_apcb->apm);
+   break;
+   }
+   }
+   }
+}
+
+/**
+ * vfio_ap_mdev_refresh_apcb
+ *
+ * Filter APQNs assigned to the matrix mdev that do not reference an AP queue
+ * device bound to the vfio_ap device driver.
+ *
+ * @matrix_mdev:  the matrix mdev whose AP configuration is to be filtered
+ * @shadow_apcb:  the shadow of the KVM guest's APCB (contains AP configuration
+ *   for guest)
+ * @filter_apids: boolean value indicating whether the APQNs shall be filtered
+ *   by APID (true) or by APQI (false).
+ *
+ * Returns the number of APQNs remaining after filtering is complete.
+ */
+static void vfio_ap_mdev_refresh_apcb(struct ap_ma

[PATCH v13 15/15] s390/vfio-ap: update docs to include dynamic config support

2020-12-22 Thread Tony Krowiak
Update the documentation in vfio-ap.rst to include information about the
AP dynamic configuration support (i.e., hot plug of adapters, domains
and control domains via the matrix mediated device's sysfs assignment
attributes).

Signed-off-by: Tony Krowiak 
---
 Documentation/s390/vfio-ap.rst | 383 -
 1 file changed, 284 insertions(+), 99 deletions(-)

diff --git a/Documentation/s390/vfio-ap.rst b/Documentation/s390/vfio-ap.rst
index e15436599086..031c2e5ee138 100644
--- a/Documentation/s390/vfio-ap.rst
+++ b/Documentation/s390/vfio-ap.rst
@@ -123,9 +123,9 @@ Let's now take a look at how AP instructions executed on a 
guest are interpreted
 by the hardware.
 
 A satellite control block called the Crypto Control Block (CRYCB) is attached 
to
-our main hardware virtualization control block. The CRYCB contains three fields
-to identify the adapters, usage domains and control domains assigned to the KVM
-guest:
+our main hardware virtualization control block. The CRYCB contains an AP 
Control
+Block (APCB) that has three fields to identify the adapters, usage domains and
+control domains assigned to the KVM guest:
 
 * The AP Mask (APM) field is a bit mask that identifies the AP adapters 
assigned
   to the KVM guest. Each bit in the mask, from left to right (i.e. from most
@@ -192,7 +192,7 @@ The design introduces three new objects:
 
 1. AP matrix device
 2. VFIO AP device driver (vfio_ap.ko)
-3. VFIO AP mediated matrix pass-through device
+3. VFIO AP mediated pass-through device
 
 The VFIO AP device driver
 -
@@ -200,12 +200,13 @@ The VFIO AP (vfio_ap) device driver serves the following 
purposes:
 
 1. Provides the interfaces to secure APQNs for exclusive use of KVM guests.
 
-2. Sets up the VFIO mediated device interfaces to manage a mediated matrix
+2. Sets up the VFIO mediated device interfaces to manage a vfio_ap mediated
device and creates the sysfs interfaces for assigning adapters, usage
domains, and control domains comprising the matrix for a KVM guest.
 
-3. Configures the APM, AQM and ADM in the CRYCB referenced by a KVM guest's
-   SIE state description to grant the guest access to a matrix of AP devices
+3. Configures the APM, AQM and ADM in the APCB contained in the CRYCB 
referenced
+   by a KVM guest's SIE state description to grant the guest access to a matrix
+   of AP devices
 
 Reserve APQNs for exclusive use of KVM guests
 -
@@ -253,7 +254,7 @@ The process for reserving an AP queue for use by a KVM 
guest is:
 1. The administrator loads the vfio_ap device driver
 2. The vfio-ap driver during its initialization will register a single 'matrix'
device with the device core. This will serve as the parent device for
-   all mediated matrix devices used to configure an AP matrix for a guest.
+   all vfio_ap mediated devices used to configure an AP matrix for a guest.
 3. The /sys/devices/vfio_ap/matrix device is created by the device core
 4. The vfio_ap device driver will register with the AP bus for AP queue devices
of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap
@@ -269,7 +270,7 @@ The process for reserving an AP queue for use by a KVM 
guest is:
default zcrypt cex4queue driver.
 8. The AP bus probes the vfio_ap device driver to bind the queues reserved for
it.
-9. The administrator creates a passthrough type mediated matrix device to be
+9. The administrator creates a passthrough type vfio_ap mediated device to be
used by a guest
 10. The administrator assigns the adapters, usage domains and control domains
 to be exclusively used by a guest.
@@ -279,14 +280,14 @@ Set up the VFIO mediated device interfaces
 The VFIO AP device driver utilizes the common interface of the VFIO mediated
 device core driver to:
 
-* Register an AP mediated bus driver to add a mediated matrix device to and
+* Register an AP mediated bus driver to add a vfio_ap mediated device to and
   remove it from a VFIO group.
-* Create and destroy a mediated matrix device
-* Add a mediated matrix device to and remove it from the AP mediated bus driver
-* Add a mediated matrix device to and remove it from an IOMMU group
+* Create and destroy a vfio_ap mediated device
+* Add a vfio_ap mediated device to and remove it from the AP mediated bus 
driver
+* Add a vfio_ap mediated device to and remove it from an IOMMU group
 
 The following high-level block diagram shows the main components and interfaces
-of the VFIO AP mediated matrix device driver::
+of the VFIO AP mediated device driver::
 
+-+
| |
@@ -343,7 +344,7 @@ matrix device.
* device_api:
the mediated device type's API
* available_instances:
-   the number of mediated matrix passthrough devices
+   the number of vfio_ap mediated passthrough devices
that can be created
* device_api:
specifies the VFIO API

[PATCH v13 13/15] s390/vfio-ap: handle host AP config change notification

2020-12-22 Thread Tony Krowiak
The motivation for config change notification is to enable the vfio_ap
device driver to handle hot plug/unplug of AP queues for a KVM guest as a
bulk operation. For example, if a new APID is dynamically assigned to the
host configuration, then a queue device will be created for each APQN that
can be formulated from the new APID and all APQIs already assigned to the
host configuration. Each of these new queue devices will get bound to their
respective driver one at a time, as they are created. In the case of the
vfio_ap driver, if the APQN of the queue device being bound to the driver
is assigned to a matrix mdev in use by a KVM guest, it will be hot plugged
into the guest if possible. Given that the AP architecture allows for 256
adapters and 256 domains, one can see the possibility of the vfio_ap
driver's probe/remove callbacks getting invoked an inordinate number of
times when the host configuration changes. Keep in mind that in order to
plug/unplug an AP queue for a guest, the guest's VCPUs must be suspended,
then the guest's AP configuration must be updated followed by the VCPUs
being resumed. If this is done each time the probe or remove callback is
invoked and there are hundreds or thousands of queues to be probed or
removed, this would be incredibly inefficient and could have a large impact
on guest performance. What the config notification does is allow us to
make the changes to the guest in a single operation.

This patch implements the on_cfg_changed callback which notifies the
AP device drivers that the host AP configuration has changed (i.e.,
adapters, domains and/or control domains are added to or removed from the
host AP configuration).

Adapters added to host configuration:
* The APIDs of the adapters added will be stored in a bitmap contained
  within the struct representing the matrix device which is the parent
  device of all matrix mediated devices.
* When a queue is probed, if the APQN of the queue being probed is
  assigned to an mdev in use by a guest, the queue may get hot plugged
  into the guest; however, if the APID of the adapter is contained in the
  bitmap of adapters added, the queue hot plug operation will be skipped
  until the AP bus notifies the driver that its scan operation has
  completed (another patch).
* When the vfio_ap driver is notified that the AP bus scan has completed,
  the guest's APCB will be refreshed by filtering the mdev's matrix by
  APID.

Domains added to host configuration:
* The APQIs of the domains added will be stored in a bitmap contained
  within the struct representing the matrix device which is the parent
  device of all matrix mediated devices.
* When a queue is probed, if the APQN of the queue being probed is
  assigned to an mdev in use by a guest, the queue may get hot plugged
  into the guest; however, if the APQI of the domain is contained in the
  bitmap of domains added, the queue hot plug operation will be skipped
  until the AP bus notifies the driver that its scan operation has
  completed (another patch).

Control domains added to the host configuration:
* The domain numbers of the domains added will be stored in a bitmap
  contained within the struct representing the matrix device which is the
  parent device of all matrix mediated devices.

When the vfio_ap device driver is notified that the AP bus scan has
completed, the APCB for each matrix mdev to which the adapters, domains
and control domains added are assigned will be refreshed. If a KVM guest is
using the matrix mdev, the APCB will be hot plugged into the guest to
refresh its AP configuration.

Adapters removed from configuration:
* Each queue device with the APID identifying an adapter removed from
  the host AP configuration will be unlinked from the matrix mdev to which
  the queue's APQN is assigned.
* When the vfio_ap driver's remove callback is invoked, if the queue
  device is not linked to the matrix mdev, the refresh of the guest's
  APCB will be skipped.

Domains removed from configuration:
* Each queue device with the APQI identifying a domain removed from
  the host AP configuration will be unlinked from the matrix mdev to which
  the queue's APQN is assigned.
* When the vfio_ap driver's remove callback is invoked, if the queue
  device is not linked to the matrix mdev, the refresh of the guest's
  APCB will be skipped.

If any queues with an APQN assigned to a given matrix mdev have been
unlinked or any control domains assigned to a given matrix mdev have been
removed from the host AP configuration, the APCB of the matrix mdev will
be refreshed. If a KVM guest is using the matrix mdev, the APCB will be hot
plugged into the guest to refresh its AP configuration.

Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_drv.c |   3 +-
 drivers/s390/crypto/vfio_ap_ops.c | 159 +++---
 drivers/s390/crypto/vfio_ap_private.h |  13 ++-
 3 files changed, 158 insertions(+), 17 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c

[PATCH v13 11/15] s390/vfio-ap: implement in-use callback for vfio_ap driver

2020-12-22 Thread Tony Krowiak
Let's implement the callback to indicate when an APQN
is in use by the vfio_ap device driver. The callback is
invoked whenever a change to the apmask or aqmask would
result in one or more queue devices being removed from the driver. The
vfio_ap device driver will indicate a resource is in use
if the APQN of any of the queue devices to be removed are assigned to
any of the matrix mdevs under the driver's control.

There is potential for a deadlock condition between the matrix_dev->lock
used to lock the matrix device during assignment of adapters and domains
and the ap_perms_mutex locked by the AP bus when changes are made to the
sysfs apmask/aqmask attributes.

Consider following scenario (courtesy of Halil Pasic):
1) apmask_store() takes ap_perms_mutex
2) assign_adapter_store() takes matrix_dev->lock
3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
   to take matrix_dev->lock
4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
   which tries to take ap_perms_mutex

BANG!

To resolve this issue, instead of using the mutex_lock(_dev->lock)
function to lock the matrix device during assignment of an adapter or
domain to a matrix_mdev as well as during the in_use callback, the
mutex_trylock(_dev->lock) function will be used. If the lock is not
obtained, then the assignment and in_use functions will terminate with
-EBUSY.

Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_drv.c |  1 +
 drivers/s390/crypto/vfio_ap_ops.c | 21 ++---
 drivers/s390/crypto/vfio_ap_private.h |  2 ++
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
b/drivers/s390/crypto/vfio_ap_drv.c
index 73bd073fd5d3..8934471b7944 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -147,6 +147,7 @@ static int __init vfio_ap_init(void)
memset(_ap_drv, 0, sizeof(vfio_ap_drv));
vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
+   vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
vfio_ap_drv.ids = ap_queue_ids;
 
ret = ap_driver_register(_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 843862c88379..6bc2e80cc565 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -644,7 +644,8 @@ static ssize_t assign_adapter_store(struct device *dev,
memset(apm, 0, sizeof(apm));
set_bit_inv(apid, apm);
 
-   mutex_lock(_dev->lock);
+   if (!mutex_trylock(_dev->lock))
+   return -EBUSY;
 
ret = vfio_ap_mdev_validate_masks(matrix_mdev, apm,
  matrix_mdev->matrix.aqm);
@@ -777,7 +778,8 @@ static ssize_t assign_domain_store(struct device *dev,
memset(aqm, 0, sizeof(aqm));
set_bit_inv(apqi, aqm);
 
-   mutex_lock(_dev->lock);
+   if (!mutex_trylock(_dev->lock))
+   return -EBUSY;
 
ret = vfio_ap_mdev_validate_masks(matrix_mdev, matrix_mdev->matrix.apm,
  aqm);
@@ -896,7 +898,8 @@ static ssize_t assign_control_domain_store(struct device 
*dev,
 * least significant, correspond to IDs 0 up to the one less than the
 * number of control domains that can be assigned.
 */
-   mutex_lock(_dev->lock);
+   if (!mutex_trylock(_dev->lock))
+   return -EBUSY;
set_bit_inv(id, matrix_mdev->matrix.adm);
vfio_ap_mdev_hot_plug_cdom(matrix_mdev, id);
mutex_unlock(_dev->lock);
@@ -1446,3 +1449,15 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
kfree(q);
mutex_unlock(_dev->lock);
 }
+
+int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
+{
+   int ret;
+
+   if (!mutex_trylock(_dev->lock))
+   return -EBUSY;
+   ret = vfio_ap_mdev_verify_no_sharing(NULL, apm, aqm);
+   mutex_unlock(_dev->lock);
+
+   return ret;
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h 
b/drivers/s390/crypto/vfio_ap_private.h
index d2d26ba18602..15b7cd74843b 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -107,4 +107,6 @@ struct vfio_ap_queue {
 int vfio_ap_mdev_probe_queue(struct ap_device *queue);
 void vfio_ap_mdev_remove_queue(struct ap_device *queue);
 
+int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
+
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.21.1



[PATCH v13 12/15] s390/zcrypt: Notify driver on config changed and scan complete callbacks

2020-12-22 Thread Tony Krowiak
This patch intruduces an extension to the ap bus to notify device drivers
when the host AP configuration changes - i.e., adapters, domains or
control domains are added or removed. To that end, two new callbacks are
introduced for AP device drivers:

  void (*on_config_changed)(struct ap_config_info *new_config_info,
struct ap_config_info *old_config_info);

 This callback is invoked at the start of the AP bus scan
 function when it determines that the host AP configuration information
 has changed since the previous scan. This is done by storing
 an old and current QCI info struct and comparing them. If there is any
 difference, the callback is invoked.

 Note that when the AP bus scan detects that AP adapters, domains or
 control domains have been removed from the host's AP configuration, it
 will remove the associated devices from the AP bus subsystem's device
 model. This callback gives the device driver a chance to respond to
 the removal of the AP devices from the host configuration prior to
 calling the device driver's remove callback. The primary purpose of
 this callback is to allow the vfio_ap driver to do a bulk unplug of
 all affected adapters, domains and control domains from affected
 guests rather than unplugging them one at a time when the remove
 callback is invoked.

  void (*on_scan_complete)(struct ap_config_info *new_config_info,
   struct ap_config_info *old_config_info);

 The on_scan_complete callback is invoked after the ap bus scan is
 complete if the host AP configuration data has changed.

 Note that when the AP bus scan detects that adapters, domains or
 control domains have been added to the host's configuration, it will
 create new devices in the AP bus subsystem's device model. The primary
 purpose of this callback is to allow the vfio_ap driver to do a bulk
 plug of all affected adapters, domains and control domains into
 affected guests rather than plugging them one at a time when the
 probe callback is invoked.

Please note that changes to the apmask and aqmask do not trigger
these two callbacks since the bus scan function is not invoked by changes
to those masks.

Signed-off-by: Harald Freudenberger 
Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/ap_bus.c | 91 +++-
 drivers/s390/crypto/ap_bus.h | 12 +
 2 files changed, 101 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 7d8add952dd6..788bfdaadafd 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -82,6 +82,7 @@ static atomic64_t ap_scan_bus_count;
 static DECLARE_COMPLETION(ap_init_apqn_bindings_complete);
 
 static struct ap_config_info *ap_qci_info;
+static struct ap_config_info *ap_qci_info_old;
 
 /*
  * AP bus related debug feature things.
@@ -1579,6 +1580,52 @@ static int __match_queue_device_with_queue_id(struct 
device *dev, const void *da
&& AP_QID_QUEUE(to_ap_queue(dev)->qid) == (int)(long) data;
 }
 
+/* Helper function for notify_config_changed */
+static int __drv_notify_config_changed(struct device_driver *drv, void *data)
+{
+   struct ap_driver *ap_drv = to_ap_drv(drv);
+
+   if (try_module_get(drv->owner)) {
+   if (ap_drv->on_config_changed)
+   ap_drv->on_config_changed(ap_qci_info,
+ ap_qci_info_old);
+   module_put(drv->owner);
+   }
+
+   return 0;
+}
+
+/* Notify all drivers about an qci config change */
+static inline void notify_config_changed(void)
+{
+   bus_for_each_drv(_bus_type, NULL, NULL,
+__drv_notify_config_changed);
+}
+
+/* Helper function for notify_scan_complete */
+static int __drv_notify_scan_complete(struct device_driver *drv, void *data)
+{
+   struct ap_driver *ap_drv = to_ap_drv(drv);
+
+   if (try_module_get(drv->owner)) {
+   if (ap_drv->on_scan_complete)
+   ap_drv->on_scan_complete(ap_qci_info,
+ap_qci_info_old);
+   module_put(drv->owner);
+   }
+
+   return 0;
+}
+
+/* Notify all drivers about bus scan complete */
+static inline void notify_scan_complete(void)
+{
+   bus_for_each_drv(_bus_type, NULL, NULL,
+__drv_notify_scan_complete);
+}
+
+
+
 /*
  * Helper function for ap_scan_bus().
  * Remove card device and associated queue devices.
@@ -1857,15 +1904,51 @@ static inline void ap_scan_adapter(int ap)
put_device(>ap_dev.device);
 }
 
+/*
+ * ap_get_configuration
+ *
+ * Stores the host AP configuration information returned from the previous call
+ * to Query Configuration Information (QCI), then retrieves and stores the
+ * current AP configuration returned from QC

[PATCH v13 03/15] s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c

2020-12-22 Thread Tony Krowiak
Let's move the probe and remove callbacks into the vfio_ap_ops.c
file to keep all code related to managing queues in a single file. This
way, all functions related to queue management can be removed from the
vfio_ap_private.h header file defining the public interfaces for the
vfio_ap device driver.

Signed-off-by: Tony Krowiak 
Reviewed-by: Halil Pasic 
---
 drivers/s390/crypto/vfio_ap_drv.c | 44 ++-
 drivers/s390/crypto/vfio_ap_ops.c | 34 +++--
 drivers/s390/crypto/vfio_ap_private.h |  6 ++--
 3 files changed, 37 insertions(+), 47 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
b/drivers/s390/crypto/vfio_ap_drv.c
index ca18c91afec9..73bd073fd5d3 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -43,46 +43,6 @@ static struct ap_device_id ap_queue_ids[] = {
 
 MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
 
-/**
- * vfio_ap_queue_dev_probe:
- *
- * Allocate a vfio_ap_queue structure and associate it
- * with the device as driver_data.
- */
-static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
-{
-   struct vfio_ap_queue *q;
-
-   q = kzalloc(sizeof(*q), GFP_KERNEL);
-   if (!q)
-   return -ENOMEM;
-   dev_set_drvdata(>device, q);
-   q->apqn = to_ap_queue(>device)->qid;
-   q->saved_isc = VFIO_AP_ISC_INVALID;
-   return 0;
-}
-
-/**
- * vfio_ap_queue_dev_remove:
- *
- * Takes the matrix lock to avoid actions on this device while removing
- * Free the associated vfio_ap_queue structure
- */
-static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
-{
-   struct vfio_ap_queue *q;
-   int apid, apqi;
-
-   mutex_lock(_dev->lock);
-   q = dev_get_drvdata(>device);
-   dev_set_drvdata(>device, NULL);
-   apid = AP_QID_CARD(q->apqn);
-   apqi = AP_QID_QUEUE(q->apqn);
-   vfio_ap_mdev_reset_queue(apid, apqi, 1);
-   kfree(q);
-   mutex_unlock(_dev->lock);
-}
-
 static void vfio_ap_matrix_dev_release(struct device *dev)
 {
struct ap_matrix_dev *matrix_dev = dev_get_drvdata(dev);
@@ -185,8 +145,8 @@ static int __init vfio_ap_init(void)
return ret;
 
memset(_ap_drv, 0, sizeof(vfio_ap_drv));
-   vfio_ap_drv.probe = vfio_ap_queue_dev_probe;
-   vfio_ap_drv.remove = vfio_ap_queue_dev_remove;
+   vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
+   vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
vfio_ap_drv.ids = ap_queue_ids;
 
ret = ap_driver_register(_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 052f61391ec7..a83d6e75361b 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -140,7 +140,7 @@ static void vfio_ap_free_aqic_resources(struct 
vfio_ap_queue *q)
  * Returns if ap_aqic function failed with invalid, deconfigured or
  * checkstopped AP.
  */
-struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
+static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
 {
struct ap_qirq_ctrl aqic_gisa = {};
struct ap_queue_status status;
@@ -1137,8 +1137,8 @@ static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
return q;
 }
 
-int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
-unsigned int retry)
+static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
+   unsigned int retry)
 {
struct ap_queue_status status;
struct vfio_ap_queue *q;
@@ -1321,3 +1321,31 @@ void vfio_ap_mdev_unregister(void)
 {
mdev_unregister_device(_dev->device);
 }
+
+int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
+{
+   struct vfio_ap_queue *q;
+
+   q = kzalloc(sizeof(*q), GFP_KERNEL);
+   if (!q)
+   return -ENOMEM;
+   dev_set_drvdata(>device, q);
+   q->apqn = to_ap_queue(>device)->qid;
+   q->saved_isc = VFIO_AP_ISC_INVALID;
+   return 0;
+}
+
+void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
+{
+   struct vfio_ap_queue *q;
+   int apid, apqi;
+
+   mutex_lock(_dev->lock);
+   q = dev_get_drvdata(>device);
+   dev_set_drvdata(>device, NULL);
+   apid = AP_QID_CARD(q->apqn);
+   apqi = AP_QID_QUEUE(q->apqn);
+   vfio_ap_mdev_reset_queue(apid, apqi, 1);
+   kfree(q);
+   mutex_unlock(_dev->lock);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h 
b/drivers/s390/crypto/vfio_ap_private.h
index 0db6fb3d56d5..d9003de4fbad 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -90,8 +90,6 @@ struct ap_matrix_mdev {
 
 extern int vfio_ap_mdev_register(void);
 extern void vfio_ap_mdev_unregister(void);
-int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
-unsigne

[PATCH v13 02/15] s390/vfio-ap: No need to disable IRQ after queue reset

2020-12-22 Thread Tony Krowiak
The queues assigned to a matrix mediated device are currently reset when:

* The VFIO_DEVICE_RESET ioctl is invoked
* The mdev fd is closed by userspace (QEMU)
* The mdev is removed from sysfs.

Immediately after the reset of a queue, a call is made to disable
interrupts for the queue. This is entirely unnecessary because the reset of
a queue disables interrupts, so this will be removed.

Signed-off-by: Tony Krowiak 
---
 drivers/s390/crypto/vfio_ap_drv.c |  1 -
 drivers/s390/crypto/vfio_ap_ops.c | 40 +--
 drivers/s390/crypto/vfio_ap_private.h |  1 -
 3 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
b/drivers/s390/crypto/vfio_ap_drv.c
index be2520cc010b..ca18c91afec9 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -79,7 +79,6 @@ static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
apid = AP_QID_CARD(q->apqn);
apqi = AP_QID_QUEUE(q->apqn);
vfio_ap_mdev_reset_queue(apid, apqi, 1);
-   vfio_ap_irq_disable(q);
kfree(q);
mutex_unlock(_dev->lock);
 }
diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 7339043906cf..052f61391ec7 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -25,6 +25,7 @@
 #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
 
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
+static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
 
 static int match_apqn(struct device *dev, const void *data)
 {
@@ -49,20 +50,15 @@ static struct vfio_ap_queue *vfio_ap_get_queue(
int apqn)
 {
struct vfio_ap_queue *q;
-   struct device *dev;
 
if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
return NULL;
if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
return NULL;
 
-   dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,
-, match_apqn);
-   if (!dev)
-   return NULL;
-   q = dev_get_drvdata(dev);
-   q->matrix_mdev = matrix_mdev;
-   put_device(dev);
+   q = vfio_ap_find_queue(apqn);
+   if (q)
+   q->matrix_mdev = matrix_mdev;
 
return q;
 }
@@ -1126,24 +1122,27 @@ static int vfio_ap_mdev_group_notifier(struct 
notifier_block *nb,
return notify_rc;
 }
 
-static void vfio_ap_irq_disable_apqn(int apqn)
+static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
 {
struct device *dev;
-   struct vfio_ap_queue *q;
+   struct vfio_ap_queue *q = NULL;
 
dev = driver_find_device(_dev->vfio_ap_drv->driver, NULL,
 , match_apqn);
if (dev) {
q = dev_get_drvdata(dev);
-   vfio_ap_irq_disable(q);
put_device(dev);
}
+
+   return q;
 }
 
 int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
 unsigned int retry)
 {
struct ap_queue_status status;
+   struct vfio_ap_queue *q;
+   int ret;
int retry2 = 2;
int apqn = AP_MKQID(apid, apqi);
 
@@ -1156,18 +1155,32 @@ int vfio_ap_mdev_reset_queue(unsigned int apid, 
unsigned int apqi,
status = ap_tapq(apqn, NULL);
}
WARN_ON_ONCE(retry2 <= 0);
-   return 0;
+   ret = 0;
+   goto free_aqic_resources;
case AP_RESPONSE_RESET_IN_PROGRESS:
case AP_RESPONSE_BUSY:
msleep(20);
break;
default:
/* things are really broken, give up */
-   return -EIO;
+   ret = -EIO;
+   goto free_aqic_resources;
}
} while (retry--);
 
return -EBUSY;
+
+free_aqic_resources:
+   /*
+* In order to free the aqic resources, the queue must be linked to
+* the matrix_mdev to which its APQN is assigned and the KVM pointer
+* must be available.
+*/
+   q = vfio_ap_find_queue(apqn);
+   if (q && q->matrix_mdev && q->matrix_mdev->kvm)
+   vfio_ap_free_aqic_resources(q);
+
+   return ret;
 }
 
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
@@ -1189,7 +1202,6 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device 
*mdev)
 */
if (ret)
rc = ret;
-   vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
}
}
 
diff --git a/drivers/s390/crypto/vfio_ap_private.h 
b/drivers/s390/crypto/vfio_ap_private.h
index f46dde

[PATCH v13 00/15] s390/vfio-ap: dynamic configuration support

2020-12-22 Thread Tony Krowiak
:

* Fixed a bug in ap_bus.c introduced with patch 2/7 of the v5 
  series. Harald Freudenberer pointed out that the mutex lock
  for ap_perms_mutex in the apmask_store and aqmask_store functions
  was not being freed. 

* Removed patch 6/7 which added logging to the vfio_ap driver
  to expedite acceptance of this series. The logging will be introduced
  with a separate patch series to allow more time to explore options
  such as DBF logging vs. tracepoints.

* Added 3 patches related to ensuring that APQNs that do not reference
  AP queue devices bound to the vfio_ap device driver are not assigned
  to the guest CRYCB:

  Patch 4: Filter CRYCB bits for unavailable queue devices
  Patch 5: sysfs attribute to display the guest CRYCB
  Patch 6: update guest CRYCB in vfio_ap probe and remove callbacks

* Added a patch (Patch 9) to version the vfio_ap module.

* Reshuffled patches to allow the in_use callback implementation to
  invoke the vfio_ap_mdev_verify_no_sharing() function introduced in
  patch 2. 

Change log v4-v5:

* Added a patch to provide kernel s390dbf debug logs for VFIO AP

Change log v3->v4:
-
* Restored patches preventing root user from changing ownership of
  APQNs from zcrypt drivers to the vfio_ap driver if the APQN is
  assigned to an mdev.

* No longer enforcing requirement restricting guest access to
  queues represented by a queue device bound to the vfio_ap
  device driver.

* Removed shadow CRYCB and now directly updating the guest CRYCB
  from the matrix mdev's matrix.

* Rebased the patch series on top of 'vfio: ap: AP Queue Interrupt
  Control' patches.

* Disabled bind/unbind sysfs interfaces for vfio_ap driver

Change log v2->v3:
-
* Allow guest access to an AP queue only if the queue is bound to
  the vfio_ap device driver.

* Removed the patch to test CRYCB masks before taking the vCPUs
  out of SIE. Now checking the shadow CRYCB in the vfio_ap driver.

Change log v1->v2:
-
* Removed patches preventing root user from unbinding AP queues from 
  the vfio_ap device driver
* Introduced a shadow CRYCB in the vfio_ap driver to manage dynamic 
  changes to the AP guest configuration due to root user interventions
  or hardware anomalies.

Tony Krowiak (15):
  s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated
  s390/vfio-ap: No need to disable IRQ after queue reset
  s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c
  s390/vfio-ap: use new AP bus interface to search for queue devices
  s390/vfio-ap: manage link between queue struct and matrix mdev
  s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  s390/vfio-ap: introduce shadow APCB
  s390/vfio-ap: sysfs attribute to display the guest's matrix
  s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  s390/zcrypt: driver callback to indicate resource in use
  s390/vfio-ap: implement in-use callback for vfio_ap driver
  s390/zcrypt: Notify driver on config changed and scan complete
callbacks
  s390/vfio-ap: handle host AP config change notification
  s390/vfio-ap: handle AP bus scan completed notification
  s390/vfio-ap: update docs to include dynamic config support

 Documentation/s390/vfio-ap.rst| 383 ---
 drivers/s390/crypto/ap_bus.c  | 251 +++-
 drivers/s390/crypto/ap_bus.h  |  16 +
 drivers/s390/crypto/vfio_ap_drv.c |  50 +-
 drivers/s390/crypto/vfio_ap_ops.c | 891 +-
 drivers/s390/crypto/vfio_ap_private.h |  29 +-
 6 files changed, 1170 insertions(+), 450 deletions(-)

-- 
2.21.1



Re: [PATCH v4] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated

2020-12-22 Thread Tony Krowiak




On 12/22/20 2:43 PM, Halil Pasic wrote:

On Tue, 22 Dec 2020 16:57:06 +0100
Cornelia Huck  wrote:


On Tue, 22 Dec 2020 10:37:01 -0500
Tony Krowiak  wrote:


On 12/21/20 11:05 PM, Halil Pasic wrote:

On Mon, 21 Dec 2020 13:56:25 -0500
Tony Krowiak  wrote:

   static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
   unsigned long action, void *data)
   {
-   int ret;
+   int ret, notify_rc = NOTIFY_DONE;
struct ap_matrix_mdev *matrix_mdev;
   
   	if (action != VFIO_GROUP_NOTIFY_SET_KVM)

return NOTIFY_OK;
   
   	matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);

+   mutex_lock(_dev->lock);
   
   	if (!data) {

-   matrix_mdev->kvm = NULL;
-   return NOTIFY_OK;
+   if (matrix_mdev->kvm)
+   vfio_ap_mdev_unset_kvm(matrix_mdev);
+   notify_rc = NOTIFY_OK;
+   goto notify_done;
}
   
   	ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);

if (ret)
-   return NOTIFY_DONE;
+   goto notify_done;
   
   	/* If there is no CRYCB pointer, then we can't copy the masks */

if (!matrix_mdev->kvm->arch.crypto.crycbd)
-   return NOTIFY_DONE;
+   goto notify_done;
   
   	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,

  matrix_mdev->matrix.aqm,
  matrix_mdev->matrix.adm);
   
-	return NOTIFY_OK;

Shouldn't there be an
   +notify_rc = NOTIFY_OK;
here? I mean you initialize notify_rc to NOTIFY_DONE, in the !data branch
on success you set notify_rc to NOTIFY_OK, but in the !!data branch it
just stays NOTIFY_DONE. Or am I missing something?

I don't think it matters much since NOTIFY_OK and NOTIFY_DONE have
no further effect on processing of the notification queue, but I believe
you are correct, this is a change from what we originally had. I can
restore the original return values if you'd prefer.

Even if they have the same semantics now, that might change in the
future; restoring the original behaviour looks like the right thing to
do.

I agree. Especially since we do care to preserve the behavior in
the !data branch. If there is no difference between the two, then it
would probably make sense to clean that up globally.


Got it. I'm going to do a quick turnaround on the next version so we
can get this merged if need be. I will be taking off for Christmas vacation
and will be gone until sometime the first week in January.



Regards,
Halil




Re: [PATCH v4] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated

2020-12-22 Thread Tony Krowiak




On 12/21/20 11:05 PM, Halil Pasic wrote:

On Mon, 21 Dec 2020 13:56:25 -0500
Tony Krowiak  wrote:


The vfio_ap device driver registers a group notifier with VFIO when the
file descriptor for a VFIO mediated device for a KVM guest is opened to
receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
event). When the KVM pointer is set, the vfio_ap driver takes the
following actions:
1. Stashes the KVM pointer in the vfio_ap_mdev struct that holds the state
of the mediated device.
2. Calls the kvm_get_kvm() function to increment its reference counter.
3. Sets the function pointer to the function that handles interception of
the instruction that enables/disables interrupt processing.
4. Sets the masks in the KVM guest's CRYCB to pass AP resources through to
the guest.

In order to avoid memory leaks, when the notifier is called to receive
notification that the KVM pointer has been set to NULL, the vfio_ap device
driver should reverse the actions taken when the KVM pointer was set.

Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")
Cc: sta...@vger.kernel.org
Signed-off-by: Tony Krowiak 
Reviewed-by: Halil Pasic 
Reviewed-by: Cornelia Huck 

[..]


  static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
   unsigned long action, void *data)
  {
-   int ret;
+   int ret, notify_rc = NOTIFY_DONE;
struct ap_matrix_mdev *matrix_mdev;
  
  	if (action != VFIO_GROUP_NOTIFY_SET_KVM)

return NOTIFY_OK;
  
  	matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);

+   mutex_lock(_dev->lock);
  
  	if (!data) {

-   matrix_mdev->kvm = NULL;
-   return NOTIFY_OK;
+   if (matrix_mdev->kvm)
+   vfio_ap_mdev_unset_kvm(matrix_mdev);
+   notify_rc = NOTIFY_OK;
+   goto notify_done;
}
  
  	ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);

if (ret)
-   return NOTIFY_DONE;
+   goto notify_done;
  
  	/* If there is no CRYCB pointer, then we can't copy the masks */

if (!matrix_mdev->kvm->arch.crypto.crycbd)
-   return NOTIFY_DONE;
+   goto notify_done;
  
  	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,

  matrix_mdev->matrix.aqm,
  matrix_mdev->matrix.adm);
  
-	return NOTIFY_OK;

Shouldn't there be an
  + notify_rc = NOTIFY_OK;
here? I mean you initialize notify_rc to NOTIFY_DONE, in the !data branch
on success you set notify_rc to NOTIFY_OK, but in the !!data branch it
just stays NOTIFY_DONE. Or am I missing something?


I don't think it matters much since NOTIFY_OK and NOTIFY_DONE have
no further effect on processing of the notification queue, but I believe
you are correct, this is a change from what we originally had. I can
restore the original return values if you'd prefer.



Otherwise LGTM!

Regards,
Halil


+notify_done:
+   mutex_unlock(_dev->lock);
+   return notify_rc;
  }


[..]




[PATCH v4] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated

2020-12-21 Thread Tony Krowiak
The vfio_ap device driver registers a group notifier with VFIO when the
file descriptor for a VFIO mediated device for a KVM guest is opened to
receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
event). When the KVM pointer is set, the vfio_ap driver takes the
following actions:
1. Stashes the KVM pointer in the vfio_ap_mdev struct that holds the state
   of the mediated device.
2. Calls the kvm_get_kvm() function to increment its reference counter.
3. Sets the function pointer to the function that handles interception of
   the instruction that enables/disables interrupt processing.
4. Sets the masks in the KVM guest's CRYCB to pass AP resources through to
   the guest.

In order to avoid memory leaks, when the notifier is called to receive
notification that the KVM pointer has been set to NULL, the vfio_ap device
driver should reverse the actions taken when the KVM pointer was set.

Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")
Cc: sta...@vger.kernel.org
Signed-off-by: Tony Krowiak 
Reviewed-by: Halil Pasic 
Reviewed-by: Cornelia Huck 
---
 drivers/s390/crypto/vfio_ap_ops.c | 42 +--
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index e0bde8518745..44f3378540d5 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1037,19 +1037,14 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
*matrix_mdev,
 {
struct ap_matrix_mdev *m;
 
-   mutex_lock(_dev->lock);
-
list_for_each_entry(m, _dev->mdev_list, node) {
-   if ((m != matrix_mdev) && (m->kvm == kvm)) {
-   mutex_unlock(_dev->lock);
+   if ((m != matrix_mdev) && (m->kvm == kvm))
return -EPERM;
-   }
}
 
matrix_mdev->kvm = kvm;
kvm_get_kvm(kvm);
kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
-   mutex_unlock(_dev->lock);
 
return 0;
 }
@@ -1083,35 +1078,49 @@ static int vfio_ap_mdev_iommu_notifier(struct 
notifier_block *nb,
return NOTIFY_DONE;
 }
 
+static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
+{
+   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
+   vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
+   kvm_put_kvm(matrix_mdev->kvm);
+   matrix_mdev->kvm = NULL;
+}
+
 static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
   unsigned long action, void *data)
 {
-   int ret;
+   int ret, notify_rc = NOTIFY_DONE;
struct ap_matrix_mdev *matrix_mdev;
 
if (action != VFIO_GROUP_NOTIFY_SET_KVM)
return NOTIFY_OK;
 
matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
+   mutex_lock(_dev->lock);
 
if (!data) {
-   matrix_mdev->kvm = NULL;
-   return NOTIFY_OK;
+   if (matrix_mdev->kvm)
+   vfio_ap_mdev_unset_kvm(matrix_mdev);
+   notify_rc = NOTIFY_OK;
+   goto notify_done;
}
 
ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
if (ret)
-   return NOTIFY_DONE;
+   goto notify_done;
 
/* If there is no CRYCB pointer, then we can't copy the masks */
if (!matrix_mdev->kvm->arch.crypto.crycbd)
-   return NOTIFY_DONE;
+   goto notify_done;
 
kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
  matrix_mdev->matrix.aqm,
  matrix_mdev->matrix.adm);
 
-   return NOTIFY_OK;
+notify_done:
+   mutex_unlock(_dev->lock);
+   return notify_rc;
 }
 
 static void vfio_ap_irq_disable_apqn(int apqn)
@@ -1222,13 +1231,8 @@ static void vfio_ap_mdev_release(struct mdev_device 
*mdev)
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
mutex_lock(_dev->lock);
-   if (matrix_mdev->kvm) {
-   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
-   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
-   vfio_ap_mdev_reset_queues(mdev);
-   kvm_put_kvm(matrix_mdev->kvm);
-   matrix_mdev->kvm = NULL;
-   }
+   if (matrix_mdev->kvm)
+   vfio_ap_mdev_unset_kvm(matrix_mdev);
mutex_unlock(_dev->lock);
 
vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
-- 
2.21.1



Re: [PATCH v3] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated

2020-12-16 Thread Tony Krowiak




On 12/16/20 11:05 AM, Christian Borntraeger wrote:


On 16.12.20 10:58, Christian Borntraeger wrote:

On 16.12.20 02:21, Halil Pasic wrote:

On Tue, 15 Dec 2020 19:10:20 +0100
Christian Borntraeger  wrote:



On 15.12.20 11:57, Halil Pasic wrote:

On Mon, 14 Dec 2020 11:56:17 -0500
Tony Krowiak  wrote:


The vfio_ap device driver registers a group notifier with VFIO when the
file descriptor for a VFIO mediated device for a KVM guest is opened to
receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
event). When the KVM pointer is set, the vfio_ap driver takes the
following actions:
1. Stashes the KVM pointer in the vfio_ap_mdev struct that holds the state
of the mediated device.
2. Calls the kvm_get_kvm() function to increment its reference counter.
3. Sets the function pointer to the function that handles interception of
the instruction that enables/disables interrupt processing.
4. Sets the masks in the KVM guest's CRYCB to pass AP resources through to
the guest.

In order to avoid memory leaks, when the notifier is called to receive
notification that the KVM pointer has been set to NULL, the vfio_ap device
driver should reverse the actions taken when the KVM pointer was set.

Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")
Signed-off-by: Tony Krowiak 
---
  drivers/s390/crypto/vfio_ap_ops.c | 29 -
  1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index e0bde8518745..cd22e85588e1 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1037,8 +1037,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
*matrix_mdev,
  {
struct ap_matrix_mdev *m;

-   mutex_lock(_dev->lock);
-
list_for_each_entry(m, _dev->mdev_list, node) {
if ((m != matrix_mdev) && (m->kvm == kvm)) {
mutex_unlock(_dev->lock);
@@ -1049,7 +1047,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
*matrix_mdev,
matrix_mdev->kvm = kvm;
kvm_get_kvm(kvm);
kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
-   mutex_unlock(_dev->lock);

return 0;
  }
@@ -1083,35 +1080,49 @@ static int vfio_ap_mdev_iommu_notifier(struct 
notifier_block *nb,
return NOTIFY_DONE;
  }

+static void "(struct ap_matrix_mdev *matrix_mdev)
+{
+   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;


This patch LGTM. The only concern I have with it is whether a
different cpu is guaranteed to observe the above assignment as
an atomic operation. I think we didn't finish this discussion
at v1, or did we?

You mean just this assigment:

+   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;

should either have the old or the new value, but not halve zero halve old?


Yes that is the assignment I was referring to. Old value will work as well 
because
kvm holds a reference to this module while in the pqap_hook.
  

Normally this should be ok (and I would consider this a compiler bug if
this is split into 2 32 bit zeroes) But if you really want to be sure then we
can use WRITE_ONCE.

Just my curiosity: what would make this a bug? Is it the s390 elf ABI,
or some gcc feature, or even the C standard? Also how exactly would
WRITE_ONCE, also access via volatile help in this particular situation?

I think its a tricky things and not strictly guaranteed, but there is a lot
of code that relies on the atomicity of word sizes. see for example the 
discussion
here
https://lore.kernel.org/lkml/CAHk-=wgc4+kv9ailokw7cpp429rkcu+vja8cwafyojc3mtq...@mail.gmail.com/

WRITE_ONCE will not change the guarantees a lot, but it is mostly a 
documentation
that we assume atomic access here.

After looking again at the code, I think I have to correct myself.
WRITE_ONCE does not look necessary.


Another thing, though:
Shouldnt we also replace this code

[...]
static void vfio_ap_mdev_release(struct mdev_device *mdev)
{
 struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);

 mutex_lock(_dev->lock);
 if (matrix_mdev->kvm) {
--->  kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
--->  matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
--->  vfio_ap_mdev_reset_queues(mdev);
--->  kvm_put_kvm(matrix_mdev->kvm);
--->  matrix_mdev->kvm = NULL;
[...]

with vfio_ap_mdev_unset_kvm ?


I had that in the v2 patches, but mistakenly removed it
because of a misinterpretation of the docs on posting a
patch for a stable release. I'll restore it since I have to
remove the unlock from the vfio_ap_mdev_unset_kvm
function.




Re: [PATCH v3] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated

2020-12-16 Thread Tony Krowiak




On 12/14/20 11:56 AM, Tony Krowiak wrote:

The vfio_ap device driver registers a group notifier with VFIO when the
file descriptor for a VFIO mediated device for a KVM guest is opened to
receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
event). When the KVM pointer is set, the vfio_ap driver takes the
following actions:
1. Stashes the KVM pointer in the vfio_ap_mdev struct that holds the state
of the mediated device.
2. Calls the kvm_get_kvm() function to increment its reference counter.
3. Sets the function pointer to the function that handles interception of
the instruction that enables/disables interrupt processing.
4. Sets the masks in the KVM guest's CRYCB to pass AP resources through to
the guest.

In order to avoid memory leaks, when the notifier is called to receive
notification that the KVM pointer has been set to NULL, the vfio_ap device
driver should reverse the actions taken when the KVM pointer was set.

Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")
Signed-off-by: Tony Krowiak 
---
  drivers/s390/crypto/vfio_ap_ops.c | 29 -
  1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index e0bde8518745..cd22e85588e1 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1037,8 +1037,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
*matrix_mdev,
  {
struct ap_matrix_mdev *m;
  
-	mutex_lock(_dev->lock);

-
list_for_each_entry(m, _dev->mdev_list, node) {
if ((m != matrix_mdev) && (m->kvm == kvm)) {
mutex_unlock(_dev->lock);


This unlock needs to be removed.


@@ -1049,7 +1047,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev 
*matrix_mdev,
matrix_mdev->kvm = kvm;
kvm_get_kvm(kvm);
kvm->arch.crypto.pqap_hook = _mdev->pqap_hook;
-   mutex_unlock(_dev->lock);
  
  	return 0;

  }
@@ -1083,35 +1080,49 @@ static int vfio_ap_mdev_iommu_notifier(struct 
notifier_block *nb,
return NOTIFY_DONE;
  }
  
+static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)

+{
+   kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+   matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
+   vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
+   kvm_put_kvm(matrix_mdev->kvm);
+   matrix_mdev->kvm = NULL;
+}
+
  static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
   unsigned long action, void *data)
  {
-   int ret;
+   int ret, notify_rc = NOTIFY_DONE;
struct ap_matrix_mdev *matrix_mdev;
  
  	if (action != VFIO_GROUP_NOTIFY_SET_KVM)

return NOTIFY_OK;
  
  	matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);

+   mutex_lock(_dev->lock);
  
  	if (!data) {

-   matrix_mdev->kvm = NULL;
-   return NOTIFY_OK;
+   if (matrix_mdev->kvm)
+   vfio_ap_mdev_unset_kvm(matrix_mdev);
+   notify_rc = NOTIFY_OK;
+   goto notify_done;
}
  
  	ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);

if (ret)
-   return NOTIFY_DONE;
+   goto notify_done;
  
  	/* If there is no CRYCB pointer, then we can't copy the masks */

if (!matrix_mdev->kvm->arch.crypto.crycbd)
-   return NOTIFY_DONE;
+   goto notify_done;
  
  	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,

  matrix_mdev->matrix.aqm,
  matrix_mdev->matrix.adm);
  
-	return NOTIFY_OK;

+notify_done:
+   mutex_unlock(_dev->lock);
+   return notify_rc;
  }
  
  static void vfio_ap_irq_disable_apqn(int apqn)




Re: [PATCH v12 14/17] s390/zcrypt: Notify driver on config changed and scan complete callbacks

2020-12-16 Thread Tony Krowiak




On 11/30/20 4:18 AM, h...@d06av26.portsmouth.uk.ibm.com wrote:

On Tue, 24 Nov 2020 16:40:13 -0500
Tony Krowiak  wrote:


This patch intruduces an extension to the ap bus to notify device drivers
when the host AP configuration changes - i.e., adapters, domains or
control domains are added or removed. To that end, two new callbacks are
introduced for AP device drivers:

   void (*on_config_changed)(struct ap_config_info *new_config_info,
 struct ap_config_info *old_config_info);

  This callback is invoked at the start of the AP bus scan
  function when it determines that the host AP configuration information
  has changed since the previous scan. This is done by storing
  an old and current QCI info struct and comparing them. If there is any
  difference, the callback is invoked.

  Note that when the AP bus scan detects that AP adapters, domains or
  control domains have been removed from the host's AP configuration, it
  will remove the associated devices from the AP bus subsystem's device
  model. This callback gives the device driver a chance to respond to
  the removal of the AP devices from the host configuration prior to
  calling the device driver's remove callback. The primary purpose of
  this callback is to allow the vfio_ap driver to do a bulk unplug of
  all affected adapters, domains and control domains from affected
  guests rather than unplugging them one at a time when the remove
  callback is invoked.

   void (*on_scan_complete)(struct ap_config_info *new_config_info,
struct ap_config_info *old_config_info);

  The on_scan_complete callback is invoked after the ap bus scan is
  complete if the host AP configuration data has changed.

  Note that when the AP bus scan detects that adapters, domains or
  control domains have been added to the host's configuration, it will
  create new devices in the AP bus subsystem's device model. The primary
  purpose of this callback is to allow the vfio_ap driver to do a bulk
  plug of all affected adapters, domains and control domains into
  affected guests rather than plugging them one at a time when the
  probe callback is invoked.

Please note that changes to the apmask and aqmask do not trigger
these two callbacks since the bus scan function is not invoked by changes
to those masks.

Signed-off-by: Harald Freudenberger 
Signed-off-by: Tony Krowiak 
---
  drivers/s390/crypto/ap_bus.c  | 83 ++-
  drivers/s390/crypto/ap_bus.h  | 12 
  drivers/s390/crypto/vfio_ap_private.h | 14 -
  3 files changed, 106 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 593573740981..3a63f6b33d8a 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -75,6 +75,7 @@ DEFINE_MUTEX(ap_perms_mutex);
  EXPORT_SYMBOL(ap_perms_mutex);
  
  static struct ap_config_info *ap_qci_info;

+static struct ap_config_info *ap_qci_info_old;
  
  /*

   * AP bus related debug feature things.
@@ -1440,6 +1441,52 @@ static int __match_queue_device_with_queue_id(struct 
device *dev, const void *da
&& AP_QID_QUEUE(to_ap_queue(dev)->qid) == (int)(long) data;
  }
  
+/* Helper function for notify_config_changed */

+static int __drv_notify_config_changed(struct device_driver *drv, void *data)
+{
+   struct ap_driver *ap_drv = to_ap_drv(drv);
+
+   if (try_module_get(drv->owner)) {
+   if (ap_drv->on_config_changed)
+   ap_drv->on_config_changed(ap_qci_info,
+ ap_qci_info_old);
+   module_put(drv->owner);
+   }
+
+   return 0;
+}
+
+/* Notify all drivers about an qci config change */
+static inline void notify_config_changed(void)
+{
+   bus_for_each_drv(_bus_type, NULL, NULL,
+__drv_notify_config_changed);
+}
+
+/* Helper function for notify_scan_complete */
+static int __drv_notify_scan_complete(struct device_driver *drv, void *data)
+{
+   struct ap_driver *ap_drv = to_ap_drv(drv);
+
+   if (try_module_get(drv->owner)) {
+   if (ap_drv->on_scan_complete)
+   ap_drv->on_scan_complete(ap_qci_info,
+ap_qci_info_old);
+   module_put(drv->owner);
+   }
+
+   return 0;
+}
+
+/* Notify all drivers about bus scan complete */
+static inline void notify_scan_complete(void)
+{
+   bus_for_each_drv(_bus_type, NULL, NULL,
+__drv_notify_scan_complete);
+}
+
+
+
  /*
   * Helper function for ap_scan_bus().
   * Remove card device and associated queue devices.
@@ -1718,15 +1765,43 @@ static inline void ap_scan_adapter(int ap)
put_device(>ap_dev.device);
  }
  
+static int ap_get_configuration(voi

Re: [PATCH v12 14/17] s390/zcrypt: Notify driver on config changed and scan complete callbacks

2020-12-16 Thread Tony Krowiak




On 12/9/20 2:20 AM, Harald Freudenberger wrote:

On 30.11.20 10:18, h...@d06av26.portsmouth.uk.ibm.com wrote:

On Tue, 24 Nov 2020 16:40:13 -0500
Tony Krowiak  wrote:


This patch intruduces an extension to the ap bus to notify device drivers
when the host AP configuration changes - i.e., adapters, domains or
control domains are added or removed. To that end, two new callbacks are
introduced for AP device drivers:

   void (*on_config_changed)(struct ap_config_info *new_config_info,
 struct ap_config_info *old_config_info);

  This callback is invoked at the start of the AP bus scan
  function when it determines that the host AP configuration information
  has changed since the previous scan. This is done by storing
  an old and current QCI info struct and comparing them. If there is any
  difference, the callback is invoked.

  Note that when the AP bus scan detects that AP adapters, domains or
  control domains have been removed from the host's AP configuration, it
  will remove the associated devices from the AP bus subsystem's device
  model. This callback gives the device driver a chance to respond to
  the removal of the AP devices from the host configuration prior to
  calling the device driver's remove callback. The primary purpose of
  this callback is to allow the vfio_ap driver to do a bulk unplug of
  all affected adapters, domains and control domains from affected
  guests rather than unplugging them one at a time when the remove
  callback is invoked.

   void (*on_scan_complete)(struct ap_config_info *new_config_info,
struct ap_config_info *old_config_info);

  The on_scan_complete callback is invoked after the ap bus scan is
  complete if the host AP configuration data has changed.

  Note that when the AP bus scan detects that adapters, domains or
  control domains have been added to the host's configuration, it will
  create new devices in the AP bus subsystem's device model. The primary
  purpose of this callback is to allow the vfio_ap driver to do a bulk
  plug of all affected adapters, domains and control domains into
  affected guests rather than plugging them one at a time when the
  probe callback is invoked.

Please note that changes to the apmask and aqmask do not trigger
these two callbacks since the bus scan function is not invoked by changes
to those masks.

Signed-off-by: Harald Freudenberger 
Signed-off-by: Tony Krowiak 
---
  drivers/s390/crypto/ap_bus.c  | 83 ++-
  drivers/s390/crypto/ap_bus.h  | 12 
  drivers/s390/crypto/vfio_ap_private.h | 14 -
  3 files changed, 106 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 593573740981..3a63f6b33d8a 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -75,6 +75,7 @@ DEFINE_MUTEX(ap_perms_mutex);
  EXPORT_SYMBOL(ap_perms_mutex);
  
  static struct ap_config_info *ap_qci_info;

+static struct ap_config_info *ap_qci_info_old;
  
  /*

   * AP bus related debug feature things.
@@ -1440,6 +1441,52 @@ static int __match_queue_device_with_queue_id(struct 
device *dev, const void *da
&& AP_QID_QUEUE(to_ap_queue(dev)->qid) == (int)(long) data;
  }
  
+/* Helper function for notify_config_changed */

+static int __drv_notify_config_changed(struct device_driver *drv, void *data)
+{
+   struct ap_driver *ap_drv = to_ap_drv(drv);
+
+   if (try_module_get(drv->owner)) {
+   if (ap_drv->on_config_changed)
+   ap_drv->on_config_changed(ap_qci_info,
+ ap_qci_info_old);
+   module_put(drv->owner);
+   }
+
+   return 0;
+}
+
+/* Notify all drivers about an qci config change */
+static inline void notify_config_changed(void)
+{
+   bus_for_each_drv(_bus_type, NULL, NULL,
+__drv_notify_config_changed);
+}
+
+/* Helper function for notify_scan_complete */
+static int __drv_notify_scan_complete(struct device_driver *drv, void *data)
+{
+   struct ap_driver *ap_drv = to_ap_drv(drv);
+
+   if (try_module_get(drv->owner)) {
+   if (ap_drv->on_scan_complete)
+   ap_drv->on_scan_complete(ap_qci_info,
+ap_qci_info_old);
+   module_put(drv->owner);
+   }
+
+   return 0;
+}
+
+/* Notify all drivers about bus scan complete */
+static inline void notify_scan_complete(void)
+{
+   bus_for_each_drv(_bus_type, NULL, NULL,
+__drv_notify_scan_complete);
+}
+
+
+
  /*
   * Helper function for ap_scan_bus().
   * Remove card device and associated queue devices.
@@ -1718,15 +1765,43 @@ static inline void ap_scan_adapter(int ap)
put_device(

Re: [PATCH v12 10/17] s390/vfio-ap: initialize the guest apcb

2020-12-16 Thread Tony Krowiak




On 11/28/20 8:09 PM, Halil Pasic wrote:

On Tue, 24 Nov 2020 16:40:09 -0500
Tony Krowiak  wrote:


The APCB is a control block containing the masks that specify the adapters,
domains and control domains to which a KVM guest is granted access. When
the vfio_ap device driver is notified that the KVM pointer has been set,
the guest's APCB is initialized from the AP configuration of adapters,
domains and control domains assigned to the matrix mdev. The linux device
model, however, precludes passing through to a guest any devices that
are not bound to the device driver facilitating the pass-through.
Consequently, APQNs assigned to the matrix mdev that do not reference
AP queue devices must be filtered before assigning them to the KVM guest's
APCB; however, the AP architecture precludes filtering individual APQNs, so
the APQNs will be filtered by APID. That is, if a given APQN does not
reference a queue device bound to the vfio_ap driver, its APID will not
get assigned to the guest's APCB. For example:

Queues bound to vfio_ap:
04.0004
04.0022
04.0035
05.0004
05.0022

Adapters/domains assigned to the matrix mdev:
04 0004
0022
0035
05 0004
0022
0035

APQNs assigned to APCB:
04.0004
04.0022
04.0035

The APID 05 was filtered from the matrix mdev's matrix because
queue device 05.0035 is not bound to the vfio_ap device driver.

Signed-off-by: Tony Krowiak 

This adds filtering. So from here guest_matrix may be different
than matrix also for an mdev that is associated with a guest. I'm still
grappling with the big picture. Have you thought about testability?
How is a testcase supposed to figure out which behavior is
to be deemed correct?


This patch is going away for v13 which is forthcoming.
The filtering of the mdev's matrix will become part of the
hot plug patch and will be used whenever changes to the
mdev's matrix are made (i.e., assign/unassign), when
AP queue devices are bound to and unbound from the
vfio_ap device driver and when the host AP configuration
changes. This resolves a couple of issues that have been
brought up in these reviews:
1. Keeps the expected behavior across the various means of
    changing the guest's AP configuration.
2. Simplifies the code.



I don't like the title line. It implies that guest apcb was
uninitialized before. Which is not the case.


This patch is going away for v13.










Re: [PATCH v12 11/17] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device

2020-12-16 Thread Tony Krowiak




On 11/28/20 8:17 PM, Halil Pasic wrote:

On Tue, 24 Nov 2020 16:40:10 -0500
Tony Krowiak  wrote:


The current implementation does not allow assignment of an AP adapter or
domain to an mdev device if each APQN resulting from the assignment
does not reference an AP queue device that is bound to the vfio_ap device
driver. This patch allows assignment of AP resources to the matrix mdev as
long as the APQNs resulting from the assignment:
1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
2. Are not assigned to another matrix mdev.

The rationale behind this is twofold:
1. The AP architecture does not preclude assignment of APQNs to an AP
   configuration that are not available to the system.
2. APQNs that do not reference a queue device bound to the vfio_ap
   device driver will not be assigned to the guest's CRYCB, so the
   guest will not get access to queues not bound to the vfio_ap driver.

Signed-off-by: Tony Krowiak 

Again code looks good. I'm still worried about all the incremental
changes (good for review) and their testability.


I'm not sure what your concern is here. Is there an expectation
that each patch needs to be testable by itself, or whether the
functionality in each patch can be easily tested en masse?

I'm not sure some of these changes can be tested with an
automated test because the test code would have to be able to
dynamically change the host's AP configuration and I don't know
if there is currently a way to do this programmatically. In order to
test the effects of dynamic host crypto configuration manually, one
needs access to an SE or HMC with DPM.




Re: [PATCH v12 09/17] s390/vfio-ap: sysfs attribute to display the guest's matrix

2020-12-16 Thread Tony Krowiak

Thanks for the review.

On 11/28/20 7:49 PM, Halil Pasic wrote:

On Tue, 24 Nov 2020 16:40:08 -0500
Tony Krowiak  wrote:


The matrix of adapters and domains configured in a guest's APCB may
differ from the matrix of adapters and domains assigned to the matrix mdev,
so this patch introduces a sysfs attribute to display the matrix of
adapters and domains that are or will be assigned to the APCB of a guest
that is or will be using the matrix mdev. For a matrix mdev denoted by
$uuid, the guest matrix can be displayed as follows:

cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix

Signed-off-by: Tony Krowiak 

Code looks good, but it may be a little early, since the treatment of
guset_matrix is changed by the following patches.




Re: [PATCH v12 07/17] s390/vfio-ap: implement in-use callback for vfio_ap driver

2020-12-14 Thread Tony Krowiak




On 11/26/20 10:54 AM, Halil Pasic wrote:

On Tue, 24 Nov 2020 16:40:06 -0500
Tony Krowiak  wrote:


Let's implement the callback to indicate when an APQN
is in use by the vfio_ap device driver. The callback is
invoked whenever a change to the apmask or aqmask would
result in one or more queue devices being removed from the driver. The
vfio_ap device driver will indicate a resource is in use
if the APQN of any of the queue devices to be removed are assigned to
any of the matrix mdevs under the driver's control.

There is potential for a deadlock condition between the matrix_dev->lock
used to lock the matrix device during assignment of adapters and domains
and the ap_perms_mutex locked by the AP bus when changes are made to the
sysfs apmask/aqmask attributes.

Consider following scenario (courtesy of Halil Pasic):
1) apmask_store() takes ap_perms_mutex
2) assign_adapter_store() takes matrix_dev->lock
3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
to take matrix_dev->lock
4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
which tries to take ap_perms_mutex

BANG!

To resolve this issue, instead of using the mutex_lock(_dev->lock)
function to lock the matrix device during assignment of an adapter or
domain to a matrix_mdev as well as during the in_use callback, the
mutex_trylock(_dev->lock) function will be used. If the lock is not
obtained, then the assignment and in_use functions will terminate with
-EBUSY.

Good news is: the final product is OK with regards to in_use(). Bad news
is: this patch does not do enough. At this stage we are still racy.

The problem is that the assign operations don't bother to take the
ap_perms_mutex lock under the matrix_dev->lock.

The scenario is the following:
1) apmask_store() takes ap_perms_mutex
2) apmask_store() calls vfio_ap_mdev_resource_in_use() which
  takes matrix_dev->lock
3) vfio_ap_mdev_resource_in_use() releases matrix_dev->lock
and returns 0
4) assign_adapter_store() takes matrix_dev->lock does the
assign (the queues are still bound to vfio_ap) and releases
matrix_dev->lock
5) apmask_store() carries on, does the update to apask and releases
ap_perms_mutex
6) The queues get 'stolen' from vfio ap while used.


You're missing an interim step between 5 and 6 where the apmask_store()
function executes the device_reprobe() function which results in queues
to be taken from vfio_ap getting unbound. In this case, the
vfio_ap_mdev_remove_queue() function gets called to remove the
queues resulting in unplugging



This gets fixed with "s390/vfio-ap: allow assignment of unavailable AP
queues to mdev device". Maybe we can reorder these patches. I didn't
look into that.

We could also just ignore the problem, because it is just for a couple
of commits, but I would prefer it gone.


Reordering the patches is not a trivial task, I perfer not to do it.



Regards,
Halil








Re: [PATCH v12 05/17] s390/vfio-ap: manage link between queue struct and matrix mdev

2020-12-14 Thread Tony Krowiak




On 11/26/20 9:08 AM, Halil Pasic wrote:

On Tue, 24 Nov 2020 16:40:04 -0500
Tony Krowiak  wrote:


@@ -1155,6 +1243,11 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device 
*mdev)
 matrix_mdev->matrix.apm_max + 1) {
for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
 matrix_mdev->matrix.aqm_max + 1) {
+   q = vfio_ap_mdev_get_queue(matrix_mdev,
+  AP_MKQID(apid, apqi));
+   if (!q)
+   continue;
+
ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
/*
 * Regardless whether a queue turns out to be busy, or
@@ -1164,9 +1257,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device 
*mdev)
if (ret)
rc = ret;
  
-			q = vfio_ap_get_queue(matrix_mdev, AP_MKQID(apid, apqi);

-   if (q)
-   vfio_ap_free_aqic_resources(q);
+   vfio_ap_free_aqic_resources(q);
}
}

During the review of v11 we discussed this. Introducing this the one
way around, just to change it in the next patch, which should deal
with something different makes no sense to me.


This is handled by the vfio_ap_mdev_reset_queue() function in the
next version.



BTW I've provided a ton of feedback for '[PATCH v11 03/14]
s390/vfio-ap: manage link between queue struct and matrix mdev', but I
can't find your response to that. Some of the things resurface here, and
I don't feel like repeating myself. Can you provide me an answer to
the v11 version?


I can.




  1   2   3   4   5   6   7   8   9   10   >