On Tue, Jul 19 2022, Jason Gunthorpe <j...@nvidia.com> wrote:

> On Thu, Jul 07, 2022 at 03:37:16PM -0600, Alex Williamson wrote:
>> On Mon,  4 Jul 2022 21:59:03 -0300
>> Jason Gunthorpe <j...@nvidia.com> wrote:
>> > diff --git a/drivers/s390/cio/vfio_ccw_ops.c 
>> > b/drivers/s390/cio/vfio_ccw_ops.c
>> > index b49e2e9db2dc6f..09e0ce7b72324c 100644
>> > --- a/drivers/s390/cio/vfio_ccw_ops.c
>> > +++ b/drivers/s390/cio/vfio_ccw_ops.c
>> > @@ -44,31 +44,19 @@ static int vfio_ccw_mdev_reset(struct vfio_ccw_private 
>> > *private)
>> >    return ret;
>> >  }
>> >  
>> > -static int vfio_ccw_mdev_notifier(struct notifier_block *nb,
>> > -                            unsigned long action,
>> > -                            void *data)
>> > +static void vfio_ccw_dma_unmap(struct vfio_device *vdev, u64 iova, u64 
>> > length)
>> >  {
>> >    struct vfio_ccw_private *private =
>> > -          container_of(nb, struct vfio_ccw_private, nb);
>> > -
>> > -  /*
>> > -   * Vendor drivers MUST unpin pages in response to an
>> > -   * invalidation.
>> > -   */
>> > -  if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
>> > -          struct vfio_iommu_type1_dma_unmap *unmap = data;
>> > -
>> > -          if (!cp_iova_pinned(&private->cp, unmap->iova))
>> > -                  return NOTIFY_OK;
>> > +          container_of(vdev, struct vfio_ccw_private, vdev);
>> >  
>> > -          if (vfio_ccw_mdev_reset(private))
>> > -                  return NOTIFY_BAD;
>> > +  /* Drivers MUST unpin pages in response to an invalidation. */
>> > +  if (!cp_iova_pinned(&private->cp, iova))
>> > +          return;
>> >  
>> > -          cp_free(&private->cp);
>> > -          return NOTIFY_OK;
>> > -  }
>> > +  if (vfio_ccw_mdev_reset(private))
>> > +          return;
>> >  
>> > -  return NOTIFY_DONE;
>> > +  cp_free(&private->cp);
>> >  }
>> 
>> 
>> The cp_free() call is gone here with [1], so I think this function now
>> just ends with:
>> 
>>      ...
>>      vfio_ccw_mdev_reset(private);
>> }
>> 
>> There are also minor contextual differences elsewhere from that series,
>> so a quick respin to record the changes on list would be appreciated.
>> 
>> However the above kind of highlights that NOTIFY_BAD that silently gets
>> dropped here.  I realize we weren't testing the return value of the
>> notifier call chain and really we didn't intend that notifiers could
>> return a failure here, but does this warrant some logging or suggest
>> future work to allow a device to go offline here?  Thanks.
>
> It looks like no.
>
> If the FSM trapped in a bad state here, such as
> VFIO_CCW_STATE_NOT_OPER, then it means it should have already unpinned
> the pages and this is considered a success for this purpose

A rather pathological case would be a subchannel that cannot be
quiesced and does not end up being non-operational; in theory, the
hardware could still try to access the buffers we provided for I/O. I'd
say that is extremely unlikely, we might log it, but really cannot do
anything else.

>
> The return code here exists only to return to userspace so it can
> detect during a VFIO_DEVICE_RESET that the device has crashed
> irrecoverably.

Does it imply only that ("it's dead, Jim"), or can it also imply a
runaway device? Not that userspace can do much in any case.

>
> Thus just continuing to silently ignore it seems like the best thing.
>
> Jason

Reply via email to