On Tue, Sep 14 2021, Jason Gunthorpe <j...@nvidia.com> wrote:

> On Mon, Sep 13, 2021 at 04:31:54PM -0400, Eric Farman wrote:
>> > I rebased it and fixed it up here:
>> > 
>> > https://github.com/jgunthorpe/linux/tree/vfio_ccw
>> > 
>> > Can you try again?
>> 
>> That does address the crash, but then why is it processing a BROKEN
>> event? Seems problematic. 
>
> The stuff related to the NOT_OPER looked really wonky to me. I'm
> guessing this is the issue - not sure about the pmcw.ena either..

[I have still not been able to digest the whole series, sorry.]

>
> diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
> index 5ea392959c0711..0d4d4f425befac 100644
> --- a/drivers/s390/cio/vfio_ccw_fsm.c
> +++ b/drivers/s390/cio/vfio_ccw_fsm.c
> @@ -380,29 +380,19 @@ static void fsm_open(struct vfio_ccw_private *private,
>       spin_unlock_irq(sch->lock);
>  }
>  
> -static void fsm_close(struct vfio_ccw_private *private,
> -                   enum vfio_ccw_event event)
> +static int flush_sch(struct vfio_ccw_private *private)
>  {
>       struct subchannel *sch = private->sch;
>       DECLARE_COMPLETION_ONSTACK(completion);
>       int iretry, ret = 0;
>  
> -     spin_lock_irq(sch->lock);
> -     if (!sch->schib.pmcw.ena)
> -             goto err_unlock;
> -     ret = cio_disable_subchannel(sch);
> -     if (ret != -EBUSY)
> -             goto err_unlock;
> -
>       iretry = 255;
>       do {
> -
>               ret = cio_cancel_halt_clear(sch, &iretry);
> -
>               if (ret == -EIO) {
>                       pr_err("vfio_ccw: could not quiesce subchannel 
> 0.%x.%04x!\n",
>                              sch->schid.ssid, sch->schid.sch_no);
> -                     break;
> +                     return ret;

Looking at this, I wonder why we had special-cased -EIO -- for -ENODEV
we should be done as well, as then the device is dead and we do not need
to disable it.

>               }
>  
>               /*
> @@ -413,13 +403,28 @@ static void fsm_close(struct vfio_ccw_private *private,
>               spin_unlock_irq(sch->lock);
>  
>               if (ret == -EBUSY)
> -                     wait_for_completion_timeout(&completion, 3*HZ);
> +                     wait_for_completion_timeout(&completion, 3 * HZ);
>  
>               private->completion = NULL;
>               flush_workqueue(vfio_ccw_work_q);
>               spin_lock_irq(sch->lock);
>               ret = cio_disable_subchannel(sch);
>       } while (ret == -EBUSY);
> +     return ret;
> +}
> +
> +static void fsm_close(struct vfio_ccw_private *private,
> +                   enum vfio_ccw_event event)
> +{
> +     struct subchannel *sch = private->sch;
> +     int ret;
> +
> +     spin_lock_irq(sch->lock);
> +     if (!sch->schib.pmcw.ena)
> +             goto err_unlock;
> +     ret = cio_disable_subchannel(sch);

cio_disable_subchannel() should be happy to disable an already disabled
subchannel, so I guess we can just walk through this and end up in
CLOSED state... unless entering with !ena actually indicates that we
messed up somewhere else in the state machine. I still need to find time
to read the patches.

> +     if (ret == -EBUSY)
> +             ret = flush_sch(private);
>       if (ret)
>               goto err_unlock;
>       private->state = VFIO_CCW_STATE_CLOSED;

Reply via email to