On Wed, 23 Jan 2019 11:34:47 +0100
Cornelia Huck <coh...@redhat.com> wrote:

> On Tue, 22 Jan 2019 20:03:31 +0100
> Halil Pasic <pa...@linux.ibm.com> wrote:
> 
> > On Tue, 22 Jan 2019 18:26:17 +0100
> > Cornelia Huck <coh...@redhat.com> wrote:
> > 
> > > On Tue, 22 Jan 2019 13:46:12 +0100
> > > Halil Pasic <pa...@linux.ibm.com> wrote:
> 
> > > > Unsolicited interrupts are another
> > > > piece of cake -- I have no idea how may of those do we get.  
> > > 
> > > They at least don't have the "free the cp before we got final state"
> > > bug. But I think both are reasons to get away from "use the BUSY state
> > > to ensure the right sequence".
> > >   
> > 
> > I'm not sure I understand you correctly. I was under the impression that
> > the whole point in having a state machine was to ensure the states are
> > traversed in the right sequence with the right stuff being done on each
> > transition. At least in theory.
> 
> Sequence in user space programs, not in the state machine.
> 

I'm a bit confused.

> > 
> > You've probably figured out that IMHO vfio-ccw is not in a good shape
> > (to put it mildly). I have a hard time reviewing a non-holistic
> > concurrency fix. Please tell sould I get perceived as non-constructive,
> > I will try to cut back on criticism. 
> 
> I'm afraid this is just confusing me :(
> 
> > 
> > > > And because
> > > > of this the broken sequencing in userspace could actually be the kernels
> > > > fault.  
> > > 
> > > Here, I can't follow you at all :(
> > >   
> > 
> > Should we ever deliver a zeroed out IRB to the userspace, for the next
> > ioinst it would look like we have no status nor FC bit set. That is, the
> > guest could end up with stuff in parallel that was never supposed to
> > be in parallel (i.e. broken sequencing because kernel feeds false
> > information due to race with unsolicited interrupt).
> > 
> > Does that help?
> 
> Not at all, I'm afraid :( User space programs still need to make sure
> they poke the interfaces in the right order IMO...
> 

Yes, one can usually think of interfaces as contracts: both sides need
to keep their end for things to work as intended. Unfortunately the
vfio-ccw iterface is not a very well specified one, and that makes
reasoning about right order so much harder.

I was under the impression that the right ordering is dictated by the
SCSW in userspace. E.g. if there is an FC bit set there userspace is not
ought to issue a SSCH request (write to the io_region). The kernel part
however may say 'userspace read the actual SCSW' by signaling
the io_trigger eventfd. Userspace is supposed to read the IRB from the
region and update it's SCSW.

Now if userspace reads a broken SCSW from the IRB, because of a race
(due to poorly written kernel part -- userspace not at fault), it is
going to make wrong assumptions about currently legal and illegal
operations (ordering).

Previously I described a scenario where IRB can break without userspace
being at fault (race between unsolicited interrupt -- can happen at any
time -- and a legit io request). I was under the impression we agreed on
this.

This in turn could lead to userspace violating the contract, as perceived
by the kernel side.

> At this point, I'm mostly confused... I'd prefer to simply fix things
> as they come up so that we can finally move forward with the halt/clear
> handling (and probably rework the state machine on top of that.)
> 

I understand. I guess you will want to send a new version because of the
stuff that got lost in the rebase, or?

Regards,
Halil


Reply via email to