On Wed, 23 Jan 2019 11:34:47 +0100 Cornelia Huck <coh...@redhat.com> wrote:
> On Tue, 22 Jan 2019 20:03:31 +0100 > Halil Pasic <pa...@linux.ibm.com> wrote: > > > On Tue, 22 Jan 2019 18:26:17 +0100 > > Cornelia Huck <coh...@redhat.com> wrote: > > > > > On Tue, 22 Jan 2019 13:46:12 +0100 > > > Halil Pasic <pa...@linux.ibm.com> wrote: > > > > > Unsolicited interrupts are another > > > > piece of cake -- I have no idea how may of those do we get. > > > > > > They at least don't have the "free the cp before we got final state" > > > bug. But I think both are reasons to get away from "use the BUSY state > > > to ensure the right sequence". > > > > > > > I'm not sure I understand you correctly. I was under the impression that > > the whole point in having a state machine was to ensure the states are > > traversed in the right sequence with the right stuff being done on each > > transition. At least in theory. > > Sequence in user space programs, not in the state machine. > I'm a bit confused. > > > > You've probably figured out that IMHO vfio-ccw is not in a good shape > > (to put it mildly). I have a hard time reviewing a non-holistic > > concurrency fix. Please tell sould I get perceived as non-constructive, > > I will try to cut back on criticism. > > I'm afraid this is just confusing me :( > > > > > > > And because > > > > of this the broken sequencing in userspace could actually be the kernels > > > > fault. > > > > > > Here, I can't follow you at all :( > > > > > > > Should we ever deliver a zeroed out IRB to the userspace, for the next > > ioinst it would look like we have no status nor FC bit set. That is, the > > guest could end up with stuff in parallel that was never supposed to > > be in parallel (i.e. broken sequencing because kernel feeds false > > information due to race with unsolicited interrupt). > > > > Does that help? > > Not at all, I'm afraid :( User space programs still need to make sure > they poke the interfaces in the right order IMO... > Yes, one can usually think of interfaces as contracts: both sides need to keep their end for things to work as intended. Unfortunately the vfio-ccw iterface is not a very well specified one, and that makes reasoning about right order so much harder. I was under the impression that the right ordering is dictated by the SCSW in userspace. E.g. if there is an FC bit set there userspace is not ought to issue a SSCH request (write to the io_region). The kernel part however may say 'userspace read the actual SCSW' by signaling the io_trigger eventfd. Userspace is supposed to read the IRB from the region and update it's SCSW. Now if userspace reads a broken SCSW from the IRB, because of a race (due to poorly written kernel part -- userspace not at fault), it is going to make wrong assumptions about currently legal and illegal operations (ordering). Previously I described a scenario where IRB can break without userspace being at fault (race between unsolicited interrupt -- can happen at any time -- and a legit io request). I was under the impression we agreed on this. This in turn could lead to userspace violating the contract, as perceived by the kernel side. > At this point, I'm mostly confused... I'd prefer to simply fix things > as they come up so that we can finally move forward with the halt/clear > handling (and probably rework the state machine on top of that.) > I understand. I guess you will want to send a new version because of the stuff that got lost in the rebase, or? Regards, Halil