On Tue, Apr 04, 2017 at 07:06:00PM +0200, Andrew Jones wrote:
> On Tue, Apr 04, 2017 at 05:24:03PM +0200, Christoffer Dall wrote:
> > Hi Drew,
> > 
> > On Fri, Mar 31, 2017 at 06:06:51PM +0200, Andrew Jones wrote:
> > > Signed-off-by: Andrew Jones <drjo...@redhat.com>
> > > ---
> > >  Documentation/virtual/kvm/vcpu-requests.rst | 114 
> > > ++++++++++++++++++++++++++++
> > >  1 file changed, 114 insertions(+)
> > >  create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst
> > > 
> > > diff --git a/Documentation/virtual/kvm/vcpu-requests.rst 
> > > b/Documentation/virtual/kvm/vcpu-requests.rst
> > > new file mode 100644
> > > index 000000000000..ea4a966d5c8a
> > > --- /dev/null
> > > +++ b/Documentation/virtual/kvm/vcpu-requests.rst
> > > @@ -0,0 +1,114 @@
> > > +=================
> > > +KVM VCPU Requests
> > > +=================
> > > +
> > > +Overview
> > > +========
> > > +
> > > +KVM supports an internal API enabling threads to request a VCPU thread to
> > > +perform some activity.  For example, a thread may request a VCPU to flush
> > > +its TLB with a VCPU request.  The API consists of only four calls::
> > > +
> > > +  /* Check if VCPU @vcpu has request @req pending. Clears the request. */
> > > +  bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
> > > +
> > > +  /* Check if any requests are pending for VCPU @vcpu. */
> > > +  bool kvm_request_pending(struct kvm_vcpu *vcpu);
> > > +
> > > +  /* Make request @req of VCPU @vcpu. */
> > > +  void kvm_make_request(int req, struct kvm_vcpu *vcpu);
> > > +
> > > +  /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
> > > +  bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
> > > +
> > > +Typically a requester wants the VCPU to perform the activity as soon
> > > +as possible after making the request.  This means most requests,
> > > +kvm_make_request() calls, are followed by a call to kvm_vcpu_kick(),
> > > +and kvm_make_all_cpus_request() has the kicking of all VCPUs built
> > > +into it.
> > > +
> > > +VCPU Kicks
> > > +----------
> > > +
> > > +A VCPU kick does one of three things:
> > > +
> > > + 1) wakes a sleeping VCPU (which sleeps outside guest mode).
> > 
> > You could clarify this to say that a sleeping VCPU is a VCPU thread
> > which is not runnable and placed on waitqueue, and waking it makes
> > the thread runnable again.
> > 
> > > + 2) sends an IPI to a VCPU currently in guest mode, in order to bring it
> > > +    out.
> > > + 3) nothing, when the VCPU is already outside guest mode and not 
> > > sleeping.
> > > +
> > > +VCPU Request Internals
> > > +======================
> > > +
> > > +VCPU requests are simply bit indices of the vcpu->requests bitmap.  This
> > > +means general bitops[1], e.g. clear_bit(KVM_REQ_UNHALT, &vcpu->requests),
> > > +may also be used.  The first 8 bits are reserved for architecture
> > > +independent requests, all additional bits are available for architecture
> > > +dependent requests.
> > 
> > Should we explain the ones that are generically defined and how they're
> > supposed to be used?  For example, we don't use them on ARM, and I don't
> > think I understand why another thread would ever make a PENDING_TIMER
> > request on a vcpu?
> 
> Yes, I agree the general requests should be described.  I'll have to
> figure out how :-)  Describing KVM_REQ_UNHALT will likely lead to a
> subsection on kvm_vcpu_block(), as you bring up below.
> 
> > 
> > > +
> > > +VCPU Requests with Associated State
> > > +===================================
> > > +
> > > +Requesters that want the requested VCPU to handle new state need to 
> > > ensure
> > > +the state is observable to the requested VCPU thread's CPU at the time 
> > > the
> > 
> > nit: need to ensure that the newly written state is observable ... by
> > the time it observed the request.
> > 
> > > +CPU observes the request.  This means a write memory barrier should be
> >                                                                  ^^^
> >                                                              must
> > 
> > > +insert between the preparation of the state and the write of the VCPU
> >     ^^^
> >    inserted
> > 
> > I would rephrase this as: '... after writing the new state to memory and
> > before setting the VCPU request bit.'
> > 
> > 
> > > +request bitmap.  Additionally, on the requested VCPU thread's side, a
> > > +corresponding read barrier should be issued after reading the request bit
> >                                 ^^^       ^^^
> >                            must      inserted (for consistency)
> > 
> > 
> > 
> > > +and before proceeding to use the state associated with it.  See the 
> > > kernel
> >                             ^^^    ^
> >                        read    new
> > 
> > 
> > > +memory barrier documentation [2].
> > 
> > I think it would be great if this document explains if this is currently
> > taken care of by the API you explain above or if there are cases where
> > people have to explicitly insert these barriers, and in that case, which
> > barriers they should use (if we know at this point already).
> 
> Will do.  The current API does take care of it.  I'll state that.  I'd
> have to grep around to see if there are any non-API users that also need
> barriers, but as they could change, I probably wouldn't want to call them
> out is the doc.  So I guess I'll still just wave my hand at that type of
> use.
> 

Sounds good.

> > 
> > > +
> > > +VCPU Requests and Guest Mode
> > > +============================
> > > +
> > 
> > I feel like an intro about the overall goal here is missing.  How about
> > something like this:
> > 
> >   When making requests to VCPUs, we want to avoid the receiving VCPU
> >   executing inside the guest for an arbitrary long time without handling
> >   the request.  The way we prevent this from happening is by keeping
> >   track of when a VCPU is running and sending an IPI to the physical CPU
> >   running the VCPU when that is the case.  However, each architecture
> >   implementation of KVM must take great care to ensure that requests are
> >   not missed when a VCPU stops running at the same time when a request
> >   is received.
> > 
> > Also, I'm not sure what the semantics are with kvm_vcpu_block().  Is it
> > ok to send a request to a VCPU and then the VCPU blocks and goes to
> > sleep forever even though there are pending requests?
> > kvm_vcpu_check_block() doesn't seem to check vcpu->requests which would
> > indicate that this is the case, but maybe architectures that actually do
> > use requests implement something else themselves?
> 
> I'll add a kvm_vcpu_block() subsection as part of the KVM_REQ_UNHALT
> documentation.
> 
> > 
> > > +As long as the guest is either in guest mode, in which case it gets an 
> > > IPI
> > 
> > guest is in guest mode?
> 
> oops, s/guest/vcpu/
> 
> > 
> > Perhaps this could be more clearly written as:
> > 
> > As long as the VCPU is running, it is marked as having vcpu->mode =
> > IN_GUEST MODE.  A requesting thread observing IN_GUEST_MODE will send an
> > IPI to the CPU running the VCPU thread.  On the other hand, when a
> > requesting thread observes vcpu->mode == OUTSIDE_GUEST_MODE, it will not 
> > send
> > any IPIs, but will simply set the request bit, a the VCPU thread will be
> > able to check the requests before running the VCPU again.  However, the
> > transition...
> > 
> > > +and will definitely see the request, or is outside guest mode, but has 
> > > yet
> > > +to do its final request check, and therefore when it does, it will see 
> > > the
> > > +request, then things will work.  However, the transition from outside to
> > > +inside guest mode, after the last request check has been made, opens a
> > > +window where a request could be made, but the VCPU would not see until it
> > > +exits guest mode some time later.  See the table below.
> > 
> > This text, and the table below, only deals with the details of entering
> > the guest.  Should we talk about kvm_vcpu_exiting_guest_mode() and
> > anything related to exiting the guest?
> 
> I think all !IN_GUEST_MODE should behave the same, so I was avoiding
> the use of EXITING_GUEST_MODE and OUTSIDE_GUEST_MODE, which wouldn't be
> hard to address, but then I'd also have to address
> READING_SHADOW_PAGE_TABLES, which may complicate the document more than
> necessary.  I'm not sure we need to address a VCPU exiting guest mode,
> other than making sure it's clear that a VCPU that exits must check
> requests before it enters again.

But the problem is that kvm_make_all_cpus_request() only sends IPIs to
CPUs where the mode was different from OUTSIDE_GUEST_MODE, so there it's
about !OUTSIDE_GUEST_MODE rather than !IN_GUEST_MODE, so there's some
subtlety here which I feel like it's dangerous to paper over.

> 
> > 
> > > +
> > > ++------------------+-----------------+----------------+--------------+
> > > +| vcpu->mode       | done last check | kick sends IPI | request seen |
> > > ++==================+=================+================+==============+
> > > +| IN_GUEST_MODE    |      N/A        |      YES       |     YES      |
> > > ++------------------+-----------------+----------------+--------------+
> > > +| !IN_GUEST_MODE   |      NO         |      NO        |     YES      |
> > > ++------------------+-----------------+----------------+--------------+
> > > +| !IN_GUEST_MODE   |      YES        |      NO        |     NO       |
> > > ++------------------+-----------------+----------------+--------------+
> > > +
> > > +To ensure the third scenario shown in the table above cannot happen, we
> > > +need to ensure the VCPU's mode change is observable by all CPUs prior to
> > > +its final request check and that a requester's request is observable by
> > > +the requested VCPU prior to the kick.  To do that we need general memory
> > > +barriers between each pair of operations involving mode and requests, 
> > > i.e.
> > > +
> > > +  CPU_i                                  CPU_j
> > > +-------------------------------------------------------------------------
> > > +  vcpu->mode = IN_GUEST_MODE;            kvm_make_request(REQ, vcpu);
> > > +  smp_mb();                              smp_mb();
> > > +  if (kvm_request_pending(vcpu))         if (vcpu->mode == IN_GUEST_MODE)
> > > +      handle_requests();                     send_IPI(vcpu->cpu);
> > > +
> > > +Whether explicit barriers are needed, or reliance on implicit barriers is
> > > +sufficient, is architecture dependent.  Alternatively, an architecture 
> > > may
> > > +choose to just always send the IPI, as not sending it, when it's not
> > > +necessary, is just an optimization.
> > 
> > Is this universally true?  This is certainly true on ARM, because we
> > disable interrupts before doing all this, so the IPI remains pending and
> > causes an immediate exit, but if any of the above is done with
> > interrupts enabled, just sending an IPI does nothing to ensure the
> > request is observed.  Perhaps this is not a case we should care about.
> 
> I'll try to make this less generic, as some architectures may not work
> this way.  Indeed, s390 doesn't seem to have kvm_vcpu_kick(), so I guess
> things don't work this way for them.
> 
> > 
> > > +
> > > +Additionally, the error prone third scenario described above also 
> > > exhibits
> > > +why a request-less VCPU kick is almost never correct.  Without the
> > > +assurance that a non-IPI generating kick will still result in an action 
> > > by
> > > +the requested VCPU, as the final kvm_request_pending() check does, then
> > > +the kick may not initiate anything useful at all.  If, for instance, a
> > > +request-less kick was made to a VCPU that was just about to set its mode
> > > +to IN_GUEST_MODE, meaning no IPI is sent, then the VCPU may continue its
> > > +entry without actually having done whatever it was the kick was meant to
> > > +initiate.
> > 
> > Indeed.
> > 
> > 
> > > +
> > > +References
> > > +==========
> > > +
> > > +[1] Documentation/core-api/atomic_ops.rst
> > > +[2] Documentation/memory-barriers.txt
> > > -- 
> > > 2.9.3
> > > 
> > 
> > This is a great writeup!  I enjoyed reading it and it made me think more
> > carefully about a number of things, so I definitely think we should
> > merge this.
> >
> 
> Thanks Christoffer!  I'll take all your suggestions above and try to
> answer your questions for v2.
> 

Awesome, I hope Radim finds this useful for his series and the rework
later on.

Thanks,
-Christoffer
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Reply via email to