On Tue, Feb 09, 2010 at 08:38:56AM +0200, Avi Kivity wrote: > On 02/09/2010 12:41 AM, Marcelo Tosatti wrote: > >On Thu, Feb 04, 2010 at 11:46:25PM +0200, Avi Kivity wrote: > >>On 02/04/2010 11:36 PM, Marcelo Tosatti wrote: > >>>On Thu, Feb 04, 2010 at 09:16:47PM +0200, Avi Kivity wrote: > >>>>On 01/28/2010 09:03 PM, Marcelo Tosatti wrote: > >>>>>A vcpu can be stopped after handling IO in userspace, > >>>>>but before returning to kernel to finish processing. > >>>>> > >>>>Is this strictly needed? If we teach qemu to migrate before > >>>>executing the pio request, I think we'll be all right? should work > >>>>at least for IN/INS, not sure about OUT/OUTS. > >>>It would be nice (instead of more state to keep track of between > >>>kernel/user) but the drawbacks i see are: > >>> > >>>You'd have to add a limitation so that any IN which was processed > >>>by device emulation has to re-entry kernel to complete it (so it > >>>complicates vcpu stop in userspace). > >>> > >>You could fix that by moving the IN emulation to before guest entry. > >>It complicates the vcpu loop a bit, but is backwards compatible and > >>all that. > >Under such scheme, to avoid a stream of IN's from temporarily blocking > >vcpu stop capability, you'd have to requeue a signal to stop the vcpu > >(so the next IN in the stream is not executed, but complete_pio does). > > > >Or not process the stop signal in the first place (new state for main > >loop, "pending pio/mmio"). > > Why? you would handle stops exactly the same way: > > vcpu_loop: > while running: > process_last_in() > run_vcpu() > handle_exit_except_in() > > An IN that is stopped would simply be unprocessed, and the next > entry, if at a new host, will simply re-execute it.
Its not so simple. The kernel advances RIP before exiting to userspace with EXIT_IO (for IN). So simply skipping an IN exit is not possible. In the case of an IN, you have to make sure kernel re-entry is performed (to complete the operation). This is what complicates vcpu stop (you need a new state which says "do not stop vcpu, re-enter kernel first"). And then you must re-raise the stop signal before entering the kernel. Does that make sense? > >Or even just copy the result from QEMU device to RAX in userspace, which > >is somewhat nasty since you'd have either userspace or kernel finishing > >the op. > > Definitely bad. > > >For REP OUTS larger than page size, the current position is held in RCX, > >but complete_pio uses vcpu->arch.pio.cur_count and count to hold the > >position. So you either make it possible to writeback vcpu->arch.pio > >to the kernel, or wait for the operation to finish (with similar > >complications regarding signal processing). > > RCX is always consistent, no? So if we migrate in the middle of REP > OUTS, the operation will restart at the correct place? On a second though, yeah, the state held in vcpu->arch.pio will be reinstatiated on the destination with updates values from RCX. > >As i see it, the benefit of backward compatibility is not worthwhile > >compared to the complications introduced to vcpu loop processing (and > >potential for damaging vcpu stop -> vcpu stopped latency). > > > >Are you certain its worth avoiding the restore ioctl for pio/mmio? > > First, let's see if it's feasible or not. If it's feasible, it's > probably just a matter of rearranging things to get userspace sane. > A small price to pay for backward compatibility. > > > -- > I have a truly marvellous patch that fixes the bug which this > signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html