2010/12/26 Michael S. Tsirkin <m...@redhat.com>:
> On Fri, Dec 24, 2010 at 08:42:19PM +0900, Yoshiaki Tamura wrote:
>> >> If qemu_aio_flush() is responsible for flushing the outstanding
>> >> virtio-net requests, I'm wondering why it's a problem for Kemari.
>> >> As I described in the previous message, Kemari queues the
>> >> requests first.  So in you example above, it should start with
>> >>
>> >> virtio-net: last_avai_idx 0 inuse 2
>> >> event-tap: {A,B}
>> >>
>> >> As you know, the requests are still in order still because net
>> >> layer initiates in order.  Not about completing.
>> >>
>> >> In the first synchronization, the status above is transferred.  In
>> >> the next synchronization, the status will be as following.
>> >>
>> >> virtio-net: last_avai_idx 1 inuse 1
>> >> event-tap: {B}
>> >
>> > OK, this answers the ordering question.
>>
>> Glad to hear that!
>>
>> > Another question: at this point we transfer this status: both
>> > event-tap and virtio ring have the command B,
>> > so the remote will have:
>> >
>> > virtio-net: inuse 0
>> > event-tap: {B}
>> >
>> > Is this right? This already seems to be a problem as when B completes
>> > inuse will go negative?
>>
>> I think state above is wrong.  inuse 0 means there shouldn't be
>> any requests in event-tap.  Note that the callback is called only
>> when event-tap flushes the requests.
>>
>> > Next it seems that the remote virtio will resubmit B to event-tap. The
>> > remote will then have:
>> >
>> > virtio-net: inuse 1
>> > event-tap: {B, B}
>> >
>> > This looks kind of wrong ... will two packets go out?
>>
>> No.  Currently, we're just replaying the requests with pio/mmio.
>> In the situation above, it should be,
>>
>> virtio-net: inuse 1
>> event-tap: {B}
>> >> Why? Because Kemari flushes the first virtio-net request using
>> >> qemu_aio_flush() before each synchronization.  If
>> >> qemu_aio_flush() doesn't guarantee the order, what you pointed
>> >> should be problematic.  So in the final synchronization, the
>> >> state should be,
>> >>
>> >> virtio-net: last_avai_idx 2 inuse 0
>> >> event-tap: {}
>> >>
>> >> where A,B were completed in order.
>> >>
>> >> Yoshi
>> >
>> >
>> > It might be better to discuss block because that's where
>> > requests can complete out of order.
>>
>> It's same as net.  We queue requests and call bdrv_flush per
>> sending requests to the block.  So there shouldn't be any
>> inversion.
>>
>> > So let me see if I understand:
>> > - each command passed to event tap is queued by it,
>> >  it is not passed directly to the backend
>> > - later requests are passed to the backend,
>> >  always in the same order that they were submitted
>> > - each synchronization point flushes all requests
>> >  passed to the backend so far
>> > - each synchronization transfers all requests not passed to the backend,
>> >  to the remote, and they are replayed there
>>
>> Correct.
>>
>> > Now to analyse this for correctness I am looking at the original patch
>> > because it is smaller so easier to analyse and I think it is
>> > functionally equivalent, correct me if I am wrong in this.
>>
>> So you think decreasing last_avail_idx upon save is better than
>> updating it in the callback?
>>
>> > So the reason there's no out of order issue is this
>> > (and might be a good thing to put in commit log
>> > or a comment somewhere):
>>
>> I've done some in the latest patch.  Please point it out if it
>> wasn't enough.
>>
>> > At point of save callback event tap has flushed commands
>> > passed to the backend already. Thus at the point of
>> > the save callback if a command has completed
>> > all previous commands have been flushed and completed.
>> >
>> >
>> > Therefore inuse is
>> > in fact the # of requests passed to event tap but not yet
>> > passed to the backend (for non-event tap case all commands are
>> > passed to the backend immediately and because of this
>> > inuse is 0) and these are the last inuse commands submitted.
>> >
>> >
>> > Right?
>>
>> Yep.
>>
>> > Now a question:
>> >
>> > When we pass last_used_index - inuse to the remote,
>> > the remote virtio will resubmit the request.
>> > Since request is also passed by event tap, we get
>> > the request twice, why is this not a problem?
>>
>> It's not a problem because event-tap currently replays with
>> pio/mmio only, as I mentioned above.  Although event-tap receives
>> information about the queued requests, it won't pass it to the
>> backend.  The reason is the problem in setting the callbacks
>> which are specific to devices on the secondary.  These are
>> pointers, and even worse, are usually static functions, which
>> event-tap has no way to restore it upon failover.  I do want to
>> change event-tap replay to be this way in the future, pio/mmio
>> replay is implemented for now.
>>
>> Thanks,
>>
>> Yoshi
>>
>
> Then I am still confused, sorry.  inuse != 0 means that some requests
> were passed to the backend but did not complete.  I think that if you do
> a flush, this waits until all requests passed to the backend will
> complete.  Why does not this guarantee inuse = 0 on the origin at the
> synchronization point?

The synchronization is done before event-tap releases requests to
the backend, so there are two types of flush: event-tap and
backend block/net.  I assume you're confused with the fact that
flushing backend with qemu_aio_flush/bdrv_flush doesn't necessary
decrease inuse if event-tap has queued requests because there are
no requests passed to the backend.  Let me do a case study again.

virtio: inuse 4
event-tap: {A,B,C}
backend: {D}

synchronization starts.  backend gets flushed.

virtio: inuse 3
event-tap: {A,B,C}
backend: {}

synchronization gets done.
# secondary is virtio: inuse 3

event-tap flushes one request.

virtio: inuse 2
event-tap: {B,C}
backend: {}

repeats above and finally it should be,

virtio: inuse 0
event-tap: {}

Hope this helps.

Yoshi

>
> --
> MST
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to