2011/1/4 Michael S. Tsirkin <m...@redhat.com>:
> On Tue, Jan 04, 2011 at 09:20:53PM +0900, Yoshiaki Tamura wrote:
>> 2011/1/4 Michael S. Tsirkin <m...@redhat.com>:
>> > On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote:
>> >> 2010/11/29 Stefan Hajnoczi <stefa...@gmail.com>:
>> >> > On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
>> >> > <tamura.yoshi...@lab.ntt.co.jp> wrote:
>> >> >> event-tap controls when to start FT transaction, and provides proxy
>> >> >> functions to called from net/block devices.  While FT transaction, it
>> >> >> queues up net/block requests, and flush them when the transaction gets
>> >> >> completed.
>> >> >>
>> >> >> Signed-off-by: Yoshiaki Tamura <tamura.yoshi...@lab.ntt.co.jp>
>> >> >> Signed-off-by: OHMURA Kei <ohmura....@lab.ntt.co.jp>
>> >> >> ---
>> >> >>  Makefile.target |    1 +
>> >> >>  block.h         |    9 +
>> >> >>  event-tap.c     |  794 
>> >> >> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >>  event-tap.h     |   34 +++
>> >> >>  net.h           |    4 +
>> >> >>  net/queue.c     |    1 +
>> >> >>  6 files changed, 843 insertions(+), 0 deletions(-)
>> >> >>  create mode 100644 event-tap.c
>> >> >>  create mode 100644 event-tap.h
>> >> >
>> >> > event_tap_state is checked at the beginning of several functions.  If
>> >> > there is an unexpected state the function silently returns.  Should
>> >> > these checks really be assert() so there is an abort and backtrace if
>> >> > the program ever reaches this state?
>> >> >
>> >> >> +typedef struct EventTapBlkReq {
>> >> >> +    char *device_name;
>> >> >> +    int num_reqs;
>> >> >> +    int num_cbs;
>> >> >> +    bool is_multiwrite;
>> >> >
>> >> > Is multiwrite logging necessary?  If event tap is called from within
>> >> > the block layer then multiwrite is turned into one or more
>> >> > bdrv_aio_writev() calls.
>> >> >
>> >> >> +static void event_tap_replay(void *opaque, int running, int reason)
>> >> >> +{
>> >> >> +    EventTapLog *log, *next;
>> >> >> +
>> >> >> +    if (!running) {
>> >> >> +        return;
>> >> >> +    }
>> >> >> +
>> >> >> +    if (event_tap_state != EVENT_TAP_LOAD) {
>> >> >> +        return;
>> >> >> +    }
>> >> >> +
>> >> >> +    event_tap_state = EVENT_TAP_REPLAY;
>> >> >> +
>> >> >> +    QTAILQ_FOREACH(log, &event_list, node) {
>> >> >> +        EventTapBlkReq *blk_req;
>> >> >> +
>> >> >> +        /* event resume */
>> >> >> +        switch (log->mode & ~EVENT_TAP_TYPE_MASK) {
>> >> >> +        case EVENT_TAP_NET:
>> >> >> +            event_tap_net_flush(&log->net_req);
>> >> >> +            break;
>> >> >> +        case EVENT_TAP_BLK:
>> >> >> +            blk_req = &log->blk_req;
>> >> >> +            if ((log->mode & EVENT_TAP_TYPE_MASK) == 
>> >> >> EVENT_TAP_IOPORT) {
>> >> >> +                switch (log->ioport.index) {
>> >> >> +                case 0:
>> >> >> +                    cpu_outb(log->ioport.address, log->ioport.data);
>> >> >> +                    break;
>> >> >> +                case 1:
>> >> >> +                    cpu_outw(log->ioport.address, log->ioport.data);
>> >> >> +                    break;
>> >> >> +                case 2:
>> >> >> +                    cpu_outl(log->ioport.address, log->ioport.data);
>> >> >> +                    break;
>> >> >> +                }
>> >> >> +            } else {
>> >> >> +                /* EVENT_TAP_MMIO */
>> >> >> +                cpu_physical_memory_rw(log->mmio.address,
>> >> >> +                                       log->mmio.buf,
>> >> >> +                                       log->mmio.len, 1);
>> >> >> +            }
>> >> >> +            break;
>> >> >
>> >> > Why are net tx packets replayed at the net level but blk requests are
>> >> > replayed at the pio/mmio level?
>> >> >
>> >> > I expected everything to replay either as pio/mmio or as net/block.
>> >>
>> >> Stefan,
>> >>
>> >> After doing some heavy load tests, I realized that we have to
>> >> take a hybrid approach to replay for now.  This is because when a
>> >> device moves to the next state (e.g. virtio decreases inuse) is
>> >> different between net and block.  For example, virtio-net
>> >> decreases inuse upon returning from the net layer,
>> >> but virtio-blk
>> >> does that inside of the callback.
>> >
>> > For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete.
>> > For RX, virtio-net calls virtqueue_flush from virtio_net_receive.
>> > Both are invoked from a callback.
>> >
>> >> If we only use pio/mmio
>> >> replay, even though event-tap tries to replay net requests, some
>> >> get lost because the state has proceeded already.
>> >
>> > It seems that all you need to do to avoid this is to
>> > delay the callback?
>>
>> Yeah, if it's possible.  But if you take a look at virtio-net,
>> you'll see that virtio_push is called immediately after calling
>> qemu_sendv_packet
>> while virtio-blk does that in the callback.
>
> This is only if the packet was sent immediately.
> I was referring to the case where the packet is queued.

I see.  I usually don't see packets get queued in the net layer.
What would be the effect to devices?  Restraint sending packets?

>
>> >
>> >> This doesn't
>> >> happen with block, because the state is still old enough to
>> >> replay.  Note that using hybrid approach won't cause duplicated
>> >> requests on the secondary.
>> >
>> > An assumption devices make is that a buffer is unused once
>> > completion callback was invoked. Does this violate that assumption?
>>
>> No, it shouldn't.  In case of net with net layer replay, we copy
>> the content of the requests, and in case of block, because we
>> haven't called the callback yet, the requests remains fresh.
>>
>> Yoshi
>>
>
> Yes, as long as you copy it should be fine.  Maybe it's a good idea for
> event-tap to queue all packets to avoid the copy and avoid the need to
> replay at the net level.

If queuing works fine for the devices, it seems to be a good
idea.  I think the ordering issue doesn't happen still.

Yoshi

>
>> >
>> > --
>> > MST
>> >
>> >
>
>

Reply via email to