replay

Pavel Dovgalyuk Wed, 24 Feb 2016 04:00:13 -0800

> From: Kevin Wolf [mailto:kw...@redhat.com]
> Am 20.02.2016 um 08:11 hat Pavel Dovgalyuk geschrieben:
> > > From: Pavel Dovgalyuk [mailto:dovga...@ispras.ru]
> > > > From: Kevin Wolf [mailto:kw...@redhat.com]
> > > > Am 16.02.2016 um 12:20 hat Pavel Dovgalyuk geschrieben:
> > > > > Coroutine                                                         
> > > > > Replay
> > > > > bool *done = req_replayed_list_get(reqid) // NULL
> > > > >                                                                   co =
> > > > req_completed_list_get(e.reqid); // NULL
> > > >
> > > > There was no yield, this context switch is impossible to happen. Same
> > > > for the switch back.
> > > >
> > > > > req_completed_list_insert(reqid, qemu_coroutine_self());
> > > > > qemu_coroutine_yield();
> > > >
> > > > This is the point at which a context switch happens. The only other
> > > > point in my code is the qemu_coroutine_enter() in the other function.
> > >
> > > I've fixed aio_poll problem by disabling mutex lock for the 
> > > replay_run_block_event()
> > > execution. Now virtual machine deterministically runs 4e8 instructions of 
> > > Windows XP
> booting.
> > > But then one non-deterministic event happens.
> > > Callback after finishing coroutine may be called from different contexts.
> 
> How does this happen? I'm not aware of callbacks being processed by any
> thread other than the I/O thread for that specific block device (unless
> you use dataplane, this is the main loop thread).
> 
> > > apic_update_irq() function behaves differently being called from vcpu and 
> > > io threads.
> > > In one case it sets CPU_INTERRUPT_POLL and in other - nothing happens.
> >
> > Kevin, do you have some ideas how to fix this issue?
> > This happens because of coroutines may be assigned to different threads.
> > Maybe there is some way of making this assignment more deterministic?
> 
> Coroutines aren't randomly assigned to threads, but threads actively
> enter coroutines. To my knowledge this happens only when starting a
> request (either vcpu or I/O thread; consistent per device) or by a
> callback when some event happens (only I/O thread). I can't see any
> non-determinism here.


Behavior of coroutines looks strange for me.
Consider the code below (co_readv function of the replay driver).
In record mode it somehow changes the thread it assigned to.
Code in point A is executed in CPU thread and code in point B - in some other 
thread.
May this happen because this coroutine yields somewhere and its execution is 
restored 
by aio_poll, which is called from iothread?
In this case event finishing callback cannot be executed deterministically
(always in CPU thread or always in IO thread).

static int coroutine_fn blkreplay_co_readv(BlockDriverState *bs,
    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
{
    BDRVBlkreplayState *s = bs->opaque;
    uint32_t reqid = request_id++;
    Request *req;
// A
    bdrv_co_readv(bs->file->bs, sector_num, nb_sectors, qiov);

    if (replay_mode == REPLAY_MODE_RECORD) {
        replay_save_block_event(reqid);
    } else {
        assert(replay_mode == REPLAY_MODE_PLAY);
        if (reqid == current_request) {
            current_finished = true;
        } else {
            req = block_request_insert(reqid, bs, qemu_coroutine_self());
            qemu_coroutine_yield();
            block_request_remove(req);
        }
    }
// B
    return 0;
}

Pavel Dovgalyuk

Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay

Reply via email to