Am 24.02.2016 um 12:59 hat Pavel Dovgalyuk geschrieben: > > From: Kevin Wolf [mailto:kw...@redhat.com] > > Am 20.02.2016 um 08:11 hat Pavel Dovgalyuk geschrieben: > > > > From: Pavel Dovgalyuk [mailto:dovga...@ispras.ru] > > > > > From: Kevin Wolf [mailto:kw...@redhat.com] > > > > > Am 16.02.2016 um 12:20 hat Pavel Dovgalyuk geschrieben: > > > > > > Coroutine > > > > > > Replay > > > > > > bool *done = req_replayed_list_get(reqid) // NULL > > > > > > > > > > > > co = > > > > > req_completed_list_get(e.reqid); // NULL > > > > > > > > > > There was no yield, this context switch is impossible to happen. Same > > > > > for the switch back. > > > > > > > > > > > req_completed_list_insert(reqid, qemu_coroutine_self()); > > > > > > qemu_coroutine_yield(); > > > > > > > > > > This is the point at which a context switch happens. The only other > > > > > point in my code is the qemu_coroutine_enter() in the other function. > > > > > > > > I've fixed aio_poll problem by disabling mutex lock for the > > > > replay_run_block_event() > > > > execution. Now virtual machine deterministically runs 4e8 instructions > > > > of Windows XP > > booting. > > > > But then one non-deterministic event happens. > > > > Callback after finishing coroutine may be called from different > > > > contexts. > > > > How does this happen? I'm not aware of callbacks being processed by any > > thread other than the I/O thread for that specific block device (unless > > you use dataplane, this is the main loop thread). > > > > > > apic_update_irq() function behaves differently being called from vcpu > > > > and io threads. > > > > In one case it sets CPU_INTERRUPT_POLL and in other - nothing happens. > > > > > > Kevin, do you have some ideas how to fix this issue? > > > This happens because of coroutines may be assigned to different threads. > > > Maybe there is some way of making this assignment more deterministic? > > > > Coroutines aren't randomly assigned to threads, but threads actively > > enter coroutines. To my knowledge this happens only when starting a > > request (either vcpu or I/O thread; consistent per device) or by a > > callback when some event happens (only I/O thread). I can't see any > > non-determinism here. > > Behavior of coroutines looks strange for me. > Consider the code below (co_readv function of the replay driver). > In record mode it somehow changes the thread it assigned to. > Code in point A is executed in CPU thread and code in point B - in some other > thread. > May this happen because this coroutine yields somewhere and its execution is > restored > by aio_poll, which is called from iothread? > In this case event finishing callback cannot be executed deterministically > (always in CPU thread or always in IO thread). > > static int coroutine_fn blkreplay_co_readv(BlockDriverState *bs, > int64_t sector_num, int nb_sectors, QEMUIOVector *qiov) > { > BDRVBlkreplayState *s = bs->opaque; > uint32_t reqid = request_id++; > Request *req; > // A > bdrv_co_readv(bs->file->bs, sector_num, nb_sectors, qiov); > > if (replay_mode == REPLAY_MODE_RECORD) { > replay_save_block_event(reqid); > } else { > assert(replay_mode == REPLAY_MODE_PLAY); > if (reqid == current_request) { > current_finished = true; > } else { > req = block_request_insert(reqid, bs, qemu_coroutine_self()); > qemu_coroutine_yield(); > block_request_remove(req); > } > } > // B > return 0; > }
Yes, I guess this can happen. As I described above, the coroutine can be entered from a vcpu thread initially. After yielding for the first time, it is resumed from the I/O thread. So if there are paths where the coroutine never yields, the coroutine completes in the original vcpu thread. (It's not the common case that bdrv_co_readv() doesn't yield, but it happens e.g. with unallocated sectors in qcow2.) If this is a problem for you, you need to force the coroutine into the I/O thread. You can do that by scheduling a BH, then yield, and then let the BH reenter the coroutine. Kevin