Am 15.02.2016 um 14:54 hat Pavel Dovgalyuk geschrieben: > > From: Kevin Wolf [mailto:kw...@redhat.com] > > Am 15.02.2016 um 10:14 hat Pavel Dovgalyuk geschrieben: > > > > From: Pavel Dovgalyuk [mailto:dovga...@ispras.ru] > > > > > From: Kevin Wolf [mailto:kw...@redhat.com] > > > > > > > > > > > > > > int blkreplay_co_readv() > > > > > > > { > > > > > > > BlockReplayState *s = bs->opaque; > > > > > > > int reqid = s->reqid++; > > > > > > > > > > > > > > bdrv_co_readv(bs->file, ...); > > > > > > > > > > > > > > if (mode == record) { > > > > > > > log(reqid, time); > > > > > > > } else { > > > > > > > assert(mode == replay); > > > > > > > bool *done = req_replayed_list_get(reqid) > > > > > > > if (done) { > > > > > > > *done = true; > > > > > > > } else { > > > > > > point A > > > > > > > req_completed_list_insert(reqid, > > > > > > > qemu_coroutine_self()); > > > > > > > qemu_coroutine_yield(); > > > > > > > } > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > /* called by replay.c */ > > > > > > > int blkreplay_run_event() > > > > > > > { > > > > > > > if (mode == replay) { > > > > > > > co = req_completed_list_get(e.reqid); > > > > > > > if (co) { > > > > > > > qemu_coroutine_enter(co); > > > > > > > } else { > > > > > > > bool done = false; > > > > > > > req_replayed_list_insert(reqid, &done); > > > > > > point B > > > > > > > /* wait synchronously for completion */ > > > > > > > while (!done) { > > > > > > > aio_poll(); > > > > > > > } > > > > > > > } > > > > > > > } > > > > > > > } > > > > > > > > > > > > One more question about coroutines. > > > > > > Are race conditions possible in this sample? > > > > > > In replay mode we may call readv, and reach point A. > > > > > > On the same time, we will read point B in another thread. > > > > > > Then readv will yield and nobody will start it back? > > > > > > > > > > There are two aspects to this: > > > > > > > > > > * Real multithreading doesn't exist in the block layer. All block > > > > > driver > > > > > functions are only called with the mutex in the AioContext held. > > > > > There > > > > > is exactly one AioContext per BDS, so no two threads can possible be > > > > > operating on the same BDS at the same time. > > > > > > > > > > * Coroutines are different from threads in that they aren't > > > > > preemptive. > > > > > They are only interrupted in places where they explicitly yield. > > > > > > > > > > Of course, in order for this to work, we actually need to take the > > > > > mutex > > > > > before calling blkreplay_run_event(), which is called directly from > > > > > the > > > > > replay code (which runs in the mainloop thread? Or vcpu?). > > > > > > > > blkreplay_run_event() is called from replay code which is protected by > > > > mutex. > > > > This function may be called from io and vcpu threads, because both of > > > > them > > > > have replay functions invocations. > > > > > > Now I've encountered a situation where blkreplay_run_event is called from > > > read coroutine: > > > bdrv_prwv_co -> aio_poll -> qemu_clock_get_ns -> replay_read_clock -> > > > blkreplay_run_event > > > \--> bdrv_co_readv -> blkreplay_co_readv -> > > > bdrv_co_readv(lower layer) > > > > > > bdrv_co_readv inside blkreplay_co_readv can't proceed in this situation. > > > This is probably because aio_poll has taken the aio context? > > > How can I resolve this? > > > > First of all, I'm not sure if running replay events from > > qemu_clock_get_ns() is such a great idea. This is not a function that > > callers expect to change any state. If you absolutely have to do it > > there instead of in the clock device emulations, maybe restricting it to > > replaying clock events could make it a bit more harmless. > > Only virtual clock is emulated, and host clock is read from the host > real time sources and therefore has to be saved into the log.
Isn't the host clock invisible to the guest anyway? > There could be asynchronous events that occur in non-cpu threads. > For now these events are shutdown request and block task execution. > They may "hide" following clock (or another one) events. That is why > we process them until synchronous event (like clock, instructions > execution, or checkpoint) is met. > > > > Anyway, what does "can't proceed" mean? The coroutine yields because > > it's waiting for I/O, but it is never reentered? Or is it hanging while > > trying to acquire a lock? > > I've solved this problem by slightly modifying the queue. > I haven't yet made BlockDriverState assignment to the request ids. > Therefore aio_poll was temporarily replaced with usleep. > Now execution starts and hangs at some random moment of OS loading. > > Here is the current version of blkreplay functions: > > static int coroutine_fn blkreplay_co_readv(BlockDriverState *bs, > int64_t sector_num, int nb_sectors, QEMUIOVector *qiov) > { > uint32_t reqid = request_id++; > Request *req; > req = block_request_insert(reqid, bs, qemu_coroutine_self()); > bdrv_co_readv(bs->file->bs, sector_num, nb_sectors, qiov); > > if (replay_mode == REPLAY_MODE_RECORD) { > replay_save_block_event(reqid); > } else { > assert(replay_mode == REPLAY_MODE_PLAY); > qemu_coroutine_yield(); > } > block_request_remove(req); > > return 0; > } > > void replay_run_block_event(uint32_t id) > { > Request *req; > if (replay_mode == REPLAY_MODE_PLAY) { > while (!(req = block_request_find(id))) { > //aio_poll(bdrv_get_aio_context(req->bs), true); > usleep(1); > } How is this loop supposed to make any progress? And I still don't understand why aio_poll() doesn't work and where it hangs. Kevin > qemu_coroutine_enter(req->co, NULL); > } > } > > > Can you provide more detail about the exact place where it's hanging, > > both in the coroutine and in the main "coroutine" that executes > > aio_poll()? > > In this version replay_run_block_event() executes while loop. > I haven't found what other threads do, because the debugger doesn't show me > call stack when thread is waiting in some blocking function. > > Pavel Dovgalyuk >