On Wed, Aug 21, 2013 at 05:33:41PM +0800, Wenchao Xia wrote: > 于 2013-8-21 16:45, Stefan Hajnoczi 写道: > >On Tue, Aug 20, 2013 at 05:59:58PM +0800, Wenchao Xia wrote: > >>于 2013-8-16 16:12, Wenchao Xia 写道: > >>>于 2013-8-16 15:15, Wenchao Xia 写道: > >>>>于 2013-8-16 0:32, Michael Roth 写道: > >>>>>Quoting Michael Roth (2013-08-15 10:23:20) > >>>>>>Quoting Wenchao Xia (2013-08-13 03:44:39) > >>>>>>>于 2013-8-13 1:01, Michael Roth 写道: > >>>>>>>>Quoting Paolo Bonzini (2013-08-12 02:30:28) > >>>>>>>>>>1) rename AioContext to AioSource. > >>>>>>>>>> This is my major purpose, which declare it is not a "context" > >>>>>>>>>>concept, > >>>>>>>>>>and GMainContext is the entity represent the thread's activity. > >>>>>>>>> > >>>>>>>>>Note that the nested event loops in QEMU are _very_ different from > >>>>>>>>>glib nested event loops. In QEMU, nested event loops only run block > >>>>>>>>>layer events. In glib, they run all events. That's why you need > >>>>>>>>>AioContext. > >>>>>>>> > >>>>>>>>Would it be possible to use glib for our nested loops as well by just > >>>>>>>>setting a higher priority for the AioContext GSource? > >>>>>>>> > >>>>>>>>Stefan and I were considering how we could make use of his "drop > >>>>>>>>ioflush" patches to use a common mechanism to register fd events, but > >>>>>>>>still allow us to distinguish between AioContext and non-AioContext > >>>>>>>>for nested loops. I was originally thinking of using prepare() > >>>>>>>>functions > >>>>>>>>to filter out non-AioContext events, but that requires we implement > >>>>>>>>on GSource's with that in mind, and non make use of pre-baked ones > >>>>>>>>like GIOChannel's, and bakes block stuff into every event source > >>>>>>>>implementation. > >>>>>>>> > >>>>>>> Besides priority, also g_source_set_can_recurse() can help. > >>>>>>> With a deeper think, I found a harder problem: > >>>>>>>g_main_context_acquire() and g_main_context_release(). In release, > >>>>>>>pending BH/IO call back need to be cleared, but this action can't > >>>>>>>be triggered automatically when user call g_main_context_release(). > >>>>>> > >>>>>>I don't understand why this is a requirement, gmctx_acquire/release > >>>>>>ensure > >>>>>>that only one thread attempts to iterate the main loop at a time. this > >>>>>>isn't currently an issue in qemu, and if we re-implemented > >>>>>>qemu_aio_wait() > >>>>>>to use the same glib interfaces, the tracking of in-flight requests > >>>>>>would > >>>>>>be moved to the block layer via Stefan's 'drop io_flush' patches, which > >>>>>>moves that block-specific logic out of the main loop/AioContext GSource > >>>>>>by design. Are there other areas where you see this as a problem? > >>>>> > >>>>>I think I understand better what you're referring to, you mean that > >>>>>if qemu_aio_wait was called, and was implementated to essentially call > >>>>>g_main_context_iterate(), that after, say, 1 iteration, the > >>>>>iothread/dataplane thread might acquire the main loop and dispatch > >>>>>block/non-block events between qemu_aio_wait() returned? The simple > >>>>>approach would be to have qemu_aio_wait() call > >>>>>g_main_context_acquire/release > >>>>>at the start end of the function, which would ensure that this never > >>>>>happens. > >>>>> > >>>> qemu_aio_wait() is relative simple to resolve by > >>>>g_main_context_acquire(), but also shows additional code needed > >>>>for a thread switch: > >>>>(guess following is the plan to implement qemu_aio_wait()) > >>>>qemu_aio_wait(GMainContext *ctx) > >>>>{ > >>>> return g_main_context(ctx, PRI_BLOCK); > >>>>} > >>>>at caller: > >>>>{ > >>>> while (qemu_aio_wait(ctx) { > >>>> ; > >>>> } > >>>> g_main_context_release(ctx); > >>>>} > >>>> The above code shows that, in *ctx switch operation, there is > >>>>more than glib's own event loop API envolved, qemu_aio_wait(). So > >>>>it referenced to a question: what data structure > >>>>should be used to represent context concept and control the thread > >>>>switching behavior? It is better to have a clear layer, GMainContext or > >>>>GlibQContext, instead of GMainContext plus custom function. The caller > >>>>reference to at least two: nested user block layer, flat user above > >>>>block layer. > >>>> In my opinion, this problem is brought by Gsource AioContext, Take > >>>>the call path of bdrv_aio_readv(*bdrv_cb) on raw linux file as > >>>>an example, there are aync following operations involved for AioContext: > >>>>1 the bdrv_cb() will be executed in bdrv_co_em_bh(). > >>>>2 bdrv_co_io_em_complete() will be executed in event_notfier_ready(). > >>>>3 aio_worker() will be executed in worker_thread(). > >>>> Operation 2 and 3 are tracked by block layer's queue after Stefan's > >>>>dropping io_flush() series. > >>>> Now if we stick to GMainContext to represent context concept, > >>>>then when thread B want to aquire GMainContext used by thread A, > >>>>all works setupped by A should be finished before B aquire it, > >>>>otherwise B will execute some function supposed to work in A. The > >>>>work refers to the three kind of aync functions above. > >>>> For this issue, we can solve it by different means: > >>>>1 the event loop API doesn't guarentee work setupped by thread A > >>>>will always be finished in A. This put a limitation to caller to > >>>>consider thread switching. I talked on IRC with Stefan, and thinks > >>>>it is possible for the nested user block layer, but I still want to > >>>>avoid this to the flat user above block layer. > >>>>2 ask caller to finish all pending operations. > >>>> 2.1 glib GMainContext API plus custom API such as aio_wait(). This is > >>>>bad that detail under GMainContext is exposed to caller. Since > >>>>operation 1 mentioned before is not tracked yet, to make sure bdrv_cb() > >>>>is called in thread setupped it, 1 need to be added in the track > >>>>queue, or in the call chain of aio_wait(), check the queue plus check > >>>>operation 1. Perhaps add a custom function ask caller to call as > >>>>context_work_flush()? > >>> If a well named API do the flush work present, using Glib API plus > >>>it seems also OK, and can avoid wrapper. I guess > >>>bdrv_drain_all(GMainContext *ctx, ...) can do it. > >>> > >> I haven't found a good answer in gstream, but want to show > >>some idea from my understanding. > >> > >>Following is a brief picture of the current event loop in qemu, > >>Alex's timer for AioContext is also drawn here: > >> > >> ======================== > >> || I/O thread in vl.c || > >> ======================== > >> | > >> run loop | > >> | > >>==================== | > >>|| other || qemu_set_fd_handler2() ===================== > >>|| ||-----------------------------|| Main_loop || > >>||(vnc, migration)|| | ===================== > >>==================== | GLib | > >> | event loop API| > >> qemu_set_fd_handler()| | > >> ----------------- ==================== > >> | || GMainContext || > >> | ==================== > >>========== | (should it be removed?) | > >>|| hw ||-------------------------------------- |GSouce > >>========== | | |Attach > >> | main_loop_tlg| | > >> qemu_bh_***()| | | > >> | | | > >> ======|===============|======================= > >> || | | || > >>=========== || ====== ================== ======= || > >>|| block ||---------------|| | BH | | TimerListGroup | | Fd | || > >>=========== qemu_bh_***()|| ====== ================== ======= || > >> qemu_aio_wait()|| || > >> qemu_aio_set_fd_handler()|| AioContext || > >> || (block layer's event collection) || > >> ============================================= > >> > >> > >> The main issue here is that components are tightly bind together and > >>no clear layer represent the thread and event loop API. Block and hw > >>code are inter acting with AioContext, so both GMainContext and > >>AioContext are playing the role. I hope to form a library for block, > >>So need to pick up one to provide event loop, the choice seems to be: > >>1 GMainContext. > >>2 AioContext. > >>3 Encapsulation, such as GlibQContext. > >> > >> 1) and 2) would not be perfect since non standard glib event loop will > >>be exposed, 3) will shows a unified interface similar to glib main loop, > >>but more code adjust. After some thinking, I guess AioContext is the > >>easiest way, which represent the block's own event loop, and give up > >>using glib event loop at this level, just add custom API as > >>block_iterate(). Briefly, bdrv_read will becomes: > >>int bdrv_read(AioContext *ctx, ....); > > > >I don't understand why you want to add AioContext *ctx to bdrv_read(). > >The synchronous APIs already work fine since no event loop integration > >is necessary at all (the event loop is hidden inside the synchronous > >function). > > > OK... I used a wrong example. It should be bdrv_aio_readv(). At this > level, do you think AioContext * should be used, instead of > GMainContext *?
Maybe but... > >Since AioContext provides a GSource, integrating with an application's > >glib event loop should also be easy. The only hard part is timers, > >since we use nanosecond timers - there we should just round up to > >millisecond granularity to libqblock. The nanosecond timers aren't > >critical in libqblock, only for running guests. ...we already have a GSource that applications can integrate into their event loop. Why do you want to modify the code instead of just exposing the GSource? Stefan