Am 26.01.2011 16:40, schrieb Avi Kivity: > On 01/22/2011 11:29 AM, Stefan Hajnoczi wrote: >> Converting qcow2 to use coroutines is fairly simple since most of qcow2 >> is synchronous. The synchronous I/O functions likes bdrv_pread() now >> transparently work when called from a coroutine, so all the synchronous >> code just works. >> >> The explicitly asynchronous code is adjusted to repeatedly call >> qcow2_aio_read_cb() or qcow2_aio_write_cb() until the request completes. >> At that point the coroutine will return from its entry function and its >> resources are freed. >> >> The bdrv_aio_readv() and bdrv_aio_writev() user callback is now invoked >> from a BH. This is necessary since the user callback code does not >> expect to be executed from a coroutine. >> >> This conversion is not completely correct because the safety the >> synchronous code does not carry over to the coroutine version. >> Previously, a synchronous code path could assume that it will never be >> interleaved with another request executing. This is no longer true >> because bdrv_pread() and bdrv_pwrite() cause the coroutine to yield and >> other requests can be processed during that time. >> >> The solution is to carefully introduce checks so that pending requests >> do not step on each other's toes. That is left for a future patch... > > The way I thought of doing this is: > > qcow_aio_write(...) > { > execute_in_coroutine { > co_mutex_lock(&bs->mutex); > do_qcow_aio_write(...); // original qcow code > co_mutex_release(&bs->mutex); > } > } > > (similar changes for the the other callbacks) > > if the code happens to be asynchronous (no metadata changes), we'll take > the mutex and release it immediately after submitting the I/O, so no > extra serialization happens. If the code does issue a synchronous > metadata write, we'll lock out all other operations on the same block > device, but still allow the vcpu to execute, since all the locking > happens in a coroutine. > > Essentially, a mutex becomes the dependency tracking mechnism. A global > mutex means all synchronous operations are dependent. Later, we can > convert the metadata cache entry dependency lists to local mutexes > inside the cache entry structures.
I thought a bit about it since you mentioned it in the call yesterday and I think this approach makes sense. Even immediately after the conversion we should be in a better state than with Stefan's approach because I/O without metadata disk access won't be serialized. In the other thread you mentioned that you have written some code independently. Do you have it in some public git repository? Kevin