On 01/26/2011 10:38 AM, Avi Kivity wrote:
On 01/26/2011 06:28 PM, Anthony Liguori wrote:
On 01/26/2011 10:13 AM, Avi Kivity wrote:
Serializing against a global mutex has the advantage that it can be
treated as a global lock that is decomposed into fine-grained locks.
For example, we can start the code conversion from an explict async
model to a threaded sync model by converting the mutex into a
shared/exclusive lock. Operations like read and write take the lock
for shared access (and take a fine-grained mutex on the metadata
cache entry), while operation like creating a snapshot take the lock
for exclusive access. That doesn't work with freeze/thaw.
The trouble with this is that you increase the amount of re-entrance
whereas freeze/thaw doesn't.
The code from the beginning of the request to where the mutex is
acquired will be executed for every single request even while
requests are blocked at the mutex acquisition.
It's just a few instructions.
With freeze/thaw, you freeze the queue and prevent any request from
starting until you thaw. You only thaw and return control to allow
another request to execute when you begin executing an asynchronous
I/O callback.
What do you actually save? The longjmp() to the coroutine code,
linking in to the mutex wait queue, and another switch back to the
main coroutine? Given that we don't expect to block often, it seems
hardly a cost worth optimizing.
It's a matter of correctness, not optimization.
Consider the following example:
coroutine {
l2 = find_l2(offset);
// allocates but does not increment max cluster offset
l2[l2_offset(offset)] = alloc_cluster();
co_mutex_lock(&lock);
write_l2(l2);
co_mutex_unlock(&lock);
l1[l1_offset(offset)] = l2;
co_mutex_lock(&lock);
write_l1(l1);
co_mutex_unlock(&lock);
commit_cluster(l2[l2_offset(offset)]);
}
This code is incorrect. The code before the first co_mutex_lock() may
be called a dozen times before before anyone finishes this routine.
That means the same cluster is reused multiple times.
This code was correct before we introduced coroutines because we
effectively had one big lock around the whole thing.
Regards,
Anthony Liguori
I think my previous example was wrong, you really want to do:
qcow2_aio_writev() {
coroutine {
freeze();
sync_io(); // existing qcow2 code
thaw();
// existing non I/O code
bdrv_aio_writev(callback); // no explicit freeze/thaw needed
}
}
This is equivalent to our existing code because no new re-entrance is
introduced. The only re-entrancy points are in the
bdrv_aio_{readv,writev} calls.
This requires you to know which code is sync, and which code is
async. My conversion allows you to wrap the code blindly with a
mutex, and have it do the right thing automatically. This is most
useful where the code can be sync or async depending on data (which is
the case for qcow2).