On Wed, Nov 23, 2011 at 11:47:53AM +0000, Stefan Hajnoczi wrote: > The block layer does not know about pending requests. This information > is necessary for copy-on-read since overlapping requests must be > serialized to prevent races that corrupt the image. > > The BlockDriverState gets a new tracked_request list field which > contains all pending requests. Each request is a BdrvTrackedRequest > record with sector_num, nb_sectors, and is_write fields. > > Note that request tracking is always enabled but hopefully this extra > work is so small that it doesn't justify adding an enable/disable flag. > > Signed-off-by: Stefan Hajnoczi <stefa...@linux.vnet.ibm.com> > --- > block.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++- > block_int.h | 4 ++++ > 2 files changed, 51 insertions(+), 1 deletions(-) > > diff --git a/block.c b/block.c > index 0df7eb9..27c4e84 100644 > --- a/block.c > +++ b/block.c > @@ -1071,6 +1071,42 @@ void bdrv_commit_all(void) > } > } > > +struct BdrvTrackedRequest { > + BlockDriverState *bs; > + int64_t sector_num; > + int nb_sectors; > + bool is_write; > + QLIST_ENTRY(BdrvTrackedRequest) list; > +}; > + > +/** > + * Remove an active request from the tracked requests list > + * > + * This function should be called when a tracked request is completing. > + */ > +static void tracked_request_end(BdrvTrackedRequest *req) > +{ > + QLIST_REMOVE(req, list); > +} > + > +/** > + * Add an active request to the tracked requests list > + */ > +static void tracked_request_begin(BdrvTrackedRequest *req, > + BlockDriverState *bs, > + int64_t sector_num, > + int nb_sectors, bool is_write) > +{ > + *req = (BdrvTrackedRequest){ > + .bs = bs, > + .sector_num = sector_num, > + .nb_sectors = nb_sectors, > + .is_write = is_write, > + }; > + > + QLIST_INSERT_HEAD(&bs->tracked_requests, req, list); > +} > + > /* > * Return values: > * 0 - success > @@ -1350,6 +1386,8 @@ static int coroutine_fn > bdrv_co_do_readv(BlockDriverState *bs, > int64_t sector_num, int nb_sectors, QEMUIOVector *qiov) > { > BlockDriver *drv = bs->drv; > + BdrvTrackedRequest req; > + int ret; > > if (!drv) { > return -ENOMEDIUM; > @@ -1363,7 +1401,10 @@ static int coroutine_fn > bdrv_co_do_readv(BlockDriverState *bs, > bdrv_io_limits_intercept(bs, false, nb_sectors); > } > > - return drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov); > + tracked_request_begin(&req, bs, sector_num, nb_sectors, false); > + ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov); > + tracked_request_end(&req); > + return ret; > }
Stefan, On earlier discussion, you replied to me: " >> req = tracked_request_add(bs, sector_num, nb_sectors, false); > > The tracked request should include cluster round info? When checking A and B for overlap, only one of them needs to be rounded in order for overlap detection to be correct. Therefore we can store the original request [start, length) in tracked_requests and only round the new request. " The problem AFAICS is this: - Store a non-cluster-aligned request in the tracked request list. - Wait on that non-cluster-aligned request (wait_for_overlapping_requests). - Submit cluster-aligned request for COR request. So, the tracked request list does not properly reflect the in-flight COR requests. Which can result in: 1) guest reads sector 10. 2) <sector_num=10,nb_sectors=2> added to tracked request list. 3) COR code submits read for <sector_num=10,nb_sectors=2+cluster_align> 4) unrelated guest operation writes to sector 13, nb_sectors=1. That is allowed to proceed without waiting because tracked request list does not reflect what COR in-flight requests. Am i missing something here?