Re: [Qemu-devel] [PATCH 1/3 v9] add-cow file format
Okay, thanks all of your comments, if no other comments, I will write next version. On Wed, May 30, 2012 at 4:20 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Wed, May 30, 2012 at 2:50 AM, Dong Xu Wang wdon...@linux.vnet.ibm.com wrote: On Tue, May 29, 2012 at 11:50 PM, Stefan Hajnoczi stefa...@gmail.com wrote: I thought a bit more about locking. Because the metadata is simple not much locking is necessary except when fetching new bitmap clusters from the image file into the cache and when populating untouched sectors during data cluster allocation. Those are the two cases where parallel requests could put the block driver or image file into a bad state if allowed to run without any locking. Another way of describing the consequences of parallelism: 1. Coroutines must not duplicate the same add-cow bitmap cluster into the cache if they run at the same time. 2. Coroutines must not hold bitmap tables across blocking operations since the cache entry has no reference count and might be evicted from the cache. 3. Coroutines must not allocate the same data cluster simultaneously because untouched head/tail sectors must never race with guest writes. +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size) +{ + BDRVAddCowState *s = bs-opaque; + int sector_per_byte = SECTORS_PER_CLUSTER * 8; + int ret; + int64_t old_image_sector = s-image_hd-total_sectors; + int64_t bitmap_size = + (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte; + + ret = bdrv_truncate(bs-file, + sizeof(AddCowHeader) + bitmap_size); + if (ret 0) { + bdrv_truncate(s-image_hd, old_image_sector * BDRV_SECTOR_SIZE); Why truncate image_hd on failure? We never touch the image_hd size on success either. I think we can just leave it alone. That means whether we truncate add-cow fails or not ,we should not never touch image_hd size? I thought about this more and I think we should truncate image_hd in the success case only. In order to resize the image we need to resize the cow bitmap and then resize image_hd. If resizing the add-cow file failed, then we haven't changed the cow bitmap and we don't need to truncate image_hd. Do you agree with this or have I missed something? @@ -828,6 +832,41 @@ static int img_convert(int argc, char **argv) } /* Create the new image */ + + if (0 == strcmp(out_fmt, add-cow)) { + image_drv = bdrv_find_format(raw); + if (!drv) { + ret = -1; + goto out; + } + snprintf(image_filename, sizeof(image_filename), + %s.ct.raw, out_filename); + ret = bdrv_create(image_drv, image_filename, image_param); + if (ret 0) { + error_report(%s: error while creating image_file: %s, + image_filename, strerror(-ret)); + goto out; + } + set_option_parameter(param, BLOCK_OPT_IMAGE_FILE, image_filename); + + if (!out_baseimg) { + backing_drv = bdrv_find_format(qcow2); + if (!drv) { + ret = -1; + goto out; + } + snprintf(backing_filename, sizeof(backing_filename), + %s.ct.qcow2, out_filename); + ret = bdrv_create(backing_drv, backing_filename, image_param); + if (ret 0) { + error_report(%s: error while creating backing_file: %s, + backing_filename, strerror(-ret)); + goto out; + } + set_option_parameter(param, BLOCK_OPT_BACKING_FILE, + backing_filename); + } + } If this diff hunk is dropped then the user needs to manually create the raw file before running qemu-img convert? qemu-img convert -O add-cow seems like a very rare case. I'm not sure we should add special user-friend hacks for this. I'm not sure I understand why you create a qcow2 file either. Yes, if we use qemu-img convert -O add-cow, we should create 2 other files, raw file and qcow2(I just picked up qcow2, other formats is also okay) file, as image_file and backing_file, without the two files, .add-cow file can not work properly. Although it will occour in very rare cases, I wish to pass all qemu-iotests cases, so I added these code. Do you think these are not necessary? And some qemu-iotests cases are using convert operation, If I do not write previous code, these cases will fail. Can I let these cases do not support add-cow? If a test uses qemu-img convert then it's probably not that interesting for add-cow. Converting is not a useful operation because add-cow is an add-on block driver that adds a feature on top of raw, rather than a format like vmdk or qcow2 which is used to share disk images. I see why you did this to make qemu-iotests work, but personally I would drop this special case code and skip those tests. Stefan
Re: [Qemu-devel] [PATCH 1/3 v9] add-cow file format
Am 30.05.2012 04:10, schrieb Anthony Liguori: On 05/08/2012 01:34 AM, Dong Xu Wang wrote: Provide a new file format: add-cow. The usage can be found in add-cow.txt of this patch. CC: Kevin Wolfkw...@redhat.com CC: Stefan Hajnoczistefa...@linux.vnet.ibm.com Signed-off-by: Dong Xu Wangwdon...@linux.vnet.ibm.com You should split out the spec to be the first patch. That makes it easier for people to review the specification independent of the code and also makes subsequent code review easier. Yes, please. --- Makefile.objs |1 + block.c|2 +- block.h|1 + block/add-cow-cache.c | 193 + block/add-cow.c| 446 block/add-cow.h| 83 + block_int.h|1 + docs/specs/add-cow.txt | 68 qemu-img.c | 39 + 9 files changed, 833 insertions(+), 1 deletion(-) create mode 100644 block/add-cow-cache.c create mode 100644 block/add-cow.c create mode 100644 block/add-cow.h create mode 100644 docs/specs/add-cow.txt diff --git a/Makefile.objs b/Makefile.objs index 70c5c79..10c5c52 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -52,6 +52,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-nested-y += qed-check.o +block-nested-y += add-cow.o add-cow-cache.o block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o block-nested-y += stream.o block-nested-$(CONFIG_WIN32) += raw-win32.o diff --git a/block.c b/block.c index 43c794c..206860c 100644 --- a/block.c +++ b/block.c @@ -196,7 +196,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs, } /* check if the path starts with protocol: */ -static int path_has_protocol(const char *path) +int path_has_protocol(const char *path) { #ifdef _WIN32 if (is_windows_drive(path) || diff --git a/block.h b/block.h index f163e54..f74c79e 100644 --- a/block.h +++ b/block.h @@ -319,6 +319,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn); char *get_human_readable_size(char *buf, int buf_size, int64_t size); int path_is_absolute(const char *path); +int path_has_protocol(const char *path); void path_combine(char *dest, int dest_size, const char *base_path, const char *filename); diff --git a/block/add-cow-cache.c b/block/add-cow-cache.c new file mode 100644 index 000..3ae23c1 --- /dev/null +++ b/block/add-cow-cache.c @@ -0,0 +1,193 @@ +/* + * Cache For QEMU ADD-COW Disk Format + * + * This work is licensed under the terms of the GNU LGPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ This is not a valid copyright block. Look at hw/virtio.c for an example. If you want to do LGPL instead of GPL, it should also be 2.1 or later. You mean because the name of the copyright holder is missing? Would be nice to include indeed, but I don't think it makes a legal difference. And the vast majority of source files doesn't have the names of all copyright holders there anyway. +#include block_int.h +#include qemu-common.h +#include add-cow.h + +/* Based on qcow2-cache.c */ If the code is based on qcow2, then you need to preserve the qcow2 copyrights in this file. Or ideally we would generalise qcow2-cache.c so that it can be used by both. Not sure how easy it would be, though. +AddCowCache *add_cow_cache_create(BlockDriverState *bs, int num_tables, +bool writethrough) +{ +AddCowCache *c; +int i; + +c = g_malloc0(sizeof(*c)); +c-size = num_tables; +c-entries = g_malloc0(sizeof(*c-entries) * num_tables); +c-writethrough = writethrough; +c-entry_size = ADD_COW_CACHE_ENTRY_SIZE; + +for (i = 0; i c-size; i++) { +c-entries[i].table = qemu_blockalign(bs, c-entry_size); +c-entries[i].offset = -1; +} + +return c; +} + +int add_cow_cache_destroy(BlockDriverState *bs, AddCowCache *c) +{ +int i; + +for (i = 0; i c-size; i++) { +qemu_vfree(c-entries[i].table); +} + +g_free(c-entries); +g_free(c); + +return 0; +} + +static int add_cow_cache_entry_flush(BlockDriverState *bs, +AddCowCache *c, +int i) +{ +BDRVAddCowState *s = bs-opaque; +int ret = 0; + +if (!c-entries[i].dirty || -1 == c-entries[i].offset) { +return ret; +} + +ret = bdrv_pwrite(bs-file, sizeof(AddCowHeader) + c-entries[i].offset, +c-entries[i].table, +MIN(c-entry_size, s-bitmap_size - c-entries[i].offset)); This is a synchronous I/O function. We shouldn't introduce new formats
Re: [Qemu-devel] [PATCH 1/3 v9] add-cow file format
On Wed, May 30, 2012 at 2:50 AM, Dong Xu Wang wdon...@linux.vnet.ibm.com wrote: On Tue, May 29, 2012 at 11:50 PM, Stefan Hajnoczi stefa...@gmail.com wrote: I thought a bit more about locking. Because the metadata is simple not much locking is necessary except when fetching new bitmap clusters from the image file into the cache and when populating untouched sectors during data cluster allocation. Those are the two cases where parallel requests could put the block driver or image file into a bad state if allowed to run without any locking. Another way of describing the consequences of parallelism: 1. Coroutines must not duplicate the same add-cow bitmap cluster into the cache if they run at the same time. 2. Coroutines must not hold bitmap tables across blocking operations since the cache entry has no reference count and might be evicted from the cache. 3. Coroutines must not allocate the same data cluster simultaneously because untouched head/tail sectors must never race with guest writes. +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size) +{ + BDRVAddCowState *s = bs-opaque; + int sector_per_byte = SECTORS_PER_CLUSTER * 8; + int ret; + int64_t old_image_sector = s-image_hd-total_sectors; + int64_t bitmap_size = + (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte; + + ret = bdrv_truncate(bs-file, + sizeof(AddCowHeader) + bitmap_size); + if (ret 0) { + bdrv_truncate(s-image_hd, old_image_sector * BDRV_SECTOR_SIZE); Why truncate image_hd on failure? We never touch the image_hd size on success either. I think we can just leave it alone. That means whether we truncate add-cow fails or not ,we should not never touch image_hd size? I thought about this more and I think we should truncate image_hd in the success case only. In order to resize the image we need to resize the cow bitmap and then resize image_hd. If resizing the add-cow file failed, then we haven't changed the cow bitmap and we don't need to truncate image_hd. Do you agree with this or have I missed something? @@ -828,6 +832,41 @@ static int img_convert(int argc, char **argv) } /* Create the new image */ + + if (0 == strcmp(out_fmt, add-cow)) { + image_drv = bdrv_find_format(raw); + if (!drv) { + ret = -1; + goto out; + } + snprintf(image_filename, sizeof(image_filename), + %s.ct.raw, out_filename); + ret = bdrv_create(image_drv, image_filename, image_param); + if (ret 0) { + error_report(%s: error while creating image_file: %s, + image_filename, strerror(-ret)); + goto out; + } + set_option_parameter(param, BLOCK_OPT_IMAGE_FILE, image_filename); + + if (!out_baseimg) { + backing_drv = bdrv_find_format(qcow2); + if (!drv) { + ret = -1; + goto out; + } + snprintf(backing_filename, sizeof(backing_filename), + %s.ct.qcow2, out_filename); + ret = bdrv_create(backing_drv, backing_filename, image_param); + if (ret 0) { + error_report(%s: error while creating backing_file: %s, + backing_filename, strerror(-ret)); + goto out; + } + set_option_parameter(param, BLOCK_OPT_BACKING_FILE, + backing_filename); + } + } If this diff hunk is dropped then the user needs to manually create the raw file before running qemu-img convert? qemu-img convert -O add-cow seems like a very rare case. I'm not sure we should add special user-friend hacks for this. I'm not sure I understand why you create a qcow2 file either. Yes, if we use qemu-img convert -O add-cow, we should create 2 other files, raw file and qcow2(I just picked up qcow2, other formats is also okay) file, as image_file and backing_file, without the two files, .add-cow file can not work properly. Although it will occour in very rare cases, I wish to pass all qemu-iotests cases, so I added these code. Do you think these are not necessary? And some qemu-iotests cases are using convert operation, If I do not write previous code, these cases will fail. Can I let these cases do not support add-cow? If a test uses qemu-img convert then it's probably not that interesting for add-cow. Converting is not a useful operation because add-cow is an add-on block driver that adds a feature on top of raw, rather than a format like vmdk or qcow2 which is used to share disk images. I see why you did this to make qemu-iotests work, but personally I would drop this special case code and skip those tests. Stefan
Re: [Qemu-devel] [PATCH 1/3 v9] add-cow file format
+ image_sectors = image_bs-total_sectors; Please use bdrv_getlength() instead of accessing total_sectors directly. + image_drv = bdrv_find_format(raw); + ret = bdrv_open(s-image_hd, image_filename, flags, image_drv); + if (ret 0) { + bdrv_delete(s-image_hd); + goto fail; + } + bs-total_sectors = s-image_hd-total_sectors; bdrv_getlength() +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs, + int64_t sector_num, int nb_sectors, int *num_same) +{ + int changed; + int64_t cluster_num; + + if (nb_sectors == 0) { + *num_same = 0; + return 0; + } + + cluster_num = sector_num / SECTORS_PER_CLUSTER; + changed = is_allocated(bs, sector_num); Do we need to hold the lock before (indirectly) accessing the cache? + *num_same = + MIN(nb_sectors, (cluster_num + 1) * SECTORS_PER_CLUSTER - sector_num); + + for (cluster_num = sector_num / SECTORS_PER_CLUSTER + 1; + cluster_num = (sector_num + nb_sectors - 1) / SECTORS_PER_CLUSTER; + cluster_num++) { + if (is_allocated(bs, cluster_num * SECTORS_PER_CLUSTER) != changed) { + break; + } + *num_same = MIN(nb_sectors, + (cluster_num + 1) * SECTORS_PER_CLUSTER - sector_num); + } I think this makes sense but it would be easier to use a loop counter that is in sector units instead of clusters. Then you can calculate *num_name after the loop simply by subtracting the starting sector_num from the final cur_sector value. And it saves you from constantly converting back and forth between clusters and sectors. +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs, + int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov) +{ + BDRVAddCowState *s = bs-opaque; + int ret = 0, i; + QEMUIOVector hd_qiov; + uint8_t *table; + bool changed = false; + + qemu_co_mutex_lock(s-lock); + qemu_iovec_init(hd_qiov, qiov-niov); + ret = bdrv_co_writev(s-image_hd, + sector_num, + remaining_sectors, qiov); We don't need to lock across all writes. lock() if write requires allocation: ...do allocation stuff under lock... unlock() write data Internal data structure (cache, metadata, and copying unmodified sectors) access needs to be locked during allocating writes. The guest's data can be written without the lock held. This means that most writes will only lock for a short time to check that the clusters are already allocated. Then they will be able to write in parallel. If any cluster is not yet allocated then the allocation needs to happen under lock. This ensures that we don't copy backing file sectors while processing another write request that touches those sectors. + + if (ret 0) { + goto fail; + } + /* copy content of unmodified sectors */ + if (!is_cluster_head(sector_num) !is_allocated(bs, sector_num)) { + qemu_co_mutex_unlock(s-lock); + ret = copy_sectors(bs, sector_num ~(SECTORS_PER_CLUSTER - 1), + sector_num); As mentioned above, I think we need to lock during copy_sectors() so that other requests cannot race with this. (The guest's writes must always take priority over copying unmodified sectors.) + qemu_co_mutex_lock(s-lock); + if (ret 0) { + goto fail; + } + } + + if (!is_cluster_tail(sector_num + remaining_sectors - 1) + !is_allocated(bs, sector_num + remaining_sectors - 1)) { + qemu_co_mutex_unlock(s-lock); + ret = copy_sectors(bs, sector_num + remaining_sectors, + ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) + 1); + qemu_co_mutex_lock(s-lock); + if (ret 0) { + goto fail; + } + } + + for (i = sector_num / SECTORS_PER_CLUSTER; + i = (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER; + i++) { + ret = add_cow_cache_get(bs, s-bitmap_cache, + i * SECTORS_PER_CLUSTER, (void **)table); + if (ret 0) { + return 0; return ret? + } + if ((table[i / 8] (1 (i % 8))) == 0) { + table[i / 8] |= (1 (i % 8)); + changed = true; + add_cow_cache_entry_mark_dirty(s-bitmap_cache, table); + } + + } + ret = 0; +fail: + if (changed) { + ret = add_cow_cache_flush(bs, s-bitmap_cache); + } + qemu_co_mutex_unlock(s-lock); + qemu_iovec_destroy(hd_qiov); + return ret; +} + +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size) +{ + BDRVAddCowState *s = bs-opaque; + int sector_per_byte = SECTORS_PER_CLUSTER * 8; + int ret; + int64_t old_image_sector = s-image_hd-total_sectors; + int64_t bitmap_size = + (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte; + + ret =
Re: [Qemu-devel] [PATCH 1/3 v9] add-cow file format
On Tue, May 29, 2012 at 11:50 PM, Stefan Hajnoczi stefa...@gmail.com wrote: + image_sectors = image_bs-total_sectors; Please use bdrv_getlength() instead of accessing total_sectors directly. + image_drv = bdrv_find_format(raw); + ret = bdrv_open(s-image_hd, image_filename, flags, image_drv); + if (ret 0) { + bdrv_delete(s-image_hd); + goto fail; + } + bs-total_sectors = s-image_hd-total_sectors; bdrv_getlength() Okay. +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs, + int64_t sector_num, int nb_sectors, int *num_same) +{ + int changed; + int64_t cluster_num; + + if (nb_sectors == 0) { + *num_same = 0; + return 0; + } + + cluster_num = sector_num / SECTORS_PER_CLUSTER; + changed = is_allocated(bs, sector_num); Do we need to hold the lock before (indirectly) accessing the cache? + *num_same = + MIN(nb_sectors, (cluster_num + 1) * SECTORS_PER_CLUSTER - sector_num); + + for (cluster_num = sector_num / SECTORS_PER_CLUSTER + 1; + cluster_num = (sector_num + nb_sectors - 1) / SECTORS_PER_CLUSTER; + cluster_num++) { + if (is_allocated(bs, cluster_num * SECTORS_PER_CLUSTER) != changed) { + break; + } + *num_same = MIN(nb_sectors, + (cluster_num + 1) * SECTORS_PER_CLUSTER - sector_num); + } I think this makes sense but it would be easier to use a loop counter that is in sector units instead of clusters. Then you can calculate *num_name after the loop simply by subtracting the starting sector_num from the final cur_sector value. And it saves you from constantly converting back and forth between clusters and sectors. +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs, + int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov) +{ + BDRVAddCowState *s = bs-opaque; + int ret = 0, i; + QEMUIOVector hd_qiov; + uint8_t *table; + bool changed = false; + + qemu_co_mutex_lock(s-lock); + qemu_iovec_init(hd_qiov, qiov-niov); + ret = bdrv_co_writev(s-image_hd, + sector_num, + remaining_sectors, qiov); We don't need to lock across all writes. lock() if write requires allocation: ...do allocation stuff under lock... unlock() write data Internal data structure (cache, metadata, and copying unmodified sectors) access needs to be locked during allocating writes. The guest's data can be written without the lock held. This means that most writes will only lock for a short time to check that the clusters are already allocated. Then they will be able to write in parallel. If any cluster is not yet allocated then the allocation needs to happen under lock. This ensures that we don't copy backing file sectors while processing another write request that touches those sectors. + + if (ret 0) { + goto fail; + } + /* copy content of unmodified sectors */ + if (!is_cluster_head(sector_num) !is_allocated(bs, sector_num)) { + qemu_co_mutex_unlock(s-lock); + ret = copy_sectors(bs, sector_num ~(SECTORS_PER_CLUSTER - 1), + sector_num); As mentioned above, I think we need to lock during copy_sectors() so that other requests cannot race with this. (The guest's writes must always take priority over copying unmodified sectors.) + qemu_co_mutex_lock(s-lock); + if (ret 0) { + goto fail; + } + } + + if (!is_cluster_tail(sector_num + remaining_sectors - 1) + !is_allocated(bs, sector_num + remaining_sectors - 1)) { + qemu_co_mutex_unlock(s-lock); + ret = copy_sectors(bs, sector_num + remaining_sectors, + ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) + 1); + qemu_co_mutex_lock(s-lock); + if (ret 0) { + goto fail; + } + } + + for (i = sector_num / SECTORS_PER_CLUSTER; + i = (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER; + i++) { + ret = add_cow_cache_get(bs, s-bitmap_cache, + i * SECTORS_PER_CLUSTER, (void **)table); + if (ret 0) { + return 0; return ret? Yes.. + } + if ((table[i / 8] (1 (i % 8))) == 0) { + table[i / 8] |= (1 (i % 8)); + changed = true; + add_cow_cache_entry_mark_dirty(s-bitmap_cache, table); + } + + } + ret = 0; +fail: + if (changed) { + ret = add_cow_cache_flush(bs, s-bitmap_cache); + } + qemu_co_mutex_unlock(s-lock); + qemu_iovec_destroy(hd_qiov); + return ret; +} + +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size) +{ + BDRVAddCowState *s = bs-opaque; + int sector_per_byte = SECTORS_PER_CLUSTER * 8; + int ret; + int64_t old_image_sector = s-image_hd-total_sectors; + int64_t
Re: [Qemu-devel] [PATCH 1/3 v9] add-cow file format
On 05/08/2012 01:34 AM, Dong Xu Wang wrote: Provide a new file format: add-cow. The usage can be found in add-cow.txt of this patch. CC: Kevin Wolfkw...@redhat.com CC: Stefan Hajnoczistefa...@linux.vnet.ibm.com Signed-off-by: Dong Xu Wangwdon...@linux.vnet.ibm.com You should split out the spec to be the first patch. That makes it easier for people to review the specification independent of the code and also makes subsequent code review easier. --- Makefile.objs |1 + block.c|2 +- block.h|1 + block/add-cow-cache.c | 193 + block/add-cow.c| 446 block/add-cow.h| 83 + block_int.h|1 + docs/specs/add-cow.txt | 68 qemu-img.c | 39 + 9 files changed, 833 insertions(+), 1 deletion(-) create mode 100644 block/add-cow-cache.c create mode 100644 block/add-cow.c create mode 100644 block/add-cow.h create mode 100644 docs/specs/add-cow.txt diff --git a/Makefile.objs b/Makefile.objs index 70c5c79..10c5c52 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -52,6 +52,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-nested-y += qed-check.o +block-nested-y += add-cow.o add-cow-cache.o block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o block-nested-y += stream.o block-nested-$(CONFIG_WIN32) += raw-win32.o diff --git a/block.c b/block.c index 43c794c..206860c 100644 --- a/block.c +++ b/block.c @@ -196,7 +196,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs, } /* check if the path starts with protocol: */ -static int path_has_protocol(const char *path) +int path_has_protocol(const char *path) { #ifdef _WIN32 if (is_windows_drive(path) || diff --git a/block.h b/block.h index f163e54..f74c79e 100644 --- a/block.h +++ b/block.h @@ -319,6 +319,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn); char *get_human_readable_size(char *buf, int buf_size, int64_t size); int path_is_absolute(const char *path); +int path_has_protocol(const char *path); void path_combine(char *dest, int dest_size, const char *base_path, const char *filename); diff --git a/block/add-cow-cache.c b/block/add-cow-cache.c new file mode 100644 index 000..3ae23c1 --- /dev/null +++ b/block/add-cow-cache.c @@ -0,0 +1,193 @@ +/* + * Cache For QEMU ADD-COW Disk Format + * + * This work is licensed under the terms of the GNU LGPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ This is not a valid copyright block. Look at hw/virtio.c for an example. If you want to do LGPL instead of GPL, it should also be 2.1 or later. +#include block_int.h +#include qemu-common.h +#include add-cow.h + +/* Based on qcow2-cache.c */ If the code is based on qcow2, then you need to preserve the qcow2 copyrights in this file. +AddCowCache *add_cow_cache_create(BlockDriverState *bs, int num_tables, +bool writethrough) +{ +AddCowCache *c; +int i; + +c = g_malloc0(sizeof(*c)); +c-size = num_tables; +c-entries = g_malloc0(sizeof(*c-entries) * num_tables); +c-writethrough = writethrough; +c-entry_size = ADD_COW_CACHE_ENTRY_SIZE; + +for (i = 0; i c-size; i++) { +c-entries[i].table = qemu_blockalign(bs, c-entry_size); +c-entries[i].offset = -1; +} + +return c; +} + +int add_cow_cache_destroy(BlockDriverState *bs, AddCowCache *c) +{ +int i; + +for (i = 0; i c-size; i++) { +qemu_vfree(c-entries[i].table); +} + +g_free(c-entries); +g_free(c); + +return 0; +} + +static int add_cow_cache_entry_flush(BlockDriverState *bs, +AddCowCache *c, +int i) +{ +BDRVAddCowState *s = bs-opaque; +int ret = 0; + +if (!c-entries[i].dirty || -1 == c-entries[i].offset) { +return ret; +} + +ret = bdrv_pwrite(bs-file, sizeof(AddCowHeader) + c-entries[i].offset, +c-entries[i].table, +MIN(c-entry_size, s-bitmap_size - c-entries[i].offset)); This is a synchronous I/O function. We shouldn't introduce new formats that have *any* synchronous I/O in them. Honestly, I think using coroutines for this format is a mistake. It's simple enough that doing a state machine would be straight forward enough. +if (ret 0) { +return ret; +} + +c-entries[i].dirty = false; + +return 0; +} + +int add_cow_cache_flush(BlockDriverState *bs, AddCowCache *c) +{ +BDRVAddCowState *s = bs-opaque; +int result = 0; +int ret; +int i; + +ret = bdrv_flush(s-image_hd); +if (ret 0) { +return result; +} + +for (i = 0; i
[Qemu-devel] [PATCH 1/3 v9] add-cow file format
Provide a new file format: add-cow. The usage can be found in add-cow.txt of this patch. CC: Kevin Wolf kw...@redhat.com CC: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com --- Makefile.objs |1 + block.c|2 +- block.h|1 + block/add-cow-cache.c | 193 + block/add-cow.c| 446 block/add-cow.h| 83 + block_int.h|1 + docs/specs/add-cow.txt | 68 qemu-img.c | 39 + 9 files changed, 833 insertions(+), 1 deletion(-) create mode 100644 block/add-cow-cache.c create mode 100644 block/add-cow.c create mode 100644 block/add-cow.h create mode 100644 docs/specs/add-cow.txt diff --git a/Makefile.objs b/Makefile.objs index 70c5c79..10c5c52 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -52,6 +52,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-nested-y += qed-check.o +block-nested-y += add-cow.o add-cow-cache.o block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o block-nested-y += stream.o block-nested-$(CONFIG_WIN32) += raw-win32.o diff --git a/block.c b/block.c index 43c794c..206860c 100644 --- a/block.c +++ b/block.c @@ -196,7 +196,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs, } /* check if the path starts with protocol: */ -static int path_has_protocol(const char *path) +int path_has_protocol(const char *path) { #ifdef _WIN32 if (is_windows_drive(path) || diff --git a/block.h b/block.h index f163e54..f74c79e 100644 --- a/block.h +++ b/block.h @@ -319,6 +319,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn); char *get_human_readable_size(char *buf, int buf_size, int64_t size); int path_is_absolute(const char *path); +int path_has_protocol(const char *path); void path_combine(char *dest, int dest_size, const char *base_path, const char *filename); diff --git a/block/add-cow-cache.c b/block/add-cow-cache.c new file mode 100644 index 000..3ae23c1 --- /dev/null +++ b/block/add-cow-cache.c @@ -0,0 +1,193 @@ +/* + * Cache For QEMU ADD-COW Disk Format + * + * This work is licensed under the terms of the GNU LGPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#include block_int.h +#include qemu-common.h +#include add-cow.h + +/* Based on qcow2-cache.c */ +AddCowCache *add_cow_cache_create(BlockDriverState *bs, int num_tables, +bool writethrough) +{ +AddCowCache *c; +int i; + +c = g_malloc0(sizeof(*c)); +c-size = num_tables; +c-entries = g_malloc0(sizeof(*c-entries) * num_tables); +c-writethrough = writethrough; +c-entry_size = ADD_COW_CACHE_ENTRY_SIZE; + +for (i = 0; i c-size; i++) { +c-entries[i].table = qemu_blockalign(bs, c-entry_size); +c-entries[i].offset = -1; +} + +return c; +} + +int add_cow_cache_destroy(BlockDriverState *bs, AddCowCache *c) +{ +int i; + +for (i = 0; i c-size; i++) { +qemu_vfree(c-entries[i].table); +} + +g_free(c-entries); +g_free(c); + +return 0; +} + +static int add_cow_cache_entry_flush(BlockDriverState *bs, +AddCowCache *c, +int i) +{ +BDRVAddCowState *s = bs-opaque; +int ret = 0; + +if (!c-entries[i].dirty || -1 == c-entries[i].offset) { +return ret; +} + +ret = bdrv_pwrite(bs-file, sizeof(AddCowHeader) + c-entries[i].offset, +c-entries[i].table, +MIN(c-entry_size, s-bitmap_size - c-entries[i].offset)); +if (ret 0) { +return ret; +} + +c-entries[i].dirty = false; + +return 0; +} + +int add_cow_cache_flush(BlockDriverState *bs, AddCowCache *c) +{ +BDRVAddCowState *s = bs-opaque; +int result = 0; +int ret; +int i; + +ret = bdrv_flush(s-image_hd); +if (ret 0) { +return result; +} + +for (i = 0; i c-size; i++) { +ret = add_cow_cache_entry_flush(bs, c, i); +if (ret 0 result != -ENOSPC) { +result = ret; +} +} + +if (result == 0) { +ret = bdrv_flush(bs-file); +if (ret 0) { +result = ret; +} +} + +return result; +} + +static int add_cow_cache_find_entry_to_replace(AddCowCache *c) +{ +int i; +int min_count = INT_MAX; +int min_index = -1; + + +for (i = 0; i c-size; i++) { +if (c-entries[i].cache_hits min_count) { +min_index = i; +min_count = c-entries[i].cache_hits; +} + +c-entries[i].cache_hits /= 2; +} + +if (min_index == -1) { +abort(); +} +return min_index; +} + +static int