Re: [Qemu-devel] [PATCH 1/3 v9] add-cow file format

2012-06-04 Thread Dong Xu Wang
Okay, thanks all of your comments, if no other comments, I will write
next version.


On Wed, May 30, 2012 at 4:20 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Wed, May 30, 2012 at 2:50 AM, Dong Xu Wang
 wdon...@linux.vnet.ibm.com wrote:
 On Tue, May 29, 2012 at 11:50 PM, Stefan Hajnoczi stefa...@gmail.com wrote:

 I thought a bit more about locking.  Because the metadata is simple
 not much locking is necessary except when fetching new bitmap clusters
 from the image file into the cache and when populating untouched
 sectors during data cluster allocation.  Those are the two cases where
 parallel requests could put the block driver or image file into a bad
 state if allowed to run without any locking.

 Another way of describing the consequences of parallelism:
 1. Coroutines must not duplicate the same add-cow bitmap cluster into
 the cache if they run at the same time.
 2. Coroutines must not hold bitmap tables across blocking operations
 since the cache entry has no reference count and might be evicted from
 the cache.
 3. Coroutines must not allocate the same data cluster simultaneously
 because untouched head/tail sectors must never race with guest writes.

 +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
 +{
 +    BDRVAddCowState *s = bs-opaque;
 +    int sector_per_byte = SECTORS_PER_CLUSTER * 8;
 +    int ret;
 +    int64_t old_image_sector = s-image_hd-total_sectors;
 +    int64_t bitmap_size =
 +        (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
 +
 +    ret = bdrv_truncate(bs-file,
 +        sizeof(AddCowHeader) + bitmap_size);
 +    if (ret  0) {
 +        bdrv_truncate(s-image_hd, old_image_sector * BDRV_SECTOR_SIZE);

 Why truncate image_hd on failure?  We never touch the image_hd size on
 success either.  I think we can just leave it alone.

 That means whether we truncate add-cow fails or not ,we should not never 
 touch
 image_hd size?

 I thought about this more and I think we should truncate image_hd in
 the success case only.  In order to resize the image we need to resize
 the cow bitmap and then resize image_hd.  If resizing the add-cow file
 failed, then we haven't changed the cow bitmap and we don't need to
 truncate image_hd.  Do you agree with this or have I missed something?

 @@ -828,6 +832,41 @@ static int img_convert(int argc, char **argv)
     }

     /* Create the new image */
 +
 +    if (0 == strcmp(out_fmt, add-cow)) {
 +        image_drv = bdrv_find_format(raw);
 +        if (!drv) {
 +            ret = -1;
 +            goto out;
 +        }
 +        snprintf(image_filename, sizeof(image_filename),
 +            %s.ct.raw, out_filename);
 +        ret = bdrv_create(image_drv, image_filename, image_param);
 +        if (ret  0) {
 +            error_report(%s: error while creating image_file: %s,
 +                     image_filename, strerror(-ret));
 +            goto out;
 +        }
 +        set_option_parameter(param, BLOCK_OPT_IMAGE_FILE, image_filename);
 +
 +        if (!out_baseimg) {
 +            backing_drv = bdrv_find_format(qcow2);
 +            if (!drv) {
 +                ret = -1;
 +                goto out;
 +            }
 +            snprintf(backing_filename, sizeof(backing_filename),
 +                %s.ct.qcow2, out_filename);
 +            ret = bdrv_create(backing_drv, backing_filename, image_param);
 +            if (ret  0) {
 +                error_report(%s: error while creating backing_file: %s,
 +                         backing_filename, strerror(-ret));
 +                goto out;
 +            }
 +            set_option_parameter(param, BLOCK_OPT_BACKING_FILE,
 +                backing_filename);
 +        }
 +    }

 If this diff hunk is dropped then the user needs to manually create
 the raw file before running qemu-img convert?

 qemu-img convert -O add-cow seems like a very rare case.  I'm not sure
 we should add special user-friend hacks for this.

 I'm not sure I understand why you create a qcow2 file either.

 Yes, if we use qemu-img convert -O add-cow, we should create 2 other files,
 raw file and qcow2(I just picked  up qcow2, other formats is also okay) file,
 as image_file and backing_file, without the two files, .add-cow file can not
 work properly.

 Although it will occour in very rare cases, I wish to pass all qemu-iotests
 cases, so I added these code.

 Do you think these are not necessary? And some qemu-iotests cases are
 using convert operation, If I do not write previous code, these cases will
 fail. Can I let these cases do not support add-cow?

 If a test uses qemu-img convert then it's probably not that
 interesting for add-cow.  Converting is not a useful operation because
 add-cow is an add-on block driver that adds a feature on top of raw,
 rather than a format like vmdk or qcow2 which is used to share disk
 images.  I see why you did this to make qemu-iotests work, but
 personally I would drop this special case code and skip those tests.

 Stefan




Re: [Qemu-devel] [PATCH 1/3 v9] add-cow file format

2012-05-30 Thread Kevin Wolf
Am 30.05.2012 04:10, schrieb Anthony Liguori:
 On 05/08/2012 01:34 AM, Dong Xu Wang wrote:
 Provide a new file format: add-cow. The usage can be found in add-cow.txt of
 this patch.

 CC: Kevin Wolfkw...@redhat.com
 CC: Stefan Hajnoczistefa...@linux.vnet.ibm.com
 Signed-off-by: Dong Xu Wangwdon...@linux.vnet.ibm.com
 
 You should split out the spec to be the first patch.  That makes it easier 
 for 
 people to review the specification independent of the code and also makes 
 subsequent code review easier.

Yes, please.

 ---
   Makefile.objs  |1 +
   block.c|2 +-
   block.h|1 +
   block/add-cow-cache.c  |  193 +
   block/add-cow.c|  446 
 
   block/add-cow.h|   83 +
   block_int.h|1 +
   docs/specs/add-cow.txt |   68 
   qemu-img.c |   39 +
   9 files changed, 833 insertions(+), 1 deletion(-)
   create mode 100644 block/add-cow-cache.c
   create mode 100644 block/add-cow.c
   create mode 100644 block/add-cow.h
   create mode 100644 docs/specs/add-cow.txt

 diff --git a/Makefile.objs b/Makefile.objs
 index 70c5c79..10c5c52 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
 @@ -52,6 +52,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o 
 dmg.o bochs.o vpc.o vv
   block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o 
 qcow2-snapshot.o qcow2-cache.o
   block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o 
 qed-cluster.o
   block-nested-y += qed-check.o
 +block-nested-y += add-cow.o add-cow-cache.o
   block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
   block-nested-y += stream.o
   block-nested-$(CONFIG_WIN32) += raw-win32.o
 diff --git a/block.c b/block.c
 index 43c794c..206860c 100644
 --- a/block.c
 +++ b/block.c
 @@ -196,7 +196,7 @@ static void bdrv_io_limits_intercept(BlockDriverState 
 *bs,
   }

   /* check if the path starts with protocol: */
 -static int path_has_protocol(const char *path)
 +int path_has_protocol(const char *path)
   {
   #ifdef _WIN32
   if (is_windows_drive(path) ||
 diff --git a/block.h b/block.h
 index f163e54..f74c79e 100644
 --- a/block.h
 +++ b/block.h
 @@ -319,6 +319,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, 
 QEMUSnapshotInfo *sn);

   char *get_human_readable_size(char *buf, int buf_size, int64_t size);
   int path_is_absolute(const char *path);
 +int path_has_protocol(const char *path);
   void path_combine(char *dest, int dest_size,
 const char *base_path,
 const char *filename);
 diff --git a/block/add-cow-cache.c b/block/add-cow-cache.c
 new file mode 100644
 index 000..3ae23c1
 --- /dev/null
 +++ b/block/add-cow-cache.c
 @@ -0,0 +1,193 @@
 +/*
 + * Cache For QEMU ADD-COW Disk Format
 + *
 + * This work is licensed under the terms of the GNU LGPL, version 2 or 
 later.
 + * See the COPYING.LIB file in the top-level directory.
 + *
 + */

 
 This is not a valid copyright block.  Look at hw/virtio.c for an example.  If 
 you want to do LGPL instead of GPL, it should also be 2.1 or later.

You mean because the name of the copyright holder is missing? Would be
nice to include indeed, but I don't think it makes a legal difference.
And the vast majority of source files doesn't have the names of all
copyright holders there anyway.

 +#include block_int.h
 +#include qemu-common.h
 +#include add-cow.h
 +
 +/* Based on qcow2-cache.c */
 
 If the code is based on qcow2, then you need to preserve the qcow2 copyrights 
 in 
 this file.

Or ideally we would generalise qcow2-cache.c so that it can be used by
both. Not sure how easy it would be, though.

 +AddCowCache *add_cow_cache_create(BlockDriverState *bs, int num_tables,
 +bool writethrough)
 +{
 +AddCowCache *c;
 +int i;
 +
 +c = g_malloc0(sizeof(*c));
 +c-size = num_tables;
 +c-entries = g_malloc0(sizeof(*c-entries) * num_tables);
 +c-writethrough = writethrough;
 +c-entry_size = ADD_COW_CACHE_ENTRY_SIZE;
 +
 +for (i = 0; i  c-size; i++) {
 +c-entries[i].table = qemu_blockalign(bs, c-entry_size);
 +c-entries[i].offset = -1;
 +}
 +
 +return c;
 +}
 +
 +int add_cow_cache_destroy(BlockDriverState *bs, AddCowCache *c)
 +{
 +int i;
 +
 +for (i = 0; i  c-size; i++) {
 +qemu_vfree(c-entries[i].table);
 +}
 +
 +g_free(c-entries);
 +g_free(c);
 +
 +return 0;
 +}
 +
 +static int add_cow_cache_entry_flush(BlockDriverState *bs,
 +AddCowCache *c,
 +int i)
 +{
 +BDRVAddCowState *s = bs-opaque;
 +int ret = 0;
 +
 +if (!c-entries[i].dirty || -1 == c-entries[i].offset) {
 +return ret;
 +}
 +
 +ret = bdrv_pwrite(bs-file, sizeof(AddCowHeader) + c-entries[i].offset,
 +c-entries[i].table,
 +MIN(c-entry_size, s-bitmap_size - c-entries[i].offset));
 
 This is a synchronous I/O function.  We shouldn't introduce new formats 

Re: [Qemu-devel] [PATCH 1/3 v9] add-cow file format

2012-05-30 Thread Stefan Hajnoczi
On Wed, May 30, 2012 at 2:50 AM, Dong Xu Wang
wdon...@linux.vnet.ibm.com wrote:
 On Tue, May 29, 2012 at 11:50 PM, Stefan Hajnoczi stefa...@gmail.com wrote:

I thought a bit more about locking.  Because the metadata is simple
not much locking is necessary except when fetching new bitmap clusters
from the image file into the cache and when populating untouched
sectors during data cluster allocation.  Those are the two cases where
parallel requests could put the block driver or image file into a bad
state if allowed to run without any locking.

Another way of describing the consequences of parallelism:
1. Coroutines must not duplicate the same add-cow bitmap cluster into
the cache if they run at the same time.
2. Coroutines must not hold bitmap tables across blocking operations
since the cache entry has no reference count and might be evicted from
the cache.
3. Coroutines must not allocate the same data cluster simultaneously
because untouched head/tail sectors must never race with guest writes.

 +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
 +{
 +    BDRVAddCowState *s = bs-opaque;
 +    int sector_per_byte = SECTORS_PER_CLUSTER * 8;
 +    int ret;
 +    int64_t old_image_sector = s-image_hd-total_sectors;
 +    int64_t bitmap_size =
 +        (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
 +
 +    ret = bdrv_truncate(bs-file,
 +        sizeof(AddCowHeader) + bitmap_size);
 +    if (ret  0) {
 +        bdrv_truncate(s-image_hd, old_image_sector * BDRV_SECTOR_SIZE);

 Why truncate image_hd on failure?  We never touch the image_hd size on
 success either.  I think we can just leave it alone.

 That means whether we truncate add-cow fails or not ,we should not never touch
 image_hd size?

I thought about this more and I think we should truncate image_hd in
the success case only.  In order to resize the image we need to resize
the cow bitmap and then resize image_hd.  If resizing the add-cow file
failed, then we haven't changed the cow bitmap and we don't need to
truncate image_hd.  Do you agree with this or have I missed something?

 @@ -828,6 +832,41 @@ static int img_convert(int argc, char **argv)
     }

     /* Create the new image */
 +
 +    if (0 == strcmp(out_fmt, add-cow)) {
 +        image_drv = bdrv_find_format(raw);
 +        if (!drv) {
 +            ret = -1;
 +            goto out;
 +        }
 +        snprintf(image_filename, sizeof(image_filename),
 +            %s.ct.raw, out_filename);
 +        ret = bdrv_create(image_drv, image_filename, image_param);
 +        if (ret  0) {
 +            error_report(%s: error while creating image_file: %s,
 +                     image_filename, strerror(-ret));
 +            goto out;
 +        }
 +        set_option_parameter(param, BLOCK_OPT_IMAGE_FILE, image_filename);
 +
 +        if (!out_baseimg) {
 +            backing_drv = bdrv_find_format(qcow2);
 +            if (!drv) {
 +                ret = -1;
 +                goto out;
 +            }
 +            snprintf(backing_filename, sizeof(backing_filename),
 +                %s.ct.qcow2, out_filename);
 +            ret = bdrv_create(backing_drv, backing_filename, image_param);
 +            if (ret  0) {
 +                error_report(%s: error while creating backing_file: %s,
 +                         backing_filename, strerror(-ret));
 +                goto out;
 +            }
 +            set_option_parameter(param, BLOCK_OPT_BACKING_FILE,
 +                backing_filename);
 +        }
 +    }

 If this diff hunk is dropped then the user needs to manually create
 the raw file before running qemu-img convert?

 qemu-img convert -O add-cow seems like a very rare case.  I'm not sure
 we should add special user-friend hacks for this.

 I'm not sure I understand why you create a qcow2 file either.

 Yes, if we use qemu-img convert -O add-cow, we should create 2 other files,
 raw file and qcow2(I just picked  up qcow2, other formats is also okay) file,
 as image_file and backing_file, without the two files, .add-cow file can not
 work properly.

 Although it will occour in very rare cases, I wish to pass all qemu-iotests
 cases, so I added these code.

 Do you think these are not necessary? And some qemu-iotests cases are
 using convert operation, If I do not write previous code, these cases will
 fail. Can I let these cases do not support add-cow?

If a test uses qemu-img convert then it's probably not that
interesting for add-cow.  Converting is not a useful operation because
add-cow is an add-on block driver that adds a feature on top of raw,
rather than a format like vmdk or qcow2 which is used to share disk
images.  I see why you did this to make qemu-iotests work, but
personally I would drop this special case code and skip those tests.

Stefan



Re: [Qemu-devel] [PATCH 1/3 v9] add-cow file format

2012-05-29 Thread Stefan Hajnoczi
 +    image_sectors = image_bs-total_sectors;

Please use bdrv_getlength() instead of accessing total_sectors directly.

 +    image_drv = bdrv_find_format(raw);
 +    ret = bdrv_open(s-image_hd, image_filename, flags, image_drv);
 +    if (ret  0) {
 +        bdrv_delete(s-image_hd);
 +        goto fail;
 +    }
 +    bs-total_sectors = s-image_hd-total_sectors;

bdrv_getlength()

 +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
 +        int64_t sector_num, int nb_sectors, int *num_same)
 +{
 +    int changed;
 +    int64_t cluster_num;
 +
 +    if (nb_sectors == 0) {
 +        *num_same = 0;
 +        return 0;
 +    }
 +
 +    cluster_num = sector_num / SECTORS_PER_CLUSTER;
 +    changed = is_allocated(bs, sector_num);

Do we need to hold the lock before (indirectly) accessing the cache?

 +    *num_same =
 +        MIN(nb_sectors, (cluster_num + 1) * SECTORS_PER_CLUSTER - 
 sector_num);
 +
 +    for (cluster_num = sector_num / SECTORS_PER_CLUSTER + 1;
 +        cluster_num = (sector_num + nb_sectors - 1) / SECTORS_PER_CLUSTER;
 +        cluster_num++) {
 +        if (is_allocated(bs, cluster_num * SECTORS_PER_CLUSTER) != changed) {
 +            break;
 +        }
 +        *num_same = MIN(nb_sectors,
 +            (cluster_num + 1) * SECTORS_PER_CLUSTER - sector_num);
 +    }

I think this makes sense but it would be easier to use a loop counter
that is in sector units instead of clusters.  Then you can calculate
*num_name after the loop simply by subtracting the starting sector_num
from the final cur_sector value.  And it saves you from constantly
converting back and forth between clusters and sectors.

 +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
 +        int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
 +{
 +    BDRVAddCowState *s = bs-opaque;
 +    int ret = 0, i;
 +    QEMUIOVector hd_qiov;
 +    uint8_t *table;
 +    bool changed = false;
 +
 +    qemu_co_mutex_lock(s-lock);
 +    qemu_iovec_init(hd_qiov, qiov-niov);
 +    ret = bdrv_co_writev(s-image_hd,
 +                     sector_num,
 +                     remaining_sectors, qiov);

We don't need to lock across all writes.

lock()
if write requires allocation:
...do allocation stuff under lock...
unlock()
write data

Internal data structure (cache, metadata, and copying unmodified
sectors) access needs to be locked during allocating writes.  The
guest's data can be written without the lock held.

This means that most writes will only lock for a short time to check
that the clusters are already allocated.  Then they will be able to
write in parallel.

If any cluster is not yet allocated then the allocation needs to
happen under lock.  This ensures that we don't copy backing file
sectors while processing another write request that touches those
sectors.

 +
 +    if (ret  0) {
 +        goto fail;
 +    }
 +    /* copy content of unmodified sectors */
 +    if (!is_cluster_head(sector_num)  !is_allocated(bs, sector_num)) {
 +        qemu_co_mutex_unlock(s-lock);
 +        ret = copy_sectors(bs, sector_num  ~(SECTORS_PER_CLUSTER - 1),
 +            sector_num);

As mentioned above, I think we need to lock during copy_sectors() so
that other requests cannot race with this.  (The guest's writes must
always take priority over copying unmodified sectors.)

 +        qemu_co_mutex_lock(s-lock);
 +        if (ret  0) {
 +            goto fail;
 +        }
 +    }
 +
 +    if (!is_cluster_tail(sector_num + remaining_sectors - 1)
 +         !is_allocated(bs, sector_num + remaining_sectors - 1)) {
 +        qemu_co_mutex_unlock(s-lock);
 +        ret = copy_sectors(bs, sector_num + remaining_sectors,
 +            ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) + 
 1);
 +        qemu_co_mutex_lock(s-lock);
 +        if (ret  0) {
 +            goto fail;
 +        }
 +    }
 +
 +    for (i = sector_num / SECTORS_PER_CLUSTER;
 +        i = (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER;
 +        i++) {
 +        ret = add_cow_cache_get(bs, s-bitmap_cache,
 +            i * SECTORS_PER_CLUSTER, (void **)table);
 +        if (ret  0) {
 +            return 0;

return ret?

 +        }
 +        if ((table[i / 8]  (1  (i % 8))) == 0) {
 +            table[i / 8] |= (1  (i % 8));
 +            changed = true;
 +            add_cow_cache_entry_mark_dirty(s-bitmap_cache, table);
 +        }
 +
 +    }
 +    ret = 0;
 +fail:
 +    if (changed) {
 +        ret = add_cow_cache_flush(bs, s-bitmap_cache);
 +    }
 +    qemu_co_mutex_unlock(s-lock);
 +    qemu_iovec_destroy(hd_qiov);
 +    return ret;
 +}
 +
 +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
 +{
 +    BDRVAddCowState *s = bs-opaque;
 +    int sector_per_byte = SECTORS_PER_CLUSTER * 8;
 +    int ret;
 +    int64_t old_image_sector = s-image_hd-total_sectors;
 +    int64_t bitmap_size =
 +        (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
 +
 +    ret = 

Re: [Qemu-devel] [PATCH 1/3 v9] add-cow file format

2012-05-29 Thread Dong Xu Wang
On Tue, May 29, 2012 at 11:50 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 +    image_sectors = image_bs-total_sectors;

 Please use bdrv_getlength() instead of accessing total_sectors directly.

 +    image_drv = bdrv_find_format(raw);
 +    ret = bdrv_open(s-image_hd, image_filename, flags, image_drv);
 +    if (ret  0) {
 +        bdrv_delete(s-image_hd);
 +        goto fail;
 +    }
 +    bs-total_sectors = s-image_hd-total_sectors;

 bdrv_getlength()
Okay.

 +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
 +        int64_t sector_num, int nb_sectors, int *num_same)
 +{
 +    int changed;
 +    int64_t cluster_num;
 +
 +    if (nb_sectors == 0) {
 +        *num_same = 0;
 +        return 0;
 +    }
 +
 +    cluster_num = sector_num / SECTORS_PER_CLUSTER;
 +    changed = is_allocated(bs, sector_num);

 Do we need to hold the lock before (indirectly) accessing the cache?

 +    *num_same =
 +        MIN(nb_sectors, (cluster_num + 1) * SECTORS_PER_CLUSTER - 
 sector_num);
 +
 +    for (cluster_num = sector_num / SECTORS_PER_CLUSTER + 1;
 +        cluster_num = (sector_num + nb_sectors - 1) / SECTORS_PER_CLUSTER;
 +        cluster_num++) {
 +        if (is_allocated(bs, cluster_num * SECTORS_PER_CLUSTER) != changed) 
 {
 +            break;
 +        }
 +        *num_same = MIN(nb_sectors,
 +            (cluster_num + 1) * SECTORS_PER_CLUSTER - sector_num);
 +    }

 I think this makes sense but it would be easier to use a loop counter
 that is in sector units instead of clusters.  Then you can calculate
 *num_name after the loop simply by subtracting the starting sector_num
 from the final cur_sector value.  And it saves you from constantly
 converting back and forth between clusters and sectors.

 +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
 +        int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
 +{
 +    BDRVAddCowState *s = bs-opaque;
 +    int ret = 0, i;
 +    QEMUIOVector hd_qiov;
 +    uint8_t *table;
 +    bool changed = false;
 +
 +    qemu_co_mutex_lock(s-lock);
 +    qemu_iovec_init(hd_qiov, qiov-niov);
 +    ret = bdrv_co_writev(s-image_hd,
 +                     sector_num,
 +                     remaining_sectors, qiov);

 We don't need to lock across all writes.

 lock()
 if write requires allocation:
    ...do allocation stuff under lock...
 unlock()
 write data

 Internal data structure (cache, metadata, and copying unmodified
 sectors) access needs to be locked during allocating writes.  The
 guest's data can be written without the lock held.

 This means that most writes will only lock for a short time to check
 that the clusters are already allocated.  Then they will be able to
 write in parallel.

 If any cluster is not yet allocated then the allocation needs to
 happen under lock.  This ensures that we don't copy backing file
 sectors while processing another write request that touches those
 sectors.

 +
 +    if (ret  0) {
 +        goto fail;
 +    }
 +    /* copy content of unmodified sectors */
 +    if (!is_cluster_head(sector_num)  !is_allocated(bs, sector_num)) {
 +        qemu_co_mutex_unlock(s-lock);
 +        ret = copy_sectors(bs, sector_num  ~(SECTORS_PER_CLUSTER - 1),
 +            sector_num);

 As mentioned above, I think we need to lock during copy_sectors() so
 that other requests cannot race with this.  (The guest's writes must
 always take priority over copying unmodified sectors.)

 +        qemu_co_mutex_lock(s-lock);
 +        if (ret  0) {
 +            goto fail;
 +        }
 +    }
 +
 +    if (!is_cluster_tail(sector_num + remaining_sectors - 1)
 +         !is_allocated(bs, sector_num + remaining_sectors - 1)) {
 +        qemu_co_mutex_unlock(s-lock);
 +        ret = copy_sectors(bs, sector_num + remaining_sectors,
 +            ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) 
 + 1);
 +        qemu_co_mutex_lock(s-lock);
 +        if (ret  0) {
 +            goto fail;
 +        }
 +    }
 +
 +    for (i = sector_num / SECTORS_PER_CLUSTER;
 +        i = (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER;
 +        i++) {
 +        ret = add_cow_cache_get(bs, s-bitmap_cache,
 +            i * SECTORS_PER_CLUSTER, (void **)table);
 +        if (ret  0) {
 +            return 0;

 return ret?
Yes..

 +        }
 +        if ((table[i / 8]  (1  (i % 8))) == 0) {
 +            table[i / 8] |= (1  (i % 8));
 +            changed = true;
 +            add_cow_cache_entry_mark_dirty(s-bitmap_cache, table);
 +        }
 +
 +    }
 +    ret = 0;
 +fail:
 +    if (changed) {
 +        ret = add_cow_cache_flush(bs, s-bitmap_cache);
 +    }
 +    qemu_co_mutex_unlock(s-lock);
 +    qemu_iovec_destroy(hd_qiov);
 +    return ret;
 +}
 +
 +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
 +{
 +    BDRVAddCowState *s = bs-opaque;
 +    int sector_per_byte = SECTORS_PER_CLUSTER * 8;
 +    int ret;
 +    int64_t old_image_sector = s-image_hd-total_sectors;
 +    int64_t 

Re: [Qemu-devel] [PATCH 1/3 v9] add-cow file format

2012-05-29 Thread Anthony Liguori

On 05/08/2012 01:34 AM, Dong Xu Wang wrote:

Provide a new file format: add-cow. The usage can be found in add-cow.txt of
this patch.

CC: Kevin Wolfkw...@redhat.com
CC: Stefan Hajnoczistefa...@linux.vnet.ibm.com
Signed-off-by: Dong Xu Wangwdon...@linux.vnet.ibm.com


You should split out the spec to be the first patch.  That makes it easier for 
people to review the specification independent of the code and also makes 
subsequent code review easier.



---
  Makefile.objs  |1 +
  block.c|2 +-
  block.h|1 +
  block/add-cow-cache.c  |  193 +
  block/add-cow.c|  446 
  block/add-cow.h|   83 +
  block_int.h|1 +
  docs/specs/add-cow.txt |   68 
  qemu-img.c |   39 +
  9 files changed, 833 insertions(+), 1 deletion(-)
  create mode 100644 block/add-cow-cache.c
  create mode 100644 block/add-cow.c
  create mode 100644 block/add-cow.h
  create mode 100644 docs/specs/add-cow.txt

diff --git a/Makefile.objs b/Makefile.objs
index 70c5c79..10c5c52 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -52,6 +52,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o 
dmg.o bochs.o vpc.o vv
  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
qcow2-cache.o
  block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
  block-nested-y += qed-check.o
+block-nested-y += add-cow.o add-cow-cache.o
  block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
  block-nested-y += stream.o
  block-nested-$(CONFIG_WIN32) += raw-win32.o
diff --git a/block.c b/block.c
index 43c794c..206860c 100644
--- a/block.c
+++ b/block.c
@@ -196,7 +196,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
  }

  /* check if the path starts with protocol: */
-static int path_has_protocol(const char *path)
+int path_has_protocol(const char *path)
  {
  #ifdef _WIN32
  if (is_windows_drive(path) ||
diff --git a/block.h b/block.h
index f163e54..f74c79e 100644
--- a/block.h
+++ b/block.h
@@ -319,6 +319,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, 
QEMUSnapshotInfo *sn);

  char *get_human_readable_size(char *buf, int buf_size, int64_t size);
  int path_is_absolute(const char *path);
+int path_has_protocol(const char *path);
  void path_combine(char *dest, int dest_size,
const char *base_path,
const char *filename);
diff --git a/block/add-cow-cache.c b/block/add-cow-cache.c
new file mode 100644
index 000..3ae23c1
--- /dev/null
+++ b/block/add-cow-cache.c
@@ -0,0 +1,193 @@
+/*
+ * Cache For QEMU ADD-COW Disk Format
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */



This is not a valid copyright block.  Look at hw/virtio.c for an example.  If 
you want to do LGPL instead of GPL, it should also be 2.1 or later.



+#include block_int.h
+#include qemu-common.h
+#include add-cow.h
+
+/* Based on qcow2-cache.c */


If the code is based on qcow2, then you need to preserve the qcow2 copyrights in 
this file.



+AddCowCache *add_cow_cache_create(BlockDriverState *bs, int num_tables,
+bool writethrough)
+{
+AddCowCache *c;
+int i;
+
+c = g_malloc0(sizeof(*c));
+c-size = num_tables;
+c-entries = g_malloc0(sizeof(*c-entries) * num_tables);
+c-writethrough = writethrough;
+c-entry_size = ADD_COW_CACHE_ENTRY_SIZE;
+
+for (i = 0; i  c-size; i++) {
+c-entries[i].table = qemu_blockalign(bs, c-entry_size);
+c-entries[i].offset = -1;
+}
+
+return c;
+}
+
+int add_cow_cache_destroy(BlockDriverState *bs, AddCowCache *c)
+{
+int i;
+
+for (i = 0; i  c-size; i++) {
+qemu_vfree(c-entries[i].table);
+}
+
+g_free(c-entries);
+g_free(c);
+
+return 0;
+}
+
+static int add_cow_cache_entry_flush(BlockDriverState *bs,
+AddCowCache *c,
+int i)
+{
+BDRVAddCowState *s = bs-opaque;
+int ret = 0;
+
+if (!c-entries[i].dirty || -1 == c-entries[i].offset) {
+return ret;
+}
+
+ret = bdrv_pwrite(bs-file, sizeof(AddCowHeader) + c-entries[i].offset,
+c-entries[i].table,
+MIN(c-entry_size, s-bitmap_size - c-entries[i].offset));


This is a synchronous I/O function.  We shouldn't introduce new formats that 
have *any* synchronous I/O in them.


Honestly, I think using coroutines for this format is a mistake.  It's simple 
enough that doing a state machine would be straight forward enough.



+if (ret  0) {
+return ret;
+}
+
+c-entries[i].dirty = false;
+
+return 0;
+}
+
+int add_cow_cache_flush(BlockDriverState *bs, AddCowCache *c)
+{
+BDRVAddCowState *s = bs-opaque;
+int result = 0;
+int ret;
+int i;
+
+ret = bdrv_flush(s-image_hd);
+if (ret  0) {
+return result;
+}
+
+for (i = 0; i  

[Qemu-devel] [PATCH 1/3 v9] add-cow file format

2012-05-07 Thread Dong Xu Wang
Provide a new file format: add-cow. The usage can be found in add-cow.txt of
this patch.

CC: Kevin Wolf kw...@redhat.com
CC: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
---
 Makefile.objs  |1 +
 block.c|2 +-
 block.h|1 +
 block/add-cow-cache.c  |  193 +
 block/add-cow.c|  446 
 block/add-cow.h|   83 +
 block_int.h|1 +
 docs/specs/add-cow.txt |   68 
 qemu-img.c |   39 +
 9 files changed, 833 insertions(+), 1 deletion(-)
 create mode 100644 block/add-cow-cache.c
 create mode 100644 block/add-cow.c
 create mode 100644 block/add-cow.h
 create mode 100644 docs/specs/add-cow.txt

diff --git a/Makefile.objs b/Makefile.objs
index 70c5c79..10c5c52 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -52,6 +52,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o 
dmg.o bochs.o vpc.o vv
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
qcow2-cache.o
 block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-nested-y += qed-check.o
+block-nested-y += add-cow.o add-cow-cache.o
 block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 block-nested-y += stream.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
diff --git a/block.c b/block.c
index 43c794c..206860c 100644
--- a/block.c
+++ b/block.c
@@ -196,7 +196,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
 }
 
 /* check if the path starts with protocol: */
-static int path_has_protocol(const char *path)
+int path_has_protocol(const char *path)
 {
 #ifdef _WIN32
 if (is_windows_drive(path) ||
diff --git a/block.h b/block.h
index f163e54..f74c79e 100644
--- a/block.h
+++ b/block.h
@@ -319,6 +319,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, 
QEMUSnapshotInfo *sn);
 
 char *get_human_readable_size(char *buf, int buf_size, int64_t size);
 int path_is_absolute(const char *path);
+int path_has_protocol(const char *path);
 void path_combine(char *dest, int dest_size,
   const char *base_path,
   const char *filename);
diff --git a/block/add-cow-cache.c b/block/add-cow-cache.c
new file mode 100644
index 000..3ae23c1
--- /dev/null
+++ b/block/add-cow-cache.c
@@ -0,0 +1,193 @@
+/*
+ * Cache For QEMU ADD-COW Disk Format
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include block_int.h
+#include qemu-common.h
+#include add-cow.h
+
+/* Based on qcow2-cache.c */
+AddCowCache *add_cow_cache_create(BlockDriverState *bs, int num_tables,
+bool writethrough)
+{
+AddCowCache *c;
+int i;
+
+c = g_malloc0(sizeof(*c));
+c-size = num_tables;
+c-entries = g_malloc0(sizeof(*c-entries) * num_tables);
+c-writethrough = writethrough;
+c-entry_size = ADD_COW_CACHE_ENTRY_SIZE;
+
+for (i = 0; i  c-size; i++) {
+c-entries[i].table = qemu_blockalign(bs, c-entry_size);
+c-entries[i].offset = -1;
+}
+
+return c;
+}
+
+int add_cow_cache_destroy(BlockDriverState *bs, AddCowCache *c)
+{
+int i;
+
+for (i = 0; i  c-size; i++) {
+qemu_vfree(c-entries[i].table);
+}
+
+g_free(c-entries);
+g_free(c);
+
+return 0;
+}
+
+static int add_cow_cache_entry_flush(BlockDriverState *bs,
+AddCowCache *c,
+int i)
+{
+BDRVAddCowState *s = bs-opaque;
+int ret = 0;
+
+if (!c-entries[i].dirty || -1 == c-entries[i].offset) {
+return ret;
+}
+
+ret = bdrv_pwrite(bs-file, sizeof(AddCowHeader) + c-entries[i].offset,
+c-entries[i].table,
+MIN(c-entry_size, s-bitmap_size - c-entries[i].offset));
+if (ret  0) {
+return ret;
+}
+
+c-entries[i].dirty = false;
+
+return 0;
+}
+
+int add_cow_cache_flush(BlockDriverState *bs, AddCowCache *c)
+{
+BDRVAddCowState *s = bs-opaque;
+int result = 0;
+int ret;
+int i;
+
+ret = bdrv_flush(s-image_hd);
+if (ret  0) {
+return result;
+}
+
+for (i = 0; i  c-size; i++) {
+ret = add_cow_cache_entry_flush(bs, c, i);
+if (ret  0  result != -ENOSPC) {
+result = ret;
+}
+}
+
+if (result == 0) {
+ret = bdrv_flush(bs-file);
+if (ret  0) {
+result = ret;
+}
+}
+
+return result;
+}
+
+static int add_cow_cache_find_entry_to_replace(AddCowCache *c)
+{
+int i;
+int min_count = INT_MAX;
+int min_index = -1;
+
+
+for (i = 0; i  c-size; i++) {
+if (c-entries[i].cache_hits  min_count) {
+min_index = i;
+min_count = c-entries[i].cache_hits;
+}
+
+c-entries[i].cache_hits /= 2;
+}
+
+if (min_index == -1) {
+abort();
+}
+return min_index;
+}
+
+static int