On Mon, Jan 24, 2011 at 3:36 PM, Kevin Wolf <kw...@redhat.com> wrote: > Am 24.01.2011 16:26, schrieb Stefan Hajnoczi: >> On Mon, Jan 24, 2011 at 2:54 PM, Kevin Wolf <kw...@redhat.com> wrote: >>> [ Re-adding qemu-devel to CC ] >>> >>> Am 24.01.2011 15:34, schrieb Stefan Hajnoczi: >>>> On Thu, Jan 20, 2011 at 5:10 PM, Kevin Wolf <kw...@redhat.com> wrote: >>>>> @@ -702,17 +622,30 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState >>>>> *bs, QCowL2Meta *m) >>>>> >>>>> if (m->nb_available & (s->cluster_sectors - 1)) { >>>>> uint64_t end = m->nb_available & ~(uint64_t)(s->cluster_sectors - >>>>> 1); >>>>> + cow = true; >>>>> ret = copy_sectors(bs, start_sect + end, cluster_offset + (end << >>>>> 9), >>>>> m->nb_available - end, s->cluster_sectors); >>>>> if (ret < 0) >>>>> goto err; >>>>> } >>>>> >>>>> - /* update L2 table */ >>>>> + /* >>>>> + * Update L2 table. >>>>> + * >>>>> + * Before we update the L2 table to actually point to the new >>>>> cluster, we >>>>> + * need to be sure that the refcounts have been increased and COW was >>>>> + * handled. >>>>> + */ >>>>> + if (cow) { >>>>> + bdrv_flush(bs->file); >>>> >>>> Just bdrv_flush(bs->file) and not a refcounts cache flush? >>> >>> Refcounts and data need not to be ordered against each other. They only >>> must both be on disk when we write the L2 table. >> >> Have I missed where refcounts actually get flushed from the cache out >> to the disk because bdrv_flush(bs->file) only syncs the file but >> doesn't write out dirty data held in cache. > > The bdrv_flush isn't supposed to flush the refcounts, but the data > copied during COW (what you call pre/postfill in QED) > > The refcounts are handled by the qcow2_cache_set_dependency below, so > that before writing the L2 tables we always write the refcounts first. > >>>>> + } >>>>> + >>>>> + qcow2_cache_set_dependency(bs, s->l2_table_cache, >>>>> s->refcount_block_cache); >>>>> ret = get_cluster_table(bs, m->offset, &l2_table, &l2_offset, >>>>> &l2_index); >>>>> if (ret < 0) { >>>>> goto err; >>>>> } >>>>> + qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table); >>>>> >>>>> for (i = 0; i < m->nb_clusters; i++) { >>>>> /* if two concurrent writes happen to the same unallocated cluster >>>>> @@ -728,16 +661,9 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState >>>>> *bs, QCowL2Meta *m) >>>>> (i << s->cluster_bits)) | QCOW_OFLAG_COPIED); >>>>> } >>>>> >>>>> - /* >>>>> - * Before we update the L2 table to actually point to the new >>>>> cluster, we >>>>> - * need to be sure that the refcounts have been increased and COW was >>>>> - * handled. >>>>> - */ >>>>> - bdrv_flush(bs->file); >>>>> >>>>> - ret = write_l2_entries(bs, l2_table, l2_offset, l2_index, >>>>> m->nb_clusters); >>>>> + ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table); >>>>> if (ret < 0) { >>>>> - qcow2_l2_cache_reset(bs); >>>>> goto err; >>>>> } >>>>> >>>> >>>> The function continues like this: >>>> >>>> /* >>>> * If this was a COW, we need to decrease the refcount of the old cluster. >>>> * Also flush bs->file to get the right order for L2 and refcount update. >>>> */ >>>> if (j != 0) { >>>> bdrv_flush(bs->file); >>>> for (i = 0; i < j; i++) { >>>> qcow2_free_any_clusters(bs, >>>> be64_to_cpu(old_cluster[i]) & ~QCOW_OFLAG_COPIED, 1); >>>> } >>>> } >>>> >>>> Does bdrv_flush(bs->file) "get the right order for L2 and refcount >>>> update"? We've just put an L2 table, should this be an L2 table >>>> flush? >>> >>> I agree, this looks wrong. What we need is a dependency to ensure that >>> first L2 is flushed and then the refcount block. >>> qcow2_free_any_clusters() calls update_refcount() indirectly, which >>> takes care of setting this dependency. >>> >>> So in the end I think it's just an unnecessary bdrv_flush. Makes sense? >> >> I don't understand this fully. I've noticed that the cache isn't the >> only mechanism for making changes to tables - there are functions like >> write_l2_entries() that directly write out parts of tables without >> honoring dependencies or using the dirty bit on the cache entry. I >> probably need to look at this more carefully to fully understand >> whether or not it is correct. > > No, the cache should be the only mechanism that is used for accessing L2 > tables and refcount blocks. write_l2_entries() is an old function that > is removed by the patch. > > Direct accesses should only be left for L1 tables and refcount tables.
Agreed. I was switching between branches and had looked at a non-qcowcache world! Stefan