On Tue, May 19, 2026 at 5:49 PM Stefan Hajnoczi <[email protected]> wrote: > > On Mon, May 18, 2026 at 12:21:55AM +0200, Sam Li wrote: > > On Thu, May 14, 2026 at 9:49 PM Stefan Hajnoczi <[email protected]> wrote: > > > On Sun, May 10, 2026 at 07:50:57PM +0200, Sam Li wrote: > > > > + 48 - 55: zonedmeta_offset > > > > + The offset of zoned metadata structure in the > > > > contained > > > > + image, in bytes. > > > > > > Do you want to say anything about the order in which metadata is > > > persisted to disk when zones used? I guess the data is written into the > > > image file first, then the non-zoned qcow2 L1/L2/refcount metadata is > > > updated, and finally the write pointer is written. Write pointers are > > > not guaranteed to be updated on disk until the write request followed by > > > a flush request are both completed. > > > > The current ordering is not like that. The write pointer is written > > persistently first, then the data writes and the non-zoned qcow2 > > L1/L2/refcount metadata updates. On IO failure, the corresponding > > write pointer is re-read from disk. As noted in the previous comment, > > the wp must be updated when issuing the IO, under the assumption that > > the write IO will succeed. > > > > The ordering has been settled this way since v7 to deal with > > concurrent zone append writes. If the wp was only updated after data > > I/O, two concurrent appends would both have read the same wp and tried > > to write to the same position. > > > > > > > > (The idea is that the data must be visible in the qcow2 file before it > > > is safe to update the write pointer. Otherwise a power failure would > > > leave the file in an inconsistent state where the write pointer has > > > advanced but the data was not written.) > > > > The crash-consistency is a concern... > > Yes, I'm thinking about crash-consistency. The ordering you described > can result in qcow2 images where the write pointer is ahead of the > actually written data after a power failure or maybe a QEMU crash. > > QEMU's block layer must follow the same data integrity behavior that > real devices guarantee.
I may have found a solution to deal with both cases. The fix is to update wp in memory instead of flushing it before qcow2 metadata and data writes. The zone append write path would become: On submission: 1) wp_lock() 2) Check write alignment 3) wp_update (in memory) 4) wp_unlock() 5) Issue write And on completion: 1) If no error: wp_flush with locks and return success 2) else, wp_lock() 3) read_wp (from disk) and use the read wp value as the current wp 4) wp_unlock() 5) return IO error Sam > > Damien: Do real zoned block devices guarantee that the updated write > pointer is persisted only after appended data has written been > persisted? > > Stefan
