Re: Fwd: [Qemu-devel] bdrv_flush for qemu block drivers nbd, rbd and sheepdog
Am 22.10.2010 18:22, schrieb Sage Weil: On Fri, 22 Oct 2010, Kevin Wolf wrote: [ Adding qemu-devel to CC again ] Am 21.10.2010 20:59, schrieb Sage Weil: On Thu, 21 Oct 2010, Christian Brunner wrote: Hi, is there a flush operation in librados? - I guess the only way to handle this, would be waiting until all aio requests are finished? That's not the semantics of bdrv_flush, you don't need to wait for running requests. You just need to make sure that all completed requests are safe on disk so that they would persist even in case of a crash/power failure. Okay, in that case we're fine. librados doesn't declare a write committed until it is safely on disk on multiple backend nodes. There is a mechanism to get an ack sooner, but the qemu storage driver does not use it. There is no flush currently. But librados does no caching, so in this case at least silenting upgrading to cache=writethrough should work. You're making sure that the data can't be cached in the server's page cache or volatile disk cache either, e.g. by using O_SYNC for the image file? If so, upgrading would be safe. Right. Okay, implementing bdrv_flush as a nop is fine then. If that's a problem, we can implement a flush. Just let us know. Presumably providing a writeback mode with explicit flushes could improve performance. Upgrading to writethrough is not a correctness problem, though, so it's your decision if you want to implement it. So is a bdrv_flush generated when e.g. the guest filesystem issues a barrier, or would otherwise normally ask a SATA disk to flush it's cache? Right, this is the implementation for things like the FLUSH CACHE command in ATA. It's also used for ordering of writes to image metadata in formats like qcow2, but that's probably an unusual scenario for the Ceph backend. Kevin
Re: Fwd: [Qemu-devel] bdrv_flush for qemu block drivers nbd, rbd and sheepdog
[ Adding qemu-devel to CC again ] Am 21.10.2010 20:59, schrieb Sage Weil: On Thu, 21 Oct 2010, Christian Brunner wrote: Hi, is there a flush operation in librados? - I guess the only way to handle this, would be waiting until all aio requests are finished? That's not the semantics of bdrv_flush, you don't need to wait for running requests. You just need to make sure that all completed requests are safe on disk so that they would persist even in case of a crash/power failure. There is no flush currently. But librados does no caching, so in this case at least silenting upgrading to cache=writethrough should work. You're making sure that the data can't be cached in the server's page cache or volatile disk cache either, e.g. by using O_SYNC for the image file? If so, upgrading would be safe. If that's a problem, we can implement a flush. Just let us know. Presumably providing a writeback mode with explicit flushes could improve performance. Upgrading to writethrough is not a correctness problem, though, so it's your decision if you want to implement it. Kevin -- Forwarded message -- From: Kevin Wolf kw...@redhat.com Date: 2010/10/21 Subject: [Qemu-devel] bdrv_flush for qemu block drivers nbd, rbd and sheepdog To: Christian Brunner c...@muc.de, Laurent Vivier laur...@vivier.eu, MORITA Kazutaka morita.kazut...@lab.ntt.co.jp Cc: Qemu-devel@nongnu.org Hi all, I'm currently looking into adding a return value to qemu's bdrv_flush function and I noticed that your block drivers (nbd, rbd and sheepdog) don't implement bdrv_flush at all. bdrv_flush is going to return -ENOTSUP for any block driver not implementing this, effectively breaking these three drivers for anything but cache=unsafe. Is there a specific reason why your drivers don't implement this? I think I remember that one of the drivers always provides cache=writethough semantics. It would be okay to silently upgrade to cache=writethrough, so in this case I'd just need to add an empty bdrv_flush implementation. Otherwise, we really cannot allow any option except cache=unsafe because that's the semantics provided by the driver. In any case, I think it would be a good idea to implement a real bdrv_flush function to allow the write-back cache modes cache=off and cache=writeback in order to improve performance over writethrough. Is this possible with your protocols, or can the protocol be changed to consider this? Any hints on how to proceed? Kevin -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: [Qemu-devel] bdrv_flush for qemu block drivers nbd, rbd and sheepdog
On Fri, 22 Oct 2010, Kevin Wolf wrote: [ Adding qemu-devel to CC again ] Am 21.10.2010 20:59, schrieb Sage Weil: On Thu, 21 Oct 2010, Christian Brunner wrote: Hi, is there a flush operation in librados? - I guess the only way to handle this, would be waiting until all aio requests are finished? That's not the semantics of bdrv_flush, you don't need to wait for running requests. You just need to make sure that all completed requests are safe on disk so that they would persist even in case of a crash/power failure. Okay, in that case we're fine. librados doesn't declare a write committed until it is safely on disk on multiple backend nodes. There is a mechanism to get an ack sooner, but the qemu storage driver does not use it. There is no flush currently. But librados does no caching, so in this case at least silenting upgrading to cache=writethrough should work. You're making sure that the data can't be cached in the server's page cache or volatile disk cache either, e.g. by using O_SYNC for the image file? If so, upgrading would be safe. Right. If that's a problem, we can implement a flush. Just let us know. Presumably providing a writeback mode with explicit flushes could improve performance. Upgrading to writethrough is not a correctness problem, though, so it's your decision if you want to implement it. So is a bdrv_flush generated when e.g. the guest filesystem issues a barrier, or would otherwise normally ask a SATA disk to flush it's cache? sage Kevin -- Forwarded message -- From: Kevin Wolf kw...@redhat.com Date: 2010/10/21 Subject: [Qemu-devel] bdrv_flush for qemu block drivers nbd, rbd and sheepdog To: Christian Brunner c...@muc.de, Laurent Vivier laur...@vivier.eu, MORITA Kazutaka morita.kazut...@lab.ntt.co.jp Cc: Qemu-devel@nongnu.org Hi all, I'm currently looking into adding a return value to qemu's bdrv_flush function and I noticed that your block drivers (nbd, rbd and sheepdog) don't implement bdrv_flush at all. bdrv_flush is going to return -ENOTSUP for any block driver not implementing this, effectively breaking these three drivers for anything but cache=unsafe. Is there a specific reason why your drivers don't implement this? I think I remember that one of the drivers always provides cache=writethough semantics. It would be okay to silently upgrade to cache=writethrough, so in this case I'd just need to add an empty bdrv_flush implementation. Otherwise, we really cannot allow any option except cache=unsafe because that's the semantics provided by the driver. In any case, I think it would be a good idea to implement a real bdrv_flush function to allow the write-back cache modes cache=off and cache=writeback in order to improve performance over writethrough. Is this possible with your protocols, or can the protocol be changed to consider this? Any hints on how to proceed? Kevin -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html