We currently don't enforce that the sparse segments we detect during convert are
aligned. This leads to unnecessary and costly read-modify-write cycles either
internally in Qemu or in the background on the storage device as nearly all
modern filesystems or hardware has a 4k alignment internally.

As we per default set the min_sparse size to 4k it makes perfectly sense to 
ensure
that these sparse holes in the file are placed at 4k boundaries.

The number of RMW cycles when converting an example image [1] to a raw device 
that
has 4k sector size is about 4600 4k read requests to perform a total of about 
15000
write requests. With this path the 4600 additional read requests are eliminated.

[1] 
https://cloud-images.ubuntu.com/releases/16.04/release/ubuntu-16.04-server-cloudimg-amd64-disk1.vmdk

Signed-off-by: Peter Lieven <p...@kamp.de>
---
 qemu-img.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 75f1610..68eefba 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1096,24 +1096,33 @@ static int64_t find_nonzero(const uint8_t *buf, int64_t 
n)
  *
  * 'pnum' is set to the number of sectors (including and immediately following
  * the first one) that are known to be in the same allocated/unallocated state.
+ * The function will try to align 'pnum' to 8 sectors (4k) to avoid unnecassary
+ * RMW cycles on modern hardware.
  */
 static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum)
 {
     bool is_zero;
-    int i;
+    int i, alignment = 1;
 
     if (n <= 0) {
         *pnum = 0;
         return 0;
     }
-    is_zero = buffer_is_zero(buf, 512);
-    for(i = 1; i < n; i++) {
-        buf += 512;
-        if (is_zero != buffer_is_zero(buf, 512)) {
+
+    if (!(n & 7)) {
+        /* the buffer size is dividable by 4k */
+        alignment = 8;
+        n /= 8;
+    }
+
+    is_zero = buffer_is_zero(buf, BDRV_SECTOR_SIZE * alignment);
+    for (i = 1; i < n; i++) {
+        buf += BDRV_SECTOR_SIZE * alignment;
+        if (is_zero != buffer_is_zero(buf, BDRV_SECTOR_SIZE * alignment)) {
             break;
         }
     }
-    *pnum = i;
+    *pnum = i * alignment;
     return !is_zero;
 }
 
-- 
2.7.4



Reply via email to