El 20/07/16 a las 17:46, Dietmar Maurer escribió:
This is called from restore_extents, where a comment precisely says "try
to write whole clusters to speedup restore", so this means we're writing
64KB-8Byte chunks, which is giving a hard time to Ceph-RBD because this
means lots of ~64KB IOPS.

So, I suggest the following solution to your consideration:
- Create a write buffer on startup (let's asume it's 4MB for example, a
number ceph rbd would like much more than 64KB). This could even be
configurable and skip the buffer altogether if buffer_size=cluster_size
- Wrap current "restore_write_data" with a
"restore_write_data_with_buffer", that does a copy to the 4MB buffer,
and only calls "restore_write_data" when it's full.
      * Create a new "flush_restore_write_data_buffer" to flush the write
buffer when device restore reading is complete.

Do you think this is a good idea? If so I will find time to implement
and test this to check whether restore time improves.
We store those 64KB blocks out of order, so your suggestion will not work
in general.
But I suppose they're mostly ordered?
But you can try to assemble larger blocks, and write them once you get
an out of order block...
Yes, this is the plan.
I always thought the ceph libraries does (or should do) that anyways?
(write combining)
Reading the docs:
http://docs.ceph.com/docs/hammer/rbd/rbd-config-ref/

It should be true when write-back rbd cache is activated. This seems to be the default, but maybe we're using disk cache setting on restore too?

I'll try to change the disk cache setting and will report the results.

Thanks
Eneko


--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
      943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to