Re: [ceph-users] RBD snapshot atomicity guarantees?

Hector Martin Tue, 18 Dec 2018 02:49:21 -0800

On 18/12/2018 18:28, Oliver Freyermuth wrote:

We have yet to observe these hangs, we are running this with ~5 VMs with ~10 disks for 
about half a year now with daily snapshots. But all of these VMs have very 
"low" I/O,
since we put anything I/O intensive on bare metal (but with automated 
provisioning of course).


So I'll chime in on your question, especially since there might be VMs on our 
cluster in the future where the inner OS may not be running an agent.
Since we did not observe this yet, I'll also add: What's your "scale", is it 
hundreds of VMs / disks? Hourly snapshots? I/O intensive VMs?

5 hosts, 15 VMs, daily snapshots. I/O is variable (customer workloads);usually not that high, but it can easily peak at 100% when certainthings happen. We don't have great I/O performance (RBD over 1gbps linksto HDD OSDs).

I'm poring through monitoring graphs now and I think the issue this timearound was just too much dirty data in the page cache of a guest. The VMthat failed spent 3 minutes flushing out writes to disk before its I/Owas quiesced, at around 100 IOPS throughput (the actual data throughputwas low, though, so small writes). That exceeded our timeout and thenthings went south from there.

I wasn't sure if fsfreeze did a full sync to disk, but given the I/Obehavior I'm seeing that seems to be the case. Unfortunately coming upwith an upper bound for the freeze time seems tricky now. I'm increasingour timeout to 15 minutes, we'll see if the problem recurs.

Given this, it makes even more sense to just avoid the freeze if at allreasonable. There's no real way to guarantee that a fsfreeze willcomplete in a "reasonable" amount of time as far as I can tell.


--
Hector Martin (hec...@marcansoft.com)
Public Key: https://marcan.st/marcan.asc
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD snapshot atomicity guarantees?

Reply via email to