Re: [PVE-User] excessive I/O latency during CEPH rebuild

Adam Thompson Tue, 28 Oct 2014 08:32:02 -0700

On 14-10-28 10:11 AM, Eneko Lacunza wrote:

Hi Adam,
You only have 3 osd in ceph cluster?

What about journals? Are they inline or in a separate (ssd?) disk?
What about network? Do you have an phisically independent network forproxmox/vms and ceph?
We have a currently 6-osd 3-node ceph cluster; doing an out/in of aosd, doesn't create a very high impact. If you in a new osd (replace adisk) the impact is noticeable but our ~30 vms were yet workable. Wedo have different physicall networks for proxmox/VMs and ceph. (1gbit)


4 nodes.
2 OSDs per node.

Journal on the same drive as the OSD, unfortunately... the nodes onlyhave 3 drive bays each.Each node has 4 x 1Gb network in LACP bond, using OpenVSwitch, VLANs ontop of that. Dedicated VLAN for CEPH and Proxmox management. Totalnetwork bandwidth in use from each node during rebuild is only ~1.5Gbps,with no single LACP member ever bursting higher than ~600Mbps. Ibelieve it's unlikely to be a network problem, I've stress-tested OVS atmuch higher data rates than this.

You mention setting 'noout'; is there a way to do that inside the GUI,or should I just do that at the CEPH CLI with "ceph osd set noout"? Ican see that this would skip one rebalancing step, but I still have torebalance after I replace each disk, don't I?

FWIW, I'm replacing 8x 250GB disks with 8x 500GB disks that becameavailable from another storage cluster. I'm almost done at thispoint... just want to know how to avoid the massive performance hit nexttime.

Oh, and on the node with the new disk, I see IOWAIT times of ~15%. Whichmakes sense IMHO, I'm writing a ton of data to the new disk.


--
-Adam Thompson
 [email protected]

_______________________________________________
pve-user mailing list
[email protected]
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Re: [PVE-User] excessive I/O latency during CEPH rebuild

Reply via email to