Excerpts from Adam Kijak's message of 2016-10-12 12:23:41 +0000: > > ________________________________________ > > From: Xav Paice <xavpa...@gmail.com> > > Sent: Monday, October 10, 2016 8:41 PM > > To: openstack-operators@lists.openstack.org > > Subject: Re: [Openstack-operators] [openstack-operators][ceph][nova] How do > > you handle Nova on Ceph? > > > > On Mon, 2016-10-10 at 13:29 +0000, Adam Kijak wrote: > > > Hello, > > > > > > We use a Ceph cluster for Nova (Glance and Cinder as well) and over > > > time, > > > more and more data is stored there. We can't keep the cluster so big > > > because of > > > Ceph's limitations. Sooner or later it needs to be closed for adding > > > new > > > instances, images and volumes. Not to mention it's a big failure > > > domain. > > > > I'm really keen to hear more about those limitations. > > Basically it's all related to the failure domain ("blast radius") and risk > management. > Bigger Ceph cluster means more users.
Are these risks well documented? Since Ceph is specifically designed _not_ to have the kind of large blast radius that one might see with say, a centralized SAN, I'm curious to hear what events trigger cluster-wide blasts. > Growing the Ceph cluster temporary slows it down, so many users will be > affected. One might say that a Ceph cluster that can't be grown without the users noticing is an over-subscribed Ceph cluster. My understanding is that one is always advised to provision a certain amount of cluster capacity for growing and replicating to replaced drives. > There are bugs in Ceph which can cause data corruption. It's rare, but when > it happens > it can affect many (maybe all) users of the Ceph cluster. > :( _______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators