I have not run into anyone replicating volumes or creating redundancy at the VM level (beyond, as you point out, HDFS, etc.).
R On Mon, Feb 8, 2016 at 6:54 PM, Joe Topjian <[email protected]> wrote: > This is a great conversation and I really appreciate everyone's input. > Though, I agree, we wandered off the original question and that's my fault > for mentioning various storage backends. > > For the sake of conversation, let's just say the user has no knowledge of > the underlying storage technology. They're presented with a Block Storage > service and the rest is up to them. What known, working options does the > user have to build their own block storage resilience? (Ignoring "obvious" > solutions where the application has native replication, such as Galera, > elasticsearch, etc) > > I have seen references to Cinder supporting replication, but I'm not able > to find a lot of information about it. The support matrix[1] lists very few > drivers that actually implement replication -- is this true or is there a > trove of replication docs that I just haven't been able to find? > > Amazon AWS publishes instructions on how to use mdadm with EBS[2]. One > might interpret that to mean mdadm is a supported solution within EC2 based > instances. > > There are also references to DRBD and EC2, though I could not find > anything as "official" as mdadm and EC2. > > Does anyone have experience (or know users) doing either? (specifically > with libvirt/KVM, but I'd be curious to know in general) > > Or is it more advisable to create multiple instances where data is > replicated instance-to-instance rather than a single instance with multiple > volumes and have data replicated volume-to-volume (by way of a single > instance)? And if so, why? Is a lack of stable volume-to-volume replication > a limitation of certain hypervisors? > > Or has this area just not been explored in depth within OpenStack > environments yet? > > 1: https://wiki.openstack.org/wiki/CinderSupportMatrix > 2: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html > > > On Mon, Feb 8, 2016 at 4:10 PM, Robert Starmer <[email protected]> wrote: > >> I'm not against Ceph, but even 2 machines (and really 2 machines with >> enough storage to be meaningful, e.g. not the all blade environments I've >> built some o7k systems on) may not be available for storage, so there are >> cases where that's not necessarily the solution. I built resiliency in one >> environment with a 2 node controller/Glance/db system with Gluster, which >> enabled enough middleware resiliency to meet the customers recovery >> expectations. Regardless, even with a cattle application model, the >> infrastructure middleware still needs to be able to provide some level of >> resiliency. >> >> But we've kind-of wandered off of the original question. I think that to >> bring this back on topic, I think users can build resilience in their own >> storage construction, but I still think there are use cases where the >> middleware either needs to use it's own resiliency layer, and/or may end up >> providing it for the end user. >> >> R >> >> On Mon, Feb 8, 2016 at 3:51 PM, Fox, Kevin M <[email protected]> wrote: >> >>> We've used ceph to address the storage requirement in small clouds >>> pretty well. it works pretty well with only two storage nodes with >>> replication set to 2, and because of the radosgw, you can share your small >>> amount of storage between the object store and the block store avoiding the >>> need to overprovision swift-only or cinder-only to handle usage unknowns. >>> Its just one pool of storage. >>> >>> Your right, using lvm is like telling your users, don't do pets, but >>> then having pets at the heart of your system. when you loose one, you loose >>> a lot. With a small ceph, you can take out one of the nodes, burn it to the >>> ground and put it back, and it just works. No pets. >>> >>> Do consider ceph for the small use case. >>> >>> Thanks, >>> Kevin >>> >>> ------------------------------ >>> *From:* Robert Starmer [[email protected]] >>> *Sent:* Monday, February 08, 2016 1:30 PM >>> *To:* Ned Rhudy >>> *Cc:* OpenStack Operators >>> >>> *Subject:* Re: [Openstack-operators] RAID / stripe block storage volumes >>> >>> Ned's model is the model I meant by "multiple underlying storage >>> services". Most of the systems I've built are LV/LVM only, a few added >>> Ceph as an alternative/live-migration option, and one where we used Gluster >>> due to size. Note that the environments I have worked with in general are >>> small (~20 compute), so huge Ceph environments aren't common. I am also >>> working on a project where the storage backend is entirely NFS... >>> >>> And I think users are more and more educated to assume that there is >>> nothing guaranteed. There is the realization, at least for a good set of >>> the customers I've worked with (and I try to educate the non-believers), >>> that the way you get best effect from a system like OpenStack is to >>> consider everything disposable. The one gap I've seen is that there are >>> plenty of folks who don't deploy SWIFT, and without some form of object >>> store, there's still the question of where you place your datasets so that >>> they can be quickly recovered (and how do you keep them up to date if you >>> do have one). With VMs, there's the concept that you can recover quickly >>> because the "dataset" e.g. your OS, is already there for you, and in plenty >>> of small environments, that's only as true as the glance repository (guess >>> what's usually backing that when there's no SWIFT around...). >>> >>> So I see the issue as a holistic one. How do you show operators/users >>> that they should consider everything disposable if we only look at the >>> current running instance as the "thing" Somewhere you still likely need >>> some form of distributed resilience (and yes, I can see using the >>> distributed Canonical, Centos, RedHat, Fedora, Debian, etc. mirrors as your >>> distributed Image backup but what about the database content, etc.). >>> >>> Robert >>> >>> On Mon, Feb 8, 2016 at 1:44 PM, Ned Rhudy (BLOOMBERG/ 731 LEX) < >>> [email protected]> wrote: >>> >>>> In our environments, we offer two types of storage. Tenants can either >>>> use Ceph/RBD and trade speed/latency for reliability and protection against >>>> physical disk failures, or they can launch instances that are realized as >>>> LVs on an LVM VG that we create on top of a RAID 0 spanning all but the OS >>>> disk on the hypervisor. This lets the users elect to go all-in on speed and >>>> sacrifice reliability for applications where replication/HA is handled at >>>> the app level, if the data on the instance is sourced from elsewhere, or if >>>> they just don't care much about the data. >>>> >>>> There are some further changes to our approach that we would like to >>>> make down the road, but in general our users seem to like the current >>>> system and being able to forgo reliability or speed as their circumstances >>>> demand. >>>> >>>> From: [email protected] >>>> Subject: Re: [Openstack-operators] RAID / stripe block storage volumes >>>> >>>> Hi Robert, >>>> >>>> Can you elaborate on "multiple underlying storage services"? >>>> >>>> The reason I asked the initial question is because historically we've >>>> made our block storage service resilient to failure. Historically we also >>>> made our compute environment resilient to failure, too, but over time, >>>> we've seen users become more educated to cope with compute failure. As a >>>> result, we've been able to become more lenient with regard to building >>>> resilient compute environments. >>>> >>>> We've been discussing how possible it would be to translate that same >>>> idea to block storage. Rather than have a large HA storage cluster (whether >>>> Ceph, Gluster, NetApp, etc), is it possible to offer simple single LVM >>>> volume servers and push the failure handling on to the user? >>>> >>>> Of course, this doesn't work for all types of use cases and >>>> environments. We still have projects which require the cloud to own most >>>> responsibility for failure than the users. >>>> >>>> But for environments were we offer general purpose / best effort >>>> compute and storage, what methods are available to help the user be >>>> resilient to block storage failures? >>>> >>>> Joe >>>> >>>> On Mon, Feb 8, 2016 at 12:09 PM, Robert Starmer <[email protected]> >>>> wrote: >>>> >>>>> I've always recommended providing multiple underlying storage services >>>>> to provide this rather than adding the overhead to the VM. So, not in any >>>>> of my systems or any I've worked with. >>>>> >>>>> R >>>>> >>>>> >>>>> >>>>> On Fri, Feb 5, 2016 at 5:56 PM, Joe Topjian <[email protected]> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> Does anyone have users RAID'ing or striping multiple block storage >>>>>> volumes from within an instance? >>>>>> >>>>>> If so, what was the experience? Good, bad, possible but with caveats? >>>>>> >>>>>> Thanks, >>>>>> Joe >>>>>> >>>>>> _______________________________________________ >>>>>> OpenStack-operators mailing list >>>>>> [email protected] >>>>>> >>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>>>>> >>>>>> >>>>> >>>> _______________________________________________ >>>> OpenStack-operators mailing >>>> [email protected]http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>>> >>>> >>>> >>>> _______________________________________________ >>>> OpenStack-operators mailing list >>>> [email protected] >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>>> >>>> >>> >> >> _______________________________________________ >> OpenStack-operators mailing list >> [email protected] >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> >> >
_______________________________________________ OpenStack-operators mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
