Re: [Gluster-users] about HA infrastructure for hypervisors
On Thu, Jun 28, 2012 at 10:40:43AM -0500, Nathan Stratton wrote: > But wait, yes, I have 16 physical disks, but I am running distribute > + replicate so the 8 physical boxes are broken up into 4 pairs of > redundant boxes. When I do a write, I am writing on two servers, or > 4 physical disks. So in my case, 31.1 MB/s vs about 200 MB/s native > is not that bad. > > DRDB is MUCH faster, but your not comparing apples to apples. DRBD > has worked great for me in the past when I only needed two storage > nodes to be mirrored in active/active, but as soon as you grow past > that you need to look at something like Gluster. But we're talking different things here: * VM image (i.e. the root filesystem it boots from; where the O/S sits; logs and scratch space) * Application data storage You'd be mad to have terabytes of data sitting inside a single VM image file. It's unshareable, the VM image is one big humungous blob, and to back it up effectively you need to run the backup tools within the VM itself. Furthermore, the performance of GlusterFS is excellent when you mount it directly. It only sucks when you're using a gluster-mounted file as a KVM virtual disk. So what I'm suggesting is, if you need performance today: * Use DRBD+LVM for your VM filesystem storage * Use glusterfs for your "big data", and attach it to those VM(s) which need to access it - leveraging the naturally shared nature of glusterfs. And eventually you'll be able to simplify your system by migrating your VM images to glusterfs, when performance catches up. Ganeti can manage both types of cluster, so you don't lose out by learning it up-front. Regards, Brian. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] about HA infrastructure for hypervisors
On Thu, 28 Jun 2012, Brian Candler wrote: As I said: don't use 10GE over CAT6/RJ45 (10Gbase-T). That does indeed have poor latency. As I understand it: in order to get such high data rates over copper, it has to employ mechanisms similar to DSL lines, like interleaving, which means 10G has comparable or even higher latency than 1G. Switches with all 10Gbase-T ports are expensive, only available from a couple of vendors, and consume a lot of power. One more thing on the "don't use 10GE over CAT6/RJ45" debate. Where do you get that 10GBase-T has power latency? If you look at 1000Base-T latency ranges from 1us to over 12us with 10GBase-T ranging from just over 2us to 4us a MUCH tighter latency range. When you look ate large packets this is even more pronounced. So yes for very small packets 1000Base-T is faster, but we are talking about 1 - 2 us difference and only on small packets. As far as power, original PHYs were almost 7 Watts per port! Today, almost every switch out there is less then 1 Watt per port and falling based on Moore's Law. <> Nathan Stratton nathan at robotics.net http://www.robotics.net ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] about HA infrastructure for hypervisors
On Thu, 28 Jun 2012, Brian Candler wrote: On Thu, Jun 28, 2012 at 11:25:20AM +0200, Nicolas Sebrecht wrote: We excluded ethernet due to searches on the web. It appeared that ethernet has bad latency. I read it on the web, it must be true :-) As I said: don't use 10GE over CAT6/RJ45 (10Gbase-T). That does indeed have poor latency. As I understand it: in order to get such high data rates over copper, it has to employ mechanisms similar to DSL lines, like interleaving, which means 10G has comparable or even higher latency than 1G. Switches with all 10Gbase-T ports are expensive, only available from a couple of vendors, and consume a lot of power. First gen PHYs did suck a LOT of power, but this has changed. With the next gen PHYs the cost of the 10GBase-T switches has also dropped. They are now the lowest cost switches out there. I like Arista Networks the best, but you can get a switch for well under 10K from guys like Interface Masters. New servers from SuperMicro and others now offer 10GBase-T native drastically lowering the cost per port. I have been working with Intel X520-DA2 NICs and a Netgear XSM7224S switch, and direct-attach cables (3m Netgear AXC763, 5m Intel) This all works fine, although with older versions of Linux I had to build the latest Intel drivers from their website to fix problems with the links going down every day or two. Ya, I ran into a odd problem with my X520s where everything worked except VLANs on bonded interfaces. It took me a full day to figure out it was not my switch or config, but the stupid ixgbe driver! <> Nathan Stratton nathan at robotics.net http://www.robotics.net ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] about HA infrastructure for hypervisors
On Thu, 28 Jun 2012, Brian Candler wrote: On Wed, Jun 27, 2012 at 05:28:43PM -0500, Nathan Stratton wrote: [root@virt01 ~]# dd if=/dev/zero of=foo bs=1M count=5k 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 26.8408 s, 200 MB/s But doing a dd if=/dev/zero bs=1024k within a VM, whose image was mounted on glusterfs, I was getting only 6-25MB/s. [root@test ~]# dd if=/dev/zero of=foo bs=1M count=5k 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 172.706 s, 31.1 MB/s That's what I consider unimpressive - slower than a single disk, when you have an array of 16. I should try a pair of drbd nodes as a fair comparison though. But wait, yes, I have 16 physical disks, but I am running distribute + replicate so the 8 physical boxes are broken up into 4 pairs of redundant boxes. When I do a write, I am writing on two servers, or 4 physical disks. So in my case, 31.1 MB/s vs about 200 MB/s native is not that bad. DRDB is MUCH faster, but your not comparing apples to apples. DRBD has worked great for me in the past when I only needed two storage nodes to be mirrored in active/active, but as soon as you grow past that you need to look at something like Gluster. With GlusterFS my single write is slower at 31.1 MB/s, but I can do that many many more times over my 8 nodes without losing I/O. Having said that, multiple clients running concurrently should be able to use the remaining bandwidth, so the aggregate throughput should be fine. Correct. <> Nathan Stratton nathan at robotics.net http://www.robotics.net ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] about HA infrastructure for hypervisors
It would be interesting if it could read in a round-robin manner from where it contains the data. Eventually if the local storage is too busy (and therefore providing higher latency times) it would be good to read some of the data from another quiet node which, even over the network, could provide better latency times. In short distribute the IO load across the whole cluster if it contains multiple copies of the data. What's other's opinions on that ? Regards, Fernando -Original Message- From: Tim Bell [mailto:tim.b...@cern.ch] Sent: 28 June 2012 11:55 To: Fernando Frediani (Qube); 'Nicolas Sebrecht'; 'Thomas Jackson' Cc: 'gluster-users' Subject: RE: [Gluster-users] about HA infrastructure for hypervisors Assuming that we use a 3 copy approach across the hypervisors, does Gluster favour the local copy on the hypervisor if the data is on distributed/replicated ? It would be good to avoid the network hop when the data is on the local disk. Tim > -Original Message- > From: gluster-users-boun...@gluster.org [mailto:gluster-users- > boun...@gluster.org] On Behalf Of Fernando Frediani (Qube) > Sent: 28 June 2012 11:43 > To: 'Nicolas Sebrecht'; 'Thomas Jackson' > Cc: 'gluster-users' > Subject: Re: [Gluster-users] about HA infrastructure for hypervisors > > You should indeed to use the same server running as a storage brick as a > KVM host to maximize hardware and power usage. Only thing I am not sure > is if you can limit the amount of host memory Gluster can eat so most of it > gets reserved for the Virtual Machines. > > Fernando > > -Original Message- > From: gluster-users-boun...@gluster.org [mailto:gluster-users- > boun...@gluster.org] On Behalf Of Nicolas Sebrecht > Sent: 28 June 2012 10:31 > To: Thomas Jackson > Cc: 'gluster-users' > Subject: [Gluster-users] Re: about HA infrastructure for hypervisors > > The 28/06/12, Thomas Jackson wrote: > > > Why don't you have KVM running on the Gluster bricks as well? > > Good point. While abtracting we decided to seperate KVM & Gluster but I > can't remember why. > We'll think about that again. > > > We have a 4 node cluster (each with 4x 300GB 15k SAS drives in > > RAID10), 10 gigabit SFP+ Ethernet (with redundant switching). Each > > node participates in a distribute+replicate Gluster namespace and runs > > KVM. We found this to be the most efficient (and fastest) way to run the > cluster. > > > > This works well for us, although (due to Gluster using fuse) it isn't > > as fast as we would like. Currently waiting for the KVM driver that > > has been discussed a few times recently, that should make a huge > > difference to the performance for us. > > Ok! Thanks. > > -- > Nicolas Sebrecht > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] about HA infrastructure for hypervisors
No I saw a patch to have it behave like this, but I can't find it right now. On 6/28/12 6:54 AM, Tim Bell wrote: Assuming that we use a 3 copy approach across the hypervisors, does Gluster favour the local copy on the hypervisor if the data is on distributed/replicated ? It would be good to avoid the network hop when the data is on the local disk. Tim -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users- boun...@gluster.org] On Behalf Of Fernando Frediani (Qube) Sent: 28 June 2012 11:43 To: 'Nicolas Sebrecht'; 'Thomas Jackson' Cc: 'gluster-users' Subject: Re: [Gluster-users] about HA infrastructure for hypervisors You should indeed to use the same server running as a storage brick as a KVM host to maximize hardware and power usage. Only thing I am not sure is if you can limit the amount of host memory Gluster can eat so most of it gets reserved for the Virtual Machines. Fernando -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users- boun...@gluster.org] On Behalf Of Nicolas Sebrecht Sent: 28 June 2012 10:31 To: Thomas Jackson Cc: 'gluster-users' Subject: [Gluster-users] Re: about HA infrastructure for hypervisors The 28/06/12, Thomas Jackson wrote: Why don't you have KVM running on the Gluster bricks as well? Good point. While abtracting we decided to seperate KVM & Gluster but I can't remember why. We'll think about that again. We have a 4 node cluster (each with 4x 300GB 15k SAS drives in RAID10), 10 gigabit SFP+ Ethernet (with redundant switching). Each node participates in a distribute+replicate Gluster namespace and runs KVM. We found this to be the most efficient (and fastest) way to run the cluster. This works well for us, although (due to Gluster using fuse) it isn't as fast as we would like. Currently waiting for the KVM driver that has been discussed a few times recently, that should make a huge difference to the performance for us. Ok! Thanks. -- Nicolas Sebrecht ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] about HA infrastructure for hypervisors
Assuming that we use a 3 copy approach across the hypervisors, does Gluster favour the local copy on the hypervisor if the data is on distributed/replicated ? It would be good to avoid the network hop when the data is on the local disk. Tim > -Original Message- > From: gluster-users-boun...@gluster.org [mailto:gluster-users- > boun...@gluster.org] On Behalf Of Fernando Frediani (Qube) > Sent: 28 June 2012 11:43 > To: 'Nicolas Sebrecht'; 'Thomas Jackson' > Cc: 'gluster-users' > Subject: Re: [Gluster-users] about HA infrastructure for hypervisors > > You should indeed to use the same server running as a storage brick as a > KVM host to maximize hardware and power usage. Only thing I am not sure > is if you can limit the amount of host memory Gluster can eat so most of it > gets reserved for the Virtual Machines. > > Fernando > > -Original Message- > From: gluster-users-boun...@gluster.org [mailto:gluster-users- > boun...@gluster.org] On Behalf Of Nicolas Sebrecht > Sent: 28 June 2012 10:31 > To: Thomas Jackson > Cc: 'gluster-users' > Subject: [Gluster-users] Re: about HA infrastructure for hypervisors > > The 28/06/12, Thomas Jackson wrote: > > > Why don't you have KVM running on the Gluster bricks as well? > > Good point. While abtracting we decided to seperate KVM & Gluster but I > can't remember why. > We'll think about that again. > > > We have a 4 node cluster (each with 4x 300GB 15k SAS drives in > > RAID10), 10 gigabit SFP+ Ethernet (with redundant switching). Each > > node participates in a distribute+replicate Gluster namespace and runs > > KVM. We found this to be the most efficient (and fastest) way to run the > cluster. > > > > This works well for us, although (due to Gluster using fuse) it isn't > > as fast as we would like. Currently waiting for the KVM driver that > > has been discussed a few times recently, that should make a huge > > difference to the performance for us. > > Ok! Thanks. > > -- > Nicolas Sebrecht > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users smime.p7s Description: S/MIME cryptographic signature ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] about HA infrastructure for hypervisors
Interesting info Brian, I am surprised with this actually. Would always expect 10Gig have a very good and low latency times. Obviously I wouldn't expect copper be exactly the same as Fibre due the losses, but not much behind either. Please share any future results you get, as it's quiet value information for people before designing their systems. Regards, Fernando -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Brian Candler Sent: 28 June 2012 11:33 To: Nicolas Sebrecht Cc: gluster-users Subject: Re: [Gluster-users] about HA infrastructure for hypervisors On Thu, Jun 28, 2012 at 11:25:20AM +0200, Nicolas Sebrecht wrote: > We excluded ethernet due to searches on the web. It appeared that > ethernet has bad latency. I read it on the web, it must be true :-) As I said: don't use 10GE over CAT6/RJ45 (10Gbase-T). That does indeed have poor latency. As I understand it: in order to get such high data rates over copper, it has to employ mechanisms similar to DSL lines, like interleaving, which means 10G has comparable or even higher latency than 1G. Switches with all 10Gbase-T ports are expensive, only available from a couple of vendors, and consume a lot of power. However switches with SFP+ ports don't have these problems. For short reach you can use SFP+ direct attach cables, and for long reach use fibre. http://en.wikipedia.org/wiki/10-gigabit_Ethernet I have been working with Intel X520-DA2 NICs and a Netgear XSM7224S switch, and direct-attach cables (3m Netgear AXC763, 5m Intel) This all works fine, although with older versions of Linux I had to build the latest Intel drivers from their website to fix problems with the links going down every day or two. Regards, Brian. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] about HA infrastructure for hypervisors
On Thu, Jun 28, 2012 at 11:25:20AM +0200, Nicolas Sebrecht wrote: > We excluded ethernet due to searches on the web. It appeared that > ethernet has bad latency. I read it on the web, it must be true :-) As I said: don't use 10GE over CAT6/RJ45 (10Gbase-T). That does indeed have poor latency. As I understand it: in order to get such high data rates over copper, it has to employ mechanisms similar to DSL lines, like interleaving, which means 10G has comparable or even higher latency than 1G. Switches with all 10Gbase-T ports are expensive, only available from a couple of vendors, and consume a lot of power. However switches with SFP+ ports don't have these problems. For short reach you can use SFP+ direct attach cables, and for long reach use fibre. http://en.wikipedia.org/wiki/10-gigabit_Ethernet I have been working with Intel X520-DA2 NICs and a Netgear XSM7224S switch, and direct-attach cables (3m Netgear AXC763, 5m Intel) This all works fine, although with older versions of Linux I had to build the latest Intel drivers from their website to fix problems with the links going down every day or two. Regards, Brian. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] about HA infrastructure for hypervisors
You should indeed to use the same server running as a storage brick as a KVM host to maximize hardware and power usage. Only thing I am not sure is if you can limit the amount of host memory Gluster can eat so most of it gets reserved for the Virtual Machines. Fernando -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Nicolas Sebrecht Sent: 28 June 2012 10:31 To: Thomas Jackson Cc: 'gluster-users' Subject: [Gluster-users] Re: about HA infrastructure for hypervisors The 28/06/12, Thomas Jackson wrote: > Why don't you have KVM running on the Gluster bricks as well? Good point. While abtracting we decided to seperate KVM & Gluster but I can't remember why. We'll think about that again. > We have a 4 node cluster (each with 4x 300GB 15k SAS drives in > RAID10), 10 gigabit SFP+ Ethernet (with redundant switching). Each > node participates in a distribute+replicate Gluster namespace and runs > KVM. We found this to be the most efficient (and fastest) way to run the > cluster. > > This works well for us, although (due to Gluster using fuse) it isn't > as fast as we would like. Currently waiting for the KVM driver that > has been discussed a few times recently, that should make a huge > difference to the performance for us. Ok! Thanks. -- Nicolas Sebrecht ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] about HA infrastructure for hypervisors
On Wed, Jun 27, 2012 at 05:28:43PM -0500, Nathan Stratton wrote: > [root@virt01 ~]# dd if=/dev/zero of=foo bs=1M count=5k > 5120+0 records in > 5120+0 records out > 5368709120 bytes (5.4 GB) copied, 26.8408 s, 200 MB/s > > >But doing a dd if=/dev/zero bs=1024k within a VM, whose image was mounted on > >glusterfs, I was getting only 6-25MB/s. > > [root@test ~]# dd if=/dev/zero of=foo bs=1M count=5k > 5120+0 records in > 5120+0 records out > 5368709120 bytes (5.4 GB) copied, 172.706 s, 31.1 MB/s That's what I consider unimpressive - slower than a single disk, when you have an array of 16. I should try a pair of drbd nodes as a fair comparison though. Having said that, multiple clients running concurrently should be able to use the remaining bandwidth, so the aggregate throughput should be fine. BTW, there used to be an issue with glusterfs 3.2 in the way it scaled its thread pool, which limited concurrency: http://gluster.org/pipermail/gluster-users/2012-February/009590.html But that has been fixed in 3.3 (commit 2d836326) Regards, Brian. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] about HA infrastructure for hypervisors
Why don't you have KVM running on the Gluster bricks as well? We have a 4 node cluster (each with 4x 300GB 15k SAS drives in RAID10), 10 gigabit SFP+ Ethernet (with redundant switching). Each node participates in a distribute+replicate Gluster namespace and runs KVM. We found this to be the most efficient (and fastest) way to run the cluster. This works well for us, although (due to Gluster using fuse) it isn't as fast as we would like. Currently waiting for the KVM driver that has been discussed a few times recently, that should make a huge difference to the performance for us. Cheers, Thomas -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Nicolas Sebrecht Sent: Wednesday, 27 June 2012 9:13 PM To: Gerald Brandt Cc: gluster-users Subject: [Gluster-users] Re: about HA infrastructure for hypervisors The 27/06/12, Gerald Brandt wrote: > Hi, > > If your switch breaks, you are done. Put each Gluster server on it's own switch. Right. Handling switch failures isn't what I'm most worried about but I guess that I'll need to add a network link between KVM hypervisors, too. Thanks for this tip, though. > > ++ ++ > > ||--|| > > | KVM hypervisor |---+ +---| KVM hypervisor | > > || | | || > > ++ | | ++ > >| | > > +--+ +--+ > > |switch| |switch| > > +--+ +--+ > > | | | | > > +---+ | | | | +---+ > > | | | | | +-| | > > | Glusterfs 3.3 |--+ +--)| Glusterfs 3.3 | > > | server A| || server B| > > | |---+| | > > +---++---+ -- Nicolas Sebrecht ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- Message protected by MailGuard: e-mail anti-virus, anti-spam and content filtering. http://www.mailguard.com.au ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] about HA infrastructure for hypervisors
On Wed, 27 Jun 2012, Brian Candler wrote: For a 16-disk array, your IOPS is not bad. But are you actually storing a VM image on it, and then doing lots of I/O within that VM (as opposed to mounting the volume form within the VM)? If so, can you specify your exact configuration, including OS and kernel versions? 2.6.32-220.23.1.el6.x86_64 [root@virt01 ~]# gluster volume info share Volume Name: share Type: Distributed-Replicate Volume ID: 09bfc0c3-e3d4-441b-af6f-acd263884920 Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: 10.59.0.11:/export Brick2: 10.59.0.12:/export Brick3: 10.59.0.13:/export Brick4: 10.59.0.14:/export Brick5: 10.59.0.15:/export Brick6: 10.59.0.16:/export Brick7: 10.59.0.17:/export Brick8: 10.59.0.18:/export Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on nfs.nlm: off auth.allow: * nfs.disable: off I did my tests on two quad-core/8GB nodes, 12 disks in each (md RAID10), running ubuntu 12.04, and 10GE RJ45 direct connection. The disk arrays locally perform at 350MB/s for streaming writes. Well I would first ditch ubuntu and install Centos, but. My disk arrays are slow: [root@virt01 ~]# dd if=/dev/zero of=foo bs=1M count=5k 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 26.8408 s, 200 MB/s But doing a dd if=/dev/zero bs=1024k within a VM, whose image was mounted on glusterfs, I was getting only 6-25MB/s. [root@test ~]# dd if=/dev/zero of=foo bs=1M count=5k 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB) copied, 172.706 s, 31.1 MB/s While this is slower then what I would like it see, its faster then what I was getting to my NetApp and it scales better! :) <> Nathan Stratton nathan at robotics.net http://www.robotics.net ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] about HA infrastructure for hypervisors
On Wed, Jun 27, 2012 at 03:07:21PM -0500, Nathan Stratton wrote: > >I've made a test setup like this, but unfortunately I haven't yet been able > >to get half-decent performance out of glusterfs 3.3 as a KVM backend. It > >may work better if you use local disk for the VM images, and within the VM > >mount the glusterfs volume for application data. > > What is considered half-decent? I have a 8 cluster > distribute+replicate setup and I am getting about 65MB/s and about > 1.5K IOPS. Considering that I am only using a single two disk SAS > strip in each host I think that is not bad. For a 16-disk array, your IOPS is not bad. But are you actually storing a VM image on it, and then doing lots of I/O within that VM (as opposed to mounting the volume form within the VM)? If so, can you specify your exact configuration, including OS and kernel versions? I did my tests on two quad-core/8GB nodes, 12 disks in each (md RAID10), running ubuntu 12.04, and 10GE RJ45 direct connection. The disk arrays locally perform at 350MB/s for streaming writes. But doing a dd if=/dev/zero bs=1024k within a VM, whose image was mounted on glusterfs, I was getting only 6-25MB/s. http://gluster.org/pipermail/gluster-users/2012-June/010553.html http://gluster.org/pipermail/gluster-users/2012-June/010560.html http://gluster.org/pipermail/gluster-users/2012-June/010570.html I get much better performance on locally-attached storage with O_DIRECT (kvm option "cache=none"), but have been unable to get O_DIRECT to work with glusterfs. After a kernel upgrade (to a 3.4+ kernel which supports O_DIRECT for fuse), and using the mount option direct-io-mode=enable, the VM simply wouldn't boot: http://gluster.org/pipermail/gluster-users/2012-June/010572.html http://gluster.org/pipermail/gluster-users/2012-June/010573.html Hence I'm keen to learn the recipe for good performance with glusterfs storing VM images, as if it exists, it doesn't seem to be well documented at all. Regards, Brian. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] about HA infrastructure for hypervisors
On Wed, 27 Jun 2012, Brian Candler wrote: I've made a test setup like this, but unfortunately I haven't yet been able to get half-decent performance out of glusterfs 3.3 as a KVM backend. It may work better if you use local disk for the VM images, and within the VM mount the glusterfs volume for application data. What is considered half-decent? I have a 8 cluster distribute+replicate setup and I am getting about 65MB/s and about 1.5K IOPS. Considering that I am only using a single two disk SAS strip in each host I think that is not bad. Alternatively, look at something like ganeti (which by default runs on top of drbd+LVM, although you can also use it to manage a cluster which uses a shared file store backend like gluster) Maybe 3.3.1 will be better. But today, your investment in SSDs is quite likely to be wasted :-( The idea is to have HA if either one KVM hypervisor or one Glusterfs server stop working (failure, maintenance, etc). You'd also need some mechanism for starting each VM on node B if node A fails. You can probably script that, although there are lots of hazards for the unwary. Maybe better to have the failover done manually. Also check out oVirt, it integrates with Gluster and provides HA. 2. We still didn't decide what physical network to choose between FC, FCoE and Infiniband. Have you ruled out 10G ethernet? If so, why? I agree, we went all 10GBase-T. (note: using SFP+ ports, either with fibre SFP+s or SFP+ coax cables, gives much better latency that 10G over RJ45/CAT6) Actually with the new switches like Arista this is less of an issue. 3. Would it be better to split the Glusterfs namespace into two gluster volumes (one for each hypervisor), each running on a Glusterfs server (for the normal case where all servers are running)? I don't see how that would help - I expect you would mount both volumes on both KVM nodes anyway, to allow you to do live migration. Yep <> Nathan Stratton nathan at robotics.net http://www.robotics.net ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] about HA infrastructure for hypervisors
On Wed, Jun 27, 2012 at 10:06:30AM +0200, Nicolas Sebrecht wrote: > We are going to try glusterfs for our new HA servers. > > To get full HA, I'm thinking of building it this way: > > ++ ++ > || || > | KVM hypervisor |-++---| KVM hypervisor | > || || || > ++ || ++ > || > +--+ > |switch| > +--+ > || > +---+ ||+---+ > | | ||| | > | Glusterfs 3.3 |--++| Glusterfs 3.3 | > | server A|| server B| > | || | > +---++---+ I've made a test setup like this, but unfortunately I haven't yet been able to get half-decent performance out of glusterfs 3.3 as a KVM backend. It may work better if you use local disk for the VM images, and within the VM mount the glusterfs volume for application data. Alternatively, look at something like ganeti (which by default runs on top of drbd+LVM, although you can also use it to manage a cluster which uses a shared file store backend like gluster) Maybe 3.3.1 will be better. But today, your investment in SSDs is quite likely to be wasted :-( > The idea is to have HA if either one KVM hypervisor or one Glusterfs > server stop working (failure, maintenance, etc). You'd also need some mechanism for starting each VM on node B if node A fails. You can probably script that, although there are lots of hazards for the unwary. Maybe better to have the failover done manually. > 2. We still didn't decide what physical network to choose between FC, FCoE > and Infiniband. Have you ruled out 10G ethernet? If so, why? (note: using SFP+ ports, either with fibre SFP+s or SFP+ coax cables, gives much better latency that 10G over RJ45/CAT6) > 3. Would it be better to split the Glusterfs namespace into two gluster > volumes (one for each hypervisor), each running on a Glusterfs server > (for the normal case where all servers are running)? I don't see how that would help - I expect you would mount both volumes on both KVM nodes anyway, to allow you to do live migration. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] about HA infrastructure for hypervisors
If you do decide to use 2 switchs: 1) for KVM Hosts use 2 nics, bridge them and run KVM on the bridge (usually br0), link each nic to a different switch 2) Interlink those switches. Dan -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Nicolas Sebrecht Sent: Wednesday, June 27, 2012 4:13 AM To: Gerald Brandt Cc: gluster-users Subject: [Gluster-users] Re: about HA infrastructure for hypervisors The 27/06/12, Gerald Brandt wrote: > Hi, > > If your switch breaks, you are done. Put each Gluster server on it's own > switch. Right. Handling switch failures isn't what I'm most worried about but I guess that I'll need to add a network link between KVM hypervisors, too. Thanks for this tip, though. > > ++ ++ > > ||--|| > > | KVM hypervisor |---+ +---| KVM hypervisor | > > || | | || > > ++ | | ++ > >| | > > +--+ +--+ > > |switch| |switch| > > +--+ +--+ > > | | | | > > +---+ | | | | +---+ > > | | | | | +-| | > > | Glusterfs 3.3 |--+ +--)| Glusterfs 3.3 | > > | server A| || server B| > > | |---+| | > > +---++---+ -- Nicolas Sebrecht ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] about HA infrastructure for hypervisors
- Original Message - > From: "Nicolas Sebrecht" > To: "gluster-users" > Sent: Wednesday, June 27, 2012 3:06:30 AM > Subject: [Gluster-users] about HA infrastructure for hypervisors > > Hi, > > We are going to try glusterfs for our new HA servers. > > To get full HA, I'm thinking of building it this way: > > ++ ++ > || || > | KVM hypervisor |-++---| KVM hypervisor | > || || || > ++ || ++ > || > +--+ > |switch| > +--+ > || > +---+ ||+---+ > | | ||| | > | Glusterfs 3.3 |--++| Glusterfs 3.3 | > | server A|| server B| > | || | > +---++---+ > > > The idea is to have HA if either one KVM hypervisor or one Glusterfs > server stop working (failure, maintenance, etc). > > Some points: > - We don't care much about duplicating the network (we're going to > have > spare materials only). > - Glusterfs servers will use gluster replication to get HA. > - Each Glusterfs server will have SSD disks in a RAID (1 or 10, I > guess). > - Most of the time, both KVM hypervisor will have VM running. > > 1. Is this a correct/typicall infrastructure? > > 2. We still didn't decide what physical network to choose between FC, > FCoE > and Infiniband. What would you suggest for both performance and easy > configuration? > > Is it possible to use FC or FCoE for a HA Glusterfs cluster? If so, > how > to configure Glusterfs nodes? > > 3. Would it be better to split the Glusterfs namespace into two > gluster > volumes (one for each hypervisor), each running on a Glusterfs server > (for the normal case where all servers are running)? > > > Thanks, > > -- > Nicolas Sebrecht Hi, If your switch breaks, you are done. Put each Gluster server on it's own switch. > ++ ++ > || || > | KVM hypervisor |---+ +---| KVM hypervisor | > || | | || > ++ | | ++ >| | > +--+ +--+ > |switch| |switch| > +--+ +--+ > | | | | > +---+ | | | | +---+ > | | | | | +-| | > | Glusterfs 3.3 |--+ +--)| Glusterfs 3.3 | > | server A| || server B| > | |---+| | > +---++---+ ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users