Re: [Gluster-users] about HA infrastructure for hypervisors

2012-06-28 Thread Brian Candler
On Thu, Jun 28, 2012 at 10:40:43AM -0500, Nathan Stratton wrote:
> But wait, yes, I have 16 physical disks, but I am running distribute
> + replicate so the 8 physical boxes are broken up into 4 pairs of
> redundant boxes. When I do a write, I am writing on two servers, or
> 4 physical disks. So in my case, 31.1 MB/s vs about 200 MB/s native
> is not that bad.
> 
> DRDB is MUCH faster, but your not comparing apples to apples. DRBD
> has worked great for me in the past when I only needed two storage
> nodes to be mirrored in active/active, but as soon as you grow past
> that you need to look at something like Gluster.

But we're talking different things here:

* VM image (i.e. the root filesystem it boots from; where the O/S sits;
  logs and scratch space)
* Application data storage

You'd be mad to have terabytes of data sitting inside a single VM image
file.  It's unshareable, the VM image is one big humungous blob, and to back
it up effectively you need to run the backup tools within the VM itself.

Furthermore, the performance of GlusterFS is excellent when you mount it
directly.  It only sucks when you're using a gluster-mounted file as a KVM
virtual disk.

So what I'm suggesting is, if you need performance today:

* Use DRBD+LVM for your VM filesystem storage
* Use glusterfs for your "big data", and attach it to those VM(s) which
need to access it - leveraging the naturally shared nature of glusterfs.

And eventually you'll be able to simplify your system by migrating your VM
images to glusterfs, when performance catches up.

Ganeti can manage both types of cluster, so you don't lose out by learning
it up-front.

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] about HA infrastructure for hypervisors

2012-06-28 Thread Nathan Stratton

On Thu, 28 Jun 2012, Brian Candler wrote:


As I said: don't use 10GE over CAT6/RJ45 (10Gbase-T). That does indeed have
poor latency.  As I understand it: in order to get such high data rates over
copper, it has to employ mechanisms similar to DSL lines, like interleaving,
which means 10G has comparable or even higher latency than 1G.  Switches
with all 10Gbase-T ports are expensive, only available from a couple of
vendors, and consume a lot of power.


One more thing on the "don't use 10GE over CAT6/RJ45" debate. Where do you 
get that 10GBase-T has power latency? If you look at 1000Base-T latency 
ranges from 1us to over 12us with 10GBase-T ranging from just over 2us to 
4us a MUCH tighter latency range. When you look ate large packets this is 
even more pronounced. So yes for very small packets 1000Base-T is faster, 
but we are talking about 1 - 2 us difference and only on small packets.


As far as power, original PHYs were almost 7 Watts per port! Today, almost 
every switch out there is less then 1 Watt per port and falling based on 
Moore's Law.



<>

Nathan Stratton
nathan at robotics.net
http://www.robotics.net
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] about HA infrastructure for hypervisors

2012-06-28 Thread Nathan Stratton

On Thu, 28 Jun 2012, Brian Candler wrote:


On Thu, Jun 28, 2012 at 11:25:20AM +0200, Nicolas Sebrecht wrote:

We excluded ethernet due to searches on the web. It appeared that
ethernet has bad latency.


I read it on the web, it must be true :-)

As I said: don't use 10GE over CAT6/RJ45 (10Gbase-T). That does indeed have
poor latency.  As I understand it: in order to get such high data rates over
copper, it has to employ mechanisms similar to DSL lines, like interleaving,
which means 10G has comparable or even higher latency than 1G.  Switches
with all 10Gbase-T ports are expensive, only available from a couple of
vendors, and consume a lot of power.


First gen PHYs did suck a LOT of power, but this has changed. With the 
next gen PHYs the cost of the 10GBase-T switches has also dropped. They 
are now the lowest cost switches out there. I like Arista Networks the 
best, but you can get a switch for well under 10K from guys like 
Interface Masters.


New servers from SuperMicro and others now offer 10GBase-T native 
drastically lowering the cost per port.



I have been working with Intel X520-DA2 NICs and a Netgear XSM7224S switch,
and direct-attach cables (3m Netgear AXC763, 5m Intel)

This all works fine, although with older versions of Linux I had to build
the latest Intel drivers from their website to fix problems with the links
going down every day or two.


Ya, I ran into a odd problem with my X520s where everything worked except 
VLANs on bonded interfaces. It took me a full day to figure out it was not 
my switch or config, but the stupid ixgbe driver!



<>

Nathan Stratton
nathan at robotics.net
http://www.robotics.net
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] about HA infrastructure for hypervisors

2012-06-28 Thread Nathan Stratton

On Thu, 28 Jun 2012, Brian Candler wrote:


On Wed, Jun 27, 2012 at 05:28:43PM -0500, Nathan Stratton wrote:

[root@virt01 ~]# dd if=/dev/zero of=foo bs=1M count=5k
5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB) copied, 26.8408 s, 200 MB/s


But doing a dd if=/dev/zero bs=1024k within a VM, whose image was mounted on
glusterfs, I was getting only 6-25MB/s.


[root@test ~]# dd if=/dev/zero of=foo bs=1M count=5k
5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB) copied, 172.706 s, 31.1 MB/s


That's what I consider unimpressive - slower than a single disk, when you
have an array of 16.  I should try a pair of drbd nodes as a fair comparison
though.


But wait, yes, I have 16 physical disks, but I am running distribute + 
replicate so the 8 physical boxes are broken up into 4 pairs of redundant 
boxes. When I do a write, I am writing on two servers, or 4 physical 
disks. So in my case, 31.1 MB/s vs about 200 MB/s native is not that bad.


DRDB is MUCH faster, but your not comparing apples to apples. DRBD has 
worked great for me in the past when I only needed two storage nodes to be 
mirrored in active/active, but as soon as you grow past that you need to 
look at something like Gluster. With GlusterFS my single write is slower 
at 31.1 MB/s, but I can do that many many more times over my 8 nodes 
without losing I/O.



Having said that, multiple clients running concurrently should be able to
use the remaining bandwidth, so the aggregate throughput should be fine.


Correct.


<>

Nathan Stratton
nathan at robotics.net
http://www.robotics.net
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] about HA infrastructure for hypervisors

2012-06-28 Thread Fernando Frediani (Qube)
It would be interesting if it could read in a round-robin manner from where it 
contains the data. Eventually if the local storage is too busy (and therefore 
providing higher latency times) it would be good to read some of the data from 
another quiet node which, even over the network, could provide better latency 
times. In short distribute the IO load across the whole cluster if it contains 
multiple copies of the data.
What's other's opinions on that ?

Regards,

Fernando

-Original Message-
From: Tim Bell [mailto:tim.b...@cern.ch] 
Sent: 28 June 2012 11:55
To: Fernando Frediani (Qube); 'Nicolas Sebrecht'; 'Thomas Jackson'
Cc: 'gluster-users'
Subject: RE: [Gluster-users] about HA infrastructure for hypervisors


Assuming that we use a 3 copy approach across the hypervisors, does Gluster
favour the local copy on the hypervisor if the data is on
distributed/replicated ? 

It would be good to avoid the network hop when the data is on the local
disk.

Tim

> -Original Message-
> From: gluster-users-boun...@gluster.org [mailto:gluster-users-
> boun...@gluster.org] On Behalf Of Fernando Frediani (Qube)
> Sent: 28 June 2012 11:43
> To: 'Nicolas Sebrecht'; 'Thomas Jackson'
> Cc: 'gluster-users'
> Subject: Re: [Gluster-users] about HA infrastructure for hypervisors
> 
> You should indeed to use the same server running as a storage brick as a
> KVM host to maximize hardware and power usage. Only thing I am not sure
> is if you can limit the amount of host memory Gluster can eat so most of
it
> gets reserved for the Virtual Machines.
> 
> Fernando
> 
> -Original Message-
> From: gluster-users-boun...@gluster.org [mailto:gluster-users-
> boun...@gluster.org] On Behalf Of Nicolas Sebrecht
> Sent: 28 June 2012 10:31
> To: Thomas Jackson
> Cc: 'gluster-users'
> Subject: [Gluster-users] Re: about HA infrastructure for hypervisors
> 
> The 28/06/12, Thomas Jackson wrote:
> 
> > Why don't you have KVM running on the Gluster bricks as well?
> 
> Good point. While abtracting we decided to seperate KVM & Gluster but I
> can't remember why.
> We'll think about that again.
> 
> > We have a 4 node cluster (each with 4x 300GB 15k SAS drives in
> > RAID10), 10 gigabit SFP+ Ethernet (with redundant switching). Each
> > node participates in a distribute+replicate Gluster namespace and runs
> > KVM. We found this to be the most efficient (and fastest) way to run the
> cluster.
> >
> > This works well for us, although (due to Gluster using fuse) it isn't
> > as fast as we would like. Currently waiting for the KVM driver that
> > has been discussed a few times recently, that should make a huge
> > difference to the performance for us.
> 
> Ok! Thanks.
> 
> --
> Nicolas Sebrecht
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] about HA infrastructure for hypervisors

2012-06-28 Thread David Coulson

No

I saw a patch to have it behave like this, but I can't find it right now.

On 6/28/12 6:54 AM, Tim Bell wrote:

Assuming that we use a 3 copy approach across the hypervisors, does Gluster
favour the local copy on the hypervisor if the data is on
distributed/replicated ?

It would be good to avoid the network hop when the data is on the local
disk.

Tim


-Original Message-
From: gluster-users-boun...@gluster.org [mailto:gluster-users-
boun...@gluster.org] On Behalf Of Fernando Frediani (Qube)
Sent: 28 June 2012 11:43
To: 'Nicolas Sebrecht'; 'Thomas Jackson'
Cc: 'gluster-users'
Subject: Re: [Gluster-users] about HA infrastructure for hypervisors

You should indeed to use the same server running as a storage brick as a
KVM host to maximize hardware and power usage. Only thing I am not sure
is if you can limit the amount of host memory Gluster can eat so most of

it

gets reserved for the Virtual Machines.

Fernando

-Original Message-
From: gluster-users-boun...@gluster.org [mailto:gluster-users-
boun...@gluster.org] On Behalf Of Nicolas Sebrecht
Sent: 28 June 2012 10:31
To: Thomas Jackson
Cc: 'gluster-users'
Subject: [Gluster-users] Re: about HA infrastructure for hypervisors

The 28/06/12, Thomas Jackson wrote:


Why don't you have KVM running on the Gluster bricks as well?

Good point. While abtracting we decided to seperate KVM & Gluster but I
can't remember why.
We'll think about that again.


We have a 4 node cluster (each with 4x 300GB 15k SAS drives in
RAID10), 10 gigabit SFP+ Ethernet (with redundant switching). Each
node participates in a distribute+replicate Gluster namespace and runs
KVM. We found this to be the most efficient (and fastest) way to run the

cluster.

This works well for us, although (due to Gluster using fuse) it isn't
as fast as we would like. Currently waiting for the KVM driver that
has been discussed a few times recently, that should make a huge
difference to the performance for us.

Ok! Thanks.

--
Nicolas Sebrecht
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] about HA infrastructure for hypervisors

2012-06-28 Thread Tim Bell

Assuming that we use a 3 copy approach across the hypervisors, does Gluster
favour the local copy on the hypervisor if the data is on
distributed/replicated ? 

It would be good to avoid the network hop when the data is on the local
disk.

Tim

> -Original Message-
> From: gluster-users-boun...@gluster.org [mailto:gluster-users-
> boun...@gluster.org] On Behalf Of Fernando Frediani (Qube)
> Sent: 28 June 2012 11:43
> To: 'Nicolas Sebrecht'; 'Thomas Jackson'
> Cc: 'gluster-users'
> Subject: Re: [Gluster-users] about HA infrastructure for hypervisors
> 
> You should indeed to use the same server running as a storage brick as a
> KVM host to maximize hardware and power usage. Only thing I am not sure
> is if you can limit the amount of host memory Gluster can eat so most of
it
> gets reserved for the Virtual Machines.
> 
> Fernando
> 
> -Original Message-
> From: gluster-users-boun...@gluster.org [mailto:gluster-users-
> boun...@gluster.org] On Behalf Of Nicolas Sebrecht
> Sent: 28 June 2012 10:31
> To: Thomas Jackson
> Cc: 'gluster-users'
> Subject: [Gluster-users] Re: about HA infrastructure for hypervisors
> 
> The 28/06/12, Thomas Jackson wrote:
> 
> > Why don't you have KVM running on the Gluster bricks as well?
> 
> Good point. While abtracting we decided to seperate KVM & Gluster but I
> can't remember why.
> We'll think about that again.
> 
> > We have a 4 node cluster (each with 4x 300GB 15k SAS drives in
> > RAID10), 10 gigabit SFP+ Ethernet (with redundant switching). Each
> > node participates in a distribute+replicate Gluster namespace and runs
> > KVM. We found this to be the most efficient (and fastest) way to run the
> cluster.
> >
> > This works well for us, although (due to Gluster using fuse) it isn't
> > as fast as we would like. Currently waiting for the KVM driver that
> > has been discussed a few times recently, that should make a huge
> > difference to the performance for us.
> 
> Ok! Thanks.
> 
> --
> Nicolas Sebrecht
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


smime.p7s
Description: S/MIME cryptographic signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] about HA infrastructure for hypervisors

2012-06-28 Thread Fernando Frediani (Qube)
Interesting info Brian,
I am surprised with this actually. Would always expect 10Gig have a very good 
and low latency times. Obviously I wouldn't expect copper be exactly the same 
as Fibre due the losses, but not much behind either.

Please share any future results you get, as it's quiet value information for 
people before designing their systems.

Regards,

Fernando

-Original Message-
From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Brian Candler
Sent: 28 June 2012 11:33
To: Nicolas Sebrecht
Cc: gluster-users
Subject: Re: [Gluster-users] about HA infrastructure for hypervisors

On Thu, Jun 28, 2012 at 11:25:20AM +0200, Nicolas Sebrecht wrote:
> We excluded ethernet due to searches on the web. It appeared that 
> ethernet has bad latency.

I read it on the web, it must be true :-)

As I said: don't use 10GE over CAT6/RJ45 (10Gbase-T). That does indeed have 
poor latency.  As I understand it: in order to get such high data rates over 
copper, it has to employ mechanisms similar to DSL lines, like interleaving, 
which means 10G has comparable or even higher latency than 1G.  Switches with 
all 10Gbase-T ports are expensive, only available from a couple of vendors, and 
consume a lot of power.

However switches with SFP+ ports don't have these problems. For short reach you 
can use SFP+ direct attach cables, and for long reach use fibre.

http://en.wikipedia.org/wiki/10-gigabit_Ethernet

I have been working with Intel X520-DA2 NICs and a Netgear XSM7224S switch, and 
direct-attach cables (3m Netgear AXC763, 5m Intel)

This all works fine, although with older versions of Linux I had to build the 
latest Intel drivers from their website to fix problems with the links going 
down every day or two.

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] about HA infrastructure for hypervisors

2012-06-28 Thread Brian Candler
On Thu, Jun 28, 2012 at 11:25:20AM +0200, Nicolas Sebrecht wrote:
> We excluded ethernet due to searches on the web. It appeared that
> ethernet has bad latency.

I read it on the web, it must be true :-)

As I said: don't use 10GE over CAT6/RJ45 (10Gbase-T). That does indeed have
poor latency.  As I understand it: in order to get such high data rates over
copper, it has to employ mechanisms similar to DSL lines, like interleaving,
which means 10G has comparable or even higher latency than 1G.  Switches
with all 10Gbase-T ports are expensive, only available from a couple of
vendors, and consume a lot of power.

However switches with SFP+ ports don't have these problems. For short reach
you can use SFP+ direct attach cables, and for long reach use fibre.

http://en.wikipedia.org/wiki/10-gigabit_Ethernet

I have been working with Intel X520-DA2 NICs and a Netgear XSM7224S switch,
and direct-attach cables (3m Netgear AXC763, 5m Intel)

This all works fine, although with older versions of Linux I had to build
the latest Intel drivers from their website to fix problems with the links
going down every day or two.

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] about HA infrastructure for hypervisors

2012-06-28 Thread Fernando Frediani (Qube)
You should indeed to use the same server running as a storage brick as a KVM 
host to maximize hardware and power usage. Only thing I am not sure is if you 
can limit the amount of host memory Gluster can eat so most of it gets reserved 
for the Virtual Machines.

Fernando

-Original Message-
From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Nicolas Sebrecht
Sent: 28 June 2012 10:31
To: Thomas Jackson
Cc: 'gluster-users'
Subject: [Gluster-users] Re: about HA infrastructure for hypervisors

The 28/06/12, Thomas Jackson wrote:

> Why don't you have KVM running on the Gluster bricks as well?

Good point. While abtracting we decided to seperate KVM & Gluster but I can't 
remember why.
We'll think about that again.

> We have a 4 node cluster (each with 4x 300GB 15k SAS drives in 
> RAID10), 10 gigabit SFP+ Ethernet (with redundant switching). Each 
> node participates in a distribute+replicate Gluster namespace and runs 
> KVM. We found this to be the most efficient (and fastest) way to run the 
> cluster.
> 
> This works well for us, although (due to Gluster using fuse) it isn't 
> as fast as we would like. Currently waiting for the KVM driver that 
> has been discussed a few times recently, that should make a huge 
> difference to the performance for us.

Ok! Thanks.

--
Nicolas Sebrecht
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] about HA infrastructure for hypervisors

2012-06-28 Thread Brian Candler
On Wed, Jun 27, 2012 at 05:28:43PM -0500, Nathan Stratton wrote:
> [root@virt01 ~]# dd if=/dev/zero of=foo bs=1M count=5k
> 5120+0 records in
> 5120+0 records out
> 5368709120 bytes (5.4 GB) copied, 26.8408 s, 200 MB/s
> 
> >But doing a dd if=/dev/zero bs=1024k within a VM, whose image was mounted on
> >glusterfs, I was getting only 6-25MB/s.
> 
> [root@test ~]# dd if=/dev/zero of=foo bs=1M count=5k
> 5120+0 records in
> 5120+0 records out
> 5368709120 bytes (5.4 GB) copied, 172.706 s, 31.1 MB/s

That's what I consider unimpressive - slower than a single disk, when you
have an array of 16.  I should try a pair of drbd nodes as a fair comparison
though.

Having said that, multiple clients running concurrently should be able to
use the remaining bandwidth, so the aggregate throughput should be fine.

BTW, there used to be an issue with glusterfs 3.2 in the way it scaled its
thread pool, which limited concurrency:
http://gluster.org/pipermail/gluster-users/2012-February/009590.html
But that has been fixed in 3.3 (commit 2d836326)

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] about HA infrastructure for hypervisors

2012-06-27 Thread Thomas Jackson
Why don't you have KVM running on the Gluster bricks as well?

We have a 4 node cluster (each with 4x 300GB 15k SAS drives in RAID10), 10
gigabit SFP+ Ethernet (with redundant switching). Each node participates in
a distribute+replicate Gluster namespace and runs KVM. We found this to be
the most efficient (and fastest) way to run the cluster.

This works well for us, although (due to Gluster using fuse) it isn't as
fast as we would like. Currently waiting for the KVM driver that has been
discussed a few times recently, that should make a huge difference to the
performance for us.

Cheers,

Thomas

-Original Message-
From: gluster-users-boun...@gluster.org
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Nicolas Sebrecht
Sent: Wednesday, 27 June 2012 9:13 PM
To: Gerald Brandt
Cc: gluster-users
Subject: [Gluster-users] Re: about HA infrastructure for hypervisors

The 27/06/12, Gerald Brandt wrote:

> Hi,
> 
> If your switch breaks, you are done.  Put each Gluster server on it's own
switch.

Right. Handling switch failures isn't what I'm most worried about but I
guess that I'll need to add a network link between KVM hypervisors, too.

Thanks for this tip, though.

> >   ++  ++
> >   ||--||
> >   | KVM hypervisor |---+  +---| KVM hypervisor |
> >   ||   |  |   ||
> >   ++   |  |   ++
> >|  |
> >  +--+  +--+
> >  |switch|  |switch|
> >  +--+  +--+
> >  | |  |  |
> >   +---+  | |  |  | +---+
> >   |   |  | |  |  +-|   |
> >   | Glusterfs 3.3 |--+ +--)| Glusterfs 3.3 |
> >   |   server A|   ||   server B|
> >   |   |---+|   |
> >   +---++---+

--
Nicolas Sebrecht
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
--
Message  protected by MailGuard: e-mail anti-virus, anti-spam and content
filtering.
http://www.mailguard.com.au


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] about HA infrastructure for hypervisors

2012-06-27 Thread Nathan Stratton

On Wed, 27 Jun 2012, Brian Candler wrote:


For a 16-disk array, your IOPS is not bad.  But are you actually storing a
VM image on it, and then doing lots of I/O within that VM (as opposed to
mounting the volume form within the VM)?  If so, can you specify your exact
configuration, including OS and kernel versions?


2.6.32-220.23.1.el6.x86_64

[root@virt01 ~]# gluster volume info share

Volume Name: share
Type: Distributed-Replicate
Volume ID: 09bfc0c3-e3d4-441b-af6f-acd263884920
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: 10.59.0.11:/export
Brick2: 10.59.0.12:/export
Brick3: 10.59.0.13:/export
Brick4: 10.59.0.14:/export
Brick5: 10.59.0.15:/export
Brick6: 10.59.0.16:/export
Brick7: 10.59.0.17:/export
Brick8: 10.59.0.18:/export
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
nfs.nlm: off
auth.allow: *
nfs.disable: off


I did my tests on two quad-core/8GB nodes, 12 disks in each (md RAID10),
running ubuntu 12.04, and 10GE RJ45 direct connection.  The disk arrays
locally perform at 350MB/s for streaming writes.


Well I would first ditch ubuntu and install Centos, but. My disk 
arrays are slow:


[root@virt01 ~]# dd if=/dev/zero of=foo bs=1M count=5k
5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB) copied, 26.8408 s, 200 MB/s


But doing a dd if=/dev/zero bs=1024k within a VM, whose image was mounted on
glusterfs, I was getting only 6-25MB/s.


[root@test ~]# dd if=/dev/zero of=foo bs=1M count=5k
5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB) copied, 172.706 s, 31.1 MB/s

While this is slower then what I would like it see, its faster then what I 
was getting to my NetApp and it scales better! :)



<>

Nathan Stratton
nathan at robotics.net
http://www.robotics.net
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] about HA infrastructure for hypervisors

2012-06-27 Thread Brian Candler
On Wed, Jun 27, 2012 at 03:07:21PM -0500, Nathan Stratton wrote:
> >I've made a test setup like this, but unfortunately I haven't yet been able
> >to get half-decent performance out of glusterfs 3.3 as a KVM backend.  It
> >may work better if you use local disk for the VM images, and within the VM
> >mount the glusterfs volume for application data.
> 
> What is considered half-decent? I have a 8 cluster
> distribute+replicate setup and I am getting about 65MB/s and about
> 1.5K IOPS. Considering that I am only using a single two disk SAS
> strip in each host I think that is not bad.

For a 16-disk array, your IOPS is not bad.  But are you actually storing a
VM image on it, and then doing lots of I/O within that VM (as opposed to
mounting the volume form within the VM)?  If so, can you specify your exact
configuration, including OS and kernel versions?

I did my tests on two quad-core/8GB nodes, 12 disks in each (md RAID10),
running ubuntu 12.04, and 10GE RJ45 direct connection.  The disk arrays
locally perform at 350MB/s for streaming writes.

But doing a dd if=/dev/zero bs=1024k within a VM, whose image was mounted on
glusterfs, I was getting only 6-25MB/s.

http://gluster.org/pipermail/gluster-users/2012-June/010553.html
http://gluster.org/pipermail/gluster-users/2012-June/010560.html
http://gluster.org/pipermail/gluster-users/2012-June/010570.html

I get much better performance on locally-attached storage with O_DIRECT (kvm
option "cache=none"), but have been unable to get O_DIRECT to work with
glusterfs.

After a kernel upgrade (to a 3.4+ kernel which supports O_DIRECT for fuse),
and using the mount option direct-io-mode=enable, the VM simply wouldn't
boot:

http://gluster.org/pipermail/gluster-users/2012-June/010572.html
http://gluster.org/pipermail/gluster-users/2012-June/010573.html

Hence I'm keen to learn the recipe for good performance with glusterfs
storing VM images, as if it exists, it doesn't seem to be well documented at
all.

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] about HA infrastructure for hypervisors

2012-06-27 Thread Nathan Stratton

On Wed, 27 Jun 2012, Brian Candler wrote:


I've made a test setup like this, but unfortunately I haven't yet been able
to get half-decent performance out of glusterfs 3.3 as a KVM backend.  It
may work better if you use local disk for the VM images, and within the VM
mount the glusterfs volume for application data.


What is considered half-decent? I have a 8 cluster distribute+replicate 
setup and I am getting about 65MB/s and about 1.5K IOPS. Considering that 
I am only using a single two disk SAS strip in each host I think that is 
not bad.



Alternatively, look at something like ganeti (which by default runs on top
of drbd+LVM, although you can also use it to manage a cluster which uses a
shared file store backend like gluster)

Maybe 3.3.1 will be better. But today, your investment in SSDs is quite
likely to be wasted :-(


The idea is to have HA if either one KVM hypervisor or one Glusterfs
server stop working (failure, maintenance, etc).


You'd also need some mechanism for starting each VM on node B if node A
fails.  You can probably script that, although there are lots of hazards for
the unwary.  Maybe better to have the failover done manually.


Also check out oVirt, it integrates with Gluster and provides HA.


2. We still didn't decide what physical network to choose between FC, FCoE
and Infiniband.


Have you ruled out 10G ethernet? If so, why?


I agree, we went all 10GBase-T.


(note: using SFP+ ports, either with fibre SFP+s or SFP+ coax cables, gives
much better latency that 10G over RJ45/CAT6)


Actually with the new switches like Arista this is less of an issue.


3. Would it be better to split the Glusterfs namespace into two gluster
volumes (one for each hypervisor), each running on a Glusterfs server
(for the normal case where all servers are running)?


I don't see how that would help - I expect you would mount both volumes on
both KVM nodes anyway, to allow you to do live migration.


Yep



<>

Nathan Stratton
nathan at robotics.net
http://www.robotics.net
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] about HA infrastructure for hypervisors

2012-06-27 Thread Brian Candler
On Wed, Jun 27, 2012 at 10:06:30AM +0200, Nicolas Sebrecht wrote:
> We are going to try glusterfs for our new HA servers.
> 
> To get full HA, I'm thinking of building it this way:
> 
>   ++  ++
>   ||  ||
>   | KVM hypervisor |-++---| KVM hypervisor |
>   || ||   ||
>   ++ ||   ++
>  ||
> +--+
> |switch|
> +--+
>  ||
>   +---+  ||+---+
>   |   |  |||   |
>   | Glusterfs 3.3 |--++| Glusterfs 3.3 |
>   |   server A||   server B|
>   |   ||   |
>   +---++---+

I've made a test setup like this, but unfortunately I haven't yet been able
to get half-decent performance out of glusterfs 3.3 as a KVM backend.  It
may work better if you use local disk for the VM images, and within the VM
mount the glusterfs volume for application data.

Alternatively, look at something like ganeti (which by default runs on top
of drbd+LVM, although you can also use it to manage a cluster which uses a
shared file store backend like gluster)

Maybe 3.3.1 will be better. But today, your investment in SSDs is quite
likely to be wasted :-(

> The idea is to have HA if either one KVM hypervisor or one Glusterfs
> server stop working (failure, maintenance, etc).

You'd also need some mechanism for starting each VM on node B if node A
fails.  You can probably script that, although there are lots of hazards for
the unwary.  Maybe better to have the failover done manually.

> 2. We still didn't decide what physical network to choose between FC, FCoE
> and Infiniband.

Have you ruled out 10G ethernet? If so, why?

(note: using SFP+ ports, either with fibre SFP+s or SFP+ coax cables, gives
much better latency that 10G over RJ45/CAT6)

> 3. Would it be better to split the Glusterfs namespace into two gluster
> volumes (one for each hypervisor), each running on a Glusterfs server
> (for the normal case where all servers are running)?

I don't see how that would help - I expect you would mount both volumes on
both KVM nodes anyway, to allow you to do live migration.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] about HA infrastructure for hypervisors

2012-06-27 Thread Dan Cyr
If you do decide to use 2 switchs:
1) for KVM Hosts use 2 nics, bridge them and run KVM on the bridge (usually 
br0), link each nic to a different switch
2) Interlink those switches.

Dan

-Original Message-
From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Nicolas Sebrecht
Sent: Wednesday, June 27, 2012 4:13 AM
To: Gerald Brandt
Cc: gluster-users
Subject: [Gluster-users] Re: about HA infrastructure for hypervisors

The 27/06/12, Gerald Brandt wrote:

> Hi,
> 
> If your switch breaks, you are done.  Put each Gluster server on it's own 
> switch.

Right. Handling switch failures isn't what I'm most worried about but I guess 
that I'll need to add a network link between KVM hypervisors, too.

Thanks for this tip, though.

> >   ++  ++
> >   ||--||
> >   | KVM hypervisor |---+  +---| KVM hypervisor |
> >   ||   |  |   ||
> >   ++   |  |   ++
> >|  |
> >  +--+  +--+
> >  |switch|  |switch|
> >  +--+  +--+
> >  | |  |  |
> >   +---+  | |  |  | +---+
> >   |   |  | |  |  +-|   |
> >   | Glusterfs 3.3 |--+ +--)| Glusterfs 3.3 |
> >   |   server A|   ||   server B|
> >   |   |---+|   |
> >   +---++---+

--
Nicolas Sebrecht
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] about HA infrastructure for hypervisors

2012-06-27 Thread Gerald Brandt


- Original Message -
> From: "Nicolas Sebrecht" 
> To: "gluster-users" 
> Sent: Wednesday, June 27, 2012 3:06:30 AM
> Subject: [Gluster-users] about HA infrastructure for hypervisors
> 
> Hi,
> 
> We are going to try glusterfs for our new HA servers.
> 
> To get full HA, I'm thinking of building it this way:
> 
>   ++  ++
>   ||  ||
>   | KVM hypervisor |-++---| KVM hypervisor |
>   || ||   ||
>   ++ ||   ++
>  ||
> +--+
> |switch|
> +--+
>  ||
>   +---+  ||+---+
>   |   |  |||   |
>   | Glusterfs 3.3 |--++| Glusterfs 3.3 |
>   |   server A||   server B|
>   |   ||   |
>   +---++---+
> 
> 
> The idea is to have HA if either one KVM hypervisor or one Glusterfs
> server stop working (failure, maintenance, etc).
> 
> Some points:
> - We don't care much about duplicating the network (we're going to
> have
>   spare materials only).
> - Glusterfs servers will use gluster replication to get HA.
> - Each Glusterfs server will have SSD disks in a RAID (1 or 10, I
> guess).
> - Most of the time, both KVM hypervisor will have VM running.
> 
> 1. Is this a correct/typicall infrastructure?
> 
> 2. We still didn't decide what physical network to choose between FC,
> FCoE
> and Infiniband. What would you suggest for both performance and easy
> configuration?
> 
> Is it possible to use FC or FCoE for a HA Glusterfs cluster? If so,
> how
> to configure Glusterfs nodes?
> 
> 3. Would it be better to split the Glusterfs namespace into two
> gluster
> volumes (one for each hypervisor), each running on a Glusterfs server
> (for the normal case where all servers are running)?
> 
> 
> Thanks,
> 
> --
> Nicolas Sebrecht

Hi,

If your switch breaks, you are done.  Put each Gluster server on it's own 
switch.

>   ++  ++
>   ||  ||
>   | KVM hypervisor |---+  +---| KVM hypervisor |
>   ||   |  |   ||
>   ++   |  |   ++
>|  |
>  +--+  +--+
>  |switch|  |switch|
>  +--+  +--+
>  | |  |  |
>   +---+  | |  |  | +---+
>   |   |  | |  |  +-|   |
>   | Glusterfs 3.3 |--+ +--)| Glusterfs 3.3 |
>   |   server A|   ||   server B|
>   |   |---+|   |
>   +---++---+
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users