Re: [Gluster-users] how well will this work

2013-01-06 Thread Shawn Heisey

On 1/2/2013 4:01 AM, Brian Candler wrote:

Aside: what is the reason for creating four multiple logical volumes/bricks
on the same node, and then combining them together using gluster
distribution?  Also, why are you combining all your disks into a single
volume group (clustervg), but then allocating each logical volume from only
a single disk within that VG?


I've got a deployment I'm working on where each server has twelve 4TB 
drives.  I've split that into two 6-drive RAID5 arrays, each of which is 
a PV/VG of 20TB, using LVM.  Then I have split that into four 5TB 
logical volumes, each of which is used as a glusterfs brick.


I have chosen this particular brick size for compatibility with other 
physical drive sizes.  Once we have the gluster volume up and running, 
we'll be able to free up some 2TB drives from other storage devices. 
Those drives will be used in future servers to be added to the gluster 
volume(s).  With similar 12-bay hardware and two RAID5 arrays per 
server, this 5TB brick size will work with any size drive that is a 
multiple of 1TB.


Thanks,
Shawn

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2013-01-02 Thread Brian Candler
On Fri, Dec 28, 2012 at 10:14:19AM -0800, Joe Julian wrote:
 In my configuration, 1 server has 4 drives (well, 5, but one's the
 OS). Each drive has one gpt partition. I create an lvm volume group
 that holds all four huge partitions. For any one GlusterFS volume I
 create 4 lvm logical volumes:
 
 lvcreate -n a_vmimages clustervg /dev/sda1
 lvcreate -n b_vmimages clustervg /dev/sdb1
 lvcreate -n c_vmimages clustervg /dev/sdc1
 lvcreate -n d_vmimages clustervg /dev/sdd1
 
 then format them xfs and (I) mount them under
 /data/glusterfs/vmimages/{a,b,c,d}. These four lvm partitions are
 bricks for the new GlusterFS volume.
 
 As glusterbot would say if asked for the glossary:
 A server hosts bricks (ie. server1:/foo) which belong to a
 volume  which is accessed from a client.
 
 My volume would then look like
 gluster volume create replica 3
 server{1,2,3}:/data/glusterfs/vmimages/a/brick
 server{1,2,3}:/data/glusterfs/vmimages/b/brick
 server{1,2,3}:/data/glusterfs/vmimages/c/brick
 server{1,2,3}:/data/glusterfs/vmimages/d/brick

Aside: what is the reason for creating four multiple logical volumes/bricks
on the same node, and then combining them together using gluster
distribution?  Also, why are you combining all your disks into a single
volume group (clustervg), but then allocating each logical volume from only
a single disk within that VG?

Snapshots perhaps?

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2013-01-02 Thread Jeff Darcy
On 1/2/13 6:01 AM, Brian Candler wrote:
 On Fri, Dec 28, 2012 at 10:14:19AM -0800, Joe Julian wrote:
 My volume would then look like
 gluster volume create replica 3
 server{1,2,3}:/data/glusterfs/vmimages/a/brick
 server{1,2,3}:/data/glusterfs/vmimages/b/brick
 server{1,2,3}:/data/glusterfs/vmimages/c/brick
 server{1,2,3}:/data/glusterfs/vmimages/d/brick
 
 Aside: what is the reason for creating four multiple logical volumes/bricks
 on the same node, and then combining them together using gluster
 distribution?

I'm not Joe, but I can think of two reasons why this might be a good idea.  One
is superior fault isolation.  With a single concatenated or striped LV (i.e. no
redundancy as with true RAID), a failure of any individual disk will appear as
a failure of the entire brick, forcing *all* traffic to the peers.  With
multiple LVs, that same failure will cause only 1/4 of the traffic to fail
over.  The other reason is performance.  I've found that it's very hard to
predict whether letting LVM schedule across disks or letting GlusterFS do so
will perform better for any given workload, but IMX the latter tends to win
slightly more often than not.

 Also, why are you combining all your disks into a single
 volume group (clustervg), but then allocating each logical volume from only
 a single disk within that VG?

That part's a bit unclear to me as well.  There doesn't seem to be any
immediate benefit, but perhaps it's more an issue of preparing for possible
future change by adding an extra level of naming/indirection.  That way, if the
LVs need to be reconfigured some day, the change will be pretty transparent to
anything that was addressing them by ID anyway.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2013-01-02 Thread Joe Julian

On 01/02/2013 03:37 AM, Jeff Darcy wrote:

On 1/2/13 6:01 AM, Brian Candler wrote:

On Fri, Dec 28, 2012 at 10:14:19AM -0800, Joe Julian wrote:

My volume would then look like
gluster volume create replica 3
server{1,2,3}:/data/glusterfs/vmimages/a/brick
server{1,2,3}:/data/glusterfs/vmimages/b/brick
server{1,2,3}:/data/glusterfs/vmimages/c/brick
server{1,2,3}:/data/glusterfs/vmimages/d/brick

Aside: what is the reason for creating four multiple logical volumes/bricks
on the same node, and then combining them together using gluster
distribution?

I'm not Joe, but I can think of two reasons why this might be a good idea.  One
is superior fault isolation.  With a single concatenated or striped LV (i.e. no
redundancy as with true RAID), a failure of any individual disk will appear as
a failure of the entire brick, forcing *all* traffic to the peers.  With
multiple LVs, that same failure will cause only 1/4 of the traffic to fail
over.  The other reason is performance.  I've found that it's very hard to
predict whether letting LVM schedule across disks or letting GlusterFS do so
will perform better for any given workload, but IMX the latter tends to win
slightly more often than not.

Fault isolation is, indeed, why I do that. I don't need any faster reads 
than my network will handle, so raid isn't going to help me there. When 
a drive fails, gluster's (mostly) been good about handling that failure 
transparently to my services.



Also, why are you combining all your disks into a single
volume group (clustervg), but then allocating each logical volume from only
a single disk within that VG?

That part's a bit unclear to me as well.  There doesn't seem to be any
immediate benefit, but perhaps it's more an issue of preparing for possible
future change by adding an extra level of naming/indirection.  That way, if the
LVs need to be reconfigured some day, the change will be pretty transparent to
anything that was addressing them by ID anyway.

Aha! Because when a drive's in pre-failure I can pvmove the lv's onto 
the new drive, or onto the other drives temporarily.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2012-12-30 Thread Jeff Darcy
On 12/27/12 6:47 PM, Miles Fidelman wrote:
 John Mark Walker wrote:
 In general, I don't recommend any distributed filesystems for VM
 images, but I can also see that this is the wave of the future.
 Ok.  I can see that.
 
 Let's say that I take a slightly looser approach to high-availability:
 - keep the static parts of my installs on local disk
 - share and replicate dynamic data using gluster

That, in a nutshell, is the approach that I (and others) often advocate.  Block
storage should be used sparingly, e.g. for booting and for data served to
others at a higher level.  I'd say that's true in general, but it's especially
true for any kind of network block storage.  When network latencies are
involved, going up the stack where operations are expressed at a high
semantic level will almost always work out better than blocks and locks.

 In this scenario, how well does gluster work when:
 - storage and processing are inter-mixed on the same nodes

That works fine and is a common deployment model for the community code, though
RHS demands a separate server and client model.  The main thing to watch out
for is CPU/memory contention between application and Gluster processes.  Those
can be addressed in all the standard ways, from cgroups to containers to
virtualization.

 - data is triply replicated (allow for 2-node failures)

Unfortunately, three-way replication is still a bit of a work in progress.
Some (such as Joe Julian) use it successfully, but they also use it very
carefully.  I've had to make a few fixes in this area myself recently, and I
expect to make a few more before I'd say that it's really up to snuff for
general use.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2012-12-30 Thread Miles Fidelman

Jeff Darcy wrote:

On 12/27/12 6:47 PM, Miles Fidelman wrote:

John Mark Walker wrote:

In general, I don't recommend any distributed filesystems for VM
images, but I can also see that this is the wave of the future.

Ok.  I can see that.

Let's say that I take a slightly looser approach to high-availability:
- keep the static parts of my installs on local disk
- share and replicate dynamic data using gluster

That, in a nutshell, is the approach that I (and others) often advocate.  Block
storage should be used sparingly, e.g. for booting and for data served to
others at a higher level.  I'd say that's true in general, but it's especially
true for any kind of network block storage.  When network latencies are
involved, going up the stack where operations are expressed at a high
semantic level will almost always work out better than blocks and locks.


What's the alternative, though?  Ok, for application files (say a word 
processing document) that works, but what about spools, databases, and 
such?  Seems like blocks are the common denominator.

- data is triply replicated (allow for 2-node failures)

Unfortunately, three-way replication is still a bit of a work in progress.
Some (such as Joe Julian) use it successfully, but they also use it very
carefully.  I've had to make a few fixes in this area myself recently, and I
expect to make a few more before I'd say that it's really up to snuff for
general use.


That's a bit disappointing.   For high-availability applications (like 
mine), 3-way replication would seem to be the major advantage of a 
cluster file system over DRBD.


Thanks,

Miles Fidelman





--
In theory, there is no difference between theory and practice.
In practice, there is.    Yogi Berra

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2012-12-30 Thread Jeff Darcy
On 12/30/12 1:33 PM, Miles Fidelman wrote:
 What's the alternative, though?  Ok, for application files (say a word
 processing document) that works, but what about spools, databases, and
 such?  Seems like blocks are the common denominator.

It's all blocks underneath; it's a matter of how you get to those blocks.  If
you use a simulated block device which is actually a GlusterFS file, then
you'll be going through both FUSE and the loopback driver.  That actually works
OK for many things, but latency will be a bit high e.g. for databases.  One
option is to use the qemu interface, which avoids both sources of overhead.  In
fact, the overhead from virtualizing your database server is likely to be lower
than FUSE+loopback because our esteemed kernel colleagues seem a lot more
interested in making virtual I/O work better than in doing the same for FUSE.
It's still a tiny hit compared to running a DB on bare metal, but the value of
being able to survive a failure should more than outweigh that.

 For high-availability applications (like
 mine), 3-way replication would seem to be the major advantage of a
 cluster file system over DRBD.

It all depends on how many failures, of which type, you need to handle, and
what price you're willing to pay in terms of storage utilization.  It's easy to
get protection against two disk failures or one server/network failure using
replication plus RAID on the servers.  If you want protection against two
server failures, then there's geosync.  You could also try using local sync
(AFR) and it would probably work for you (as it does for Joe), but there's the
caveat that we're still working on some of the more unusual edge cases.  Only
you and your tests can say whether that's good enough.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2012-12-30 Thread Miles Fidelman

Jeff,

Thanks for the details.  If I might trouble you for a few more...

Jeff Darcy wrote:

On 12/30/12 1:33 PM, Miles Fidelman wrote:

What's the alternative, though?  Ok, for application files (say a word
processing document) that works, but what about spools, databases, and
such?  Seems like blocks are the common denominator.

It's all blocks underneath; it's a matter of how you get to those blocks.  If
you use a simulated block device which is actually a GlusterFS file, then
you'll be going through both FUSE and the loopback driver.  That actually works
OK for many things, but latency will be a bit high e.g. for databases.  One
option is to use the qemu interface, which avoids both sources of overhead.  In
fact, the overhead from virtualizing your database server is likely to be lower
than FUSE+loopback because our esteemed kernel colleagues seem a lot more
interested in making virtual I/O work better than in doing the same for FUSE.
It's still a tiny hit compared to running a DB on bare metal, but the value of
being able to survive a failure should more than outweigh that.



I'm running Xen virtualization, and I understand how all the pieces fit 
together for running paravitualized hosts over a local disk, software 
raid, LVM, and DRBD - but none of those involve qemu.  I wonder if you 
could say a little bit about how all the pieces wire together, if I 
wanted to mount a Gluster filesystem from a paravirtualized VM, through 
the qemu interface?


Thanks again,

Miles Fidelman




--
In theory, there is no difference between theory and practice.
In practice, there is.    Yogi Berra

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2012-12-30 Thread Joe Julian


On 12/30/2012 12:31 PM, William Muriithi wrote:

For mysql, I set up my innodb store to use 4 files (I don't do 1 file
per table), each file distributes to each of the 4 replica subvolumes.
This balances the load pretty nicely.

It's not so much a how glusterfs works question as much as it is a how
innodb works question. By configuring the innodb_data_file_path to start
with a multiple of your bricks (and carefully choosing some filenames to
ensure they're distributed evenly), records seem to be (and I only have
tested this through actual use and have no idea if this is how it's
supposed to work) accessed evenly over the distribute set.


Hmm, have you checked on the gluster servers that these four files are
in separate bricks?  As far as I understand, if you have not done
anything Glusterfs scheduler (Default ALU on version 3.3), it is
likely that is not whats happening. Or you are using a version that
has a different scheduler.  Interesting though.  Poke around and
update us please
Not just checked, but engineered. At the time I created a file then 
checked which dht subvolume it was on using getfattr -n 
trusted.glusterfs.pathinfo $file for each file, then incremented the 
filename until it was created on the subvolume I wanted.


As an aside, I'm referring to DHT (distrubute) /subvolumes/ rather than 
bricks because AFR (replicate) is under DHT meaning that replicate 
actually is the translator whose subvolumes map 1:1 to bricks in my setup.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] how well will this work

2012-12-28 Thread Miles Fidelman

Joe,

Thanks for the details, but I'm having just a bit of a problem parsing 
and picturing the description.  Any possibility of walking this through 
from the bottom up - hardware (server + n drives) - drive partitioning 
- lvm setup - gluster config - VM setup?


Thanks again,

Miles

Joe Julian wrote:
I have 3 servers with replica 3 volumes, 4 bricks per server on lvm 
partitions that are placed on each of 4 hard drives, 15 volumes 
resulting in 60 bricks per server. One of my servers is also a kvm 
host running (only) 24 vms.


Each vm image is only 6 gig, enough for the operating system and 
applications and is hosted on one volume. The data for each 
application is hosted on its own GlusterFS volume.


For mysql, I set up my innodb store to use 4 files (I don't do 1 file 
per table), each file distributes to each of the 4 replica subvolumes. 
This balances the load pretty nicely.


I don't really do anything special for anything else, other than the 
php app recommendations I make on my blog (http://joejulian.name) 
which all have nothing to do with the actual filesystem.


The thing that I think some people (even John Mark) miss apply is that 
this is just a tool. You have to engineer a solution using the tools 
you have available. If you feel the positives that GlusterFS provides 
outweigh the negatives, then you will simply have to engineer a 
solution that suits your end goal using this tool. It's not a question 
of whether it works, it's whether you can make it work for your use case.


On 12/27/2012 03:00 PM, Miles Fidelman wrote:
Ok... now that's diametrically the opposite response from Dan Cyr's 
of a few minutes ago.


Can you say just a bit more about your configuration - how many 
nodes, do you have storage and processing combined or separated, how 
do you have your drives partitioned, and so forth?


Thanks,

Miles


Joe Julian wrote:
Trying to return to the actual question, the way I handle those is 
to mount gluster volumes that host the data for those tasks from 
within the vm. I've done that successfully since 2.0 with all of 
those services.


The limitations that others are expressing have as much to do with 
limitations placed on their own designs as with their hardware. 
Sure, there are other less stable and/or scalable systems that are 
faster, but with proper engineering you should be able to build a 
system that meets those design requirements.


The one piece that wasn't there before but now is in 3.3 is the 
locking and performance problems during disk rebuilds which is now 
done at a much more granular level and I have successfully 
self-healed several vm images simultaneously while doing it on all 
of them without any measurable delays.


Miles Fidelman mfidel...@meetinghouse.net wrote:

Joe Julian wrote:

It would probably be better to ask this with end-goal
questions instead of with a unspecified critical feature
list and performance problems.


Ok... I'm running a 2-node cluster that's essentially a mini 
cloud stack
- with storage and processing combined on the same boxes. I'm 
running a
production VM that hosts a mail server, list server, web server, 
and

database; another production VM providing a backup server for the
cluster and for a bunch of desktop machines; and several VMs 
used for a

variety of development and testing purposes. It's all backed by a
storage stack consisting of linux raid10 - lvm - drbd, and uses
pacemaker for high-availability failover of the
production VMs.  It all
performs reasonably well under moderate load (mail flows, web 
servers

respond, database transactions complete, without notable user-level
delays; queues don't back up; cpu and io loads stay within 
reasonable

bounds).

The goals are to:
- add storage and processing capacity by adding two more nodes - 
each

consisting of several CPU cores and 4 disks each
- maintain the flexibility to create/delete/migrate/failover 
virtual

machines - across 4 nodes instead of 2
- avoid having to play games with pairwise DRBD configurations 
by moving

to a clustered filesystem
- in essence, I'm looking to do what Sheepdog purports to do, 
except in

a Xen environment

Earlier versions of gluster had reported problems with:
- supporting databases
- supporting VMs
- locking and performance problems during disk rebuilds
- and... most of the gluster documentation implies that it's
preferable
to separate storage nodes from processing nodes

It looks like Gluster 3.2 and 3.3 have addressed some of these 
issues,
and I'm trying to get a general read on whether it's worth 
putting in
the effort of moving forward with some experimentation, or 
whether this
is a non-starter.  Is there anyone out there who's tried to run 
this
kind of mini-cloud with gluster?  What kind of results have you 
had?




On 12/26/2012 08:24 PM, Miles 

Re: [Gluster-users] how well will this work

2012-12-28 Thread William Muriithi
Joe,
 I have 3 servers with replica 3 volumes, 4 bricks per server on lvm
 partitions that are placed on each of 4 hard drives, 15 volumes
 resulting in 60 bricks per server. One of my servers is also a kvm host
 running (only) 24 vms.


Mind explaining your setup again.  I kind of could not follow,
probably because of terminology issues.  For example

 4 bricks per server  - Don't understand this part, I assumes a brick
== 1 physical server (Okay, could also be one vm, but don't see how
that would be help unless its a test environment).  The way you put it
though, mean I have issues with my terminology.

Isn't there a 1:1 relationship between brick and server?

 Each vm image is only 6 gig, enough for the operating system and
 applications and is hosted on one volume. The data for each application
 is hosted on its own GlusterFS volume.
Hmm, petty good idea, especially security wise.  Means one VM can not
mess with another vm files.  Is it possible to extend gluster volume
without destroying and recreating it with bigger peer storage setting

 For mysql, I set up my innodb store to use 4 files (I don't do 1 file
 per table), each file distributes to each of the 4 replica subvolumes.
 This balances the load pretty nicely.

I thought lots of small files would be better than 4 huge files?  I
mean, why does this work out better performance wise?  Not saying its
wrong, I am just trying to learn from you as I am looking for a
similar setup. However, I could not think why using 4 files would be
better but this may because I don't understand how glusterfs works may
be

 I don't really do anything special for anything else, other than the php
 app recommendations I make on my blog (http://joejulian.name) which all
 have nothing to do with the actual filesystem.

Thanks for the link
 The thing that I think some people (even John Mark) miss apply is that
 this is just a tool. You have to engineer a solution using the tools you
 have available. If you feel the positives that GlusterFS provides
 outweigh the negatives, then you will simply have to engineer a solution
 that suits your end goal using this tool. It's not a question of whether
 it works, it's whether you can make it work for your use case.

 On 12/27/2012 03:00 PM, Miles Fidelman wrote:
 Ok... now that's diametrically the opposite response from Dan Cyr's of
 a few minutes ago.
William
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2012-12-28 Thread Joe Julian

On 12/28/2012 08:54 AM, William Muriithi wrote:

Joe,

I have 3 servers with replica 3 volumes, 4 bricks per server on lvm
partitions that are placed on each of 4 hard drives, 15 volumes
resulting in 60 bricks per server. One of my servers is also a kvm host
running (only) 24 vms.


Mind explaining your setup again.  I kind of could not follow,
probably because of terminology issues.  For example

  4 bricks per server  - Don't understand this part, I assumes a brick
== 1 physical server (Okay, could also be one vm, but don't see how
that would be help unless its a test environment).  The way you put it
though, mean I have issues with my terminology.

Isn't there a 1:1 relationship between brick and server?
In my configuration, 1 server has 4 drives (well, 5, but one's the OS). 
Each drive has one gpt partition. I create an lvm volume group that 
holds all four huge partitions. For any one GlusterFS volume I create 4 
lvm logical volumes:


lvcreate -n a_vmimages clustervg /dev/sda1
lvcreate -n b_vmimages clustervg /dev/sdb1
lvcreate -n c_vmimages clustervg /dev/sdc1
lvcreate -n d_vmimages clustervg /dev/sdd1

then format them xfs and (I) mount them under 
/data/glusterfs/vmimages/{a,b,c,d}. These four lvm partitions are bricks 
for the new GlusterFS volume.


As glusterbot would say if asked for the glossary:
A server hosts bricks (ie. server1:/foo) which belong to a 
volume  which is accessed from a client.


My volume would then look like
gluster volume create replica 3 
server{1,2,3}:/data/glusterfs/vmimages/a/brick 
server{1,2,3}:/data/glusterfs/vmimages/b/brick 
server{1,2,3}:/data/glusterfs/vmimages/c/brick 
server{1,2,3}:/data/glusterfs/vmimages/d/brick

Each vm image is only 6 gig, enough for the operating system and
applications and is hosted on one volume. The data for each application
is hosted on its own GlusterFS volume.

Hmm, petty good idea, especially security wise.  Means one VM can not
mess with another vm files.  Is it possible to extend gluster volume
without destroying and recreating it with bigger peer storage setting
I can do that two ways. I can add servers with storage and then 
add-brick to expand, or I can resize the lvm partitions and grow xfs 
(which I have done live several times).



For mysql, I set up my innodb store to use 4 files (I don't do 1 file
per table), each file distributes to each of the 4 replica subvolumes.
This balances the load pretty nicely.

I thought lots of small files would be better than 4 huge files?  I
mean, why does this work out better performance wise?  Not saying its
wrong, I am just trying to learn from you as I am looking for a
similar setup. However, I could not think why using 4 files would be
better but this may because I don't understand how glusterfs works may
be
It's not so much a how glusterfs works question as much as it is a how 
innodb works question. By configuring the innodb_data_file_path to start 
with a multiple of your bricks (and carefully choosing some filenames to 
ensure they're distributed evenly), records seem to be (and I only have 
tested this through actual use and have no idea if this is how it's 
supposed to work) accessed evenly over the distribute set.


With a one file per table model, all records read from any specific 
table will be read from only one distribute subvolume. At least with my 
data set, that would hit one distribute subvolume really heavily while 
leaving the rest fairly idle.

I don't really do anything special for anything else, other than the php
app recommendations I make on my blog (http://joejulian.name) which all
have nothing to do with the actual filesystem.


Thanks for the link

The thing that I think some people (even John Mark) miss apply is that
this is just a tool. You have to engineer a solution using the tools you
have available. If you feel the positives that GlusterFS provides
outweigh the negatives, then you will simply have to engineer a solution
that suits your end goal using this tool. It's not a question of whether
it works, it's whether you can make it work for your use case.

On 12/27/2012 03:00 PM, Miles Fidelman wrote:

Ok... now that's diametrically the opposite response from Dan Cyr's of
a few minutes ago.




___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2012-12-27 Thread Miles Fidelman

Joe Julian wrote:
It would probably be better to ask this with end-goal questions 
instead of with a unspecified critical feature list and performance 
problems.


Ok... I'm running a 2-node cluster that's essentially a mini cloud stack 
- with storage and processing combined on the same boxes.  I'm running a 
production VM that hosts a mail server, list server, web server, and 
database; another production VM providing a backup server for the 
cluster and for a bunch of desktop machines; and several VMs used for a 
variety of development and testing purposes. It's all backed by a 
storage stack consisting of linux raid10 - lvm - drbd, and uses 
pacemaker for high-availability failover of the production VMs.  It all 
performs reasonably well under moderate load (mail flows, web servers 
respond, database transactions complete, without notable user-level 
delays; queues don't back up; cpu and io loads stay within reasonable 
bounds).


The goals are to:
- add storage and processing capacity by adding two more nodes - each 
consisting of several CPU cores and 4 disks each
- maintain the flexibility to create/delete/migrate/failover virtual 
machines - across 4 nodes instead of 2
- avoid having to play games with pairwise DRBD configurations by moving 
to a clustered filesystem
- in essence, I'm looking to do what Sheepdog purports to do, except in 
a Xen environment


Earlier versions of gluster had reported problems with:
- supporting databases
- supporting VMs
- locking and performance problems during disk rebuilds
- and... most of the gluster documentation implies that it's preferable 
to separate storage nodes from processing nodes


It looks like Gluster 3.2 and 3.3 have addressed some of these issues, 
and I'm trying to get a general read on whether it's worth putting in 
the effort of moving forward with some experimentation, or whether this 
is a non-starter.  Is there anyone out there who's tried to run this 
kind of mini-cloud with gluster?  What kind of results have you had?





On 12/26/2012 08:24 PM, Miles Fidelman wrote:

Hi Folks,

I find myself trying to expand a 2-node high-availability cluster 
from to a 4-node cluster.  I'm running Xen virtualization, and 
currently using DRBD to mirror data, and pacemaker to failover cleanly.


The thing is, I'm trying to add 2 nodes to the cluster, and DRBD 
doesn't scale.  Also, as a function of rackspace limits, and the 
hardware at hand, I can't separate storage nodes from compute nodes - 
instead, I have to live with 4 nodes, each with 4 large drives (but 
also w/ 4 gigE ports per server).


The obvious thought is to use Gluster to assemble all the drives into 
one large storage pool, with replication.  But.. last time I looked 
at this (6 months or so back), it looked like some of the critical 
features were brand new, and performance seemed to be a problem in 
the configuration I'm thinking of.


Which leads me to my question:  Has the situation improved to the 
point that I can use Gluster this way?


Thanks very much,

Miles Fidelman




___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users



--
In theory, there is no difference between theory and practice.
In practice, there is.    Yogi Berra

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2012-12-27 Thread Gerald Brandt

On 12-12-26 10:24 PM, Miles Fidelman wrote:

Hi Folks,

I find myself trying to expand a 2-node high-availability cluster from 
to a 4-node cluster.  I'm running Xen virtualization, and currently 
using DRBD to mirror data, and pacemaker to failover cleanly.


The thing is, I'm trying to add 2 nodes to the cluster, and DRBD 
doesn't scale.  Also, as a function of rackspace limits, and the 
hardware at hand, I can't separate storage nodes from compute nodes - 
instead, I have to live with 4 nodes, each with 4 large drives (but 
also w/ 4 gigE ports per server).


The obvious thought is to use Gluster to assemble all the drives into 
one large storage pool, with replication.  But.. last time I looked at 
this (6 months or so back), it looked like some of the critical 
features were brand new, and performance seemed to be a problem in the 
configuration I'm thinking of.


Which leads me to my question:  Has the situation improved to the 
point that I can use Gluster this way?


Thanks very much,

Miles Fidelman



Hi,

I have a XenServer pool (3 servers) talking to an GlusterFS replicate 
server over NFS with uCARP for IP failover.


The system was put in place in May 2012, using GlusterFS 3.3.  It ran 
very well, with speeds comparable to my existing iSCSI solution 
(http://majentis.com/2011/09/21/xenserver-iscsi-and-glusterfsnfs/


I was quite pleased with the system, it worked flawlessly.  Until 
November.  At that point, the Gluster NFS server started stalling under 
load.  It would become unresponsive for a long enough period of time 
that the VM's under XenServer would lose their drives. Linux would 
remount the drives read-only and then eventually lock up, while Windows 
would just lock up.  In this case, Windows was more resilient to the 
transient disk loss.


I have been unable to solve the problem, and am now switching back to a 
DRBD/iSCSI setup.  I'm not happy about it, but we were losing NFS 
connectively nightly, during backups.  Life was hell for a long time 
while I was trying to fix things.


Gerald
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] how well will this work

2012-12-27 Thread Stephan von Krawczynski
On Wed, 26 Dec 2012 22:04:09 -0800
Joe Julian j...@julianfamily.org wrote:

 It would probably be better to ask this with end-goal questions instead 
 of with a unspecified critical feature list and performance problems.
 
 6 months ago, for myself and quite an extensive (and often impressive) 
 list of users there were no missing critical features nor was there any 
 problems with performance. That's not to say that they did not meet your 
 design specifications, but without those specs you're the only one who 
 could evaluate that.

Well, then the list of users does obviously not contain me ;-)
The damn thing will only become impressive if a native kernel client module is
done. FUSE is really a pain.
And read my lips: the NFS implementation has general load/performance problems.
Don't be surprised if it jumps into your face.
Why on earth do they think linux has NFS as kernel implementation?
-- 
Regards,
Stephan
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2012-12-27 Thread Brian Candler
On Wed, Dec 26, 2012 at 11:24:25PM -0500, Miles Fidelman wrote:
 I find myself trying to expand a 2-node high-availability cluster
 from to a 4-node cluster.  I'm running Xen virtualization, and
 currently using DRBD to mirror data, and pacemaker to failover
 cleanly.

Not answering your question directly, but have you looked at Ganeti? This is
a front-end to Xen+LVM+DRBD (open source, written by Google) which makes it
easy to manage such a cluster, assuming DRBD is meeting your needs well at
the moment.

With Ganeti each VM image is its own logical volume, with its own DRBD
instance sitting on top, so you can have different VMs mirrored between
different pairs of machines.  You can migrate storage, albeit slowly (e.g. 
starting with A mirrored to B, you can break the mirroring then re-mirror A
to C, and then mirror C to D). Ganeti automates all this for you.

Another option to look at is Sheepdog, which is a clustered block-storage
device, but this would require you to switch from Xen to KVM.

 and performance seemed to be a
 problem in the configuration I'm thinking of.

With KVM at least, last time I tried performance was still very poor when
a VM image was being written to a file over gluster - I measured about
6MB/s.

However remember that each VM can directly mount glusterfs volumes
internally, and the performance of this is fine - and it also means you can
share data between the VMs.  So with some rearchitecture of your application
you may get sufficient performance for your needs.

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2012-12-27 Thread John Mark Walker
Look, fuse its issues that we all know about. Either it works for you or it 
doesn't. If fuse bothers you that much, look into libgfapi. 

Re: NFS - I'm trying to help track this down. Please either add your comment to 
an existing bug or create a new ticket. 

Either way, ranting won't solve your problem or inspire anyone to fix it. 

-JM


Stephan von Krawczynski sk...@ithnet.com wrote:

On Wed, 26 Dec 2012 22:04:09 -0800
Joe Julian j...@julianfamily.org wrote:

 It would probably be better to ask this with end-goal questions instead 
 of with a unspecified critical feature list and performance problems.
 
 6 months ago, for myself and quite an extensive (and often impressive) 
 list of users there were no missing critical features nor was there any 
 problems with performance. That's not to say that they did not meet your 
 design specifications, but without those specs you're the only one who 
 could evaluate that.

Well, then the list of users does obviously not contain me ;-)
The damn thing will only become impressive if a native kernel client module is
done. FUSE is really a pain.
And read my lips: the NFS implementation has general load/performance problems.
Don't be surprised if it jumps into your face.
Why on earth do they think linux has NFS as kernel implementation?
-- 
Regards,
Stephan
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2012-12-27 Thread Miles Fidelman

Gerald Brandt wrote:

On 12-12-26 10:24 PM, Miles Fidelman wrote:


The thing is, I'm trying to add 2 nodes to the cluster, and DRBD 
doesn't scale.  Also, as a function of rackspace limits, and the 
hardware at hand, I can't separate storage nodes from compute nodes - 
instead, I have to live with 4 nodes, each with 4 large drives (but 
also w/ 4 gigE ports per server).




I have a XenServer pool (3 servers) talking to an GlusterFS replicate 
server over NFS with uCARP for IP failover.


If I read this properly, you have 3 compute servers, and a separate box 
with all your nodes - which is quite different from my setup (4 nodes, 
all will be both compute and storage).  Or am I reading this wrong?


Thanks though.

Miles Fidelman

--
In theory, there is no difference between theory and practice.
In practice, there is.    Yogi Berra

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2012-12-27 Thread Miles Fidelman

Brian Candler wrote:

On Wed, Dec 26, 2012 at 11:24:25PM -0500, Miles Fidelman wrote:

I find myself trying to expand a 2-node high-availability cluster
from to a 4-node cluster.  I'm running Xen virtualization, and
currently using DRBD to mirror data, and pacemaker to failover
cleanly.

Not answering your question directly, but have you looked at Ganeti? This is
a front-end to Xen+LVM+DRBD (open source, written by Google) which makes it
easy to manage such a cluster, assuming DRBD is meeting your needs well at
the moment.


I keep looking at Ganeti, played with it a bit in a test installation.  
It does a lot, but it falls short in two regards:


- it doesn't really have an auto-failover function (it keeps getting 
closer, but no cigar, at least last time I looked) - you either need 
intervene manually on a node failure, or you need to add something like 
pacemaker, and the plumbing starts to get very confused


- the second, you've identified

With Ganeti each VM image is its own logical volume, with its own DRBD
instance sitting on top, so you can have different VMs mirrored between
different pairs of machines.  You can migrate storage, albeit slowly (e.g.
starting with A mirrored to B, you can break the mirroring then re-mirror A
to C, and then mirror C to D). Ganeti automates all this for you.


This is precisely what I'm hoping to get past with a cluster file-system.

Another option to look at is Sheepdog, which is a clustered block-storage
device, but this would require you to switch from Xen to KVM.


You nailed it.  Sheepdog is architected for nodes that combine storage 
and processing.  Sheepdog on Xen would be ideal.  Sigh



With KVM at least, last time I tried performance was still very poor when
a VM image was being written to a file over gluster - I measured about
6MB/s.

However remember that each VM can directly mount glusterfs volumes
internally, and the performance of this is fine - and it also means you can
share data between the VMs.  So with some rearchitecture of your application
you may get sufficient performance for your needs.



Thanks!

Miles Fidelman

--
In theory, there is no difference between theory and practice.
In practice, there is.    Yogi Berra

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2012-12-27 Thread Stephan von Krawczynski
Dear JM,

unfortunately one has to tell openly that the whole concept that is tried here
is simply wrong. The problem is not the next-bug-to-fix. The problem is the
client strategy in user space. It is broken by design. You can either believe
this or go ahead ignoring it and never really get a good and stable setup.
Really, the whole we-close-our-eyes-and-hope-it-will-turn-out-well strategy
looks just like btrfs. Read the archives, I told them years ago it will not
work out in our life time. And today, still they have no ready-for-production
fs, and believe me: it never will be there.
And the same goes for glusterfs. It _could_ be the greatest fs on earth, but
only if you accept:

1) Throw away all non-linux code. Because this war is over since long.
2) Make a kernel based client/server implementation. Because it is the only
way to acceptable performance.
3) Implement true undelete feature. Make delete a move to a deleted-files area.

These are the minimal steps to take for a real success, everything else is
just beating the dead horse. 

Regards,
Stephan



On Thu, 27 Dec 2012 10:03:10 -0500 (EST)
John Mark Walker johnm...@redhat.com wrote:

 Look, fuse its issues that we all know about. Either it works for you or it 
 doesn't. If fuse bothers you that much, look into libgfapi. 
 
 Re: NFS - I'm trying to help track this down. Please either add your comment 
 to an existing bug or create a new ticket. 
 
 Either way, ranting won't solve your problem or inspire anyone to fix it. 
 
 -JM
 
 
 Stephan von Krawczynski sk...@ithnet.com wrote:
 
 On Wed, 26 Dec 2012 22:04:09 -0800
 Joe Julian j...@julianfamily.org wrote:
 
  It would probably be better to ask this with end-goal questions instead 
  of with a unspecified critical feature list and performance problems.
  
  6 months ago, for myself and quite an extensive (and often impressive) 
  list of users there were no missing critical features nor was there any 
  problems with performance. That's not to say that they did not meet your 
  design specifications, but without those specs you're the only one who 
  could evaluate that.
 
 Well, then the list of users does obviously not contain me ;-)
 The damn thing will only become impressive if a native kernel client module is
 done. FUSE is really a pain.
 And read my lips: the NFS implementation has general load/performance 
 problems.
 Don't be surprised if it jumps into your face.
 Why on earth do they think linux has NFS as kernel implementation?
 -- 
 Regards,
 Stephan
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users
 


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2012-12-27 Thread John Mark Walker
Stephan,

I'm going to make this as simple as possible. Every message to this list
should follow these rules:

1. be helpful
2. be constructive
3. be respectful

I will not tolerate ranting that serves no purpose. If your message doesn't
follow any of the rules above, then you shouldn't be posting it.

This is your 2nd warning.

-JM
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] how well will this work

2012-12-27 Thread Sean Fulton
I didn't think his message violated any of your rules. Seems to me he 
has some disagreements with the approach being used to develop Gluster. 
I think you should listen to people who disagree with you.


From monitoring this list for more than a year and 
tried--unsuccessfully--to put Gluster into production use, I think there 
are a lot of people who have problems with stability.


So please, can you respond to his comments with why his suggestions are 
invalid?


sean


On 12/27/2012 03:39 PM, John Mark Walker wrote:


Stephan,

I'm going to make this as simple as possible. Every message to this 
list should follow these rules:


1. be helpful
2. be constructive
3. be respectful

I will not tolerate ranting that serves no purpose. If your message 
doesn't follow any of the rules above, then you shouldn't be posting it.


This is your 2nd warning.

-JM



___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


--
Sean Fulton
GCN Publishing, Inc.
Internet Design, Development and Consulting For Today's Media Companies
http://www.gcnpublishing.com
(203) 665-6211, x203

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] how well will this work

2012-12-27 Thread Mario Kadastik
 I'm going to make this as simple as possible. Every message to this list 
 should follow these rules:
 
 1. be helpful
 2. be constructive
 3. be respectful
 
 I will not tolerate ranting that serves no purpose. If your message doesn't 
 follow any of the rules above, then you shouldn't be posting it.

Might be jumping in here at a random spot, but looking at Stephan's e-mail it 
was all three. It was helpful and constructive by outlining a concrete strategy 
that would make glusterfs greater in his opinion and to an extent that's 
something I share, performance IS an issue and makes me hesitate in moving 
glusterfs to the next level at our site (right now we have a 6 node 12 brick 
configuration that's used extensively as /home, target would be a 180 node 2PB 
distributed 2-way replicated installation). We hit FUSE snags from day 2 and 
are running on NFS right now because negative lookup caching is not in FUSE. In 
fact there is no caching. And NFS has hiccups that cause issues especially for 
us because we use vz containers with bind mounting so if the headnode nfs goes 
stale we have to hack a lot to get the stale mount remounted in all the VZ 
images. I've had at least two or three instances where I had to stop all the 
containers killing user tasks to remount stably. 

And to be fair at least in this particular e-mail I didn't really see much 
disrespect, just some comparisons that I think still remained in respectful 
range. 

Mario Kadastik, PhD
Researcher

---
 Physics is like sex, sure it may have practical reasons, but that's not why 
we do it 
-- Richard P. Feynman

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2012-12-27 Thread Dan Cyr
I also don’t think this is a rant. I, as well, have been following this list 
for a few years, and have been waiting for GlusterFS to stabilize for VM 
deployment. I hope this discussion helps the devs understand areas that people 
are waiting for.

We have 2 SAN servers with Infiniband connections to a Blade Center. I would 
like all the KVM VMs hosted on the SAN with the ability to add more SAN servers 
in the future. – Currently Gluster allows this via NFS but I’ve read about 
performance issues. – So, right now, after 2 years of not deploying this gear 
(and running the VMs images on each blade), am looking for an expandable 
solution for the backend storage so I stop manually babying this network and 
install OpenNebula so I’m not the only person in our office who can manage our 
VM infrastructure.

This does fit into the OP’s question because I would love to see GlusterFS work 
like this.

Miles - As is right now GlusterFS is not what you want for backend VM storage.
  Question: “how well will this work”
  Answer: “horribly”

Dan


From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of John Mark Walker
Sent: Thursday, December 27, 2012 12:39 PM
To: Stephan von Krawczynski
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] how well will this work


Stephan,

I'm going to make this as simple as possible. Every message to this list should 
follow these rules:

1. be helpful
2. be constructive
3. be respectful

I will not tolerate ranting that serves no purpose. If your message doesn't 
follow any of the rules above, then you shouldn't be posting it.

This is your 2nd warning.

-JM

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] how well will this work

2012-12-27 Thread Stephan von Krawczynski
On Thu, 27 Dec 2012 13:24:55 -0800
Dan Cyr d...@truenorthmanagement.com wrote:

 I also don’t think this is a rant. I, as well, have been following this list 
 for a few years, and have been waiting for GlusterFS to stabilize for VM 
 deployment. I hope this discussion helps the devs understand areas that 
 people are waiting for.
 
 We have 2 SAN servers with Infiniband connections to a Blade Center. I would 
 like all the KVM VMs hosted on the SAN with the ability to add more SAN 
 servers in the future. – Currently Gluster allows this via NFS but I’ve read 
 about performance issues. – So, right now, after 2 years of not deploying 
 this gear (and running the VMs images on each blade), am looking for an 
 expandable solution for the backend storage so I stop manually babying this 
 network and install OpenNebula so I’m not the only person in our office who 
 can manage our VM infrastructure.
 
 This does fit into the OP’s question because I would love to see GlusterFS 
 work like this.
 
 Miles - As is right now GlusterFS is not what you want for backend VM storage.
   Question: “how well will this work”
   Answer: “horribly”
 
 Dan
 
 
 From: gluster-users-boun...@gluster.org 
 [mailto:gluster-users-boun...@gluster.org] On Behalf Of John Mark Walker
 Sent: Thursday, December 27, 2012 12:39 PM
 To: Stephan von Krawczynski
 Cc: gluster-users@gluster.org
 Subject: Re: [Gluster-users] how well will this work
 
 
 Stephan,
 
 I'm going to make this as simple as possible. Every message to this list 
 should follow these rules:
 
 1. be helpful
 2. be constructive
 3. be respectful
 
 I will not tolerate ranting that serves no purpose. If your message doesn't 
 follow any of the rules above, then you shouldn't be posting it.
 
 This is your 2nd warning.
 
 -JM


Hola JM,

are you aware that your above message has neither arrived at my side through 
the list, nor through personal mail.
Does this mean I got deleted from the list by you?

-- 
Regards,
Stephan

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] how well will this work

2012-12-27 Thread Joe Julian
Trying to return to the actual question, the way I handle those is to mount 
gluster volumes that host the data for those tasks from within the vm. I've 
done that successfully since 2.0 with all of those services. 

The limitations that others are expressing have as much to do with limitations 
placed on their own designs as with their hardware. Sure, there are other less 
stable and/or scalable systems that are faster, but with proper engineering you 
should be able to build a system that meets those design requirements. 

The one piece that wasn't there before but now is in 3.3 is the locking and 
performance problems during disk rebuilds which is now done at a much more 
granular level and I have successfully self-healed several vm images 
simultaneously while doing it on all of them without any measurable delays. 

Miles Fidelman mfidel...@meetinghouse.net wrote:

Joe Julian wrote:
 It would probably be better to ask this with end-goal questions 
 instead of with a unspecified critical feature list and
performance 
 problems.

Ok... I'm running a 2-node cluster that's essentially a mini cloud
stack 
- with storage and processing combined on the same boxes.  I'm running
a 
production VM that hosts a mail server, list server, web server, and 
database; another production VM providing a backup server for the 
cluster and for a bunch of desktop machines; and several VMs used for a

variety of development and testing purposes. It's all backed by a 
storage stack consisting of linux raid10 - lvm - drbd, and uses 
pacemaker for high-availability failover of the production VMs.  It all

performs reasonably well under moderate load (mail flows, web servers 
respond, database transactions complete, without notable user-level 
delays; queues don't back up; cpu and io loads stay within reasonable 
bounds).

The goals are to:
- add storage and processing capacity by adding two more nodes - each 
consisting of several CPU cores and 4 disks each
- maintain the flexibility to create/delete/migrate/failover virtual 
machines - across 4 nodes instead of 2
- avoid having to play games with pairwise DRBD configurations by
moving 
to a clustered filesystem
- in essence, I'm looking to do what Sheepdog purports to do, except in

a Xen environment

Earlier versions of gluster had reported problems with:
- supporting databases
- supporting VMs
- locking and performance problems during disk rebuilds
- and... most of the gluster documentation implies that it's preferable

to separate storage nodes from processing nodes

It looks like Gluster 3.2 and 3.3 have addressed some of these issues, 
and I'm trying to get a general read on whether it's worth putting in 
the effort of moving forward with some experimentation, or whether this

is a non-starter.  Is there anyone out there who's tried to run this 
kind of mini-cloud with gluster?  What kind of results have you had?



 On 12/26/2012 08:24 PM, Miles Fidelman wrote:
 Hi Folks,

 I find myself trying to expand a 2-node high-availability cluster 
 from to a 4-node cluster.  I'm running Xen virtualization, and 
 currently using DRBD to mirror data, and pacemaker to failover
cleanly.

 The thing is, I'm trying to add 2 nodes to the cluster, and DRBD 
 doesn't scale.  Also, as a function of rackspace limits, and the 
 hardware at hand, I can't separate storage nodes from compute nodes
- 
 instead, I have to live with 4 nodes, each with 4 large drives (but 
 also w/ 4 gigE ports per server).

 The obvious thought is to use Gluster to assemble all the drives
into 
 one large storage pool, with replication.  But.. last time I looked 
 at this (6 months or so back), it looked like some of the critical 
 features were brand new, and performance seemed to be a problem in 
 the configuration I'm thinking of.

 Which leads me to my question:  Has the situation improved to the 
 point that I can use Gluster this way?

 Thanks very much,

 Miles Fidelman



 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users


-- 
In theory, there is no difference between theory and practice.
In practice, there is.    Yogi Berra

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] how well will this work

2012-12-27 Thread Miles Fidelman

Dan Cyr wrote:


Miles - As is right now GlusterFS is not what you want for backend VM 
storage.


Question: “how well will this work”

Answer: “horribly”




Ok... that's the kind of answer I was looking for (though a 
disappointing one).


Thanks,

Miles

--
In theory, there is no difference between theory and practice.
In practice, there is.    Yogi Berra

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2012-12-27 Thread Miles Fidelman
Ok... now that's diametrically the opposite response from Dan Cyr's of a 
few minutes ago.


Can you say just a bit more about your configuration - how many nodes, 
do you have storage and processing combined or separated, how do you 
have your drives partitioned, and so forth?


Thanks,

Miles


Joe Julian wrote:
Trying to return to the actual question, the way I handle those is to 
mount gluster volumes that host the data for those tasks from within 
the vm. I've done that successfully since 2.0 with all of those services.


The limitations that others are expressing have as much to do with 
limitations placed on their own designs as with their hardware. Sure, 
there are other less stable and/or scalable systems that are faster, 
but with proper engineering you should be able to build a system that 
meets those design requirements.


The one piece that wasn't there before but now is in 3.3 is the 
locking and performance problems during disk rebuilds which is now 
done at a much more granular level and I have successfully self-healed 
several vm images simultaneously while doing it on all of them without 
any measurable delays.


Miles Fidelman mfidel...@meetinghouse.net wrote:

Joe Julian wrote:

It would probably be better to ask this with end-goal
questions instead of with a unspecified critical feature
list and performance problems.


Ok... I'm running a 2-node cluster that's essentially a mini cloud stack
- with storage and processing combined on the same boxes.  I'm running a
production VM that hosts a mail server, list server, web server, and
database; another production VM providing a backup server for the
cluster and for a bunch of desktop machines; and several VMs used for a
variety of development and testing purposes. It's all backed by a
storage stack consisting of linux raid10 - lvm - drbd, and uses
pacemaker for high-availability failover of the
production VMs.  It all
performs reasonably well under moderate load (mail flows, web servers
respond, database transactions complete, without notable user-level
delays; queues don't back up; cpu and io loads stay within reasonable
bounds).

The goals are to:
- add storage and processing capacity by adding two more nodes - each
consisting of several CPU cores and 4 disks each
- maintain the flexibility to create/delete/migrate/failover virtual
machines - across 4 nodes instead of 2
- avoid having to play games with pairwise DRBD configurations by moving
to a clustered filesystem
- in essence, I'm looking to do what Sheepdog purports to do, except in
a Xen environment

Earlier versions of gluster had reported problems with:
- supporting databases
- supporting VMs
- locking and performance problems during disk rebuilds
- and... most of the gluster documentation implies that it's
preferable
to separate storage nodes from processing nodes

It looks like Gluster 3.2 and 3.3 have addressed some of these issues,
and I'm trying to get a general read on whether it's worth putting in
the effort of moving forward with some experimentation, or whether this
is a non-starter.  Is there anyone out there who's tried to run this
kind of mini-cloud with gluster?  What kind of results have you had?



On 12/26/2012 08:24 PM, Miles Fidelman wrote:

Hi Folks, I find myself trying to expand a 2-node
high-availability cluster from to a 4-node cluster. I'm
running Xen virtualization, and currently using DRBD to
mirror data, and pacemaker to failover cleanly. The thing
is, I'm trying to add 2 nodes to the cluster, and DRBD
doesn't scale. Also, as a function of rackspace limits,
and the hardware at hand, I can't separate storage nodes
from compute nodes - instead, I have to live with 4 nodes,
each with 4 large drives (but also w/ 4 gigE ports per
server). The obvious thought is to use Gluster to assemble
all the drives into one large storage pool, with
replication. But.. last time I looked at this (6 months or
so back), it looked like some of the critical features
were brand new, and performance seemed to be a problem in
the configuration I'm thinking of. Which leads me to my
question: Has the situation improved to the point that I
can use Gluster this way? Thanks very much, Miles Fidelman


Gluster-users mailing list Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users





--
In theory, there is no difference between theory and practice.
In practice, there is.    Yogi Berra

___
Gluster-users mailing list

Re: [Gluster-users] how well will this work

2012-12-27 Thread Miles Fidelman

John Mark Walker wrote:

In general, I don't recommend any distributed filesystems for VM images, but I 
can also see that this is the wave of the future.

Ok.  I can see that.

Let's say that I take a slightly looser approach to high-availability:
- keep the static parts of my installs on local disk
- share and replicate dynamic data using gluster
- failover by rebooting on a different node (no image to worry about 
migrating)


In this scenario, how well does gluster work when:
- storage and processing are inter-mixed on the same nodes
- data is triply replicated (allow for 2-node failures)

Miles Fidelman

--
In theory, there is no difference between theory and practice.
In practice, there is.    Yogi Berra

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] how well will this work

2012-12-27 Thread Joe Julian
I have 3 servers with replica 3 volumes, 4 bricks per server on lvm 
partitions that are placed on each of 4 hard drives, 15 volumes 
resulting in 60 bricks per server. One of my servers is also a kvm host 
running (only) 24 vms.


Each vm image is only 6 gig, enough for the operating system and 
applications and is hosted on one volume. The data for each application 
is hosted on its own GlusterFS volume.


For mysql, I set up my innodb store to use 4 files (I don't do 1 file 
per table), each file distributes to each of the 4 replica subvolumes. 
This balances the load pretty nicely.


I don't really do anything special for anything else, other than the php 
app recommendations I make on my blog (http://joejulian.name) which all 
have nothing to do with the actual filesystem.


The thing that I think some people (even John Mark) miss apply is that 
this is just a tool. You have to engineer a solution using the tools you 
have available. If you feel the positives that GlusterFS provides 
outweigh the negatives, then you will simply have to engineer a solution 
that suits your end goal using this tool. It's not a question of whether 
it works, it's whether you can make it work for your use case.


On 12/27/2012 03:00 PM, Miles Fidelman wrote:
Ok... now that's diametrically the opposite response from Dan Cyr's of 
a few minutes ago.


Can you say just a bit more about your configuration - how many nodes, 
do you have storage and processing combined or separated, how do you 
have your drives partitioned, and so forth?


Thanks,

Miles


Joe Julian wrote:
Trying to return to the actual question, the way I handle those is to 
mount gluster volumes that host the data for those tasks from within 
the vm. I've done that successfully since 2.0 with all of those 
services.


The limitations that others are expressing have as much to do with 
limitations placed on their own designs as with their hardware. Sure, 
there are other less stable and/or scalable systems that are faster, 
but with proper engineering you should be able to build a system that 
meets those design requirements.


The one piece that wasn't there before but now is in 3.3 is the 
locking and performance problems during disk rebuilds which is now 
done at a much more granular level and I have successfully 
self-healed several vm images simultaneously while doing it on all of 
them without any measurable delays.


Miles Fidelman mfidel...@meetinghouse.net wrote:

Joe Julian wrote:

It would probably be better to ask this with end-goal
questions instead of with a unspecified critical feature
list and performance problems.


Ok... I'm running a 2-node cluster that's essentially a mini 
cloud stack
- with storage and processing combined on the same boxes. I'm 
running a

production VM that hosts a mail server, list server, web server, and
database; another production VM providing a backup server for the
cluster and for a bunch of desktop machines; and several VMs used 
for a

variety of development and testing purposes. It's all backed by a
storage stack consisting of linux raid10 - lvm - drbd, and uses
pacemaker for high-availability failover of the
production VMs.  It all
performs reasonably well under moderate load (mail flows, web 
servers

respond, database transactions complete, without notable user-level
delays; queues don't back up; cpu and io loads stay within 
reasonable

bounds).

The goals are to:
- add storage and processing capacity by adding two more nodes - 
each

consisting of several CPU cores and 4 disks each
- maintain the flexibility to create/delete/migrate/failover virtual
machines - across 4 nodes instead of 2
- avoid having to play games with pairwise DRBD configurations by 
moving

to a clustered filesystem
- in essence, I'm looking to do what Sheepdog purports to do, 
except in

a Xen environment

Earlier versions of gluster had reported problems with:
- supporting databases
- supporting VMs
- locking and performance problems during disk rebuilds
- and... most of the gluster documentation implies that it's
preferable
to separate storage nodes from processing nodes

It looks like Gluster 3.2 and 3.3 have addressed some of these 
issues,
and I'm trying to get a general read on whether it's worth 
putting in
the effort of moving forward with some experimentation, or 
whether this

is a non-starter.  Is there anyone out there who's tried to run this
kind of mini-cloud with gluster?  What kind of results have you had?



On 12/26/2012 08:24 PM, Miles Fidelman wrote:

Hi Folks, I find myself trying to expand a 2-node
high-availability cluster from to a 4-node cluster. I'm
running Xen virtualization, and currently using DRBD to
mirror data, and pacemaker to failover cleanly. The thing
is, I'm trying to 

Re: [Gluster-users] how well will this work

2012-12-26 Thread Joe Julian
It would probably be better to ask this with end-goal questions instead 
of with a unspecified critical feature list and performance problems.


6 months ago, for myself and quite an extensive (and often impressive) 
list of users there were no missing critical features nor was there any 
problems with performance. That's not to say that they did not meet your 
design specifications, but without those specs you're the only one who 
could evaluate that.


On 12/26/2012 08:24 PM, Miles Fidelman wrote:

Hi Folks,

I find myself trying to expand a 2-node high-availability cluster from 
to a 4-node cluster.  I'm running Xen virtualization, and currently 
using DRBD to mirror data, and pacemaker to failover cleanly.


The thing is, I'm trying to add 2 nodes to the cluster, and DRBD 
doesn't scale.  Also, as a function of rackspace limits, and the 
hardware at hand, I can't separate storage nodes from compute nodes - 
instead, I have to live with 4 nodes, each with 4 large drives (but 
also w/ 4 gigE ports per server).


The obvious thought is to use Gluster to assemble all the drives into 
one large storage pool, with replication.  But.. last time I looked at 
this (6 months or so back), it looked like some of the critical 
features were brand new, and performance seemed to be a problem in the 
configuration I'm thinking of.


Which leads me to my question:  Has the situation improved to the 
point that I can use Gluster this way?


Thanks very much,

Miles Fidelman




___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users