Re: [Gluster-users] how well will this work
On 1/2/2013 4:01 AM, Brian Candler wrote: Aside: what is the reason for creating four multiple logical volumes/bricks on the same node, and then combining them together using gluster distribution? Also, why are you combining all your disks into a single volume group (clustervg), but then allocating each logical volume from only a single disk within that VG? I've got a deployment I'm working on where each server has twelve 4TB drives. I've split that into two 6-drive RAID5 arrays, each of which is a PV/VG of 20TB, using LVM. Then I have split that into four 5TB logical volumes, each of which is used as a glusterfs brick. I have chosen this particular brick size for compatibility with other physical drive sizes. Once we have the gluster volume up and running, we'll be able to free up some 2TB drives from other storage devices. Those drives will be used in future servers to be added to the gluster volume(s). With similar 12-bay hardware and two RAID5 arrays per server, this 5TB brick size will work with any size drive that is a multiple of 1TB. Thanks, Shawn ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
On Fri, Dec 28, 2012 at 10:14:19AM -0800, Joe Julian wrote: In my configuration, 1 server has 4 drives (well, 5, but one's the OS). Each drive has one gpt partition. I create an lvm volume group that holds all four huge partitions. For any one GlusterFS volume I create 4 lvm logical volumes: lvcreate -n a_vmimages clustervg /dev/sda1 lvcreate -n b_vmimages clustervg /dev/sdb1 lvcreate -n c_vmimages clustervg /dev/sdc1 lvcreate -n d_vmimages clustervg /dev/sdd1 then format them xfs and (I) mount them under /data/glusterfs/vmimages/{a,b,c,d}. These four lvm partitions are bricks for the new GlusterFS volume. As glusterbot would say if asked for the glossary: A server hosts bricks (ie. server1:/foo) which belong to a volume which is accessed from a client. My volume would then look like gluster volume create replica 3 server{1,2,3}:/data/glusterfs/vmimages/a/brick server{1,2,3}:/data/glusterfs/vmimages/b/brick server{1,2,3}:/data/glusterfs/vmimages/c/brick server{1,2,3}:/data/glusterfs/vmimages/d/brick Aside: what is the reason for creating four multiple logical volumes/bricks on the same node, and then combining them together using gluster distribution? Also, why are you combining all your disks into a single volume group (clustervg), but then allocating each logical volume from only a single disk within that VG? Snapshots perhaps? Regards, Brian. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
On 1/2/13 6:01 AM, Brian Candler wrote: On Fri, Dec 28, 2012 at 10:14:19AM -0800, Joe Julian wrote: My volume would then look like gluster volume create replica 3 server{1,2,3}:/data/glusterfs/vmimages/a/brick server{1,2,3}:/data/glusterfs/vmimages/b/brick server{1,2,3}:/data/glusterfs/vmimages/c/brick server{1,2,3}:/data/glusterfs/vmimages/d/brick Aside: what is the reason for creating four multiple logical volumes/bricks on the same node, and then combining them together using gluster distribution? I'm not Joe, but I can think of two reasons why this might be a good idea. One is superior fault isolation. With a single concatenated or striped LV (i.e. no redundancy as with true RAID), a failure of any individual disk will appear as a failure of the entire brick, forcing *all* traffic to the peers. With multiple LVs, that same failure will cause only 1/4 of the traffic to fail over. The other reason is performance. I've found that it's very hard to predict whether letting LVM schedule across disks or letting GlusterFS do so will perform better for any given workload, but IMX the latter tends to win slightly more often than not. Also, why are you combining all your disks into a single volume group (clustervg), but then allocating each logical volume from only a single disk within that VG? That part's a bit unclear to me as well. There doesn't seem to be any immediate benefit, but perhaps it's more an issue of preparing for possible future change by adding an extra level of naming/indirection. That way, if the LVs need to be reconfigured some day, the change will be pretty transparent to anything that was addressing them by ID anyway. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
On 01/02/2013 03:37 AM, Jeff Darcy wrote: On 1/2/13 6:01 AM, Brian Candler wrote: On Fri, Dec 28, 2012 at 10:14:19AM -0800, Joe Julian wrote: My volume would then look like gluster volume create replica 3 server{1,2,3}:/data/glusterfs/vmimages/a/brick server{1,2,3}:/data/glusterfs/vmimages/b/brick server{1,2,3}:/data/glusterfs/vmimages/c/brick server{1,2,3}:/data/glusterfs/vmimages/d/brick Aside: what is the reason for creating four multiple logical volumes/bricks on the same node, and then combining them together using gluster distribution? I'm not Joe, but I can think of two reasons why this might be a good idea. One is superior fault isolation. With a single concatenated or striped LV (i.e. no redundancy as with true RAID), a failure of any individual disk will appear as a failure of the entire brick, forcing *all* traffic to the peers. With multiple LVs, that same failure will cause only 1/4 of the traffic to fail over. The other reason is performance. I've found that it's very hard to predict whether letting LVM schedule across disks or letting GlusterFS do so will perform better for any given workload, but IMX the latter tends to win slightly more often than not. Fault isolation is, indeed, why I do that. I don't need any faster reads than my network will handle, so raid isn't going to help me there. When a drive fails, gluster's (mostly) been good about handling that failure transparently to my services. Also, why are you combining all your disks into a single volume group (clustervg), but then allocating each logical volume from only a single disk within that VG? That part's a bit unclear to me as well. There doesn't seem to be any immediate benefit, but perhaps it's more an issue of preparing for possible future change by adding an extra level of naming/indirection. That way, if the LVs need to be reconfigured some day, the change will be pretty transparent to anything that was addressing them by ID anyway. Aha! Because when a drive's in pre-failure I can pvmove the lv's onto the new drive, or onto the other drives temporarily. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
On 12/27/12 6:47 PM, Miles Fidelman wrote: John Mark Walker wrote: In general, I don't recommend any distributed filesystems for VM images, but I can also see that this is the wave of the future. Ok. I can see that. Let's say that I take a slightly looser approach to high-availability: - keep the static parts of my installs on local disk - share and replicate dynamic data using gluster That, in a nutshell, is the approach that I (and others) often advocate. Block storage should be used sparingly, e.g. for booting and for data served to others at a higher level. I'd say that's true in general, but it's especially true for any kind of network block storage. When network latencies are involved, going up the stack where operations are expressed at a high semantic level will almost always work out better than blocks and locks. In this scenario, how well does gluster work when: - storage and processing are inter-mixed on the same nodes That works fine and is a common deployment model for the community code, though RHS demands a separate server and client model. The main thing to watch out for is CPU/memory contention between application and Gluster processes. Those can be addressed in all the standard ways, from cgroups to containers to virtualization. - data is triply replicated (allow for 2-node failures) Unfortunately, three-way replication is still a bit of a work in progress. Some (such as Joe Julian) use it successfully, but they also use it very carefully. I've had to make a few fixes in this area myself recently, and I expect to make a few more before I'd say that it's really up to snuff for general use. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
Jeff Darcy wrote: On 12/27/12 6:47 PM, Miles Fidelman wrote: John Mark Walker wrote: In general, I don't recommend any distributed filesystems for VM images, but I can also see that this is the wave of the future. Ok. I can see that. Let's say that I take a slightly looser approach to high-availability: - keep the static parts of my installs on local disk - share and replicate dynamic data using gluster That, in a nutshell, is the approach that I (and others) often advocate. Block storage should be used sparingly, e.g. for booting and for data served to others at a higher level. I'd say that's true in general, but it's especially true for any kind of network block storage. When network latencies are involved, going up the stack where operations are expressed at a high semantic level will almost always work out better than blocks and locks. What's the alternative, though? Ok, for application files (say a word processing document) that works, but what about spools, databases, and such? Seems like blocks are the common denominator. - data is triply replicated (allow for 2-node failures) Unfortunately, three-way replication is still a bit of a work in progress. Some (such as Joe Julian) use it successfully, but they also use it very carefully. I've had to make a few fixes in this area myself recently, and I expect to make a few more before I'd say that it's really up to snuff for general use. That's a bit disappointing. For high-availability applications (like mine), 3-way replication would seem to be the major advantage of a cluster file system over DRBD. Thanks, Miles Fidelman -- In theory, there is no difference between theory and practice. In practice, there is. Yogi Berra ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
On 12/30/12 1:33 PM, Miles Fidelman wrote: What's the alternative, though? Ok, for application files (say a word processing document) that works, but what about spools, databases, and such? Seems like blocks are the common denominator. It's all blocks underneath; it's a matter of how you get to those blocks. If you use a simulated block device which is actually a GlusterFS file, then you'll be going through both FUSE and the loopback driver. That actually works OK for many things, but latency will be a bit high e.g. for databases. One option is to use the qemu interface, which avoids both sources of overhead. In fact, the overhead from virtualizing your database server is likely to be lower than FUSE+loopback because our esteemed kernel colleagues seem a lot more interested in making virtual I/O work better than in doing the same for FUSE. It's still a tiny hit compared to running a DB on bare metal, but the value of being able to survive a failure should more than outweigh that. For high-availability applications (like mine), 3-way replication would seem to be the major advantage of a cluster file system over DRBD. It all depends on how many failures, of which type, you need to handle, and what price you're willing to pay in terms of storage utilization. It's easy to get protection against two disk failures or one server/network failure using replication plus RAID on the servers. If you want protection against two server failures, then there's geosync. You could also try using local sync (AFR) and it would probably work for you (as it does for Joe), but there's the caveat that we're still working on some of the more unusual edge cases. Only you and your tests can say whether that's good enough. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
Jeff, Thanks for the details. If I might trouble you for a few more... Jeff Darcy wrote: On 12/30/12 1:33 PM, Miles Fidelman wrote: What's the alternative, though? Ok, for application files (say a word processing document) that works, but what about spools, databases, and such? Seems like blocks are the common denominator. It's all blocks underneath; it's a matter of how you get to those blocks. If you use a simulated block device which is actually a GlusterFS file, then you'll be going through both FUSE and the loopback driver. That actually works OK for many things, but latency will be a bit high e.g. for databases. One option is to use the qemu interface, which avoids both sources of overhead. In fact, the overhead from virtualizing your database server is likely to be lower than FUSE+loopback because our esteemed kernel colleagues seem a lot more interested in making virtual I/O work better than in doing the same for FUSE. It's still a tiny hit compared to running a DB on bare metal, but the value of being able to survive a failure should more than outweigh that. I'm running Xen virtualization, and I understand how all the pieces fit together for running paravitualized hosts over a local disk, software raid, LVM, and DRBD - but none of those involve qemu. I wonder if you could say a little bit about how all the pieces wire together, if I wanted to mount a Gluster filesystem from a paravirtualized VM, through the qemu interface? Thanks again, Miles Fidelman -- In theory, there is no difference between theory and practice. In practice, there is. Yogi Berra ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
On 12/30/2012 12:31 PM, William Muriithi wrote: For mysql, I set up my innodb store to use 4 files (I don't do 1 file per table), each file distributes to each of the 4 replica subvolumes. This balances the load pretty nicely. It's not so much a how glusterfs works question as much as it is a how innodb works question. By configuring the innodb_data_file_path to start with a multiple of your bricks (and carefully choosing some filenames to ensure they're distributed evenly), records seem to be (and I only have tested this through actual use and have no idea if this is how it's supposed to work) accessed evenly over the distribute set. Hmm, have you checked on the gluster servers that these four files are in separate bricks? As far as I understand, if you have not done anything Glusterfs scheduler (Default ALU on version 3.3), it is likely that is not whats happening. Or you are using a version that has a different scheduler. Interesting though. Poke around and update us please Not just checked, but engineered. At the time I created a file then checked which dht subvolume it was on using getfattr -n trusted.glusterfs.pathinfo $file for each file, then incremented the filename until it was created on the subvolume I wanted. As an aside, I'm referring to DHT (distrubute) /subvolumes/ rather than bricks because AFR (replicate) is under DHT meaning that replicate actually is the translator whose subvolumes map 1:1 to bricks in my setup. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
Joe, Thanks for the details, but I'm having just a bit of a problem parsing and picturing the description. Any possibility of walking this through from the bottom up - hardware (server + n drives) - drive partitioning - lvm setup - gluster config - VM setup? Thanks again, Miles Joe Julian wrote: I have 3 servers with replica 3 volumes, 4 bricks per server on lvm partitions that are placed on each of 4 hard drives, 15 volumes resulting in 60 bricks per server. One of my servers is also a kvm host running (only) 24 vms. Each vm image is only 6 gig, enough for the operating system and applications and is hosted on one volume. The data for each application is hosted on its own GlusterFS volume. For mysql, I set up my innodb store to use 4 files (I don't do 1 file per table), each file distributes to each of the 4 replica subvolumes. This balances the load pretty nicely. I don't really do anything special for anything else, other than the php app recommendations I make on my blog (http://joejulian.name) which all have nothing to do with the actual filesystem. The thing that I think some people (even John Mark) miss apply is that this is just a tool. You have to engineer a solution using the tools you have available. If you feel the positives that GlusterFS provides outweigh the negatives, then you will simply have to engineer a solution that suits your end goal using this tool. It's not a question of whether it works, it's whether you can make it work for your use case. On 12/27/2012 03:00 PM, Miles Fidelman wrote: Ok... now that's diametrically the opposite response from Dan Cyr's of a few minutes ago. Can you say just a bit more about your configuration - how many nodes, do you have storage and processing combined or separated, how do you have your drives partitioned, and so forth? Thanks, Miles Joe Julian wrote: Trying to return to the actual question, the way I handle those is to mount gluster volumes that host the data for those tasks from within the vm. I've done that successfully since 2.0 with all of those services. The limitations that others are expressing have as much to do with limitations placed on their own designs as with their hardware. Sure, there are other less stable and/or scalable systems that are faster, but with proper engineering you should be able to build a system that meets those design requirements. The one piece that wasn't there before but now is in 3.3 is the locking and performance problems during disk rebuilds which is now done at a much more granular level and I have successfully self-healed several vm images simultaneously while doing it on all of them without any measurable delays. Miles Fidelman mfidel...@meetinghouse.net wrote: Joe Julian wrote: It would probably be better to ask this with end-goal questions instead of with a unspecified critical feature list and performance problems. Ok... I'm running a 2-node cluster that's essentially a mini cloud stack - with storage and processing combined on the same boxes. I'm running a production VM that hosts a mail server, list server, web server, and database; another production VM providing a backup server for the cluster and for a bunch of desktop machines; and several VMs used for a variety of development and testing purposes. It's all backed by a storage stack consisting of linux raid10 - lvm - drbd, and uses pacemaker for high-availability failover of the production VMs. It all performs reasonably well under moderate load (mail flows, web servers respond, database transactions complete, without notable user-level delays; queues don't back up; cpu and io loads stay within reasonable bounds). The goals are to: - add storage and processing capacity by adding two more nodes - each consisting of several CPU cores and 4 disks each - maintain the flexibility to create/delete/migrate/failover virtual machines - across 4 nodes instead of 2 - avoid having to play games with pairwise DRBD configurations by moving to a clustered filesystem - in essence, I'm looking to do what Sheepdog purports to do, except in a Xen environment Earlier versions of gluster had reported problems with: - supporting databases - supporting VMs - locking and performance problems during disk rebuilds - and... most of the gluster documentation implies that it's preferable to separate storage nodes from processing nodes It looks like Gluster 3.2 and 3.3 have addressed some of these issues, and I'm trying to get a general read on whether it's worth putting in the effort of moving forward with some experimentation, or whether this is a non-starter. Is there anyone out there who's tried to run this kind of mini-cloud with gluster? What kind of results have you had? On 12/26/2012 08:24 PM, Miles
Re: [Gluster-users] how well will this work
Joe, I have 3 servers with replica 3 volumes, 4 bricks per server on lvm partitions that are placed on each of 4 hard drives, 15 volumes resulting in 60 bricks per server. One of my servers is also a kvm host running (only) 24 vms. Mind explaining your setup again. I kind of could not follow, probably because of terminology issues. For example 4 bricks per server - Don't understand this part, I assumes a brick == 1 physical server (Okay, could also be one vm, but don't see how that would be help unless its a test environment). The way you put it though, mean I have issues with my terminology. Isn't there a 1:1 relationship between brick and server? Each vm image is only 6 gig, enough for the operating system and applications and is hosted on one volume. The data for each application is hosted on its own GlusterFS volume. Hmm, petty good idea, especially security wise. Means one VM can not mess with another vm files. Is it possible to extend gluster volume without destroying and recreating it with bigger peer storage setting For mysql, I set up my innodb store to use 4 files (I don't do 1 file per table), each file distributes to each of the 4 replica subvolumes. This balances the load pretty nicely. I thought lots of small files would be better than 4 huge files? I mean, why does this work out better performance wise? Not saying its wrong, I am just trying to learn from you as I am looking for a similar setup. However, I could not think why using 4 files would be better but this may because I don't understand how glusterfs works may be I don't really do anything special for anything else, other than the php app recommendations I make on my blog (http://joejulian.name) which all have nothing to do with the actual filesystem. Thanks for the link The thing that I think some people (even John Mark) miss apply is that this is just a tool. You have to engineer a solution using the tools you have available. If you feel the positives that GlusterFS provides outweigh the negatives, then you will simply have to engineer a solution that suits your end goal using this tool. It's not a question of whether it works, it's whether you can make it work for your use case. On 12/27/2012 03:00 PM, Miles Fidelman wrote: Ok... now that's diametrically the opposite response from Dan Cyr's of a few minutes ago. William ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
On 12/28/2012 08:54 AM, William Muriithi wrote: Joe, I have 3 servers with replica 3 volumes, 4 bricks per server on lvm partitions that are placed on each of 4 hard drives, 15 volumes resulting in 60 bricks per server. One of my servers is also a kvm host running (only) 24 vms. Mind explaining your setup again. I kind of could not follow, probably because of terminology issues. For example 4 bricks per server - Don't understand this part, I assumes a brick == 1 physical server (Okay, could also be one vm, but don't see how that would be help unless its a test environment). The way you put it though, mean I have issues with my terminology. Isn't there a 1:1 relationship between brick and server? In my configuration, 1 server has 4 drives (well, 5, but one's the OS). Each drive has one gpt partition. I create an lvm volume group that holds all four huge partitions. For any one GlusterFS volume I create 4 lvm logical volumes: lvcreate -n a_vmimages clustervg /dev/sda1 lvcreate -n b_vmimages clustervg /dev/sdb1 lvcreate -n c_vmimages clustervg /dev/sdc1 lvcreate -n d_vmimages clustervg /dev/sdd1 then format them xfs and (I) mount them under /data/glusterfs/vmimages/{a,b,c,d}. These four lvm partitions are bricks for the new GlusterFS volume. As glusterbot would say if asked for the glossary: A server hosts bricks (ie. server1:/foo) which belong to a volume which is accessed from a client. My volume would then look like gluster volume create replica 3 server{1,2,3}:/data/glusterfs/vmimages/a/brick server{1,2,3}:/data/glusterfs/vmimages/b/brick server{1,2,3}:/data/glusterfs/vmimages/c/brick server{1,2,3}:/data/glusterfs/vmimages/d/brick Each vm image is only 6 gig, enough for the operating system and applications and is hosted on one volume. The data for each application is hosted on its own GlusterFS volume. Hmm, petty good idea, especially security wise. Means one VM can not mess with another vm files. Is it possible to extend gluster volume without destroying and recreating it with bigger peer storage setting I can do that two ways. I can add servers with storage and then add-brick to expand, or I can resize the lvm partitions and grow xfs (which I have done live several times). For mysql, I set up my innodb store to use 4 files (I don't do 1 file per table), each file distributes to each of the 4 replica subvolumes. This balances the load pretty nicely. I thought lots of small files would be better than 4 huge files? I mean, why does this work out better performance wise? Not saying its wrong, I am just trying to learn from you as I am looking for a similar setup. However, I could not think why using 4 files would be better but this may because I don't understand how glusterfs works may be It's not so much a how glusterfs works question as much as it is a how innodb works question. By configuring the innodb_data_file_path to start with a multiple of your bricks (and carefully choosing some filenames to ensure they're distributed evenly), records seem to be (and I only have tested this through actual use and have no idea if this is how it's supposed to work) accessed evenly over the distribute set. With a one file per table model, all records read from any specific table will be read from only one distribute subvolume. At least with my data set, that would hit one distribute subvolume really heavily while leaving the rest fairly idle. I don't really do anything special for anything else, other than the php app recommendations I make on my blog (http://joejulian.name) which all have nothing to do with the actual filesystem. Thanks for the link The thing that I think some people (even John Mark) miss apply is that this is just a tool. You have to engineer a solution using the tools you have available. If you feel the positives that GlusterFS provides outweigh the negatives, then you will simply have to engineer a solution that suits your end goal using this tool. It's not a question of whether it works, it's whether you can make it work for your use case. On 12/27/2012 03:00 PM, Miles Fidelman wrote: Ok... now that's diametrically the opposite response from Dan Cyr's of a few minutes ago. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
Joe Julian wrote: It would probably be better to ask this with end-goal questions instead of with a unspecified critical feature list and performance problems. Ok... I'm running a 2-node cluster that's essentially a mini cloud stack - with storage and processing combined on the same boxes. I'm running a production VM that hosts a mail server, list server, web server, and database; another production VM providing a backup server for the cluster and for a bunch of desktop machines; and several VMs used for a variety of development and testing purposes. It's all backed by a storage stack consisting of linux raid10 - lvm - drbd, and uses pacemaker for high-availability failover of the production VMs. It all performs reasonably well under moderate load (mail flows, web servers respond, database transactions complete, without notable user-level delays; queues don't back up; cpu and io loads stay within reasonable bounds). The goals are to: - add storage and processing capacity by adding two more nodes - each consisting of several CPU cores and 4 disks each - maintain the flexibility to create/delete/migrate/failover virtual machines - across 4 nodes instead of 2 - avoid having to play games with pairwise DRBD configurations by moving to a clustered filesystem - in essence, I'm looking to do what Sheepdog purports to do, except in a Xen environment Earlier versions of gluster had reported problems with: - supporting databases - supporting VMs - locking and performance problems during disk rebuilds - and... most of the gluster documentation implies that it's preferable to separate storage nodes from processing nodes It looks like Gluster 3.2 and 3.3 have addressed some of these issues, and I'm trying to get a general read on whether it's worth putting in the effort of moving forward with some experimentation, or whether this is a non-starter. Is there anyone out there who's tried to run this kind of mini-cloud with gluster? What kind of results have you had? On 12/26/2012 08:24 PM, Miles Fidelman wrote: Hi Folks, I find myself trying to expand a 2-node high-availability cluster from to a 4-node cluster. I'm running Xen virtualization, and currently using DRBD to mirror data, and pacemaker to failover cleanly. The thing is, I'm trying to add 2 nodes to the cluster, and DRBD doesn't scale. Also, as a function of rackspace limits, and the hardware at hand, I can't separate storage nodes from compute nodes - instead, I have to live with 4 nodes, each with 4 large drives (but also w/ 4 gigE ports per server). The obvious thought is to use Gluster to assemble all the drives into one large storage pool, with replication. But.. last time I looked at this (6 months or so back), it looked like some of the critical features were brand new, and performance seemed to be a problem in the configuration I'm thinking of. Which leads me to my question: Has the situation improved to the point that I can use Gluster this way? Thanks very much, Miles Fidelman ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users -- In theory, there is no difference between theory and practice. In practice, there is. Yogi Berra ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
On 12-12-26 10:24 PM, Miles Fidelman wrote: Hi Folks, I find myself trying to expand a 2-node high-availability cluster from to a 4-node cluster. I'm running Xen virtualization, and currently using DRBD to mirror data, and pacemaker to failover cleanly. The thing is, I'm trying to add 2 nodes to the cluster, and DRBD doesn't scale. Also, as a function of rackspace limits, and the hardware at hand, I can't separate storage nodes from compute nodes - instead, I have to live with 4 nodes, each with 4 large drives (but also w/ 4 gigE ports per server). The obvious thought is to use Gluster to assemble all the drives into one large storage pool, with replication. But.. last time I looked at this (6 months or so back), it looked like some of the critical features were brand new, and performance seemed to be a problem in the configuration I'm thinking of. Which leads me to my question: Has the situation improved to the point that I can use Gluster this way? Thanks very much, Miles Fidelman Hi, I have a XenServer pool (3 servers) talking to an GlusterFS replicate server over NFS with uCARP for IP failover. The system was put in place in May 2012, using GlusterFS 3.3. It ran very well, with speeds comparable to my existing iSCSI solution (http://majentis.com/2011/09/21/xenserver-iscsi-and-glusterfsnfs/ I was quite pleased with the system, it worked flawlessly. Until November. At that point, the Gluster NFS server started stalling under load. It would become unresponsive for a long enough period of time that the VM's under XenServer would lose their drives. Linux would remount the drives read-only and then eventually lock up, while Windows would just lock up. In this case, Windows was more resilient to the transient disk loss. I have been unable to solve the problem, and am now switching back to a DRBD/iSCSI setup. I'm not happy about it, but we were losing NFS connectively nightly, during backups. Life was hell for a long time while I was trying to fix things. Gerald ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
On Wed, 26 Dec 2012 22:04:09 -0800 Joe Julian j...@julianfamily.org wrote: It would probably be better to ask this with end-goal questions instead of with a unspecified critical feature list and performance problems. 6 months ago, for myself and quite an extensive (and often impressive) list of users there were no missing critical features nor was there any problems with performance. That's not to say that they did not meet your design specifications, but without those specs you're the only one who could evaluate that. Well, then the list of users does obviously not contain me ;-) The damn thing will only become impressive if a native kernel client module is done. FUSE is really a pain. And read my lips: the NFS implementation has general load/performance problems. Don't be surprised if it jumps into your face. Why on earth do they think linux has NFS as kernel implementation? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
On Wed, Dec 26, 2012 at 11:24:25PM -0500, Miles Fidelman wrote: I find myself trying to expand a 2-node high-availability cluster from to a 4-node cluster. I'm running Xen virtualization, and currently using DRBD to mirror data, and pacemaker to failover cleanly. Not answering your question directly, but have you looked at Ganeti? This is a front-end to Xen+LVM+DRBD (open source, written by Google) which makes it easy to manage such a cluster, assuming DRBD is meeting your needs well at the moment. With Ganeti each VM image is its own logical volume, with its own DRBD instance sitting on top, so you can have different VMs mirrored between different pairs of machines. You can migrate storage, albeit slowly (e.g. starting with A mirrored to B, you can break the mirroring then re-mirror A to C, and then mirror C to D). Ganeti automates all this for you. Another option to look at is Sheepdog, which is a clustered block-storage device, but this would require you to switch from Xen to KVM. and performance seemed to be a problem in the configuration I'm thinking of. With KVM at least, last time I tried performance was still very poor when a VM image was being written to a file over gluster - I measured about 6MB/s. However remember that each VM can directly mount glusterfs volumes internally, and the performance of this is fine - and it also means you can share data between the VMs. So with some rearchitecture of your application you may get sufficient performance for your needs. Regards, Brian. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
Look, fuse its issues that we all know about. Either it works for you or it doesn't. If fuse bothers you that much, look into libgfapi. Re: NFS - I'm trying to help track this down. Please either add your comment to an existing bug or create a new ticket. Either way, ranting won't solve your problem or inspire anyone to fix it. -JM Stephan von Krawczynski sk...@ithnet.com wrote: On Wed, 26 Dec 2012 22:04:09 -0800 Joe Julian j...@julianfamily.org wrote: It would probably be better to ask this with end-goal questions instead of with a unspecified critical feature list and performance problems. 6 months ago, for myself and quite an extensive (and often impressive) list of users there were no missing critical features nor was there any problems with performance. That's not to say that they did not meet your design specifications, but without those specs you're the only one who could evaluate that. Well, then the list of users does obviously not contain me ;-) The damn thing will only become impressive if a native kernel client module is done. FUSE is really a pain. And read my lips: the NFS implementation has general load/performance problems. Don't be surprised if it jumps into your face. Why on earth do they think linux has NFS as kernel implementation? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
Gerald Brandt wrote: On 12-12-26 10:24 PM, Miles Fidelman wrote: The thing is, I'm trying to add 2 nodes to the cluster, and DRBD doesn't scale. Also, as a function of rackspace limits, and the hardware at hand, I can't separate storage nodes from compute nodes - instead, I have to live with 4 nodes, each with 4 large drives (but also w/ 4 gigE ports per server). I have a XenServer pool (3 servers) talking to an GlusterFS replicate server over NFS with uCARP for IP failover. If I read this properly, you have 3 compute servers, and a separate box with all your nodes - which is quite different from my setup (4 nodes, all will be both compute and storage). Or am I reading this wrong? Thanks though. Miles Fidelman -- In theory, there is no difference between theory and practice. In practice, there is. Yogi Berra ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
Brian Candler wrote: On Wed, Dec 26, 2012 at 11:24:25PM -0500, Miles Fidelman wrote: I find myself trying to expand a 2-node high-availability cluster from to a 4-node cluster. I'm running Xen virtualization, and currently using DRBD to mirror data, and pacemaker to failover cleanly. Not answering your question directly, but have you looked at Ganeti? This is a front-end to Xen+LVM+DRBD (open source, written by Google) which makes it easy to manage such a cluster, assuming DRBD is meeting your needs well at the moment. I keep looking at Ganeti, played with it a bit in a test installation. It does a lot, but it falls short in two regards: - it doesn't really have an auto-failover function (it keeps getting closer, but no cigar, at least last time I looked) - you either need intervene manually on a node failure, or you need to add something like pacemaker, and the plumbing starts to get very confused - the second, you've identified With Ganeti each VM image is its own logical volume, with its own DRBD instance sitting on top, so you can have different VMs mirrored between different pairs of machines. You can migrate storage, albeit slowly (e.g. starting with A mirrored to B, you can break the mirroring then re-mirror A to C, and then mirror C to D). Ganeti automates all this for you. This is precisely what I'm hoping to get past with a cluster file-system. Another option to look at is Sheepdog, which is a clustered block-storage device, but this would require you to switch from Xen to KVM. You nailed it. Sheepdog is architected for nodes that combine storage and processing. Sheepdog on Xen would be ideal. Sigh With KVM at least, last time I tried performance was still very poor when a VM image was being written to a file over gluster - I measured about 6MB/s. However remember that each VM can directly mount glusterfs volumes internally, and the performance of this is fine - and it also means you can share data between the VMs. So with some rearchitecture of your application you may get sufficient performance for your needs. Thanks! Miles Fidelman -- In theory, there is no difference between theory and practice. In practice, there is. Yogi Berra ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
Dear JM, unfortunately one has to tell openly that the whole concept that is tried here is simply wrong. The problem is not the next-bug-to-fix. The problem is the client strategy in user space. It is broken by design. You can either believe this or go ahead ignoring it and never really get a good and stable setup. Really, the whole we-close-our-eyes-and-hope-it-will-turn-out-well strategy looks just like btrfs. Read the archives, I told them years ago it will not work out in our life time. And today, still they have no ready-for-production fs, and believe me: it never will be there. And the same goes for glusterfs. It _could_ be the greatest fs on earth, but only if you accept: 1) Throw away all non-linux code. Because this war is over since long. 2) Make a kernel based client/server implementation. Because it is the only way to acceptable performance. 3) Implement true undelete feature. Make delete a move to a deleted-files area. These are the minimal steps to take for a real success, everything else is just beating the dead horse. Regards, Stephan On Thu, 27 Dec 2012 10:03:10 -0500 (EST) John Mark Walker johnm...@redhat.com wrote: Look, fuse its issues that we all know about. Either it works for you or it doesn't. If fuse bothers you that much, look into libgfapi. Re: NFS - I'm trying to help track this down. Please either add your comment to an existing bug or create a new ticket. Either way, ranting won't solve your problem or inspire anyone to fix it. -JM Stephan von Krawczynski sk...@ithnet.com wrote: On Wed, 26 Dec 2012 22:04:09 -0800 Joe Julian j...@julianfamily.org wrote: It would probably be better to ask this with end-goal questions instead of with a unspecified critical feature list and performance problems. 6 months ago, for myself and quite an extensive (and often impressive) list of users there were no missing critical features nor was there any problems with performance. That's not to say that they did not meet your design specifications, but without those specs you're the only one who could evaluate that. Well, then the list of users does obviously not contain me ;-) The damn thing will only become impressive if a native kernel client module is done. FUSE is really a pain. And read my lips: the NFS implementation has general load/performance problems. Don't be surprised if it jumps into your face. Why on earth do they think linux has NFS as kernel implementation? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
Stephan, I'm going to make this as simple as possible. Every message to this list should follow these rules: 1. be helpful 2. be constructive 3. be respectful I will not tolerate ranting that serves no purpose. If your message doesn't follow any of the rules above, then you shouldn't be posting it. This is your 2nd warning. -JM ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
I didn't think his message violated any of your rules. Seems to me he has some disagreements with the approach being used to develop Gluster. I think you should listen to people who disagree with you. From monitoring this list for more than a year and tried--unsuccessfully--to put Gluster into production use, I think there are a lot of people who have problems with stability. So please, can you respond to his comments with why his suggestions are invalid? sean On 12/27/2012 03:39 PM, John Mark Walker wrote: Stephan, I'm going to make this as simple as possible. Every message to this list should follow these rules: 1. be helpful 2. be constructive 3. be respectful I will not tolerate ranting that serves no purpose. If your message doesn't follow any of the rules above, then you shouldn't be posting it. This is your 2nd warning. -JM ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users -- Sean Fulton GCN Publishing, Inc. Internet Design, Development and Consulting For Today's Media Companies http://www.gcnpublishing.com (203) 665-6211, x203 ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
I'm going to make this as simple as possible. Every message to this list should follow these rules: 1. be helpful 2. be constructive 3. be respectful I will not tolerate ranting that serves no purpose. If your message doesn't follow any of the rules above, then you shouldn't be posting it. Might be jumping in here at a random spot, but looking at Stephan's e-mail it was all three. It was helpful and constructive by outlining a concrete strategy that would make glusterfs greater in his opinion and to an extent that's something I share, performance IS an issue and makes me hesitate in moving glusterfs to the next level at our site (right now we have a 6 node 12 brick configuration that's used extensively as /home, target would be a 180 node 2PB distributed 2-way replicated installation). We hit FUSE snags from day 2 and are running on NFS right now because negative lookup caching is not in FUSE. In fact there is no caching. And NFS has hiccups that cause issues especially for us because we use vz containers with bind mounting so if the headnode nfs goes stale we have to hack a lot to get the stale mount remounted in all the VZ images. I've had at least two or three instances where I had to stop all the containers killing user tasks to remount stably. And to be fair at least in this particular e-mail I didn't really see much disrespect, just some comparisons that I think still remained in respectful range. Mario Kadastik, PhD Researcher --- Physics is like sex, sure it may have practical reasons, but that's not why we do it -- Richard P. Feynman ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
I also don’t think this is a rant. I, as well, have been following this list for a few years, and have been waiting for GlusterFS to stabilize for VM deployment. I hope this discussion helps the devs understand areas that people are waiting for. We have 2 SAN servers with Infiniband connections to a Blade Center. I would like all the KVM VMs hosted on the SAN with the ability to add more SAN servers in the future. – Currently Gluster allows this via NFS but I’ve read about performance issues. – So, right now, after 2 years of not deploying this gear (and running the VMs images on each blade), am looking for an expandable solution for the backend storage so I stop manually babying this network and install OpenNebula so I’m not the only person in our office who can manage our VM infrastructure. This does fit into the OP’s question because I would love to see GlusterFS work like this. Miles - As is right now GlusterFS is not what you want for backend VM storage. Question: “how well will this work” Answer: “horribly” Dan From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of John Mark Walker Sent: Thursday, December 27, 2012 12:39 PM To: Stephan von Krawczynski Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] how well will this work Stephan, I'm going to make this as simple as possible. Every message to this list should follow these rules: 1. be helpful 2. be constructive 3. be respectful I will not tolerate ranting that serves no purpose. If your message doesn't follow any of the rules above, then you shouldn't be posting it. This is your 2nd warning. -JM ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
On Thu, 27 Dec 2012 13:24:55 -0800 Dan Cyr d...@truenorthmanagement.com wrote: I also don’t think this is a rant. I, as well, have been following this list for a few years, and have been waiting for GlusterFS to stabilize for VM deployment. I hope this discussion helps the devs understand areas that people are waiting for. We have 2 SAN servers with Infiniband connections to a Blade Center. I would like all the KVM VMs hosted on the SAN with the ability to add more SAN servers in the future. – Currently Gluster allows this via NFS but I’ve read about performance issues. – So, right now, after 2 years of not deploying this gear (and running the VMs images on each blade), am looking for an expandable solution for the backend storage so I stop manually babying this network and install OpenNebula so I’m not the only person in our office who can manage our VM infrastructure. This does fit into the OP’s question because I would love to see GlusterFS work like this. Miles - As is right now GlusterFS is not what you want for backend VM storage. Question: “how well will this work” Answer: “horribly” Dan From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of John Mark Walker Sent: Thursday, December 27, 2012 12:39 PM To: Stephan von Krawczynski Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] how well will this work Stephan, I'm going to make this as simple as possible. Every message to this list should follow these rules: 1. be helpful 2. be constructive 3. be respectful I will not tolerate ranting that serves no purpose. If your message doesn't follow any of the rules above, then you shouldn't be posting it. This is your 2nd warning. -JM Hola JM, are you aware that your above message has neither arrived at my side through the list, nor through personal mail. Does this mean I got deleted from the list by you? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
Trying to return to the actual question, the way I handle those is to mount gluster volumes that host the data for those tasks from within the vm. I've done that successfully since 2.0 with all of those services. The limitations that others are expressing have as much to do with limitations placed on their own designs as with their hardware. Sure, there are other less stable and/or scalable systems that are faster, but with proper engineering you should be able to build a system that meets those design requirements. The one piece that wasn't there before but now is in 3.3 is the locking and performance problems during disk rebuilds which is now done at a much more granular level and I have successfully self-healed several vm images simultaneously while doing it on all of them without any measurable delays. Miles Fidelman mfidel...@meetinghouse.net wrote: Joe Julian wrote: It would probably be better to ask this with end-goal questions instead of with a unspecified critical feature list and performance problems. Ok... I'm running a 2-node cluster that's essentially a mini cloud stack - with storage and processing combined on the same boxes. I'm running a production VM that hosts a mail server, list server, web server, and database; another production VM providing a backup server for the cluster and for a bunch of desktop machines; and several VMs used for a variety of development and testing purposes. It's all backed by a storage stack consisting of linux raid10 - lvm - drbd, and uses pacemaker for high-availability failover of the production VMs. It all performs reasonably well under moderate load (mail flows, web servers respond, database transactions complete, without notable user-level delays; queues don't back up; cpu and io loads stay within reasonable bounds). The goals are to: - add storage and processing capacity by adding two more nodes - each consisting of several CPU cores and 4 disks each - maintain the flexibility to create/delete/migrate/failover virtual machines - across 4 nodes instead of 2 - avoid having to play games with pairwise DRBD configurations by moving to a clustered filesystem - in essence, I'm looking to do what Sheepdog purports to do, except in a Xen environment Earlier versions of gluster had reported problems with: - supporting databases - supporting VMs - locking and performance problems during disk rebuilds - and... most of the gluster documentation implies that it's preferable to separate storage nodes from processing nodes It looks like Gluster 3.2 and 3.3 have addressed some of these issues, and I'm trying to get a general read on whether it's worth putting in the effort of moving forward with some experimentation, or whether this is a non-starter. Is there anyone out there who's tried to run this kind of mini-cloud with gluster? What kind of results have you had? On 12/26/2012 08:24 PM, Miles Fidelman wrote: Hi Folks, I find myself trying to expand a 2-node high-availability cluster from to a 4-node cluster. I'm running Xen virtualization, and currently using DRBD to mirror data, and pacemaker to failover cleanly. The thing is, I'm trying to add 2 nodes to the cluster, and DRBD doesn't scale. Also, as a function of rackspace limits, and the hardware at hand, I can't separate storage nodes from compute nodes - instead, I have to live with 4 nodes, each with 4 large drives (but also w/ 4 gigE ports per server). The obvious thought is to use Gluster to assemble all the drives into one large storage pool, with replication. But.. last time I looked at this (6 months or so back), it looked like some of the critical features were brand new, and performance seemed to be a problem in the configuration I'm thinking of. Which leads me to my question: Has the situation improved to the point that I can use Gluster this way? Thanks very much, Miles Fidelman ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users -- In theory, there is no difference between theory and practice. In practice, there is. Yogi Berra ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
Dan Cyr wrote: Miles - As is right now GlusterFS is not what you want for backend VM storage. Question: “how well will this work” Answer: “horribly” Ok... that's the kind of answer I was looking for (though a disappointing one). Thanks, Miles -- In theory, there is no difference between theory and practice. In practice, there is. Yogi Berra ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
Ok... now that's diametrically the opposite response from Dan Cyr's of a few minutes ago. Can you say just a bit more about your configuration - how many nodes, do you have storage and processing combined or separated, how do you have your drives partitioned, and so forth? Thanks, Miles Joe Julian wrote: Trying to return to the actual question, the way I handle those is to mount gluster volumes that host the data for those tasks from within the vm. I've done that successfully since 2.0 with all of those services. The limitations that others are expressing have as much to do with limitations placed on their own designs as with their hardware. Sure, there are other less stable and/or scalable systems that are faster, but with proper engineering you should be able to build a system that meets those design requirements. The one piece that wasn't there before but now is in 3.3 is the locking and performance problems during disk rebuilds which is now done at a much more granular level and I have successfully self-healed several vm images simultaneously while doing it on all of them without any measurable delays. Miles Fidelman mfidel...@meetinghouse.net wrote: Joe Julian wrote: It would probably be better to ask this with end-goal questions instead of with a unspecified critical feature list and performance problems. Ok... I'm running a 2-node cluster that's essentially a mini cloud stack - with storage and processing combined on the same boxes. I'm running a production VM that hosts a mail server, list server, web server, and database; another production VM providing a backup server for the cluster and for a bunch of desktop machines; and several VMs used for a variety of development and testing purposes. It's all backed by a storage stack consisting of linux raid10 - lvm - drbd, and uses pacemaker for high-availability failover of the production VMs. It all performs reasonably well under moderate load (mail flows, web servers respond, database transactions complete, without notable user-level delays; queues don't back up; cpu and io loads stay within reasonable bounds). The goals are to: - add storage and processing capacity by adding two more nodes - each consisting of several CPU cores and 4 disks each - maintain the flexibility to create/delete/migrate/failover virtual machines - across 4 nodes instead of 2 - avoid having to play games with pairwise DRBD configurations by moving to a clustered filesystem - in essence, I'm looking to do what Sheepdog purports to do, except in a Xen environment Earlier versions of gluster had reported problems with: - supporting databases - supporting VMs - locking and performance problems during disk rebuilds - and... most of the gluster documentation implies that it's preferable to separate storage nodes from processing nodes It looks like Gluster 3.2 and 3.3 have addressed some of these issues, and I'm trying to get a general read on whether it's worth putting in the effort of moving forward with some experimentation, or whether this is a non-starter. Is there anyone out there who's tried to run this kind of mini-cloud with gluster? What kind of results have you had? On 12/26/2012 08:24 PM, Miles Fidelman wrote: Hi Folks, I find myself trying to expand a 2-node high-availability cluster from to a 4-node cluster. I'm running Xen virtualization, and currently using DRBD to mirror data, and pacemaker to failover cleanly. The thing is, I'm trying to add 2 nodes to the cluster, and DRBD doesn't scale. Also, as a function of rackspace limits, and the hardware at hand, I can't separate storage nodes from compute nodes - instead, I have to live with 4 nodes, each with 4 large drives (but also w/ 4 gigE ports per server). The obvious thought is to use Gluster to assemble all the drives into one large storage pool, with replication. But.. last time I looked at this (6 months or so back), it looked like some of the critical features were brand new, and performance seemed to be a problem in the configuration I'm thinking of. Which leads me to my question: Has the situation improved to the point that I can use Gluster this way? Thanks very much, Miles Fidelman Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users -- In theory, there is no difference between theory and practice. In practice, there is. Yogi Berra ___ Gluster-users mailing list
Re: [Gluster-users] how well will this work
John Mark Walker wrote: In general, I don't recommend any distributed filesystems for VM images, but I can also see that this is the wave of the future. Ok. I can see that. Let's say that I take a slightly looser approach to high-availability: - keep the static parts of my installs on local disk - share and replicate dynamic data using gluster - failover by rebooting on a different node (no image to worry about migrating) In this scenario, how well does gluster work when: - storage and processing are inter-mixed on the same nodes - data is triply replicated (allow for 2-node failures) Miles Fidelman -- In theory, there is no difference between theory and practice. In practice, there is. Yogi Berra ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
I have 3 servers with replica 3 volumes, 4 bricks per server on lvm partitions that are placed on each of 4 hard drives, 15 volumes resulting in 60 bricks per server. One of my servers is also a kvm host running (only) 24 vms. Each vm image is only 6 gig, enough for the operating system and applications and is hosted on one volume. The data for each application is hosted on its own GlusterFS volume. For mysql, I set up my innodb store to use 4 files (I don't do 1 file per table), each file distributes to each of the 4 replica subvolumes. This balances the load pretty nicely. I don't really do anything special for anything else, other than the php app recommendations I make on my blog (http://joejulian.name) which all have nothing to do with the actual filesystem. The thing that I think some people (even John Mark) miss apply is that this is just a tool. You have to engineer a solution using the tools you have available. If you feel the positives that GlusterFS provides outweigh the negatives, then you will simply have to engineer a solution that suits your end goal using this tool. It's not a question of whether it works, it's whether you can make it work for your use case. On 12/27/2012 03:00 PM, Miles Fidelman wrote: Ok... now that's diametrically the opposite response from Dan Cyr's of a few minutes ago. Can you say just a bit more about your configuration - how many nodes, do you have storage and processing combined or separated, how do you have your drives partitioned, and so forth? Thanks, Miles Joe Julian wrote: Trying to return to the actual question, the way I handle those is to mount gluster volumes that host the data for those tasks from within the vm. I've done that successfully since 2.0 with all of those services. The limitations that others are expressing have as much to do with limitations placed on their own designs as with their hardware. Sure, there are other less stable and/or scalable systems that are faster, but with proper engineering you should be able to build a system that meets those design requirements. The one piece that wasn't there before but now is in 3.3 is the locking and performance problems during disk rebuilds which is now done at a much more granular level and I have successfully self-healed several vm images simultaneously while doing it on all of them without any measurable delays. Miles Fidelman mfidel...@meetinghouse.net wrote: Joe Julian wrote: It would probably be better to ask this with end-goal questions instead of with a unspecified critical feature list and performance problems. Ok... I'm running a 2-node cluster that's essentially a mini cloud stack - with storage and processing combined on the same boxes. I'm running a production VM that hosts a mail server, list server, web server, and database; another production VM providing a backup server for the cluster and for a bunch of desktop machines; and several VMs used for a variety of development and testing purposes. It's all backed by a storage stack consisting of linux raid10 - lvm - drbd, and uses pacemaker for high-availability failover of the production VMs. It all performs reasonably well under moderate load (mail flows, web servers respond, database transactions complete, without notable user-level delays; queues don't back up; cpu and io loads stay within reasonable bounds). The goals are to: - add storage and processing capacity by adding two more nodes - each consisting of several CPU cores and 4 disks each - maintain the flexibility to create/delete/migrate/failover virtual machines - across 4 nodes instead of 2 - avoid having to play games with pairwise DRBD configurations by moving to a clustered filesystem - in essence, I'm looking to do what Sheepdog purports to do, except in a Xen environment Earlier versions of gluster had reported problems with: - supporting databases - supporting VMs - locking and performance problems during disk rebuilds - and... most of the gluster documentation implies that it's preferable to separate storage nodes from processing nodes It looks like Gluster 3.2 and 3.3 have addressed some of these issues, and I'm trying to get a general read on whether it's worth putting in the effort of moving forward with some experimentation, or whether this is a non-starter. Is there anyone out there who's tried to run this kind of mini-cloud with gluster? What kind of results have you had? On 12/26/2012 08:24 PM, Miles Fidelman wrote: Hi Folks, I find myself trying to expand a 2-node high-availability cluster from to a 4-node cluster. I'm running Xen virtualization, and currently using DRBD to mirror data, and pacemaker to failover cleanly. The thing is, I'm trying to
Re: [Gluster-users] how well will this work
It would probably be better to ask this with end-goal questions instead of with a unspecified critical feature list and performance problems. 6 months ago, for myself and quite an extensive (and often impressive) list of users there were no missing critical features nor was there any problems with performance. That's not to say that they did not meet your design specifications, but without those specs you're the only one who could evaluate that. On 12/26/2012 08:24 PM, Miles Fidelman wrote: Hi Folks, I find myself trying to expand a 2-node high-availability cluster from to a 4-node cluster. I'm running Xen virtualization, and currently using DRBD to mirror data, and pacemaker to failover cleanly. The thing is, I'm trying to add 2 nodes to the cluster, and DRBD doesn't scale. Also, as a function of rackspace limits, and the hardware at hand, I can't separate storage nodes from compute nodes - instead, I have to live with 4 nodes, each with 4 large drives (but also w/ 4 gigE ports per server). The obvious thought is to use Gluster to assemble all the drives into one large storage pool, with replication. But.. last time I looked at this (6 months or so back), it looked like some of the critical features were brand new, and performance seemed to be a problem in the configuration I'm thinking of. Which leads me to my question: Has the situation improved to the point that I can use Gluster this way? Thanks very much, Miles Fidelman ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users