Re: [Gluster-users] [Gluster-devel] Need testers for GlusterFS 3.4.4
Ack tend to use the (your ;-) knowledge for a cfengine promise once I get the time... best regards Bernhard On 04.06.2014, at 20:51, James purplei...@gmail.com wrote: On Wed, Jun 4, 2014 at 2:43 PM, BGM bernhard.gl...@ecologic.eu wrote: we might get a cfengine/puppet framework to easily https://github.com/purpleidea/puppet-gluster ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Best Practices for different failure scenarios?
thnx Vijay, will drill my head into it Bernhard Sent from my iPad On 24.02.2014, at 17:26, Vijay Bellur vbel...@redhat.com wrote: On 02/21/2014 10:27 PM, BGM wrote: It might be very helpful to have a wiki next to this mailing list, where all the good experience, all the proved solutions for situations that are brought up here, could be gathered in a more permanent and straight way. +1. It would be very useful to evolve an operations guide for GlusterFS. . To your questions I would add: what's best practice in setting options for performance and/or integrity... (yeah, well, for which use case under which conditions) a mailinglist is very helpful for adhoc probs and questions, but it would be nice to distill the knowledge into a permanent, searchable form. . sure anybody could set up a wiki, but... it would need the acceptance and participation of an active group to get best results. so IMO the appropriate place would be somewhere close to gluster.org? . Would be happy to carry this in doc/ folder of glusterfs.git and collaborate on it if a lightweight documentation format like markdown or asciidoc is used for evolving this guide. I haven't worked with neither of them, on the very first glance asciidoc looks easier to me. (assuming it is either or ?) and (sorry for being flat, i m op not dev ;-) you suggest everybody sets up a git from where you pull, right? No need to setup a git on your own. We use the development workflow [1] for submitting patches to documentation too. well, wouldn't a wiki be much easier? both, to contribute to and to access the information? (like wiki.debian.org?) The git based solution might be easier to start of with, but would it reach a big enough community? Documentation in markdown or asciidoc is rendered well by github. One of the chapters in our admin guide does get rendered like this [2]. Wouldn't a wiki also have a better PR/marketing effect (by being easier to access)? just a thought... We can roll out the content from git in various formats (like pdf, html etc.) as both asciidoc/markdown can be converted to various formats. The advantage of a git based workflow is that it becomes easy to review changes through tools like gerrit and can also help in keeping false content/spam out of the way. Having said that, feel free to use tools of your choice. We can just go ahead and use whatever is easy for most of us :). At the end of the day, evolving this guide is more important than the tools that we choose to use in the process. Cheers, Vijay [1] http://www.gluster.org/community/documentation/index.php/Development_Work_Flow [2] https://github.com/gluster/glusterfs/blob/master/doc/admin-guide/en-US/markdown/admin_setting_volumes.md Bernhard -Vijay ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Problems to work with mounted directory in Gluster 3.2.7 - switch to 3.2.4 ; -)
well, note: - you don't need zfs on the hardeware machines, xfs or ext3 or ext4 would do it too - for production you wouldn't use a glusterfs on top of a glusterfs but rather giving the vm access to a real blockdevice, like a whole harddisk or at least a partition of it although migration of the vm wouldn't be possible than... therefor: a VM as a glusterserver might not be the best idea. - remember to peer probe the glusterserver partner from both sides! as mentioned below for a first setup you should be fine with that. regards On 19.02.2014, at 19:32, Targino Silveira targinosilve...@gmail.com wrote: Thanks Bernhard I will do this. Regards, Targino Silveira +55-85-8626-7297 www.twitter.com/targinosilveira 2014-02-19 14:43 GMT-03:00 Bernhard Glomm bernhard.gl...@ecologic.eu: I would strongly recommend to restart fresh with gluster 3.2.4 from http://download.gluster.org/pub/gluster/glusterfs/3.4/ It works totally fine for me. (reinstall the vms as slim as possible if you can.) As a quick howto consider this: - We have 2 Hardware machines (just desktop machines for dev-env) - both running zol - create a zpool and zfs filesystem - create a gluster replica 2 volume between hostA and hostB - installe 3 VM vmachine0{4,5,6} - vmachine0{4,5} each have a 100GB diskimage file as /dev/vdb which also resides on the glustervolume - create ext3 filesystem on vmachine0{4,5}:/dev/vdb1 - create gluster replica 2 between vmachine04 and vmachine05 as shown below (!!!obviously nobody would do that in any serious environment, just to show that even a setup like that _would_ be possible!!!) - run some benchmarks on that volume and compare the results to other So: root@vmachine04[/0]:~ # mkdir -p /srv/vdb1/gf_brick root@vmachine04[/0]:~ # mount /dev/vdb1 /srv/vdb1/ root@vmachine04[/0]:~ # gluster peer probe vmachine05 peer probe: success # now switch over to vmachine05 and do root@vmachine05[/1]:~ # mkdir -p /srv/vdb1/gf_brick root@vmachine05[/1]:~ # mount /dev/vdb1 /srv/vdb1/ root@vmachine05[/1]:~ # gluster peer probe vmachine04 peer probe: success root@vmachine05[/1]:~ # gluster peer probe vmachine04 peer probe: success: host vmachine04 port 24007 already in peer list # the peer probe from BOTH sides ist often forgotten # switch back to vmachine04 and continue with root@vmachine04[/0]:~ # gluster peer status Number of Peers: 1 Hostname: vmachine05 Port: 24007 Uuid: 085a1489-dabf-40bb-90c1-fbfe66539953 State: Peer in Cluster (Connected) root@vmachine04[/0]:~ # gluster volume info layer_cake_volume Volume Name: layer_cake_volume Type: Replicate Volume ID: ef5299db-2896-4631-a2a8-d0082c1b25be Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: vmachine04:/srv/vdb1/gf_brick Brick2: vmachine05:/srv/vdb1/gf_brick root@vmachine04[/0]:~ # gluster volume status layer_cake_volume Status of volume: layer_cake_volume Gluster process PortOnline Pid -- Brick vmachine04:/srv/vdb1/gf_brick 49152 Y 12778 Brick vmachine05:/srv/vdb1/gf_brick 49152 Y 16307 NFS Server on localhost 2049Y 12790 Self-heal Daemon on localhost N/A Y 12791 NFS Server on vmachine052049Y 16320 Self-heal Daemon on vmachine05 N/A Y 16319 There are no active volume tasks # set any option you might like root@vmachine04[/1]:~ # gluster volume set layer_cake_volume network.remote-dio enable volume set: success # go to vmachine06 and mount the volume root@vmachine06[/1]:~ # mkdir /srv/layer_cake root@vmachine06[/1]:~ # mount -t glusterfs -o backupvolfile-server=vmachine05 vmachine04:/layer_cake_volume /srv/layer_cake root@vmachine06[/1]:~ # mount vmachine04:/layer_cake_volume on /srv/layer_cake type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) root@vmachine06[/1]:~ # df -h Filesystem Size Used Avail Use% Mounted on ... vmachine04:/layer_cake_volume 97G 188M 92G 1% /srv/layer_cake All fine and stable # now let's see how it tastes # note this is postmark on / NOT on the glustermounted layer_cake_volume! # that postmark results might be available tomorrow ;-))) root@vmachine06[/1]:~ # postmark PostMark v1.51 : 8/14/01 pmset transactions 50 pmset number 20 pmset subdirectories 1 pmrun Creating subdirectories...Done Creating files...Done Performing transactions..Done Deleting files...Done Deleting subdirectories...Done Time: 2314 seconds total 2214 seconds of transactions (225 per second) Files: 450096 created (194 per second) Creation
Re: [Gluster-users] Problems to work with mounted directory in Gluster 3.2.7 - switch to 3.2.4 ; -)
... keep it simple, make it robust ... use raid1 (or raidz if you can) for the bricks hth On 19.02.2014, at 20:32, Targino Silveira targinosilve...@gmail.com wrote: Sure, I will use XFS, as I sayd before it's for old data, so we don't need a great performance, we only need to store data. regards, Targino Silveira +55-85-8626-7297 www.twitter.com/targinosilveira 2014-02-19 16:11 GMT-03:00 BGM bernhard.gl...@ecologic.eu: well, note: - you don't need zfs on the hardeware machines, xfs or ext3 or ext4 would do it too - for production you wouldn't use a glusterfs on top of a glusterfs but rather giving the vm access to a real blockdevice, like a whole harddisk or at least a partition of it although migration of the vm wouldn't be possible than... therefor: a VM as a glusterserver might not be the best idea. - remember to peer probe the glusterserver partner from both sides! as mentioned below for a first setup you should be fine with that. regards On 19.02.2014, at 19:32, Targino Silveira targinosilve...@gmail.com wrote: Thanks Bernhard I will do this. Regards, Targino Silveira +55-85-8626-7297 www.twitter.com/targinosilveira 2014-02-19 14:43 GMT-03:00 Bernhard Glomm bernhard.gl...@ecologic.eu: I would strongly recommend to restart fresh with gluster 3.2.4 from http://download.gluster.org/pub/gluster/glusterfs/3.4/ It works totally fine for me. (reinstall the vms as slim as possible if you can.) As a quick howto consider this: - We have 2 Hardware machines (just desktop machines for dev-env) - both running zol - create a zpool and zfs filesystem - create a gluster replica 2 volume between hostA and hostB - installe 3 VM vmachine0{4,5,6} - vmachine0{4,5} each have a 100GB diskimage file as /dev/vdb which also resides on the glustervolume - create ext3 filesystem on vmachine0{4,5}:/dev/vdb1 - create gluster replica 2 between vmachine04 and vmachine05 as shown below (!!!obviously nobody would do that in any serious environment, just to show that even a setup like that _would_ be possible!!!) - run some benchmarks on that volume and compare the results to other So: root@vmachine04[/0]:~ # mkdir -p /srv/vdb1/gf_brick root@vmachine04[/0]:~ # mount /dev/vdb1 /srv/vdb1/ root@vmachine04[/0]:~ # gluster peer probe vmachine05 peer probe: success # now switch over to vmachine05 and do root@vmachine05[/1]:~ # mkdir -p /srv/vdb1/gf_brick root@vmachine05[/1]:~ # mount /dev/vdb1 /srv/vdb1/ root@vmachine05[/1]:~ # gluster peer probe vmachine04 peer probe: success root@vmachine05[/1]:~ # gluster peer probe vmachine04 peer probe: success: host vmachine04 port 24007 already in peer list # the peer probe from BOTH sides ist often forgotten # switch back to vmachine04 and continue with root@vmachine04[/0]:~ # gluster peer status Number of Peers: 1 Hostname: vmachine05 Port: 24007 Uuid: 085a1489-dabf-40bb-90c1-fbfe66539953 State: Peer in Cluster (Connected) root@vmachine04[/0]:~ # gluster volume info layer_cake_volume Volume Name: layer_cake_volume Type: Replicate Volume ID: ef5299db-2896-4631-a2a8-d0082c1b25be Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: vmachine04:/srv/vdb1/gf_brick Brick2: vmachine05:/srv/vdb1/gf_brick root@vmachine04[/0]:~ # gluster volume status layer_cake_volume Status of volume: layer_cake_volume Gluster process PortOnline Pid -- Brick vmachine04:/srv/vdb1/gf_brick 49152 Y 12778 Brick vmachine05:/srv/vdb1/gf_brick 49152 Y 16307 NFS Server on localhost 2049Y 12790 Self-heal Daemon on localhost N/A Y 12791 NFS Server on vmachine052049Y 16320 Self-heal Daemon on vmachine05 N/A Y 16319 There are no active volume tasks # set any option you might like root@vmachine04[/1]:~ # gluster volume set layer_cake_volume network.remote-dio enable volume set: success # go to vmachine06 and mount the volume root@vmachine06[/1]:~ # mkdir /srv/layer_cake root@vmachine06[/1]:~ # mount -t glusterfs -o backupvolfile-server=vmachine05 vmachine04:/layer_cake_volume /srv/layer_cake root@vmachine06[/1]:~ # mount vmachine04:/layer_cake_volume on /srv/layer_cake type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) root@vmachine06[/1]:~ # df -h Filesystem Size Used Avail Use% Mounted on ... vmachine04:/layer_cake_volume 97G 188M 92G 1% /srv/layer_cake All fine and stable # now let's see how it tastes # note this is postmark on / NOT on the glustermounted layer_cake_volume! # that postmark results might be available
Re: [Gluster-users] Best Practices for different failure scenarios?
On 19.02.2014, at 21:15, James purplei...@gmail.com wrote: On Wed, Feb 19, 2014 at 3:07 PM, Michael Peek p...@nimbios.org wrote: Is there a best practices document somewhere for how to handle standard problems that crop up? Short answer, it sounds like you'd benefit from playing with a test cluster... Would I be correct in guessing that you haven't setup a gluster pool yet? You might want to look at: https://ttboj.wordpress.com/2014/01/08/automatically-deploying-glusterfs-with-puppet-gluster-vagrant/ This way you can try them out easily... For some of those points... solve them with... Sort of a crib notes for things like: 1) What do you do if you see that a drive is about to fail? RAID6 or: zol, raidzx (open for critical commends) or: brick remove brick add volume heal (it's really just three commands, at least in my experience so far, touch wood) . but Michael, I appreciate your _original_ question: Is there a best practice document? Nope, not that I am aware of. . It might be very helpful to have a wiki next to this mailing list, where all the good experience, all the proved solutions for situations that are brought up here, could be gathered in a more permanent and straight way. . To your questions I would add: what's best practice in setting options for performance and/or integrity... (yeah, well, for which use case under which conditions) a mailinglist is very helpful for adhoc probs and questions, but it would be nice to distill the knowledge into a permanent, searchable form. . sure anybody could set up a wiki, but... it would need the acceptance and participation of an active group to get best results. so IMO the appropriate place would be somewhere close to gluster.org? . regards Bernhard 2) What do you do if a drive has already failed? RAID6 3) What do you do if a peer is about to fail? Get a new peer ready... 4) What do you do if a peer has failed? Replace with new peer... 5) What do you do to reinstall a peer from scratch (i.e. what configuration files/directories do you need to restore to get the host back up and talking to the rest of the cluster)? Bring up a new peer. Add to cluster... Same as failed peer... 6) What do you do with failed-heals? 7) What do you do with split-brains? These are more complex issues and a number of people have written about them... Eg: http://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/ Cheers, James Michael ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Bug 1057645] ownership of diskimage changes during livemigration, livemigration with kvm/libvirt fails
Hi Paul all I'm really keen on getting this solved, right now it's a nasty show stopper. I could try different gluster versions, as long as I can get the .debs for it, wouldn't want to start compiling (although does a config option have changed on package build?) you reported that 3.4.0 on ubuntu 13.04 was working, right? code diff, config options for package build. Another approach: can anyone verify or falsify https://bugzilla.redhat.com/show_bug.cgi?id=1057645 on another distro than ubuntu/debian? thinking of it... could it be an apparmor interference? I had fun with apparmor and mysql on ubuntu 12.04 once... will have a look at that tomorrow. As mentioned before, a straight drbd/ocfs2 works (with only 1/4 speed and the pain of maintenance) so AFAIK I have to blame the ownership change on gluster, not on an issue with my general setup best regards Bernhard ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster and kvm livemigration
Hi Paul, thnx, nice report, u file(d) the bug? can u do a watch tree - pfungiA path to ur vm images pool on both hosts some vm running, some stopped. start a machine trigger the migration at some point, the ownership of the vmimage.file flips from libvirtd (running machnie) to root (normal permission, but only when stopped). If the ownership/permission flips that way, libvirtd on the reciving side can't write that file ... does group/acl permission flip likewise? Regards Bernhard On 23.01.2014, at 16:49, Paul Boven bo...@jive.nl wrote: Hi Bernhard, I'm having exactly the same problem on Ubuntu 13.04 with the 3.4.1 packages from semiosis. It worked fine with glusterfs-3.4.0. We've been trying to debug this on the list, but haven't found the smoking gun yet. Please have a look at the URL below, and see if it matches what you are experiencing? http://epboven.home.xs4all.nl/gluster-migrate.html Regards, Paul Boven. On 01/23/2014 04:27 PM, Bernhard Glomm wrote: I had/have problems with live-migrating a virtual machine on a 2sided replica volume. I run ubuntu 13.04 and gluster 3.4.2 from semiosis with network.remote-dio to enable I can use cache mode = none as performance option for the virtual disks, so live migration works without --unsafe I'm triggering the migration now through the Virtual Machine Manager as an unprivileged user which is group member of libvirtd. After migration the disks become read-only because on migration the disk files changes ownership from libvirt-qemu to root What am I missing? TIA Bernhard ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users -- Paul Boven bo...@jive.nl +31 (0)521-596547 Unix/Linux/Networking specialist Joint Institute for VLBI in Europe - www.jive.nl VLBI - It's a fringe science ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS share authentication?
On 22.01.2014, at 16:43, Peter B. p...@das-werkstatt.com wrote: On 01/21/2014 10:31 PM, Dan Mons wrote: On 22 January 2014 05:19, Peter B. p...@das-werkstatt.com wrote: The clients in fact *do* only access it over Samba. I just figured that *if* one user connected a GNU/Linux machine to the LAN, he could simply connect with write permissions using the GlusterFS Linux client. All he'd have to do for authenticating is to spoof one of the storage-IPs. man iptables I've been working with iptables for many years, but in this particular case, I fail to see how they would help. Maybe I'm overlooking something very obvious? Could you please elaborate your suggestion a bit? I would suggest not to connect the dedicated storage nic(s) to the lan but to a physical seperated network, vlan or if that all is not possible, through a vpn. could be wrong, but INHO with ip_forward off you should be fine? regards Bernhard ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Design/HW for cost-efficient NL archive = 0.5PB?
Sent from my iPad On 02.01.2014, at 18:06, Justin Dossey j...@podomatic.com wrote: 1) It depends on the number of drives per chassis, your tolerance for risk, and the speed of rebuilds. I'd recommend doing a couple of test rebuilds with different array sizes to see how fast your controller and drives can complete them, and then comparing the rebuild completion times to your SLA-- if a rebuild takes two days to complete, is that good enough for you (especially given the chances of another failure occuring during the rebuild)? All other things being equal, the smaller the array, the faster the rebuild, but the more wasted space in the array. Also note that many controllers have tunable rebuild algorithms, so you can divert more resources to completing rebuilds faster at the cost of performance. One data point from me: my last 16-2T-SATA RAID-6 rebuild took about 58 hours to complete. 2) My understanding is that the way file reads work on GlusterFS, read requests are sent to all nodes and the data is used from the first node to respond to the request. So if one node is busier than others, it is likely to respond more slowly and thus receive a lower portion of the read activity, as long as the files being read are larger than a single response. On Wed, Jan 1, 2014 at 12:21 PM, Fredrik Häll hall.fred...@gmail.com wrote: Thanks for all the input! It sure sounds like RAID-6 for disk failures and Gluster for the spanning and high level redundancy parts is a good candidate. Some final questions: 1) How big can one comfortably go in terms of RAID-6 array size? Given 4TB SATA/SAS drives. On the one hand much points to keeping as few RAIDs as possible, and disk usage is of course maximized. But there are complications in terms of rebuild times and risk of losing the 2 drives. Hot spares may also be an option. Your reflections? 2) Is there any intelligence or automation in Gluster that makes smart use of dual (or multiple) replicas? Say that I have 2 replicas, and one of them is spending some effort on a RAID rebuild, is there functionality for manually or automatically preferring the other (healhy) replica? Best regards, Fredrik On Tue, Dec 31, 2013 at 10:27 PM, Justin Dossey j...@podomatic.com wrote: Yes, RAID-6 is better than RAID-5 in most cases. I agonized over the decision to deploy 5 for my Gluster cluster, and the reason I went with 5 is that the number of drives in the brick was (IMO) acceptably low. I use 6 for my 16-drive arrays, which means I have to lose 3 disks out of the 16 to lose my data. With 2x8-drive arrays in 5, I also have to lose 3 disks to lose data, but if I do lose data, I only lose 50% of the data on the server, and all these bricks are distribute-replicate anyway, so I wouldn't actually lose any data at all. That consideration, paired with the fact that I keep spares on hand and replace failed drives within a day or two, means that I'm okay with running 2x RAID-5 instead of 1x RAID-6. (2x RAID-6 would put me below my storage target, forcing additional hardware purchases.) I suppose the short answer is evaluate your storage needs carefully. On Tue, Dec 31, 2013 at 11:19 AM, James purplei...@gmail.com wrote: On Tue, Dec 31, 2013 at 11:33 AM, Justin Dossey j...@podomatic.com wrote: Yes, I'd recommend sticking with RAID in addition to GlusterFS. The cluster I'm mid-build on (it's a live migration) is 18x RAID-5 bricks on 9 servers. Each RAID-5 brick is 8 2T drives, so about 13T usable. It's better to deal with a RAID when a disk fails than to have to pull and replace the brick, and I believe Red Hat's official recommendation is still to minimize the number of bricks per server (which makes me a rebel for having two, I suppose). 9 (slow-ish, SATA RAID) servers easily saturate 1Gbit on a busy day. I think RedHat also recommends RAID6 instead of RAID5. In any case, I sure do, at least. James On Mon, Dec 30, 2013 at 5:54 AM, bernhard glomm bernhard.gl...@ecologic.eu wrote: some years ago I had a similar tasks. I did: - We had disk arrays with 24 slots, with optional 4 JBODS (each 24 slots) stacked on top, dual LWL controller 4GB (costs ;-) - creating raids (6) with not more than 7 disks each - as far as I remember I had one hot spare per each 4 raids - connecting as many of this raid bricks together with striped glusterfs as needed - as for replication, I was planing for an offside duplicate of this architecture and because losing data was REALLY not an option, writing it all off at a second offside location onto LTFS tapes. As the original version for the LTFS library edition was far to expensive for us I found an alternative solution that does the same thing but fort a much reasonable prize. LTFS is still a big thing in digital Archiving. Give me a note if you like more details on
Re: [Gluster-users] Design/HW for cost-efficient NL archive = 0.5PB?
thnx Justin for an accurate example figure on this! (58hours rebuild time on 16 2TBHD with one HD failing, not two I suppose that was) again I would emphasize to go for maximum (affordable) brick security (raid6/raidz2 or better) and gluster to expand the available space and/or replication on a high level (i.e. replicate the whole dataset to a second side of server, either for redundancy or access speed) There are different aspects that make up the rebuild time of a failed raid: - raid level - disk speed - controller performance - active read/write usage of the system tests (because the above mentioned aspects are difficult to bring into a simple math formula) and considering a proper SLA surely helps. though a hi boss, all data lost isn't really an option, is it? build stronger to last longer. have a good one ;-) Bernhard On 02.01.2014, at 18:06, Justin Dossey j...@podomatic.com wrote: to completing rebuilds faster at the cost of performance. One data point from me: my last 16-2T-SATA RAID-6 rebuild took about 58 hours to complete. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Three nodes cluster with 2 replicas
It's logical impossible what you ask for! on a raid5, if two (you said two) fails ALL data IS lost. think harder! As Justin supossed, go for securing your bricks with (soft)raid (or maybe raidz) and extend the space by striping it onto several server. In a disaster, a hard disk, and 10hours later a second one, fails, while you are sick and out of office, and your apprentice has to handle it. out the bad HD, in the good one. cross fingers (that is, that the new HDs are good) and rock on. Simple solutions for complex problems! I don't argue that it is not totally impossible to build, with gluster, what you are asking for, but to my taste it wouldn't b nice nd easy (i.e. rocksolid) Bernhard On 01.01.2014, at 22:06, shacky shack...@gmail.com wrote: Hi. I have three servers with 7 hard drives (without HW RAID controller) that I wish to use to create a Gluster cluster. I am looking a way to have 2 replicas with 3 nodes, because I need much storage space and 2 nodes are not enough, but I wish to have the same security I'd have using a RAID5 on a node. So I wish my data to be protected if one (or two) of 7 hard drives will fail on the same node and if an entire node of three will fail. Is it possibile? Could you help me to find out the correct way? Thank you very much! Bye. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Design/HW for cost-efficient NL archive = 0.5PB?
Yepp, and still I would still be VERY uncomfortable with raid5 and = 1TB disks. read this https://prestoprimews.ina.fr/public/deliverables/PP_WP3_ID3.2.1_ThreatsMassStorage_R0_v1.00.pdf and maybe other deliverables from presto prime to understand why. In short (I think it was in the above mentioned dekiverable) PrestoPrime was looking into patterns of disk failure. For that they worked together with google and others, which runs A LOT of spinning disks, The most clear pattern they found was not with certain HD types or companies but with batches of HDs, produced at the same time at the same facility. If one out of the batch failed it was more likely other HDs from the same batch failed soon after that too. So if you order 50 disks at once to build your storage, something admins like to do normaly it turns out to become a kind of russian roulette. Also the paper points out the fact that it takes more and more time to a) detect a failure on a given HD and b) to recover from a disk failure the bigger the disks become. From that point of view PrestoPrime strongly recommended against raid5 and AT LEAST for raid6 for any archiving (long term storage) purpose. My opinion/advise is: build VERY strong bricks (raid6 or zfs raidz2/3?) and use gluster to: - expand the space - increase performance - replicate the whole thing to offside vault - think of LTFS to get your value resting on not spinning media hth Bernhard On 31.12.2013, at 17:33, Justin Dossey j...@podomatic.com wrote: Yes, I'd recommend sticking with RAID in addition to GlusterFS. The cluster I'm mid-build on (it's a live migration) is 18x RAID-5 bricks on 9 servers. Each RAID-5 brick is 8 2T drives, so about 13T usable. It's better to deal with a RAID when a disk fails than to have to pull and replace the brick, and I believe Red Hat's official recommendation is still to minimize the number of bricks per server (which makes me a rebel for having two, I suppose). 9 (slow-ish, SATA RAID) servers easily saturate 1Gbit on a busy day. The following is opinion only, so make up your own mind: If I had a big pile of RAID-5 or RAID-6 bricks, I would not want to spend extra money for replica-3. Instead, I would go replica-2 and use the leftover money to build in additional redundancy on the hardware (e.g. redundant power, redundant 10gigE). If money were not an object, of course there's no harm in going replica-3 or more. But every build I've ever done has a budget that seems slightly small for the desired outcome. On Mon, Dec 30, 2013 at 5:54 AM, bernhard glomm bernhard.gl...@ecologic.eu wrote: some years ago I had a similar tasks. I did: - We had disk arrays with 24 slots, with optional 4 JBODS (each 24 slots) stacked on top, dual LWL controller 4GB (costs ;-) - creating raids (6) with not more than 7 disks each - as far as I remember I had one hot spare per each 4 raids - connecting as many of this raid bricks together with striped glusterfs as needed - as for replication, I was planing for an offside duplicate of this architecture and because losing data was REALLY not an option, writing it all off at a second offside location onto LTFS tapes. As the original version for the LTFS library edition was far to expensive for us I found an alternative solution that does the same thing but fort a much reasonable prize. LTFS is still a big thing in digital Archiving. Give me a note if you like more details on that. - This way I could fsck all (not to big) raids in parallel (sped things up) - proper robustness against disk failure - space that could grow infinite in size (add more and bigger disks) and keep up with access speed (ad more server) at a pretty foreseeable prize - LTFS in the vault provided just the finishing having data accessible even if two out three sides are down, reasonable prize, (for instance no heat problem at the tape location) Nowadays I would go for the same approach except zfs raidz3 bricks (at least do a thorough test on it) instead of (small) hardware raid bricks. As for simplicity and robustness I wouldn't like to end up with several hundred glusterfs bricks, each on one individual disk, but rather leaving disk failure prevention either to hardware raid or zfs and using gluster to connect this bricks into the fs size I need( - and for mirroring the whole thing to a second side if needed) hth Bernhard Bernhard Glomm IT Administration Phone:+49 (30) 86880 134 Fax: +49 (30) 86880 100 Skype:bernhard.glomm.ecologic Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | 10717 Berlin | Germany GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | USt/VAT-IdNr.: DE811963464 Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH On Dec 25, 2013, at 8:47 PM, Fredrik Häll hall.fred...@gmail.com wrote: I am new to Gluster, but so far it seems
Re: [Gluster-users] [Gluster-devel] glusterfs-3.4.2qa4 BUG 987555 not fixed?
thanks Nils, will try that tomorrow, and let you know of course Bernhard On 19.12.2013, at 17:34, Niels de Vos nde...@redhat.com wrote: On Thu, Dec 19, 2013 at 03:44:26PM +, Bernhard Glomm wrote: hi all I'm testing SRC: http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.4.2qa4.tar.gz on ubuntu 13.04 previous I had gluster 3.2.7 (the one from ubuntu 13.04 repository) installed. I use a two sided gluster mirror to host the imagefiles of my VM With gluster 3.2.7 all worked fine. I upgraded to gluster 3.4.2qa4 (see above). VM still worked fine, bonnie++ tests from inside the VM instances showing similar results than before but than I hit the 987555 bug again The change for that bug introduces an option to the /etc/glusterfs/gluster.vol configuration file. You can now add the following line to that file: volume management ... option base-port 50152 ... end-volume By default this is commented out with the default port (49152). In the line above. 50152 is just an example, you can pick any port you like. GlusterFS tries to detect if a port is in use, if it is, it'll try the next one (and so on). Also note that QEMU had a fix for this as well. With the right version of QEMU, there should be no need to change this option from the default. Details on the fixes for QEMU are referenced in Bug 1019053. Can you let us know if setting this option and restarting all the glusterfsd processes helps? Thanks, Niels root@ping[/1]:~ # time virsh migrate --verbose --live --unsafe --p2p --domain atom01 --desturi qemu+ssh://192.168.242.93/system error: Unable to read from monitor: Connection reset by peer root@ping[/0]:~ # netstat -tulpn|egrep 49152 tcp0 0 0.0.0.0:49152 0.0.0.0:* LISTEN 3924/glusterfsd or root@ping[/0]:~ # netstat -tulpn|egrep gluster tcp0 0 0.0.0.0:49155 0.0.0.0:* LISTEN 4031/glusterfsd tcp0 0 0.0.0.0:38468 0.0.0.0:* LISTEN 5418/glusterfs tcp0 0 0.0.0.0:49156 0.0.0.0:* LISTEN 4067/glusterfsd tcp0 0 0.0.0.0:933 0.0.0.0:* LISTEN 5418/glusterfs tcp0 0 0.0.0.0:38469 0.0.0.0:* LISTEN 5418/glusterfs tcp0 0 0.0.0.0:49157 0.0.0.0:* LISTEN 4109/glusterfsd tcp0 0 0.0.0.0:49158 0.0.0.0:* LISTEN 4155/glusterfsd tcp0 0 0.0.0.0:49159 0.0.0.0:* LISTEN 4197/glusterfsd tcp0 0 0.0.0.0:24007 0.0.0.0:* LISTEN 2682/glusterd tcp0 0 0.0.0.0:49160 0.0.0.0:* LISTEN 4237/glusterfsd tcp0 0 0.0.0.0:49161 0.0.0.0:* LISTEN 4280/glusterfsd tcp0 0 0.0.0.0:49162 0.0.0.0:* LISTEN 4319/glusterfsd tcp0 0 0.0.0.0:49163 0.0.0.0:* LISTEN 4360/glusterfsd tcp0 0 0.0.0.0:49165 0.0.0.0:* LISTEN 5408/glusterfsd tcp0 0 0.0.0.0:49152 0.0.0.0:* LISTEN 3924/glusterfsd tcp0 0 0.0.0.0:20490.0.0.0:* LISTEN 5418/glusterfs tcp0 0 0.0.0.0:38465 0.0.0.0:* LISTEN 5418/glusterfs tcp0 0 0.0.0.0:49153 0.0.0.0:* LISTEN 3959/glusterfsd tcp0 0 0.0.0.0:38466 0.0.0.0:* LISTEN 5418/glusterfs tcp0 0 0.0.0.0:49154 0.0.0.0:* LISTEN 3996/glusterfsd udp0 0 0.0.0.0:931 0.0.0.0:* 5418/glusterfs is there a compile option work_together_with_libvirt ;-) Can anyone confirm this or has a work around? best Bernhard P.S.: As I learned in the discussion before libvirt is counting up the ports when it finds the one needed are blocked already. So after 12 migration attempt the VM finally WAS migrated IMHO there should/could be an option to configure the start port/port range and yes, given could/should be done ALSO for libvirt, fact is gluster 3.2.7 works (for me), 3.4.2 doesn't :-(( I really would like to try the gfapi but not for the prize of no live migration. -- Bernhard Glomm IT Administration Phone: +49 (30) 86880 134 Fax: +49 (30) 86880 100 Skype: bernhard.glomm.ecologic Ecologic Institut gemeinnützige GmbH |
Re: [Gluster-users] Mount GlusterFS from localhost
hi, I'm experience a similar issue. I'm tetsting glusterfs on a two-sided mirror, on zol. I got several volumes (each on it's own zfs-filesystem) mount -a works fine, but after a reboot all the zfs filesystems are present but only an arbitrary amount of glusterfs got mounted (like 2 random out of 6) me I so far circumvented that by abandon fstab for that purpose and using an rc script (S99, K20) for the glustermount, allowing a sleep 30 gracetime, than mounting the glustervolumes ( on a two sided mirror I mount my-ip:/volumename /my-mountpoint in contrast to mirror-ip:/volumename /my-mountpoint) All glustervolumes get mounted fine than. AFAIK option _netdev in fstab should have done the job, but in my case it didn't, I thought maybe due to zol. Anyhow, 30 sec gracetime does the job, piked 30 sec just arbitrary, might be too much even... best Bernhard On 13.12.2013, at 17:40, Joel Young j...@cryregarder.com wrote: I use a systemd file in /etc/systemd/system such as the attached. You also want to make sure you've done an systemctl enable NetworkManager-wait-online.service systemctl enable work.mount systemctl start work.mount Joel On Sat, Dec 7, 2013 at 9:33 AM, Vadim Nevorotin mala...@ubuntu.com wrote: Hello! I need to mount glusterfs from localhost. So both server and client are on the same host. I've add to fstab localhost:/srv_tftp /srv/tftp glusterfs defaults,_netdev 0 0 Then mount -a Ok, in this case all work great. But after reboot nothing is mounted, because GlusterFS server starts after network and after remote FS. Is there any solutions to fix this problem? I use Debian, but I think that the same problem is in all other distros. As I understand it's impossible to execute init script after network is ready, but before remote fs are mounted. It can fix the problem, but may be there is some different solution? ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users work.mount ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Ubuntu GlusterFS in Production
Josh, although it might be more some bleeding than cutting edge, is there/could you provide - some howto get the lib gfapi working in ubuntu 13.04 given that it'll work with the port problem on live migration mentioned earlier? gluster 3.4.2 I hope fixed the port problem? I'd be willing to do some testing/feed back on that. best Bernhard On 05.12.2013, at 20:07, Josh Boon glus...@joshboon.com wrote: We're using it in production too. We're on KVM 1.6, gluster 3.4.1 running on Ubuntu 13.04. No problems but do be aware that you'll want fast links if you actually want to saturate your disk bandwidth. We've bonded 10gbps links and we still saturate those before we fully utilize our IO. I've not tested NFS specifically but performance is something you're looking for I'd strongly suggest gfapi which gets us near 400MBps writes in a replica 2 config across the above mentioned interfaces. You'll have to do some work on the KVM sources for gfapi though as it's not even made it into the debian unstable packages. Best, Josh - Original Message - From: Jiri Hoogeveen j.hoogev...@bluebillywig.com To: Gerald Brandt g...@majentis.com Cc: gluster-users@gluster.org List gluster-users@gluster.org Sent: Thursday, December 5, 2013 11:31:34 AM Subject: Re: [Gluster-users] Ubuntu GlusterFS in Production Hi Gerald, Yes, we are using GlusterFS 3.3.2 with Ubuntu 12.04, KVM and bonding 802.3ad on 2 x 1Gbps nic. This way every tcp session can go over a different nic. For vmWare vSphere we use the NFS of GlusterFS and for KVM the native glusterfs client. This setup is working nice. Grtz, Jiri On 05 Dec 2013, at 14:49, Gerald Brandt g...@majentis.com wrote: Hi, Is anyone using GlusterFS on Ubuntu in production? Specifically, I'm looking at using the NFS portion of it over a bonded interface. I believe I'll get better speed than user the gluster client across a single interface. Setup: 3 servers running KVM (about 24 VM's) 2 NAS boxes running Ubuntu (13.04 and 13.10) Since Gluster NFS does server side replication, I'll put replication data over a different nic than user data. Gerald ps: I had this setup with 3.2, but it proved unstable under load. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users