Re: [Gluster-users] Newbie questions on HPC cluster + gluster configuration
+Raghavendra, one of the maintainers of distribute xlator Pranith On 01/29/2016 03:28 PM, Fedele Stabile wrote: Hello, I want to comment with you the configuration that I would realize on my HPCC cluster: The cluster is 32 worker nodes (wn1 ... wn32) each one has a local disk, Infiniband 40 Gb is the connection network and I have also a login server (login-server). I would create a glusterfs distributed volume using the worker node disks (in total 32 disks 1TB) by running this command on login-server: login-server# gluster volume create scratch wn1:/brick wn2:/brick .. wn32:/brick After this I would mount the volume on each node of the cluster so that for example if I write file1 in scratch on wn1 node I'll write on the local disk of wn1. The question is if I can mount scratch in wn1 using this command: wn1# mount -t glusterfs wn1:/scratch /scratch This permits me to write file1 locally not using network channel, isn't it? Thank you for your attention and your contribution. Fedele Stabile ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Newbie questions on HPC cluster + gluster configuration
Hello, I want to comment with you the configuration that I would realize on my HPCC cluster: The cluster is 32 worker nodes (wn1 ... wn32) each one has a local disk, Infiniband 40 Gb is the connection network and I have also a login server (login-server). I would create a glusterfs distributed volume using the worker node disks (in total 32 disks 1TB) by running this command on login-server: login-server# gluster volume create scratch wn1:/brick wn2:/brick .. wn32:/brick After this I would mount the volume on each node of the cluster so that for example if I write file1 in scratch on wn1 node I'll write on the local disk of wn1. The question is if I can mount scratch in wn1 using this command: wn1# mount -t glusterfs wn1:/scratch /scratch This permits me to write file1 locally not using network channel, isn't it? Thank you for your attention and your contribution. Fedele Stabile ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] newbie questions + rpc_client_ping_timer_expired error
Looking at the first mail in the thread, after 'creating' the volume, you are trying to mount it.. please run 'gluster volume start kvm' before running mount. If you have already done it, check if actually the brick process (glusterfsd) is still running by 'ps ax | grep glusterfsd' on server. If not, please go through the brick log file for more information. I suspect that the brick here can be a read-only backend which would have killed glusterfsd process even when you do 'gluster volume start'. Regards, Amar On Thu, May 12, 2011 at 5:46 AM, Chris Haumesser wrote: > Replying to my own thread ... > > After reading more mailing list archives and docs, I tried disabling > stat-prefetch, to no avail. > > I next disabled all of the other performance-related features > (write-behind, read-ahead, io-cache, quick-read), and now my debootstrap > appears to be (albeit slowly) going about its business without issue. > > I also noticed that 42 seconds is the default value for > network.ping-timeout, which corresponds to the error I was seeing in syslog. > > > Which of the above options, now disabled, is most likely to have triggered > the network ping timeouts that I was consistently seeing before? (I did not > change anything on the network.) > > What other side-effects and performance hits will I incur with the above > options disabled? > > Finally, I do not see descriptions of what the io-cache or quick-read > options do in the 3.2 docs. Can someone elucidate? I would also love more > thorough explanations of what how write-behind and read-ahead work (the docs > are pretty terse). > > Thanks everyone. > > Cheers, > > > -C- > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > > ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] newbie questions + rpc_client_ping_timer_expired error
Replying to my own thread ... After reading more mailing list archives and docs, I tried disabling stat-prefetch, to no avail. I next disabled all of the other performance-related features (write-behind, read-ahead, io-cache, quick-read), and now my debootstrap appears to be (albeit slowly) going about its business without issue. I also noticed that 42 seconds is the default value for network.ping-timeout, which corresponds to the error I was seeing in syslog. Which of the above options, now disabled, is most likely to have triggered the network ping timeouts that I was consistently seeing before? (I did not change anything on the network.) What other side-effects and performance hits will I incur with the above options disabled? Finally, I do not see descriptions of what the io-cache or quick-read options do in the 3.2 docs. Can someone elucidate? I would also love more thorough explanations of what how write-behind and read-ahead work (the docs are pretty terse). Thanks everyone. Cheers, -C- ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] newbie questions + rpc_client_ping_timer_expired error
Greetings, I'm trying to replace an NFS server, serving (currently) about a dozen clients, with a gluster cluster. Ultimately, I'd like to use gluster as read-only nfs to net-boot a number of clients in my cluster, using something like openSIS. (Or better yet, natively booting glusterfs, as I saw in the mailing list archives from last year). I'm having some trouble getting up and running. First question: should I be using 3.1.4 or 3.2? I notice that 3.2 is listed as the latest release, but the LATEST folder on the ftp server still points to 3.1.4. Confused. I have been testing with 3.2, both pre-compiled debs and my own build. Particular to my nfsroot problem, I have created a gluster volume called 'kvm' and replicated it across two nodes, e.g., gluster volume create kvm replica 2 transport tcp util.office:/gluster/kvm1 admin.office:/gluster/kvm2 I then mount the volume on util.office: mount -t glusterfs util.office:kvm /mnt/kvm Then I attempt to use debootstrap to set up my nfs image at this mountpoint. Deboostrap consistenly fails at 'Installing core packages ... ' and if I wait long enough, I am rewarded with this terse nugget in syslog: GlusterFS[1223]: [2011-05-11 21:32:06.376514] C [client-handshake.c:121:rpc_client_ping_timer_expired] 0-kvm-client-1: server 10.11.12.44:24010 has not responded in the last 42 seconds, disconnecting. I get this result whether using the pre-compiled debs or my own build (just in slightly different locations). I am using all default options on the volume at this point. The output of 'gluster peer status' on each end continues to show that the peers are connected, and all other network communication between the hosts seems normal. The volume definitions produced by the gluster cli are here: http://pastebin.com/W7R1n4UD, but they're using all default options. I'd appreciate any guidance on how to move forward. I have read mention of other users net-booting from glusterfs, so I guess I must be missing or misusing some configuration parameter(s), or perhaps using the wrong release. Thanks! -C- ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] newbie questions.
On 04/19/2011 09:20 AM, Fyodor Ustinov wrote: > 1. Possible create new volume by 'gluster' command stripe and replica > simultaneously? This came up at my job recently too. I was surprised to find that, although the code and the volfile syntax both support this, the CLI syntax has no way to express it. > 2. File stored on glusterfs can not be greater size than the brick? Without striping, yes, the size of a file cannot be greater than the (remaining) size of a brick - or the smallest of all replica bricks if you're using replication. IMO this is one of the main reasons to use striping, since I've never seen it provide any performance benefit. > 3. I have two bricks of size 10G each. And 2 files of 4G each on one > brick. Perform glusterfs self-balancing and migrate one file to another > brick? I must/can do it "by hand"? It's probably better to let GlusterFS do the rebalancing if possible, but it might not work with very small numbers of files. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] newbie questions.
Hi! 1. Possible create new volume by 'gluster' command stripe and replica simultaneously? 2. File stored on glusterfs can not be greater size than the brick? 3. I have two bricks of size 10G each. And 2 files of 4G each on one brick. Perform glusterfs self-balancing and migrate one file to another brick? I must/can do it "by hand"? To begin, I think, that's enough. :) WBR, Fyodor. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Newbie questions
Hello, I'm new to Gluster and am trying to understand it better before I roll it into production. I looked at the FAQ's and it didn't seem to answer my questions, so please pardon my ignorance. For now I have set up two servers, gluster1 and gluster2. The clients are set up using the cluster/replicate translator: volume mirror-0 type cluster/replicate subvolumes gluster1-1 gluster2-1 end-volume I have a few questions about this setup. 1. If one of the mirror nodes goes down (for minutes, hours, or even needs to be completely rebuilt), how is recovery/resync'ing handled? Do I need to do "ls -laR" in the directory from a client to force it to check all of the files? 2. When growth happens, I would like to add servers in pairs. I would like to add another mirror, and stripe across the mirrors. I understand that gluster needs to be restarted to add storage nodes, but assuming that is done, is there any more than updating the client volume files and restarting gluster? If a new mirror volume is added and it can be striped, is it possible for gluster to rebalance the data across the stripe? Thanks, Kris ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Newbie questions
Am 04.05.2010 14:34, schrieb Count Zero: On May 4, 2010, at 3:25 PM, pkoelle wrote: From our testing we found gluster with many small files to be rather slow (GigE). Each open() will go over the network and will effectively kill read performance (5-7 MB/sec). We tried to serve webapps with many small files and startup time was not tolerable. How about performance in 'replicate' mode (AFR), where you set a preferred volume to be the local volume? Would you still get the same bad performance with that? It's just unclear to me in which configuration you experiences the sub-optimal performance. Sorry, should have provided more details. Version was glusterFS 3.0.3 from git checkout. We tried 4 node (2servers/2clients) with favorite-child and 2 node (client/server same node) with read-subvolume pointing to the local node. (plus a boatload of variations with translators). But as I said, and what you can gather from the list-archives, reported performance differs wildly so there is no way around testing your own platform. cheers Paul ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Newbie questions
I don't recall the numbers, but when I did that reads were about as fast as regular local reads. Also if the post below refers to an older version of glusterfs, then things might have changed a lot since then. The quick-read translator combined with 3.x version was supposed to help a lot for this situation and it's a relatively recent addition to the project. Chris - "Count Zero" wrote: > On May 4, 2010, at 3:25 PM, pkoelle wrote: > > > From our testing we found gluster with many small files to be rather > slow (GigE). Each open() will go over the network and will effectively > kill read performance (5-7 MB/sec). We tried to serve webapps with > many small files and startup time was not tolerable. > > > How about performance in 'replicate' mode (AFR), where you set a > preferred volume to be the local volume? > Would you still get the same bad performance with that? > > It's just unclear to me in which configuration you experiences the > sub-optimal performance. > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Newbie questions
On May 4, 2010, at 3:25 PM, pkoelle wrote: > From our testing we found gluster with many small files to be rather slow > (GigE). Each open() will go over the network and will effectively kill read > performance (5-7 MB/sec). We tried to serve webapps with many small files and > startup time was not tolerable. How about performance in 'replicate' mode (AFR), where you set a preferred volume to be the local volume? Would you still get the same bad performance with that? It's just unclear to me in which configuration you experiences the sub-optimal performance. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Newbie questions
Am 03.05.2010 21:50, schrieb Joshua Baker-LePain: [snip] I'm looking at Gluster for 2 purposes: 1) To host our "database" volume. This volume has copies of several protein and gene databases (PDB, UniProt, etc). The databases generally consist of tens of thousands of small (a few hundred KB at most) files. Users often start array jobs with hundreds or thousands of tasks, each task of which accesses many of these files. From our testing we found gluster with many small files to be rather slow (GigE). Each open() will go over the network and will effectively kill read performance (5-7 MB/sec). We tried to serve webapps with many small files and startup time was not tolerable. Of course, you need to test yourself ;) hth Paul ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Newbie questions
On 05/03/2010 09:50 PM, Joshua Baker-LePain wrote: For purpose 1, clearly I'm looking at a replicated volume. For purpose 2, I'm assuming that distributed is the way to go (rather than striped), although for reliability reasons I'd likely go replicated then distributed. For storage bricks, I'm looking at something like HP's 1. Yes. 2. Your call - both will work, but as you said, it's a question of in how many places you want the data to be. :) 2) Is it frowned upon to create 2 volumes out of the same physical set of disks? I'd like to maximize the spindle count in both volumes (especially the scratch volume), but will it overly degrade performance? Would it be better to simply create one replicated and distributed volume and use that for both of the above purposes? I don't know about « frowned », but my knee-jerk response would be to avoid that scenario. That said, it really all comes down to usage patterns ; if you're only serving data out of one volume at a time, then there's no problem, but if you're constantly using both... 3) Is it crazy to think of doing a distributed (or NUFA) volume with the scratch disks in the whole cluster? Especially given that we have nodes of many ages and see not infrequent node crashes due to bad memory/HDDs/user code? Again, « crazy » is a little strong, but again, it might not hurt to review your usage patterns before diving into the architecture. Who will access what, in what amounts, and at what speed, when ? Once this has been established, you can make better informed decisions about where to put the data, and how to let people access it (in fact, i would submit that many of your questions will answer themselves :) ). -- Daniel Maher ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Newbie questions
Jon, Stripe should be used only if the data usage is a very few files, but each, very very large file ( many GBs in size ). Rest all can use distribute. Regards, Tejas. - Original Message - From: "Jon Tegner" To: "Joshua Baker-LePain" Cc: "gluster-users" Sent: Tuesday, May 4, 2010 11:00:57 AM Subject: Re: [Gluster-users] Newbie questions Hi, I'm also a newbie, and I'm looking forward to answers to your questions. Just one question, why would distributed be preferable over striped (I'm probably the bigger newbie here)? > For purpose 1, clearly I'm looking at a replicated volume. For > purpose 2, I'm assuming that distributed is the way to go (rather than > striped), although for Regards, /jon ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Newbie questions
Hi, I'm also a newbie, and I'm looking forward to answers to your questions. Just one question, why would distributed be preferable over striped (I'm probably the bigger newbie here)? For purpose 1, clearly I'm looking at a replicated volume. For purpose 2, I'm assuming that distributed is the way to go (rather than striped), although for Regards, /jon ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Newbie questions
I'm a Gluster newbie trying to get myself up to speed. I've been through the bulk of the website docs and I'm in the midst of some small (although increasing) scale test setups. But I wanted to poll the list's collective wisdom on how best to fit Gluster into my setup. As background, I currently have over 550 nodes with over 3000 cores in my (SGE scheduled) cluster, and we expand on a roughly biannual basis. The cluster is all gigabit ethernet -- each rack has a switch, and these switches each have 4-port trunks to our central switch. Despite the number of nodes in each rack, these trunks are not currently oversubscribed. The cluster is shared among many research groups and the vast majority of the jobs are embarrassingly parallel. Our current storage is an active-active pair of NetApp FAS3070s with a total of 8 shelves of disks. Unsurprisingly, it's fairly easy for any one user to flatten either head (or both) of the NetApp. I'm looking at Gluster for 2 purposes: 1) To host our "database" volume. This volume has copies of several protein and gene databases (PDB, UniProt, etc). The databases generally consist of tens of thousands of small (a few hundred KB at most) files. Users often start array jobs with hundreds or thousands of tasks, each task of which accesses many of these files. 2) To host a cluster-wide scratch space. Users waste a lot of time (and bandwidth) copying (often temporary) results back and forth between the network storage and the nodes' scratch disks. And scaling the NetApp is difficult, not least of which because it is rather difficult to convince PIs to spring for storage rather than more cores. For purpose 1, clearly I'm looking at a replicated volume. For purpose 2, I'm assuming that distributed is the way to go (rather than striped), although for reliability reasons I'd likely go replicated then distributed. For storage bricks, I'm looking at something like HP's DL180 G6, where I would have 25 internal SAS disks (or alternatively, I could put the same number in a SAS-attached external chassis). In addition to any general advice folks could give, I have these specific questions: 1) My initial leaning would be to RAID10 the disks at the server level, and then use the RAID volumes as gluster exports. But I could also see running the disks in JBOD mode and doing all the redundancy at the Gluster level. The latter would seem to make management (and, e.g., hot swap) more difficult, but is it preferred from a Gluster perspective? How difficult would it make disk and/or brick maintenance? 2) Is it frowned upon to create 2 volumes out of the same physical set of disks? I'd like to maximize the spindle count in both volumes (especially the scratch volume), but will it overly degrade performance? Would it be better to simply create one replicated and distributed volume and use that for both of the above purposes? 3) Is it crazy to think of doing a distributed (or NUFA) volume with the scratch disks in the whole cluster? Especially given that we have nodes of many ages and see not infrequent node crashes due to bad memory/HDDs/user code? If you've made it this far, thanks very much for reading. Any and all advice (and/or pointers at more documentation) would be much appreciated. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Newbie questions :-)
Philipp Huber wrote: Daniel, Fantastic, thanks very much for your reply. We are very excited about GlusterFS and are working on a business case for a Cloud Storage product that would complement our Cloud Computing platform. One quick question re your #4 answer, does that mean you will have to take the volume down for a re-sync? Thanks for your reply, Phil Please direct your replies to the list, mate. :) As for question #4 : > 4) Is it correct to assume that after a failed 'brick' comes back > online, the auto-heal functionality will take care of the re-sync'ing? The volume doesn't need to be taken down, no, but replication won't happen by magic either. Basically, for a node to realise that its copy of the file is no longer current (or that it shouldn't be there, or should be there, or whatever), the file has to be accessed. On a webserver or something like that, the access might easily occur organically (a graphic or html page being served). On file servers where there's less interactivity, running a simple script that will find and, say, stat the files in the exported tree (for example) will ensure coherency. -- Daniel Maher ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Newbie questions :-)
Hello ! Philipp Huber wrote: 1) Can I configure GlusterFS so it can withstand a complete 'brick' failure without users loosing access to their data? Yes. 2) If Yes, can I configure how many redundant copies of the files are store, e.g. 2x, 3x? Yes. 3) Can I control the amount of replication per user? No. 4) Is it correct to assume that after a failed 'brick' comes back online, the auto-heal functionality will take care of the re-sync'ing? Yes (but not in the background...) 5) As GlusterFS stores Metadata along with the normal data, what is the capacity overhead in %? That's a good question. :) -- Daniel Maher ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Newbie questions :-)
Hi guys, My first post, so sorry if it is something that was covered before. I read quite a bit of the documentation and archived posts, but couldn't find the answers: 1) Can I configure GlusterFS so it can withstand a complete 'brick' failure without users loosing access to their data? 2) If Yes, can I configure how many redundant copies of the files are store, e.g. 2x, 3x? 3) Can I control the amount of replication per user? 4) Is it correct to assume that after a failed 'brick' comes back online, the auto-heal functionality will take care of the re-sync'ing? 5) As GlusterFS stores Metadata along with the normal data, what is the capacity overhead in %? Any feedback is hugely appreciated. Phil Huber ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users