[Gluster-users] enabling NFS on a running gluster system
Hi all, We're currently having a 2-nodes replicated-distributed gluster system (version 3.2.2) where all the clients connect via the native gluster client. There's been a requirement to connect via NFS to the existing gluster and I'd like to ask to you whether the NFS can be dynamically enabled, Is it required to restart services in the server? Is it required to remount existing clients? There's a georeplica backend which I guess will not be affected, but is it required to restart the replicacion? As a side effect, would the existing gluster performance by degraded for the activation of the NFS compatibility? Thank you in advance. Samuel. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Performance issues with striped volume over Infiniband
Dear Gluster Users, We are facing some severe performance issues with GlusterFS and we would very much appreciate any help on identifying the cause of this. Our setup is extremely simple: 2 nodes interconnected with 40Gb/s Infiniband and also 1Gb/s Ethernet, running Centos 6.2 and GlusterFS 3.2.6. Each node has 4 SATA drives put in a RAID0 array that gives ~750 MB/s random reads bandwidth. The tool that we used for measuring IO performance relies on O_DIRECT access, so we patched the fuse kernel: http://marc.info/?l=linux-fsdevelm=132950081331043w=2. We created the following volume and mounted it at /mnt/gfs/. Volume Name: GFS_RDMA_VOLUME Type: Stripe Status: Started Number of Bricks: 2 Transport-type: rdma Bricks: Brick1: node01:/mnt/md0/gfs_storage Brick2: node02:/mnt/md0/gfs_storage Options Reconfigured: cluster.stripe-block-size: *:2MB performance.quick-read: on performance.io-cache: on performance.cache-size: 256MB performance.cache-max-file-size: 128MB We expected to see an IO bandwidth of about 1500 MB/s (measured with the exact same tool and parameters), but unfortunately we only get ~100MB/s, which is very disappointing. Please find below the output of #cat /var/log/glusterfs/mnt-gfs-.log. If you need any other information that I forgot to mentioned, please let me know. Thanks, Adrian [2012-04-18 11:59:42.847818] I [glusterfsd.c:1493:main] 0-/opt/glusterfs/3.2.6/sbin/glusterfs: Started running /opt/glusterfs/3.2.6/sbin/glusterfs version 3.2.6 [2012-04-18 11:59:42.862610] W [write-behind.c:3023:init] 0-GFS_RDMA_VOLUME-write-behind: disabling write-behind for first 0 bytes [2012-04-18 11:59:43.318188] I [client.c:1935:notify] 0-GFS_RDMA_VOLUME-client-0: parent translators are ready, attempting connect on transport [2012-04-18 11:59:43.321287] I [client.c:1935:notify] 0-GFS_RDMA_VOLUME-client-1: parent translators are ready, attempting connect on transport Given volfile: +--+ 1: volume GFS_RDMA_VOLUME-client-0 2: type protocol/client 3: option remote-host node01 4: option remote-subvolume /mnt/md0/gfs_storage 5: option transport-type rdma 6: end-volume 7: 8: volume GFS_RDMA_VOLUME-client-1 9: type protocol/client 10: option remote-host node02 11: option remote-subvolume /mnt/md0/gfs_storage 12: option transport-type rdma 13: end-volume 14: 15: volume GFS_RDMA_VOLUME-stripe-0 16: type cluster/stripe 17: option block-size *:2MB 18: subvolumes GFS_RDMA_VOLUME-client-0 GFS_RDMA_VOLUME-client-1 19: end-volume 20: 21: volume GFS_RDMA_VOLUME-write-behind 22: type performance/write-behind 23: subvolumes GFS_RDMA_VOLUME-stripe-0 24: end-volume 25: 26: volume GFS_RDMA_VOLUME-read-ahead 27: type performance/read-ahead 28: subvolumes GFS_RDMA_VOLUME-write-behind 29: end-volume 30: 31: volume GFS_RDMA_VOLUME-io-cache 32: type performance/io-cache 33: option max-file-size 128MB 34: option cache-size 256MB 35: subvolumes GFS_RDMA_VOLUME-read-ahead 36: end-volume 37: 38: volume GFS_RDMA_VOLUME-quick-read 39: type performance/quick-read 40: option cache-size 256MB 41: subvolumes GFS_RDMA_VOLUME-io-cache 42: end-volume 43: 44: volume GFS_RDMA_VOLUME-stat-prefetch 45: type performance/stat-prefetch 46: subvolumes GFS_RDMA_VOLUME-quick-read 47: end-volume 48: 49: volume GFS_RDMA_VOLUME 50: type debug/io-stats 51: option latency-measurement off 52: option count-fop-hits off 53: subvolumes GFS_RDMA_VOLUME-stat-prefetch 54: end-volume +--+ [2012-04-18 11:59:43.326287] E [client-handshake.c:1171:client_query_portmap_cbk] 0-GFS_RDMA_VOLUME-client-1: failed to get the port number for remote subvolume [2012-04-18 11:59:43.764287] E [client-handshake.c:1171:client_query_portmap_cbk] 0-GFS_RDMA_VOLUME-client-0: failed to get the port number for remote subvolume [2012-04-18 11:59:46.868595] I [rpc-clnt.c:1536:rpc_clnt_reconfig] 0-GFS_RDMA_VOLUME-client-0: changing port to 24009 (from 0) [2012-04-18 11:59:46.879292] I [rpc-clnt.c:1536:rpc_clnt_reconfig] 0-GFS_RDMA_VOLUME-client-1: changing port to 24009 (from 0) [2012-04-18 11:59:50.872346] I [client-handshake.c:1090:select_server_supported_programs] 0-GFS_RDMA_VOLUME-client-0: Using Program GlusterFS 3.2.6, Num (1298437), Version (310) [2012-04-18 11:59:50.872760] I [client-handshake.c:913:client_setvolume_cbk] 0-GFS_RDMA_VOLUME-client-0: Connected to 192.168.0.101:24009, attached to remote volume '/mnt/md0/gfs_storage'. [2012-04-18 11:59:50.874975] I [client-handshake.c:1090:select_server_supported_programs] 0-GFS_RDMA_VOLUME-client-1: Using Program GlusterFS 3.2.6, Num (1298437), Version (310) [2012-04-18 11:59:50.875290] I [client-handshake.c:913:client_setvolume_cbk]
Re: [Gluster-users] Performance issues with striped volume over Infiniband
I've seen the same 100MB/s limit (depending on block size of transfer) with 5 bricks in a stripe and have yet to try ipoib, which I hear improves performance over rdma for some reason. On Wed, Apr 18, 2012 at 5:05 AM, Ionescu, A. a.ione...@student.vu.nl wrote: Dear Gluster Users, We are facing some severe performance issues with GlusterFS and we would very much appreciate any help on identifying the cause of this. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Frequent glusterd restarts needed to, avoid NFS performance degradation
On 04/18/2012 01:48 PM, gluster-users-requ...@gluster.org wrote: Date: Tue, 17 Apr 2012 19:06:31 -0500 (CDT) From: Gerald Brandtg...@majentis.com Subject: Re: [Gluster-users] Frequent glusterd restarts needed to avoid NFS performance degradation To: Dan Brethertond.a.brether...@reading.ac.uk Cc: gluster-usersgluster-users@gluster.org Message-ID:22749685.104.1334707572319.JavaMail.gbr@thinkpad Content-Type: text/plain; charset=utf-8 Hi, - Original Message - Dear All- I find that I have to restart glusterd every few days on my servers to stop NFS performance from becoming unbearably slow. When the problem occurs, volumes can take several minutes to mount and there are long delays responding to ls. Mounting from a different server, i.e. one not normally used for NFS export, results in normal NFS access speeds. This doesn't seem to have anything to do with load because it happens whether or not there is anything running on the compute servers. Even when the system is mostly idle there are often a lot of glusterfsd processes running, and on several of the servers I looked at this evening there is a process called glusterfs using 100% of one CPU. I can't find anything unusual in nfs.log or etc-glusterfs-glusterd.vol.log on the servers affected. Restarting glusterd seems to stop this strange behaviour and make NFS access run smoothly again, but this usually only lasts for a day or two. This behaviour is not necessarily related to the length of time since glusterd was started, but has more to do with the amount of work the GlusterFS processes on each server have to do. I use a different server to export each of my 8 different volumes, and the NFS performance degradation seems to affect the most heavily used volumes more than the others. I really need to find a solution to this problem; all I can think of doing is setting up a cron job on each server to restart glusterd every day, but I am worried about what side effects that might have. I am using GlusterFS version 3.2.5. All suggestions would be much appreciated. Regards, Dan. I run GlusterFS 3.2.5 and only access is via NFS. I'm running Citrix XenServer with about 23 VM's off of it. I haven't seen any degradation at all. One thing I don't have is replication or anything else set up. The server is ready to replicate, but I'm waiting for 3.3 Gerald Hello Gerald, Thanks for your comments. I should have mentioned that I do use replication in my cluster, but I'm not sure that the replication is causing the problem. Another thing to mention about my system is that there is a lot of data transfer going on most of the time, including models and data processing applications running on the compute cluster and data transfers from other sites. I wouldn't be surprised if the Gluster-NFS handles several terabytes of data before it starts to grind to a halt. Perhaps this problem hasn't been noticed before because my usage isn't typical. However, it should be fairly easy to reproduce if it's just a matter of transferring a large volume of data. -Dan. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] IPoIB Volume (3.3b3) started but not online, not mountable
I was successfully running a IPoIB gluster testbed (V3.3b3 on Ubuntu 10.04.04) and brought it down smoothly to adjust some parameters. It now looks like this (the options reconfigured were just added): # gluster volume info Volume Name: gli Type: Distribute Volume ID: 76cc5e88-0ac4-42ac-a4a3-31bf2ba611d4 Status: Started Number of Bricks: 5 Transport-type: tcp,rdma Bricks: Brick1: pbs1ib:/bducgl Brick2: pbs2ib:/bducgl Brick3: pbs2ib:/bducgl1 Brick4: pbs3ib:/bducgl Brick5: pbs4ib:/bducgl Options Reconfigured: performance.io-cache: on performance.quick-read: on performance.io-thread-count: 64 auth.allow: 10.255.77.*, 128.200.15.*, 10.255.78.*, 10.255.89.* however, a status query gives this: # gluster volume status Status of volume: gli Gluster process PortOnline Pid -- Brick pbs1ib:/bducgl24016 N N/A Brick pbs2ib:/bducgl24023 N N/A Brick pbs2ib:/bducgl1 24025 N N/A Brick pbs3ib:/bducgl24016 N N/A Brick pbs4ib:/bducgl24016 N N/A NFS Server on localhost 38467 N N/A NFS Server on pbs4ib38467 N N/A NFS Server on pbs3ib38467 N N/A NFS Server on pbs2ib38467 N N/A (I didn't want the NFS Server options - is that a default to start it?) But the operative bit is that it's not online, despite being started. What could give this situation? As might be expected, clients can't mount the gluster vol. The last part of the /etc-glusterfs-glusterd.vol.log/ is many lines like this: [2012-04-18 11:36:57.456318] E [socket.c:2115:socket_connect] 0- management: connection attempt failed (Connection refused) and the last lines before are a number of stanzas like this: [2012-04-18 11:31:14.698184] I [glusterd-op- sm.c::glusterd_op_ac_send_commit_op] 0-management: Sent op req to 3 peers [2012-04-18 11:31:14.698379] I [glusterd-rpc- ops.c:1294:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from uuid: 2a593581-bf45-446c-8f7c-212c53297803 [2012-04-18 11:31:14.698496] I [glusterd-rpc- ops.c:1294:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from uuid: c79c4084-d6b9-4af9-b975-40dd6aa99b42 [2012-04-18 11:31:14.698581] I [glusterd-rpc- ops.c:1294:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from uuid: 26de63bd-c5b7-48ba-b81d-5d77a533d077 [2012-04-18 11:31:14.698834] I [glusterd-rpc- ops.c:606:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC from uuid: 2a593581-bf45-446c-8f7c-212c53297803 [2012-04-18 11:31:14.698879] I [glusterd-rpc- ops.c:606:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC from uuid: 26de63bd-c5b7-48ba-b81d-5d77a533d077 [2012-04-18 11:31:14.698910] I [glusterd-rpc- ops.c:606:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC from uuid: c79c4084-d6b9-4af9-b975-40dd6aa99b42 [2012-04-18 11:31:14.698929] I [glusterd-op- sm.c:2491:glusterd_op_txn_complete] 0-glusterd: Cleared local lock [2012-04-18 11:31:15.410106] E [socket.c:2115:socket_connect] 0- management: connection attempt failed (Connection refused) -- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [ZOT 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 415 South Circle View Dr, Irvine, CA, 92697 [shipping] MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) -- ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Bricks suggestions
Hi all, we are planning a new infrastructure based on gluster to be used by some mail servers and some web servers. We plan 4 server, with 6x 2TB SATA disks in RAID-5 hardware each. In a replicate-distribute volume we will have 20TB of available space. What do you suggest, a single XFS volume and then split webstorage and mailstorage by directory or do you suggest to create two different mount points with different replicate-distribute volume? any performance degradation making 2 or more volumes instead a single one? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] IPoIB Volume (3.3b3) started but not online, not mountable
With JoeJulian's help, tracked this down to what looks like a bug in the IP# format which causes glusterfsd to crash. The bug is: https://bugzilla.redhat.com:443/show_bug.cgi?id=813937 If anyone has an immediate workaround or correction, be glad to hear of it. hjm -- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [ZOT 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 415 South Circle View Dr, Irvine, CA, 92697 [shipping] MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) -- ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] IPoIB Volume (3.3b3) started but not online, not mountable
Interim fix is to use ONLY commas, no spaces allowed (this used to be oK previously) gluster volume set gli auth.allow \ '10.255.77.*,128.200.15.*,10.255.78.*,10.255.89.*' is ok (glusterfsd starts correctly) but gluster volume set gli auth.allow '10.255.77.*, 128.200.15.*, 10.255.78.*, 10.255.89.*' is NOT OK (glusterfsd will not start). hjm On Wednesday 18 April 2012 12:56:08 Harry Mangalam wrote: With JoeJulian's help, tracked this down to what looks like a bug in the IP# format which causes glusterfsd to crash. The bug is: https://bugzilla.redhat.com:443/show_bug.cgi?id=813937 If anyone has an immediate workaround or correction, be glad to hear of it. hjm -- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [ZOT 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 415 South Circle View Dr, Irvine, CA, 92697 [shipping] MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) -- ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] IPoIB Volume (3.3b3) started but not online, not mountable
And one more observation that will probably be obvious in retrospect. If you enable auth.allow (on 3.3b3), it will do reverse lookups to verify hostnames so it will be more complicated to share an IPoIB gluster volume to IPoEth clients. I had been overriding DNS entries with /etc/hosts entries, but the auth.allow option will prevent that hack. If anyone knows how to share an IPoIB volume to ethernet clients in a more formally correct way, I'd be happy to learn of it. -- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [ZOT 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 415 South Circle View Dr, Irvine, CA, 92697 [shipping] MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) -- ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] IPoIB Volume (3.3b3) started but not online, not mountable
On 04/18/2012 06:58 PM, Harry Mangalam wrote: And one more observation that will probably be obvious in retrospect. If you enable auth.allow (on 3.3b3), it will do reverse lookups to verify hostnames so it will be more complicated to share an IPoIB gluster volume to IPoEth clients. I had been overriding DNS entries with /etc/hosts entries, but the auth.allow option will prevent that hack. If anyone knows how to share an IPoIB volume to ethernet clients in a more formally correct way, I'd be happy to learn of it. After dealing with problems in multi-modal networks with slightly different naming schemes, I don't recommend using tcp and RDMA together (or even IPoIB with eth) for Gluster. Very long, very painful saga. Executive summary: here be dragons. Also, IPoIB is very leaky. So under heavy load, you can find your servers starting to run out of memory. We've seen this with OFED through 1.5.3.x and Glusters as late as 3.2.6. We'd recommend sticking to one fabric for the moment with Gluster. Use real tcp with a 10 or 40 GbE backbone. Far fewer problems. Much less excitement. Regards, Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: land...@scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users