[Gluster-users] enabling NFS on a running gluster system

2012-04-18 Thread samuel
Hi all,

We're currently having a 2-nodes replicated-distributed gluster system
(version 3.2.2) where all the clients connect via the native gluster
client. There's been a requirement to connect via NFS to the existing
gluster and I'd like to ask to you whether the NFS can be dynamically
enabled,

Is it required to restart services in the server?
Is it required to remount existing clients?
There's a georeplica backend which I guess will not be affected, but is it
required to restart the replicacion?


As a side effect, would the existing gluster performance by degraded for
the activation of the NFS compatibility?

Thank you in advance.

Samuel.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Performance issues with striped volume over Infiniband

2012-04-18 Thread Ionescu, A.
Dear Gluster Users,

We are facing some severe performance issues with GlusterFS and we would very 
much appreciate any help on identifying the cause of this.

Our setup is extremely simple: 2 nodes interconnected with 40Gb/s Infiniband 
and also 1Gb/s Ethernet, running Centos 6.2 and GlusterFS 3.2.6.
Each node has 4 SATA drives put in a RAID0 array that gives ~750 MB/s random 
reads bandwidth. The tool that we used for measuring IO performance relies on 
O_DIRECT access, so we patched the fuse kernel: 
http://marc.info/?l=linux-fsdevelm=132950081331043w=2.

We created the following volume and mounted it at /mnt/gfs/.

Volume Name: GFS_RDMA_VOLUME
Type: Stripe
Status: Started
Number of Bricks: 2
Transport-type: rdma
Bricks:
Brick1: node01:/mnt/md0/gfs_storage
Brick2: node02:/mnt/md0/gfs_storage
Options Reconfigured:
cluster.stripe-block-size: *:2MB
performance.quick-read: on
performance.io-cache: on
performance.cache-size: 256MB
performance.cache-max-file-size: 128MB

We expected to see an IO bandwidth of about 1500 MB/s (measured with the exact 
same tool and parameters), but unfortunately we only get ~100MB/s, which is 
very disappointing.

Please find below the output of  #cat /var/log/glusterfs/mnt-gfs-.log. If you 
need any other information that I forgot to mentioned, please let me know.

Thanks,
Adrian


[2012-04-18 11:59:42.847818] I [glusterfsd.c:1493:main] 
0-/opt/glusterfs/3.2.6/sbin/glusterfs: Started running 
/opt/glusterfs/3.2.6/sbin/glusterfs version 3.2.6
[2012-04-18 11:59:42.862610] W [write-behind.c:3023:init] 
0-GFS_RDMA_VOLUME-write-behind: disabling write-behind for first 0 bytes
[2012-04-18 11:59:43.318188] I [client.c:1935:notify] 
0-GFS_RDMA_VOLUME-client-0: parent translators are ready, attempting connect on 
transport
[2012-04-18 11:59:43.321287] I [client.c:1935:notify] 
0-GFS_RDMA_VOLUME-client-1: parent translators are ready, attempting connect on 
transport
Given volfile:
+--+
  1: volume GFS_RDMA_VOLUME-client-0
  2: type protocol/client
  3: option remote-host node01
  4: option remote-subvolume /mnt/md0/gfs_storage
  5: option transport-type rdma
  6: end-volume
  7:
  8: volume GFS_RDMA_VOLUME-client-1
  9: type protocol/client
 10: option remote-host node02
 11: option remote-subvolume /mnt/md0/gfs_storage
 12: option transport-type rdma
 13: end-volume
 14:
 15: volume GFS_RDMA_VOLUME-stripe-0
 16: type cluster/stripe
 17: option block-size *:2MB
 18: subvolumes GFS_RDMA_VOLUME-client-0 GFS_RDMA_VOLUME-client-1
 19: end-volume
 20:
 21: volume GFS_RDMA_VOLUME-write-behind
 22: type performance/write-behind
 23: subvolumes GFS_RDMA_VOLUME-stripe-0
 24: end-volume
 25:
 26: volume GFS_RDMA_VOLUME-read-ahead
 27: type performance/read-ahead
 28: subvolumes GFS_RDMA_VOLUME-write-behind
 29: end-volume
 30:
 31: volume GFS_RDMA_VOLUME-io-cache
 32: type performance/io-cache
 33: option max-file-size 128MB
 34: option cache-size 256MB
 35: subvolumes GFS_RDMA_VOLUME-read-ahead
 36: end-volume
 37:
 38: volume GFS_RDMA_VOLUME-quick-read
 39: type performance/quick-read
 40: option cache-size 256MB
 41: subvolumes GFS_RDMA_VOLUME-io-cache
 42: end-volume
 43:
 44: volume GFS_RDMA_VOLUME-stat-prefetch
 45: type performance/stat-prefetch
 46: subvolumes GFS_RDMA_VOLUME-quick-read
 47: end-volume
 48:
 49: volume GFS_RDMA_VOLUME
 50: type debug/io-stats
 51: option latency-measurement off
 52: option count-fop-hits off
 53: subvolumes GFS_RDMA_VOLUME-stat-prefetch
 54: end-volume

+--+
[2012-04-18 11:59:43.326287] E 
[client-handshake.c:1171:client_query_portmap_cbk] 0-GFS_RDMA_VOLUME-client-1: 
failed to get the port number for remote subvolume
[2012-04-18 11:59:43.764287] E 
[client-handshake.c:1171:client_query_portmap_cbk] 0-GFS_RDMA_VOLUME-client-0: 
failed to get the port number for remote subvolume
[2012-04-18 11:59:46.868595] I [rpc-clnt.c:1536:rpc_clnt_reconfig] 
0-GFS_RDMA_VOLUME-client-0: changing port to 24009 (from 0)
[2012-04-18 11:59:46.879292] I [rpc-clnt.c:1536:rpc_clnt_reconfig] 
0-GFS_RDMA_VOLUME-client-1: changing port to 24009 (from 0)
[2012-04-18 11:59:50.872346] I 
[client-handshake.c:1090:select_server_supported_programs] 
0-GFS_RDMA_VOLUME-client-0: Using Program GlusterFS 3.2.6, Num (1298437), 
Version (310)
[2012-04-18 11:59:50.872760] I [client-handshake.c:913:client_setvolume_cbk] 
0-GFS_RDMA_VOLUME-client-0: Connected to 192.168.0.101:24009, attached to 
remote volume '/mnt/md0/gfs_storage'.
[2012-04-18 11:59:50.874975] I 
[client-handshake.c:1090:select_server_supported_programs] 
0-GFS_RDMA_VOLUME-client-1: Using Program GlusterFS 3.2.6, Num (1298437), 
Version (310)
[2012-04-18 11:59:50.875290] I [client-handshake.c:913:client_setvolume_cbk] 

Re: [Gluster-users] Performance issues with striped volume over Infiniband

2012-04-18 Thread Sabuj Pattanayek
I've seen the same 100MB/s limit (depending on block size of transfer)
with 5 bricks in a stripe and have yet to try ipoib, which I hear
improves performance over rdma for some reason.

On Wed, Apr 18, 2012 at 5:05 AM, Ionescu, A. a.ione...@student.vu.nl wrote:
 Dear Gluster Users,

 We are facing some severe performance issues with GlusterFS and we would
 very much appreciate any help on identifying the cause of this.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Frequent glusterd restarts needed to, avoid NFS performance degradation

2012-04-18 Thread Dan Bretherton

On 04/18/2012 01:48 PM, gluster-users-requ...@gluster.org wrote:

Date: Tue, 17 Apr 2012 19:06:31 -0500 (CDT)
From: Gerald Brandtg...@majentis.com
Subject: Re: [Gluster-users] Frequent glusterd restarts needed to
avoid   NFS performance degradation
To: Dan Brethertond.a.brether...@reading.ac.uk
Cc: gluster-usersgluster-users@gluster.org
Message-ID:22749685.104.1334707572319.JavaMail.gbr@thinkpad
Content-Type: text/plain; charset=utf-8

Hi,

- Original Message -

  Dear All-
  I find that I have to restart glusterd every few days on my servers
  to
  stop NFS performance from becoming unbearably slow.  When the problem
  occurs, volumes can take several minutes to mount and there are long
  delays responding to ls.   Mounting from a different server, i.e.
  one
  not normally used for NFS export, results in normal NFS access
  speeds.
  This doesn't seem to have anything to do with load because it happens
  whether or not there is anything running on the compute servers.
Even
  when the system is mostly idle there are often a lot of glusterfsd
  processes running, and on several of the servers I looked at this
  evening there is a process called glusterfs using 100% of one CPU.  I
  can't find anything unusual in nfs.log or
  etc-glusterfs-glusterd.vol.log
  on the servers affected.  Restarting glusterd seems to stop this
  strange
  behaviour and make NFS access run smoothly again, but this usually
  only
  lasts for a day or two.
  
  This behaviour is not necessarily related to the length of time since

  glusterd was started, but has more to do with the amount of work the
  GlusterFS processes on each server have to do.  I use a different
  server
  to export each of my 8 different volumes, and the NFS performance
  degradation seems to affect the most heavily used volumes more than
  the
  others.  I really need to find a solution to this problem; all I can
  think of doing is setting up a cron job on each server to restart
  glusterd every day, but I am worried about what side effects that
  might
  have.  I am using GlusterFS version 3.2.5.  All suggestions would be
  much appreciated.
  
  Regards,

  Dan.

I run GlusterFS 3.2.5 and only access is via NFS.  I'm running Citrix XenServer 
with about 23 VM's off of it.  I haven't seen any degradation at all.

One thing I don't have is replication or anything else set up.  The server is 
ready to replicate, but I'm waiting for 3.3

Gerald


Hello Gerald,
Thanks for your comments.  I should have mentioned that I do use 
replication in my cluster, but I'm not sure that the replication is 
causing the problem.  Another thing to mention about my system is that 
there is a lot of data transfer going on most of the time, including 
models and data processing applications running on the compute cluster 
and data transfers from other sites.  I wouldn't be surprised if the 
Gluster-NFS handles several terabytes of data before it starts to grind 
to a halt.  Perhaps this problem hasn't been noticed before because my 
usage isn't typical.  However, it should be fairly easy to reproduce if 
it's just a matter of transferring a large volume of data.

-Dan.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] IPoIB Volume (3.3b3) started but not online, not mountable

2012-04-18 Thread Harry Mangalam
I was successfully running a IPoIB gluster testbed (V3.3b3 on Ubuntu 
10.04.04) and brought it down smoothly to adjust some parameters.  
It now looks like this (the options reconfigured were just added):

# gluster volume info
 
Volume Name: gli
Type: Distribute
Volume ID: 76cc5e88-0ac4-42ac-a4a3-31bf2ba611d4
Status: Started
Number of Bricks: 5
Transport-type: tcp,rdma
Bricks:
Brick1: pbs1ib:/bducgl
Brick2: pbs2ib:/bducgl
Brick3: pbs2ib:/bducgl1
Brick4: pbs3ib:/bducgl
Brick5: pbs4ib:/bducgl
Options Reconfigured:
performance.io-cache: on
performance.quick-read: on
performance.io-thread-count: 64
auth.allow: 10.255.77.*, 128.200.15.*, 10.255.78.*, 10.255.89.*


however, a status query gives this:

# gluster volume status

Status of volume: gli
Gluster process PortOnline  
Pid
--
Brick pbs1ib:/bducgl24016   N   
N/A
Brick pbs2ib:/bducgl24023   N   
N/A
Brick pbs2ib:/bducgl1   24025   N   
N/A
Brick pbs3ib:/bducgl24016   N   
N/A
Brick pbs4ib:/bducgl24016   N   
N/A
NFS Server on localhost 38467   N   
N/A
NFS Server on pbs4ib38467   N   
N/A
NFS Server on pbs3ib38467   N   
N/A
NFS Server on pbs2ib38467   N   
N/A

(I didn't want the NFS Server options - is that a default to start 
it?)

But the operative bit is that it's not online, despite being started.  
What could give this situation?


As might be expected, clients can't mount the gluster vol.

The last part of the /etc-glusterfs-glusterd.vol.log/ is many lines 
like this:
[2012-04-18 11:36:57.456318] E [socket.c:2115:socket_connect] 0-
management: connection attempt failed (Connection refused)
and the last lines before are a number of stanzas like this:

[2012-04-18 11:31:14.698184] I [glusterd-op-
sm.c::glusterd_op_ac_send_commit_op] 0-management: Sent op req to 
3 peers
[2012-04-18 11:31:14.698379] I [glusterd-rpc-
ops.c:1294:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from 
uuid: 2a593581-bf45-446c-8f7c-212c53297803
[2012-04-18 11:31:14.698496] I [glusterd-rpc-
ops.c:1294:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from 
uuid: c79c4084-d6b9-4af9-b975-40dd6aa99b42
[2012-04-18 11:31:14.698581] I [glusterd-rpc-
ops.c:1294:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from 
uuid: 26de63bd-c5b7-48ba-b81d-5d77a533d077
[2012-04-18 11:31:14.698834] I [glusterd-rpc-
ops.c:606:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC 
from uuid: 2a593581-bf45-446c-8f7c-212c53297803
[2012-04-18 11:31:14.698879] I [glusterd-rpc-
ops.c:606:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC 
from uuid: 26de63bd-c5b7-48ba-b81d-5d77a533d077
[2012-04-18 11:31:14.698910] I [glusterd-rpc-
ops.c:606:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC 
from uuid: c79c4084-d6b9-4af9-b975-40dd6aa99b42
[2012-04-18 11:31:14.698929] I [glusterd-op-
sm.c:2491:glusterd_op_txn_complete] 0-glusterd: Cleared local lock
[2012-04-18 11:31:15.410106] E [socket.c:2115:socket_connect] 0-
management: connection attempt failed (Connection refused)

-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[ZOT 2225] / 92697  Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
--
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Bricks suggestions

2012-04-18 Thread Gandalf Corvotempesta
Hi all,
we are planning a new infrastructure based on gluster to be used by some
mail servers and some web servers.
We plan 4 server, with 6x 2TB SATA disks in RAID-5 hardware each.
In a replicate-distribute volume we will have 20TB of available space.

What do you suggest, a single XFS volume and then split webstorage and
mailstorage by directory or do you suggest to create two different mount
points with different replicate-distribute volume?

any performance degradation making 2 or more volumes instead a single one?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] IPoIB Volume (3.3b3) started but not online, not mountable

2012-04-18 Thread Harry Mangalam
With JoeJulian's help, tracked this down to what looks like a bug in 
the IP# format which causes glusterfsd to crash.  The bug is: 
https://bugzilla.redhat.com:443/show_bug.cgi?id=813937
 
If anyone has an immediate workaround or correction, be glad to hear 
of it.

hjm
-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[ZOT 2225] / 92697  Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
--
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] IPoIB Volume (3.3b3) started but not online, not mountable

2012-04-18 Thread Harry Mangalam
Interim fix is to use ONLY commas, no spaces allowed (this used to be 
oK previously)

gluster volume set gli auth.allow \
 '10.255.77.*,128.200.15.*,10.255.78.*,10.255.89.*'

is ok (glusterfsd starts correctly)

but 
gluster volume set gli auth.allow '10.255.77.*, 128.200.15.*, 
10.255.78.*, 10.255.89.*'

is NOT OK (glusterfsd will not start).

hjm

On Wednesday 18 April 2012 12:56:08 Harry Mangalam wrote:
 With JoeJulian's help, tracked this down to what looks like a bug
 in the IP# format which causes glusterfsd to crash.  The bug is:
 https://bugzilla.redhat.com:443/show_bug.cgi?id=813937
 
 If anyone has an immediate workaround or correction, be glad to
 hear of it.
 
 hjm

-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[ZOT 2225] / 92697  Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
--
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] IPoIB Volume (3.3b3) started but not online, not mountable

2012-04-18 Thread Harry Mangalam
And one more observation that will probably be obvious in retrospect.  
If you enable auth.allow (on 3.3b3), it will do reverse lookups to 
verify hostnames so it will be more complicated to share an IPoIB 
gluster volume to IPoEth clients.

I had been overriding DNS entries with /etc/hosts entries, but the 
auth.allow option will prevent that hack.

If anyone knows how to share an IPoIB volume to ethernet clients in a 
more formally correct way, I'd be happy to learn of it.


-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[ZOT 2225] / 92697  Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
--
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] IPoIB Volume (3.3b3) started but not online, not mountable

2012-04-18 Thread Joe Landman

On 04/18/2012 06:58 PM, Harry Mangalam wrote:

And one more observation that will probably be obvious in retrospect. If
you enable auth.allow (on 3.3b3), it will do reverse lookups to verify
hostnames so it will be more complicated to share an IPoIB gluster
volume to IPoEth clients.

I had been overriding DNS entries with /etc/hosts entries, but the
auth.allow option will prevent that hack.

If anyone knows how to share an IPoIB volume to ethernet clients in a
more formally correct way, I'd be happy to learn of it.


After dealing with problems in multi-modal networks with slightly 
different naming schemes, I don't recommend using tcp and RDMA together 
(or even IPoIB with eth) for Gluster.  Very long, very painful saga. 
Executive summary:  here be dragons.


Also, IPoIB is very leaky.  So under heavy load, you can find your 
servers starting to run out of memory.  We've seen this with OFED 
through 1.5.3.x and Glusters as late as 3.2.6.


We'd recommend sticking to one fabric for the moment with Gluster.  Use 
real tcp with a 10 or 40 GbE backbone.  Far fewer problems.  Much less 
excitement.


Regards,

Joe


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
   http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users