Re: [Gluster-users] Gluster v 3.3 with KVM and High Availability
On 11.07.2012 21:35, Brian Candler wrote: On Wed, Jul 11, 2012 at 12:55:50PM -0500, Mark Nipper wrote: Would that be using something like O_DIRECT which FUSE doesn't support at the moment? Yes. FUSE does support it in recent kernels (3.4), and I tried it. Nothing happened until I also mounted with -o direct-io-mode=enable; with that and cache=none, the VM was unable to start up at all. This FUSE patch has been backported into RHEL 6.3 and should also work with latest 6.2 kernels. Iirc, it also worked with cache=none without any special mount options, but unfortunately I don't have possibility to test if that's true :( -samuli ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster v 3.3 with KVM and High Availability
On Thu, Jul 12, 2012 at 09:01:48AM +0300, Samuli Heinonen wrote: On 11.07.2012 21:35, Brian Candler wrote: On Wed, Jul 11, 2012 at 12:55:50PM -0500, Mark Nipper wrote: Would that be using something like O_DIRECT which FUSE doesn't support at the moment? Yes. FUSE does support it in recent kernels (3.4), and I tried it. Nothing happened until I also mounted with -o direct-io-mode=enable; with that and cache=none, the VM was unable to start up at all. This FUSE patch has been backported into RHEL 6.3 and should also work with latest 6.2 kernels. Iirc, it also worked with cache=none without any special mount options, but unfortunately I don't have possibility to test if that's true :( When I tested it, it did work with cache=none without any special mount options and a kernel which supports FUSE+O_DIRECT, but the performance was no better than without. With -o direct-io-mode=enable the VM wouldn't start at all. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster v 3.3 with KVM and High Availability
On Wed, Jul 11, 2012 at 06:06:44PM +0100, Brian Candler wrote: But my understanding from reading previous posts on this list is that using something other than a cache mode of none is acceptable and safe with Gluster at least. cache=none is definitely what we want, but doesn't currently work with glusterfs. And I forgot to add: since a KVM VM is a userland process anyway, I'd expect a big performance gain when KVM gets the ability to talk to libglusterfs to send its disk I/O directly, without going through a kernel mount (and hence bypassing the kernel cache). It looks like this is being developed now: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg01745.html You can see the performance figures at the bottom of that post. Regards, Brian. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Monitor replicated Gluster setup ?
On Wed, Jul 11, 2012 at 03:06:19PM +, Stefan Schloesser wrote: My Idea is to use GlusterFS as a replicated filesystem (for apache) and built-in mysql replication for the database. In the event of a failure I need to run a script to implement an ip switch. To switch IP for what? glusterfs in replicated mode, using the native (FUSE) client, doesn't need this. The client talks to both backends, and if either backend fails, it continues to work. [] I am slightly confused here, I don't have a client (at least in the sense of a different machine), it's only 2 servers with each running an apache which uses the Filesystem (simply mounted). The reason for the ip switch is the apache: if one fails the other should take other the workload and continue operation, this is done via the ip switch. How should I monitor such a system? I'm not sure if there's a proper API. As a starting point, try running 'gluster volume status' as root and parsing the results. e.g. here the bricks on one server are unavailable: Not that I'm aware of, as far as I can see it's not needed. Volume status information is synchronised between the peers in the cluster using some internal protocol, and I'm not exactly sure how it deals with split brain scenarios. As a minimum I need to monitor whether the file system is locally accessible, so if not I can trigger the failover. E.g. if I manually stop the glusterd service the mount point is gone and hence the apache cannot access its web root any more, or not?? Stefan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] NFS mounts with glusterd on localhost - reliable or not?
Hi, are NFS mounts made on a single server (i.e. where glusterd is running) supposed to be stable (with gluster 3.2.6)? I'm using the following line in /etc/fstab: localhost:/sites /var/ftp/sites nfs _netdev,mountproto=tcp,nfsvers=3,bg 0 0 The problem is, after some time (~1-6 hours), I'm no longer able to access this mount. dmesg says: [49609.832274] nfs: server localhost not responding, still trying [49910.639351] nfs: server localhost not responding, still trying [50211.446433] nfs: server localhost not responding, still trying What's worse, whenever this happens, *all* other servers in the cluster (it's a 10-server distributed volume) will destabilise - their load average will grow, and eventually their gluster mount becomes unresponsive, too (other servers use normal gluster mounts). At this point, I have to kill all gluster processes, start glusterd again, mount (on servers using gluster mount). Is it expected behaviour with gluster and NFS mounts on localhost? Can it be caused by some kind of deadlock? Any workarounds? -- Tomasz Chmielewski http://www.ptraveler.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Performance translators - a overview.
HI, Is there a way to get good performance when an application does small writes? Most of the apllications using NetCDF write big files (upto 100GB) but using small block-sized writes(Block size less than 1KB) -- [root@cola5 scripts]# dd if=/dev/zero of=/h1/junk bs=512 count=1024000 1024000+0 records in 1024000+0 records out 524288000 bytes (524 MB) copied, 70.7706 seconds, 7.4 MB/s [root@cola5 scripts]# dd if=/dev/zero of=/h1/junk bs=1k count=1024000 1024000+0 records in 1024000+0 records out 1048576000 bytes (1.0 GB) copied, 59.6961 seconds, 17.6 MB/s [root@cola5 scripts]# dd if=/dev/zero of=/h1/junk bs=16k count=64000 64000+0 records in 64000+0 records out 1048576000 bytes (1.0 GB) copied, 4.42826 seconds, 237 MB/s --- For very small block-sized writes write-behind does not seem to help? How to improve small write caching? Al On Mon, Jun 4, 2012 at 11:13 AM, Raghavendra G raghaven...@gluster.comwrote: Hi, The purpose of performance translators is to decrease system call latency of applications and increase responsiveness of glusterfs. The standard approach used within glusterfs to decrease system call latency is making sure we avoid network roundtrip time as part of the fop processing. And based on what fop we are dealing with, we have different translators like read-ahead, io-cache, write-behind, quick-read, md-cache. - Though read-ahead and io-cache both serve read calls, the difference lies in that read-ahead can even serve first read on an offset (since it would have read-ahead on a read with lesser offset) and io-cache can serve only requests after first read on an offset from its cache. read-ahead can have negative performance impact in the form of cache maintanence on random reads.In case of read-ahead, cache is maintained per-fd basis and io-cache maintains per-inode cache. Ceiling for cache limits can be configured. - write-behind takes the responsibility of storing writes in its cache and syncing it with disk in background. Because of this fact, we may not able to find out the fate of a write from an application in return value of that write. However write-behind communicates errors to application either in return value of current or future writes or close call. Paranoid applications which need to know errors during any writes previously done, should do an fsync. There is another option flush-behind which when turned on, makes flush calls sent as part of close, background. The consequence of doing flush in background is that posix locks on that fd might not be cleared as soon as close returns. - quick-read optimizes reads by storing small files in its cache. It gets the contents of entire file as part of lookup call done during path to inode conversion. It assumes that all opens are done with an intention of doing reads and hence doesn't really send open to translators below if the file is cached. However, it maintains the abstraction by doing open as part of other fd based fops (like fstat, fsync etc). Because of this, read-intensive applications like a web-server serving lots of small files, can save network round trip for two fops - open and read (It used to save close roundtrip call too, but with close implemented as a callback of fd-destroy, network roundtrip time is eliminated altogether). - md-cache is a translator that caches metadata like stats, certain extended attributes of files. One of the strategies to increase responsiveness is to introduce asynchronous nature - one doesn't block on a single operation to complete before taking another - during fop processing. Again asynchronous nature can be achieved using single or multiple threads. The first approach is effective only when there are blocking components in the system, like I/O with network or disk. Performance translators does not do anything helpful in this aspect (STACK_WIND and STACK_UNWIND macros, non-blocking sockets etc help here). It is in introducing parallel processing as call proceeds through gluster translator graph where io-threads (a performance translator) comes into picture. Apart from introducing parallelism, io-threads implements priority based processing of fops, which helps to increase responsiveness. There are other threads within a glusterfs process which are not maintained by io-threads like fuse-reader, posix janitor, a thread which polls on network sockets, threads processing send/receive completion queues in infiniband, threads introduced by syncops, thread processing timer events etc. regards, -- Raghavendra G ___ Gluster-users mailing list Gluster-users@gluster.org
[Gluster-users] One file incessant self heal
Hi Pranith; Thanks for your reply. I checked md5sum too, it is different. Here is my output: # getfattr -d -m . -e hex /export/data10/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f getfattr: Removing leading '/' from absolute path names # file: export/data10/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f trusted.afr.gvol1-client-2=0x0363 trusted.afr.gvol1-client-3=0x0001 trusted.gfid=0xd9b0c35033ba4090ab08f91f30dd661f trusted.glusterfs.quota.4111d3b4-7e06-483f-aae8-fbefe9e55843.contri=0x0001c0d15000 # getfattr -d -m . -e hex /export/data11/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f getfattr: Removing leading '/' from absolute path names # file: export/data11/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f trusted.afr.gvol1-client-2=0x0001 trusted.afr.gvol1-client-3=0x0001 trusted.gfid=0xd9b0c35033ba4090ab08f91f30dd661f trusted.glusterfs.quota.4111d3b4-7e06-483f-aae8-fbefe9e55843.contri=0x0001c0d15000 #stat /export/data10/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f File: `/export/data10/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f' Size: 7530086400 Blocks: 14706856 IO Block: 4096 regular file Device: 6900h/26880dInode: 81395728Links: 2 Access: (0644/-rw-r--r--) Uid: ( 107/ UNKNOWN) Gid: ( 107/ UNKNOWN) Access: 2012-06-21 09:58:59.242136421 +0800 Modify: 2012-07-10 13:42:04.381141510 +0800 Change: 2012-07-12 16:23:15.884163991 +0800 #stat /export/data11/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f File: `/export/data11/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f' Size: 7530086400 Blocks: 14706856 IO Block: 4096 regular file Device: 6910h/26896dInode: 17956874Links: 2 Access: (0644/-rw-r--r--) Uid: ( 107/ UNKNOWN) Gid: ( 107/ UNKNOWN) Access: 2012-06-21 09:58:59.242136421 +0800 Modify: 2012-07-10 13:42:04.381141510 +0800 Change: 2012-07-12 16:23:15.885163872 +0800 #md5sum /export/data10/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f 7fba61af476bf379c50f7429c89449ee /export/data10/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f #md5sum /export/data11/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f 9b08b7145c171afff863c4ae5884fa01 /export/data11/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f 2012/7/12 Pranith Kumar Karampuri pkara...@redhat.com: Homer, Could you give the output of getfattr -d -m . -e hex /export/data10/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f getfattr -d -m . -e hex /export/data11/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f and also 'stat' of these files. Pranith. - Original Message - From: Homer Li 01jay...@gmail.com To: gluster-users gluster-users@gluster.org Sent: Thursday, July 12, 2012 7:54:24 AM Subject: [Gluster-users] One file incessant self heal Hello ; I found many self-heal triggered log in every 10 minutes. Only one file , it 's gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f. Heal-failed and split-brain have not display anything. Does any problem in this file ? GlusterFS config: OS: 2.6.32-220.17.1.el6.x86_64 Scientific Linux release 6.2 (Carbon) # rpm -qa | grep glusterfs glusterfs-3.3.0-2.el6.x86_64 glusterfs-devel-3.3.0-2.el6.x86_64 glusterfs-fuse-3.3.0-2.el6.x86_64 glusterfs-geo-replication-3.3.0-2.el6.x86_64 glusterfs-rdma-3.3.0-2.el6.x86_64 glusterfs-server-3.3.0-2.el6.x86_64 glusterfs-debuginfo-3.3.0-2.el6.x86_64 gluster volume info Volume Name: gvol1 Type: Distributed-Replicate Volume ID: a7d8ffdf-7296-404b-aeab-824ee853ec59 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 172.30.1.125:/export/data00 Brick2: 172.30.1.125:/export/data01 Brick3: 172.30.1.125:/export/data10 Brick4: 172.30.1.125:/export/data11 Options Reconfigured: features.limit-usage: /source:500GB features.quota: on performance.cache-refresh-timeout: 30 performance.io-thread-count: 32 nfs.disable: off cluster.min-free-disk: 5% performance.cache-size: 128MB gluster volume heal gvol1 info Heal operation on volume gvol1 has been successful Brick 172.30.1.125:/export/data00 Number of entries: 0 Brick 172.30.1.125:/export/data01 Number of entries: 0 Brick 172.30.1.125:/export/data10 Number of entries: 1 /fs126/Graphite-monitor_vdb.qcow2 Brick 172.30.1.125:/export/data11 Number of entries: 1 /fs126/Graphite-monitor_vdb.qcow2 # gluster volume heal gvol1 info heal-failed Heal operation on volume gvol1 has been successful Brick 172.30.1.125:/export/data00 Number of entries: 0 Brick 172.30.1.125:/export/data01 Number of entries: 0 Brick 172.30.1.125:/export/data10 Number of entries: 0 Brick 172.30.1.125:/export/data11 Number of entries: 0 gluster volume heal gvol1 info split-brain Heal operation on volume gvol1 has been successful Brick 172.30.1.125:/export/data00 Number of entries: 0 Brick
Re: [Gluster-users] Gluster v 3.3 with KVM and High Availability
On Thu, Jul 12, 2012 at 03:40:14AM -0500, Mark Nipper wrote: On 12 Jul 2012, Brian Candler wrote: And I forgot to add: since a KVM VM is a userland process anyway, I'd expect a big performance gain when KVM gets the ability to talk to libglusterfs to send its disk I/O directly, without going through a kernel mount (and hence bypassing the kernel cache). It looks like this is being developed now: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg01745.html You can see the performance figures at the bottom of that post. Something concerns me about those performance figures. If I'm reading them correctly, the normal fuse mount performance is about what I was seeing, 2-3MB. And now bypassing everything, libglusterfs is still capping out a little under 20MB/s. I read it as: (aggrb) - base 72.9MB/s - fuse bypass (libglusterfs) 66.MB/s (minb) - base 18.2MB/s - fuse bypass 16.6MB/s (maxb) - base 18.9MB/s - fuse bypass 17.8MB/s etc. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster v 3.3 with KVM and High Availability
Folks, Gluster is not ready to run Virtual Machines at all. Yes you can build a 2 node cluster and live migrate machines, but the performance is poor and they need to do a lot of work on it yet. I wouldn't put in production even a cluster with low performance web server VMs until this is solved. For Archive or Multimedia general storage maybe, but not to run VMs. Perhaps someone is intending to integrate with RHEV (seems they are as it's going to be on oVirt 3.1 now) so they will put more effort to solve this problem that 10 of 10 of those who tested are reporting the same thing. Fernando -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Mark Nipper Sent: 12 July 2012 09:40 To: Brian Candler Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Gluster v 3.3 with KVM and High Availability On 12 Jul 2012, Brian Candler wrote: And I forgot to add: since a KVM VM is a userland process anyway, I'd expect a big performance gain when KVM gets the ability to talk to libglusterfs to send its disk I/O directly, without going through a kernel mount (and hence bypassing the kernel cache). It looks like this is being developed now: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg01745.html You can see the performance figures at the bottom of that post. Something concerns me about those performance figures. If I'm reading them correctly, the normal fuse mount performance is about what I was seeing, 2-3MB. And now bypassing everything, libglusterfs is still capping out a little under 20MB/s. So am I kidding myself that approaching 45-50MB/s with a FUSE based Gluster mount and using cache=writethrough is actually a safe thing to do really? I know the performance is abysmal without setting the cache mode, but is using writethrough really safe, or is it a recipe for disaster waiting to happen? -- Mark Nipper ni...@bitgnome.net (XMPP) +1 979 575 3193 - I cannot tolerate intolerant people. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster v 3.3 with KVM and High Availability
On 12 Jul 2012, Brian Candler wrote: On 12 Jul 2012, Brian Candler wrote: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg01745.html I read it as: (aggrb) - base 72.9MB/s - fuse bypass (libglusterfs) 66.MB/s (minb) - base 18.2MB/s - fuse bypass 16.6MB/s (maxb) - base 18.9MB/s - fuse bypass 17.8MB/s I was trying to figure out what the aggregate consisted of exactly, but I'm assuming it's referring to the 4 files: --- ; Read 4 files with aio at different depths So it looks like you're right; the combined, total throughput was around 65-70MB/s and the min and max refer to individual file performance at any given point during the test. That makes me feel much better! :) -- Mark Nipper ni...@bitgnome.net (XMPP) +1 979 575 3193 - It is better to sit in silence and appear ignorant, than to open your mouth and remove all doubt. -- attributed to many ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Monitor replicated Gluster setup ?
On Thu, Jul 12, 2012 at 07:38:18AM +, Stefan Schloesser wrote: To switch IP for what? glusterfs in replicated mode, using the native (FUSE) client, doesn't need this. The client talks to both backends, and if either backend fails, it continues to work. [] I am slightly confused here, I don't have a client (at least in the sense of a different machine), it's only 2 servers with each running an apache which uses the Filesystem (simply mounted). You mean Apache is reading the glusterfs bricks locally? That's wrong; any writes would screw up replication. You should mount the glusterfs volume via a FUSE mount, and have Apache access files through that mountpoint. That's what I mean by a client. The fact it happens to be on the same server as where one of the glusterfs storage bricks runs is irrelevant. Even if Apache is only reading files, going through the glusterfs mountpoint will ensure that when the two replicas are out of sync you see the correct version of each file (which will happen after one node has been down for a while, writes have happened to the other node, and then the first node has come back up again, and background healing is taking place) The reason for the ip switch is the apache: if one fails the other should take other the workload and continue operation, this is done via the ip switch. Then that's just switching the public IP which apache listens on, and is nothing to do with glusterfs. Both servers can have the glusterfs volume mounted all the time. Regards, Brian. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster v 3.3 with KVM and High Availability
On 12.07.2012 11:40, Mark Nipper wrote: Something concerns me about those performance figures. If I'm reading them correctly, the normal fuse mount performance is about what I was seeing, 2-3MB. And now bypassing everything, libglusterfs is still capping out a little under 20MB/s. It's running tests on four files at the same time. Minb shows speed of slowest test, maxb is the fastest and aggrb shows all four tests aggregated. So am I kidding myself that approaching 45-50MB/s with a FUSE based Gluster mount and using cache=writethrough is actually a safe thing to do really? I know the performance is abysmal without setting the cache mode, but is using writethrough really safe, or is it a recipe for disaster waiting to happen? When using writethrough the data is written to cache but not marked finished before it has hit the disk. Data in cache can be then used to speed up read operations. So I would consider it pretty safe, but perhaps someone else can explain it better? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster v 3.3 with KVM and High Availability
On 12 Jul 2012, Fernando Frediani (Qube) wrote: Gluster is not ready to run Virtual Machines at all. Yes you can build a 2 node cluster and live migrate machines, but the performance is poor and they need to do a lot of work on it yet. I wouldn't put in production even a cluster with low performance web server VMs until this is solved. For Archive or Multimedia general storage maybe, but not to run VMs. Perhaps someone is intending to integrate with RHEV (seems they are as it's going to be on oVirt 3.1 now) so they will put more effort to solve this problem that 10 of 10 of those who tested are reporting the same thing. I realize that this is a major new release, so buyer beware and all that. But I'm actually very happy with the performance so far. Pretty much none of our VM's are very heavy weight from a disk perspective. And anything that requires a massive amount of storage is using separate systems via NFS or iSCSI, and not the underlying Gluster file system which only contains the VM images themselves. So usually active services are all cached in RAM for the most part anyway while these VM's are running. It could certainly lend itself to a perfect storm of sorts if all the VM's suddenly start thrashing their OS disk doing updates for example. But we can stagger that easily enough to avoid it. My only concern is whether using the writethrough cache mode is actually considered safe by people in the know around here. Gluster is sufficiently magical (and this particular release is sufficiently new) enough that everything is a moving target right now. Especially with Red Hat themselves busily backporting functionality from 3.x kernels and newer versions of libvirt and KVM/qemu into RHEL 6.x (for those of us using it) to gain all of these whizbang features now as opposed to waiting forever for RHEL 7. That unfortunately means that rebooting to a new kernel version or updating a few key packages can also fundamentally change a previously stable and working setup into a nightmarish house of cards just waiting to collapse. -- Mark Nipper ni...@bitgnome.net (XMPP) +1 979 575 3193 - If Pac-Man had affected us as kids, we'd all be running around in dark rooms, munching pills and listening to repetitive electronic music. -- Marcus Brigstocke, British comedian ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster v 3.3 with KVM and High Availability
On Thu, Jul 12, 2012 at 04:01:29AM -0500, Mark Nipper wrote: I read it as: (aggrb) - base 72.9MB/s - fuse bypass (libglusterfs) 66.MB/s (minb) - base 18.2MB/s - fuse bypass 16.6MB/s (maxb) - base 18.9MB/s - fuse bypass 17.8MB/s I was trying to figure out what the aggregate consisted of exactly, but I'm assuming it's referring to the 4 files: --- ; Read 4 files with aio at different depths From the fio(1) manpage: The group statistics show: io Number of megabytes I/O performed. aggrb Aggregate bandwidth of threads in the group. minb Minimum average bandwidth a thread saw. maxb Maximum average bandwidth a thread saw. mint Shortest runtime of threads in the group. maxt Longest runtime of threads in the group. Regards, Brian. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Transport endpoint is not connected
Hi group, I'm in production with gluster for the last 2 weeks. No problems until today. As of today I've got the Transport endpoint is not connected problem on the client, maybe once every hour. df: `/services/users/6': Transport endpoint is not connected Here is my setup: I have 1 Client and 2 Servers with 2 Disks each for bricks. Glusterfs 3.3 compiled from source. # gluster volume info Volume Name: freecloud Type: Distributed-Replicate Volume ID: 1cf4804f-12aa-4cd1-a892-cec69fc2cf22 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: XX.25.137.252:/mnt/35be42b4-afb3-48a2-8b3c-17a422fd1e15 Brick2: YY.40.3.216:/mnt/7ee4f117-8aee-4cae-b08c-5e441b703886 Brick3: XX.25.137.252:/mnt/9ee7c816-085d-4c5c-9276-fd3dadac6c72 Brick4: YY.40.3.216:/mnt/311399bc-4d55-445d-8480-286c56cf493e Options Reconfigured: cluster.self-heal-daemon: on performance.cache-size: 256MB performance.io-thread-count: 32 features.quota: on Quota is ON but not used - # gluster volume status all detail Status of volume: freecloud -- Brick: Brick XX.25.137.252:/mnt/35be42b4-afb3-48a2-8b3c-17a422fd1e15 Port : 24009 Online : Y Pid : 29221 File System : xfs Device : /dev/sdd1 Mount Options: rw Inode Size : 256 Disk Space Free : 659.7GB Total Disk Space : 698.3GB Inode Count : 732571968 Free Inodes : 730418928 -- Brick: Brick YY.40.3.216:/mnt/7ee4f117-8aee-4cae-b08c-5e441b703886 Port : 24009 Online : Y Pid : 15496 File System : xfs Device : /dev/sdc1 Mount Options: rw Inode Size : 256 Disk Space Free : 659.7GB Total Disk Space : 698.3GB Inode Count : 732571968 Free Inodes : 730410396 -- Brick: Brick XX.25.137.252:/mnt/9ee7c816-085d-4c5c-9276-fd3dadac6c72 Port : 24010 Online : Y Pid : 29227 File System : xfs Device : /dev/sdc1 Mount Options: rw Inode Size : 256 Disk Space Free : 659.9GB Total Disk Space : 698.3GB Inode Count : 732571968 Free Inodes : 730417864 -- Brick: Brick YY.40.3.216:/mnt/311399bc-4d55-445d-8480-286c56cf493e Port : 24010 Online : Y Pid : 15502 File System : xfs Device : /dev/sdb1 Mount Options: rw Inode Size : 256 Disk Space Free : 659.9GB Total Disk Space : 698.3GB Inode Count : 732571968 Free Inodes : 730409337 On server1 I mount the volume and start copying files to it. Server1 is used like storage. 209.25.137.252:freecloud 1.4T 78G 1.3T 6% /home/freecloud One thing to mention is that I have a large list of subdirectories in the main directory and the list keeps getting bigger. client1# ls | wc -l 42424 --- I have one client server that mounts glusterfs and uses the files directly as the files are for low traffic web sites. On the client, there is no gluster daemon, just the mount. client1# mount -t glusterfs rscloud1.domain.net:/freecloud /services/users/6/ This all worked fine for the last 2-3 weeks. Here is a log from the crash client1:/var/log/glusterfs/services-users-6-.log pending frames: frame : type(1) op(RENAME) frame : type(1) op(RENAME) frame : type(1) op(RENAME) frame : type(1) op(RENAME) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2012-07-12 14:51:01 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.3.0 /lib/x86_64-linux-gnu/libc.so.6(+0x32480)[0x7f1e0e9f0480] /services/glusterfs//lib/libglusterfs.so.0(uuid_unpack+0x0)[0x7f1e0f79d760] /services/glusterfs//lib/libglusterfs.so.0(+0x4c526)[0x7f1e0f79d526] /services/glusterfs//lib/libglusterfs.so.0(uuid_utoa+0x26)[0x7f1e0f77ca66] /services/glusterfs//lib/glusterfs/3.3.0/xlator/features/quota.so(quota_rename_cbk+0x308)[0x7f1e09b940c8] /services/glusterfs//lib/glusterfs/3.3.0/xlator/cluster/distribute.so(dht_rename_unlink_cbk+0x454)[0x7f1e09dad264] /services/glusterfs//lib/glusterfs/3.3.0/xlator/cluster/replicate.so(afr_unlink_unwind+0xf7)[0x7f1e09ff23c7] /services/glusterfs//lib/glusterfs/3.3.0/xlator/cluster/replicate.so(afr_unlink_wind_cbk+0xb6)[0x7f1e09ff43d6]
Re: [Gluster-users] Monitor replicated Gluster setup ?
On Thu, Jul 12, 2012 at 01:13:26PM +, Stefan Schloesser wrote: I am mounting it via mount -t glusterfs -o log-level=WARNING,log-file=/var/log/gluster.log cluster-1:/shared /shared and sure, the apache will write to it .. OK. Then that's the client I was talking about. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster v 3.3 with KVM and High Availability
On 12 Jul 2012, Brian Candler wrote: I don't believe he is getting that speed with cache=writethrough. I think he means cache=writeback. No, I meant writethrough. Initially I had just done dd using something like: --- $ dd if=/dev/zero of=/home/user/testing bs=1024k count=1000 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 20.1434 s, 52.1 MB/s This is on a guest with 512MB of RAM. But then I added sync and made the file bigger for a little bit more accuracy: --- $ dd if=/dev/zero of=/home/user/testing bs=1024k count=2000 conv=sync 2000+0 records in 2000+0 records out 2097152000 bytes (2.1 GB) copied, 53.372 s, 39.3 MB/s And finally I just tried fio using: --- [global] ioengine=sync direct=1 rw=write bs=128k size=512m directory=/home/user/data1 [file1] iodepth=4 [file2] iodepth=32 [file3] iodepth=8 [file4] iodepth=16 --- which gave: --- WRITE: io=2048.0MB, aggrb=32217KB/s, minb=8054KB/s, maxb=8212KB/s, mint=63842msec, maxt=65093msec That's probably a lot closer to accurate, 32MB/s give or take. Sorry for the less than scientific approach initially. :) -- Mark Nipper ni...@bitgnome.net (XMPP) +1 979 575 3193 - LITTLE GIRL: But which cookie will you eat first? COOKIE MONSTER: Me think you have misconception of cookie-eating process. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster-3.3 Puzzler
Hi Harry, Running Gluster-3.3 on Redhat 6.2 with the RPMs provided by gluster.org. I rpm -e previous gluster install, it could be possible that it's picking up older files. I did find in the entire system with '*gluster*' and cleaned those out. So, it's a bit weird to me. Robin On 6/27/12 8:13 PM, Harry Mangalam hjmanga...@gmail.com wrote: which OS are you using? I believe 3.3 will install but won't run on older CentOSs (5.7/5.8) due to libc skew. and you did 'modprobe fuse' before you tried to mount it...? hjm On Wed, Jun 27, 2012 at 12:46 PM, Robin, Robin rob...@muohio.edu wrote: Hi, Just updated to Gluster-3.3; I can't seem to mount my initial test volume. I did the mount on the gluster server itself (which works on Gluster-3.2). # rpm -qa | grep -i gluster glusterfs-fuse-3.3.0-1.el6.x86_64 glusterfs-server-3.3.0-1.el6.x86_64 glusterfs-3.3.0-1.el6.x86_64 # gluster volume info all Volume Name: vmvol Type: Replicate Volume ID: b105560a-e157-4b94-bac9-39378db6c6c9 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: mualglup01:/mnt/gluster/vmvol001 Brick2: mualglup02:/mnt/gluster/vmvol001 Options Reconfigured: auth.allow: 127.0.0.1,134.53.*,10.* ## mount -t glusterfs mualglup01.mcs.muohio.edu:vmvol /mnt/test (did this on the gluster machine itself) I'm getting the following in the logs: + --+ [2012-06-27 15:40:52.116160] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 0-vmvol-client-0: changing port to 24009 (from 0) [2012-06-27 15:40:52.116479] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 0-vmvol-client-1: changing port to 24009 (from 0) [2012-06-27 15:40:56.055124] I [client-handshake.c:1636:select_server_supported_programs] 0-vmvol-client-0: Using Program GlusterFS 3.3.0, Num (1298437), Version (330) [2012-06-27 15:40:56.055575] I [client-handshake.c:1433:client_setvolume_cbk] 0-vmvol-client-0: Connected to 10.0.72.132:24009, attached to remote volume '/mnt/gluster/vmvol001'. [2012-06-27 15:40:56.055610] I [client-handshake.c:1445:client_setvolume_cbk] 0-vmvol-client-0: Server and Client lk-version numbers are not same, reopening the fds [2012-06-27 15:40:56.055682] I [afr-common.c:3627:afr_notify] 0-vmvol-replicate-0: Subvolume 'vmvol-client-0' came back up; going online. [2012-06-27 15:40:56.055871] I [client-handshake.c:453:client_set_lk_version_cbk] 0-vmvol-client-0: Server lk version = 1 [2012-06-27 15:40:56.057871] I [client-handshake.c:1636:select_server_supported_programs] 0-vmvol-client-1: Using Program GlusterFS 3.3.0, Num (1298437), Version (330) [2012-06-27 15:40:56.058277] I [client-handshake.c:1433:client_setvolume_cbk] 0-vmvol-client-1: Connected to 10.0.72.133:24009, attached to remote volume '/mnt/gluster/vmvol001'. [2012-06-27 15:40:56.058304] I [client-handshake.c:1445:client_setvolume_cbk] 0-vmvol-client-1: Server and Client lk-version numbers are not same, reopening the fds [2012-06-27 15:40:56.063514] I [fuse-bridge.c:4193:fuse_graph_setup] 0-fuse: switched to graph 0 [2012-06-27 15:40:56.063638] I [client-handshake.c:453:client_set_lk_version_cbk] 0-vmvol-client-1: Server lk version = 1 [2012-06-27 15:40:56.063802] I [fuse-bridge.c:4093:fuse_thread_proc] 0-fuse: unmounting /mnt/test [2012-06-27 15:40:56.064207] W [glusterfsd.c:831:cleanup_and_exit] (--/lib64/libc.so.6(clone+0x6d) [0x35f0ce592d] (--/lib64/libpthread.so.0() [0x35f14077f1] (--/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x405cfd]))) 0-: received signum (15), shutting down [2012-06-27 15:40:56.064250] I [fuse-bridge.c:4643:fini] 0-fuse: Unmounting '/mnt/test'. The server and client should be the same version (as attested by the rpm). I've seen that some other people are getting the same errors in the archive; no solutions were offered. Any help is appreciated. Thanks, Robin ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 415 South Circle View Dr, Irvine, CA, 92697 [shipping] MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] RDMA high cpu usage and poor performance
I see both glusterfsd and glusterfs eat a good 70-100% of CPU while dd runs (see below) [root@lab0 ~]# gluster volume info Volume Name: testrdma Type: Replicate Volume ID: bf7b42aa-5680-4f5c-8027-d0a56cc5e65d Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: rdma Bricks: Brick1: 10.1.0.10:/glust Brick2: 10.1.0.11:/glust Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on performance.io-thread-count: 64 [root@lab0 ~]# output from dd bs=1M if=/dev/zero of=/test/test.file : 29992+0 records in 29992+0 records out 31448891392 bytes (31 GB) copied, 319.076 s, 98.6 MB/s [root@lab0 ~]# gluster volume profile testrdma info Brick: 10.1.0.10:/glust --- Cumulative Stats: Block Size: 131072b+ No. of Reads:0 No. of Writes: 245984 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us 1 FORGET 0.00 0.00 us 0.00 us 0.00 us 5 RELEASE 0.00 0.00 us 0.00 us 0.00 us 1 RELEASEDIR 0.00 10.50 us 9.00 us 12.00 us 2 FLUSH 0.00 13.00 us 11.00 us 15.00 us 2 ENTRYLK 0.00 28.00 us 28.00 us 28.00 us 1GETXATTR 0.00 59.50 us 18.00 us 101.00 us 2 READDIR 0.00 201.00 us 201.00 us 201.00 us 1 CREATE 0.00 56.00 us 18.00 us 93.00 us 5 LOOKUP 17.89 123.50 us 47.00 us 29371.00 us 239936 WRITE 23.06 79.60 us 2.00 us 41325.00 us 479877FINODELK 59.06 227.42 us 18.00 us 52835.00 us 430226FXATTROP Duration: 3790 seconds Data Read: 0 bytes Data Written: 32241614848 bytes Interval 3 Stats: Block Size: 131072b+ No. of Reads:0 No. of Writes:82837 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us 1 RELEASE 0.00 9.00 us 9.00 us 9.00 us 1 FLUSH 0.00 18.00 us 18.00 us 18.00 us 1 LOOKUP 0.00 28.00 us 28.00 us 28.00 us 1GETXATTR 0.00 59.50 us 18.00 us 101.00 us 2 READDIR 16.61 129.30 us 47.00 us 23023.00 us 82837 WRITE 19.79 77.02 us 3.00 us 30051.00 us 165668FINODELK 63.61 282.00 us 37.00 us 48149.00 us 145458FXATTROP Duration: 472 seconds Data Read: 0 bytes Data Written: 10857611264 bytes Brick: 10.1.0.11:/glust --- Cumulative Stats: Block Size: 131072b+ No. of Reads:0 No. of Writes: 245984 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us 1 FORGET 0.00 0.00 us 0.00 us 0.00 us 5 RELEASE 0.00 0.00 us 0.00 us 0.00 us 1 RELEASEDIR 0.00 13.00 us 12.00 us 14.00 us 2 FLUSH 0.00 28.00 us 28.00 us 28.00 us 1GETXATTR 0.00 32.00 us 25.00 us 39.00 us 2 ENTRYLK 0.00 63.50 us 32.00 us 95.00 us 2 READDIR 0.00 179.00 us 179.00 us 179.00 us 1 CREATE 0.00 57.40 us 27.00 us 87.00 us 5 LOOKUP 7.76 177.75 us 40.00 us 54211.00 us 239936 WRITE 44.67 512.94 us 17.00 us 75783.00 us 478774FXATTROP 47.57 544.98 us 2.00 us 99430.00 us 479877FINODELK Duration: 3790 seconds Data Read: 0 bytes Data Written: 32241614848 bytes Interval 3 Stats: Block Size: 131072b+ No. of Reads:0 No. of Writes:82837 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us 1 RELEASE 0.00 14.00 us 14.00 us 14.00 us 1 FLUSH 0.00 27.00 us 27.00 us 27.00 us 1 LOOKUP 0.00 28.00 us 28.00
Re: [Gluster-users] Gluster-3.3 Puzzler [Solved]
I was using the ISO from Oracle and it's booting Oracle Unbreakable Kernel (rather than the upstream Redhat). Switching to upstream Redhat Kernel fixed the issue. Robin On 7/12/12 2:30 PM, Robin, Robin rob...@muohio.edu wrote: Hi Harry, Running Gluster-3.3 on Redhat 6.2 with the RPMs provided by gluster.org. I rpm -e previous gluster install, it could be possible that it's picking up older files. I did find in the entire system with '*gluster*' and cleaned those out. So, it's a bit weird to me. Robin On 6/27/12 8:13 PM, Harry Mangalam hjmanga...@gmail.com wrote: which OS are you using? I believe 3.3 will install but won't run on older CentOSs (5.7/5.8) due to libc skew. and you did 'modprobe fuse' before you tried to mount it...? hjm On Wed, Jun 27, 2012 at 12:46 PM, Robin, Robin rob...@muohio.edu wrote: Hi, Just updated to Gluster-3.3; I can't seem to mount my initial test volume. I did the mount on the gluster server itself (which works on Gluster-3.2). # rpm -qa | grep -i gluster glusterfs-fuse-3.3.0-1.el6.x86_64 glusterfs-server-3.3.0-1.el6.x86_64 glusterfs-3.3.0-1.el6.x86_64 # gluster volume info all Volume Name: vmvol Type: Replicate Volume ID: b105560a-e157-4b94-bac9-39378db6c6c9 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: mualglup01:/mnt/gluster/vmvol001 Brick2: mualglup02:/mnt/gluster/vmvol001 Options Reconfigured: auth.allow: 127.0.0.1,134.53.*,10.* ## mount -t glusterfs mualglup01.mcs.muohio.edu:vmvol /mnt/test (did this on the gluster machine itself) I'm getting the following in the logs: +--- - --+ [2012-06-27 15:40:52.116160] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 0-vmvol-client-0: changing port to 24009 (from 0) [2012-06-27 15:40:52.116479] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 0-vmvol-client-1: changing port to 24009 (from 0) [2012-06-27 15:40:56.055124] I [client-handshake.c:1636:select_server_supported_programs] 0-vmvol-client-0: Using Program GlusterFS 3.3.0, Num (1298437), Version (330) [2012-06-27 15:40:56.055575] I [client-handshake.c:1433:client_setvolume_cbk] 0-vmvol-client-0: Connected to 10.0.72.132:24009, attached to remote volume '/mnt/gluster/vmvol001'. [2012-06-27 15:40:56.055610] I [client-handshake.c:1445:client_setvolume_cbk] 0-vmvol-client-0: Server and Client lk-version numbers are not same, reopening the fds [2012-06-27 15:40:56.055682] I [afr-common.c:3627:afr_notify] 0-vmvol-replicate-0: Subvolume 'vmvol-client-0' came back up; going online. [2012-06-27 15:40:56.055871] I [client-handshake.c:453:client_set_lk_version_cbk] 0-vmvol-client-0: Server lk version = 1 [2012-06-27 15:40:56.057871] I [client-handshake.c:1636:select_server_supported_programs] 0-vmvol-client-1: Using Program GlusterFS 3.3.0, Num (1298437), Version (330) [2012-06-27 15:40:56.058277] I [client-handshake.c:1433:client_setvolume_cbk] 0-vmvol-client-1: Connected to 10.0.72.133:24009, attached to remote volume '/mnt/gluster/vmvol001'. [2012-06-27 15:40:56.058304] I [client-handshake.c:1445:client_setvolume_cbk] 0-vmvol-client-1: Server and Client lk-version numbers are not same, reopening the fds [2012-06-27 15:40:56.063514] I [fuse-bridge.c:4193:fuse_graph_setup] 0-fuse: switched to graph 0 [2012-06-27 15:40:56.063638] I [client-handshake.c:453:client_set_lk_version_cbk] 0-vmvol-client-1: Server lk version = 1 [2012-06-27 15:40:56.063802] I [fuse-bridge.c:4093:fuse_thread_proc] 0-fuse: unmounting /mnt/test [2012-06-27 15:40:56.064207] W [glusterfsd.c:831:cleanup_and_exit] (--/lib64/libc.so.6(clone+0x6d) [0x35f0ce592d] (--/lib64/libpthread.so.0() [0x35f14077f1] (--/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x405cfd]))) 0-: received signum (15), shutting down [2012-06-27 15:40:56.064250] I [fuse-bridge.c:4643:fini] 0-fuse: Unmounting '/mnt/test'. The server and client should be the same version (as attested by the rpm). I've seen that some other people are getting the same errors in the archive; no solutions were offered. Any help is appreciated. Thanks, Robin ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 415 South Circle View Dr, Irvine, CA, 92697 [shipping] MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Need advice on how to export directories
Hello everybody, I would export thought NFS, or evently native glusterFs, some directories from a volume but I need that each directory should be only exported to a specific IP address. As an exemple : /data1 would be exported only to 10.0.0.1 /data2 would be exported only to 10.0.0.2 /dataX would be exported only to 10.0.0.X with /data1, /data2 and /dataX being subdirectories in the same gluster volume. How can I do that ? I've read the parameters of the gluster NFS server but I did not saw a way to do this. Thank for your help. Michel. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users