Re: [Gluster-users] How to stop glusterfsd ?
The service is halted already. Note that by service, you would mean the /usr/sbin/glusterd process. The other processes are specific to the volume. If you wish to stop these, you must stop the volume using # gluster v stop You could also kill them, but that might come with additional repercussions like data loss, etc. From: Merlin MorgensternTo: gluster-users Date: 09/09/2015 05:27 PM Subject:[Gluster-users] How to stop glusterfsd ? Sent by:gluster-users-boun...@gluster.org I am running Gluster 3.7.x on 3 nodes and want to stop the service. Unfortunatelly this does not seem to work: sudo /usr/sbin/glusterd stop user@fx2:~$ ps -ef | grep gluster root 2334 1 0 Sep08 ?00:00:03 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/f66ad4ca3f2b040a2b828e28e9648b0d.socket root 2342 1 0 Sep08 ?00:00:04 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/673e39003114a00621cd86113e27d107.socket --xlator-option *replicate*.node-uuid=1a401094-307d-4ada-b710-58a906e97e66 root 2348 1 0 Sep08 ?00:00:06 /usr/sbin/glusterfsd -s node2 --volfile-id vol1.node2.bricks-brick1 -p /var/lib/glusterd/vols/vol1/run/node2-bricks-brick1.pid -S /var/run/gluster/8b23a0563fdaecb0c7023644ffb933f1.socket --brick-name /bricks/brick1 -l /var/log/glusterfs/bricks/bricks-brick1.log --xlator-option *-posix.glusterd-uuid=1a401094-307d-4ada-b710-58a906e97e66 --brick-port 49152 --xlator-option vol1-server.listen-port=49152 user 8189 6703 0 13:51 pts/000:00:00 grep --color=auto gluster I tried all sorts of stop comands on gluster-server without success. How can I stop Gluster?___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Can a gluster server be an NFS client ?
Check if another NFS service isn't already running. Basically, ps -ef | grep nfs at the time of failure should tell you something. From: Prasun Gera prasun.g...@gmail.com To: gluster-users@gluster.org gluster-users@gluster.org Date: 05/19/2015 02:17 AM Subject:[Gluster-users] Can a gluster server be an NFS client ? Sent by:gluster-users-boun...@gluster.org I am seeing some erratic behavior w.r.t. the NFS service on the gluster servers (RHS 3.0). The nfs service fails to start occasionally and randomly with Could not register with portmap 100021 4 38468 Program NLM4 registration failed This appears to be related to http://www.gluster.org/pipermail/gluster-users/2014-October/019215.html , although I'm not sure what the resolution is. The gluster servers use autofs to mount user home directories and other sundry directories. I could verify that stopping autofs and then starting the gluster volume seems to solve the problem. Starting autofs after gluster seems to work fine too. What's the right way to handle this ? ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] client is terrible with large amount of small files
Performance would largely depend upon setup. While I cannot think of any setup that would cause write to be this slow, if would help if you share the following details: A) Glusterfs version B) volume configuration (gluster v volname info) C) host linux version D) details about the kind of network you use to connect your servers making up your storage pool. Thanks, Anirban From: gjprabu gjpr...@zohocorp.com To: gluster-users@gluster.org Date: 04/29/2015 05:52 PM Subject:Re: [Gluster-users] client is terrible with large amount of small files Sent by:gluster-users-boun...@gluster.org Hi Team, If anybody know the solution please share us. Regards Prabu On Tue, 28 Apr 2015 19:32:40 +0530 gjprabu gjpr...@zohocorp.com wrote Hi Team, We are using glusterfs newly and testing data transfer part in client using fuse.glusterfs file system but it is terrible with large amount of small files (Large amount of small file 150MB of size it's writing around 18min). I can able copy small files and syncing between the server brick are working fine but it is terrible with large amount of small files. if anybody please share the solution for the above issue. Regards Prabu ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster probe node by hostname
Hi, Do you probe by hostname or by IP? We probe our servers by hostname. Provided both servers are up and each one knows what IP's the hostnames resolve to, gluster peer status displays only hostnames. Thanks, Anirban From: 可樂我 colacolam...@gmail.com To: gluster-users@gluster.org Date: 03/25/2015 12:54 PM Subject:[Gluster-users] gluster probe node by hostname Sent by:gluster-users-boun...@gluster.org Hi all, I have a problem about probe new node by hostname i have three nodes, (Node1, Node-2, Node3) Node1 hostname: node1 Node2 hostname: node2 Node3 hostname: node3 Step 1: Node1 probe Node2 # gluster probe node2 Step 2: modify the peer file of Node2 hostanme1=IP of Node1 = hostname1=node1(hostname of Node1) Step 3: Node1 probe Node3 #gluster probe node3 Step 4: modify the peer file of Node2 hostanme1=IP of Node1 = hostname1=node1(hostname of Node1) but it still show the IP of Node1 in hostname when I execute gluster peer status cmd if I want to hostname of all peer in cluster will only show hostname,how can I do? is any solution to fix the problem? Thank you very much!!___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] NFS File handles change during upgrade from glusterfs version 3.4.2 to 3.5.3
Hello, After upgrading from 3.4.2 to 3.5.3, I noticed that all my NFS clients (mounted over tcp) on remote nodes turned stale. I investigated this and got the following recurrent logs from the NFS server: [2015-02-23 20:40:25.834071] W [nfs3-helpers.c:3401:nfs3_log_common_res] 0-nfs-nfsv3: XID: 7ead95cb, GETATTR: NFS: 10001(Illegal NFS file handle), POSIX: 14(Bad address) [2015-02-23 20:40:25.834167] E [nfs3.c:301:__nfs3_get_volume_id] (--/usr/lib64/glusterfs/3.5.3/xlator/nfs/server.so(nfs3_getattr+0x4cb) [0x7fc728b3631a] (--/usr/lib64/glusterfs/3.5.3/xlator/nfs/server.so(nfs3_getattr_reply+0x37) [0x7fc728b35873] (--/usr/lib64/glusterfs/3.5.3/xlator/nfs/server.so(nfs3_request_xlator_deviceid+0xb0) [0x7fc728b357ba]))) 0-nfs-nfsv3: invalid argument: xl [2015-02-23 20:40:25.834801] E [nfs3.c:840:nfs3_getattr] 0-nfs-nfsv3: Bad Handle Upon investigation, it seems to me that the trouble is with procedure nfs3_fh_validate(). In 3.4.2, the validation is against the following identifiers: #define GF_NFSFH_IDENT0 ':' #define GF_NFSFH_IDENT1 'O' #define GF_NFSFH_IDENT_SIZE (sizeof(char) * 2) #define GF_NFSFH_STATIC_SIZE(GF_NFSFH_IDENT_SIZE + (2*sizeof (uuid_t))) While, on 3.5.3, this has expanded to #define GF_NFSFH_IDENT0 ':' #define GF_NFSFH_IDENT1 'O' #define GF_NFSFH_IDENT2 'G' #define GF_NFSFH_IDENT3 'L' #define GF_NFSFH_IDENT_SIZE (sizeof(char) * 4) #define GF_NFSFH_STATIC_SIZE(GF_NFSFH_IDENT_SIZE + (2*sizeof (uuid_t))) Due to this, I have to unmount+mount all my nfs clients to get them back into service. Could somebody help me understand why this change was introduced (maybe a reference of the bug ID)? Also, any chance that there's a way to get around this? Thank you in advance for your suggestions. Anirban =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] pb glusterfs 3.4.2 built on Jan 3 2014 12:38:05
Correction on below mail - I forgot to mention the relevant log. From: A Ghoshal/MUM/TCS To: Pierre Léonard pleon...@jouy.inra.fr Cc: gluster-users@gluster.org, gluster-users-boun...@gluster.org Date: 02/21/2015 03:44 PM Subject:Re: [Gluster-users] pb glusterfs 3.4.2 built on Jan 3 2014 12:38:05 Sent by:A Ghoshal Hi Pierre, I looked up the following log in the source code: [2015-02-20 14:31:24.969984] E [glusterd-store.c:2487:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore Looks like what glusterd is trying to do is map all peers to their UUIDs. This information is generally stored in /var/lib/glusterd/peers If you look in there, you will find one file for each peer. Here's an example from my system: root@serv0:/var/lib/glusterd/peers ls da6b79c8-38c2-411d-b522-30229a9e907f root@ksc-base-unit0:/var/lib/glusterd/peers cat da6b79c8-38c2-411d-b522-30229a9e907f uuid=da6b79c8-38c2-411d-b522-30229a9e907f state=3 hostname1=serv1 root@serv0:/var/lib/glusterd/peers So, you should have such a file there for each host present in your pool. If not, there may be a problem. Also, I think that error with rdma.so shouldn't be a problem. It's just glusterd's way to check whether your build supports rdma or tcp. Your system must be using tcp socket to communicate among peers instead. Thanks, Anirban P.s. This is kind of a disclaimer - I am NOT a Red Hat developer, and not associated with the glusterfs development team in any official capacity. It's just that I use it from time to time. From: Pierre Léonard pleon...@jouy.inra.fr To: gluster-users@gluster.org Date: 02/20/2015 10:25 PM Subject:Re: [Gluster-users] pb glusterfs 3.4.2 built on Jan 3 2014 12:38:05 Sent by:gluster-users-boun...@gluster.org Hi Ghoshal, That's funny. What's your glusterd version? glusterd --version glusterfs 3.5.3 built on Nov 13 2014 11:06:04 It seems that I have diffrent release of glusterfs. Could be a problem. I know also that I have made an update of that computer a knew kernel, and new openssl and glibc . I remember I get a problem with the peers files a guy named Kaushal help me. The files wher not good on that same computer. So I find the good peers files by analysing all my 14 nodes and the server restart. Today I have check the peers files but find no evidency of mistake. sincerely -- Pierre Léonard Senior IT Manager MetaGenoPolis pierre.leon...@jouy.inra.fr Tél. : +33 (0)1 34 65 29 78 Centre de recherche INRA Domaine de Vilvert Bât. 325 R+1 78 352 Jouy-en-Josas CEDEX France www.mgps.eu ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] pb glusterfs 3.4.2 built on Jan 3 2014 12:38:05
Hi Pierre, I looked up the following log in the source code: Looks like what glusterd is trying to do is map all peers to their UUIDs. This information is generally stored in /var/lib/glusterd/peers If you look in there, you will find one file for each peer. Here's an example from my system: root@serv0:/var/lib/glusterd/peers ls da6b79c8-38c2-411d-b522-30229a9e907f root@ksc-base-unit0:/var/lib/glusterd/peers cat da6b79c8-38c2-411d-b522-30229a9e907f uuid=da6b79c8-38c2-411d-b522-30229a9e907f state=3 hostname1=serv1 root@serv0:/var/lib/glusterd/peers So, you should have such a file there for each host present in your pool. If not, there may be a problem. Also, I think that error with rdma.so shouldn't be a problem. It's just glusterd's way to check whether your build supports rdma or tcp. Your system must be using tcp socket to communicate among peers instead. Thanks, Anirban P.s. This is kind of a disclaimer - I am NOT a Red Hat developer, and not associated with the glusterfs development team in any official capacity. It's just that I use it from time to time. From: Pierre Léonard pleon...@jouy.inra.fr To: gluster-users@gluster.org Date: 02/20/2015 10:25 PM Subject:Re: [Gluster-users] pb glusterfs 3.4.2 built on Jan 3 2014 12:38:05 Sent by:gluster-users-boun...@gluster.org Hi Ghoshal, That's funny. What's your glusterd version? glusterd --version glusterfs 3.5.3 built on Nov 13 2014 11:06:04 It seems that I have diffrent release of glusterfs. Could be a problem. I know also that I have made an update of that computer a knew kernel, and new openssl and glibc . I remember I get a problem with the peers files a guy named Kaushal help me. The files wher not good on that same computer. So I find the good peers files by analysing all my 14 nodes and the server restart. Today I have check the peers files but find no evidency of mistake. sincerely -- Pierre Léonard Senior IT Manager MetaGenoPolis pierre.leon...@jouy.inra.fr Tél. : +33 (0)1 34 65 29 78 Centre de recherche INRA Domaine de Vilvert Bât. 325 R+1 78 352 Jouy-en-Josas CEDEX France www.mgps.eu ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Gluster-devel] In a replica 2 server, file-updates on one server missing on the other server #Personal#
I found out the reason this happens a few of days back. Just to let you know.. It seems it has partly to do with the way we handle reboots on our setup. When we take down one of our replica servers (for testing/maintenance), to ensure that the bricks are unmounted correctly, we kill off the glusterfsd processes (short of stopping the volume and causing service disruption to the mount clients). Let us assume that serv1 is being rebooted. When we kill off glusterfsd, For file-systems that are normally not accessed: 1. ping between the mount client on serv0 and the brick's glusterfsd on serv1 times out. In our system, this ping is configured at 10 seconds. 2. At this point, the mount client on serv0 destroys the now defunct TCP connection and querying the port of the remote brick with the remote glusterd process. 3. But, since by this time serv1 is already down, no response arrives, and the local mount client retries the query till serv1 is up once more, upon which the glusterd on serv1 responds with the newly allocated port number for the brick, and a new connection is thus established. For frequently accessed file-systems: 1. it is one of the file operations (read/write) that times out. This happens much earlier than 10 seconds. This results in the connection being destroyed and the mount client on serv0 querying remote glusterd for the remote brick's port number. 2. Because this happens so quickly, glusterd on serv1 is not yet down, and also unaware that the local brick is not alive anymore. So, it returns the port number of the dead process. 3. For the mount client on serv0, since the query succeeded, it does not attempt another port query, but instead tries to connect to the stale port number ad infinitum. Our solution to this problem is simple - before we kill glusterfsd and unmount the bricks, we stop glusterd: /etc/init.d/glusterd stop This ensures that the portmap queries by the mount client on serv0 are never honored. Thanks, Anirban From: A Ghoshal/MUM/TCS To: Ben England bengl...@redhat.com Cc: gluster-users@gluster.org Date: 02/05/2015 04:50 AM Subject:Re: [Gluster-devel] [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal# Sent by:A Ghoshal CC gluster-users. No, there aren't any firewall rules in our server. As I write in one of my earlier emails, if I kill the mount client, and remount the volume, then the problem disappears. That is to say, this causes the client to refresh remote port data and from there everything's fine. Also, we dont' use gfapi - and bind() is always good. From: Ben England bengl...@redhat.com To: A Ghoshal a.ghos...@tcs.com Date: 02/05/2015 04:40 AM Subject:Re: [Gluster-devel] [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal# could it be a problem with iptables blocking connections? DO iptables --list and make sure gluster ports are allowed through, at both ends. Also, if you are using libgfapi, be sure you use rpc-auth-allow-insecure if you have a lot of gfapi instances, or else you'll run into problems. - Original Message - From: A Ghoshal a.ghos...@tcs.com To: Ben England bengl...@redhat.com Sent: Wednesday, February 4, 2015 6:07:10 PM Subject: Re: [Gluster-devel] [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal# Thanks, Ben, same here :/ I actually get port numbers for glusterfsd in any of the three ways: 1. gluster volume status VOLNAME 2. command line for glusterfsd on target server. 3. if you're really paranoid, get the glusterfsd PID and use netstat. Looking at the code it seems to me that the whole thing operates on a statd-notify paradigm. Your local mount client registers for notify on all remote glusterfsd's. When remote brick goes down and comes back up, you are notified and then it calls portmap to obtain remote glusterfsd port. I see here that both glusterd are up. But somehow the port number of the remote glusterfsd with the mount client is now stale - not sure how it happens. Now, the client keeps trying to connect on the stale port every 3 seconds. It gets the return errno of -111 (-ECONNREFUSED) which is clearly indicating that there is not listener on the remote host's IP at this port. Design-wise, could it indicate to the mount client that the port number information needs to be refreshed? Would you say this is a bug of sorts? From: Ben England bengl...@redhat.com To: A Ghoshal a.ghos...@tcs.com Date: 02/05/2015 03:59 AM Subject:Re: [Gluster-devel] [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal# I thought Gluster was based on ONC RPC, which means there are no fixed port numbers except for glusterd (24007). The client connects to Glusterd, reads
Re: [Gluster-users] pb glusterfs 3.4.2 built on Jan 3 2014 12:38:05
Something's wrong with the configuration data in /var/lib/glusterd. Try run glusterd with debug: glusterd --debug It might have more details. From: Pierre Léonard pleon...@jouy.inra.fr To: gluster-users@gluster.org gluster-users@gluster.org Date: 02/20/2015 08:08 PM Subject:[Gluster-users] pb glusterfs 3.4.2 built on Jan 3 2014 12:38:05 Sent by:gluster-users-boun...@gluster.org Hi All, I have a problem on restarting the glusterd service release 3.4.2 some of my 14 nodes (Centos 6.5 and 6.6) have stop the service and when I want to restart it I got that message in etc-glusterfs-glusterd.vol.log [root@xstoocky10 glusterfs]# cat etc-glusterfs-glusterd.vol.log [2015-02-20 14:31:22.094851] I [glusterfsd.c:1910:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.4.2 (/usr/sbin/glusterd --pid-file=/var/run/glusterd.pid) [2015-02-20 14:31:22.099381] I [glusterd.c:961:init] 0-management: Using /var/lib/glusterd as working directory [2015-02-20 14:31:22.103021] I [socket.c:3480:socket_init] 0-socket.management: SSL support is NOT enabled [2015-02-20 14:31:22.103056] I [socket.c:3495:socket_init] 0-socket.management: using system polling thread [2015-02-20 14:31:22.103949] W [rdma.c:4197:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device) [2015-02-20 14:31:22.103980] E [rdma.c:4485:init] 0-rdma.management: Failed to initialize IB Device [2015-02-20 14:31:22.103995] E [rpc-transport.c:320:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2015-02-20 14:31:22.104080] W [rpcsvc.c:1389:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed [2015-02-20 14:31:24.177993] I [glusterd-store.c:1339:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 2 [2015-02-20 14:31:24.189994] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-0 [2015-02-20 14:31:24.190047] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-1 [2015-02-20 14:31:24.190070] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-2 [2015-02-20 14:31:24.190090] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-3 [2015-02-20 14:31:24.190109] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-4 [2015-02-20 14:31:24.190128] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-5 [2015-02-20 14:31:24.190147] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-6 [2015-02-20 14:31:24.190166] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-7 [2015-02-20 14:31:24.190185] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-8 [2015-02-20 14:31:24.190203] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-9 [2015-02-20 14:31:24.190222] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-10 [2015-02-20 14:31:24.190242] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-11 [2015-02-20 14:31:24.190261] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-12 [2015-02-20 14:31:24.190280] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-13 [2015-02-20 14:31:24.630365] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-0 [2015-02-20 14:31:24.630416] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-1 [2015-02-20 14:31:24.630439] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-2 [2015-02-20 14:31:24.630460] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-3 [2015-02-20 14:31:24.630479] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-4 [2015-02-20 14:31:24.630499] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-5 [2015-02-20 14:31:24.630518] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-6 [2015-02-20 14:31:24.630538] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-7 [2015-02-20 14:31:24.630557] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-8 [2015-02-20 14:31:24.630577] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-9 [2015-02-20 14:31:24.630597] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-10 [2015-02-20 14:31:24.630617] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-11 [2015-02-20 14:31:24.630636] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-12 [2015-02-20 14:31:24.630668] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-:
Re: [Gluster-users] pb glusterfs 3.4.2 built on Jan 3 2014 12:38:05
That's funny. What's your glusterd version? glusterd --version -Pierre Léonard pleon...@jouy.inra.fr wrote: - === To: A Ghoshal a.ghos...@tcs.com, gluster-users@gluster.org gluster-users@gluster.org From: Pierre Léonard pleon...@jouy.inra.fr Date: 02/20/2015 09:19PM Subject: Re: [Gluster-users] pb glusterfs 3.4.2 built on Jan 3 2014 12:38:05 === Hi Ghoshal, Something's wrong with the configuration data in /var/lib/glusterd. Try run glusterd with debug: glusterd --debug It might have more details. OK I have found somme bad iptables and stop it but that does not solve the problem. It seems to get a rdma transport :/usr/lib64/glusterfs/3.5.3/rpc-transport/rdma.so Which is not the good release. I have a folder /usr/lib64/glusterfs/3.6.1/rpc-transport/rdma.so the debug listening follow : [root@xstoocky06 ~]# glusterd --debug [2015-02-20 15:34:46.367984] I [glusterfsd.c:1959:main] 0-glusterd: Started running glusterd version 3.5.3 (glusterd --debug) [2015-02-20 15:34:46.368295] D [glusterfsd.c:596:get_volfp] 0-glusterfsd: loading volume file /etc/glusterfs/glusterd.vol [2015-02-20 15:34:46.419687] I [glusterd.c:1122:init] 0-management: Using /var/lib/glusterd as working directory [2015-02-20 15:34:46.419820] D [glusterd.c:345:glusterd_rpcsvc_options_build] 0-: listen-backlog value: 128 [2015-02-20 15:34:46.420519] D [rpcsvc.c:2183:rpcsvc_init] 0-rpc-service: RPC service inited. [2015-02-20 15:34:46.420544] D [rpcsvc.c:1812:rpcsvc_program_register] 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0 [2015-02-20 15:34:46.420601] D [rpc-transport.c:262:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.5.3/rpc-transport/socket.so [2015-02-20 15:34:46.424797] I [socket.c:3645:socket_init] 0-socket.management: SSL support is NOT enabled [2015-02-20 15:34:46.424831] I [socket.c:3660:socket_init] 0-socket.management: using system polling thread [2015-02-20 15:34:46.424853] D [name.c:557:server_fill_address_family] 0-socket.management: option address-family not specified, defaulting to inet [2015-02-20 15:34:46.424968] D [rpc-transport.c:262:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.5.3/rpc-transport/rdma.so [2015-02-20 15:34:46.448374] D [rpc-transport.c:300:rpc_transport_load] 0-rpc-transport: dlsym (gf_rpc_transport_reconfigure) on /usr/lib64/glusterfs/3.5.3/rpc-transport/rdma.so: undefined symbol: reconfigure librdmacm: Warning: couldn't read ABI version. librdmacm: Warning: assuming: 4 librdmacm: Fatal: unable to get RDMA device list [2015-02-20 15:34:46.448538] W [rdma.c:4194:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device) [2015-02-20 15:34:46.448557] E [rdma.c:4482:init] 0-rdma.management: Failed to initialize IB Device [2015-02-20 15:34:46.448570] E [rpc-transport.c:333:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2015-02-20 15:34:46.448680] W [rpcsvc.c:1535:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed [2015-02-20 15:34:46.448703] D [rpcsvc.c:1812:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc peer, Num: 1238437, Ver: 2, Port: 0 [2015-02-20 15:34:46.448718] D [rpcsvc.c:1812:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc cli read-only, Num: 1238463, Ver: 2, Port: 0 [2015-02-20 15:34:46.448731] D [rpcsvc.c:1812:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc mgmt, Num: 1238433, Ver: 2, Port: 0 [2015-02-20 15:34:46.448744] D [rpcsvc.c:1812:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Portmap, Num: 34123456, Ver: 1, Port: 0 [2015-02-20 15:34:46.448757] D [rpcsvc.c:1812:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Handshake, Num: 14398633, Ver: 2, Port: 0 [2015-02-20 15:34:46.448769] D [rpcsvc.c:1812:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster MGMT Handshake, Num: 1239873, Ver: 1, Port: 0 [2015-02-20 15:34:46.448828] D [rpcsvc.c:2183:rpcsvc_init] 0-rpc-service: RPC service inited. [2015-02-20 15:34:46.448843] D [rpcsvc.c:1812:rpcsvc_program_register] 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0 [2015-02-20 15:34:46.448868] D [rpc-transport.c:262:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.5.3/rpc-transport/socket.so [2015-02-20 15:34:46.448956] D [socket.c:3533:socket_init] 0-socket.management: disabling nodelay [2015-02-20 15:34:46.448980] I [socket.c:3645:socket_init] 0-socket.management: SSL support is NOT enabled [2015-02-20 15:34:46.448996] I [socket.c:3660:socket_init] 0-socket.management: using system polling thread [2015-02-20 15:34:46.449100] D [rpcsvc.c:1812:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc cli
Re: [Gluster-users] A few queries on self-healing and AFR (glusterfs 3.4.2)
Thank you, Krutika. We are currently planning to migrate our system to 3.5.3. Should be done in a month. If you look at my follow up mail, though, and also at http://www.gluster.org/pipermail/gluster-users/2015-February/020519.html, which is another thread I started some time back, but now find out that they're basically the same problem. The problem, what I found out was this: I have the following setup: Volume Name: replicated_vol Type: Replicate Volume ID: 26d111e3-7e4c-479e-9355-91635ab7f1c2 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: serv0:/mnt/bricks/replicated_vol/brick Brick2: serv1:/mnt/bricks/replicated_vol/brick Options Reconfigured: diagnostics.client-log-level: INFO network.ping-timeout: 10 nfs.enable-ino32: on cluster.self-heal-daemon: on nfs.disable: off replicated_vol is mounted using mount.glusterfs at /mnt/replicated_vol on both servers. I found out using `netstat` that while the mount client (usr/sbin/glusterfs) on serv1 was connection to three ports (local glusterd, and local and remote glusterfsd), the mount client on serv0 was connected only to the local glusterfsd and glusterd. In effect, none of the write requests serviced by the mount client on serv0 were not being sent to glusterfsd on the serv1. All writes were being transferred to serv1 from serv0 only later by the shd once every cluster.heal-timeout. More investigation revealed the following: mount-client on serv0 had stale port information about the listen port of glusterfsd on serv1. On Jan 30 serv1 underwent a reboot, following which the brick-port on it changed but the mount client on serv0 was never made aware about it and continued to attempt connection on the old port number every 3 seconds (also filling up my /var/log in the process). More technical details may be found in the email link that I pasted above. I'd greatly appreciate some advice on what should be the next thing to look for. Also, we do not have a firewall on our servers - they're only test setups and not downright prod.. Thanks again, Anirban From: Krutika Dhananjay kdhan...@redhat.com To: A Ghoshal a.ghos...@tcs.com Cc: gluster-users@gluster.org Date: 02/05/2015 05:44 PM Subject:Re: [Gluster-users] A few queries on self-healing and AFR (glusterfs 3.4.2) From: A Ghoshal a.ghos...@tcs.com To: gluster-users@gluster.org Sent: Tuesday, February 3, 2015 12:00:15 AM Subject: [Gluster-users] A few queries on self-healing and AFR (glusterfs 3.4.2) Hello, I have a replica-2 volume in which I store a large number of files that are updated frequently (critical log files, etc). My files are generally stable, but one thing that does worry me from time to time is that files show up on one of the bricks in the output of gluster v volname heal info. These entries disappear on their own after a while (I am guessing when cluster.heal-timeout expires and another heal by the self-heal daemon is triggered). For certain files, this could be a bit of a bother - in terms of fault tolerance... In 3.4.x, even files that are currently undergoing modification will be listed in heal-info output. So this could be the reason why the file(s) disappear from the output after a while, in which case reducing cluster.heal-timeout might not solve the problem. Since 3.5.1, heal-info _only_ reports those files which are truly undergoing heal. I was wondering if there is a way I could force AFR to return write-completion to the application only _after_ the data is written to both replicas successfully (kind of, like, atomic writes) - even if it were at the cost of performance. This way I could ensure that my bricks shall always be in sync. AFR has always returned write-completion status to the application only _after_ the data is written to all replicas. The appearance of files under modification in heal-info output might have led you to think the changes have not (yet) been synced to the other replica(s). The other thing I could possibly do is reduce my cluster.heal-timeout (it is 600 currently). Is it a bad idea to set it to something as small as say, 60 seconds for volumes where redundancy is a prime concern? One question, though - is heal through self-heal daemon accomplished using separate threads for each replicated volume, or is it a single thread for every volume? The reason I ask is I have a large number of replicated file-systems on each volume (17, to be precise) but I do have a reasonably powerful multicore processor array and large RAM and top indicates the load on the system resources is quite moderate. There is an infra piece called syncop in gluster using which multiple heal jobs are handled by handful of threads. The maximum it can scale up to is 16 depending on the load. It is safe to assume that there will be one healer thread per replica set. But if the load is not too high, just 1 thread may do all
Re: [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal#
Sorry for spamming you guys, but this is kind of important for me to debug, so if you saw anything like this before, do let me know. Here's an update: It seems the mount client is attempting connection with an invalid port number. 49175 is NOT the port number of glusterfsd on serv1 (192.168.24.8). I got me an strace: [pid 31026] open(/proc/sys/net/ipv4/ip_local_reserved_ports, O_RDONLY) = -1 ENOENT (No such file or directory) [pid 31026] write(4, [2015-02-04 20:39:02.793154] W [..., 215) = 215 [pid 31026] write(4, [2015-02-04 20:39:02.793289] W [..., 194) = 194 [pid 31026] bind(10, {sa_family=AF_INET, sin_port=htons(1023), sin_addr=inet_addr(192.168.24.80)}, 16) = 0 [pid 31026] fcntl(10, F_GETFL) = 0x2 (flags O_RDWR) [pid 31026] fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 31026] connect(10, {sa_family=AF_INET, sin_port=htons(49175), sin_addr=inet_addr(192.168.24.81)}, 16) = -1 EINPROGRESS (Operation now in progress) [pid 31026] fcntl(10, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK) [pid 31026] fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 31026] epoll_ctl(3, EPOLL_CTL_ADD, 10, {EPOLLIN|EPOLLPRI|EPOLLOUT, {u32=10, u64=8589934602}}) = 0 [pid 31026] nanosleep({1, 0}, unfinished ... [pid 31021] ... epoll_wait resumed {{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=10, u64=8589934602}}}, 257, 4294967295) = 1 [pid 31021] getsockopt(10, SOL_SOCKET, SO_ERROR, [29422518842425455], [4]) = 0 [pid 31021] shutdown(10, 2 /* send and receive */) = -1 ENOTCONN (Transport endpoint is not connected) Which basically told me that connection is attempted via a non-blocking socket at port 49175. The errno from the failure is -ECONNREFUSED, which is what is expected. 807 in socket.c (gdb) bt #0 __socket_connect_finish (this=0x6887a0) at socket.c:807 #1 socket_connect_finish (this=0x6887a0) at socket.c:2147 #2 0x7fc863de4c04 in socket_event_handler (fd=value optimized out, idx=value optimized out, data=0x6887a0, poll_in=1, poll_out=4, poll_err=value optimized out) at socket.c:2223 #3 0x7fc867f7919f in event_dispatch_epoll_handler (event_pool=0x62db70) at event-epoll.c:384 #4 event_dispatch_epoll (event_pool=0x62db70) at event-epoll.c:445 #5 0x00406b06 in main (argc=4, argv=0x7fff25302c38) at glusterfsd.c:1934 (gdb) print *optval Cannot access memory at address 0x6f (gdb) print optval $1 = 111 Note that this agrees with the following debug log: [2015-02-03 12:11:33.833647] D [socket.c:1962:__socket_proto_state_machine] 0-replicated_vol-1: reading from socket failed. Error (No data available), peer (192.168.24.81:49175) There is, of course no service running on port 49175. In fact, listen port for corresponding glusterd on serv1 is 49206. Where does the mount client pick this port number from? I know that if I kill and restart the mount client on serv0 from command line, then the problem will disappear. So, it's not something that is up with the processes on serv1... Thanks, Anirban From: A Ghoshal/MUM/TCS To: A Ghoshal a.ghos...@tcs.com Cc: gluster-users@gluster.org, gluster-users-boun...@gluster.org, Pranith Kumar Karampuri pkara...@redhat.com Date: 02/05/2015 02:03 AM Subject:Re: [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal# Sent by:A Ghoshal Ok, more updates here: I turned on trace and it seems bind to secure port on the mount client with the remote brick is successful - afterwards the connect() fails to complete. I saw these logs: [2015-02-03 12:11:33.832615] T [rpc-clnt.c:422:rpc_clnt_reconnect] 0-replicated_vol-1: attempting reconnect [2015-02-03 12:11:33.832666] D [name.c:155:client_fill_address_family] 0-replicated_vol-1: address-family not specified, guessing it to be inet from (remote-host: serv1) [2015-02-03 12:11:33.832683] T [name.c:225:af_inet_client_get_remote_sockaddr] 0-replicated_vol-1: option remote-port missing in volume replicated_vol-1. Defaulting to 24007 [2015-02-03 12:11:33.833083] D [common-utils.c:237:gf_resolve_ip6] 0-resolver: returning ip-192.168.24.81 (port-24007) for hostname: serv1 and port: 24007 [2015-02-03 12:11:33.833113] T [socket.c:731:__socket_nodelay] 0-replicated_vol-1: NODELAY enabled for socket 10 [2015-02-03 12:11:33.833128] T [socket.c:790:__socket_keepalive] 0-replicated_vol-1: Keep-alive enabled for socket 10, interval 2, idle: 20 [2015-02-03 12:11:33.833188] W [common-utils.c:2247:gf_get_reserved_ports] 0-glusterfs: could not open the file /proc/sys/net/ipv4/ip_local_reserved_ports for getting reserved ports info (No such file or directory) [2015-02-03 12:11:33.833204] W [common-utils.c:2280:gf_process_reserved_ports] 0-glusterfs: Not able to get reserved ports, hence there is a possibility that glusterfs may consume reserved port [2015-02-03 12:11:33.833560] D [socket.c:605:__socket_shutdown] 0-replicated_vol-1: shutdown() returned -1. Transport endpoint is not connected [2015-02-03 12:11
Re: [Gluster-users] [Gluster-devel] In a replica 2 server, file-updates on one server missing on the other server #Personal#
CC gluster-users. No, there aren't any firewall rules in our server. As I write in one of my earlier emails, if I kill the mount client, and remount the volume, then the problem disappears. That is to say, this causes the client to refresh remote port data and from there everything's fine. Also, we dont' use gfapi - and bind() is always good. From: Ben England bengl...@redhat.com To: A Ghoshal a.ghos...@tcs.com Date: 02/05/2015 04:40 AM Subject:Re: [Gluster-devel] [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal# could it be a problem with iptables blocking connections? DO iptables --list and make sure gluster ports are allowed through, at both ends. Also, if you are using libgfapi, be sure you use rpc-auth-allow-insecure if you have a lot of gfapi instances, or else you'll run into problems. - Original Message - From: A Ghoshal a.ghos...@tcs.com To: Ben England bengl...@redhat.com Sent: Wednesday, February 4, 2015 6:07:10 PM Subject: Re: [Gluster-devel] [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal# Thanks, Ben, same here :/ I actually get port numbers for glusterfsd in any of the three ways: 1. gluster volume status VOLNAME 2. command line for glusterfsd on target server. 3. if you're really paranoid, get the glusterfsd PID and use netstat. Looking at the code it seems to me that the whole thing operates on a statd-notify paradigm. Your local mount client registers for notify on all remote glusterfsd's. When remote brick goes down and comes back up, you are notified and then it calls portmap to obtain remote glusterfsd port. I see here that both glusterd are up. But somehow the port number of the remote glusterfsd with the mount client is now stale - not sure how it happens. Now, the client keeps trying to connect on the stale port every 3 seconds. It gets the return errno of -111 (-ECONNREFUSED) which is clearly indicating that there is not listener on the remote host's IP at this port. Design-wise, could it indicate to the mount client that the port number information needs to be refreshed? Would you say this is a bug of sorts? From: Ben England bengl...@redhat.com To: A Ghoshal a.ghos...@tcs.com Date: 02/05/2015 03:59 AM Subject:Re: [Gluster-devel] [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal# I thought Gluster was based on ONC RPC, which means there are no fixed port numbers except for glusterd (24007). The client connects to Glusterd, reads the volfile, and gets the port numbers of the registered glusterfsd processes at that time, then it connects to glusterfsd. Make sense? What you need to know is whether glusterfsd is running or not, and whether glusterd is finding out current state of glusterfsd. /var/log/glusterfsd/bricks/*log has log files for each glusterfsd process, you might be able to see from that what glusterfsd port number is. /var/log/glusterfs/etc*log is glusterd's log file, it might say whether glusterd knows about glusterfsd. I'm not as good at troubleshooting as some of the other people are so don't take my word for it. -ben - Original Message - From: A Ghoshal a.ghos...@tcs.com To: gluster-de...@gluster.org Cc: gluster-users@gluster.org, gluster-users-boun...@gluster.org Sent: Wednesday, February 4, 2015 4:36:02 PM Subject: Re: [Gluster-devel] [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal# Sorry for spamming you guys, but this is kind of important for me to debug, so if you saw anything like this before, do let me know. Here's an update: It seems the mount client is attempting connection with an invalid port number. 49175 is NOT the port number of glusterfsd on serv1 (192.168.24.8). I got me an strace: [pid 31026] open(/proc/sys/net/ipv4/ip_local_reserved_ports, O_RDONLY) = -1 ENOENT (No such file or directory) [pid 31026] write(4, [2015-02-04 20:39:02.793154] W [..., 215) = 215 [pid 31026] write(4, [2015-02-04 20:39:02.793289] W [..., 194) = 194 [pid 31026] bind(10, {sa_family=AF_INET, sin_port=htons(1023), sin_addr=inet_addr(192.168.24.80)}, 16) = 0 [pid 31026] fcntl(10, F_GETFL) = 0x2 (flags O_RDWR) [pid 31026] fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 31026] connect(10, {sa_family=AF_INET, sin_port=htons(49175), sin_addr=inet_addr(192.168.24.81)}, 16) = -1 EINPROGRESS (Operation now in progress) [pid 31026] fcntl(10, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK) [pid 31026] fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 31026] epoll_ctl(3, EPOLL_CTL_ADD, 10, {EPOLLIN|EPOLLPRI|EPOLLOUT, {u32=10, u64=8589934602}}) = 0 [pid 31026] nanosleep({1, 0}, unfinished ... [pid 31021] ... epoll_wait resumed {{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP
Re: [Gluster-users] A few queries on self-healing and AFR (glusterfs 3.4.2)
It seems I found out what goes wrong here - and this was useful learning to me: On one of the replica servers, the client mount did not have an open port to communicate with the other krfsd process. To illustrate: root@serv1:/root ps -ef | grep replicated_vol root 30627 1 0 Jan29 ?00:17:30 /usr/sbin/glusterfs --volfile-id=replicated_vol --volfile-server=serv1 /mnt/replicated_vol root 31132 18322 0 23:04 pts/100:00:00 grep _opt_kapsch_cnp_data_memusage root 31280 1 0 06:32 ?00:09:10 /usr/sbin/glusterfsd -s serv1 --volfile-id replicated_vol.serv1.mnt-bricks-replicated_vol-brick -p /var/lib/glusterd/vols/replicated_vol/run/serv1-mnt-bricks-replicated_vol-brick.pid -S /var/run/4d70e99b47c1f95cc2eab1715d3a9b67.socket --brick-name /mnt/bricks/replicated_vol/brick -l /var/log/glusterfs/bricks/mnt-bricks-replicated_vol-bricks.log --xlator-option *-posix.glusterd-uuid=c7930be6-969f-4f62-b119-c5bbe4df22a3 --brick-port 49172 --xlator-option replicated_vol.listen-port=49172 root@serv1:/root netstat -p | grep 30627 tcp0 0 serv1:715 serv1:24007 ESTABLISHED 30627/glusterfs = client-local glusterd tcp0 0 serv1:863 serv1:49172 ESTABLISHED 30627/glusterfs = client-local brick root@serv1:/root However, the client on the other server did have a port open to the mount, and so whatever one wrote on the other server synced over immediately. root@serv0:/root ps -ef | grep replicated_vol root 12761 7556 0 23:05 pts/100:00:00 replicated_vol root 15067 1 0 06:32 ?00:04:50 /usr/sbin/glusterfsd -s serv1 --volfile-id replicated_vol.serv1.mnt-bricks-replicated_vol-brick -p /var/lib/glusterd/vols/replicated_vol/run/serv1-mnt-bricks-replicated_vol-brick.pid -S /var/run/f642d7dbff0ab7a475a23236f6f50b33.socket --brick-name /mnt/bricks/replicated_vol/brick -l /var/log/glusterfs/bricks/mnt-bricks-replicated_vol-bricks.log --xlator-option *-posix.glusterd-uuid=13df1bd2-6dc8-49fa-ade0-5cd95f6b1f19 --brick-port 49209 --xlator-option replicated_vol.listen-port=49209 root 30587 1 0 Jan30 ?00:12:17 /usr/sbin/glusterfs --volfile-id=serv --volfile-server=serv0 /mnt/replicated_vol root@serv0:/root netstat -p | grep 30587 tcp0 0 serv0:859 serv1:49172 ESTABLISHED 30587/glusterfs = client-remote brick tcp0 0 serv0:746 serv0:24007 ESTABLISHED 30587/glusterfs = client-glusterd tcp0 0 serv0:857 serv0:49209 ESTABLISHED 30587/glusterfs = client-local brick root@serv0:/root So, the client has no open tcp link with the mate brick - which is why it cannot write to the mate brick directly, and instead has to rely on the self-heal daemon instead to do the job. Of course, I now need to debug why the connection fails, but at least we are clean on AFR. Thanks everyone. From: A Ghoshal a.ghos...@tcs.com To: gluster-users@gluster.org Date: 02/03/2015 12:00 AM Subject:[Gluster-users] A few queries on self-healing and AFR (glusterfs 3.4.2) Sent by:gluster-users-boun...@gluster.org Hello, I have a replica-2 volume in which I store a large number of files that are updated frequently (critical log files, etc). My files are generally stable, but one thing that does worry me from time to time is that files show up on one of the bricks in the output of gluster v volname heal info. These entries disappear on their own after a while (I am guessing when cluster.heal-timeout expires and another heal by the self-heal daemon is triggered). For certain files, this could be a bit of a bother - in terms of fault tolerance... I was wondering if there is a way I could force AFR to return write-completion to the application only _after_ the data is written to both replicas successfully (kind of, like, atomic writes) - even if it were at the cost of performance. This way I could ensure that my bricks shall always be in sync. The other thing I could possibly do is reduce my cluster.heal-timeout (it is 600 currently). Is it a bad idea to set it to something as small as say, 60 seconds for volumes where redundancy is a prime concern? One question, though - is heal through self-heal daemon accomplished using separate threads for each replicated volume, or is it a single thread for every volume? The reason I ask is I have a large number of replicated file-systems on each volume (17, to be precise) but I do have a reasonably powerful multicore processor array and large RAM and top indicates the load on the system resources is quite moderate. Thanks, Anirban =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained
Re: [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal#
Hi Pranith, I finally understood what you meant the secure ports, because the issue occurred in one of our setups once more. It seems one of the clients on serv1 could not open a connection to the glusterfsd running on serv0. I'd actually started a mail trail about it (believing it might be something else, initially) here: http://www.gluster.org/pipermail/gluster-users/2015-February/020465.html I think I can write me a rudimentary kind of patch altering af_inet_bind_to_port_lt_ceiling() to get it to call bind with port 0, rather than specify a port explicitly when the client.bind-insecure is specified ... Then I'd need to create a way to set server.allow-insecure using the cli (or if you already sent around the patch to do that like you said in the earlier mail, do let me know). I'll keep you posted about it round here or @ [gluster-devel] if I can get it to work. Thanks a lot, Anirban From: A Ghoshal/MUM/TCS To: Pranith Kumar Karampuri pkara...@redhat.com Cc: gluster-users@gluster.org, Niels de Vos nde...@redhat.com Date: 01/23/2015 02:45 PM Subject:Re: [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal# Sent by:A Ghoshal Oh, I didn't I only read a fragment of the IRC log and assumed --xlator-option would be enough. Apparently it's a lot more work I do have a query, though. These connections, from one of our setups, are these on secure ports? Or, maybe I didn't get it the first time. root@serv0:/root ps -ef | grep replicated_vol root 8851 25307 0 10:03 pts/200:00:00 grep replicated_vol root 29751 1 4 Jan21 ?01:47:20 /usr/sbin/glusterfsd -s serv0 --volfile-id replicated_vol.serv0.mnt-bricks-replicated_vol-brick -p /var/lib/glusterd/vols/_replicated_vol/run/serv0-mnt-bricks-replicated_vol-brick.pid -S /var/run/dff9fa3c93e82f20103f2a3d91adc4a8.socket --brick-name /mnt/bricks/replicated_vol/brick -l /var/log/glusterfs/bricks/mnt-bricks-replicated_vol-brick.log --xlator-option *-posix.glusterd-uuid=1a1d1ebc-4b92-428f-b66b-9c5efa49574d --brick-port 49185 --xlator-option replicated_vol-server.listen-port=49185 root 30399 1 0 Jan21 ?00:19:06 /usr/sbin/glusterfs --volfile-id=replicated_vol --volfile-server=serv0 /mnt/replicated_vol root@serv0:/root netstat -p | grep 30399 tcp0 0 serv0:969 serv0:49185 ESTABLISHED 30399/glusterfs tcp0 0 serv0:999 serv1:49159 ESTABLISHED 30399/glusterfs tcp0 0 serv0:1023 serv0:24007 ESTABLISHED 30399/glusterfs root@serv0:/root Thanks again, Anirban From: Pranith Kumar Karampuri pkara...@redhat.com To: A Ghoshal a.ghos...@tcs.com Cc: gluster-users@gluster.org, Niels de Vos nde...@redhat.com Date: 01/23/2015 01:58 PM Subject:Re: [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal# On 01/23/2015 01:54 PM, A Ghoshal wrote: Thanks a lot, Pranith. We'll set this option on our test servers and keep the setup under observation. How did you get the bind-insecure option working? I guess I will send a patch to make it 'volume set option' Pranith Thanks, Anirban From:Pranith Kumar Karampuri pkara...@redhat.com To:A Ghoshal a.ghos...@tcs.com Cc:gluster-users@gluster.org, Niels de Vos nde...@redhat.com Date:01/23/2015 01:28 PM Subject:Re: [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal# On 01/22/2015 02:07 PM, A Ghoshal wrote: Hi Pranith, Yes, the very same (chalcogen_eg_oxy...@yahoo.com). Justin Clift sent me a mail a while back telling me that it is better if we all use our business email addresses so I made me a new profile. Glusterfs complains about /proc/sys/net/ipv4/ip_local_reserved_ports because we use a really old Linux kernel (2.6.34) wherein this feature is not present. We plan to upgrade our Linux so often but each time we are dissuaded from it by some compatibility issue or the other. So, we get this log every time - on both good volumes and bad ones. What bothered me was this (on serv1): Basically to make the connections to servers i.e. bricks clients need to choose secure ports i.e. port less than 1024. Since this file is not present, it is not binding to any port as per the code I just checked. There is an option called client-bind-insecure which bypasses this check. I feel that is one (probably only way) to get around this. You have to volume set server.allow-insecure on option and bind-insecure option. CC ndevos who seemed to have helped someone set bind-insecure option correctly here (http://irclog.perlgeek.de/gluster/2014-04-09/text) Pranith [2015-01-20 09:37:49.151744] T [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 456, payload: 360, rpc hdr: 96 [2015-01-20 09:37
[Gluster-users] A few queries on self-healing and AFR (glusterfs 3.4.2)
Hello, I have a replica-2 volume in which I store a large number of files that are updated frequently (critical log files, etc). My files are generally stable, but one thing that does worry me from time to time is that files show up on one of the bricks in the output of gluster v volname heal info. These entries disappear on their own after a while (I am guessing when cluster.heal-timeout expires and another heal by the self-heal daemon is triggered). For certain files, this could be a bit of a bother - in terms of fault tolerance... I was wondering if there is a way I could force AFR to return write-completion to the application only _after_ the data is written to both replicas successfully (kind of, like, atomic writes) - even if it were at the cost of performance. This way I could ensure that my bricks shall always be in sync. The other thing I could possibly do is reduce my cluster.heal-timeout (it is 600 currently). Is it a bad idea to set it to something as small as say, 60 seconds for volumes where redundancy is a prime concern? One question, though - is heal through self-heal daemon accomplished using separate threads for each replicated volume, or is it a single thread for every volume? The reason I ask is I have a large number of replicated file-systems on each volume (17, to be precise) but I do have a reasonably powerful multicore processor array and large RAM and top indicates the load on the system resources is quite moderate. Thanks, Anirban =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Mount failed
The logs you show me are reminiscent of an issue I once faced and all it turned out to be was that service glusterd was not running. Did you check that out? Sorry if I'm stating the obvious, though. ;) Thanks, Anirban -Bart#322;omiej Syryjczyk bsyryjc...@kamsoft.pl wrote: - === To: From: Bart#322;omiej Syryjczyk bsyryjc...@kamsoft.pl Date: 01/26/2015 02:28PM Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Mount failed === W dniu 2015-01-22 o 17:37, A Ghoshal pisze: Maybe start the mount daemon from shell, like this? /usr/sbin/glusterfs --debug --volfile-server=glnode1 --volfile-id=/testvol /mnt/gluster You could get some useful debug data on your terminal. However, it's more likely you have a configuration related problem here. So the output of the following might also help: ls -la /var/lib/glusterd/vols/glnode1/ When I start mount daemon from shell it's works fine. After mount with -o log-level=DEBUG, I can see this: --- [...] [2015-01-26 08:19:48.191675] D [fuse-bridge.c:4817:fuse_thread_proc] 0-glusterfs-fuse: *terminating upon getting ENODEV when reading /dev/fuse* [2015-01-26 08:19:48.191702] I [fuse-bridge.c:4921:fuse_thread_proc] 0-fuse: unmounting /mnt/gluster [2015-01-26 08:19:48.191759] D [MSGID: 0] [dht-diskusage.c:96:dht_du_info_cbk] 0-testvol-dht: subvolume 'testvol-replicate-0': avail_percent is: 83.00 and avail_space is: 19188232192 and avail_inodes is: 99.00 [2015-01-26 08:19:48.191801] D [logging.c:1740:gf_log_flush_extra_msgs] 0-logging-infra: Log buffer size reduced. About to flush 5 extra log messages [2015-01-26 08:19:48.191822] D [logging.c:1743:gf_log_flush_extra_msgs] 0-logging-infra: Just flushed 5 extra log messages [2015-01-26 08:19:48.192118] W [glusterfsd.c:1194:cleanup_and_exit] (-- 0-: received signum (15), shutting down [2015-01-26 08:19:48.192137] D [glusterfsd-mgmt.c:2244:glusterfs_mgmt_pmap_signout] 0-fsd-mgmt: portmapper signout arguments not given [2015-01-26 08:19:48.192145] I [fuse-bridge.c:5599:fini] 0-fuse: Unmounting '/mnt/gluster'. --- And it won't work Output of few commands: --- [root@apache2 ~]# lsmod|grep fuse fuse 75687 3 [root@apache2 ~]# ls -l /dev/fuse crw-rw-rw- 1 root root 10, 229 Jan 26 09:19 /dev/fuse [root@apache2 ~]# ls -la /var/lib/glusterd/vols/testvol/ total 48 drwxr-xr-x 4 root root 4096 Jan 26 09:09 . drwxr-xr-x. 3 root root 20 Jan 26 07:11 .. drwxr-xr-x 2 root root 48 Jan 26 08:02 bricks -rw--- 1 root root 16 Jan 26 09:09 cksum -rw--- 1 root root 545 Jan 26 08:02 info -rw--- 1 root root 93 Jan 26 08:02 node_state.info -rw--- 1 root root 18 Jan 26 09:09 quota.cksum -rw--- 1 root root0 Jan 26 07:37 quota.conf -rw--- 1 root root 12 Jan 26 08:02 rbstate drwxr-xr-x 2 root root 30 Jan 26 09:09 run -rw--- 1 root root 13 Jan 26 08:02 snapd.info -rw--- 1 root root 1995 Jan 26 08:02 testvol.apache1.brick.vol -rw--- 1 root root 1995 Jan 26 08:02 testvol.apache2.brick.vol -rw--- 1 root root 1392 Jan 26 08:02 testvol-rebalance.vol -rw--- 1 root root 1392 Jan 26 08:02 testvol.tcp-fuse.vol -rw--- 1 root root 1620 Jan 26 08:02 trusted-testvol.tcp-fuse.vol --- -- Z powa#380;aniem, *Bart#322;omiej Syryjczyk* ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while
I am plagued with something of this sort, too! What I mostly see when I explore these things is that A) it's a split-brain. B) the split-brain is because the gfid's on the two replicas are at odds. You could check that out by 1. On each server, first 'cd' to where your brick is mounted. 2. getfattr -m . -d -e hex templates/assets/prod/temporary/13/user_1339200.png You will see a trusted.gfid kind of extended attribute. If it's not the same on both servers, there's a problem. Thanks, Anirban -Tiago Santos ti...@musthavemenus.com wrote: - === To: gluster-users@gluster.org From: Tiago Santos ti...@musthavemenus.com Date: 01/26/2015 09:38PM Subject: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while === Hey guys, I'm experiencing this weird case for pretty much any command (ls, df, find, etc) I try to run against a Gluster client filesystem. Just for you guys to understand what I'm talking about, follows this easy and simple test I just ran: root@web3:~# date; time ls -ltrh /var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png Mon Jan 26 07:00:27 PST 2015 -rwx---r-- 1 mhmadmin mhmadmin 61K Jan 22 14:37 /var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png real 0m*33.651s* user 0m0.001s sys 0m0.004s root@web3:~# date; time ls -ltrh /var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png Mon Jan 26 07:01:03 PST 2015 ls: cannot access /var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png: *Input/output error* real *1m40.241s* user 0m0.000s sys 0m0.003s root@web3:~# date; time ls -ltrh /var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png Mon Jan 26 07:02:51 PST 2015 ls: cannot access /var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png: *Input/output error* real *0m12.834s* user 0m0.000s sys 0m0.003s root@web3:~# date; time ls -ltrh /var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png Mon Jan 26 07:03:10 PST 2015 -rwx---r-- 1 mhmadmin mhmadmin 61K Jan 22 14:37 /var/www/site-images/templates/assets/prod/temporary/13/user_1339200.png real *2m10.150s* user 0m0.000s sys 0m0.005s Sometimes it passes, but takes a really long time to run a simple command (this is a 61K file), sometimes I see the Input/output error. The important thing to mention is that this behavior happens almost all the time. I can quickly reproduce it if asked. This is a 2-node gluster setup. Both VMs act as Client and Server (sorry if I'm not using the correct gluster naming.. I'm getting to know it for weeks now). More info: # gluster --version glusterfs 3.5.3 built on Nov 18 2014 03:53:25 Repository revision: git://git.gluster.com/glusterfs.git # df -Th Filesystem TypeSize Used Avail Use% Mounted on /dev/mapper/data_vg-data_lv ext4 1007G 506G 451G 53% /export/images1-1 images1.mydomain.com:/site-images fuse.glusterfs 1007G 506G 451G 53% /var/www/site-images # uname -a Linux web3 3.13.0-44-generic #73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux # gluster volume info Volume Name: site-images Type: Replicate Volume ID: 68bca3c9-210c-45a9-b2bc-6a0e2ee630bb Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: images1.mydomain.com:/export/images1-1/brick Brick2: images2.mydomain.com:/export/images2-1/brick Would anyone help me identify what is going on here? Thanks in advance! -- *Tiago Santos* MustHaveMenus.com ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while
Actually you ran getfattr on the volume - which is why the requisite extended attributes never showed up... Your bricks are mounted elsewhere. /exports/images1-1/brick, and exports/images2-1/brick Btw, what version of Linux do you use? And, are the files you observe the input/output errors on soft-links? -Tiago Santos ti...@musthavemenus.com wrote: - === To: A Ghoshal a.ghos...@tcs.com From: Tiago Santos ti...@musthavemenus.com Date: 01/27/2015 12:20AM Cc: gluster-users gluster-users@gluster.org Subject: Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while === Thanks for you input, Anirban. I ran the commands on both servers, with the following results: root@web3:/var/www/site-images# time getfattr -m . -d -e hex templates/assets/prod/temporary/13/user_1339200.png real 0m34.524s user 0m0.004s sys 0m0.000s root@web4:/var/www/site-images# time getfattr -m . -d -e hex templates/assets/prod/temporary/13/user_1339200.png getfattr: templates/assets/prod/temporary/13/user_1339200.png: Input/output error real 0m11.315s user 0m0.001s sys 0m0.003s root@web4:/var/www/site-images# ls templates/assets/prod/temporary/13/user_1339200.png ls: cannot access templates/assets/prod/temporary/13/user_1339200.png: Input/output error Not sure if it elucidate the issue.. Also, I saw at /var/log/gluster.log a zillion entries like these: [2015-01-26 17:35:39.973268] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/9616964 (----) [2015-01-26 17:35:39.973435] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/9594915 (----) [2015-01-26 17:35:39.973571] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/9681971 (----) [2015-01-26 17:35:39.973686] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/19615 (----) [2015-01-26 17:35:39.973802] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/130392 (----) I have talked with some guys at #gluster that pointed it could be network issues. I'm still looking into it, but since the issue also happens locally (within the same server), would that still be a valid point? Also, less often, I see entries like these: [2015-01-26 17:41:25.956418] E [afr-self-heal-common.c:1615:afr_sh_common_lookup_cbk] 0-site-images-replicate-0: Conflicting entries for /webhost/sites/clipart/assets/apache/images/graphics/215126/image1.png [2015-01-26 17:41:26.588753] E [afr-self-heal-common.c:1615:afr_sh_common_lookup_cbk] 0-site-images-replicate-0: Conflicting entries for /webhost/sites/clipart/assets/apache/images/graphics/215126/image1.png Are those a definitive indication of a split-brain? Or just something usual until self-heal takes care of recently updated files? On Mon, Jan 26, 2015 at 2:25 PM, A Ghoshal a.ghos...@tcs.com wrote: I am plagued with something of this sort, too! What I mostly see when I explore these things is that A) it's a split-brain. B) the split-brain is because the gfid's on the two replicas are at odds. You could check that out by 1. On each server, first 'cd' to where your brick is mounted. 2. getfattr -m . -d -e hex templates/assets/prod/temporary/13/user_1339200.png You will see a trusted.gfid kind of extended attribute. If it's not the same on both servers, there's a problem. Thanks, Anirban Regards, -- *Tiago Santos* MustHaveMenus.com =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while
Yep, so it is indeed a split-brain caused by a mismatch of the trusted.gfid attribute. Sadly, I don't know precisely what causes it. -Communication loss might be one of the triggers. I am guessing the files with the problem are dynamic, correct? In our setup (also replica 2), communication is never a problem but we do see this when one of the server takes a reboot. Maybe some obscure and difficult to understand race between background self-heal and the self heal daemon... In any case, a normal procedure for split brain recovery would work for you if you wish to get you files back in function. It's easy to find on google. I use the instructions on Joe Julian's blog page myself. -Tiago Santos ti...@musthavemenus.com wrote: - === To: A Ghoshal a.ghos...@tcs.com From: Tiago Santos ti...@musthavemenus.com Date: 01/27/2015 02:11AM Cc: gluster-users gluster-users@gluster.org Subject: Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while === Oh, right! Follow the outputs: root@web3:/export/images1-1/brick# time getfattr -m . -d -e hex templates/assets/prod/temporary/13/user_1339200.png # file: templates/assets/prod/temporary/13/user_1339200.png trusted.afr.site-images-client-0=0x0004 trusted.afr.site-images-client-1=0x00020009 trusted.gfid=0x10e5894c474a4cb1898b71e872cdf527 real 0m0.024s user 0m0.001s sys 0m0.001s root@web4:/export/images2-1/brick# time getfattr -m . -d -e hex templates/assets/prod/temporary/13/user_1339200.png # file: templates/assets/prod/temporary/13/user_1339200.png trusted.afr.site-images-client-0=0x trusted.afr.site-images-client-1=0x trusted.gfid=0xd02f14fcb6724ceba4a330eb606910f3 real 0m0.003s user 0m0.000s sys 0m0.006s Not sure exactly what that means. I'm googling, and would appreciate if you guys can bring some light. Thanks! -- Tiago On Mon, Jan 26, 2015 at 6:16 PM, A Ghoshal a.ghos...@tcs.com wrote: Actually you ran getfattr on the volume - which is why the requisite extended attributes never showed up... Your bricks are mounted elsewhere. /exports/images1-1/brick, and exports/images2-1/brick Btw, what version of Linux do you use? And, are the files you observe the input/output errors on soft-links? -Tiago Santos ti...@musthavemenus.com wrote: - === To: A Ghoshal a.ghos...@tcs.com From: Tiago Santos ti...@musthavemenus.com Date: 01/27/2015 12:20AM Cc: gluster-users gluster-users@gluster.org Subject: Re: [Gluster-users] Pretty much any operation related to Gluster mounted fs hangs for a while === Thanks for you input, Anirban. I ran the commands on both servers, with the following results: root@web3:/var/www/site-images# time getfattr -m . -d -e hex templates/assets/prod/temporary/13/user_1339200.png real 0m34.524s user 0m0.004s sys 0m0.000s root@web4:/var/www/site-images# time getfattr -m . -d -e hex templates/assets/prod/temporary/13/user_1339200.png getfattr: templates/assets/prod/temporary/13/user_1339200.png: Input/output error real 0m11.315s user 0m0.001s sys 0m0.003s root@web4:/var/www/site-images# ls templates/assets/prod/temporary/13/user_1339200.png ls: cannot access templates/assets/prod/temporary/13/user_1339200.png: Input/output error =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal#
Oh, I didn't I only read a fragment of the IRC log and assumed --xlator-option would be enough. Apparently it's a lot more work I do have a query, though. These connections, from one of our setups, are these on secure ports? Or, maybe I didn't get it the first time. root@serv0:/root ps -ef | grep replicated_vol root 8851 25307 0 10:03 pts/200:00:00 grep replicated_vol root 29751 1 4 Jan21 ?01:47:20 /usr/sbin/glusterfsd -s serv0 --volfile-id replicated_vol.serv0.mnt-bricks-replicated_vol-brick -p /var/lib/glusterd/vols/_replicated_vol/run/serv0-mnt-bricks-replicated_vol-brick.pid -S /var/run/dff9fa3c93e82f20103f2a3d91adc4a8.socket --brick-name /mnt/bricks/replicated_vol/brick -l /var/log/glusterfs/bricks/mnt-bricks-replicated_vol-brick.log --xlator-option *-posix.glusterd-uuid=1a1d1ebc-4b92-428f-b66b-9c5efa49574d --brick-port 49185 --xlator-option replicated_vol-server.listen-port=49185 root 30399 1 0 Jan21 ?00:19:06 /usr/sbin/glusterfs --volfile-id=replicated_vol --volfile-server=serv0 /mnt/replicated_vol root@serv0:/root netstat -p | grep 30399 tcp0 0 serv0:969 serv0:49185 ESTABLISHED 30399/glusterfs tcp0 0 serv0:999 serv1:49159 ESTABLISHED 30399/glusterfs tcp0 0 serv0:1023 serv0:24007 ESTABLISHED 30399/glusterfs root@serv0:/root Thanks again, Anirban From: Pranith Kumar Karampuri pkara...@redhat.com To: A Ghoshal a.ghos...@tcs.com Cc: gluster-users@gluster.org, Niels de Vos nde...@redhat.com Date: 01/23/2015 01:58 PM Subject:Re: [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal# On 01/23/2015 01:54 PM, A Ghoshal wrote: Thanks a lot, Pranith. We'll set this option on our test servers and keep the setup under observation. How did you get the bind-insecure option working? I guess I will send a patch to make it 'volume set option' Pranith Thanks, Anirban From:Pranith Kumar Karampuri pkara...@redhat.com To:A Ghoshal a.ghos...@tcs.com Cc:gluster-users@gluster.org, Niels de Vos nde...@redhat.com Date:01/23/2015 01:28 PM Subject:Re: [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal# On 01/22/2015 02:07 PM, A Ghoshal wrote: Hi Pranith, Yes, the very same (chalcogen_eg_oxy...@yahoo.com). Justin Clift sent me a mail a while back telling me that it is better if we all use our business email addresses so I made me a new profile. Glusterfs complains about /proc/sys/net/ipv4/ip_local_reserved_ports because we use a really old Linux kernel (2.6.34) wherein this feature is not present. We plan to upgrade our Linux so often but each time we are dissuaded from it by some compatibility issue or the other. So, we get this log every time - on both good volumes and bad ones. What bothered me was this (on serv1): Basically to make the connections to servers i.e. bricks clients need to choose secure ports i.e. port less than 1024. Since this file is not present, it is not binding to any port as per the code I just checked. There is an option called client-bind-insecure which bypasses this check. I feel that is one (probably only way) to get around this. You have to volume set server.allow-insecure on option and bind-insecure option. CC ndevos who seemed to have helped someone set bind-insecure option correctly here (http://irclog.perlgeek.de/gluster/2014-04-09/text) Pranith [2015-01-20 09:37:49.151744] T [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 456, payload: 360, rpc hdr: 96 [2015-01-20 09:37:49.151780] T [rpc-clnt.c:1499:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x39620x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (replicated_vol-client-0) [2015-01-20 09:37:49.151810] T [rpc-clnt.c:1302:rpc_clnt_record] 0-replicated_vol-client-1: Auth Info: pid: 7599, uid: 0, gid: 0, owner: [2015-01-20 09:37:49.151824] T [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 456, payload: 360, rpc hdr: 96 [2015-01-20 09:37:49.151889] T [rpc-clnt.c:1499:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x39563x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (replicated_vol-client-1) [2015-01-20 09:37:49.152239] T [rpc-clnt.c:669:rpc_clnt_reply_init] 0-replicated_vol-client-1: received rpc message (RPC XID: 0x39563x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport (replicated_vol-client-1) [2015-01-20 09:37:49.152484] T [rpc-clnt.c:669:rpc_clnt_reply_init] 0-replicated_vol-client-0: received rpc message (RPC XID: 0x39620x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport (replicated_vol-client-0) When I write on the good server (serv1), we see that an RPC request is sent to both client
Re: [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal#
Thanks a lot, Pranith. We'll set this option on our test servers and keep the setup under observation. Thanks, Anirban From: Pranith Kumar Karampuri pkara...@redhat.com To: A Ghoshal a.ghos...@tcs.com Cc: gluster-users@gluster.org, Niels de Vos nde...@redhat.com Date: 01/23/2015 01:28 PM Subject:Re: [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal# On 01/22/2015 02:07 PM, A Ghoshal wrote: Hi Pranith, Yes, the very same (chalcogen_eg_oxy...@yahoo.com). Justin Clift sent me a mail a while back telling me that it is better if we all use our business email addresses so I made me a new profile. Glusterfs complains about /proc/sys/net/ipv4/ip_local_reserved_ports because we use a really old Linux kernel (2.6.34) wherein this feature is not present. We plan to upgrade our Linux so often but each time we are dissuaded from it by some compatibility issue or the other. So, we get this log every time - on both good volumes and bad ones. What bothered me was this (on serv1): Basically to make the connections to servers i.e. bricks clients need to choose secure ports i.e. port less than 1024. Since this file is not present, it is not binding to any port as per the code I just checked. There is an option called client-bind-insecure which bypasses this check. I feel that is one (probably only way) to get around this. You have to volume set server.allow-insecure on option and bind-insecure option. CC ndevos who seemed to have helped someone set bind-insecure option correctly here (http://irclog.perlgeek.de/gluster/2014-04-09/text) Pranith [2015-01-20 09:37:49.151744] T [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 456, payload: 360, rpc hdr: 96 [2015-01-20 09:37:49.151780] T [rpc-clnt.c:1499:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x39620x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (replicated_vol-client-0) [2015-01-20 09:37:49.151810] T [rpc-clnt.c:1302:rpc_clnt_record] 0-replicated_vol-client-1: Auth Info: pid: 7599, uid: 0, gid: 0, owner: [2015-01-20 09:37:49.151824] T [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 456, payload: 360, rpc hdr: 96 [2015-01-20 09:37:49.151889] T [rpc-clnt.c:1499:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x39563x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (replicated_vol-client-1) [2015-01-20 09:37:49.152239] T [rpc-clnt.c:669:rpc_clnt_reply_init] 0-replicated_vol-client-1: received rpc message (RPC XID: 0x39563x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport (replicated_vol-client-1) [2015-01-20 09:37:49.152484] T [rpc-clnt.c:669:rpc_clnt_reply_init] 0-replicated_vol-client-0: received rpc message (RPC XID: 0x39620x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport (replicated_vol-client-0) When I write on the good server (serv1), we see that an RPC request is sent to both client-0 and client-1. While, when I write on the bad server (serv0), the RPC request is sent only to client-0, which is why it is no wonder that the writes are not synced over to serv1. Somehow I could not make the daemon on serv0 understand that there are two up-children and not just one. One additional detail - since we are using a kernel that's too old, we do not have the (Anand Avati's?) FUse readdirplus patches, either. I've noticed that the fixes in the readdirplus version of glusterfs aren't always guaranteed to be present on the non-readdirplus version of the patches. I'd filed a bug around one such anomaly back, but never got around to writing a patch for it (sorry!) Here it is: https://bugzilla.redhat.com/show_bug.cgi?id=1062287 I don't this has anything to do with readdirplus. Maybe something on similar lines here? Thanks, Anirban P.s. Please ignore the #Personal# in the subject line - we need to do that to push mails to the public domain past the email filter safely. From:Pranith Kumar Karampuri pkara...@redhat.com To:A Ghoshal a.ghos...@tcs.com, gluster-users@gluster.org Date:01/22/2015 12:09 AM Subject:Re: [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server hi, Responses inline. PS: You are chalkogen_oxygen? Pranith On 01/20/2015 05:34 PM, A Ghoshal wrote: Hello, I am using the following replicated volume: root@serv0:~ gluster v info replicated_vol Volume Name: replicated_vol Type: Replicate Volume ID: 26d111e3-7e4c-479e-9355-91635ab7f1c2 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: serv0:/mnt/bricks/replicated_vol/brick Brick2: serv1:/mnt/bricks/replicated_vol/brick Options Reconfigured: diagnostics.client-log-level: INFO network.ping-timeout: 10 nfs.enable-ino32: on cluster.self-heal-daemon: on nfs.disable: off replicated_vol
Re: [Gluster-users] NFS share issue #Personal#
Hello, I tried using the linuxl's NFS server by exporting mount.glusterfs mounts, and man it didn't go well either (similar, stale file handle issues from time to time). Actually, it's not just glusterfs, but all FUse based file-systems that encounter problems with the linuxl's nfs server. Which is why we finally moved to the glusterfs NFS server. It was a lot smoother afterwards. In your volume configuration I can see that nfs.disable is ON. To start glusterfs's NFS server, you would need to shut down your current NFS service and set nfs.disable to OFF, just so: gluster volume set vol_home nfs.disable off Then, your mountable export (say, on promethee in your example) would be hades:vol_home, and your mount command would look like this: mount -otcp,nfsvers=3 -t nfs hades:vol_home /hades Thanks, Anirban From: Geoffrey Letessier geoffrey.letess...@cnrs.fr To: gluster-users@gluster.org Date: 01/22/2015 05:05 PM Subject:[Gluster-users] NFS share issue Sent by:gluster-users-boun...@gluster.org Dear all, Since a few days, some users raise IT issues concerning GlusterFS NFS Shares and more specifically, in about NFS shares available on my HPC clusters. Indeed, from their workstations which, if they try to copy some directories/files from the NFS share into somewhere else (taking place in), they get some Stale NFS file handle errors. Here is an example: my machine is promethee and the remote machine where is located the NFS share is hades. the mount point of my NFS share in my machine is /hades. [me@promethee ~]$ cp -r /hades/Gquads/ /home/me/Gquads/ = here all is OK [me@promethee ~]$ cd /hades [me@promethee ~]$ cp -r Gquads/ /home/me/Gquads/ cp: reading `2KF8/dihedrals/traj.pdb': Stale NFS file handle cp: failed to extend `/data/pasquali/Gquads/2KF8/dihedrals/traj.pdb': Stale NFS file handle cp: cannot stat `2KF8/dihedrals/line_14.png': Stale NFS file handle cp: cannot stat `2KF8/dihedrals/polar_log_14.png': Stale NFS file handle cp: cannot stat `2KF8/dihedrals/polar_9.png': Stale NFS file handle cp: cannot stat `2KF8/dihedrals/polar_log_13.png': Stale NFS file handle cp: cannot stat `2KF8/dihedrals/line_15.png': Stale NFS file handle cp: cannot stat `2KF8/dihedrals/polar_log_12.png': Stale NFS file handle cp: cannot stat `2KF8/dihedrals/line_21.png': Stale NFS file handle It looks like to have problems to solve relative paths... For information: i use NFS export over my GlusterFS volume remote mount. In other words: hades is a master whose the /home is a mount of a GlusterFS volume. Here is my volume settings: [root@hades ~]# gluster volume info vol_home Volume Name: vol_home Type: Distributed-Replicate Volume ID: f6ebcfc1-b735-4a0e-b1d7-47ed2d2e7af6 Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: tcp,rdma Bricks: Brick1: ib-storage1:/export/brick_home/brick1 Brick2: ib-storage2:/export/brick_home/brick1 Brick3: ib-storage3:/export/brick_home/brick1 Brick4: ib-storage4:/export/brick_home/brick1 Brick5: ib-storage1:/export/brick_home/brick2 Brick6: ib-storage2:/export/brick_home/brick2 Brick7: ib-storage3:/export/brick_home/brick2 Brick8: ib-storage4:/export/brick_home/brick2 Options Reconfigured: features.default-soft-limit: 90% features.quota: on diagnostics.brick-log-level: CRITICAL auth.allow: localhost,127.0.0.1,10.* nfs.disable: on performance.cache-size: 64MB performance.write-behind-window-size: 1MB performance.quick-read: on performance.io-cache: on performance.io-thread-count: 64 [root@hades ~]# cat /etc/exports /home *.lbt.ibpc.fr(fsid=0,rw,root_squash) [root@lucifer ~]# mount |grep home ib-storage1:vol_home.rdma on /home type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) Any idea? Am I the only one with this problem? Thanks in advance, Geoffrey -- --- Geoffrey Letessier Responsable informatique CNRS - UPR 9080 - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail:geoffrey.letess...@cnrs.fr ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal#
Hi Pranith, Yes, the very same (chalcogen_eg_oxy...@yahoo.com). Justin Clift sent me a mail a while back telling me that it is better if we all use our business email addresses so I made me a new profile. Glusterfs complains about /proc/sys/net/ipv4/ip_local_reserved_ports because we use a really old Linux kernel (2.6.34) wherein this feature is not present. We plan to upgrade our Linux so often but each time we are dissuaded from it by some compatibility issue or the other. So, we get this log every time - on both good volumes and bad ones. What bothered me was this (on serv1): [2015-01-20 09:37:49.151744] T [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 456, payload: 360, rpc hdr: 96 [2015-01-20 09:37:49.151780] T [rpc-clnt.c:1499:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x39620x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (replicated_vol-client-0) [2015-01-20 09:37:49.151810] T [rpc-clnt.c:1302:rpc_clnt_record] 0-replicated_vol-client-1: Auth Info: pid: 7599, uid: 0, gid: 0, owner: [2015-01-20 09:37:49.151824] T [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 456, payload: 360, rpc hdr: 96 [2015-01-20 09:37:49.151889] T [rpc-clnt.c:1499:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x39563x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (replicated_vol-client-1) [2015-01-20 09:37:49.152239] T [rpc-clnt.c:669:rpc_clnt_reply_init] 0-replicated_vol-client-1: received rpc message (RPC XID: 0x39563x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport (replicated_vol-client-1) [2015-01-20 09:37:49.152484] T [rpc-clnt.c:669:rpc_clnt_reply_init] 0-replicated_vol-client-0: received rpc message (RPC XID: 0x39620x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport (replicated_vol-client-0) When I write on the good server (serv1), we see that an RPC request is sent to both client-0 and client-1. While, when I write on the bad server (serv0), the RPC request is sent only to client-0, which is why it is no wonder that the writes are not synced over to serv1. Somehow I could not make the daemon on serv0 understand that there are two up-children and not just one. One additional detail - since we are using a kernel that's too old, we do not have the (Anand Avati's?) FUse readdirplus patches, either. I've noticed that the fixes in the readdirplus version of glusterfs aren't always guaranteed to be present on the non-readdirplus version of the patches. I'd filed a bug around one such anomaly back, but never got around to writing a patch for it (sorry!) Here it is: https://bugzilla.redhat.com/show_bug.cgi?id=1062287 Maybe something on similar lines here? Thanks, Anirban P.s. Please ignore the #Personal# in the subject line - we need to do that to push mails to the public domain past the email filter safely. From: Pranith Kumar Karampuri pkara...@redhat.com To: A Ghoshal a.ghos...@tcs.com, gluster-users@gluster.org Date: 01/22/2015 12:09 AM Subject:Re: [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server hi, Responses inline. PS: You are chalkogen_oxygen? Pranith On 01/20/2015 05:34 PM, A Ghoshal wrote: Hello, I am using the following replicated volume: root@serv0:~ gluster v info replicated_vol Volume Name: replicated_vol Type: Replicate Volume ID: 26d111e3-7e4c-479e-9355-91635ab7f1c2 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: serv0:/mnt/bricks/replicated_vol/brick Brick2: serv1:/mnt/bricks/replicated_vol/brick Options Reconfigured: diagnostics.client-log-level: INFO network.ping-timeout: 10 nfs.enable-ino32: on cluster.self-heal-daemon: on nfs.disable: off replicated_vol is mounted at /mnt/replicated_vol on both serv0 and serv1. If I do the following on serv0: root@serv0:~echo cranberries /mnt/replicated_vol/testfile root@serv0:~echo tangerines /mnt/replicated_vol/testfile And then I check for the state of the replicas in the bricks, then I find that root@serv0:~cat /mnt/bricks/replicated_vol/brick/testfile cranberries tangerines root@serv0:~ root@serv1:~cat /mnt/bricks/replicated_vol/brick/testfile root@serv1:~ As may be seen, the replica on serv1 is blank, when I write into testfile from serv0 (even though the file is created on both bricks). Interestingly, if I write something to the file at serv1, then the two replicas become identical. root@serv1:~echo artichokes /mnt/replicated_vol/testfile root@serv1:~cat /mnt/bricks/replicated_vol/brick/testfile cranberries tangerines artichokes root@serv1:~ root@serv0:~cat /mnt/bricks/replicated_vol/brick/testfile cranberries tangerines artichokes root@serv0:~ So, I dabbled into the logs a little bit, after upping the diagnostic level, and this is what I saw: When I write on serv0 (bad case
Re: [Gluster-users] Mount failed
Maybe start the mount daemon from shell, like this? /usr/sbin/glusterfs --debug --volfile-server=glnode1 --volfile-id=/testvol /mnt/gluster You could get some useful debug data on your terminal. However, it's more likely you have a configuration related problem here. So the output of the following might also help: ls -la /var/lib/glusterd/vols/glnode1/ Thanks, Anirban From: Bartłomiej Syryjczyk bsyryjc...@kamsoft.pl To: gluster-users@gluster.org Date: 01/22/2015 09:52 PM Subject:[Gluster-users] Mount failed Sent by:gluster-users-boun...@gluster.org I've got problem with mount. Anyone help? # mount -t glusterfs apache1:/testvol /mnt/gluster *Mount failed. Please check the log file for more details.* Log: http://pastebin.com/GzkbEGCw Oracle Linux Server release 7.0 Kernel 3.8.13-55.1.2.el7uek.x86_64 glusterfs packages from official yum repository Name: glusterfs Arch: x86_64 Version : 3.6.1 Release : 1.el7 -- Z poważaniem, *Bartłomiej Syryjczyk* ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server
No, this is the fuse mount for the glusterfs volume: root@serv0:~ df -TH /mnt/replicated_vol/ FilesystemType Size Used Avail Use% Mounted on serv0:replicated_vol fuse.glusterfs 138M15M 124M 11% /replicated_vol root@serv0:~ Thanks, Anirban From: Lindsay Mathieson lindsay.mathie...@gmail.com To: gluster-users@gluster.org Date: 01/20/2015 05:56 PM Subject:Re: [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server Sent by:gluster-users-boun...@gluster.org replicated_vol is mounted at /mnt/replicated_vol on both serv0 and serv1. The mounts - these are the base disk mounts? To access the replicated files system you need to mount the gluster file system itself. mount -t glusterfs HOSTNAME-OR-IPADDRESS:/VOLNAME MOUNTDIR e.g mount -t glusterfs server1:/test-volume /mnt/glusterfs Good blog post for ubuntu here: http://www.jamescoyle.net/how-to/439-mount-a-glusterfs-volume ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] In a replica 2 server, file-updates on one server missing on the other server
Hello, I am using the following replicated volume: root@serv0:~ gluster v info replicated_vol Volume Name: replicated_vol Type: Replicate Volume ID: 26d111e3-7e4c-479e-9355-91635ab7f1c2 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: serv0:/mnt/bricks/replicated_vol/brick Brick2: serv1:/mnt/bricks/replicated_vol/brick Options Reconfigured: diagnostics.client-log-level: INFO network.ping-timeout: 10 nfs.enable-ino32: on cluster.self-heal-daemon: on nfs.disable: off replicated_vol is mounted at /mnt/replicated_vol on both serv0 and serv1. If I do the following on serv0: root@serv0:~echo cranberries /mnt/replicated_vol/testfile root@serv0:~echo tangerines /mnt/replicated_vol/testfile And then I check for the state of the replicas in the bricks, then I find that root@serv0:~cat /mnt/bricks/replicated_vol/brick/testfile cranberries tangerines root@serv0:~ root@serv1:~cat /mnt/bricks/replicated_vol/brick/testfile root@serv1:~ As may be seen, the replica on serv1 is blank, when I write into testfile from serv0 (even though the file is created on both bricks). Interestingly, if I write something to the file at serv1, then the two replicas become identical. root@serv1:~echo artichokes /mnt/replicated_vol/testfile root@serv1:~cat /mnt/bricks/replicated_vol/brick/testfile cranberries tangerines artichokes root@serv1:~ root@serv0:~cat /mnt/bricks/replicated_vol/brick/testfile cranberries tangerines artichokes root@serv0:~ So, I dabbled into the logs a little bit, after upping the diagnostic level, and this is what I saw: When I write on serv0 (bad case): [2015-01-20 09:21:52.197704] T [fuse-bridge.c:546:fuse_lookup_resume] 0-glusterfs-fuse: 53027: LOOKUP /testfl(f0a76987-8a42-47a2-b027-a823254b736b) [2015-01-20 09:21:52.197959] D [afr-common.c:131:afr_lookup_xattr_req_prepare] 0-replicated_vol-replicate-0: /testfl: failed to get the gfid from dict [2015-01-20 09:21:52.198006] T [rpc-clnt.c:1302:rpc_clnt_record] 0-replicated_vol-client-0: Auth Info: pid: 28151, uid: 0, gid: 0, owner: [2015-01-20 09:21:52.198024] T [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 456, payload: 360, rpc hdr: 96 [2015-01-20 09:21:52.198108] T [rpc-clnt.c:1499:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x78163x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (replicated_vol-client-0) [2015-01-20 09:21:52.198565] T [rpc-clnt.c:669:rpc_clnt_reply_init] 0-replicated_vol-client-0: received rpc message (RPC XID: 0x78163x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport (replicated_vol-client-0) [2015-01-20 09:21:52.198640] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-replicated_vol-replicate-0: pending_matrix: [ 0 3 ] [2015-01-20 09:21:52.198669] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-replicated_vol-replicate-0: pending_matrix: [ 0 0 ] [2015-01-20 09:21:52.198681] D [afr-self-heal-common.c:887:afr_mark_sources] 0-replicated_vol-replicate-0: Number of sources: 1 [2015-01-20 09:21:52.198694] D [afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type] 0-replicated_vol-replicate-0: returning read_child: 0 [2015-01-20 09:21:52.198705] D [afr-common.c:1380:afr_lookup_select_read_child] 0-replicated_vol-replicate-0: Source selected as 0 for /testfl [2015-01-20 09:21:52.198720] D [afr-common.c:1117:afr_lookup_build_response_params] 0-replicated_vol-replicate-0: Building lookup response from 0 [2015-01-20 09:21:52.198732] D [afr-common.c:1732:afr_lookup_perform_self_heal] 0-replicated_vol-replicate-0: Only 1 child up - do not attempt to detect self heal When I write on serv1 (good case): [2015-01-20 09:37:49.151506] T [fuse-bridge.c:546:fuse_lookup_resume] 0-glusterfs-fuse: 31212: LOOKUP /testfl(f0a76987-8a42-47a2-b027-a823254b736b) [2015-01-20 09:37:49.151683] D [afr-common.c:131:afr_lookup_xattr_req_prepare] 0-replicated_vol-replicate-0: /testfl: failed to get the gfid from dict [2015-01-20 09:37:49.151726] T [rpc-clnt.c:1302:rpc_clnt_record] 0-replicated_vol-client-0: Auth Info: pid: 7599, uid: 0, gid: 0, owner: [2015-01-20 09:37:49.151744] T [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 456, payload: 360, rpc hdr: 96 [2015-01-20 09:37:49.151780] T [rpc-clnt.c:1499:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x39620x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (replicated_vol-client-0) [2015-01-20 09:37:49.151810] T [rpc-clnt.c:1302:rpc_clnt_record] 0-replicated_vol-client-1: Auth Info: pid: 7599, uid: 0, gid: 0, owner: [2015-01-20 09:37:49.151824] T [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 456, payload: 360, rpc hdr: 96 [2015-01-20 09:37:49.151889] T [rpc-clnt.c:1499:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x39563x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to
Re: [Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors
Ok, no problem. The issue is very rare, even with our setup - we have seen it only once on one site even though we have been in production for several months now. For now, we can live with that IMO. And, thanks again. Anirban___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors
It is possible, yes, because these are actually a kind of log files. I suppose, like other logging frameworks these files an remain open for a considerable period, and then get renamed to support log rotate semantics. That said, I might need to check with the team that actually manages the logging framework to be sure. I only take care of the file-system stuff. I can tell you for sure Monday. If it is the same race that you mention, is there a fix for it? Thanks, Anirban___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors
I see. Thanks a tonne for the thorough explanation! :) I can see that our setup would be vulnerable here because the logger on one server is not generally aware of the state of the replica on the other server. So, it is possible that the log files may have been renamed before heal had a chance to kick in. Could I also request you for the bug ID (should there be one) against which you are coding up the fix, so that we could get a notification once it is passed? Also, as an aside, is O_DIRECT supposed to prevent this from occurring if one were to make allowance for the performance hit? Thanks again, Anirban___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors
Hi, Yes, they do, and considerably. I#39;d forgotten to mention that on my last email. Their mtimes, however, as far as i could tell on separate servers, seemed to coincide. Thanks, Anirban___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] NFS not start on localhost
Maybe share the last 15-20 lines of you /var/log/glusterfs/nfs.log for the consideration of everyone on the list? Thanks. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] NFS not start on localhost
It happens with me sometimes. Try `tail -n 20 /var/log/glusterfs/nfs.log`. You will probably find something out that will help your cause. In general, if you just wish to start the thing up without going into the why of it, try `gluster volume set engine nfs.disable on` followed by `gluster volume set engine nfs.disable off`. It does the trick quite often for me because it is a polite way to askmgmt/glusterd to try and respawn the nfs server process if need be. But, keep in mind that this will call a (albeit small) service interruption to all clients accessing volume engine over nfs. Thanks, Anirban On Saturday, 18 October 2014 1:03 AM, Demeter Tibor tdeme...@itsmart.hu wrote: Hi, I have make a glusterfs with nfs support. I don't know why, but after a reboot the nfs does not listen on localhost, only on gs01. [root@node0 ~]# gluster volume info engine Volume Name: engine Type: Replicate Volume ID: 2ea009bf-c740-492e-956d-e1bca76a0bd3 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: gs00.itsmart.cloud:/gluster/engine0 Brick2: gs01.itsmart.cloud:/gluster/engine1 Options Reconfigured: storage.owner-uid: 36 storage.owner-gid: 36 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server auth.allow: * nfs.disable: off [root@node0 ~]# gluster volume status engine Status of volume: engine Gluster process Port Online Pid -- Brick gs00.itsmart.cloud:/gluster/engine0 50158 Y 3250 Brick gs01.itsmart.cloud:/gluster/engine1 50158 Y 5518 NFS Server on localhost N/A N N/A Self-heal Daemon on localhost N/A Y 3261 NFS Server on gs01.itsmart.cloud 2049 Y 5216 Self-heal Daemon on gs01.itsmart.cloud N/A Y 5223 Does anybody help me? Thanks in advance. Tibor ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors
Hi everyone, I have this really confusing split-brain here that's bothering me. I am running glusterfs 3.4.2 over linux 2.6.34. I have a replica 2 volume 'testvol' that is It seems I cannot read/stat/edit the file in question, and `gluster volume heal testvol info split-brain` shows nothing. Here are the logs from the fuse-mount for the volume: [2014-09-29 07:53:02.867111] W [fuse-bridge.c:1172:fuse_err_cbk] 0-glusterfs-fuse: 4560969: FLUSH() ERR = -1 (Input/output error) [2014-09-29 07:54:16.007799] W [page.c:991:__ioc_page_error] 0-testvol-io-cache: page error for page = 0x7fd5c8529d20 waitq = 0x7fd5c8067d40 [2014-09-29 07:54:16.007854] W [fuse-bridge.c:2089:fuse_readv_cbk] 0-glusterfs-fuse: 4561103: READ = -1 (Input/output error) [2014-09-29 07:54:16.008018] W [page.c:991:__ioc_page_error] 0-testvol-io-cache: page error for page = 0x7fd5c8607ee0 waitq = 0x7fd5c8067d40 [2014-09-29 07:54:16.008056] W [fuse-bridge.c:2089:fuse_readv_cbk] 0-glusterfs-fuse: 4561104: READ = -1 (Input/output error) [2014-09-29 07:54:16.008233] W [page.c:991:__ioc_page_error] 0-testvol-io-cache: page error for page = 0x7fd5c8066f30 waitq = 0x7fd5c8067d40 [2014-09-29 07:54:16.008269] W [fuse-bridge.c:2089:fuse_readv_cbk] 0-glusterfs-fuse: 4561105: READ = -1 (Input/output error) [2014-09-29 07:54:16.008800] W [page.c:991:__ioc_page_error] 0-testvol-io-cache: page error for page = 0x7fd5c860bcf0 waitq = 0x7fd5c863b1f0 [2014-09-29 07:54:16.008839] W [fuse-bridge.c:2089:fuse_readv_cbk] 0-glusterfs-fuse: 4561107: READ = -1 (Input/output error) [2014-09-29 07:54:16.009365] W [page.c:991:__ioc_page_error] 0-testvol-io-cache: page error for page = 0x7fd5c85fd120 waitq = 0x7fd5c8067d40 [2014-09-29 07:54:16.009413] W [fuse-bridge.c:2089:fuse_readv_cbk] 0-glusterfs-fuse: 4561109: READ = -1 (Input/output error) [2014-09-29 07:54:16.040549] W [afr-open.c:213:afr_open] 0-testvol-replicate-0: failed to open as split brain seen, returning EIO [2014-09-29 07:54:16.040594] W [fuse-bridge.c:915:fuse_fd_cbk] 0-glusterfs-fuse: 4561142: OPEN() /SECLOG/20140908.d/SECLOG_00427425_.log = -1 (Input/output error) Could somebody please give me some clue on where to begin? I checked the xattrs on /SECLOG/20140908.d/SECLOG_00427425_.log and it seems the changelogs are [0, 0] on both replicas, and the gfid's match. Thank you very much for any help on this. Anirban___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Need help: Mount -t glusterfs hangs
Hello, I am facing an intermittent issue with the mount.glusterfs program freezing up. Investigation reveals that it is not exactly the mount system call but the 'stat' call from within mount.glusterfs that is actually hung. Any other command such as df or ls also hang. I opened a redhat bugzilla bug against this. It has more details on the precise observation and steps followed. What intrigues me is that as soon as I set a volume option such as one for diagnostics or maybe the statedump path, the problem goes away. Any advice would be greatly appreciated as this is causing problems with some server replacement procedures I am trying out. Thanks a lot, Anirban___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Need help: Mount -t glusterfs hangs
Yes, very sorry. Actually, I kind of missed adding the bug id which made it all look so incomplete. Fact is, I am totally caught up in work today so am more absent minded than usual.. Thanks for pointing that out! As to the other details: Bug #1141940 Glusterfs 3.4.2 Linux 2.6.34 Also, I was kind of hoping that the fact that setting a volume option sort of might ring a bell with someone.. Someone who knows precisely what they do.. Thanks again, Anirban___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Enforce direct-io for volume rather than mount
Suppose I have a replica 2 setup using XFS for my volume, and that I also export my files over nfs, and write access is expected. Now if I wish to avoid split brains at all cost, even during server crashes and such, I am given to understand that direct-io mode would help. I know that there is a direct-io option with my mount.glusterfs program, but I wish for a direct-io enforcement for my entire volume so as to enforce uncached writes to the underlying XFS. Any ideas on how I could achieve that? Thanks for your replies. Anirban___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] : On breaking the connection between replicated volumes certain files return -ENOTCONN
We migrated to stable version 3.4.2 and confirmed that the error occurs with that as well. I reported this over bug 1062287. Thanks again, Anirban -- On Tue 4 Feb, 2014 2:27 PM MST Anirban Ghoshal wrote: Hi everyone, Here's a strange issue. I am using glusterfs 3.4.0 alpha. We need to move to a stable version ASAP, but I am telling you this just off chance that it might be interesting for somebody from the glusterfs development team. Please excuse the sheer length of this mail, but I am new to browsing such massive code, and not good at presenting my ideas very clearly. Here's a set of observations: 1. You have a replica 2 volume (testvol) on server1 and server2. You assume that on either server, it is also locally mounted via mount.glusterfs at /testvol. 2. You have a large number of soft-linked files within the volume. 3. You check heal info (all its facets) to ensure not a single file is out of sync (also, verify md5sum or such, if possible). 4. You abrupty take down the ethernet device over which the servers are conencted (ip link set eth-dev down). 5. On one of the servers (say, server1 for definiteness), if you do an 'ls -l' readlink returns 'Transport endpoint is not connected'. 6. The error resolves all by itself if you get the eth-link up. Here's some additional detail: 7. The error is intermittent, and not all soft-linked files have the issue. 8. If you take a directory containing soft-linked files, and if you do a ls -l _on_the_directory, like so, server1$ ls -l /testvol/somedir/bin/ ls: cannot read symbolic link /testvol/somedir/bin/reset: Transport endpoint is not connected ls: cannot read symbolic link /testvol/somedir/bin/bzless: Transport endpoint is not connected ls: cannot read symbolic link /testvol/somedir/bin/i386: Transport endpoint is not connected ls: cannot read symbolic link /testvol/somedir/bin/kill: Transport endpoint is not connected ls: cannot read symbolic link /testvol/somedir/bin/linux32: Transport endpoint is not connected ls: cannot read symbolic link /testvol/somedir/bin/linux64: Transport endpoint is not connected ls: cannot read symbolic link /testvol/somedir/bin/logger: Transport endpoint is not connected ls: cannot read symbolic link /testvol/somedir/bin/x86_64: Transport endpoint is not connected ls: cannot read symbolic link /testvol/somedir/bin/python2: Transport endpoint is not connected connected 9. If, however, you take a faulty soft-link and do an ls -l on it directly, then it rights itself immediately. server1$ ls -l /testvol/somedir/bin/x86_64 lrwxrwxrwx 1 root root 7 May 7 23:11 /testvol/somedir/bin/x86_64 - setarch I tried raising the client log level to 'trace'. Here's what I saw: Upon READLINK failures, (ls -l /testvol/somedir/bin/): [2010-05-09 01:13:28.140265] T [fuse-bridge.c:2453:fuse_readdir_cbk] 0-glusterfs-fuse: 2783484: READDIR = 23/4096,1380 [2010-05-09 01:13:28.140444] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 45 [2010-05-09 01:13:28.140477] T [fuse-bridge.c:708:fuse_getattr_resume] 0-glusterfs-fuse: 2783485: GETATTR 140299577689176 (/testvol/somedir/bin) [2010-05-09 01:13:28.140618] T [fuse-bridge.c:641:fuse_attr_cbk] 0-glusterfs-fuse: 2783485: STAT() /testvol/somedir/bin = -5626802993936595428 [2010-05-09 01:13:28.140722] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 52 [2010-05-09 01:13:28.140737] T [fuse-bridge.c:506:fuse_lookup_resume] 0-glusterfs-fuse: 2783486: LOOKUP /testvol/somedir/bin/x86_64(025d1c57-865f-4f1f-bc95-96ddcef3dc03) [2010-05-09 01:13:28.140851] T [fuse-bridge.c:376:fuse_entry_cbk] 0-glusterfs-fuse: 2783486: LOOKUP() /testvol/somedir/bin/x86_64 = -4857810743645185021 [2010-05-09 01:13:28.140954] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 52 [2010-05-09 01:13:28.140975] T [fuse-bridge.c:1296:fuse_readlink_resume] 0-glusterfs-fuse: 2783487 READLINK /testvol/somedir/bin/x86_64/025d1c57-865f-4f1f-bc95-96ddcef3dc03 [2010-05-09 01:13:28.141090] D [afr-common.c:760:afr_get_call_child] 0-_testvol-replicate-0: Returning -107, call_child: -1, last_index: -1 [2010-05-09 01:13:28.141120] W [fuse-bridge.c:1271:fuse_readlink_cbk] 0-glusterfs-fuse: 2783487: /testvol/somedir/bin/x86_64 = -1 (Transport endpoint is not connected) Upon successful readlink (ls -l /testvol/somedir/bin/x86_64): [2010-05-09 01:13:37.717904] T [fuse-bridge.c:376:fuse_entry_cbk] 0-glusterfs-fuse: 2790073: LOOKUP() /testvol/somedir/bin = -5626802993936595428 [2010-05-09 01:13:37.718070] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 52 [2010-05-09 01:13:37.718127] T [fuse-bridge.c:506:fuse_lookup_resume] 0-glusterfs-fuse: 2790074: LOOKUP /testvol/somedir/bin/x86_64(025d1c57-865f-4f1f-bc95-96ddcef3dc03) [2010-05-09 01:13:37.718306] D [afr-common.c:131:afr_lookup_xattr_req_prepare] 0-_testvol-replicate-0: /testvol/somedir/bin/x86_64
[Gluster-users] On breaking the connection between replicated volumes certain files return -ENOTCONN
Hi everyone, Here's a strange issue. I am using glusterfs 3.4.0 alpha. We need to move to a stable version ASAP, but I am telling you this just off chance that it might be interesting for somebody from the glusterfs development team. Please excuse the sheer length of this mail, but I am new to browsing such massive code, and not good at presenting my ideas very clearly. Here's a set of observations: 1. You have a replica 2 volume (testvol) on server1 and server2. You assume that on either server, it is also locally mounted via mount.glusterfs at /testvol. 2. You have a large number of soft-linked files within the volume. 3. You check heal info (all its facets) to ensure not a single file is out of sync (also, verify md5sum or such, if possible). 4. You abrupty take down the ethernet device over which the servers are conencted (ip link set eth-dev down). 5. On one of the servers (say, server1 for definiteness), if you do an 'ls -l' readlink returns 'Transport endpoint is not connected'. 6. The error resolves all by itself if you get the eth-link up. Here's some additional detail: 7. The error is intermittent, and not all soft-linked files have the issue. 8. If you take a directory containing soft-linked files, and if you do a ls -l _on_the_directory, like so, server1$ ls -l /testvol/somedir/bin/ ls: cannot read symbolic link /testvol/somedir/bin/reset: Transport endpoint is not connected ls: cannot read symbolic link /testvol/somedir/bin/bzless: Transport endpoint is not connected ls: cannot read symbolic link /testvol/somedir/bin/i386: Transport endpoint is not connected ls: cannot read symbolic link /testvol/somedir/bin/kill: Transport endpoint is not connected ls: cannot read symbolic link /testvol/somedir/bin/linux32: Transport endpoint is not connected ls: cannot read symbolic link /testvol/somedir/bin/linux64: Transport endpoint is not connected ls: cannot read symbolic link /testvol/somedir/bin/logger: Transport endpoint is not connected ls: cannot read symbolic link /testvol/somedir/bin/x86_64: Transport endpoint is not connected ls: cannot read symbolic link /testvol/somedir/bin/python2: Transport endpoint is not connected connected 9. If, however, you take a faulty soft-link and do an ls -l on it directly, then it rights itself immediately. server1$ ls -l /testvol/somedir/bin/x86_64 lrwxrwxrwx 1 root root 7 May 7 23:11 /testvol/somedir/bin/x86_64 - setarch I tried raising the client log level to 'trace'. Here's what I saw: Upon READLINK failures, (ls -l /testvol/somedir/bin/): [2010-05-09 01:13:28.140265] T [fuse-bridge.c:2453:fuse_readdir_cbk] 0-glusterfs-fuse: 2783484: READDIR = 23/4096,1380 [2010-05-09 01:13:28.140444] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 45 [2010-05-09 01:13:28.140477] T [fuse-bridge.c:708:fuse_getattr_resume] 0-glusterfs-fuse: 2783485: GETATTR 140299577689176 (/testvol/somedir/bin) [2010-05-09 01:13:28.140618] T [fuse-bridge.c:641:fuse_attr_cbk] 0-glusterfs-fuse: 2783485: STAT() /testvol/somedir/bin = -5626802993936595428 [2010-05-09 01:13:28.140722] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 52 [2010-05-09 01:13:28.140737] T [fuse-bridge.c:506:fuse_lookup_resume] 0-glusterfs-fuse: 2783486: LOOKUP /testvol/somedir/bin/x86_64(025d1c57-865f-4f1f-bc95-96ddcef3dc03) [2010-05-09 01:13:28.140851] T [fuse-bridge.c:376:fuse_entry_cbk] 0-glusterfs-fuse: 2783486: LOOKUP() /testvol/somedir/bin/x86_64 = -4857810743645185021 [2010-05-09 01:13:28.140954] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 52 [2010-05-09 01:13:28.140975] T [fuse-bridge.c:1296:fuse_readlink_resume] 0-glusterfs-fuse: 2783487 READLINK /testvol/somedir/bin/x86_64/025d1c57-865f-4f1f-bc95-96ddcef3dc03 [2010-05-09 01:13:28.141090] D [afr-common.c:760:afr_get_call_child] 0-_testvol-replicate-0: Returning -107, call_child: -1, last_index: -1 [2010-05-09 01:13:28.141120] W [fuse-bridge.c:1271:fuse_readlink_cbk] 0-glusterfs-fuse: 2783487: /testvol/somedir/bin/x86_64 = -1 (Transport endpoint is not connected) Upon successful readlink (ls -l /testvol/somedir/bin/x86_64): [2010-05-09 01:13:37.717904] T [fuse-bridge.c:376:fuse_entry_cbk] 0-glusterfs-fuse: 2790073: LOOKUP() /testvol/somedir/bin = -5626802993936595428 [2010-05-09 01:13:37.718070] T [fuse-resolve.c:51:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 52 [2010-05-09 01:13:37.718127] T [fuse-bridge.c:506:fuse_lookup_resume] 0-glusterfs-fuse: 2790074: LOOKUP /testvol/somedir/bin/x86_64(025d1c57-865f-4f1f-bc95-96ddcef3dc03) [2010-05-09 01:13:37.718306] D [afr-common.c:131:afr_lookup_xattr_req_prepare] 0-_testvol-replicate-0: /testvol/somedir/bin/x86_64: failed to get the gfid from dict [2010-05-09 01:13:37.718355] T [rpc-clnt.c:1301:rpc_clnt_record] 0-_testvol-client-1: Auth Info: pid: 3343, uid: 0, gid: 0, owner: [2010-05-09 01:13:37.718383] T
Re: [Gluster-users] File (setuid) permission changes during volume heal - possible bug?
Hi Ravi, Many thanks for the super-quick turnaround on this! Didn't know about this one quirk os chown, so thanks for that as well. Anirban On Thursday, 30 January 2014 9:22 AM, Ravishankar N ravishan...@redhat.com wrote: Hi Anirban, Thanks for taking the time off to file the bugzilla bug report. The fix has been sent for review upstream (http://review.gluster.org/#/c/6862/). Once it is merged, I will backport it to 3.4 as well. Regards, Ravi On 01/28/2014 02:07 AM, Chalcogen wrote: Hi, I am working on a twin-replicated setup (server1 and server2) with glusterfs 3.4.0. I perform the following steps: 1. Create a distributed volume 'testvol' with the XFS brick server1:/brick/testvol on server1, and mount it using the glusterfs native client at /testvol. 2. I copy the following file to /testvol: server1:~$ ls -l /bin/su -rwsr-xr-x 1 root root 84742 Jan 17 2014 /bin/su server1:~$ cp -a /bin/su /testvol 3. Within /testvol if I list out the file I just copied, I find its attributes intact. 4. Now, I add the XFS brick server2:/brick/testvol. server2:~$ gluster volume add-brick testvol replica 2 server2:/brick/testvol At this point, heal kicks in and the file is replicated on server 2. 5. If I list out su in testvol on either server now, now, this is what I see. server1:~$ ls -l /testvol/su -rwsr-xr-x 1 root root 84742 Jan 17 2014 /bin/su server2:~$ ls -l /testvol/su -rwxr-xr-x 1 root root 84742 Jan 17 2014 /bin/suThat is, the 's' file mode gets changed to plain 'x' - meaning, all the attributes are not preserved upon heal completion. Would you consider this a bug? Is the behavior different on a higher release? Thanks a lot. Anirban ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Mounting soft-linked paths over nfs
Dear gluster users, I am facing a tiny bit of issue with the glusterfs nfs server. If I export a volume testvol, and within testvol, I have a path, say, dir1/dir2. If dir1 and dir2 are actual directories, then one can simply mount testvol/dir1/dir2 over nfs. However, if either of dir1 or dir2 is a soft-link, then mount.nfs returns -EINVAL. Would you say that this is normal behavior with this nfs server? Also, I am using the 3.4.0 release. Would it help if I upgrade? Thanks a lot, Anirban ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Passing noforget option to glusterfs native client mounts
Hi, and Thanks a lot, Anand! I was initially searching for a good answer to why the glusterfs site lists knfsd as NOT compatible with the glusterfs. So, now I know. :) Funnily enough, we didn't have a problem with the failover during our testing. We passed constant fsid's (fsid=xxx) while exporting our mounts and NFS mounts on client applications haven't called any of the file handles out stale while migrating the NFS service from one server to the other. Not sure why this happpens. Do nodeid's and generation numbers remain invariant across storage servers in glusterfs-3.4.0? We, for our part, have a pretty small amount of data in our filesystem (that is, compared with the petabyte sized volumes glusterfs commonly manages). Our total volume size would be somewhere around 4 GB, and some 50, 000 files is all they contain. Each server has around 16 GB of RAM, so space is not at a premium for this project... However, saying that, if glusterfs NFS server does maintain identical file handles across all its servers and does not alter file-handles upon failover, then in the long run it might be prudent to switch to glusterFS NFS as the cleaner solution... Thanks again! Anirban On Tuesday, 24 December 2013 1:58 PM, Anand Avati av...@gluster.org wrote: Hi, Allowing noforget option to FUSE will not help for your cause. Gluster persents the address of the inode_t as the nodeid to FUSE. In turn FUSE creates a filehandle using this nodeid for knfsd to export to nfs client. When knfsd fails over to another server, FUSE will decode the handle encoded by the other NFS server and try to use the nodeid of the other server - which will obviously not work as the virtual address of glusterfs process on the other server is not valid here. Short version: the file-handle generated through FUSE is not durable. The noforget option in FUSE is a hack to avoid ESTALE messages because of dcache pruning. If you have enough inode in your volume, your system will go OOM at some point. The noforget is NOT a solution for providing NFS failover to a different server. For reasons such as these, we ended up implementing our own NFS server where we encode a filehandle using the GFID (which is durable across reboots and server failovers). I would strongly recommend NOT using knfsd with any FUSE based filesystems (not just glusterfs) for a serious production use, and it will just not work if you are designing for NFS high availability/fail-over. Thanks, Avati On Sat, Dec 21, 2013 at 8:52 PM, Anirban Ghoshal chalcogen_eg_oxy...@yahoo.com wrote: If somebody has an idea on how this could be done, could you please help out? I am still stuck on this, apparently... Thanks, Anirban On Thursday, 19 December 2013 1:40 AM, Chalcogen chalcogen_eg_oxy...@yahoo.com wrote: P.s. I think I need to clarify this: I am only reading from the mounts, and not modifying anything on the server. and so the commonest causes on stale file handles do not appy. Anirban On Thursday 19 December 2013 01:16 AM, Chalcogen wrote: Hi everybody, A few months back I joined a project where people want to replace their legacy fuse-based (twin-server) replicated file-system with GlusterFS. They also have a high-availability NFS server code tagged with the kernel NFSD that they would wish to retain (the nfs-kernel-server, I mean). The reason they wish to retain the kernel NFS and not use the NFS server that comes with GlusterFS is mainly because there's this bit of code that allows NFS IP's to be migrated from one host server to the other in the case that one happens to go down, and tweaks on the export server configuration allow the file-handles to remain identical on the new host server. The solution was to mount gluster volumes using the mount.glusterfs native client program and then export the directories over the kernel NFS server. This seems to work most of the time, but on rare occasions, 'stale file handle' is reported off certain clients, which really puts a damper over the 'high-availability' thing. After suitably instrumenting the nfsd/fuse code in the kernel, it seems that decoding of the file-handle fails on the server because the inode record corresponding to the nodeid in the handle cannot be looked up. Combining this with the fact that a second attempt by the client to execute lookup on the same file passes, one might suspect that the problem is identical to what many people attempting to export fuse mounts over the kernel's NFS server are facing; viz, fuse 'forgets' the inode records thereby causing ilookup5() to fail. Miklos and other fuse developers/hackers would point towards '-o noforget' while mounting their fuse file-systems. I tried passing '-o noforget' to mount.glusterfs, but it does
Re: [Gluster-users] Passing noforget option to glusterfs native client mounts
Thanks, Harshavardhana, Anand for the tips! I checked out parts of the linux-2.6.34 (the one we are using down here) knfsd/fuse code. I understood (hopefully, rightly) that when we export a fuse directory over NFS and specify an fsid, the handle is constructed somewhat like this: fh_size (4 bytes) - fh_version and stuff (4 bytes) - fsid, from export parms (4 bytes) - nodeid (8 bytes) - generation number (4 bytes) - parent nodeid (8 bytes) - parent generation (4 bytes). So, since Anand mentions that nodeid's for glusterfs are just the inode_t addresses on servers, I can now relate to the fact that the file handles might not even survive failovers in any and every case, even with the fsid constant. That's why I was so confused.. I never faced an issue with stale file handles during failover yet! Maybe something to do with the order in which files were created on the replica server following heal commencement (our data is quite static btw) - like, if you malloc identical things on two identical platforms by running the same executable on each, you get allocations at the exact same virtual addresses... However, now that I understand at least in part how this works, Glusterfs NFS does seem a lot cleaner... Will also try out the Ganesha... Thanks! On Tuesday, 24 December 2013 11:04 PM, Harshavardhana har...@harshavardhana.net wrote: On Tue, Dec 24, 2013 at 8:21 AM, Anirban Ghoshal chalcogen_eg_oxy...@yahoo.com wrote: Hi, and Thanks a lot, Anand! I was initially searching for a good answer to why the glusterfs site lists knfsd as NOT compatible with the glusterfs. So, now I know. :) Funnily enough, we didn't have a problem with the failover during our testing. We passed constant fsid's (fsid=xxx) while exporting our mounts and NFS mounts on client applications haven't called any of the file handles out stale while migrating the NFS service from one server to the other. Not sure why this happpens. Using fsid is just a workaround always used to solve ESTALE on file handles. The device major/minor numbers are embedded in the NFS file handle, a problem when an NFS export is failed over or moved to another node during failover is that these numbers change when the resource is exported on the new node resulting in client to see a Stale NFS file handle error. We need to make sure the embedded number stays the same that is where the fsid export option - allowing us to specify a coherent number across various clients. GlusterNFS server is way cleaner solution for such consistency. Another thing would be to take the next step, give a go for 'NFS-Ganesha' and 'GlusterFS' integration? https://forge.gluster.org/nfs-ganesha-and-glusterfs-integration http://www.gluster.org/2013/09/gluster-ganesha-nfsv4-initial-impressions/ Cheers -- Religious confuse piety with mere ritual, the virtuous confuse regulation with outcomes___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Passing noforget option to glusterfs native client mounts
If somebody has an idea on how this could be done, could you please help out? I am still stuck on this, apparently... Thanks, Anirban On Thursday, 19 December 2013 1:40 AM, Chalcogen chalcogen_eg_oxy...@yahoo.com wrote: P.s. I think I need to clarify this: I am only reading from the mounts, and not modifying anything on the server. and so the commonest causes on stale file handles do not appy. Anirban On Thursday 19 December 2013 01:16 AM, Chalcogen wrote: Hi everybody, A few months back I joined a project where people want to replace their legacy fuse-based (twin-server) replicated file-system with GlusterFS. They also have a high-availability NFS server code tagged with the kernel NFSD that they would wish to retain (the nfs-kernel-server, I mean). The reason they wish to retain the kernel NFS and not use the NFS server that comes with GlusterFS is mainly because there's this bit of code that allows NFS IP's to be migrated from one host server to the other in the case that one happens to go down, and tweaks on the export server configuration allow the file-handles to remain identical on the new host server. The solution was to mount gluster volumes using the mount.glusterfs native client program and then export the directories over the kernel NFS server. This seems to work most of the time, but on rare occasions, 'stale file handle' is reported off certain clients, which really puts a damper over the 'high-availability' thing. After suitably instrumenting the nfsd/fuse code in the kernel, it seems that decoding of the file-handle fails on the server because the inode record corresponding to the nodeid in the handle cannot be looked up. Combining this with the fact that a second attempt by the client to execute lookup on the same file passes, one might suspect that the problem is identical to what many people attempting to export fuse mounts over the kernel's NFS server are facing; viz, fuse 'forgets' the inode records thereby causing ilookup5() to fail. Miklos and other fuse developers/hackers would point towards '-o noforget' while mounting their fuse file-systems. I tried passing '-o noforget' to mount.glusterfs, but it does not seem to recognize it. Could somebody help me out with the correct syntax to pass noforget to gluster volumes? Or, something we could pass to glusterfs that would instruct fuse to allocate a bigger cache for our inodes? Additionally, should you think that something else might be behind our problems, please do let me know. Here's my configuration: Linux kernel version: 2.6.34.12 GlusterFS versionn: 3.4.0 nfs.disable option for volumes: OFF on all volumes Thanks a lot for your time! Anirban P.s. I found quite a few pages on the web that admonish users that GlusterFS is not compatible with the kernel NFS server, but do not really give much detail. Is this one of the reasons for saying so? ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users