Re: [Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?
On 07/19/2014 11:25 AM, Andrew Lau wrote: On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri pkara...@redhat.com mailto:pkara...@redhat.com wrote: On 07/18/2014 05:43 PM, Andrew Lau wrote: On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur vbel...@redhat.com mailto:vbel...@redhat.com wrote: [Adding gluster-devel] On 07/18/2014 05:20 PM, Andrew Lau wrote: Hi all, As most of you have got hints from previous messages, hosted engine won't work on gluster . A quote from BZ1097639 Using hosted engine with Gluster backed storage is currently something we really warn against. I think this bug should be closed or re-targeted at documentation, because there is nothing we can do here. Hosted engine assumes that all writes are atomic and (immediately) available for all hosts in the cluster. Gluster violates those assumptions. I tried going through BZ1097639 but could not find much detail with respect to gluster there. A few questions around the problem: 1. Can somebody please explain in detail the scenario that causes the problem? 2. Is hosted engine performing synchronous writes to ensure that writes are durable? Also, if there is any documentation that details the hosted engine architecture that would help in enhancing our understanding of its interactions with gluster. Now my question, does this theory prevent a scenario of perhaps something like a gluster replicated volume being mounted as a glusterfs filesystem and then re-exported as the native kernel NFS share for the hosted-engine to consume? It could then be possible to chuck ctdb in there to provide a last resort failover solution. I have tried myself and suggested it to two people who are running a similar setup. Now using the native kernel NFS server for hosted-engine and they haven't reported as many issues. Curious, could anyone validate my theory on this? If we obtain more details on the use case and obtain gluster logs from the failed scenarios, we should be able to understand the problem better. That could be the first step in validating your theory or evolving further recommendations :). I'm not sure how useful this is, but Jiri Moskovcak tracked this down in an off list message. Message Quote: == We were able to track it down to this (thanks Andrew for providing the testing setup): -b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine' Traceback (most recent call last): File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py, line 165, in handle response = success + self._dispatch(data) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py, line 261, in _dispatch .get_all_stats_for_service_type(**options) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py, line 41, in get_all_stats_for_service_type d = self.get_raw_stats_for_service_type(storage_dir, service_type) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py, line 74, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY) OSError: [Errno 116] Stale file handle: '/rhev/data-center/mnt/localhost:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-engine.metadata' Andrew/Jiri, Would it be possible to post gluster logs of both the mount and bricks on the bz? I can take a look at it once. If I gather nothing then probably I will ask for your help in re-creating the issue. Pranith Unfortunately, I don't have the logs for that setup any more.. I'll try replicate when I get a chance. If I understand the comment from the BZ, I don't think it's a gluster bug per-say, more just how gluster does its replication. hi Andrew, Thanks for that. I couldn't come to any conclusions because no logs were available. It is unlikely that self-heal is involved because there were no bricks going down/up according to the bug description. Pranith It's definitely connected to the storage which leads us to the gluster, I'm not very familiar with the gluster so I need to check this with our gluster gurus. == Thanks, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org
Re: [Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?
On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri pkara...@redhat.com wrote: On 07/18/2014 05:43 PM, Andrew Lau wrote: On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur vbel...@redhat.com wrote: [Adding gluster-devel] On 07/18/2014 05:20 PM, Andrew Lau wrote: Hi all, As most of you have got hints from previous messages, hosted engine won't work on gluster . A quote from BZ1097639 Using hosted engine with Gluster backed storage is currently something we really warn against. I think this bug should be closed or re-targeted at documentation, because there is nothing we can do here. Hosted engine assumes that all writes are atomic and (immediately) available for all hosts in the cluster. Gluster violates those assumptions. I tried going through BZ1097639 but could not find much detail with respect to gluster there. A few questions around the problem: 1. Can somebody please explain in detail the scenario that causes the problem? 2. Is hosted engine performing synchronous writes to ensure that writes are durable? Also, if there is any documentation that details the hosted engine architecture that would help in enhancing our understanding of its interactions with gluster. Now my question, does this theory prevent a scenario of perhaps something like a gluster replicated volume being mounted as a glusterfs filesystem and then re-exported as the native kernel NFS share for the hosted-engine to consume? It could then be possible to chuck ctdb in there to provide a last resort failover solution. I have tried myself and suggested it to two people who are running a similar setup. Now using the native kernel NFS server for hosted-engine and they haven't reported as many issues. Curious, could anyone validate my theory on this? If we obtain more details on the use case and obtain gluster logs from the failed scenarios, we should be able to understand the problem better. That could be the first step in validating your theory or evolving further recommendations :). I'm not sure how useful this is, but Jiri Moskovcak tracked this down in an off list message. Message Quote: == We were able to track it down to this (thanks Andrew for providing the testing setup): -b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine' Traceback (most recent call last): File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py, line 165, in handle response = success + self._dispatch(data) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py, line 261, in _dispatch .get_all_stats_for_service_type(**options) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py, line 41, in get_all_stats_for_service_type d = self.get_raw_stats_for_service_type(storage_dir, service_type) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py, line 74, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY) OSError: [Errno 116] Stale file handle: '/rhev/data-center/mnt/localho st:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted -engine.metadata' Andrew/Jiri, Would it be possible to post gluster logs of both the mount and bricks on the bz? I can take a look at it once. If I gather nothing then probably I will ask for your help in re-creating the issue. Pranith Unfortunately, I don't have the logs for that setup any more.. I'll try replicate when I get a chance. If I understand the comment from the BZ, I don't think it's a gluster bug per-say, more just how gluster does its replication. It's definitely connected to the storage which leads us to the gluster, I'm not very familiar with the gluster so I need to check this with our gluster gurus. == Thanks, Vijay ___ Gluster-devel mailing listGluster-devel@gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume
On Fri, Jul 18, 2014 at 10:43 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: On 07/18/2014 07:57 PM, Anders Blomdell wrote: During testing of a 3*4 gluster (from master as of yesterday), I encountered two major weirdnesses: 1. A 'rm -rf some_dir' needed several invocations to finish, each time reporting a number of lines like these: rm: cannot remove ‘a/b/c/d/e/f’: Directory not empty This is reproducible for me when running dbench on nfs mounts. I think I may have seen it on glusterfs mounts as well but it seems more reproducible on nfs. I should have caught it sooner but it doesn't error out client side when cleaning up, and the next test I run the deletes are successful. When this happens in the nfs.log I see: This spams the log, from what I can tell it happens when dbench is creating the files: [2014-07-19 13:37:03.271651] I [MSGID: 109036] [dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /clients/client3/~dmtmp/SEED with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 2147483647 , Stop: 4294967295 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start: 0 , Stop: 2147483646 ], Then when the deletes fail I see the following when the client is removing the files: [2014-07-18 23:31:44.272465] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 74a6541a: /run8063_dbench/clients = -1 (Directory not empty) . . [2014-07-18 23:31:44.452988] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 7ea9541a: /run8063_dbench/clients = -1 (Directory not empty) [2014-07-18 23:31:45.262651] W [client-rpc-fops.c:1354:client3_3_access_cbk] 0-testvol-client-0: remote operation failed: Stale file handle [2014-07-18 23:31:45.263151] W [MSGID: 108008] [afr-read-txn.c:218:afr_read_txn] 0-testvol-replicate-0: Unreadable subvolume -1 found with e vent generation 2. (Possible split-brain) [2014-07-18 23:31:45.264196] W [nfs3.c:1532:nfs3svc_access_cbk] 0-nfs: 32ac541a: gfid:b073a189-91ea-46b2-b757-5b320591b848 = -1 (Stale fi le handle) [2014-07-18 23:31:45.264217] W [nfs3-helpers.c:3401:nfs3_log_common_res] 0-nfs-nfsv3: XID: 32ac541a, ACCESS: NFS: 70(Invalid file handle), P OSIX: 116(Stale file handle) [2014-07-18 23:31:45.266818] W [nfs3.c:1532:nfs3svc_access_cbk] 0-nfs: 33ac541a: gfid:b073a189-91ea-46b2-b757-5b320591b848 = -1 (Stale fi le handle) [2014-07-18 23:31:45.266853] W [nfs3-helpers.c:3401:nfs3_log_common_res] 0-nfs-nfsv3: XID: 33ac541a, ACCESS: NFS: 70(Invalid file handle), P OSIX: 116(Stale file handle) Occasionally I see: [2014-07-19 13:50:46.091429] W [socket.c:529:__socket_rwv] 0-NLM-client: readv on 192.168.11.102:45823 failed (No data available) [2014-07-19 13:50:46.091570] E [rpc-transport.c:485:rpc_transport_unref] (--/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_rpcclnt_notify+0x5a) [0x7f53775128ea] (--/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_unset_rpc_clnt+0x75) [0x7f537750e3e5] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_unref+0x63) [0x7f5388914693]))) 0-rpc_transport: invalid argument: this I'm opening a BZ now, I'll leave systems up and put the repro steps + hostnames in the BZ in case anyone wants to poke around. -b 2. After having successfully deleted all files from the volume, i have a single directory that is duplicated in gluster-fuse, like this: # ls -l /mnt/gluster total 24 drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ any idea on how to debug this issue? What are the steps to recreate? We need to first find what lead to this. Then probably which xlator leads to this. I have not seen this but I am running on a 6x2 volume. I wonder if this may only happen with replica 2? Pranith /Anders ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Random and frequent split brain
On Sat, Jul 19, 2014 at 08:23:29AM +0530, Pranith Kumar Karampuri wrote: Guys, Does anyone know why device-id can be different even though it is all single xfs filesystem? We see the following log in the brick-log. [2014-07-16 00:00:24.358628] W [posix-handle.c:586:posix_handle_hard] 0-home-posix: mismatching ino/dev between file The device-id (major:minor number) of a block-device can change, but will not change while the device is in use. Device-mapper (DM) is part of the stack that includes multipath and lvm (and more, but these are most common). The stack for the block-devices is built dynamically, and the device-id is assigned when the block-device is made active. The ordering of making devices active can change, hence the device-id too. It is also possible to deactivate some logical-volumes, and activate them in a different order. (You can not deactivate a dm-device when it is in use, for example mounted.) Without device-mapper in the io-stack, re-ordering disks is possible too, but requires a little more (advanced sysadmin) work. So, the main questions I'd ask would be: 1. What kind of block storage is used, LVM, multipath, ...? 2. Were there any issues on the block-layer, scsi-errors, reconnects? 3. Were there changes in the underlaying disks or their structure? Disks added, removed or new partitions created. 4. Were disks deactivated+activated again, for example for creating backups or snapshots on a level below the (XFS) filesystem? HTH, Niels /data/gluster/home/techiebuzz/techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old (1077282838/2431) and handle /data/gluster/home/.glusterfs/ae/f0/aef0404b-e084-4501-9d0f-0e6f5bb2d5e0 (1077282836/2431) [2014-07-16 00:00:24.358646] E [posix.c:823:posix_mknod] 0-home-posix: setting gfid on /data/gluster/home/techiebuzz/techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old failed Pranith On 07/17/2014 07:06 PM, Nilesh Govindrajan wrote: log1 was the log from client of node2. The filesystems are mounted locally. /data is a raid10 array and /data/gluster contains 4 volumes, one of which is home which is a high read/write one (the log of which was attached here). On Thu, Jul 17, 2014 at 11:54 AM, Pranith Kumar Karampuri pkara...@redhat.com wrote: On 07/17/2014 08:41 AM, Nilesh Govindrajan wrote: log1 and log2 are brick logs. The others are client logs. I see a lot of logs as below in 'log1' you attached. It seems like the device ID of where the file where it is actually stored, where the gfid-link of the same file is stored i.e inside brick-dir/.glusterfs/ are different. What all devices/filesystems are present inside the brick represented by 'log1'? [2014-07-16 00:00:24.358628] W [posix-handle.c:586:posix_handle_hard] 0-home-posix: mismatching ino/dev between file /data/gluster/home/techiebuzz/techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old (1077282838/2431) and handle /data/gluster/home/.glusterfs/ae/f0/aef0404b-e084-4501-9d0f-0e6f5bb2d5e0 (1077282836/2431) [2014-07-16 00:00:24.358646] E [posix.c:823:posix_mknod] 0-home-posix: setting gfid on /data/gluster/home/techiebuzz/techie-buzz.com/wp-content/cache/page_enhanced/techie-buzz.com/social-networking/facebook-will-permanently-remove-your-deleted-photos.html/_index.html.old failed Pranith On Thu, Jul 17, 2014 at 8:08 AM, Pranith Kumar Karampuri pkara...@redhat.com wrote: On 07/17/2014 07:28 AM, Nilesh Govindrajan wrote: On Thu, Jul 17, 2014 at 7:26 AM, Nilesh Govindrajan m...@nileshgr.com wrote: Hello, I'm having a weird issue. I have this config: node2 ~ # gluster peer status Number of Peers: 1 Hostname: sto1 Uuid: f7570524-811a-44ed-b2eb-d7acffadfaa5 State: Peer in Cluster (Connected) node1 ~ # gluster peer status Number of Peers: 1 Hostname: sto2 Port: 24007 Uuid: 3a69faa9-f622-4c35-ac5e-b14a6826f5d9 State: Peer in Cluster (Connected) Volume Name: home Type: Replicate Volume ID: 54fef941-2e33-4acf-9e98-1f86ea4f35b7 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: sto1:/data/gluster/home Brick2: sto2:/data/gluster/home Options Reconfigured: performance.write-behind-window-size: 2GB performance.flush-behind: on performance.cache-size: 2GB cluster.choose-local: on storage.linux-aio: on transport.keepalive: on performance.quick-read: on performance.io-cache: on performance.stat-prefetch: on performance.read-ahead: on cluster.data-self-heal-algorithm: diff nfs.disable: on sto1/2 is alias to node1/2 respectively. As you see, NFS is disabled so I'm using the native fuse mount on both nodes. The volume contains files and php scripts that
Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume
On Sat, Jul 19, 2014 at 10:02:33AM -0400, Benjamin Turner wrote: On Fri, Jul 18, 2014 at 10:43 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: On 07/18/2014 07:57 PM, Anders Blomdell wrote: During testing of a 3*4 gluster (from master as of yesterday), I encountered two major weirdnesses: 1. A 'rm -rf some_dir' needed several invocations to finish, each time reporting a number of lines like these: rm: cannot remove ‘a/b/c/d/e/f’: Directory not empty This is reproducible for me when running dbench on nfs mounts. I think I may have seen it on glusterfs mounts as well but it seems more reproducible on nfs. I should have caught it sooner but it doesn't error out client side when cleaning up, and the next test I run the deletes are successful. When this happens in the nfs.log I see: This spams the log, from what I can tell it happens when dbench is creating the files: [2014-07-19 13:37:03.271651] I [MSGID: 109036] [dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /clients/client3/~dmtmp/SEED with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 2147483647 , Stop: 4294967295 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start: 0 , Stop: 2147483646 ], My guess is that DHT/AFR fail to create the whole directory structure on all bricks (remember that directories should get created on all bricks, even for a dht only volume). If creating a directory fails on a particular brick, self-heal should pick it up... But well, maybe self-heal is not run when deleting directories, causing some directories on some bricks to be non-empty, but empty on others. It may be that this conflict is not handled correctly. You could maybe test with different volumes, and narrow down where the issue occurs: - a volume of one brick - a replicate volume with two bricks - a distribute volume with two bricks Potentially increase the number of bricks when a 2-brick afr-only, or dht-only volume does not trigger the issue reliably or quickly. Occasionally I see: [2014-07-19 13:50:46.091429] W [socket.c:529:__socket_rwv] 0-NLM-client: readv on 192.168.11.102:45823 failed (No data available) [2014-07-19 13:50:46.091570] E [rpc-transport.c:485:rpc_transport_unref] (--/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_rpcclnt_notify+0x5a) [0x7f53775128ea] (--/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_unset_rpc_clnt+0x75) [0x7f537750e3e5] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_unref+0x63) [0x7f5388914693]))) 0-rpc_transport: invalid argument: this This looks like a bug in the NFS-server, I suggest to file is independently from the directory tree create/delete issue. I'm opening a BZ now, I'll leave systems up and put the repro steps + hostnames in the BZ in case anyone wants to poke around. Thanks! The NFS problem does not need any checking on the running system. Niels -b 2. After having successfully deleted all files from the volume, i have a single directory that is duplicated in gluster-fuse, like this: # ls -l /mnt/gluster total 24 drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ any idea on how to debug this issue? What are the steps to recreate? We need to first find what lead to this. Then probably which xlator leads to this. I have not seen this but I am running on a 6x2 volume. I wonder if this may only happen with replica 2? Pranith /Anders ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel