Re: [Gluster-users] KVM lockups on Gluster 4.1.1
I upgraded late last week to 4.1.2. Since then I've seen several posix health checks fail and bricks drop offline but I'm not sure if that's related or a different root issue. I haven't seen the issue described below re-occur on 4.1.2 yet but it was intermittent to begin with so I'll probably need to run for a week or more to be confident. -Walter Deignan -Uline IT, Systems Architect From: "Claus Jeppesen" To: wdeig...@uline.com Cc: gluster-users@gluster.org Date: 08/20/2018 07:20 AM Subject:Re: [Gluster-users] KVM lockups on Gluster 4.1.1 I think I have seen this also on our CentOS 7.5 systems using GlusterFS 4.1.1 (*) - has an upgrade to 4.1.2 helped out ? I'm trying this now. Thanx, Claus. (*) libvirt/quemu log: [2018-08-19 16:45:54.275830] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-0: remote operation failed [Invalid argument] [2018-08-19 16:45:54.276156] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-1: remote operation failed [Invalid argument] [2018-08-19 16:45:54.276159] E [MSGID: 108010] [afr-lk-common.c:284:afr_unlock_inodelk_cbk] 0-glu-vol01-lab-replicate-0: path=(null) gfid=----: unlock failed on subvolume glu-vol 01-lab-client-0 with lock owner 28ae49704956 [Invalid argument] [2018-08-19 16:45:54.276183] E [MSGID: 108010] [afr-lk-common.c:284:afr_unlock_inodelk_cbk] 0-glu-vol01-lab-replicate-0: path=(null) gfid=----: unlock failed on subvolume glu-vol 01-lab-client-1 with lock owner 28ae49704956 [Invalid argument] [2018-08-19 17:16:03.690808] E [rpc-clnt.c:184:call_bail] 0-glu-vol01-lab-client-0: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0x3071a5 sent = 2018-08-19 16:45:54.276560. timeout = 1800 for 192.168.13.131:49152 [2018-08-19 17:16:03.691113] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-0: remote operation failed [Transport endpoint is not connected] [2018-08-19 17:46:03.855909] E [rpc-clnt.c:184:call_bail] 0-glu-vol01-lab-client-1: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0x301d0f sent = 2018-08-19 17:16:03.691174. timeout = 1800 for 192.168.13.132:49152 [2018-08-19 17:46:03.856170] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-1: remote operation failed [Transport endpoint is not connected] block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) ... many repeats ... block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) [2018-08-19 18:16:04.022526] E [rpc-clnt.c:184:call_bail] 0-glu-vol01-lab-client-0: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0x307221 sent = 2018-08-19 17:46:03.861005. timeout = 1800 for 192.168.13.131:49152 [2018-08-19 18:16:04.022788] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-0: remote operation failed [Transport endpoint is not connected] [2018-08-19 18:46:04.195590] E [rpc-clnt.c:184:call_bail] 0-glu-vol01-lab-client-1: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0x301d8a sent = 2018-08-19 18:16:04.022838. timeout = 1800 for 192.168.13.132:49152 [2018-08-19 18:46:04.195881] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-1: remote operation failed [Transport endpoint is not connected] block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) qemu: terminating on signal 15 from pid 507 2018-08-19 19:36:59.065+: shutting down, reason=destroyed 2018-08-19 19:37:08.059+: starting up libvirt version: 3.9.0, package: 14.el7_5.6 (CentOS BuildSystem <http://bugs.centos.org>, 2018-06-27-14:13:57, x86-01.bsys.centos.org), qemu version: 1.5.3 (qemu-kvm-1. 5.3-156.el7_5.3) At 19:37 the VM was restarted. On Wed, Aug 15, 2018 at 8:25 PM Walter Deignan wrote: I am using gluster to host KVM/QEMU images. I am seeing an intermittent issue where access to an image will hang. I have to do a lazy dismount of the gluster volume in order to break the lock and then reset the impacted virtual machine. It happened again today and I caught the events below in the client side logs. Any thoughts on what might cause this? It seemed to begin after I upgraded from 3.12.10 to 4.1.1 a few weeks ago. [2018-08-14 14:22:15.549501] E [MSGID: 114031] [client-rpc-fops_v2.c:1
[Gluster-users] KVM lockups on Gluster 4.1.1
I am using gluster to host KVM/QEMU images. I am seeing an intermittent issue where access to an image will hang. I have to do a lazy dismount of the gluster volume in order to break the lock and then reset the impacted virtual machine. It happened again today and I caught the events below in the client side logs. Any thoughts on what might cause this? It seemed to begin after I upgraded from 3.12.10 to 4.1.1 a few weeks ago. [2018-08-14 14:22:15.549501] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-4: remote operation failed [Invalid argument] [2018-08-14 14:22:15.549576] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-5: remote operation failed [Invalid argument] [2018-08-14 14:22:15.549583] E [MSGID: 108010] [afr-lk-common.c:284:afr_unlock_inodelk_cbk] 2-gv1-replicate-2: path=(null) gfid=----: unlock failed on subvolume gv1-client-4 with lock owner d89caca92b7f [Invalid argument] [2018-08-14 14:22:15.549615] E [MSGID: 108010] [afr-lk-common.c:284:afr_unlock_inodelk_cbk] 2-gv1-replicate-2: path=(null) gfid=----: unlock failed on subvolume gv1-client-5 with lock owner d89caca92b7f [Invalid argument] [2018-08-14 14:52:18.726219] E [rpc-clnt.c:184:call_bail] 2-gv1-client-4: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0xc5e00 sent = 2018-08-14 14:22:15.699082. timeout = 1800 for 10.35.20.106:49159 [2018-08-14 14:52:18.726254] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-4: remote operation failed [Transport endpoint is not connected] [2018-08-14 15:22:25.962546] E [rpc-clnt.c:184:call_bail] 2-gv1-client-5: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0xc4a6d sent = 2018-08-14 14:52:18.726329. timeout = 1800 for 10.35.20.107:49164 [2018-08-14 15:22:25.962587] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-5: remote operation failed [Transport endpoint is not connected] [2018-08-14 15:22:25.962618] W [MSGID: 108019] [afr-lk-common.c:601:is_blocking_locks_count_sufficient] 2-gv1-replicate-2: Unable to obtain blocking inode lock on even one child for gfid:24a48cae-53fe-4634-8fb7-0254c85ad672. [2018-08-14 15:22:25.962668] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 3715808: FSYNC() ERR => -1 (Transport endpoint is not connected) Volume configuration - Volume Name: gv1 Type: Distributed-Replicate Volume ID: 66ad703e-3bae-4e79-a0b7-29ea38e8fcfc Status: Started Snapshot Count: 0 Number of Bricks: 5 x 2 = 10 Transport-type: tcp Bricks: Brick1: dc-vihi44:/gluster/bricks/megabrick/data Brick2: dc-vihi45:/gluster/bricks/megabrick/data Brick3: dc-vihi44:/gluster/bricks/brick1/data Brick4: dc-vihi45:/gluster/bricks/brick1/data Brick5: dc-vihi44:/gluster/bricks/brick2_1/data Brick6: dc-vihi45:/gluster/bricks/brick2/data Brick7: dc-vihi44:/gluster/bricks/brick3/data Brick8: dc-vihi45:/gluster/bricks/brick3/data Brick9: dc-vihi44:/gluster/bricks/brick4/data Brick10: dc-vihi45:/gluster/bricks/brick4/data Options Reconfigured: cluster.min-free-inodes: 6% performance.client-io-threads: off nfs.disable: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: enable cluster.eager-lock: enable cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 1 user.cifs: off cluster.choose-local: off features.shard: on cluster.server-quorum-ratio: 51% -Walter Deignan -Uline IT, Systems Architect___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] "Input/output error" on mkdir for PPC64 based client
8 and avail_inodes is: 99.00 [2017-09-20 13:34:23.353086] D [MSGID: 0] [afr-transaction.c:1934:afr_post_nonblocking_entrylk_cbk] 0-gv0-replicate-0: Non blocking entrylks done. Proceeding to FOP [2017-09-20 13:34:23.353722] D [MSGID: 0] [dht-selfheal.c:1879:dht_selfheal_layout_new_directory] 0-gv0-dht: chunk size = 0x / 20466 = 209858.658018 [2017-09-20 13:34:23.353748] D [MSGID: 0] [dht-selfheal.c:1920:dht_selfheal_layout_new_directory] 0-gv0-dht: assigning range size 0x to gv0-replicate-0 [2017-09-20 13:34:23.353897] D [MSGID: 0] [afr-lk-common.c:448:transaction_lk_op] 0-gv0-replicate-0: lk op is for a transaction [2017-09-20 13:34:23.354052] D [MSGID: 0] [afr-transaction.c:1883:afr_post_nonblocking_inodelk_cbk] 0-gv0-replicate-0: Non blocking inodelks done. Proceeding to FOP [2017-09-20 13:34:23.354453] D [MSGID: 0] [afr-lk-common.c:448:transaction_lk_op] 0-gv0-replicate-0: lk op is for a transaction [2017-09-20 13:34:23.354969] D [MSGID: 109036] [dht-common.c:9527:dht_log_new_layout_for_dir_selfheal] 0-gv0-dht: Setting layout of /tempdir3 with [Subvol_name: gv0-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2017-09-20 13:34:23.355226] D [MSGID: 0] [afr-transaction.c:1883:afr_post_nonblocking_inodelk_cbk] 0-gv0-replicate-0: Non blocking inodelks done. Proceeding to FOP [2017-09-20 13:34:23.355714] D [MSGID: 0] [afr-lk-common.c:448:transaction_lk_op] 0-gv0-replicate-0: lk op is for a transaction -Walter Deignan -Uline IT, Systems Architect From: Amar Tumballi To: Walter Deignan Cc: "gluster-users@gluster.org List" Date: 09/20/2017 01:23 PM Subject:Re: [Gluster-users] "Input/output error" on mkdir for PPC64 based client Looks like it is an issue with architecture compatibility in RPC layer (ie, with XDRs and how it is used). Just glance the logs of the client process where you saw the errors, which could give some hints. If you don't understand the logs, share them, so we will try to look into it. -Amar On Wed, Sep 20, 2017 at 2:40 AM, Walter Deignan wrote: I recently compiled the 3.10-5 client from source on a few PPC64 systems running RHEL 7.3. They are mounting a Gluster volume which is hosted on more traditional x86 servers. Everything seems to be working properly except for creating new directories from the PPC64 clients. The mkdir command gives a "Input/output error" and for the first few minutes the new directory is inaccessible. I checked the backend bricks and confirmed the directory was created properly on all of them. After waiting for 2-5 minutes the directory magically becomes accessible. This inaccessible directory issue only appears from the client which created it. When creating the directory from client #1 I can immediately see it with no errors from client #2. Using a pre-compiled 3.10-5 package on an x86 client doesn't show the issue. I poked around bugzilla but couldn't seem to find anything which matches this. [root@mqdev1 hafsdev1_gv0]# ls -lh total 8.0K drwxrwxr-x. 4 mqm mqm 4.0K Sep 19 15:47 data drwxr-xr-x. 2 root root 4.0K Sep 19 15:47 testdir [root@mqdev1 hafsdev1_gv0]# mkdir testdir2 mkdir: cannot create directory ?testdir2?: Input/output error [root@mqdev1 hafsdev1_gv0]# ls ls: cannot access testdir2: No such file or directory data testdir testdir2 [root@mqdev1 hafsdev1_gv0]# ls -lht ls: cannot access testdir2: No such file or directory total 8.0K drwxr-xr-x. 2 root root 4.0K Sep 19 15:47 testdir drwxrwxr-x. 4 mqm mqm 4.0K Sep 19 15:47 data d?? ? ?? ?? testdir2 [root@mqdev1 hafsdev1_gv0]# cd testdir2 -bash: cd: testdir2: No such file or directory *Wait a few minutes...* [root@mqdev1 hafsdev1_gv0]# ls -lht total 12K drwxr-xr-x. 2 root root 4.0K Sep 19 15:50 testdir2 drwxr-xr-x. 2 root root 4.0K Sep 19 15:47 testdir drwxrwxr-x. 4 mqm mqm 4.0K Sep 19 15:47 data [root@mqdev1 hafsdev1_gv0]# My volume config... [root@dc-hafsdev1a bricks]# gluster volume info Volume Name: gv0 Type: Replicate Volume ID: a2d37705-05cb-4700-8ed8-2cb89376faf0 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: dc-hafsdev1a.ulinedm.com:/gluster/bricks/brick1/data Brick2: dc-hafsdev1b.ulinedm.com:/gluster/bricks/brick1/data Brick3: dc-hafsdev1c.ulinedm.com:/gluster/bricks/brick1/data Options Reconfigured: nfs.disable: on transport.address-family: inet network.ping-timeout: 2 features.bitrot: on features.scrub: Active cluster.server-quorum-ratio: 51% -Walter Deignan -Uline IT, Systems Architect ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] "Input/output error" on mkdir for PPC64 based client
I recently compiled the 3.10-5 client from source on a few PPC64 systems running RHEL 7.3. They are mounting a Gluster volume which is hosted on more traditional x86 servers. Everything seems to be working properly except for creating new directories from the PPC64 clients. The mkdir command gives a "Input/output error" and for the first few minutes the new directory is inaccessible. I checked the backend bricks and confirmed the directory was created properly on all of them. After waiting for 2-5 minutes the directory magically becomes accessible. This inaccessible directory issue only appears from the client which created it. When creating the directory from client #1 I can immediately see it with no errors from client #2. Using a pre-compiled 3.10-5 package on an x86 client doesn't show the issue. I poked around bugzilla but couldn't seem to find anything which matches this. [root@mqdev1 hafsdev1_gv0]# ls -lh total 8.0K drwxrwxr-x. 4 mqm mqm 4.0K Sep 19 15:47 data drwxr-xr-x. 2 root root 4.0K Sep 19 15:47 testdir [root@mqdev1 hafsdev1_gv0]# mkdir testdir2 mkdir: cannot create directory ?testdir2?: Input/output error [root@mqdev1 hafsdev1_gv0]# ls ls: cannot access testdir2: No such file or directory data testdir testdir2 [root@mqdev1 hafsdev1_gv0]# ls -lht ls: cannot access testdir2: No such file or directory total 8.0K drwxr-xr-x. 2 root root 4.0K Sep 19 15:47 testdir drwxrwxr-x. 4 mqm mqm 4.0K Sep 19 15:47 data d?? ? ?? ?? testdir2 [root@mqdev1 hafsdev1_gv0]# cd testdir2 -bash: cd: testdir2: No such file or directory *Wait a few minutes...* [root@mqdev1 hafsdev1_gv0]# ls -lht total 12K drwxr-xr-x. 2 root root 4.0K Sep 19 15:50 testdir2 drwxr-xr-x. 2 root root 4.0K Sep 19 15:47 testdir drwxrwxr-x. 4 mqm mqm 4.0K Sep 19 15:47 data [root@mqdev1 hafsdev1_gv0]# My volume config... [root@dc-hafsdev1a bricks]# gluster volume info Volume Name: gv0 Type: Replicate Volume ID: a2d37705-05cb-4700-8ed8-2cb89376faf0 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: dc-hafsdev1a.ulinedm.com:/gluster/bricks/brick1/data Brick2: dc-hafsdev1b.ulinedm.com:/gluster/bricks/brick1/data Brick3: dc-hafsdev1c.ulinedm.com:/gluster/bricks/brick1/data Options Reconfigured: nfs.disable: on transport.address-family: inet network.ping-timeout: 2 features.bitrot: on features.scrub: Active cluster.server-quorum-ratio: 51% -Walter Deignan -Uline IT, Systems Architect___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Deleting large files on sharded volume hangs and doesn't delete shards
Sorry for the slow response. Your hunch was right. It seems to be a problem between tiering and sharding. I untiered the volume and the symptom vanished. I then deleted and recreated the volume entirely (without tiering) in order to cleanup the orphaned shards. -Walter Deignan -Uline IT, Systems Architect From: Nithya Balachandran To: Walter Deignan Cc: gluster-users Date: 05/17/2017 10:17 PM Subject:Re: [Gluster-users] Deleting large files on sharded volume hangs and doesn't delete shards I don't think we have tested shards with a tiered volume. Do you see such issues on non-tiered sharded volumes? Regards, Nithya On 18 May 2017 at 00:51, Walter Deignan wrote: I have a reproducible issue where attempting to delete a file large enough to have been sharded hangs. I can't kill the 'rm' command and eventually am forced to reboot the client (which in this case is also part of the gluster cluster). After the node finishes rebooting I can see that while the file front-end is gone, the back-end shards are still present. Is this a known issue? Any way to get around it? -- [root@dc-vihi19 ~]# gluster volume info gv0 Volume Name: gv0 Type: Tier Volume ID: d42e366f-381d-4787-bcc5-cb6770cb7d58 Status: Started Snapshot Count: 0 Number of Bricks: 24 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 4 x 2 = 8 Brick1: dc-vihi71:/gluster/bricks/brick4/data Brick2: dc-vihi19:/gluster/bricks/brick4/data Brick3: dc-vihi70:/gluster/bricks/brick4/data Brick4: dc-vihi19:/gluster/bricks/brick3/data Brick5: dc-vihi71:/gluster/bricks/brick3/data Brick6: dc-vihi19:/gluster/bricks/brick2/data Brick7: dc-vihi70:/gluster/bricks/brick3/data Brick8: dc-vihi19:/gluster/bricks/brick1/data Cold Tier: Cold Tier Type : Distributed-Replicate Number of Bricks: 8 x 2 = 16 Brick9: dc-vihi19:/gluster/bricks/brick5/data Brick10: dc-vihi70:/gluster/bricks/brick1/data Brick11: dc-vihi19:/gluster/bricks/brick6/data Brick12: dc-vihi71:/gluster/bricks/brick1/data Brick13: dc-vihi19:/gluster/bricks/brick7/data Brick14: dc-vihi70:/gluster/bricks/brick2/data Brick15: dc-vihi19:/gluster/bricks/brick8/data Brick16: dc-vihi71:/gluster/bricks/brick2/data Brick17: dc-vihi19:/gluster/bricks/brick9/data Brick18: dc-vihi70:/gluster/bricks/brick5/data Brick19: dc-vihi19:/gluster/bricks/brick10/data Brick20: dc-vihi71:/gluster/bricks/brick5/data Brick21: dc-vihi19:/gluster/bricks/brick11/data Brick22: dc-vihi70:/gluster/bricks/brick6/data Brick23: dc-vihi19:/gluster/bricks/brick12/data Brick24: dc-vihi71:/gluster/bricks/brick6/data Options Reconfigured: nfs.disable: on transport.address-family: inet features.ctr-enabled: on cluster.tier-mode: cache features.shard: on features.shard-block-size: 512MB network.ping-timeout: 5 cluster.server-quorum-ratio: 51% [root@dc-vihi19 temp]# ls -lh total 26G -rw-rw-rw-. 1 root root 31G May 17 10:38 win7.qcow2 [root@dc-vihi19 temp]# getfattr -n glusterfs.gfid.string win7.qcow2 # file: win7.qcow2 glusterfs.gfid.string="7f4a0fea-72c0-41e4-97a5-6297be0a9142" [root@dc-vihi19 temp]# rm win7.qcow2 rm: remove regular file âwin7.qcow2â? y *Process hangs and can't be killed. A reboot later...* login as: root Authenticating with public key "rsa-key-20170510" Last login: Wed May 17 14:04:29 2017 from ** [root@dc-vihi19 ~]# find /gluster/bricks -name "7f4a0fea-72c0-41e4-97a5-6297be0a9142*" /gluster/bricks/brick1/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.23 /gluster/bricks/brick1/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.35 /gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.52 /gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.29 /gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.22 /gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.24 and so on... -Walter Deignan -Uline IT, Systems Architect ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Deleting large files on sharded volume hangs and doesn't delete shards
I have a reproducible issue where attempting to delete a file large enough to have been sharded hangs. I can't kill the 'rm' command and eventually am forced to reboot the client (which in this case is also part of the gluster cluster). After the node finishes rebooting I can see that while the file front-end is gone, the back-end shards are still present. Is this a known issue? Any way to get around it? -- [root@dc-vihi19 ~]# gluster volume info gv0 Volume Name: gv0 Type: Tier Volume ID: d42e366f-381d-4787-bcc5-cb6770cb7d58 Status: Started Snapshot Count: 0 Number of Bricks: 24 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 4 x 2 = 8 Brick1: dc-vihi71:/gluster/bricks/brick4/data Brick2: dc-vihi19:/gluster/bricks/brick4/data Brick3: dc-vihi70:/gluster/bricks/brick4/data Brick4: dc-vihi19:/gluster/bricks/brick3/data Brick5: dc-vihi71:/gluster/bricks/brick3/data Brick6: dc-vihi19:/gluster/bricks/brick2/data Brick7: dc-vihi70:/gluster/bricks/brick3/data Brick8: dc-vihi19:/gluster/bricks/brick1/data Cold Tier: Cold Tier Type : Distributed-Replicate Number of Bricks: 8 x 2 = 16 Brick9: dc-vihi19:/gluster/bricks/brick5/data Brick10: dc-vihi70:/gluster/bricks/brick1/data Brick11: dc-vihi19:/gluster/bricks/brick6/data Brick12: dc-vihi71:/gluster/bricks/brick1/data Brick13: dc-vihi19:/gluster/bricks/brick7/data Brick14: dc-vihi70:/gluster/bricks/brick2/data Brick15: dc-vihi19:/gluster/bricks/brick8/data Brick16: dc-vihi71:/gluster/bricks/brick2/data Brick17: dc-vihi19:/gluster/bricks/brick9/data Brick18: dc-vihi70:/gluster/bricks/brick5/data Brick19: dc-vihi19:/gluster/bricks/brick10/data Brick20: dc-vihi71:/gluster/bricks/brick5/data Brick21: dc-vihi19:/gluster/bricks/brick11/data Brick22: dc-vihi70:/gluster/bricks/brick6/data Brick23: dc-vihi19:/gluster/bricks/brick12/data Brick24: dc-vihi71:/gluster/bricks/brick6/data Options Reconfigured: nfs.disable: on transport.address-family: inet features.ctr-enabled: on cluster.tier-mode: cache features.shard: on features.shard-block-size: 512MB network.ping-timeout: 5 cluster.server-quorum-ratio: 51% [root@dc-vihi19 temp]# ls -lh total 26G -rw-rw-rw-. 1 root root 31G May 17 10:38 win7.qcow2 [root@dc-vihi19 temp]# getfattr -n glusterfs.gfid.string win7.qcow2 # file: win7.qcow2 glusterfs.gfid.string="7f4a0fea-72c0-41e4-97a5-6297be0a9142" [root@dc-vihi19 temp]# rm win7.qcow2 rm: remove regular file âwin7.qcow2â? y *Process hangs and can't be killed. A reboot later...* login as: root Authenticating with public key "rsa-key-20170510" Last login: Wed May 17 14:04:29 2017 from ** [root@dc-vihi19 ~]# find /gluster/bricks -name "7f4a0fea-72c0-41e4-97a5-6297be0a9142*" /gluster/bricks/brick1/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.23 /gluster/bricks/brick1/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.35 /gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.52 /gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.29 /gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.22 /gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.24 and so on... -Walter Deignan -Uline IT, Systems Architect___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Arbiter and Hot Tier
Thank you very much for the assistance. -Walter Deignan -Uline IT, Systems Architect From: Ravishankar N To: Walter Deignan Cc: gluster-users@gluster.org Date: 05/05/2017 11:43 AM Subject:Re: [Gluster-users] Arbiter and Hot Tier Okay, just tried it in 3.9 Even though attaching a non arbiter volume (say a replica-2 ) as hot-tier is succeeding at the CLI level, the brick volfiles seem to be generated incorrectly (I see that the arbiter translator is getting loaded in one of the replica-2 bricks too, which is incorrect). I'd recommend not to use arbiter for tiering irrespective of hot or cold tier. On 05/05/2017 09:42 PM, Walter Deignan wrote: Did that change between 3.9 and 3.10? When I originally saw some references on the Redhat storage packaged solution about a possible incompatibility I assumed it just meant that the hot tier itself couldn't be an arbiter volume. I was tripped up by the change in apparent support for the cold tier between 3.9 and 3.10. But maybe that was just a fixed oversight which never should have worked in the first place? -Walter Deignan -Uline IT, Systems Architect From:Ravishankar N To: Walter Deignan , gluster-users@gluster.org Date:05/05/2017 11:08 AM Subject:Re: [Gluster-users] Arbiter and Hot Tier Hi Walter, Yes, arbiter volumes are currently not supported with tiering. -Ravi On 05/05/2017 08:54 PM, Walter Deignan wrote: I've been googling this to no avail so apologies if this is explained somewhere I missed. Is there a known incompatibility between using arbiters and hot tiering? Experience on 3.9 Original volume - replica 3 arbiter 1 Attach replica 2 arbiter 1 hot tier - failure Attach replica 3 hot tier - success Experience on 3.10 Original volume - replica 3 arbiter 1 Attach replica 2 arbiter 1 hot tier - failure Attach replica 3 hot tier - failure Attach hot tier without specifying replica - success but comes in as a distributed tier which I would assume totally negates the point of having a replicated cold tier? The specific error message I get is "volume attach-tier: failed: Increasing replica count for arbiter volumes is not supported." -Walter Deignan -Uline IT, Systems Architect ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Arbiter and Hot Tier
Did that change between 3.9 and 3.10? When I originally saw some references on the Redhat storage packaged solution about a possible incompatibility I assumed it just meant that the hot tier itself couldn't be an arbiter volume. I was tripped up by the change in apparent support for the cold tier between 3.9 and 3.10. But maybe that was just a fixed oversight which never should have worked in the first place? -Walter Deignan -Uline IT, Systems Architect From: Ravishankar N To: Walter Deignan , gluster-users@gluster.org Date: 05/05/2017 11:08 AM Subject:Re: [Gluster-users] Arbiter and Hot Tier Hi Walter, Yes, arbiter volumes are currently not supported with tiering. -Ravi On 05/05/2017 08:54 PM, Walter Deignan wrote: I've been googling this to no avail so apologies if this is explained somewhere I missed. Is there a known incompatibility between using arbiters and hot tiering? Experience on 3.9 Original volume - replica 3 arbiter 1 Attach replica 2 arbiter 1 hot tier - failure Attach replica 3 hot tier - success Experience on 3.10 Original volume - replica 3 arbiter 1 Attach replica 2 arbiter 1 hot tier - failure Attach replica 3 hot tier - failure Attach hot tier without specifying replica - success but comes in as a distributed tier which I would assume totally negates the point of having a replicated cold tier? The specific error message I get is "volume attach-tier: failed: Increasing replica count for arbiter volumes is not supported." -Walter Deignan -Uline IT, Systems Architect ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Arbiter and Hot Tier
I've been googling this to no avail so apologies if this is explained somewhere I missed. Is there a known incompatibility between using arbiters and hot tiering? Experience on 3.9 Original volume - replica 3 arbiter 1 Attach replica 2 arbiter 1 hot tier - failure Attach replica 3 hot tier - success Experience on 3.10 Original volume - replica 3 arbiter 1 Attach replica 2 arbiter 1 hot tier - failure Attach replica 3 hot tier - failure Attach hot tier without specifying replica - success but comes in as a distributed tier which I would assume totally negates the point of having a replicated cold tier? The specific error message I get is "volume attach-tier: failed: Increasing replica count for arbiter volumes is not supported." -Walter Deignan -Uline IT, Systems Architect___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users