Re: [Gluster-users] KVM lockups on Gluster 4.1.1

2018-08-20 Thread Walter Deignan
I upgraded late last week to 4.1.2. Since then I've seen several posix 
health checks fail and bricks drop offline but I'm not sure if that's 
related or a different root issue.

I haven't seen the issue described below re-occur on 4.1.2 yet but it was 
intermittent to begin with so I'll probably need to run for a week or more 
to be confident.

-Walter Deignan
-Uline IT, Systems Architect



From:   "Claus Jeppesen" 
To: wdeig...@uline.com
Cc: gluster-users@gluster.org
Date:   08/20/2018 07:20 AM
Subject:Re: [Gluster-users] KVM lockups on Gluster 4.1.1



I think I have seen this also on our CentOS 7.5 systems using GlusterFS 
4.1.1 (*) - has an upgrade to 4.1.2 helped out ? I'm trying this now.

Thanx,

Claus.

(*)  libvirt/quemu log:
[2018-08-19 16:45:54.275830] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 
0-glu-vol01-lab-client-0: remote operation failed [Invalid argument] 
[2018-08-19 16:45:54.276156] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 
0-glu-vol01-lab-client-1: remote operation failed [Invalid argument] 
[2018-08-19 16:45:54.276159] E [MSGID: 108010] 
[afr-lk-common.c:284:afr_unlock_inodelk_cbk] 0-glu-vol01-lab-replicate-0: 
path=(null) gfid=----: unlock failed on 
subvolume glu-vol
01-lab-client-0 with lock owner 28ae49704956 [Invalid argument] 
[2018-08-19 16:45:54.276183] E [MSGID: 108010] 
[afr-lk-common.c:284:afr_unlock_inodelk_cbk] 0-glu-vol01-lab-replicate-0: 
path=(null) gfid=----: unlock failed on 
subvolume glu-vol
01-lab-client-1 with lock owner 28ae49704956 [Invalid argument] 
[2018-08-19 17:16:03.690808] E [rpc-clnt.c:184:call_bail] 
0-glu-vol01-lab-client-0: bailing out frame type(GlusterFS 4.x v1) 
op(FINODELK(30)) xid = 0x3071a5 sent = 2018-08-19 16:45:54.276560. timeout 
= 1800 for
192.168.13.131:49152 
[2018-08-19 17:16:03.691113] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 
0-glu-vol01-lab-client-0: remote operation failed [Transport endpoint is 
not connected] 
[2018-08-19 17:46:03.855909] E [rpc-clnt.c:184:call_bail] 
0-glu-vol01-lab-client-1: bailing out frame type(GlusterFS 4.x v1) 
op(FINODELK(30)) xid = 0x301d0f sent = 2018-08-19 17:16:03.691174. timeout 
= 1800 for
192.168.13.132:49152 
[2018-08-19 17:46:03.856170] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 
0-glu-vol01-lab-client-1: remote operation failed [Transport endpoint is 
not connected] 
block I/O error in device 'drive-virtio-disk0': Operation not permitted 
(1) 
... many repeats ... 
block I/O error in device 'drive-virtio-disk0': Operation not permitted 
(1) 
[2018-08-19 18:16:04.022526] E [rpc-clnt.c:184:call_bail] 
0-glu-vol01-lab-client-0: bailing out frame type(GlusterFS 4.x v1) 
op(FINODELK(30)) xid = 0x307221 sent = 2018-08-19 17:46:03.861005. timeout 
= 1800 for
192.168.13.131:49152 
[2018-08-19 18:16:04.022788] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 
0-glu-vol01-lab-client-0: remote operation failed [Transport endpoint is 
not connected] 
[2018-08-19 18:46:04.195590] E [rpc-clnt.c:184:call_bail] 
0-glu-vol01-lab-client-1: bailing out frame type(GlusterFS 4.x v1) 
op(FINODELK(30)) xid = 0x301d8a sent = 2018-08-19 18:16:04.022838. timeout 
= 1800 for
192.168.13.132:49152 
[2018-08-19 18:46:04.195881] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 
0-glu-vol01-lab-client-1: remote operation failed [Transport endpoint is 
not connected] 
block I/O error in device 'drive-virtio-disk0': Operation not permitted 
(1) 
block I/O error in device 'drive-virtio-disk0': Operation not permitted 
(1) 
block I/O error in device 'drive-virtio-disk0': Operation not permitted 
(1) 
block I/O error in device 'drive-virtio-disk0': Operation not permitted 
(1) 
block I/O error in device 'drive-virtio-disk0': Operation not permitted 
(1) 
qemu: terminating on signal 15 from pid 507 
2018-08-19 19:36:59.065+: shutting down, reason=destroyed 
2018-08-19 19:37:08.059+: starting up libvirt version: 3.9.0, package: 
14.el7_5.6 (CentOS BuildSystem <http://bugs.centos.org>, 
2018-06-27-14:13:57, x86-01.bsys.centos.org), qemu version: 1.5.3 
(qemu-kvm-1.
5.3-156.el7_5.3)

At 19:37 the VM was restarted.



On Wed, Aug 15, 2018 at 8:25 PM Walter Deignan  wrote:
I am using gluster to host KVM/QEMU images. I am seeing an intermittent 
issue where access to an image will hang. I have to do a lazy dismount of 
the gluster volume in order to break the lock and then reset the impacted 
virtual machine. 

It happened again today and I caught the events below in the client side 
logs. Any thoughts on what might cause this? It seemed to begin after I 
upgraded from 3.12.10 to 4.1.1 a few weeks ago. 

[2018-08-14 14:22:15.549501] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1

[Gluster-users] KVM lockups on Gluster 4.1.1

2018-08-15 Thread Walter Deignan
I am using gluster to host KVM/QEMU images. I am seeing an intermittent 
issue where access to an image will hang. I have to do a lazy dismount of 
the gluster volume in order to break the lock and then reset the impacted 
virtual machine.

It happened again today and I caught the events below in the client side 
logs. Any thoughts on what might cause this? It seemed to begin after I 
upgraded from 3.12.10 to 4.1.1 a few weeks ago.

[2018-08-14 14:22:15.549501] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-4: remote 
operation failed [Invalid argument]
[2018-08-14 14:22:15.549576] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-5: remote 
operation failed [Invalid argument]
[2018-08-14 14:22:15.549583] E [MSGID: 108010] 
[afr-lk-common.c:284:afr_unlock_inodelk_cbk] 2-gv1-replicate-2: 
path=(null) gfid=----: unlock failed on 
subvolume gv1-client-4 with lock owner d89caca92b7f [Invalid argument]
[2018-08-14 14:22:15.549615] E [MSGID: 108010] 
[afr-lk-common.c:284:afr_unlock_inodelk_cbk] 2-gv1-replicate-2: 
path=(null) gfid=----: unlock failed on 
subvolume gv1-client-5 with lock owner d89caca92b7f [Invalid argument]
[2018-08-14 14:52:18.726219] E [rpc-clnt.c:184:call_bail] 2-gv1-client-4: 
bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0xc5e00 
sent = 2018-08-14 14:22:15.699082. timeout = 1800 for 10.35.20.106:49159
[2018-08-14 14:52:18.726254] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-4: remote 
operation failed [Transport endpoint is not connected]
[2018-08-14 15:22:25.962546] E [rpc-clnt.c:184:call_bail] 2-gv1-client-5: 
bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0xc4a6d 
sent = 2018-08-14 14:52:18.726329. timeout = 1800 for 10.35.20.107:49164
[2018-08-14 15:22:25.962587] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-5: remote 
operation failed [Transport endpoint is not connected]
[2018-08-14 15:22:25.962618] W [MSGID: 108019] 
[afr-lk-common.c:601:is_blocking_locks_count_sufficient] 
2-gv1-replicate-2: Unable to obtain blocking inode lock on even one child 
for gfid:24a48cae-53fe-4634-8fb7-0254c85ad672.
[2018-08-14 15:22:25.962668] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 3715808: FSYNC() ERR => -1 (Transport endpoint is not 
connected)

Volume configuration -

Volume Name: gv1
Type: Distributed-Replicate
Volume ID: 66ad703e-3bae-4e79-a0b7-29ea38e8fcfc
Status: Started
Snapshot Count: 0
Number of Bricks: 5 x 2 = 10
Transport-type: tcp
Bricks:
Brick1: dc-vihi44:/gluster/bricks/megabrick/data
Brick2: dc-vihi45:/gluster/bricks/megabrick/data
Brick3: dc-vihi44:/gluster/bricks/brick1/data
Brick4: dc-vihi45:/gluster/bricks/brick1/data
Brick5: dc-vihi44:/gluster/bricks/brick2_1/data
Brick6: dc-vihi45:/gluster/bricks/brick2/data
Brick7: dc-vihi44:/gluster/bricks/brick3/data
Brick8: dc-vihi45:/gluster/bricks/brick3/data
Brick9: dc-vihi44:/gluster/bricks/brick4/data
Brick10: dc-vihi45:/gluster/bricks/brick4/data
Options Reconfigured:
cluster.min-free-inodes: 6%
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 1
user.cifs: off
cluster.choose-local: off
features.shard: on
cluster.server-quorum-ratio: 51%

-Walter Deignan
-Uline IT, Systems Architect___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] "Input/output error" on mkdir for PPC64 based client

2017-09-21 Thread Walter Deignan
8 
and avail_inodes is: 99.00
[2017-09-20 13:34:23.353086] D [MSGID: 0] 
[afr-transaction.c:1934:afr_post_nonblocking_entrylk_cbk] 
0-gv0-replicate-0: Non blocking entrylks done. Proceeding to FOP
[2017-09-20 13:34:23.353722] D [MSGID: 0] 
[dht-selfheal.c:1879:dht_selfheal_layout_new_directory] 0-gv0-dht: chunk 
size = 0x / 20466 = 209858.658018
[2017-09-20 13:34:23.353748] D [MSGID: 0] 
[dht-selfheal.c:1920:dht_selfheal_layout_new_directory] 0-gv0-dht: 
assigning range size 0x to gv0-replicate-0
[2017-09-20 13:34:23.353897] D [MSGID: 0] 
[afr-lk-common.c:448:transaction_lk_op] 0-gv0-replicate-0: lk op is for a 
transaction
[2017-09-20 13:34:23.354052] D [MSGID: 0] 
[afr-transaction.c:1883:afr_post_nonblocking_inodelk_cbk] 
0-gv0-replicate-0: Non blocking inodelks done. Proceeding to FOP
[2017-09-20 13:34:23.354453] D [MSGID: 0] 
[afr-lk-common.c:448:transaction_lk_op] 0-gv0-replicate-0: lk op is for a 
transaction
[2017-09-20 13:34:23.354969] D [MSGID: 109036] 
[dht-common.c:9527:dht_log_new_layout_for_dir_selfheal] 0-gv0-dht: Setting 
layout of /tempdir3 with [Subvol_name: gv0-replicate-0, Err: -1 , Start: 0 
, Stop: 4294967295 , Hash: 1 ],
[2017-09-20 13:34:23.355226] D [MSGID: 0] 
[afr-transaction.c:1883:afr_post_nonblocking_inodelk_cbk] 
0-gv0-replicate-0: Non blocking inodelks done. Proceeding to FOP
[2017-09-20 13:34:23.355714] D [MSGID: 0] 
[afr-lk-common.c:448:transaction_lk_op] 0-gv0-replicate-0: lk op is for a 
transaction

-Walter Deignan
-Uline IT, Systems Architect



From:   Amar Tumballi 
To: Walter Deignan 
Cc: "gluster-users@gluster.org List" 
Date:   09/20/2017 01:23 PM
Subject:Re: [Gluster-users] "Input/output error" on mkdir for 
PPC64 based client



Looks like it is an issue with architecture compatibility in RPC layer 
(ie, with XDRs and how it is used). Just glance the logs of the client 
process where you saw the errors, which could give some hints. If you 
don't understand the logs, share them, so we will try to look into it.

-Amar

On Wed, Sep 20, 2017 at 2:40 AM, Walter Deignan  
wrote:
I recently compiled the 3.10-5 client from source on a few PPC64 systems 
running RHEL 7.3. They are mounting a Gluster volume which is hosted on 
more traditional x86 servers. 

Everything seems to be working properly except for creating new 
directories from the PPC64 clients. The mkdir command gives a 
"Input/output error" and for the first few minutes the new directory is 
inaccessible. I checked the backend bricks and confirmed the directory was 
created properly on all of them. After waiting for 2-5 minutes the 
directory magically becomes accessible. 

This inaccessible directory issue only appears from the client which 
created it. When creating the directory from client #1 I can immediately 
see it with no errors from client #2. 

Using a pre-compiled 3.10-5 package on an x86 client doesn't show the 
issue. 

I poked around bugzilla but couldn't seem to find anything which matches 
this. 

[root@mqdev1 hafsdev1_gv0]# ls -lh 
total 8.0K 
drwxrwxr-x. 4 mqm  mqm  4.0K Sep 19 15:47 data 
drwxr-xr-x. 2 root root 4.0K Sep 19 15:47 testdir 
[root@mqdev1 hafsdev1_gv0]# mkdir testdir2 
mkdir: cannot create directory ?testdir2?: Input/output error 
[root@mqdev1 hafsdev1_gv0]# ls 
ls: cannot access testdir2: No such file or directory 
data  testdir  testdir2 
[root@mqdev1 hafsdev1_gv0]# ls -lht 
ls: cannot access testdir2: No such file or directory 
total 8.0K 
drwxr-xr-x. 2 root root 4.0K Sep 19 15:47 testdir 
drwxrwxr-x. 4 mqm  mqm  4.0K Sep 19 15:47 data 
d?? ? ??   ?? testdir2 
[root@mqdev1 hafsdev1_gv0]# cd testdir2 
-bash: cd: testdir2: No such file or directory 

*Wait a few minutes...* 

[root@mqdev1 hafsdev1_gv0]# ls -lht 
total 12K 
drwxr-xr-x. 2 root root 4.0K Sep 19 15:50 testdir2 
drwxr-xr-x. 2 root root 4.0K Sep 19 15:47 testdir 
drwxrwxr-x. 4 mqm  mqm  4.0K Sep 19 15:47 data 
[root@mqdev1 hafsdev1_gv0]# 

My volume config... 

[root@dc-hafsdev1a bricks]# gluster volume info 

Volume Name: gv0 
Type: Replicate 
Volume ID: a2d37705-05cb-4700-8ed8-2cb89376faf0 
Status: Started 
Snapshot Count: 0 
Number of Bricks: 1 x 3 = 3 
Transport-type: tcp 
Bricks: 
Brick1: dc-hafsdev1a.ulinedm.com:/gluster/bricks/brick1/data 
Brick2: dc-hafsdev1b.ulinedm.com:/gluster/bricks/brick1/data 
Brick3: dc-hafsdev1c.ulinedm.com:/gluster/bricks/brick1/data 
Options Reconfigured: 
nfs.disable: on 
transport.address-family: inet 
network.ping-timeout: 2 
features.bitrot: on 
features.scrub: Active 
cluster.server-quorum-ratio: 51% 

-Walter Deignan
-Uline IT, Systems Architect
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users



-- 
Amar Tumballi (amarts)

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] "Input/output error" on mkdir for PPC64 based client

2017-09-20 Thread Walter Deignan
I recently compiled the 3.10-5 client from source on a few PPC64 systems 
running RHEL 7.3. They are mounting a Gluster volume which is hosted on 
more traditional x86 servers.

Everything seems to be working properly except for creating new 
directories from the PPC64 clients. The mkdir command gives a 
"Input/output error" and for the first few minutes the new directory is 
inaccessible. I checked the backend bricks and confirmed the directory was 
created properly on all of them. After waiting for 2-5 minutes the 
directory magically becomes accessible.

This inaccessible directory issue only appears from the client which 
created it. When creating the directory from client #1 I can immediately 
see it with no errors from client #2.

Using a pre-compiled 3.10-5 package on an x86 client doesn't show the 
issue.

I poked around bugzilla but couldn't seem to find anything which matches 
this.

[root@mqdev1 hafsdev1_gv0]# ls -lh
total 8.0K
drwxrwxr-x. 4 mqm  mqm  4.0K Sep 19 15:47 data
drwxr-xr-x. 2 root root 4.0K Sep 19 15:47 testdir
[root@mqdev1 hafsdev1_gv0]# mkdir testdir2
mkdir: cannot create directory ?testdir2?: Input/output error
[root@mqdev1 hafsdev1_gv0]# ls
ls: cannot access testdir2: No such file or directory
data  testdir  testdir2
[root@mqdev1 hafsdev1_gv0]# ls -lht
ls: cannot access testdir2: No such file or directory
total 8.0K
drwxr-xr-x. 2 root root 4.0K Sep 19 15:47 testdir
drwxrwxr-x. 4 mqm  mqm  4.0K Sep 19 15:47 data
d?? ? ??   ?? testdir2
[root@mqdev1 hafsdev1_gv0]# cd testdir2
-bash: cd: testdir2: No such file or directory

*Wait a few minutes...*

[root@mqdev1 hafsdev1_gv0]# ls -lht
total 12K
drwxr-xr-x. 2 root root 4.0K Sep 19 15:50 testdir2
drwxr-xr-x. 2 root root 4.0K Sep 19 15:47 testdir
drwxrwxr-x. 4 mqm  mqm  4.0K Sep 19 15:47 data
[root@mqdev1 hafsdev1_gv0]#

My volume config...

[root@dc-hafsdev1a bricks]# gluster volume info

Volume Name: gv0
Type: Replicate
Volume ID: a2d37705-05cb-4700-8ed8-2cb89376faf0
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: dc-hafsdev1a.ulinedm.com:/gluster/bricks/brick1/data
Brick2: dc-hafsdev1b.ulinedm.com:/gluster/bricks/brick1/data
Brick3: dc-hafsdev1c.ulinedm.com:/gluster/bricks/brick1/data
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
network.ping-timeout: 2
features.bitrot: on
features.scrub: Active
cluster.server-quorum-ratio: 51%

-Walter Deignan
-Uline IT, Systems Architect___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Deleting large files on sharded volume hangs and doesn't delete shards

2017-05-24 Thread Walter Deignan
Sorry for the slow response. Your hunch was right. It seems to be a 
problem between tiering and sharding.

I untiered the volume and the symptom vanished. I then deleted and 
recreated the volume entirely (without tiering) in order to cleanup the 
orphaned shards.

-Walter Deignan
-Uline IT, Systems Architect



From:   Nithya Balachandran 
To: Walter Deignan 
Cc: gluster-users 
Date:   05/17/2017 10:17 PM
Subject:Re: [Gluster-users] Deleting large files on sharded volume 
hangs and doesn't delete shards



I don't think we have tested shards with a tiered volume.  Do you see such 
issues on non-tiered sharded volumes?

Regards,
Nithya

On 18 May 2017 at 00:51, Walter Deignan  wrote:
I have a reproducible issue where attempting to delete a file large enough 
to have been sharded hangs. I can't kill the 'rm' command and eventually 
am forced to reboot the client (which in this case is also part of the 
gluster cluster). After the node finishes rebooting I can see that while 
the file front-end is gone, the back-end shards are still present. 

Is this a known issue? Any way to get around it? 

-- 

[root@dc-vihi19 ~]# gluster volume info gv0 

Volume Name: gv0 
Type: Tier 
Volume ID: d42e366f-381d-4787-bcc5-cb6770cb7d58 
Status: Started 
Snapshot Count: 0 
Number of Bricks: 24 
Transport-type: tcp 
Hot Tier : 
Hot Tier Type : Distributed-Replicate 
Number of Bricks: 4 x 2 = 8 
Brick1: dc-vihi71:/gluster/bricks/brick4/data 
Brick2: dc-vihi19:/gluster/bricks/brick4/data 
Brick3: dc-vihi70:/gluster/bricks/brick4/data 
Brick4: dc-vihi19:/gluster/bricks/brick3/data 
Brick5: dc-vihi71:/gluster/bricks/brick3/data 
Brick6: dc-vihi19:/gluster/bricks/brick2/data 
Brick7: dc-vihi70:/gluster/bricks/brick3/data 
Brick8: dc-vihi19:/gluster/bricks/brick1/data 
Cold Tier: 
Cold Tier Type : Distributed-Replicate 
Number of Bricks: 8 x 2 = 16 
Brick9: dc-vihi19:/gluster/bricks/brick5/data 
Brick10: dc-vihi70:/gluster/bricks/brick1/data 
Brick11: dc-vihi19:/gluster/bricks/brick6/data 
Brick12: dc-vihi71:/gluster/bricks/brick1/data 
Brick13: dc-vihi19:/gluster/bricks/brick7/data 
Brick14: dc-vihi70:/gluster/bricks/brick2/data 
Brick15: dc-vihi19:/gluster/bricks/brick8/data 
Brick16: dc-vihi71:/gluster/bricks/brick2/data 
Brick17: dc-vihi19:/gluster/bricks/brick9/data 
Brick18: dc-vihi70:/gluster/bricks/brick5/data 
Brick19: dc-vihi19:/gluster/bricks/brick10/data 
Brick20: dc-vihi71:/gluster/bricks/brick5/data 
Brick21: dc-vihi19:/gluster/bricks/brick11/data 
Brick22: dc-vihi70:/gluster/bricks/brick6/data 
Brick23: dc-vihi19:/gluster/bricks/brick12/data 
Brick24: dc-vihi71:/gluster/bricks/brick6/data 
Options Reconfigured: 
nfs.disable: on 
transport.address-family: inet 
features.ctr-enabled: on 
cluster.tier-mode: cache 
features.shard: on 
features.shard-block-size: 512MB 
network.ping-timeout: 5 
cluster.server-quorum-ratio: 51% 

[root@dc-vihi19 temp]# ls -lh 
total 26G 
-rw-rw-rw-. 1 root root 31G May 17 10:38 win7.qcow2 
[root@dc-vihi19 temp]# getfattr -n glusterfs.gfid.string win7.qcow2 
# file: win7.qcow2 
glusterfs.gfid.string="7f4a0fea-72c0-41e4-97a5-6297be0a9142" 

[root@dc-vihi19 temp]# rm win7.qcow2 
rm: remove regular file âwin7.qcow2â? y 

*Process hangs and can't be killed. A reboot later...* 

login as: root 
Authenticating with public key "rsa-key-20170510" 
Last login: Wed May 17 14:04:29 2017 from ** 
[root@dc-vihi19 ~]# find /gluster/bricks -name 
"7f4a0fea-72c0-41e4-97a5-6297be0a9142*" 
/gluster/bricks/brick1/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.23 

/gluster/bricks/brick1/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.35 

/gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.52 

/gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.29 

/gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.22 

/gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.24 


and so on... 


-Walter Deignan
-Uline IT, Systems Architect
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Deleting large files on sharded volume hangs and doesn't delete shards

2017-05-17 Thread Walter Deignan
I have a reproducible issue where attempting to delete a file large enough 
to have been sharded hangs. I can't kill the 'rm' command and eventually 
am forced to reboot the client (which in this case is also part of the 
gluster cluster). After the node finishes rebooting I can see that while 
the file front-end is gone, the back-end shards are still present.

Is this a known issue? Any way to get around it?

--

[root@dc-vihi19 ~]# gluster volume info gv0

Volume Name: gv0
Type: Tier
Volume ID: d42e366f-381d-4787-bcc5-cb6770cb7d58
Status: Started
Snapshot Count: 0
Number of Bricks: 24
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 4 x 2 = 8
Brick1: dc-vihi71:/gluster/bricks/brick4/data
Brick2: dc-vihi19:/gluster/bricks/brick4/data
Brick3: dc-vihi70:/gluster/bricks/brick4/data
Brick4: dc-vihi19:/gluster/bricks/brick3/data
Brick5: dc-vihi71:/gluster/bricks/brick3/data
Brick6: dc-vihi19:/gluster/bricks/brick2/data
Brick7: dc-vihi70:/gluster/bricks/brick3/data
Brick8: dc-vihi19:/gluster/bricks/brick1/data
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 8 x 2 = 16
Brick9: dc-vihi19:/gluster/bricks/brick5/data
Brick10: dc-vihi70:/gluster/bricks/brick1/data
Brick11: dc-vihi19:/gluster/bricks/brick6/data
Brick12: dc-vihi71:/gluster/bricks/brick1/data
Brick13: dc-vihi19:/gluster/bricks/brick7/data
Brick14: dc-vihi70:/gluster/bricks/brick2/data
Brick15: dc-vihi19:/gluster/bricks/brick8/data
Brick16: dc-vihi71:/gluster/bricks/brick2/data
Brick17: dc-vihi19:/gluster/bricks/brick9/data
Brick18: dc-vihi70:/gluster/bricks/brick5/data
Brick19: dc-vihi19:/gluster/bricks/brick10/data
Brick20: dc-vihi71:/gluster/bricks/brick5/data
Brick21: dc-vihi19:/gluster/bricks/brick11/data
Brick22: dc-vihi70:/gluster/bricks/brick6/data
Brick23: dc-vihi19:/gluster/bricks/brick12/data
Brick24: dc-vihi71:/gluster/bricks/brick6/data
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
features.ctr-enabled: on
cluster.tier-mode: cache
features.shard: on
features.shard-block-size: 512MB
network.ping-timeout: 5
cluster.server-quorum-ratio: 51%

[root@dc-vihi19 temp]# ls -lh
total 26G
-rw-rw-rw-. 1 root root 31G May 17 10:38 win7.qcow2
[root@dc-vihi19 temp]# getfattr -n glusterfs.gfid.string win7.qcow2
# file: win7.qcow2
glusterfs.gfid.string="7f4a0fea-72c0-41e4-97a5-6297be0a9142"

[root@dc-vihi19 temp]# rm win7.qcow2
rm: remove regular file âwin7.qcow2â? y

*Process hangs and can't be killed. A reboot later...*

login as: root
Authenticating with public key "rsa-key-20170510"
Last login: Wed May 17 14:04:29 2017 from **
[root@dc-vihi19 ~]# find /gluster/bricks -name 
"7f4a0fea-72c0-41e4-97a5-6297be0a9142*"
/gluster/bricks/brick1/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.23
/gluster/bricks/brick1/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.35
/gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.52
/gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.29
/gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.22
/gluster/bricks/brick2/data/.shard/7f4a0fea-72c0-41e4-97a5-6297be0a9142.24

and so on...


-Walter Deignan
-Uline IT, Systems Architect___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Arbiter and Hot Tier

2017-05-05 Thread Walter Deignan
Thank you very much for the assistance.

-Walter Deignan
-Uline IT, Systems Architect



From:   Ravishankar N 
To: Walter Deignan 
Cc: gluster-users@gluster.org
Date:   05/05/2017 11:43 AM
Subject:Re: [Gluster-users] Arbiter and Hot Tier



Okay, just tried it in 3.9  Even though attaching a non arbiter volume 
(say a replica-2 ) as hot-tier is succeeding at the CLI level, the brick 
volfiles seem to be generated incorrectly (I see that the arbiter 
translator is getting loaded in one of the replica-2 bricks too, which is 
incorrect). I'd recommend not to use arbiter for tiering irrespective of 
hot or cold tier.

 
On 05/05/2017 09:42 PM, Walter Deignan wrote:
Did that change between 3.9 and 3.10? When I originally saw some 
references on the Redhat storage packaged solution about a possible 
incompatibility I assumed it just meant that the hot tier itself couldn't 
be an arbiter volume. 

I was tripped up by the change in apparent support for the cold tier 
between 3.9 and 3.10. But maybe that was just a fixed oversight which 
never should have worked in the first place? 

-Walter Deignan
-Uline IT, Systems Architect 



From:Ravishankar N  
To:    Walter Deignan , gluster-users@gluster.org 
Date:05/05/2017 11:08 AM 
Subject:Re: [Gluster-users] Arbiter and Hot Tier 



Hi Walter,
Yes, arbiter volumes are currently not supported with tiering.
-Ravi

On 05/05/2017 08:54 PM, Walter Deignan wrote: 
I've been googling this to no avail so apologies if this is explained 
somewhere I missed. 

Is there a known incompatibility between using arbiters and hot tiering? 

Experience on 3.9 

Original volume - replica 3 arbiter 1 
Attach replica 2 arbiter 1 hot tier - failure 
Attach replica 3 hot tier - success 

Experience on 3.10 

Original volume - replica 3 arbiter 1 
Attach replica 2 arbiter 1 hot tier - failure 
Attach replica 3 hot tier - failure 
Attach hot tier without specifying replica - success but comes in as a 
distributed tier which I would assume totally negates the point of having 
a replicated cold tier? 

The specific error message I get is "volume attach-tier: failed: 
Increasing replica count for arbiter volumes is not supported." 

-Walter Deignan
-Uline IT, Systems Architect 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users 


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Arbiter and Hot Tier

2017-05-05 Thread Walter Deignan
Did that change between 3.9 and 3.10? When I originally saw some 
references on the Redhat storage packaged solution about a possible 
incompatibility I assumed it just meant that the hot tier itself couldn't 
be an arbiter volume.

I was tripped up by the change in apparent support for the cold tier 
between 3.9 and 3.10. But maybe that was just a fixed oversight which 
never should have worked in the first place?

-Walter Deignan
-Uline IT, Systems Architect



From:   Ravishankar N 
To: Walter Deignan , gluster-users@gluster.org
Date:   05/05/2017 11:08 AM
Subject:Re: [Gluster-users] Arbiter and Hot Tier



Hi Walter,
Yes, arbiter volumes are currently not supported with tiering.
-Ravi

On 05/05/2017 08:54 PM, Walter Deignan wrote:
I've been googling this to no avail so apologies if this is explained 
somewhere I missed. 

Is there a known incompatibility between using arbiters and hot tiering? 

Experience on 3.9 

Original volume - replica 3 arbiter 1 
Attach replica 2 arbiter 1 hot tier - failure 
Attach replica 3 hot tier - success 

Experience on 3.10 

Original volume - replica 3 arbiter 1 
Attach replica 2 arbiter 1 hot tier - failure 
Attach replica 3 hot tier - failure 
Attach hot tier without specifying replica - success but comes in as a 
distributed tier which I would assume totally negates the point of having 
a replicated cold tier? 

The specific error message I get is "volume attach-tier: failed: 
Increasing replica count for arbiter volumes is not supported." 

-Walter Deignan
-Uline IT, Systems Architect 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Arbiter and Hot Tier

2017-05-05 Thread Walter Deignan
I've been googling this to no avail so apologies if this is explained 
somewhere I missed.

Is there a known incompatibility between using arbiters and hot tiering?

Experience on 3.9

Original volume - replica 3 arbiter 1
Attach replica 2 arbiter 1 hot tier - failure
Attach replica 3 hot tier - success

Experience on 3.10

Original volume - replica 3 arbiter 1
Attach replica 2 arbiter 1 hot tier - failure
Attach replica 3 hot tier - failure
Attach hot tier without specifying replica - success but comes in as a 
distributed tier which I would assume totally negates the point of having 
a replicated cold tier?

The specific error message I get is "volume attach-tier: failed: 
Increasing replica count for arbiter volumes is not supported."

-Walter Deignan
-Uline IT, Systems Architect___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users