date:20180523

[Gluster-users] Registration for mountpoint is available!

2018-05-23 Thread Amye Scavarda

mountpoint (https://mountpoint.io/) a open source software storage
conference is a co-located event with Open Source Summit North
America, August 27-28, 2018.
https://events.linuxfoundation.org/events/open-source-summit-north-america-2018/program/co-located-events/
Registration for just mountpoint is now available at:
https://www.regonline.com/registration/Checkin.aspx?EventID=2447527

Looking forward to seeing you there!
- amye

-- 
Amye Scavarda | a...@redhat.com | Gluster Community Lead
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Rebalance state stuck or corrupted

2018-05-23 Thread Anh Vo

We have had a rebalance operation going on for a few days. After a couple
days the rebalance status said "failed". We stopped the rebalance operation
by doing gluster volume rebalance gv0 stop. Rebalance log indicated gluster
did try to stop the rebalance. However, when we try now to stop the volume
or try to restart rebalance it says there's a rebalance operation going on
and volume can't be stopped. I tried restarting all the glusterfs-server
service (we're using Gluster 3.8.15 on Ubuntu) but that did not help

user@gfs-vm000:~$ sudo gluster volume stop gv0
Stopping volume will make its data inaccessible. Do you want to continue?
(y/n) y
volume stop: gv0: failed: Staging failed on gfs-vm001. Error: rebalance
session is in progress for the volume 'gv0'
Staging failed on gfs-vm017. Error: rebalance session is in progress for
the volume 'gv0'
Staging failed on gfs-vm011. Error: rebalance session is in progress for
the volume 'gv0'
Staging failed on gfs-vm006. Error: rebalance session is in progress for
the volume 'gv0'
Staging failed on gfs-vm003. Error: rebalance session is in progress for
the volume 'gv0'
Staging failed on gfs-vm004. Error: rebalance session is in progress for
the volume 'gv0'
Staging failed on 10.0.13.9. Error: rebalance session is in progress for
the volume 'gv0'
Staging failed on gfs-vm014. Error: rebalance session is in progress for
the volume 'gv0'
Staging failed on gfs-vm013. Error: rebalance session is in progress for
the volume 'gv0'
Staging failed on gfs-vm002. Error: rebalance session is in progress for
the volume 'gv0'
Staging failed on gfs-vm016. Error: rebalance session is in progress for
the volume 'gv0'
Staging failed on gfs-vm007. Error: rebalance session is in progress for
the volume 'gv0'
Staging failed on gfs-vm010. Error: rebalance session is in progress for
the volume 'gv0'
user@gfs-vm000:~$ sudo gluster volume rebalance gv0 stop
volume rebalance: gv0: failed: Rebalance not started.

tail log from gv0-rebalance.log

[2018-05-23 17:32:55.262168] I [MSGID: 109029]
[dht-rebalance.c:4260:gf_defrag_stop] 0-: Received stop command on rebalance
[2018-05-23 17:32:55.262221] I [MSGID: 109028]
[dht-rebalance.c:4079:gf_defrag_status_get] 0-glusterfs: Rebalance is
stopped. Time taken is 749380.00 secs
[2018-05-23 17:32:55.262234] I [MSGID: 109028]
[dht-rebalance.c:4083:gf_defrag_status_get] 0-glusterfs: Files migrated:
821417, size: 25797609415002, lookups: 1162021, failures: 0, skipped: 1814
[2018-05-23 17:32:55.777149] I [MSGID: 109022]
[dht-rebalance.c:1703:dht_migrate_file] 0-gv0-dht: completed migration of
/pnrsy/v-zhli2/generated/ende_with_teacher/model/translate_ende_wmt32k_distill/transformer_nat-transformer_nat_base_v1-id016_lr0.1_4000_reg5.0_neighbor_hinge0.5_exp_distill_2.0_no_average_kl/model.ckpt-50724.data-2-of-3
from subvolume gv0-replicate-0 to gv0-replicate-3
[2018-05-23 17:32:55.782048] W [dht-rebalance.c:2826:gf_defrag_process_dir]
0-gv0-dht: Found error from gf_defrag_get_entry
[2018-05-23 17:32:55.782358] E [MSGID: 109111]
[dht-rebalance.c:3123:gf_defrag_fix_layout] 0-gv0-dht:
gf_defrag_process_dir failed for directory:
/pnrsy/v-zhli2/generated/ende_with_teacher/model/translate_ende_wmt32k_distill/transformer_nat-transformer_nat_base_v1-id016_lr0.1_4000_reg5.0_neighbor_hinge0.5_exp_distill_2.0_no_average_kl
[2018-05-23 17:32:56.115106] E [MSGID: 109016]
[dht-rebalance.c:3334:gf_defrag_fix_layout] 0-gv0-dht: Fix layout failed
for
/pnrsy/v-zhli2/generated/ende_with_teacher/model/translate_ende_wmt32k_distill/transformer_nat-transformer_nat_base_v1-id016_lr0.1_4000_reg5.0_neighbor_hinge0.5_exp_distill_2.0_no_average_kl
[2018-05-23 17:32:56.115586] E [MSGID: 109016]
[dht-rebalance.c:3334:gf_defrag_fix_layout] 0-gv0-dht: Fix layout failed
for
/pnrsy/v-zhli2/generated/ende_with_teacher/model/translate_ende_wmt32k_distill
[2018-05-23 17:32:56.115849] E [MSGID: 109016]
[dht-rebalance.c:3334:gf_defrag_fix_layout] 0-gv0-dht: Fix layout failed
for /pnrsy/v-zhli2/generated/ende_with_teacher/model
[2018-05-23 17:32:56.116141] E [MSGID: 109016]
[dht-rebalance.c:3334:gf_defrag_fix_layout] 0-gv0-dht: Fix layout failed
for /pnrsy/v-zhli2/generated/ende_with_teacher
[2018-05-23 17:32:56.116237] E [MSGID: 109016]
[dht-rebalance.c:3334:gf_defrag_fix_layout] 0-gv0-dht: Fix layout failed
for /pnrsy/v-zhli2/generated
[2018-05-23 17:32:56.116393] E [MSGID: 109016]
[dht-rebalance.c:3334:gf_defrag_fix_layout] 0-gv0-dht: Fix layout failed
for /pnrsy/v-zhli2
[2018-05-23 17:32:56.116625] E [MSGID: 109016]
[dht-rebalance.c:3334:gf_defrag_fix_layout] 0-gv0-dht: Fix layout failed
for /pnrsy
[2018-05-23 17:32:56.129836] I [dht-rebalance.c:2246:gf_defrag_task] 0-DHT:
Thread wokeup. defrag->current_thread_count: 7
[2018-05-23 17:32:56.130072] I [dht-rebalance.c:2246:gf_defrag_task] 0-DHT:
Thread wokeup. defrag->current_thread_count: 8
[2018-05-23 17:32:56.130567] I [dht-rebalance.c:2246:gf_defrag_task] 0-DHT:
Thread wokeup. defrag->current_thread_count: 9
[2018-05-23 17:32:56.131273] I

Re: [Gluster-users] [Nfs-ganesha-support] [SOLVED] volume start: gv01: failed: Quorum not met. Volume operation not allowed.

2018-05-23 Thread Daniel Gryniewicz



Thanks, Tom.  Good to know.

Daniel

On 05/22/2018 01:43 AM, TomK wrote:
This list has been deprecated. Please subscribe to the new support list 
at lists.nfs-ganesha.org.

Hey All,

Appears I solved this one and NFS mounts now work on all my clients.  No 
issues since fixing it a few hours back.


RESOLUTION

Auditd is to blame for the trouble.  Noticed this in the logs on 2 of 
the 3 NFS servers (nfs01, nfs02, nfs03):


type=AVC msg=audit(1526965320.850:4094): avc:  denied  { write } for 
pid=8714 comm="ganesha.nfsd" name="nfs_0" dev="dm-0" ino=201547689 
scontext=system_u:system_r:ganesha_t:s0 
tcontext=system_u:object_r:krb5_host_rcache_t:s0 tclass=file
type=SYSCALL msg=audit(1526965320.850:4094): arch=c03e syscall=2 
success=no exit=-13 a0=7f23b0003150 a1=2 a2=180 a3=2 items=0 ppid=1 
pid=8714 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 
fsgid=0 tty=(none) ses=4294967295 comm="ganesha.nfsd" 
exe="/usr/bin/ganesha.nfsd" subj=system_u:system_r:ganesha_t:s0 key=(null)
type=PROCTITLE msg=audit(1526965320.850:4094): 
proctitle=2F7573722F62696E2F67616E657368612E6E667364002D4C002F7661722F6C6F672F67616E657368612F67616E657368612E6C6F67002D66002F6574632F67616E657368612F67616E657368612E636F6E66002D4E004E49565F4556454E54 

type=AVC msg=audit(1526965320.850:4095): avc:  denied  { unlink } for 
pid=8714 comm="ganesha.nfsd" name="nfs_0" dev="dm-0" ino=201547689 
scontext=system_u:system_r:ganesha_t:s0 
tcontext=system_u:object_r:krb5_host_rcache_t:s0 tclass=file
type=SYSCALL msg=audit(1526965320.850:4095): arch=c03e syscall=87 
success=no exit=-13 a0=7f23b0004100 a1=7f23b050 a2=7f23b0004100 a3=5 
items=0 ppid=1 pid=8714 auid=4294967295 uid=0 gid=0 euid=0 suid=0 
fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 
comm="ganesha.nfsd" exe="/usr/bin/ganesha.nfsd" 
subj=system_u:system_r:ganesha_t:s0 key=(null)
type=PROCTITLE msg=audit(1526965320.850:4095): 
proctitle=2F7573722F62696E2F67616E657368612E6E667364002D4C002F7661722F6C6F672F67616E657368612F67616E657368612E6C6F67002D66002F6574632F67616E657368612F67616E657368612E636F6E66002D4E004E49565F4556454E54 



Fix was to adjust the SELinux rules using audit2allow.

All the errors below including the one in the link below, were due to that.

Turns out that when ever it worked, it hit the only working server in 
the system, nfs03.  Whenever it didn't work, it was hitting the non 
working servers.  So sometimes it worked, and other times it didn't.  It 
looked like it was to do with Haproxy / Keepalived as well since I 
couldn't mount using the VIP but could using the host.  But that wasn't 
the case either.


I've also added the third brick to the Gluster FS, nfs03, trying to see 
if the backend FS was to blame since Gluster FS recommends 3 bricks 
minimum for replication, but that had no effect.


In case anyone runs into this, I've added notes here as well:

http://microdevsys.com/wp/kernel-nfs-nfs4_discover_server_trunking-unhandled-error-512-exiting-with-error-eio-and-mount-hangs/ 



http://microdevsys.com/wp/nfs-reply-xid-3844308326-reply-err-20-auth-rejected-credentials-client-should-begin-new-session/ 



The errors thrown included:

NFS reply xid 3844308326 reply ERR 20: Auth Rejected Credentials (client 
should begin new session)


kernel: NFS: nfs4_discover_server_trunking unhandled error -512. Exiting 
with error EIO and mount hangs


+ the kernel exception below.



___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-05-23 Thread Ravishankar N




On 05/23/2018 12:47 PM, mabi wrote:

Hello,

I just wanted to ask if you had time to look into this bug I am encountering 
and if there is anything else I can do?

For now in order to get rid of these 3 unsynched files shall I do the same 
method that was suggested to me in this thread?
Sorry Mabi,  I haven't had a chance to dig deeper into this. The 
workaround of resetting xattrs should be fine though.

Thanks,
Ravi


Thanks,
Mabi


‐‐‐ Original Message ‐‐‐

On May 17, 2018 11:07 PM, mabi  wrote:




Hi Ravi,

Please fine below the answers to your questions

1.  I have never touched the cluster.quorum-type option. Currently it is set as 
following for this volume:
 
 Option Value
 


cluster.quorum-type none

2) The .shareKey files are not supposed to be empty. They should be 512 bytes 
big and contain binary data (PGP Secret Sub-key). I am not in a position to say 
why it is in this specific case only 0 bytes and if it is the fault of the 
software (Nextcloud) or GlusterFS. I can just say here that I have another file 
server which is a simple NFS server with another Nextcloud installation and 
there I never saw any 0 bytes .shareKey files being created.

3) It seems to be quite random and I am not the person who uses the Nextcloud 
software so I can't say what it was doing at that specific time but I guess 
uploading files or moving files around. Basically I use GlusterFS to store the 
files/data of the Nextcloud web application where I have it mounted using a 
fuse mount (mount -t glusterfs).

Regarding the logs I have attached the mount log file from the client and below are the 
relevant log entries from the brick log file of all 3 nodes. Let me know if you need any 
other log files. Also if you know any "log file sanitizer tool" which can 
replace sensitive file names with random file names in log files that would like to use 
it as right now I have to do that manually.

NODE 1 brick log:

[2018-05-15 06:54:20.176679] E [MSGID: 113015] [posix.c:1211:posix_opendir] 
0-myvol-private-posix: opendir failed on 
/data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
 [No such file or directory]

NODE 2 brick log:

[2018-05-15 06:54:20.176415] E [MSGID: 113015] [posix.c:1211:posix_opendir] 
0-myvol-private-posix: opendir failed on 
/data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
 [No such file or directory]

NODE 3 (arbiter) brick log:

[2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] 
0-myvol-private-posix: Found stale gfid handle 
/srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155,
 removing it. [Stale file handle]

[2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] 
0-myvol-private-posix: Found stale gfid handle 
/srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13,
 removing it. [Stale file handle]

[2018-05-15 06:54:20.172823] I [MSGID: 115056] 
[server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740125: 
RMDIR 
/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
 (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: 
nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0,
 error-xlator: myvol-private-posix [Directory not empty]

[2018-05-15 06:54:20.190911] I [MSGID: 115056] 
[server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740141: 
RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir 
(72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 AVB_Photovoltaik-Versicherung 
2013.pdf), client: 
nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0,
 error-xlator: myvol-private-posix [Directory not empty]

Best regards,

Mabi

‐‐‐ Original Message ‐‐‐

On May 17, 2018 7:00 AM, Ravishankar N ravishan...@redhat.com wrote:


Hi mabi,

Some questions:

-Did you by any chance change the cluster.quorum-type option from the

default values?

-Is filename.shareKey supposed to be any empty file? Looks like the file

was fallocated with the keep-size option but never written to. (On the 2

data bricks, stat output shows Size =0, but non zero Blocks and yet a

'regular empty file').

-Do you have some sort of a reproducer/ steps that you perform when the

issue occurs? Please also share the logs from all 3 nodes and the client(s).

Thanks,

Ravi

On 05/15/2018 05:26 PM, mabi wrote:


Thank you Ravi for your fast answer. As requested you will find below the "stat" and 
"getfattr" of one of the files and its parent directory from all three nodes of my 
cluster.

NODE 1:

File: 
‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’

Size: 0 Blocks: 38 IO Block: 131072 regular empty file

Device: 23h/35d Inode: 744413 Links: 2

Access: (0644

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-05-23 Thread mabi

Hello,

I just wanted to ask if you had time to look into this bug I am encountering 
and if there is anything else I can do?

For now in order to get rid of these 3 unsynched files shall I do the same 
method that was suggested to me in this thread?

Thanks,
Mabi


‐‐‐ Original Message ‐‐‐

On May 17, 2018 11:07 PM, mabi  wrote:

> 
> 
> Hi Ravi,
> 
> Please fine below the answers to your questions
> 
> 1.  I have never touched the cluster.quorum-type option. Currently it is set 
> as following for this volume:
> 
> Option Value
> 
> 
> cluster.quorum-type none
> 
> 2) The .shareKey files are not supposed to be empty. They should be 512 bytes 
> big and contain binary data (PGP Secret Sub-key). I am not in a position to 
> say why it is in this specific case only 0 bytes and if it is the fault of 
> the software (Nextcloud) or GlusterFS. I can just say here that I have 
> another file server which is a simple NFS server with another Nextcloud 
> installation and there I never saw any 0 bytes .shareKey files being created.
> 
> 3) It seems to be quite random and I am not the person who uses the Nextcloud 
> software so I can't say what it was doing at that specific time but I guess 
> uploading files or moving files around. Basically I use GlusterFS to store 
> the files/data of the Nextcloud web application where I have it mounted using 
> a fuse mount (mount -t glusterfs).
> 
> Regarding the logs I have attached the mount log file from the client and 
> below are the relevant log entries from the brick log file of all 3 nodes. 
> Let me know if you need any other log files. Also if you know any "log file 
> sanitizer tool" which can replace sensitive file names with random file names 
> in log files that would like to use it as right now I have to do that 
> manually.
> 
> NODE 1 brick log:
> 
> [2018-05-15 06:54:20.176679] E [MSGID: 113015] [posix.c:1211:posix_opendir] 
> 0-myvol-private-posix: opendir failed on 
> /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
>  [No such file or directory]
> 
> NODE 2 brick log:
> 
> [2018-05-15 06:54:20.176415] E [MSGID: 113015] [posix.c:1211:posix_opendir] 
> 0-myvol-private-posix: opendir failed on 
> /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
>  [No such file or directory]
> 
> NODE 3 (arbiter) brick log:
> 
> [2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] 
> 0-myvol-private-posix: Found stale gfid handle 
> /srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155,
>  removing it. [Stale file handle]
> 
> [2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] 
> 0-myvol-private-posix: Found stale gfid handle 
> /srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13,
>  removing it. [Stale file handle]
> 
> [2018-05-15 06:54:20.172823] I [MSGID: 115056] 
> [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740125: 
> RMDIR 
> /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
>  (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: 
> nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0,
>  error-xlator: myvol-private-posix [Directory not empty]
> 
> [2018-05-15 06:54:20.190911] I [MSGID: 115056] 
> [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740141: 
> RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir 
> (72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 
> AVB_Photovoltaik-Versicherung 2013.pdf), client: 
> nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0,
>  error-xlator: myvol-private-posix [Directory not empty]
> 
> Best regards,
> 
> Mabi
> 
> ‐‐‐ Original Message ‐‐‐
> 
> On May 17, 2018 7:00 AM, Ravishankar N ravishan...@redhat.com wrote:
> 
> > Hi mabi,
> > 
> > Some questions:
> > 
> > -Did you by any chance change the cluster.quorum-type option from the
> > 
> > default values?
> > 
> > -Is filename.shareKey supposed to be any empty file? Looks like the file
> > 
> > was fallocated with the keep-size option but never written to. (On the 2
> > 
> > data bricks, stat output shows Size =0, but non zero Blocks and yet a
> > 
> > 'regular empty file').
> > 
> > -Do you have some sort of a reproducer/ steps that you perform when the
> > 
> > issue occurs? Please also share the logs from all 3 nodes and the client(s).
> > 
> > Thanks,
> > 
> > Ravi
> > 
> > On 05/15/2018 05:26 PM, mabi wrote:
> > 
> > > Thank you Ravi for your fast answer. As requested you will find below the 
> > > "stat" and "getfattr" of one of the files and its parent directory from 
> > > all three nodes of my cluster.
> > > 
> > > NODE 1:
> > > 
> > > File: 
> > > ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAU

[Gluster-users] Registration for mountpoint is available!

[Gluster-users] Rebalance state stuck or corrupted

Re: [Gluster-users] [Nfs-ganesha-support] [SOLVED] volume start: gv01: failed: Quorum not met. Volume operation not allowed.

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

5 matches

Site Navigation

Mail list logo

Footer information