[Gluster-users] issues with replicating data to a new brick
Hello everybody, I have some kind of a situation here I want to move some volumes to new hosts. the idea is to add the new bricks to the volume, sync and then drop the old bricks. starting point is: Volume Name: Server_Monthly_02 Type: Replicate Volume ID: 0ada8e12-15f7-42e9-9da3-2734b04e04e9 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: chastcvtprd04:/data/glusterfs/Server_Monthly/2I-1-40/brick Brick2: chglbcvtprd04:/data/glusterfs/Server_Monthly/2I-1-40/brick Options Reconfigured: features.scrub: Inactive features.bitrot: off nfs.disable: on auth.allow: 127.0.0.1,10.30.28.43,10.30.28.44,10.30.28.17,10.30.28.18,10.8.13.132,10.30.28.30,10.30.28.31 performance.readdir-ahead: on diagnostics.latency-measurement: on diagnostics.count-fop-hits: on root@chastcvtprd04:~# cat /etc/os-release NAME="Ubuntu" VERSION="16.04.3 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.3 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/; SUPPORT_URL="http://help.ubuntu.com/; BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/; VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial root@chastcvtprd04:~# uname -a Linux chastcvtprd04 4.4.0-109-generic #132-Ubuntu SMP Tue Jan 9 19:52:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux root@chastcvtprd04:~# dpkg -l | grep gluster ii glusterfs-client 3.8.15-ubuntu1~xenial1 amd64clustered file-system (client package) ii glusterfs-common 3.8.15-ubuntu1~xenial1 amd64GlusterFS common libraries and translator modules ii glusterfs-server 3.8.15-ubuntu1~xenial1 amd64clustered file-system (server package) root@chastcvtprd04:~# df -h /data/glusterfs/Server_Monthly/2I-1-40/brick Filesystem Size Used Avail Use% Mounted on /dev/bcache47 7.3T 7.3T 45G 100% /data/glusterfs/Server_Monthly/2I-1-40 then I add the new brick Volume Name: Server_Monthly_02 Type: Replicate Volume ID: 0ada8e12-15f7-42e9-9da3-2734b04e04e9 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: chastcvtprd04:/data/glusterfs/Server_Monthly/2I-1-40/brick Brick2: chglbcvtprd04:/data/glusterfs/Server_Monthly/2I-1-40/brick Brick3: chglbglsprd02:/data/glusterfs/Server_Monthly/1I-1-51/brick Options Reconfigured: features.scrub: Inactive features.bitrot: off nfs.disable: on auth.allow: 127.0.0.1,10.30.28.43,10.30.28.44,10.30.28.17,10.30.28.18,10.8.13.132,10.30.28.30,10.30.28.31 performance.readdir-ahead: on diagnostics.latency-measurement: on diagnostics.count-fop-hits: on root@chglbglsprd02:~# cat /etc/os-release NAME="Ubuntu" VERSION="16.04.4 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.4 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/; SUPPORT_URL="http://help.ubuntu.com/; BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/; VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial root@chglbglsprd02:~# uname -a Linux chglbglsprd02 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux root@chglbglsprd02:~# dpkg -l | grep gluster ii glusterfs-client 3.8.15-ubuntu1~xenial1 amd64clustered file-system (client package) ii glusterfs-common 3.8.15-ubuntu1~xenial1 amd64GlusterFS common libraries and translator modules ii glusterfs-server 3.8.15-ubuntu1~xenial1 amd64clustered file-system (server package) then healing kicks in and the cluster starts copying data to the new brick unfortunately after a while it starts complaining [2018-04-10 14:39:32.057443] E [MSGID: 113072] [posix.c:3457:posix_writev] 0-Server_Monthly_02-posix: write failed: offset 0, [No space left on device] [2018-04-10 14:39:32.057538] E [MSGID: 115067] [server-rpc-fops.c:1346:server_writev_cbk] 0-Server_Monthly_02-server: 22835126: WRITEV 0 (48949669-ba1c-4735-b83c-71340f1bb64f) ==> (No space left on device) [No space left on device] root@chglbglsprd02:~# df -h /data/glusterfs/Server_Monthly/1I-1-51/brick Filesystem Size Used Avail Use% Mounted on /dev/sdaq 7.3T 7.3T 20K 100% /data/glusterfs/Server_Monthly/1I-1-51 there's no other I/O going on on this volume, so the copy process should be straight forward BUT I noticed that there are a lot of sparse files on this volume Any ideas on how to make it work? If you need more details, please let me known and I'll try to make them available Kind Regards Bernhard ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Rebooting cluster nodes - GFS3.8
Hi, just wanted to write the same thing. there was once a post that suggested to kill the gluster processes manually but I guess rebooting the machine will do the same. the clients will stall for a while and then continue do access the volume from the remaining node. it is very important that you checj the heal status before you bring down the next node otherwise you could end up in a split brain situation. hope this helps Bernhard 2017-12-05 18:36 GMT+01:00 Andrew Kester: > On my setup at least, just issuing the reboot command works without any > issue. I've done a number of rolling reboots for software / kernel upgrades > in the manner you've described this way. > > The one gotcha I've found is when the node comes back online. I manually > check healing to ensure that everything is synced and back online before > taking other nodes offline. > > --- > Thanks, > > Andrew Kester > The Storehouse > https://sthse.co > > On 12/5/17 10:40 AM, Mark Connor wrote: >> >> I am running gluster ver 3.8 in a distributed replica 2 config. I need to >> reboot all my 8 cluster nodes to update my bios firmware. I would like to >> do a rolling update to my bios and keep up my cluster so my clients don't >> take an outage. Do I need to shutdown all gluster services on each node >> before I reboot? Or just issue the reboot. >> >> >> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] move brick to new location
Hello everybody, we have a number of "replica 3 arbiter 1" or (2 + 1) volumes because we're running out of space on some volumes I need to optimize the usage of the physical disks. that means I want to consolidate volumes with low usage onto the same physical disk. I can do it with "replace-brick commit force" but that looks a bit drastic to me because it immediately drops the current brick and rebuilds the new one from the remaining bricks. Is there a possibility which builds the new brick in the background and changes config only when it's fully in sync? I was thinking about - dropping arbiter brick => replica 2 - adding a new brick => replica 3 - dropping old brick => replica 2 - re-adding arbiter brick => replica 2 arbiter 1 About 20 years ago, I was managing Vertitas Volume Manager. To move a sub-disk (= similar to brick) VVM temporarily upgraded the subdisk to a mirrored volume, synced both sides of the mirror and then downgraded the construct to the new sub-disk. it was impressive and scary at the same time but we never had an outage. BTW: I'm running Gluster 3.8.15 BTW: new storage is ordered but the reseller fucked up and now we have to wait for the delivery for 2 months Kind Regards Bernhard ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] nfs-ganesha locking problems
Hi Soumya, what I can say so far: it is working on a standalone system but not on the clustered system from reading the ganesha wiki I have the impression that it is possible to change the log level without restarting ganesha. I was playing with dbus-send but so far was unsuccessful. if you can help me with that, this would be great. here some details about the tested machines. the nfs client was always the same THIS SYSTEM IS WORKING root@chvirnfstst01 ~]# uname -a Linux chvirnfstst01 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux [root@chvirnfstst01 ~]# cd /etc/ [root@chvirnfstst01 etc]# ls -ld *rel* -rw-r--r--. 1 root root 38 Aug 30 17:53 centos-release -rw-r--r--. 1 root root 51 Aug 30 17:53 centos-release-upstream -rw-r--r--. 1 root root 393 Aug 30 17:53 os-release drwxr-xr-x. 2 root root 78 Oct 1 15:52 prelink.conf.d lrwxrwxrwx. 1 root root 14 Oct 1 15:51 redhat-release -> centos-release lrwxrwxrwx. 1 root root 14 Oct 1 15:51 system-release -> centos-release -rw-r--r--. 1 root root 23 Aug 30 17:53 system-release-cpe [root@chvirnfstst01 etc]# cat centos-release CentOS Linux release 7.4.1708 (Core) [root@chvirnfstst01 etc]# cat os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/; BUG_REPORT_URL="https://bugs.centos.org/; CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7" [root@chvirnfstst01 etc]# [root@chvirnfstst01 etc]# rpm -qa | grep ganesha | sort nfs-ganesha-2.3.3-1.el7.x86_64 nfs-ganesha-gluster-2.3.3-1.el7.x86_64 [root@chvirnfstst01 etc]# [root@chvirnfstst01 etc]# rpm -qa | grep gluster | sort centos-release-gluster38-1.0-1.el7.centos.noarch glusterfs-3.8.15-2.el7.x86_64 glusterfs-api-3.8.15-2.el7.x86_64 glusterfs-client-xlators-3.8.15-2.el7.x86_64 glusterfs-libs-3.8.15-2.el7.x86_64 nfs-ganesha-gluster-2.3.3-1.el7.x86_64 [root@chvirnfstst01 etc]# [root@chvirnfstst01 etc]# cat /etc/ganesha/ganesha.conf EXPORT { # Export Id (mandatory, each EXPORT must have a unique Export_Id) Export_Id = 77; # Exported path (mandatory) Path = /ora_dump; # Pseudo Path (required for NFS v4) Pseudo = /ora_dump; # Exporting FSAL FSAL { Name = GLUSTER; Hostname = 10.30.28.43; Volume = ora_dump; } CLIENT { # Oracle Servers Clients = 10.30.29.125,10.30.28.25,10.30.28.64,10.30.29.123,10.30.28.21,10.30.28.81,10.30.29.124,10.30.28.82,10.30.29.111; Access_Type = RW; } } EXPORT { # Export Id (mandatory, each EXPORT must have a unique Export_Id) Export_Id = 88; # Exported path (mandatory) Path = /chzrhcvtprd04; # Pseudo Path (required for NFS v4) Pseudo = /chzrhcvtprd04; # Exporting FSAL FSAL { Name = GLUSTER; Hostname = 10.30.28.43; Volume = chzrhcvtprd04; } CLIENT { # everybody Clients = 10.30.0.0/16,10.40.0.0/16,10.50.0.0/16; Access_Type = RW; } } [root@chvirnfstst01 etc]# THIS SYSTEM IS NOT WORKING you can find the details about the shared volume in my previous mail [root@chvirnfsprd12 ~]# uname -a Linux chvirnfsprd12 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux [root@chvirnfsprd12 ~]# cd /etc/ [root@chvirnfsprd12 etc]# ls -ld *rel* -rw-r--r--. 1 root root 38 Nov 29 2016 centos-release -rw-r--r--. 1 root root 51 Nov 29 2016 centos-release-upstream -rw-r--r--. 1 root root 393 Nov 29 2016 os-release drwxr-xr-x. 2 root root 78 Sep 2 08:54 prelink.conf.d lrwxrwxrwx. 1 root root 14 Sep 2 08:53 redhat-release -> centos-release lrwxrwxrwx. 1 root root 14 Sep 2 08:53 system-release -> centos-release -rw-r--r--. 1 root root 23 Nov 29 2016 system-release-cpe [root@chvirnfsprd12 etc]# cat centos-release CentOS Linux release 7.3.1611 (Core) [root@chvirnfsprd12 etc]# cat os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/; BUG_REPORT_URL="https://bugs.centos.org/; CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7" [root@chvirnfsprd12 etc]# rpm -qa | grep ganesha | sort glusterfs-ganesha-3.8.15-2.el7.x86_64 nfs-ganesha-2.3.3-1.el7.x86_64 nfs-ganesha-gluster-2.3.3-1.el7.x86_64 [root@chvirnfsprd12 etc]# rpm -qa | grep gluster | sort centos-release-gluster38-1.0-1.el7.centos.noarch glusterfs-3.8.15-2.el7.x86_64
[Gluster-users] nfs-ganesha locking problems
Hi, I have a problem with nfs-ganesha serving gluster volumes I can read and write files but then one of the DBAs tried to dump an Oracle DB onto the NFS share and got the following errors: Export: Release 11.2.0.4.0 - Production on Wed Sep 27 23:27:48 2017 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options ORA-39001: invalid argument value ORA-39000: bad dump file specification ORA-31641: unable to create dump file "/u00/app/oracle/DB_BACKUPS/FPESSP11/riskdw_prod_tabs_28092017_01.dmp" ORA-27086: unable to lock file - already in use Linux-x86_64 Error: 37: No locks available Additional information: 10 ORA-27037: unable to obtain file status Linux-x86_64 Error: 2: No such file or directory Additional information: 3 the file exists and is accessible. Details: There are 2 gluster clusters involved the first cluster hosts a number of "replica 3 arbiter 1" volumes the second cluster only hosts the cluster.enable-shared-storage volume across 3 nodes. it also runs nfs-ganesha in cluster configuration (pacemaker, corosync). nfs-ganesha serves the volumes from the first cluster. Any idea what's wrong? Kind Regards Bernhard CLUSTER 1 info == root@chglbcvtprd04:/etc# cat os-release NAME="Ubuntu" VERSION="16.04.3 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.3 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/; SUPPORT_URL="http://help.ubuntu.com/; BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/; VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial root@chglbcvtprd04:/etc# cat lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=16.04 DISTRIB_CODENAME=xenial DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS" root@chglbcvtprd04:/etc# dpkg -l | grep gluster | sort ii glusterfs-client3.8.15-ubuntu1~xenial1 amd64clustered file-system (client package) ii glusterfs-common3.8.15-ubuntu1~xenial1 amd64GlusterFS common libraries and translator modules ii glusterfs-server3.8.15-ubuntu1~xenial1 amd64clustered file-system (server package) root@chglbcvtprd04:~# gluster volume status ora_dump Status of volume: ora_dump Gluster process TCP Port RDMA Port Online Pid -- Brick chastcvtprd04:/data/glusterfs/ora_dum p/2I-1-39/brick 49772 0 Y 11048 Brick chglbcvtprd04:/data/glusterfs/ora_dum p/2I-1-39/brick 50108 0 Y 9990 Brick chealglaprd01:/data/glusterfs/arbiter /vol01/ora_dump.2I-1-39 49200 0 Y 3114 Brick chastcvtprd04:/data/glusterfs/ora_dum p/1I-1-18/brick 49773 0 Y 11085 Brick chglbcvtprd04:/data/glusterfs/ora_dum p/1I-1-18/brick 50109 0 Y 1 Brick chealglaprd01:/data/glusterfs/arbiter /vol02/ora_dump.1I-1-18 49201 0 Y 3080 Brick chastcvtprd04:/data/glusterfs/ora_dum p/2I-1-48/brick 49774 0 Y 11091 Brick chglbcvtprd04:/data/glusterfs/ora_dum p/2I-1-48/brick 50110 0 Y 10007 Brick chealglaprd01:/data/glusterfs/arbiter /vol03/ora_dump.2I-1-48 49202 0 Y 3070 Brick chastcvtprd04:/data/glusterfs/ora_dum p/1I-1-25/brick 49775 0 Y 11152 Brick chglbcvtprd04:/data/glusterfs/ora_dum p/1I-1-25/brick 50111 0 Y 10012 Brick chealglaprd01:/data/glusterfs/arbiter /vol04/ora_dump.1I-1-25 49203 0 Y 3090 Self-heal Daemon on localhost N/A N/AY 27438 Self-heal Daemon on chealglaprd01 N/A N/AY 32209 Self-heal Daemon on chastcvtprd04.fpprod.co rp N/A N/AY 27378 root@chglbcvtprd04:~# gluster volume info ora_dump Volume Name: ora_dump Type: Distributed-Replicate Volume ID: b26e649d-d1fe-4ebc-aa03-b196c8925466 Status: Started Snapshot Count: 0 Number of Bricks: 4 x (2 + 1) = 12 Transport-type: tcp Bricks: Brick1: chastcvtprd04:/data/glusterfs/ora_dump/2I-1-39/brick Brick2: chglbcvtprd04:/data/glusterfs/ora_dump/2I-1-39/brick Brick3: chealglaprd01:/data/glusterfs/arbiter/vol01/ora_dump.2I-1-39 (arbiter) Brick4: chastcvtprd04:/data/glusterfs/ora_dump/1I-1-18/brick Brick5: chglbcvtprd04:/data/glusterfs/ora_dump/1I-1-18/brick Brick6: chealglaprd01:/data/glusterfs/arbiter/vol02/ora_dump.1I-1-18 (arbiter) Brick7:
Re: [Gluster-users] Bug 1374166 or similar
Hi Jiffin, thank you for the explanation Kind Regards Bernhard 2017-07-18 8:53 GMT+02:00 Jiffin Tony Thottan <jthot...@redhat.com>: > > > On 16/07/17 20:11, Bernhard Dübi wrote: >> >> Hi, >> >> both Gluster servers were rebooted and now the unlink directory is clean. > > > Following should have happened, If delete operation is performed gluster > keeps file in .unlink directory if it has open fd. > In this case since lazy umount is performed, ganesha server may still keep > the fd's open by that client so gluster keeps > the unlink directory even though it is removed from fuse mount. > > -- > Jiffin > > >> Best Regards >> Bernhard >> >> 2017-07-14 12:43 GMT+02:00 Bernhard Dübi <1linuxengin...@gmail.com>: >>> >>> Hi, >>> >>> yes, I mounted the Gluster volume and deleted the files from the >>> volume not the brick >>> >>> mount -t glusterfs hostname:volname /mnt >>> cd /mnt/some/directory >>> rm -rf * >>> >>> restart of nfs-ganesha is planned for tomorrow. I'll keep you posted >>> BTW: nfs-ganesha is running on a separate server in standalone >>> configuration >>> >>> Best Regards >>> Bernhard >>> >>> 2017-07-14 10:43 GMT+02:00 Jiffin Tony Thottan <jthot...@redhat.com>: >>>> >>>> >>>> On 14/07/17 13:06, Bernhard Dübi wrote: >>>>> >>>>> Hello everybody, >>>>> >>>>> I'm in a similar situation as described in >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1374166 >>>> >>>> >>>> The issue got fixed by https://review.gluster.org/#/c/14820 and is >>>> already >>>> available in 3.8 branch >>>> >>>>> I have a gluster volume exported through ganesha. we had some problems >>>>> on the gluster server and the NFS mount on the client was hanging. >>>>> I did a lazy umount of the NFS mount on the client, then went to the >>>>> Gluster server, mounted the Gluster volume and deleted a bunch of >>>>> files. >>>>> When I mounted the volume again on the client I noticed that the space >>>>> was not freed. Now I find them in $brick/.glusterfs/unlink >>>> >>>> Here you have mounted the volume via glusterfs fuse mount and deleted >>>> those >>>> files >>>> right(not directly from the bricks)? >>>> Can you restart nfs-ganesha server and see what happens ? >>>> What type of volume are you using? >>>> -- >>>> Jiffin >>>> >>>>> OS: Ubuntu 16.04 >>>>> Gluster: 3.8.13 >>>>> Ganesha: 2.4.5 >>>>> >>>>> Let me know if you need more info >>>>> >>>>> Best Regards >>>>> Bernhard >>>>> ___ >>>>> Gluster-users mailing list >>>>> Gluster-users@gluster.org >>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Bug 1374166 or similar
Hi, both Gluster servers were rebooted and now the unlink directory is clean. Best Regards Bernhard 2017-07-14 12:43 GMT+02:00 Bernhard Dübi <1linuxengin...@gmail.com>: > Hi, > > yes, I mounted the Gluster volume and deleted the files from the > volume not the brick > > mount -t glusterfs hostname:volname /mnt > cd /mnt/some/directory > rm -rf * > > restart of nfs-ganesha is planned for tomorrow. I'll keep you posted > BTW: nfs-ganesha is running on a separate server in standalone configuration > > Best Regards > Bernhard > > 2017-07-14 10:43 GMT+02:00 Jiffin Tony Thottan <jthot...@redhat.com>: >> >> >> On 14/07/17 13:06, Bernhard Dübi wrote: >>> >>> Hello everybody, >>> >>> I'm in a similar situation as described in >>> https://bugzilla.redhat.com/show_bug.cgi?id=1374166 >> >> >> The issue got fixed by https://review.gluster.org/#/c/14820 and is already >> available in 3.8 branch >> >>> >>> I have a gluster volume exported through ganesha. we had some problems >>> on the gluster server and the NFS mount on the client was hanging. >>> I did a lazy umount of the NFS mount on the client, then went to the >>> Gluster server, mounted the Gluster volume and deleted a bunch of >>> files. >>> When I mounted the volume again on the client I noticed that the space >>> was not freed. Now I find them in $brick/.glusterfs/unlink >> >> Here you have mounted the volume via glusterfs fuse mount and deleted those >> files >> right(not directly from the bricks)? >> Can you restart nfs-ganesha server and see what happens ? >> What type of volume are you using? >> -- >> Jiffin >> >>> OS: Ubuntu 16.04 >>> Gluster: 3.8.13 >>> Ganesha: 2.4.5 >>> >>> Let me know if you need more info >>> >>> Best Regards >>> Bernhard >>> ___ >>> Gluster-users mailing list >>> Gluster-users@gluster.org >>> http://lists.gluster.org/mailman/listinfo/gluster-users >> >> ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Bug 1374166 or similar
Hi, yes, I mounted the Gluster volume and deleted the files from the volume not the brick mount -t glusterfs hostname:volname /mnt cd /mnt/some/directory rm -rf * restart of nfs-ganesha is planned for tomorrow. I'll keep you posted BTW: nfs-ganesha is running on a separate server in standalone configuration Best Regards Bernhard 2017-07-14 10:43 GMT+02:00 Jiffin Tony Thottan <jthot...@redhat.com>: > > > On 14/07/17 13:06, Bernhard Dübi wrote: >> >> Hello everybody, >> >> I'm in a similar situation as described in >> https://bugzilla.redhat.com/show_bug.cgi?id=1374166 > > > The issue got fixed by https://review.gluster.org/#/c/14820 and is already > available in 3.8 branch > >> >> I have a gluster volume exported through ganesha. we had some problems >> on the gluster server and the NFS mount on the client was hanging. >> I did a lazy umount of the NFS mount on the client, then went to the >> Gluster server, mounted the Gluster volume and deleted a bunch of >> files. >> When I mounted the volume again on the client I noticed that the space >> was not freed. Now I find them in $brick/.glusterfs/unlink > > Here you have mounted the volume via glusterfs fuse mount and deleted those > files > right(not directly from the bricks)? > Can you restart nfs-ganesha server and see what happens ? > What type of volume are you using? > -- > Jiffin > >> OS: Ubuntu 16.04 >> Gluster: 3.8.13 >> Ganesha: 2.4.5 >> >> Let me know if you need more info >> >> Best Regards >> Bernhard >> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users > > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Bug 1374166 or similar
Hello everybody, I'm in a similar situation as described in https://bugzilla.redhat.com/show_bug.cgi?id=1374166 I have a gluster volume exported through ganesha. we had some problems on the gluster server and the NFS mount on the client was hanging. I did a lazy umount of the NFS mount on the client, then went to the Gluster server, mounted the Gluster volume and deleted a bunch of files. When I mounted the volume again on the client I noticed that the space was not freed. Now I find them in $brick/.glusterfs/unlink OS: Ubuntu 16.04 Gluster: 3.8.13 Ganesha: 2.4.5 Let me know if you need more info Best Regards Bernhard ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] total outage - almost
Hi, I just remembered that I posted once a bug at redhat https://bugzilla.redhat.com/show_bug.cgi?id=1434000 could this be the same problem? but this time it's not a few files but hundreds of thousands BTW: I tried to disable bitrot but it didn't help Best Regards Bernhard 2017-06-19 16:51 GMT+02:00 Bernhard Dübi <1linuxengin...@gmail.com>: > Hi, > > I checked the attributes of one of the files with I/O errors > > root@chastcvtprd04:~# getfattr -d -e hex -m - > /data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014 > getfattr: Removing leading '/' from absolute path names > # file: > data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014 > trusted.afr.dirty=0x > trusted.bit-rot.bad-file=0x3100 > trusted.bit-rot.signature=0x011400ee3e3ac6a79b8efc42d0904ca431cb20d01890d300c041e905d9d78a562bf276 > trusted.bit-rot.version=0x14005841bb3c000ac813 > trusted.gfid=0x1427a79086f14ed2902e3c18e133d02b > > > > > root@chglbcvtprd04:~# getfattr -d -e hex -m - > /data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014 > getfattr: Removing leading '/' from absolute path names > # file: > data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014 > trusted.afr.dirty=0x > trusted.bit-rot.bad-file=0x3100 > trusted.bit-rot.signature=0x011300ee3e3ac6a79b8efc42d0904ca431cb20d01890d300c041e905d9d78a562bf276 > trusted.bit-rot.version=0x13005841b921000c222f > trusted.gfid=0x1427a79086f14ed2902e3c18e133d02b > > > > the "dirty" is 0, that's good, isn't it? > what's the "trusted.bit-rot.bad-file=0x3100" information? > > Best Regards > Bernhard Dübi > > BTW: I saved all logs, maybe I can upload them somewhere > > 2017-06-19 15:55 GMT+02:00 Bernhard Dübi <1linuxengin...@gmail.com>: >> Hi, >> >> we use a bunch of replicated gluster volumes as a backend for our >> backup. Yesterday I noticed that some synthetic backups failed because >> of I/O errors. >> >> Today I ran "find /gluster_vol -type f | xargs md5sum" and got loads >> of I/O errors. >> The brick log file shows the below errors >> >> [2017-06-19 13:42:33.554875] E [MSGID: 116020] >> [bit-rot-stub.c:566:br_stub_check_bad_object] >> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba >> is a bad object. Returning >> [2017-06-19 13:42:33.554923] E [MSGID: 116020] >> [bit-rot-stub.c:566:br_stub_check_bad_object] >> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba >> is a bad object. Returning >> [2017-06-19 13:42:33.554931] E [MSGID: 115081] >> [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: >> 21461: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> >> (Input/output error) [Input/output error] >> [2017-06-19 13:42:33.554940] E [MSGID: 115081] >> [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: >> 21462: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> >> (Input/output error) [Input/output error] >> [2017-06-19 13:42:33.555655] E [MSGID: 116020] >> [bit-rot-stub.c:566:br_stub_check_bad_object] >> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba >> is a bad object. Returning >> [2017-06-19 13:42:33.555697] E [MSGID: 115081] >> [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: >> 21463: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> >> (Input/output error) [Input/output error] >> [2017-06-19 13:42:33.555950] E [MSGID: 116020] >> [bit-rot-stub.c:566:br_stub_check_bad_object] >> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba >> is a bad object. Returning >> [2017-06-19 13:42:33.555983] E [MSGID: 115081] >> [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: >> 21464: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> >> (Input/output error) [Input/output error] >> [2017-06-19 13:42:33.556604] E [MSGID: 116020] >> [bit-rot-stub.c:566:br_stub_check_bad_object] >> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba >> is a bad object. Returning >> >> >> >> >> Any idea what's wrong? >> >> >> BTW: I'm running gluster 3.8.12 on Ubuntu 16.04 - 4.4.0-79 >> >> many thanks for your help >> Bernhard ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] total outage - almost
Hi, I checked the attributes of one of the files with I/O errors root@chastcvtprd04:~# getfattr -d -e hex -m - /data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014 getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014 trusted.afr.dirty=0x trusted.bit-rot.bad-file=0x3100 trusted.bit-rot.signature=0x011400ee3e3ac6a79b8efc42d0904ca431cb20d01890d300c041e905d9d78a562bf276 trusted.bit-rot.version=0x14005841bb3c000ac813 trusted.gfid=0x1427a79086f14ed2902e3c18e133d02b root@chglbcvtprd04:~# getfattr -d -e hex -m - /data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014 getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014 trusted.afr.dirty=0x trusted.bit-rot.bad-file=0x3100 trusted.bit-rot.signature=0x011300ee3e3ac6a79b8efc42d0904ca431cb20d01890d300c041e905d9d78a562bf276 trusted.bit-rot.version=0x13005841b921000c222f trusted.gfid=0x1427a79086f14ed2902e3c18e133d02b the "dirty" is 0, that's good, isn't it? what's the "trusted.bit-rot.bad-file=0x3100" information? Best Regards Bernhard Dübi BTW: I saved all logs, maybe I can upload them somewhere 2017-06-19 15:55 GMT+02:00 Bernhard Dübi <1linuxengin...@gmail.com>: > Hi, > > we use a bunch of replicated gluster volumes as a backend for our > backup. Yesterday I noticed that some synthetic backups failed because > of I/O errors. > > Today I ran "find /gluster_vol -type f | xargs md5sum" and got loads > of I/O errors. > The brick log file shows the below errors > > [2017-06-19 13:42:33.554875] E [MSGID: 116020] > [bit-rot-stub.c:566:br_stub_check_bad_object] > 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba > is a bad object. Returning > [2017-06-19 13:42:33.554923] E [MSGID: 116020] > [bit-rot-stub.c:566:br_stub_check_bad_object] > 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba > is a bad object. Returning > [2017-06-19 13:42:33.554931] E [MSGID: 115081] > [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: > 21461: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> > (Input/output error) [Input/output error] > [2017-06-19 13:42:33.554940] E [MSGID: 115081] > [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: > 21462: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> > (Input/output error) [Input/output error] > [2017-06-19 13:42:33.555655] E [MSGID: 116020] > [bit-rot-stub.c:566:br_stub_check_bad_object] > 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba > is a bad object. Returning > [2017-06-19 13:42:33.555697] E [MSGID: 115081] > [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: > 21463: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> > (Input/output error) [Input/output error] > [2017-06-19 13:42:33.555950] E [MSGID: 116020] > [bit-rot-stub.c:566:br_stub_check_bad_object] > 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba > is a bad object. Returning > [2017-06-19 13:42:33.555983] E [MSGID: 115081] > [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: > 21464: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> > (Input/output error) [Input/output error] > [2017-06-19 13:42:33.556604] E [MSGID: 116020] > [bit-rot-stub.c:566:br_stub_check_bad_object] > 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba > is a bad object. Returning > > > > > Any idea what's wrong? > > > BTW: I'm running gluster 3.8.12 on Ubuntu 16.04 - 4.4.0-79 > > many thanks for your help > Bernhard ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] total outage - almost
Hi, we use a bunch of replicated gluster volumes as a backend for our backup. Yesterday I noticed that some synthetic backups failed because of I/O errors. Today I ran "find /gluster_vol -type f | xargs md5sum" and got loads of I/O errors. The brick log file shows the below errors [2017-06-19 13:42:33.554875] E [MSGID: 116020] [bit-rot-stub.c:566:br_stub_check_bad_object] 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba is a bad object. Returning [2017-06-19 13:42:33.554923] E [MSGID: 116020] [bit-rot-stub.c:566:br_stub_check_bad_object] 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba is a bad object. Returning [2017-06-19 13:42:33.554931] E [MSGID: 115081] [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: 21461: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> (Input/output error) [Input/output error] [2017-06-19 13:42:33.554940] E [MSGID: 115081] [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: 21462: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> (Input/output error) [Input/output error] [2017-06-19 13:42:33.555655] E [MSGID: 116020] [bit-rot-stub.c:566:br_stub_check_bad_object] 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba is a bad object. Returning [2017-06-19 13:42:33.555697] E [MSGID: 115081] [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: 21463: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> (Input/output error) [Input/output error] [2017-06-19 13:42:33.555950] E [MSGID: 116020] [bit-rot-stub.c:566:br_stub_check_bad_object] 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba is a bad object. Returning [2017-06-19 13:42:33.555983] E [MSGID: 115081] [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server: 21464: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==> (Input/output error) [Input/output error] [2017-06-19 13:42:33.556604] E [MSGID: 116020] [bit-rot-stub.c:566:br_stub_check_bad_object] 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba is a bad object. Returning Any idea what's wrong? BTW: I'm running gluster 3.8.12 on Ubuntu 16.04 - 4.4.0-79 many thanks for your help Bernhard ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] ganesha.nfsd: `NTIRPC_1.4.3' not found
Hi, is this list also dealing with nfs-ganesha problems? I just ran a dist-upgrade on my Ubuntu 16.04 machine and now nfs-ganesha doesn't start anymore May 20 10:00:15 chastcvtprd03 bash[5720]: /usr/bin/ganesha.nfsd: /lib/x86_64-linux-gnu/libntirpc.so.1.4: version `NTIRPC_1.4.3' not found (required by /usr/bin/ganesha.nfsd) Any hints? Here some info about my system: # uname -a Linux hostname 4.4.0-78-generic #99-Ubuntu SMP Thu Apr 27 15:29:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux # cat /etc/os-release NAME="Ubuntu" VERSION="16.04.2 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.2 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/; SUPPORT_URL="http://help.ubuntu.com/; BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/; VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial /etc/apt/sources.list.d# head *.list ==> gluster-ubuntu-glusterfs-3_8-xenial.list <== deb http://ppa.launchpad.net/gluster/glusterfs-3.8/ubuntu xenial main # deb-src http://ppa.launchpad.net/gluster/glusterfs-3.8/ubuntu xenial main ==> gluster-ubuntu-libntirpc-xenial.list <== deb http://ppa.launchpad.net/gluster/libntirpc/ubuntu xenial main # deb-src http://ppa.launchpad.net/gluster/libntirpc/ubuntu xenial main ==> gluster-ubuntu-nfs-ganesha-xenial.list <== deb http://ppa.launchpad.net/gluster/nfs-ganesha/ubuntu xenial main # deb-src http://ppa.launchpad.net/gluster/nfs-ganesha/ubuntu xenial main # dpkg -l | grep -E 'gluster|ganesha|libntirpc' ii glusterfs-common 3.8.12-ubuntu1~xenial1 amd64GlusterFS common libraries and translator modules ii libntirpc1:amd64 1.4.4-ubuntu1~xenial1 amd64new transport-independent RPC library ii nfs-ganesha 2.4.5-ubuntu1~xenial1 amd64nfs-ganesha is a NFS server in User Space ii nfs-ganesha-fsal:amd642.4.5-ubuntu1~xenial1 amd64nfs-ganesha fsal libraries Best Regards Bernhard ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] unlimited memory usage
Hi, one more question: when I can convince my boss to buy another machine to separate the load of Gluster and Backup onto different machines, will this solve my problem or will the Gluster client also eat up all memory it can get? Best Regards Bernhard 2017-02-13 21:53 GMT+01:00 Bernhard Dübi <1linuxengin...@gmail.com>: > Hi, > > I'm running Gluster 3.8.8 on Ubuntu 16.04 on 2 HP Apollo 4510 with 60 x > 8TB each > The machines are used as Backup Media Agents for CommVault Simpana V11 > I was running this combination since Gluster 3.7. Lately I noticed that > Gluster is using almost all available memory, starving the other > applications. I tried to put some memory limitions on gluster using cgroups > but that didn't work out. > Any other idea to make Gluster less greedy on memory? > > Best Regards > Bernhard > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] unlimited memory usage
Hi, I'm running Gluster 3.8.8 on Ubuntu 16.04 on 2 HP Apollo 4510 with 60 x 8TB each The machines are used as Backup Media Agents for CommVault Simpana V11 I was running this combination since Gluster 3.7. Lately I noticed that Gluster is using almost all available memory, starving the other applications. I tried to put some memory limitions on gluster using cgroups but that didn't work out. Any other idea to make Gluster less greedy on memory? Best Regards Bernhard ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users