[Gluster-users] issues with replicating data to a new brick

2018-04-12 Thread Bernhard Dübi
Hello everybody,

I have some kind of a situation here

I want to move some volumes to new hosts. the idea is to add the new
bricks to the volume, sync and then drop the old bricks.

starting point is:


Volume Name: Server_Monthly_02
Type: Replicate
Volume ID: 0ada8e12-15f7-42e9-9da3-2734b04e04e9
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: chastcvtprd04:/data/glusterfs/Server_Monthly/2I-1-40/brick
Brick2: chglbcvtprd04:/data/glusterfs/Server_Monthly/2I-1-40/brick
Options Reconfigured:
features.scrub: Inactive
features.bitrot: off
nfs.disable: on
auth.allow: 
127.0.0.1,10.30.28.43,10.30.28.44,10.30.28.17,10.30.28.18,10.8.13.132,10.30.28.30,10.30.28.31
performance.readdir-ahead: on
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on


root@chastcvtprd04:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.3 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.3 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/;
SUPPORT_URL="http://help.ubuntu.com/;
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/;
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
root@chastcvtprd04:~# uname -a
Linux chastcvtprd04 4.4.0-109-generic #132-Ubuntu SMP Tue Jan 9
19:52:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
root@chastcvtprd04:~# dpkg -l | grep gluster
ii  glusterfs-client 3.8.15-ubuntu1~xenial1
   amd64clustered file-system (client package)
ii  glusterfs-common 3.8.15-ubuntu1~xenial1
   amd64GlusterFS common libraries and translator
modules
ii  glusterfs-server 3.8.15-ubuntu1~xenial1
   amd64clustered file-system (server package)



root@chastcvtprd04:~# df -h  /data/glusterfs/Server_Monthly/2I-1-40/brick
Filesystem  Size  Used Avail Use% Mounted on
/dev/bcache47   7.3T  7.3T   45G 100% /data/glusterfs/Server_Monthly/2I-1-40


then I add the new brick


Volume Name: Server_Monthly_02
Type: Replicate
Volume ID: 0ada8e12-15f7-42e9-9da3-2734b04e04e9
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: chastcvtprd04:/data/glusterfs/Server_Monthly/2I-1-40/brick
Brick2: chglbcvtprd04:/data/glusterfs/Server_Monthly/2I-1-40/brick
Brick3: chglbglsprd02:/data/glusterfs/Server_Monthly/1I-1-51/brick
Options Reconfigured:
features.scrub: Inactive
features.bitrot: off
nfs.disable: on
auth.allow: 
127.0.0.1,10.30.28.43,10.30.28.44,10.30.28.17,10.30.28.18,10.8.13.132,10.30.28.30,10.30.28.31
performance.readdir-ahead: on
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on


root@chglbglsprd02:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.4 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.4 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/;
SUPPORT_URL="http://help.ubuntu.com/;
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/;
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
root@chglbglsprd02:~# uname -a
Linux chglbglsprd02 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12
21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
root@chglbglsprd02:~# dpkg -l | grep gluster
ii  glusterfs-client 3.8.15-ubuntu1~xenial1
 amd64clustered file-system (client package)
ii  glusterfs-common 3.8.15-ubuntu1~xenial1
 amd64GlusterFS common libraries and translator
modules
ii  glusterfs-server 3.8.15-ubuntu1~xenial1
 amd64clustered file-system (server package)


then healing kicks in and the cluster starts copying data to the new brick
unfortunately after a while it starts complaining

[2018-04-10 14:39:32.057443] E [MSGID: 113072]
[posix.c:3457:posix_writev] 0-Server_Monthly_02-posix: write failed:
offset 0, [No space left on device]
[2018-04-10 14:39:32.057538] E [MSGID: 115067]
[server-rpc-fops.c:1346:server_writev_cbk] 0-Server_Monthly_02-server:
22835126: WRITEV 0 (48949669-ba1c-4735-b83c-71340f1bb64f) ==> (No
space left on device) [No space left on device]


root@chglbglsprd02:~# df -h /data/glusterfs/Server_Monthly/1I-1-51/brick
Filesystem  Size  Used Avail Use% Mounted on
/dev/sdaq   7.3T  7.3T   20K 100% /data/glusterfs/Server_Monthly/1I-1-51



there's no other I/O going on on this volume, so the copy process
should be straight forward
BUT I noticed that there are a lot of sparse files on this volume



Any ideas on how to make it work?
If you need more details, please let me known and I'll try to make
them available


Kind Regards
Bernhard
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Rebooting cluster nodes - GFS3.8

2017-12-05 Thread Bernhard Dübi
Hi,
just wanted to write the same thing.
there was once a post that suggested to kill the gluster processes
manually but I guess rebooting the machine will do the same. the
clients will stall for a while and then continue do access the volume
from the remaining node.
it is very important that you checj the heal status before you bring
down the next node otherwise you could end up in a split brain
situation.
hope this helps
Bernhard

2017-12-05 18:36 GMT+01:00 Andrew Kester :
> On my setup at least, just issuing the reboot command works without any
> issue.  I've done a number of rolling reboots for software / kernel upgrades
> in the manner you've described this way.
>
> The one gotcha I've found is when the node comes back online.  I manually
> check healing to ensure that everything is synced and back online before
> taking other nodes offline.
>
> ---
> Thanks,
>
> Andrew Kester
> The Storehouse
> https://sthse.co
>
> On 12/5/17 10:40 AM, Mark Connor wrote:
>>
>> I am running gluster ver 3.8 in a distributed replica 2 config. I need to
>> reboot all my 8 cluster nodes to update my bios firmware.  I would like to
>> do a rolling update to my bios and keep up my cluster so my clients don't
>> take an outage. Do I need to shutdown all gluster services on each node
>> before I reboot? Or just issue the reboot.
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] move brick to new location

2017-11-28 Thread Bernhard Dübi
Hello everybody,

we have a number of "replica 3 arbiter 1" or (2 + 1) volumes
because we're running out of space on some volumes I need to optimize
the usage of the physical disks. that means I want to consolidate
volumes with low usage onto the same physical disk. I can do it with
"replace-brick commit force" but that looks a bit drastic to me
because it immediately drops the current brick and rebuilds the new
one from the remaining bricks. Is there a possibility which builds the
new brick in the background and changes config only when it's fully in
sync?

I was thinking about
- dropping arbiter brick => replica 2
- adding a new brick => replica 3
- dropping old brick => replica 2
- re-adding arbiter brick => replica 2 arbiter 1

About 20 years ago, I was managing Vertitas Volume Manager. To move a
sub-disk (= similar to brick) VVM temporarily upgraded the subdisk to
a mirrored volume, synced both sides of the mirror and then downgraded
the construct to the new sub-disk. it was impressive and scary at the
same time but we never had an outage.

BTW: I'm running Gluster 3.8.15
BTW: new storage is ordered but the reseller fucked up and now we have
to wait for the delivery for 2 months

Kind Regards
Bernhard
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] nfs-ganesha locking problems

2017-10-02 Thread Bernhard Dübi
Hi Soumya,

what I can say so far:

it is working on a standalone system but not on the clustered system

from reading the ganesha wiki I have the impression that it is
possible to change the log level without restarting ganesha. I was
playing with dbus-send but so far was unsuccessful. if you can help me
with that, this would be great.

here some details about the tested machines. the nfs client was always the same


THIS SYSTEM IS WORKING


root@chvirnfstst01 ~]# uname -a
Linux chvirnfstst01 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12
22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@chvirnfstst01 ~]# cd /etc/
[root@chvirnfstst01 etc]# ls -ld *rel*
-rw-r--r--. 1 root root  38 Aug 30 17:53 centos-release
-rw-r--r--. 1 root root  51 Aug 30 17:53 centos-release-upstream
-rw-r--r--. 1 root root 393 Aug 30 17:53 os-release
drwxr-xr-x. 2 root root  78 Oct  1 15:52 prelink.conf.d
lrwxrwxrwx. 1 root root  14 Oct  1 15:51 redhat-release -> centos-release
lrwxrwxrwx. 1 root root  14 Oct  1 15:51 system-release -> centos-release
-rw-r--r--. 1 root root  23 Aug 30 17:53 system-release-cpe
[root@chvirnfstst01 etc]# cat centos-release
CentOS Linux release 7.4.1708 (Core)
[root@chvirnfstst01 etc]# cat os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/;
BUG_REPORT_URL="https://bugs.centos.org/;

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

[root@chvirnfstst01 etc]#
[root@chvirnfstst01 etc]# rpm -qa | grep ganesha | sort
nfs-ganesha-2.3.3-1.el7.x86_64
nfs-ganesha-gluster-2.3.3-1.el7.x86_64
[root@chvirnfstst01 etc]#
[root@chvirnfstst01 etc]# rpm -qa | grep gluster | sort
centos-release-gluster38-1.0-1.el7.centos.noarch
glusterfs-3.8.15-2.el7.x86_64
glusterfs-api-3.8.15-2.el7.x86_64
glusterfs-client-xlators-3.8.15-2.el7.x86_64
glusterfs-libs-3.8.15-2.el7.x86_64
nfs-ganesha-gluster-2.3.3-1.el7.x86_64
[root@chvirnfstst01 etc]#
[root@chvirnfstst01 etc]# cat /etc/ganesha/ganesha.conf
EXPORT
{
# Export Id (mandatory, each EXPORT must have a unique Export_Id)
Export_Id = 77;

# Exported path (mandatory)
Path = /ora_dump;

# Pseudo Path (required for NFS v4)
Pseudo = /ora_dump;

# Exporting FSAL
FSAL {
Name = GLUSTER;
Hostname = 10.30.28.43;
Volume = ora_dump;
}

CLIENT {
# Oracle Servers
Clients =
10.30.29.125,10.30.28.25,10.30.28.64,10.30.29.123,10.30.28.21,10.30.28.81,10.30.29.124,10.30.28.82,10.30.29.111;
Access_Type = RW;
}
}

EXPORT
{
# Export Id (mandatory, each EXPORT must have a unique Export_Id)
Export_Id = 88;

# Exported path (mandatory)
Path = /chzrhcvtprd04;

# Pseudo Path (required for NFS v4)
Pseudo = /chzrhcvtprd04;

# Exporting FSAL
FSAL {
Name = GLUSTER;
Hostname = 10.30.28.43;
Volume = chzrhcvtprd04;
}

CLIENT {
# everybody
Clients = 10.30.0.0/16,10.40.0.0/16,10.50.0.0/16;
Access_Type = RW;
}
}
[root@chvirnfstst01 etc]#



THIS SYSTEM IS NOT WORKING

you can find the details about the shared volume in my previous mail

[root@chvirnfsprd12 ~]# uname -a
Linux chvirnfsprd12 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4
15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@chvirnfsprd12 ~]# cd /etc/
[root@chvirnfsprd12 etc]# ls -ld *rel*
-rw-r--r--. 1 root root  38 Nov 29  2016 centos-release
-rw-r--r--. 1 root root  51 Nov 29  2016 centos-release-upstream
-rw-r--r--. 1 root root 393 Nov 29  2016 os-release
drwxr-xr-x. 2 root root  78 Sep  2 08:54 prelink.conf.d
lrwxrwxrwx. 1 root root  14 Sep  2 08:53 redhat-release -> centos-release
lrwxrwxrwx. 1 root root  14 Sep  2 08:53 system-release -> centos-release
-rw-r--r--. 1 root root  23 Nov 29  2016 system-release-cpe
[root@chvirnfsprd12 etc]# cat centos-release
CentOS Linux release 7.3.1611 (Core)
[root@chvirnfsprd12 etc]# cat os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/;
BUG_REPORT_URL="https://bugs.centos.org/;

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

[root@chvirnfsprd12 etc]# rpm -qa | grep ganesha | sort
glusterfs-ganesha-3.8.15-2.el7.x86_64
nfs-ganesha-2.3.3-1.el7.x86_64
nfs-ganesha-gluster-2.3.3-1.el7.x86_64
[root@chvirnfsprd12 etc]# rpm -qa | grep gluster | sort
centos-release-gluster38-1.0-1.el7.centos.noarch
glusterfs-3.8.15-2.el7.x86_64

[Gluster-users] nfs-ganesha locking problems

2017-09-29 Thread Bernhard Dübi
Hi,

I have a problem with nfs-ganesha serving gluster volumes

I can read and write files but then one of the DBAs tried to dump an
Oracle DB onto the NFS share and got the following errors:


Export: Release 11.2.0.4.0 - Production on Wed Sep 27 23:27:48 2017

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release
11.2.0.4.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options
ORA-39001: invalid argument value
ORA-39000: bad dump file specification
ORA-31641: unable to create dump file
"/u00/app/oracle/DB_BACKUPS/FPESSP11/riskdw_prod_tabs_28092017_01.dmp"
ORA-27086: unable to lock file - already in use
Linux-x86_64 Error: 37: No locks available
Additional information: 10
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3


the file exists and is accessible.


Details:
There are 2 gluster clusters involved
the first cluster hosts a number of "replica 3 arbiter 1" volumes
the second cluster only hosts the cluster.enable-shared-storage volume
across 3 nodes. it also runs nfs-ganesha in cluster configuration
(pacemaker, corosync). nfs-ganesha serves the volumes from the first
cluster.

Any idea what's wrong?

Kind Regards
Bernhard


CLUSTER 1 info
==

root@chglbcvtprd04:/etc# cat os-release
NAME="Ubuntu"
VERSION="16.04.3 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.3 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/;
SUPPORT_URL="http://help.ubuntu.com/;
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/;
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
root@chglbcvtprd04:/etc# cat lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS"
root@chglbcvtprd04:/etc# dpkg -l | grep gluster | sort
ii  glusterfs-client3.8.15-ubuntu1~xenial1
  amd64clustered file-system (client package)
ii  glusterfs-common3.8.15-ubuntu1~xenial1
  amd64GlusterFS common libraries and translator
modules
ii  glusterfs-server3.8.15-ubuntu1~xenial1
  amd64clustered file-system (server package)

root@chglbcvtprd04:~# gluster volume status ora_dump
Status of volume: ora_dump
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick chastcvtprd04:/data/glusterfs/ora_dum
p/2I-1-39/brick 49772 0  Y   11048
Brick chglbcvtprd04:/data/glusterfs/ora_dum
p/2I-1-39/brick 50108 0  Y   9990
Brick chealglaprd01:/data/glusterfs/arbiter
/vol01/ora_dump.2I-1-39 49200 0  Y   3114
Brick chastcvtprd04:/data/glusterfs/ora_dum
p/1I-1-18/brick 49773 0  Y   11085
Brick chglbcvtprd04:/data/glusterfs/ora_dum
p/1I-1-18/brick 50109 0  Y   1
Brick chealglaprd01:/data/glusterfs/arbiter
/vol02/ora_dump.1I-1-18 49201 0  Y   3080
Brick chastcvtprd04:/data/glusterfs/ora_dum
p/2I-1-48/brick 49774 0  Y   11091
Brick chglbcvtprd04:/data/glusterfs/ora_dum
p/2I-1-48/brick 50110 0  Y   10007
Brick chealglaprd01:/data/glusterfs/arbiter
/vol03/ora_dump.2I-1-48 49202 0  Y   3070
Brick chastcvtprd04:/data/glusterfs/ora_dum
p/1I-1-25/brick 49775 0  Y   11152
Brick chglbcvtprd04:/data/glusterfs/ora_dum
p/1I-1-25/brick 50111 0  Y   10012
Brick chealglaprd01:/data/glusterfs/arbiter
/vol04/ora_dump.1I-1-25 49203 0  Y   3090
Self-heal Daemon on localhost   N/A   N/AY   27438
Self-heal Daemon on chealglaprd01   N/A   N/AY   32209
Self-heal Daemon on chastcvtprd04.fpprod.co
rp  N/A   N/AY   27378

root@chglbcvtprd04:~# gluster volume info ora_dump

Volume Name: ora_dump
Type: Distributed-Replicate
Volume ID: b26e649d-d1fe-4ebc-aa03-b196c8925466
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x (2 + 1) = 12
Transport-type: tcp
Bricks:
Brick1: chastcvtprd04:/data/glusterfs/ora_dump/2I-1-39/brick
Brick2: chglbcvtprd04:/data/glusterfs/ora_dump/2I-1-39/brick
Brick3: chealglaprd01:/data/glusterfs/arbiter/vol01/ora_dump.2I-1-39 (arbiter)
Brick4: chastcvtprd04:/data/glusterfs/ora_dump/1I-1-18/brick
Brick5: chglbcvtprd04:/data/glusterfs/ora_dump/1I-1-18/brick
Brick6: chealglaprd01:/data/glusterfs/arbiter/vol02/ora_dump.1I-1-18 (arbiter)
Brick7: 

Re: [Gluster-users] Bug 1374166 or similar

2017-07-18 Thread Bernhard Dübi
Hi Jiffin,

thank you for the explanation

Kind Regards
Bernhard

2017-07-18 8:53 GMT+02:00 Jiffin Tony Thottan <jthot...@redhat.com>:
>
>
> On 16/07/17 20:11, Bernhard Dübi wrote:
>>
>> Hi,
>>
>> both Gluster servers were rebooted and now the unlink directory is clean.
>
>
> Following should have happened, If delete operation is performed gluster
> keeps file in .unlink directory if it has open fd.
> In this case since lazy umount is performed, ganesha server may still keep
> the fd's open by that client so gluster keeps
> the unlink directory even though it is removed from fuse mount.
>
> --
> Jiffin
>
>
>> Best Regards
>> Bernhard
>>
>> 2017-07-14 12:43 GMT+02:00 Bernhard Dübi <1linuxengin...@gmail.com>:
>>>
>>> Hi,
>>>
>>> yes, I mounted the Gluster volume and deleted the files from the
>>> volume not the brick
>>>
>>> mount -t glusterfs hostname:volname /mnt
>>> cd /mnt/some/directory
>>> rm -rf *
>>>
>>> restart of nfs-ganesha is planned for tomorrow. I'll keep you posted
>>> BTW: nfs-ganesha is running on a separate server in standalone
>>> configuration
>>>
>>> Best Regards
>>> Bernhard
>>>
>>> 2017-07-14 10:43 GMT+02:00 Jiffin Tony Thottan <jthot...@redhat.com>:
>>>>
>>>>
>>>> On 14/07/17 13:06, Bernhard Dübi wrote:
>>>>>
>>>>> Hello everybody,
>>>>>
>>>>> I'm in a similar situation as described in
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1374166
>>>>
>>>>
>>>> The issue got fixed by https://review.gluster.org/#/c/14820 and is
>>>> already
>>>> available in 3.8 branch
>>>>
>>>>> I have a gluster volume exported through ganesha. we had some problems
>>>>> on the gluster server and the NFS mount on the client was hanging.
>>>>> I did a lazy umount of the NFS mount on the client, then went to the
>>>>> Gluster server, mounted the Gluster volume and deleted a bunch of
>>>>> files.
>>>>> When I mounted the volume again on the client I noticed that the space
>>>>> was not freed. Now I find them in $brick/.glusterfs/unlink
>>>>
>>>> Here you have mounted the volume via glusterfs fuse mount and deleted
>>>> those
>>>> files
>>>> right(not directly from the bricks)?
>>>> Can you restart nfs-ganesha server and see what happens ?
>>>> What type of volume are you using?
>>>> --
>>>> Jiffin
>>>>
>>>>> OS: Ubuntu 16.04
>>>>> Gluster: 3.8.13
>>>>> Ganesha: 2.4.5
>>>>>
>>>>> Let me know if you need more info
>>>>>
>>>>> Best Regards
>>>>> Bernhard
>>>>> ___
>>>>> Gluster-users mailing list
>>>>> Gluster-users@gluster.org
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Bug 1374166 or similar

2017-07-16 Thread Bernhard Dübi
Hi,

both Gluster servers were rebooted and now the unlink directory is clean.

Best Regards
Bernhard

2017-07-14 12:43 GMT+02:00 Bernhard Dübi <1linuxengin...@gmail.com>:
> Hi,
>
> yes, I mounted the Gluster volume and deleted the files from the
> volume not the brick
>
> mount -t glusterfs hostname:volname /mnt
> cd /mnt/some/directory
> rm -rf *
>
> restart of nfs-ganesha is planned for tomorrow. I'll keep you posted
> BTW: nfs-ganesha is running on a separate server in standalone configuration
>
> Best Regards
> Bernhard
>
> 2017-07-14 10:43 GMT+02:00 Jiffin Tony Thottan <jthot...@redhat.com>:
>>
>>
>> On 14/07/17 13:06, Bernhard Dübi wrote:
>>>
>>> Hello everybody,
>>>
>>> I'm in a similar situation as described in
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1374166
>>
>>
>> The issue got fixed by https://review.gluster.org/#/c/14820 and is already
>> available in 3.8 branch
>>
>>>
>>> I have a gluster volume exported through ganesha. we had some problems
>>> on the gluster server and the NFS mount on the client was hanging.
>>> I did a lazy umount of the NFS mount on the client, then went to the
>>> Gluster server, mounted the Gluster volume and deleted a bunch of
>>> files.
>>> When I mounted the volume again on the client I noticed that the space
>>> was not freed. Now I find them in $brick/.glusterfs/unlink
>>
>> Here you have mounted the volume via glusterfs fuse mount and deleted those
>> files
>> right(not directly from the bricks)?
>> Can you restart nfs-ganesha server and see what happens ?
>> What type of volume are you using?
>> --
>> Jiffin
>>
>>> OS: Ubuntu 16.04
>>> Gluster: 3.8.13
>>> Ganesha: 2.4.5
>>>
>>> Let me know if you need more info
>>>
>>> Best Regards
>>> Bernhard
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Bug 1374166 or similar

2017-07-14 Thread Bernhard Dübi
Hi,

yes, I mounted the Gluster volume and deleted the files from the
volume not the brick

mount -t glusterfs hostname:volname /mnt
cd /mnt/some/directory
rm -rf *

restart of nfs-ganesha is planned for tomorrow. I'll keep you posted
BTW: nfs-ganesha is running on a separate server in standalone configuration

Best Regards
Bernhard

2017-07-14 10:43 GMT+02:00 Jiffin Tony Thottan <jthot...@redhat.com>:
>
>
> On 14/07/17 13:06, Bernhard Dübi wrote:
>>
>> Hello everybody,
>>
>> I'm in a similar situation as described in
>> https://bugzilla.redhat.com/show_bug.cgi?id=1374166
>
>
> The issue got fixed by https://review.gluster.org/#/c/14820 and is already
> available in 3.8 branch
>
>>
>> I have a gluster volume exported through ganesha. we had some problems
>> on the gluster server and the NFS mount on the client was hanging.
>> I did a lazy umount of the NFS mount on the client, then went to the
>> Gluster server, mounted the Gluster volume and deleted a bunch of
>> files.
>> When I mounted the volume again on the client I noticed that the space
>> was not freed. Now I find them in $brick/.glusterfs/unlink
>
> Here you have mounted the volume via glusterfs fuse mount and deleted those
> files
> right(not directly from the bricks)?
> Can you restart nfs-ganesha server and see what happens ?
> What type of volume are you using?
> --
> Jiffin
>
>> OS: Ubuntu 16.04
>> Gluster: 3.8.13
>> Ganesha: 2.4.5
>>
>> Let me know if you need more info
>>
>> Best Regards
>> Bernhard
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Bug 1374166 or similar

2017-07-14 Thread Bernhard Dübi
Hello everybody,

I'm in a similar situation as described in
https://bugzilla.redhat.com/show_bug.cgi?id=1374166


I have a gluster volume exported through ganesha. we had some problems
on the gluster server and the NFS mount on the client was hanging.
I did a lazy umount of the NFS mount on the client, then went to the
Gluster server, mounted the Gluster volume and deleted a bunch of
files.
When I mounted the volume again on the client I noticed that the space
was not freed. Now I find them in $brick/.glusterfs/unlink

OS: Ubuntu 16.04
Gluster: 3.8.13
Ganesha: 2.4.5

Let me know if you need more info

Best Regards
Bernhard
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] total outage - almost

2017-06-19 Thread Bernhard Dübi
Hi,


I just remembered that I posted once a bug at redhat

https://bugzilla.redhat.com/show_bug.cgi?id=1434000

could this be the same problem? but this time it's not a few files but
hundreds of thousands


BTW: I tried to disable bitrot but it didn't help

Best Regards
Bernhard


2017-06-19 16:51 GMT+02:00 Bernhard Dübi <1linuxengin...@gmail.com>:
> Hi,
>
> I checked the attributes of one of the files with I/O errors
>
> root@chastcvtprd04:~# getfattr -d -e hex -m -
> /data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014
> getfattr: Removing leading '/' from absolute path names
> # file: 
> data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014
> trusted.afr.dirty=0x
> trusted.bit-rot.bad-file=0x3100
> trusted.bit-rot.signature=0x011400ee3e3ac6a79b8efc42d0904ca431cb20d01890d300c041e905d9d78a562bf276
> trusted.bit-rot.version=0x14005841bb3c000ac813
> trusted.gfid=0x1427a79086f14ed2902e3c18e133d02b
>
>
>
>
> root@chglbcvtprd04:~# getfattr -d -e hex -m -
> /data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014
> getfattr: Removing leading '/' from absolute path names
> # file: 
> data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014
> trusted.afr.dirty=0x
> trusted.bit-rot.bad-file=0x3100
> trusted.bit-rot.signature=0x011300ee3e3ac6a79b8efc42d0904ca431cb20d01890d300c041e905d9d78a562bf276
> trusted.bit-rot.version=0x13005841b921000c222f
> trusted.gfid=0x1427a79086f14ed2902e3c18e133d02b
>
>
>
> the "dirty" is 0, that's good, isn't it?
> what's the "trusted.bit-rot.bad-file=0x3100" information?
>
> Best Regards
> Bernhard Dübi
>
> BTW: I saved all logs, maybe I can upload them somewhere
>
> 2017-06-19 15:55 GMT+02:00 Bernhard Dübi <1linuxengin...@gmail.com>:
>> Hi,
>>
>> we use a bunch of replicated gluster volumes as a backend for our
>> backup. Yesterday I noticed that some synthetic backups failed because
>> of I/O errors.
>>
>> Today I ran "find /gluster_vol -type f | xargs md5sum" and got loads
>> of I/O errors.
>> The brick log file shows the below errors
>>
>> [2017-06-19 13:42:33.554875] E [MSGID: 116020]
>> [bit-rot-stub.c:566:br_stub_check_bad_object]
>> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba
>> is a bad object. Returning
>> [2017-06-19 13:42:33.554923] E [MSGID: 116020]
>> [bit-rot-stub.c:566:br_stub_check_bad_object]
>> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba
>> is a bad object. Returning
>> [2017-06-19 13:42:33.554931] E [MSGID: 115081]
>> [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server:
>> 21461: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==>
>> (Input/output error) [Input/output error]
>> [2017-06-19 13:42:33.554940] E [MSGID: 115081]
>> [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server:
>> 21462: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==>
>> (Input/output error) [Input/output error]
>> [2017-06-19 13:42:33.555655] E [MSGID: 116020]
>> [bit-rot-stub.c:566:br_stub_check_bad_object]
>> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba
>> is a bad object. Returning
>> [2017-06-19 13:42:33.555697] E [MSGID: 115081]
>> [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server:
>> 21463: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==>
>> (Input/output error) [Input/output error]
>> [2017-06-19 13:42:33.555950] E [MSGID: 116020]
>> [bit-rot-stub.c:566:br_stub_check_bad_object]
>> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba
>> is a bad object. Returning
>> [2017-06-19 13:42:33.555983] E [MSGID: 115081]
>> [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server:
>> 21464: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==>
>> (Input/output error) [Input/output error]
>> [2017-06-19 13:42:33.556604] E [MSGID: 116020]
>> [bit-rot-stub.c:566:br_stub_check_bad_object]
>> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba
>> is a bad object. Returning
>>
>>
>>
>>
>> Any idea what's wrong?
>>
>>
>> BTW: I'm running gluster 3.8.12 on Ubuntu 16.04 - 4.4.0-79
>>
>> many thanks for your help
>> Bernhard
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] total outage - almost

2017-06-19 Thread Bernhard Dübi
Hi,

I checked the attributes of one of the files with I/O errors

root@chastcvtprd04:~# getfattr -d -e hex -m -
/data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014
getfattr: Removing leading '/' from absolute path names
# file: 
data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014
trusted.afr.dirty=0x
trusted.bit-rot.bad-file=0x3100
trusted.bit-rot.signature=0x011400ee3e3ac6a79b8efc42d0904ca431cb20d01890d300c041e905d9d78a562bf276
trusted.bit-rot.version=0x14005841bb3c000ac813
trusted.gfid=0x1427a79086f14ed2902e3c18e133d02b




root@chglbcvtprd04:~# getfattr -d -e hex -m -
/data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014
getfattr: Removing leading '/' from absolute path names
# file: 
data/glusterfs/Server_Standard/1I-1-14/brick/Server_Standard/CV_MAGNETIC/V_1050932/CHUNK_11126559/SFILE_CONTAINER_014
trusted.afr.dirty=0x
trusted.bit-rot.bad-file=0x3100
trusted.bit-rot.signature=0x011300ee3e3ac6a79b8efc42d0904ca431cb20d01890d300c041e905d9d78a562bf276
trusted.bit-rot.version=0x13005841b921000c222f
trusted.gfid=0x1427a79086f14ed2902e3c18e133d02b



the "dirty" is 0, that's good, isn't it?
what's the "trusted.bit-rot.bad-file=0x3100" information?

Best Regards
Bernhard Dübi

BTW: I saved all logs, maybe I can upload them somewhere

2017-06-19 15:55 GMT+02:00 Bernhard Dübi <1linuxengin...@gmail.com>:
> Hi,
>
> we use a bunch of replicated gluster volumes as a backend for our
> backup. Yesterday I noticed that some synthetic backups failed because
> of I/O errors.
>
> Today I ran "find /gluster_vol -type f | xargs md5sum" and got loads
> of I/O errors.
> The brick log file shows the below errors
>
> [2017-06-19 13:42:33.554875] E [MSGID: 116020]
> [bit-rot-stub.c:566:br_stub_check_bad_object]
> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba
> is a bad object. Returning
> [2017-06-19 13:42:33.554923] E [MSGID: 116020]
> [bit-rot-stub.c:566:br_stub_check_bad_object]
> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba
> is a bad object. Returning
> [2017-06-19 13:42:33.554931] E [MSGID: 115081]
> [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server:
> 21461: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==>
> (Input/output error) [Input/output error]
> [2017-06-19 13:42:33.554940] E [MSGID: 115081]
> [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server:
> 21462: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==>
> (Input/output error) [Input/output error]
> [2017-06-19 13:42:33.555655] E [MSGID: 116020]
> [bit-rot-stub.c:566:br_stub_check_bad_object]
> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba
> is a bad object. Returning
> [2017-06-19 13:42:33.555697] E [MSGID: 115081]
> [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server:
> 21463: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==>
> (Input/output error) [Input/output error]
> [2017-06-19 13:42:33.555950] E [MSGID: 116020]
> [bit-rot-stub.c:566:br_stub_check_bad_object]
> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba
> is a bad object. Returning
> [2017-06-19 13:42:33.555983] E [MSGID: 115081]
> [server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server:
> 21464: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==>
> (Input/output error) [Input/output error]
> [2017-06-19 13:42:33.556604] E [MSGID: 116020]
> [bit-rot-stub.c:566:br_stub_check_bad_object]
> 0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba
> is a bad object. Returning
>
>
>
>
> Any idea what's wrong?
>
>
> BTW: I'm running gluster 3.8.12 on Ubuntu 16.04 - 4.4.0-79
>
> many thanks for your help
> Bernhard
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] total outage - almost

2017-06-19 Thread Bernhard Dübi
Hi,

we use a bunch of replicated gluster volumes as a backend for our
backup. Yesterday I noticed that some synthetic backups failed because
of I/O errors.

Today I ran "find /gluster_vol -type f | xargs md5sum" and got loads
of I/O errors.
The brick log file shows the below errors

[2017-06-19 13:42:33.554875] E [MSGID: 116020]
[bit-rot-stub.c:566:br_stub_check_bad_object]
0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba
is a bad object. Returning
[2017-06-19 13:42:33.554923] E [MSGID: 116020]
[bit-rot-stub.c:566:br_stub_check_bad_object]
0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba
is a bad object. Returning
[2017-06-19 13:42:33.554931] E [MSGID: 115081]
[server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server:
21461: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==>
(Input/output error) [Input/output error]
[2017-06-19 13:42:33.554940] E [MSGID: 115081]
[server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server:
21462: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==>
(Input/output error) [Input/output error]
[2017-06-19 13:42:33.555655] E [MSGID: 116020]
[bit-rot-stub.c:566:br_stub_check_bad_object]
0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba
is a bad object. Returning
[2017-06-19 13:42:33.555697] E [MSGID: 115081]
[server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server:
21463: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==>
(Input/output error) [Input/output error]
[2017-06-19 13:42:33.555950] E [MSGID: 116020]
[bit-rot-stub.c:566:br_stub_check_bad_object]
0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba
is a bad object. Returning
[2017-06-19 13:42:33.555983] E [MSGID: 115081]
[server-rpc-fops.c:1201:server_fstat_cbk] 0-Server_Standard_05-server:
21464: FSTAT -2 (c75016a9-95c1-4819-b24a-e5d77107c4ba) ==>
(Input/output error) [Input/output error]
[2017-06-19 13:42:33.556604] E [MSGID: 116020]
[bit-rot-stub.c:566:br_stub_check_bad_object]
0-Server_Standard_05-bitrot-stub: c75016a9-95c1-4819-b24a-e5d77107c4ba
is a bad object. Returning




Any idea what's wrong?


BTW: I'm running gluster 3.8.12 on Ubuntu 16.04 - 4.4.0-79

many thanks for your help
Bernhard
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] ganesha.nfsd: `NTIRPC_1.4.3' not found

2017-05-20 Thread Bernhard Dübi
Hi,

is this list also dealing with nfs-ganesha problems?

I just ran a dist-upgrade on my Ubuntu 16.04 machine and now
nfs-ganesha doesn't start anymore

May 20 10:00:15 chastcvtprd03 bash[5720]: /usr/bin/ganesha.nfsd:
/lib/x86_64-linux-gnu/libntirpc.so.1.4: version `NTIRPC_1.4.3' not
found (required by /usr/bin/ganesha.nfsd)

Any hints?


Here some info about my system:

# uname -a
Linux hostname 4.4.0-78-generic #99-Ubuntu SMP Thu Apr 27 15:29:09 UTC
2017 x86_64 x86_64 x86_64 GNU/Linux

# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.2 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.2 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/;
SUPPORT_URL="http://help.ubuntu.com/;
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/;
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial


/etc/apt/sources.list.d# head *.list
==> gluster-ubuntu-glusterfs-3_8-xenial.list <==
deb http://ppa.launchpad.net/gluster/glusterfs-3.8/ubuntu xenial main
# deb-src http://ppa.launchpad.net/gluster/glusterfs-3.8/ubuntu xenial main

==> gluster-ubuntu-libntirpc-xenial.list <==
deb http://ppa.launchpad.net/gluster/libntirpc/ubuntu xenial main
# deb-src http://ppa.launchpad.net/gluster/libntirpc/ubuntu xenial main

==> gluster-ubuntu-nfs-ganesha-xenial.list <==
deb http://ppa.launchpad.net/gluster/nfs-ganesha/ubuntu xenial main
# deb-src http://ppa.launchpad.net/gluster/nfs-ganesha/ubuntu xenial main


# dpkg -l | grep -E 'gluster|ganesha|libntirpc'
ii  glusterfs-common  3.8.12-ubuntu1~xenial1
  amd64GlusterFS common libraries and translator
modules
ii  libntirpc1:amd64  1.4.4-ubuntu1~xenial1
  amd64new transport-independent RPC library
ii  nfs-ganesha   2.4.5-ubuntu1~xenial1
  amd64nfs-ganesha is a NFS server in User Space
ii  nfs-ganesha-fsal:amd642.4.5-ubuntu1~xenial1
  amd64nfs-ganesha fsal libraries


Best Regards
Bernhard
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] unlimited memory usage

2017-02-13 Thread Bernhard Dübi
Hi,

one more question: when I can convince my boss to buy another machine to
separate the load of Gluster and
Backup onto different machines, will this solve my problem or will the
Gluster client also eat up all memory it can get?

Best Regards
Bernhard

2017-02-13 21:53 GMT+01:00 Bernhard Dübi <1linuxengin...@gmail.com>:

> Hi,
>
> I'm running Gluster 3.8.8 on Ubuntu 16.04 on 2 HP Apollo 4510 with 60 x
> 8TB each
> The machines are used as Backup Media Agents for CommVault Simpana V11
> I was running this combination since Gluster 3.7. Lately I noticed that
> Gluster is using almost all available memory, starving the other
> applications. I tried to put some memory limitions on gluster using cgroups
> but that didn't work out.
> Any other idea to make Gluster less greedy on memory?
>
> Best Regards
> Bernhard
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] unlimited memory usage

2017-02-13 Thread Bernhard Dübi
Hi,

I'm running Gluster 3.8.8 on Ubuntu 16.04 on 2 HP Apollo 4510 with 60 x 8TB
each
The machines are used as Backup Media Agents for CommVault Simpana V11
I was running this combination since Gluster 3.7. Lately I noticed that
Gluster is using almost all available memory, starving the other
applications. I tried to put some memory limitions on gluster using cgroups
but that didn't work out.
Any other idea to make Gluster less greedy on memory?

Best Regards
Bernhard
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users