Re: [Gluster-users] Stale locks on shards

2018-01-24 Thread Pranith Kumar Karampuri
On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen 
wrote:

> Hi!
>
> Thank you very much for your help so far. Could you please tell an example
> command how to use aux-gid-mount to remove locks? "gluster vol clear-locks"
> seems to mount volume by itself.
>

You are correct, sorry, this was implemented around 7 years back and I
forgot that bit about it :-(. Essentially it becomes a getxattr syscall on
the file.
Could you give me the clear-locks command you were trying to execute and I
can probably convert it to the getfattr command?


>
> Best regards,
> Samuli Heinonen
>
> Pranith Kumar Karampuri 
>> 23 January 2018 at 10.30
>>
>>
>> On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen > > wrote:
>>
>> Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34:
>>
>> On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
>> > wrote:
>>
>> Hi again,
>>
>> here is more information regarding issue described earlier
>>
>> It looks like self healing is stuck. According to "heal
>> statistics"
>> crawl began at Sat Jan 20 12:56:19 2018 and it's still
>> going on
>> (It's around Sun Jan 21 20:30 when writing this). However
>> glustershd.log says that last heal was completed at
>> "2018-01-20
>> 11:00:13.090697" (which is 13:00 UTC+2). Also "heal info"
>> has been
>> running now for over 16 hours without any information. In
>> statedump
>> I can see that storage nodes have locks on files and some
>> of those
>> are blocked. Ie. Here again it says that ovirt8z2 is
>> having active
>> lock even ovirt8z2 crashed after the lock was granted.:
>>
>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>> path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
>> mandatory=0
>> inodelk-count=3
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-
>> heal
>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
>> len=0, pid
>> = 18446744073709551610, owner=d0c6d857a87f,
>> client=0x7f885845efa0,
>>
>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0,
>>
>> granted at 2018-01-20 10:59:52
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metad
>> ata
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
>> len=0, pid
>> = 3420, owner=d8b9372c397f, client=0x7f8858410be0,
>>
>> connection-id=ovirt8z2.xxx.com
>> -5652-2017/12/27-09:49:02:946825-
>> zone2-ssd1-vmstor1-client-0-7-0,
>>
>>
>> granted at 2018-01-20 08:57:23
>> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0,
>> len=0,
>> pid = 18446744073709551610, owner=d0c6d857a87f,
>> client=0x7f885845efa0,
>>
>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0,
>>
>> blocked at 2018-01-20 10:59:52
>>
>> I'd also like to add that volume had arbiter brick before
>> crash
>> happened. We decided to remove it because we thought that
>> it was
>> causing issues. However now I think that this was
>> unnecessary. After
>> the crash arbiter logs had lots of messages like this:
>> [2018-01-20 10:19:36.515717] I [MSGID: 115072]
>> [server-rpc-fops.c:1640:server_setattr_cbk]
>> 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
>> 
>> (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not
>> permitted)
>> [Operation not permitted]
>>
>> Is there anyways to force self heal to stop? Any help
>> would be very
>> much appreciated :)
>>
>>
>> Exposing .shard to a normal mount is opening a can of worms. You
>> should probably look at mounting the volume with gfid
>> aux-mount where
>> you can access a file with
>> /.gfid/to clear
>> locks on it.
>>
>> Mount command:  mount -t glusterfs -o aux-gfid-mount vm1:test
>> /mnt/testvol
>>
>> A gfid string will have some hyphens like:
>> 8443-1894-4273-9340-4b212fa1c0e4
>>
>> That said. Next disconnect on the brick where you successfully
>> did the
>> clear-locks will crash the brick. There was a bug in 3.8.x
>> series with
>> clear-locks which was fixed in 

Re: [Gluster-users] geo-replication command rsync returned with 3

2018-01-24 Thread Kotresh Hiremath Ravishankar
It is clear that rsync is failing. Are the rsync versions on all masters
and slave nodes same?
I have seen that has caused problems sometimes.

-Kotresh HR

On Wed, Jan 24, 2018 at 10:29 PM, Dietmar Putz 
wrote:

> Hi all,
> i have made some tests on the latest Ubuntu 16.04.3 server image. Upgrades
> were disabled...
> the configuration was always the same...a distributed replicated volume on
> 4 VM's with geo-replication to a dist. repl .volume on 4 VM's.
> i started with 3.7.20, upgrade to 3.8.15, to 3.10.9 to 3.12.5. After each
> upgrade i have tested the geo-replication which worked well anytime.
> then i have made an update / upgrade on the first master node. directly
> after upgrade the below shown error appeared on that node.
> after upgrade on the second master node the error appeared there also...
> geo replication is faulty.
>
> this error affects gfs 3.7.20, 3.8.15, 3.10.9 and 3.12.5 on Ubuntu 16.04.3
> in one test i have updated rsync from 3.1.1 to 3.1.2 but with no effect.
>
> does anyone else experienced this behavior...any idea ?
>
> best regards
> Dietmar
>
>
> gfs 3.12.5 geo-rep log on master :
>
> [2018-01-24 15:50:35.347959] I [master(/brick1/mvol1):1385:crawl]
> _GMaster: slave's timestime=(1516808792, 0)
> [2018-01-24 15:50:35.604094] I [master(/brick1/mvol1):1863:syncjob]
> Syncer: Sync Time Takenduration=0.0294num_files=1job=2
> return_code=3
> [2018-01-24 15:50:35.605490] E [resource(/brick1/mvol1):210:errlog]
> Popen: command returned errorcmd=rsync -aR0 --inplace --files-from=-
> --super --stats --numeric-ids --no-implied-dirs --existing --xattrs --acls
> --ignore-missing-args . -e ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem
> -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-MZwEp2/
> cbad1c5f88978ecd713bdb1478fbabbe.sock --compress root@gl-node5-int
> :/proc/2013/cwderror=3
> [2018-01-24 15:50:35.628978] I [syncdutils(/brick1/mvol1):271:finalize]
> : exiting.
>
>
>
> after this upgrade one server fails :
> Start-Date: 2018-01-18  04:33:52
> Commandline: /usr/bin/unattended-upgrade
> Upgrade:
> libdns-export162:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10),
> libisccfg140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10),
> bind9-host:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10),
> dnsutils:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10),
> libc6:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10),
> libisc160:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10),
> locales:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10),
> libisc-export160:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10),
> libc-bin:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10),
> liblwres141:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10),
> libdns162:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10),
> multiarch-support:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10),
> libisccc140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10),
> libbind9-140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8,
> 1:9.10.3.dfsg.P4-8ubuntu1.10)
> End-Date: 2018-01-18  04:34:32
>
>
>
> strace rsync :
>
> 30743 23:34:47 newfstatat(3, "6737", {st_mode=S_IFDIR|0755, st_size=4096,
> ...}, AT_SYMLINK_NOFOLLOW) = 0
> 30743 23:34:47 newfstatat(3, "6741", {st_mode=S_IFDIR|0755, st_size=4096,
> ...}, AT_SYMLINK_NOFOLLOW) = 0
> 30743 23:34:47 getdents(3, /* 0 entries */, 131072) = 0
> 30743 23:34:47 munmap(0x7fa4feae7000, 135168) = 0
> 30743 23:34:47 close(3) = 0
> 30743 23:34:47 write(2, "rsync: getcwd(): No such file or directory (2)",
> 46) = 46
> 30743 23:34:47 write(2, "\n", 1)= 1
> 30743 23:34:47 rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER,
> 0x7fa4fdf404b0}, NULL, 8) = 0
> 30743 23:34:47 rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER,
> 0x7fa4fdf404b0}, NULL, 8) = 0
> 30743 23:34:47 write(2, "rsync error: errors selecting input/output files,
> dirs (code 3) at util.c(1056) [Receiver=3.1.1]", 96) = 96
> 30743 23:34:47 write(2, "\n", 1)= 1
> 30743 23:34:47 exit_group(3)= ?
> 30743 23:34:47 +++ exited with 3 +++
>
>
>
>
> Am 19.01.2018 um 17:27 schrieb Joe Julian:
>
> ubuntu 16.04
>
>
> --
> Dietmar Putz
> 3Q GmbH
> Kurfürstendamm 102
> D-10711 Berlin
>
> Mobile:   +49 171 / 90 160 39
> Mail: dietmar.p...@3qsdn.com
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Thanks and Regards,
Kotresh H R
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Stale locks on shards

2018-01-24 Thread Samuli Heinonen

Hi!

Thank you very much for your help so far. Could you please tell an 
example command how to use aux-gid-mount to remove locks? "gluster vol 
clear-locks" seems to mount volume by itself.


Best regards,
Samuli Heinonen


Pranith Kumar Karampuri 
23 January 2018 at 10.30


On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen 
> wrote:


Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34:

On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
> wrote:

Hi again,

here is more information regarding issue described earlier

It looks like self healing is stuck. According to "heal
statistics"
crawl began at Sat Jan 20 12:56:19 2018 and it's still
going on
(It's around Sun Jan 21 20:30 when writing this). However
glustershd.log says that last heal was completed at
"2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also "heal info"
has been
running now for over 16 hours without any information. In
statedump
I can see that storage nodes have locks on files and some
of those
are blocked. Ie. Here again it says that ovirt8z2 is
having active
lock even ovirt8z2 crashed after the lock was granted.:

[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
len=0, pid
= 18446744073709551610, owner=d0c6d857a87f,
client=0x7f885845efa0,


connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,

granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
len=0, pid
= 3420, owner=d8b9372c397f, client=0x7f8858410be0,

connection-id=ovirt8z2.xxx.com

-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,

granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0,
len=0,
pid = 18446744073709551610, owner=d0c6d857a87f,
client=0x7f885845efa0,


connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,

blocked at 2018-01-20 10:59:52

I'd also like to add that volume had arbiter brick before
crash
happened. We decided to remove it because we thought that
it was
causing issues. However now I think that this was
unnecessary. After
the crash arbiter logs had lots of messages like this:
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR

(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not
permitted)
[Operation not permitted]

Is there anyways to force self heal to stop? Any help
would be very
much appreciated :)


Exposing .shard to a normal mount is opening a can of worms. You
should probably look at mounting the volume with gfid
aux-mount where
you can access a file with
/.gfid/to clear
locks on it.

Mount command:  mount -t glusterfs -o aux-gfid-mount vm1:test
/mnt/testvol

A gfid string will have some hyphens like:
8443-1894-4273-9340-4b212fa1c0e4

That said. Next disconnect on the brick where you successfully
did the
clear-locks will crash the brick. There was a bug in 3.8.x
series with
clear-locks which was fixed in 3.9.0 with a feature. The self-heal
deadlocks that you witnessed also is fixed in 3.10 version of the
release.



Thank you the answer. Could you please tell more about crash? What
will actually happen or is there a bug report about it? Just want
to make sure that we can do everything to secure data on bricks.
We will look into upgrade but we have to make sure that new
version works for us and of course get self healing working before
doing anything :)


Locks xlator/module maintains a list of locks that are granted to a 
client. Clear locks had an issue where it forgets to remove the lock 
from this list. So the connection list ends up 

[Gluster-users] It necessary make backup the .glusterfs directory ?

2018-01-24 Thread César E . Portela

Hi All,

I have two glusterfs servers and doing the backup of these is very slow, 
when it does not fail.

I have thousand and thousand and thousand files...

Apparently the directory .glusterfs has some responsibility for the 
backup failure.


Is necessary to make a backup of the .glusterfs directory?

Thanks in advance.


--
Cease
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] geo-replication command rsync returned with 3

2018-01-24 Thread Dietmar Putz

Hi all,

i have made some tests on the latest Ubuntu 16.04.3 server image. 
Upgrades were disabled...
the configuration was always the same...a distributed replicated volume 
on 4 VM's with geo-replication to a dist. repl .volume on 4 VM's.
i started with 3.7.20, upgrade to 3.8.15, to 3.10.9 to 3.12.5. After 
each upgrade i have tested the geo-replication which worked well anytime.
then i have made an update / upgrade on the first master node. directly 
after upgrade the below shown error appeared on that node.
after upgrade on the second master node the error appeared there also... 
geo replication is faulty.


this error affects gfs 3.7.20, 3.8.15, 3.10.9 and 3.12.5 on Ubuntu 16.04.3
in one test i have updated rsync from 3.1.1 to 3.1.2 but with no effect.

does anyone else experienced this behavior...any idea ?

best regards
Dietmar


gfs 3.12.5 geo-rep log on master :

[2018-01-24 15:50:35.347959] I [master(/brick1/mvol1):1385:crawl] 
_GMaster: slave's time stime=(1516808792, 0)
[2018-01-24 15:50:35.604094] I [master(/brick1/mvol1):1863:syncjob] 
Syncer: Sync Time Taken duration=0.0294    num_files=1    job=2    
return_code=3
[2018-01-24 15:50:35.605490] E [resource(/brick1/mvol1):210:errlog] 
Popen: command returned error    cmd=rsync -aR0 --inplace --files-from=- 
--super --stats --numeric-ids --no-implied-dirs --existing --xattrs 
--acls --ignore-missing-args . -e ssh -oPasswordAuthentication=no 
-oStrictHostKeyChecking=no -i 
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto 
-S /tmp/gsyncd-aux-ssh-MZwEp2/cbad1c5f88978ecd713bdb1478fbabbe.sock 
--compress root@gl-node5-int:/proc/2013/cwd    error=3
[2018-01-24 15:50:35.628978] I [syncdutils(/brick1/mvol1):271:finalize] 
: exiting.




after this upgrade one server fails :
Start-Date: 2018-01-18  04:33:52
Commandline: /usr/bin/unattended-upgrade
Upgrade:
libdns-export162:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 
1:9.10.3.dfsg.P4-8ubuntu1.10),
libisccfg140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 
1:9.10.3.dfsg.P4-8ubuntu1.10),
bind9-host:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 
1:9.10.3.dfsg.P4-8ubuntu1.10),

dnsutils:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 1:9.10.3.dfsg.P4-8ubuntu1.10),
libc6:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10),
libisc160:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 
1:9.10.3.dfsg.P4-8ubuntu1.10),

locales:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10),
libisc-export160:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 
1:9.10.3.dfsg.P4-8ubuntu1.10),

libc-bin:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10),
liblwres141:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 
1:9.10.3.dfsg.P4-8ubuntu1.10),
libdns162:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 
1:9.10.3.dfsg.P4-8ubuntu1.10),

multiarch-support:amd64 (2.23-0ubuntu9, 2.23-0ubuntu10),
libisccc140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 
1:9.10.3.dfsg.P4-8ubuntu1.10),
libbind9-140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.8, 
1:9.10.3.dfsg.P4-8ubuntu1.10)

End-Date: 2018-01-18  04:34:32



strace rsync :

30743 23:34:47 newfstatat(3, "6737", {st_mode=S_IFDIR|0755, 
st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
30743 23:34:47 newfstatat(3, "6741", {st_mode=S_IFDIR|0755, 
st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0

30743 23:34:47 getdents(3, /* 0 entries */, 131072) = 0
30743 23:34:47 munmap(0x7fa4feae7000, 135168) = 0
30743 23:34:47 close(3) = 0
30743 23:34:47 write(2, "rsync: getcwd(): No such file or directory 
(2)", 46) = 46

30743 23:34:47 write(2, "\n", 1)    = 1
30743 23:34:47 rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER, 
0x7fa4fdf404b0}, NULL, 8) = 0
30743 23:34:47 rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER, 
0x7fa4fdf404b0}, NULL, 8) = 0
30743 23:34:47 write(2, "rsync error: errors selecting input/output 
files, dirs (code 3) at util.c(1056) [Receiver=3.1.1]", 96) = 96

30743 23:34:47 write(2, "\n", 1)    = 1
30743 23:34:47 exit_group(3)    = ?
30743 23:34:47 +++ exited with 3 +++




Am 19.01.2018 um 17:27 schrieb Joe Julian:

ubuntu 16.04


--
Dietmar Putz
3Q GmbH
Kurfürstendamm 102
D-10711 Berlin
 
Mobile:   +49 171 / 90 160 39

Mail: dietmar.p...@3qsdn.com

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Replacing a third data node with an arbiter one

2018-01-24 Thread Hoggins!
Hello,

The subject says it all. I have a replica 3 cluster :

gluster> volume info thedude
 
Volume Name: thedude
Type: Replicate
Volume ID: bc68dfd3-94e2-4126-b04d-77b51ec6f27e
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: ngluster-1.network.hoggins.fr:/export/brick/thedude
Brick2: ngluster-2.network.hoggins.fr:/export/brick/thedude
Brick3: ngluster-3.network.hoggins.fr:/export/brick/thedude
Options Reconfigured:
cluster.server-quorum-type: server
transport.address-family: inet
nfs.disable: on
performance.readdir-ahead: on
client.event-threads: 8
server.event-threads: 15


... and I would like to replace, say ngluster-2 with an arbiter-only
node, without any data. Is that possible ? How ?

Thanks !

    Hoggins!



signature.asc
Description: OpenPGP digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] fault tolerancy in glusterfs distributed volume

2018-01-24 Thread Aravinda

Volume will be available even if one of the brick each sub volume goes down.

Sub volume 1 bricks:
Brick1: 10.0.0.2:/brick
Brick2: 10.0.0.3:/brick
Brick3: 10.0.0.1:/brick

Subvolume 2 bricks:
Brick4: 10.0.0.5:/brick
Brick5: 10.0.0.6:/brick
Brick6: 10.0.0.7:/brick

On Wednesday 24 January 2018 04:36 PM, atris adam wrote:

I have made  a distributed replica3 volume with 6 nodes. I mean this:

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: f271a9bd-6599-43e7-bc69-26695b55d206
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.0.0.2:/brick
Brick2: 10.0.0.3:/brick
Brick3: 10.0.0.1:/brick
Brick4: 10.0.0.5:/brick
Brick5: 10.0.0.6:/brick
Brick6: 10.0.0.7:/brick
Options Reconfigured:
cluster.quorum-type: auto
cluster.server-quorum-type: server
nfs.disable: on
transport.address-family: inet

I have set quorum in client and server side, I want to know about 
fault tolerancy in distributed volume, how many bricks goes down, my 
volume is still available?



___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users



--
regards
Aravinda VK

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Split brain directory

2018-01-24 Thread Karthik Subrahmanya
Hey,

>From the getfattr output you have provided, the directory is clearly not in
split brain.
If all the bricks are being blamed by others then it is called split brain.
In your case only client-13 that is Brick-14 in the volume info output had
a pending entry heal on the directory.
That is the last replica subvol which consists of the bricks

Brick13: glusterserver03.mydomain.local:/bricks/video/brick3/safe
Brick14: glusterserver04.mydomain.local:/bricks/video/brick3/safe
Brick15: glusterserver05.mydomain.local:/bricks/video/brick3/safe (arbiter)

Which got healed as part of the heal you ran, or part of the self heal
crawl and pending xattrs got reset to all zeros.
Which file are you not able to access? Can you give the getfattr output of
that file and give the shd log
and the mount log where you were not able to access the file.

Regards,
Karthik

On Wed, Jan 24, 2018 at 2:00 PM, Luca Gervasi 
wrote:

> Hello,
> I'm trying to fix an issue with a Directory Split on a gluster 3.10.3. The
> effect consist of a specific file in this splitted directory to randomly be
> unavailable on some clients.
> I have gathered all the informations on this gist: https://gist.
> githubusercontent.com/lucagervasi/534e0024d349933eef44615fa8a5c374/raw/
> 52ff8dd6a9cc8ba09b7f258aa85743d2854f9acc/splitinfo.txt
>
> I discovered the splitted directory by the extended attributes (lines
> 172,173, 291,292,
> trusted.afr.dirty=0x
> trusted.afr.vol-video-client-13=0x
> Seen on the bricks
> * /bricks/video/brick3/safe/video.mysite.it/htdocs/ su glusterserver05
> (lines 278 ro 294)
> * /bricks/video/brick3/safe/video.mysite.it/htdocs/ su glusterserver03
> (lines 159 to 175)
>
> Reading the documentation about afr extended attributes, this situation
> seems unclear (Docs from [1] and [2])
> as own changelog is 0, same as client-13 (glusterserver02.mydomain.
> local:/bricks/video/brick3/safe)
> as my understanding, such "dirty" attributes seems to indicate no split at
> all (feel free to correct me).
>
> Some days ago, I issued a "gluster volume heal vol-video full", which
> endend (probably) that day, leaving no info on /var/log/gluster/glustershd.log
> nor fixing this split.
> I tried to trigger a self heal using "stat" and "ls -l" over the splitted
> directory from a glusterfs mounted client directory, without having the bit
> set cleared.
> The volume heal info split-brain itself shows zero items to be healed
> (lines 388 to 446).
>
> All the clients mount this volume using glusterfs-fuse.
>
> I don't know what to do, please help.
>
> Thanks.
>
> Luca Gervasi
>
> References:
> [1] https://access.redhat.com/documentation/en-US/Red_Hat_
> Storage/2.1/html/Administration_Guide/Recovering_from_File_Split-
> brain.html
> [2] https://access.redhat.com/documentation/en-us/red_hat_
> gluster_storage/3.3/html/administration_guide/sect-managing_split-brain
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] fault tolerancy in glusterfs distributed volume

2018-01-24 Thread atris adam
I have made  a distributed replica3 volume with 6 nodes. I mean this:

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: f271a9bd-6599-43e7-bc69-26695b55d206
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.0.0.2:/brick
Brick2: 10.0.0.3:/brick
Brick3: 10.0.0.1:/brick
Brick4: 10.0.0.5:/brick
Brick5: 10.0.0.6:/brick
Brick6: 10.0.0.7:/brick
Options Reconfigured:
cluster.quorum-type: auto
cluster.server-quorum-type: server
nfs.disable: on
transport.address-family: inet

I have set quorum in client and server side, I want to know about fault
tolerancy in distributed volume, how many bricks goes down, my volume is
still available?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Split brain directory

2018-01-24 Thread Luca Gervasi
Hello,
I'm trying to fix an issue with a Directory Split on a gluster 3.10.3. The
effect consist of a specific file in this splitted directory to randomly be
unavailable on some clients.
I have gathered all the informations on this gist:
https://gist.githubusercontent.com/lucagervasi/534e0024d349933eef44615fa8a5c374/raw/52ff8dd6a9cc8ba09b7f258aa85743d2854f9acc/splitinfo.txt

I discovered the splitted directory by the extended attributes (lines
172,173, 291,292,
trusted.afr.dirty=0x
trusted.afr.vol-video-client-13=0x
Seen on the bricks
* /bricks/video/brick3/safe/video.mysite.it/htdocs/ su glusterserver05
(lines 278 ro 294)
* /bricks/video/brick3/safe/video.mysite.it/htdocs/ su glusterserver03
(lines 159 to 175)

Reading the documentation about afr extended attributes, this situation
seems unclear (Docs from [1] and [2])
as own changelog is 0, same as client-13
(glusterserver02.mydomain.local:/bricks/video/brick3/safe)
as my understanding, such "dirty" attributes seems to indicate no split at
all (feel free to correct me).

Some days ago, I issued a "gluster volume heal vol-video full", which
endend (probably) that day, leaving no info on
/var/log/gluster/glustershd.log nor fixing this split.
I tried to trigger a self heal using "stat" and "ls -l" over the splitted
directory from a glusterfs mounted client directory, without having the bit
set cleared.
The volume heal info split-brain itself shows zero items to be healed
(lines 388 to 446).

All the clients mount this volume using glusterfs-fuse.

I don't know what to do, please help.

Thanks.

Luca Gervasi

References:
[1]
https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/Recovering_from_File_Split-brain.html
[2]
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/sect-managing_split-brain
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users