subject:"\[Gluster\-users\] Issues in AFR and self healing"

Re: [Gluster-users] Issues in AFR and self healing

2018-08-21 Thread Pablo Schandin

I couldn't find any disconnections yet. We analyzed the port's traffic 
to see if there was too much data going through, but that was OK. I also 
cannot see any other disconnections so for now we will continue to check 
the network because I might have missed something.


Thanks for all the help! If I have any other news I will let you know.

Pablo.

On 08/16/2018 01:06 AM, Ravishankar N wrote:




On 08/15/2018 11:07 PM, Pablo Schandin wrote:


I found another log that I wasn't aware of in 
/var/log/glusterfs/brick, that is te mount log, I confused the log 
files. In this file I see a lot of entries like this one:


[2018-08-15 16:41:19.568477] I [addr.c:55:compare_addr_and_update] 
0-/mnt/brick1/gv1: allowed = "172.20.36.10", received addr = 
"172.20.36.11"
[2018-08-15 16:41:19.568527] I [addr.c:55:compare_addr_and_update] 
0-/mnt/brick1/gv1: allowed = "172.20.36.11", received addr = 
"172.20.36.11"
[2018-08-15 16:41:19.568547] I [login.c:76:gf_auth] 0-auth/login: 
allowed user names: 7107ccfa-0ba1-4172-aa5a-031568927bf1
[2018-08-15 16:41:19.568564] I [MSGID: 115029] 
[server-handshake.c:793:server_setvolume] 0-gv1-server: accepted 
client from 
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0 
(version: 3.1

2.6)
[2018-08-15 16:41:19.582710] I [MSGID: 115036] 
[server.c:527:server_rpc_notify] 0-gv1-server: disconnecting 
connection from 
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0
[2018-08-15 16:41:19.582830] I [MSGID: 101055] 
[client_t.c:443:gf_client_unref] 0-gv1-server: Shutting down 
connection 
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0


So I see a lot of disconnections, right? This might be why the self 
healing is triggered all the time?


Not necessarily. These disconnects could also be due to the glfsheal 
binary which is invoked when you run `gluster vol heal volname info` 
etc and do not cause heals. It would be better to check your client 
mount logs for disconnect messages like these:


[2018-08-16 03:59:32.289763] I [MSGID: 114018] 
[client.c:2285:client_rpc_notify] 0-testvol-client-4: disconnected 
from testvol-client-0. Client process will keep trying to connect to 
glusterd until brick's port is available


If there are no disconnects and you are still seeing files undergoing 
heal, then you might want to check the brick logs to see if there are 
any write failures.

Thanks,
Ravi


Thanks!

Pablo.

Avature

Get Engaged to Talent



On 08/14/2018 09:15 AM, Pablo Schandin wrote:


Thanks for the info!

I cannot see any logs in the mount log besides one line every time 
it rotates


[2018-08-13 06:25:02.246187] I 
[glusterfsd-mgmt.c:1821:mgmt_getspec_cbk] 0-glusterfs: No change in 
volfile,continuing


But I did find in the glfsheal-gv1.log of the volumes some kind of 
server-client connection that was disconnected and now it connects 
using a different port. The block of log per each run is kind of 
long so I'm copying it into a pastebin.


https://pastebin.com/bp06rrsT

Maybe this has something to do with it?

Thanks!

Pablo.

On 08/11/2018 12:19 AM, Ravishankar N wrote:




On 08/10/2018 11:25 PM, Pablo Schandin wrote:


Hello everyone!

I'm having some trouble with something but I'm not quite sure of 
with what yet. I'm running GlusterFS 3.12.6 on Ubuntu 16.04. I 
have two servers (nodes) in the cluster in a replica mode. Each 
server has 2 bricks. As the servers are KVM running several VMs, 
one brick has some VMs locally defined in it and the second brick 
is the replicated from the other server. It has data but not 
actual writing is being done except for the replication.


                            Server 1                               
                  Server 2
Volume 1 (gv1): Brick 1 defined VMs (read/write) >            
      Brick 1 replicated qcow2 files
Volume 2 (gv2): Brick 2 replicated qcow2 files <-            
 Brick 2 defined VMs (read/write)


So, the main issue arose when I got a nagios alarm that warned 
about a file listed to be healed. And then it disappeared. I came 
to find out that every 5 minutes, the self heal daemon triggers 
the healing and this fixes it. But looking at the logs I have a 
lot of entries in the glustershd.log file like this:


[2018-08-09 14:23:37.689403] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-gv1-replicate-0: 
Completed data selfheal on 407bd97b-e76c-4f81-8f59-7dae11507b0c. 
sources=[0] sinks=1
[2018-08-09 14:44:37.933143] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-gv2-replicate-0: 
Completed data selfheal on 73713556-5b63-4f91-b83d-d7d82fee111f. 
sources=[0] sinks=1


The qcow2 files are being healed several times a day (up to 30 in 
occasions). As I understand, this means that a data heal occurred 
on file with gfid 407b... and 7371... in source to sink. Local 
server to replica server? Is it OK for the shd to heal files in 
the replicated brick that supposedly has no writing on it

Re: [Gluster-users] Issues in AFR and self healing

2018-08-15 Thread Ravishankar N




On 08/15/2018 11:07 PM, Pablo Schandin wrote:


I found another log that I wasn't aware of in 
/var/log/glusterfs/brick, that is te mount log, I confused the log 
files. In this file I see a lot of entries like this one:


[2018-08-15 16:41:19.568477] I [addr.c:55:compare_addr_and_update] 
0-/mnt/brick1/gv1: allowed = "172.20.36.10", received addr = 
"172.20.36.11"
[2018-08-15 16:41:19.568527] I [addr.c:55:compare_addr_and_update] 
0-/mnt/brick1/gv1: allowed = "172.20.36.11", received addr = 
"172.20.36.11"
[2018-08-15 16:41:19.568547] I [login.c:76:gf_auth] 0-auth/login: 
allowed user names: 7107ccfa-0ba1-4172-aa5a-031568927bf1
[2018-08-15 16:41:19.568564] I [MSGID: 115029] 
[server-handshake.c:793:server_setvolume] 0-gv1-server: accepted 
client from 
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0 
(version: 3.1

2.6)
[2018-08-15 16:41:19.582710] I [MSGID: 115036] 
[server.c:527:server_rpc_notify] 0-gv1-server: disconnecting 
connection from 
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0
[2018-08-15 16:41:19.582830] I [MSGID: 101055] 
[client_t.c:443:gf_client_unref] 0-gv1-server: Shutting down 
connection 
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0


So I see a lot of disconnections, right? This might be why the self 
healing is triggered all the time?


Not necessarily. These disconnects could also be due to the glfsheal 
binary which is invoked when you run `gluster vol heal volname info` etc 
and do not cause heals. It would be better to check your client mount 
logs for disconnect messages like these:


[2018-08-16 03:59:32.289763] I [MSGID: 114018] 
[client.c:2285:client_rpc_notify] 0-testvol-client-4: disconnected from 
testvol-client-0. Client process will keep trying to connect to glusterd 
until brick's port is available


If there are no disconnects and you are still seeing files undergoing 
heal, then you might want to check the brick logs to see if there are 
any write failures.

Thanks,
Ravi


Thanks!

Pablo.

Avature

Get Engaged to Talent



On 08/14/2018 09:15 AM, Pablo Schandin wrote:


Thanks for the info!

I cannot see any logs in the mount log besides one line every time it 
rotates


[2018-08-13 06:25:02.246187] I 
[glusterfsd-mgmt.c:1821:mgmt_getspec_cbk] 0-glusterfs: No change in 
volfile,continuing


But I did find in the glfsheal-gv1.log of the volumes some kind of 
server-client connection that was disconnected and now it connects 
using a different port. The block of log per each run is kind of long 
so I'm copying it into a pastebin.


https://pastebin.com/bp06rrsT

Maybe this has something to do with it?

Thanks!

Pablo.

On 08/11/2018 12:19 AM, Ravishankar N wrote:




On 08/10/2018 11:25 PM, Pablo Schandin wrote:


Hello everyone!

I'm having some trouble with something but I'm not quite sure of 
with what yet. I'm running GlusterFS 3.12.6 on Ubuntu 16.04. I have 
two servers (nodes) in the cluster in a replica mode. Each server 
has 2 bricks. As the servers are KVM running several VMs, one brick 
has some VMs locally defined in it and the second brick is the 
replicated from the other server. It has data but not actual 
writing is being done except for the replication.


                            Server 1                               
                  Server 2
Volume 1 (gv1): Brick 1 defined VMs (read/write) >            
      Brick 1 replicated qcow2 files
Volume 2 (gv2): Brick 2 replicated qcow2 files <-            
 Brick 2 defined VMs (read/write)


So, the main issue arose when I got a nagios alarm that warned 
about a file listed to be healed. And then it disappeared. I came 
to find out that every 5 minutes, the self heal daemon triggers the 
healing and this fixes it. But looking at the logs I have a lot of 
entries in the glustershd.log file like this:


[2018-08-09 14:23:37.689403] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-gv1-replicate-0: 
Completed data selfheal on 407bd97b-e76c-4f81-8f59-7dae11507b0c. 
sources=[0] sinks=1
[2018-08-09 14:44:37.933143] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-gv2-replicate-0: 
Completed data selfheal on 73713556-5b63-4f91-b83d-d7d82fee111f. 
sources=[0] sinks=1


The qcow2 files are being healed several times a day (up to 30 in 
occasions). As I understand, this means that a data heal occurred 
on file with gfid 407b... and 7371... in source to sink. Local 
server to replica server? Is it OK for the shd to heal files in the 
replicated brick that supposedly has no writing on it besides the 
mirroring? How does that work?


In AFR, for writes, there is no notion of local/remote brick. No 
matter from which client you write to the volume, it gets sent to 
both bricks. i.e. the replication is synchronous and real time.


How does afr replication work? The file with gfid 7371... is the 
qcow2 root disk of an owncloud server with 17GB of data. It does 
not seem to be that big

Re: [Gluster-users] Issues in AFR and self healing

2018-08-15 Thread Pablo Schandin

I found another log that I wasn't aware of in /var/log/glusterfs/brick, 
that is te mount log, I confused the log files. In this file I see a lot 
of entries like this one:


[2018-08-15 16:41:19.568477] I [addr.c:55:compare_addr_and_update] 
0-/mnt/brick1/gv1: allowed = "172.20.36.10", received addr = "172.20.36.11"
[2018-08-15 16:41:19.568527] I [addr.c:55:compare_addr_and_update] 
0-/mnt/brick1/gv1: allowed = "172.20.36.11", received addr = "172.20.36.11"
[2018-08-15 16:41:19.568547] I [login.c:76:gf_auth] 0-auth/login: 
allowed user names: 7107ccfa-0ba1-4172-aa5a-031568927bf1
[2018-08-15 16:41:19.568564] I [MSGID: 115029] 
[server-handshake.c:793:server_setvolume] 0-gv1-server: accepted client 
from 
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0 
(version: 3.1

2.6)
[2018-08-15 16:41:19.582710] I [MSGID: 115036] 
[server.c:527:server_rpc_notify] 0-gv1-server: disconnecting connection 
from 
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0
[2018-08-15 16:41:19.582830] I [MSGID: 101055] 
[client_t.c:443:gf_client_unref] 0-gv1-server: Shutting down connection 
physinfra-hb2.xcade.net-21091-2018/08/15-16:41:03:103872-gv1-client-0-0-0


So I see a lot of disconnections, right? This might be why the self 
healing is triggered all the time?


Thanks!

Pablo.

Avature

Get Engaged to Talent



On 08/14/2018 09:15 AM, Pablo Schandin wrote:


Thanks for the info!

I cannot see any logs in the mount log besides one line every time it 
rotates


[2018-08-13 06:25:02.246187] I 
[glusterfsd-mgmt.c:1821:mgmt_getspec_cbk] 0-glusterfs: No change in 
volfile,continuing


But I did find in the glfsheal-gv1.log of the volumes some kind of 
server-client connection that was disconnected and now it connects 
using a different port. The block of log per each run is kind of long 
so I'm copying it into a pastebin.


https://pastebin.com/bp06rrsT

Maybe this has something to do with it?

Thanks!

Pablo.

On 08/11/2018 12:19 AM, Ravishankar N wrote:




On 08/10/2018 11:25 PM, Pablo Schandin wrote:


Hello everyone!

I'm having some trouble with something but I'm not quite sure of 
with what yet. I'm running GlusterFS 3.12.6 on Ubuntu 16.04. I have 
two servers (nodes) in the cluster in a replica mode. Each server 
has 2 bricks. As the servers are KVM running several VMs, one brick 
has some VMs locally defined in it and the second brick is the 
replicated from the other server. It has data but not actual writing 
is being done except for the replication.


                            Server 1                               
              Server 2
Volume 1 (gv1): Brick 1 defined VMs (read/write) >            
      Brick 1 replicated qcow2 files
Volume 2 (gv2): Brick 2 replicated qcow2 files <-            
 Brick 2 defined VMs (read/write)


So, the main issue arose when I got a nagios alarm that warned about 
a file listed to be healed. And then it disappeared. I came to find 
out that every 5 minutes, the self heal daemon triggers the healing 
and this fixes it. But looking at the logs I have a lot of entries 
in the glustershd.log file like this:


[2018-08-09 14:23:37.689403] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-gv1-replicate-0: 
Completed data selfheal on 407bd97b-e76c-4f81-8f59-7dae11507b0c. 
sources=[0]  sinks=1
[2018-08-09 14:44:37.933143] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-gv2-replicate-0: 
Completed data selfheal on 73713556-5b63-4f91-b83d-d7d82fee111f. 
sources=[0]  sinks=1


The qcow2 files are being healed several times a day (up to 30 in 
occasions). As I understand, this means that a data heal occurred on 
file with gfid 407b... and 7371... in source to sink. Local server 
to replica server? Is it OK for the shd to heal files in the 
replicated brick that supposedly has no writing on it besides the 
mirroring? How does that work?


In AFR, for writes, there is no notion of local/remote brick. No 
matter from which client you write to the volume, it gets sent to 
both bricks. i.e. the replication is synchronous and real time.


How does afr replication work? The file with gfid 7371... is the 
qcow2 root disk of an owncloud server with 17GB of data. It does not 
seem to be that big to be a bottleneck of some sort, I think.


Also, I was investigating the directory tree in 
brick/.glusterfs/indices and I notices that both in xattrop and 
dirty I always have a file created named xattrop-xx and 
dirty-xx. I read that the xattrop file is like a parent file or 
handle to reference other files created there as hardlinks with gfid 
name for the shd to heal. Is the same case as the ones in the dirty dir?


Yes, before the write, the gfid gets captured inside dirty on all 
bricks. If the write is successful, it gets removed. In addition, if 
the write fails on one brick, the other brick will capture the gfid 
inside xattrop.


Any help will be greatly appreciated it. Thanks!

If frequent

Re: [Gluster-users] Issues in AFR and self healing

2018-08-14 Thread Pablo Schandin


Thanks for the info!

I cannot see any logs in the mount log besides one line every time it 
rotates


[2018-08-13 06:25:02.246187] I [glusterfsd-mgmt.c:1821:mgmt_getspec_cbk] 
0-glusterfs: No change in volfile,continuing


But I did find in the glfsheal-gv1.log of the volumes some kind of 
server-client connection that was disconnected and now it connects using 
a different port. The block of log per each run is kind of long so I'm 
copying it into a pastebin.


https://pastebin.com/bp06rrsT

Maybe this has something to do with it?

Thanks!

Pablo.

On 08/11/2018 12:19 AM, Ravishankar N wrote:




On 08/10/2018 11:25 PM, Pablo Schandin wrote:


Hello everyone!

I'm having some trouble with something but I'm not quite sure of with 
what yet. I'm running GlusterFS 3.12.6 on Ubuntu 16.04. I have two 
servers (nodes) in the cluster in a replica mode. Each server has 2 
bricks. As the servers are KVM running several VMs, one brick has 
some VMs locally defined in it and the second brick is the replicated 
from the other server. It has data but not actual writing is being 
done except for the replication.


                            Server 1                               
              Server 2
Volume 1 (gv1): Brick 1 defined VMs (read/write) >                
  Brick 1 replicated qcow2 files
Volume 2 (gv2): Brick 2 replicated qcow2 files <-             
Brick 2 defined VMs (read/write)


So, the main issue arose when I got a nagios alarm that warned about 
a file listed to be healed. And then it disappeared. I came to find 
out that every 5 minutes, the self heal daemon triggers the healing 
and this fixes it. But looking at the logs I have a lot of entries in 
the glustershd.log file like this:


[2018-08-09 14:23:37.689403] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-gv1-replicate-0: 
Completed data selfheal on 407bd97b-e76c-4f81-8f59-7dae11507b0c. 
sources=[0]  sinks=1
[2018-08-09 14:44:37.933143] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-gv2-replicate-0: 
Completed data selfheal on 73713556-5b63-4f91-b83d-d7d82fee111f. 
sources=[0]  sinks=1


The qcow2 files are being healed several times a day (up to 30 in 
occasions). As I understand, this means that a data heal occurred on 
file with gfid 407b... and 7371... in source to sink. Local server to 
replica server? Is it OK for the shd to heal files in the replicated 
brick that supposedly has no writing on it besides the mirroring? How 
does that work?


In AFR, for writes, there is no notion of local/remote brick. No 
matter from which client you write to the volume, it gets sent to both 
bricks. i.e. the replication is synchronous and real time.


How does afr replication work? The file with gfid 7371... is the 
qcow2 root disk of an owncloud server with 17GB of data. It does not 
seem to be that big to be a bottleneck of some sort, I think.


Also, I was investigating the directory tree in 
brick/.glusterfs/indices and I notices that both in xattrop and dirty 
I always have a file created named xattrop-xx and dirty-xx. I 
read that the xattrop file is like a parent file or handle to 
reference other files created there as hardlinks with gfid name for 
the shd to heal. Is the same case as the ones in the dirty dir?


Yes, before the write, the gfid gets captured inside dirty on all 
bricks. If the write is successful, it gets removed. In addition, if 
the write fails on one brick, the other brick will capture the gfid 
inside xattrop.


Any help will be greatly appreciated it. Thanks!

If frequent heals are triggered, it could mean there are frequent 
network disconnects from the clients to the bricks as writes happen. 
You can check the mount logs and see if that is the case and 
investigate possible network issues.


HTH,
Ravi


Pablo.





___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users






smime.p7s
Description: S/MIME Cryptographic Signature
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Issues in AFR and self healing

2018-08-10 Thread Ravishankar N




On 08/10/2018 11:25 PM, Pablo Schandin wrote:


Hello everyone!

I'm having some trouble with something but I'm not quite sure of with 
what yet. I'm running GlusterFS 3.12.6 on Ubuntu 16.04. I have two 
servers (nodes) in the cluster in a replica mode. Each server has 2 
bricks. As the servers are KVM running several VMs, one brick has some 
VMs locally defined in it and the second brick is the replicated from 
the other server. It has data but not actual writing is being done 
except for the replication.


                            Server 1                                 
        Server 2
Volume 1 (gv1): Brick 1 defined VMs (read/write)    >             
  Brick 1 replicated qcow2 files
Volume 2 (gv2): Brick 2 replicated qcow2 files <-             
Brick 2 defined VMs (read/write)


So, the main issue arose when I got a nagios alarm that warned about a 
file listed to be healed. And then it disappeared. I came to find out 
that every 5 minutes, the self heal daemon triggers the healing and 
this fixes it. But looking at the logs I have a lot of entries in the 
glustershd.log file like this:


[2018-08-09 14:23:37.689403] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-gv1-replicate-0: 
Completed data selfheal on 407bd97b-e76c-4f81-8f59-7dae11507b0c. 
sources=[0]  sinks=1
[2018-08-09 14:44:37.933143] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-gv2-replicate-0: 
Completed data selfheal on 73713556-5b63-4f91-b83d-d7d82fee111f. 
sources=[0]  sinks=1


The qcow2 files are being healed several times a day (up to 30 in 
occasions). As I understand, this means that a data heal occurred on 
file with gfid 407b... and 7371... in source to sink. Local server to 
replica server? Is it OK for the shd to heal files in the replicated 
brick that supposedly has no writing on it besides the mirroring? How 
does that work?


In AFR, for writes, there is no notion of local/remote brick. No matter 
from which client you write to the volume, it gets sent to both bricks. 
i.e. the replication is synchronous and real time.


How does afr replication work? The file with gfid 7371... is the qcow2 
root disk of an owncloud server with 17GB of data. It does not seem to 
be that big to be a bottleneck of some sort, I think.


Also, I was investigating the directory tree in 
brick/.glusterfs/indices and I notices that both in xattrop and dirty 
I always have a file created named xattrop-xx and dirty-xx. I 
read that the xattrop file is like a parent file or handle to 
reference other files created there as hardlinks with gfid name for 
the shd to heal. Is the same case as the ones in the dirty dir?


Yes, before the write, the gfid gets captured inside dirty on all 
bricks. If the write is successful, it gets removed. In addition, if the 
write fails on one brick, the other brick will capture the gfid inside 
xattrop.


Any help will be greatly appreciated it. Thanks!

If frequent heals are triggered, it could mean there are frequent 
network disconnects from the clients to the bricks as writes happen. You 
can check the mount logs and see if that is the case and investigate 
possible network issues.


HTH,
Ravi


Pablo.





___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Issues in AFR and self healing

2018-08-10 Thread Pablo Schandin


Hello everyone!

I'm having some trouble with something but I'm not quite sure of with 
what yet. I'm running GlusterFS 3.12.6 on Ubuntu 16.04. I have two 
servers (nodes) in the cluster in a replica mode. Each server has 2 
bricks. As the servers are KVM running several VMs, one brick has some 
VMs locally defined in it and the second brick is the replicated from 
the other server. It has data but not actual writing is being done 
except for the replication.


                            Server 1                                 
        Server 2
Volume 1 (gv1): Brick 1 defined VMs (read/write)    >               
Brick 1 replicated qcow2 files
Volume 2 (gv2): Brick 2 replicated qcow2 files        <-         
 Brick 2 defined VMs (read/write)


So, the main issue arose when I got a nagios alarm that warned about a 
file listed to be healed. And then it disappeared. I came to find out 
that every 5 minutes, the self heal daemon triggers the healing and this 
fixes it. But looking at the logs I have a lot of entries in the 
glustershd.log file like this:


[2018-08-09 14:23:37.689403] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-gv1-replicate-0: 
Completed data selfheal on 407bd97b-e76c-4f81-8f59-7dae11507b0c. 
sources=[0]  sinks=1
[2018-08-09 14:44:37.933143] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-gv2-replicate-0: 
Completed data selfheal on 73713556-5b63-4f91-b83d-d7d82fee111f. 
sources=[0]  sinks=1


The qcow2 files are being healed several times a day (up to 30 in 
occasions). As I understand, this means that a data heal occurred on 
file with gfid 407b... and 7371... in source to sink. Local server to 
replica server? Is it OK for the shd to heal files in the replicated 
brick that supposedly has no writing on it besides the mirroring? How 
does that work?


How does afr replication work? The file with gfid 7371... is the qcow2 
root disk of an owncloud server with 17GB of data. It does not seem to 
be that big to be a bottleneck of some sort, I think.


Also, I was investigating the directory tree in brick/.glusterfs/indices 
and I notices that both in xattrop and dirty I always have a file 
created named xattrop-xx and dirty-xx. I read that the xattrop 
file is like a parent file or handle to reference other files created 
there as hardlinks with gfid name for the shd to heal. Is the same case 
as the ones in the dirty dir?


Any help will be greatly appreciated it. Thanks!

Pablo.





smime.p7s
Description: S/MIME Cryptographic Signature
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Issues in AFR and self healing

Re: [Gluster-users] Issues in AFR and self healing

Re: [Gluster-users] Issues in AFR and self healing

Re: [Gluster-users] Issues in AFR and self healing

Re: [Gluster-users] Issues in AFR and self healing

[Gluster-users] Issues in AFR and self healing

6 matches

Site Navigation

Mail list logo

Footer information