Re: [Gluster-devel] Question on choosing source of replica to heal with AFR

2014-02-28 Thread Ravishankar N

On 02/28/2014 01:09 PM, Zhang Huan wrote:
On Fri, Feb 28, 2014 at 12:07 PM, Ravishankar N 
ravishan...@redhat.com mailto:ravishan...@redhat.comwrote:


On 02/28/2014 07:28 AM, Zhang Huan wrote:

Hello Ravi,

Thanks for your reply.

Sorry that I have a typo in my mail. It should by underlying
corruption instead of underlying correction.

I guess the logic of eliminating zero byte files from all
innocent nodes is working for preventing underlying corruption to
propagate to other brick. Asked in another way, if the underlying
brick finds some file is corrupted, anything it could do to tell
glusterfs to fix it?


Hi Zhang,
If all nodes are innocent (from AFR's point of view) ,then AFR
cannot  use the changelog attributes to determine which is source.
In this case, the safest bet is to mark all zero byte files as
sink, so that we don't end up healing in the wrong direction. 
Like I said earlier, AFR can only use the changelog attributes

(xattrs) to determine the source/sinks. It cannot detect
underlying on disk file system corruptions outside the scope of
the xattrs.

If you are sure that a particular brick is the right source
despite the xattrs saying otherwise, you can manually change the
attributes of the file on all bricks so that AFR now sees that
brick as the source and heals in the expected direction.

-Ravi


Hello Ravi,

IMO, changing the attributes might be dangerous, since concurrent 
access with glusterfs is introduced. Not sure if glusterfs has already 
provided some mechanism for this.


You are right Zhang. My assumption was that the file wouldn't be 
modified from the mount point while you are modifying the xattrs at the 
bricks.
My suggestion is to eliminate the zero-byte file from heal source even 
if is marked as a source. If the underlying filesystem finds some 
corruption (by scrubbing daemon after checking data checksum), it 
could truncate it to 0 and let glusterfs to do the healing job.
If there is underlying FS corruption and we need to make gluster aware 
of it, then something like bit rot detection would be the way to go. You 
can find more information about some work in progress on the gluster 
website/ mailing list archives:

http://www.gluster.org/community/documentation/index.php/Arch/BitRot_Detection
http://lists.nongnu.org/archive/html/gluster-devel/2014-01/msg00209.html
https://lists.gnu.org/archive/html/gluster-devel/2014-01/msg6.html

-Ravi


Here is several cases of analysis in my mind.
1. If this corrupted file is marked as the only source, then there is 
no correct replica in the filesystem (actually all are fools), just 
pick any one as the source to heal is OK;
2. If the corrupted file is one of the potential sources, eliminate 
this one should keep healing in the right direction without further 
corrupting other correct replicas.
3. If the corrupted file is not marked as a source, some other replica 
will be chosen as a source and this file will be overwritten with 
correct data.
4. If there is no one is marked as clean by attribute, it is quite 
unlikely this file is chosen as a source as its size is 0. Even it is 
chosen as a source, there is no further corruption of file content 
after heal.


Zhang Huan



___
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Question on choosing source of replica to heal with AFR

2014-02-27 Thread Ravishankar N

On 02/26/2014 07:42 PM, Zhang Huan wrote:

Hello guys,

Anyone know about my question?

Zhang Huan


On Sun, Feb 23, 2014 at 11:28 AM, Zhang Huan zhh...@gmail.com 
mailto:zhh...@gmail.com wrote:


Hello all,

While reading codes about how to choose healing source, there is
one thing that confuse me. Say we have 3 replica, and 2 of them
are OK and the left one is outdated due to temporary IO failure.
For some reason, one of the 2 correct replica is truncated to 0
due to some underlying correction. Will glusterfs kick the 0 size
file out? or still consider it a correct one and may corrupt the
left correct replica by healing?

Out of the two correct replicas, gluster will pick the first healthy 
replica brick as source [see afr_sh_select_source()]. If that brick is 
truncated at the back-end due to 'underlying correction' (not sure what 
that means), then yes I'm afraid it will still be considered as correct 
source and you would get zero byte file in other 2 bricks because of the 
healing.


In function afr_mark_sources(), it kicks 0 size file out when all
nodes are innocent. Even when all nodes are fools, the file with
largest size will be chosen as source. When it comes to the case
that there is wise nodes, it won't further check file size.
Considering different file size of replicate will trigger healing
to work, I am wondering if there is any reason behind the code?

The changelog extended attributes are marked  by AFR based on the result 
of whether the file operation succeeded or not on each of the replica. 
It uses those attributes to determine the source/sink. Direct 
modification of the file at the brick will invalidate any meaning that 
the changelog holds.

Thanks,
Ravi



Thanks.

Zhang Huan




___
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Question on choosing source of replica to heal with AFR

2014-02-27 Thread Zhang Huan
Hello Ravi,

Thanks for your reply.

Sorry that I have a typo in my mail. It should by underlying corruption
instead of underlying correction.

I guess the logic of eliminating zero byte files from all innocent nodes is
working for preventing underlying corruption to propagate to other brick.
Asked in another way, if the underlying brick finds some file is corrupted,
anything it could do to tell glusterfs to fix it?

Zhang Huan


On Thu, Feb 27, 2014 at 7:50 PM, Ravishankar N ravishan...@redhat.comwrote:

  On 02/26/2014 07:42 PM, Zhang Huan wrote:

  Hello guys,

  Anyone know about my question?

 Zhang Huan


 On Sun, Feb 23, 2014 at 11:28 AM, Zhang Huan zhh...@gmail.com wrote:

  Hello all,

  While reading codes about how to choose healing source, there is one
 thing that confuse me. Say we have 3 replica, and 2 of them are OK and the
 left one is outdated due to temporary IO failure. For some reason, one of
 the 2 correct replica is truncated to 0 due to some underlying correction.
 Will glusterfs kick the 0 size file out? or still consider it a correct one
 and may corrupt the left correct replica by healing?

Out of the two correct replicas, gluster will pick the first healthy
 replica brick as source [see afr_sh_select_source()]. If that brick is
 truncated at the back-end due to 'underlying correction' (not sure what
 that means), then yes I'm afraid it will still be considered as correct
 source and you would get zero byte file in other 2 bricks because of the
 healing.

In function afr_mark_sources(), it kicks 0 size file out when all
 nodes are innocent. Even when all nodes are fools, the file with largest
 size will be chosen as source. When it comes to the case that there is wise
 nodes, it won't further check file size. Considering different file size of
 replicate will trigger healing to work, I am wondering if there is any
 reason behind the code?

   The changelog extended attributes are marked  by AFR based on the
 result of whether the file operation succeeded or not on each of the
 replica. It uses those attributes to determine the source/sink. Direct
 modification of the file at the brick will invalidate any meaning that the
 changelog holds.
 Thanks,
 Ravi


  Thanks.

  Zhang Huan




 ___
 Gluster-devel mailing 
 listGluster-devel@nongnu.orghttps://lists.nongnu.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Question on choosing source of replica to heal with AFR

2014-02-27 Thread Ravishankar N

On 02/28/2014 07:28 AM, Zhang Huan wrote:

Hello Ravi,

Thanks for your reply.

Sorry that I have a typo in my mail. It should by underlying 
corruption instead of underlying correction.


I guess the logic of eliminating zero byte files from all innocent 
nodes is working for preventing underlying corruption to propagate to 
other brick. Asked in another way, if the underlying brick finds some 
file is corrupted, anything it could do to tell glusterfs to fix it?



Hi Zhang,
If all nodes are innocent (from AFR's point of view) ,then AFR cannot  
use the changelog attributes to determine which is source. In this case, 
the safest bet is to mark all zero byte files as sink, so that we don't 
end up healing in the wrong direction.  Like I said earlier, AFR can 
only use the changelog attributes (xattrs) to determine the 
source/sinks. It cannot detect underlying on disk file system 
corruptions outside the scope of the xattrs.


If you are sure that a particular brick is the right source despite the 
xattrs saying otherwise, you can manually change the attributes of the 
file on all bricks so that AFR now sees that brick as the source and 
heals in the expected direction.


-Ravi


Zhang Huan


On Thu, Feb 27, 2014 at 7:50 PM, Ravishankar N ravishan...@redhat.com 
mailto:ravishan...@redhat.com wrote:


On 02/26/2014 07:42 PM, Zhang Huan wrote:

Hello guys,

Anyone know about my question?

Zhang Huan


On Sun, Feb 23, 2014 at 11:28 AM, Zhang Huan zhh...@gmail.com
mailto:zhh...@gmail.com wrote:

Hello all,

While reading codes about how to choose healing source, there
is one thing that confuse me. Say we have 3 replica, and 2 of
them are OK and the left one is outdated due to temporary IO
failure. For some reason, one of the 2 correct replica is
truncated to 0 due to some underlying correction. Will
glusterfs kick the 0 size file out? or still consider it a
correct one and may corrupt the left correct replica by healing?


Out of the two correct replicas, gluster will pick the first
healthy replica brick as source [see afr_sh_select_source()]. If
that brick is truncated at the back-end due to 'underlying
correction' (not sure what that means), then yes I'm afraid it
will still be considered as correct source and you would get zero
byte file in other 2 bricks because of the healing.


In function afr_mark_sources(), it kicks 0 size file out when
all nodes are innocent. Even when all nodes are fools, the
file with largest size will be chosen as source. When it
comes to the case that there is wise nodes, it won't further
check file size. Considering different file size of replicate
will trigger healing to work, I am wondering if there is any
reason behind the code?


The changelog extended attributes are marked  by AFR based on the
result of whether the file operation succeeded or not on each of
the replica. It uses those attributes to determine the
source/sink. Direct modification of the file at the brick will
invalidate any meaning that the changelog holds.
Thanks,
Ravi



Thanks.

Zhang Huan




___
Gluster-devel mailing list
Gluster-devel@nongnu.org  mailto:Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel





___
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Question on choosing source of replica to heal with AFR

2014-02-27 Thread Zhang Huan
On Fri, Feb 28, 2014 at 12:07 PM, Ravishankar N ravishan...@redhat.comwrote:

  On 02/28/2014 07:28 AM, Zhang Huan wrote:

  Hello Ravi,

  Thanks for your reply.

  Sorry that I have a typo in my mail. It should by underlying
 corruption instead of underlying correction.

  I guess the logic of eliminating zero byte files from all innocent nodes
 is working for preventing underlying corruption to propagate to other
 brick. Asked in another way, if the underlying brick finds some file is
 corrupted, anything it could do to tell glusterfs to fix it?

   Hi Zhang,
 If all nodes are innocent (from AFR's point of view) ,then AFR cannot  use
 the changelog attributes to determine which is source. In this case, the
 safest bet is to mark all zero byte files as sink, so that we don't end up
 healing in the wrong direction.  Like I said earlier, AFR can only use the
 changelog attributes (xattrs) to determine the source/sinks. It cannot
 detect underlying on disk file system corruptions outside the scope of the
 xattrs.

 If you are sure that a particular brick is the right source despite the
 xattrs saying otherwise, you can manually change the attributes of the file
 on all bricks so that AFR now sees that brick as the source and heals in
 the expected direction.

 -Ravi


Hello Ravi,

IMO, changing the attributes might be dangerous, since concurrent access
with glusterfs is introduced. Not sure if glusterfs has already provided
some mechanism for this.

My suggestion is to eliminate the zero-byte file from heal source even if
is marked as a source. If the underlying filesystem finds some corruption
(by scrubbing daemon after checking data checksum), it could truncate it to
0 and let glusterfs to do the healing job. Here is several cases of
analysis in my mind.
1. If this corrupted file is marked as the only source, then there is no
correct replica in the filesystem (actually all are fools), just pick any
one as the source to heal is OK;
2. If the corrupted file is one of the potential sources, eliminate this
one should keep healing in the right direction without further corrupting
other correct replicas.
3. If the corrupted file is not marked as a source, some other replica will
be chosen as a source and this file will be overwritten with correct data.
4. If there is no one is marked as clean by attribute, it is quite unlikely
this file is chosen as a source as its size is 0. Even it is chosen as a
source, there is no further corruption of file content after heal.

Zhang Huan
___
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Question on choosing source of replica to heal with AFR

2014-02-26 Thread Zhang Huan
Hello guys,

Anyone know about my question?

Zhang Huan


On Sun, Feb 23, 2014 at 11:28 AM, Zhang Huan zhh...@gmail.com wrote:

 Hello all,

 While reading codes about how to choose healing source, there is one thing
 that confuse me. Say we have 3 replica, and 2 of them are OK and the left
 one is outdated due to temporary IO failure. For some reason, one of the 2
 correct replica is truncated to 0 due to some underlying correction. Will
 glusterfs kick the 0 size file out? or still consider it a correct one and
 may corrupt the left correct replica by healing?

 In function afr_mark_sources(), it kicks 0 size file out when all nodes
 are innocent. Even when all nodes are fools, the file with largest size
 will be chosen as source. When it comes to the case that there is wise
 nodes, it won't further check file size. Considering different file size of
 replicate will trigger healing to work, I am wondering if there is any
 reason behind the code?

 Thanks.

 Zhang Huan

___
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel