# getfattr -m . -d -e hex
/opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
getfattr: Removing leading '/' from absolute path names
# file:
opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
trusted.afr.c_glusterfs-client-0=0x000000000000000000000000
trusted.afr.c_glusterfs-client-2=0x000000000000000000000000
trusted.afr.c_glusterfs-client-4=0x000000000000000000000000
trusted.afr.c_glusterfs-client-6=0x000000000000000000000000
trusted.afr.c_glusterfs-client-8=*0x000000060000000000000000**//because client8
is the latest client in our case and starting 8 digits **
*
*00000006....are saying like there is something in changelog data.
*
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x000000000000001356d86c0c000217fd
trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
# lhsh 002500 getfattr -m . -d -e hex
/opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
getfattr: Removing leading '/' from absolute path names
# file:
opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
trusted.afr.c_glusterfs-client-1=*0x000000000000000000000000**// and
here we can say that there is no split brain but the file is out of sync*
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x000000000000001156d86c290005735c
trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
# gluster volume info
Volume Name: c_glusterfs
Type: Replicate
Volume ID: c6a61455-d378-48bf-ad40-7a3ce897fc9c
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.32.0.48:/opt/lvmdir/c2/brick
Brick2: 10.32.1.144:/opt/lvmdir/c2/brick
Options Reconfigured:
performance.readdir-ahead: on
network.ping-timeout: 4
nfs.disable: on
# gluster volume info
Volume Name: c_glusterfs
Type: Replicate
Volume ID: c6a61455-d378-48bf-ad40-7a3ce897fc9c
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.32.0.48:/opt/lvmdir/c2/brick
Brick2: 10.32.1.144:/opt/lvmdir/c2/brick
Options Reconfigured:
performance.readdir-ahead: on
network.ping-timeout: 4
nfs.disable: on
# gluster --version
glusterfs 3.7.8 built on Feb 17 2016 07:49:49
Repository revision: git://git.gluster.com/glusterfs.git
<http://git.gluster.com/glusterfs.git>
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com
<https://prod-webmail.windriver.com/owa/redir.aspx?SURL=1n3NinBc2tJluL9mRvtdRtuM7FXSFmZ7aHgTkNSgQ7vm1RuX9kPTCGgAdAB0AHAAOgAvAC8AdwB3AHcALgBnAGwAdQBzAHQAZQByAC4AYwBvAG0ALwA.&URL=http%3a%2f%2fwww.gluster.com%2f>>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU
General Public License.
# gluster volume heal info heal-failed
Usage: volume heal <VOLNAME> [enable | disable | full |statistics
[heal-count [replica <HOSTNAME:BRICKNAME>]] |info [healed |
heal-failed | split-brain] |split-brain {bigger-file <FILE>
|source-brick <HOSTNAME:BRICKNAME> [<FILE>]}]
# gluster volume heal c_glusterfs info heal-failed
Command not supported. Please use "gluster volume heal c_glusterfs
info" and logs to find the heal information.
# lhsh 002500
_______ _____ _____ _____ __ _ _ _ _ _
| |_____] |_____] | | | \ | | | \___/
|_____ | | |_____ __|__ | \_| |_____| _/ \_
002500> gluster --version
glusterfs 3.7.8 built on Feb 17 2016 07:49:49
Repository revision: git://git.gluster.com/glusterfs.git
<http://git.gluster.com/glusterfs.git>
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com
<https://prod-webmail.windriver.com/owa/redir.aspx?SURL=1n3NinBc2tJluL9mRvtdRtuM7FXSFmZ7aHgTkNSgQ7vm1RuX9kPTCGgAdAB0AHAAOgAvAC8AdwB3AHcALgBnAGwAdQBzAHQAZQByAC4AYwBvAG0ALwA.&URL=http%3a%2f%2fwww.gluster.com%2f>>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU
General Public License.
002500>
Regards,
Abhishek
On Thu, Mar 3, 2016 at 4:54 PM, ABHISHEK PALIWAL
<abhishpali...@gmail.com <mailto:abhishpali...@gmail.com>> wrote:
On Thu, Mar 3, 2016 at 4:10 PM, Ravishankar N
<ravishan...@redhat.com <mailto:ravishan...@redhat.com>> wrote:
Hi,
On 03/03/2016 11:14 AM, ABHISHEK PALIWAL wrote:
Hi Ravi,
As I discussed earlier this issue, I investigated this issue
and find that healing is not triggered because the "gluster
volume heal c_glusterfs info split-brain" command not showing
any entries as a outcome of this command even though the file
in split brain case.
Couple of observations from the 'commands_output' file.
getfattr -d -m . -e hex
opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
The afr xattrs do not indicate that the file is in split brain:
# file:
opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
trusted.afr.c_glusterfs-client-1=0x000000000000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x000000000000000b56d6dd1d000ec7a9
trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
getfattr -d -m . -e hex
opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
trusted.afr.c_glusterfs-client-0=0x000000080000000000000000
trusted.afr.c_glusterfs-client-2=0x000000020000000000000000
trusted.afr.c_glusterfs-client-4=0x000000020000000000000000
trusted.afr.c_glusterfs-client-6=0x000000020000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x000000000000000b56d6dcb7000c87e7
trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
1. There doesn't seem to be a split-brain going by the
trusted.afr* xattrs.
if it is not the split brain problem then how can I resolve this.
2. You seem to have re-used the bricks from another
volume/setup. For replica 2, only
trusted.afr.c_glusterfs-client-0 and
trusted.afr.c_glusterfs-client-1 must be present but I see 4
xattrs - client-0,2,4 and 6
could you please suggest why these entries are there because I am
not able to find out scenario. I am rebooting the one board
multiple times to reproduce the issue and after every reboot doing
the remove-brick and add-brick on the same volume for the second
board.
3. On the rebooted node, do you have ssl enabled by any
chance? There is a bug for "Not able to fetch volfile' when
ssl is enabled:
https://bugzilla.redhat.com/show_bug.cgi?id=1258931
Btw, you for data and metadata split-brains you can use the
gluster CLI
https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md
instead of modifying the file from the back end.
But you are saying it is not split brain problem and even the
split-brain command is not showing any file so how can I find the
bigger file in size. Also in my case the file size is fix 2MB it
is overwritten every time.
-Ravi
So, what I have done I manually deleted the gfid entry of
that file from .glusterfs directory and follow the
instruction mentioned in the following link to do heal
https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md
and this works fine for me.
But my question is why the split-brain command not showing
any file in output.
Here I am attaching all the log which I get from the node for
you and also the output of commands from both of the boards
In this tar file two directories are present
000300 - log for the board which is running continuously
002500- log for the board which is rebooted
I am waiting for your reply please help me out on this issue.
Thanks in advanced.
Regards,
Abhishek
On Fri, Feb 26, 2016 at 1:21 PM, ABHISHEK PALIWAL
<abhishpali...@gmail.com <mailto:abhishpali...@gmail.com>> wrote:
On Fri, Feb 26, 2016 at 10:28 AM, Ravishankar N
<ravishan...@redhat.com <mailto:ravishan...@redhat.com>>
wrote:
On 02/26/2016 10:10 AM, ABHISHEK PALIWAL wrote:
Yes correct
Okay, so when you say the files are not in sync until
some time, are you getting stale data when accessing
from the mount?
I'm not able to figure out why heal info shows zero
when the files are not in sync, despite all IO
happening from the mounts. Could you provide the
output of getfattr -d -m . -e hex /brick/file-name
from both bricks when you hit this issue?
I'll provide the logs once I get. here delay means we
are powering on the second board after the 10 minutes.
On Feb 26, 2016 9:57 AM, "Ravishankar N"
<ravishan...@redhat.com
<mailto:ravishan...@redhat.com>> wrote:
Hello,
On 02/26/2016 08:29 AM, ABHISHEK PALIWAL wrote:
Hi Ravi,
Thanks for the response.
We are using Glugsterfs-3.7.8
Here is the use case:
We have a logging file which saves logs of the
events for every board of a node and these
files are in sync using glusterfs. System in
replica 2 mode it means When one brick in a
replicated volume goes offline, the glusterd
daemons on the other nodes keep track of all
the files that are not replicated to the
offline brick. When the offline brick becomes
available again, the cluster initiates a
healing process, replicating the updated files
to that brick. But in our casse, we see that
log file of one board is not in the sync and
its format is corrupted means files are not in
sync.
Just to understand you correctly, you have
mounted the 2 node replica-2 volume on both
these nodes and writing to a logging file from
the mounts right?
Even the outcome of #gluster volume heal
c_glusterfs info shows that there is no pending
heals.
Also , The logging file which is updated is of
fixed size and the new entries will be wrapped
,overwriting the old entries.
This way we have seen that after few restarts ,
the contents of the same file on two bricks are
different , but the volume heal info shows zero
entries
Solution:
But when we tried to put delay > 5 min before
the healing everything is working fine.
Regards,
Abhishek
On Fri, Feb 26, 2016 at 6:35 AM, Ravishankar N
<ravishan...@redhat.com
<mailto:ravishan...@redhat.com>> wrote:
On 02/25/2016 06:01 PM, ABHISHEK PALIWAL wrote:
Hi,
Here, I have one query regarding the time
taken by the healing process.
In current two node setup when we rebooted
one node then the self-healing process
starts less than 5min interval on the
board which resulting the corruption of
the some files data.
Heal should start immediately after the
brick process comes up. What version of
gluster are you using? What do you mean by
corruption of data? Also, how did you
observe that the heal started after 5 minutes?
-Ravi
And to resolve it I have search on google
and found the following link:
https://support.rackspace.com/how-to/glusterfs-troubleshooting/
Mentioning that the healing process can
takes upto 10min of time to start this
process.
Here is the statement from the link:
"Healing replicated volumes
When any brick in a replicated volume goes
offline, the glusterd daemons on the
remaining nodes keep track of all the
files that are not replicated to the
offline brick. When the offline brick
becomes available again, the cluster
initiates a healing process, replicating
the updated files to that brick. *The
start of this process can take up to 10
minutes, based on observation.*"
After giving the time of more than 5 min
file corruption problem has been resolved.
So, Here my question is there any way
through which we can reduce the time taken
by the healing process to start?
Regards,
Abhishek Paliwal
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
<mailto:Gluster-devel@gluster.org>
http://www.gluster.org/mailman/listinfo/gluster-devel
--
Regards
Abhishek Paliwal
--
Regards
Abhishek Paliwal
--
Regards
Abhishek Paliwal
--
Regards
Abhishek Paliwal
--
Regards
Abhishek Paliwal