On 03/20/2017 06:31 PM, Bernhard Dübi wrote:
Hi Ravi,
thank you very much for looking into this
The gluster volumes are used by CommVault Simpana to store backup
data. Nothing/Nobody should access the underlying infrastructure.
while looking at the xattrs of the files, I noticed that the only
difference was the bit-rot.version. So, I assume that something in the
synchronization of the bit-rot data went wrong and having different
bit-rot.versions is considered like a split-brain situation and access
is denied because there is no guarantee of correctness. this is just a
wild guess.
Hi Bernhard,
bit-rot version can be different between bricks of the replica when I/O
is successful only on one brick of the replica when the other brick was
down. (though AFR self-heal will later heal the contents, but not modify
bitrot xattrs). So that is not a problem.
over the weekend I identified hundreds of files with input/output
errors. I compared the sha256sum of both bricks, they were always the
same. I then deleted the affected files from gluster and recreated
them. this should have fixed the issue. Verification is still running.
if you're interested in the root cause, I can send you more log files
and the xattrs of some files
If you did not access the underlying bricks directly like you said then
it could possibly be a bitrot bug. If you don't mind please raise a BZ
under the bitrot component and the appropriate gluster version with all
client and brick logs attached.
Also if you do have some kind of reproducer, that would help a lot.
-Ravi
Best Regards
Bernhard
2017-03-20 12:57 GMT+01:00 Ravishankar N <ravishan...@redhat.com
<mailto:ravishan...@redhat.com>>:
SFILE_CONTAINER_080 is the one which seems to be in split-brain.
SFILE_CONTAINER_046, for which you have provided the getfattr
output, hard links etc doesn't seem to be in split-brain. We do
see that the fops on SFILE_CONTAINER_046 are failing on the client
translator itself due to EIO:
[2017-03-17 19:49:56.088867] E [MSGID: 114031]
[client-rpc-fops.c:444:client3_3_open_cbk]
0-Server_Legal_01-client-0: remote operation failed. Path:
/Server_Legal/CV_MAGNETIC/V_944453/CHUNK_9291168/SFILE_CONTAINER_046
(bfdfe21a-1af3-474b-a6a4-bc0e17edb529) [Input/output error]
[2017-03-17 19:49:56.089012] E [MSGID: 114031]
[client-rpc-fops.c:444:client3_3_open_cbk]
0-Server_Legal_01-client-1: remote operation failed. Path:
/Server_Legal/CV_MAGNETIC/V_944453/CHUNK_9291168/SFILE_CONTAINER_046
(bfdfe21a-1af3-474b-a6a4-bc0e17edb529) [Input/output error]
which is why the sha256sum on the mount gave EIO. And that is
because the file seems to be corrupt on both bricks because the
'trusted.bit-rot.bad-file' xattr is set.
Did you write to the files directly on the backend? What is
interesting is that the sha256sum is same on both the bricks
despite being both marked as bad by bitrot.
-Ravi
On 03/18/2017 03:20 AM, Bernhard Dübi wrote:
Hi,
I have a situation
the volume logfile reports a possible split-brain but when I try
to heal it fails because the file is not in split-brain. Any ideas?
Regards
Bernhard
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users