On 03/20/2017 06:31 PM, Bernhard Dübi wrote:
Hi Ravi,

thank you very much for looking into this
The gluster volumes are used by CommVault Simpana to store backup data. Nothing/Nobody should access the underlying infrastructure.

while looking at the xattrs of the files, I noticed that the only difference was the bit-rot.version. So, I assume that something in the synchronization of the bit-rot data went wrong and having different bit-rot.versions is considered like a split-brain situation and access is denied because there is no guarantee of correctness. this is just a wild guess.
Hi Bernhard,

bit-rot version can be different between bricks of the replica when I/O is successful only on one brick of the replica when the other brick was down. (though AFR self-heal will later heal the contents, but not modify bitrot xattrs). So that is not a problem.


over the weekend I identified hundreds of files with input/output errors. I compared the sha256sum of both bricks, they were always the same. I then deleted the affected files from gluster and recreated them. this should have fixed the issue. Verification is still running.

if you're interested in the root cause, I can send you more log files and the xattrs of some files

If you did not access the underlying bricks directly like you said then it could possibly be a bitrot bug. If you don't mind please raise a BZ under the bitrot component and the appropriate gluster version with all client and brick logs attached.
Also if you do have some kind of reproducer, that would help a lot.
-Ravi



Best Regards
Bernhard


2017-03-20 12:57 GMT+01:00 Ravishankar N <ravishan...@redhat.com <mailto:ravishan...@redhat.com>>:

    SFILE_CONTAINER_080 is the one which seems to be in split-brain.
    SFILE_CONTAINER_046, for which you have provided the getfattr
    output, hard links etc doesn't seem to be in split-brain.  We do
    see that the fops on SFILE_CONTAINER_046 are failing on the client
    translator itself due to EIO:

    [2017-03-17 19:49:56.088867] E [MSGID: 114031]
    [client-rpc-fops.c:444:client3_3_open_cbk]
    0-Server_Legal_01-client-0: remote operation failed. Path:
    /Server_Legal/CV_MAGNETIC/V_944453/CHUNK_9291168/SFILE_CONTAINER_046
    (bfdfe21a-1af3-474b-a6a4-bc0e17edb529) [Input/output error]

    [2017-03-17 19:49:56.089012] E [MSGID: 114031]
    [client-rpc-fops.c:444:client3_3_open_cbk]
    0-Server_Legal_01-client-1: remote operation failed. Path:
    /Server_Legal/CV_MAGNETIC/V_944453/CHUNK_9291168/SFILE_CONTAINER_046
    (bfdfe21a-1af3-474b-a6a4-bc0e17edb529) [Input/output error]

    which is  why the sha256sum on the mount gave EIO.  And that is
    because the file seems to be corrupt on both bricks because the
    'trusted.bit-rot.bad-file' xattr is set.

    Did you write to the files directly on the backend? What is
    interesting is that the sha256sum is same on both the bricks
    despite being both marked as bad by bitrot.

    -Ravi


    On 03/18/2017 03:20 AM, Bernhard Dübi wrote:
    Hi,

    I have a situation

    the volume logfile reports a possible split-brain but when I try
    to heal it fails because the file is not in split-brain. Any ideas?




    Regards

    Bernhard



    _______________________________________________
    Gluster-users mailing list
    Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
    http://lists.gluster.org/mailman/listinfo/gluster-users
    <http://lists.gluster.org/mailman/listinfo/gluster-users>

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to