On 07/16/2015 01:28 AM, Игорь Бирюлин wrote:
I have studied information on page:
https://github.com/gluster/glusterfs/blob/master/doc/features/heal-info-and-split-brain-resolution.md
and cannot solve split-brain by this instruction.

I have tested it on gluster 3.6 and it doesn't work, only on gluster 3.7.


Right. We need to explicitly mention in the .md that it is supported from 3.7 onwards.

I try to use on gluster 3.7.2.
I have a gluster share in replicate mode:
root@dist-gl2:/# gluster volume info

Volume Name: repofiles
Type: Replicate
Volume ID: 1d5d5d7d-39f2-4011-9fc8-d73c29495e7c
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: dist-gl1:/brick1
Brick2: dist-gl2:/brick1
Options Reconfigured:
performance.readdir-ahead: on
server.allow-insecure: on
root@dist-gl2:/#

And I have one file in split-brain (it is file "test"):
root@dist-gl2:/# gluster volume heal repofiles info
Brick dist-gl1:/brick1/
/test
/ - Is in split-brain

Number of entries: 2

Brick dist-gl2:/brick1/
/ - Is in split-brain

/test
Number of entries: 2

root@dist-gl2:/# gluster volume heal repofiles info split-brain
Brick dist-gl1:/brick1/
/
Number of entries in split-brain: 1

Brick dist-gl2:/brick1/
/
Number of entries in split-brain: 1

root@dist-gl2:/#

I don't know why these commands show only directory ("/") in split-brain.

That is because the file is in gfid split-brain. As listed in the .md file, " for a gfid split-brain, the parent directory of the file is shown to be in split-brain and the file itself is shown to be needing heal". You cannot resolve gfid split-brains using the commands. You need to resolve them manually. See "Fixing Directory entry split-brain" in https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md


I try solve split-brain by gluster cli commands (on directory from the output previous commands and on file), but it could not help:
root@dist-gl2:/# gluster v heal repofiles split-brain bigger-file /
Healing / failed:Operation not permitted.
Volume heal failed.
root@dist-gl2:/# gluster v heal repofiles split-brain bigger-file /test
Lookup failed on /test:Input/output error
Volume heal failed.
root@dist-gl2:/# gluster v heal repofiles split-brain source-brick dist-gl1:/brick1 /
Healing / failed:Operation not permitted.
Volume heal failed.
root@dist-gl2:/# gluster v heal repofiles split-brain source-brick dist-gl1:/brick1 /test
Lookup failed on /test:Input/output error
Volume heal failed.
root@dist-gl2:/# gluster v heal repofiles split-brain source-brick dist-gl2:/brick1 /
Healing / failed:Operation not permitted.
Volume heal failed.
root@dist-gl2:/# gluster v heal repofiles split-brain source-brick dist-gl2:/brick1 /test
Lookup failed on /test:Input/output error
Volume heal failed.
root@dist-gl2:/#

Parts of glfsheal-repofiles.log logs.
When try to solve split-brain on dirictory ("/"):
[2015-07-15 19:45:30.508670] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-07-15 19:45:30.516662] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2015-07-15 19:45:30.517201] I [MSGID: 104045] [glfs-master.c:95:notify] 0-gfapi: New graph 64697374-2d67-6c32-2d32-303634362d32 (0) coming up [2015-07-15 19:45:30.517227] I [MSGID: 114020] [client.c:2118:notify] 0-repofiles-client-0: parent translators are ready, attempting connect on transport [2015-07-15 19:45:30.525457] I [MSGID: 114020] [client.c:2118:notify] 0-repofiles-client-1: parent translators are ready, attempting connect on transport [2015-07-15 19:45:30.526788] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-repofiles-client-0: changing port to 49152 (from 0) [2015-07-15 19:45:30.534012] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-repofiles-client-1: changing port to 49152 (from 0) [2015-07-15 19:45:30.536252] I [MSGID: 114057] [client-handshake.c:1438:select_server_supported_programs] 0-repofiles-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-07-15 19:45:30.536606] I [MSGID: 114046] [client-handshake.c:1214:client_setvolume_cbk] 0-repofiles-client-0: Connected to repofiles-client-0, attached to remote volume '/brick1'. [2015-07-15 19:45:30.536621] I [MSGID: 114047] [client-handshake.c:1225:client_setvolume_cbk] 0-repofiles-client-0: Server and Client lk-version numbers are not same, reopening the fds [2015-07-15 19:45:30.536679] I [MSGID: 108005] [afr-common.c:3883:afr_notify] 0-repofiles-replicate-0: Subvolume 'repofiles-client-0' came back up; going online. [2015-07-15 19:45:30.536819] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-repofiles-client-0: Server lk version = 1 [2015-07-15 19:45:30.543712] I [MSGID: 114057] [client-handshake.c:1438:select_server_supported_programs] 0-repofiles-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-07-15 19:45:30.543919] I [MSGID: 114046] [client-handshake.c:1214:client_setvolume_cbk] 0-repofiles-client-1: Connected to repofiles-client-1, attached to remote volume '/brick1'. [2015-07-15 19:45:30.543933] I [MSGID: 114047] [client-handshake.c:1225:client_setvolume_cbk] 0-repofiles-client-1: Server and Client lk-version numbers are not same, reopening the fds [2015-07-15 19:45:30.554650] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-repofiles-client-1: Server lk version = 1 [2015-07-15 19:45:30.557628] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-repofiles-replicate-0: performing entry selfheal on 00000000-0000-0000-0000-000000000001 [2015-07-15 19:45:30.560002] E [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 0-repofiles-replicate-0: Gfid mismatch detected for <00000000-0000-0000-0000-000000000001/test>, e42d3f03-0633-4954-95ce-5cd8710e595e on repofiles-client-1 and 16da3178-8a6e-4010-b874-7f11449d1993 on repofiles-client-0. Skipping conservative merge on the file. [2015-07-15 19:45:30.561582] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for / [2015-07-15 19:45:30.561604] I [afr-common.c:1673:afr_local_discovery_cbk] 0-repofiles-replicate-0: selecting local read_child repofiles-client-1 [2015-07-15 19:45:30.561900] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for / [2015-07-15 19:45:30.561962] I [MSGID: 104041] [glfs-resolve.c:843:__glfs_active_subvol] 0-repofiles: switched to graph 64697374-2d67-6c32-2d32-303634362d32 (0) [2015-07-15 19:45:30.562259] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for / [2015-07-15 19:45:32.563285] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for / [2015-07-15 19:45:32.564898] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-repofiles-replicate-0: performing entry selfheal on 00000000-0000-0000-0000-000000000001 [2015-07-15 19:45:32.566693] E [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 0-repofiles-replicate-0: Gfid mismatch detected for <00000000-0000-0000-0000-000000000001/test>, e42d3f03-0633-4954-95ce-5cd8710e595e on repofiles-client-1 and 16da3178-8a6e-4010-b874-7f11449d1993 on repofiles-client-0. Skipping conservative merge on the file.
When try to solve split-brain on file ("/test"):
[2015-07-15 19:48:45.910819] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-07-15 19:48:45.919854] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2015-07-15 19:48:45.920434] I [MSGID: 104045] [glfs-master.c:95:notify] 0-gfapi: New graph 64697374-2d67-6c32-2d32-313133392d32 (0) coming up [2015-07-15 19:48:45.920481] I [MSGID: 114020] [client.c:2118:notify] 0-repofiles-client-0: parent translators are ready, attempting connect on transport [2015-07-15 19:48:45.996442] I [MSGID: 114020] [client.c:2118:notify] 0-repofiles-client-1: parent translators are ready, attempting connect on transport [2015-07-15 19:48:45.997892] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-repofiles-client-0: changing port to 49152 (from 0) [2015-07-15 19:48:46.005153] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-repofiles-client-1: changing port to 49152 (from 0) [2015-07-15 19:48:46.007437] I [MSGID: 114057] [client-handshake.c:1438:select_server_supported_programs] 0-repofiles-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-07-15 19:48:46.007928] I [MSGID: 114046] [client-handshake.c:1214:client_setvolume_cbk] 0-repofiles-client-0: Connected to repofiles-client-0, attached to remote volume '/brick1'. [2015-07-15 19:48:46.007945] I [MSGID: 114047] [client-handshake.c:1225:client_setvolume_cbk] 0-repofiles-client-0: Server and Client lk-version numbers are not same, reopening the fds [2015-07-15 19:48:46.008020] I [MSGID: 108005] [afr-common.c:3883:afr_notify] 0-repofiles-replicate-0: Subvolume 'repofiles-client-0' came back up; going online. [2015-07-15 19:48:46.008189] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-repofiles-client-0: Server lk version = 1 [2015-07-15 19:48:46.014313] I [MSGID: 114057] [client-handshake.c:1438:select_server_supported_programs] 0-repofiles-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-07-15 19:48:46.014536] I [MSGID: 114046] [client-handshake.c:1214:client_setvolume_cbk] 0-repofiles-client-1: Connected to repofiles-client-1, attached to remote volume '/brick1'. [2015-07-15 19:48:46.014550] I [MSGID: 114047] [client-handshake.c:1225:client_setvolume_cbk] 0-repofiles-client-1: Server and Client lk-version numbers are not same, reopening the fds [2015-07-15 19:48:46.026828] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-repofiles-client-1: Server lk version = 1 [2015-07-15 19:48:46.029357] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-repofiles-replicate-0: performing entry selfheal on 00000000-0000-0000-0000-000000000001 [2015-07-15 19:48:46.031719] E [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 0-repofiles-replicate-0: Gfid mismatch detected for <00000000-0000-0000-0000-000000000001/test>, e42d3f03-0633-4954-95ce-5cd8710e595e on repofiles-client-1 and 16da3178-8a6e-4010-b874-7f11449d1993 on repofiles-client-0. Skipping conservative merge on the file. [2015-07-15 19:48:46.033222] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for / [2015-07-15 19:48:46.033224] I [afr-common.c:1673:afr_local_discovery_cbk] 0-repofiles-replicate-0: selecting local read_child repofiles-client-1 [2015-07-15 19:48:46.033569] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for / [2015-07-15 19:48:46.033624] I [MSGID: 104041] [glfs-resolve.c:843:__glfs_active_subvol] 0-repofiles: switched to graph 64697374-2d67-6c32-2d32-313133392d32 (0) [2015-07-15 19:48:46.033906] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for / [2015-07-15 19:48:48.036482] W [MSGID: 108008] [afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] 0-repofiles-replicate-0: GFID mismatch for <gfid:00000000-0000-0000-0000-000000000001>/test e42d3f03-0633-4954-95ce-5cd8710e595e on repofiles-client-1 and 16da3178-8a6e-4010-b874-7f11449d1993 on repofiles-client-0

Where I did mistake when try solve split-brain?

Best regards,
Igor

2015-07-14 22:11 GMT+03:00 Roman <rome...@gmail.com <mailto:rome...@gmail.com>>:

    never mind. I do not have enough time to debug why basic commands
    of gluster do not work on production server. It was enough of
    tonight's system freeze due to not documented XFS settings MUST
    have to run glusterfs with XFS. I'll keep to EXT4 better. Anyway
    XFS for bricks did not solve my previous problem.

    To solve split-brain this time, I've restored VM from backup.

    2015-07-14 21:55 GMT+03:00 Roman <rome...@gmail.com
    <mailto:rome...@gmail.com>>:

        Thanx for pointing out...
        but doesn't seem to work... or i am too sleepy due to problems
        with glusterfs and debian8 in other topic which i'm fighting
        for month..

        root@stor1:~# gluster volume heal HA-2TB-TT-Proxmox-cluster
        split-brain source-brick stor1:HA-2TB-TT-Proxmox-cluster/2TB
        /images/124/vm-124-disk-1.qcow2
        Usage: volume heal <VOLNAME> [{full | statistics {heal-count
        {replica <hostname:brickname>}} |info {healed | heal-failed |
        split-brain}}]

        seems like wrong command...

        2015-07-14 21:23 GMT+03:00 Joe Julian <j...@julianfamily.org
        <mailto:j...@julianfamily.org>>:

            On 07/14/2015 11:19 AM, Roman wrote:

                Hi,

                played with glusterfs tonight and tried to use
                recommended XFS for gluster.. first try was pretty bad
                and all of my VM-s hanged (XFS wants allocsize=64k to
                create qcow2 files, which i didn't know about and
                tried to create VM on XFS without this config line in
                fstab, which lead to a lot of IO-s and qemu says it
                got time out while creating the file)..

                now i've got this:
                Brick stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB/
                /images/124/vm-124-disk-1.qcow2 - Is in split-brain

                Number of entries: 1

                Brick stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB/
                /images/124/vm-124-disk-1.qcow2 - Is in split-brain

                ok. what next?
                I've deleted one of files, it didn't help. even more,
                selfheal restored the file on node, where i deleted
                it... and still split-brain.

                how to fix?

-- Best regards,
                Roman.



            
https://github.com/gluster/glusterfs/blob/master/doc/features/heal-info-and-split-brain-resolution.md


            or

            
https://joejulian.name/blog/glusterfs-split-brain-recovery-made-easy/
            _______________________________________________
            Gluster-users mailing list
            Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
            http://www.gluster.org/mailman/listinfo/gluster-users




-- Best regards,
        Roman.




-- Best regards,
    Roman.

    _______________________________________________
    Gluster-users mailing list
    Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
    http://www.gluster.org/mailman/listinfo/gluster-users




_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Reply via email to