I have studied information on page: https://github.com/gluster/glusterfs/blob/master/doc/features/heal-info-and-split-brain-resolution.md and cannot solve split-brain by this instruction.
I have tested it on gluster 3.6 and it doesn't work, only on gluster 3.7. I try to use on gluster 3.7.2. I have a gluster share in replicate mode: root@dist-gl2:/# gluster volume info Volume Name: repofiles Type: Replicate Volume ID: 1d5d5d7d-39f2-4011-9fc8-d73c29495e7c Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: dist-gl1:/brick1 Brick2: dist-gl2:/brick1 Options Reconfigured: performance.readdir-ahead: on server.allow-insecure: on root@dist-gl2:/# And I have one file in split-brain (it is file "test"): root@dist-gl2:/# gluster volume heal repofiles info Brick dist-gl1:/brick1/ /test / - Is in split-brain Number of entries: 2 Brick dist-gl2:/brick1/ / - Is in split-brain /test Number of entries: 2 root@dist-gl2:/# gluster volume heal repofiles info split-brain Brick dist-gl1:/brick1/ / Number of entries in split-brain: 1 Brick dist-gl2:/brick1/ / Number of entries in split-brain: 1 root@dist-gl2:/# I don't know why these commands show only directory ("/") in split-brain. I try solve split-brain by gluster cli commands (on directory from the output previous commands and on file), but it could not help: root@dist-gl2:/# gluster v heal repofiles split-brain bigger-file / Healing / failed:Operation not permitted. Volume heal failed. root@dist-gl2:/# gluster v heal repofiles split-brain bigger-file /test Lookup failed on /test:Input/output error Volume heal failed. root@dist-gl2:/# gluster v heal repofiles split-brain source-brick dist-gl1:/brick1 / Healing / failed:Operation not permitted. Volume heal failed. root@dist-gl2:/# gluster v heal repofiles split-brain source-brick dist-gl1:/brick1 /test Lookup failed on /test:Input/output error Volume heal failed. root@dist-gl2:/# gluster v heal repofiles split-brain source-brick dist-gl2:/brick1 / Healing / failed:Operation not permitted. Volume heal failed. root@dist-gl2:/# gluster v heal repofiles split-brain source-brick dist-gl2:/brick1 /test Lookup failed on /test:Input/output error Volume heal failed. root@dist-gl2:/# Parts of glfsheal-repofiles.log logs. When try to solve split-brain on dirictory ("/"): [2015-07-15 19:45:30.508670] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-07-15 19:45:30.516662] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2015-07-15 19:45:30.517201] I [MSGID: 104045] [glfs-master.c:95:notify] 0-gfapi: New graph 64697374-2d67-6c32-2d32-303634362d32 (0) coming up [2015-07-15 19:45:30.517227] I [MSGID: 114020] [client.c:2118:notify] 0-repofiles-client-0: parent translators are ready, attempting connect on transport [2015-07-15 19:45:30.525457] I [MSGID: 114020] [client.c:2118:notify] 0-repofiles-client-1: parent translators are ready, attempting connect on transport [2015-07-15 19:45:30.526788] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-repofiles-client-0: changing port to 49152 (from 0) [2015-07-15 19:45:30.534012] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-repofiles-client-1: changing port to 49152 (from 0) [2015-07-15 19:45:30.536252] I [MSGID: 114057] [client-handshake.c:1438:select_server_supported_programs] 0-repofiles-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-07-15 19:45:30.536606] I [MSGID: 114046] [client-handshake.c:1214:client_setvolume_cbk] 0-repofiles-client-0: Connected to repofiles-client-0, attached to remote volume '/brick1'. [2015-07-15 19:45:30.536621] I [MSGID: 114047] [client-handshake.c:1225:client_setvolume_cbk] 0-repofiles-client-0: Server and Client lk-version numbers are not same, reopening the fds [2015-07-15 19:45:30.536679] I [MSGID: 108005] [afr-common.c:3883:afr_notify] 0-repofiles-replicate-0: Subvolume 'repofiles-client-0' came back up; going online. [2015-07-15 19:45:30.536819] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-repofiles-client-0: Server lk version = 1 [2015-07-15 19:45:30.543712] I [MSGID: 114057] [client-handshake.c:1438:select_server_supported_programs] 0-repofiles-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-07-15 19:45:30.543919] I [MSGID: 114046] [client-handshake.c:1214:client_setvolume_cbk] 0-repofiles-client-1: Connected to repofiles-client-1, attached to remote volume '/brick1'. [2015-07-15 19:45:30.543933] I [MSGID: 114047] [client-handshake.c:1225:client_setvolume_cbk] 0-repofiles-client-1: Server and Client lk-version numbers are not same, reopening the fds [2015-07-15 19:45:30.554650] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-repofiles-client-1: Server lk version = 1 [2015-07-15 19:45:30.557628] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-repofiles-replicate-0: performing entry selfheal on 00000000-0000-0000-0000-000000000001 [2015-07-15 19:45:30.560002] E [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 0-repofiles-replicate-0: Gfid mismatch detected for <00000000-0000-0000-0000-000000000001/test>, e42d3f03-0633-4954-95ce-5cd8710e595e on repofiles-client-1 and 16da3178-8a6e-4010-b874-7f11449d1993 on repofiles-client-0. Skipping conservative merge on the file. [2015-07-15 19:45:30.561582] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for / [2015-07-15 19:45:30.561604] I [afr-common.c:1673:afr_local_discovery_cbk] 0-repofiles-replicate-0: selecting local read_child repofiles-client-1 [2015-07-15 19:45:30.561900] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for / [2015-07-15 19:45:30.561962] I [MSGID: 104041] [glfs-resolve.c:843:__glfs_active_subvol] 0-repofiles: switched to graph 64697374-2d67-6c32-2d32-303634362d32 (0) [2015-07-15 19:45:30.562259] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for / [2015-07-15 19:45:32.563285] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for / [2015-07-15 19:45:32.564898] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-repofiles-replicate-0: performing entry selfheal on 00000000-0000-0000-0000-000000000001 [2015-07-15 19:45:32.566693] E [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 0-repofiles-replicate-0: Gfid mismatch detected for <00000000-0000-0000-0000-000000000001/test>, e42d3f03-0633-4954-95ce-5cd8710e595e on repofiles-client-1 and 16da3178-8a6e-4010-b874-7f11449d1993 on repofiles-client-0. Skipping conservative merge on the file. When try to solve split-brain on file ("/test"): [2015-07-15 19:48:45.910819] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-07-15 19:48:45.919854] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2015-07-15 19:48:45.920434] I [MSGID: 104045] [glfs-master.c:95:notify] 0-gfapi: New graph 64697374-2d67-6c32-2d32-313133392d32 (0) coming up [2015-07-15 19:48:45.920481] I [MSGID: 114020] [client.c:2118:notify] 0-repofiles-client-0: parent translators are ready, attempting connect on transport [2015-07-15 19:48:45.996442] I [MSGID: 114020] [client.c:2118:notify] 0-repofiles-client-1: parent translators are ready, attempting connect on transport [2015-07-15 19:48:45.997892] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-repofiles-client-0: changing port to 49152 (from 0) [2015-07-15 19:48:46.005153] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-repofiles-client-1: changing port to 49152 (from 0) [2015-07-15 19:48:46.007437] I [MSGID: 114057] [client-handshake.c:1438:select_server_supported_programs] 0-repofiles-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-07-15 19:48:46.007928] I [MSGID: 114046] [client-handshake.c:1214:client_setvolume_cbk] 0-repofiles-client-0: Connected to repofiles-client-0, attached to remote volume '/brick1'. [2015-07-15 19:48:46.007945] I [MSGID: 114047] [client-handshake.c:1225:client_setvolume_cbk] 0-repofiles-client-0: Server and Client lk-version numbers are not same, reopening the fds [2015-07-15 19:48:46.008020] I [MSGID: 108005] [afr-common.c:3883:afr_notify] 0-repofiles-replicate-0: Subvolume 'repofiles-client-0' came back up; going online. [2015-07-15 19:48:46.008189] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-repofiles-client-0: Server lk version = 1 [2015-07-15 19:48:46.014313] I [MSGID: 114057] [client-handshake.c:1438:select_server_supported_programs] 0-repofiles-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-07-15 19:48:46.014536] I [MSGID: 114046] [client-handshake.c:1214:client_setvolume_cbk] 0-repofiles-client-1: Connected to repofiles-client-1, attached to remote volume '/brick1'. [2015-07-15 19:48:46.014550] I [MSGID: 114047] [client-handshake.c:1225:client_setvolume_cbk] 0-repofiles-client-1: Server and Client lk-version numbers are not same, reopening the fds [2015-07-15 19:48:46.026828] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-repofiles-client-1: Server lk version = 1 [2015-07-15 19:48:46.029357] I [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-repofiles-replicate-0: performing entry selfheal on 00000000-0000-0000-0000-000000000001 [2015-07-15 19:48:46.031719] E [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 0-repofiles-replicate-0: Gfid mismatch detected for <00000000-0000-0000-0000-000000000001/test>, e42d3f03-0633-4954-95ce-5cd8710e595e on repofiles-client-1 and 16da3178-8a6e-4010-b874-7f11449d1993 on repofiles-client-0. Skipping conservative merge on the file. [2015-07-15 19:48:46.033222] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for / [2015-07-15 19:48:46.033224] I [afr-common.c:1673:afr_local_discovery_cbk] 0-repofiles-replicate-0: selecting local read_child repofiles-client-1 [2015-07-15 19:48:46.033569] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for / [2015-07-15 19:48:46.033624] I [MSGID: 104041] [glfs-resolve.c:843:__glfs_active_subvol] 0-repofiles: switched to graph 64697374-2d67-6c32-2d32-313133392d32 (0) [2015-07-15 19:48:46.033906] W [afr-common.c:1985:afr_discover_done] 0-repofiles-replicate-0: no read subvols for / [2015-07-15 19:48:48.036482] W [MSGID: 108008] [afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] 0-repofiles-replicate-0: GFID mismatch for <gfid:00000000-0000-0000-0000-000000000001>/test e42d3f03-0633-4954-95ce-5cd8710e595e on repofiles-client-1 and 16da3178-8a6e-4010-b874-7f11449d1993 on repofiles-client-0 Where I did mistake when try solve split-brain? Best regards, Igor 2015-07-14 22:11 GMT+03:00 Roman <rome...@gmail.com>: > never mind. I do not have enough time to debug why basic commands of > gluster do not work on production server. It was enough of tonight's system > freeze due to not documented XFS settings MUST have to run glusterfs with > XFS. I'll keep to EXT4 better. Anyway XFS for bricks did not solve my > previous problem. > > To solve split-brain this time, I've restored VM from backup. > > 2015-07-14 21:55 GMT+03:00 Roman <rome...@gmail.com>: > >> Thanx for pointing out... >> but doesn't seem to work... or i am too sleepy due to problems with >> glusterfs and debian8 in other topic which i'm fighting for month.. >> >> root@stor1:~# gluster volume heal HA-2TB-TT-Proxmox-cluster split-brain >> source-brick stor1:HA-2TB-TT-Proxmox-cluster/2TB >> /images/124/vm-124-disk-1.qcow2 >> Usage: volume heal <VOLNAME> [{full | statistics {heal-count {replica >> <hostname:brickname>}} |info {healed | heal-failed | split-brain}}] >> >> seems like wrong command... >> >> 2015-07-14 21:23 GMT+03:00 Joe Julian <j...@julianfamily.org>: >> >>> On 07/14/2015 11:19 AM, Roman wrote: >>> >>>> Hi, >>>> >>>> played with glusterfs tonight and tried to use recommended XFS for >>>> gluster.. first try was pretty bad and all of my VM-s hanged (XFS wants >>>> allocsize=64k to create qcow2 files, which i didn't know about and tried to >>>> create VM on XFS without this config line in fstab, which lead to a lot of >>>> IO-s and qemu says it got time out while creating the file).. >>>> >>>> now i've got this: >>>> Brick stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB/ >>>> /images/124/vm-124-disk-1.qcow2 - Is in split-brain >>>> >>>> Number of entries: 1 >>>> >>>> Brick stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB/ >>>> /images/124/vm-124-disk-1.qcow2 - Is in split-brain >>>> >>>> ok. what next? >>>> I've deleted one of files, it didn't help. even more, selfheal restored >>>> the file on node, where i deleted it... and still split-brain. >>>> >>>> how to fix? >>>> >>>> -- >>>> Best regards, >>>> Roman. >>>> >>>> >>> >>> >>> https://github.com/gluster/glusterfs/blob/master/doc/features/heal-info-and-split-brain-resolution.md >>> >>> or >>> >>> https://joejulian.name/blog/glusterfs-split-brain-recovery-made-easy/ >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users@gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-users >>> >> >> >> >> -- >> Best regards, >> Roman. >> > > > > -- > Best regards, > Roman. > > _______________________________________________ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users