Re: [Users] oVirt 3.3 gluster volume active but unable to activate domain
On 10/04/2013 12:54 AM, Gianluca Cecchi wrote: On messages of vdsm hosts Oct 3 23:05:57 f18ovn03 sanlock[1146]: 2013-10-03 23:05:57+0200 16624 [13543]: read_sectors delta_leader offset 512 rv -5 /rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids Oct 3 23:05:58 f18ovn03 sanlock[1146]: 2013-10-03 23:05:58+0200 16625 [1155]: s2142 add_lockspace fail result -5 Oct 3 23:04:24 f18ovn01 sanlock[1166]: 2013-10-03 23:04:24+0200 16154 [1172]: s2688 add_lockspace fail result -5 Oct 3 23:04:24 f18ovn01 vdsm TaskManager.Task ERROR Task=`bd6b0848-4550-483e-9002-e3051a2e1074`::Unexpected error Oct 3 23:04:25 f18ovn01 vdsm TaskManager.Task ERROR Task=`f2ac595d-cc9d-4125-a7be-7b8706cc9ee3`::Unexpected error Oct 3 23:04:25 f18ovn01 vdsm TaskManager.Task ERROR Task=`6c03be35-57f1-405e-b127-bd708defad67`::Unexpected error Oct 3 23:04:26 f18ovn01 sanlock[1166]: 2013-10-03 23:04:26+0200 16157 [21348]: read_sectors delta_leader offset 0 rv -5 /rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids Oct 3 23:04:27 f18ovn01 sanlock[1166]: 2013-10-03 23:04:27+0200 16158 [1172]: s2689 add_lockspace fail result -5 But the origin possibly is a split brain detected at gluster level I see it in rhev-data-center-mnt-glusterSD-f18ovn01.mydomain:gvdata.log this afternoon around the time I had installed the first guest and ran a shutdown and a power on. see: https://docs.google.com/file/d/0BwoPbcrMv8mvNHNOVlNrOFFabjQ/edit?usp=sharing Why gluster logs are two hours behind? UTC? Any way to set them with the current system time? Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users do you happen to know why it got into a split brain status? ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] oVirt 3.3 gluster volume active but unable to activate domain
On Fri, Oct 4, 2013 at 11:30 AM, Itamar Heim wrote: do you happen to know why it got into a split brain status? I suppose with the fist migration attempt. But last night I replicated but I only got the heal for ids file, that gluster apparently solved itself. Possibly the bug about ide can influence the migration problem and the possible gluster split brain? What does it contain dom_md/ids file and when does it get updated? During install of OS, and yum update it seems all was ok and no heal about disk file itself SO I presume the metadata update not correctly managed someway could induce gluster coherence on this ids file Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] oVirt 3.3 gluster volume active but unable to activate domain
On messages of vdsm hosts Oct 3 23:05:57 f18ovn03 sanlock[1146]: 2013-10-03 23:05:57+0200 16624 [13543]: read_sectors delta_leader offset 512 rv -5 /rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids Oct 3 23:05:58 f18ovn03 sanlock[1146]: 2013-10-03 23:05:58+0200 16625 [1155]: s2142 add_lockspace fail result -5 Oct 3 23:04:24 f18ovn01 sanlock[1166]: 2013-10-03 23:04:24+0200 16154 [1172]: s2688 add_lockspace fail result -5 Oct 3 23:04:24 f18ovn01 vdsm TaskManager.Task ERROR Task=`bd6b0848-4550-483e-9002-e3051a2e1074`::Unexpected error Oct 3 23:04:25 f18ovn01 vdsm TaskManager.Task ERROR Task=`f2ac595d-cc9d-4125-a7be-7b8706cc9ee3`::Unexpected error Oct 3 23:04:25 f18ovn01 vdsm TaskManager.Task ERROR Task=`6c03be35-57f1-405e-b127-bd708defad67`::Unexpected error Oct 3 23:04:26 f18ovn01 sanlock[1166]: 2013-10-03 23:04:26+0200 16157 [21348]: read_sectors delta_leader offset 0 rv -5 /rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids Oct 3 23:04:27 f18ovn01 sanlock[1166]: 2013-10-03 23:04:27+0200 16158 [1172]: s2689 add_lockspace fail result -5 But the origin possibly is a split brain detected at gluster level I see it in rhev-data-center-mnt-glusterSD-f18ovn01.mydomain:gvdata.log this afternoon around the time I had installed the first guest and ran a shutdown and a power on. see: https://docs.google.com/file/d/0BwoPbcrMv8mvNHNOVlNrOFFabjQ/edit?usp=sharing Why gluster logs are two hours behind? UTC? Any way to set them with the current system time? Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] oVirt 3.3 gluster volume active but unable to activate domain
And in fact after solving the split brain, the gluster domain automatically activated. From rhev-data-center-mnt-glusterSD-f18ovn01.mydomain:gvdata.log under /var/log/glusterfs I found ids file was the one not in sync As the VM only started on f18ovn03 and I was not able to migrate to f18ovn01, I decided to delete the file form f18ovn01. BTW: what does dom_md/ids contain? [2013-10-03 22:06:33.543730] E [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-gvdata-replicate-0: Unable to self-heal contents of '/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 2 ] [ 2 0 ] ] [2013-10-03 22:06:33.544013] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-gvdata-replicate-0: background data self-heal failed on /d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids [2013-10-03 22:06:33.544522] W [afr-open.c:213:afr_open] 0-gvdata-replicate-0: failed to open as split brain seen, returning EIO [2013-10-03 22:06:33.544603] W [page.c:991:__ioc_page_error] 0-gvdata-io-cache: page error for page = 0x7f4b80004910 waitq = 0x7f4b8001da60 [2013-10-03 22:06:33.544635] W [fuse-bridge.c:2049:fuse_readv_cbk] 0-glusterfs-fuse: 132995: READ = -1 (Input/output error) [2013-10-03 22:06:33.545070] W [client-lk.c:367:delete_granted_locks_owner] 0-gvdata-client-0: fdctx not valid [2013-10-03 22:06:33.545118] W [client-lk.c:367:delete_granted_locks_owner] 0-gvdata-client-1: fdctx not valid I found that gluster creates hard links, so you have to delete all copies of conflicting file from the brick directory of the node you choose to delete from. Thanks very much to this link: http://inuits.eu/blog/fixing-glusterfs-split-brain So these my steps: locate [root@f18ovn01 d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291]# find /gluster/DATA_GLUSTER/brick1/ -samefile /gluster/DATA_GLUSTER/brick1/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids -print /gluster/DATA_GLUSTER/brick1/.glusterfs/ae/27/ae27eb8d-c653-4cc0-a054-ea376ce8097d /gluster/DATA_GLUSTER/brick1/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids and then delete both [root@f18ovn01 d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291]# find /gluster/DATA_GLUSTER/brick1/ -samefile /gluster/DATA_GLUSTER/brick1/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids -print -delete /gluster/DATA_GLUSTER/brick1/.glusterfs/ae/27/ae27eb8d-c653-4cc0-a054-ea376ce8097d /gluster/DATA_GLUSTER/brick1/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids An after this step no more E lines in gluster log and gluster domain automatically activated by engine. Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users