[Users] oVirt 3.3 gluster volume active but unable to activate domain
One engine with f19 and two nodes with f19. All with ovirt stable repo for f19. DC defined as GlusterFS The volume is ok, but I can't activate the domain. Relevant logs when I clich activate are below On engine: 2013-10-03 23:05:10,332 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterServicesListVDSCommand] (pool-6-thread-50) START, GlusterServicesListVDSCommand(HostName = f18ovn03, HostId = b67bcfd4-f868-49d5-8704-4936ee922249), log id: 5704c54f 2013-10-03 23:05:12,121 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-83) hostFromVds::selectedVds - f18ovn01, spmStatus Free, storage pool Gluster 2013-10-03 23:05:12,142 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-83) SpmStatus on vds 80188ccc-83b2-4bc8-9385-8d07f7458a3c: Free 2013-10-03 23:05:12,144 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-83) starting spm on vds f18ovn01, storage pool Gluster, prevId 1, LVER 0 2013-10-03 23:05:12,148 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterServicesListVDSCommand] (pool-6-thread-46) FINISH, GlusterServicesListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterServerService@955283ba, org.ovirt.engine.core.common.businessentities.gluster.GlusterServerService@1ef87397, org.ovirt.engine.core.common.businessentities.gluster.GlusterServerService@c1b996b6, org.ovirt.engine.core.common.businessentities.gluster.GlusterServerService@30199726, org.ovirt.engine.core.common.businessentities.gluster.GlusterServerService@606c4879, org.ovirt.engine.core.common.businessentities.gluster.GlusterServerService@2b860d38, org.ovirt.engine.core.common.businessentities.gluster.GlusterServerService@f69fd1f7], log id: 4a1b4d33 2013-10-03 23:05:12,159 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-83) START, SpmStartVDSCommand(HostName = f18ovn01, HostId = 80188ccc-83b2-4bc8-9385-8d07f7458a3c, storagePoolId = eb679feb-4da2-4fd0-a185-abbe459ffa70, prevId=1, prevLVER=0, storagePoolFormatType=V3, recoveryMode=Manual, SCSIFencing=false), log id: 62f11f2d 2013-10-03 23:05:12,169 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-83) spmStart polling started: taskId = ab9f2f84-f89b-44e9-b508-a904420635f4 2013-10-03 23:05:12,232 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterServicesListVDSCommand] (pool-6-thread-50) FINISH, GlusterServicesListVDSCommand, return: [org.ovirt.engine.core.common.businessentities.gluster.GlusterServerService@b624c19b, org.ovirt.engine.core.common.businessentities.gluster.GlusterServerService@3fcab178, org.ovirt.engine.core.common.businessentities.gluster.GlusterServerService@e28bd497, org.ovirt.engine.core.common.businessentities.gluster.GlusterServerService@50ebd507, org.ovirt.engine.core.common.businessentities.gluster.GlusterServerService@813e865a, org.ovirt.engine.core.common.businessentities.gluster.GlusterServerService@4c584b19, org.ovirt.engine.core.common.businessentities.gluster.GlusterServerService@17720fd8], log id: 5704c54f 2013-10-03 23:05:12,512 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler_Worker-6) START, GlusterVolumesListVDSCommand(HostName = f18ovn01, HostId = 80188ccc-83b2-4bc8-9385-8d07f7458a3c), log id: 39a3f45d 2013-10-03 23:05:12,595 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler_Worker-6) FINISH, GlusterVolumesListVDSCommand, return: {97873e57-0cc2-4740-ae38-186a8dd94718=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@a82da199, d055b38c-2754-4e53-af5c-69cc0b8bf31c=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@ef0c0180}, log id: 39a3f45d 2013-10-03 23:05:14,182 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetTaskStatusVDSCommand] (DefaultQuartzScheduler_Worker-83) Failed in HSMGetTaskStatusVDS method 2013-10-03 23:05:14,184 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetTaskStatusVDSCommand] (DefaultQuartzScheduler_Worker-83) Error code AcquireHostIdFailure and error message VDSGenericException: VDSErrorException: Failed to HSMGetTaskStatusVDS, error = Cannot acquire host id 2013-10-03 23:05:14,186 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-83) spmStart polling ended: taskId = ab9f2f84-f89b-44e9-b508-a904420635f4 task status = finished 2013-10-03 23:05:14,188 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-83) Start SPM Task failed - result: cleanSuccess, message: VDSGenericException: VDSErrorException: Failed to HSMGetTaskStatusVDS, error = Cannot acquire host id 2013-10-03 23:05:14,214 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-83) spmStart polling ended, spm statu
Re: [Users] oVirt 3.3 gluster volume active but unable to activate domain
On messages of vdsm hosts Oct 3 23:05:57 f18ovn03 sanlock[1146]: 2013-10-03 23:05:57+0200 16624 [13543]: read_sectors delta_leader offset 512 rv -5 /rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids Oct 3 23:05:58 f18ovn03 sanlock[1146]: 2013-10-03 23:05:58+0200 16625 [1155]: s2142 add_lockspace fail result -5 Oct 3 23:04:24 f18ovn01 sanlock[1166]: 2013-10-03 23:04:24+0200 16154 [1172]: s2688 add_lockspace fail result -5 Oct 3 23:04:24 f18ovn01 vdsm TaskManager.Task ERROR Task=`bd6b0848-4550-483e-9002-e3051a2e1074`::Unexpected error Oct 3 23:04:25 f18ovn01 vdsm TaskManager.Task ERROR Task=`f2ac595d-cc9d-4125-a7be-7b8706cc9ee3`::Unexpected error Oct 3 23:04:25 f18ovn01 vdsm TaskManager.Task ERROR Task=`6c03be35-57f1-405e-b127-bd708defad67`::Unexpected error Oct 3 23:04:26 f18ovn01 sanlock[1166]: 2013-10-03 23:04:26+0200 16157 [21348]: read_sectors delta_leader offset 0 rv -5 /rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids Oct 3 23:04:27 f18ovn01 sanlock[1166]: 2013-10-03 23:04:27+0200 16158 [1172]: s2689 add_lockspace fail result -5 But the origin possibly is a split brain detected at gluster level I see it in rhev-data-center-mnt-glusterSD-f18ovn01.mydomain:gvdata.log this afternoon around the time I had installed the first guest and ran a shutdown and a power on. see: https://docs.google.com/file/d/0BwoPbcrMv8mvNHNOVlNrOFFabjQ/edit?usp=sharing Why gluster logs are two hours behind? UTC? Any way to set them with the current system time? Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] oVirt 3.3 gluster volume active but unable to activate domain
And in fact after solving the split brain, the gluster domain automatically activated. >From rhev-data-center-mnt-glusterSD-f18ovn01.mydomain:gvdata.log under /var/log/glusterfs I found ids file was the one not in sync As the VM only started on f18ovn03 and I was not able to migrate to f18ovn01, I decided to delete the file form f18ovn01. BTW: what does dom_md/ids contain? [2013-10-03 22:06:33.543730] E [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-gvdata-replicate-0: Unable to self-heal contents of '/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 2 ] [ 2 0 ] ] [2013-10-03 22:06:33.544013] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-gvdata-replicate-0: background data self-heal failed on /d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids [2013-10-03 22:06:33.544522] W [afr-open.c:213:afr_open] 0-gvdata-replicate-0: failed to open as split brain seen, returning EIO [2013-10-03 22:06:33.544603] W [page.c:991:__ioc_page_error] 0-gvdata-io-cache: page error for page = 0x7f4b80004910 & waitq = 0x7f4b8001da60 [2013-10-03 22:06:33.544635] W [fuse-bridge.c:2049:fuse_readv_cbk] 0-glusterfs-fuse: 132995: READ => -1 (Input/output error) [2013-10-03 22:06:33.545070] W [client-lk.c:367:delete_granted_locks_owner] 0-gvdata-client-0: fdctx not valid [2013-10-03 22:06:33.545118] W [client-lk.c:367:delete_granted_locks_owner] 0-gvdata-client-1: fdctx not valid I found that gluster creates hard links, so you have to delete all copies of conflicting file from the brick directory of the node you choose to delete from. Thanks very much to this link: http://inuits.eu/blog/fixing-glusterfs-split-brain So these my steps: locate [root@f18ovn01 d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291]# find /gluster/DATA_GLUSTER/brick1/ -samefile /gluster/DATA_GLUSTER/brick1/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids -print /gluster/DATA_GLUSTER/brick1/.glusterfs/ae/27/ae27eb8d-c653-4cc0-a054-ea376ce8097d /gluster/DATA_GLUSTER/brick1/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids and then delete both [root@f18ovn01 d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291]# find /gluster/DATA_GLUSTER/brick1/ -samefile /gluster/DATA_GLUSTER/brick1/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids -print -delete /gluster/DATA_GLUSTER/brick1/.glusterfs/ae/27/ae27eb8d-c653-4cc0-a054-ea376ce8097d /gluster/DATA_GLUSTER/brick1/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids An after this step no more " E " lines in gluster log and gluster domain automatically activated by engine. Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] oVirt 3.3 gluster volume active but unable to activate domain
On 10/04/2013 12:54 AM, Gianluca Cecchi wrote: On messages of vdsm hosts Oct 3 23:05:57 f18ovn03 sanlock[1146]: 2013-10-03 23:05:57+0200 16624 [13543]: read_sectors delta_leader offset 512 rv -5 /rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids Oct 3 23:05:58 f18ovn03 sanlock[1146]: 2013-10-03 23:05:58+0200 16625 [1155]: s2142 add_lockspace fail result -5 Oct 3 23:04:24 f18ovn01 sanlock[1166]: 2013-10-03 23:04:24+0200 16154 [1172]: s2688 add_lockspace fail result -5 Oct 3 23:04:24 f18ovn01 vdsm TaskManager.Task ERROR Task=`bd6b0848-4550-483e-9002-e3051a2e1074`::Unexpected error Oct 3 23:04:25 f18ovn01 vdsm TaskManager.Task ERROR Task=`f2ac595d-cc9d-4125-a7be-7b8706cc9ee3`::Unexpected error Oct 3 23:04:25 f18ovn01 vdsm TaskManager.Task ERROR Task=`6c03be35-57f1-405e-b127-bd708defad67`::Unexpected error Oct 3 23:04:26 f18ovn01 sanlock[1166]: 2013-10-03 23:04:26+0200 16157 [21348]: read_sectors delta_leader offset 0 rv -5 /rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids Oct 3 23:04:27 f18ovn01 sanlock[1166]: 2013-10-03 23:04:27+0200 16158 [1172]: s2689 add_lockspace fail result -5 But the origin possibly is a split brain detected at gluster level I see it in rhev-data-center-mnt-glusterSD-f18ovn01.mydomain:gvdata.log this afternoon around the time I had installed the first guest and ran a shutdown and a power on. see: https://docs.google.com/file/d/0BwoPbcrMv8mvNHNOVlNrOFFabjQ/edit?usp=sharing Why gluster logs are two hours behind? UTC? Any way to set them with the current system time? Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users do you happen to know why it got into a split brain status? ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] oVirt 3.3 gluster volume active but unable to activate domain
On Fri, Oct 4, 2013 at 11:30 AM, Itamar Heim wrote: > > do you happen to know why it got into a split brain status? I suppose with the fist migration attempt. But last night I replicated but I only got the heal for ids file, that gluster apparently solved itself. Possibly the bug about ide can influence the migration problem and the possible gluster split brain? What does it contain dom_md/ids file and when does it get updated? During install of OS, and "yum update" it seems all was ok and no heal about disk file itself SO I presume the metadata update not correctly managed someway could induce gluster coherence on this "ids" file Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users