Re: [Users] oVirt 3.3 gluster volume active but unable to activate domain

2013-10-04 Thread Itamar Heim

On 10/04/2013 12:54 AM, Gianluca Cecchi wrote:

On messages of vdsm hosts

Oct  3 23:05:57 f18ovn03 sanlock[1146]: 2013-10-03 23:05:57+0200 16624
[13543]: read_sectors delta_leader offset 512 rv
  -5 
/rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids
Oct  3 23:05:58 f18ovn03 sanlock[1146]: 2013-10-03 23:05:58+0200 16625
[1155]: s2142 add_lockspace fail result -5


Oct  3 23:04:24 f18ovn01 sanlock[1166]: 2013-10-03 23:04:24+0200 16154
[1172]: s2688 add_lockspace fail result -5
Oct  3 23:04:24 f18ovn01 vdsm TaskManager.Task ERROR
Task=`bd6b0848-4550-483e-9002-e3051a2e1074`::Unexpected error
Oct  3 23:04:25 f18ovn01 vdsm TaskManager.Task ERROR
Task=`f2ac595d-cc9d-4125-a7be-7b8706cc9ee3`::Unexpected error
Oct  3 23:04:25 f18ovn01 vdsm TaskManager.Task ERROR
Task=`6c03be35-57f1-405e-b127-bd708defad67`::Unexpected error
Oct  3 23:04:26 f18ovn01 sanlock[1166]: 2013-10-03 23:04:26+0200 16157
[21348]: read_sectors delta_leader offset 0 rv -5
/rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids
Oct  3 23:04:27 f18ovn01 sanlock[1166]: 2013-10-03 23:04:27+0200 16158
[1172]: s2689 add_lockspace fail result -5

But the origin possibly is a split brain detected at gluster level
I see it in rhev-data-center-mnt-glusterSD-f18ovn01.mydomain:gvdata.log
this afternoon around the time I had installed the first guest and ran
a shutdown and a power on.

see:
https://docs.google.com/file/d/0BwoPbcrMv8mvNHNOVlNrOFFabjQ/edit?usp=sharing

Why gluster logs are two hours behind? UTC? Any way to set them with
the current system time?

Gianluca
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users



do you happen to know why it got into a split brain status?
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] oVirt 3.3 gluster volume active but unable to activate domain

2013-10-04 Thread Gianluca Cecchi
On Fri, Oct 4, 2013 at 11:30 AM, Itamar Heim  wrote:

 do you happen to know why it got into a split brain status?

I suppose with the fist migration attempt.

But last night I replicated but I only got the heal for ids file, that
gluster apparently solved itself.
Possibly the bug about ide can influence the migration problem and the
possible gluster split brain?
What does it contain dom_md/ids file and when does it get updated?
During install of OS, and yum update it seems all was ok and no heal
about disk file itself
SO I presume the metadata update not correctly managed someway could
induce gluster coherence on this ids file

Gianluca
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] oVirt 3.3 gluster volume active but unable to activate domain

2013-10-03 Thread Gianluca Cecchi
On messages of vdsm hosts

Oct  3 23:05:57 f18ovn03 sanlock[1146]: 2013-10-03 23:05:57+0200 16624
[13543]: read_sectors delta_leader offset 512 rv
 -5 
/rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids
Oct  3 23:05:58 f18ovn03 sanlock[1146]: 2013-10-03 23:05:58+0200 16625
[1155]: s2142 add_lockspace fail result -5


Oct  3 23:04:24 f18ovn01 sanlock[1166]: 2013-10-03 23:04:24+0200 16154
[1172]: s2688 add_lockspace fail result -5
Oct  3 23:04:24 f18ovn01 vdsm TaskManager.Task ERROR
Task=`bd6b0848-4550-483e-9002-e3051a2e1074`::Unexpected error
Oct  3 23:04:25 f18ovn01 vdsm TaskManager.Task ERROR
Task=`f2ac595d-cc9d-4125-a7be-7b8706cc9ee3`::Unexpected error
Oct  3 23:04:25 f18ovn01 vdsm TaskManager.Task ERROR
Task=`6c03be35-57f1-405e-b127-bd708defad67`::Unexpected error
Oct  3 23:04:26 f18ovn01 sanlock[1166]: 2013-10-03 23:04:26+0200 16157
[21348]: read_sectors delta_leader offset 0 rv -5
/rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids
Oct  3 23:04:27 f18ovn01 sanlock[1166]: 2013-10-03 23:04:27+0200 16158
[1172]: s2689 add_lockspace fail result -5

But the origin possibly is a split brain detected at gluster level
I see it in rhev-data-center-mnt-glusterSD-f18ovn01.mydomain:gvdata.log
this afternoon around the time I had installed the first guest and ran
a shutdown and a power on.

see:
https://docs.google.com/file/d/0BwoPbcrMv8mvNHNOVlNrOFFabjQ/edit?usp=sharing

Why gluster logs are two hours behind? UTC? Any way to set them with
the current system time?

Gianluca
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] oVirt 3.3 gluster volume active but unable to activate domain

2013-10-03 Thread Gianluca Cecchi
And in fact after solving the split brain, the gluster domain
automatically activated.

From rhev-data-center-mnt-glusterSD-f18ovn01.mydomain:gvdata.log under
/var/log/glusterfs I found ids file was the one not in sync
As the VM only started on f18ovn03 and I was not able to migrate to
f18ovn01, I decided to delete the file form f18ovn01.

BTW: what does dom_md/ids contain?


[2013-10-03 22:06:33.543730] E
[afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
0-gvdata-replicate-0: Unable to self-heal contents of
'/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids' (possible
split-brain). Please delete the file from all but the preferred
subvolume.- Pending matrix:  [ [ 0 2 ] [ 2 0 ] ]
[2013-10-03 22:06:33.544013] E
[afr-self-heal-common.c:2212:afr_self_heal_completion_cbk]
0-gvdata-replicate-0: background  data self-heal failed on
/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids
[2013-10-03 22:06:33.544522] W [afr-open.c:213:afr_open]
0-gvdata-replicate-0: failed to open as split brain seen, returning
EIO
[2013-10-03 22:06:33.544603] W [page.c:991:__ioc_page_error]
0-gvdata-io-cache: page error for page = 0x7f4b80004910  waitq =
0x7f4b8001da60
[2013-10-03 22:06:33.544635] W [fuse-bridge.c:2049:fuse_readv_cbk]
0-glusterfs-fuse: 132995: READ = -1 (Input/output error)
[2013-10-03 22:06:33.545070] W
[client-lk.c:367:delete_granted_locks_owner] 0-gvdata-client-0: fdctx
not valid
[2013-10-03 22:06:33.545118] W
[client-lk.c:367:delete_granted_locks_owner] 0-gvdata-client-1: fdctx
not valid

I found that gluster creates hard links, so you have to delete all
copies of conflicting file from the brick directory of the node you
choose to delete from.

Thanks very much to this link:
http://inuits.eu/blog/fixing-glusterfs-split-brain

So these my steps:

locate
[root@f18ovn01 d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291]# find
/gluster/DATA_GLUSTER/brick1/ -samefile
/gluster/DATA_GLUSTER/brick1/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids
-print

/gluster/DATA_GLUSTER/brick1/.glusterfs/ae/27/ae27eb8d-c653-4cc0-a054-ea376ce8097d
/gluster/DATA_GLUSTER/brick1/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids

and then delete both
[root@f18ovn01 d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291]# find
/gluster/DATA_GLUSTER/brick1/ -samefile
/gluster/DATA_GLUSTER/brick1/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids
-print -delete

/gluster/DATA_GLUSTER/brick1/.glusterfs/ae/27/ae27eb8d-c653-4cc0-a054-ea376ce8097d
/gluster/DATA_GLUSTER/brick1/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids

An after this step no more  E  lines in gluster log and gluster
domain automatically activated by engine.

Gianluca
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users