yes, they do. getfattr: Removing leading '/' from absolute path names # file: exports/pve1/1T/images/125/vm-125-disk-1.qcow2 trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000 trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000 trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa
root@stor1:~# du -sh /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 1.6G /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 root@stor1:~# md5sum /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 c117d73c9f8a2e09ef13da31f7225fa6 /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 root@stor1:~# du -sh /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 1.6G /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 root@stor2:~# getfattr -d -m. -e hex /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 getfattr: Removing leading '/' from absolute path names # file: exports/pve1/1T/images/125/vm-125-disk-1.qcow2 trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000 trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000 trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa root@stor2:~# md5sum /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 c117d73c9f8a2e09ef13da31f7225fa6 /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 root@stor2:~# du -sh /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 2.6G /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 2014-08-06 12:49 GMT+03:00 Humble Chirammal <hchir...@redhat.com>: > > > > ----- Original Message ----- > | From: "Pranith Kumar Karampuri" <pkara...@redhat.com> > | To: "Roman" <rome...@gmail.com> > | Cc: gluster-users@gluster.org, "Niels de Vos" <nde...@redhat.com>, > "Humble Chirammal" <hchir...@redhat.com> > | Sent: Wednesday, August 6, 2014 12:09:57 PM > | Subject: Re: [Gluster-users] libgfapi failover problem on replica bricks > | > | Roman, > | The file went into split-brain. I think we should do these tests > | with 3.5.2. Where monitoring the heals is easier. Let me also come up > | with a document about how to do this testing you are trying to do. > | > | Humble/Niels, > | Do we have debs available for 3.5.2? In 3.5.1 there was packaging > | issue where /usr/bin/glfsheal is not packaged along with the deb. I > | think that should be fixed now as well? > | > Pranith, > > The 3.5.2 packages for debian is not available yet. We are co-ordinating > internally to get it processed. > I will update the list once its available. > > --Humble > | > | On 08/06/2014 11:52 AM, Roman wrote: > | > good morning, > | > > | > root@stor1:~# getfattr -d -m. -e hex > | > /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 > | > getfattr: Removing leading '/' from absolute path names > | > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 > | > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000 > | > trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000 > | > trusted.gfid=0x23c79523075a4158bea38078da570449 > | > > | > getfattr: Removing leading '/' from absolute path names > | > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 > | > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000 > | > trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000 > | > trusted.gfid=0x23c79523075a4158bea38078da570449 > | > > | > > | > > | > 2014-08-06 9:20 GMT+03:00 Pranith Kumar Karampuri <pkara...@redhat.com > | > <mailto:pkara...@redhat.com>>: > | > > | > > | > On 08/06/2014 11:30 AM, Roman wrote: > | >> Also, this time files are not the same! > | >> > | >> root@stor1:~# md5sum > | >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 > | >> 32411360c53116b96a059f17306caeda > | >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 > | >> > | >> root@stor2:~# md5sum > | >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 > | >> 65b8a6031bcb6f5fb3a11cb1e8b1c9c9 > | >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 > | > What is the getfattr output? > | > > | > Pranith > | > > | >> > | >> > | >> 2014-08-05 16:33 GMT+03:00 Roman <rome...@gmail.com > | >> <mailto:rome...@gmail.com>>: > | >> > | >> Nope, it is not working. But this time it went a bit other way > | >> > | >> root@gluster-client:~# dmesg > | >> Segmentation fault > | >> > | >> > | >> I was not able even to start the VM after I done the tests > | >> > | >> Could not read qcow2 header: Operation not permitted > | >> > | >> And it seems, it never starts to sync files after first > | >> disconnect. VM survives first disconnect, but not second (I > | >> waited around 30 minutes). Also, I've > | >> got network.ping-timeout: 2 in volume settings, but logs > | >> react on first disconnect around 30 seconds. Second was > | >> faster, 2 seconds. > | >> > | >> Reaction was different also: > | >> > | >> slower one: > | >> [2014-08-05 13:26:19.558435] W [socket.c:514:__socket_rwv] > | >> 0-glusterfs: readv failed (Connection timed out) > | >> [2014-08-05 13:26:19.558485] W > | >> [socket.c:1962:__socket_proto_state_machine] 0-glusterfs: > | >> reading from socket failed. Error (Connection timed out), > | >> peer (10.250.0.1:24007 <http://10.250.0.1:24007>) > | >> [2014-08-05 13:26:21.281426] W [socket.c:514:__socket_rwv] > | >> 0-HA-fast-150G-PVE1-client-0: readv failed (Connection timed > out) > | >> [2014-08-05 13:26:21.281474] W > | >> [socket.c:1962:__socket_proto_state_machine] > | >> 0-HA-fast-150G-PVE1-client-0: reading from socket failed. > | >> Error (Connection timed out), peer (10.250.0.1:49153 > | >> <http://10.250.0.1:49153>) > | >> [2014-08-05 13:26:21.281507] I > | >> [client.c:2098:client_rpc_notify] > | >> 0-HA-fast-150G-PVE1-client-0: disconnected > | >> > | >> the fast one: > | >> 2014-08-05 12:52:44.607389] C > | >> [client-handshake.c:127:rpc_client_ping_timer_expired] > | >> 0-HA-fast-150G-PVE1-client-1: server 10.250.0.2:49153 > | >> <http://10.250.0.2:49153> has not responded in the last 2 > | >> seconds, disconnecting. > | >> [2014-08-05 12:52:44.607491] W [socket.c:514:__socket_rwv] > | >> 0-HA-fast-150G-PVE1-client-1: readv failed (No data available) > | >> [2014-08-05 12:52:44.607585] E > | >> [rpc-clnt.c:368:saved_frames_unwind] > | >> > (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8) > | >> [0x7fcb1b4b0558] > | >> > (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) > | >> [0x7fcb1b4aea63] > | >> > (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe) > | >> [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced > | >> unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at > | >> 2014-08-05 12:52:42.463881 (xid=0x381883x) > | >> [2014-08-05 12:52:44.607604] W > | >> [client-rpc-fops.c:2624:client3_3_lookup_cbk] > | >> 0-HA-fast-150G-PVE1-client-1: remote operation failed: > | >> Transport endpoint is not connected. Path: / > | >> (00000000-0000-0000-0000-000000000001) > | >> [2014-08-05 12:52:44.607736] E > | >> [rpc-clnt.c:368:saved_frames_unwind] > | >> > (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8) > | >> [0x7fcb1b4b0558] > | >> > (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) > | >> [0x7fcb1b4aea63] > | >> > (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe) > | >> [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced > | >> unwinding frame type(GlusterFS Handshake) op(PING(3)) called > | >> at 2014-08-05 12:52:42.463891 (xid=0x381884x) > | >> [2014-08-05 12:52:44.607753] W > | >> [client-handshake.c:276:client_ping_cbk] > | >> 0-HA-fast-150G-PVE1-client-1: timer must have expired > | >> [2014-08-05 12:52:44.607776] I > | >> [client.c:2098:client_rpc_notify] > | >> 0-HA-fast-150G-PVE1-client-1: disconnected > | >> > | >> > | >> > | >> I've got SSD disks (just for an info). > | >> Should I go and give a try for 3.5.2? > | >> > | >> > | >> > | >> 2014-08-05 13:06 GMT+03:00 Pranith Kumar Karampuri > | >> <pkara...@redhat.com <mailto:pkara...@redhat.com>>: > | >> > | >> reply along with gluster-users please :-). May be you are > | >> hitting 'reply' instead of 'reply all'? > | >> > | >> Pranith > | >> > | >> On 08/05/2014 03:35 PM, Roman wrote: > | >>> To make sure and clean, I've created another VM with raw > | >>> format and goint to repeat those steps. So now I've got > | >>> two VM-s one with qcow2 format and other with raw > | >>> format. I will send another e-mail shortly. > | >>> > | >>> > | >>> 2014-08-05 13:01 GMT+03:00 Pranith Kumar Karampuri > | >>> <pkara...@redhat.com <mailto:pkara...@redhat.com>>: > | >>> > | >>> > | >>> On 08/05/2014 03:07 PM, Roman wrote: > | >>>> really, seems like the same file > | >>>> > | >>>> stor1: > | >>>> a951641c5230472929836f9fcede6b04 > | >>>> > /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 > | >>>> > | >>>> stor2: > | >>>> a951641c5230472929836f9fcede6b04 > | >>>> > /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 > | >>>> > | >>>> > | >>>> one thing I've seen from logs, that somehow proxmox > | >>>> VE is connecting with wrong version to servers? > | >>>> [2014-08-05 09:23:45.218550] I > | >>>> > [client-handshake.c:1659:select_server_supported_programs] > | >>>> 0-HA-fast-150G-PVE1-client-0: Using Program > | >>>> GlusterFS 3.3, Num (1298437), Version (330) > | >>> It is the rpc (over the network data structures) > | >>> version, which is not changed at all from 3.3 so > | >>> thats not a problem. So what is the conclusion? Is > | >>> your test case working now or not? > | >>> > | >>> Pranith > | >>> > | >>>> but if I issue: > | >>>> root@pve1:~# glusterfs -V > | >>>> glusterfs 3.4.4 built on Jun 28 2014 03:44:57 > | >>>> seems ok. > | >>>> > | >>>> server use 3.4.4 meanwhile > | >>>> [2014-08-05 09:23:45.117875] I > | >>>> [server-handshake.c:567:server_setvolume] > | >>>> 0-HA-fast-150G-PVE1-server: accepted client from > | >>>> > stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0 > | >>>> (version: 3.4.4) > | >>>> [2014-08-05 09:23:49.103035] I > | >>>> [server-handshake.c:567:server_setvolume] > | >>>> 0-HA-fast-150G-PVE1-server: accepted client from > | >>>> > stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0 > | >>>> (version: 3.4.4) > | >>>> > | >>>> if this could be the reason, of course. > | >>>> I did restart the Proxmox VE yesterday (just for an > | >>>> information) > | >>>> > | >>>> > | >>>> > | >>>> > | >>>> > | >>>> 2014-08-05 12:30 GMT+03:00 Pranith Kumar Karampuri > | >>>> <pkara...@redhat.com <mailto:pkara...@redhat.com>>: > | >>>> > | >>>> > | >>>> On 08/05/2014 02:33 PM, Roman wrote: > | >>>>> Waited long enough for now, still different > | >>>>> sizes and no logs about healing :( > | >>>>> > | >>>>> stor1 > | >>>>> # file: > | >>>>> > exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 > | >>>>> > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000 > | >>>>> > trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000 > | >>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921 > | >>>>> > | >>>>> root@stor1:~# du -sh > | >>>>> /exports/fast-test/150G/images/127/ > | >>>>> 1.2G /exports/fast-test/150G/images/127/ > | >>>>> > | >>>>> > | >>>>> stor2 > | >>>>> # file: > | >>>>> > exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 > | >>>>> > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000 > | >>>>> > trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000 > | >>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921 > | >>>>> > | >>>>> > | >>>>> root@stor2:~# du -sh > | >>>>> /exports/fast-test/150G/images/127/ > | >>>>> 1.4G /exports/fast-test/150G/images/127/ > | >>>> According to the changelogs, the file doesn't > | >>>> need any healing. Could you stop the operations > | >>>> on the VMs and take md5sum on both these > machines? > | >>>> > | >>>> Pranith > | >>>> > | >>>>> > | >>>>> > | >>>>> > | >>>>> > | >>>>> 2014-08-05 11:49 GMT+03:00 Pranith Kumar > | >>>>> Karampuri <pkara...@redhat.com > | >>>>> <mailto:pkara...@redhat.com>>: > | >>>>> > | >>>>> > | >>>>> On 08/05/2014 02:06 PM, Roman wrote: > | >>>>>> Well, it seems like it doesn't see the > | >>>>>> changes were made to the volume ? I > | >>>>>> created two files 200 and 100 MB (from > | >>>>>> /dev/zero) after I disconnected the first > | >>>>>> brick. Then connected it back and got > | >>>>>> these logs: > | >>>>>> > | >>>>>> [2014-08-05 08:30:37.830150] I > | >>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk] > | >>>>>> 0-glusterfs: No change in volfile, > continuing > | >>>>>> [2014-08-05 08:30:37.830207] I > | >>>>>> [rpc-clnt.c:1676:rpc_clnt_reconfig] > | >>>>>> 0-HA-fast-150G-PVE1-client-0: changing > | >>>>>> port to 49153 (from 0) > | >>>>>> [2014-08-05 08:30:37.830239] W > | >>>>>> [socket.c:514:__socket_rwv] > | >>>>>> 0-HA-fast-150G-PVE1-client-0: readv > | >>>>>> failed (No data available) > | >>>>>> [2014-08-05 08:30:37.831024] I > | >>>>>> > [client-handshake.c:1659:select_server_supported_programs] > | >>>>>> 0-HA-fast-150G-PVE1-client-0: Using > | >>>>>> Program GlusterFS 3.3, Num (1298437), > | >>>>>> Version (330) > | >>>>>> [2014-08-05 08:30:37.831375] I > | >>>>>> > [client-handshake.c:1456:client_setvolume_cbk] > | >>>>>> 0-HA-fast-150G-PVE1-client-0: Connected > | >>>>>> to 10.250.0.1:49153 > | >>>>>> <http://10.250.0.1:49153>, attached to > | >>>>>> remote volume '/exports/fast-test/150G'. > | >>>>>> [2014-08-05 08:30:37.831394] I > | >>>>>> > [client-handshake.c:1468:client_setvolume_cbk] > | >>>>>> 0-HA-fast-150G-PVE1-client-0: Server and > | >>>>>> Client lk-version numbers are not same, > | >>>>>> reopening the fds > | >>>>>> [2014-08-05 08:30:37.831566] I > | >>>>>> > [client-handshake.c:450:client_set_lk_version_cbk] > | >>>>>> 0-HA-fast-150G-PVE1-client-0: Server lk > | >>>>>> version = 1 > | >>>>>> > | >>>>>> > | >>>>>> [2014-08-05 08:30:37.830150] I > | >>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk] > | >>>>>> 0-glusterfs: No change in volfile, > continuing > | >>>>>> this line seems weird to me tbh. > | >>>>>> I do not see any traffic on switch > | >>>>>> interfaces between gluster servers, which > | >>>>>> means, there is no syncing between them. > | >>>>>> I tried to ls -l the files on the client > | >>>>>> and servers to trigger the healing, but > | >>>>>> seems like no success. Should I wait more? > | >>>>> Yes, it should take around 10-15 minutes. > | >>>>> Could you provide 'getfattr -d -m. -e hex > | >>>>> <file-on-brick>' on both the bricks. > | >>>>> > | >>>>> Pranith > | >>>>> > | >>>>>> > | >>>>>> > | >>>>>> 2014-08-05 11:25 GMT+03:00 Pranith Kumar > | >>>>>> Karampuri <pkara...@redhat.com > | >>>>>> <mailto:pkara...@redhat.com>>: > | >>>>>> > | >>>>>> > | >>>>>> On 08/05/2014 01:10 PM, Roman wrote: > | >>>>>>> Ahha! For some reason I was not able > | >>>>>>> to start the VM anymore, Proxmox VE > | >>>>>>> told me, that it is not able to read > | >>>>>>> the qcow2 header due to permission > | >>>>>>> is denied for some reason. So I just > | >>>>>>> deleted that file and created a new > | >>>>>>> VM. And the nex message I've got was > | >>>>>>> this: > | >>>>>> Seems like these are the messages > | >>>>>> where you took down the bricks before > | >>>>>> self-heal. Could you restart the run > | >>>>>> waiting for self-heals to complete > | >>>>>> before taking down the next brick? > | >>>>>> > | >>>>>> Pranith > | >>>>>> > | >>>>>>> > | >>>>>>> > | >>>>>>> [2014-08-05 07:31:25.663412] E > | >>>>>>> > [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] > | >>>>>>> 0-HA-fast-150G-PVE1-replicate-0: > | >>>>>>> Unable to self-heal contents of > | >>>>>>> '/images/124/vm-124-disk-1.qcow2' > | >>>>>>> (possible split-brain). Please > | >>>>>>> delete the file from all but the > | >>>>>>> preferred subvolume.- Pending > | >>>>>>> matrix: [ [ 0 60 ] [ 11 0 ] ] > | >>>>>>> [2014-08-05 07:31:25.663955] E > | >>>>>>> > [afr-self-heal-common.c:2262:afr_self_heal_completion_cbk] > | >>>>>>> 0-HA-fast-150G-PVE1-replicate-0: > | >>>>>>> background data self-heal failed on > | >>>>>>> /images/124/vm-124-disk-1.qcow2 > | >>>>>>> > | >>>>>>> > | >>>>>>> > | >>>>>>> 2014-08-05 10:13 GMT+03:00 Pranith > | >>>>>>> Kumar Karampuri <pkara...@redhat.com > | >>>>>>> <mailto:pkara...@redhat.com>>: > | >>>>>>> > | >>>>>>> I just responded to your earlier > | >>>>>>> mail about how the log looks. > | >>>>>>> The log comes on the mount's > logfile > | >>>>>>> > | >>>>>>> Pranith > | >>>>>>> > | >>>>>>> On 08/05/2014 12:41 PM, Roman > wrote: > | >>>>>>>> Ok, so I've waited enough, I > | >>>>>>>> think. Had no any traffic on > | >>>>>>>> switch ports between servers. > | >>>>>>>> Could not find any suitable log > | >>>>>>>> message about completed > | >>>>>>>> self-heal (waited about 30 > | >>>>>>>> minutes). Plugged out the other > | >>>>>>>> server's UTP cable this time > | >>>>>>>> and got in the same situation: > | >>>>>>>> root@gluster-test1:~# cat > | >>>>>>>> /var/log/dmesg > | >>>>>>>> -bash: /bin/cat: Input/output > error > | >>>>>>>> > | >>>>>>>> brick logs: > | >>>>>>>> [2014-08-05 07:09:03.005474] I > | >>>>>>>> [server.c:762:server_rpc_notify] > | >>>>>>>> 0-HA-fast-150G-PVE1-server: > | >>>>>>>> disconnecting connectionfrom > | >>>>>>>> > pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0 > | >>>>>>>> [2014-08-05 07:09:03.005530] I > | >>>>>>>> > [server-helpers.c:729:server_connection_put] > | >>>>>>>> 0-HA-fast-150G-PVE1-server: > | >>>>>>>> Shutting down connection > | >>>>>>>> > pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0 > | >>>>>>>> [2014-08-05 07:09:03.005560] I > | >>>>>>>> > [server-helpers.c:463:do_fd_cleanup] > | >>>>>>>> 0-HA-fast-150G-PVE1-server: fd > | >>>>>>>> cleanup on > | >>>>>>>> /images/124/vm-124-disk-1.qcow2 > | >>>>>>>> [2014-08-05 07:09:03.005797] I > | >>>>>>>> > [server-helpers.c:617:server_connection_destroy] > | >>>>>>>> 0-HA-fast-150G-PVE1-server: > | >>>>>>>> destroyed connection of > | >>>>>>>> > pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0 > | >>>>>>>> > | >>>>>>>> > | >>>>>>>> > | >>>>>>>> > | >>>>>>>> > | >>>>>>>> 2014-08-05 9:53 GMT+03:00 > | >>>>>>>> Pranith Kumar Karampuri > | >>>>>>>> <pkara...@redhat.com > | >>>>>>>> <mailto:pkara...@redhat.com>>: > | >>>>>>>> > | >>>>>>>> Do you think it is possible > | >>>>>>>> for you to do these tests > | >>>>>>>> on the latest version > | >>>>>>>> 3.5.2? 'gluster volume heal > | >>>>>>>> <volname> info' would give > | >>>>>>>> you that information in > | >>>>>>>> versions > 3.5.1. > | >>>>>>>> Otherwise you will have to > | >>>>>>>> check it from either the > | >>>>>>>> logs, there will be > | >>>>>>>> self-heal completed message > | >>>>>>>> on the mount logs (or) by > | >>>>>>>> observing 'getfattr -d -m. > | >>>>>>>> -e hex > <image-file-on-bricks>' > | >>>>>>>> > | >>>>>>>> Pranith > | >>>>>>>> > | >>>>>>>> > | >>>>>>>> On 08/05/2014 12:09 PM, > | >>>>>>>> Roman wrote: > | >>>>>>>>> Ok, I understand. I will > | >>>>>>>>> try this shortly. > | >>>>>>>>> How can I be sure, that > | >>>>>>>>> healing process is done, > | >>>>>>>>> if I am not able to see > | >>>>>>>>> its status? > | >>>>>>>>> > | >>>>>>>>> > | >>>>>>>>> 2014-08-05 9:30 GMT+03:00 > | >>>>>>>>> Pranith Kumar Karampuri > | >>>>>>>>> <pkara...@redhat.com > | >>>>>>>>> <mailto: > pkara...@redhat.com>>: > | >>>>>>>>> > | >>>>>>>>> Mounts will do the > | >>>>>>>>> healing, not the > | >>>>>>>>> self-heal-daemon. The > | >>>>>>>>> problem I feel is that > | >>>>>>>>> whichever process does > | >>>>>>>>> the healing has the > | >>>>>>>>> latest information > | >>>>>>>>> about the good bricks > | >>>>>>>>> in this usecase. Since > | >>>>>>>>> for VM usecase, mounts > | >>>>>>>>> should have the latest > | >>>>>>>>> information, we should > | >>>>>>>>> let the mounts do the > | >>>>>>>>> healing. If the mount > | >>>>>>>>> accesses the VM image > | >>>>>>>>> either by someone > | >>>>>>>>> doing operations > | >>>>>>>>> inside the VM or > | >>>>>>>>> explicit stat on the > | >>>>>>>>> file it should do the > | >>>>>>>>> healing. > | >>>>>>>>> > | >>>>>>>>> Pranith. > | >>>>>>>>> > | >>>>>>>>> > | >>>>>>>>> On 08/05/2014 10:39 > | >>>>>>>>> AM, Roman wrote: > | >>>>>>>>>> Hmmm, you told me to > | >>>>>>>>>> turn it off. Did I > | >>>>>>>>>> understood something > | >>>>>>>>>> wrong? After I issued > | >>>>>>>>>> the command you've > | >>>>>>>>>> sent me, I was not > | >>>>>>>>>> able to watch the > | >>>>>>>>>> healing process, it > | >>>>>>>>>> said, it won't be > | >>>>>>>>>> healed, becouse its > | >>>>>>>>>> turned off. > | >>>>>>>>>> > | >>>>>>>>>> > | >>>>>>>>>> 2014-08-05 5:39 > | >>>>>>>>>> GMT+03:00 Pranith > | >>>>>>>>>> Kumar Karampuri > | >>>>>>>>>> <pkara...@redhat.com > | >>>>>>>>>> <mailto: > pkara...@redhat.com>>: > | >>>>>>>>>> > | >>>>>>>>>> You didn't > | >>>>>>>>>> mention anything > | >>>>>>>>>> about > | >>>>>>>>>> self-healing. Did > | >>>>>>>>>> you wait until > | >>>>>>>>>> the self-heal is > | >>>>>>>>>> complete? > | >>>>>>>>>> > | >>>>>>>>>> Pranith > | >>>>>>>>>> > | >>>>>>>>>> On 08/04/2014 > | >>>>>>>>>> 05:49 PM, Roman > | >>>>>>>>>> wrote: > | >>>>>>>>>>> Hi! > | >>>>>>>>>>> Result is pretty > | >>>>>>>>>>> same. I set the > | >>>>>>>>>>> switch port down > | >>>>>>>>>>> for 1st server, > | >>>>>>>>>>> it was ok. Then > | >>>>>>>>>>> set it up back > | >>>>>>>>>>> and set other > | >>>>>>>>>>> server's port > | >>>>>>>>>>> off. and it > | >>>>>>>>>>> triggered IO > | >>>>>>>>>>> error on two > | >>>>>>>>>>> virtual > | >>>>>>>>>>> machines: one > | >>>>>>>>>>> with local root > | >>>>>>>>>>> FS but network > | >>>>>>>>>>> mounted storage. > | >>>>>>>>>>> and other with > | >>>>>>>>>>> network root FS. > | >>>>>>>>>>> 1st gave an > | >>>>>>>>>>> error on copying > | >>>>>>>>>>> to or from the > | >>>>>>>>>>> mounted network > | >>>>>>>>>>> disk, other just > | >>>>>>>>>>> gave me an error > | >>>>>>>>>>> for even reading > | >>>>>>>>>>> log.files. > | >>>>>>>>>>> > | >>>>>>>>>>> cat: > | >>>>>>>>>>> > /var/log/alternatives.log: > | >>>>>>>>>>> Input/output > error > | >>>>>>>>>>> then I reset the > | >>>>>>>>>>> kvm VM and it > | >>>>>>>>>>> said me, there > | >>>>>>>>>>> is no boot > | >>>>>>>>>>> device. Next I > | >>>>>>>>>>> virtually > | >>>>>>>>>>> powered it off > | >>>>>>>>>>> and then back on > | >>>>>>>>>>> and it has > booted. > | >>>>>>>>>>> > | >>>>>>>>>>> By the way, did > | >>>>>>>>>>> I have to > | >>>>>>>>>>> start/stop > volume? > | >>>>>>>>>>> > | >>>>>>>>>>> >> Could you do > | >>>>>>>>>>> the following > | >>>>>>>>>>> and test it > again? > | >>>>>>>>>>> >> gluster volume > | >>>>>>>>>>> set <volname> > | >>>>>>>>>>> > cluster.self-heal-daemon > | >>>>>>>>>>> off > | >>>>>>>>>>> > | >>>>>>>>>>> >>Pranith > | >>>>>>>>>>> > | >>>>>>>>>>> > | >>>>>>>>>>> > | >>>>>>>>>>> > | >>>>>>>>>>> 2014-08-04 14:10 > | >>>>>>>>>>> GMT+03:00 > | >>>>>>>>>>> Pranith Kumar > | >>>>>>>>>>> Karampuri > | >>>>>>>>>>> < > pkara...@redhat.com > | >>>>>>>>>>> <mailto: > pkara...@redhat.com>>: > | >>>>>>>>>>> > | >>>>>>>>>>> > | >>>>>>>>>>> On > | >>>>>>>>>>> 08/04/2014 > | >>>>>>>>>>> 03:33 PM, > | >>>>>>>>>>> Roman wrote: > | >>>>>>>>>>>> Hello! > | >>>>>>>>>>>> > | >>>>>>>>>>>> Facing the > | >>>>>>>>>>>> same > | >>>>>>>>>>>> problem as > | >>>>>>>>>>>> mentioned > | >>>>>>>>>>>> here: > | >>>>>>>>>>>> > | >>>>>>>>>>>> > http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html > | >>>>>>>>>>>> > | >>>>>>>>>>>> my set up > | >>>>>>>>>>>> is up and > | >>>>>>>>>>>> running, so > | >>>>>>>>>>>> i'm ready > | >>>>>>>>>>>> to help you > | >>>>>>>>>>>> back with > | >>>>>>>>>>>> feedback. > | >>>>>>>>>>>> > | >>>>>>>>>>>> setup: > | >>>>>>>>>>>> proxmox > | >>>>>>>>>>>> server as > | >>>>>>>>>>>> client > | >>>>>>>>>>>> 2 gluster > | >>>>>>>>>>>> physical > | >>>>>>>>>>>> servers > | >>>>>>>>>>>> > | >>>>>>>>>>>> server side > | >>>>>>>>>>>> and client > | >>>>>>>>>>>> side both > | >>>>>>>>>>>> running atm > | >>>>>>>>>>>> 3.4.4 > | >>>>>>>>>>>> glusterfs > | >>>>>>>>>>>> from > | >>>>>>>>>>>> gluster > repo. > | >>>>>>>>>>>> > | >>>>>>>>>>>> the problem > is: > | >>>>>>>>>>>> > | >>>>>>>>>>>> 1. craeted > | >>>>>>>>>>>> replica > bricks. > | >>>>>>>>>>>> 2. mounted > | >>>>>>>>>>>> in proxmox > | >>>>>>>>>>>> (tried both > | >>>>>>>>>>>> promox > | >>>>>>>>>>>> ways: via > | >>>>>>>>>>>> GUI and > | >>>>>>>>>>>> fstab (with > | >>>>>>>>>>>> backup > | >>>>>>>>>>>> volume > | >>>>>>>>>>>> line), btw > | >>>>>>>>>>>> while > | >>>>>>>>>>>> mounting > | >>>>>>>>>>>> via fstab > | >>>>>>>>>>>> I'm unable > | >>>>>>>>>>>> to launch a > | >>>>>>>>>>>> VM without > | >>>>>>>>>>>> cache, > | >>>>>>>>>>>> meanwhile > | >>>>>>>>>>>> > direct-io-mode > | >>>>>>>>>>>> is enabled > | >>>>>>>>>>>> in fstab > line) > | >>>>>>>>>>>> 3. > installed VM > | >>>>>>>>>>>> 4. bring > | >>>>>>>>>>>> one volume > | >>>>>>>>>>>> down - ok > | >>>>>>>>>>>> 5. bringing > | >>>>>>>>>>>> up, waiting > | >>>>>>>>>>>> for sync is > | >>>>>>>>>>>> done. > | >>>>>>>>>>>> 6. bring > | >>>>>>>>>>>> other > | >>>>>>>>>>>> volume down > | >>>>>>>>>>>> - getting > | >>>>>>>>>>>> IO errors > | >>>>>>>>>>>> on VM guest > | >>>>>>>>>>>> and not > | >>>>>>>>>>>> able to > | >>>>>>>>>>>> restore the > | >>>>>>>>>>>> VM after I > | >>>>>>>>>>>> reset the > | >>>>>>>>>>>> VM via > | >>>>>>>>>>>> host. It > | >>>>>>>>>>>> says (no > | >>>>>>>>>>>> bootable > | >>>>>>>>>>>> media). > | >>>>>>>>>>>> After I > | >>>>>>>>>>>> shut it > | >>>>>>>>>>>> down > | >>>>>>>>>>>> (forced) > | >>>>>>>>>>>> and bring > | >>>>>>>>>>>> back up, it > | >>>>>>>>>>>> boots. > | >>>>>>>>>>> Could you do > | >>>>>>>>>>> the > | >>>>>>>>>>> following > | >>>>>>>>>>> and test it > | >>>>>>>>>>> again? > | >>>>>>>>>>> gluster > | >>>>>>>>>>> volume set > | >>>>>>>>>>> <volname> > | >>>>>>>>>>> > cluster.self-heal-daemon > | >>>>>>>>>>> off > | >>>>>>>>>>> > | >>>>>>>>>>> Pranith > | >>>>>>>>>>>> > | >>>>>>>>>>>> Need help. > | >>>>>>>>>>>> Tried > | >>>>>>>>>>>> 3.4.3, > 3.4.4. > | >>>>>>>>>>>> Still > | >>>>>>>>>>>> missing > | >>>>>>>>>>>> pkg-s for > | >>>>>>>>>>>> 3.4.5 for > | >>>>>>>>>>>> debian and > | >>>>>>>>>>>> 3.5.2 > | >>>>>>>>>>>> (3.5.1 > | >>>>>>>>>>>> always > | >>>>>>>>>>>> gives a > | >>>>>>>>>>>> healing > | >>>>>>>>>>>> error for > | >>>>>>>>>>>> some reason) > | >>>>>>>>>>>> > | >>>>>>>>>>>> -- > | >>>>>>>>>>>> Best > regards, > | >>>>>>>>>>>> Roman. > | >>>>>>>>>>>> > | >>>>>>>>>>>> > | >>>>>>>>>>>> > _______________________________________________ > | >>>>>>>>>>>> > Gluster-users > | >>>>>>>>>>>> mailing list > | >>>>>>>>>>>> > Gluster-users@gluster.org > | >>>>>>>>>>>> <mailto: > Gluster-users@gluster.org> > | >>>>>>>>>>>> > http://supercolony.gluster.org/mailman/listinfo/gluster-users > | >>>>>>>>>>> > | >>>>>>>>>>> > | >>>>>>>>>>> > | >>>>>>>>>>> > | >>>>>>>>>>> -- > | >>>>>>>>>>> Best regards, > | >>>>>>>>>>> Roman. > | >>>>>>>>>> > | >>>>>>>>>> > | >>>>>>>>>> > | >>>>>>>>>> > | >>>>>>>>>> -- > | >>>>>>>>>> Best regards, > | >>>>>>>>>> Roman. > | >>>>>>>>> > | >>>>>>>>> > | >>>>>>>>> > | >>>>>>>>> > | >>>>>>>>> -- > | >>>>>>>>> Best regards, > | >>>>>>>>> Roman. > | >>>>>>>> > | >>>>>>>> > | >>>>>>>> > | >>>>>>>> > | >>>>>>>> -- > | >>>>>>>> Best regards, > | >>>>>>>> Roman. > | >>>>>>> > | >>>>>>> > | >>>>>>> > | >>>>>>> > | >>>>>>> -- > | >>>>>>> Best regards, > | >>>>>>> Roman. > | >>>>>> > | >>>>>> > | >>>>>> > | >>>>>> > | >>>>>> -- > | >>>>>> Best regards, > | >>>>>> Roman. > | >>>>> > | >>>>> > | >>>>> > | >>>>> > | >>>>> -- > | >>>>> Best regards, > | >>>>> Roman. > | >>>> > | >>>> > | >>>> > | >>>> > | >>>> -- > | >>>> Best regards, > | >>>> Roman. > | >>> > | >>> > | >>> > | >>> > | >>> -- > | >>> Best regards, > | >>> Roman. > | >> > | >> > | >> > | >> > | >> -- > | >> Best regards, > | >> Roman. > | >> > | >> > | >> > | >> > | >> -- > | >> Best regards, > | >> Roman. > | > > | > > | > > | > > | > -- > | > Best regards, > | > Roman. > | > | > -- Best regards, Roman.
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users