On 08/06/2014 11:30 AM, Roman wrote:
Also, this time files are not the same!
root@stor1:~# md5sum
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
32411360c53116b96a059f17306caeda
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
root@stor2:~# md5sum
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
65b8a6031bcb6f5fb3a11cb1e8b1c9c9
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
What is the getfattr output?
Pranith
2014-08-05 16:33 GMT+03:00 Roman <rome...@gmail.com
<mailto:rome...@gmail.com>>:
Nope, it is not working. But this time it went a bit other way
root@gluster-client:~# dmesg
Segmentation fault
I was not able even to start the VM after I done the tests
Could not read qcow2 header: Operation not permitted
And it seems, it never starts to sync files after first
disconnect. VM survives first disconnect, but not second (I waited
around 30 minutes). Also, I've got network.ping-timeout: 2 in
volume settings, but logs react on first disconnect around 30
seconds. Second was faster, 2 seconds.
Reaction was different also:
slower one:
[2014-08-05 13:26:19.558435] W [socket.c:514:__socket_rwv]
0-glusterfs: readv failed (Connection timed out)
[2014-08-05 13:26:19.558485] W
[socket.c:1962:__socket_proto_state_machine] 0-glusterfs: reading
from socket failed. Error (Connection timed out), peer
(10.250.0.1:24007 <http://10.250.0.1:24007>)
[2014-08-05 13:26:21.281426] W [socket.c:514:__socket_rwv]
0-HA-fast-150G-PVE1-client-0: readv failed (Connection timed out)
[2014-08-05 13:26:21.281474] W
[socket.c:1962:__socket_proto_state_machine]
0-HA-fast-150G-PVE1-client-0: reading from socket failed. Error
(Connection timed out), peer (10.250.0.1:49153
<http://10.250.0.1:49153>)
[2014-08-05 13:26:21.281507] I [client.c:2098:client_rpc_notify]
0-HA-fast-150G-PVE1-client-0: disconnected
the fast one:
2014-08-05 12:52:44.607389] C
[client-handshake.c:127:rpc_client_ping_timer_expired]
0-HA-fast-150G-PVE1-client-1: server 10.250.0.2:49153
<http://10.250.0.2:49153> has not responded in the last 2 seconds,
disconnecting.
[2014-08-05 12:52:44.607491] W [socket.c:514:__socket_rwv]
0-HA-fast-150G-PVE1-client-1: readv failed (No data available)
[2014-08-05 12:52:44.607585] E
[rpc-clnt.c:368:saved_frames_unwind]
(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
[0x7fcb1b4b0558]
(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
[0x7fcb1b4aea63]
(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
[0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced unwinding
frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2014-08-05
12:52:42.463881 (xid=0x381883x)
[2014-08-05 12:52:44.607604] W
[client-rpc-fops.c:2624:client3_3_lookup_cbk]
0-HA-fast-150G-PVE1-client-1: remote operation failed: Transport
endpoint is not connected. Path: /
(00000000-0000-0000-0000-000000000001)
[2014-08-05 12:52:44.607736] E
[rpc-clnt.c:368:saved_frames_unwind]
(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
[0x7fcb1b4b0558]
(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
[0x7fcb1b4aea63]
(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
[0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced unwinding
frame type(GlusterFS Handshake) op(PING(3)) called at 2014-08-05
12:52:42.463891 (xid=0x381884x)
[2014-08-05 12:52:44.607753] W
[client-handshake.c:276:client_ping_cbk]
0-HA-fast-150G-PVE1-client-1: timer must have expired
[2014-08-05 12:52:44.607776] I [client.c:2098:client_rpc_notify]
0-HA-fast-150G-PVE1-client-1: disconnected
I've got SSD disks (just for an info).
Should I go and give a try for 3.5.2?
2014-08-05 13:06 GMT+03:00 Pranith Kumar Karampuri
<pkara...@redhat.com <mailto:pkara...@redhat.com>>:
reply along with gluster-users please :-). May be you are
hitting 'reply' instead of 'reply all'?
Pranith
On 08/05/2014 03:35 PM, Roman wrote:
To make sure and clean, I've created another VM with raw
format and goint to repeat those steps. So now I've got two
VM-s one with qcow2 format and other with raw format. I will
send another e-mail shortly.
2014-08-05 13:01 GMT+03:00 Pranith Kumar Karampuri
<pkara...@redhat.com <mailto:pkara...@redhat.com>>:
On 08/05/2014 03:07 PM, Roman wrote:
really, seems like the same file
stor1:
a951641c5230472929836f9fcede6b04
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
stor2:
a951641c5230472929836f9fcede6b04
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
one thing I've seen from logs, that somehow proxmox VE
is connecting with wrong version to servers?
[2014-08-05 09:23:45.218550] I
[client-handshake.c:1659:select_server_supported_programs]
0-HA-fast-150G-PVE1-client-0: Using Program GlusterFS
3.3, Num (1298437), Version (330)
It is the rpc (over the network data structures) version,
which is not changed at all from 3.3 so thats not a
problem. So what is the conclusion? Is your test case
working now or not?
Pranith
but if I issue:
root@pve1:~# glusterfs -V
glusterfs 3.4.4 built on Jun 28 2014 03:44:57
seems ok.
server use 3.4.4 meanwhile
[2014-08-05 09:23:45.117875] I
[server-handshake.c:567:server_setvolume]
0-HA-fast-150G-PVE1-server: accepted client from
stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0
(version: 3.4.4)
[2014-08-05 09:23:49.103035] I
[server-handshake.c:567:server_setvolume]
0-HA-fast-150G-PVE1-server: accepted client from
stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0
(version: 3.4.4)
if this could be the reason, of course.
I did restart the Proxmox VE yesterday (just for an
information)
2014-08-05 12:30 GMT+03:00 Pranith Kumar Karampuri
<pkara...@redhat.com <mailto:pkara...@redhat.com>>:
On 08/05/2014 02:33 PM, Roman wrote:
Waited long enough for now, still different sizes
and no logs about healing :(
stor1
# file:
exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
root@stor1:~# du -sh
/exports/fast-test/150G/images/127/
1.2G /exports/fast-test/150G/images/127/
stor2
# file:
exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
root@stor2:~# du -sh
/exports/fast-test/150G/images/127/
1.4G /exports/fast-test/150G/images/127/
According to the changelogs, the file doesn't need
any healing. Could you stop the operations on the
VMs and take md5sum on both these machines?
Pranith
2014-08-05 11:49 GMT+03:00 Pranith Kumar Karampuri
<pkara...@redhat.com <mailto:pkara...@redhat.com>>:
On 08/05/2014 02:06 PM, Roman wrote:
Well, it seems like it doesn't see the changes
were made to the volume ? I created two files
200 and 100 MB (from /dev/zero) after I
disconnected the first brick. Then connected
it back and got these logs:
[2014-08-05 08:30:37.830150] I
[glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2014-08-05 08:30:37.830207] I
[rpc-clnt.c:1676:rpc_clnt_reconfig]
0-HA-fast-150G-PVE1-client-0: changing port to
49153 (from 0)
[2014-08-05 08:30:37.830239] W
[socket.c:514:__socket_rwv]
0-HA-fast-150G-PVE1-client-0: readv failed (No
data available)
[2014-08-05 08:30:37.831024] I
[client-handshake.c:1659:select_server_supported_programs]
0-HA-fast-150G-PVE1-client-0: Using Program
GlusterFS 3.3, Num (1298437), Version (330)
[2014-08-05 08:30:37.831375] I
[client-handshake.c:1456:client_setvolume_cbk]
0-HA-fast-150G-PVE1-client-0: Connected to
10.250.0.1:49153 <http://10.250.0.1:49153>,
attached to remote volume
'/exports/fast-test/150G'.
[2014-08-05 08:30:37.831394] I
[client-handshake.c:1468:client_setvolume_cbk]
0-HA-fast-150G-PVE1-client-0: Server and
Client lk-version numbers are not same,
reopening the fds
[2014-08-05 08:30:37.831566] I
[client-handshake.c:450:client_set_lk_version_cbk]
0-HA-fast-150G-PVE1-client-0: Server lk
version = 1
[2014-08-05 08:30:37.830150] I
[glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
this line seems weird to me tbh.
I do not see any traffic on switch interfaces
between gluster servers, which means, there is
no syncing between them.
I tried to ls -l the files on the client and
servers to trigger the healing, but seems like
no success. Should I wait more?
Yes, it should take around 10-15 minutes. Could
you provide 'getfattr -d -m. -e hex
<file-on-brick>' on both the bricks.
Pranith
2014-08-05 11:25 GMT+03:00 Pranith Kumar
Karampuri <pkara...@redhat.com
<mailto:pkara...@redhat.com>>:
On 08/05/2014 01:10 PM, Roman wrote:
Ahha! For some reason I was not able to
start the VM anymore, Proxmox VE told me,
that it is not able to read the qcow2
header due to permission is denied for
some reason. So I just deleted that file
and created a new VM. And the nex message
I've got was this:
Seems like these are the messages where
you took down the bricks before self-heal.
Could you restart the run waiting for
self-heals to complete before taking down
the next brick?
Pranith
[2014-08-05 07:31:25.663412] E
[afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
0-HA-fast-150G-PVE1-replicate-0: Unable
to self-heal contents of
'/images/124/vm-124-disk-1.qcow2'
(possible split-brain). Please delete the
file from all but the preferred
subvolume.- Pending matrix: [ [ 0 60 ] [
11 0 ] ]
[2014-08-05 07:31:25.663955] E
[afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]
0-HA-fast-150G-PVE1-replicate-0:
background data self-heal failed on
/images/124/vm-124-disk-1.qcow2
2014-08-05 10:13 GMT+03:00 Pranith Kumar
Karampuri <pkara...@redhat.com
<mailto:pkara...@redhat.com>>:
I just responded to your earlier mail
about how the log looks. The log
comes on the mount's logfile
Pranith
On 08/05/2014 12:41 PM, Roman wrote:
Ok, so I've waited enough, I think.
Had no any traffic on switch ports
between servers. Could not find any
suitable log message about completed
self-heal (waited about 30 minutes).
Plugged out the other server's UTP
cable this time and got in the same
situation:
root@gluster-test1:~# cat /var/log/dmesg
-bash: /bin/cat: Input/output error
brick logs:
[2014-08-05 07:09:03.005474] I
[server.c:762:server_rpc_notify]
0-HA-fast-150G-PVE1-server:
disconnecting connectionfrom
pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
[2014-08-05 07:09:03.005530] I
[server-helpers.c:729:server_connection_put]
0-HA-fast-150G-PVE1-server: Shutting
down connection
pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
[2014-08-05 07:09:03.005560] I
[server-helpers.c:463:do_fd_cleanup]
0-HA-fast-150G-PVE1-server: fd
cleanup on
/images/124/vm-124-disk-1.qcow2
[2014-08-05 07:09:03.005797] I
[server-helpers.c:617:server_connection_destroy]
0-HA-fast-150G-PVE1-server:
destroyed connection of
pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
2014-08-05 9:53 GMT+03:00 Pranith
Kumar Karampuri <pkara...@redhat.com
<mailto:pkara...@redhat.com>>:
Do you think it is possible for
you to do these tests on the
latest version 3.5.2? 'gluster
volume heal <volname> info'
would give you that information
in versions > 3.5.1.
Otherwise you will have to check
it from either the logs, there
will be self-heal completed
message on the mount logs (or)
by observing 'getfattr -d -m. -e
hex <image-file-on-bricks>'
Pranith
On 08/05/2014 12:09 PM, Roman wrote:
Ok, I understand. I will try
this shortly.
How can I be sure, that healing
process is done, if I am not
able to see its status?
2014-08-05 9:30 GMT+03:00
Pranith Kumar Karampuri
<pkara...@redhat.com
<mailto:pkara...@redhat.com>>:
Mounts will do the healing,
not the self-heal-daemon.
The problem I feel is that
whichever process does the
healing has the latest
information about the good
bricks in this usecase.
Since for VM usecase,
mounts should have the
latest information, we
should let the mounts do
the healing. If the mount
accesses the VM image
either by someone doing
operations inside the VM or
explicit stat on the file
it should do the healing.
Pranith.
On 08/05/2014 10:39 AM,
Roman wrote:
Hmmm, you told me to turn
it off. Did I understood
something wrong? After I
issued the command you've
sent me, I was not able to
watch the healing process,
it said, it won't be
healed, becouse its turned
off.
2014-08-05 5:39 GMT+03:00
Pranith Kumar Karampuri
<pkara...@redhat.com
<mailto:pkara...@redhat.com>>:
You didn't mention
anything about
self-healing. Did you
wait until the
self-heal is complete?
Pranith
On 08/04/2014 05:49
PM, Roman wrote:
Hi!
Result is pretty
same. I set the
switch port down for
1st server, it was
ok. Then set it up
back and set other
server's port off.
and it triggered IO
error on two virtual
machines: one with
local root FS but
network mounted
storage. and other
with network root FS.
1st gave an error on
copying to or from
the mounted network
disk, other just gave
me an error for even
reading log.files.
cat:
/var/log/alternatives.log:
Input/output error
then I reset the kvm
VM and it said me,
there is no boot
device. Next I
virtually powered it
off and then back on
and it has booted.
By the way, did I
have to start/stop
volume?
>> Could you do the
following and test it
again?
>> gluster volume set
<volname>
cluster.self-heal-daemon
off
>>Pranith
2014-08-04 14:10
GMT+03:00 Pranith
Kumar Karampuri
<pkara...@redhat.com
<mailto:pkara...@redhat.com>>:
On 08/04/2014
03:33 PM, Roman
wrote:
Hello!
Facing the same
problem as
mentioned here:
http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html
my set up is up
and running, so
i'm ready to
help you back
with feedback.
setup:
proxmox server
as client
2 gluster
physical servers
server side and
client side both
running atm
3.4.4 glusterfs
from gluster repo.
the problem is:
1. craeted
replica bricks.
2. mounted in
proxmox (tried
both promox
ways: via GUI
and fstab (with
backup volume
line), btw while
mounting via
fstab I'm unable
to launch a VM
without cache,
meanwhile
direct-io-mode
is enabled in
fstab line)
3. installed VM
4. bring one
volume down - ok
5. bringing up,
waiting for sync
is done.
6. bring other
volume down -
getting IO
errors on VM
guest and not
able to restore
the VM after I
reset the VM via
host. It says
(no bootable
media). After I
shut it down
(forced) and
bring back up,
it boots.
Could you do the
following and
test it again?
gluster volume
set <volname>
cluster.self-heal-daemon
off
Pranith
Need help. Tried
3.4.3, 3.4.4.
Still missing
pkg-s for 3.4.5
for debian and
3.5.2 (3.5.1
always gives a
healing error
for some reason)
--
Best regards,
Roman.
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
<mailto:Gluster-users@gluster.org>
http://supercolony.gluster.org/mailman/listinfo/gluster-users
--
Best regards,
Roman.
--
Best regards,
Roman.
--
Best regards,
Roman.
--
Best regards,
Roman.
--
Best regards,
Roman.
--
Best regards,
Roman.
--
Best regards,
Roman.
--
Best regards,
Roman.
--
Best regards,
Roman.
--
Best regards,
Roman.
--
Best regards,
Roman.
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users