Re: [Gluster-users] Reliably mounting a gluster volume

2016-10-24 Thread Paul Boven

Hi Kevin, everyone,

On 10/21/2016 03:19 PM, Kevin Lemonnier wrote:

As we were discussing in the "gluster volume not mounted on boot" you
should probably just go with AutoFS, not ideal but I don't see any
other reliable solutions.


Apologies, I had checked about half a year of Gluster mailing list 
postings before I started working on this, but hadn't re-checked the 
last few days.


I rather like the x-systemd.automount solution, because it works equally 
well on a gluster server as on a gluster client. I can confirm that it 
works perfectly in our case. The virtual machines also get properly 
started on boot once the /gluster filesystem is there.


Regards, Paul Boven.
--
Paul Boven <bo...@jive.eu> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.eu
VLBI - It's a fringe science
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Reliably mounting a gluster volume

2016-10-21 Thread Paul Boven

Hi everyone,

For the past few days I've been experimenting with Gluster and systemd. 
The issue I'm trying to solve is that my gluster servers always fail to 
self-mount their gluster volume locally on boot. Apparently this is 
because the mount happens right after glusterd has been started, but 
before it is ready to serve the volume.


I'm doing a refresh of our internal gluster based KVM system, bringing 
it to Ubuntu 16.04LTS. As the Ubuntu gluster package as shipped still 
has this boot/mount issue, and to simplify things a bit, I've removed 
all SystemV and Upstart that ships with the current Ubuntu Gluster 
package, aiming for a systemd-only solution. Ubuntu 16.04LTS uses systemd.


The problem, in my opinion, stems from the fact that in the Unit file 
for glusterd, it is declared as a 'forking' kind of service. This means 
that as soon as the double fork happens, systemd has no option but to 
consider the service as available, and continues with the rest of its 
work. I try to delay the mounting of my /gluster by adding 
"x-systemd.requires=glusterd.service" but for the reasons above, that 
still causes the mount to happen immediately after glusterd has started, 
and then the mount fails.


Is there a way for systemd to know when the gluster service is actually 
able to service a mount request, so one can delay this step of the boot 
process?


In the Unit file, I have:
[Unit]
Requires=rpcbind.service
After=network.target rpcbind.service network-online.target

The curious thing is that, according to gluster.log, the gluster client 
does find out on which hostnames the subvolumes are available. However, 
it seems that talking to both the local (0-gv0-client-0) as remote 
(0-gv0-client-1) fails. For the service on localhost, the error is 
'failed to get the port number for remote subvolume'. For the remote 
volume, it is 'no route to host'. But at this stage, local networking 
(which is fully static and on the same network) should already be up.


Some error messages during the mount:

[12:15:50.749137] E [MSGID: 114058] 
[client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-0: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
[12:15:50.749178] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 
0-gv0-client-0: disconnected from gv0-client-0. Client process will keep 
trying to connect to glusterd until brick's port is available
[12:15:53.679570] E [socket.c:2278:socket_connect_finish] 
0-gv0-client-1: connection to 10.0.0.3:24007 failed (No route to host)
[12:15:53.679611] E [MSGID: 108006] [afr-common.c:3880:afr_notify] 
0-gv0-replicate-0: All subvolumes are down. Going offline until atleast 
one of them comes back up.


Once the machine has fully booted and I log in, simply typing 'mount 
/gluster' always succeeds. I would really appreciate your help in making 
this happening on boot without intervention.


Regards, Paul Boven.
--
Paul Boven <bo...@jive.eu> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.eu
VLBI - It's a fringe science
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Does QEMU offer High Availability

2014-03-18 Thread Paul Boven

Hi Daniel,

I'm using KVM/Qemu with GlusterFS. The virtual machine images are stored 
on the Gluster filesystem, so there is always a copy of the virtual 
machine image on both cluster nodes. And you can even do a live migrate 
of a running guest from one node to the other.


So if a host fails, all its guests will die with it, but you can restart 
them on the other node if you have enough resources.


Regards, Paul Boven.

On 03/18/2014 01:51 PM, Daniel Baker wrote:


Hi Lucian,

but Glusterfs is at least trying to replicate the QEMU image onto the
other node ?  That's the whole point of the replication isn’t it ?


Thanks for the help,

Dan


--

Message: 1
Date: Mon, 17 Mar 2014 20:29:30 +0700
From: Daniel Baker i...@collisiondetection.biz
To: gluster-users@gluster.org
Subject: [Gluster-users] Does QEMU offer High Availability
Message-ID: 5326f8ba.7010...@collisiondetection.biz
Content-Type: text/plain; charset=ISO-8859-1

HI all,

If I use QEMU in Glusterfs 3.4.2 can I achieve true High Availability.

If I have two nodes that are clustering a QEMU image and one of them
goes down will the other QEMU image take its place.

Can I really achieve that ?



Thanks,

Dan


--

Message: 2
Date: Mon, 17 Mar 2014 13:48:45 +
From: Nux! n...@li.nux.ro
To: gluster-users@gluster.org
Subject: Re: [Gluster-users] Does QEMU offer High Availability
Message-ID: c5ba49ac0de0faf0ab1257a7f1800...@li.nux.ro
Content-Type: text/plain; charset=UTF-8; format=flowed

On 17.03.2014 13:29, Daniel Baker wrote:

HI all,

If I use QEMU in Glusterfs 3.4.2 can I achieve true High Availability.

If I have two nodes that are clustering a QEMU image and one of them
goes down will the other QEMU image take its place.

Can I really achieve that ?


Dan,

This is not really a discussion for the gluster lists.

You should ask KVM people, but to answer you briefly:
- No, KVM is just a hypervisor, it runs virtual machines and that's all
it does.
You need to run additional RedHat clustering stuff around it to achieve
HA.

HTH
Lucian


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users




--
Paul Boven bo...@jive.nl +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] [Bug 1057645] ownership of diskimage changes during livemigration, livemigration with kvm/libvirt fails

2014-02-18 Thread Paul Boven

Hi Adam,

This is rather odd - the Redhat specific clone of the bug we filed is 
now marked private, even if I log in with my RH bugtracker account. 
However, the original bug is still accessible:


https://bugzilla.redhat.com/show_bug.cgi?id=1057645

There's not been any progress as far as I know. We are using the 
workaround (which stops libvirt/qemu from doing the chown) in 
production. With the release of Ubuntu 14.04 LTS, I hope to be able to 
use libgfapi on our setup.


Perhaps the fact that the RedHat specific bug is now private might mean 
that they're actually doing something with it, but I wouldn't know.


Regards, Paul Boven.

On 02/18/2014 02:59 PM, Adam Huffman wrote:

Hi Paul,

Could you keep the list updated? That bug has been marked private, so
I can't see it.

Best Wishes,
Adam

On Tue, Jan 28, 2014 at 9:29 AM, Paul Boven bo...@jive.nl wrote:

Hi Bernhard, everyone,

The same problem has now been reproduced on RedHat, please see:

https://bugzilla.redhat.com/show_bug.cgi?id=1058032

With 3.4.0 and Ubuntu 13.04, live migrations worked fine. For me it broke
when the packages were upgraded to 3.4.1.

I've set AppArmor to 'complain' as part of the debugging, so that's not the
issue.

I'm still not convinced that the file ownership itself is the root cause of
this issue, it could well be just a symptom. Libvirt/qemu is perfectly happy
to start a VM when its image file is owned root:root, and change ownership
to libvirt-qemu:kvm. So I see no reason why it couldn't do the same during a
live migration.

In my opinion the real issue is the failure at the fuse level, that makes
file access to the image on the destination impossible, even for root.

Regards, Paul Boven.


On 01/27/2014 07:51 PM, BGM wrote:


Hi Paul  all
I'm really keen on getting this solved,
right now it's a nasty show stopper.
I could try different gluster versions,
as long as I can get the .debs for it,
wouldn't want to start compiling
(although does a config option have changed on package build?)
you reported that 3.4.0 on ubuntu 13.04 was working, right?
code diff, config options for package build.
Another approach: can anyone verify or falsify
https://bugzilla.redhat.com/show_bug.cgi?id=1057645
on another distro than ubuntu/debian?
thinking of it... could it be an apparmor interference?
I had fun with apparmor and mysql on ubuntu 12.04 once...
will have a look at that tomorrow.
As mentioned before, a straight drbd/ocfs2 works (with only 1/4 speed
and the pain of maintenance) so AFAIK I have to blame the ownership change
on gluster, not on an issue with my general setup
best regards
Bernhard




--
Paul Boven bo...@jive.nl +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users




--
Paul Boven bo...@jive.nl +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] [Bug 1057645] ownership of diskimage changes during livemigration, livemigration with kvm/libvirt fails

2014-01-28 Thread Paul Boven

Hi Bernhard, everyone,

The same problem has now been reproduced on RedHat, please see:

https://bugzilla.redhat.com/show_bug.cgi?id=1058032

With 3.4.0 and Ubuntu 13.04, live migrations worked fine. For me it 
broke when the packages were upgraded to 3.4.1.


I've set AppArmor to 'complain' as part of the debugging, so that's not 
the issue.


I'm still not convinced that the file ownership itself is the root cause 
of this issue, it could well be just a symptom. Libvirt/qemu is 
perfectly happy to start a VM when its image file is owned root:root, 
and change ownership to libvirt-qemu:kvm. So I see no reason why it 
couldn't do the same during a live migration.


In my opinion the real issue is the failure at the fuse level, that 
makes file access to the image on the destination impossible, even for root.


Regards, Paul Boven.

On 01/27/2014 07:51 PM, BGM wrote:

Hi Paul  all
I'm really keen on getting this solved,
right now it's a nasty show stopper.
I could try different gluster versions,
as long as I can get the .debs for it,
wouldn't want to start compiling
(although does a config option have changed on package build?)
you reported that 3.4.0 on ubuntu 13.04 was working, right?
code diff, config options for package build.
Another approach: can anyone verify or falsify
https://bugzilla.redhat.com/show_bug.cgi?id=1057645
on another distro than ubuntu/debian?
thinking of it... could it be an apparmor interference?
I had fun with apparmor and mysql on ubuntu 12.04 once...
will have a look at that tomorrow.
As mentioned before, a straight drbd/ocfs2 works (with only 1/4 speed
and the pain of maintenance) so AFAIK I have to blame the ownership change
on gluster, not on an issue with my general setup
best regards
Bernhard



--
Paul Boven bo...@jive.nl +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Migrating a VM makes its gluster storage inaccessible

2014-01-28 Thread Paul Boven

Hi everyone,

On the libvirt Wiki, I found the text below which might well apply to 
our live-migration issue:


The directory used for storing disk images has to be mounted from 
shared storage on both hosts. Otherwise, the domain may lose access to 
its disk images during migration because source libvirtd may change the 
owner, permissions, and SELinux labels on the disk images once it 
successfully migrates the domain to its destination. Libvirt avoids 
doing such things if it detects that the disk images are mounted from a 
shared storage. 


So perhaps libvirtd fails to recognize that it is on shared storage, and 
it is the originating libvirt that throws a wrench in the wheels by 
changing the ownership?


http://wiki.libvirt.org/page/Migration_fails_because_disk_image_cannot_be_found

Regards, Paul Boven.
--
Paul Boven bo...@jive.nl +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Migrating a VM makes its gluster storage inaccessible

2014-01-28 Thread Paul Boven

Hi again,

First the good news: I found a way to make live migrations work again.

As quoted below, libvirt changes the ownership of the guest image, 
unless it detects that the image is on a shared filesystem. After 
looking at the code for libvirt, they have code to detect NFS, GFS2 and 
SMB/CIFS, but not Gluster. As libvirt does not detect that the storage 
is on a shared file system, the originating host will perform a chown 
back to root:root at the end of a successful migration, whereas the 
destination host will do a chown to libvirt-qemu:kvm. This is in fact a 
race condition, so the difference in behaviour between 3.4.0 and 3.4.1 
could be down to timing differences.


Workaround: stop your guests, then stop libvirt, and edit 
/etc/libvirt/qemu.conf - this contains a commented out entry 
'dynamic_ownership=1', which is the default. Change this to 0, and 
remove the comment. Then do a chown to libvirt-qemu:kvm for all your 
stopped images. Then you can start the service libvirt-bin again, and 
bring up the guests. Repeat on the other half of your cluster, and test 
a live migration. For me, they work again.


The workaround seems fine with me, but now you have to take care of 
properly setting the ownership of a guest image yourself (presumably 
only once when you create it).


Other possible solutions:

JoeJulian suggested using libgfapi, giving libvirt direct access without 
having to go through the filesystem. This is the preferred setup for 
libvirt+gluster and should also result in better I/O performance. I 
haven't tested this yet, but it's high on my to-do list.


Submit a patch to libvirt so it can detect that the filesystem is 
Gluster. statfs() will only show 'FUSE, but we could then use getxattr 
to see if there is a gluster-specific attribute set (suggested by 
kkeithley). This could be trusted.glusterfs.volume-id, e.g.


Regards, Paul Boven.


On 01/28/2014 01:57 PM, Paul Boven wrote:

Hi everyone,

On the libvirt Wiki, I found the text below which might well apply to
our live-migration issue:

The directory used for storing disk images has to be mounted from
shared storage on both hosts. Otherwise, the domain may lose access to
its disk images during migration because source libvirtd may change the
owner, permissions, and SELinux labels on the disk images once it
successfully migrates the domain to its destination. Libvirt avoids
doing such things if it detects that the disk images are mounted from a
shared storage. 

So perhaps libvirtd fails to recognize that it is on shared storage, and
it is the originating libvirt that throws a wrench in the wheels by
changing the ownership?

http://wiki.libvirt.org/page/Migration_fails_because_disk_image_cannot_be_found


Regards, Paul Boven.



--
Paul Boven bo...@jive.nl +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Glusterd dont start

2014-01-28 Thread Paul Boven

Hi Jefferson,

I've seen such differences in df, too. They are not necessarily a cause 
for alarm, as sometimes sparse files can be identical (verified through 
md5sum) on both bricks, but not use the same number of disk blocks.


You should instead try a ls -l of the files on both bricks, and see if 
they are different. If they're exactly the same, you could still try to 
do an md5sum, I did that on my bricks (without gluster running) to make 
100% sure that all the interesting events of the past few days didn't 
corrupt my storage.


The difference in disk usage can also be down to the content of the 
hidden .glusterfs-directory in your bricks. That's where the main 
difference is on my machines.


Regards, Paul Boven.

On 01/28/2014 05:54 PM, Jefferson Carlos Machado wrote:

Hi,

Thank you so much.
After this all sounds good, but I am not sure because df is different on
nodes.

root@srvhttp0 results]# df
Filesystem  1K-blocksUsed Available Use% Mounted on
/dev/mapper/fedora-root   2587248 2128160307948  88% /
devtmpfs   493056   0493056   0% /dev
tmpfs  506240   50648455592  11% /dev/shm
tmpfs  506240 236506004   1% /run
tmpfs  506240   0506240   0% /sys/fs/cgroup
tmpfs  506240  12506228   1% /tmp
/dev/xvda1 487652  106846351110  24% /boot
/dev/xvdb12085888  551292 1534596  27% /gv
localhost:/gv_html2085888  587776   1498112  29% /var/www/html
[root@srvhttp0 results]# cd /gv
[root@srvhttp0 gv]# ls -la
total 8
drwxr-xr-x   3 root root   17 Jan 28 14:43 .
dr-xr-xr-x. 19 root root 4096 Jan 26 10:10 ..
drwxr-xr-x   4 root root   37 Jan 28 14:43 html
[root@srvhttp0 gv]#


[root@srvhttp1 html]# df
Filesystem  1K-blocksUsed Available Use% Mounted on
/dev/mapper/fedora-root   2587248 2355180 80928  97% /
devtmpfs   126416   0126416   0% /dev
tmpfs  139600   35252104348  26% /dev/shm
tmpfs  139600 208139392   1% /run
tmpfs  139600   0139600   0% /sys/fs/cgroup
tmpfs  139600   8139592   1% /tmp
/dev/xvda1 487652  106846351110  24% /boot
/dev/xvdb12085888  587752 1498136  29% /gv
localhost:/gv_html2085888  587776   1498112  29% /var/www/html
[root@srvhttp1 html]#
[root@srvhttp1 html]# cd /gv
[root@srvhttp1 gv]# ll -a
total 12
drwxr-xr-x   3 root root   17 Jan 28 14:42 .
dr-xr-xr-x. 19 root root 4096 Out 18 11:16 ..
drwxr-xr-x   4 root root   37 Jan 28 14:42 html
[root@srvhttp1 gv]#

Em 28-01-2014 12:01, Franco Broi escreveu:


Every peer has a copy of the files but I'm not sure it's 100% safe to
remove them entirely. I've never really got a definitive answer from
the Gluster devs but if your files were trashed anyway you don't have
anything to lose.

This is what I did.

On the bad node stop glusterd

Make a copy of the /var/lib/glusterd dir, then remove it.

Start glusterd

peer probe the good node.

Restart glusterd

And that should be it. Check the files are there.

If it doesn't work you can restore the files from the backup copy.

On 28 Jan 2014 21:48, Jefferson Carlos Machado
lista.li...@results.com.br wrote:
Hi,

I have only 2 nodes in this cluster.
So can I remove the config files?

Regards,
Em 28-01-2014 04:17, Franco Broi escreveu:
 I think Jefferson's problem might have been due to corrupted config
 files, maybe because the /var partition was full as suggested by Paul
 Boven but as has been pointed out before, the error messages don't make
 it obvious what's wrong.

 He got glusterd started but now the peers can't communicate, probably
 because a uuid is wrong. This is an weird problem to debug because the
 clients can see the data but df may not show the full size and you
 wouldn't now anything was wrong until like Jefferson you looked in the
 gluster log file.

 [2014-01-27 15:48:19.580353] E [socket.c:2788:socket_connect]
0-management: connection attempt failed (Connection refused)
 [2014-01-27 15:48:19.583374] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management:
Found brick
 [2014-01-27 15:48:22.584029] E [socket.c:2788:socket_connect]
0-management: connection attempt failed (Connection refused)
 [2014-01-27 15:48:22.607477] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management:
Found brick
 [2014-01-27 15:48:25.608186] E [socket.c:2788:socket_connect]
0-management: connection attempt failed (Connection refused)
 [2014-01-27 15:48:25.612032] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management:
Found brick
 [2014-01-27 15:48:28.612638] E [socket.c:2788:socket_connect]
0-management: connection attempt failed (Connection refused)
 [2014-01-27 15:48:28.615509] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management:
Found brick

 I think the advice should

Re: [Gluster-users] Glusterd dont start

2014-01-27 Thread Paul Boven

Hi Jefferson,

Did you perhaps run out of diskspace at some point in the past? I've had 
a similar thing happen to me this weekend, the machine worked fine for 
weeks after fixing the diskspace issue, but glusterd wouldn't start 
after a reboot. Here's how I managed to get things working again.


Amongst all the noise, the error message that 'resolve brick failed' 
seems to be the key here.


Check these files, by comparing the between your gluster servers:

/var/lib/glusterd/peers/uuid
This file should actually contain the uuid, IP-address etc. for the 
peer. Look at your other node how it looks, and adapt UUID and IP 
address as appropriate.


/var/lib/glusterd/glusterd.info
This file should contain the UUID of the host, and you might be able to 
retrieve it from the other side.


I got mine back in working order after fixing these two files.

Regards, Paul Boven.

On 01/27/2014 12:53 PM, Jefferson Carlos Machado wrote:

Hi,

Please, help me!!

After reboot my system the service glusterd dont start.

the /var/log/glusterfs/etc-glusterfs-glusterd.vol.log

[2014-01-27 09:27:02.898807] I [glusterfsd.c:1910:main]
0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.4.2
(/usr/sbin/glusterd -p /run/glusterd.pid)
[2014-01-27 09:27:02.909147] I [glusterd.c:961:init] 0-management: Using
/var/lib/glusterd as working directory
[2014-01-27 09:27:02.913247] I [socket.c:3480:socket_init]
0-socket.management: SSL support is NOT enabled
[2014-01-27 09:27:02.913273] I [socket.c:3495:socket_init]
0-socket.management: using system polling thread
[2014-01-27 09:27:02.914337] W [rdma.c:4197:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event channel creation failed (No such
device)
[2014-01-27 09:27:02.914359] E [rdma.c:4485:init] 0-rdma.management:
Failed to initialize IB Device
[2014-01-27 09:27:02.914375] E [rpc-transport.c:320:rpc_transport_load]
0-rpc-transport: 'rdma' initialization failed
[2014-01-27 09:27:02.914535] W [rpcsvc.c:1389:rpcsvc_transport_create]
0-rpc-service: cannot create listener, initing the transport failed
[2014-01-27 09:27:05.337557] I
[glusterd-store.c:1339:glusterd_restore_op_version] 0-glusterd:
retrieved op-version: 2
[2014-01-27 09:27:05.373853] E
[glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key:
brick-0
[2014-01-27 09:27:05.373927] E
[glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key:
brick-1
[2014-01-27 09:27:06.166721] I [glusterd.c:125:glusterd_uuid_init]
0-management: retrieved UUID: 28f232e9-564f-4866-8014-32bb020766f2
[2014-01-27 09:27:06.169422] E
[glusterd-store.c:2487:glusterd_resolve_all_bricks] 0-glusterd: resolve
brick failed in restore
[2014-01-27 09:27:06.169491] E [xlator.c:390:xlator_init] 0-management:
Initialization of volume 'management' failed, review your volfile again
[2014-01-27 09:27:06.169516] E [graph.c:292:glusterfs_graph_init]
0-management: initializing translator failed
[2014-01-27 09:27:06.169532] E [graph.c:479:glusterfs_graph_activate]
0-graph: init failed
[2014-01-27 09:27:06.169769] W [glusterfsd.c:1002:cleanup_and_exit]
(--/usr/sbin/glusterd(main+0x3df) [0x7f23c76588ef]
(--/usr/sbin/glusterd(glusterfs_volumes_init+0xb0) [0x7f23c765b6e0]
(--/usr/sbin/glusterd(glusterfs_process_volfp+0x103)
[0x7f23c765b5f3]))) 0-: received signum (0), shutting down

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users



--
Paul Boven bo...@jive.nl +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster and kvm livemigration

2014-01-26 Thread Paul Boven

Hi James, everyone,

When debugging things, I already came across this bug. It is unlikely to 
be the cause of our issues:


Firstly, we migrated from 3.4.0 to 3.4.1, so we already had the possible 
port number conflict, but things worked fine with 3.4.0.


Secondly, I don't see the 'address already in use' messages in my 
logfiles (see the qemu logfiles I posted).


Also, the migration itself doesn't fail: it works fine, the guest ends 
up on the other server, it's just that the migrated guest loses 
read/write access to its filesystem.


Regards, Paul Boven.

On 01/24/2014 01:19 AM, James wrote:

Not sure if it's related at all, but is there any chance this has
anything to do with:

https://bugzilla.redhat.com/show_bug.cgi?id=987555

It came to mind as something to do with glusterfs+libvirt+migration.

HTH,
James



On Thu, Jan 16, 2014 at 5:52 AM, Bernhard Glomm
bernhard.gl...@ecologic.eu mailto:bernhard.gl...@ecologic.eu wrote:

I experienced a strange behavior of glusterfs during livemigration
of a qemu-kvm guest
using a 10GB file on a mirrored gluster 3.4.2 volume
(both on ubuntu 13.04)
I run
virsh migrate --verbose --live --unsafe --p2p --domain atom01
--desturi qemu+ssh://target_ip/system
and the migration works,
the running machine is pingable and keeps sending pings.
nevertheless, when I let the machine touch a file during migration
it stops, complaining that it's filesystem is read only (from that
moment that
migration finished)
A reboot from inside the machine failes,
machine goes down and comes up with an error
unable to write to sector xx on hd0
(than falling into the initrd).
a
virsh destroy VM  virsh start VM
leads to a perfect running VM again,
no matter on which of the two hosts I start the machine
anybody better experience with livemigration?
any hint on a procedure how to debug that?
TIA
Bernhard

--

*Ecologic Institute**Bernhard Glomm*
IT Administration

Phone:  +49 (30) 86880 134 tel:%2B49%20%2830%29%2086880%20134
Fax:+49 (30) 86880 100 tel:%2B49%20%2830%29%2086880%20100
Skype:  bernhard.glomm.ecologic

Website: http://ecologic.eu | Video:
http://www.youtube.com/v/hZtiK04A9Yo | Newsletter:
http://ecologic.eu/newsletter/subscribe | Facebook:
http://www.facebook.com/Ecologic.Institute | Linkedin:
http://www.linkedin.com/company/ecologic-institute-berlin-germany
| Twitter: http://twitter.com/EcologicBerlin | YouTube:
http://www.youtube.com/user/EcologicInstitute | Google+:
http://plus.google.com/113756356645020994482
Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 |
10717 Berlin | Germany
GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 |
USt/VAT-IdNr.: DE811963464
Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH



___
Gluster-users mailing list
Gluster-users@gluster.org mailto:Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users




___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users




--
Paul Boven bo...@jive.nl +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster and kvm livemigration

2014-01-26 Thread Paul Boven

Hi Bernhard,

Indeed I see the same behaviour:
When a guest is running, it is owned by libvirt:kvm on both servers.
When a guest is stopped, it is owned by root:root on both servers.
In a failed migration, the ownership changes to root:root.

I'm not convinced though that it is a simple unix permission problem, 
because after a failed migration, the guest.raw image is completely 
unreadable on the destination machine, even for root (permission 
denied), whereas I can still read it fine (e.g. dd or md5sum) on the 
originating server.


Regards, Paul Boven.

On 01/23/2014 08:10 PM, BGM wrote:

Hi Paul,
thnx, nice report,
u file(d) the bug?
can u do a
watch tree - pfungiA path to ur vm images pool
on both hosts
some vm running, some stopped.
start a machine
trigger the migration
at some point, the ownership of the vmimage.file flips from
libvirtd (running machnie) to root (normal permission, but only when stopped).
If the ownership/permission flips that way,
libvirtd on the reciving side
can't write that file ...
does group/acl permission flip likewise?
Regards
Bernhard

On 23.01.2014, at 16:49, Paul Boven bo...@jive.nl wrote:


Hi Bernhard,

I'm having exactly the same problem on Ubuntu 13.04 with the 3.4.1 packages 
from semiosis. It worked fine with glusterfs-3.4.0.

We've been trying to debug this on the list, but haven't found the smoking gun 
yet.

Please have a look at the URL below, and see if it matches what you are 
experiencing?

http://epboven.home.xs4all.nl/gluster-migrate.html

Regards, Paul Boven.

On 01/23/2014 04:27 PM, Bernhard Glomm wrote:


I had/have problems with live-migrating a virtual machine on a 2sided
replica volume.

I run ubuntu 13.04 and gluster 3.4.2 from semiosis


with network.remote-dio to enable I can use cache mode = none as
performance option for the virtual disks,

so live migration works without --unsafe

I'm triggering the migration now through the Virtual Machine Manager as an

unprivileged user which is group member of libvirtd.


After migration the disks become read-only because

on migration the disk files changes ownership from

libvirt-qemu to root


What am I missing?


TIA


Bernhard



___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users



--
Paul Boven bo...@jive.nl +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users



--
Paul Boven bo...@jive.nl +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Strange side-effect to option base-port 50152

2014-01-26 Thread Paul Boven

Hi folks,

While debugging the migration issue, I noticed that sometimes we did in 
fact occasionally hit but 987555 (Address already in use) when doing a 
live-migration, so I decided to implement the advice in aforementioned bug.


So I set 'option base-port 50152' in /etc/glusterfs/glusterd.vol (note 
that the bug talks about /etc/glusterfs/gluster.vol, which doesn't 
exist). The result of this was that migrations completely stopped 
working. Trying to do a migration would cause the process on the sending 
machine to hang, and on the receiving machine, libvirt became completely 
unresponsive, even 'virsh list' would simply hang.


Curiously, 'gluster volume status' showed that, despite setting the 
base-port to 50152, the bricks were still listening on 49152, as before 
the config change  reboot.


I reverted my change to 'option-base-port', did another set of reboots, 
and now migration is 'working' again, as in the guest gets moved across, 
but then still loses access to its image (see my other mails).


Ubuntu 13.04, glusterfs-3.4.1.

Regards, Paul Boven.
--
Paul Boven bo...@jive.nl +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Strange side-effect to option base-port 50152

2014-01-26 Thread Paul Boven

Hi Vijay, everyone,

On 01/26/2014 04:22 PM, Vijay Bellur wrote:

On 01/26/2014 07:57 PM, Paul Boven wrote:



The result of this was that migrations completely stopped
working. Trying to do a migration would cause the process on the sending
machine to hang, and on the receiving machine, libvirt became completely
unresponsive, even 'virsh list' would simply hang.

Curiously, 'gluster volume status' showed that, despite setting the
base-port to 50152, the bricks were still listening on 49152, as before
the config change  reboot.


Restart of both glusterd and the volume would be necessary to change the
ports where the bricks listen.


As I wrote, I did a complete reboot of both machines, to make sure that 
this configuration change was accepted.



I am still not certain as to what caused the migration to hang. Do you
notice anything unusual in the log files when the hang happens?


The message I get after setting the base-port is still:
-incoming tcp:0.0.0.0:49152: Failed to bind socket: Address already in use

Regards, Paul Boven.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster and kvm livemigration

2014-01-23 Thread Paul Boven

Hi Bernhard,

I'm having exactly the same problem on Ubuntu 13.04 with the 3.4.1 
packages from semiosis. It worked fine with glusterfs-3.4.0.


We've been trying to debug this on the list, but haven't found the 
smoking gun yet.


Please have a look at the URL below, and see if it matches what you are 
experiencing?


http://epboven.home.xs4all.nl/gluster-migrate.html

Regards, Paul Boven.

On 01/23/2014 04:27 PM, Bernhard Glomm wrote:


I had/have problems with live-migrating a virtual machine on a 2sided
replica volume.

I run ubuntu 13.04 and gluster 3.4.2 from semiosis


with network.remote-dio to enable I can use cache mode = none as
performance option for the virtual disks,

so live migration works without --unsafe

I'm triggering the migration now through the Virtual Machine Manager as an

unprivileged user which is group member of libvirtd.


After migration the disks become read-only because

on migration the disk files changes ownership from

libvirt-qemu to root


What am I missing?


TIA


Bernhard



___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users




--
Paul Boven bo...@jive.nl +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Migrating a VM makes its gluster storage inaccessible

2014-01-22 Thread Paul Boven

Hi Josh, everyone,

I've just tried the server.allow-insecure option, and it makes no 
difference.


You can find a summary and the logfiles at this URL:

http://epboven.home.xs4all.nl/gluster-migrate.html

The migration itself happens at 14:00:00, with the first write access 
attempt by the migrated guest at 14:00:25 which results in the 
'permission denied' errors in the gluster.log. Some highlights from 
gluster.log:


[2014-01-22 14:00:00.779741] D 
[afr-common.c:131:afr_lookup_xattr_req_prepare] 0-gv0-replicate-0: 
/kvmtest.raw: failed to get the gfid from dict


[2014-01-22 14:00:00.780458] D 
[afr-common.c:1380:afr_lookup_select_read_child] 0-gv0-replicate-0: 
Source selected as 1 for /kvmtest.raw


[2014-01-22 14:00:25.176181] W 
[client-rpc-fops.c:471:client3_3_open_cbk] 0-gv0-client-1: remote 
operation failed: Permission denied. Path: /kvmtest.raw 
(f7ed9edd-c6bd-4e86-b448-1d98bb38314b)


[2014-01-22 14:00:25.176322] W [fuse-bridge.c:2167:fuse_writev_cbk] 
0-glusterfs-fuse: 2494829: WRITE = -1 (Permission denied)


Regards, Paul Boven.

On 01/21/2014 05:35 PM, Josh Boon wrote:

Hey Paul,


Have you tried server.allow-insecure: on as a volume option? If that doesn't 
work we'll need the logs for both bricks.

Best,
Josh

- Original Message -
From: Paul Boven bo...@jive.nl
To: gluster-users@gluster.org
Sent: Tuesday, January 21, 2014 11:12:03 AM
Subject: Re: [Gluster-users] Migrating a VM makes its gluster storage   
inaccessible

Hi Josh, everyone,

Glad you're trying to help, so no need to apologize at all.

mount output:
/dev/sdb1 on /export/brick0 type xfs (rw)

localhost:/gv0 on /gluster type fuse.glusterfs
(rw,default_permissions,allow_other,max_read=131072)

gluster volume info all:
Volume Name: gv0
Type: Replicate
Volume ID: ee77a036-50c7-4a41-b10d-cc0703769df9
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.88.4.0:/export/brick0/sdb1
Brick2: 10.88.4.1:/export/brick0/sdb1
Options Reconfigured:
diagnostics.client-log-level: INFO
diagnostics.brick-log-level: INFO

Regards, Paul Boven.




On 01/21/2014 05:02 PM, Josh Boon wrote:

Hey Paul,

Definitely looks to be gluster. Sorry about the wrong guess on UID/GID.  What's the output of 
mount and gluster volume info all?

Best,
Josh


- Original Message -
From: Paul Boven bo...@jive.nl
To: gluster-users@gluster.org
Sent: Tuesday, January 21, 2014 10:56:34 AM
Subject: Re: [Gluster-users] Migrating a VM makes its gluster storage   
inaccessible

Hi Josh,

I've taken great care that /etc/passwd and /etc/group are the same on
both machines. When the problem occurs, even root gets 'permission
denied' when trying to read /gluster/guest.raw. So my first reaction was
that it cannot be a uid problem.

In the normal situation, the storage for a running guest is owned by
libvirt-qemu:kvm. When I shut a guest down (virsh destroy), the
ownership changes to root:root on both cluster servers.

During a migration (that fails), the ownership also ends up as root:root
on both, which I hadn't noticed before. Filemode is 0644.

On the originating server, root can still read /gluster/guest.raw,
whereas on the destination, this gives me 'permission denied'.

The qemu logfile for the guest doesn't show much interesting
information, merely 'shutting down' on the originating server, and the
startup on de destination server. Libvirt/qemu does not seem to be aware
of the situation that the guest ends up in. I'll post the gluster logs
somewhere, too.

   From the destination server:

LC_ALL=C
PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
/usr/bin/kvm -name kvmtest -S -M pc-i440fx-1.4 -m 1024 -smp
1,sockets=1,cores=1,threads=1 -uuid 97db2d3f-c8e4-31de-9f89-848356b20da5
-nographic -no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/kvmtest.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
file=/gluster/kvmtest.raw,if=none,id=drive-virtio-disk0,format=raw,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:01:01:11,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -incoming tcp:0.0.0.0:49166
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
W: kvm binary is deprecated, please use qemu-system-x86_64 instead
char device redirected to /dev/pts/4 (label charserial0)

Regards, Paul Boven.






On 01/21/2014 04:22 PM, Josh Boon wrote:


Paul,

Sounds like a potential uid/gid problem.  Would you be able to update with the 
logs from cd /var/log/libvirt/qemu/ for the guest from both source and 
destination? Also the gluster logs for the volume would be awesome.


Best,
Josh

- Original Message -
From: Paul Boven bo...@jive.nl
To: gluster

Re: [Gluster-users] Migrating a VM makes its gluster storage inaccessible

2014-01-22 Thread Paul Boven

Hi Paul, everyone,

Thanks for your reply. The networking setup for these is very simple, 
and does not involve NAT, only statically assigned addresses.


Each host has a 1G on a private space address which is set up as a 
bridge interface, which contains the IP address of the host itself, and 
its guest VMs. The hosts are also connected together through a 10G link

in a separate private space /31. Dnsmasq and NAT are not in use.

Is there any way in which I could debug whether this is a networking issue?

This is the network configuration file /etc/network/interfaces:

# The primary network interface
auto p2p1
iface p2p1 inet manual

# Virtual bridge interface on p2p1 (control net)
auto br0
iface br0 inet static
address 10.0.0.10
netmask 255.255.255.0
gateway 10.0.0.1
dns-nameservers 10.0.0.100 10.0.0.101
dns-search example.com
bridge_ports p2p1
bridge_fd 9
bridge_hello 2
bridge_maxage 12
bridge_stp off

# 10G cluster interconnect
auto p1p1
iface p1p1 inet static
address 10.0.1.0
netmask 255.255.255.254
mtu 9000

Regards, Paul Boven.



On 01/22/2014 05:20 PM, Paul Robert Marino wrote:

are you doing any thin like NATing the VM's on the physical host, or
do you have any Iptables forward rules on the physical host.
if so you may have a connection tracking issue.
there are a couple of ways you can fix that if thats the case the
easiest of which is to install conntrackd on the physical hosts and
configure it to sync directly into the live connection tracking table;
however it does limit your scaling capabilities for your fail over
zones.
The second is not to do that any more.



On Wed, Jan 22, 2014 at 9:38 AM, Paul Boven bo...@jive.nl wrote:

Hi Josh, everyone,

I've just tried the server.allow-insecure option, and it makes no
difference.

You can find a summary and the logfiles at this URL:

http://epboven.home.xs4all.nl/gluster-migrate.html

The migration itself happens at 14:00:00, with the first write access
attempt by the migrated guest at 14:00:25 which results in the 'permission
denied' errors in the gluster.log. Some highlights from gluster.log:

[2014-01-22 14:00:00.779741] D
[afr-common.c:131:afr_lookup_xattr_req_prepare] 0-gv0-replicate-0:
/kvmtest.raw: failed to get the gfid from dict

[2014-01-22 14:00:00.780458] D
[afr-common.c:1380:afr_lookup_select_read_child] 0-gv0-replicate-0: Source
selected as 1 for /kvmtest.raw

[2014-01-22 14:00:25.176181] W [client-rpc-fops.c:471:client3_3_open_cbk]
0-gv0-client-1: remote operation failed: Permission denied. Path:
/kvmtest.raw (f7ed9edd-c6bd-4e86-b448-1d98bb38314b)

[2014-01-22 14:00:25.176322] W [fuse-bridge.c:2167:fuse_writev_cbk]
0-glusterfs-fuse: 2494829: WRITE = -1 (Permission denied)

Regards, Paul Boven.


On 01/21/2014 05:35 PM, Josh Boon wrote:


Hey Paul,


Have you tried server.allow-insecure: on as a volume option? If that
doesn't work we'll need the logs for both bricks.

Best,
Josh

- Original Message -
From: Paul Boven bo...@jive.nl
To: gluster-users@gluster.org
Sent: Tuesday, January 21, 2014 11:12:03 AM
Subject: Re: [Gluster-users] Migrating a VM makes its gluster storage
inaccessible

Hi Josh, everyone,

Glad you're trying to help, so no need to apologize at all.

mount output:
/dev/sdb1 on /export/brick0 type xfs (rw)

localhost:/gv0 on /gluster type fuse.glusterfs
(rw,default_permissions,allow_other,max_read=131072)

gluster volume info all:
Volume Name: gv0
Type: Replicate
Volume ID: ee77a036-50c7-4a41-b10d-cc0703769df9
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.88.4.0:/export/brick0/sdb1
Brick2: 10.88.4.1:/export/brick0/sdb1
Options Reconfigured:
diagnostics.client-log-level: INFO
diagnostics.brick-log-level: INFO

Regards, Paul Boven.




On 01/21/2014 05:02 PM, Josh Boon wrote:


Hey Paul,

Definitely looks to be gluster. Sorry about the wrong guess on UID/GID.
What's the output of mount and gluster volume info all?

Best,
Josh


- Original Message -
From: Paul Boven bo...@jive.nl
To: gluster-users@gluster.org
Sent: Tuesday, January 21, 2014 10:56:34 AM
Subject: Re: [Gluster-users] Migrating a VM makes its gluster storage
inaccessible

Hi Josh,

I've taken great care that /etc/passwd and /etc/group are the same on
both machines. When the problem occurs, even root gets 'permission
denied' when trying to read /gluster/guest.raw. So my first reaction was
that it cannot be a uid problem.

In the normal situation, the storage for a running guest is owned by
libvirt-qemu:kvm. When I shut a guest down (virsh destroy), the
ownership changes to root:root on both cluster servers.

During a migration (that fails), the ownership also ends up as root:root
on both, which I hadn't noticed before. Filemode is 0644.

On the originating server, root can still read /gluster/guest.raw,
whereas on the destination, this gives me 'permission denied'.

The qemu logfile for the guest doesn't

[Gluster-users] Migrating a VM makes its gluster storage inaccessible

2014-01-21 Thread Paul Boven

Hi everyone

We've been running glusterfs-3.4.0 on Ubuntu 13.04, using semiosis' 
packages. We're using kvm (libvrt) to host guest installs, and thanks to 
gluster and libvirt, we can live-migrate guests between the two hosts.


Recently I ran an apt-get update/upgrade to stay up-to-date with 
security patches, and this also upgraded our glusterfs to the 3.4.1 
version of the packages.


Since this upgrade (which updated the gluster packages, but also the 
Ubuntu kernel package), kvm live migration fails in a most unusual 
manner. The live migration itself succeeds, but on the receiving 
machine, the vm-storage for that machine becomes inaccessible. Which in 
turn causes the guest OS to no longer be able to read or write its 
filesystem, with of course fairly disastrous consequences for such a guest.


So before a migration, everything is running smoothly. The two cluster 
nodes are 'cl0' and 'cl1', and we do the migration like this:


virsh migrate --live --persistent --undefinesource guest 
qemu+tls://cl1/system


The migration itself works, but soon as you do the migration, the 
/gluster/guest.raw file (which holds the filesystem for the guest) 
becomes completely inaccessible: trying to read it (e.g. with dd or 
md5sum) results in a 'permission denied' on the destination cluster 
node, whereas the file is still perfectly fine on the machine that the 
migration originated from.


As soon as I stop the guest (virsh destroy), the /gluster/guest.raw file 
becomes readable again and I can start up the guest on either server 
without further issues. It does not affect any of the other files in 
/gluster/.


The problem seems to be in the gluster or fuse part, because once this 
error condition is triggered, the /gluster/guest.raw cannot be read by 
any application on the destination server. This situation is 100% 
reproducible, every attempted live migration fails in this way.


Has anyone else experienced this? Is this a known or new bug?

We've done some troubleshooting already in the irc channel (thanks to 
everyone for their help) but haven't found the smoking gun yet. I would 
appreciate any help in debugging and resolving this.


Regards, Paul Boven.
--
Paul Boven bo...@jive.nl +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Migrating a VM makes its gluster storage inaccessible

2014-01-21 Thread Paul Boven

Hi James,

Thanks for the quick reply.

We are only using the fuse mounted paths at the moment. So libvirt/qemu 
simply know of these files as /gluster/guest.raw, and the guests are not 
aware of libgluster.


Some version numbers:

Kernel: Ubuntu 3.8.0-35-generic (13.10, Raring)
Glusterfs: 3.4.1-ubuntu1~raring1
qemu: 1.4.0+dfsg-1expubuntu4
libvirt0: 1.0.2-0ubuntu11.13.04.4
The gluster bricks are on xfs.

Regards, Paul Boven.


On 01/21/2014 03:25 PM, James wrote:

Are you using the qemu gluster:// storage or are you using a fuse
mounted file path?

I would actually expect it to work with either, however I haven't had
a chance to test this yet.

It's probably also useful if you post your qemu versions...

James

On Tue, Jan 21, 2014 at 9:15 AM, Paul Boven bo...@jive.nl wrote:

Hi everyone

We've been running glusterfs-3.4.0 on Ubuntu 13.04, using semiosis'
packages. We're using kvm (libvrt) to host guest installs, and thanks to
gluster and libvirt, we can live-migrate guests between the two hosts.

Recently I ran an apt-get update/upgrade to stay up-to-date with security
patches, and this also upgraded our glusterfs to the 3.4.1 version of the
packages.

Since this upgrade (which updated the gluster packages, but also the Ubuntu
kernel package), kvm live migration fails in a most unusual manner. The live
migration itself succeeds, but on the receiving machine, the vm-storage for
that machine becomes inaccessible. Which in turn causes the guest OS to no
longer be able to read or write its filesystem, with of course fairly
disastrous consequences for such a guest.

So before a migration, everything is running smoothly. The two cluster nodes
are 'cl0' and 'cl1', and we do the migration like this:

virsh migrate --live --persistent --undefinesource guest
qemu+tls://cl1/system

The migration itself works, but soon as you do the migration, the
/gluster/guest.raw file (which holds the filesystem for the guest) becomes
completely inaccessible: trying to read it (e.g. with dd or md5sum) results
in a 'permission denied' on the destination cluster node, whereas the file
is still perfectly fine on the machine that the migration originated from.

As soon as I stop the guest (virsh destroy), the /gluster/guest.raw file
becomes readable again and I can start up the guest on either server without
further issues. It does not affect any of the other files in /gluster/.

The problem seems to be in the gluster or fuse part, because once this error
condition is triggered, the /gluster/guest.raw cannot be read by any
application on the destination server. This situation is 100% reproducible,
every attempted live migration fails in this way.

Has anyone else experienced this? Is this a known or new bug?

We've done some troubleshooting already in the irc channel (thanks to
everyone for their help) but haven't found the smoking gun yet. I would
appreciate any help in debugging and resolving this.

Regards, Paul Boven.
--
Paul Boven bo...@jive.nl +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users



--
Paul Boven bo...@jive.nl +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Migrating a VM makes its gluster storage inaccessible

2014-01-21 Thread Paul Boven

Hi Josh,

I've taken great care that /etc/passwd and /etc/group are the same on 
both machines. When the problem occurs, even root gets 'permission 
denied' when trying to read /gluster/guest.raw. So my first reaction was 
that it cannot be a uid problem.


In the normal situation, the storage for a running guest is owned by 
libvirt-qemu:kvm. When I shut a guest down (virsh destroy), the 
ownership changes to root:root on both cluster servers.


During a migration (that fails), the ownership also ends up as root:root 
on both, which I hadn't noticed before. Filemode is 0644.


On the originating server, root can still read /gluster/guest.raw, 
whereas on the destination, this gives me 'permission denied'.


The qemu logfile for the guest doesn't show much interesting 
information, merely 'shutting down' on the originating server, and the 
startup on de destination server. Libvirt/qemu does not seem to be aware 
of the situation that the guest ends up in. I'll post the gluster logs 
somewhere, too.


From the destination server:

LC_ALL=C 
PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin 
/usr/bin/kvm -name kvmtest -S -M pc-i440fx-1.4 -m 1024 -smp 
1,sockets=1,cores=1,threads=1 -uuid 97db2d3f-c8e4-31de-9f89-848356b20da5 
-nographic -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/kvmtest.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc 
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
file=/gluster/kvmtest.raw,if=none,id=drive-virtio-disk0,format=raw,cache=none 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 
-netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:01:01:11,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device 
isa-serial,chardev=charserial0,id=serial0 -incoming tcp:0.0.0.0:49166 
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

W: kvm binary is deprecated, please use qemu-system-x86_64 instead
char device redirected to /dev/pts/4 (label charserial0)

Regards, Paul Boven.






On 01/21/2014 04:22 PM, Josh Boon wrote:


Paul,

Sounds like a potential uid/gid problem.  Would you be able to update with the 
logs from cd /var/log/libvirt/qemu/ for the guest from both source and 
destination? Also the gluster logs for the volume would be awesome.


Best,
Josh

- Original Message -
From: Paul Boven bo...@jive.nl
To: gluster-users@gluster.org
Sent: Tuesday, January 21, 2014 9:36:06 AM
Subject: Re: [Gluster-users] Migrating a VM makes its gluster storage   
inaccessible

Hi James,

Thanks for the quick reply.

We are only using the fuse mounted paths at the moment. So libvirt/qemu
simply know of these files as /gluster/guest.raw, and the guests are not
aware of libgluster.

Some version numbers:

Kernel: Ubuntu 3.8.0-35-generic (13.10, Raring)
Glusterfs: 3.4.1-ubuntu1~raring1
qemu: 1.4.0+dfsg-1expubuntu4
libvirt0: 1.0.2-0ubuntu11.13.04.4
The gluster bricks are on xfs.

Regards, Paul Boven.


On 01/21/2014 03:25 PM, James wrote:

Are you using the qemu gluster:// storage or are you using a fuse
mounted file path?

I would actually expect it to work with either, however I haven't had
a chance to test this yet.

It's probably also useful if you post your qemu versions...

James

On Tue, Jan 21, 2014 at 9:15 AM, Paul Boven bo...@jive.nl wrote:

Hi everyone

We've been running glusterfs-3.4.0 on Ubuntu 13.04, using semiosis'
packages. We're using kvm (libvrt) to host guest installs, and thanks to
gluster and libvirt, we can live-migrate guests between the two hosts.

Recently I ran an apt-get update/upgrade to stay up-to-date with security
patches, and this also upgraded our glusterfs to the 3.4.1 version of the
packages.

Since this upgrade (which updated the gluster packages, but also the Ubuntu
kernel package), kvm live migration fails in a most unusual manner. The live
migration itself succeeds, but on the receiving machine, the vm-storage for
that machine becomes inaccessible. Which in turn causes the guest OS to no
longer be able to read or write its filesystem, with of course fairly
disastrous consequences for such a guest.

So before a migration, everything is running smoothly. The two cluster nodes
are 'cl0' and 'cl1', and we do the migration like this:

virsh migrate --live --persistent --undefinesource guest
qemu+tls://cl1/system

The migration itself works, but soon as you do the migration, the
/gluster/guest.raw file (which holds the filesystem for the guest) becomes
completely inaccessible: trying to read it (e.g. with dd or md5sum) results
in a 'permission denied' on the destination cluster node, whereas the file
is still perfectly fine on the machine that the migration originated from.

As soon as I stop the guest (virsh destroy), the /gluster/guest.raw file
becomes readable again and I can start up the guest on either server without

Re: [Gluster-users] Migrating a VM makes its gluster storage inaccessible

2014-01-21 Thread Paul Boven

Hi Josh, everyone,

Glad you're trying to help, so no need to apologize at all.

mount output:
/dev/sdb1 on /export/brick0 type xfs (rw)

localhost:/gv0 on /gluster type fuse.glusterfs 
(rw,default_permissions,allow_other,max_read=131072)


gluster volume info all:
Volume Name: gv0
Type: Replicate
Volume ID: ee77a036-50c7-4a41-b10d-cc0703769df9
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.88.4.0:/export/brick0/sdb1
Brick2: 10.88.4.1:/export/brick0/sdb1
Options Reconfigured:
diagnostics.client-log-level: INFO
diagnostics.brick-log-level: INFO

Regards, Paul Boven.




On 01/21/2014 05:02 PM, Josh Boon wrote:

Hey Paul,

Definitely looks to be gluster. Sorry about the wrong guess on UID/GID.  What's the output of 
mount and gluster volume info all?

Best,
Josh


- Original Message -
From: Paul Boven bo...@jive.nl
To: gluster-users@gluster.org
Sent: Tuesday, January 21, 2014 10:56:34 AM
Subject: Re: [Gluster-users] Migrating a VM makes its gluster storage   
inaccessible

Hi Josh,

I've taken great care that /etc/passwd and /etc/group are the same on
both machines. When the problem occurs, even root gets 'permission
denied' when trying to read /gluster/guest.raw. So my first reaction was
that it cannot be a uid problem.

In the normal situation, the storage for a running guest is owned by
libvirt-qemu:kvm. When I shut a guest down (virsh destroy), the
ownership changes to root:root on both cluster servers.

During a migration (that fails), the ownership also ends up as root:root
on both, which I hadn't noticed before. Filemode is 0644.

On the originating server, root can still read /gluster/guest.raw,
whereas on the destination, this gives me 'permission denied'.

The qemu logfile for the guest doesn't show much interesting
information, merely 'shutting down' on the originating server, and the
startup on de destination server. Libvirt/qemu does not seem to be aware
of the situation that the guest ends up in. I'll post the gluster logs
somewhere, too.

  From the destination server:

LC_ALL=C
PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
/usr/bin/kvm -name kvmtest -S -M pc-i440fx-1.4 -m 1024 -smp
1,sockets=1,cores=1,threads=1 -uuid 97db2d3f-c8e4-31de-9f89-848356b20da5
-nographic -no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/kvmtest.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
file=/gluster/kvmtest.raw,if=none,id=drive-virtio-disk0,format=raw,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:01:01:11,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -incoming tcp:0.0.0.0:49166
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
W: kvm binary is deprecated, please use qemu-system-x86_64 instead
char device redirected to /dev/pts/4 (label charserial0)

Regards, Paul Boven.






On 01/21/2014 04:22 PM, Josh Boon wrote:


Paul,

Sounds like a potential uid/gid problem.  Would you be able to update with the 
logs from cd /var/log/libvirt/qemu/ for the guest from both source and 
destination? Also the gluster logs for the volume would be awesome.


Best,
Josh

- Original Message -
From: Paul Boven bo...@jive.nl
To: gluster-users@gluster.org
Sent: Tuesday, January 21, 2014 9:36:06 AM
Subject: Re: [Gluster-users] Migrating a VM makes its gluster storage   
inaccessible

Hi James,

Thanks for the quick reply.

We are only using the fuse mounted paths at the moment. So libvirt/qemu
simply know of these files as /gluster/guest.raw, and the guests are not
aware of libgluster.

Some version numbers:

Kernel: Ubuntu 3.8.0-35-generic (13.10, Raring)
Glusterfs: 3.4.1-ubuntu1~raring1
qemu: 1.4.0+dfsg-1expubuntu4
libvirt0: 1.0.2-0ubuntu11.13.04.4
The gluster bricks are on xfs.

Regards, Paul Boven.


On 01/21/2014 03:25 PM, James wrote:

Are you using the qemu gluster:// storage or are you using a fuse
mounted file path?

I would actually expect it to work with either, however I haven't had
a chance to test this yet.

It's probably also useful if you post your qemu versions...

James

On Tue, Jan 21, 2014 at 9:15 AM, Paul Boven bo...@jive.nl wrote:

Hi everyone

We've been running glusterfs-3.4.0 on Ubuntu 13.04, using semiosis'
packages. We're using kvm (libvrt) to host guest installs, and thanks to
gluster and libvirt, we can live-migrate guests between the two hosts.

Recently I ran an apt-get update/upgrade to stay up-to-date with security
patches, and this also upgraded our glusterfs to the 3.4.1 version of the
packages.

Since this upgrade (which updated the gluster packages, but also the Ubuntu
kernel package), kvm live migration fails in a most unusual manner. The live