Re: [Gluster-users] Problem with creating constraints
Uuuups, sorry wrong list. Pls. forget it Uwe ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Problem with creating constraints
Hello List, I have a problem in creating a constraint. I hope that someone could help me and give me a hint. I have three resources (A,B,C) and two cluster nodes (node0,node1). Resource A can run only on node0 and resource B can run only node1. I managed this by resource-stickiness INFINITY for each resource. Resource C can run on both nodes but on node0 only if resource A is up and running or on node 1 if resource B is up and running. Additionally resource C can only start after resource A or B How do I have to create resource order constraints and resource location constraints that meets the requirement of resource C Thx in advance Uwe ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] glusterfs, pacemaker and Filesystem RA
Hello List due to a mistake my post from yesterday has been cut. This is why I try to send my post again and open it as new thread. I hope it will work this time. < Original postet mail starts here > Hello Marcel, hello Samuel, sorry for my late answer, but I was away for two months and for that I could continue my tests last week. First of all thank you for your patch of the Filesystem RA. It works like a charm but I have some little remarks. What I found out is that the test of the filesystem access with OCF_CHECK_LEVEL is not working with glusterfs. If I use the nvpair OCF_CHECK_LEVEL with a value of 10 I get an err_message with the content: ' 192.168.51.1:/gl_vol0 is not a block device, monitor 10 is noop' If I use the nvpair OCF_CHECK_LEVEL with a value of 20 I get an err_message with the content: ' ERROR: dd said: dd: opening `/virtfs0/.Filesystem_status/res_glusterfs_sp0:0_0_vmhost1': Invalid argument' After that the resource is trying to restart permanently Unfortunately I am not familiar enough with scripting to fix it by myself and to contribute it. Another item I would like to discuss is a bit more general. As Samuel pointed out the Filesystem RA (with native client) needs the gluster node it connects to (by using the device attribute of the Filsesystem RA) up and running only on startup of the client. After that the native client detects by itself if a gluster node is gone or not. This is correct so far but in my setup this could be a SPOF. I would like to build a cluster of three machines (A,B,C) and start a Filesystem RA clone on all three clusternodes. Each of that nodes is a glusterfs server offering a glusterfs replicated share and also a glusterfs client which mounts that share (from server A initially). If all three servers are up there is no problem. Even if one of the servers is going down everything will work fine. But If the node the clients are connected to on startup (Server A) crashes and I have afterwards a need to reboot one of the remaining servers (B or C) , this server is not able to reconnect as client cause node A is still down. >From my point of view it could be a solution to give some nvpairs (e.g. glusterhost1=IPorName1ofNodeA:/glustervolume, glusterhost2=IPorName2ofNodeB:/glustervolume, ... glusterhostN=IPorNameNofNodeN:/glustervolume). These pairs could be pinged and/or tested before the Filesystem RA tries to connect to them. In case that one of these nodes is not reachable or does not respond to the connection attempt the RA could try a connection with the next nvpair. Background: I would like to build a openais/pacemaker cluster consisting of three nodes. On each node should run a gluster server providing a replicated glusterfs share, a glusterfs client (Filesystem RA clone) connected to this share and one or more KVM-VMs Due to load reason the VMs should be distributed over the cluster. In case of crash of one of these servers the affected VM shall fail over to the remaining nodes. I hope I was able to explain my concerns and you or anybody else could give me a hint to solve my problem. Thx in advance Uwe -Ursprüngliche Nachricht- Von: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] Im Auftrag von Marcel Pennewiß Gesendet: Montag, 18. Juli 2011 14:00 An: gluster-users@gluster.org Betreff: Re: [Gluster-users] glusterfs and pacemaker On Monday 18 July 2011 13:26:00 samuel wrote: > I don't know from which version on but, if you use the native client > for mounting the volumes, it's only required to have the IP active in > the mount moment. After that, the native client will transparently > manage node's failure. ACK, that's why we use this shared IP (e.g. for backup issues via nfs). AFAIR glusterFS retrieves Volfile (via shared IP) and connects to the nodes. Marcel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs and pacemaker
Hello Marcel, hello Simuel, sorry for my late answer, but I was away for two months and for that I could continue my tests last week. First of all thank you for your patch of the Filesystem RA. It works like a charm but I have some little remarks. What I found out is that the test of the filesystem access with OCF_CHECK_LEVEL is not working with glusterfs. If I use the nvpair OCF_CHECK_LEVEL with a value of 10 I get an err_message with the content: ' 192.168.51.1:/gl_vol0 is not a block device, monitor 10 is noop' If I use the nvpair OCF_CHECK_LEVEL with a value of 20 I get an err_message with the content: ' ERROR: dd said: dd: opening `/virtfs0/.Filesystem_status/res_glusterfs_sp0:0_0_vmhost1': Invalid argument' After that the resource is trying to restart permanently Unfortunately I am not familiar enough with scripting to fix it by myself and to contribute it. Another item I would like to discuss is a bit more general. As Samuel pointed out the Filesystem RA (with native client) needs the gluster node it connects to (by using the device attribute of the Filsesystem RA) up and running only on startup of the client. After that the native client detects by itself if a gluster node is gone or not. This is correct so far but in my setup this could be a SPOF. I would like to build a cluster of three machines (A,B,C) and start a Filesystem RA clone on all three clusternodes. Each of that nodes is a glusterfs server offering a glusterfs replicated share and also a glusterfs client which mounts that share (from server A initially). If all three servers are up there is no problem. Even if one of the servers is going down everything will work fine. But If the node the clients are connected to on startup (Server A) crashes and I have afterwards a need to reboot one of the remaining servers (B or C) , this server is not able to reconnect as client cause node A is still down. >From my point of view it could be a solution to give some nvpairs (e.g. glusterhost1=IPorName1ofNodeA:/glustervolume, glusterhost2=IPorName2ofNodeB:/glustervolume, ... glusterhostN=IPorNameNofNodeN:/glustervolume). These pairs could be pinged and/or tested before the Filesystem RA tries to connect to them. In case that one of these nodes is not reachable or does not respond to the connection attempt the RA could try a connection with the next nvpair. Background: I would like to build a openais/pacemaker cluster consisting of three nodes. On each node should run a gluster server providing a replicated glusterfs share, a glusterfs client (Filesystem RA clone) connected to this share and one or more KVM-VMs Due to load reason the VMs should be distributed over the cluster. In case of crash of one of these servers the affected VM shall fail over to the remaining nodes. I hope I was able to explain my concerns and you or anybody else could give me a hint to solve my problem. Thx in advance Uwe -Ursprüngliche Nachricht- Von: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] Im Auftrag von Marcel Pennewiß Gesendet: Montag, 18. Juli 2011 14:00 An: gluster-users@gluster.org Betreff: Re: [Gluster-users] glusterfs and pacemaker On Monday 18 July 2011 13:26:00 samuel wrote: > I don't know from which version on but, if you use the native client > for mounting the volumes, it's only required to have the IP active in > the mount moment. After that, the native client will transparently > manage node's failure. ACK, that's why we use this shared IP (e.g. for backup issues via nfs). AFAIR glusterFS retrieves Volfile (via shared IP) and connects to the nodes. Marcel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs and pacemaker
Hello Marcel, thank you very much for the patch. Great Job. It works on the first shot. Mounting and migration of the fs works. Till now I could not test a hart reset of a cluster node, cause a colleague is currently using the cluster. I applied the following parameters: Fstype: glusterfs Mountdir: /virtfs Glustervolume: 192.168.50.1:/gl_vol1 Maybe you can answer me a question for better understanding? My second node is 192.168.50.2. But in the Filesystem RA I have referenced to 192.168.50.1 (see above). During my first test node1 was up and running, but what happens if node1 is completely away and the address is inaccessible? Thx Uwe Uwe Weiss weiss edv-consulting Lattenkamp 14 22299 Hamburg Phone: +49 40 51323431 Fax:+49 40 51323437 eMail: u.we...@netz-objekte.de -Ursprüngliche Nachricht- Von: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] Im Auftrag von Marcel Pennewiß Gesendet: Sonntag, 17. Juli 2011 17:37 An: gluster-users@gluster.org Betreff: Re: [Gluster-users] glusterfs and pacemaker On Friday 15 July 2011 13:12:02 Marcel Pennewiß wrote: > > My idea is, that pacemaker starts and monitors the glusterfs > > mountpoints and migrates some resources to the remaining node if > > one or more > > mountpoint(s) fails. > > For using mountpoints, please have a look at OCF Filesystem agent. Uwe informed me (via PM) that this didn't work - we did not use this until now. After some investigation you'll see that ocf::Filesystem did not detect/work with glusterfs-shares :( A few changes are necessary to create a basic support for glusterfs. @Uwe: Please have a look at [1] and try to patch your "Filesystem"-OCF-script (which maybe located in /usr/lib/ocf/resource.d/heartbeat). [1] http://subversion.fem.tu-ilmenau.de/repository/fem-overlay/trunk/sys- cluster/resource-agents/files/filesystem-glusterfs-support.patch best regards Marcel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] glusterfs and pacemaker
Hello List, I have a running cluster (two nodes) running opanais , pacemaker and glusterfs. Currently glusterfs is manually started and mounted. Now I would like to integrate glusterfs into this pacemaker environment but I can't find a glusterfs resource agent for pacemaker. Has anyone an idea how to integrate glusterfs into pacemaker? My idea is, that pacemaker starts and monitors the glusterfs mountpoints and migrates some resources to the remaining node if one or more mountpoint(s) fails. Thx in advance ewuewu ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] glusterfs 3.2.1 nfs hangs on opensuse 11.4
Hi list, I have some trouble with glusterfs 3.2.1. compiled from tarball in two opensue 11.4 boxes. Compiling an Installing of glusterfs went fine. Afterwards I have configured a distributed replicated volume named gluster_vol1 (peers are of course connected to each other) As long as I only use the native gluster-client everything works fine an I am reallay happy about glusterfs. But if I try to use nfs I am running in trouble. In details: I have two hosts called vmhost1 (192.168.50.1) and vmhost2(192.168.50.2). On both hosts glusterfs is started. On both hosts I have mounted a directory named /nfs_mount with: mount -t nfs -o mountproto=tcp 192.168.50.1:/gluster_vol1 /nfs_mount Afterward I can ls, df, du and so on with this directory on both hosts. Even copying of small files works mostly but not every times. But if I try to create bigfiles with dd for example [], the connections hangs after one or two minutes. Sometimes it helps to restart glusterfs on the mounted node, sometimes only a reset of the master node solves the problem. This happens if I run the copying or the creation of bigfile on any node that has mounted the gluster nfs share independently if I run this command from a single node or concurrently from multiple nodes. Here is my logfile from my masternode after starting glusterfs on both nodes: cat etc-glusterfs-glusterd.vol.log [2011-07-12 15:19:08.455046] I [glusterd.c:564:init] 0-management: Using /etc/glusterd as working directory [2011-07-12 15:19:08.467129] E [rpc-transport.c:676:rpc_transport_load] 0-rpc-transport: /usr/local/lib/glusterfs/3.2.1/rpc-transport/rdma.so: cannot open shared object file: No such file or directory [2011-07-12 15:19:08.467151] E [rpc-transport.c:680:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine [2011-07-12 15:19:08.467164] W [rpcsvc.c:1288:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed [2011-07-12 15:19:08.475337] I [glusterd.c:88:glusterd_uuid_init] 0-glusterd: retrieved UUID: 358770a7-d1ee-4fbd-8552-b354a2295d79 [2011-07-12 15:19:09.852048] E [glusterd-store.c:1567:glusterd_store_retrieve_volume] 0-: Unknown key: brick-0 [2011-07-12 15:19:09.852093] E [glusterd-store.c:1567:glusterd_store_retrieve_volume] 0-: Unknown key: brick-1 [2011-07-12 15:19:10.72382] I [glusterd-handler.c:3399:glusterd_friend_add] 0-glusterd: connect returned 0 [2011-07-12 15:19:10.74324] I [glusterd-utils.c:1092:glusterd_volume_start_glusterfs] 0-: About to start glusterfs for brick 192.168.50.1:/sp1/glusterfs [2011-07-12 15:19:10.98987] I [glusterd-utils.c:2217:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully [2011-07-12 15:19:10.100043] I [glusterd-utils.c::glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully [2011-07-12 15:19:10.101123] I [glusterd-utils.c:2227:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully Given volfile: +--- ---+ 1: volume management 2: type mgmt/glusterd 3: option working-directory /etc/glusterd 4: option transport-type socket,rdma 5: option transport.socket.keepalive-time 10 6: option transport.socket.keepalive-interval 2 7: end-volume 8: +--- ---+ [2011-07-12 15:19:10.115334] E [socket.c:1685:socket_connect_finish] 0-management: connection to failed (Connection refused) [2011-07-12 15:19:10.193987] I [glusterd-pmap.c:237:pmap_registry_bind] 0-pmap: adding brick /sp1/glusterfs on port 24010 [2011-07-12 15:19:10.224800] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (192.168.50.1:1023) [2011-07-12 15:21:39.983189] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (192.168.50.2:1021) [2011-07-12 15:21:40.318042] I [glusterd-handshake.c:317:glusterd_set_clnt_mgmt_program] 0-: Using Program glusterd clnt mgmt, Num (1238433), Version (1) Here is my nfs.log before mounting anything: cat nfs.log [2011-07-12 15:19:10.213621] I [nfs.c:675:init] 0-nfs: NFS service started [2011-07-12 15:19:10.213743] W [write-behind.c:3029:init] 0-gluster_vol1-write-behind: disabling write-behind for first 0 bytes [2011-07-12 15:19:10.216610] I [client.c:1935:notify] 0-gluster_vol1-client-0: parent translators are ready, attempting connect on transport [2011-07-12 15:19:10.220675] I [client.c:1935:notify] 0-gluster_vol1-client-1: parent translators are ready, attempting connect on transport Given volfile: +--- ---+ 1: volume gluster_vol1-client-0 2: type protocol/client 3: option remote-host 192.168.50.1 4: option remote-subvolume /sp1/g