On Wed, Jun 16, 2010 at 1:44 PM, <patrik.rappo...@knapp.com> wrote: > > > > > > hy, > > the other problems are still open, but I found another problem. > > We configured stonith so that it should power off one node. This didn't > work and resulted in following error message: > > "Jun 16 09:00:38 kxxxxkc1 stonith-ng: [3871]: ERROR: log_operation: > Operation 'poweroff' [12614] for host 'kxxxxkc2' with device > 'kill-kxxxxkc2-fire-from-kxxxxkc1' returned: 1 (call 0 from (null)) > Jun 16 09:00:38 kxxxxkc1 stonith-ng: [3871]: ERROR: stonith_command: > Unknown st_fence reply from kxxxxkc1 > Jun 16 09:00:38 kxxxxkc1 stonith-ng: [3871]: WARN: log_data_element: > stonith_command: UnknownOp <st-reply > st_origin="stonith_construct_async_reply" t="stonith-ng" st_op="st_fence" > st_remote_op="e814c6ce-41f3-4e8b-b5b6-301d8056f37b" st_callid="0" > st_callopt="0" st_rc="1" st_output="failed: unrecognised action: poweroff " > src="kxxxxkc1" seq="60" />" > > We played around a little and expanded the external ibmrsa plugin with an > echo, which supports us the value, which the ibmrsa get's from stonithd. We > found out, that it doesn't get an "off" message as it should,
Hmmm. Could you include a hb_report for this please? I'd need to see more than just those two log lines. > But if you configure reboot (the default value) for stonith, the ibmrsa > plugin gets the reset value, as it should and reboots the faulty node. > > So this is probably a bug in the stonithd, as i guess, because it can't > handle the poweroff command. > > For our needs we changed the condition in the plugin, so that the reset > value issues the mpcli command to power off the node. > > Is this a known issue, because we didn't find anything to it. > > Does anyone have a glue with our other problems? --> 1. fiber channel > connection loss to the storage, 2. hang of reenable a ring with > corosync-cfgtool -r. > > thx for replies. > > kr > > > Mit freundlichen Grüßen / Best Regards > > Patrik Rapposch > System Administration > > KNAPP Systemintegration GmbH > Waltenbachstraße 9 > 8700 Leoben, Austria > Phone: +43 3842 805-915 > Fax: +43 3842 82930-500 > peter.wratit...@knapp.com > www.KNAPP.com > > Commercial register number: FN 138870x > Commercial register court: Leoben > > The information in this e-mail (including any attachment) is confidential > and intended to be for the use of the addressee(s) only. If you have > received the e-mail by mistake, any disclosure, copy, distribution or use > of the contents of the e-mail is prohibited, and you must delete the e-mail > from your system. As e-mail can be changed electronically KNAPP assumes no > responsibility for any alteration to this e-mail or its attachments. KNAPP > has taken every reasonable precaution to ensure that any attachment to this > e-mail has been swept for virus. However, KNAPP does not accept any > liability for damage sustained as a result of such attachment being virus > infected and strongly recommend that you carry out your own virus check > before opening any attachment. > > > > patrik.rappo...@k > napp.com > An > 15.06.2010 11:11 The Pacemaker cluster resource > manager > <pacemaker@oss.clusterlabs.org> > Bitte antworten Kopie > an > The Pacemaker Thema > cluster resource [Pacemaker] UPDATE...2 node > manager cluster with clvm, > <pacema...@oss.cl configuration help needed... > usterlabs.org> > > > > > > > > > > > > > > > hy guys, > > my colleague gave me a tip, that the stonith ressource on node 1, when node > 2 is offline, won't work cause of a false state (cant reach the asm module > of node 2) and so the other ressources (vg, lv) can't start. > Based on this I modified the ibmrsa plugin in following way: > > I changed the return value of "/usr/lib64/stonith/plugins/external/ibmrsa" > in line 44 to 0, so that there is no false state for the stonith device and > the remaining node (node 1) can start the ressources. > > So this problem is fixed for our needs. > > The other question concerning the storage is still open. > > Further I mentioned, that I have no problem, when a node loses the > connection to the gateway (ping ressource), but I have a problem with this. > Because when the connection is up again, > the ring stays faulty and won't return. Not even when I manually try to > make the ring clean again with "corosync-cfgtool -r". - I also open a call > @ novell concerning this problem. > > The strace ouptut from" corosync-cfgtool -r" can be found in the > attachement. > > (See attached file: strace_output_corosync-cfgtool_-r.txt) > > thx for replies. > > kr patrik > > > > Mit freundlichen Grüßen / Best Regards > > Patrik Rapposch > System Administration > > KNAPP Systemintegration GmbH > Waltenbachstraße 9 > 8700 Leoben, Austria > Phone: +43 3842 805-915 > Fax: +43 3842 82930-500 > peter.wratit...@knapp.com > www.KNAPP.com > > Commercial register number: FN 138870x > Commercial register court: Leoben > > The information in this e-mail (including any attachment) is confidential > and intended to be for the use of the addressee(s) only. If you have > received the e-mail by mistake, any disclosure, copy, distribution or use > of the contents of the e-mail is prohibited, and you must delete the e-mail > from your system. As e-mail can be changed electronically KNAPP assumes no > responsibility for any alteration to this e-mail or its attachments. KNAPP > has taken every reasonable precaution to ensure that any attachment to this > e-mail has been swept for virus. However, KNAPP does not accept any > liability for damage sustained as a result of such attachment being virus > infected and strongly recommend that you carry out your own virus check > before opening any attachment. > > > > patrik.rappo...@k > napp.com > An > 15.06.2010 09:12 The Pacemaker cluster resource > manager > <pacemaker@oss.clusterlabs.org> > Bitte antworten Kopie > an > The Pacemaker Thema > cluster resource [Pacemaker] 2 node cluster with > manager clvm, configuration help > <pacema...@oss.cl needed... > usterlabs.org> > > > > > > > > > > > > > > > hy, > > as I told you, I am going to test the clvm cluster with the new service > packs vor SLES11 and the HA edition. > > The versions in there are following: > "pacemaker-1.1.2-0.2.1" > "corosync-1.2.1-0.5.1" > "openais-1.1.2-0.5.19". > > The problem that only one ring is supported by the dlm is now gone and I > have it running with 2 rings right now. > > Including a ping ressource, the loss of connection is also covered and > works fine. > > The only problem I have is, when I power off the node, which holds the > volume group and logical volume ressources: the ressources on the cluster > go in unclean state (stonith, vg, lv resources). > The failover of the ressources then doesn't work, till the node gets power > again. I maybe think, that this has something to do with my stonith > ressource, because as soon as the asm module gets power again, the failover > of the ressources to the running node works. We already updatet the asm > module to the newest version, but this didn't help. > > Another question I have is following: Is it possible, that the cluster > checks the loss of fiber channel connection to the storage. (We are > connected to the storage via fc switches, and have 2 paths) We tried > pulling of the fiber channel connection, and could recognize that our > volume group we defined, fails. The group fails, but no failover happens > nor anything else. > > I add my configuration, maybe you see a configuration failure. If you need > log files, please tell me. > > Thx for your replies. > > kr patrik > > (See attached file: cib_150610_0909.xml) > > > Mit freundlichen Grüßen / Best Regards > > Patrik Rapposch > System Administration > > KNAPP Systemintegration GmbH > Waltenbachstraße 9 > 8700 Leoben, Austria > Phone: +43 3842 805-915 > Fax: +43 3842 82930-500 > peter.wratit...@knapp.com > www.KNAPP.com > > Commercial register number: FN 138870x > Commercial register court: Leoben > > The information in this e-mail (including any attachment) is confidential > and intended to be for the use of the addressee(s) only. If you have > received the e-mail by mistake, any disclosure, copy, distribution or use > of the contents of the e-mail is prohibited, and you must delete the e-mail > from your system. As e-mail can be changed electronically KNAPP assumes no > responsibility for any alteration to this e-mail or its attachments. KNAPP > has taken every reasonable precaution to ensure that any attachment to this > e-mail has been swept for virus. However, KNAPP does not accept any > liability for damage sustained as a result of such attachment being virus > infected and strongly recommend that you carry out your own virus check > before opening any attachment. > > > > patrik.rappo...@k > napp.com > An > 07.06.2010 07:44 The Pacemaker cluster resource > manager > <pacemaker@oss.clusterlabs.org> > Bitte antworten Kopie > an > The Pacemaker Thema > cluster resource [Pacemaker] 2 node cluster with > manager clvm, configuration help > <pacema...@oss.cl needed... > usterlabs.org> > > > > > > > > > > > > > > hy, > > thx for your answers. > I tried out, modifying the crm file, didn't get any new output. I wanted to > use the opensuse packages, because they were newer then the sles11 packages > which are in the hae extension. > > finally novell managed to make the sp1 for sles11 and the hae extension > available, i'll download it, and try it out in the next few hours, hope > that it works with the new versions. > we'll see, i'll inform u then. > > thx. > > kr patrik ;) > > > Mit freundlichen Grüßen / Best Regards > > Patrik Rapposch > System Administration > > KNAPP Systemintegration GmbH > Waltenbachstraße 9 > 8700 Leoben, Austria > Phone: +43 3842 805-915 > Fax: +43 3842 82930-500 > peter.wratit...@knapp.com > www.KNAPP.com > > Commercial register number: FN 138870x > Commercial register court: Leoben > > The information in this e-mail (including any attachment) is confidential > and intended to be for the use of the addressee(s) only. If you have > received the e-mail by mistake, any disclosure, copy, distribution or use > of the contents of the e-mail is prohibited, and you must delete the e-mail > from your system. As e-mail can be changed electronically KNAPP assumes no > responsibility for any alteration to this e-mail or its attachments. KNAPP > has taken every reasonable precaution to ensure that any attachment to this > e-mail has been swept for virus. However, KNAPP does not accept any > liability for damage sustained as a result of such attachment being virus > infected and strongly recommend that you carry out your own virus check > before opening any attachment. > > > > Dejan Muhamedagic > <deja...@fastmail > .fm> An > The Pacemaker cluster resource > 04.06.2010 13:14 manager > <pacemaker@oss.clusterlabs.org> > Kopie > Bitte antworten > an Thema > The Pacemaker Re: [Pacemaker] 2 node cluster > cluster resource with clvm, configuration help > manager needed... > <pacema...@oss.cl > usterlabs.org> > > > > > > > > > On Fri, Jun 04, 2010 at 10:03:09AM +0200, Dejan Muhamedagic wrote: >> On Thu, Jun 03, 2010 at 07:57:59AM +0200, Andrew Beekhof wrote: >> > On Wed, Jun 2, 2010 at 1:25 PM, <patrik.rappo...@knapp.com> wrote: >> > > >> > > >> > > >> > > >> > > >> > > hy, >> > > >> > > thx for your reply. >> > > >> > > I installed python-curses and xml, but didn't help. >> > >> > Dejan? Thoughts? >> >> For whatever reason "import crm.main" fails. Patrik, could you >> remove the try/expect around it (in /usr/sbin/crm) and try again, >> perhaps it'll show a more specific error message. > > Looking again at the code, it is most probably that the package > just can't be used on SLES, i.e. that the python paths for > modules differs. You can verify that with 'rpm -ql | grep /crm/' > and compare the output to the paths from the error message. > > Thanks, > > Dejan > >> Otherwise, why do you want to install opensuse 11.0 packages on >> SLES11? It probably won't work and anyway you definitely won't >> get any support for that. >> >> Thanks, >> >> Dejan >> >> > > Yeah first we used the hae extension, but as you told us, that the > versions >> > > we use, are really old and this could be the problem, we tried to > upgrade >> > > to newer versions to get it running. >> > > >> > > Is there maybe another way to get it running with newer versions? >> > >> > was there nothing newer from yum? >> > I'm pretty sure the packages have been updated since then. >> > >> > > or could >> > > you may please have a look on my config, which I had in the old > running >> > > versions? I reattach it right now. >> > > >> > > thx. >> > > >> > > kr, patrik >> > > >> > > (See attached file: cib_aktuell.xml) >> > > >> > > Mit freundlichen Grüßen / Best Regards >> > > >> > > Patrik Rapposch >> > > System Administration >> > > >> > > KNAPP Systemintegration GmbH >> > > Waltenbachstraße 9 >> > > 8700 Leoben, Austria >> > > Phone: +43 3842 805-915 >> > > Fax: +43 3842 82930-500 >> > > peter.wratit...@knapp.com >> > > www.KNAPP.com >> > > >> > > Commercial register number: FN 138870x >> > > Commercial register court: Leoben >> > > >> > > The information in this e-mail (including any attachment) is > confidential >> > > and intended to be for the use of the addressee(s) only. If you have >> > > received the e-mail by mistake, any disclosure, copy, distribution or > use >> > > of the contents of the e-mail is prohibited, and you must delete the > e-mail >> > > from your system. As e-mail can be changed electronically KNAPP > assumes no >> > > responsibility for any alteration to this e-mail or its attachments. > KNAPP >> > > has taken every reasonable precaution to ensure that any attachment > to this >> > > e-mail has been swept for virus. However, KNAPP does not accept any >> > > liability for damage sustained as a result of such attachment being > virus >> > > infected and strongly recommend that you carry out your own virus > check >> > > before opening any attachment. >> > > >> > > >> > > >> > > Andrew Beekhof >> > > <and...@beekhof.n >> > > > et> An >> > > The Pacemaker cluster > resource >> > > 02.06.2010 12:53 manager >> > > > <pacemaker@oss.clusterlabs.org> >> > > > Kopie >> > > Bitte antworten >> > > an > Thema >> > > The Pacemaker Re: [Pacemaker] Antwort: Re: >> > > cluster resource Antwort: Re: 2 node cluster > with >> > > manager clvm, configuration help >> > > <pacema...@oss.cl needed... >> > > usterlabs.org> >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > On Wed, Jun 2, 2010 at 7:50 AM, <patrik.rappo...@knapp.com> wrote: >> > >> >> > >> >> > >> >> > >> >> > >> hy, >> > >> >> > >> so I tried yesterday to update to a newer version. I am using > sles11. At >> > >> least it worked with the opensuse 11.0 repo >> > >> (http://www.clusterlabs.org/rpm/opensuse-11.0/x86_64/) and one > additional >> > >> library, which I got as rpm. >> > >> >> > >> The problem I have now is, that if I want to run the crm command, I > get >> > >> following error: >> > >> >> > >> "abort: couldn't find crm libraries in [/usr/sbin > /usr/lib/python26.zip >> > >> /usr/lib64/python2.6 /usr/lib64/python2.6/plat-linux2 >> > >> /usr/lib64/python2.6/lib-tk /usr/lib64/python2.6/lib-old >> > >> /usr/lib64/python2.6/lib-dynload /usr/lib64/python2.6/site-packages >> > >> /usr/lib64/python2.6/site-packages/Numeric >> > >> /usr/local/lib64/python2.6/site-packages >> > >> /usr/lib64/python2.6/site-packages/gtk-2.0] >> > >> (check your install and PYTHONPATH)" >> > >> >> > >> I don't know what libraries it is exactly searching for, >> > > >> > > you might be missing python-curses and python-xml >> > > >> > >> I tried >> > >> rearranging my PYTHONPATH to some directories, but had no access. > The >> > > next >> > >> thing I saw was, that it now works with corosync (had to configure > it) >> > >> instead of openais and that the gui totally disappeared, so I have > no >> > >> commands like "crm_gui" or "hb_gui". >> > > >> > > Since you're on SLES, have you thought about using the HAE extension? >> > > It has all the above plus the gui. >> > > >> > >> >> > >> Do you maybe know how to fix this, or do you know a successfull way > to >> > >> implement a newer version into sles11. Service pack for sles11 > should be >> > >> available today, but they didn't make it available right now, so I > dunno >> > > if >> > >> there is also a hae sp1, which has newer versions in it. >> > >> >> > >> Thx for your help. >> > >> >> > >> Mit freundlichen Grüßen / Best Regards >> > >> >> > >> Patrik Rapposch, Bsc. >> > >> Systemadministration >> > >> >> > >> KNAPP Systemintegration GmbH >> > >> Waltenbachstraße 9 >> > >> 8700 Leoben, Austria >> > >> Phone: +43 3842 805 >> > >> Mobil: >> > >> Fax: +43 3842 82930-990 >> > >> patrik.rappo...@knapp.com >> > >> www.KNAPP.com >> > >> >> > >> Commercial register number: FN 138870x >> > >> Commercial register court: Leoben >> > >> >> > >> >> > >> >> > >> Andrew Beekhof >> > >> <and...@beekhof.n >> > >> > et> An >> > >> The Pacemaker cluster > resource >> > >> 31.05.2010 08:46 manager >> > >> > <pacemaker@oss.clusterlabs.org> >> > >> > Kopie >> > >> Bitte antworten >> > >> an > Thema >> > >> The Pacemaker Re: [Pacemaker] Antwort: Re: > 2 >> > >> cluster resource node cluster with clvm, >> > >> manager configuration help > needed... >> > >> <pacema...@oss.cl >> > >> usterlabs.org> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> On Mon, May 31, 2010 at 8:37 AM, <patrik.rappo...@knapp.com> wrote: >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> hy, >> > >>> >> > >>> thx for your reply. >> > >>> k, i'll try that in the next few hours. >> > >>> >> > >>> is there any other possibility, why there is such a strange > behaviour in >> > >>> the cluster? >> > >>> >> > >>> I short redesribe the main problem: >> > >>> >> > >>> failover between the nodes works fine, the ressources get started > on the >> > >>> remaining node (let it be node2). When node1 comes back online, >> > >>> the resources on node2 get stopped and started again on node2. ---> > very >> > >>> strange. I already tried a lot, but didn't find a solution. >> > >>> >> > >>> so the failback has a bug. >> > >> >> > >> Or did have a year ago when 1.0.3 was out... you might have more > luck >> > >> with something a little more recent. >> > >> >> > >> _______________________________________________ >> > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> >> > >> Project Home: http://www.clusterlabs.org >> > >> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > >> >> > >> >> > >> >> > >> _______________________________________________ >> > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> >> > >> Project Home: http://www.clusterlabs.org >> > >> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > >> Bugs: >> > > > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> > >> >> > > >> > > _______________________________________________ >> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > > >> > > Project Home: http://www.clusterlabs.org >> > > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > > Bugs: >> > > > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> > > >> > > _______________________________________________ >> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > > >> > > Project Home: http://www.clusterlabs.org >> > > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> > > >> > > >> > >> > _______________________________________________ >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> > Project Home: http://www.clusterlabs.org >> > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > (See attached file: cib_150610_0909.xml) > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > (See attached file: strace_output_corosync-cfgtool_-r.txt)(See attached > file: cib_150610_0909.xml)_______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker