Re: [Linux-HA] samba lsb script
On Thu, Jul 31, 2008 at 19:51, Serge Dubrouski [EMAIL PROTECTED] wrote: One more thing to learn about Pacemaker :-) It looks like it runs monitor/status action for all configured resources before trying to start any of those resources. Then modifying that init script is your the only option. 100% correct ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] samba lsb script
On Thu, Jul 31, 2008 at 15:41, Thibaut Perrin [EMAIL PROTECTED] wrote: Why don't you put the samba and drbd resources in a resource group, as the samba will always be launched AFTER the drbd resource and filesystem ? because we also check the status of resources _before_ we start anything (doing so afterwards would defeat the point of doing so) ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] resource keep restarting on standby node
On Fri, Aug 1, 2008 at 03:35, jijun gao [EMAIL PROTECTED] wrote: hi, Andreas very short interval and timeout *Jul 31 16:24:37 node2 last message repeated 9 times Jul 31 16:24:37 node2 setroubleshoot: SELinux is preventing ifconfig (ifconfig_t) read write to socket:[136168] (initrc_t). For complete SELinux messages. run sealert -l 0db84664-2bd3-4f8f-a10e-1e0641417484 hmmm ... I'm not familiar with SELinux, but that looks suspicious to me. I assume on node1 SELinux is disabled? actually, on node1 SELinux is enabled, but I don't find similar log iinformation on node1, anyway, the two nodes don't have completely the same software environment, and I disable SELinux on node2. Jul 31 16:24:37 node2 lrmd: [29544]: WARN: asterisk_2:monitor process (PID 23374) timed out (try 1). Killing with signal SIGTERM (15). ... and because of the monitoring timeout the resource is declared dead and restarted. you got it. when I set timeout=10, resources don't restart as used to. but I am still not quite sure what timeout mean. it means that the operation has 10s to complete before we assume it failed and the operation is performed every {interval} seconds. here is my understanding: so the moniter action, actually, it's a process that run again and again, and the process takes some time to execute, and every interval time, a new process runs. Is that true? still, there is something else I don't understand. why the 'restarting' only happens on the standby node? (as far as I know, it has nothing to do with SELinux) Thanks a bounch ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] colocation constraint dependencies
On Wed, Jul 30, 2008 at 19:18, daniel peess [EMAIL PROTECTED] wrote: hello andreas, On Wed, Jul 30, 2008 at 11:04:10AM +0200, Andreas Kurz wrote: Ok .. I see. Try to set the 'default-resource-stickiness' to a positive value and give each of your groups a different 'priority'. That should do the trick. setting the 'default-resource-stickiness' to a positive value now prevents the restart behavior when a node returns, thanks. but this is only half of a workaround for the problem below. this doesn't help if you crash/standby/stop a node. resources that were running on this node are pushing away other resources, although both of their scores are equal/unset. if other free nodes are available the failing resource should start there, and if none are available shouldn't start at all. instead heartbeat restarts resources depending on the colocation constraints, although those should just distribute the resources across all nodes. again, all groups shall be treated equal if they have the same score, the failure of one shall never affect other ones. IMHO this is a bug. then please submit one ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Resources starting twice
On 2008-07-31T16:44:31, Angel Rengifo Cancino [EMAIL PROTECTED] wrote: Yep, it's because I'm first trying to understand very well heartbeat 1.x before learning 2.x style. Using haresources it seems easier for my simple requirements. That's not necessarily helpful, as v2 is very different, and knowledge from v1 almost does not apply at all. The errors you have are very likely related to the LSB scripts not being quite LSB compliant. However, v1 and even v2 have some scenarios which might cause the start or even stop action to be issued twice. The scripts must be able to handle this, as they are defined to be idem-potent. Regards, Lars -- Teamlead Kernel, SuSE Labs, Research and Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Resources starting twice
Thanks Lars and Michael: The squid script from Centos 5.2 it wasn't working correctly when trying to start twice. I edited /etc/init.d/squid a now start twice always returns me code 0. Now heartbeat doesn't give up when tries to start an already running service. I'll check every init script before using it with heartbeat. mmm, this is a different question: Do I really need to start/stop services with haresources? Why can't I just simply mantain my services always running (chkconfig services on)? Is it not enough to change the IP alias between nodes? On Fri, Aug 1, 2008 at 5:00 AM, Lars Marowsky-Bree [EMAIL PROTECTED] wrote: On 2008-07-31T16:44:31, Angel Rengifo Cancino [EMAIL PROTECTED] wrote: Yep, it's because I'm first trying to understand very well heartbeat 1.x before learning 2.x style. Using haresources it seems easier for my simple requirements. That's not necessarily helpful, as v2 is very different, and knowledge from v1 almost does not apply at all. The errors you have are very likely related to the LSB scripts not being quite LSB compliant. However, v1 and even v2 have some scenarios which might cause the start or even stop action to be issued twice. The scripts must be able to handle this, as they are defined to be idem-potent. Regards, Lars -- Teamlead Kernel, SuSE Labs, Research and Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] mgmtd not starting on opensuse 11i386(unresolvedsymbol)
Dejan Muhamedagic [EMAIL PROTECTED] wrote: Hi, On Wed, Jul 30, 2008 at 08:55:53AM +0200, Sebastian Reitenbach wrote: General Linux-HA mailing list linux-ha@lists.linux-ha.org wrote: On Mon, Jul 28, 2008 at 05:52:10PM -, root wrote: Hi, Dejan Muhamedagic [EMAIL PROTECTED] wrote: Hi, On Mon, Jul 28, 2008 at 04:41:27PM +0200, Sebastian Reitenbach wrote: Hi, I just upgraded my desktop to opensuse 11.0 i586, and updated the box, then installed the heartbeat rpm's 2.1.3 from download.opensuse.org. I've these rpm's installed right now: pacemaker-heartbeat-0.6.5-8.2 heartbeat-common-2.1.3-23.1 heartbeat-resources-2.1.3-23.1 heartbeat-2.1.3-23.1 pacemaker-pygui-1.4-1.3 I've added these lines to /etc/ha.d/ha.cf to start mgmtd automatically: apiauth mgmtd uid=root respawn root/usr/lib/heartbeat/mgmtd -v but mgmtd fails to start, when I try to start it on the commandline, then I see the following output: /usr/lib/heartbeat/mgmtd: symbol lookup error: /usr/lib/libpe_status.so.2: undefined symbol: stdscr As far as I researched now, the stdscr symbol is expected to come from ncurses? Looks like a dependency problem. Does the package containing mgmtd depend on the ncurses library? Though I don't understand why mgmtd needs ncurses. I found this out, in a thread in some m/l, regarding the error message about the undefined symbol, but maybe this is just wrong. stdscr is an external variable defined in ncurses.h which is included from ./lib/crm/pengine/unpack.h which is part of the code that gets built in libpe_status. The pacemaker rpm, which includes that library, does depend on libncurses. Is that the case with the pacemaker you downloaded? I've these installed: rpm -qa | grep -i ncurs ncurses-utils-5.6-83.1 libncurses5-5.6-83.1 yast2-ncurses-pkg-2.16.14-0.1 yast2-ncurses-2.16.27-8.1 rpm -q --requires pacemaker-heartbeat /bin/sh /bin/sh /sbin/ldconfig /sbin/ldconfig rpmlib(PayloadFilesHavePrefix) = 4.0-1 rpmlib(CompressedFileNames) = 3.0.4-1 /bin/sh /usr/bin/python libbz2.so.1 libc.so.6 libc.so.6(GLIBC_2.0) libc.so.6(GLIBC_2.1) libc.so.6(GLIBC_2.1.3) libc.so.6(GLIBC_2.2) libc.so.6(GLIBC_2.3) libc.so.6(GLIBC_2.3.4) libc.so.6(GLIBC_2.4) libccmclient.so.1 libcib.so.1 libcrmcluster.so.1 libcrmcommon.so.2 libdl.so.2 libgcrypt.so.11 libglib-2.0.so.0 libgnutls.so.26 libgnutls.so.26(GNUTLS_1_4) libgpg-error.so.0 libhbclient.so.1 liblrm.so.0 libltdl.so.3 libm.so.6 libncurses.so.5 libpam.so.0 libpam.so.0(LIBPAM_1.0) libpcre.so.0 libpe_rules.so.2 libpe_status.so.2 libpengine.so.3 libplumb.so.1 librt.so.1 libstonithd.so.0 libtransitioner.so.1 libxml2.so.2 libz.so.1 rpmlib(PayloadIsLzma) = 4.4.2-1 rpm -ql libncurses5-5.6-83.1 /lib/libncurses.so.5 /lib/libncurses.so.5.6 ... so it does require ncurses, but it is installed. but nm /lib/libncurses.so.5.6 nm: /lib/libncurses.so.5.6: no symbols That's fine, it means that the binary is stripped. If you take a look at libncurses.a (which is probably only in the development package), you should see some symbols. BTW, you can also try objdump with -T: $ objdump -T libncurses.so.5 | grep stdscr 0015a630 gDO .bss 0008 Base stdscr here I have: objdump -T /lib64/libncurses.so.5 | grep stdscr 002465e8 gDO .bss 0008 Basestdscr Meanwhile I observed the problem on a opensuse 10.3 i386 and on opensue 11 x86_64 too. Seems like there is a general problem with this version. kind regards Sebastian ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] external/ipmi problems
Hi, Brock Palen Good, finally there's someone got the same things as me. I just don't know if there's any chance the stonith/external would parse return value 0 into 256, or ipmitool itself have bugs when doing reset. Brock, can I ask your machine type and model? I met some non-zero return values when using ipmitool reset on some HP Proliant DL140/145 servers. Regards, Chun Tian (binghe) I have made some luck getting STONITH to work but still running into a problem I can not figure out how to debug. In the ha.cf on each host I have: stonith_host mds2.engin.umich.edu external/ipmi mds2.engin.umich.edu mds2-m.engin.umich.edu root PASSWORD stonith_host mds1.engin.umich.edu external/ipmi mds1.engin.umich.edu mds1-m.engin.umich.edu root PASSWORD Now heartbeat does try to kill the node where I kill heartbeat. In the log I see: heartbeat[12013]: 2008/07/31_15:47:56 info: Resetting node mds1.engin.umich.edu with [IPMI STONITH device] heartbeat[12013]: 2008/07/31_15:47:57 info: glib: external_run_cmd: Calling '/usr/lib64/stonith/plugins/external/ipmi reset mds1.engin.umich.edu' returned 256 heartbeat[12013]: 2008/07/31_15:47:57 ERROR: glib: external_reset_req: 'ipmi reset' for host mds1.engin.umich.edu failed with rc 256 I can run: stonith -t external/ipmi -p mds1.engin.umich.edu mds1- m.engin.umich.edu root PASSWORD -T reset mds1.engin.umich.edu and the dead node will restart. So from the documentation of 1.x style configs I am not sure where to debug why the stonith_host lines do not work. mds1 and mds2 are the nodes of the cluster, mds1-m and mds2-m are the hostnames of the IPMI devices which have lan configs set up. Note how stonith from the cmd line works just fine, just not in heartbeat. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] external/ipmi problems
I did figure it out. Problem is some of the docs out there are not very clear. I will blog this at www.mlds-networks.com If you need this latter, or just look in the mailing list archives. The format is stonith_host HOST SENDING external/ipmi HOST TO CONTROL So what I really needed: stonith_host mds2.engin.umich.edu external/ipmi mds1.engin.umich.edu mds1-m.engin.umich.edu USER PASSWORD stonith_host mds1.engin.umich.edu external/ipmi mds2.engin.umich.edu mds2-m.engin.umich.edu USER PASSWORD Notice how the first host, second host and IPMI host are differnt. The first one tells mds2 how to kill mds1 using mds1-m IPMI device. The second one tells mds1 how to kill mds2 using mds2-m etc. I hope that helps. Good luck, Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Aug 1, 2008, at 8:37 AM, Chun Tian (binghe) wrote: Hi, Brock Palen Good, finally there's someone got the same things as me. I just don't know if there's any chance the stonith/external would parse return value 0 into 256, or ipmitool itself have bugs when doing reset. Brock, can I ask your machine type and model? I met some non-zero return values when using ipmitool reset on some HP Proliant DL140/145 servers. Regards, Chun Tian (binghe) I have made some luck getting STONITH to work but still running into a problem I can not figure out how to debug. In the ha.cf on each host I have: stonith_host mds2.engin.umich.edu external/ipmi mds2.engin.umich.edu mds2-m.engin.umich.edu root PASSWORD stonith_host mds1.engin.umich.edu external/ipmi mds1.engin.umich.edu mds1-m.engin.umich.edu root PASSWORD Now heartbeat does try to kill the node where I kill heartbeat. In the log I see: heartbeat[12013]: 2008/07/31_15:47:56 info: Resetting node mds1.engin.umich.edu with [IPMI STONITH device] heartbeat[12013]: 2008/07/31_15:47:57 info: glib: external_run_cmd: Calling '/usr/lib64/stonith/plugins/external/ ipmi reset mds1.engin.umich.edu' returned 256 heartbeat[12013]: 2008/07/31_15:47:57 ERROR: glib: external_reset_req: 'ipmi reset' for host mds1.engin.umich.edu failed with rc 256 I can run: stonith -t external/ipmi -p mds1.engin.umich.edu mds1- m.engin.umich.edu root PASSWORD -T reset mds1.engin.umich.edu and the dead node will restart. So from the documentation of 1.x style configs I am not sure where to debug why the stonith_host lines do not work. mds1 and mds2 are the nodes of the cluster, mds1-m and mds2-m are the hostnames of the IPMI devices which have lan configs set up. Note how stonith from the cmd line works just fine, just not in heartbeat. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems