[Linux-ha-dev] make rpm failed
gmake[2]: Entering directory `/home/gshi/linux-ha/mgmt/client' msgfmt not found -o haclient.zh_CN.mo haclient.zh_CN.po gmake[2]: msgfmt: Command not found gmake[2]: *** [haclient.zh_CN.mo] Error 127 -Guochun ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] BSC failure due to CTSproxy.py permission
It comes from CTSproxy.py not being executable. [EMAIL PROTECTED] cts]# /usr/bin/python /usr/lib/heartbeat/cts/CTSlab.py --bsc Jul 24 11:29:35 Random seed is: 1153758575 Jul 24 11:29:35 BEGINNING 2 TESTS Jul 24 11:29:35 HA configuration directory: /etc/ha.d Jul 24 11:29:35 System log files: /var/log/ha-log-local7 Jul 24 11:29:35 Enable Stonith: 1 Jul 24 11:29:35 Enable Fencing: 1 Jul 24 11:29:35 Enable Standby: 1 Jul 24 11:29:35 Cluster nodes: /bin/sh: line 1: /usr/lib/heartbeat/cts/CTSproxy.py: Permission denied Traceback (most recent call last): File /usr/lib/heartbeat/cts/CTSlab.py, line 749, in ? /usr/sbin/crm_uuid) File /usr/lib/heartbeat/cts/CTS.py, line 204, in remote_py result.pop() IndexError: pop from empty list After I manually chmod +x CTSproxy.py, it works fine. I don't know how to change it in makefile.am. Can someone fix it? thanks -Guochun ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] MAXMSG too small
Andrew Beekhof wrote: On 5/29/06, Alan Robertson [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: Running CTS on 6 nodes has shown MAXMSG to be too small - the PE cannot send its transition graph and the cluster stalls indefinitely. So, that means the CIB is 256K compressed? Or is it 256K uncompressed? its being added with ha_msg_addstruct_compress(msg, field, xml); and sent via IPC to the crmd (from the pengine) whether its actually been compressed or not i dont know. It should be compressed if you have specified compression method ha.cf. However it would be good to have some proof that it is compressed. Having a message 256K after compression means the uncompressed one probably has 1M ~2M Another way that might be interesting is to provide an API that has much higher bound, which is suited for local usage only. We could increase the value but looking through the code this seems to be an artificial limitation to various degrees... * In some cases its used as a substitute for get_netstringlen(msg) - I believe these should be fixed * In some cases its used to pre-empt checks by child functions - I believe these should be removed. The two cases that seem to legitimately use MAXMSG are the HBcomm plugins and the decompression code (though even that could retry a couple of time with larger buffers). Alan, can you please take a look at the use of MAXMSG in the IPC layer which is really not my area of expertise (especially the HBcomm plugins) and verify that my assessment is correct (and possibly get someone to look at fixing it). Unfortunately, this means various buffers get locked into memory at this size. Our processes are already pretty huge. get_netstringlen() is an expensive call. Thats basically the tradeoff... either we increase MAXMSG and take a hit on the process size, or we do more dynamically and take a runtime hit. Not being a guru in the IPC layer, I dont know which is worse. However, my suspicion was that get_(net)stringlen was not too bad for flat messages and would therefore be preferred. Why do you think that predicting that child buffers will be too large is a bad idea? How do you understand that removing it will help? For low values of MAXMSG I think its fine to do that. But we keep upping the value and allocating 256k for regular heartbeat packets seems like a real waste. Is your concern related to compressed/uncompressed sizes? As above. I'm doing my part and indicating that it can/should be compressed, but i dont know the internals well enough to say for sure. Andrew, if you can send log/debug file to me, I may (or may not) find some clue -Guochun ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] MAXMSG too small
Alan Robertson wrote: Guochun Shi wrote: Andrew Beekhof wrote: On 5/29/06, Alan Robertson [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: Running CTS on 6 nodes has shown MAXMSG to be too small - the PE cannot send its transition graph and the cluster stalls indefinitely. So, that means the CIB is 256K compressed? Or is it 256K uncompressed? its being added with ha_msg_addstruct_compress(msg, field, xml); and sent via IPC to the crmd (from the pengine) whether its actually been compressed or not i dont know. It should be compressed if you have specified compression method ha.cf. However it would be good to have some proof that it is compressed. Having a message 256K after compression means the uncompressed one probably has 1M ~2M Another way that might be interesting is to provide an API that has much higher bound, which is suited for local usage only. We could increase the value but looking through the code this seems to be an artificial limitation to various degrees... * In some cases its used as a substitute for get_netstringlen(msg) - I believe these should be fixed * In some cases its used to pre-empt checks by child functions - I believe these should be removed. The two cases that seem to legitimately use MAXMSG are the HBcomm plugins and the decompression code (though even that could retry a couple of time with larger buffers). Alan, can you please take a look at the use of MAXMSG in the IPC layer which is really not my area of expertise (especially the HBcomm plugins) and verify that my assessment is correct (and possibly get someone to look at fixing it). Unfortunately, this means various buffers get locked into memory at this size. Our processes are already pretty huge. get_netstringlen() is an expensive call. Thats basically the tradeoff... either we increase MAXMSG and take a hit on the process size, or we do more dynamically and take a runtime hit. Not being a guru in the IPC layer, I dont know which is worse. However, my suspicion was that get_(net)stringlen was not too bad for flat messages and would therefore be preferred. Why do you think that predicting that child buffers will be too large is a bad idea? How do you understand that removing it will help? For low values of MAXMSG I think its fine to do that. But we keep upping the value and allocating 256k for regular heartbeat packets seems like a real waste. Is your concern related to compressed/uncompressed sizes? As above. I'm doing my part and indicating that it can/should be compressed, but i dont know the internals well enough to say for sure. Andrew, if you can send log/debug file to me, I may (or may not) find some clue I think that MAXMSG is inappropriately used for the size of IPC messages - which would prevent messages from being sent in some cases. are you saying that there should be higher limit or no limit in IPC-only messages? I think the message layer can provide another API for that -Guochun ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Re: [Linux-ha-cvs] Linux-HA CVS: lib by alan from
there is an error from mcast.c cc1: warnings being treated as errors mcast.c: In function 'if_getaddr': mcast.c:703: warning: 'err' may be used uninitialized in this function gmake[4]: *** [mcast.lo] Error 1 linux-ha-cvs@lists.linux-ha.org wrote: linux-ha CVS committal Author : alan Host: Project : linux-ha Module : lib Dir : linux-ha/lib/plugins/HBcomm Modified Files: mcast.c Log Message: Increased how long we'll wait for the network interface to get an address... === RCS file: /home/cvs/linux-ha/linux-ha/lib/plugins/HBcomm/mcast.c,v retrieving revision 1.27 retrieving revision 1.28 diff -u -3 -r1.27 -r1.28 --- mcast.c 24 Feb 2006 00:14:59 - 1.27 +++ mcast.c 24 Feb 2006 02:20:24 - 1.28 @@ -1,4 +1,4 @@ -/* $Id: mcast.c,v 1.27 2006/02/24 00:14:59 alan Exp $ */ +/* $Id: mcast.c,v 1.28 2006/02/24 02:20:24 alan Exp $ */ /* * mcast.c: implements hearbeat API for UDP multicast communication * @@ -696,10 +696,9 @@ static int if_getaddr(const char *ifname, struct in_addr *addr) { - int fd; struct ifreqif_info; int j; - int maxtry = 30; + int maxtry = 120; gbooleangotaddr = FALSE; int err; @@ -716,28 +715,37 @@ return 0; } - if ((fd=socket(AF_INET, SOCK_DGRAM, 0)) == -1) { - PILCallLog(LOG, PIL_CRIT, Error getting socket); - return -1; - } if (Debug 0) { PILCallLog(LOG, PIL_DEBUG, looking up address for %s , if_info.ifr_name); } for (j=0; j maxtry !gotaddr; ++j) { - if (ioctl(fd, SIOCGIFADDR, if_info) 0) { - err = errno; - sleep(1); - }else{ + int fd; + if ((fd=socket(AF_INET, SOCK_DGRAM, 0)) == -1) { + PILCallLog(LOG, PIL_CRIT, Error getting socket); + return -1; + } + if (ioctl(fd, SIOCGIFADDR, if_info) = 0) { gotaddr = TRUE; + }else{ + err = errno; + switch(err) { + case EADDRNOTAVAIL: + sleep(1); + break; + default: + close(fd); + goto getout; + } } + close(fd); } +getout: if (!gotaddr) { PILCallLog(LOG, PIL_CRIT , Unable to retrieve local interface address for interface [%s] using ioctl(SIOCGIFADDR): %s , ifname, strerror(err)); - close(fd); return -1; } @@ -750,7 +758,6 @@ memcpy(addr, (SOCKADDR_IN(if_info.ifr_addr)-sin_addr) , sizeof(struct in_addr)); - close(fd); return 0; } @@ -813,6 +820,9 @@ /* * $Log: mcast.c,v $ + * Revision 1.28 2006/02/24 02:20:24 alan + * Increased how long we'll wait for the network interface to get an address... + * * Revision 1.27 2006/02/24 00:14:59 alan * Put code into mcast.c to make it retry retrieving the address from * the interface if it fails... ___ Linux-ha-cvs mailing list Linux-ha-cvs@lists.linux-ha.org http://lists.community.tummy.com/mailman/listinfo/linux-ha-cvs ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] different bug fix for bug described in archives
A grep shows MSG_DONWAIT is still used in ipcsocket.c and IPV6addr.c. I will have it fixed. thanks for reminding. -Guochun Steven Dake wrote: Folks, Joe is porting openais to bsd (and Darwin). During this process, we found a problem with the portability of our ipc layer because of the fact that sendmsg doesn't honor the MSG_DONTWAIT flag. A quick google search brought up this thread with the same problem in linux-ha: http://www.gossamer-threads.com/lists/linuxha/dev/0 The fix in this thread was to increase the buffer size of the send queue in the kernel. A more portable fix is to set the O_NONBLOCK flag via the fcntl syscall. This seems to work properly on Linux and Darwin (Joe's darwin port now works for me on macosx). Alan mentioned rewriting the code - i'm not sure if this has been done yet, but if it hasn't you might keep this tip in mind. Regards -steve ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] CTS result ---- Overall Results:{'failure': 0, 'success': 1000, 'BadNews': 0}
[EMAIL PROTECTED] cts_ha_test]# rpm -qi heartbeat Name: heartbeatRelocations: (not relocatable) Version : 2.0.3 Vendor: (none) Release : 1 Build Date: Wed 30 Nov 2005 03:15:51 PM CST Install Date: Wed 30 Nov 2005 03:15:21 PM CST Build Host: posic066.ncsa.uiuc.edu Group : Utilities Source RPM: heartbeat-2.0.3-1.src.rpm Size: 13641396 License: GPL/LGPL Dec 02 12:29:36 Dec 02 12:29:36 Overall Results:{'failure': 0, 'success': 1000, 'BadNews': 0} Dec 02 12:29:36 Dec 02 12:29:36 Detailed Results Dec 02 12:29:36 Test Flip: {'elapsed_time': 3056.7287585735321, 'skipped': 0, 'calls': 66, 'success': 66, 'started': 12, 'down-up': 12, 'auditfail': 0, 'failure': 0, 's topped': 54, 'max_time': 69.569756031036377, 'min_time': 22.751783132553101, 'up-down': 54} Dec 02 12:29:36 Test Restart: {'elapsed_time': 2781.6241755485535, 'skipped': 0, 'calls': 65, 'success': 65, 'min_time': 23.147539138793945, 'auditfail': 0, 'failure': 0, ' node:posic043': 15, 'node:posic042': 23, 'node:posic045': 16, 'node:posic044': 11, 'max_time': 60.437736034393311, 'WasStopped': 48} Dec 02 12:29:36 Test Stonithd: {'elapsed_time': 20226.280859470367, 'skipped': 0, 'calls': 77, 'success': 77, 'auditfail': 0, 'failure': 0, 'max_time': 305.45212197303772, ' min_time': 237.37988209724426} Dec 02 12:29:36 Test StartOnebyOne: {'elapsed_time': 10851.152466773987, 'skipped': 0, 'calls': 86, 'success': 86, 'auditfail': 0, 'failure': 0, 'max_time': 128.338687181 47278, 'min_time': 107.48243713378906} Dec 02 12:29:36 Test SimulStart:{'elapsed_time': 4848.6989839076996, 'skipped': 0, 'calls': 70, 'success': 70, 'auditfail': 0, 'failure': 0, 'max_time': 74.8273160457 61108, 'min_time': 52.387298107147217} Dec 02 12:29:36 Test SimulStop: {'elapsed_time': 1881.3365070819855, 'skipped': 0, 'calls': 66, 'success': 66, 'auditfail': 0, 'failure': 0, 'max_time': 69.1265339851 37939, 'min_time': 15.12415599822998} Dec 02 12:29:36 Test StopOnebyOne: {'elapsed_time': 7284.4506158828735, 'skipped': 0, 'calls': 80, 'success': 80, 'auditfail': 0, 'failure': 0, 'max_time': 107.769363880 15747, 'min_time': 51.32683801651001} Dec 02 12:29:36 Test RestartOnebyOne: {'elapsed_time': 20264.56384563446, 'skipped': 0, 'calls': 98, 'success': 98, 'auditfail': 0, 'failure': 0, 'max_time': 256.4438838958 7402, 'min_time': 167.01707005500793} Dec 02 12:29:36 Test standby2: {'elapsed_time': 9871.682421207428, 'skipped': 0, 'calls': 89, 'success': 89, 'auditfail': 0, 'failure': 0, 'max_time': 150.31842494010925, 'm in_time': 94.204804182052612} Dec 02 12:29:36 Test Bandwidth: {'elapsed_time': 2135.1637029647827, 'skipped': 12, 'calls': 73, 'success': 61, 'min': 7464.737207460571, 'max': 8226.9630687767694, ' totalbandwidth': 479334.04886258673, 'auditfail': 0, 'failure': 0, 'max_time': 72.851320028305054, 'min_time': 0.00011706352233886719} Dec 02 12:29:36 Test ResourceRecover: {'elapsed_time': 2640.5746552944183, 'skipped': 0, 'calls': 65, 'success': 65, 'auditfail': 0, 'failure': 0, 'max_time': 89.0237920284 27124, 'min_time': 23.00553297996521} Dec 02 12:29:36 Test SpecialTest1: {'elapsed_time': 8043.8690595626831, 'skipped': 0, 'calls': 88, 'success': 0, 'auditfail': 0, 'failure': 0, 'max_time': 99.96206617355 3467, 'min_time': 75.700264930725098} Dec 02 12:29:36 Test NearQuorumPoint: {'elapsed_time': 2532.2484936714172, 'skipped': 7, 'calls': 77, 'success': 70, 'auditfail': 0, 'failure': 0, 'max_time': 212.765486001 96838, 'min_time': 0.0004940032958984375} Dec 02 12:29:36 TESTS COMPLETED ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Fwd: Re: [Linux-HA] hb 2.0.3 cvs does not start anymore with config of 2.0.2]
OK, I forward this mail to dev-list. Who is in charge of the scripts? Sunxun? ---BeginMessage--- Hi Andrew, Serge and Alan, Am Mittwoch, 9. November 2005 06:37 schrieb Alan Robertson: [EMAIL PROTECTED] wrote: It is very possible that the problem is that your OCF start scripts don't return $OCF_NOT_RUNNING value when monitor function is called and resources are down. I had similar problem when moved my config files from 2.0.2 to 2.0.3. BTW: the in CVS 2.0.3 some of heartbeats' OCF script did not complain to this rule. And I think that this restriction is a problem. I know that it's the LSB spec - but I doubt it's very often (if at all) followed. All three of you were right! I just trusted the ocf compliance of the delivered scripts. The resources start just fine with my patched versions. The problem with the stop order remains. For me, it seems, the stop starts as expected. In parallel to the rightly ordered stop operation the DC role is released and as soon as the election timeout pops, another round of stop operations is started , this time all operations at once. It would definitely help, if the lrm would print the exact commands it executes and the return codes. regards, Joachim Banzhaf --- db2.cvs 2005-11-09 18:31:51.0 +0100 +++ db2 2005-11-09 21:32:36.0 +0100 @@ -87,7 +87,6 @@ END } - # # methods: What methods/operations do we support? # @@ -104,9 +103,9 @@ ! } - -# Gather up information about our db2 instance - +# +# Gather up information about our db2 instance +# db2info() { instance=$1 db2admin=$instance @@ -118,7 +117,7 @@ db2bin=$db2sql/bin db2db2=$db2bin/db2 - # Let's make sure a few important things are there... + # Let's make sure a few important things are there... if [ -d $db2sql -a -d $db2bin -a -f $db2profile -a \ -x $db2profile -a -x $db2db2 ] @@ -138,7 +137,7 @@ } # -# Run the given command in the db2 admin environment... +# Run the given command in the db2 admin environment... # runasdb2() { if @@ -151,7 +150,7 @@ } # -# Run a command as the DB2 admin, and log the output +# Run a command as the DB2 admin, and log the output # logasdb2() { output=`runasdb2 $*` @@ -166,7 +165,18 @@ return $rc } - +# +# db2 returncodes 2 and 4 are just warnings +# +filterdb2rc() { + if +[ $1 == 2 -o $1 == 4 ] + then +return 0 + fi + return $1 +} + # # db2_start: Start the given db2 instance # @@ -193,6 +203,7 @@ for DB in `db2_dblist` do runasdb2 $db2db2 activate database $DB +filterdb2rc $? done fi return $? @@ -233,7 +244,6 @@ return $rc } - # # db2_status: is the given db2 instance running? # @@ -243,6 +253,7 @@ test $pscount -ge 5 } + our_db2_ps() { ps -u $db2admin | grep db2 } @@ -250,10 +261,9 @@ db2_dblist() { runasdb2 $db2db2 list database directory \ - | grep -i 'Database name.*=' | sed 's%.*= *%%' + | awk -F'=' '$1 ~ /lias/ { db = $2 } $1 ~ /o[ck]al/ { print db }' } - # # db2_monitor: Can the given db2 instance do anything useful? # @@ -287,9 +297,8 @@ } # -# 'main' starts here... +# 'main' starts here... # - if ( [ $# -ne 1 ] ) then @@ -329,45 +338,55 @@ exit $OCF_ERR_PERM fi -# -# Grab common db2 information... -# -if - db2info $instance -then - : DB2 info is OK! -else - exit $OCF_ERR_GENERIC -fi - - # What kind of method was invoked? case $1 in - start) db2_start $instance - exit $?;; - - stop) db2_stop $instance - exit $?;; + start)if + db2info $instance + then + db2_start $instance + exit $? + fi + exit $OCF_ERR_GENERIC + ;; + stop) if + db2info $instance + then + db2_stop $instance + exit $? + fi + exit $OCF_SUCCESS + ;; + status) if - db2_status $instance + db2info $instance /dev/null 21 db2_status $instance then echo DB2 UDB instance $instance is running exit $OCF_SUCCESS - else - echo DB2 UDB instance $instance is stopped - exit $OCF_NOT_RUNNING fi + echo DB2 UDB instance $instance is stopped + exit $OCF_NOT_RUNNING ;; - monitor) db2_monitor $instance - exit $?;; + monitor) if + db2info $instance + then + db2_monitor $instance + exit $? + fi + exit $OCF_NOT_RUNNING + ;; - validate-all) # OCF_RESKEY_instance has already checked within db2info(), - # just exit successfully here. - exit $OCF_SUCCESS;; + validate-all) if + db2info $instance + then + exit $OCF_SUCCESS + fi + exit $OCF_ERR_GENERIC + ;; *) db2_methods - exit $OCF_ERR_UNIMPLEMENTED;; + exit $OCF_ERR_UNIMPLEMENTED + ;; esac --- drbddisk.orig 2005-11-09 21:36:36.0 +0100 +++ drbddisk 2005-11-09 17:04:08.0 +0100 @@ -33,8 +33,10 @@ done ;; stop) - # exec, so the exit code of drbdadm propagates - exec $DRBDADM secondary $RES + $DRBDADM secondary $RES +if [ $? = 20 ]; then +exit 20 +fi ;;
Re: [Linux-ha-dev] UUID bug?
David Lee wrote: Heartbeat is failing. I'm chasing a really weird problem (CVS/HEAD), and the finger of suspicion is pointing at the way we handle UUIDs. I suspect that in at least one context we are passing the address of the UUID rather than the UUID itself. Top-down: 1. On Solaris-8, I was getting error messages: /etc/opt/LXHAhb/ha.d/harc: ha_log: not found (This was then stopping IPaddr being called.) But on Solaris-9 and Linux (FC4) things are fine. 2. Noticing that the harc script and the OCF rely heavily on environment variables in the shell I then inserted into it a diagnostic: env | grep HA_ /dev/console Ouch! Linux: Lots of good-looking strings. Fine. S8: Lots of good-looking strings. But also: HA_srcuuid=horrible binary-looking thing HA_dstuuid=horrible binary-looking thing S9: Very few HA_* strings at all. (In particular HA_FUNCS was absent, which explains my S9 ha_log: not found failure sequence.) So it looks as if HA_srcuuid and HA_dstuuid are being incorrectly set. Might the cause be around heartbeat/heartbeat.c:770? Let's assume this (or something like it) to be the case. What then happens when these HA_* are setenvd in preparation for harc? Is it S9 or S8 which has the problem? You first said it was S8 then switched to S9 :) Anyway, the problem is caused by heartbeat try to run a script that comes out of the message type. Before running the script, all message fields will be set in the enviroment since it is (name, value) pair each field. In this case, obviously we should not set any binary as envioment variable. I will commit a fix soon. -Guochun ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] UUID bug?
fix is in CVS now. Let me know if that works or not thanks -Guochun Guochun Shi wrote: David Lee wrote: Heartbeat is failing. I'm chasing a really weird problem (CVS/HEAD), and the finger of suspicion is pointing at the way we handle UUIDs. I suspect that in at least one context we are passing the address of the UUID rather than the UUID itself. Top-down: 1. On Solaris-8, I was getting error messages: /etc/opt/LXHAhb/ha.d/harc: ha_log: not found (This was then stopping IPaddr being called.) But on Solaris-9 and Linux (FC4) things are fine. 2. Noticing that the harc script and the OCF rely heavily on environment variables in the shell I then inserted into it a diagnostic: env | grep HA_ /dev/console Ouch! Linux: Lots of good-looking strings. Fine. S8: Lots of good-looking strings. But also: HA_srcuuid=horrible binary-looking thing HA_dstuuid=horrible binary-looking thing S9: Very few HA_* strings at all. (In particular HA_FUNCS was absent, which explains my S9 ha_log: not found failure sequence.) So it looks as if HA_srcuuid and HA_dstuuid are being incorrectly set. Might the cause be around heartbeat/heartbeat.c:770? Let's assume this (or something like it) to be the case. What then happens when these HA_* are setenvd in preparation for harc? Is it S9 or S8 which has the problem? You first said it was S8 then switched to S9 :) Anyway, the problem is caused by heartbeat try to run a script that comes out of the message type. Before running the script, all message fields will be set in the enviroment since it is (name, value) pair each field. In this case, obviously we should not set any binary as envioment variable. I will commit a fix soon. -Guochun ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/