from:"Keisuke MORI"

[Linux-ha-dev] [PATCH] pgsql resource improvement

2006-08-15 Thread Keisuke MORI

Hi,

The attachment is a patch for pgsql.in to fix an annoying
problem and some typos.

* Problem:

Pgsql fails to launch PostgreSQL unobviously even if it's
configured properly.

The cause is that PostgreSQL log file /var/log/pgsql.log, which
is hard-coded in the script, is not usually writable to PGDBA
user (at least on RedHat EL4), and pgsql just returns with no
obvious log messages when it failed to create the log file, so
users will be puzzled why it failed.

* Solution:

This patch fixes it as follows, and also fixes some typos:

 - make the log file path configurable as an OCF parameter
   "logfile" in cib.xml.

 - try to create the log file with PGDBA user and leave an error
   log message if it failed.

 - default logfile path is now /dev/null. I made it so because
   1) an appropriate path largely depends on distros' or an
   user's policy and 2) the default of PostgreSQL init.d script
   which comes with RedHat is also /dev/null.  


The patch is against 2.0.7.
I would appreciate if it is incorporated into the future release.

Sincerely,

Keisuke MORI
NTT DATA Intellilink Corporation
Index: resources/OCF/pgsql.in
===
--- resources/OCF/pgsql.in	(.../heartbeat-tags/heartbeat-2.0.7)	(revision 81)
+++ resources/OCF/pgsql.in	(.../heartbeat-branches/heartbeat-2.0.7-ksk)	(revision 81)
@@ -11,10 +11,11 @@
 # OCF parameters:
 #  OCF_RESKEY_pgctl  - Path to pg_ctl. Default /usr/bin/pg_ctl
 #  OCF_RESKET_start_opt - Startup options. Default ""
-#  OCF_RESKEY_psql   - Path to psql. Defauilt is /usr/bin/psql
+#  OCF_RESKEY_psql   - Path to psql. Default is /usr/bin/psql
 #  OCF_RESKEY_pgdata - PGDATA directory. Default is /var/lib/pgsql/data
 #  OCF_RESKEY_pgdba  - userID that manages DB. Default is postgres
 #  OCF_RESKEY_pgdb   - database to monitor. Default is template1
+#  OCF_RESKEY_logfile - Path to PostgreSQL log file. Default is /dev/null
 ###
 # Initialization:
 
@@ -58,7 +59,7 @@
 Path to pg_ctl command.
 
 pgctl
-
+
 
 
 
@@ -95,6 +96,13 @@
 pgdb
 
 
+
+
+Path to PostgreSQL server log file.
+
+logfile
+
+
 
 
 
@@ -143,10 +151,13 @@
 if [ -x $PGCTL ]
 then
 # Check if we need to create a log file
-[ ! -f /var/log/pgsql.log ] && touch /var/log/pgsql.log
-# Set the right ownership
+if ! check_log_file $LOGFILE
+	then
+ocf_log err "PostgreSQL can't create a log file: $LOGFILE"
+	return $OCF_ERR_GENERIC
+	fi
 
-	if runasowner "$PGCTL -D $PGDATA -l /var/log/pgsql.log -o "\'$STARTOPT\'" start > /dev/null 2>&1"
+	if runasowner "$PGCTL -D $PGDATA -l $LOGFILE -o "\'$STARTOPT\'" start > /dev/null 2>&1"
 	then
 	   # Probably started.
 ocf_log info "PostgreSQL start command sent."
@@ -237,6 +248,22 @@
 }
 
 #
+# Check if we need to create a log file
+#
+
+check_log_file() {
+if [ ! -w "$1" ]
+then
+if ! runasowner "touch $1 > /dev/null 2>&1"
+then
+	return 1
+fi
+fi
+
+return 0
+}
+
+#
 #   'main' starts here...
 #
 
@@ -258,9 +285,10 @@
 PGCTL=${OCF_RESKEY_pgctl:-/usr/bin/pg_ctl}
 STARTOPT=${OCF_RESKEY_start_opt:-""}
 PSQL=${OCF_RESKEY_psql:-/usr/bin/psql}
-PGDATA=${OCF_RESKEY_pgdata:-/var/lib/pgctl/data}
+PGDATA=${OCF_RESKEY_pgdata:-/var/lib/pgsql/data}
 PGDBA=${OCF_RESKEY_pgdba:-postgres}
 PGDB=${OCF_RESKEY_pgdb:-template1}
+LOGFILE=${OCF_RESKEY_logfile:-/dev/null}
 PIDFILE=${PGDATA}/postmaster.pid
 
 case "$1" in
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Re: [PATCH] pgsql resource improvement

2006-08-15 Thread Keisuke MORI

Hi Serge,

"Serge Dubrouski" <[EMAIL PROTECTED]> writes:
> Actually I don not like check_log_file function very much. It'll fail
> if the log file does not exist and admin configured /var/log/pgsql.log
> or so as a log file. 

It works if the log file path is creatable to PGDBA user.

If you want to use a log file under /var/log (i.e. writable to
root only) then the admin should create the file and give a
permission to PGDBA user beforehand you start heartbeat. This is
the same prerequisite to the first version of pgsql, I think.

>The better approach, IMHO, would be to create a
> files as root and run chown then. I'll prepare a patch later id
> somebody else won't beat me.

Yeah, I also thought that idea, but I wondered what the
group would be.

Anyway, either approach is OK for me.

Lars,
Horms,
Thank you for your comments and applying the patch!

Thanks,

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Heartbeat API for Version 2 ?

2006-08-22 Thread Keisuke MORI

Hi,

I'm looking for APIs to obtain V2 nodes/resources information
from my application. I assumes that the CIB library (cib.h and
libcib.so) is meant for it, but I'm wondering how can I use it.

 - What functions are "public" to users and what hearder files
   should I refer to?  I understand that the cib_t object is the
   one which provides its main functionalities but it seems to
   require many utility functions that defined in crm/common and
   crm/pengine to manipulate crm_data_t data structure.
   Are those all considered as the "API"?

 - What is the license of the library? It seems GPL, and is it
   designed so? I see that some files under lib/crm/cib/ are
   LGPL'ed but the rest of the files under lib/crm/ are
   GPL'ed(as well as related header files).

   IMHO, It would be nice if the library is LGPL as the other
   ones so that it can be used widely by variant users...

Regards,

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Heartbeat API for Version 2 ?

2006-08-22 Thread Keisuke MORI

Andrew,

Thank you for your reply.

"Andrew Beekhof" <[EMAIL PROTECTED]> writes:

> There are two levels of the CRM API.
>
> The first is a basic API that exchanges XML between the user and the CRM.
> There is a DTD which describes the contents.
> You can manipulate it any way you like and then upload it again using the API.
>
> The second is a library of C functions that help interpret and update the XML.
>
> I guess you could also consider the various CLIs as and API of sorts too.
>
> I wasn't aware any of it is currently LGPL but I'm prepared to revisit
> the issue.
> (Looking now at the file changes it appears I accidentally copied in
> the LGPL instead of the GPL on some occasions)
>
> What are you building and what are your requirements?


Actually, I don't have any actual plan to build something for
the present.  I'm just studying what I can do, and whether if
it's possible to link with another system, which may be a
proprietary one if my company wants :)

In my understanding, HB API for V1 libraries(libhbclient and
libclplumb) are LGPL and I was just curious to know whether if
the CRM library is different or not.


I think I can use CLIs and XML/DTD as an API.

Thanks,




>
> On 8/22/06, Keisuke MORI <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> I'm looking for APIs to obtain V2 nodes/resources information
>> from my application. I assumes that the CIB library (cib.h and
>> libcib.so) is meant for it, but I'm wondering how can I use it.
>>
>>  - What functions are "public" to users and what hearder files
>>should I refer to?  I understand that the cib_t object is the
>>one which provides its main functionalities but it seems to
>>require many utility functions that defined in crm/common and
>>crm/pengine to manipulate crm_data_t data structure.
>>Are those all considered as the "API"?
>>
>>  - What is the license of the library? It seems GPL, and is it
>>designed so? I see that some files under lib/crm/cib/ are
>>LGPL'ed but the rest of the files under lib/crm/ are
>>GPL'ed(as well as related header files).
>>
>>IMHO, It would be nice if the library is LGPL as the other
>>ones so that it can be used widely by variant users...
>>
>> Regards,
>>
>> Keisuke MORI
>> NTT DATA Intellilink Corporation
>>
>> ___
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>>
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Heartbeat API for Version 2 ?

2006-08-23 Thread Keisuke MORI

Alan,

Thank you for you comments.
I understand the situation.

Alan Robertson <[EMAIL PROTECTED]> writes:
> Alan Robertson wrote:

>> 
>> By project policy, ALL APIs to the project are LGPL.
>> http://linux-ha.org/FileCopyrightPolicy
>> 
>> The developer of the software in question is in violation of
>> long-standing, documented project policy.
>> 
>> I have talked to him about it before.  I hate to have to rewrite the
>> libraries in question, but that may be what we have to do :-(.
>> 
>> The management daemon may also do what you want - and it is licensed LGPL.
>
> To make it clear...  As far as I am personally concerned, a person has
> the right to license their software in any way they wish.  So, although
> I might disagree with him regarding the choice of license for this
> purpose, I personally view that I have no authority to demand that a
> person change their license, nor do I hold him in lower esteem for his
> decision.
>
> My desire for consistent licensing comes from my opinion that our
> customers should not have to read all every source file to discover
> which portions of the system are under which licenses.  For this and
> other reasons, it is my belief that all projects need consistent license
> policies.

I totally agree with this opinion. I luckily noticed the
difference of licenses this time, but the inconsistency will
confuse users, particularly for newbies like me, and it may lead
license violation. With this kind of risk, I might hesitate to
recommend Heartbeat to some customers... that's not my desire.

Of cource I also respect the developer's decision of license,
but I think it would be preferable to keep an uniform,
consistent policy in one project.


> Of course, I still hold hope that persuasion might hold out, and he
> might be convinced to license these library interfaces under the LGPL.
>

And I also prefer that all libraries are under LGPL.

Thanks,

-- 
Keisuke MORI
NTT DATA Intellilink Corporation


___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] thoughts about library naming conventions

2006-08-24 Thread Keisuke MORI

Alan Robertson <[EMAIL PROTECTED]> writes:
> Matthew Soffen wrote:
>> I don't see it being any problem.
>> Is there any "convention" that other projects use ?
>
>
> Not that I know of.  I just thought this might make things clearer to
> those who want to link to our libraries - which ones are under which
> license.

I think changing the library names to distinct its license is
not so common, and I feel a bit uncomfortable...

Other options came up to my mind are:

1. Clearly note somewhere on a document, probably in
   "Exceptions" section on http://linux-ha.org/FileCopyrightPolicy .

   It needs less work, but users tend to miss it, and documents
   tend to be out of date:-)

2. Separate the directories for LGPL and others in the source tree.
   Only LGPL files (that conform to the project policy) should
   be put into lib/ directory, and for other directories it
   depends on each module or developer. For example, GPL'ed
   lib/crm might be in, say crm/lib, and when the developer
   decide to change some of the functions to be LGPL, he may
   want to move those into lib/ directory with the new license
   statement.

   Users who concern the license just need to check if it's in
   lib/ directory. They don't have to look through all the
   source code in the tree.

3. Separate RPM package for LGPL libraries, say heartbeat-lib-*.rpm,
   which is intended to be used by users as to the project policy.

   Users who concern the license should only use the libraries
   in this RPM. A problem would be, because the lib rpm is
   always required, all users have to do extra steps to
   download and install.

I personally think that 2. is clear enough and good for developers, 
although I'm not sure how big the impact is for the library in question.

I hope it helps.

BTW, if this topic came up from my post about the license,
it is not an urgent issue for me. sorry if I confused you.

Thanks,

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] RPM install does not create hacluster user correctly on RedHat

2006-11-15 Thread Keisuke MORI

Hi,

I built the RPMs with the following configure options to specify
the uid/gid for hacluster/haclient on RedHat Linux, but 
the user is created with the different uid/gid with the RPMs.

./ConfigureMe rpm --with-group-id=90  --with-ccmuser-id=90

/etc/passwd is like this.
hacluster:x:1001:1001::/home/hacluster:/bin/bash


The attached patch fixes this.


The cause of this problem comes from the default behavior of
useradd command. There are at least two kinds of 'useradd' for
Linux and the useradd used by RedHat trys to create the home
directory by default, therefore it fails during the %pre scripts
execution because the directory
'/var/lib/heartbeat/cores/hacluster' does not exist yet.

FYI, the difference of the two useradds is summarized as:

1) pwdutils version (used by SuSE)
  - it does not create the home directory unless '-m' option is specified.
  - ref. http://rpmfind.net/linux/rpm2html/search.php?query=pwdutils

2) shadow-utils version (used by RedHat etc.)
  - it depends on /etc/login.defs configuration whether if it
creates the home directory by default.
  - in the case of RedHat, it's configured to create the home
directory by default.
  - '-M' option is available if you don't want to create it.
  - ref. http://rpmfind.net/linux/rpm2html/search.php?query=shadow-utils

I hope it helps.
Thanks,
-- 
Keisuke MORI
NTT DATA Intellilink Corporation
diff -r 12246952c083 heartbeat.spec.in
--- a/heartbeat.spec.in	Wed Nov 15 14:09:13 2006 +0900
+++ b/heartbeat.spec.in	Wed Nov 15 14:53:56 2006 +0900
@@ -1529,7 +1529,9 @@ else
 else
   USEROPT="-g @HA_APIGROUP@ -u @HA_CCMUID@ -d @HA_COREDIR@/@HA_CCMUSER@"
   if
-/usr/sbin/useradd $USEROPT @HA_CCMUSER@ 2>/dev/null
+/usr/sbin/useradd $USEROPT @HA_CCMUSER@ 2>/dev/null \
+|| /usr/sbin/useradd -M $USEROPT @HA_CCMUSER@ 2>/dev/null
+# -M force to suppress to create the home directory on RedHat
   then
 : OK we were able to add user @HA_CCMUSER@
   else
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] IPaddr in -dev does not work

2006-11-22 Thread Keisuke MORI

"Andrew Beekhof" <[EMAIL PROTECTED]> writes:
>> 1. When 'start', setting the netmask causes an error when I
>>specified a CIDR netmask. It was converted by findif in 2.0.7.
>
> Fixed as:
>   http://hg.beekhof.net/lha/crm-stable/rev/b4f60d5ded4b

Thanks, but I think this fix is nothing related to my problem...

>
>>
>> 2. 'stop' fails when I specified 'nic' parameter.
>> From the sh -x option output, the following 'if' condition
>> failed at L531.
>> 
>> + '[' xeth0:0 '!=' x -a eth0 '!=' eth0:0 ']'
>> 
>>
>> 3. 'stop' fails with another error when the interface was
>>already down-ed.
>
> I ran the exact commands you listed for 2 and 3 on Gentoo and SLES10
> - it ran fine on both platforms
>
> Alas I dont have access to a redhat box... if you attach the complete
> "-x" output i could probably figure something out...


I attached my complete log with -x option.
This log is done after I applied your patch to findif.


I also attached my suggesting patch to fix those.

My patch includes fixes as below:

 - use caluclated netmask instead of the specified netmask param
   at ip_validate_all(): L806.
   will fix problem 1.

 - remove a sanity check at ip_stop(): L531
   will fix problem 2 and 3.
   I personally think that this sanity check is not
   necessary, as there was no such check in 2.0.7...

 - a variable name is obviously wrong for SunOS at
   add_interface(): L500 (I haven't test it though)

 - leave some info logs as 2.0.7 rather than using echo at L512.


>> My environment is:
>>
>> OS: RedHat ES4 Update 3
>> Heartbeat: see below (is this the right way to identify the revision?)
>
> "hg id" is usually enough

Thanks!

$ hg id
3d89093dd920 tip


-- 
Keisuke MORI
NTT DATA Intellilink Corporation


IPaddr-20061122.log.gz
Description: Binary data
diff -r 3d89093dd920 resources/OCF/IPaddr.in
--- a/resources/OCF/IPaddr.in	Mon Nov 20 14:34:48 2006 -0700
+++ b/resources/OCF/IPaddr.in	Wed Nov 22 18:03:38 2006 +0900
@@ -464,6 +464,7 @@ delete_interface () {
 		CMD="$IFCONFIG $ifname down";;
   esac
 
+  ocf_log info "$CMD"
   $CMD
 
   return $?
@@ -497,7 +498,7 @@ add_interface () {
 	  #	CMD="$IFCONFIG $iface inet $ipaddr $netmask_text up"
 	  # So hack the following workaround:
 	  CMD="$IFCONFIG $iface inet $ipaddr"
-	  CMD="$CMD && $IFCONFIG $iface $netmask_text"
+	  CMD="$CMD && $IFCONFIG $iface $netmask"
 	  CMD="$CMD && $IFCONFIG $iface up"
 	  ;;
   
@@ -509,7 +510,7 @@ add_interface () {
   esac
 
   # Use "eval $CMD" (not "$CMD"): it might be a chain of two or more commands.
-  echo $CMD
+  ocf_log info "eval $CMD"
   eval $CMD
   rc=$?
   if [ $rc != 0 ]; then
@@ -526,12 +527,6 @@ ip_stop() {
 
   SENDARPPIDFILE="$SENDARPPIDDIR/send_arp-$OCF_RESKEY_ip"
   NIC=`find_interface $OCF_RESKEY_ip`
-  : ${OCF_RESKEY_nic=$NIC}
-
-  if [ x$NIC != x -a $OCF_RESKEY_nic != $NIC ]; then
-  ocf_log err "Attempt to remove $OCF_RESKEY_ip from an interface other than the one supplied"
-  return $OCF_ERR_ARGS
-  fi
 
   if [ -f "$SENDARPPIDFILE" ]; then
 	cat "$SENDARPPIDFILE" | xargs kill
@@ -803,14 +798,19 @@ ip_validate_all() {
 fi
 
 tmp=`echo "$NICINFO" | cut -f2 | cut -d ' ' -f2`
-if [ "x$OCF_RESKEY_netmask" = "x" ]; then
-	ocf_log info "Using calculated netmask for ${OCF_RESKEY_ip}: $tmp"
-	OCF_RESKEY_netmask=$tmp
+# use calculated netmask to take the CIDR form
+if [ x$tmp != x${OCF_RESKEY_netmask} ]; then
+ocf_log info "Using calculated netmask for ${OCF_RESKEY_ip}: $tmp"
+OCF_RESKEY_netmask=$tmp
+fi
+#if [ "x$OCF_RESKEY_netmask" = "x" ]; then
+#	ocf_log info "Using calculated netmask for ${OCF_RESKEY_ip}: $tmp"
+#	OCF_RESKEY_netmask=$tmp
 # We cant do this because netmask used to take the CIDR form...
 #elif [ x$tmp != x${OCF_RESKEY_netmask} ]; then
 #	ocf_log err "Invalid parameter value: netmask [$OCF_RESKEY_netmask [Calculated netmask: $tmp]"
 #	return $OCF_ERR_ARGS
-fi
+#fi
 
 tmp=`echo "$NICINFO" | cut -f3 | cut -d ' ' -f2`
 if [ "x$OCF_RESKEY_broadcast" = "x" ]; then
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] IPaddr in -dev does not work

2006-11-26 Thread Keisuke MORI

"Andrew Beekhof" <[EMAIL PROTECTED]> writes:
>>  - use caluclated netmask instead of the specified netmask param
>>at ip_validate_all(): L806.
>>will fix problem 1.
>
> this would only serve to mask underlying problems
> either both forms are equivalent and the copy does nothing, or there
> is a bug in findif (like the one i found yesterday)
>
> if you want the automatic value, then don't specify one in the
> resource configuration

Hmm, OK, let me clarify "the spec." of IPaddr:

 - "netmask" param for IPaddr takes only dot notation, and
does not take CIDR format.
 - Specifying CIDR format to netmask might be working in 2.0.7,
but it should not have been.
 - "cidr_netmask" can be used for CIDR format instead.

I will change my cib.xml to use the dot notation from 2.0.8.
It shouldn't be a big problem...

>
>>  - remove a sanity check at ip_stop(): L531
>>will fix problem 2 and 3.
>>I personally think that this sanity check is not
>>necessary, as there was no such check in 2.0.7...
>
> yeah, it unfortunately means we cant recover an IP is some scenarios
> - so i've moved it to the status check which will now report
> "running/failed" if the NICs dont match (which will invoke recovery
> actions).
>
>>  - a variable name is obviously wrong for SunOS at
>>add_interface(): L500 (I haven't test it though)
>
> not sure what you mean here

I meant that a variable "netmask_text" is used there but it's no
longer valid in this version of script. The variable name was
used in 2.0.7 version.

>
>>  - leave some info logs as 2.0.7 rather than using echo at L512.
>
> included
>
>
> latest patch is http://hg.beekhof.net/lha/crm-stable/rev/026bab6b8384

Great! I tried this version and confirmed that everything works fine
(with specifying netmask in the dot notation).
Thank you for your help!

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] RPM install does not create hacluster user correctly on RedHat

2006-11-26 Thread Keisuke MORI

I realized that OSDL bugzilla #960 is about the same thing.
The patch should fix this bug, too.

I added a comment to the bugzilla.


Ragnar Kjørstad <[EMAIL PROTECTED]> writes:

> It seems to me that the default %preinstall scripts can not work
> on Red Hat unless heartbeat has been installed on the systems already,
> so that /var/lib/heartbeat/cores/ exists prior to installing the
> package.
>
> The patch works ok for us. Will it be applied to the development branch?
>
>
> On Wed, Nov 15, 2006 at 10:06:48PM +0900, Keisuke MORI wrote:
>> Hi,
>> 
>> I built the RPMs with the following configure options to specify
>> the uid/gid for hacluster/haclient on RedHat Linux, but 
>> the user is created with the different uid/gid with the RPMs.
>> 
>> ./ConfigureMe rpm --with-group-id=90  --with-ccmuser-id=90
>> 
>> /etc/passwd is like this.
>> hacluster:x:1001:1001::/home/hacluster:/bin/bash
>> 
>> 
>> The attached patch fixes this.
>> 
>> 
>> The cause of this problem comes from the default behavior of
>> useradd command. There are at least two kinds of 'useradd' for
>> Linux and the useradd used by RedHat trys to create the home
>> directory by default, therefore it fails during the %pre scripts
>> execution because the directory
>> '/var/lib/heartbeat/cores/hacluster' does not exist yet.
>> 
>> FYI, the difference of the two useradds is summarized as:
>> 
>> 1) pwdutils version (used by SuSE)
>>   - it does not create the home directory unless '-m' option is specified.
>>   - ref. http://rpmfind.net/linux/rpm2html/search.php?query=pwdutils
>> 
>> 2) shadow-utils version (used by RedHat etc.)
>>   - it depends on /etc/login.defs configuration whether if it
>> creates the home directory by default.
>>   - in the case of RedHat, it's configured to create the home
>> directory by default.
>>   - '-M' option is available if you don't want to create it.
>>   - ref. http://rpmfind.net/linux/rpm2html/search.php?query=shadow-utils
>> 
>> I hope it helps.
>> Thanks,
>> -- 
>> Keisuke MORI
>> NTT DATA Intellilink Corporation
>
>
>> ___
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>
>
> -- 
> Ragnar Kjørstad
> Software Engineer
> Scali - http://www.scali.com
> Scaling the Linux Datacenter
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

-- 
Keisuke MORI
Open Source Business Division
NTT DATA Intellilink Corporation
Tel: +81-3-3534-4811 / Fax: +81-3-3534-4814

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] IPaddr in -dev does not work

2006-11-28 Thread Keisuke MORI

"Andrew Beekhof" <[EMAIL PROTECTED]> writes:
>>  - "netmask" param for IPaddr takes only dot notation, and
>> does not take CIDR format.
>
> nope, the addition of cidr_netmask was a mistake that we're all going
> to try and forget ever happened.
>
> netmask can take either form (now that findif can too)

OK, This is "the spec."
Then, this is still NOT working.

Here's the command output again:
              
[EMAIL PROTECTED] ~]# env OCF_RESKEY_ip=192.168.20.210 OCF_RESKEY_netmask=24 
OCF_RESKEY_nic=eth0   /usr/lib/ocf/resource.d/heartbeat/IPaddr start
2006/11/28_17:46:06 INFO: Using calculated broadcast for 192.168.20.210: 
192.168.20.255
2006/11/28_17:46:07 INFO: eval /sbin/ifconfig eth0:0 192.168.20.210 netmask 24 
broadcast 192.168.20.255
SIOCSIFNETMASK: Invalid argument
2006/11/28_17:46:07 DEBUG: Sending Gratuitous Arp for 192.168.20.210 on eth0:0 
[eth0]
              
(the latest IPaddr pulled from crm-stable is used)

Please look at the ifconfig command line carefully.
The script gives "24" to the ifconfig netmask option parameter.
"24" is the CIDR value specified at OCF_RESKEY_netmask parameter as is.

Now looking into the IPaddr script code, see Line 820-823 at
ip_validate_all(). The if statement at Line 821 fails in the case
of above because OCF_RESKEY_netmask was specified, therefore the
OCF_RESKEY_netmask value remains "24" which will be eventually
passed to the ifconfig command. The dot notation value
calculated by findif is assigned into $tmp but the value is
never used in this scenario, therefore a bug in findif is not the case.


I would suggest another patch. In this patch,

 - Replace the netmask value by the calculated one if the
   specified netmask value and the calculated value are not matched.

   In the original IPaddr script there are comment lines below
   and this is the case we should use the calculated value.

   # We cant do this because netmask used to take the CIDR form...
   #elif [ x$tmp != x${OCF_RESKEY_netmask} ]; then


- Modified the description in meta_data so that it reflects the
  specification to avoid confusion.


Thanks,
-- 
Keisuke MORI
NTT DATA Intellilink Corporation

diff -r f220e888f07d resources/OCF/IPaddr.in
--- a/resources/OCF/IPaddr.in	Mon Nov 27 10:48:28 2006 +0100
+++ b/resources/OCF/IPaddr.in	Tue Nov 28 20:20:44 2006 +0900
@@ -121,8 +121,8 @@ routing table.
 
 
 
-The netmask for the interface in CIDR format.
-(e.g., 255.255.255.0 and not 24)
+The netmask for the interface either in dot notation or in CIDR format.
+(e.g., 255.255.255.0 or 24)
 
 If unspecified, the script will also try to determine this from the
 routing table.
@@ -818,13 +818,10 @@ ip_validate_all() {
 fi
 
 tmp=`echo "$NICINFO" | cut -f2 | cut -d ' ' -f2`
-if [ "x$OCF_RESKEY_netmask" = "x" ]; then
+# Replace the value when unmatched in order to accept netmask in the CIDR form.
+if [ "x$OCF_RESKEY_netmask" = "x" -o x$tmp != x${OCF_RESKEY_netmask} ]; then
 	ocf_log info "Using calculated netmask for ${OCF_RESKEY_ip}: $tmp"
 	OCF_RESKEY_netmask=$tmp
-# We cant do this because netmask used to take the CIDR form...
-#elif [ x$tmp != x${OCF_RESKEY_netmask} ]; then
-#	ocf_log err "Invalid parameter value: netmask [$OCF_RESKEY_netmask [Calculated netmask: $tmp]"
-#	return $OCF_ERR_ARGS
 fi
 
 tmp=`echo "$NICINFO" | cut -f3 | cut -d ' ' -f2`
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Filesystem RA for V1 in -dev does not work

2006-12-20 Thread Keisuke MORI

Just my two cents...

Filesystem RA for V1 (native heartbeat RA) in the current -dev tree
fails with the following error:

[EMAIL PROTECTED] resource.d]# /etc/ha.d/resource.d/Filesystem /dev/sdc2 
/mnt/shared-disk/ start
/etc/ha.d/resource.d/Filesystem: line 58: syntax error near unexpected token 
`;;'
/etc/ha.d/resource.d/Filesystem: line 58: `export OCF_RESKEY_fstype;;'


Beside this, it wrongly calls OCF IPaddr RA at last.
The attached patch will fix them.

Thanks,
-- 
Keisuke MORI
NTT DATA Intellilink Corporation

diff -r 171967143fa0 resources/heartbeat/Filesystem.in
--- a/resources/heartbeat/Filesystem.in	Tue Dec 19 11:54:14 2006 +0100
+++ b/resources/heartbeat/Filesystem.in	Wed Dec 20 16:37:23 2006 +0900
@@ -55,15 +55,15 @@ fi
 
 if [ "x$2" != "x" ]; then
 OCF_RESKEY_fstype=$1; shift
-export OCF_RESKEY_fstype;;
+export OCF_RESKEY_fstype
 fi
 
 if [ "x$2" != "x" ]; then
 OCF_RESKEY_options="$1";  shift
-export OCF_RESKEY_options;;
+export OCF_RESKEY_options
 fi
 
-OCF_TYPE=IPaddr
+OCF_TYPE=Filesystem
 OCF_RESOURCE_INSTANCE=${OCF_TYPE}_$OCF_RESKEY_device
 export OCF_TYPE OCF_RESOURCE_INSTANCE
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] IPaddr in -dev does not work

2006-12-20 Thread Keisuke MORI

Andrew,

"Andrew Beekhof" <[EMAIL PROTECTED]> writes:
> On 12/14/06, Lars Marowsky-Bree <[EMAIL PROTECTED]> wrote:
>> On 2006-12-14T13:34:41, Andrew Beekhof <[EMAIL PROTECTED]> wrote:
>>
>> > actually it wont work if you supply a CIDR netmask as "netmask", thats
>> > one case we dont handle.
>> >
>> > if you want that, you'll have to pull "OCF_RESKEY_netmask=$tmp" out of
>> > it's if-block
>>
>> I think it should work - regardless of which format the netmask is
>> specified in. Path of least surprise and so on.
>>
>> I've got to run off now, can you make that fix and check that it works
>> please?
>
> sure

I confirmed that all my problems are resolved with the latest -dev.

Thank you for fixing them!
-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Filesystem RA for V1 in -dev does not work

2006-12-21 Thread Keisuke MORI

"Andrew Beekhof" <[EMAIL PROTECTED]> writes:
> I've made a fix locally - I'll push it up later today
> Thanks for spotting it

Andrew,

Thanks for updating it, but one more line needs to be fixed.

-OCF_TYPE=IPaddr
+OCF_TYPE=Filesystem

Otherwise, it will try to assign an IP address instead of mount:-)



>
> On 12/20/06, Keisuke MORI <[EMAIL PROTECTED]> wrote:
>> Just my two cents...
>>
>> Filesystem RA for V1 (native heartbeat RA) in the current -dev tree
>> fails with the following error:
>>
>> [EMAIL PROTECTED] resource.d]# /etc/ha.d/resource.d/Filesystem /dev/sdc2 
>> /mnt/shared-disk/ start
>> /etc/ha.d/resource.d/Filesystem: line 58: syntax error near unexpected token 
>> `;;'
>> /etc/ha.d/resource.d/Filesystem: line 58: `export OCF_RESKEY_fstype;;'
>>
>>
>> Beside this, it wrongly calls OCF IPaddr RA at last.
>> The attached patch will fix them.
>>
>> Thanks,

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Announcing! Release 2.0.8 of Linux-HA is now available!

2007-01-12 Thread Keisuke MORI

Alan and all developers,

Alan Robertson <[EMAIL PROTECTED]> writes:

> The Linux-HA team proudly announces feature and bug-fix release 2.0.8
> of the Linux-HA (aka "heartbeat", aka "OpenHA") software.

Thank you for your great work for Release 2.0.8!

But I have a trouble with building rpm from the tar ball.

$ sudo  env LANG=C ./ConfigureMe rpm --with-group-id=90  --with-ccmuser-id=90 
--enable-snmp-subagent 2>&1   | tee configure.log
(...)
gmake[1]: Entering directory `/home/ksk/proj/LinuxHA/heartbeat-2.0.8/tsa_plugin'
gmake[1]: *** No rule to make target `ha_mgmt_client.i', needed by 
`ha_mgmt_client_wrap.c'.  Stop.
gmake[1]: Leaving directory `/home/ksk/proj/LinuxHA/heartbeat-2.0.8/tsa_plugin'
gmake: *** [distdir] Error 1

There's no problem for just doing make and make install, nor
building RPM from the Hg repository,
but the problem arises when doing 'make rpm' from the tar ball.
I'm working on RedHat 4.

Maybe some files are missing for the tar ball.

Thanks,

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] pgsql RA improvements

2007-02-23 Thread Keisuke MORI

Hi,

We have found a several problems with pgsql RA through our testing.
It 'fails to failover' in some scenarios.
I'm proposing a patch to fix them.

Problem description:

1) The first 'monitor' may fail even if the postmaster was
   successfully launched.

   This is because 'start' of the pgsql may return before the
   postmaster gets ready to answer to a psql query issued by
   'monitor', since it only checks the existance of postmaster
   process. The postmaster can take a few minitues to get ready
   to answer, particularly when it needs to recover the database
   after a crash. Even if no recovery is necessary, we observed
   that it sometimes fails in some of our test cases.

2) The postmaster fails to startup when 'postmaster.pid' file
   was left over from the previous crash.

3) 'stop' doest not execute the fast mode shutdown effectively,
   because it executes the immediate mode shutdown at the very
   next moment.  The fast mode shutdown can take a few minutes
   to complete to flush the database log.

   This isn't a critical problem, but it may result to take a
   time longer to complete the failover (according to our
   database team). It is preferable to wait to complete the fast
   mode shutdown as long as possible.


Proposals to fix:

1) In 'start', wait until the postmaster gets ready to answer by
   checking as same as 'monitor' does.
   The maximum wait time to complete to startup can be
   customized by an additional parameter 'start_wait'. 

2) Add a cleanup code for 'postmaster.pid' when stop and before starting.

3) In 'stop', wait until the postmaster completes to the fast
   mode shutdown.
   The maximum wait time to complete to shutdown can be
   customized by an additional parameter 'stop_wait. 


The attached patch is for the latest -dev.

Regards,

Keisuke MORI
NTT DATA Intellilink Corporation

diff -r 7dbd2d974acc resources/OCF/pgsql.in
--- a/resources/OCF/pgsql.in	Mon Feb 19 15:25:07 2007 +0100
+++ b/resources/OCF/pgsql.in	Tue Feb 20 21:25:52 2007 +0900
@@ -19,6 +19,8 @@
 #  OCF_RESKEY_pgport - Port where PostgreSQL is listening
 #  OCF_RESKEY_pgdb   - database to monitor. Default is template1
 #  OCF_RESKEY_logfile - Path to PostgreSQL log file. Default is /dev/null
+#  OCF_RESKEY_start_wait - Start waiting time. Default is 30
+#  OCF_RESKEY_stop_wait - Stop waiting time. Default is 30
 ###
 # Initialization:
 
@@ -127,6 +129,20 @@ Path to PostgreSQL server log output fil
 
 logfile
 
+
+
+
+Start waiting time.
+
+start_wait
+
+
+
+
+Stop waiting time.
+
+stop_wait
+
 
 
 
@@ -178,6 +194,9 @@ pgsql_start() {
 
 if [ -x $PGCTL ]
 then
+	# Remove postmastre.pid if it exists
+	rm -f $PIDFILE
+
 # Check if we need to create a log file
 if ! check_log_file $LOGFILE
 	then
@@ -196,15 +215,35 @@ pgsql_start() {
 	ocf_log err "$PGCTL not found!"
 	return $OCF_ERR_GENERIC
 fi
-	
-if ! pgsql_status
-then
-	sleep 5
-	if ! pgsql_status
-	then	
-	echo "ERROR: PostgreSQL is not running!"
-return $OCF_ERR_GENERIC
-	fi
+
+# start waiting
+count=0
+PRESULT=1
+while [ $count -lt $START_WAIT ]
+do
+if pgsql_status
+then
+if [ -z "$PGHOST" ]
+then
+   $PSQL -p $PGPORT -U $PGDBA $PGDB -c 'select now();' >/dev/null 2>&1
+else
+   $PSQL -h $PGHOST -p $PGPORT -U $PGDBA $PGDB -c 'select now();' >/dev/null 2>&1
+fi
+PRESULT=$?
+
+if [ $PRESULT -eq 0 ]
+	then
+break;
+fi
+fi
+count=`expr $count + 1`
+sleep 1
+done
+
+if [ $PRESULT -ne  0 ]
+then
+	ocf_log err "PostgreSQL is not running!"
+	return $OCF_ERR_GENERIC
 fi
 
 return $OCF_SUCCESS
@@ -221,11 +260,27 @@ pgsql_stop() {
 # Stop PostgreSQL do not wait for clients to disconnect
 runasowner "$PGCTL -D $PGDATA stop -m fast > /dev/null 2>&1"
 
+# stop waiting
+count=0
+while [ $count -lt $STOP_WAIT ]
+do
+if ! pgsql_status
+then
+#PostgreSQL stopped
+break;
+fi
+count=`expr $count + 1`
+sleep 1
+done
+
 if pgsql_status
 then
#PostgreSQL is still up. Use another shutdown mode.
runasowner "$PGCTL -D $PGDATA stop -m immediate > /dev/null 2>&1"
 fi
+	
+# Remove postmastre.pid if it exists
+rm -f $PIDFILE
 
 return $OCF_SUCCESS
 }
@@ -348,6 +403,8 @@ PGDB=${OCF_RESKEY_pgdb:-template1}
 PGDB=${OCF_RESKEY_pgdb:-template1}
 LOGFILE=${OCF_RESKEY_logfile:-/dev/null}
 PIDFILE=${PGDATA}/postmaster.pid
+START_WAIT=${OCF_RESKEY_start_wait:-"30"}

Re: [Linux-ha-dev] pgsql RA improvements

2007-02-25 Thread Keisuke MORI

Serge,

Thanks for reviewing the patch.

"Serge Dubrouski" <[EMAIL PROTECTED]> writes:

> And I don't like the idea of removing PID in "start" function. The
> standard approach if to remove it after stopping application. Other
> way it could lead to attempt of starting a second copy of application.

This is necessary for the recovery from the power failure of the
primary node, for example. There is no chance to cleanup by stop
in such cases.  

Duplicate starting is avoided by checking if the postmaster
process exists beforehand, as the original script does.

>
> On 2/23/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
>> I like the idea of the patch, but honestly I don't like how it's
>> implemented. It shall call (as Andrew suggested) "monitor" function to
>> check that pgsql is up or down instead of spreading the same code all
>> around the script. I'd like to review the idea and prepare another
>> patch if everybody is agree.

Yes, using the same monitor function would be better.
I didn't do that just because it will dump many logs every
seconds when it takes time to start.
It is OK if you don't mind it.

Thanks,
-- 
Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] pgsql RA improvements

2007-02-25 Thread Keisuke MORI

"Serge Dubrouski" <[EMAIL PROTECTED]> writes:
>> "Serge Dubrouski" <[EMAIL PROTECTED]> writes:
>>
>> > And I don't like the idea of removing PID in "start" function. The
>> > standard approach if to remove it after stopping application. Other
>> > way it could lead to attempt of starting a second copy of application.
>>
>> This is necessary for the recovery from the power failure of the
>> primary node, for example. There is no chance to cleanup by stop
>> in such cases.
>>
>> Duplicate starting is avoided by checking if the postmaster
>> process exists beforehand, as the original script does.
>
> Yes, but in this case you remov the legitimate pid file from the
> running instance. You remove it before testing that the checking for
> postmaster. 

Well, I think that the script does the cheking for postmaster first
and removing it second (remove it only when no postmaster process exists).

Here's the code snip with my patch.
pgsql_status checks for it and I think it should be good enough.
8<8<8<8<8<8<8<8<
pgsql_start() {
if pgsql_status
then
ocf_log info "PostgreSQL is already running. PID=`cat $PIDFILE`"
return $OCF_SUCCESS
fi

if [ -x $PGCTL ]
then
# Remove postmastre.pid if it exists
rm -f $PIDFILE
8<8<8<8<8<8<8<8<


> Let me think about it, I don't know what is worse in a
> such case. Probably you are right and we has the right to think that
> Postgress shouldn't be started outside of cluster control.

If postmaster was already started outside of heartbeat control,
then it should return OCF_SUCCESS and the postmaster should
continue to run.

Power failure is one of the most typical situation that we want
to save with HA software, so this 'cleanup in start' is
important, I think.

Maybe it would be nice if we put a WARN log before removing it.

Thanks,

>
>>
>>
>> >
>> > On 2/23/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
>> >> I like the idea of the patch, but honestly I don't like how it's
>> >> implemented. It shall call (as Andrew suggested) "monitor" function to
>> >> check that pgsql is up or down instead of spreading the same code all
>> >> around the script. I'd like to review the idea and prepare another
>> >> patch if everybody is agree.
>>
>> Yes, using the same monitor function would be better.
>> I didn't do that just because it will dump many logs every
>> seconds when it takes time to start.
>> It is OK if you don't mind it.
>
> Don't think that this is a problem. Those files are big even without
> those records.
>
> Thanks for all these proposals.
>
>>
>> Thanks,
>> --
>> Keisuke MORI
>> NTT DATA Intellilink Corporation
>> ___
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>>
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

-- 
Keisuke MORI
Open Source Business Division
NTT DATA Intellilink Corporation
Tel: +81-3-3534-4811 / Fax: +81-3-3534-4814
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] trouble building 2.0.8 RPM from tarball

2007-03-06 Thread Keisuke MORI

Brian Reichert <[EMAIL PROTECTED]> writes:
> On Tue, Mar 06, 2007 at 03:51:28PM -0500, Brian Reichert wrote:
>> I'm running into the same problem as is this fellow:
>> 
>>   <http://www.mail-archive.com/linux-ha-dev@lists.linux-ha.org/msg01856.html>
>> 
>> Unfortunately, I can't find any archive where his issue is addressed. :/

My workaround is this:

# touch tsa_plugin/ha_mgmt_client.i
# touch tsa_plugin/ha_mgmt_client_wrap.c
# ./ConfigureMe rpm (with options you want) 

http://www.gossamer-threads.com/lists/linuxha/users/36629#36629


Is this fixed with the -dev tree already?
-- 
Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] trouble building 2.0.8 RPM from tarball

2007-03-11 Thread Keisuke MORI

Keisuke MORI <[EMAIL PROTECTED]> writes:
> Brian Reichert <[EMAIL PROTECTED]> writes:
>> On Tue, Mar 06, 2007 at 03:51:28PM -0500, Brian Reichert wrote:
>>> I'm running into the same problem as is this fellow:
>>> 
>>>   
>>> <http://www.mail-archive.com/linux-ha-dev@lists.linux-ha.org/msg01856.html>
>>> 
>>> Unfortunately, I can't find any archive where his issue is addressed. :/
>
> My workaround is this:
>
> # touch tsa_plugin/ha_mgmt_client.i
> # touch tsa_plugin/ha_mgmt_client_wrap.c
> # ./ConfigureMe rpm (with options you want) 
>
> http://www.gossamer-threads.com/lists/linuxha/users/36629#36629
>
>
> Is this fixed with the -dev tree already?

I confirmed that the same problem still exist in -dev tree.
$ hg id
e665c504932d tip

Filed to #1514.
-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Proposed fixes for release 2.1.1

2007-07-17 Thread Keisuke MORI

Hi,

Could anyone take care of the following bugs?

Please find details and my proposed patches on each bugzilla item.

1) LF #1513 building RPM from the 2.0.8 tarball fails
  http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1513

2) LF #1540 MgmtGUI requires explicit port number when connecting
  http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1540

3) LF #960  rpm: first-time install fails
  http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=960


I believe that those of my patches would not break anything
because they are all trivial and not tight related to the core functionalities.


Fixing #1513 is very important I think because it is sad if
users can not build from the tarball which obtained from the
official web site.

As for #1540, it was closed but I reopened it because it still
seems having a problem.


I would appreciate if those fixes would be in the next release.

Regards,

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Proposed fixes for release 2.1.1

2007-07-17 Thread Keisuke MORI

Hi,

(I apologize if you see this message twice. I'm resending this
because I've got an error mail for my first post)


Could anyone take care of the following bugs?

Please find details and my proposed patches on each bugzilla item.

1) LF #1513 building RPM from the 2.0.8 tarball fails
  http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1513

2) LF #1540 MgmtGUI requires explicit port number when connecting
  http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1540

3) LF #960  rpm: first-time install fails
  http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=960


I believe that those of my patches would not break anything
because they are all trivial and not tight related to the core functionalities.


Fixing #1513 is very important I think because it is sad if
users can not build from the tarball which obtained from the
official web site.

As for #1540, it was closed but I reopened it because it still
seems having a problem.


I would appreciate if those fixes would be in the next release.

Regards,

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Proposed fixes for release 2.1.1

2007-07-18 Thread Keisuke MORI

Alan,

Thank you for taking care of them.

Alan Robertson <[EMAIL PROTECTED]> writes:
>> As for #1540, it was closed but I reopened it because it still
>> seems having a problem.
>
> For 1540, I applied your fix.  Did I mess it up?

No, I don't think so.
The problem I pointed out had been there and
I just happened to find it when I was reviewing the code
with regard to this bug.

> I will double check these and make sure they go in.

Thanks again for doing this.

My colleagues and I are now working on the regression test for
our system with the 2.1.1 candidate. We will report you if
something "interesting" happened.

Regards,

Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] crm_resource in 2.1.1 prints info messages

2007-07-19 Thread Keisuke MORI

Lars Marowsky-Bree <[EMAIL PROTECTED]> writes:

> On 2007-07-19T13:42:53, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
>
>> In the latest 2.1.1 crm_resource prints info messages with every invocation:
>> 
>> crm_resource -W -r PTEGroup
>> crm_resource[9813]: 2007/07/19_13:39:05 info: Invoked: crm_resource -W
>> -r PTEGroup
>> resource PTEGroup is running on: goodman
>> 
>> crm_resource -U -r PTEGroup
>> crm_resource[10223]: 2007/07/19_13:41:48 info: Invoked: crm_resource
>> -U -r PTEGroup
>> 
>> And so on. Not a big deal, just annoying. It wasn't like this in
>> previous releases.
>
> That is quite intentional. It shouldn't necessarily print it, but you
> wouldn't believe how helpful it is to be able to see the actual user
> commands executed in the logs ;-)
>
> "Honest I didn't do anything!"
>
> If you have specific suggestions about commands which needn't be logged,
> lets discuss that.

I also feel annoyance with that log.

We are using attrd_updater command in the monitor operation of
a RA to control resources by our own attribute values and rules.
In that case the messages is logged every 10 seconds for example :-(

We might also use crm_resource to run periodically
for monitoring from another system administration tool.

How about changing the log level for this message down to debug level?

Thanks,

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Bug 1655 crmd stucks in S_TERMINATE when pengine was killed

2007-07-20 Thread Keisuke MORI

Hi,

I filed a bug on the subject and cc-ing to the ML and Alan
just in case if this might affect to the upcoming release.
http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1655

As I wroted on the bugzilla, we found this during our regression test.
We suffered this problem on 2.0.8 first and then it seemed to be solved
with the dev version, so we really have been longing and waiting
for the fix is in the official release.

Is there any chance to fix this bug very soon?

Regards,

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Re: Bug 1655 crmd stucks in S_TERMINATE when pengine was killed

2007-07-23 Thread Keisuke MORI

Alan Robertson <[EMAIL PROTECTED]> writes:
>> I filed a bug on the subject and cc-ing to the ML and Alan
>> just in case if this might affect to the upcoming release.
>> http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1655
>> 
>> As I wroted on the bugzilla, we found this during our regression test.
>> We suffered this problem on 2.0.8 first and then it seemed to be solved
>> with the dev version, so we really have been longing and waiting
>> for the fix is in the official release.
>> 
>> Is there any chance to fix this bug very soon?
>
> Hi,
>
> If I delay the release by one week, I'll have to delay it by probably
> about 3 weeks.  But, if the fix is in 'dev', it's probably in 'test' -
> unless it went in during the last week.
>
> I think a better approach would be to go ahead and put out the release,
> and then put out another release in 4-6 weeks or so.

Delaying the release is not what I want either,
so I agree on your approach and please just go ahead for the release.


>
> Everyone agrees a different release methodology is needed.  I have some
> suggestions for how to improve things, but I need to think about them
> some more.  I'll put out my proposal later this week.   Hopefully in a
> week or two of discussion, we can agree on some improvements.
> Surprisingly, I think most people are likely to agree with the most
> important of my suggestions.

This is great. 
We will continue to do our testing anyway, and
I believe we can help you guys better along with the release schedule 
once such the release methodology is getting more clear.


Thanks you very much.

Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] -dev compilate error (raexecocf.c)

2007-08-30 Thread Keisuke MORI

Dejan,

The latest -dev tree(a47dd11d40ef) can not be build on RedHat
with the following error:

raexecocf.c: In function `get_resource_meta':
raexecocf.c:325: warning: implicit declaration of function `nanosleep'

This seems coming from your changeset b591600fbc15.
The attached patch would be needed along with the changeset.

Thanks,
-- 
Keisuke MORI
NTT DATA Intellilink Corporation

diff -r a47dd11d40ef lib/plugins/lrm/raexecocf.c
--- a/lib/plugins/lrm/raexecocf.c	Thu Aug 30 22:54:48 2007 +0200
+++ b/lib/plugins/lrm/raexecocf.c	Fri Aug 31 15:25:09 2007 +0900
@@ -37,6 +37,9 @@
 #include 
 #include 
 #include   /* Add it for compiling on OSX */
+#ifdef HAVE_TIME_H
+#include 
+#endif
 
 #include 
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [Bug 1722] First item in a group is not stopped when the second fails (and can't be migrated)

2007-10-29 Thread Keisuke MORI

Hi,

>From the discussion of bugzilla #1722,
the behavior in the following situation is now changed
in the recent version.

* When the second resource in a group failed and there is no node
  available to run;

  2.1.2: all resources in the group are stopped.

  obs-2.1.2-15: the second resource is stopped, but the first resource
is not stopped and continues to run.

cf. http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1722

Andrew told me that this is a bug fix and the 2.1.2's behavior was wrong.

But for me (and my customers) this is a big and fundamental
change from the opearation point of view.


Andrew,
May I ask a favor of you;

1) Is it possible to modify to preserve the compatibility about
   this behavior?

   I think that the old behavior is preferable because running
   "a part of the group" is pointless from the service
   availability's point of view and confusing to users.


2) If it would not be possible, then would you tell me
   what is the "correct" configuration to achieve the same
   result as 2.1.2 in the new version?
   (with "correct" I mean "by design" and "unlikely change in
   the near future")

   I'm also wondering how anybody else configures about this behavior.


Regards,

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] enabling valgrind option

2007-10-29 Thread Keisuke MORI

Hi,

The attached patch is needed to enable the valgrind option.

Thanks,

-- 
Keisuke MORI
NTT DATA Intellilink Corporation
diff -r 8b07ee716560 heartbeat/config.c
--- a/heartbeat/config.c	Thu Oct 25 10:05:45 2007 +0200
+++ b/heartbeat/config.c	Mon Oct 29 18:11:47 2007 +0900
@@ -2542,7 +2542,7 @@ set_release2mode(const char* value)
 		r2size = DIMOF(r2minimal_dirs);
 
 	} else if (0 == strcasecmp("valgrind", value)) {
-#if ENABLE_LIBC_MALLOC	
+#if CL_USE_LIBC_MALLOC
 		r2dirs = &r2valgrind_dirs[0];
 		r2size = DIMOF(r2valgrind_dirs);
 		setenv("HA_VALGRIND_ENABLED", "1", 1);
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [Bug 1722] First item in a group is not stopped when the second fails (and can't be migrated)

2007-10-29 Thread Keisuke MORI

"Andrew Beekhof" <[EMAIL PROTECTED]> writes:
>>I think that the old behavior is preferable because running
>>"a part of the group" is pointless from the service
>>availability's point of view and confusing to users.
>
> no.
> just because items later in the group fail doesn't mean the rest of
> the group should be stopped.

In the HA database cluster, the database service is typically provided
by the group like:
  Filesystem + MySQL + IP

If any of the resources failed then the database service is no
longer available. Running only Filesystem does not mean anything to
"the service availability."

> consider:
>   IP + Filesystem + Apache + MySQL
>
> Just because MySQL fails doesn't mean Apache, the Filesystem nor the
> IP should be stopped.

I can understand that, but in that case,
I think it would be more straightforward to have
two separate groups; one for the database server and the other
for the web server, because they can run independently, right?

We usually group up resources because they need to run together to
provide "the service" (database, web server, or whatever) as a whole,
therefore running a part of the group does not make sense.

>
>> 2) If it would not be possible, then would you tell me
>>what is the "correct" configuration to achieve the same
>>result as 2.1.2 in the new version?
>>(with "correct" I mean "by design" and "unlikely change in
>>the near future")
>>
>>I'm also wondering how anybody else configures about this behavior.
>
> Let me instead ask what you believe you gain by stopping the first resource.

Because it is just simple and intuitive for users.

And I believe that the most of commercial HA software would also
behave like this (at least in the typical usage).
Our costomers are considering to migrate from a commercial HA
software to heartbeat, and all of them expect to behave like
this so far.

At least it would be nice if it's able to be customized, I would think.

Regards,

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [patch] Fix potential memory leaks in the HB client library

2007-10-30 Thread Keisuke MORI

Hi,

I've been testing the heartbeat with valgrind enabled,
and found that it reported a couple of leaks which are
in the heartbeat API client library.

I'm submitting my proposed patch to fix them, so
could somebody please review it and the correctness?

In my understanding, these leaks are not so serious 
because the leaks only happens when the heartbeat exits,
but it may be a problem if a HB client does signon()/signoff()/delete()
repeatedly in a single process.


The valgrind report is below:
(It's excerpt. I used the hbagent to generate this, but it has
some other problems and I'm still looking into it.)

8<8<8<8<8<8<8<
(1) ==13900== 72 (28 direct, 44 indirect) bytes in 1 blocks are definitely lost 
in loss record 8 of 47
(1) ==13900==at 0x4004405: malloc (vg_replace_malloc.c:149)
(1) ==13900==by 0xAB1B72: g_malloc (in /usr/lib/libglib-2.0.so.0.400.7)
(1) ==13900==by 0xAA142E: g_hash_table_new_full (in 
/usr/lib/libglib-2.0.so.0.400.7)
(1) ==13900==by 0xAA14C4: g_hash_table_new (in 
/usr/lib/libglib-2.0.so.0.400.7)
(1) ==13900==by 0x404D5A0: hb_api_signon (client_lib.c:359)
(1) ==13900==by 0x804A12B: init_heartbeat (hbagent.c:572)
(1) ==13900==by 0x804AF1B: main (hbagent.c:1382)
(1) ==13900== 
(1) ==13900== 
(1) ==13900== 881 (12 direct, 869 indirect) bytes in 1 blocks are definitely 
lost in loss record 21 of 47
(1) ==13900==at 0x4004405: malloc (vg_replace_malloc.c:149)
(1) ==13900==by 0x4049760: enqueue_msg (client_lib.c:1535)
(1) ==13900==by 0x404B51E: read_api_msg (client_lib.c:1684)
(1) ==13900==by 0x404D680: hb_api_signon (client_lib.c:396)
(1) ==13900==by 0x804A12B: init_heartbeat (hbagent.c:572)
(1) ==13900==by 0x804AF1B: main (hbagent.c:1382)
8<8<8<----8<8<8<8<


-- 
Keisuke MORI
NTT DATA Intellilink Corporation

diff -r 8b07ee716560 lib/hbclient/client_lib.c
--- a/lib/hbclient/client_lib.c	Thu Oct 25 10:05:45 2007 +0200
+++ b/lib/hbclient/client_lib.c	Tue Oct 30 19:22:48 2007 +0900
@@ -161,6 +161,7 @@ static void		zap_iflist(llc_private_t*);
 static void		zap_iflist(llc_private_t*);
 static void		zap_order_seq(llc_private_t* pi);
 static void		zap_order_queue(llc_private_t* pi);
+static void		zap_msg_queue(llc_private_t* pi);
 static int		enqueue_msg(llc_private_t*,struct ha_msg*);
 static struct ha_msg*	dequeue_msg(llc_private_t*);
 static gen_callback_t*	search_gen_callback(const char * type, llc_private_t*);
@@ -361,8 +362,9 @@ hb_api_signon(struct ll_cluster* cinfo, 
 
 	/* Connect to the heartbeat API server */
 
-	if ((pi->chan = ipc_channel_constructor(IPC_ANYTYPE, wchanattrs))
-	==	NULL) {
+	pi->chan = ipc_channel_constructor(IPC_ANYTYPE, wchanattrs);
+	g_hash_table_destroy(wchanattrs);
+	if (pi->chan == NULL) {
 		ha_api_log(LOG_ERR, "hb_api_signon: Can't connect"
 		" to heartbeat");
 		ZAPMSG(request);
@@ -504,7 +506,8 @@ hb_api_delete(struct ll_cluster* ci)
 	zap_iflist(pi);
 	zap_nodelist(pi);
 
-	/* What about our message queue? */
+	/* Free up the message queue */
+	zap_msg_queue(pi);
 
 	/* Free up the private information */
 	memset(pi, 0, sizeof(*pi));
@@ -1479,6 +1482,23 @@ zap_order_queue(llc_private_t* pi)
 	}
 	pi->order_queue_head = NULL;
 }
+
+static void
+zap_msg_queue(llc_private_t* pi)
+{
+	struct MsgQueue* qelem = pi->firstQdmsg;
+	struct MsgQueue* next;
+
+	while (qelem != NULL){
+		next = qelem->next;
+		ZAPMSG(qelem->value);
+		cl_free(qelem);
+		qelem = next;	 
+	}
+	pi->firstQdmsg = NULL;
+	pi->lastQdmsg = NULL;
+}
+
 
 /*
  * Create a new stringlist.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] Proposal SNMP subagent extention for CRM resources

2007-11-15 Thread Keisuke MORI

Dejan,

Dejan Muhamedagic <[EMAIL PROTECTED]> writes:
> Hi,
>
> On Fri, Nov 09, 2007 at 03:12:29PM +0900, Keisuke MORI wrote:
>> Hello all,
>> 
>> I would like to propose an extention for the SNMP hbagent 
>> so that it can handle the CRM resource information provided by
>> Heartbeat Version 2.
>> 
>> The attached patch is my proposal implementation.
>> The patch is already tested and debugged by our team
>> with using valgrind. 
>> 
>> 
>> I would appreciate any comments and suggestions
>> to make it more usable for everybody in the community.
>> 
>
> I'll take a look at the code.
>
> Thanks a lot for the contribution.

Thank you for taking a look at it.

Please advise me if there's any suspicious code.
We'll correct it.

Thanks,

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] Proposal SNMP subagent extention for CRM resources

2007-11-15 Thread Keisuke MORI

Andrew,

Thank you for your comments.

Andrew Beekhof <[EMAIL PROTECTED]> writes:
>> I would appreciate any comments and suggestions
>> to make it more usable for everybody in the community.
>
> you might want to include so other data (such as failed, is_managed,
> etc) in the trap
>
> also, it might be an idea to include a back-link for resources that
> are in groups/clones/etc
>
>
> 
> NOTE :   This trap is sent only when the resource operation
> succeeds.
>Concretely, the extended hbagent gets the cib information
> when it
>changes, and parse it. And if the rc_code of the operation
> (like
>CRMD_ACTION_START) is "0", then the hbagent sends a trap.
> 
>
> it worries me a little that you only send the trap when rc=0...
> you don't want to know about failed actions?

The intention of the trap is letting you know the current status
of resources (such as running/stopped/etc.), and not the result
of each operations. This is similar to LHAResourceGroupStatus
object which is the resource status V1.
(The note above is just for an explanation that how it's implemented.)

But, yeah, your point is right and it might be also useful.
Does anybody want to use this information?

We're considering to extend it more, but before we proceed
I would like to design the new MIB definition first.
Does anyone have comments for this?

I would like to hear more, particularly about what kind of
information is needed from whom really wants to use the SNMP agent.

Regards,

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [patch] Fix potential memory leaks in the HB client library

2007-11-18 Thread Keisuke MORI

Dejan,

Dejan Muhamedagic <[EMAIL PROTECTED]> writes:
> Hi,
>
> On Tue, Oct 30, 2007 at 08:53:54PM +0900, Keisuke MORI wrote:
>> Hi,
>> 
>> I've been testing the heartbeat with valgrind enabled,
>> and found that it reported a couple of leaks which are
>> in the heartbeat API client library.
>> 
>> I'm submitting my proposed patch to fix them, so
>> could somebody please review it and the correctness?
>> 
>> In my understanding, these leaks are not so serious 
>> because the leaks only happens when the heartbeat exits,
>> but it may be a problem if a HB client does signon()/signoff()/delete()
>> repeatedly in a single process.
>> 
(omit)
>
> Your patch is in this changeset:
>
> http://hg.linux-ha.org/dev/rev/84e6520764bf

Thank you for taking care of it.


> BTW, do you have hg write access?

No, I don't.
Is there any authorization procedure to gain the access?


Thanks,

Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [Q] group resources and unmanaged status

2007-12-07 Thread Keisuke MORI

Andrew,

Can I ask a question about the internal status of the PE?

My SNMP subagent code is using cluster_status(pe_working_set_t)
to analyze the current status of resources like crm_mon.

When a parent resource(group/clone/master) is unmanaged,
'running_on' and 'allowed_nodes' member of resource_t gets NULL.
Is this an expected value? or any intention about this?


If the parent resource was managed, those members have node values
according to its children.
In the case of a child resource(primitive), those members always
contain node values no matter if it's in managed or unmanaged.


My SNMP subagent has a minor problem for displaying the status
of an unmanaged group resource and I'm now looking into how
should I fix it.


Thanks,

Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: AW: [Linux-ha-dev] Call for testers: 2.1.3

2007-12-07 Thread Keisuke MORI

"Spindler Michael" <[EMAIL PROTECTED]> writes:

> Hi,
>
>> > > > This problem has been solved. My packaging box didn't have all 
>> > > > necessary packages for building GUI rpm. When I added 
>> them it was 
>> > > > able to build haclinet (GUI) and that find-lang.sh tool 
>> worked fine.
>> > > >
>> > > > I didn't find the problem with pegasus on my CentOS 5.0 
>> but I have 
>> > > > 32 bit version, and the problem was reported for 64 bit.
>> > >
>> > >
>> > > OK.
>> > >
>> > > So, this step should only be included if --enable-mgmt, I guess?
>> > >
>> > 
>> > Right. It establish language settings for the GUI, so it's 
>> not needed 
>> > if GUI isn't needed.
>> 
>> We are trying to build it on RedHat(Red Hat Enterprise Linux 
>> ES release 4 (Nahant Update 4)), and a problem remains before us.
>> Please check Mori-san's patch again.
>> http://developerbugs.linux-foundation.org//attachment.cgi?id=1109
>> 
>> -if test "x${CIMOM}" = "x"; then
>> -if test "x${CIMOM}" = "x"; then 
>> -AC_CHECK_PROG([CIMOM], [cimserver], [pegasus])
>> +if test "x${enable_cim_provider}" = "xyes"; then   # 
>> maybe, here #####
>> +if test "x${CIMOM}" = "x"; then
>> +if test "x${CIMOM}" = "x"; then
>> 
>> I attached the configure.log
>> 
>
> fyi: I was able to build the rpms on RedHat AS 4 without any problems.

The error above should occur only when tog-pegasus packages has
been installed on your RedHat.

I thought that tog-pegasus is installed by default on RedHat ES 4...

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [Q] group resources and unmanaged status

2007-12-07 Thread Keisuke MORI

Andrew Beekhof <[EMAIL PROTECTED]> writes:
> On Dec 7, 2007, at 11:56 AM, Keisuke MORI wrote:
>
>> Andrew,
>>
>> Can I ask a question about the internal status of the PE?
>>
>> My SNMP subagent code is using cluster_status(pe_working_set_t)
>> to analyze the current status of resources like crm_mon.
>>
>> When a parent resource(group/clone/master) is unmanaged,
>> 'running_on' and 'allowed_nodes' member of resource_t gets NULL.
>> Is this an expected value?
>
> I thought that group/clone/master always had NULL... since they can be
> running on more than one node (especially clone and m/s resources)

Judging from the output of the SNMP agent, two pair of a clone and a
primitive are observed and each of the parent clone has the node
which its child primitive is running on.
Maybe my code is doing something wrong, I'll check it again.


>
> I recall also doing something special for unmanaged resources but I
> can probably change that behavior for you.
>
> that said, it would be better to use the recently added API call:
>   node_t *(*location)(resource_t *, GListPtr*, gboolean);
>
> eg.
>node_t *native_location(resource_t *rsc, GListPtr *list, gboolean
> current)
>

Thanks, I will look into this.

-- 
Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [PATCH] SNMP subagent syslog fix

2007-12-11 Thread Keisuke MORI

Hi,

The attached patch fixes the SNMP subagent so that
it obeys the syslog policy of heartbeat; 1) use logd if it's
enabled. 2) the default syslog facility is taken from the
configure option as well as lrmd, mgmtd, etc.

The current SNMP subagent produces its logs always into LOG_USER
which is hard-coded. This is not good.

This patch can be applied solely
(i.e. independent from the SNMP extention for V2),
so please consider including this patch into 2.1.3.

Regards,

-- 
Keisuke MORI
NTT DATA Intellilink Corporation
diff -r 0890907b816f snmp_subagent/hbagent.c
--- a/snmp_subagent/hbagent.c	Tue Dec 11 01:10:53 2007 +0100
+++ b/snmp_subagent/hbagent.c	Tue Dec 11 17:08:47 2007 +0900
@@ -562,7 +562,10 @@ init_heartbeat(void)
 	hb = NULL;
 
 	cl_log_set_entity("lha-snmpagent");
-	cl_log_set_facility(LOG_USER);
+	cl_log_set_facility(HA_LOG_FACILITY);
+
+	/* Use logd if it's enabled by heartbeat */
+	cl_inherit_logging_environment(0);
 
 	hb = ll_cluster_new("heartbeat");
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] SNMP subagent syslog fix

2007-12-11 Thread Keisuke MORI

Hi,

Dejan Muhamedagic <[EMAIL PROTECTED]> writes:
> Hi,
>
> On Tue, Dec 11, 2007 at 08:26:52PM +0900, Keisuke MORI wrote:
>> Hi,
>> 
>> The attached patch fixes the SNMP subagent so that
>> it obeys the syslog policy of heartbeat; 1) use logd if it's
>> enabled. 2) the default syslog facility is taken from the
>> configure option as well as lrmd, mgmtd, etc.
>> 
>> The current SNMP subagent produces its logs always into LOG_USER
>> which is hard-coded. This is not good.
>> 
>> This patch can be applied solely
>> (i.e. independent from the SNMP extention for V2),
>> so please consider including this patch into 2.1.3.
>
> Thanks for the patch. I can recall vaguely seeing the problem,
> perhaps I even filed a bugzilla for it. Or something. My memory
> isn't in the best shape today.


By grep'ing the source, there are still some hard-coded LOG_USER.
Do they also need to be fix?

In particular, send_arp.c, cl_status.c, xml_diff.c, lrmadmin.c
are visible to end users, I think.



$ hg id
885e02e00632 tip
$ grep -R cl_log_set_facility * | grep LOG_USER
crm/pengine/ptest.c:cl_log_set_facility(LOG_USER);
crm/admin/xml_diff.c:   cl_log_set_facility(LOG_USER);
fencing/test/apitest.c: cl_log_set_facility(LOG_USER);
heartbeat/libnet_util/send_arp.c:cl_log_set_facility(LOG_USER);
lib/hbclient/api_test.c:cl_log_set_facility(LOG_USER);
lib/clplumbing/netstring_test.c:cl_log_set_facility(LOG_USER);
lrm/admin/lrmadmin.c:   cl_log_set_facility(LOG_USER);
lrm/test/apitest.c: cl_log_set_facility(LOG_USER);
membership/ccm/ccm_testclient.c:cl_log_set_facility(LOG_USER);
telecom/apphbd/apphbd.c:cl_log_set_facility(LOG_USER);
telecom/apphbd/apphbtest.c: cl_log_set_facility(LOG_USER);
telecom/recoverymgrd/recoverymgrd.c:cl_log_set_facility(LOG_USER);
tools/cl_status.c:  cl_log_set_facility(LOG_USER);


-- 
Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [PATCH] SNMP subagent fixes for 2.1.3

2007-12-18 Thread Keisuke MORI

Dejan,

Thank you for commiting the SNMP extention to the repository, but
Would you please include the following patches along with it?

1) SNMP: fix a problem on displaying an unmanaged group and to check it
   in SNMPAgentSanityCheck.

2) SNMP: the hbagent obeys to the syslog output policy.
   (This one is the same one as I posted here at
   http://www.gossamer-threads.com/lists/linuxha/dev/0 )

These should be needed if the SNMP extention is going to be
included in 2.1.3.
I hope it isn't too late...

Regards,

Keisuke MORI
NTT DATA Intellilink Corporation

diff -r b7dd4d1b5d08 snmp_subagent/SNMPAgentSanityCheck.in
--- a/snmp_subagent/SNMPAgentSanityCheck.in	Wed Dec 19 13:14:14 2007 +0900
+++ b/snmp_subagent/SNMPAgentSanityCheck.in	Wed Dec 19 13:23:50 2007 +0900
@@ -59,6 +59,7 @@ sleep 3
 
 # start the linux-ha snmp subagent
 @libdir@/heartbeat/hbagent -d &
+HBAGENTPID=$!
 sleep 5
 
 # get the nodename for node0 and node1
@@ -112,6 +113,9 @@ RSC0TYPE="primitive(1)"
 RSC0TYPE="primitive(1)"
 RSC0NODE="$NODE0"
 RSC0STATUS="started(2)"
+RSC0ISMANAGED="managed(1)"
+RSC0FAILCOUNT="0"
+RSC0PARENT=""
 cibadmin --cib_create -o resources \
   -X ''
 sleep 3
@@ -125,6 +129,12 @@ rsc0node=`snmpget -v2c localhost -c publ
   | sed -ne 's/LINUX-HA-MIB::LHAResourceNode.1 = STRING: //p'`
 rsc0status=`snmpget -v2c localhost -c public LHAResourceStatus.1 \
   | sed -ne 's/LINUX-HA-MIB::LHAResourceStatus.1 = INTEGER: //p'`
+rsc0ismanaged=`snmpget -v2c localhost -c public LHAResourceIsManaged.1 \
+  | sed -ne 's/LINUX-HA-MIB::LHAResourceIsManaged.1 = INTEGER: //p'`
+rsc0failcount=`snmpget -v2c localhost -c public LHAResourceFailCount.1 \
+  | sed -ne 's/LINUX-HA-MIB::LHAResourceFailCount.1 = INTEGER: //p'`
+rsc0parent=`snmpget -v2c localhost -c public LHAResourceParent.1 \
+  | sed -ne 's/LINUX-HA-MIB::LHAResourceParent.1 = STRING: //p'`
 
 # check for LHAResourceName
 ret=0
@@ -161,19 +171,45 @@ if test $ret = 0; then
 	fi
 fi
 
+# check for LHAResourceIsManaged
+if test $ret = 0; then
+	echo "rsc0ismanaged = $rsc0ismanaged, RSC0ISMANAGED = $RSC0ISMANAGED" 
+	if test "$rsc0ismanaged" != "$RSC0ISMANAGED"; then
+		echo "failed to get resource ismanaged." >&2
+		ret=1
+	fi
+fi
+# check for LHAResourceFailCount
+if test $ret = 0; then
+echo "rsc0failcount = $rsc0failcount, RSC0FAILCOUNT = $RSC0FAILCOUNT"
+if test "$rsc0failcount" != "$RSC0FAILCOUNT"; then
+echo "failed to get resource failcount." >&2
+ret=1
+fi
+fi
+
+# check for LHAResourceParent
+if test $ret = 0; then
+echo "rsc0parent = $rsc0parent, RSC0PARENT = $RSC0PARENT"
+if test "$rsc0parent" != "$RSC0PARENT"; then
+echo "failed to get resource parent." >&2
+ret=1
+fi
+fi
+
 # show the result.
 if test $ret = 0; then
 	echo "BasicSanityCheck for SNMP Subagent about CRM resources passed."
-	exit 0
 else 
 	echo "BasicSanityCheck for SNMP Subagent about CRM resources failed."
-	exit 1
-fi
-
-
+fi
+
+kill $HBAGENTPID
 if
   [ -f $SNMPPIDFILE -a ! -z $SNMPPIDFILE ]
 then
   kill `cat $SNMPPIDFILE`
-  rm $SNMPPIDFILE
-fi
+  rm -f $SNMPPIDFILE
+fi
+
+exit $ret
diff -r b7dd4d1b5d08 snmp_subagent/hbagentv2.c
--- a/snmp_subagent/hbagentv2.c	Wed Dec 19 13:14:14 2007 +0900
+++ b/snmp_subagent/hbagentv2.c	Wed Dec 19 13:23:50 2007 +0900
@@ -69,7 +69,7 @@ init_resource_table_v2(void)
  * Return the number of resources.
  */
 static int
-update_resources_recursively(GListPtr reslist, int index)
+update_resources_recursively(GListPtr reslist, GListPtr nodelist, int index)
 {
 
 if (reslist == NULL) {
@@ -82,7 +82,7 @@ update_resources_recursively(GListPtr re
 slist_iter(rsc, resource_t, reslist, lpc1,
 {
 cl_log(LOG_DEBUG, "resource %s processing.", rsc->id);
-slist_iter(node, node_t, rsc->allowed_nodes, lpc2,
+slist_iter(node, node_t, nodelist, lpc2,
 {
 struct hb_rsinfov2 *rsinfo;
 enum rsc_role_e rsstate;
@@ -98,12 +98,19 @@ update_resources_recursively(GListPtr re
 
 /* using a temp var to suppress casting warning of the compiler */
 rsstate = rsc->fns->state(rsc, TRUE);
-if (pe_find_node_id(rsc->running_on, node->details->id) == NULL) {
-/*
- * if the resource is not running on current node,
- * its status is "stopped(1)".
- */
-rsstate = RSC_ROLE_STOPPED;
+{
+GListPtr running_on_nodes = NULL;
+
+rsc->fns->location(rsc, &running_on_nodes, TRUE);
+if (pe_find_node_id(
+running_on_nodes, node->

Re: [Linux-ha-dev] [PATCH] Process monitor daemon

2008-01-22 Thread Keisuke MORI

Hi Lars,
thank you for your comments.

Lars Marowsky-Bree <[EMAIL PROTECTED]> writes:

> On 2008-01-16T18:48:06, Keisuke MORI <[EMAIL PROTECTED]> wrote:
>
>> Hello all,
>> 
>> We have developed a new feature that detects a process failure directly
>> to reduce the failover time.
>> 
>> If you're interested in, please try this and give me your comments.
>> 
>> See attached README for details about how to use this.
>> The patch is made for heartbeat-2.1.3.
>
> Hi Keisuke, instead of introducing a secondary process monitoring layer
> which has to be configured additionally, why not instead enhance the
> existing RAs to use a faster process checking technique? 

The background of why we developed this tool is that:
1) We want to detect a process failure asynchronously,
   not only by the periodic monitor operations, to cause a
   failover faster to minimize the service downtime.
2) We want to make it usable as "an additional feature" for
   arbitrary applications without modifying existent RAs and
   the application itself.

As for 1), the asynchronous detection can not be achieved by
the normal monitor operation of RAs. And some users who care the
behavior under the heavy load think that a monitor operation
itself is heavy because it does fork/exec some shell commands,
and that eventually delays the detection of failures.

As for 2), the current usage of this tools is that, if you have
already configured your system, just make a group your RA
with 'procdctl' RA and add some parameters in cib.xml. You don't
need to modify and test your application/RA again in order to
use this function.

>
> If you are going this way, I think the RAs starting the individual
> [services should sign in and tell procd (if it's running) what they want
> monitored, and what rsc id it relates to - and, of course, notify procd
> before stopping the process.
>
> Instead of scanning /proc, which is very very Linux-specific, why not
> use the async waitpid call instead? Or, you might decide to
> poll()/select()/inotify() on the relevant proc dirs at least.

Yes, we're aware that the current implementation is very Linux
specific, and we're grad to modify it to be more portable.

But for those techniques, waitpid() can handle only its child
process and it can not be used to monitor a process launched
by heartbeat. By using poll()/select()/inotify(), it can not be
detect if a process gets to "the zombie state" as long as we studied.
Please let me know if I'm wrong, or there's better way to do this.

For the meantime, why don't we start from "Linux only" as the
start point and fix it more portable upon users demands?

>
> You can further use the async failure notification feature of the LRM to
> directly tell it when a monitored process has failed; no need to do so
> via the monitor call, which would then only need to be a backup function
> to make sure procd is still running.
>
> This would further reduce the error detection latency.

Well, probably I'm missing your point...,
the procd is already using the asynchronous notification to the
CRM in the same manner of 'crm_resource -F' command and that is
the primary purpose of this tool.

Please point me out if I'm misunderstanding what you mean.

> procd also probably should be started by a RA, not by a respawn line.

It's a respawned daemon because it can be used if you want to
montor two or more applications. 

>
>> +if [ -f ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs ]; then
>> +FUNCTION_FILE="${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs"
>> +elif [ -f /usr/lib64/heartbeat/ocf-shellfuncs ]; then
>> +FUNCTION_FILE="/usr/lib64/heartbeat/ocf-shellfuncs"
>> +elif [ -f /usr/lib/heartbeat/ocf-shellfuncs ]; then
>> +FUNCTION_FILE="/usr/lib/heartbeat/ocf-shellfuncs"
>> +else
>> +echo "${OCF_RESOURCE_INSTANCE} ocf-shellfuncs file doesn't exist." >&2
>> +exit 1
>> +fi
>
> This seems unneeded, the other RAs have the proper code too - to just
> source a single file.

The code above is from our compatibility code with 2.0.8 (because we
have a customer already providing their service with 2.0.8),
but yes, it's not needed for the future release and I will remove it.

>
>> +PROCD="/usr/lib/heartbeat/procd"
>
> No hardcoded paths please.
>
>> +> start-delay="60" />
>
> Start-delay shouldn't be needed.
>
>> +#if 1
>> +crm_log_init(crm_system_name, LOG_INFO, TRUE, FALSE, argc, argv);
>> +#else
>> +/* for before heartbeat 2.1.2 */
>> +crm_log_init(crm_system_name, TRUE);
>> +#endif
>
> Ple

Re: [Linux-ha-dev] [PATCH] Process monitor daemon (revised)

2008-04-14 Thread Keisuke MORI

Hi Lars,

Thank all of you for reviewing and making a suggestion.

I think I understand your point as the Heartbeat architecture,
but it would require re-writing the code almost all ;-)

I will discuss with my colleagues about what we can do for procd
as the next step.



Lars Marowsky-Bree <[EMAIL PROTECTED]> writes:

> On 2008-02-27T20:39:13, Keisuke MORI <[EMAIL PROTECTED]> wrote:
>
> Hi Keisuke-san,
>
> thanks for your patch and contribution. I have to apologize in the name
> of everyone for the late feedback.
>
> I really appreciate the idea of monitoring processes directly, and
> receiving async failure notifications to reduce fail-over times.
>
> I have just discussed this with Dejan and Andrew, and we think that the
> best path forward, alas necessary before inclusion, is to
>
> - Make procd independent of Pacemaker. It should talk only to the RAs
>   and the LRM.
>
> - RAs should "sign in" with it for the processes they want monitored,
>   instead of listing the processes in the procd configuration section
>   (means it gets decoupled from the CIB further). The RAs could write a
>   record to /var/run/heartbeat/procd/, for example. 
>   
>   The RAs would add/remove the required processes on start/promote or
>   demote/stop. (So procd itself would not need to be master-slave.)
>
>   I'm afraid that having users manually specify process lists in the CIB
>   really is not workable - the users will not be able to get this
>   right.
>
> - Instead of respawning procd, there should be a resource agent which
>   starts/stops (and monitors!) procd. You already have one, but why
>   doesn't it go into resources/OCF/ ?

We've only thought to use procd by respawning so far
and we didn't have a such RA yet.


>
> - procd should talk to the LRM to insert a "fake" failed resource
>   action, which would then cause the CRM/PE to handle the resource as
>   failed and initiate recovery. (This is not currently possible with the
>   LRM client library; you could exec crm_resource -F, which would mean
>   you no longer have a build-time dependency on the CRM.)
>
> - This would have the advantage of decoupling procd from pacemaker as
>   well as heartbeat. It could be included with the LRM/RA package build,
>   and possibly be useful with other cluster managers too.
>
> I think all that would help simplify the code.
>
>
>> +#define RSCID_LEN  128 /* ref. include/lrm/lrm_api.h */
>> +#define MAX_PID_LEN256 /* ref. lrm/lrmd/lrmd.h */
>> +#define MAX_LISTEN_NUM 10 /* ref. lib/clplumbing/ipcsocket.c */
>
> If you're referencing from other include files, please do include the
> includes as to avoid diverging header definitions.
>

Right.


Regards,

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Re: [RFC] heartbeat-2.1.4

2008-04-22 Thread Keisuke MORI

Hi,

"Andrew Beekhof" <[EMAIL PROTECTED]> writes:

> On Wed, Apr 16, 2008 at 1:31 PM, HIDEO YAMAUCHI
> <[EMAIL PROTECTED]> wrote:
>> Hi Andrew,
>>
>>
>>  > I asked for the right function but the wrong frame number - I should
>>  > have asked for frame 2.  Sorry :(
>>
>>  (gdb) frame 2
>>  #2  0x00416c74 in stop_recurring_action_by_rsc (key=0x755f60, 
>> value=0x755f40,
>>  user_data=0x545a10) at lrm.c:1442
>>  1442if(op->interval != 0 && safe_str_eq(op->rsc_id, rsc->id)) {
>>  (gdb) print *rsc
>>  Variable "rsc" is not available.
>>  (gdb) print *op
>>  No symbol "op" in current context.
>>
>>  Is what or my operation a mistake?
>
> Looks like gcc is being too clever for it's own good (by optimizing
> away some of the variables) :-(
>
> Can you try the following patch please?
>
> diff -r be12cb83cd2d crmd/lrm.c
> --- a/crmd/lrm.c  Wed Apr 16 10:46:59 2008 +0200
> +++ b/crmd/lrm.c  Wed Apr 16 15:02:16 2008 +0200
> @@ -1451,7 +1451,7 @@ stop_recurring_action_by_rsc(gpointer ke
>  {
>   lrm_rsc_t *rsc = user_data;
>   struct recurring_op_s *op = (struct recurring_op_s*)value;
> - 
> + crm_info("op->rsc=%s (%p), rsc=%s (%p)", crm_str(op->rsc_id),
> op->rsc_id, crm_str(rsc->id), rsc->id);
>   if(op->interval != 0 && safe_str_eq(op->rsc_id, rsc->id)) {
>   cancel_op(rsc, key, op->call_id, FALSE);
>   }



I think I found the cause of this issue.
I attached the additional log with your patch (a bit different though)
and the stacktrace.

Here's my observation:

 - An element of pending_ops is removed at lrm.c:L497
 - It is called inside from g_has_table_foreach() at L1475
 - This is violating the usage of g_has_table_foreach() according
   to the glib manual.
 - Therefore the iteration can not proceed correctly and would
   try to refer to a removed element.

http://hg.linux-ha.org/lha-2.1/annotate/333aef5bd4ed/crm/crmd/lrm.c
(...)
946 /* not doing this will block the node from shutting down */
947 g_hash_table_remove(pending_ops, key);
(...)
1475g_hash_table_foreach(pending_ops, stop_recurring_action_by_rsc, 
rsc);

http://library.gnome.org/devel/glib/stable/glib-Hash-Tables.html#g-hash-table-foreach
(...)
The hash table may not be modified while iterating over it (you can't 
add/remove items).


I also attached my suggested patch, although I can not guarantee
the correctness but just to show you the idea.

Thanks,

-- 
Keisuke MORI
NTT DATA Intellilink Corporation



ms-additional-log-20080422.tar.gz
Description: ms-additional-log-20080422.tar.gz
diff -r 333aef5bd4ed -r 36c0fd90691d crm/crmd/lrm.c
--- a/crm/crmd/lrm.c	Thu Apr 17 18:55:57 2008 +0200
+++ b/crm/crmd/lrm.c	Tue Apr 22 17:48:47 2008 +0900
@@ -943,8 +943,9 @@ cancel_op(lrm_rsc_t *rsc, const char *ke
 		if(key && remove) {
 			delete_op_entry(NULL, rsc->id, key, op);
 		}
+		/* return FALSE to be removed from pending_ops */
 		/* not doing this will block the node from shutting down */
-		g_hash_table_remove(pending_ops, key);
+		return FALSE;
 	}
 	
 	return TRUE;
@@ -954,15 +955,20 @@ gboolean cancel_done = FALSE;
 gboolean cancel_done = FALSE;
 lrm_rsc_t *cancel_rsc = NULL;
 
-static void
+static gboolean
 cancel_action_by_key(gpointer key, gpointer value, gpointer user_data)
 {
 	struct recurring_op_s *op = (struct recurring_op_s*)value;
 	
 	if(safe_str_eq(op->op_key, cancel_key)) {
 		cancel_done = TRUE;
-		cancel_op(cancel_rsc, key, op->call_id, TRUE);
-	}
+		if (!cancel_op(cancel_rsc, key, op->call_id, TRUE)) {
+			/* return TRUE to be removed from pending_ops */
+			/* when the cancellation failed */
+			return TRUE;
+		}
+	}
+	return FALSE;
 }
 
 static gboolean
@@ -976,7 +982,7 @@ cancel_op_key(lrm_rsc_t *rsc, const char
 
 	CRM_CHECK(key != NULL, return FALSE);
 	
-	g_hash_table_foreach(pending_ops, cancel_action_by_key, NULL);
+	g_hash_table_foreach_remove(pending_ops, cancel_action_by_key, NULL);
 
 	if(cancel_done == FALSE && remove) {
 		crm_err("No known %s operation to cancel", key);
@@ -1433,15 +1439,21 @@ send_direct_ack(const char *to_host, con
 	free_xml(update);
 }
 
-static void
+static gboolean
 stop_recurring_action_by_rsc(gpointer key, gpointer value, gpointer user_data)
 {
 	lrm_rsc_t *rsc = user_data;
 	struct recurring_op_s *op = (struct recurring_op_s*)value;
 	
 	if(op->interval != 0 && safe_str_eq(op->rsc_id, rsc->id)) {
-		cancel_op(rsc, key, op->call_id, FALSE);
-	}
+		if (!cancel_op(rsc, key, op->call_id, FALSE)) {
+			/* return TRUE to be removed from pending_ops */
+			/* when the cancellation failed */
+			return TRUE;
+		}
+	}
+
+	return FALSE;
 }
 
 void

Re: [Linux-ha-dev] Re: [RFC] heartbeat-2.1.4

2008-04-22 Thread Keisuke MORI

"Andrew Beekhof" <[EMAIL PROTECTED]> writes:
(snip)
>>  Here's my observation:
>>
>>   - An element of pending_ops is removed at lrm.c:L497
>>   - It is called inside from g_has_table_foreach() at L1475
>>   - This is violating the usage of g_has_table_foreach() according
>>to the glib manual.
>>   - Therefore the iteration can not proceed correctly and would
>>try to refer to a removed element.
>
> Turns out that the Stateful resource in CTS was never getting promoted.
> Once I fixed this, I was able to trigger the bug too (in the last few 
> minutes).

A weird thing is that, it is not reproducable on every environments.

As far as we've tested:
 - it _always_ happens on a RedHat 4 environment.
 - it has _never_ happened on a RedHat 5 environment.

I'm not sure if it's the only difference but
possibly the difference of the glib versions may be related to 
the behavior.


>
> Thanks for your diagnosis and the patch, you've certainly saved me some time 
> :-)
>
>>
>>  http://hg.linux-ha.org/lha-2.1/annotate/333aef5bd4ed/crm/crmd/lrm.c
>>  (...)
>>  946 /* not doing this will block the node from shutting down */
>>  947 g_hash_table_remove(pending_ops, key);
>>  (...)
>>  1475g_hash_table_foreach(pending_ops, 
>> stop_recurring_action_by_rsc, rsc);
>>
>>  
>> http://library.gnome.org/devel/glib/stable/glib-Hash-Tables.html#g-hash-table-foreach
>>  (...)
>>  The hash table may not be modified while iterating over it (you can't 
>> add/remove items).
>>
>>
>>  I also attached my suggested patch, although I can not guarantee
>>  the correctness but just to show you the idea.
>>
>>  Thanks,

-- 
Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [RFC] heartbeat-2.1.4

2008-05-21 Thread Keisuke MORI

Hi,

How is the 2.1.4 release going?
Will it be released soon? or any trouble with it?

I look forward to see it!

Thanks,

Lars Marowsky-Bree <[EMAIL PROTECTED]> writes:

> Hi all,
>
> the Linux-HA project is undergoing some changes, as you've noticed. Not
> all of them have gone as well as expected, and it hasn't stabilized
> yet.
>
> Under guidance with Alan, the project members have met and decided to
> change the governance of the project in the future. This will be
> announced in more detail soon, stay tuned.
>
> We also want to likely make some further changing to the package layout,
> and understand that users, admins and distro maintainers dislike it when
> we do that, so we don't want to make it a habit.
>
> We recognize the needs of our users (I hope!) to receive timely updates,
> and thus have decided to go ahead and propose releasing one more 2.1.4
> (following the 2.1.x package layout) as the last release of that branch
> before the restructuring kicks in completely.
>
> (When we decided to split off pacemaker, we didn't expect that this
> would cause the upstream Linux-HA project to cease releasing
> completely, and unfortunately there's been little discussion on the
> lists regarding this since.)
>
> For SLE10 SP2, it was already too late to change the package layout, so
> I've been backporting changes (which is quite easy with Mercurial) from
> the Pacemaker project, the GUI, and heartbeat-dev into the 2.1.x
> codebase, and done a fair amount of testing on x86, x86-64, s390x.
>
> However, I've been mostly focused on cherry picking what we (as in,
> Novell) needed, so in particular the packaging for non-SUSE dists is
> somewhat neglected in this version.
>
> If other distro maintainers would please help me with fixing up the
> packaging, and more community members would pound on it, I would really
> appreciate this.
>
> My proposal would be to release 2.1.4 by the end of next week
> (2008-04-18). (Mostly because after that I go on vacation ;-)
>
> I know this is a highly condensed schedule and doesn't follow any
> "proper" release methodology. The reasons for this in bullet points:
>
> - It's been too long since the last "official" gasp from the heartbeat
>   project. The code we have is clearly better than 2.1.3, and we should
>   get it to our users ASAP.
>
> - Novell has done a fair amount of testing on it already. The code is
>   good (as in "much better than 2.1.3"), except the packaging.
>
> - The new governance will eventually decide on a new release methodology
>   for the Linux-HA project, I expect, but this will take some more
>   weeks, and I don't want to delay releasing even further.
>
> So, with the above reasoning, I'm volunteering myself - and hijacking
> the vacuum, I acknowledge - to do the 2.1.4 release, as the current
> split hasn't been adopted everywhere yet, 2.1.x is defunc, and our user
> community appears to need it "now" and not in several months.
>
> I'd plan on building the packages for all dists via OBS, if nobody holds
> any strong objections and update the DownloadSoftware page after we
> agree that the 2.1.4 release is good. 
>
> And of course would be much approving of distro maintainers pulling it
> into their official distro repositories too!
>
> So, that said, I've pushed my proposed code to
> http://hg.linux-ha.org/lha-2.1/. It, for reasons outlined above, likely
> doesn't build yet (because the in-tree packaging is broken), but I
> wanted to share the scope of changes with you. 
>
> As a further point of reference, I'm attaching the SLES changes section
> to this mail. (bnc# refers to bugzilla.novell.com.)
>
>
> Let me emphasize strongly that I really don't want to step on anyone's
> toes, or rush the new governance board, but only fill the current void
> until that is actually operational and has settled down, as I suggest
> our users need it.
>
>
> Please comment.
>
>
> Regards,
> Lars
>
> -- 
> Team lead Kernel, SuSE Labs, Research and Development
> SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

-- 
Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [PATCH] IPv6 HBcomm plugin

2008-06-13 Thread Keisuke MORI

Hi,

I've been implementing HBcomm plugin to enable IPv6
communication among the cluster nodes and the ping nodes.

It is still an experimental implementation and
I would appreciate on any feedback.


Thanks,

            
The IPv6 HBcomm plugin usage

1. Building

Apply the attached patch and do './ConfigureMe configure' and make.

The patch is made against the dev branch at:
changeset:   11945:5c915f1d5b7b

It has built and tested on RHEL 5.1.


2. Configuration

The following two directives are available in ha.cf:

 1) mcast6

Use IPv6 multicast for the heartbeat communication between the nodes.
The syntax is same as 'mcast'.

Eg.
mcast6 eth1 ff02::694 694 1 0

Note: Please choose a multicast address that available on
your subnet. The address in the example is not officially
registered to IANA.


 2) ping6

Use an IPv6 address as a ping node.
This is equivalent to 'ping' directive in IPv4.
The syntax is also same as 'ping' except that you can specify a
interface name for the address by concatenating with '%'.

Eg.
ping6 fe80::1:1%eth0

Note: the interface name ("%eth0" above) is mandatory
if you want to ping to a link-local address (by the design of IPv6).
You can omit this part if you're pinging to a global address.


3. TODO / known issues

 - Still experimental status and not completely tested yet.
   Please test yourself and give me your feedback :-).

 - the 'ping_group' equivalent is not implemented.
   (is it possible to use an anycast address instead of this?)

 - ping6: the ioctl() to set a ICMPv6 filter fails.
   It can be ignored but preferable for the optimization.

 - mcast6: the allocated memory for the private area is never freed.
   It would not be a big problem but preferable to fix. Same in 'mcast'.

            


-- 
Keisuke MORI
NTT DATA Intellilink Corporation

diff -r 5c915f1d5b7b lib/plugins/HBcomm/Makefile.am
--- a/lib/plugins/HBcomm/Makefile.am	Wed May 28 09:14:21 2008 +1000
+++ b/lib/plugins/HBcomm/Makefile.am	Wed Jun 04 13:02:19 2008 +0900
@@ -46,6 +46,7 @@ halibdir		= $(libdir)/@HB_PKG@
 halibdir		= $(libdir)/@HB_PKG@
 plugindir		= $(halibdir)/plugins/HBcomm
 plugin_LTLIBRARIES	= bcast.la mcast.la ping.la serial.la ucast.la \
+			  mcast6.la ping6.la \
 			  ping_group.la  $(HBAPING) $(OPENAIS) $(TIPC)
 
 bcast_la_SOURCES	= bcast.c
@@ -80,3 +81,12 @@ tipc_la_SOURCES   	= tipc.c
 tipc_la_SOURCES   	= tipc.c
 tipc_la_LDFLAGS  	= -export-dynamic -module -avoid-version
 tipc_la_LIBADD   	= $(top_builddir)/replace/libreplace.la
+
+mcast6_la_SOURCES	= mcast6.c
+mcast6_la_LDFLAGS	= -export-dynamic -module -avoid-version 
+mcast6_la_LIBADD	= $(top_builddir)/replace/libreplace.la
+
+ping6_la_SOURCES	= ping6.c
+ping6_la_LDFLAGS	= -export-dynamic -module -avoid-version
+ping6_la_LIBADD		= $(top_builddir)/replace/libreplace.la
+
diff -r 5c915f1d5b7b lib/plugins/HBcomm/mcast6.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +
+++ b/lib/plugins/HBcomm/mcast6.c	Thu Jun 05 17:00:20 2008 +0900
@@ -0,0 +1,788 @@
+/*
+ * mcast6.c: implements hearbeat API for UDP/IPv6 multicast communication
+ *
+ * Author:  Keisuke MORI <[EMAIL PROTECTED]>
+ * 
+ * based on mcast.c written by the following authors.
+ * Copyright (C) 2000 Alan Robertson <[EMAIL PROTECTED]>
+ * Copyright (C) 2000 Chris Wright <[EMAIL PROTECTED]>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ * 
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ * 
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef HAVE_SYS_SOCKIO_H
+#	include 
+#endif
+
+#include 
+ 
+#define PIL_PLUGINTYPE  HB_COMM_TYPE
+#define PIL_PLUGINTYPE_SHB_COMM_TYPE_S
+#define PIL_PLUGIN  mcast6
+#define PIL_PLUGIN_S"mcast6"
+#define PIL_PLUGINLICENSE	LICENSE_LGPL
+#define PIL_PLUGINLICENSEURL	URL_LGPL
+#include 
+#include 
+
+struct mcast6_private {
+	char *  interface;  /* Interface name */
+	char *  mcastaddr;  /* multicast address for IPv6 in string */
+	struct  addrinfo *  ad

Re: [Linux-ha-dev] [PATCH] IPv6 HBcomm plugin

2008-06-16 Thread Keisuke MORI

Hi,

Dejan Muhamedagic <[EMAIL PROTECTED]> writes:

> Hi Keisuke-san,
>
> On Fri, Jun 13, 2008 at 06:04:58PM +0900, Keisuke MORI wrote:
>> Hi,
>> 
>> I've been implementing HBcomm plugin to enable IPv6
>> communication among the cluster nodes and the ping nodes.
>> 
>> It is still an experimental implementation and
>> I would appreciate on any feedback.
>
> One quick question: Why did you create a new plugin instead of
> extending the existing ones with IPv6 capabilities?
>

Well, no big reasons but

 - I wanted to preserve the old code untouched for the stability.
 - It would have many if/cases to sepalete AF_INET/AF_INET6 codes,
   particularly in the ping code.
   (the ping6 command exists with the same reason, IIRC)

so I thought that it's not a bad idea to sepalate them at the plugin level.
It's not necessarily so, of course.

Thanks,

Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] IPv6 HBcomm plugin

2008-06-17 Thread Keisuke MORI

"Andrew Beekhof" <[EMAIL PROTECTED]> writes:
> and in case anyone cares... the new pingd tool (the stand-alone
> version that supports both stacks) also supports IPv6

It's something I'm interested in...

Do you have any plan when it will be available?

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] sfex

2008-06-17 Thread Keisuke MORI

Dejan,

Thank you for taking care of it.

Yes, NTT is very glad and agrees to include sfex into the
heartbeat repository!

Dejan Muhamedagic <[EMAIL PROTECTED]> writes:

> Hello,
>
> Since last year NTT designed and implemented sfex, a suite of
> programs to improve shared disk usage (see linux-ha.org/sfex)
> which unfortunately didn't attract attention it deserves. I
> reviewed the code and attached you'll find some comments and some
> simple changes. One general remark: all programs (sfex_*) are
> monolithic and, though they are not that big, it would be
> beneficial to code readers if they were split into more
> units/functions.

That sounds reasonable.
Where can I find your comments and modifications?

>
> A couple of suggestions on making sfex useful in other contexts
> were making a quorum plugin and a HBcomm plugin. Did you
> investigate further these options?

Yes we did but we think that
those would be totally different approach from sfex.

 - a quorum plugin

   A quorum plugin is executed only on 'the cluster leader node' in CCM,
   and it does not care where the resource is running on,
   whereas sfex should run on the same node which the resource
   in question is running on because it's for the protection of
   the data which resides in the resource.

   In other words, sfex is to control with resource granularity,
   whereas a quorum plugin is to control 'the partition' granularity.

 - HBcomm plugin

   I remember that somebody posted this before, called 'dskcm'.
   This is also interesting idea but the approach is very different.

   This approach is:
- having yet another redundant communication path through
  the shared medium.
   whereas sfex's approach is:
- provide a protection method when ALL of the communication
  paths are failed.

   Even though they have the similar goal the functionality is
   very different.

>
> Of course, if you agree, we could include sfex into the heartbeat
> repository.
>
> Cheers,
>
> Dejan
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

Thanks,

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] IPv6 HBcomm plugin

2008-06-17 Thread Keisuke MORI

"Andrew Beekhof" <[EMAIL PROTECTED]> writes:

> On Tue, Jun 17, 2008 at 09:48, Keisuke MORI <[EMAIL PROTECTED]> wrote:
>> "Andrew Beekhof" <[EMAIL PROTECTED]> writes:
>>> and in case anyone cares... the new pingd tool (the stand-alone
>>> version that supports both stacks) also supports IPv6
>>
>> It's something I'm interested in...
>>
>> Do you have any plan when it will be available?
>
> Its already in pacemaker-dev (which I think you're testing already).
> It will also be part of 0.7 (unstable) which will be out this month.


Ok, I didn't realize that it's already in there.
I will take a look at it.

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] sfex

2008-06-18 Thread Keisuke MORI

Hi

Dejan Muhamedagic <[EMAIL PROTECTED]> writes:

> Hi Keisuke-san,
>
> On Tue, Jun 17, 2008 at 05:33:52PM +0900, Keisuke MORI wrote:
>> Dejan,
>> 
>> Thank you for taking care of it.
>> 
>> Yes, NTT is very glad and agrees to include sfex into the
>> heartbeat repository!
>> 
>> Dejan Muhamedagic <[EMAIL PROTECTED]> writes:
>> 
>> > Hello,
>> >
>> > Since last year NTT designed and implemented sfex, a suite of
>> > programs to improve shared disk usage (see linux-ha.org/sfex)
>> > which unfortunately didn't attract attention it deserves. I
>> > reviewed the code and attached you'll find some comments and some
>> > simple changes. One general remark: all programs (sfex_*) are
>> > monolithic and, though they are not that big, it would be
>> > beneficial to code readers if they were split into more
>> > units/functions.
>> 
>> That sounds reasonable.
>> Where can I find your comments and modifications?
>
> A reasonable question :)  Forgot to attach the file with
> comments. Sorry about that. It is in the form of a patch against
> version 1.3.


Thanks, I will look into it.


>
>> > A couple of suggestions on making sfex useful in other contexts
>> > were making a quorum plugin and a HBcomm plugin. Did you
>> > investigate further these options?
>> 
>> 
>> Yes we did but we think that
>> those would be totally different approach from sfex.
>> 
>> 
>>  - a quorum plugin
>> 
>>A quorum plugin is executed only on 'the cluster leader node' in CCM,
>
> I don't think so. CCM delivers connectivity and quorum
> information on each node. However, that's probably not relevant.
>
>>and it does not care where the resource is running on,
>>whereas sfex should run on the same node which the resource
>>in question is running on because it's for the protection of
>>the data which resides in the resource.
>> 
>>In other words, sfex is to control with resource granularity,
>>whereas a quorum plugin is to control 'the partition' granularity.
>
> Right. The point was however to use parts of sfex for the quorum
> functionality. I'll see if I can get back to you with a more
> detailed and specific proposal.


I still don't understand you very well, sorry.
I'd appreciate if you could explain more details.




>
>>  - HBcomm plugin
>> 
>>I remember that somebody posted this before, called 'dskcm'.
>
> Somehow missed that one.
>
>>This is also interesting idea but the approach is very different.
>> 
>>This approach is:
>> - having yet another redundant communication path through
>>   the shared medium.
>>whereas sfex's approach is:
>> - provide a protection method when ALL of the communication
>>   paths are failed.
>> 
>>Even though they have the similar goal the functionality is
>>very different.
>
> Yes. Though again sfex would need to be twisted a bit to provide
> heartbeats over shared storage. I'll take a look at dskcm.
>

It was this:

http://www.gossamer-threads.com/lists/linuxha/dev/39716#39716

Thanks,

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] sfex

2008-06-19 Thread Keisuke MORI

Hi,

"Xinwei Hu" <[EMAIL PROTECTED]> writes:
> I'm the one who opposed sfex in the previous discussion.
>
> My point was simple that:
> """"
> check-and-reserve on disk is not an atomic CAS operation. and lock
> based on that may silently cause data corruption.
> """

sfex doest not rely on the atomicity of "check-and-reserve".
It's always _overwriting_ the control data and the detection of
losing the ownership is done by timeout based.


Indeed it can happen that two nodes try to write the control
data at a same time in a particular condition, but

1) Such situation will not happen on the scenario of the typical
   split-brain condition with sfex. It only can happen in a
   particular condition such as a miss-operation that trys to
   launch two nodes simultaneously _without_ fixing the
   split-brain condition.

2) Even if such situation had occured, sfex resolves it as follows;
   - sfex always writes its control data as "one sector" data
 (512 bytes in most of cases) through the direct I/O.
 That would be a single write request to the disk controller.
   - If two nodes tried to write the data at a same time,
 the request will be serialized in the disk controller, so
 'the latter one' will win.
   - sfex makes sure that the written data is "mine" and
 the "loser" will return an error to prevent from lauching resources.


   
Does it explain to you?

Thanks,


>
> I haven't follow the evolution of sfex though, so things might have
> been changed.
>
> Just FYI.
>
> 2008/6/17 Dejan Muhamedagic <[EMAIL PROTECTED]>:
>> Hello,
>>
>> Since last year NTT designed and implemented sfex, a suite of
>> programs to improve shared disk usage (see linux-ha.org/sfex)
>> which unfortunately didn't attract attention it deserves. I
>> reviewed the code and attached you'll find some comments and some
>> simple changes. One general remark: all programs (sfex_*) are
>> monolithic and, though they are not that big, it would be
>> beneficial to code readers if they were split into more
>> units/functions.
>>
>> A couple of suggestions on making sfex useful in other contexts
>> were making a quorum plugin and a HBcomm plugin. Did you
>> investigate further these options?
>>
>> Of course, if you agree, we could include sfex into the heartbeat
>> repository.
>>
>> Cheers,
>>
>> Dejan
>> ___
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>>
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] BasicSanityCheck fails in lha-2.1

2008-07-29 Thread Keisuke MORI

Dejan,

BasicSanityCheck fails by the permission test of RA
because ocf-tester returns an error at below (line 175)
if nobody user was not allowed to login.

su nobody $agent $action > /dev/null

[EMAIL PROTECTED] su nobody /usr/lib/ocf/resource.d/heartbeat/Dummy meta-data
This account is currently not available.
[EMAIL PROTECTED] grep nobody /etc/passwd
nobody:x:99:99:Nobody:/:/sbin/nologin


How about to use the hacluster user instead as attached?

Thanks,
-- 
Keisuke MORI
NTT DATA Intellilink Corporation

diff -r a8b2fc037b29 tools/ocf-tester.in
--- a/tools/ocf-tester.in	Thu Jul 17 17:01:29 2008 +0900
+++ b/tools/ocf-tester.in	Tue Jul 29 19:58:04 2008 +0900
@@ -168,11 +168,11 @@ lrm_test_command() {
 
 test_permissions() {
 action=meta-data
-msg=${1:-"Testing permissions with uid nobody"}
+msg=${1:-"Testing permissions with uid @HA_CCMUSER@"}
 if [ $verbose -ne 0 ]; then
 	echo $msg
 fi
-su nobody $agent $action > /dev/null
+su @HA_CCMUSER@ $agent $action > /dev/null
 }
 
 test_metadata() {
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] BasicSanityCheck fails in lha-2.1

2008-07-29 Thread Keisuke MORI

Hi Dejan,

Dejan Muhamedagic <[EMAIL PROTECTED]> writes:
> Hi Keisuke-san,
>
> On Tue, Jul 29, 2008 at 08:03:18PM +0900, Keisuke MORI wrote:
>> Dejan,
>> 
>> BasicSanityCheck fails by the permission test of RA
>> because ocf-tester returns an error at below (line 175)
>> if nobody user was not allowed to login.
>> 
>> su nobody $agent $action > /dev/null
>> 
>> [EMAIL PROTECTED] su nobody /usr/lib/ocf/resource.d/heartbeat/Dummy meta-data
>> This account is currently not available.
>> [EMAIL PROTECTED] grep nobody /etc/passwd
>> nobody:x:99:99:Nobody:/:/sbin/nologin
>> 
>> 
>> How about to use the hacluster user instead as attached?
>
> That won't help. nobody was chosen because lrmd runs the
> meta-data action as nobody. The problem here is that su(1)
> requires a shell whereas lrmd doesn't. It looks like the -s
> option could help. Just pushed a patch. Could you please test it
> too.

That works perfectly!

Thanks,

-- 
Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] crm_mon doesn't exit immediately

2008-08-11 Thread Keisuke MORI

Andrew,

If there's no objection I would like to push this patch into
the lha-2.1 repository, but any problem on that?

It seems that the latest pacemaker also presents the same behavior
so I think the both needs to be fixed as well.

Thanks,


"Junko IKEDA" <[EMAIL PROTECTED]> writes:

> Hi,
>
> I found that crm_mon which is included in Pacemaker-dev(2f2343008186) can be
> quitted by Ctrl + C.
> If a back port from Pacemaker to Heartbeat 2.1.4 is better than applying the
> patch,
> We don't care about how to fix this.
>
> Thanks,
> Junko
>
>
>> Can somebody handle this issue?
>> She said that, she couldn't quit crm_mon command with Ctrl+C.
>> I usually use crm_mon with -i option, so I couldn't notice this behavior,
>> but it sure is that crm_mon running with no option wouldn't be stopped by
>> SIGINT.
>> It's odd, right?
>> I think almost all people would expect that Ctrl + C can stop this
> command.
>> See attached her patch.
>> 
>> Thanks,
>> Junko
>> 
>> 
>> > I noticed that crm_mon doesn't exit immediately
>> > when it receive SIGINT in mainloop.
>> > It seems that SIGINT only kills sleep() function...
>> > (Is this caused by something in G_main_add_SignalHandler()?
>> >  Or anything else?)
>> >
>> > So, I modified it to exit wait function
>> > when it is interrupted by a signal.
>> > This patch is for Heartbeat STABLE 2.1 (aae8d51d84ec).
>> > I hope it isn't too late for Heartbeat2.1.4...
>> >
>> >
>> > Regards,
>> > Satomi Taniguchi
>
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Re: [Linux-HA] rsc_order constraints behavior changed?

2008-08-11 Thread Keisuke MORI

Andrew,

I'm also going to backport this fix into lha-2.1. 
If there's any problem could you please let me know.

Thanks,

"Junko IKEDA" <[EMAIL PROTECTED]> writes:

>> >>> If you don't want non_clone_group1 to be restarted when this happens,
>> >>> make the ordering constraint advisory-only by setting adding score="0"
>> >>> to the constraint.
>> >> I tried this configuration, but non_clone_group1 was restarted
>> >> when clone1 resources fail-count was cleared.
>> >
>> > you're right - this appears to be broken :(
>> 
>> fixed in:
>>http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/e4b49e9f957b
>
> Thanks a lot!
> We are planning to offer this function soon,
> so could you push this change into Heartbeat 2.1.4(Stable 2.1)?
>
> Thanks,
> Junko
>
> _______
> Linux-HA mailing list
> [EMAIL PROTECTED]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] duplicate resource active in 2.1.4-RC

2008-08-15 Thread Keisuke MORI

Lars Marowsky-Bree <[EMAIL PROTECTED]> writes:
> On 2008-08-15T11:55:35, Keisuke MORI <[EMAIL PROTECTED]> wrote:
>
>> > I look forward to hearing from Keisuke-san whether this works for them
>> > now!
>> 
>> It does not seem to be fixed right.
>> 
>> It does not cause an assertion failure any more (neither crash ;-),
>> but an invalid clone resource is appeared.
>
> Ah, well. Then we'll have to wait for Andrew to fix it completely.
> Otherwise, the code looks fine here.
>
> Are you using cloned groups in production, btw?

Yes.
More precisely, we once tried to use clones with 2.1.3 in production
but had to suspend to use it because there were some problems.
Now we want to upgrade it to the coming 2.1.4 with using clones.

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] duplicate resource active in 2.1.4-RC

2008-08-17 Thread Keisuke MORI

Lars Marowsky-Bree <[EMAIL PROTECTED]> writes:
> On 2008-08-15T17:52:42, Keisuke MORI <[EMAIL PROTECTED]> wrote:
>
>> More precisely, we once tried to use clones with 2.1.3 in production
>> but had to suspend to use it because there were some problems.
>> Now we want to upgrade it to the coming 2.1.4 with using clones.
>
> _Clones_ by themselves work fine, but cloned groups are the issue. You
> can work around this by not using them ;-)
>

We assume that we would use cloned groups as well,
and therefore we've been doing our tests with a configuration using cloned 
groups.
(and we didn't expect that those behaves differently ;-)


-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] duplicate resource active in 2.1.4-RC

2008-08-18 Thread Keisuke MORI

Andrew,

Thanks for fixing it!

With my quick try, it seems working fine.
I (and a colleague of mine) now continue to test to make sure
that everything works fine. 

Thanks,

"Andrew Beekhof" <[EMAIL PROTECTED]> writes:

> Fixed in:
>http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/2d516888d27c
>
> 2008/8/15 Keisuke MORI <[EMAIL PROTECTED]>:
>>>
>>>> > But I've got PE crash now when I used with clone resources...
>>>> I think the following is the correct fix, but i need to do some more 
>>>> testing
>>>
>>> I've pushed that fix for the fatal assert to both the lha-2.1 tree and
>>> the openSUSE build service.
>>>
>>> I look forward to hearing from Keisuke-san whether this works for them
>>> now!
>>
>> It does not seem to be fixed right.
>>
>> It does not cause an assertion failure any more (neither crash ;-),
>> but an invalid clone resource is appeared.
>>
>> Thanks,
>>
>> --
>> Keisuke MORI
>> NTT DATA Intellilink Corporation
>>
>>
>> ___
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>>
>>
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] duplicate resource active in 2.1.4-RC

2008-08-18 Thread Keisuke MORI

Keisuke MORI <[EMAIL PROTECTED]> writes:
> Andrew,
>
> Thanks for fixing it!
>
> With my quick try, it seems working fine.
> I (and a colleague of mine) now continue to test to make sure
> that everything works fine. 

Just for making sure...
Our tests regarding to clone groups has passed without any problem.
Thank you again for the fix!


And also I would like to say thank you for _everybody_
who helped for the release in various way.
Thank you very much!


>
> Thanks,
>
> "Andrew Beekhof" <[EMAIL PROTECTED]> writes:
>
>> Fixed in:
>>http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/2d516888d27c
>>
>> 2008/8/15 Keisuke MORI <[EMAIL PROTECTED]>:
>>>>
>>>>> > But I've got PE crash now when I used with clone resources...
>>>>> I think the following is the correct fix, but i need to do some more 
>>>>> testing
>>>>
>>>> I've pushed that fix for the fatal assert to both the lha-2.1 tree and
>>>> the openSUSE build service.
>>>>
>>>> I look forward to hearing from Keisuke-san whether this works for them
>>>> now!
>>>
>>> It does not seem to be fixed right.
>>>
>>> It does not cause an assertion failure any more (neither crash ;-),
>>> but an invalid clone resource is appeared.
>>>
>>> Thanks,
>>>
>>> --
>>> Keisuke MORI
>>> NTT DATA Intellilink Corporation
>>>
>>>
>>> ___
>>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>>> Home Page: http://linux-ha.org/
>>>
>>>
>> ___
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>
> -- 
> Keisuke MORI
> NTT DATA Intellilink Corporation
>
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

-- 
Keisuke MORI
Open Source Business Unit
Software Services Integration Business Division
NTT DATA Intellilink Corporation
Tel: +81-3-3534-4810 / Fax: +81-3-3534-4814
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] A STONITH plugin for checking whether the target node is kdumping or not.

2008-10-16 Thread Keisuke MORI

Hi Lars,

When we discussed about this feature at the Cluster Summit,
you mentioned that there're some issues in stonithd regarding to
the STONITH escalation.

Could you summarise the issues again please?
And if you have any particular test cases that may not work well 
in your mind, we will add the test cases and try to fix it.

As long as we've tested so far it seems working fine as expected, though.

Regars,
Keisuke MORI

Satomi TANIGUCHI <[EMAIL PROTECTED]> writes:

> Hi lists,
>
> I'm posting a STONITH plugin which checks whether the target node is kdumping
> or not.
> There are some steps to use this, but I believe this plugin is helpful for
> failure analysis.
> See attached README for details about how to use this.
>
> There are 2 patches.
> The patch named "kdumpcheck.patch" is for Linux-HA-dev(1eae6aaf1af8).
> And the patch named "mkdumprd_for_kdumpcheck.patch" is
> for mkdumprd version 5.0.39.
>
> If you're interested in, please give me your comments.
> Any comments and suggestions are really appreciated.
>
>
> Best Regards,
> Satomi TANIGUCHI

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Re: [Bug 2034] New: IPv6addr aborts in start on x86_64

2009-01-13 Thread Keisuke MORI

Hi,

I've filed the following bug to the bugzilla.

Summary: IPv6addr aborts in start on x86_64
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2034


If there's no objection I will commit the patch to the -dev tree.

Thanks,
-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] re:A patch of tomcat.

2009-02-23 Thread Keisuke MORI

Hi,

Will anybody review this patch?

I can commit it to the -dev if there're no comments.
The patch was well tested and is used with tomcat 5.5 in our environment.

Thanks,

 writes:

> Hi All, 
>
> The patch which solved a new problem was completed.
> The change is the following point. 
>
> 1. Addition of the comment. 
>
> 2. Deletion of the garbage in the log. 
>
> 3. Optional addition. 
>  * catalina_opts - CATALINA_OPTS environment variable. Default is None 
>  * catalina_rotate_log - Control catalina.out logrotation flag. Default is 
> NO. 
>  * catalina_rotatetime - catalina.out logrotation time span(seconds). Default 
> is 86400. 
>
> 4. I summarized redundant pgrep processing in one function.
>
> 5. Revised it so that pgrep was handled in a version of new tomcat definitely.
>  * The new version of tomcat confirmed that there was not a problem with 
> 5.5.27 and version 6.0.28.
>
> 6. For unity, I revised it to use $WGET of ocf_shellfunc.
>
> I attached a patch. 
> Please reflect it in a development version. 
>
> Best Regards,
> Hideo Yamauchi.
>
> --- renayama19661...@ybb.ne.jp wrote:
>
>> Hi,
>> 
>> Sorry
>> 
>> There was a problem to the patch which I attached.
>> When used latest tomcat, RA seem not to be able to handle it well.
>> 
>> I will send the patch which I revised later.
>> 
>> Best Regards,
>> 
>> Hideo Yamauchi.
>> ___
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>> 
>
> _______
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

-- 
Sincerely,

Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] re:A patch of tomcat.

2009-02-25 Thread Keisuke MORI

Hi Dejan,

Thank you for reviewing it.

Commited the revised patch by Yamauchi-san (tomcat.patch-0225) as:
http://hg.linux-ha.org/dev/rev/6cbdca48bf88

Thanks,

Dejan Muhamedagic  writes:

> Hi,
>
> On Tue, Feb 24, 2009 at 12:20:22PM +0900, Keisuke MORI wrote:
>> Hi,
>> 
>> Will anybody review this patch?
>
> I was just reviewing it.
>
>> I can commit it to the -dev if there're no comments.
>> The patch was well tested and is used with tomcat 5.5 in our environment.
>
> Great. I'm attaching a patch which contains just a few
> minor optimizations and some meta-data updates. Please apply it
> after checking it with your tomcat (no tomcats here :).
>
> Cheers,
>
> Dejan
>
>> Thanks,
>> 
>>  writes:
>> 
>> > Hi All, 
>> >
>> > The patch which solved a new problem was completed.
>> > The change is the following point. 
>> >
>> > 1. Addition of the comment. 
>> >
>> > 2. Deletion of the garbage in the log. 
>> >
>> > 3. Optional addition. 
>> >  * catalina_opts - CATALINA_OPTS environment variable. Default is None 
>> >  * catalina_rotate_log - Control catalina.out logrotation flag. Default is 
>> > NO. 
>> >  * catalina_rotatetime - catalina.out logrotation time span(seconds). 
>> > Default is 86400. 
>> >
>> > 4. I summarized redundant pgrep processing in one function.
>> >
>> > 5. Revised it so that pgrep was handled in a version of new tomcat 
>> > definitely.
>> >  * The new version of tomcat confirmed that there was not a problem with 
>> > 5.5.27 and version 6.0.28.
>> >
>> > 6. For unity, I revised it to use $WGET of ocf_shellfunc.
>> >
>> > I attached a patch. 
>> > Please reflect it in a development version. 
>> >
>> > Best Regards,
>> > Hideo Yamauchi.
>> >
>> > --- renayama19661...@ybb.ne.jp wrote:
>> >
>> >> Hi,
>> >> 
>> >> Sorry
>> >> 
>> >> There was a problem to the patch which I attached.
>> >> When used latest tomcat, RA seem not to be able to handle it well.
>> >> 
>> >> I will send the patch which I revised later.
>> >> 
>> >> Best Regards,
>> >> 
>> >> Hideo Yamauchi.

-- 
Keisuke MORI
Open Source Business Unit
Software Services Integration Business Division
NTT DATA Intellilink Corporation
Tel: +81-3-3534-4810 / Fax: +81-3-3534-4814
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Checksum not computed in ICMPv6 neighbor advertisement

2009-06-07 Thread Keisuke MORI

Hi,

2009/6/5 Dejan Muhamedagic :
> Hi Andre,
>
> On Fri, Jun 05, 2009 at 09:34:37AM +, Andre, Pascal wrote:
>> Hi,
>>
>> On an Active/Standby platform (using Linux-HA 2.1.4 RHEL5, in
>> my case), when a fail-over/switch-over is initiated and standby
>> machine takes over the virtual IP (IPv6), IPv6addr broadcasts
>> an ICMPv6 neighbor advertisement message.
>>
>> Unfortunately, this ICMPv6 message has its checksum field set
>> to 0 (i.e. not computed). The message is thus discarded by
>> recipients.
>>
>> Maybe this computation should be done by libnet itself.
>> Unfortunately, without much time to investigate libnet, I've
>> added code in resources/OCF/IPv6addr.c in order to compute the
>> checksum and provide the result to libnet (as a parameter).
>
> Applied. Many thanks for the patch.

That problem was already fixed in:
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2034
so the patch should not be necessary.

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] An OCF RA for syslog-ng

2009-06-11 Thread Keisuke MORI

Hi Dejan,

Do you have any chance to take a look at the syslog-ng OCF RA which
was posted by Takenaka-san before?

http://www.gossamer-threads.com/lists/linuxha/dev/54425

If you are OK, I will commit this to the -dev repository.

Thanks,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] OCF Script for Jboss

2009-06-11 Thread Keisuke MORI

Hi,

I'm posting an OCF RA for JBoss, which was originally posted by Stefan
to the users list, and includes some modifications as suggested by
Takenaka-san:

http://www.gossamer-threads.com/lists/linuxha/users/53969

Stefan,
Do you have any comment on this modification?

Dejan,
Would you please review this RA if you have any chance?

If you are all OK, I will commit the RA to the -dev repository.

Thanks,

-- 
Keisuke MORI


jboss
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] An OCF RA for syslog-ng

2009-06-11 Thread Keisuke MORI

Hi Dejan,

Thank you for your comments.
I will repost the RA after I revise it with your comments.

Thanks,

2009/6/11 Dejan Muhamedagic :
> Hi Keisuke-san,
>
> On Thu, Jun 11, 2009 at 06:16:26PM +0900, Keisuke MORI wrote:
>> Hi Dejan,
>>
>> Do you have any chance to take a look at the syslog-ng OCF RA which
>> was posted by Takenaka-san before?
>>
>> http://www.gossamer-threads.com/lists/linuxha/dev/54425
>
> Attaching the script with comments. Please use diff.
>
> Cheers,
>
> Dejan
>
>> If you are OK, I will commit this to the -dev repository.
>>
>> Thanks,
>> --
>> Keisuke MORI
>> ___
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
>



-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Linux-HA site down?

2009-08-13 Thread Keisuke MORI

Hi,

http://www.linux-ha.org/ seems down today.

Maintenance ? or something trouble?

Thanks,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Linux-HA site down?

2009-08-16 Thread Keisuke MORI

Hi Dejan,

Thank you for your reply.
The web site has apparently recovered now.

Thanks!

2009/8/14 Dejan Muhamedagic :
> Hi,
>
> On Fri, Aug 14, 2009 at 03:11:44PM +0900, Keisuke MORI wrote:
>> Hi,
>>
>> http://www.linux-ha.org/ seems down today.
>>
>> Maintenance ? or something trouble?
>
> No idea. Just sent message to tummy.com people. I'm sure they'll
> have an answer, but it's still early in their time zone.
>
> Thanks,
>
> Dejan
>
>> Thanks,
>> --
>> Keisuke MORI
>> ___
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>



-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Pseudo RAs do not work properly on Corosync stack

2010-03-16 Thread Keisuke MORI

Hi,

Sorry for a bit long mail.
I'm going to describe the issue of the Subject: and would like to
suggest some changes to the agents package (and possibly Pacemaker, too).
I would be grad if you could give me your thought and comments.



A pseudo RA which creates a stat file under HA_RSCTMP
(/var/run/heartbeat/rsctmp), such as Dummy, MailTo, etc. do not
work properly on the Pacemaker+Corosync stack.

When a node crashed and was rebooted, a stale stat file is
left over the reboot and hence the RA misbehaves as if the
resource was already started when the cluster is launched again
for the recovery.

This problem does not occur on Heartbeat stack because
Heartbeat removes HA_RSCTMP when its startup,
while on Pacemaker stack none of Pacemaker/Corosync removes it.

But removing them by Pacemaker does not seem to be correct -
if they were removed at the cluster startup time then the
maintenance mode would no longer work properly.

In my understanding, the "correct" behavior is:
 - They should NOT be removed at the cluster startup time.
 - They should be removed at the OS bootup time.



My suggestion to address this issue is, to fix as the following;

 - 1) change the HA_RSCTMP location to /var/run/resource-agents,
  or wherever a subdirectory right under /var/run.
 - 2) having the directory permission as 01777 (with sticky bit)
 - 3) change IPaddr/SendArp RA not to use its own subdirectory
  but instead, add a prefix for the filename.
 - 4) make /var/run/heartbeat/rsctmp as obsolete;
  Heartbeat/Pacemaker could preserve the current behavior
  for a while for the compatibility.


The basic idea of the changes is that, we're now going to follow the
file removal procedure defined by FHS(Filesystem Hierarchy Standard).

http://www.pathname.com/fhs/pub/fhs-2.3.html#VARRUNRUNTIMEVARIABLEDATA

FHS defines that any files under a subdirectory of /var/run
should be removed at the OS bootup time.

Unfortunately the second level subdirectory is out of the scope and
you can not rely on the removal (and that's the case of
/var/run/heartbeat/rsctmp).


I believe that the impacts for existing RAs are minimum.
If your RA is implemented "correctly" then you need to do nothing -
just notice that the location of the stat file is changed.

If your RA has hardcoded /var/run/heartbeat/rsctmp, or it
creates its own subdirectory, it is encouraged to fix because it
may not work well with the maintenance mode, but you can
continue to use the old rsctmp if you would like.


I would like to hear your thought and comments.

Regards,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Pseudo RAs do not work properly on Corosync stack

2010-03-24 Thread Keisuke MORI

Hi,

2010/3/24 Andrew Beekhof :
>> We'd need to coordinate this with all projects (corosync,
>> pacemaker, heartbeat, glue, agents). That would probably be the
>> most difficult part.
>
> Currently the ais plugin has:
>
>    mkdir(HA_STATE_DIR"/heartbeat", 0755); /* Used by RAs - Leave
> owned by root */
>    mkdir(HA_STATE_DIR"/heartbeat/rsctmp", 0755); /* Used by RAs -
> Leave owned by root */
>
> When you make the change, please also put it in a #define that
> pacemaker can look for during configure.
> That way I can default to the above if I can't find it.
>
> If you do that then upgrading should be pretty trivial.

OK, I will look into it when making changes.

I filed a bugzilla item for this issue:
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2378

Thanks,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [patch] lrmd: fix unnecessary close

2010-03-29 Thread Keisuke MORI

Hi Dejan,

Will you consider the attached patch for lrmd?

It won't be a serious problem but when you try to test with valgrind,
it complains loudly whenever lrmd calls a RA like this:

8<8<8<8<8<8<8<
(6) ==32486== Warning: invalid file descriptor 1014 in syscall close()
(6) ==32496== Warning: invalid file descriptor 1014 in syscall close()
(6) ==32509== Warning: invalid file descriptor 1014 in syscall close()
(6) ==32517== Warning: invalid file descriptor 1014 in syscall close()
(6) ==32524== Warning: invalid file descriptor 1014 in syscall close()
(6) ==32531== Warning: invalid file descriptor 1014 in syscall close()
8<8<8<8<8<----8<8<


Thanks,

-- 
Keisuke MORI


glue103-close-1.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Memory leaks in lrmd/cl_msg

2010-03-30 Thread Keisuke MORI

Hi,

lrmd in glue-1.0.3 has a memory leakage.
To be exact, the leakage is in the cl_msg library.

Please find the detail on the bugzilla item:
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2389

Note that the leakage must have been existing since
the old heartbeat-2.1.4 because the code around here has not
been changed quite a while.

Thanks,

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [Linux-HA] Heads up: upcoming Linux-HA releases

2010-03-30 Thread Keisuke MORI

Hi Dejan,

Will you please consider the patch to Filesystem RA discussed in the
following mail?
http://www.gossamer-threads.com/lists/linuxha/dev/59541#59541

The fix will record any error logs as much as possible and it would be
beneficial
to diagnose when 'a really bad thing' happened.

Thanks,

2010/3/26 Dejan Muhamedagic :
> Hello everyone,
>
> this is to announce that a maintenance release is upcoming for the three
> projects hosted at linux-ha.org (Heartbeat 3.0.3, cluster-glue 1.0.4,
> and resource-agents 1.0.3).
>
> Our tentative release schedule is to publish an RC on Wed, Mar 31 with
> the release following a week later (Wed, Apr 7). As always, the actual
> release dates may vary by a day or two.
>
> There's been some bugfixes particularly in cluster-glue and to
> some extent in resource-agents, plus a couple of init script and
> directory permission fixes in Heartbeat.
>
> We also have a new RA lined up (exportfs from Ben Timby), pretty
> major changes to another (mysql, which Marian Marinov is working
> on), and a new stonith plugin (ippower9258 by Helmut Weymann). We
> are currently undecided whether to include these in the upcoming release
> or defer them for the later release. Anything that the committers
> don't consider ready by the RC, won't make the release. Ben, Marian, and
> everyone reviewing their patches, if you kick in high gear now you're
> upping the chances of getting these new features upstreamed for 1.0.4.
>
> Cheers,
> Florian, Dejan, Lars
> ___
> Linux-HA mailing list
> linux...@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [Pacemaker] Known problem with IPaddr(2)

2010-04-26 Thread Keisuke MORI

Hi,

Regarding to the discussion in the pacemaker ML below,
I would suggest a patch as attached.

The patch includes:
1) Fix IPaddr to return the correct OCF value (It returned 255 when
delete_interface failed).
2) Add a description about the assumption in IPaddr / IPaddr2 meta-data.

Regards,

Keisuke MORI

2010/4/14 Lars Ellenberg :
> On Tue, Apr 13, 2010 at 08:28:09PM +0200, Lars Ellenberg wrote:
>> On Tue, Apr 13, 2010 at 12:10:18PM +0200, Dejan Muhamedagic wrote:
>> > Hi,
>> >
>> > On Mon, Apr 12, 2010 at 05:26:19PM +0200, Markus M. wrote:
>> > > Markus M. wrote:
>> > > >is there a known problem with IPaddr(2) when defining many (in my
>> > > >case: 11) ip resources which are started/stopped concurrently?
>> >
>> > Don't remember any problems.
>> >
>> > > Well... some further investigation revealed that it seems to be a
>> > > problem with the way how the ip addresses are assigned.
>> > >
>> > > When looking at the output of "ip addr", the first ip address added
>> > > to the interface gets the scope "global", all further aliases gets
>> > > the scope "global secondary".
>> > >
>> > > If afterwards the first ip address is removed before the secondaries
>> > > (due to concurrently run of the scripts), ALL secondaries are
>> > > removed at the same time by the "ip" command, leading to an error
>> > > for all subsequent trials to remove the other ip addresses because
>> > > they are already gone.
>> > >
>> > > I am not sure how "ip" decides for the "secondary" scope, maybe
>> > > beacuse the other ip addresses are in the same subnet as the first
>> > > one.
>> >
>> > That sounds bad. Instances should be independent of each other.
>> > Can you please open a bugzilla and attach a hb_report.
>>
>> Oh, that is perfectly expected the way he describes it.
>> The assumption has always been that there is at least one
>> "normal", not managed by crm, address on the interface,
>> so no one will have noticed before.
>>
>> I suggest the following patch,
>> basically doing one retry.
>>
>> For the described scenario,
>> the second try will find the IP already "non existant",
>> and exit $OCF_SUCCESS.
>
> Though that obviously won't make instances independent.
>
> The typical way to achieve that is to have them all as "secondary" IPs.
> Which implies that for successful use of independent IPaddr2 resources
> on the same device, you need at least one "system" IP (as opposed to
> "managed by cluster") on that device.
>
> The first IP assigned will get "primary" status.
> Usually, if you delete a "primary" IP, the kernel will also
> delete all secondary IP addresses.
>
> If using a "system" IP is not an option, here is the alternative:
> "Recent" kernels (a quick check revealed that this setting is around
> since at least 2.6.12) can do "alias promotion", which can be enabled
> using
>        sysctl -w net.ipv4.conf.all.promote_secondaries=1
> (or per device)
>
> In both cases the previously "retry on ip_stop" patch is unnecesssary.
> But won't do any harm, either. Most likely ;-)
>
> Glad that helped ;-)
>
> Somebody please add that to the man page respectively agent meta data...
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
>
> ___
> Pacemaker mailing list: pacema...@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>



-- 
Keisuke MORI


agents-ipaddr-retval.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency

2010-07-22 Thread Keisuke MORI

The attached patch is to remove libnet dependency from IPv6addr RA
by replacing the same functionality using the standard socket API.

Currently there are following problems with resource-agents package:

 - IPv6addr RA requires an extra libnet package on the run-time environment.
  That is pretty inconvenient particularly for RHEL users because
  it's not included in the standard distribution.

 - The pre-built RPMs from ClusterLabs does not include IPv6addr RA.
  This was once reported in the pacemaker list:
  http://www.gossamer-threads.com/lists/linuxha/pacemaker/64295#64295

The patch will resolve those issues.
I believe that none of Pacemaker/Heartbeat related packages would be
depending on libnet library any more after patched.

Regards,

-- 
Keisuke MORI
# HG changeset patch
# User Keisuke MORI 
# Date 1279802861 -32400
# Branch ipv6
# Node ID 40d5dbdca9cc089b6514c7525cd2dbd678299711
# Parent  b3142fd9cc672f2217e632608bc986b46265b193
IPv6addr: remove libnet dependency

diff -r b3142fd9cc67 -r 40d5dbdca9cc configure.in
--- a/configure.in	Fri Jul 16 09:46:38 2010 +0200
+++ b/configure.in	Thu Jul 22 21:47:41 2010 +0900
@@ -607,6 +607,7 @@
   [new_libnet=yes; AC_DEFINE(HAVE_LIBNET_1_1_API, 1, Libnet 1.1 API)],
   [new_libnet=no; AC_DEFINE(HAVE_LIBNET_1_0_API, 1, Libnet 1.0 API)],$LIBNETLIBS)
AC_SUBST(LIBNETLIBS)
+   AC_DEFINE(HAVE_LIBNET_API, 1, Libnet API)
 fi
 
 if test "$new_libnet" = yes; then
@@ -634,7 +635,7 @@
 dnl 
 dnl * Check for netinet/icmp6.h to enable the IPv6addr resource agent
 AC_CHECK_HEADERS(netinet/icmp6.h,[],[],[#include ])
-AM_CONDITIONAL(USE_IPV6ADDR, test "$ac_cv_header_netinet_icmp6_h" = yes -a "$new_libnet" = yes )
+AM_CONDITIONAL(USE_IPV6ADDR, test "$ac_cv_header_netinet_icmp6_h" = yes )
 
 dnl 
 dnl Compiler flags
diff -r b3142fd9cc67 -r 40d5dbdca9cc heartbeat/IPv6addr.c
--- a/heartbeat/IPv6addr.c	Fri Jul 16 09:46:38 2010 +0200
+++ b/heartbeat/IPv6addr.c	Thu Jul 22 21:47:41 2010 +0900
@@ -87,13 +87,25 @@
 
 #include 
 
+#include 
 #include 
+#include 
 #include 
+#include 
 #include 
+#include  /* for inet_pton */
+#include  /* for if_nametoindex */
+#include 
+#include 
+#include 
 #include 
 #include 
+#include 
+#include 
 #include 
+#ifdef HAVE_LIBNET_API
 #include 
+#endif
 
 
 #define PIDFILE_BASE HA_RSCTMPDIR  "/IPv6addr-"
@@ -400,8 +412,11 @@
 	return OCF_NOT_RUNNING;
 }
 
+#ifdef HAVE_LIBNET_API
 /* Send an unsolicited advertisement packet
  * Please refer to rfc2461
+ *
+ * Libnet based implementation.
  */
 int
 send_ua(struct in6_addr* src_ip, char* if_name)
@@ -466,6 +481,108 @@
 	libnet_destroy(l);
 	return status;
 }
+#else /* HAVE_LIBNET_API */
+/* Send an unsolicited advertisement packet
+ * Please refer to rfc4861 / rfc3542
+ *
+ * Libnet independent implementation.
+ */
+int
+send_ua(struct in6_addr* src_ip, char* if_name)
+{
+	int status = -1;
+	int fd;
+
+	int ifindex;
+	int hop;
+	struct ifreq ifr;
+#define HWADDR_LEN 6 /* mac address length */
+	u_int8_t payload[sizeof(struct nd_neighbor_advert)
+			 + sizeof(struct nd_opt_hdr) + HWADDR_LEN];
+	struct nd_neighbor_advert *na;
+	struct nd_opt_hdr *opt;
+	struct sockaddr_in6 src_sin6;
+	struct sockaddr_in6 dst_sin6;
+
+	if ((fd = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6)) == 0) {
+		cl_log(LOG_ERR, "socket(IPPROTO_ICMPV6) failed: %s",
+		   strerror(errno));
+		goto err;
+	}
+	/* set the outgoing interface */
+	ifindex = if_nametoindex(if_name);
+	if (setsockopt(fd, IPPROTO_IPV6, IPV6_MULTICAST_IF,
+		   &ifindex, sizeof(ifindex)) < 0) {
+		cl_log(LOG_ERR, "setsockopt(IPV6_MULTICAST_IF) failed: %s",
+		   strerror(errno));
+		goto err;
+	}
+	/* set the hop limit */
+	hop = 255; /* 255 is required. see rfc4861 7.1.2 */
+	if (setsockopt(fd, IPPROTO_IPV6, IPV6_MULTICAST_HOPS,
+		   &hop, sizeof(hop)) < 0) {
+		cl_log(LOG_ERR, "setsockopt(IPV6_MULTICAST_HOPS) failed: %s",
+		   strerror(errno));
+		goto err;
+	}
+	
+	/* set the source address */
+	memset(&src_sin6, 0, sizeof(src_sin6));
+	src_sin6.sin6_family = AF_INET6;
+	src_sin6.sin6_addr = *src_ip;
+	src_sin6.sin6_port = 0;
+	if (bind(fd, (struct sockaddr *)&src_sin6, sizeof(src_sin6)) < 0) {
+		cl_log(LOG_ERR, "bind() failed: %s", strerror(errno));
+		goto err;
+	}
+
+
+	/* get the hardware address */
+	memset(&ifr, 0, sizeof(ifr));
+	strncpy(ifr.ifr_name, if_name, sizeof(ifr.ifr_name) - 1);
+	if (ioctl(fd, SIOCGIFHWADDR, &ifr) < 0) {
+		cl_log(LOG_ERR, "ioctl(SIOCGIFHWADDR) failed: %s", strerror(errno));
+		goto err;
+	}
+
+	/* build a neighbor advertisement message */
+	memset(&payload, 0, sizeof(payload));
+
+	na = (struct nd_neighbor_advert *)&payload;
+	na->nd_na_type = ND_NEIGHBOR_ADVERT;
+	na->nd_na_code = 0;
+	na-&

Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency

2010-07-23 Thread Keisuke MORI

Hi,

2010/7/23 Simon Horman :
> I will add that libnet seems to be more or less unmaintained.
>
> You seem to make using libnet optional, is there a reason
> not to just remove it? portability?

I just thought that some people may want to preserve the existing
behavior. OpenSUSE has libnet shipped with it for example, and I'm not
sure if they would agree to change the implementation or just want to
keep using libnet.

But ok, If no one has objections I'll revise the patch so that it
removes all libnet code from IPv6addr.c and make it single code.
Any other opinions?

As for portability, I believe that the new implementation is
more portable than using libnet. (cf.
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2034#c10)


>> +#define HWADDR_LEN 6 /* mac address length */
>
> Personally I'd prefer the define outside of the function.

Ok, I just wanted to place them closely but no strong preference.
I'll move it to somewhere around the other macro definitions.

>> +     na->nd_na_target = (*src_ip);
>
> There is no need to enclose *src_ip in brackets.

Right. removing the parens.

>> +     if (sendto(fd, &payload, sizeof(payload), 0,
>> +                (struct sockaddr *)&dst_sin6, sizeof(dst_sin6))
>> +         != sizeof(payload)) {

> Is it valid to assume that there will never be a partial write?

I think that reporting an error is just enough when a partial write
occurred here. The packet is very small (32 bytes) and it should
rarely happen, it will be retried 5 times when it occurred, and if it
still fails then it should be considered that "a really bad things"
happened:-) And also the current libnet code does exactly same as
above inside so no behavior would be changed with this code.

Thanks,

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency

2010-07-26 Thread Keisuke MORI

Hi,

2010/7/23 Lars Ellenberg :
> On Fri, Jul 23, 2010 at 03:04:20PM +0200, Andrew Beekhof wrote:
>> On Fri, Jul 23, 2010 at 5:09 AM, Simon Horman  wrote:
>> > Hi Mori-san,
>> >
>> > I will add that libnet seems to be more or less unmaintained.
>>
>> Someone recently picked it up again, but I'm in favor of the patch for
>> the reasons Mori-san already stated.
>>
>> > You seem to make using libnet optional, is there a reason
>> > not to just remove it? portability?
>>
>> Agreed, lets just drop it.
>
> Ack.

Thanks to Simon, Andrew and Lars for all of your constructive comments.

I've revised the patch so that it drops the old libnet code completely.
Please apply this into the repository.


By the way, do we have any plan to release the next
agents/glue/heartbeat packages from the Linux-HA project?
I think it's good time to consider them for the best use of pacemaker-1.0.9.


> BTW, is it correct that most of it could be done by "ip", similar as
> IPaddr2 does it?  The only think missing would be a send_arp v6.
> Anyone want to write an IPv6addr2? ;-)

"find_if" for IPv6 is also missing if you want to write a script based one.

Thanks,
-- 
Keisuke MORI
# HG changeset patch
# User Keisuke MORI 
# Date 1280134509 -32400
# Branch ipv6
# Node ID 275089e31232b870e4218f7dd930538daa438cbf
# Parent  b3142fd9cc672f2217e632608bc986b46265b193
IPv6addr: remove libnet dependency

diff -r b3142fd9cc67 -r 275089e31232 configure.in
--- a/configure.in	Fri Jul 16 09:46:38 2010 +0200
+++ b/configure.in	Mon Jul 26 17:55:09 2010 +0900
@@ -634,7 +634,7 @@
 dnl 
 dnl * Check for netinet/icmp6.h to enable the IPv6addr resource agent
 AC_CHECK_HEADERS(netinet/icmp6.h,[],[],[#include ])
-AM_CONDITIONAL(USE_IPV6ADDR, test "$ac_cv_header_netinet_icmp6_h" = yes -a "$new_libnet" = yes )
+AM_CONDITIONAL(USE_IPV6ADDR, test "$ac_cv_header_netinet_icmp6_h" = yes )
 
 dnl 
 dnl Compiler flags
diff -r b3142fd9cc67 -r 275089e31232 heartbeat/IPv6addr.c
--- a/heartbeat/IPv6addr.c	Fri Jul 16 09:46:38 2010 +0200
+++ b/heartbeat/IPv6addr.c	Mon Jul 26 17:55:09 2010 +0900
@@ -87,13 +87,22 @@
 
 #include 
 
+#include 
 #include 
+#include 
 #include 
+#include 
 #include 
+#include  /* for inet_pton */
+#include  /* for if_nametoindex */
+#include 
+#include 
+#include 
 #include 
 #include 
+#include 
+#include 
 #include 
-#include 
 
 
 #define PIDFILE_BASE HA_RSCTMPDIR  "/IPv6addr-"
@@ -141,6 +150,8 @@
 const int	UA_REPEAT_COUNT	= 5;
 const int	QUERY_COUNT	= 5;
 
+#define 	HWADDR_LEN 	6 /* mac address length */
+
 struct in6_ifreq {
 	struct in6_addr ifr6_addr;
 	uint32_t ifr6_prefixlen;
@@ -401,69 +412,100 @@
 }
 
 /* Send an unsolicited advertisement packet
- * Please refer to rfc2461
+ * Please refer to rfc4861 / rfc3542
  */
 int
 send_ua(struct in6_addr* src_ip, char* if_name)
 {
 	int status = -1;
-	libnet_t *l;
-	char errbuf[LIBNET_ERRBUF_SIZE];
+	int fd;
 
-	struct libnet_in6_addr dst_ip;
-	struct libnet_ether_addr *mac_address;
-	char payload[24];
 	int ifindex;
+	int hop;
+	struct ifreq ifr;
+	u_int8_t payload[sizeof(struct nd_neighbor_advert)
+			 + sizeof(struct nd_opt_hdr) + HWADDR_LEN];
+	struct nd_neighbor_advert *na;
+	struct nd_opt_hdr *opt;
+	struct sockaddr_in6 src_sin6;
+	struct sockaddr_in6 dst_sin6;
 
-
-	if ((l=libnet_init(LIBNET_RAW6, if_name, errbuf)) == NULL) {
-		cl_log(LOG_ERR, "libnet_init failure on %s", if_name);
+	if ((fd = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6)) == 0) {
+		cl_log(LOG_ERR, "socket(IPPROTO_ICMPV6) failed: %s",
+		   strerror(errno));
 		goto err;
 	}
 	/* set the outgoing interface */
 	ifindex = if_nametoindex(if_name);
-	if (setsockopt(libnet_getfd(l), IPPROTO_IPV6, IPV6_MULTICAST_IF,
+	if (setsockopt(fd, IPPROTO_IPV6, IPV6_MULTICAST_IF,
 		   &ifindex, sizeof(ifindex)) < 0) {
-		cl_log(LOG_ERR, "setsockopt(IPV6_MULTICAST_IF): %s",
+		cl_log(LOG_ERR, "setsockopt(IPV6_MULTICAST_IF) failed: %s",
 		   strerror(errno));
 		goto err;
 	}
-
-	mac_address = libnet_get_hwaddr(l);
-	if (!mac_address) {
-		cl_log(LOG_ERR, "libnet_get_hwaddr: %s", errbuf);
+	/* set the hop limit */
+	hop = 255; /* 255 is required. see rfc4861 7.1.2 */
+	if (setsockopt(fd, IPPROTO_IPV6, IPV6_MULTICAST_HOPS,
+		   &hop, sizeof(hop)) < 0) {
+		cl_log(LOG_ERR, "setsockopt(IPV6_MULTICAST_HOPS) failed: %s",
+		   strerror(errno));
+		goto err;
+	}
+	
+	/* set the source address */
+	memset(&src_sin6, 0, sizeof(src_sin6));
+	src_sin6.sin6_family = AF_INET6;
+	src_sin6.sin6_addr = *src_ip;
+	src_sin6.sin6_port = 0;
+	if (bind(fd, (struct sockaddr *)&src_sin6, sizeof(src_sin6)) < 0) {
+		cl_log(LOG_ERR, &q

Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency

2010-07-26 Thread Keisuke MORI

2010/7/26 Lars Ellenberg :
> On Mon, Jul 26, 2010 at 06:39:50PM +0900, Keisuke MORI wrote:
>> By the way, do we have any plan to release the next
>> agents/glue/heartbeat packages from the Linux-HA project?
>> I think it's good time to consider them for the best use of pacemaker-1.0.9.
>
> I think glue was released by dejan just before he went on vacation,
> though the release announcement is missing (1.0.6).
>
> Heartbeat does not have many changes (appart from some cleanup in the
> build dependencies), so there is no urge to release a 3.0.4, but we
> could do so any time.
>
> Agents has a few fixes, but also has some big changes.
> I have to take an other close look, but yes, I think we should release
> an agents 1.0.4 within the next few weeks.

Great! Then let's go for the next release for agents/heartbeat along with glue.

My most concern about agents is LF#2378:
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2378
It is a change but it's a necessary change to make the maintenance
mode work fine.

For heartbeat, I personally like "pacemaker on" in ha.cf :)

>> "find_if" for IPv6 is also missing if you want to write a script based one.
>
> I'm sure that can be scripted itself around
> ip -o -f inet6 a s | grep ...
>
> but we already sort of agreed that this would
> not be development time well spent.

find_if does more than just grepping. It has to do matching against
"the network address" calculated from the given address and the prefix
to find out which interface would be appropriate to be assigned the
virtual address. The current IPaddr2 also relies on find_if to do
this.

But anyway, I would also agree that we are not going to develop such.
Just off topic.

Thanks,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency

2010-07-27 Thread Keisuke MORI

2010/7/27 Keisuke MORI :
> 2010/7/26 Lars Ellenberg :
>> On Mon, Jul 26, 2010 at 06:39:50PM +0900, Keisuke MORI wrote:
>> Heartbeat does not have many changes (appart from some cleanup in the
>> build dependencies), so there is no urge to release a 3.0.4, but we
>> could do so any time.
(...)
> For heartbeat, I personally like "pacemaker on" in ha.cf :)


I should have mentioned this too, the version number in the log file
from heartbeat 3.0.3 seems incorrect. I want to fix this soon to avoid
confusion.


Jul 20 14:08:50 srv01 heartbeat: [6299]: info: Configuration
validated. Starting heartbeat 3.0.2


Thanks,

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency

2010-07-30 Thread Keisuke MORI

2010/7/27 Andrew Beekhof :
> On Tue, Jul 27, 2010 at 8:44 AM, Keisuke MORI  
> wrote:
>> For heartbeat, I personally like "pacemaker on" in ha.cf :)
>
> One thing thats coming in 1.1.3 is an mcp (master control process) and
> associated init script for pacemaker.
> This means that Pacemaker is started/stopped independently of the
> messaging layer.
>
> Currently this is only written for corosync[1], but I've been toying
> with the idea of extending it to Heartbeat.
> In which case, if you're already changing the option, you might want
> to make it: legacy on/off
> Where "off" would be the equivalent of starting with -M (no resource
> management) but wouldn't spawn any daemons.
>
> Thoughts?

I have a several concerns with that change,

1) Is it possible to recover or cause a fail-over correctly when any
of the Pacemaker/Heartbeat process was failed?
   (In particular, for the failure of the new mcp process of pacemaker
and for the current heartbeat's MCP process failure)

2) Would the daemons used with respawn directive such as hbagent(SNMP
daemon) or pingd work as compatible?

3) After all, what would be the benefit for end users with the change?
   I feel like it's only adding some complexity to the operations and
the diagnostics by the end users.

I guess that I would only use "legacy on" on the heartbeat stack...

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency

2010-08-10 Thread Keisuke MORI

2010/8/11 Simon Horman :
>> > http://hg.linux-ha.org/agents/rev/612e2966f372
>>
>> I've had to commit a small revision, because on IA64, the memory on the
>> stack is not aligned properly for the cast to struct nd_neighbor_advert
>> * - http://hg.linux-ha.org/agents/rev/d206bc8f1303
>>
>> I apologize for the uglyness; it was the only way I could make gcc
>> shutup and get the alignment right. If someone can make the alignment
>> properly on the stack, I'm all ears ...
>
> You are right, that is a bit ugly.
> But I have no better ideas at this time :-(

How about this patch or along this line?
It assumes GCC but ICC should have a similar feature if you want to support it.

Alternatively, having an union with an u_int8_t array and a struct
should make an alignment correctly, I think.

-- 
Keisuke MORI
# HG changeset patch
# User Keisuke MORI 
# Date 1281491442 -32400
# Node ID b12ca86af66197498cbf537ccc7ad4ff56cdf63b
# Parent  d206bc8f13039b332e76a93a86e8e550b67781da
[mq]: ipv6addr-alignment.patch

diff -r d206bc8f1303 -r b12ca86af661 heartbeat/IPv6addr.c
--- a/heartbeat/IPv6addr.c	Mon Aug 09 21:51:19 2010 +0200
+++ b/heartbeat/IPv6addr.c	Wed Aug 11 10:50:42 2010 +0900
@@ -89,7 +89,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -424,10 +423,17 @@
 	int ifindex;
 	int hop;
 	struct ifreq ifr;
-	u_int8_t *payload;
-	intpayload_size;
-	struct nd_neighbor_advert *na;
-	struct nd_opt_hdr *opt;
+
+	/* GCC is assumed.
+	 * If you want to port to other than GCC, make sure that
+	 * the packet is packed correctly.
+	 */ 
+	struct neighbor_advert {
+		struct nd_neighbor_advert na;
+		struct nd_opt_hdr opt;
+		u_int8_t hwaddr[HWADDR_LEN];
+	} __attribute__ ((packed)) payload;
+
 	struct sockaddr_in6 src_sin6;
 	struct sockaddr_in6 dst_sin6;
 
@@ -473,39 +479,27 @@
 	}
 
 	/* build a neighbor advertisement message */
-	payload_size = sizeof(struct nd_neighbor_advert)
-			 + sizeof(struct nd_opt_hdr) + HWADDR_LEN;
-	payload = memalign(sysconf(_SC_PAGESIZE), payload_size);
-	if (!payload) {
-		cl_log(LOG_ERR, "malloc for payload failed");
-		goto err;
-	}
-	memset(payload, 0, payload_size);
+	memset((void *)&payload, 0, sizeof(payload));
 
-	/* Ugly typecast from ia64 hell! */
-	na = (struct nd_neighbor_advert *)((void *)payload);
-	na->nd_na_type = ND_NEIGHBOR_ADVERT;
-	na->nd_na_code = 0;
-	na->nd_na_cksum = 0; /* calculated by kernel */
-	na->nd_na_flags_reserved = ND_NA_FLAG_OVERRIDE;
-	na->nd_na_target = *src_ip;
+	payload.na.nd_na_type = ND_NEIGHBOR_ADVERT;
+	payload.na.nd_na_code = 0;
+	payload.na.nd_na_cksum = 0; /* calculated by kernel */
+	payload.na.nd_na_flags_reserved = ND_NA_FLAG_OVERRIDE;
+	payload.na.nd_na_target = *src_ip;
 
 	/* options field; set the target link-layer address */
-	opt = (struct nd_opt_hdr *)(payload + sizeof(struct nd_neighbor_advert));
-	opt->nd_opt_type = ND_OPT_TARGET_LINKADDR;
-	opt->nd_opt_len = 1; /* The length of the option in units of 8 octets */
-	memcpy(payload + sizeof(struct nd_neighbor_advert)
-			+ sizeof(struct nd_opt_hdr),
-	   &ifr.ifr_hwaddr.sa_data, HWADDR_LEN);
+	payload.opt.nd_opt_type = ND_OPT_TARGET_LINKADDR;
+	payload.opt.nd_opt_len = 1; /* The length of the option in units of 8 octets */
+	memcpy(payload.hwaddr, &ifr.ifr_hwaddr.sa_data, HWADDR_LEN);
 
 	/* sending an unsolicited neighbor advertisement to all */
 	memset(&dst_sin6, 0, sizeof(dst_sin6));
 	dst_sin6.sin6_family = AF_INET6;
 	inet_pton(AF_INET6, BCAST_ADDR, &dst_sin6.sin6_addr); /* should not fail */
 
-	if (sendto(fd, payload, payload_size, 0,
+	if (sendto(fd, (void *)&payload, sizeof(payload), 0,
 		   (struct sockaddr *)&dst_sin6, sizeof(dst_sin6))
-	!= payload_size) {
+	!= sizeof(payload)) {
 		cl_log(LOG_ERR, "sendto(%s) failed: %s",
 		   if_name, strerror(errno));
 		goto err;
@@ -515,7 +509,6 @@
 
 err:
 	close(fd);
-	free(payload);
 	return status;
 }
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Next release from Linux-HA? (was: [PATCH] IPv6addr: removing libnet dependency)

2010-10-15 Thread Keisuke MORI

Hi Lars,

We have talked about the next release of heartbeat/resource-agents
packages while ago.
As Pacemaker-1.0.10 is about to release soon, I think it's  good time
to release those packages now too for the best use of Pacemaker.

I think that heartbeat-3.0.4 / resource-agents-1.0.4 should be
released at least because it has already been 6 months since the last
release.

How do you think about it and when can we release the packages?

Regards,
Keisuke MORI

2010/7/27 Lars Ellenberg :
> On Tue, Jul 27, 2010 at 04:12:34PM +0900, Keisuke MORI wrote:
>> 2010/7/27 Keisuke MORI :
>> > 2010/7/26 Lars Ellenberg :
>> >> On Mon, Jul 26, 2010 at 06:39:50PM +0900, Keisuke MORI wrote:
>> >> Heartbeat does not have many changes (appart from some cleanup in the
>> >> build dependencies), so there is no urge to release a 3.0.4, but we
>> >> could do so any time.
>> (...)
>> > For heartbeat, I personally like "pacemaker on" in ha.cf :)
>>
>>
>> I should have mentioned this too, the version number in the log file
>> from heartbeat 3.0.3 seems incorrect. I want to fix this soon to avoid
>> confusion.
>>
>> 
>> Jul 20 14:08:50 srv01 heartbeat: [6299]: info: Configuration
>> validated. Starting heartbeat 3.0.2
>
> Yes, I know. Not a problem.
> Needs to be changed in configure.ac before the 3.0.4 release.
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> _______
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>



-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [PATCH] Heartbeat: remove an assertion fail in pacemaker

2010-10-25 Thread Keisuke MORI

Hi,

The recent heartbeat on the tip would cause an assertion fail in
pacemaker-1.0 and generate a core:
{{{
Oct 25 17:15:08 srv02 cib: [31333]: ERROR: crm_abort:
crm_glib_handler: Forked child 31338 to record non-fatal assert at
utils.c:449 : g_main_loop_is_running: assertion `loop != NULL' failed
Oct 25 17:15:08 srv02 cib: [31333]: ERROR: crm_abort:
crm_glib_handler: Forked child 31339 to record non-fatal assert at
utils.c:449 : g_main_loop_is_running: assertion `loop != NULL' failed
Oct 25 17:15:11 srv02 crmd: [31337]: ERROR: crm_abort:
crm_glib_handler: Forked child 31341 to record non-fatal assert at
utils.c:449 : g_main_loop_is_running: assertion `loop != NULL' failed
Oct 25 17:15:11 srv02 crmd: [31337]: ERROR: crm_abort:
crm_glib_handler: Forked child 31342 to record non-fatal assert at
utils.c:449 : g_main_loop_is_running: assertion `loop != NULL' failed
}}}


This seems introduced by the following changeset:
http://hg.linux-ha.org/dev/rev/231b0b8555be

The stack trace and my suggested patch are attached.

The changeset in question had changed to use get_next_random() here
which eventually calls g_main_loop_is_running() but it may fail
because g_main_loop is not initialized yet in cib/crmd.

My suggested patch would just revert the old behavior but only changes
the delay as 50ms.

Thanks,

-- 
Keisuke MORI
(gdb) where
#0  0x00669410 in __kernel_vsyscall ()
#1  0x00692df0 in raise () from /lib/libc.so.6
#2  0x00694701 in abort () from /lib/libc.so.6
#3  0x00c0d82f in crm_abort (file=0xc26955 "utils.c", 
function=0xc26dda "crm_glib_handler", line=449, 
assert_condition=0x8933d58 "g_main_loop_is_running: assertion `loop != 
NULL' failed", do_core=1, do_fork=1) at utils.c:1382
#4  0x00c09f05 in crm_glib_handler (log_domain=0x167686 "GLib", 
flags=G_LOG_LEVEL_CRITICAL, 
message=0x8933d58 "g_main_loop_is_running: assertion `loop != NULL' 
failed", user_data=0x0) at utils.c:449
#5  0x00143b67 in g_logv () from /lib/libglib-2.0.so.0
#6  0x00143d39 in g_log () from /lib/libglib-2.0.so.0
#7  0x00143e1b in g_return_if_fail_warning () from /lib/libglib-2.0.so.0
#8  0x0013981b in g_main_loop_is_running () from /lib/libglib-2.0.so.0
#9  0x00880811 in get_more_random () at cl_random.c:95
#10 0x00880945 in cl_init_random () at cl_random.c:128
#11 0x00880644 in gen_a_random () at cl_random.c:68
#12 0x00880896 in get_next_random () at cl_random.c:106
#13 0x00fdbabb in get_clientstatus (lcl=0x8931bd8, host=0x0, 
clientid=0x805b779 "cib", timeout=-1) at client_lib.c:974
#14 0x080557ee in cib_init () at main.c:461
#15 0x08054c4b in main (argc=1, argv=0xbfcd6124) at main.c:218
(gdb) 
# HG changeset patch
# User Keisuke MORI 
# Date 1288003477 -32400
# Node ID 96b67422b12814f64dc7dd61c670801c7ba213b6
# Parent  82fc843fbcf9733e50bbc169c95e51b6c7f97c54
Medium: reduce max delay in get_client_status (revised 231b0b8555be)
revert the old code to avoid calling g_main_loop_is_running()
which may fail when used in Pacemaker cib/crmd.

diff -r 82fc843fbcf9 -r 96b67422b128 lib/hbclient/client_lib.c
--- a/lib/hbclient/client_lib.c	Mon Oct 04 22:12:37 2010 +0200
+++ b/lib/hbclient/client_lib.c	Mon Oct 25 19:44:37 2010 +0900
@@ -966,16 +966,6 @@ get_nodesite(ll_cluster_t* lcl, const ch
 * Return the status of the given client.
 */
 
-#ifndef HAVE_CL_RAND_FROM_INTERVAL
-/* you should grab latest glue headers! */
-static inline int cl_rand_from_interval(const int a, const int b)
-{
-	/* RAND_MAX may be INT_MAX, or (b-a) may be huge. */
-	long long r = get_next_random();
-	return a + (r * (b-a) + RAND_MAX/2)/RAND_MAX;
-}
-#endif
-
 static const char *
 get_clientstatus(ll_cluster_t* lcl, const char *host
 ,		const char *clientid, int timeout)
@@ -1027,8 +1017,9 @@ get_clientstatus(ll_cluster_t* lcl, cons
 		 * in a 100-node cluster, the max delay is 5 seconds
 		 */
 		num_nodes = get_num_nodes(lcl);
-		max_delay = num_nodes * 5;
-		delay = cl_rand_from_interval(0, max_delay);
+		max_delay = num_nodes * 5; /* in microsecond*/
+		srand(cl_randseed());
+		delay = (1.0* rand()/RAND_MAX)*max_delay;
 		if (ANYDEBUG){
 			cl_log(LOG_DEBUG, "Delaying cstatus request for %d ms", delay/1000);
 		}
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] Heartbeat: remove an assertion fail in pacemaker

2010-11-09 Thread Keisuke MORI

Lars,

2010/10/27 Lars Ellenberg :
> On Mon, Oct 25, 2010 at 08:21:26PM +0900, Keisuke MORI wrote:
>> Hi,
>>
>> The recent heartbeat on the tip would cause an assertion fail in
>> pacemaker-1.0 and generate a core:
(snip)
> I don't care for the "get_more_random()" stuff and
> keeping 100 "random" values prepared for get_next_random,
> that is probably just academic sugar, anyways.
>
> If it does not work, we throw it all out, or fix it.

Ok, then let's just drop the changeset.

I agree that srand should not be called many times,
but I would rather prefer to just keep the existing behavior
since there have been no problems with that so far.


>
> I object to calling srand many times.
> Actually we should only call it once,
> we still call it in too many places.
>
> I found the get_next_random() function to apparently properly wrap
> around a "static int inityet" and do the srand only once,
> so I just used it.
>
> Would it help to call g_main_loop_new() earlier?
> Can we more cleanly catch the "no GMainLoop there yet" in
> get_more_random()?
>
> Should we just drop get_next_random() from cl_rand_from_interval?
> Or drop it altogether along with get_more_random and its static
> array -- it's not as if generating random numbers was performance
> critical in any way, is it.

It could possibly help, but I don't think it's worth to do it right now.

Any other backlogs to release the heartbeat packeage?
I would look forward to be released it soon!

Thanks,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] Heartbeat: remove an assertion fail in pacemaker

2010-11-15 Thread Keisuke MORI

2010/11/14 Lars Ellenberg :
> On Tue, Nov 09, 2010 at 06:06:30PM +0900, Keisuke MORI wrote:

>> Ok, then let's just drop the changeset.
>>
>> I agree that srand should not be called many times,
>> but I would rather prefer to just keep the existing behavior
>> since there have been no problems with that so far.
>
> Ok, I'll revert it for now.

Thanks, I confirmed that the problem went away.

>
> But I'd rather have it working there.
> Would this patch to the cib do the right thing?

The patch actually didn't work. I've looked into the code more and now
I realized that the existence of the mainloop is not the issue here;
the g_main_loop_is_running() _always_ fails when NULL is passed.

glue/lib/clplumbing/cl_random.c:

static void
get_more_random(void)
{
if (randgen_scheduled || IS_QUEUEFULL) {
return;
}
if (g_main_loop_is_running(NULL)) {
randgen_scheduled = TRUE;
Gmain_timeout_add_full(G_PRIORITY_LOW+1, 10, add_a_random, 
NULL, NULL);
}
}

By looking at the source code of glib it looks like this:
http://git.gnome.org/browse/glib/tree/glib/gmain.c#n3157

gboolean
g_main_loop_is_running (GMainLoop *loop)
{
  g_return_val_if_fail (loop != NULL, FALSE);
  g_return_val_if_fail (g_atomic_int_get (&loop->ref_count) > 0, FALSE);

  return loop->is_running;
}

I'm wondering the 'get_more_random()' logic had ever worked before.

So the proper fix here would be, in my opinion,
just removing the 'get_more_random()' logic in the cluster-glue code.
It does not make sense for me that the g_mainloop is required just for
getting a random value:)

The Heartbeat code should still support the current version of
cluster-glue, so I think that the current code in the repository is
just good for the coming 3.0.4.

>> Any other backlogs to release the heartbeat packeage?
>> I would look forward to be released it soon!
>
> Me too. Alas ...
> We try to get it out until next Friday (19th November)

Great!
Thank you for all your effort for the release!

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] nginx resource agent

2011-01-03 Thread Keisuke MORI

Hi Alan,

2011/1/2 Alan Robertson :
> On 12/14/2010 02:42 AM, Dejan Muhamedagic wrote:
>>>    #
>>>    # I'm not convinced this is a wonderful idea (AlanR)
>>>    #
>>>    for sig in SIGTERM SIGHUP SIGKILL
>>>    do
>>>      if
>>>        pgrep -f "$NGINXD.*$CONFIGFILE">/dev/null
>>>      then
>>>        pkill -$sig -f $NGINXD.*$CONFIGFILE>/dev/null
>>>        ocf_log info "nginxd children were signalled ($sig)"
>>>        sleep 1
>>>      else
>>>        break
>>>      fi
>>>    done
>> Can't recall anymore the details, there was a bit of discussion
>> on the matter a few years ago, but NTT insisted on killing httpd
>> children. Or do you mind the implementation?
>
> Hi Dejan,
>
> I know it's been a long time.  Sorry about that.  If I _hated_ the idea,
> I would have left it out.  It definitely leaves me feeling a bit
> unsettled.  If it causes a problem, it will no doubt eventually show
> up.  It looks like it's just masking a bug in Apache - that is, that
> giving it a shutdown request doesn't really work...

The relevant discussion is this:
http://www.gossamer-threads.com/lists/linuxha/dev/44395#44395
http://developerbugs.linux-foundation.org//show_bug.cgi?id=1800

The intention of the code is to allow to restart the service if the
Apache main process was failed in some reason (maybe a bug in Apache,
maybe the OOM killer or whatever). It's not for masking a bug in
Apache - it's just trying to clean up and continue the service without
the manual intervention as possible.

> Perhaps I shouldn't have kept it in the nginx code - since it does seem
> to be a bit specific to some circumstance in Apache...  On the other
> hand, it shouldn't hurt anything either...

You may want to see what happens if the nginx process was accidentally killed.
I'm not familiar with nginx at all, but in the case of Apache, the
children would keep running and they prevent to restart another Apache
instance until you kill all the orphaned processes manually.
If nginx is a single process application, then I think that the code
should not be necessary.

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [ha-wg-technical] resource agents 3.9.1rc1 release

2011-06-08 Thread Keisuke MORI

Hi,

Thank you for all your efforts for the new release.


2011/6/7 Fabio M. Di Nitto :
> Several changes have been made to the build system and the spec file to
> accommodate both projects´ needs. The most noticeable change is the
> option to select "all", "linux-ha" or "rgmanager" resource agents at
> configuration time, which will also set the default for the
> spec file.

Why is the ldirectord package disabled on RHEL environment?
I would expect that it would be built as same as (linux-ha)
resource-agents-1.0.4
so that we can use the upcoming 3.9.1 as the upgrade version.

We still use the resource-agents/ldirectord on many RHEL systems and
if it was missing
we can not upgrade them anymore.

from resource-agents.spec.in :
 --- --- ---  --- --- ---   --- --- ---
%if %{with linuxha}
%if 0%{?rhel} == 0
%package -n ldirectord
 --- --- ---  --- --- ---   --- --- ---


> NOTE: About the 3.9.x version (particularly for linux-ha folks): This
> version was chosen simply because the rgmanager set was already at
> 3.1.x. In order to make it easier for distribution, and to keep package
> upgrades linear, we decided to bump the number higher than both
> projects. There is no other special meaning associated with it.
>
> The final 3.9.1 release will take place soon.

BTW why not 4.0? :)
just curious though.


Regards,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [ha-wg-technical] resource agents 3.9.1rc1 release

2011-06-08 Thread Keisuke MORI

Hi,

2011/6/8 Fabio M. Di Nitto :
> Hi,
>
> On 6/8/2011 10:16 AM, Keisuke MORI wrote:
>> Hi,
>>
>> Thank you for all your efforts for the new release.
>>
>>
>> 2011/6/7 Fabio M. Di Nitto :
>>> Several changes have been made to the build system and the spec file to
>>> accommodate both projects´ needs. The most noticeable change is the
>>> option to select "all", "linux-ha" or "rgmanager" resource agents at
>>> configuration time, which will also set the default for the
>>> spec file.
>>
>> Why is the ldirectord package disabled on RHEL environment?
>> I would expect that it would be built as same as (linux-ha)
>> resource-agents-1.0.4
>> so that we can use the upcoming 3.9.1 as the upgrade version.
>
> Because ldirectord requires libnet to build and libnet is not available
> on default RHEL (unless you explicitly enable EPEL).

ldirectord requires no extra packages to build on RHEL. It just a perl script.
You may be concerned about the running environment;  it requires perl-MailTools
at least which can be obtained only from EPEL or CentOS extras, but
ldirectord users
have been already doing to collect such packages when they want to use it.

I can provide a patch to the spec file if it's ok to build.

Note that the (linux-ha) resource-agents should have been completely independent
from libnet as of 1.0.4. Before that IPv6addr RA was the only
dependency of libnet.

Thanks,

>
> Florian, last time we spoke, we were trying to avoid adding BR on
> packages that are not part of RHEL, but then to build linux-ha agents we
> need cluster-glue* that are not part of RHEL anyway.
>
> We should be consistent here.
>
> I am ok to allow people to build ldirectord.
>
>>
>> We still use the resource-agents/ldirectord on many RHEL systems and
>> if it was missing
>> we can not upgrade them anymore.
>
> Understood, we are still smoothing a few corners after the merge. It´s
> good people are spotting those bits.
>




-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [ha-wg-technical] resource agents 3.9.1rc1 release

2011-06-08 Thread Keisuke MORI

Hi,

2011/6/8 Fabio M. Di Nitto :
>>>> Why is the ldirectord package disabled on RHEL environment?
>>>> I would expect that it would be built as same as (linux-ha)
>>>> resource-agents-1.0.4
>>>> so that we can use the upcoming 3.9.1 as the upgrade version.
>>>
>>> Because ldirectord requires libnet to build and libnet is not available
>>> on default RHEL (unless you explicitly enable EPEL).
>>
>> ldirectord requires no extra packages to build on RHEL. It just a perl 
>> script.
>> You may be concerned about the running environment;  it requires 
>> perl-MailTools
>> at least which can be obtained only from EPEL or CentOS extras, but
>> ldirectord users
>> have been already doing to collect such packages when they want to use it.
>>
>> I can provide a patch to the spec file if it's ok to build.
>>
>> Note that the (linux-ha) resource-agents should have been completely 
>> independent
>> from libnet as of 1.0.4. Before that IPv6addr RA was the only
>> dependency of libnet.

> Whops.. yes you are absolutely right. I got confused between the IPAddr
> and ldirectord.
>
> Yes you can either send me a patch, or I can do it. It´s really piece of
> cake.

Ok, I would suggest the attached patch for resolving this particular issue,
but I think there are still some issues left;

1) I'm wondering why this condition is needed; I think we can always use
   %{_var}/run/resource-agents in the current version.

%if 0%{?fedora} >= 11 || 0%{?centos_version} > 5 || 0%{?rhel} > 5
%dir %{_var}/run/heartbeat/rsctmp
%else
%dir %attr (1755, root, root)   %{_var}/run/resource-agents
%endif


2) duplicated man8/ldirectord.8.gz is included both in resource-agents
   and ldirectord packages. it should not be a big problem though.

%{_mandir}/man8/*.8*
(...)
%{_mandir}/man8/ldirectord.8*


3) It can not build on RHEL5 with this error. I'd be glad if there is
   some kind of backward compatibility.

%if 0%{?suse_version} == 0 && 0%{?fedora} == 0 && 0%{?centos_version}
== 0 && 0%{?rhel} == 0
%{error:Unable to determine the distribution/version. This is
generally caused by missing /etc/rpm/macros.dist. Please install the
correct build packages or define the required macros manually.}


Regards,
-- 
Keisuke MORI
diff --git a/resource-agents.spec.in b/resource-agents.spec.in
index 8b39b3f..7dc6670 100644
--- a/resource-agents.spec.in
+++ b/resource-agents.spec.in
@@ -106,7 +106,6 @@ High Availability environment for both Pacemaker and rgmanager
 service managers.
 
 %if %{with linuxha}
-%if 0%{?rhel} == 0
 %package -n ldirectord
 License:	GPLv2+
 Summary:	A Monitoring Daemon for Maintaining High Availability Resources
@@ -136,7 +135,6 @@ lditrecord is simple to install and works with the heartbeat code
 
 See 'ldirectord -h' and linux-ha/doc/ldirectord for more information.
 %endif
-%endif
 
 %prep
 %if 0%{?suse_version} == 0 && 0%{?fedora} == 0 && 0%{?centos_version} == 0 && 0%{?rhel} == 0
@@ -194,11 +192,6 @@ make install DESTDIR=%{buildroot}
 rm -rf %{buildroot}/usr/share/doc/resource-agents
 
 %if %{with linuxha}
-%if 0%{?rhel} != 0
-# ldirectord isn't included on RHEL
-find %{buildroot} -name 'ldirectord.*' -exec rm -f {} \;
-find %{buildroot} -name 'ldirectord' -exec rm -f {} \;
-%endif
 
 %if 0%{?suse_version}
 test -d %{buildroot}/sbin || mkdir %{buildroot}/sbin
@@ -270,7 +263,6 @@ rm -rf %{buildroot}
 %{_libdir}/heartbeat/findif
 %{_libdir}/heartbeat/tickle_tcp
 
-%if 0%{?rhel} == 0
 %if 0%{?suse_version}
 %preun -n ldirectord
 %stop_on_removal ldirectord
@@ -303,7 +295,6 @@ rm -rf %{buildroot}
 /usr/lib/ocf/resource.d/heartbeat/ldirectord
 %endif
 %endif
-%endif
 
 %changelog
 * @date@ Autotools generated version  - @version@-@specver@-@numcomm@.@alphatag@.@dirty@
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [ha-wg-technical] resource agents 3.9.1rc1 release

2011-06-08 Thread Keisuke MORI

2011/6/8 Digimer :
> On 06/08/2011 09:48 AM, Florian Haas wrote:
>> I realize I'm bikeshedding, but my preference would be for 3.9 for this
>> one, and 4.0 to implement the new standard. Like Fabio originally suggested.
>>
>> Cheers,
>> Florian
>
> Given that "x.0" has long meant "new stuff", I'd like to stick with the
> 3.9.x.

About the bikeshed's color:) I don't mind either one,
I just wanted to know what's the reason behind and now all clear for me.

Thank,s
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] regressions in resource-agents 3.9.1

2011-06-22 Thread Keisuke MORI

2011/6/22 Florian Haas :
> On 2011-06-22 11:48, Dejan Muhamedagic wrote:
>> Hello all,
>>
>> Unfortunately, it turned out that there were two regressions in
>> the 3.9.1 release:
>>
>> - iscsi on platforms which run open-iscsi 2.0-872 (see
>>   http://developerbugs.linux-foundation.org/show_bug.cgi?id=2562)
>>
>> - pgsql probes with shared storage (iirc), see
>>   http://marc.info/?l=linux-ha&m=130858569405820&w=2
>>
>> Thanks to Vadym Chepkov for finding and reporting them.
>>
>> I'd suggest to make a quick fix release 3.9.2.
>>
>> Opinions?
>
> Agree.

+1

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] regressions in resource-agents 3.9.1

2011-06-27 Thread Keisuke MORI

Hi,

Is there any backlogs for the 3.9.2 release?
I'm very looking forward to see it soon since 3.9.1 was not really
usable for me...

Thanks,

2011/6/22 Dejan Muhamedagic :
> Hi all,
>
> On Wed, Jun 22, 2011 at 11:22:48PM +0900, Keisuke MORI wrote:
>> 2011/6/22 Florian Haas :
>> > On 2011-06-22 11:48, Dejan Muhamedagic wrote:
>> >> Hello all,
>> >>
>> >> Unfortunately, it turned out that there were two regressions in
>> >> the 3.9.1 release:
>> >>
>> >> - iscsi on platforms which run open-iscsi 2.0-872 (see
>> >>   http://developerbugs.linux-foundation.org/show_bug.cgi?id=2562)
>> >>
>> >> - pgsql probes with shared storage (iirc), see
>> >>   http://marc.info/?l=linux-ha&m=130858569405820&w=2
>> >>
>> >> Thanks to Vadym Chepkov for finding and reporting them.
>> >>
>> >> I'd suggest to make a quick fix release 3.9.2.
>> >>
>> >> Opinions?
>> >
>> > Agree.
>>
>> +1
>
> OK. Let's do that on Friday morning. Tomorrow is holiday here.
>
> Cheers,
>
> Dejan
>
>> --
>> Keisuke MORI
>> ___
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>



-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [Linux-HA] [ha-wg] CFP: HA Mini-Conference in Prague on Oct 25th

2011-10-14 Thread Keisuke MORI

Hi,

2011/10/10 Dejan Muhamedagic :
> On Sun, Oct 09, 2011 at 11:28:41PM +1100, Andrew Beekhof wrote:
>> On Sat, Oct 8, 2011 at 6:03 AM, Digimer  wrote:
>> > On 10/07/2011 02:58 PM, Florian Haas wrote:
>> >> Vienna before the early afternoon of Saturday the 29th, so if anyone has
>> >> plans to do something interesting that Saturday morning I'd be more than
>> >> happy to join.
>> >>
>> >> Cheers,
>> >> Florian
>> >
>> > I'm going to be in the city all day Saturday as well.
>> >
>> > Knowing there will be at least a few who will have trouble making the
>> > unofficial meeting on the 26th,
>>
>> The 26th is just the meeting start.
>
> It's not 26th, but 25th. It also says so in the subject line.
> I'll be in Prague only on 25th.

I'm trying to settle my schedule being in Prague from 25th to 28th.
See you everybody over there.

Thanks,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] LVS support for IPv6

2011-11-15 Thread Keisuke MORI

Hello all,

I would like to use LVS Direct Routing configuration on IPv6
but I encountered a problem as described below.

I'm now going to fix it but I've found that there may be a several
arguments to decide how it should be fix, so
I would like to ask everybody's opinion before I proceed.

Please give me your thoughts, comments about how we should fix it.


Symptom:

On IPv4, I have been using IPaddr2 RA for LVS DR and it works like a charm.
On IPv6, I tried to use IPv6addr RA for the virtual IPv6
address with a similar configuration as IPv4 but
the address would not become reachable from another node.

The ip command shows that the duplicate address has assigned to both lo
and ethX and one has 'dadfailed' flag (Duplicate Address Detection
defined in RFC4862).

# ip addr show
1: lo:  mtu 16436 qdisc noqueue state UNKNOWN
inet6 2004::210/128 scope global
   valid_lft forever preferred_lft forever
(...)
5: eth3:  mtu 1500 qdisc mq state
UP qlen 1000
inet6 2004::210/64 scope global tentative dadfailed
   valid_lft forever preferred_lft forever


Arguments:

1) Which RA should be improved? IPaddr2 or IPv6addr?

Obviously we have two approaches to fix this and each has pros/cons.

 a) improve IPadd2 to support IPv4/IPv6 dual stack
 b) improve IPv6addr to remove a duplicate IPv6 address on the loopback.

 As for a),
   pros: easy to maintain as a single code base.
 uniform behavior between IPv4/IPv6 since ip command already
supports dual stack.
   cons: it changes the policy of the recommended RA for IPv6 on Linux.
 need a new binary for the replacement of send_arp for IPv6.

 As for b),
   pros: no changes for the existing IPaddr2.
   cons: need to implement "lvs_support=true" equivalent feature in C,
 which may make the code to maintain harder.


2) Is "lvs_support=true" functionality really necessary?

When I use IPaddr2 for LVS on IPv4, it's been working perfectly
*without* "lvs_support=true".
In this case the same IP addresses are assigned to both lo and ethX, and
still works everything fine.

Addition to this, the latest IPaddr2 has a bug and it does not
remove the IP address on lo even if "lvs_support=true".
This was reported once before:
http://www.gossamer-threads.com/lists/linuxha/pacemaker/71106#71106


On IPv6, I also tried assigning an IPv6 address to both lo and ethX by ip
command manually, and it seems to work fine as same as IPv4.
The weird 'dadfailed' flag was not seen when I use ip command.



Proposed solution:

As considering all arguments above, I would like to suggest the
following modification:

 - improve IPaddr2 as IPv4/IPv6 dual stack support.
 - recommend to use IPaddr2 both for IPv4/IPv6 on Linux in the future.
   IPaddr/IPv6addr would be only left for legacy and cross platform support.
 - "lvs_support=true" option would be deprecated and no longer necessary.

Any opinions, suggestions are appreciated.
I will work on it after we all agree on how we should fix it.


Regards,

Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [GIT PULL] Medium: IPv6addr: handle a link-local address properly in send_ua

2011-11-23 Thread Keisuke MORI

Dejan,

Would you consider to pull this patch to resolve issue #29 on the github?
https://github.com/ClusterLabs/resource-agents/pull/34

Thanks,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] new resource agents release

2012-01-27 Thread Keisuke MORI

Hi Dejan,

I would appreciate you for bringing up for the new release.
I really look forward to see it very soon!


Among other issues, I consider that the two issues below are critical
and want to make it for the new release somehow:

1. tomcat RA: SEARCH_STR fix
http://www.gossamer-threads.com/lists/linuxha/dev/75950#75950

We can reconsider the patch if you're not comfortable with it, but
without fixing this, it  just doesn't work with a typical
catalina_opts configuration.

2. apache RA: testregex matching fix
http://www.gossamer-threads.com/lists/linuxha/dev/77619#77619

This looks as a regression since heartbeat-2.1.4 from user's point of view;
 one of our customer reported that they had been using 2.1.4 and
apache without problems
 and when they tried to upgrade to the recent Pacemaker without
any changes in apache,
 it failed because of this issue.


As for the other issues posted from our team,
although it would be great if they are also fixed as much as possible,
I do not want to delay the release schedule any longer  for them.

Regards,
Keisuke MORI

2012/1/27 Dejan Muhamedagic :
> Hello,
>
> The resource agents release 3.9.2 was end of last June, quite a
> while ago. High time to do a new one.
>
> Any obstacles or release blockers? Or other opinions?
>
> Florian and Lars, is that fine with you in particular?
>
> Now, I know that there are some fixes and ammendments and
> improvements which have been posted to this list or elsewhere and
> never made it to the repository. If there are some which didn't
> get appropriate attention, please give us a heads up and then we
> can discuss the matter. Unfortunately, I cannot guarantee that
> all are going to get due attention, in particular those which
> would be deemed to jeopardize stability.
>
> Best regards,
>
> Dejan
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/



-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] new resource agents release

2012-01-30 Thread Keisuke MORI

Hi Dejan,

2012/1/27 Dejan Muhamedagic :
>> 2. apache RA: testregex matching fix
>>     http://www.gossamer-threads.com/lists/linuxha/dev/77619#77619
>>
>>     This looks as a regression since heartbeat-2.1.4 from user's point of 
>> view;
>>      one of our customer reported that they had been using 2.1.4 and
>> apache without problems
>>      and when they tried to upgrade to the recent Pacemaker without
>> any changes in apache,
>>      it failed because of this issue.
>
> You're referring to apache-002.patch? Well, that's unfortunate as
> the two are incompatible, i.e. if the configuration has
> 'whatever-string$' in testregex and we reintroduce
> "tr '\012' ' '" that would break such configurations.

Oops, sorry,  I was only referring to apache-001.patch.

>
> This changed a bit more than three years ago and so far nobody
> complained. So, perhaps better to leave it as it is and whoever
> wants to upgrade from a +3 old installation should anyway do
> some good testing. What do you think?

Ok, let's move  the discussion on to the relevant thread about this topic.

Regards,

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH]Monitor failure and IPv6 support of apache-ra

2012-01-30 Thread Keisuke MORI

Hi,

2012/1/28 Dejan Muhamedagic :
> Hi,
>
> On Fri, Jan 20, 2012 at 02:09:13PM +0900, nozawat wrote:
>> Hi Dejan
>>
>>  I'm agreeable in the opinion.
>>  I send the patch which I revised.
>>
>> > I'll apply this one. BTW, can you share your use case.
>> If there is not -z option, the following HTML files return an error.
>> - example ---
>> 
>> 
>> test
>> 
>> 
>> ---
>> I placed a page for checks and was going to monitor it.
>
> Even though I said I'd apply this one, I'm now rather reluctant,
> because it may break some existing configurations, for instance
> if there are anchors in the regular expression (^ or $).
>
> Why is it important to match multiple lines?
> Just curious: how do you put this string into statusurl?

The problem is that the default value of testregex assumes that
 and  tags are in a single line,
although it is very common that the HTML contents return them
as multiple lines.

 TESTREGEX=${OCF_RESKEY_testregex:-'[[:space:]]*'}

I think that it will not be a problem when you are using apache with
'server-status' handler enabled
because in that case apache seems return those tags in a single line,
but it is also a common use case that the RA monitors to, say, the
index.html on the top page.

As for the regular expression like ^ or $, it looks like working as
expected with -z option in my quick tests.
Do you have any examples that it may break the configuration?

If we should not really support multiple lines matching, then that is
fine for us too,
but in that case it would be preferable that the default value of
testregex is something better
for a single line matching, like just ''.
(and also we should mention about it in the meta-data document)

Regards,
Keisuke MORI

>
> Cheers,
>
> Dejan
>
>> Regards,
>> Tomo
>>
>> 2012年1月20日4:20 Dejan Muhamedagic :
>> > Hi,
>> >
>> > On Thu, Jan 19, 2012 at 11:42:07AM +0900, nozawat wrote:
>> >> Hi Dejan and Lars
>> >>
>> >>  I send the patch which settled a conventional argument.
>> >>  1)apache-001.patch
>> >>->I am the same with the patch which I sent last time.
>> >>->It is the version that I added an option of the grep to.
>> >
>> > I'll apply this one. BTW, can you share your use case.
>> >
>> >>  2)apache-002.patch
>> >>->It is a processing method using tr at the age of HB2.1.4.
>> >
>> > Can't recall or see from the history why tr(1) was dropped (and
>> > it was me who removed it :( But I guess there was a reason for
>> > that.
>> >
>> >>  3)http-mon.sh.patch
>> >>->It is the patch which coupled my suggestion with A.
>> >
>> > After trying to rework the patch a bit, I think now that we need
>> > a different user interface, i.e. we should introduce a boolean
>> > parameter, say "use_ipv6", then fix interface bind addresses
>> > depending on that. For instance, if user wants to use curl, then
>> > we'd need to add the "-g" option to make it work with IPv6.
>> >
>> > We can also try to figure out from the "statusurl" content if
>> > it contains an IPv6 address (echo "$statusurl" | grep -qs "::")
>> > then make the http client use IPv6 automatically.
>> >
>> > Would that work for you? Opinions?
>> >
>> > Cheers,
>> >
>> > Dejan
>> >
>> >>  1) and 2) improve malfunction at the time of the monitor processing.
>> >>  3) supports IPv6.
>> >>
>> >>  The malfunction is not revised when I do not apply at least 1) or 2).
>> >>  I think that 2) plan is good, but leave the final judgment to Dejan.
>> >>
>> >> Regards,
>> >> Tomo
>> >>
>> >> 2012年1月19日1:12 Dejan Muhamedagic :
>> >> > Hi,
>> >> >
>> >> > On Wed, Jan 18, 2012 at 11:19:58AM +0900, nozawat wrote:
>> >> >> Hi Dejna and Lars
>> >> >>
>> >> >>  When, for example, it is a logic of the examples of Lars to try both,
>> >> >>  in the case of IPv6, is the check of IPv4 that I enter every time?
>> >> >>  Don't you hate that useless processing enters every time?
>> >> >>
>> >> >>  In that case, I think that I should give a parameter such as
>> >> >> OCF_RESKEY_bi

1 2 >

1 - 100 of 122 matches

Mail list logo