Re: [Pacemaker] Problem : By colocations limitation, the resource appointment of the combination does not become effective.

2010-03-08 Thread renayama19661014
Hi Andrew, > This is normal for constraints with scores < INFINITY. > Anything < INFINITY is "preferable but not mandatory" Sorry The method of my question was bad. As of STEP9, is the setting that a resource of UMgroup01 does not start possible? I do not perform the INFINITY setting in ci

Re: [Pacemaker] [PATCH]The change of the output level of the log.(for stonithd)

2010-03-08 Thread renayama19661014
Hi Dejan, There seem to be some problems for a retouch somehow or other. An error is given unless I appoint enable-fatal-warnings=no. [r...@x3650e Pacemaker-1-0-efdc0d8143dd]# ./autogen.sh && ./configure --prefix=$PREFIX --localstatedir=/var --with-lcrso-dir=$LCRSODIR (snip) [r...@x3650e Pacem

Re: [Pacemaker] [PATCH]The change of the output level of the log.(for stonithd)

2010-03-09 Thread renayama19661014
Hi Dejan, > Andrew fixed it already. Sorry about that. All right. Thanks. Hideo Yamauchi. --- Dejan Muhamedagic wrote: > Hi, > > On Tue, Mar 09, 2010 at 04:31:22PM +0900, renayama19661...@ybb.ne.jp wrote: > > Hi Dejan, > > > > There seem to be some problems for a retouch somehow or other. >

Re: [Pacemaker] Problem : By colocations limitation, the resource appointment of the combination does not become effective.

2010-03-09 Thread renayama19661014
Hi Andrew, Thank you for comment. I asked next question before. http://www.gossamer-threads.com/lists/linuxha/pacemaker/61484 I guessed from your this answer. When I use cib.xml of the answer of before, is the limitation that it combined a start of clnPingd with after a node rebooted unreal

[Pacemaker] [PATCH GUI]A Japanization file.

2010-03-11 Thread renayama19661014
Hi Yan, For Pacemaker-Python-GUI-14fd66fafbfa.tar.gz, I updated a Japanization file. In the latest edition of GUI, please reflect this patch. The next error occurs by construction of Pacemaker-Python-GUI-14fd66fafbfa.tar.gz. Please remove an error. cc1: warnings being treated as errors mgmt_c

Re: [Pacemaker] [PATCH GUI]A Japanization file.

2010-03-11 Thread renayama19661014
Hi Yan, > There are other two _fuzzy_ translations in the ja.po. You may want to > update them too:-) Thanks! I'm sorry. I send a patch again. > Prototypes of some functions have changed. You need to update to the > latest pacemaker. I used Pacemaker of the next place. * http://hg.clusterlab

Re: [Pacemaker] [PATCH GUI]A Japanization file.

2010-03-11 Thread renayama19661014
Hi Yan, > You may want to try pacemaker 1.1: > http://hg.clusterlabs.org/pacemaker/1.1 All right. Thanks!! > Otherwise you could reverse the change: > http://hg.clusterlabs.org/pacemaker/pygui/rev/4dc8cb63f29b Thanks!! Best Regards, Hideo Yamauchi. --- Yan Gao wrote: > > > On 03/12/10 12:

Re: [Pacemaker] [PATCH GUI]A Japanization file.

2010-03-14 Thread renayama19661014
Hi Yan, In the version that you taught, I confirmed display of GUI by the Japanese language. I made a patch again and attached it. Please reflect this patch in GUI for development version. Best Regards, Hideo Yamauchi. ja.po.20100316.patch Description: 4280310379-ja.po.20100316.patch __

Re: [Pacemaker] Problem : By colocations limitation, the resource appointment of the combination does not become effective.

2010-03-16 Thread renayama19661014
Hi Andrew, Please give my question an answer. Best Regards, Hideo Yamauchi. --- renayama19661...@ybb.ne.jp wrote: > Hi Andrew, > > Thank you for comment. > > I asked next question before. > > http://www.gossamer-threads.com/lists/linuxha/pacemaker/61484 > > I guessed from your this answer

Re: [Pacemaker] Problem : By colocations limitation, the resource appointment of the combination does not become effective.

2010-03-18 Thread renayama19661014
Hi Andrew, > I've been extremely busy. > Sometimes I defer more complex questions until I have time to give > them my full attention. I understand that you are busy. Thank you for comment. > I don't really understand the question here. Sorry.. I made a mistake in the link of the former problem.

[Pacemaker] Problem : Sometimes failed in the start of the guest(on KVM).

2010-03-18 Thread renayama19661014
Hi, I use VirtualDomain-RA and, on KVM, constitute a cluster. However, a guest sometimes fails in start. Mar 16 15:16:52 x3650e lrmd: [13457]: info: RA output: (guest-kvm1:start:stderr) error: Failed to start domain kvm1 error: internal error unable to start guest: inet_listen: bind(ipv4,127.0

Re: [Pacemaker] Problem : Sometimes failed in the start of the guest(on KVM).

2010-03-19 Thread renayama19661014
Hi Dejan, > IIRC, that port has to do with vnc and something else (another > VNC server?) has already been started on that port. Thank you for comment. I examine it a little more. Best Regards, Hideo Yamauchi. --- Dejan Muhamedagic wrote: > Hi Hideo-san, > > On Fri, Mar 19, 2010 at 10:46:1

Re: [Pacemaker] About replacement of clone and handling of the fail number of times.

2010-03-23 Thread renayama19661014
Hi Andrew, Thank you for comment. > So if I can summarize, you're saying that clnUMdummy02 should not be > allowed to run on srv01 because the combined number of failures is 6 > (and clnUMdummy02 is a non-unique clone). > > And that the current behavior is that clnUMdummy02 continues to run. >

Re: [Pacemaker] Problem : By colocations limitation, the resource appointment of the combination does not become effective.

2010-03-23 Thread renayama19661014
Hi Andrew, Thank you for comment. > I was suggesting: > > with-rsc="clnUMgroup01" score="INFINITY"/> > > > >operation="not_defined"/> >attribute="clnPingd" operation="lt" type="integer" value="1"/> >attribute="clnPingd2" operation="not_defined"/> >attribu

Re: [Pacemaker] Problem : By colocations limitation, the resource appointment of the combination does not become effective.

2010-03-23 Thread renayama19661014
Hi Andrew, I ask you a question one more. Our real resource constitution is a little more complicated. We do colocation of the clone(clnG3dummy1, clnG3dummy2) which does not treat the update of the attribute such as pingd. (snip)

Re: [Pacemaker] Problem : Sometimes failed in the start of the guest(on KVM).

2010-03-23 Thread renayama19661014
Hi Dejan, It seems to be a problem to make the vnc setting of the guest automatic somehow or other. The problem seems to be evaded by every guest by appointing a port. Thanks, Hideo Yamauchi. --- renayama19661...@ybb.ne.jp wrote: > Hi Dejan, > > > IIRC, that port has to do with

Re: [Pacemaker] About replacement of clone and handling of the fail number of times.

2010-03-24 Thread renayama19661014
Hi Andrew, > Do you mean: why is the clone on srv01 always $clone:0 but on srv02 > its sometimes $clone:0 and sometimes $clone:1 ? yes. The replacement thought both nodes to be the same movement. Because it is "globally-unique=false". Best Regards, Hideo Yamauchi. --- Andrew Beekhof wrot

Re: [Pacemaker] About replacement of clone and handling of the fail number of times.

2010-03-25 Thread renayama19661014
Hi Andrew, > globally-unique=false" means that :0 and :1 are actually the same resource. > its perfectly valid for entries for both to exist on the node, but the > PE should fold them together internally. > > in most ways it does, just not for failures (yet). Thank you for comment. Some we were

Re: [Pacemaker] Problem : By colocations limitation, the resource appointment of the combination does not become effective.

2010-04-18 Thread renayama19661014
Hi Andrew, Are you busy? Please give my question an answer. Best Regards, Hideo Yamauchi. --- renayama19661...@ybb.ne.jp wrote: > Hi Andrew, > > I ask you a question one more. > > Our real resource constitution is a little more complicated. > > We do colocation of the clone(clnG3dummy1, c

Re: [Pacemaker] Problem : By colocations limitation, the resource appointment of the combination does not become effective.

2010-04-19 Thread renayama19661014
Hi Andrew, > >> We want to realize start in order of the next. > >> 1) clnPingd, clnG3dummy1, clnG3dummy2, clnUMgroup01 (All resources start) > >> -> UMgroup01 start > >>* And the resource moves if a clone of one stops. > >> 2) clnPingd, clnG3dummy1, clnG3dummy2 (All resources start) -> >

Re: [Pacemaker] Problem : By colocations limitation, the resource appointment of the combination does not become effective.

2010-04-19 Thread renayama19661014
Hi Andrew, > > Thank you for comment. > > But, does not the problem of the next email recur when I change it in > > INFINITY? > > > > * http://www.gossamer-threads.com/lists/linuxha/pacemaker/60342 > > No, as I previously explained: By an answer before you, pingd moves well. However, does set

Re: [Pacemaker] Problem : By colocations limitation, the resource appointment of the combination does not become effective.

2010-04-20 Thread renayama19661014
Hi Andrew, > Yes, because the -INFINITY + INFINITY = -INFINITY and therefore the > node wont be allowed to host the service. Thank you for comment. My worry was useless somehow or other. The initial placement of the resource went well, too. By various patterns, I test some movement. Best Reg

Re: [Pacemaker] About influence of resouce-stickiness which used colocation for limitation.

2010-04-23 Thread renayama19661014
Hi Andrew, > Fixed in: >http://hg.clusterlabs.org/pacemaker/1.1/rev/4c775a4abc87 Thanks! Best Regards, Hideo Yamauchi. --- Andrew Beekhof wrote: > Fixed in: >http://hg.clusterlabs.org/pacemaker/1.1/rev/4c775a4abc87 > > 2010/4/22 : > > Hi, > > > > We tested the cluster constitution

Re: [Pacemaker] About influence of resouce-stickiness which used colocation for limitation.

2010-04-25 Thread renayama19661014
Hi Andrew, Version 1.0 is necessary for us. Please backport your revision to version 1.0. Best Regards, Hideo Yamauchi. --- renayama19661...@ybb.ne.jp wrote: > Hi Andrew, > > > Fixed in: > >http://hg.clusterlabs.org/pacemaker/1.1/rev/4c775a4abc87 > > Thanks! > > Best Regards, > Hideo Ya

Re: [Pacemaker] About influence of resouce-stickiness which used colocation for limitation.

2010-04-26 Thread renayama19661014
Hi Andrew, > Done. >http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/f7da9d09ebd2 Thanks. Best Regards, Hideo Yamauchi. --- Andrew Beekhof wrote: > Done. >http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/f7da9d09ebd2 > > On Mon, Apr 26, 2010 at 4:18 AM, wrote: > > Hi Andrew, >

Re: [Pacemaker] About influence of resouce-stickiness which used colocation for limitation.

2010-04-27 Thread renayama19661014
Hi Andrew, > > Done. > >http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/f7da9d09ebd2 It seems to move with your patch definitely. But, the following error is reflected on log. Does not this error have any problem? Apr 27 16:37:00 srv01 pengine: [5839]: ERROR: native_merge_weights: Appl

Re: [Pacemaker] About influence of resouce-stickiness which used colocation for limitation.

2010-04-27 Thread renayama19661014
Hi Andrew, > Oh, that was some development logging I forgot to remove. > I'll backport that fix in a moment too. All right. Thanks! Best Regards, Hideo Yamauchi. --- Andrew Beekhof wrote: > On Tue, Apr 27, 2010 at 9:42 AM, wrote: > > Hi Andrew, > > > >> > Done. > >> > � > >> > �http://hg.c

[Pacemaker] [Problem] A fail count is up by a postponed monitor.

2010-05-11 Thread renayama19661014
Hi, On a test of Pacemaker before a little, the following problem happened. * corosync 1.2.1 * Pacemaker-1-0-8463260ff667 * Reusable-Cluster-Components-c447fc25e119 * Cluster-Resource-Agents-f92935082277 A problem is that the monitor error of the prmFsPostgreSQLDB3-2 resource that stopped o

[Pacemaker] An error of log_data_element is noisy.

2010-05-13 Thread renayama19661014
Hi, In 1.0 latest version, an error is reflected on log. Movement does not have any problem, but is very noisy. (snip) May 13 16:21:34 srv01 cib: [24342]: ERROR: log_data_element: cib_config_changed: Diff May 13 16:21:34 srv01 cib: [24342]: ERROR: log_data_element: cib_config_change

Re: [Pacemaker] An error of log_data_element is noisy.

2010-05-13 Thread renayama19661014
> Fixed: > http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/94bf2cc9219b Thanks. Hideo Yamauchi. --- Andrew Beekhof wrote: > Fixed: > http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/94bf2cc9219b > > On Thu, May 13, 2010 at 9:26 AM, wrote: > > Hi, > > > > In 1.0 latest version, an e

Re: [Pacemaker] [Problem] A fail count is up by a postponed monitor.

2010-05-16 Thread renayama19661014
Hi Andrew, I registered this problem with Bugzilla. * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2417 Best Regards, Hideo Yamauchi. --- renayama19661...@ybb.ne.jp wrote: > Hi Andrew, > > Thank you for comment. > > > After reconstructing the logs (because they were unreadabl

Re: [Pacemaker] [Problem] A fail count is up by a postponed monitor.

2010-05-16 Thread renayama19661014
Hi Andrew, The next patch seems to influence the cause of this problem. * http://hg.linux-ha.org/glue/rev/3112dd90ecd8 Best Regards, Hideo Yamauchi. --- renayama19661...@ybb.ne.jp wrote: > Hi Andrew, > > I registered this problem with Bugzilla. > > * http://developerbugs.linux-foundation.

[Pacemaker] A mistake of the log output.

2010-06-10 Thread renayama19661014
Hi, There seems to be an error in the log output of the source of Pacemaker. void clone_expand(resource_t *rsc, pe_working_set_t *data_set) { clone_variant_data_t *clone_data = NULL; get_clone_variant_data(clone_data, rsc); crm_err("Processing actions from %s", rsc->id);

Re: [Pacemaker] A mistake of the log output.

2010-06-10 Thread renayama19661014
Hi Andrew, Thanks! Another one... Not a great problemNext if is the same. static gboolean determine_online_status_no_fencing(pe_working_set_t *data_set, xmlNode * node_state, node_t *this_node) { (snip) if(!crm_is_true(ccm_state) || safe_str_eq(ha_state, DEADSTATUS)){

[Pacemaker] [Problem]Cib cannot update an attribute by 16 node constitution.

2010-06-13 Thread renayama19661014
We tested 16 node constitution (15+1). We carried out the next procedure. Step1) Start 16 nodes. Step2) Send cib after a DC node was decided. An error occurs by the update of the attribute of pingd after Probe processing was over. ---

Re: [Pacemaker] [Problem]Cib cannot update an attribute by 16 node constitution.

2010-06-14 Thread renayama19661014
Hi Andrew, Thank you for comment. > More likely of the underlying messaging infrastructure, but I'll take a look. > Perhaps the default cib operation timeouts are too low for larger clusters. > > > > > The log attached it to next Bugzilla. > > �* http://developerbugs.linux-foundation.org/show_bu

[Pacemaker] [PATCH]Omitted STONITH of useless broadcast.(only 2 nodes configuration)

2010-07-06 Thread renayama19661014
Hi All, I wrote a patch. This patch limited it to two node configuration and wrote it. When failed in STONITH by the configuration of two nodes, the request of STONITH to the other node is useless. Because the reason is because it can carry out STONITH only from one node. We should omit the re

Re: [Pacemaker] [PATCH]Omitted STONITH of useless broadcast.(only 2 nodes configuration)

2010-07-06 Thread renayama19661014
Hi Andrew, Thank you for comment. > stonithd should also have access to the memebership list, so i dont > think clients like the crmd need to be involved here. My understanding may be wrong. When stonithd accesses memberlist, stonithd can know node configuration. It is from memberlist and does

Re: [Pacemaker] [PATCH]Omitted STONITH of useless broadcast.(only 2 nodes configuration)

2010-07-07 Thread renayama19661014
Hi Andrew, Thank you for comment. > It should be able to calculate the expected number of votes/replies > from the ais/heartbeat membership list directly. > The crmd shouldn't need to pass that info in IMHO. OK. I think about a patch of the form to acquire memberlist from stonithd. Best Regard

[Pacemaker] [PATCH]The changing of the log level of pengine process.

2010-07-28 Thread renayama19661014
Hi All, Our user showed a demand in a level of log output after handling of pengine. When STONITH is carried out, pengine wants to output log at a warning level if a repeating resource is only an STONITH resource. Because plural STONITH may be started when STONITH is carried out. However, it

[Pacemaker] [Problem]The problem of the combination of Pacemaker and corosync1.2.7.

2010-08-01 Thread renayama19661014
Hi, I confirmed movement when corosync1.2.7 combined Pacemaker. The combination is as follows. * corosync 1.2.7 * Pacemaker-1-0-74392a28b7f3.tar * Cluster-Resource-Agents-bfcc4e050a07.tar * Reusable-Cluster-Components-8286b46c91e3.tar I confirmed the next movement in two nodes of a virtual

Re: [Pacemaker] [Problem]The problem of the combination of Pacemaker and corosync1.2.7.

2010-08-03 Thread renayama19661014
Hi Vladislav, Thank you for comment. > This is probably connected to > http://marc.info/?l=openais&m=127977785007234&w=2 > > Steven promised to look at that issue after his vacation. I wait for a revision of Steven. Meanwhile, I use Pacemaker1.1 to recommend of Andrew. Best Regards, Hideo Ya

Re: [Pacemaker] [Problem]The problem of the combination of Pacemaker and corosync1.2.7.

2010-08-03 Thread renayama19661014
Hi Andrew, Thank you for comment. > No need to wait, the current tip of Pacemaker 1.1 is perfectly stable > (and included for RHEL6.0). > Almost all the testing has been done for 1.1.3, I've just been busy > helping out with some other projects at Red Hat and haven't had time > to do the actual r

Re: [Pacemaker] [PATCH]The changing of the log level of pengine process.

2010-08-03 Thread renayama19661014
Hi Andrew, Thank you for comment. It is difficult for me to illustrate by English. This patch is a considerably special demand of our user. Even if an STONITH resource duplicated, node and STONITH done STONITH of are that the log of the node to do wants to output it by warning. L

[Pacemaker] [PATCH]A redundant if sentence.

2010-08-03 Thread renayama19661014
Hi, It is the patch of a redundant if sentence for pengine. void unpack_operation( action_t *action, xmlNode *xml_obj, pe_working_set_t* data_set) { (snip) if(safe_str_eq(class, "stonith")) { action->needs = rsc_req_nothing; value = "nothing (fenci

[Pacemaker] [Problem]A compilation error of Pacemaker1.1.

2010-08-03 Thread renayama19661014
Hi, I compiled Pacemaker1.1. But, the next error happened. [r...@srv01 Pacemaker-1-1-5ce5b34cf3ab]# export PREFIX=/usr;export LCRSODIR=$PREFIX/libexec/lcrso;export CLUSTER_USER=hacluster;export CLUSTER_GROUP=haclient [r...@srv01 Pacemaker-1-1-5ce5b34cf3ab]# ./autogen.sh && ./configure --pref

Re: [Pacemaker] [Problem]A compilation error of Pacemaker1.1.

2010-08-04 Thread renayama19661014
Hi Andrew, > Fixed: >http://hg.clusterlabs.org/pacemaker/1.1/rev/6bad7c6bbe7d Thanks! Best Regards, Hideo Yamauchi. --- Andrew Beekhof wrote: > On Wed, Aug 4, 2010 at 10:08 AM, Andrew Beekhof wrote: > > On Wed, Aug 4, 2010 at 6:34 AM, � wrote: > > >> [r...@srv01 Pacemaker-1-1-5ce5b34cf3

Re: [Pacemaker] [Problem]A compilation error of Pacemaker1.1.

2010-08-04 Thread renayama19661014
Hi Andrew, Let me ask you a question. I deleted service from corosync.conf. I started service of pacemaker after having started corosync. However, two clusters do not consist of a node. * srv01 [r...@srv01 ~]# ps -ef | grep heartbeat root 4 0 0 10:25 ?00:00:00 /usr/lib/hea

Re: [Pacemaker] [PATCH]The changing of the log level of pengine process.

2010-08-05 Thread renayama19661014
Hi Andrew, Thank you for comment. > np :-) > > Maybe it would be easier to show the logs and/or crm_mon output with > and without the patch. However, our many users watch error log. And some users do not like trouble to be notified of in error log in this situation. After all is this patch t

Re: [Pacemaker] [Problem]A compilation error of Pacemaker1.1.

2010-08-05 Thread renayama19661014
Hi Andrew, Thank you for comment. > Sorry, I gave you the wrong information yesterday. > Apparently working on GUIs for two weeks is enough to rot one's brain ;-) > > Since I've also been talking to other people about this, I've taken > the time to write it up here: > > http://theclusterguy.c

[Pacemaker] A demand for the expected votes indication and a question.

2010-08-05 Thread renayama19661014
Hi, Our user uses corosync and Pacemaker. Last updated: Fri Aug 6 13:25:37 2010 Stack: openais Current DC: srv01 - partition with quorum Version: 1.1.2-230655711dc7b8579747ddeafc6f39247f8e87fc 3 Nodes configured, 3 expected votes 1 Resources configured. Online: [ srv01

Re: [Pacemaker] [Problem]A compilation error of Pacemaker1.1.

2010-08-06 Thread renayama19661014
Hi Andrew, > This looks to be the problem. > Doesn't look like @sysconfdir@ was expanded correctly in configure. > > How did you build? Sorry I appointed same build before. I appointed sysconfdir=/etc, and the problem was settled. Thanks! Hideo Yamauchi. --- Andrew Beekhof wrote: > On

[Pacemaker] About specifications of on-fail="block".

2010-08-08 Thread renayama19661014
Hi, Let me confirm it about specifications of on-fail="block". I constituted the following cluster. Last updated: Mon Aug 9 11:18:29 2010 Stack: openais Current DC: srv01 - partition with quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 2 Nodes configured, 2 expected

Re: [Pacemaker] About specifications of on-fail="block".

2010-08-13 Thread renayama19661014
Hi, I compared movement in a version of pacemaker about this problem. * 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b [r...@srv02 ~]# crm_mon -1 Last updated: Fri Aug 13 13:02:24 2010 Stack: openais Current DC: srv01 - partition with quorum Version: 1.0.9-74392a28b7f31d7ddc866895

Re: [Pacemaker] [PATCH]The changing of the log level of pengine process.

2010-08-26 Thread renayama19661014
Hi Andrew, Thank you for comment. > Why not simply remove the if(was_processing_error) block? > Its just a summary message, the place that set was_processing_error > will also have logged an error. Is this meaning to abolish the next code? - if(was_processing_error) { -

Re: [Pacemaker] A demand for the expected votes indication and a question.

2010-08-26 Thread renayama19661014
Hi Andrew, > crm_mon shouldn't really display expected votes for heartbeat > clusters... they're not used in any way when heartbeat is in use. > expected votes is only relevant for ver: 0 of the pacemaker/corosync plugin. > in the future pacemaker will obtain quorum information directly from > cor

Re: [Pacemaker] About specifications of on-fail="block".

2010-08-26 Thread renayama19661014
Hi Andrew, I registered this problem on Bugzilla. * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2476 Best Regards, Hideo Yamauchi. --- renayama19661...@ybb.ne.jp wrote: > Hi, > > I compared movement in a version of pacemaker about this problem. > > * 1.0.9-74392a28b7f31d7dd

Re: [Pacemaker] [PATCH]The changing of the log level of pengine process.

2010-09-01 Thread renayama19661014
Hi Andrew, Thank you for comment. We discussed it about this matter a little. The revision of the output of the log withdraws it for the moment. Best Regards, Hideo Yamauchi. --- Andrew Beekhof wrote: > On Fri, Aug 27, 2010 at 3:03 AM, wrote: > > Hi Andrew, > > > > Thank you for comment.

[Pacemaker] About Quorum control at the time of the service stop.(no-quorum-policy=freeze)

2010-09-09 Thread renayama19661014
Hi, We confirmed movement of no-quorum-policy=freeze in four node constitution. Of course we understand that quorum control does not act in Heartbeat well. We confirmed the service stop of four nodes in the next procedure. Step1) We start four nodes.(3ACT:1STB) Step2) We send cib.xml. ===

[Pacemaker] A patch of crm_mon for the trouble actions.

2010-09-12 Thread renayama19661014
Hi, I contribute the patch of the crm_mon command. A node was offline and, in the case of the shutdown, revised it not to display a trouble action. Please confirm a patch. And, without a problem, please take this revision in a development version. diff -r 9b95463fde99 tools/crm_mon.c --- a/t

Re: [Pacemaker] A patch of crm_mon for the trouble actions.

2010-09-13 Thread renayama19661014
Hi Andrew, Thank you for comment. > I assume this is for the stonith-enabled=true case, since offline > nodes are ignored for stonith-enabled=false. > Once the node is shot, then its status section is erased and no failed > actions will be shown... so why do we need this patch? I know that troub

Re: [Pacemaker] A patch of crm_mon for the trouble actions.

2010-09-13 Thread renayama19661014
Hi Andrew, Thank you for comment. > Thanks for the explanation, I think you're right that we shouldn't be > showing these failed actions. > I think we want to do it in the PE though, eg. stop them from making > it into the failed_ops list in the first place. Does your answer mean that the next p

Re: [Pacemaker] About Quorum control at the time of the service stop.(no-quorum-policy=freeze)

2010-09-13 Thread renayama19661014
Hi Andrew, Thank you for comment. As a conclusion in case of the freeze setting * At the divided point in time, the resource maintains it. * When a node shuts it down, in divided constitution, the resource does migrate. -> Maintaining a resource in divided constitution. Is my understa

Re: [Pacemaker] About Quorum control at the time of the service stop.(no-quorum-policy=freeze)

2010-09-14 Thread renayama19661014
Hi Andrew, > I'd probably summarize it as: > "resources are frozen to their current _partition_" > > They can only move around within their partition. So if the partition > does not have quorum and > * a node shuts down, the partition can reallocate any services on > that node, but > * a node

Re: [Pacemaker] A patch of crm_mon for the trouble actions.

2010-09-14 Thread renayama19661014
Hi Andrew, > Perfect. Pushed. Thanks! > >http://hg.clusterlabs.org/pacemaker/1.1/rev/d932da0b886b Thanks!! Hideo Yamauchi. --- Andrew Beekhof wrote: > Perfect. Pushed. Thanks! > >http://hg.clusterlabs.org/pacemaker/1.1/rev/d932da0b886b > > 2010/9/14 : > > Hi Andrew, > > > > Thank

[Pacemaker] About behavior in "Action Lost".

2010-09-21 Thread renayama19661014
Hi, Node was in state that the load was very high, and we confirmed monitor movement of Pacemeker. Action Lost occurred in stop movement after the error of the monitor occurred. Sep 8 20:02:22 cgl54 crmd: [3507]: ERROR: print_elem: Aborting transition, action lost: [Action 9]: In-flight (id: p

Re: [Pacemaker] About behavior in "Action Lost".

2010-09-22 Thread renayama19661014
Hi Andrew, Thank you for comment. > A long time ago in a galaxy far away, some messaging layers used to > loose quite a few actions, including stops. > About the same time, we decided that fencing because a stop action was > lost wasn't a good idea. > > The rationale was that if the operation eve

[Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

2010-09-26 Thread renayama19661014
Hi, When I investigated another problem, I discovered this phenomenon. If attrd causes process trouble and does not restart, the problem does not occur. Step1) After start, it causes a monitor error in UmIPaddr twice. Online: [ srv01 srv02 ] Resource Group: UMgroup01 UmVIPcheck (ocf::hea

Re: [Pacemaker] About behavior in "Action Lost".

2010-09-28 Thread renayama19661014
Hi Andrew, > Pushed as: >http://hg.clusterlabs.org/pacemaker/1.1/rev/8433015faf18 > > Not sure about applying to 1.0 though, its a dramatic change in behavior. The change of this link is not found. Where did you update it? Best Regards, Hideo Yamauchi. --- Andrew Beekhof wrote: > Pushed

Re: [Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

2010-09-28 Thread renayama19661014
Hi Andrew, Thank you for comment. > The problem here is that attrd is supposed to be the authoritative > source for this sort of data. Yes. I understand. > Additionally, you don't always want attrd reading from the status > section - like after the cluster restarts. The problem seems to be abl

[Pacemaker] [Problem]Lost fail-count.

2010-09-29 Thread renayama19661014
Hi, We examined the trouble outbreak of a resource during cluster division and the recovery of the cluster. However, at the time of cluster recovery, the phenomenon that fail-count disappeared occurred. Failed-Actions did not disappear then. In the next procedure, it occurred. Step1)We start

Re: [Pacemaker] About behavior in "Action Lost".

2010-09-29 Thread renayama19661014
Hi Andrew, > Sorry, it probably got rebased before I pushed it. > > http://hg.clusterlabs.org/pacemaker/1.1/rev/dd8e37df3e96 should be the > right link Thanks!! Hideo Yamuachi. --- Andrew Beekhof wrote: > Sorry, it probably got rebased before I pushed it. > > http://hg.clusterlabs.org/pacem

Re: [Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

2010-09-30 Thread renayama19661014
Hi Andrew, Thank you for comment. > During crmd startup, one could read all the values from attrd into the > hashtable. > So the hashtable would only do something if only attrd went down. If attrd communicates with crmd at the time of start and reads the data of the hash table, the problem seem

[Pacemaker] [GUI]Compatibility issues of Python.

2010-09-30 Thread renayama19661014
Hi Yan, I operated latest GUI for Japanization of GUI. However, on RHEL5.5, GUI causes an error by the operation of the resource. There seems to be a cause in the difference of the version of Python. (snip) def on_rsc_action(self, action) : (cur_type, cur_name) = self.ma

Re: [Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

2010-10-04 Thread renayama19661014
Hi Andrew, Thank you for comment. > > Is the change of this attrd and crmd difficult? > > I dont think so. > But its not a huge priority because I've never heard of attrd actually > crashing. > > So while I agree that its theoretically a problem, in practice no-one > is going to hit this in pr

[Pacemaker] "Election Timeout" and node became the "Pending" state.

2010-10-04 Thread renayama19661014
Hi, We tested complicated node trouble. An error of "Election Timeout" occurred then. * Pacemaker:pacemaker-1.0.9.1 * heartbeat-3.0.3-2.3.el5 * cluster-glue:cluster-glue-1.0.6-1.6.el5 * resource-agents-1.0.3-1.0.dev.b7a3b1973ba7 We tested it in the next procedure. Step1) Start all nodes

Re: [Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

2010-10-05 Thread renayama19661014
Hi Andrew, I registered these contents with Bugzilla as enhancement of the functions. * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2501 Thanks, Hideo Yamauchi. --- renayama19661...@ybb.ne.jp wrote: > Hi Andrew, > > Thank you for comment. > > > > Is the change of this attrd a

[Pacemaker] [Problem]The monitor that start-delay is long does not stop.

2010-10-06 Thread renayama19661014
Hi, I operated the next to confirm the contribution of the mailing list. * http://www.gossamer-threads.com/lists/linuxha/pacemaker/66939 Step1) I prepare cib.xml having monitor which set start-delay than five minutes.. Step2) I start two nodes and send cib. Last updated: Thu Oct

Re: [Pacemaker] [Problem]The monitor that start-delay is long does not stop.

2010-10-07 Thread renayama19661014
Hi Andrew, Thank you for comment. > Funnily enough I was just looking at that message and saw that the > code relevant to this one looked wrong too. > > I believe this should fix the issue: >http://hg.clusterlabs.org/pacemaker/1.1/rev/e06810256413 > > > > > I registered log and more with Bu

Re: [Pacemaker] [GUI]Compatibility issues of Python.

2010-10-09 Thread renayama19661014
Hi Yan, > It should work with python < 2.5 now: > http://hg.clusterlabs.org/pacemaker/pygui/rev/16a7d8a5d3eb Thank you for a revision. I confirm the Japanese display of a new po-file in revised GUI. If confirmation is over, I contact you. Best Regards, Hideo Yamauchi. --- Yan Gao wrote: >

Re: [Pacemaker] [Problem]The monitor that start-delay is long does not stop.

2010-10-11 Thread renayama19661014
Hi Andrew, > > Funnily enough I was just looking at that message and saw that the > > code relevant to this one looked wrong too. > > > > I believe this should fix the issue: > >http://hg.clusterlabs.org/pacemaker/1.1/rev/e06810256413 > > > > > > > > I registered log and more with Bugzilla.

Re: [Pacemaker] [GUI]Compatibility issues of Python.

2010-10-12 Thread renayama19661014
Hi Yan, > > * http://www.gossamer-threads.com/lists/linuxha/pacemaker/67046 > Appreciate your good work! Thanks! > Pushed them: > http://hg.clusterlabs.org/pacemaker/pygui/rev/9920f30d364c > http://hg.clusterlabs.org/pacemaker/pygui/rev/af237f362f13 Thank you for revision of hg-GUI. Best Rega

[Pacemaker] Time to a service stop is very long.

2010-10-21 Thread renayama19661014
Hi, We confirmed movement when we set freeze in no-quorum-policy. In the cluster that freeze setting became effective, we stopped the service. However, a stop of the service took time very much. We set "shutdown-escalation" for five minutes to shorten the time for test. But, a stop of the servic

[Pacemaker] [Problem]Failed in node recovery in no-quorum-policy="freeze".

2010-10-26 Thread renayama19661014
Hi, We found a problem about the change of the inside number of the clone resource. This problem influences it and fails in the reconfiguration of a divided cluster. * The divided cluster cannot constitute a cluster again. Because a log file is big, we register the details with next Bugzilla.

Re: [Pacemaker] Time to a service stop is very long.

2010-10-27 Thread renayama19661014
Hi Andrew, > Wait, I think I read that wrong. > I would expect that no-matter what that pacemaker would exit after > shutdown-escalation. > > You're saying it didn't? > Better create a bug and attach the logs. At the time of Step4, srv03,srv04 requested a stop of the Heartbeat service. To see

[Pacemaker] [Question]About the recovery procedure from the state that a node was divided.

2010-11-03 Thread renayama19661014
Hi All, We tested it about the recovery procedure from the state that a node was divided. (As for four nodes, three nodes are active, and one node is constitution of the standby.) It is the restoration from a state divided by two nodes that we set in no-quorum-policy="freeze". The resource ke

[Pacemaker] [Problem]Number of times control of the fail-count is late.

2010-11-09 Thread renayama19661014
Hi, We constituted a cluster by two node constitution. The migration-threshold set it to 2. We confirmed a phenomenon in the next procedure. Step1) Start two nodes and send config5.crm. (The clnDiskd-resources is original.) Last updated: Tue Nov 9 21:10:49 2010 Stack: Heartbeat C

Re: [Pacemaker] [Problem]Number of times control of the fail-count is late.

2010-11-12 Thread renayama19661014
Hi Andrew, Thank you for comment. > > > > It seems to be a problem that update of fail-count was late. > > But, this problem seems to occur by a timing. > > > > It affects it in fail over time of the resource that the control number of > > times of fail-count > is > > wrong. > > > > Is this prob

Re: [Pacemaker] [Question]About the recovery procedure from the state that a node was divided.

2010-11-15 Thread renayama19661014
Hi Andrew, Thank you for comment. > > �Step3) Make "/var/lib/heartbeat/crm/" clean. > > � � � �Make it clean in all nodes > > �Step4) Start all four nodes. > > �Step5) Send cib information to a cluster. > > �Step6) A cluster is rebuilt. > > > > > > We do not want to take the second method. > > Be

Re: [Pacemaker] [Question]About the recovery procedure from the state that a node was divided.

2010-11-15 Thread renayama19661014
Hi Andrew, > > If there is not a procedure of Step3, I think that the bug that I reported > > before is easy to > occur. > > �* http://developerbugs.linux-foundation.org/show_bug.cgi?id=2508 > > > > I think that this bug influences that a procedure of step3 is necessary. > > Hopefully we'll get

[Pacemaker] It affects it that the update of the attribute by attrd is late, and a resource starts with a standby node.

2010-11-28 Thread renayama19661014
Hi, We constituted a cluster by two node constitution. It is constitution complicated slightly that included two pingd in constitution. We confirmed a phenomenon in the next procedure. Step1) 192.168.40.3 addresses invalidate the understanding of ping. Step2) Start two nodes and send trac1383.cr

[Pacemaker] Transition-graph of pengine does a loop when configuration set order of stonith.

2010-11-28 Thread renayama19661014
Hi, We constituted two simple nodes of the clone resource of one clone resource and stonith. But we set order of a stonith resource and the clone resource. We generated a problem in the next procedure. Step1)Start two nodes and send trac1386.clone.crm. [r...@srv01 ~]# crm_mon -1 L

Re: [Pacemaker] [Problem]The movement of the resource is not possible.

2010-11-28 Thread renayama19661014
Hi Andrew, Sorry My response was late. > I think the smartest thing to do here is drop the cib_scope_local flag from -f if(do_force) { crm_debug("Forcing..."); /* cib_options |= cib_scope_local|cib_quorum_override; */ cib_options |= cib_qu

Re: [Pacemaker] Transition-graph of pengine does a loop when configuration set order of stonith.

2010-12-01 Thread renayama19661014
Hi Andrew, > > It is necessary for us to set order of a stonith resource and the clone > > resource. > > > > �* hb_report attached it to Bugzilla. > > �* http://developerbugs.linux-foundation.org/show_bug.cgi?id=2529 > > Thanks for the report, I'll try to follow up there soon Thannks!! Hideo Ya

Re: [Pacemaker] [Problem]The movement of the resource is not possible.

2010-12-01 Thread renayama19661014
Hi Andrew, > > Can 1.0 reflect this revision? > > Because there is influence else, is it impossible? > > I have no objection to it being added to 1.0, it should be safe. Thanks. About 1.0, I ask Mr. Mori for backporting. Will you revise 1.1? Best Regards, Hideo Yamauchi. --- Andrew Beekhof

Re: [Pacemaker] It affects it that the update of the attribute by attrd is late, and a resource starts with a standby node.

2010-12-01 Thread renayama19661014
Hi Andrew, > > Step1) 192.168.40.3 addresses invalidate the understanding of ping. > > Not sure I understand this, can you rephrase? Sorry For pingd, we address 2 of the next. * 192.168.4.2 * 192.168.4.3 When one address cannot communicate, this problem occurs. When cluster can communic

Re: [Pacemaker] [Problem]The movement of the resource is not possible.

2010-12-01 Thread renayama19661014
Hi Andrew, I send a patch to 1.1. Mr. Mori performs the backporting for 1.0. Best Regards, Hideo Yamauchi. --- renayama19661...@ybb.ne.jp wrote: > Hi Andrew, > > > > Can 1.0 reflect this revision? > > > Because there is influence else, is it impossible? > > > > I have no objection to it be

Re: [Pacemaker] [Problem]The movement of the resource is not possible.

2010-12-15 Thread renayama19661014
Hi Andrew, > > About 1.0, I ask Mr. Mori for backporting. > > Will you revise 1.1? > > Yes, I have it modified locally. I'll push it out soon. I confirmed a revision for your PM1.1. * http://hg.clusterlabs.org/pacemaker/1.1/rev/862936c5bca3 I think to have PM1.0 reflect this revision. I ask

[Pacemaker] [Problem]post_notify_start_0 is carried out in the node that disappeared.

2011-02-15 Thread renayama19661014
Hi all, We test trouble at the time of the start of the Master/Slave resource. Step1) We start the first node and send cib. Last updated: Thu Feb 10 16:32:12 2011 Stack: Heartbeat Current DC: srv01 (c7435833-8bc5-43aa-8195-c666b818677f) - partition with quorum Version: 1.0.10-b0266d

Re: [Pacemaker] [Problem]post_notify_start_0 is carried out in the node that disappeared.

2011-02-15 Thread renayama19661014
Hi Andrew, Thank you for comment. > Perhaps I misunderstood - does the node fail _while_ we're running > post_notify_start_0? > Is that the ordering you're talking about? Yes. I think that stonith do not have to wait for post_notify_start_0 of the inoperative node. > If so, then the crmd is a

<    1   2   3   4   >