Re: [Pacemaker] A demand to the process trouble.(OpenAIS/Corosync and Pacemaker)
Hi Andrew, > Yes please. http://developerbugs.linux-foundation.org/show_bug.cgi?id=2161 Best Regards, Hideo Yamauchi. --- Andrew Beekhof wrote: > 2009/7/23 : > > Hi Andrew, > > > >> Can you open a bug for that. > >> I suspect the lrmd might be doing the wrong thing, but assign it to > >> pacemaker until I can prove that :-) > > > > All right. > > Had better I register problem in bugzilla? > > Yes please. > > ___ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] A demand to the process trouble.(OpenAIS/Corosync and Pacemaker)
2009/7/23 : > Hi Andrew, > >> Can you open a bug for that. >> I suspect the lrmd might be doing the wrong thing, but assign it to >> pacemaker until I can prove that :-) > > All right. > Had better I register problem in bugzilla? Yes please. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] A demand to the process trouble.(OpenAIS/Corosync and Pacemaker)
Hi Andrew, > Can you open a bug for that. > I suspect the lrmd might be doing the wrong thing, but assign it to > pacemaker until I can prove that :-) All right. Had better I register problem in bugzilla? #As for me, a problem understands the thing in lrmd. Best Regards, Hideo Yamauchi. --- Andrew Beekhof wrote: > 2009/7/22 : > > Hi Andrew, > > > > > >> If the crmd dies, then I (IIRC) the lrmd cancels all existing resource > >> monitoring. > >> However, when the crmd is recovered, it should setup the resource > >> monitoring again. > >> > >> Is the second part not happening? > > > > There are some patterns in a problem. > > > > When crmd/stonithd restarts in a ACT node in a DC node, the monitor > > completely stops. > > When others of lrmd/mgmtd restart in the STB node of the DC node, the > > monitor completely > stops. > > Can you open a bug for that. > I suspect the lrmd might be doing the wrong thing, but assign it to > pacemaker until I can prove that :-) > > >> > 1) When a process related to a monitor fell, a system reboots.(Emergency > >> > Reboot) > >> Thats the lazy way out. > > > > I think your opinion is right. > > If a monitor is reopened definitely, I think that the Emergency reboot is > > not necessary. > > > > Best Regards, > > Hideo Yamauchi. > > > > > > --- Andrew Beekhof wrote: > > > >> 2009/7/17 \xA0: > >> > Hi Andrew, > >> > > >> >> What do you mean by monitor here? > >> >> Do you mean that pacemaker would no longer detect if those two > >> >> processes died? > >> > > >> > This monitor means the monitor of the resource. > >> > When these processes fall, the monitor of the resource of lrmd/stonithd > >> > stops. > >> > For example, the monitor of external/ssh and pgsql stops. > >> > > >> >> Do you mean the way heartbeat behaves with "crm on" instead of "crm > >> >> respawn" ? > >> > > >> > I do not understand this meaning well. > >> > >> That makes two of us :-) > >> I'm not sure I really understand the problem here. > >> > >> If the crmd dies, then I (IIRC) the lrmd cancels all existing resource > >> monitoring. > >> However, when the crmd is recovered, it should setup the resource > >> monitoring again. > >> > >> Is the second part not happening? > >> > >> > > >> > I think that the following approach is necessary. > >> > > >> > 1) When a process related to a monitor fell, a system reboots.(Emergency > >> > Reboot) > >> > >> Thats the lazy way out. > >> > >> > 2) When a process related to a monitor fell, a monitor does not stop > >> > even if it reboot. > >> > > >> > The first approach thinks that realization is simple. > >> > > >> > Best Regards, > >> > Hideo Yamauchi. > >> > > >> > --- Andrew Beekhof wrote: > >> > > >> >> On Fri, Jul 17, 2009 at 3:34 AM, wrote: > >> >> > Hi, > >> >> > > >> >> > We began shift investigation to the combination of Pacemaker and > >> >> > corosync/openais now. > >> >> > > >> >> > We put Pacemaker and openais(whitetank) together and confirmed > >> >> > movement at the time of > the > >> >> process > >> >> > trouble. > >> >> > (This is the function that a reboot emergency occurred by a > >> >> > combination with Heartbeat.) > >> >> > > >> >> > I let a process of Pacemaker break down. (kill -9 pid) > >> >> > The following behavior was seen then. > >> >> > > >> >> > * When crmd reboots on ACT node(Not DC), the monitor of the lrmd > >> >> > resource stops. > >> >> > \xA0And the monitor of the stonnith resource stops. > >> >> > >> >> What do you mean by monitor here? > >> >> Do you mean that pacemaker would no longer detect if those two > >> >> processes died? > >> >> > >> >> > * When stonithd reboots on ACT node(Not DC), the monitor of the > >> >> > stonnith resource stops. > >> >> > * When crmd reboots on STB node(DC), the monitor of the stonnith > >> >> > resource stops. > >> >> > * When pengine reboots on STB node(DC), the monitor of the stonnith > >> >> > resource stops. > >> >> > * And more > >> >> > > >> >> > We feel a problem for the stop of the monitor after the process > >> >> > reboot. > >> >> > When we combined openais/corosync, we hope a function such as the > >> >> > urgent reboot of > >> Heartbeat > >> >> to be > >> >> > included. > >> >> > >> >> Do you mean the way heartbeat behaves with "crm on" instead of "crm > >> >> respawn" ? > >> >> > > >> >> > Best Regards, > >> >> > Hideo Yamauchi. > >> >> > > >> >> > > >> >> > ___ > >> >> > Pacemaker mailing list > >> >> > Pacemaker@oss.clusterlabs.org > >> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> >> > > >> >> > >> >> ___ > >> >> Pacemaker mailing list > >> >> Pacemaker@oss.clusterlabs.org > >> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> >> > >> > > >> > > >> > > >> > ___ > >> > Pacemaker mailing list > >> > Pacemaker@oss.clusterlabs.org > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] A demand to the process trouble.(OpenAIS/Corosync and Pacemaker)
2009/7/22 : > Hi Andrew, > > >> If the crmd dies, then I (IIRC) the lrmd cancels all existing resource >> monitoring. >> However, when the crmd is recovered, it should setup the resource >> monitoring again. >> >> Is the second part not happening? > > There are some patterns in a problem. > > When crmd/stonithd restarts in a ACT node in a DC node, the monitor > completely stops. > When others of lrmd/mgmtd restart in the STB node of the DC node, the monitor > completely stops. Can you open a bug for that. I suspect the lrmd might be doing the wrong thing, but assign it to pacemaker until I can prove that :-) >> > 1) When a process related to a monitor fell, a system reboots.(Emergency >> > Reboot) >> Thats the lazy way out. > > I think your opinion is right. > If a monitor is reopened definitely, I think that the Emergency reboot is not > necessary. > > Best Regards, > Hideo Yamauchi. > > > --- Andrew Beekhof wrote: > >> 2009/7/17 : >> > Hi Andrew, >> > >> >> What do you mean by monitor here? >> >> Do you mean that pacemaker would no longer detect if those two processes >> >> died? >> > >> > This monitor means the monitor of the resource. >> > When these processes fall, the monitor of the resource of lrmd/stonithd >> > stops. >> > For example, the monitor of external/ssh and pgsql stops. >> > >> >> Do you mean the way heartbeat behaves with "crm on" instead of "crm >> >> respawn" ? >> > >> > I do not understand this meaning well. >> >> That makes two of us :-) >> I'm not sure I really understand the problem here. >> >> If the crmd dies, then I (IIRC) the lrmd cancels all existing resource >> monitoring. >> However, when the crmd is recovered, it should setup the resource >> monitoring again. >> >> Is the second part not happening? >> >> > >> > I think that the following approach is necessary. >> > >> > 1) When a process related to a monitor fell, a system reboots.(Emergency >> > Reboot) >> >> Thats the lazy way out. >> >> > 2) When a process related to a monitor fell, a monitor does not stop even >> > if it reboot. >> > >> > The first approach thinks that realization is simple. >> > >> > Best Regards, >> > Hideo Yamauchi. >> > >> > --- Andrew Beekhof wrote: >> > >> >> On Fri, Jul 17, 2009 at 3:34 AM, wrote: >> >> > Hi, >> >> > >> >> > We began shift investigation to the combination of Pacemaker and >> >> > corosync/openais now. >> >> > >> >> > We put Pacemaker and openais(whitetank) together and confirmed movement >> >> > at the time of the >> >> process >> >> > trouble. >> >> > (This is the function that a reboot emergency occurred by a combination >> >> > with Heartbeat.) >> >> > >> >> > I let a process of Pacemaker break down. (kill -9 pid) >> >> > The following behavior was seen then. >> >> > >> >> > * When crmd reboots on ACT node(Not DC), the monitor of the lrmd >> >> > resource stops. >> >> > And the monitor of the stonnith resource stops. >> >> >> >> What do you mean by monitor here? >> >> Do you mean that pacemaker would no longer detect if those two processes >> >> died? >> >> >> >> > * When stonithd reboots on ACT node(Not DC), the monitor of the >> >> > stonnith resource stops. >> >> > * When crmd reboots on STB node(DC), the monitor of the stonnith >> >> > resource stops. >> >> > * When pengine reboots on STB node(DC), the monitor of the stonnith >> >> > resource stops. >> >> > * And more >> >> > >> >> > We feel a problem for the stop of the monitor after the process reboot. >> >> > When we combined openais/corosync, we hope a function such as the >> >> > urgent reboot of >> Heartbeat >> >> to be >> >> > included. >> >> >> >> Do you mean the way heartbeat behaves with "crm on" instead of "crm >> >> respawn" ? >> >> > >> >> > Best Regards, >> >> > Hideo Yamauchi. >> >> > >> >> > >> >> > ___ >> >> > Pacemaker mailing list >> >> > Pacemaker@oss.clusterlabs.org >> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> > >> >> >> >> ___ >> >> Pacemaker mailing list >> >> Pacemaker@oss.clusterlabs.org >> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> >> > >> > >> > >> > ___ >> > Pacemaker mailing list >> > Pacemaker@oss.clusterlabs.org >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> > >> >> ___ >> Pacemaker mailing list >> Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > > > > > ___ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] A demand to the process trouble.(OpenAIS/Corosync and Pacemaker)
Hi Andrew, > If the crmd dies, then I (IIRC) the lrmd cancels all existing resource > monitoring. > However, when the crmd is recovered, it should setup the resource > monitoring again. > > Is the second part not happening? There are some patterns in a problem. When crmd/stonithd restarts in a ACT node in a DC node, the monitor completely stops. When others of lrmd/mgmtd restart in the STB node of the DC node, the monitor completely stops. > > 1) When a process related to a monitor fell, a system reboots.(Emergency > > Reboot) > Thats the lazy way out. I think your opinion is right. If a monitor is reopened definitely, I think that the Emergency reboot is not necessary. Best Regards, Hideo Yamauchi. --- Andrew Beekhof wrote: > 2009/7/17 : > > Hi Andrew, > > > >> What do you mean by monitor here? > >> Do you mean that pacemaker would no longer detect if those two processes > >> died? > > > > This monitor means the monitor of the resource. > > When these processes fall, the monitor of the resource of lrmd/stonithd > > stops. > > For example, the monitor of external/ssh and pgsql stops. > > > >> Do you mean the way heartbeat behaves with "crm on" instead of "crm > >> respawn" ? > > > > I do not understand this meaning well. > > That makes two of us :-) > I'm not sure I really understand the problem here. > > If the crmd dies, then I (IIRC) the lrmd cancels all existing resource > monitoring. > However, when the crmd is recovered, it should setup the resource > monitoring again. > > Is the second part not happening? > > > > > I think that the following approach is necessary. > > > > 1) When a process related to a monitor fell, a system reboots.(Emergency > > Reboot) > > Thats the lazy way out. > > > 2) When a process related to a monitor fell, a monitor does not stop even > > if it reboot. > > > > The first approach thinks that realization is simple. > > > > Best Regards, > > Hideo Yamauchi. > > > > --- Andrew Beekhof wrote: > > > >> On Fri, Jul 17, 2009 at 3:34 AM, wrote: > >> > Hi, > >> > > >> > We began shift investigation to the combination of Pacemaker and > >> > corosync/openais now. > >> > > >> > We put Pacemaker and openais(whitetank) together and confirmed movement > >> > at the time of the > >> process > >> > trouble. > >> > (This is the function that a reboot emergency occurred by a combination > >> > with Heartbeat.) > >> > > >> > I let a process of Pacemaker break down. (kill -9 pid) > >> > The following behavior was seen then. > >> > > >> > * When crmd reboots on ACT node(Not DC), the monitor of the lrmd > >> > resource stops. > >> > \xA0And the monitor of the stonnith resource stops. > >> > >> What do you mean by monitor here? > >> Do you mean that pacemaker would no longer detect if those two processes > >> died? > >> > >> > * When stonithd reboots on ACT node(Not DC), the monitor of the stonnith > >> > resource stops. > >> > * When crmd reboots on STB node(DC), the monitor of the stonnith > >> > resource stops. > >> > * When pengine reboots on STB node(DC), the monitor of the stonnith > >> > resource stops. > >> > * And more > >> > > >> > We feel a problem for the stop of the monitor after the process reboot. > >> > When we combined openais/corosync, we hope a function such as the urgent > >> > reboot of > Heartbeat > >> to be > >> > included. > >> > >> Do you mean the way heartbeat behaves with "crm on" instead of "crm > >> respawn" ? > >> > > >> > Best Regards, > >> > Hideo Yamauchi. > >> > > >> > > >> > ___ > >> > Pacemaker mailing list > >> > Pacemaker@oss.clusterlabs.org > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > > >> > >> ___ > >> Pacemaker mailing list > >> Pacemaker@oss.clusterlabs.org > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > > > > > > > > ___ > > Pacemaker mailing list > > Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > ___ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] A demand to the process trouble.(OpenAIS/Corosync and Pacemaker)
2009/7/17 : > Hi Andrew, > >> What do you mean by monitor here? >> Do you mean that pacemaker would no longer detect if those two processes >> died? > > This monitor means the monitor of the resource. > When these processes fall, the monitor of the resource of lrmd/stonithd stops. > For example, the monitor of external/ssh and pgsql stops. > >> Do you mean the way heartbeat behaves with "crm on" instead of "crm respawn" >> ? > > I do not understand this meaning well. That makes two of us :-) I'm not sure I really understand the problem here. If the crmd dies, then I (IIRC) the lrmd cancels all existing resource monitoring. However, when the crmd is recovered, it should setup the resource monitoring again. Is the second part not happening? > > I think that the following approach is necessary. > > 1) When a process related to a monitor fell, a system reboots.(Emergency > Reboot) Thats the lazy way out. > 2) When a process related to a monitor fell, a monitor does not stop even if > it reboot. > > The first approach thinks that realization is simple. > > Best Regards, > Hideo Yamauchi. > > --- Andrew Beekhof wrote: > >> On Fri, Jul 17, 2009 at 3:34 AM, wrote: >> > Hi, >> > >> > We began shift investigation to the combination of Pacemaker and >> > corosync/openais now. >> > >> > We put Pacemaker and openais(whitetank) together and confirmed movement at >> > the time of the >> process >> > trouble. >> > (This is the function that a reboot emergency occurred by a combination >> > with Heartbeat.) >> > >> > I let a process of Pacemaker break down. (kill -9 pid) >> > The following behavior was seen then. >> > >> > * When crmd reboots on ACT node(Not DC), the monitor of the lrmd resource >> > stops. >> > And the monitor of the stonnith resource stops. >> >> What do you mean by monitor here? >> Do you mean that pacemaker would no longer detect if those two processes >> died? >> >> > * When stonithd reboots on ACT node(Not DC), the monitor of the stonnith >> > resource stops. >> > * When crmd reboots on STB node(DC), the monitor of the stonnith resource >> > stops. >> > * When pengine reboots on STB node(DC), the monitor of the stonnith >> > resource stops. >> > * And more >> > >> > We feel a problem for the stop of the monitor after the process reboot. >> > When we combined openais/corosync, we hope a function such as the urgent >> > reboot of Heartbeat >> to be >> > included. >> >> Do you mean the way heartbeat behaves with "crm on" instead of "crm respawn" >> ? >> > >> > Best Regards, >> > Hideo Yamauchi. >> > >> > >> > ___ >> > Pacemaker mailing list >> > Pacemaker@oss.clusterlabs.org >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> >> ___ >> Pacemaker mailing list >> Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > > > > ___ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] A demand to the process trouble.(OpenAIS/Corosync and Pacemaker)
Hi Andrew, > What do you mean by monitor here? > Do you mean that pacemaker would no longer detect if those two processes died? This monitor means the monitor of the resource. When these processes fall, the monitor of the resource of lrmd/stonithd stops. For example, the monitor of external/ssh and pgsql stops. > Do you mean the way heartbeat behaves with "crm on" instead of "crm respawn" ? I do not understand this meaning well. I think that the following approach is necessary. 1) When a process related to a monitor fell, a system reboots.(Emergency Reboot) 2) When a process related to a monitor fell, a monitor does not stop even if it reboot. The first approach thinks that realization is simple. Best Regards, Hideo Yamauchi. --- Andrew Beekhof wrote: > On Fri, Jul 17, 2009 at 3:34 AM, wrote: > > Hi, > > > > We began shift investigation to the combination of Pacemaker and > > corosync/openais now. > > > > We put Pacemaker and openais(whitetank) together and confirmed movement at > > the time of the > process > > trouble. > > (This is the function that a reboot emergency occurred by a combination > > with Heartbeat.) > > > > I let a process of Pacemaker break down. (kill -9 pid) > > The following behavior was seen then. > > > > * When crmd reboots on ACT node(Not DC), the monitor of the lrmd resource > > stops. > > \xA0And the monitor of the stonnith resource stops. > > What do you mean by monitor here? > Do you mean that pacemaker would no longer detect if those two processes died? > > > * When stonithd reboots on ACT node(Not DC), the monitor of the stonnith > > resource stops. > > * When crmd reboots on STB node(DC), the monitor of the stonnith resource > > stops. > > * When pengine reboots on STB node(DC), the monitor of the stonnith > > resource stops. > > * And more > > > > We feel a problem for the stop of the monitor after the process reboot. > > When we combined openais/corosync, we hope a function such as the urgent > > reboot of Heartbeat > to be > > included. > > Do you mean the way heartbeat behaves with "crm on" instead of "crm respawn" ? > > > > Best Regards, > > Hideo Yamauchi. > > > > > > ___ > > Pacemaker mailing list > > Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > ___ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] A demand to the process trouble.(OpenAIS/Corosync and Pacemaker)
On Fri, Jul 17, 2009 at 3:34 AM, wrote: > Hi, > > We began shift investigation to the combination of Pacemaker and > corosync/openais now. > > We put Pacemaker and openais(whitetank) together and confirmed movement at > the time of the process > trouble. > (This is the function that a reboot emergency occurred by a combination with > Heartbeat.) > > I let a process of Pacemaker break down. (kill -9 pid) > The following behavior was seen then. > > * When crmd reboots on ACT node(Not DC), the monitor of the lrmd resource > stops. > And the monitor of the stonnith resource stops. What do you mean by monitor here? Do you mean that pacemaker would no longer detect if those two processes died? > * When stonithd reboots on ACT node(Not DC), the monitor of the stonnith > resource stops. > * When crmd reboots on STB node(DC), the monitor of the stonnith resource > stops. > * When pengine reboots on STB node(DC), the monitor of the stonnith resource > stops. > * And more > > We feel a problem for the stop of the monitor after the process reboot. > When we combined openais/corosync, we hope a function such as the urgent > reboot of Heartbeat to be > included. Do you mean the way heartbeat behaves with "crm on" instead of "crm respawn" ? > > Best Regards, > Hideo Yamauchi. > > > ___ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker