Re: [Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

Andrew Beekhof Tue, 28 Sep 2010 02:47:15 -0700

On Mon, Sep 27, 2010 at 7:26 AM,  <renayama19661...@ybb.ne.jp> wrote:
> Hi,
>
> When I investigated another problem, I discovered this phenomenon.
> If attrd causes process trouble and does not restart, the problem does not 
> occur.
>
> Step1) After start, it causes a monitor error in UmIPaddr twice.
>
> Online: [ srv01 srv02 ]
>
>  Resource Group: UMgroup01
>     UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
>     UmIPaddr   (ocf::heartbeat:Dummy2):        Started srv01
>
> Migration summary:
> * Node srv02:
> * Node srv01:
>   UmIPaddr: migration-threshold=10 fail-count=2
>
> Step2) Kill Attrd and Attrd reboots.
>
> Online: [ srv01 srv02 ]
>
>  Resource Group: UMgroup01
>     UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
>     UmIPaddr   (ocf::heartbeat:Dummy2):        Started srv01
>
> Migration summary:
> * Node srv02:
> * Node srv01:
>   UmIPaddr: migration-threshold=10 fail-count=2
>
> Step3) It causes a monitor error in UmIPaddr.
>
> Online: [ srv01 srv02 ]
>
>  Resource Group: UMgroup01
>     UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
>     UmIPaddr   (ocf::heartbeat:Dummy2):        Started srv01
>
> Migration summary:
> * Node srv02:
> * Node srv01:
>   UmIPaddr: migration-threshold=10 fail-count=1 -----> Fail-count return to 
> the first.
>
> The problem is so that attrd disappears fail-count by reboot.(Hash-tables is 
> Lost.)
> It is a problem very much that the trouble number of times is initialized.
>
> I think that there is the following method.
>
> method 1)Attrd maintain fail-count as a file in "/var/run" directories and 
> refer.
>
> method 2)When attrd started, Attrd communicates with cib and receives 
> fail-count.
>
> Is there a better method?
>
> Please think about the solution of this problem.


Hmmmm... a tricky one.

The problem here is that attrd is supposed to be the authoritative
source for this sort of data.
Additionally, you don't always want attrd reading from the status
section - like after the cluster restarts.

For failcount, the crmd could keep a hashtable of the current values
which it could re-send to attrd if it detects a disconnection.
But that might not be a generic-enough solution.

The chance that attrd dies _and_ there were relevant values for
fail-count is pretty remote though... is this a real problem you've
experienced or a theoretical one?

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

Reply via email to