Re: [Pacemaker] crond on both nodes (active/passive) but some jobs on active only

2013-07-05 Thread andreas graeper
thanks a lot !


2013/7/5 Lars Ellenberg 

> On Fri, Jul 05, 2013 at 04:52:35PM +0200, andreas graeper wrote:
> > when i wrote a script handled by ocf:heartbeat:anything i.e. that is
> > signalling the cron-daemon to reload crontabs
> > when crontab file is enabled by symlink:start and disabled by
> symlink:stop
> >
> > how can i achieve that the script runs after symlink :start and :stop ?
> > when i define order-constraint R1 then R2 this implizit means R1:start ,
> > R2:start and R2:stop, R1:stop ?
> >
>
> Not an answer to that specific question,
> rather a "why even bother" suggestion:
>
> You say:
> > > two nodes active/passive and fetchmail as cronjob shall run on active
> only.
>
> How do you know the node is "active"?
> Maybe some specific file system is mounted?
> Great.  You have files and directories
> which are only visible on an "active" node.
>
> Why not prefix your cron job lines with
> test -e /this/file/only/visible/on/active || exit 0; real cron command
> follows
>  or
> cd /some/dir/only/on/active || exit 0; real cron command
>
>  or a wrapper, if that looks too ugly
> only-on-active real cron command
>
> /bin/only-on-active:
> #!/bin/sh
> same-active-test-as-above || exit 0
> "$@" # do the real cron command
>
> Lars
>
> > 2013/7/5 andreas graeper 
> >
> > > hi,
> > > two nodes active/passive and fetchmail as cronjob shall run on active
> only.
> > >
> > > i use ocf:heartbeat:symlink to move / rename
> > >
> > > /etc/cron.d/jobs <> /etc/cron.d/jobs.disable
> > >
> > > i read anywhere crond ignores files with dot.
> > >
> > > but new experience: crond needs to restarted or signalled.
> > >
> > > how this is done best within pacemaker ?
> > > is clone for me ?
> > >
> > >
> > > thanks in advance
> > > andreas
>
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker remote nodes, naming, and attributes

2013-07-05 Thread David Vossel
- Original Message -
> From: "David Vossel" 
> To: "The Pacemaker cluster resource manager" 
> Sent: Wednesday, July 3, 2013 4:20:37 PM
> Subject: Re: [Pacemaker] Pacemaker remote nodes, naming, and attributes
> 
> - Original Message -
> > From: "Lindsay Todd" 
> > To: "The Pacemaker cluster resource manager"
> > 
> > Sent: Wednesday, July 3, 2013 2:12:05 PM
> > Subject: Re: [Pacemaker] Pacemaker remote nodes, naming, and attributes
> > 
> > Well, I'm not getting failures right now simply with attributes, but I can
> > induce a failure by stopping the vm-db02 (it puts db02 into an unclean
> > state, and attempts to migrate the unrelated vm-compute-test). I've
> > collected the commands from my latest interactions, a crm_report, and a gdb
> > traceback from the core file that crmd dumped, into bug 5164.
> 
> 
> Thanks, hopefully I can start investigating this Friday
> 
> -- Vossel

Yeah, this is a bad one.  Adding the node attributes using crm_attribute for 
the remote-node did some unexpected things to the crmd component.  Somehow the 
remote-node was getting entered into the cluster node cache... which made it 
look like we had both a cluster-node and remote-node named the same thing... 
not good.

I think I got that part worked out.  Try this patch.

https://github.com/ClusterLabs/pacemaker/commit/67dfff76d632f1796c9ded8fd367aa49258c8c32

Rather than trying to patch RCs, it might be worth trying out the master branch 
on github (which already has this patch).  If you aren't already, use rpms to 
make your life easier.  Running 'make rpm' in the source directory will 
generate them for you.

There was another bug fixed recently in pacemaker_remote involving the 
directory created for resource agents to store their temporary data (stuff like 
pid files).  I believe the fix was not introduced until 1.1.10rc6.

-- Vossel


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crond on both nodes (active/passive) but some jobs on active only

2013-07-05 Thread Lars Ellenberg
On Fri, Jul 05, 2013 at 04:52:35PM +0200, andreas graeper wrote:
> when i wrote a script handled by ocf:heartbeat:anything i.e. that is
> signalling the cron-daemon to reload crontabs
> when crontab file is enabled by symlink:start and disabled by symlink:stop
> 
> how can i achieve that the script runs after symlink :start and :stop ?
> when i define order-constraint R1 then R2 this implizit means R1:start ,
> R2:start and R2:stop, R1:stop ?
> 

Not an answer to that specific question,
rather a "why even bother" suggestion:

You say:
> > two nodes active/passive and fetchmail as cronjob shall run on active only.

How do you know the node is "active"?
Maybe some specific file system is mounted?
Great.  You have files and directories
which are only visible on an "active" node.

Why not prefix your cron job lines with
test -e /this/file/only/visible/on/active || exit 0; real cron command follows
 or
cd /some/dir/only/on/active || exit 0; real cron command

 or a wrapper, if that looks too ugly
only-on-active real cron command

/bin/only-on-active:
#!/bin/sh
same-active-test-as-above || exit 0
"$@" # do the real cron command

Lars

> 2013/7/5 andreas graeper 
> 
> > hi,
> > two nodes active/passive and fetchmail as cronjob shall run on active only.
> >
> > i use ocf:heartbeat:symlink to move / rename
> >
> > /etc/cron.d/jobs <> /etc/cron.d/jobs.disable
> >
> > i read anywhere crond ignores files with dot.
> >
> > but new experience: crond needs to restarted or signalled.
> >
> > how this is done best within pacemaker ?
> > is clone for me ?
> >
> >
> > thanks in advance
> > andreas


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Another question about fencing/stonithing

2013-07-05 Thread Andreas Mock
Thank you for your hint.

 There is a German saying which I try to translate:
"You don't see the forest 'cause of all the trees"

So, I'll see.

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Digimer [mailto:li...@alteeve.ca] 
Gesendet: Freitag, 5. Juli 2013 17:22
An: Andreas Mock
Cc: 'The Pacemaker cluster resource manager'; 'Marek Grac'
Betreff: Re: AW: [Pacemaker] Another question about fencing/stonithing

Andrew might know the trick. In theory, putting your agent into the 
/usr/sbin or /sbin directory (where ever the other agents are) should 
"just work". You're sure the exit codes are appropriate? I am sure they 
are, but just thinking out loud about too-obvious-to-see possible issues.

On 05/07/13 11:17, Andreas Mock wrote:
> Hi Digimer,
>
> sorry I forget to mention that I implemented the metadata-call
> accordingly. But it may be the "registration" thing which
> is necessary to make it know to the stonith/fencing daemon.
>
> I don't know. I'm wondering a little bit that there is no
> pointer how to do it.
>
> Thank you for your answer!
>
> Best regards
> Andreas Mock
>
>
> -Ursprüngliche Nachricht-
> Von: Digimer [mailto:li...@alteeve.ca]
> Gesendet: Freitag, 5. Juli 2013 16:52
> An: The Pacemaker cluster resource manager
> Cc: Andreas Mock; Marek Grac
> Betreff: Re: [Pacemaker] Another question about fencing/stonithing
>
> On 05/07/13 03:34, Andreas Mock wrote:
>> Hi all,
>>
>> I just wrote a stonith agent which IMHO implements the
>> API spec found at https://fedorahosted.org/cluster/wiki/FenceAgentAPI.
>>
>> But it seems it has a problem when used as pacemaker stonith device.
>>
>> What has to be done, to have a stonith/fencing agent which implements
>> both roles. I'm pretty sure something is missing.
>> It's just a guess that it has something to do with listing "registered"
>> agents.
>>
>> What is a registered stonith agent and what is done while registering it?
>>
>> When I configure my own fencing agent as packemaker stonith device
>> and try to do a "stonith_admin --list=nodename" I get a "no such device"
>> error.
>>
>> Any pointer appreciated.
>>
>> Best regards
>> Andreas Mock
>
> The API doesn't (yet) cover the metadata action. The agents now have to
> print out XML validation of valid attributes and elements for your
> agent. If you call any existing fence_* agent with just -o metadata, you
> will see the format.
>
> I know rhcs can be forced to see the new agent by putting it in the same
> directory as the other agents and then running 'ccs_update_schema'. If
> pacemaker doesn't immediately see it, then there might be an equivalent
> command you can run.
>
> I will try to get the API updated. I'm not a cardinal source, but
> something is better than nothing. Marek (who I have cc'ed) is, so I can
> run the changes by him when done to ensure they're accurate.
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Pacemaker 1.1.10 rc 5 & rc 6

2013-07-05 Thread Andrii Moiseiev
Hi,

I'm trying to update pacemaker on centos 6.4 hosts but each release
introduces some new problems %).

we have centos 6.4 corosync, cman packages and latest pcs / pacemaker.
Cluster is cman based.

Pacemaker 1.1.10 rc5 was almost nice, excluding repeating message on all of
our nodes:

Jul  5 10:32:34 devpacemaker01 stonith-ng[14501]:   notice:
update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an
update diff failed (-206)
Jul  5 10:32:34 devpacemaker01 stonith-ng[14501]:  warning:
cib_process_diff: Diff 0.4.83 -> 0.4.84 from local not applied to 0.4.83:
Failed application of an update diff
Jul  5 10:32:34 devpacemaker01 stonith-ng[14501]:   notice:
update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an
update diff failed (-206)
Jul  5 10:32:34 devpacemaker01 stonith-ng[14501]:  warning:
cib_process_diff: Diff 0.4.84 -> 0.4.85 from local not applied to 0.4.84:
Failed application of an update diff
Jul  5 10:32:34 devpacemaker01 stonith-ng[14501]:   notice:
update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an
update diff failed (-206)
Jul  5 10:32:34 devpacemaker01 stonith-ng[14501]:  warning:
cib_process_diff: Diff 0.4.85 -> 0.4.86 from local not applied to 0.4.85:
Failed application of an update diff
Jul  5 10:32:34 devpacemaker01 stonith-ng[14501]:   notice:
update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an
update diff failed (-206)
Jul  5 10:32:34 devpacemaker01 stonith-ng[14501]:  warning:
cib_process_diff: Diff 0.4.86 -> 0.4.87 from local not applied to 0.4.86:
Failed application of an update diff
Jul  5 10:32:34 devpacemaker01 stonith-ng[14501]:   notice:
update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an
update diff failed (-206)
Jul  5 10:32:34 devpacemaker01 stonith-ng[14501]:  warning:
cib_process_diff: Diff 0.4.87 -> 0.4.88 from local not applied to 0.4.87:
Failed application of an update diff
Jul  5 10:32:34 devpacemaker01 stonith-ng[14501]:   notice:
update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an
update diff failed (-206)
Jul  5 10:32:35 devpacemaker01 stonith-ng[14501]:  warning:
cib_process_diff: Diff 0.4.88 -> 0.4.89 from local not applied to 0.4.88:
Failed application of an update diff
Jul  5 10:32:35 devpacemaker01 stonith-ng[14501]:   notice:
update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an
update diff failed (-206)

Not critical, as the stuff worked, but looks strange, it doesn't matter
what you do, it keeps complaining. Full CIB resync, fresh configuration
importing , nothing helps.

Cluster Properties:
 cluster-delay: 10s
 cluster-infrastructure: cman
 cluster-recheck-interval: 2min
 last-lrm-refresh: 1373023780
 no-quorum-policy: freeze
 start-failure-is-fatal: true
 stonith-enabled: false

today, we upgraded to 1.1.10 rc6 and it made it worse... Also, it broke
'default' fencing. Previously, even with  stonith-enabled: false, pacemaker
was trying to kill cman / corosync if connection is lost or split brain
occurs, but now it's not happening:

Jul  5 09:54:25 devpacemaker01 crmd[20840]:   notice:
tengine_stonith_notify: Peer devpacemaker03_eth1 was not terminated
(reboot) by devpacemaker02_eth1 for devpacemaker02_eth1: No such device
(ref=1fc11b87-529d-4f6c-b4e6-ffaa82c06bd8) by client stonith_admin.cman.8832
Jul  5 09:54:28 devpacemaker01 stonith-ng[20838]:   notice: remote_op_done:
Operation reboot of devpacemaker03_eth1 by devpacemaker02_eth1 for
stonith_admin.cman.8855@devpacemaker02_eth1.6e0e0da3: No such device
Jul  5 09:54:28 devpacemaker01 crmd[20840]:   notice:
tengine_stonith_notify: Peer devpacemaker03_eth1 was not terminated
(reboot) by devpacemaker02_eth1 for devpacemaker02_eth1: No such device
(ref=6e0e0da3-f9f9-43a0-933e-0ff9ec2cb390) by client stonith_admin.cman.8855
Jul  5 09:54:31 devpacemaker01 stonith-ng[20838]:   notice: remote_op_done:
Operation reboot of devpacemaker03_eth1 by devpacemaker02_eth1 for
stonith_admin.cman.9017@devpacemaker02_eth1.955b859b: No such device
Jul  5 09:54:31 devpacemaker01 crmd[20840]:   notice:
tengine_stonith_notify: Peer devpacemaker03_eth1 was not terminated
(reboot) by devpacemaker02_eth1 for devpacemaker02_eth1: No such device
(ref=955b859b-791e-4083-b760-a6f8f05ddc2f) by client stonith_admin.cman.9017
Jul  5 09:54:35 devpacemaker01 stonith-ng[20838]:   notice: remote_op_done:
Operation reboot of devpacemaker03_eth1 by devpacemaker02_eth1 for
stonith_admin.cman.9089@devpacemaker02_eth1.ede9aa4e: No such device
Jul  5 09:54:35 devpacemaker01 crmd[20840]:   notice:
tengine_stonith_notify: Peer devpacemaker03_eth1 was not terminated
(reboot) by devpacemaker02_eth1 for devpacemaker02_eth1: No such device
(ref=ede9aa4e-32e0-4f3d-bd3a-f519c1250363) by client stonith_admin.cman.9089
Jul  5 09:54:38 devpacemaker01 stonith-ng[20838]:   notice: remote_op_done:
Operation reboot of devpacemaker03_eth1 by devpacemaker02_eth1 for
stonith_admin.cman.9242@devpacemaker02_eth1.2d92ca8d: No such device
Ju

Re: [Pacemaker] Another question about fencing/stonithing

2013-07-05 Thread Digimer
Andrew might know the trick. In theory, putting your agent into the 
/usr/sbin or /sbin directory (where ever the other agents are) should 
"just work". You're sure the exit codes are appropriate? I am sure they 
are, but just thinking out loud about too-obvious-to-see possible issues.


On 05/07/13 11:17, Andreas Mock wrote:

Hi Digimer,

sorry I forget to mention that I implemented the metadata-call
accordingly. But it may be the "registration" thing which
is necessary to make it know to the stonith/fencing daemon.

I don't know. I'm wondering a little bit that there is no
pointer how to do it.

Thank you for your answer!

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Digimer [mailto:li...@alteeve.ca]
Gesendet: Freitag, 5. Juli 2013 16:52
An: The Pacemaker cluster resource manager
Cc: Andreas Mock; Marek Grac
Betreff: Re: [Pacemaker] Another question about fencing/stonithing

On 05/07/13 03:34, Andreas Mock wrote:

Hi all,

I just wrote a stonith agent which IMHO implements the
API spec found at https://fedorahosted.org/cluster/wiki/FenceAgentAPI.

But it seems it has a problem when used as pacemaker stonith device.

What has to be done, to have a stonith/fencing agent which implements
both roles. I'm pretty sure something is missing.
It's just a guess that it has something to do with listing "registered"
agents.

What is a registered stonith agent and what is done while registering it?

When I configure my own fencing agent as packemaker stonith device
and try to do a "stonith_admin --list=nodename" I get a "no such device"
error.

Any pointer appreciated.

Best regards
Andreas Mock


The API doesn't (yet) cover the metadata action. The agents now have to
print out XML validation of valid attributes and elements for your
agent. If you call any existing fence_* agent with just -o metadata, you
will see the format.

I know rhcs can be forced to see the new agent by putting it in the same
directory as the other agents and then running 'ccs_update_schema'. If
pacemaker doesn't immediately see it, then there might be an equivalent
command you can run.

I will try to get the API updated. I'm not a cardinal source, but
something is better than nothing. Marek (who I have cc'ed) is, so I can
run the changes by him when done to ensure they're accurate.




--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Another question about fencing/stonithing

2013-07-05 Thread Andreas Mock
Hi Digimer,

sorry I forget to mention that I implemented the metadata-call
accordingly. But it may be the "registration" thing which
is necessary to make it know to the stonith/fencing daemon.

I don't know. I'm wondering a little bit that there is no
pointer how to do it.

Thank you for your answer!

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Digimer [mailto:li...@alteeve.ca] 
Gesendet: Freitag, 5. Juli 2013 16:52
An: The Pacemaker cluster resource manager
Cc: Andreas Mock; Marek Grac
Betreff: Re: [Pacemaker] Another question about fencing/stonithing

On 05/07/13 03:34, Andreas Mock wrote:
> Hi all,
>
> I just wrote a stonith agent which IMHO implements the
> API spec found at https://fedorahosted.org/cluster/wiki/FenceAgentAPI.
>
> But it seems it has a problem when used as pacemaker stonith device.
>
> What has to be done, to have a stonith/fencing agent which implements
> both roles. I'm pretty sure something is missing.
> It's just a guess that it has something to do with listing "registered"
> agents.
>
> What is a registered stonith agent and what is done while registering it?
>
> When I configure my own fencing agent as packemaker stonith device
> and try to do a "stonith_admin --list=nodename" I get a "no such device"
> error.
>
> Any pointer appreciated.
>
> Best regards
> Andreas Mock

The API doesn't (yet) cover the metadata action. The agents now have to 
print out XML validation of valid attributes and elements for your 
agent. If you call any existing fence_* agent with just -o metadata, you 
will see the format.

I know rhcs can be forced to see the new agent by putting it in the same 
directory as the other agents and then running 'ccs_update_schema'. If 
pacemaker doesn't immediately see it, then there might be an equivalent 
command you can run.

I will try to get the API updated. I'm not a cardinal source, but 
something is better than nothing. Marek (who I have cc'ed) is, so I can 
run the changes by him when done to ensure they're accurate.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Another question about fencing/stonithing

2013-07-05 Thread Digimer

On 05/07/13 03:34, Andreas Mock wrote:

Hi all,

I just wrote a stonith agent which IMHO implements the
API spec found at https://fedorahosted.org/cluster/wiki/FenceAgentAPI.

But it seems it has a problem when used as pacemaker stonith device.

What has to be done, to have a stonith/fencing agent which implements
both roles. I'm pretty sure something is missing.
It's just a guess that it has something to do with listing "registered"
agents.

What is a registered stonith agent and what is done while registering it?

When I configure my own fencing agent as packemaker stonith device
and try to do a "stonith_admin --list=nodename" I get a "no such device"
error.

Any pointer appreciated.

Best regards
Andreas Mock


The API doesn't (yet) cover the metadata action. The agents now have to 
print out XML validation of valid attributes and elements for your 
agent. If you call any existing fence_* agent with just -o metadata, you 
will see the format.


I know rhcs can be forced to see the new agent by putting it in the same 
directory as the other agents and then running 'ccs_update_schema'. If 
pacemaker doesn't immediately see it, then there might be an equivalent 
command you can run.


I will try to get the API updated. I'm not a cardinal source, but 
something is better than nothing. Marek (who I have cc'ed) is, so I can 
run the changes by him when done to ensure they're accurate.


--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crond on both nodes (active/passive) but some jobs on active only

2013-07-05 Thread andreas graeper
when i wrote a script handled by ocf:heartbeat:anything i.e. that is
signalling the cron-daemon to reload crontabs
when crontab file is enabled by symlink:start and disabled by symlink:stop

how can i achieve that the script runs after symlink :start and :stop ?
when i define order-constraint R1 then R2 this implizit means R1:start ,
R2:start and R2:stop, R1:stop ?

thanks in advance
andreas


2013/7/5 andreas graeper 

> hi,
> two nodes active/passive and fetchmail as cronjob shall run on active only.
>
> i use ocf:heartbeat:symlink to move / rename
>
> /etc/cron.d/jobs <> /etc/cron.d/jobs.disable
>
> i read anywhere crond ignores files with dot.
>
> but new experience: crond needs to restarted or signalled.
>
> how this is done best within pacemaker ?
> is clone for me ?
>
>
> thanks in advance
> andreas
>
>
>
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] crond on both nodes (active/passive) but some jobs on active only

2013-07-05 Thread andreas graeper
hi,
two nodes active/passive and fetchmail as cronjob shall run on active only.

i use ocf:heartbeat:symlink to move / rename

/etc/cron.d/jobs <> /etc/cron.d/jobs.disable

i read anywhere crond ignores files with dot.

but new experience: crond needs to restarted or signalled.

how this is done best within pacemaker ?
is clone for me ?


thanks in advance
andreas
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Fwd: Java application failover problem

2013-07-05 Thread Martin Gazak
Hello,
we are facing the problem with the simple (I hope) cluster configuration
with 2 nodes ims0 and ims1 and 3 primitives (no shared storage or
something like this where data corruption is a danger):

- master-slave Java application ims (to be run normally on both nodes in
as master/slave, with our own OCF script) with embedded web server (to
be accessed by clients)

- ims-ip and ims-ip-src: shared IP address and outgoing address to be
run on the ims master solely

Below are listed the software versions, crm configuration and portions
of corosync log.

The problem is that although most of the time the setup works (i.e if
master ims application dies, slave one is promoted and ip addresses are
remapped) but sometimes when master ims application stops (fails or is
killed), the failover does not occur - the slave ims application remains
the slave and the shared IP address remains mapped on the node with died
ims.

I even created a testbed of 2 servers, killing the ims application from
cron every 15 minutes on supposed MAIN server to simulate the failure
and observe the failover and to replicate the problem (sometimes it
works properly for hours/days).

For example today (July 4, 23:45 local time) the ims at ims0 was killed,
but remained Master - no failover of IP addresses was performed and ims
on ims1 remained Slave:

Last updated: Fri Jul  5 02:07:18 2013
Last change: Thu Jul  4 23:33:46 2013
Stack: openais
Current DC: ims0 - partition with quorum
Version: 1.1.7-61a079313275f3e9d0e85671f62c721d32ce3563
2 Nodes configured, 2 expected votes
6 Resources configured.


Online: [ ims1 ims0 ]

 Master/Slave Set: ms-ims [ims]
 Masters: [ ims0 ]
 Slaves: [ ims1 ]
 Clone Set: clone-cluster-mon [cluster-mon]
 Started: [ ims0 ims1 ]
 Resource Group: on-ims-master
 ims-ip (ocf::heartbeat:IPaddr2):   Started ims0
 ims-ip-src (ocf::heartbeat:IPsrcaddr): Started ims0

The command 'crm node standby' on ims0 did not fix the thing: ims0
remained master (although standby):

Node ims0: standby
Online: [ ims1 ]

 Master/Slave Set: ms-ims [ims]
 ims:0  (ocf::microstepmis:imsMS):  Slave ims0 FAILED
 Slaves: [ ims1 ]
 Clone Set: clone-cluster-mon [cluster-mon]
 Started: [ ims1 ]
 Stopped: [ cluster-mon:0 ]

Failed actions:
ims:0_demote_0 (node=ims0, call=3179, rc=7, status=complete): not
running

Stoppping openais service on ims0 completely did the thing.

Could someone provide me with a hint, what to do ?
- provide more information (logs, ocf script) ?
- change something in configuration ?
- change the environment / versions ?

Thanks a lot

Martin Gazak


Software versions:
--
libpacemaker3-1.1.7-42.1
pacemaker-1.1.7-42.1
corosync-1.4.3-21.1
libcorosync4-1.4.3-21.1
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 2



Configuration:
--
node ims0 \
attributes standby="off"
node ims1 \
attributes standby="off"
primitive cluster-mon ocf:pacemaker:ClusterMon \
params htmlfile="/opt/ims/tomcat/webapps/ims/html/crm_status.html" \
op monitor interval="10"
primitive ims ocf:microstepmis:imsMS \
op monitor interval="1" role="Master" timeout="20" \
op monitor interval="2" role="Slave" timeout="20" \
op start interval="0" timeout="1800s" \
op stop interval="0" timeout="120s" \
op promote interval="0" timeout="180s" \
meta failure-timeout="360s"
primitive ims-ip ocf:heartbeat:IPaddr2 \
params ip="192.168.141.13" nic="bond1" iflabel="ims"
cidr_netmask="24" \
op monitor interval="15s" \
meta failure-timeout="60s"
primitive ims-ip-src ocf:heartbeat:IPsrcaddr \
params ipaddress="192.168.141.13" cidr_netmask="24" \
op monitor interval="15s" \
meta failure-timeout="60s"
group on-ims-master ims-ip ims-ip-src
ms ms-ims ims \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Started"
migration-threshold="1"
clone clone-cluster-mon cluster-mon
colocation ims_master inf: on-ims-master ms-ims:Master
order ms-ims-before inf: ms-ims:promote on-ims-master:start
property $id="cib-bootstrap-options" \
dc-version="1.1.7-61a079313275f3e9d0e85671f62c721d32ce3563" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
stonith-enabled="false" \
cluster-recheck-interval="1m" \
default-resource-stickiness="1000" \
last-lrm-refresh="1372951736" \
maintenance-mode="false"


corosync.log from ims0:
---
Jul 04 23:45:02 ims0 crmd: [3935]: info: process_lrm_event: LRM
operation ims:0_monitor_1000 (call=3046, rc=7, cib-update=6229,
confirmed=false) not running
Jul 04 23:45:02 ims0 crmd: [3935]: info: process_graph_event: Detected
action ims:0_monitor_1000 from a different transition: 4024 vs. 4035
Jul 04 23:45:02 ims0 crmd: [3935]: i

[Pacemaker] Another question about fencing/stonithing

2013-07-05 Thread Andreas Mock
Hi all,

I just wrote a stonith agent which IMHO implements the
API spec found at https://fedorahosted.org/cluster/wiki/FenceAgentAPI.

But it seems it has a problem when used as pacemaker stonith device.

What has to be done, to have a stonith/fencing agent which implements
both roles. I'm pretty sure something is missing.
It's just a guess that it has something to do with listing "registered"
agents.

What is a registered stonith agent and what is done while registering it?

When I configure my own fencing agent as packemaker stonith device
and try to do a "stonith_admin --list=nodename" I get a "no such device"
error.

Any pointer appreciated.

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org