[Linux-HA] Antw: Re: Making crm quit (#2)

2011-04-06 Thread Ulrich Windl
>>> Dejan Muhamedagic  schrieb am 06.04.2011 um 16:48 in
Nachricht <20110406144816.GC3673@squib>:
> Hi,
> 
> On Wed, Apr 06, 2011 at 03:49:58PM +0200, Ulrich Windl wrote:
> > Hi!
> > 
> > I just managed to make crm of SLES11 SP1 quit again:
> 
> Amazing :)
> 
> > # crm
> > crm(live)# configure
> > crm(live)configure# primitive prm_OCF1_dlm ocf:pacemaker::controld op 
> monitor interval="60" timeout="60"
> 
> Don't use '::', just ':'. I'll fix this too.

Hi!

I had noticed that, too. The first one was introduced by TAB-completion, while 
the other one was entered manually. I did not look close enough before hitting 
enter...

BTW: Is anybody from Novell/SUSE listening? I found several errors in their 
"High Availability Guide" I'd like to share with them...

Regards,
Ulrich


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Linux heartbeat resources question

2011-04-06 Thread Darren.Mansell
 

 

Hi all,

 

i have a scenario where i have 2 front ends that receive traffic (http

server) and  should run some scripts in crontab, some of the scripts
should just being running by 1 server at a time (active one) and others
should run on both.

 

Regarding the http like is load-sharing i think i cant use heartbeat,
right?

heartbeat is just for active-stanby or can we use to a active-active as
watchdog? i have a cisco css to load sharing the http, and i can make a
watchdog script to the apache. 

 

Regarding the cron crontrol i was thinking to make a script that
replaces the crontab file to whatever is the correct one. When the
heartbeat start what parameter is sent to the script that are resources?
a start if active node and nothing if is the standby?allways start?how
should i config the haresources to do it? what is the best way?

 

i have other situation that is making a nfs server in solaris 10, i have
2 servers with shared disks ( sun array), can i use heartbeat to this
too? it is possible to make it in such way that if i had i failover in
nfs server the clients doesn't need to reconnect?

 

Its a long post...sorry!

thanks!!

--

 

 

I'll have a go at this one. J

 

I've got some clusters implementing both of the features you're
requesting there. In fact, I've got quite a few heavy, mission-critical
clusters with web services doing load-balancing using Pacemaker and LVS.

 

To load-balance using Pacemaker, although I believe there are other
ways, I've always used a combination of cloned resources on the cluster,
ldirectord, and a virtual IP address. The virtual IP and ldirectord are
standard primitive resources grouped together so they get run together
on one node like so:

 

Resource Group: Load-Balancing

VIP   (ocf::heartbeat:IPaddr2):  Started NODE-01

ldirectord   (ocf::heartbeat:ldirectord):  Started NODE-01

 

A configuration for this would be something like:

 

primitive VIP ocf:heartbeat:IPaddr2 \

   params lvs_support="true" ip="192.168.1.100" cidr_netmask="24"
broadcast="192.168.1.255" \

   op monitor interval="1m" timeout="10s" \

   meta migration-threshold="10"

primitive ldirectord ocf:heartbeat:ldirectord \

   params configfile="/etc/ha.d/ldirectord.cf" \

   op monitor interval="2m" timeout="20s" \

   meta migration-threshold="10" target-role="Started"

group Load-Balancing VIP ldirectord

location Prefer-Node1 ldirectord \

   rule $id="prefer-node1-rule" 100: #uname eq NODE-01

 

And then just put your load-balancing rules in /etc/ha.d/ldirectord.cf:

 

checktimeout=5

checkinterval=7

autoreload=yes

logfile="/var/log/ldirectord.log"

quiescent=no

emailalert=yourem...@you.com

virtual=192.168.1.100:80

fallback=192.168.1.250:80

real=192.168.1.10:80 gate 100

real=192.168.1.20:80 gate 100

service=http

scheduler=wlc

protocol=tcp

checktype=negotiate

request="/"

receive="OK"

 

Pacemaker with Ldirectord/LVS can make a fantastic load-balancer with HA
built using only 2 nodes. I'm surprised more people don't use it in this
way as while it makes the config slightly more complicated, you can use
your passive node to run the cloned resource and maximise your
performance.

 

Note that to do this you'll need to put the VIP as an extra IP on your
loopback interface (I've still got to file a bug about this), and set
ARP parameters in sysctl. (look on the pacemaker and LVS wiki). You can
also configure LVS to sync the connection table so on failover you won't
lose any connections.

 

On to your second point. I've found that just writing a loop inside of a
shell script with the appropriate controls and adding it as an LSB style
resource to the cluster works fine. You can then add the resource to the
load-balancing group so it will only run on one node. It's a little bit
like rewriting cron, but cron isn't cluster aware ;) My script is a bit
like:

 

#!/bin/sh

# description: Start or stop the task script

#

### BEGIN INIT INFO  

# Provides: task 

# Required-Start: $network $syslog   

# Required-Stop: $network

# Default-Start: 3   

# Default-Stop: 0

# Description: Start or stop the task script   

### END INIT INFO

 

# Static variables here.

DIR=/opt/task/

BOTHER="y...@yourmail.com"

RUNFILE="/var/run/task"

RUNTIME="003000"

 

MAINLOOP() {

LOG="/var/log/task.log"

 

while true

do

   TODAY=`date +%d-%m-%Y`

   EXTRACTFILE="${DIR}OP/$TODAY.zip"

   echo `date` >> $LOG

 

   # Check for permission to run.

   if [ ! -f "$RUNFILE" ]

   then

  

[Linux-HA] When is the next release for resource agents?

2011-04-06 Thread Serge Dubrouski
Hello -

When is the next release for resource agents? Agents that come with
resource-agents-1.0.3-2.6.el5 form clusterlabs repository are very
outdated.pgsql is at least one year old or so.

-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Making crm quit (#2)

2011-04-06 Thread Dejan Muhamedagic
Hi,

On Wed, Apr 06, 2011 at 03:49:58PM +0200, Ulrich Windl wrote:
> Hi!
> 
> I just managed to make crm of SLES11 SP1 quit again:

Amazing :)

> # crm
> crm(live)# configure
> crm(live)configure# primitive prm_OCF1_dlm ocf:pacemaker::controld op monitor 
> interval="60" timeout="60"

Don't use '::', just ':'. I'll fix this too.

Thanks,

Dejan

> Traceback (most recent call last):
>   File "/usr/sbin/crm", line 45, in 
> main.run()
>   File "/usr/lib64/python2.6/site-packages/crm/main.py", line 293, in run
> if not parse_line(levels,shlex.split(inp)):
>   File "/usr/lib64/python2.6/site-packages/crm/main.py", line 147, in 
> parse_line
> rv = d() # execute the command
>   File "/usr/lib64/python2.6/site-packages/crm/main.py", line 146, in 
> d = lambda: cmd[0](*args)
>   File "/usr/lib64/python2.6/site-packages/crm/ui.py", line 1563, in 
> conf_primitive
> return self.__conf_object(cmd,*args)
>   File "/usr/lib64/python2.6/site-packages/crm/ui.py", line 1550, in 
> __conf_object
> return f()
>   File "/usr/lib64/python2.6/site-packages/crm/ui.py", line 1549, in 
> f = lambda: cib_factory.create_object(cmd,*args)
>   File "/usr/lib64/python2.6/site-packages/crm/cibconfig.py", line 1945, in 
> create_object
> return self.create_from_cli(CliParser().parse(list(args))) != None
>   File "/usr/lib64/python2.6/site-packages/crm/parse.py", line 756, in parse
> cli_list = parser_fn(s)
>   File "/usr/lib64/python2.6/site-packages/crm/parse.py", line 99, in 
> parse_resource
> cli_parse_rsctype(s[2],head)
>   File "/usr/lib64/python2.6/site-packages/crm/parse.py", line 35, in 
> cli_parse_rsctype
> ra_class,provider,rsc_type = disambiguate_ra_type(s)
> TypeError: 'NoneType' object is not iterable
> 
> 
> Regards,
> Ulrich
> 
> 
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] HA software download

2011-04-06 Thread Ajaykumar Narayanaswamy
Hi All,

I would like to know whether Linux OS has any inbuilt HA/Failover software or 
should we procure some third-party HA s/w.

I came to know about heartbeat package which is an Open source application and 
also have downloaded the same, but does this help in providing failover for 
LDAP Server running on Linux OS for about 2000 SAP Users who would be using it 
for authentication.

Looking forward to hearing from you.

Regards,
Ajay



http://www.mindtree.com/email/disclaimer.html
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Robustness of "crm"

2011-04-06 Thread Dejan Muhamedagic
Hi,

On Wed, Apr 06, 2011 at 12:23:25PM +0200, Ulrich Windl wrote:
> Hi,
> 
> found in SLES11 SP1 with all current updates (pacemaker-1.1.5-5.5.5):
> # crm
> crm(live)# configure
> crm(live)configure# template
> crm(live)configure template# list
> Traceback (most recent call last):
>   File "/usr/sbin/crm", line 45, in 
> main.run()
>   File "/usr/lib64/python2.6/site-packages/crm/main.py", line 293, in run
> if not parse_line(levels,shlex.split(inp)):
>   File "/usr/lib64/python2.6/site-packages/crm/main.py", line 147, in 
> parse_line
> rv = d() # execute the command
>   File "/usr/lib64/python2.6/site-packages/crm/main.py", line 146, in 
> d = lambda: cmd[0](*args)
>   File "/usr/lib64/python2.6/site-packages/crm/ui.py", line 631, in list
> multicolumn(listconfigs())
> NameError: global name 'listconfigs' is not defined
> # crm
> crm(live)# configure
> crm(live)configure# template
> crm(live)configure template# list templates
> Traceback (most recent call last):
>   File "/usr/sbin/crm", line 45, in 
> main.run()
>   File "/usr/lib64/python2.6/site-packages/crm/main.py", line 293, in run
> if not parse_line(levels,shlex.split(inp)):
>   File "/usr/lib64/python2.6/site-packages/crm/main.py", line 147, in 
> parse_line
> rv = d() # execute the command
>   File "/usr/lib64/python2.6/site-packages/crm/main.py", line 146, in 
> d = lambda: cmd[0](*args)
>   File "/usr/lib64/python2.6/site-packages/crm/ui.py", line 629, in list
> multicolumn(listtemplates())
> NameError: global name 'listtemplates' is not defined
> #
> 
> I feel the crm shell should be a bit more robust, and not exit that quickly.

I feel so too :) It's a regression, obviously recently introduced.
And obviously nobody's using templates. Many thanks for reporting.

Dejan


> Regards,
> Ulrich
> 
> 
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Linux heartbeat resources question

2011-04-06 Thread blast

Hi all,

i have a scenario where i have 2 front ends that receive traffic (http
server) and  should run some scripts in crontab, some of the scripts should
just being running by 1 server at a time (active one) and others should run
on both.

Regarding the http like is load-sharing i think i cant use heartbeat, right?
heartbeat is just for active-stanby or can we use to a active-active as
watchdog? i have a cisco css to load sharing the http, and i can make a
watchdog script to the apache. 

Regarding the cron crontrol i was thinking to make a script that replaces
the crontab file to whatever is the correct one. When the heartbeat start
what parameter is sent to the script that are resources? a start if active
node and nothing if is the standby?allways start?how should i config the
haresources to do it? what is the best way?

i have other situation that is making a nfs server in solaris 10, i have 2
servers with shared disks ( sun array), can i use heartbeat to this too? it
is possible to make it in such way that if i had i failover in nfs server
the clients doesn't need to reconnect?

Its a long post...sorry!
thanks!!
-- 
View this message in context: 
http://old.nabble.com/Linux-heartbeat-resources-question-tp31293501p31293501.html
Sent from the Linux-HA mailing list archive at Nabble.com.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Making crm quit (#2)

2011-04-06 Thread Ulrich Windl
Hi!

I just managed to make crm of SLES11 SP1 quit again:
# crm
crm(live)# configure
crm(live)configure# primitive prm_OCF1_dlm ocf:pacemaker::controld op monitor 
interval="60" timeout="60"
Traceback (most recent call last):
  File "/usr/sbin/crm", line 45, in 
main.run()
  File "/usr/lib64/python2.6/site-packages/crm/main.py", line 293, in run
if not parse_line(levels,shlex.split(inp)):
  File "/usr/lib64/python2.6/site-packages/crm/main.py", line 147, in parse_line
rv = d() # execute the command
  File "/usr/lib64/python2.6/site-packages/crm/main.py", line 146, in 
d = lambda: cmd[0](*args)
  File "/usr/lib64/python2.6/site-packages/crm/ui.py", line 1563, in 
conf_primitive
return self.__conf_object(cmd,*args)
  File "/usr/lib64/python2.6/site-packages/crm/ui.py", line 1550, in 
__conf_object
return f()
  File "/usr/lib64/python2.6/site-packages/crm/ui.py", line 1549, in 
f = lambda: cib_factory.create_object(cmd,*args)
  File "/usr/lib64/python2.6/site-packages/crm/cibconfig.py", line 1945, in 
create_object
return self.create_from_cli(CliParser().parse(list(args))) != None
  File "/usr/lib64/python2.6/site-packages/crm/parse.py", line 756, in parse
cli_list = parser_fn(s)
  File "/usr/lib64/python2.6/site-packages/crm/parse.py", line 99, in 
parse_resource
cli_parse_rsctype(s[2],head)
  File "/usr/lib64/python2.6/site-packages/crm/parse.py", line 35, in 
cli_parse_rsctype
ra_class,provider,rsc_type = disambiguate_ra_type(s)
TypeError: 'NoneType' object is not iterable


Regards,
Ulrich


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] crm commands : how to reduce the delay between two commands

2011-04-06 Thread Alain.Moulle
Done : Bug 2579 
 has 
been added to the database
Thanks
Alain

Andrew Beekhof a écrit :
> On Fri, Mar 25, 2011 at 2:07 PM, Alain.Moulle  wrote:
>   
>> Hi,
>> I tried but it does not work :
>> crm_resource -r resname -p target-role -v started
>> because it adds a target-role=started as params
>> whereis I already have a meta target-role=Stopped
>> so resource does not start.
>> So I tried :
>> crm_resource -r resname -m -p target-role -v started
>> then resource starts successfully.
>> But with a loop:
>> for i in {1..20}; do echo resname$i ; crm_resource -r resname$i -m -p
>> target-role -v started; done
>> The first one is started immediately, and the 19th other ones are
>> started ~20s after the first one
>> but all in one salvo.
>> So it seems to be quite the same behavior as successive "crm resource
>> start resname$i" commands.
>> First command is taken in account immediately, then there is a delay
>> perhaps before pooling eventuals
>> other crm commands, but as during this delay , my loop has already sent
>> 19 commands, these are
>> taken in account in one shot when the new polling occurs.
>>
>> Meaning, that manually, if you wait that the expected result of your crm
>> command is displayed on crm_mon,
>> before sending the second one etc. there is always this 10 to 20s
>> latency between each commands.
>> (Same behavior inside scripts if the script waits for the command to be
>> really completed by testing ...)
>>
>> Hope my description is clear enough ...
>> 
>
> Yes.  Looks like something in core pacemaker.
> Could you file a bug for this and include the output of your above
> testcase but with - added to the crm_resource command line please?
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
>   

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] crm commands : how to reduce the delay between two commands

2011-04-06 Thread Andrew Beekhof
On Fri, Mar 25, 2011 at 2:07 PM, Alain.Moulle  wrote:
> Hi,
> I tried but it does not work :
> crm_resource -r resname -p target-role -v started
> because it adds a target-role=started as params
> whereis I already have a meta target-role=Stopped
> so resource does not start.
> So I tried :
> crm_resource -r resname -m -p target-role -v started
> then resource starts successfully.
> But with a loop:
> for i in {1..20}; do echo resname$i ; crm_resource -r resname$i -m -p
> target-role -v started; done
> The first one is started immediately, and the 19th other ones are
> started ~20s after the first one
> but all in one salvo.
> So it seems to be quite the same behavior as successive "crm resource
> start resname$i" commands.
> First command is taken in account immediately, then there is a delay
> perhaps before pooling eventuals
> other crm commands, but as during this delay , my loop has already sent
> 19 commands, these are
> taken in account in one shot when the new polling occurs.
>
> Meaning, that manually, if you wait that the expected result of your crm
> command is displayed on crm_mon,
> before sending the second one etc. there is always this 10 to 20s
> latency between each commands.
> (Same behavior inside scripts if the script waits for the command to be
> really completed by testing ...)
>
> Hope my description is clear enough ...

Yes.  Looks like something in core pacemaker.
Could you file a bug for this and include the output of your above
testcase but with - added to the crm_resource command line please?
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] why Cluster "restarts" A, before starting B on surviving node.

2011-04-06 Thread Andrew Beekhof
I meant in the form of a hb_report which contains the necessary logs
and status information necessary to diagnose your issue.

On Mon, Apr 4, 2011 at 12:11 PM, Muhammad Sharfuddin
 wrote:
>
> On Mon, 2011-04-04 at 10:42 +0200, Andrew Beekhof wrote:
>> On Thu, Mar 24, 2011 at 7:42 PM, Muhammad Sharfuddin
>>  wrote:
>> > we have two resources A and B
>> > Cluster starts A on node1, and B on node2, while failover node for A is
>> > node2 and failover node for B is node1
>> >
>> > B cant start without A, so I have following location rules:
>> >
>> >          order first_A_then_B : A  B
>> >
>> > Problem/Question
>> > 
>> > now if B fails due to node failure, Cluster "restarts" A, before
>> > starting B on surviving node(node1).
>> >
>> > my question/problem, is why Cluster restarts A.
>>
>> my question/problem, is that you've given us no information on which
>> to base a reply.
> SLES 11 SP1 updated
> SLE HAE SP1 + updated
> node1 hostname:
>
> this is a 'distributed' and/or 'Active/Active', two nodes Cluster.
>
> Scenario:
> Cluster starts resource A on node1, and resource B on node2, due to
> following location constraints:
>
>  location PrimaryLoc-of-A A +inf: node1
>  location PrimaryLoc-of-B B +inf: node2
>
>
> B is a resource that is dependent on resource A, therefor I have a order
> constraint:
>
>   order first_A_then_B : A  B
>
> Now node2 blown, so cluster starts moving resource B(i.e resource 'B'
> failover) on node1(where resource A is already running).. but during
> this process Cluster first stops and starts(restarts) resource A, and
> then starts B.
>
> Problem/Question:
>
> Why Cluster restarts resource 'A' during failover process of resource B
>
>
>
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Robustness of "crm"

2011-04-06 Thread Ulrich Windl
Hi,

found in SLES11 SP1 with all current updates (pacemaker-1.1.5-5.5.5):
# crm
crm(live)# configure
crm(live)configure# template
crm(live)configure template# list
Traceback (most recent call last):
  File "/usr/sbin/crm", line 45, in 
main.run()
  File "/usr/lib64/python2.6/site-packages/crm/main.py", line 293, in run
if not parse_line(levels,shlex.split(inp)):
  File "/usr/lib64/python2.6/site-packages/crm/main.py", line 147, in parse_line
rv = d() # execute the command
  File "/usr/lib64/python2.6/site-packages/crm/main.py", line 146, in 
d = lambda: cmd[0](*args)
  File "/usr/lib64/python2.6/site-packages/crm/ui.py", line 631, in list
multicolumn(listconfigs())
NameError: global name 'listconfigs' is not defined
# crm
crm(live)# configure
crm(live)configure# template
crm(live)configure template# list templates
Traceback (most recent call last):
  File "/usr/sbin/crm", line 45, in 
main.run()
  File "/usr/lib64/python2.6/site-packages/crm/main.py", line 293, in run
if not parse_line(levels,shlex.split(inp)):
  File "/usr/lib64/python2.6/site-packages/crm/main.py", line 147, in parse_line
rv = d() # execute the command
  File "/usr/lib64/python2.6/site-packages/crm/main.py", line 146, in 
d = lambda: cmd[0](*args)
  File "/usr/lib64/python2.6/site-packages/crm/ui.py", line 629, in list
multicolumn(listtemplates())
NameError: global name 'listtemplates' is not defined
#

I feel the crm shell should be a bit more robust, and not exit that quickly.

Regards,
Ulrich


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat ordering

2011-04-06 Thread Andrew Beekhof
On Tue, Apr 5, 2011 at 11:58 AM, Maxim Ianoglo  wrote:
> Hello,
>
> I have four serves in a HA cluster:
> NodeA
> NodeB
> NodeC
> NodeD
>
> There are defined three groups of resources and one inline resource:
> 1. group_storage ( NFS VIP, NFS Server, DRBD )
> 2. group_apache_www (Domains VIPs and Apache)
> 3. group_nginx_www (Static files with nginx)
> 4. inline_nfs_client ( NFS client )
>
> (1) should run only on NodeC or NodeD. NodeC is preferable. NodeD for backup.
> (2) should run on NodeC and NodeD. NodeD is preferable. NodeC for backup.
> (3) should run on NodeC and NodeD. NodeC is preferable. NodeD for backup.
> (4) should run on every node except for node on which (1) is located.
>
> I have following orders:
> (2) depends on (1) and (4)
> (3) depends on (1) and (4)
> (4) depends on (1)
>
> Collocations:
> (4) and (1) should not run on same node.
>
> The issue is that resource (4) chooses NodeC which is the default node for 
> (1), so (1) had to choose another node but NodeC, so it goes to NodeD.
> How can I make resource (1) to choose it's node earlier that (4) and any 
> other resource ?

Swap the order resources are listed in the colocation constraint.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems