Re: [Pacemaker] stonith_admin problems with Redhat6

2011-10-13 Thread Andrew Beekhof
On Thu, Oct 13, 2011 at 11:08 PM,   wrote:
> Hi,
>
>
>
> I am facing problem with configuring stonith on REDHAT6 on my virtual
> workstation.
>
>
>
> I am trying to configure stonith using fence_ack_manual for my vmware
> workstation as there is no hardware fencing device.

fence_ack_manual isn't something you can configure.
its something you need to run, but is specific to rgmanager.

i implemented the equivalent for pacemaker yesterday, look for it in 1.1.7

>
>
>
> Can anyone please help me in testing stonith configuration with
> fence_ack_manual, that will be gr8 for me….
>
>
>
>
>
> Following are the errors –
>
>
>
> ===
> [root@node1 ~]# crm ra info fence_ack_manual
>
> lrmadmin[12247]: 2011/10/04_15:31:53 ERROR: lrm_get_rsc_type_metadata(578):
> got a return code HA_FAIL from a reply message of rmetadata with function
> get_ret_from_msg.
>
> ERROR: ocf:heartbeat:fence_ack_manual: could not parse meta-data:
>
> [root@node1 ~]#
>
>
>
> ---
>
> [root@node1 ~]# crm ra info stonith:fence_ack_manual
>
> stonith:fence_ack_manual
>
>
>
> 
>
>
>
> Parameters (* denotes required, [] the default):
>
>
>
> action (string, [reboot]): Fencing action (null, off, on, [reboot], status,
> hostlist, devstatus)
>
> stonith-timeout (time, [60s]): How long to wait for the STONITH action to
> complete.
>
>     Overrides the stonith-timeout cluster property
>
>
>
> priority (integer, [0]): The priority of the stonith resource. The lower the
> number, the higher the priority.
>
> pcmk_arg_map (string): A mapping of host attributes to device arguments.
>
>     Eg. uname:domain would tell the cluster to pass the machines name as the
> domain argument to the device.  Useful for devices that have non-standard
> interfaces
>
>
>
> pcmk_host_map (string): A mapping of host names to ports numbers for devices
> that do not support names.
>
>     Eg. node1:1,node2:3 would tell the cluster to use port 1 for node1 and
> port 3 for node2
>
>
>
> pcmk_host_list (string): A list of machines controlled by this device
> (Optional unless pcmk_host_check=static-list).
>
> pcmk_host_check (string, [dynamic-list]): How to determin which machines are
> controlled by the device.
>
>     Allowed values: dynamic-list (query the device), static-list (check the
> pcmk_host_list attribute), none (assume every device can fence every
> machine)
>
>
>
> pcmk_list_cmd (string, [list]): Which device operation to use for listing
> machines controlled by the device.
>
> pcmk_status_cmd (string, [status]): Which device operation to use for
> testing the state of a machine controlled by the device.
>
> pcmk_monitor_cmd (string, [monitor]): Which device operation to use for
> monitoring the health of the device.
>
>
>
> Operations' defaults (advisory minimum):
>
>
>
>     start timeout=15
>
>     stop  timeout=15
>
>     status    timeout=15
>
>     monitor_0 interval=15 timeout=15 start-delay=15
>
> [root@node1 ~]#
>
> ---
>
>
>
> ---
>
>
>
> [root@node1 ~]# stonith_admin --metadata --agent type
>
> stonith_admin[11852]: 2011/10/04_15:30:35 info: crm_log_init_worker: Changed
> active directory to /var/lib/heartbeat/cores/root
>
> stonith_admin[11852]: 2011/10/04_15:30:35 ERROR: get_stonith_provider: No
> such device: type
>
> stonith_admin[11852]: 2011/10/04_15:30:35 info: stonith_api_device_metadata:
> looking up type/(null) metadata
>
> stonith_admin[11852]: 2011/10/04_15:30:35 ERROR: crm_abort: crm_strdup_fn:
> Triggered assert at utils.c:822 : src != NULL
>
> stonith_admin[11852]: 2011/10/04_15:30:35 ERROR: crm_strdup_fn: Could not
> perform copy at st_client.c:507 (stonith_api_device_metadata)
>
> stonith_admin[11852]: 2011/10/04_15:30:35 WARN: stonith_api_device_metadata:
> no long description in type's metadata.
>
> stonith_admin[11852]: 2011/10/04_15:30:35 ERROR: crm_abort: crm_strdup_fn:
> Triggered assert at utils.c:822 : src != NULL
>
> stonith_admin[11852]: 2011/10/04_15:30:35 ERROR: crm_strdup_fn: Could not
> perform copy at st_client.c:513 (stonith_api_device_metadata)
>
> stonith_admin[11852]: 2011/10/04_15:30:35 info: stonith_api_device_metadata:
> short description: (null)
>
> stonith_admin[11852]: 2011/10/04_15:30:35 WARN: stonith_api_device_metadata:
> no short description in type's metadata.
>
> stonith_admin[11852]: 2011/10/04_15:30:35 ERROR: crm_abort: crm_strdup_fn:
> Triggered assert 

Re: [Pacemaker] Postgres RA won't start

2011-10-13 Thread Lars Ellenberg
On Thu, Oct 13, 2011 at 06:35:27AM -0600, Serge Dubrouski wrote:
> On Thu, Oct 13, 2011 at 4:29 AM, Lars Ellenberg
> wrote:
> 
> > On Wed, Oct 12, 2011 at 07:41:20PM -0600, Serge Dubrouski wrote:
> > > On Wed, Oct 12, 2011 at 9:20 AM, Amar Prasovic 
> > wrote:
> > >
> > > > Thank you all for tips and suggestions. I managed to configure postgres
> > so
> > > > it actually starts.
> > > >
> > > > First, I updated resource-agents (Florian thanks for the tip, still
> > don't
> > > > know how did I manage to miss that :) )
> > > > Second, I deleted postgres primitive, cleared all failcounts and
> > configure
> > > > it again like this:
> > > >
> > > > primitive postgres_res ocf:heartbeat:pgsql \
> > > > params pgctl="/usr/lib/postgresql/8.4/bin/pg_ctl"
> > > > psql="/usr/bin/psql" start_opt="" pgdata="/var/lib/postgresql/8.4/main"
> > > > config="/etc/postgresql/8.4/main/postgresql.conf" pgdba="postgres" \
> > > >
> > > > op start interval="0" timeout="120s" \
> > > > op stop interval="0" timeout="120s" \
> > > > op monitor interval="30s" timeout="30s" depth="0"
> > > >
> > > > After that, it all worked like a charm.
> > > >
> > > > However, I noticed some strange output in the log file, it wasn't there
> > > > before I updated the resource-agents.
> > > >
> > > > Here is the extract from the syslog:
> > > >
> > > > http://pastebin.com/ybPi0VMp
> > > >
> > > > (postgres_res:monitor:stderr) [: 647: monitor: unexpected operator
> > > >
> > > > This error is actually reported with any operator. I tried to start the
> > > > script from CLI, I got the same thing with ./pgsql start, ./pgsql
> > status,
> > > > ./pgsql stop
> > > >
> > >
> > > Weird. I don't know what to tell. The RA is basically all right, it just
> > > misses one nor very important fix. On my system CentOS 5. PosgreSQL 8.4
> > or
> > > 9.0 it doesn't produce any errors. If understand you log right the
> > problem
> > > is in line 647 of the RA which is:
> > >
> > > [ "$1" == "validate-all" ] && exit $rc
> >
> >  "==" != "="
> >
> >
> Theoretically yes "=" is for strings and "==" is for numbers. But why it
> would create a problem on Debian and not on CentOS and why nobody else
> reported this issue so far?
> 
> BTW, other RAs use  "==" operator as well: apache, LVM, portblock,

As you found out by now, if they are bash, that's ok.
If they are /bin/sh, then that's a bug.
dash for example does not like ==.

And no, apache and portblock use these in some embeded awk script.

LVM I fixed as well.

> > Make that [ "$1" = "validate-all" ] && exit $rc

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Postgres RA won't start

2011-10-13 Thread Serge Dubrouski
On Thu, Oct 13, 2011 at 4:29 AM, Lars Ellenberg
wrote:

> On Wed, Oct 12, 2011 at 07:41:20PM -0600, Serge Dubrouski wrote:
> > On Wed, Oct 12, 2011 at 9:20 AM, Amar Prasovic 
> wrote:
> >
> > > Thank you all for tips and suggestions. I managed to configure postgres
> so
> > > it actually starts.
> > >
> > > First, I updated resource-agents (Florian thanks for the tip, still
> don't
> > > know how did I manage to miss that :) )
> > > Second, I deleted postgres primitive, cleared all failcounts and
> configure
> > > it again like this:
> > >
> > > primitive postgres_res ocf:heartbeat:pgsql \
> > > params pgctl="/usr/lib/postgresql/8.4/bin/pg_ctl"
> > > psql="/usr/bin/psql" start_opt="" pgdata="/var/lib/postgresql/8.4/main"
> > > config="/etc/postgresql/8.4/main/postgresql.conf" pgdba="postgres" \
> > >
> > > op start interval="0" timeout="120s" \
> > > op stop interval="0" timeout="120s" \
> > > op monitor interval="30s" timeout="30s" depth="0"
> > >
> > > After that, it all worked like a charm.
> > >
> > > However, I noticed some strange output in the log file, it wasn't there
> > > before I updated the resource-agents.
> > >
> > > Here is the extract from the syslog:
> > >
> > > http://pastebin.com/ybPi0VMp
> > >
> > > (postgres_res:monitor:stderr) [: 647: monitor: unexpected operator
> > >
> > > This error is actually reported with any operator. I tried to start the
> > > script from CLI, I got the same thing with ./pgsql start, ./pgsql
> status,
> > > ./pgsql stop
> > >
> >
> > Weird. I don't know what to tell. The RA is basically all right, it just
> > misses one nor very important fix. On my system CentOS 5. PosgreSQL 8.4
> or
> > 9.0 it doesn't produce any errors. If understand you log right the
> problem
> > is in line 647 of the RA which is:
> >
> > [ "$1" == "validate-all" ] && exit $rc
>
>  "==" != "="
>
>
Theoretically yes "=" is for strings and "==" is for numbers. But why it
would create a problem on Debian and not on CentOS and why nobody else
reported this issue so far?

BTW, other RAs use  "==" operator as well: apache, LVM, portblock,


> Make that [ "$1" = "validate-all" ] && exit $rc
>
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>



-- 
Serge Dubrouski.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] stonith_admin problems with Redhat6

2011-10-13 Thread Sagar.Shimpi
Hi,

I am facing problem with configuring stonith on REDHAT6 on my virtual 
workstation.

I am trying to configure stonith using fence_ack_manual for my vmware 
workstation as there is no hardware fencing device.

Can anyone please help me in testing stonith configuration with 
fence_ack_manual, that will be gr8 for me


Following are the errors -

===
[root@node1 ~]# crm ra info fence_ack_manual
lrmadmin[12247]: 2011/10/04_15:31:53 ERROR: lrm_get_rsc_type_metadata(578): got 
a return code HA_FAIL from a reply message of rmetadata with function 
get_ret_from_msg.
ERROR: ocf:heartbeat:fence_ack_manual: could not parse meta-data:
[root@node1 ~]#

---
[root@node1 ~]# crm ra info stonith:fence_ack_manual
stonith:fence_ack_manual



Parameters (* denotes required, [] the default):

action (string, [reboot]): Fencing action (null, off, on, [reboot], status, 
hostlist, devstatus)
stonith-timeout (time, [60s]): How long to wait for the STONITH action to 
complete.
Overrides the stonith-timeout cluster property

priority (integer, [0]): The priority of the stonith resource. The lower the 
number, the higher the priority.
pcmk_arg_map (string): A mapping of host attributes to device arguments.
Eg. uname:domain would tell the cluster to pass the machines name as the 
domain argument to the device.  Useful for devices that have non-standard 
interfaces

pcmk_host_map (string): A mapping of host names to ports numbers for devices 
that do not support names.
Eg. node1:1,node2:3 would tell the cluster to use port 1 for node1 and port 
3 for node2

pcmk_host_list (string): A list of machines controlled by this device (Optional 
unless pcmk_host_check=static-list).
pcmk_host_check (string, [dynamic-list]): How to determin which machines are 
controlled by the device.
Allowed values: dynamic-list (query the device), static-list (check the 
pcmk_host_list attribute), none (assume every device can fence every machine)

pcmk_list_cmd (string, [list]): Which device operation to use for listing 
machines controlled by the device.
pcmk_status_cmd (string, [status]): Which device operation to use for testing 
the state of a machine controlled by the device.
pcmk_monitor_cmd (string, [monitor]): Which device operation to use for 
monitoring the health of the device.

Operations' defaults (advisory minimum):

start timeout=15
stop  timeout=15
statustimeout=15
monitor_0 interval=15 timeout=15 start-delay=15
[root@node1 ~]#
---

---


[root@node1 ~]# stonith_admin --metadata --agent type
stonith_admin[11852]: 2011/10/04_15:30:35 info: crm_log_init_worker: Changed 
active directory to /var/lib/heartbeat/cores/root
stonith_admin[11852]: 2011/10/04_15:30:35 ERROR: get_stonith_provider: No such 
device: type
stonith_admin[11852]: 2011/10/04_15:30:35 info: stonith_api_device_metadata: 
looking up type/(null) metadata
stonith_admin[11852]: 2011/10/04_15:30:35 ERROR: crm_abort: crm_strdup_fn: 
Triggered assert at utils.c:822 : src != NULL
stonith_admin[11852]: 2011/10/04_15:30:35 ERROR: crm_strdup_fn: Could not 
perform copy at st_client.c:507 (stonith_api_device_metadata)
stonith_admin[11852]: 2011/10/04_15:30:35 WARN: stonith_api_device_metadata: no 
long description in type's metadata.
stonith_admin[11852]: 2011/10/04_15:30:35 ERROR: crm_abort: crm_strdup_fn: 
Triggered assert at utils.c:822 : src != NULL
stonith_admin[11852]: 2011/10/04_15:30:35 ERROR: crm_strdup_fn: Could not 
perform copy at st_client.c:513 (stonith_api_device_metadata)
stonith_admin[11852]: 2011/10/04_15:30:35 info: stonith_api_device_metadata: 
short description: (null)
stonith_admin[11852]: 2011/10/04_15:30:35 WARN: stonith_api_device_metadata: no 
short description in type's metadata.
stonith_admin[11852]: 2011/10/04_15:30:35 ERROR: crm_abort: crm_strdup_fn: 
Triggered assert at utils.c:822 : src != NULL
stonith_admin[11852]: 2011/10/04_15:30:35 ERROR: crm_strdup_fn: Could not 
perform copy at st_client.c:520 (stonith_api_device_metadata)
stonith_admin[11852]: 2011/10/04_15:30:35 WARN: stonith_api_device_metadata: no 
list of parameters in type's metadata.



  1.0
  

  
  

  





  
  
2.0
  

[root@node1 ~]#
--

Re: [Pacemaker] Strange failure with ocf:heartbeat:apache when corrupting config file

2011-10-13 Thread Michael Schwartzkopff
> On 2011-10-13 12:52, Michael Schwartzkopff wrote:
> > Hi,
> > 
> > I have a res:Apache defined like:
> > 
> > primitive resApache ocf:heartbeat:apache \
> > 
> > params configfile="/etc/apache2/apache2.conf" httpd="/usr/sbin/apache2"
> > \ op monitor interval="30"
> > 
> > When I remove /etc/apache2/apache2.conf on the active node The cluster
> > does not start the resource on the other node. Both nodes have a score
> > of -inf for the resource. But the resource could start happily on the
> > other node since the configfile is not corrupted there.
> > 
> > Any ideas?
> 
> Aw c'mon. You wrote a book about this; you ought to know that one. :)

Thanks. I just wanted you to confirm my idea.
 
> Take a look at the apache RA, see which error code it returns when it
> can't parse the config file. Should be obvious then. At least it seems
> obvious to me -- unless I'm missing something important, in which case
> I'd be happy to stand corrected.
> 
> I'd merge a fix, btw. I just have no time fixing that RA now.

Why do I know that "no time" problem all too well?

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98


signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Strange failure with ocf:heartbeat:apache when corrupting config file

2011-10-13 Thread Florian Haas
On 2011-10-13 12:52, Michael Schwartzkopff wrote:
> Hi,
> 
> I have a res:Apache defined like:
> 
> primitive resApache ocf:heartbeat:apache \
>   params configfile="/etc/apache2/apache2.conf" httpd="/usr/sbin/apache2" 
> \
>   op monitor interval="30"
> 
> When I remove /etc/apache2/apache2.conf on the active node The cluster does 
> not start the resource on the other node. Both nodes have a score of -inf for 
> the resource. But the resource could start happily on the other node since 
> the 
> configfile is not corrupted there.
> 
> Any ideas?

Aw c'mon. You wrote a book about this; you ought to know that one. :)

Take a look at the apache RA, see which error code it returns when it
can't parse the config file. Should be obvious then. At least it seems
obvious to me -- unless I'm missing something important, in which case
I'd be happy to stand corrected.

I'd merge a fix, btw. I just have no time fixing that RA now.

Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] found small memory leak of pengine.

2011-10-13 Thread Yuusuke IIDA

Hi, Andrew
I confirmed commit.

https://github.com/ClusterLabs/pacemaker/commit/b2987de0d604fe599b33d215b1f44861ad72dc46

Thanks!
Yuusuke
(2011/10/13 13:28), Andrew Beekhof wrote:

On Thu, Oct 13, 2011 at 1:39 PM, Yuusuke IIDA
  wrote:

Hi, Andrew

Though this correction did not seem to be yet carried out as far as I
looked, did you have any problem?


Sorry, slipped off my radar.
Committed now :-)



Regards,
Yuusuke

(2011/09/20 10:55), Andrew Beekhof wrote:


Thanks, I'll apply it shortly

2011/9/6 Yuusuke IIDA:


Hi, Andrew

I found small memory leak of pengine.
It seems to leak out in Pacemaker-1.1 and Pacemaker-1.0.

Best Regards,
Yuusuke
--

METRO SYSTEMS CO., LTD

Yuusuke Iida
Mail: iiday...@intellilink.co.jp


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



--

METRO SYSTEMS CO., LTD

Yuusuke Iida
Mail: iiday...@intellilink.co.jp


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



--

METRO SYSTEMS CO., LTD

Yuusuke Iida
Mail: iiday...@intellilink.co.jp


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Strange failure with ocf:heartbeat:apache when corrupting config file

2011-10-13 Thread Michael Schwartzkopff
Hi,

I have a res:Apache defined like:

primitive resApache ocf:heartbeat:apache \
params configfile="/etc/apache2/apache2.conf" httpd="/usr/sbin/apache2" 
\
op monitor interval="30"

When I remove /etc/apache2/apache2.conf on the active node The cluster does 
not start the resource on the other node. Both nodes have a score of -inf for 
the resource. But the resource could start happily on the other node since the 
configfile is not corrupted there.

Any ideas?

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98


signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] crm_master triggering assert section != NULL

2011-10-13 Thread Yves Trudeau

On 11-10-13 06:26 AM, Lars Ellenberg wrote:

On Wed, Oct 12, 2011 at 08:08:21PM -0400, Yves Trudeau wrote:

What about referring to the git repository here:

http://www.clusterlabs.org/wiki/Get_Pacemaker#Building_from_Source

http://www.clusterlabs.org/mwiki/index.php?title=Install&diff=1287&oldid=1282

Lars

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Awesome!

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Postgres RA won't start

2011-10-13 Thread Lars Ellenberg
On Wed, Oct 12, 2011 at 07:41:20PM -0600, Serge Dubrouski wrote:
> On Wed, Oct 12, 2011 at 9:20 AM, Amar Prasovic  wrote:
> 
> > Thank you all for tips and suggestions. I managed to configure postgres so
> > it actually starts.
> >
> > First, I updated resource-agents (Florian thanks for the tip, still don't
> > know how did I manage to miss that :) )
> > Second, I deleted postgres primitive, cleared all failcounts and configure
> > it again like this:
> >
> > primitive postgres_res ocf:heartbeat:pgsql \
> > params pgctl="/usr/lib/postgresql/8.4/bin/pg_ctl"
> > psql="/usr/bin/psql" start_opt="" pgdata="/var/lib/postgresql/8.4/main"
> > config="/etc/postgresql/8.4/main/postgresql.conf" pgdba="postgres" \
> >
> > op start interval="0" timeout="120s" \
> > op stop interval="0" timeout="120s" \
> > op monitor interval="30s" timeout="30s" depth="0"
> >
> > After that, it all worked like a charm.
> >
> > However, I noticed some strange output in the log file, it wasn't there
> > before I updated the resource-agents.
> >
> > Here is the extract from the syslog:
> >
> > http://pastebin.com/ybPi0VMp
> >
> > (postgres_res:monitor:stderr) [: 647: monitor: unexpected operator
> >
> > This error is actually reported with any operator. I tried to start the
> > script from CLI, I got the same thing with ./pgsql start, ./pgsql status,
> > ./pgsql stop
> >
> 
> Weird. I don't know what to tell. The RA is basically all right, it just
> misses one nor very important fix. On my system CentOS 5. PosgreSQL 8.4 or
> 9.0 it doesn't produce any errors. If understand you log right the problem
> is in line 647 of the RA which is:
> 
> [ "$1" == "validate-all" ] && exit $rc

 "==" != "="

Make that [ "$1" = "validate-all" ] && exit $rc


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] crm_master triggering assert section != NULL

2011-10-13 Thread Lars Ellenberg
On Wed, Oct 12, 2011 at 08:08:21PM -0400, Yves Trudeau wrote:
> What about referring to the git repository here:
> 
> http://www.clusterlabs.org/wiki/Get_Pacemaker#Building_from_Source

http://www.clusterlabs.org/mwiki/index.php?title=Install&diff=1287&oldid=1282

Lars

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] crm abend with message AttributeError: 'NoneType' object has no attribute 'values'

2011-10-13 Thread Florian Haas
On 2011-10-13 11:36, Kulovits Christian - OS ITSC wrote:
> Hi,
> send this message just to report an error. 
> I am coming from the mainframe side and so far I don't like vi. Normal config 
> changes are done with ultraedit and spflite using xml files.

Ahem, you could also use an octal editor or punchcards if so inclined.
:) Or set EDITOR={joe,nano,emacs,whathaveyou} and use your favorite
editor with the shell "configure edit" command.

As far as the shell exception is concerned, I'll humbly defer to Dejan.

Cheers,
Florian

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] crm abend with message AttributeError: 'NoneType' object has no attribute 'values'

2011-10-13 Thread Kulovits Christian - OS ITSC
Hi,
send this message just to report an error. 
I am coming from the mainframe side and so far I don't like vi. Normal config 
changes are done with ultraedit and spflite using xml files. LCMC has boring 
naming conventions! Maybe there will be once an option to set masks for names.
I ´d like to change the admin epoch, but there is another command available to 
do this.
Cheers, Christian

-Original Message-
From: Florian Haas [mailto:flor...@hastexo.com] 
Sent: Donnerstag, 13. Oktober 2011 10:13
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] crm abend with message AttributeError: 'NoneType' 
object has no attribute 'values'

On 2011-10-13 09:56, Kulovits Christian - OS ITSC wrote:
> Hello,
> 
> I tried to do an edit xml within crm and after a vi :wq the crm shell is 
> terminated with the following messages. The shadow cib is an unchanged copy 
> of the active cib.

Now, while this might indeed be a shell problem, back up a bit and explain why 
you think you need to modify the XML directly. Perhaps we can suggest a 
workaround solving your issue with a simple "edit". Thanks.

Cheers,
Florian

--
Need help with Pacemaker?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300 Vienna-Airport, 
Austria, registered office: Vienna, registered with Vienna Commercial Court 
under FN 111000k, DVR 0091740. This e-mail is confidential and is subject to 
disclaimers. Details can be found at: http://www.austrian.com/disclaimer.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] crm abend with message AttributeError: 'NoneType' object has no attribute 'values'

2011-10-13 Thread Florian Haas
On 2011-10-13 09:56, Kulovits Christian - OS ITSC wrote:
> Hello,
> 
> I tried to do an edit xml within crm and after a vi :wq the crm shell is 
> terminated with the following messages. The shadow cib is an unchanged copy 
> of the active cib.

Now, while this might indeed be a shell problem, back up a bit and
explain why you think you need to modify the XML directly. Perhaps we
can suggest a workaround solving your issue with a simple "edit". Thanks.

Cheers,
Florian

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [DRBD-user] examples of dual primary DRBD

2011-10-13 Thread Bart Coninckx

On 10/13/11 09:42, Felix Frank wrote:

On 10/13/2011 09:38 AM, Bart Coninckx wrote:

OK, I see. Can my workaround be considered safe? Possible pitfalls I see
are upgrades that overwrite the DRBD resource agent script with the 1
second delay. Besides that I would expect this to be fine?


thx,


B.


Hi,

could it be possible to hack something up using a delay resource?
I'm thinking along the lines of "migrate" constraints, which work by
querying the uname of the respective peer.

If you could have pacemaker delay the promotion based on the peer's
uname, you could avoid the risk you mention (and personally, I feel this
is a mite less of a hack, although still bad enough).

Cheers,
Felix


It should produce a one second delay then, as tests showed that less 
delay still produces a split brain.
Agreed that they both are quite ugly, but if it works with not further 
consequences? ...


cheers,

B.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] crm abend with message AttributeError: 'NoneType' object has no attribute 'values'

2011-10-13 Thread Kulovits Christian - OS ITSC
Hello,

I tried to do an edit xml within crm and after a vi :wq the crm shell is 
terminated with the following messages. The shadow cib is an unchanged copy of 
the active cib.

shadow[HubNew] # crm
crm(HubNew)# options sort-elements no
crm(HubNew)# configure 
crm(HubNew)configure# edit xml
Traceback (most recent call last):
  File "/usr/sbin/crm", line 41, in ?
crm.main.run()
  File "/usr/lib64/python2.4/site-packages/crm/main.py", line 283, in run
if not parse_line(levels,shlex.split(inp)):
  File "/usr/lib64/python2.4/site-packages/crm/main.py", line 144, in parse_line
rv = d() # execute the command
  File "/usr/lib64/python2.4/site-packages/crm/main.py", line 143, in 
d = lambda: cmd[0](*args)
  File "/usr/lib64/python2.4/site-packages/crm/ui.py", line 1320, in edit
return set_obj.edit()
  File "/usr/lib64/python2.4/site-packages/crm/cibconfig.py", line 153, in edit
return self.edit_save(s)
  File "/usr/lib64/python2.4/site-packages/crm/cibconfig.py", line 138, in 
edit_save
if not self.save(s):
  File "/usr/lib64/python2.4/site-packages/crm/cibconfig.py", line 390, in save
doc.unlink()
  File "/usr/lib64/python2.4/site-packages/_xmlplus/dom/minidom.py", line 1573, 
in unlink
Node.unlink(self)
  File "/usr/lib64/python2.4/site-packages/_xmlplus/dom/minidom.py", line 274, 
in unlink
child.unlink()
  File "/usr/lib64/python2.4/site-packages/_xmlplus/dom/minidom.py", line 681, 
in unlink
Node.unlink(self)
  File "/usr/lib64/python2.4/site-packages/_xmlplus/dom/minidom.py", line 274, 
in unlink
child.unlink()
  File "/usr/lib64/python2.4/site-packages/_xmlplus/dom/minidom.py", line 681, 
in unlink
Node.unlink(self)
  File "/usr/lib64/python2.4/site-packages/_xmlplus/dom/minidom.py", line 274, 
in unlink
child.unlink()
  File "/usr/lib64/python2.4/site-packages/_xmlplus/dom/minidom.py", line 681, 
in unlink
Node.unlink(self)
  File "/usr/lib64/python2.4/site-packages/_xmlplus/dom/minidom.py", line 274, 
in unlink
child.unlink()
  File "/usr/lib64/python2.4/site-packages/_xmlplus/dom/minidom.py", line 677, 
in unlink
for attr in self._attrs.values():
AttributeError: 'NoneType' object has no attribute 'values'
shadow[HubNew] #

regards, Christian


Mit freundlichen Grüßen / with best regards
Christian Kulovits



AUSTRIAN AIRLINES
Christian Kulovits
ITSC Central System & Data Base Services
Senior IT System Engineer
Head Office
Office Park 2, P.O. Box 100
1300 Vienna-Airport, Austria

1   Phone: +43 (0)5 1766   11557
   Fax: +43 (0)5 1766 511557
   Mobile: +43 (0)664 80111 11557
M   email:  christian.kulov...@austrian.com
   www:   www.austrian.com
 





Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300 Vienna-Airport, 
Austria, registered office: Vienna, registered with Vienna Commercial Court 
under FN 111000k, DVR 0091740. This e-mail is confidential and is subject to 
disclaimers. Details can be found at: http://www.austrian.com/disclaimer.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [DRBD-user] examples of dual primary DRBD

2011-10-13 Thread Bart Coninckx

On 10/12/11 09:39, Florian Haas wrote:

On 2011-10-11 09:12, Bart Coninckx wrote:

Florian,

Does this mean you thought this problem could have been the result of
changes done by Andrew to the DRBD RA? But sindce he hasn't done them
yet, isn't?


You're right, I had been thinking your issues may be due to a change in
the way master/slave sets are being promoted on startup. Since that
change was in fact never made to the Pacemaker codebase, we can rule out
that possibility.

Florian



OK, I see. Can my workaround be considered safe? Possible pitfalls I see 
are upgrades that overwrite the DRBD resource agent script with the 1 
second delay. Besides that I would expect this to be fine?



thx,


B.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker