Re: [Linux-ha-dev] Fwd: Re: Bug#420206: heartbeat-2: Bashism in IPAddr2

2007-04-26 Thread Andrew Beekhof

On 4/26/07, Simon Horman [EMAIL PROTECTED] wrote:

- Forwarded message from Simon Horman [EMAIL PROTECTED] -

Date: Mon, 23 Apr 2007 11:25:36 +0900
From: Simon Horman [EMAIL PROTECTED]
To: Erich Schubert [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: Bug#420206: heartbeat-2: Bashism in IPAddr2
Message-ID: [EMAIL PROTECTED]
References: [EMAIL PROTECTED]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: [EMAIL PROTECTED]
User-Agent: mutt-ng/devel-r804 (Debian)
Status: RO
Content-Length: 567
Lines: 20

On Fri, Apr 20, 2007 at 08:38:59PM +0200, Erich Schubert wrote:
 Package: heartbeat-2
 Version: 2.0.7-2
 Severity: normal

 The IPAddr2 script contains bashisms.
 /usr/lib/ocf/resource.d/heartbeat/IPaddr2:

 
IF_MAC=${IF_MAC:0:2}:${IF_MAC:2:2}:${IF_MAC:4:2}:${IF_MAC:6:2}:${IF_MAC:8:2}:${IF_MAC:10:2}

 Hotfix: replace /bin/sh in the first line by /bin/bash
 Other scripts might be affected as well.

Thanks, I'll get this fixed. Please let me know if you find any more.


i'll push up a fix momentarily
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Fwd: Re: Bug#420206: heartbeat-2: Bashism in IPAddr2

2007-04-26 Thread Dejan Muhamedagic
On Thu, Apr 26, 2007 at 10:00:10AM +0200, Andrew Beekhof wrote:
 On 4/26/07, Simon Horman [EMAIL PROTECTED] wrote:
 - Forwarded message from Simon Horman [EMAIL PROTECTED] -
 
 Date: Mon, 23 Apr 2007 11:25:36 +0900
 From: Simon Horman [EMAIL PROTECTED]
 To: Erich Schubert [EMAIL PROTECTED], [EMAIL PROTECTED]
 Subject: Re: Bug#420206: heartbeat-2: Bashism in IPAddr2
 Message-ID: [EMAIL PROTECTED]
 References: [EMAIL PROTECTED]
 MIME-Version: 1.0
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 In-Reply-To: [EMAIL PROTECTED]
 User-Agent: mutt-ng/devel-r804 (Debian)
 Status: RO
 Content-Length: 567
 Lines: 20
 
 On Fri, Apr 20, 2007 at 08:38:59PM +0200, Erich Schubert wrote:
  Package: heartbeat-2
  Version: 2.0.7-2
  Severity: normal
 
  The IPAddr2 script contains bashisms.
  /usr/lib/ocf/resource.d/heartbeat/IPaddr2:
 
  
 IF_MAC=${IF_MAC:0:2}:${IF_MAC:2:2}:${IF_MAC:4:2}:${IF_MAC:6:2}:${IF_MAC:8:2}:${IF_MAC:10:2}
 
  Hotfix: replace /bin/sh in the first line by /bin/bash
  Other scripts might be affected as well.
 
 Thanks, I'll get this fixed. Please let me know if you find any more.
 
 i'll push up a fix momentarily

Since IPaddr2 is Linux specific, I guess it's OK to have it run by
bash.

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

-- 
Dejan
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Fwd: Re: Bug#420206: heartbeat-2: Bashism in IPAddr2

2007-04-26 Thread Andrew Beekhof

On 4/26/07, Dejan Muhamedagic [EMAIL PROTECTED] wrote:

On Thu, Apr 26, 2007 at 10:00:10AM +0200, Andrew Beekhof wrote:
 On 4/26/07, Simon Horman [EMAIL PROTECTED] wrote:
 - Forwarded message from Simon Horman [EMAIL PROTECTED] -
 
 Date: Mon, 23 Apr 2007 11:25:36 +0900
 From: Simon Horman [EMAIL PROTECTED]
 To: Erich Schubert [EMAIL PROTECTED], [EMAIL PROTECTED]
 Subject: Re: Bug#420206: heartbeat-2: Bashism in IPAddr2
 Message-ID: [EMAIL PROTECTED]
 References: [EMAIL PROTECTED]
 MIME-Version: 1.0
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 In-Reply-To: [EMAIL PROTECTED]
 User-Agent: mutt-ng/devel-r804 (Debian)
 Status: RO
 Content-Length: 567
 Lines: 20
 
 On Fri, Apr 20, 2007 at 08:38:59PM +0200, Erich Schubert wrote:
  Package: heartbeat-2
  Version: 2.0.7-2
  Severity: normal
 
  The IPAddr2 script contains bashisms.
  /usr/lib/ocf/resource.d/heartbeat/IPaddr2:
 
 
 
IF_MAC=${IF_MAC:0:2}:${IF_MAC:2:2}:${IF_MAC:4:2}:${IF_MAC:6:2}:${IF_MAC:8:2}:${IF_MAC:10:2}
 
  Hotfix: replace /bin/sh in the first line by /bin/bash
  Other scripts might be affected as well.
 
 Thanks, I'll get this fixed. Please let me know if you find any more.

 i'll push up a fix momentarily

Since IPaddr2 is Linux specific, I guess it's OK to have it run by
bash.


i believe bash is available for most platforms
its even the default on OSX (a BSD variant)
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Re: Bug#420637: heartbeat-2: File descriptor leak?

2007-04-26 Thread Dejan Muhamedagic
Hi,

On Thu, Apr 26, 2007 at 11:14:46AM +0900, Simon Horman wrote:
 On Tue, Apr 24, 2007 at 09:51:45AM +0900, Simon Horman wrote:
  forwarded 420637 [EMAIL PROTECTED]
  thanks
  
  On Mon, Apr 23, 2007 at 07:28:53PM +0200, Erich Schubert wrote:
   Package: heartbeat-2
   Version: 2.0.7-2
   Severity: normal
   
   It seems that heartbeat-2 leaks a file descriptor to it's child
   processes. From the SELinux audit log:
   
   avc:  denied  { read } for  pid=2403 comm=ip name=heartbeat.pid
   dev=ida/c0d0p5 ino=86181 scontext=root:system_r:ifconfig_t:s0
   tcontext=system_u:object_r:initrc_var_run_t:s0 tclass=file
   
   avc:  denied  { read } for  pid=3210 comm=rndc name=heartbeat.pid
   dev=ida/c0d0p5 ino=86181 scontext=root:system_r:ndc_t:s0
   tcontext=system_u:object_r:initrc_var_run_t:s0 tclass=file
   
   avc:  denied  { read } for  pid=3303 comm=openvpn name=heartbeat.pid
   dev=ida/c0d0p5 ino=86181 scontext=root:system_r:openvpn_t:s0
   tcontext=system_u:object_r:initrc_var_run_t:s0 tclass=file

I don't speak SElinux: comm= denotes a program? I suppose that ip
is from IPaddr2 then. Do you have openvpn and bind in your
heartbeat config? Perhaps you could also post your heartbeat
configuration (ha.cf and haresources/cib.xml).

Thanks.


   
   The best explanaition for these errors I have is that a file descriptor
   (such as STDIN) of these processes points to the heartbeat.pid file.
   I havn't verified it in the heartbeat-2 code yet. It's not very likely
   that this is exploitable; the heartbeat scripts are started with root
   privileges anyway. But in theory it could be possible to trick one of
   these scripts into writing a differend PID into the pidfile maybe?
  
  Hi Eric,
  
  that does indeed look like a bit of a problem. Thanks for reporting it.
  Hopefully it isn't too hard to track down and fix.
  
  I'm CCing the linux-ha-dev list so their eyes pass over this problem.
 
 Re CCing, as I used the wrong address the first time around.
 
 -- 
 Horms
   H: http://www.vergenet.net/~horms/
   W: http://www.valinux.co.jp/en/
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

-- 
Dejan
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Fwd: Re: Bug#420206: heartbeat-2: Bashism in IPAddr2

2007-04-26 Thread David Lee
On Thu, 26 Apr 2007, Dejan Muhamedagic wrote:

 On Thu, Apr 26, 2007 at 10:00:10AM +0200, Andrew Beekhof wrote:
  On 4/26/07, Simon Horman [EMAIL PROTECTED] wrote:
  - Forwarded message from Simon Horman [EMAIL PROTECTED] -
  
  [...]
  On Fri, Apr 20, 2007 at 08:38:59PM +0200, Erich Schubert wrote:
   Package: heartbeat-2
   Version: 2.0.7-2
   Severity: normal
  
   The IPAddr2 script contains bashisms.
   /usr/lib/ocf/resource.d/heartbeat/IPaddr2:
  
  
  IF_MAC=${IF_MAC:0:2}:${IF_MAC:2:2}:${IF_MAC:4:2}:${IF_MAC:6:2}:${IF_MAC:8:2}:${IF_MAC:10:2}
  
   Hotfix: replace /bin/sh in the first line by /bin/bash
   Other scripts might be affected as well.
  
  Thanks, I'll get this fixed. Please let me know if you find any more.
 
  i'll push up a fix momentarily

 Since IPaddr2 is Linux specific, I guess it's OK to have it run by
 bash.


Executive summary of what follows: a cautious agreement with that:
  Since IPaddr2 is Linux specific, I guess it's OK to have ... bash.


Now the waffle: feel free to hit delete:

A personal view (coming from a portability angle and a Solaris angle):

In general, I would usually argue for Bourne-only, avoiding bash where
reasonably possible, in line with GNU portability recommendations.

But if IPaddr2 really is Linux specific then I'd be OK with bash in this
defined instance, if it makes the insides of the script significantly
cleaner and clearer (more understandable and more maintainable).


The GNU portability purists would argue for Bourne, and discourage bash,
based on the fact that every UN*X-like OS has Bourne, but only some have
bash.  Personally, I try to follow that where reasonably possible, to keep
things portable, including in heartbeat.

As a counter-example: They would also argue against using shell functions
because (apparently) some Bournes lack them.  Personally, I don't bother
following that one, including in heartbeat, because most real world
Bournes these days seem to have shell functions.  Indeed, I've added some
myself to heartbeat down the years (and in an email earlier this week, I
suggested adding another).


Solaris?  When I started with heartbeat, Solaris versions of the time
lacked native bash.  These days, Solaris distributions include bash,
(although not necessarily within the default installation set but at least
these days it's easily installable).

So in the heartbeat context nowadays, I would usually continue to advise
against bash-isms where reasonably possible (for OSes that may still lack
native (or natively available) bash) but in favour of shell functions
(because they tend to add significant clarity, with no apparent loss to
likely OSes).  In the case of a known Linux-only script, then bash is
probably OK if its use adds value (clarity, maintainability, etc.).

(Hope you don't mind that rambling piece of background waffle!)


-- 

:  David LeeI.T. Service  :
:  Senior Systems ProgrammerComputer Centre   :
:  UNIX Team Leader Durham University :
:   South Road:
:  http://www.dur.ac.uk/t.d.lee/Durham DH1 3LE:
:  Phone: +44 191 334 2752  U.K.  :
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Fwd: Re: Bug#420206: heartbeat-2: Bashism in IPAddr2

2007-04-26 Thread Dejan Muhamedagic
On Thu, Apr 26, 2007 at 04:42:13PM +0100, David Lee wrote:
 On Thu, 26 Apr 2007, Dejan Muhamedagic wrote:
 
  On Thu, Apr 26, 2007 at 10:00:10AM +0200, Andrew Beekhof wrote:
   On 4/26/07, Simon Horman [EMAIL PROTECTED] wrote:
   - Forwarded message from Simon Horman [EMAIL PROTECTED] -
   
   [...]
   On Fri, Apr 20, 2007 at 08:38:59PM +0200, Erich Schubert wrote:
Package: heartbeat-2
Version: 2.0.7-2
Severity: normal
   
The IPAddr2 script contains bashisms.
/usr/lib/ocf/resource.d/heartbeat/IPaddr2:
   
   
   IF_MAC=${IF_MAC:0:2}:${IF_MAC:2:2}:${IF_MAC:4:2}:${IF_MAC:6:2}:${IF_MAC:8:2}:${IF_MAC:10:2}
   
Hotfix: replace /bin/sh in the first line by /bin/bash
Other scripts might be affected as well.
   
   Thanks, I'll get this fixed. Please let me know if you find any more.
  
   i'll push up a fix momentarily
 
  Since IPaddr2 is Linux specific, I guess it's OK to have it run by
  bash.
 
 
 Executive summary of what follows: a cautious agreement with that:
   Since IPaddr2 is Linux specific, I guess it's OK to have ... bash.
 
 
 Now the waffle: feel free to hit delete:
 
 A personal view (coming from a portability angle and a Solaris angle):
 
 In general, I would usually argue for Bourne-only, avoiding bash where
 reasonably possible, in line with GNU portability recommendations.
 
 But if IPaddr2 really is Linux specific then I'd be OK with bash in this
 defined instance, if it makes the insides of the script significantly
 cleaner and clearer (more understandable and more maintainable).
 
 
 The GNU portability purists would argue for Bourne, and discourage bash,
 based on the fact that every UN*X-like OS has Bourne, but only some have
 bash.  Personally, I try to follow that where reasonably possible, to keep
 things portable, including in heartbeat.
 
 As a counter-example: They would also argue against using shell functions
 because (apparently) some Bournes lack them.  Personally, I don't bother
 following that one, including in heartbeat, because most real world
 Bournes these days seem to have shell functions.  Indeed, I've added some
 myself to heartbeat down the years (and in an email earlier this week, I
 suggested adding another).
 
 
 Solaris?  When I started with heartbeat, Solaris versions of the time
 lacked native bash.  These days, Solaris distributions include bash,
 (although not necessarily within the default installation set but at least
 these days it's easily installable).
 
 So in the heartbeat context nowadays, I would usually continue to advise
 against bash-isms where reasonably possible (for OSes that may still lack
 native (or natively available) bash) but in favour of shell functions
 (because they tend to add significant clarity, with no apparent loss to
 likely OSes).  In the case of a known Linux-only script, then bash is
 probably OK if its use adds value (clarity, maintainability, etc.).
 
 (Hope you don't mind that rambling piece of background waffle!)

No, not at all. I think that this is important and I'd second your
opinion.

This is not meant as an argument, but I don't even understand the
line above, just guess that it's something about splitting
something into something. However, that script is, I believe,
useable just on Linux.

One thing which I'm really missing is variables local to a
function (typeset or local). On the one hand, it is easy to mix up
variable names and on the other very tedious to keep track of all
the variables in a long script.

It is not clear which standard should be followed. I think that
there's also something like POSIX shell, but no idea how
widespread.

 
 
 -- 
 
 :  David LeeI.T. Service  :
 :  Senior Systems ProgrammerComputer Centre   :
 :  UNIX Team Leader Durham University :
 :   South Road:
 :  http://www.dur.ac.uk/t.d.lee/Durham DH1 3LE:
 :  Phone: +44 191 334 2752  U.K.  :
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

-- 
Dejan
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-HA] How to add op to existing master/slave tag from command line

2007-04-26 Thread Andrew Beekhof

On 4/25/07, Doug Knight [EMAIL PROTECTED] wrote:

Can someone provide an example of xml to be used with cibadmin to add an
op tag to an existing master/slave resource? Here's my master/slave
definition:

 master_slave notify=true id=ms_drbd_7788
   instance_attributes id=ms_drbd_7788_instance_attrs
 attributes
   nvpair id=ms_drbd_7788_clone_max name=clone_max value=2/
   nvpair id=ms_drbd_7788_clone_node_max name=clone_node_max
value=1/
   nvpair id=ms_drbd_7788_master_max name=master_max
value=1/
   nvpair id=ms_drbd_7788_master_node_max name=master_node_max
value=1/
   nvpair name=target_role id=ms_drbd_7788_target_role
value=stopped/
 /attributes
   /instance_attributes
   primitive class=ocf type=drbd provider=heartbeat
id=rsc_drbd_7788
 instance_attributes id=rsc_drbd_7788_instance_attrs
   attributes
 nvpair id=fdb586b1-d439-4dfb-867c-3eefbe5d585f
name=drbd_resource value=pgsql/
 nvpair name=target_role id=rsc_drbd_7788:0_target_role
value=stopped/
   /attributes
 /instance_attributes
   /primitive
 /master_slave

And for example, I'd like to add:

op id=drbd_mon_sl  name=monitor timeout=60 role=Slave
interval=30/

So that I can do:

cibadmin -U -x add_mon_ssl.xml

I've been trying to add it from the command line, and some of my
attempts are core dumping with the following:

crm_abort: crm_str_eq: Triggered fatal assert at utils.c:686 : a != b


its rather hard to comment without
* the contents of add_mon_ssl.xml
* the stacktrace
* the version



Thanks,
Doug

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Location constraints

2007-04-26 Thread Benjamin Watine

Dejan Muhamedagic a écrit :

On Wed, Apr 25, 2007 at 05:59:12PM +0200, Benjamin Watine wrote:

Dejan Muhamedagic a écrit :

On Wed, Apr 25, 2007 at 11:59:02AM +0200, Benjamin Watine wrote:
You were true, it wasn't a score problem, but my IPv6 resource that 
causes an error, and let the resource group unstarted.


Without IPv6, all is OK, behaviour of Heartbeat fit my needs (start on 
prefered node (castor), and failover after 3 fails). So, my problem is 
IPv6 now.


The script seems to have a problem :

# /etc/ha.d/resource.d/IPv6addr 2001:660:6301:301::47:1 start
*** glibc detected *** free(): invalid next size (fast): 
0x0050d340 ***
/etc/ha.d/resource.d//hto-mapfuncs: line 51:  4764 Aborted 
  $__SCRIPT_NAME start

2007/04/25_11:43:29 ERROR:  Unknown error: 134
ERROR:  Unknown error: 134

but now, ifconfig show that IPv6 is well configured, but script exit 
with error code.

IPv6addr aborts, hence the exit code 134 (128+signo). Somebody
recently posted a set of patches for IPv6addr... Right, I'm cc-ing
this to Horms.

Thank you so much, I'm waiting for Horms so. I'll take a look to list 
archive also.


BTW, wasn't there also a core dump for this case too? Could you do
a ls -R /var/lib/heartbeat/cores and check.



I don't know how to find core dump :/ In this case, should it be 
core.22560 ?


# /etc/ha.d/resource.d/IPv6addr 2001:660:6301:301::47:1 start
*** glibc detected *** free(): invalid next size (fast): 
0x0050d340 ***
/etc/ha.d/resource.d//hto-mapfuncs: line 51: 22560 Aborted 
   $__SCRIPT_NAME start

2007/04/26_10:46:38 ERROR:  Unknown error: 134
ERROR:  Unknown error: 134
[EMAIL PROTECTED] ls -R /var/lib/heartbeat/cores
/var/lib/heartbeat/cores:
hacluster  nobody  root

/var/lib/heartbeat/cores/hacluster:
core.3620  core.4116  core.4119  core.4123  core.5262  core.5265 
core.5269  core.5272

core.3626  core.4117  core.4121  core.4124  core.5263  core.5266  core.5270
core.3829  core.4118  core.4122  core.5256  core.5264  core.5268  core.5271

/var/lib/heartbeat/cores/nobody:

/var/lib/heartbeat/cores/root:
core.10766  core.21816  core.29951  core.3642  core.3650  core.3658 
core.3667  core.4471
core.11379  core.23505  core.30813  core.3643  core.3651  core.3661 
core.3668  core.4550
core.11592  core.24403  core.31033  core.3645  core.3652  core.3663 
core.4234  core.5104
core.12928  core.24863  core.3489   core.3647  core.3653  core.3664 
core.4371  core.5761
core.15849  core.25786  core.3591   core.3648  core.3654  core.3665 
core.4394  core.6130
core.21501  core.28286  core.3610   core.3649  core.3657  core.3666 
core.4470

[EMAIL PROTECTED]



# ifconfig
eth0  Lien encap:Ethernet  HWaddr 00:13:72:58:74:5F
 inet adr:193.48.169.46  Bcast:193.48.169.63 
Masque:255.255.255.224

 adr inet6: 2001:660:6301:301:213:72ff:fe58:745f/64 Scope:Global
 adr inet6: fe80::213:72ff:fe58:745f/64 Scope:Lien
 adr inet6: 2001:660:6301:301::47:1/64 Scope:Global
 UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 RX packets:3788 errors:0 dropped:0 overruns:0 frame:0
 TX packets:3992 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 lg file transmission:1000
 RX bytes:450820 (440.2 KiB)  TX bytes:844188 (824.4 KiB)
 Adresse de base:0xecc0 Mémoire:fe6e-fe70

And if I launch the script again, no error is returned :

# /etc/ha.d/resource.d/IPv6addr 2001:660:6301:301::47:1 start
2007/04/25_11:45:23 INFO:  Success
INFO:  Success


So, you're saying that once the resource is running, starting it
again doesn't produce an error? Did you also try to stop it and
start it from the stopped state?



Yes, but probably because the script just check that IPv6 is set, and so 
don't try to set it again. If I stop and start again, the error occurs.


For others errors, I disable stonith for the moment, and DRBD is built 
in kernel, so the drbd module is not needed. I've seen this message, but 
it's not a problem.

There's a small problem with the stonith suicide agent, which
renders it unusable, but it is soon to be fixed.

OK, that's what I had read on this list, but I wasn't sure. Is there is 
any patch now ?


I joined log and config, and core file about stonith 
(/var/lib/heartbeat/cores/root/core.3668). Is it what you asked for 
(backtrace from stonith core dump) ?

You shouldn't be sending core dumps to a public list: it may
contain sensitive information. What I asked for, a backtrace, you
get like this:

$ gdb /usr/lib64/heartbeat/stonithd core.3668
(gdb) bt
...   here comes the backtrace
(gdb) quit


Ooops ! Here it is :

#0  0x0039b9d03507 in stonith_free_hostlist () from 
/usr/lib64/libstonith.so.1

#1  0x00408a95 in ?? ()
#2  0x00407fee in ?? ()
#3  0x004073c3 in ?? ()
#4  0x0040539d in ?? ()
#5  0x00405015 in ?? ()
#6  0x0039b950abd4 in G_CH_dispatch_int () from /usr/lib64/libplumb.so.1
#7  0x003a12a266bd in g_main_context_dispatch () from 

Re: [Linux-HA] Location constraints

2007-04-26 Thread Benjamin Watine

Simon Horman a écrit :

On Wed, Apr 25, 2007 at 04:25:48PM +0200, Dejan Muhamedagic wrote:

On Wed, Apr 25, 2007 at 11:59:02AM +0200, Benjamin Watine wrote:
You were true, it wasn't a score problem, but my IPv6 resource that 
causes an error, and let the resource group unstarted.


Without IPv6, all is OK, behaviour of Heartbeat fit my needs (start on 
prefered node (castor), and failover after 3 fails). So, my problem is 
IPv6 now.


The script seems to have a problem :

# /etc/ha.d/resource.d/IPv6addr 2001:660:6301:301::47:1 start
*** glibc detected *** free(): invalid next size (fast): 
0x0050d340 ***
/etc/ha.d/resource.d//hto-mapfuncs: line 51:  4764 Aborted 
   $__SCRIPT_NAME start

2007/04/25_11:43:29 ERROR:  Unknown error: 134
ERROR:  Unknown error: 134

but now, ifconfig show that IPv6 is well configured, but script exit 
with error code.

IPv6addr aborts, hence the exit code 134 (128+signo). Somebody
recently posted a set of patches for IPv6addr... Right, I'm cc-ing
this to Horms.


Hi,

thanks for CCing me on this, I don't peruse the linux-ha list very often
and I certainly would have missed it otherwise.

Looking over the patches that I applied to IPv6addr recently,
the following two fix potential crash bugs, though I don't think
either of them relate to free() calls, so I doubt that they will resolve
your problem.

http://hg.linux-ha.org/dev/rev/37271ae7f117
http://hg.linux-ha.org/dev/rev/b4bc188b4ebe

I did however find a crash bug relating to free in the version of
libnet that I was using. You can find a fairly lenthy discussion and
a proposed fix at:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=418975

In summary. On Debian Etch, the problem resulted in a crash on amd64.
It did not manifest in a crash on i386.  I will raise this issue with
the upstream libnet maintainer, as I think that the problem is present
in the latest versions of his code.

Assuming that this does not solve your problem, what would help me
imensely is the following information.



I use libnet v1.1.2.1 and I've applied your patch, but it don't solve my 
problem.



1) What version of linux-ha and libnet you are using
   and where you got them from.


Heartbeat v2.0.8 x86_64 from CentOS package 
(http://mirror.centos.org/centos/4/extras/x86_64/RPMS/) before, but now 
Heartbeat v2.0.8 from sources 
(http://linux-ha.org/download/heartbeat-2.0.8.tar.gz)


Libnet v1.1.2.1 (latest stable) from http://www.packetfactory.net/libnet/


2) What architecture you are using.


I'm running on RedHat ES4 x86_64


3) If you could provide a backtrace of the crash, preferably using
   versions of linux-ha and libnet that have been recompiled with
   debuging symbols.  (In the general case this means adding -g to
   CFLAGS, then rebuilding from scratch, including rerunning ./configure).


I've rebuilded Heartbeat from sources, enabled debugging (-g option was 
already in CFLAGS if I don't make mistake), but I don't know how to do a 
 backtrace :/


I've tried to do :

gdb /usr/lib/ocf/resource.d/heartbeat/IPv6addr
run 2001:660:6301:301::47:1 start
Starting program: /usr/lib/ocf/resource.d/heartbeat/IPv6addr 
2001:660:6301:301::47:1 start

[Thread debugging using libthread_db enabled]
[New Thread 47165808758720 (LWP 4360)]
usage: /usr/lib/ocf/resource.d/heartbeat/IPv6addr 
{start|stop|status|monitor|validate-all|meta-data}


Program exited with code 02.

What is the usage of executable IPv6addr ? It's ok for its resource 
agent (/etc/ha.d/resource.d/IPv6addr (IPv6) start), but not for the 
executable. How can I do the backtrace of IPv6addr ?



4) Please Cc me on mail regarding this :)



done :)

Thanks !
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Heartbeat and EVMS cluster

2007-04-26 Thread Jose Jerez

Hello,

I'm trying to set up a cluster system with two machines and a shared
storage (all SLES10  Heartbeat 2.0.7)

For the shared storage there is an ISCSI target available to both
machines and the  management of this common device is done through
EVMS.  So far I managed to set up a private segment container using
the cluster segment manager (CSM) in EVMS; for this container to be
available heartbeat has to be started before the evms volumes can be
activated, and the following two lines must be present in the ha.cf
config file:

respawn root /sbin/evmsd
apiauth evms uid=hacluster,root

Now the private segment container is available and accessible two one
node only, that's how it is suppose to be.

So far so good, but now my question is, how do I configure  hearbeat
to move this private segment container from one node to the other, is
there an RA for this or am I going ahead of the project?

I don't think that the filesystem type should be a problem since I'm
using a private container, right?

I know STONITH is important here, I'll take care of this later.

A bit lost here, any helping hand?

Kind regards

Jose Jerez.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] mysql drbd and SAN all together

2007-04-26 Thread Jan Kalcic
Andrew Beekhof wrote:
 On 4/25/07, Jan Kalcic [EMAIL PROTECTED] wrote:
 Hi,

 After some tests in my lab I have now a two nodes cluster working
 perfectly where I create a virtual ip resource using hb_gui. I also
 created a drbd partition which is correctly working but not yet included
 as heartbeat resource. This is the next step I'm going to do.

 Now, following the documentation and other I found a simple way to
 create a drbd resource for heartbeat which consists in a single line in
 the haresources file. But as far as I understood this is not longer used
 in heartbeat 2 where resources are configured in the cib.xml, right? I
 also noticed it's possible to configure it with hb_gui but I don't know
 the right steps to take and actually I get confused seeing different
 native resource for drbd.

 use haresources2cib.py to convert from the old version to the new


If it worked it would be really great. The cib.xml file created during
convertion has something wrong as I can not longer connect to the server
from hb_gui.

The line in haresources looks like:

node1 IPaddr::192.168.1.93 drbddisk httpd

The output in cib.xml is:

?xml version=1.0 ?
cib
configuration
crm_config
nvpair id=transition_idle_timeout
name=transition_idle_timeout value=120s/
nvpair id=symmetric_cluster
name=symmetric_cluster value=true/
nvpair id=no_quorum_policy
name=no_quorum_policy value=stop/
/crm_config
nodes/
resources
group id=group_1
primitive class=ocf id=IPaddr_1
provider=heartbeat type=IPaddr
operations
op id=IPaddr_1_mon
interval=5s name=monitor timeout=5s/
/operations
instance_attributes
attributes
nvpair
id=IPaddr_1_attr_0 name=ip value=192.168.1.93/
/attributes
/instance_attributes
/primitive
primitive class=heartbeat
id=drbddisk_2 provider=heartbeat type=drbddisk
operations
op id=drbddisk_2_mon
interval=120s name=monitor timeout=60s/
/operations
/primitive
primitive class=heartbeat
id=httpd_3 provider=heartbeat type=httpd
operations
op id=httpd_3_mon
interval=120s name=monitor timeout=60s/
/operations
/primitive
/group
/resources
constraints
rsc_location id=rsc_location_group_1
rsc=group_1
rule id=prefered_location_group_1
score=100
expression attribute=#uname
id=prefered_location_group_1_expr operation=eq value=node1/
/rule
/rsc_location
/constraints
/configuration
status/
/cib

I've also tried with the line below in haresources but it doesn't work
anyway.

nodo1 drbddisk::r0 Filesystem::/dev/drbd0::/drbdmount::ext3 192.168.0.1 httpd


Does anybody have a similar configuration file working properly which can share?

Jan 




___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] fast hb_standby hb_takeover - lock

2007-04-26 Thread Dejan Muhamedagic
Hi,

On Thu, Apr 26, 2007 at 09:32:42AM +0200, Hannes Dorbath wrote:
 Hello,
 
 I'm running Heartbeat 2.0.8 with a V1 style config. It works fine, 
 besides a single thing, I'd like to get some clarification about:
 
 When I do hb_standby on machine A, and before the resource takeover 
 completed hb_takeover again, I get the cluster in a situation where any 
 standby or takeover requests are ignored on both sides.
 
 I get a message on machine B that it ignores the takeover request from 
 machine A as resources are in flux. That makes sense, but after the 
 resource takeover completed they still ignore requests. Boths sides 
 display a timer of 3600 seconds or something before they will do 
 anything again.
 
 What is the correct way to recover from that? Both sides refuse to 
 accepts standby or takeover request, both sides refuse to stop 
 Heartbeat. I need to kill  Heartbeat and restart it, so that they are 
 happy again.

Any logs out there? Configuration perhaps?

Thanks.

 
 
 Thanks.
 
 
 
 -- 
 Regards,
 Hannes Dorbath
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

-- 
Dejan
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] fast hb_standby hb_takeover - lock

2007-04-26 Thread Hannes Dorbath

On 26.04.2007 15:54, Dejan Muhamedagic wrote:

Any logs out there? Configuration perhaps?


I'll post both in 1-2 hours when I'm at that location again. I just 
thought this might be something known / expected.


Thanks.


--
Regards,
Hannes Dorbath
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] IPv6addr fail

2007-04-26 Thread Benjamin Watine
OK. I've lauched the IPv6addr again (standalone, and managed by HB), it 
crashes, but no core dump seems to be generated today. No file from 
today in core dir. I don't know why.


The file command show me some old IPv6addr core dumps, you can find 
backtraces of it in the tar.gz generated by your script.


I join the file root/* output for you can find files easily. All these 
core dumps are generated only by stonithd, IPv6addr, and pidof.


Regards

Ben

Dejan Muhamedagic a écrit :

On Thu, Apr 26, 2007 at 10:54:36AM +0200, Benjamin Watine wrote:

Dejan Muhamedagic a écrit :

On Wed, Apr 25, 2007 at 05:59:12PM +0200, Benjamin Watine wrote:

Dejan Muhamedagic a écrit :

On Wed, Apr 25, 2007 at 11:59:02AM +0200, Benjamin Watine wrote:
You were true, it wasn't a score problem, but my IPv6 resource that 
causes an error, and let the resource group unstarted.


Without IPv6, all is OK, behaviour of Heartbeat fit my needs (start on 
prefered node (castor), and failover after 3 fails). So, my problem is 
IPv6 now.


The script seems to have a problem :

# /etc/ha.d/resource.d/IPv6addr 2001:660:6301:301::47:1 start
*** glibc detected *** free(): invalid next size (fast): 
0x0050d340 ***
/etc/ha.d/resource.d//hto-mapfuncs: line 51:  4764 Aborted 
 $__SCRIPT_NAME start

2007/04/25_11:43:29 ERROR:  Unknown error: 134
ERROR:  Unknown error: 134

but now, ifconfig show that IPv6 is well configured, but script exit 
with error code.

IPv6addr aborts, hence the exit code 134 (128+signo). Somebody
recently posted a set of patches for IPv6addr... Right, I'm cc-ing
this to Horms.

Thank you so much, I'm waiting for Horms so. I'll take a look to list 
archive also.

BTW, wasn't there also a core dump for this case too? Could you do
a ls -R /var/lib/heartbeat/cores and check.

I don't know how to find core dump :/ In this case, should it be 
core.22560 ?


Some newer releases of file(1) show the program name which dumped
the core:

$ file core.6468 
core.6468: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, from 'gaim'


Also, you can match the timestamps of core files and from the logs.


# /etc/ha.d/resource.d/IPv6addr 2001:660:6301:301::47:1 start
*** glibc detected *** free(): invalid next size (fast): 
0x0050d340 ***
/etc/ha.d/resource.d//hto-mapfuncs: line 51: 22560 Aborted 
   $__SCRIPT_NAME start

2007/04/26_10:46:38 ERROR:  Unknown error: 134
ERROR:  Unknown error: 134
[EMAIL PROTECTED] ls -R /var/lib/heartbeat/cores
/var/lib/heartbeat/cores:
hacluster  nobody  root

/var/lib/heartbeat/cores/hacluster:
core.3620  core.4116  core.4119  core.4123  core.5262  core.5265 
core.5269  core.5272

core.3626  core.4117  core.4121  core.4124  core.5263  core.5266  core.5270
core.3829  core.4118  core.4122  core.5256  core.5264  core.5268  core.5271

/var/lib/heartbeat/cores/nobody:

/var/lib/heartbeat/cores/root:
core.10766  core.21816  core.29951  core.3642  core.3650  core.3658 
core.3667  core.4471
core.11379  core.23505  core.30813  core.3643  core.3651  core.3661 
core.3668  core.4550
core.11592  core.24403  core.31033  core.3645  core.3652  core.3663 
core.4234  core.5104
core.12928  core.24863  core.3489   core.3647  core.3653  core.3664 
core.4371  core.5761
core.15849  core.25786  core.3591   core.3648  core.3654  core.3665 
core.4394  core.6130
core.21501  core.28286  core.3610   core.3649  core.3657  core.3666 
core.4470

[EMAIL PROTECTED]


Well, you have quite a few. Let's hope that they stem from only
those two errors.

I'll attach a script which should generate all backtraces from your
core files. It's been lightly tested but should work.


# ifconfig
eth0  Lien encap:Ethernet  HWaddr 00:13:72:58:74:5F
inet adr:193.48.169.46  Bcast:193.48.169.63 
Masque:255.255.255.224

adr inet6: 2001:660:6301:301:213:72ff:fe58:745f/64 Scope:Global
adr inet6: fe80::213:72ff:fe58:745f/64 Scope:Lien
adr inet6: 2001:660:6301:301::47:1/64 Scope:Global
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
RX packets:3788 errors:0 dropped:0 overruns:0 frame:0
TX packets:3992 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 lg file transmission:1000
RX bytes:450820 (440.2 KiB)  TX bytes:844188 (824.4 KiB)
Adresse de base:0xecc0 Mémoire:fe6e-fe70

And if I launch the script again, no error is returned :

# /etc/ha.d/resource.d/IPv6addr 2001:660:6301:301::47:1 start
2007/04/25_11:45:23 INFO:  Success
INFO:  Success

So, you're saying that once the resource is running, starting it
again doesn't produce an error? Did you also try to stop it and
start it from the stopped state?

Yes, but probably because the script just check that IPv6 is set, and so 
don't try to set it again. If I stop and start again, the error occurs.


For others errors, I disable stonith for the moment, and DRBD is built 
in kernel, so the drbd module is not needed. I've seen this message, 
but it's not a problem.

There's a small problem 

[Linux-HA] ERROR: parse_xml: Expected: action - HB 2.0.8

2007-04-26 Thread Alex Strachan
Error in /var/log/messages 

 

Apr 27 11:07:26 deneb crmd: [3038]: info: process_lrm_event: LRM operation
resource_itsapaims_skel1_start_0 (call=134, rc=0) complete

Apr 27 11:07:26 deneb crmd: [3038]: ERROR: parse_xml: Expected: action

Apr 27 11:07:26 deneb crmd: [3038]: ERROR: parse_xml: Error parsing token:
Mismatching close tag

Apr 27 11:07:26 deneb crmd: [3038]: ERROR: parse_xml: Error at or before:
/actions /resourc

Apr 27 11:07:26 deneb crmd: [3038]: ERROR: parse_xml: Error parsing token:
error parsing child

Apr 27 11:07:26 deneb crmd: [3038]: ERROR: parse_xml: Error at or before: /
action name=mon

Apr 27 11:07:26 deneb crmd: [3038]: ERROR: parse_xml: Error parsing token:
error parsing child

Apr 27 11:07:26 deneb crmd: [3038]: ERROR: parse_xml: Error at or before: 
actions action

Apr 27 11:07:26 deneb crmd: [3038]: ERROR: crm_abort: find_xml_node:
Triggered non-fatal assert at xml.c:77 : root != NULL

Apr 27 11:07:26 deneb crmd: [3038]: ERROR: cl_get_value: wrong arugment
(__name__)

Apr 27 11:07:26 deneb crmd: [3038]: WARN: find_xml_node: Could not find
actions in (null).

 

 

Definition of the primitive

 

 primitive class=ocf type=itsapaims_ISskel provider=heartbeat
restart_type=ignore id=resource_itsapaims_skel1

   instance_attributes
id=resource_itsapaims_skel1_instance_attrs

 attributes

   nvpair id=resource_itsapaims_skel1_user name=isskel_user
value=sadmin/

   nvpair id=resource_itsapaims_skel1_id name=isskel_id
value=Skel_Server1/

   nvpair id=resource_itsapaims_skel1_log name=isskel_log
value=isskel1/

   nvpair id=resource_itsapaims_skel1_sys name=isskel_sys
value=FIDS/

   nvpair id=resource_itsapaims_skel1_port name=isskel_port
value=11431/

   nvpair id=resource_itsapaims_skel1_config
name=isskel_config value=/u/fids/data/isskel1.cfg/

 /attributes

   /instance_attributes

   operations

 op id=skel1_itsapaims_skel1_mon interval=60s
name=monitor timeout=60s on_fail=restart/

   /operations

   instance_attributes id=resource_itsapaims_skel1

 attributes

   nvpair name=is_managed
id=resource_itsapaims_skel1-is_managed value=true/

 /attributes

   /instance_attributes

 /primitive

 

 

When I look in the output from  'cibadmin -Q' I don't see any actions tags.

 

[EMAIL PROTECTED] hb]# cibadmin -Q | grep -i actions

   nvpair id=cib-bootstrap-options-stop-orphan-actions
name=stop-orphan-actions value=true/

[EMAIL PROTECTED] hb]# cibadmin -Q | grep -i action

   nvpair id=cib-bootstrap-options-default-action-timeout
name=default-action-timeout value=240/

   nvpair id=cib-bootstrap-options-stonith-action
name=stonith-action value=reboot/

   nvpair id=cib-bootstrap-options-stop-orphan-actions
name=stop-orphan-actions value=true/

   rsc_order id=order_itsapaims_itsapaims1
from=resource_itsapaims1_aims to=group_itsapaims action=start
type=before symmetrical=true/

   rsc_order id=order_itsapaims_itsapaims2
from=resource_itsapaims2_aims to=group_itsapaims action=start
type=before symmetrical=true/

 

 

Any ideas?   Is the names too long?

 

 

 

Alex

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 



saved-cib.xml.gz
Description: GNU Zip compressed data
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Location constraints

2007-04-26 Thread Simon Horman
On Thu, Apr 26, 2007 at 12:08:29PM +0200, Benjamin Watine wrote:
 Simon Horman a écrit :

[snip]

 I use libnet v1.1.2.1 and I've applied your patch, but it don't solve my 
 problem.
 
 1) What version of linux-ha and libnet you are using
and where you got them from.
 
 Heartbeat v2.0.8 x86_64 from CentOS package 
 (http://mirror.centos.org/centos/4/extras/x86_64/RPMS/) before, but now 
 Heartbeat v2.0.8 from sources 
 (http://linux-ha.org/download/heartbeat-2.0.8.tar.gz)
 
 Libnet v1.1.2.1 (latest stable) from http://www.packetfactory.net/libnet/
 
 2) What architecture you are using.
 
 I'm running on RedHat ES4 x86_64
 
 3) If you could provide a backtrace of the crash, preferably using
versions of linux-ha and libnet that have been recompiled with
debuging symbols.  (In the general case this means adding -g to
CFLAGS, then rebuilding from scratch, including rerunning ./configure).
 
 I've rebuilded Heartbeat from sources, enabled debugging (-g option was 
 already 
 in CFLAGS if I don't make mistake), but I don't know how to do a  backtrace :/
 
 I've tried to do :
 
 gdb /usr/lib/ocf/resource.d/heartbeat/IPv6addr
 run 2001:660:6301:301::47:1 start
 Starting program: /usr/lib/ocf/resource.d/heartbeat/IPv6addr 
 2001:660:6301:301::47:1 start
 [Thread debugging using libthread_db enabled]
 [New Thread 47165808758720 (LWP 4360)]
 usage: /usr/lib/ocf/resource.d/heartbeat/IPv6addr 
 {start|stop|status|monitor|validate-all|meta-data}
 
 Program exited with code 02.
 
 What is the usage of executable IPv6addr ? It's ok for its resource agent 
 (/etc/ha.d/resource.d/IPv6addr (IPv6) start), but not for the executable. How 
 can I do the backtrace of IPv6addr ?

Hi,

thanks for taking some more time to look into this.

The address is passed using the environment variable
OCF_RESKEY_ipv6addr, so you want to run something like:

OCF_RESKEY_ipv6addr=2001:660:6301:301::47:1 gdb 
/usr/lib/ocf/resource.d/heartbeat/IPv6addr
(gdb) run start

If this doesn't provide any intersting information, valgrind often does.

OCF_RESKEY_ipv6addr=2001:660:6301:301::47:1 valgrind 
/usr/lib/ocf/resource.d/heartbeat/IPv6addr start

Though I did put some effort into getting rid of the valgrind errors
that I saw, and those problems should be resolved in the unstable tree.

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems