Re: [Pacemaker] Multi-level ACLs for the CIB

2010-03-19 Thread Yan Gao

On 03/19/10 06:22, Lars Ellenberg wrote:
 On Wed, Mar 17, 2010 at 06:12:24PM +0800, Yan Gao wrote:
 After investigating, I found that Unix domain sockets provide methods to
 identify the user on the other side of a socket. That means we don't need
 PAM to do authentication for local access, and the clients doesn't need
 to prompt user to input and transfer username/password to the server.
 And cib daemon still can run as hacluster.

 I've improved the ipcsocket library of cluster-glue to record user's identity
 info for cib to use.

 The behavior of remote access to the cib is still like before.

 Attached the patch for cluster-glue and the updated patch for pacemaker. 
 Looking
 forward to your review and comments. Thanks!
 
 diff -r 5e7284501da6 -r 699b8e950cdf include/clplumbing/ipc.h
 --- a/include/clplumbing/ipc.h   Mon Mar 15 16:03:30 2010 +0100
 +++ b/include/clplumbing/ipc.h   Wed Mar 17 15:06:08 2010 +0800
 @@ -132,6 +132,8 @@
  int ch_status;  /* identify the status of channel.*/
  int refcount;   /* reference count */
  pid_t   farside_pid;/* far side pid */
 +uid_t   farside_uid;/* far side uid */
 +gid_t   farside_gid;/* far side gid */
  void*   ch_private; /* channel private data. */
  /* (may contain conn. info.) */
  IPC_Ops*ops;/* IPC_Channel function table.*/
 
 
 If you instead add the new members
 at the _end_ of the struct(s),
 it should be easier to maintain ABI compatibility.
Right. I should have thought of this compatibility issue.

Thanks,
  Yan
-- 
Yan Gao y...@novell.com
Software Engineer
China Server Team, OPS Engineering, Novell, Inc.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] logd and corosync/pacemaker

2010-03-19 Thread Andreas Mock
-Ursprüngliche Nachricht-
Von: Dejan Muhamedagic deja...@fastmail.fm
Gesendet: 15.03.2010 11:01:03
An: pacemaker@oss.clusterlabs.org
Betreff: Re: [Pacemaker] logd and corosync/pacemaker

Hi,

On Fri, Mar 12, 2010 at 05:24:53PM +0100, Andreas Mock wrote:
 Hi all,
 
 with heartbeat it was adviced to use logd for logging.
 
 a) Is still valid for a corosync/pacemaker combination?

Yes.

 b) If yes, how is it enabled?

Set use_logd to yes in the pacemaker service stanza in
corosync.conf.


Hi Dejan,

I don't get it work with corosync. Probably some insight is missing.
a) Can you give me an example of that stanza?
b) Which services start to log to logd if logd is enabled?
c) Does corosync also log to logd? 
d) If I enable logd, what does the paragraph
logging {
 fileline: off
 to_syslog: no
 to_stderr: no
 to_logfile: yes
 syslog_facility: daemon
 logfile: /tmp/corosync.log
 debug: on
 timestamp: on
 logger_subsys {
 subsys: AMF
 debug: off
 }
}
in /etc/corosync/corosync.conf mean?

Help needed.   :-)

Thank you in advance
Andreas

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] DRBD Management Console 0.7.0

2010-03-19 Thread Rasto Levrinc
Hi,

This is the next DRBD-MC beta release 0.7.0. DRBD-MC, that is also a 
Pacemaker GUI, is a Java application that helps to configure 
DRBD/Pacemaker/Corosync/Heartbeat clusters. It is compatible with Heartbeat 
2.1.3 to the Pacemaker 1.0.8 with both available communication layers and 
DRBD 8.

In this release resource defaults were added. This is one of the missing 
features, that normal people would use, I think. You'll find it in next to 
the global options if you cluster software supports it.

The great new feature are the different operating modes and advanced/not 
advanced modes. Depending on the operating mode, the input fields and menus 
are shown, hidden, enabled or disabled. For example an administrator that 
does not configure the cluster, can quickly find what he needs, not to worry 
removing or changing something he shouldn't. You can start the DRBD-MC with 
maximum operating mode that is allowed for the user in the whole application 
and change to the lesser modes on the fly. Of course it can be easily 
circumvented so this is not a security feature, but...

Disabling and hiding of widgets, menus and whole panels should fit nicely 
with the upcoming ACLs, as soon as they figure out how to do it.

Till then, the new operating modes make the configuration and administration 
of a the cluster much easier, faster and almost enjoyable.

There are currently following Operating Modes:

Read-Only (cmd option --ro):
read-only access is granted, you can view the cluster, add and remove 
clusters to and from your DRBD MC, but you cannot change anything on the 
cluster. This is somewhat equivalent to watching crm_mon, but way more 
informative. Additionally you can start VNC Viewer to work with Virtual 
Machines.

Operator (--op):
you can do the basic tasks like stop, start and migrate resources and put 
nodes to and out of standby, but also resolve DRBD split-brains for example. 
All configuration options are hidden. This is, what it seems, about the 
functionality that Hawk is going to have, but without operations on DRBD and 
VMs.

Administrator (--admin):
this level of access can create, configure, reconfigure and destroy, as well 
as operate the whole cluster, but many most of the options are hidden. This 
is a default operating mode.

Administrator (--admin)/ Advanced:
here are the options, that are seldom needed and/or I am not even sure what 
they do.

Another but hidden operating mode is a God mode. This is useful, as the name 
suggest, only for development and for testing.

http://oss.linbit.com/drbd-mc/img/drbd-mc-0.7.0.png

You can get DRBD MC here:

http://www.drbd.org/mc/management-console/
http://oss.linbit.com/drbd-mc/DMC-0.7.0.jar
http://oss.linbit.com/drbd-mc/drbd-mc-0.7.0.tar.gz

You can start it with help of Java Web-Start or you can download it and 
start it with java -Xmx512m -jar DMC-0.7.0.jar command. Make sure you use 
the Java from SUN. The openjdk seems to work fine by some time now, but it 
seems to run DRBD MC much slower than the original Java.

Rasto Levrinc

Here is the changelog:
* Removing of DRBD resources was fixed.
* VNC viewer menu in the cluster view was fixed.
* stonith-timeout and priority stonith attributes were added.
* stonith_ prefix for stonith devices in their ids is used.
* When group is stopped, it is indicated in the cluster view.
* master and slave target-roles for master slave resources were added.
* All missing meta-attributes for groups and clones were added.
* Advisory values from status, meta-data and validate-all operations are not 
used anymore.
* Some global CRM parameters, that didn't have defaults, were fixed.
* Different operating modes were implemented.
* Terminal frame is started as collapsed now.
* Parsing of operation defaults was added.
* Resource defaults were added.
* Metal look-and-feel is forced so that it works on Macs.
* DRBD status after start is not delayed after start-up.
* GUI helper perl script got a version to its file name, so that different 
versions of DRBD-MC can be used at the same time on one cluster.
* Smoother and faster start-up, when there are many resources.

-- 
: Dipl-Ing Rastislav Levrinc
: DRBD-MC http://www.drbd.org/mc/management-console/
: DRBD/HA support and consulting http://www.linbit.com/
DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria.


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] Packaging error in cluster-glue

2010-03-19 Thread Andreas Mock
Hi all,

I don't know who feels responsible for that, but I found the following:

My external stonith script ibmrsa-telnet was enhanced in several ways which
is great. I take the chance to thank all contributors.

One of the enhancements was a change in logging. From the beginning there
was a way to log the operation of the stonith script as there was no common
service for that.

Now a subprocess call to 'ha_log.sh' is done which is a part of cluster-glue
(path /usr/share/cluster-glue/ha_log.sh in clusterglue 1.0.3).
BUT: The script is called without any path, only with ha_log.sh which is not 
in the path by default or postinstallatin script.

So, please use a full path to ha_log.sh in ibmrsa-telnet.

As I don't know how these relocatable things are handled for the build 
environment
and where and how to search for the sources in the mecurial repositories, 
I would be thankful seeing someone changing it.

Best regards
Andreas Mock

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] WARNING: drbd0: default-action-timeout

2010-03-19 Thread Dejan Muhamedagic
Hi,

On Thu, Mar 18, 2010 at 10:52:24PM +0100, Andreas Mock wrote:
 -Ursprüngliche Nachricht-
 Von: Michael Schwartzkopff mi...@multinet.de
 Gesendet: 18.03.2010 21:32:22
 An: pacemaker@oss.clusterlabs.org
 Betreff: Re: [Pacemaker] WARNING: drbd0: default-action-timeout
 
 Hi,
 
 new introduced in 1.0.8. If your resources work you can safely ignore this.
 
 
 
 Hi Michael,
 
 is this a new feature which can be used by other RA too? Is the RA giving the
 recommendation which leads to the warning?

Yes. The feature has always been there, it's just that the
timeouts were checked only if set explicitely, but not against
the default-action-timeout. It could get noisy I'm afraid, but
you should follow the advice and fix the timeouts. Or increase
the default-action-timeout. Of course, depends on your resources.
What is advised in the metadata of resource agents should be the
minumum timeouts for resources of that type.

If you find these warnings showing up too often, set the
check-frequency option to on-verify:

crm options check-frequency on-verify

It is also possible to set this option to never, but I'd
strongly advise against it for production clusters.

Thanks,

Dejan

 Or is it directly programmed into 
 pacemaker (or a part of it)?

 More informations welcome.
 
 Best regards
 Andreas
 
 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Problem : Sometimes failed in the start of the guest(on KVM).

2010-03-19 Thread Dejan Muhamedagic
Hi Hideo-san,

On Fri, Mar 19, 2010 at 10:46:12AM +0900, renayama19661...@ybb.ne.jp wrote:
 Hi,
 
 I use VirtualDomain-RA and, on KVM, constitute a cluster.
 
 However, a guest sometimes fails in start.
 
 Mar 16 15:16:52 x3650e lrmd: [13457]: info: RA output: 
 (guest-kvm1:start:stderr) error: Failed to
 start domain kvm1 error: internal error unable to start guest: inet_listen: 
 bind(ipv4,127.0.0.1,5900):
 Address already in use inet_listen: FAILED
 Mar 16 15:16:52 x3650e lrmd: [13457]: info: RA output: 
 (guest-kvm3:start:stdout) Domain kvm3 started
 Mar 16 15:16:52 x3650e crmd: [13460]: info: abort_transition_graph: 
 te_update_diff:146 - Triggered
 transition abort (complete=0, tag=transient_attributes, id=x3650f, magic=NA, 
 cib=0.102.76) : Transient
 attribute: update
 Mar 16 15:16:52 x3650e VirtualDomain[13781]: ERROR: Failed to start virtual 
 domain kvm1.
 
 Is this a problem related to libvirt?
 
 We are the next environment.
  * RHEL5.4-64(kvm)
   * libvirt-0.6.3-20.el5
   * libvirt-python-0.6.3-20.el5
   * libvirt-0.6.3-20.el5
  * corosync-1.2.0.zip
  * Cluster-Resource-Agents-bb7dc7b7f6e4.tar.gz
  * Pacemaker-1-0-efdc0d8143dd.tar.gz
  * Pacemaker-Python-GUI-a05fd62b2e13.tar.gz
  * Reusable-Cluster-Components-65900eaaf453.tar.gz
 
 Do know the solution of the problem?

 Mar 16 15:16:52 x3650e lrmd: [13457]: info: RA output: 
 (guest-kvm1:start:stderr) error: Failed to
 start domain kvm1 error: internal error unable to start guest: inet_listen: 
 bind(ipv4,127.0.0.1,5900):

IIRC, that port has to do with vnc and something else (another
VNC server?) has already been started on that port.

Thanks,

Dejan

 
 Best Regards,
 Hideo Yamauchi.
 
 
 
 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] logd and corosync/pacemaker

2010-03-19 Thread Dejan Muhamedagic
Hi,

On Fri, Mar 19, 2010 at 10:34:27AM +0100, Andreas Mock wrote:
 -Ursprüngliche Nachricht-
 Von: Dejan Muhamedagic deja...@fastmail.fm
 Gesendet: 15.03.2010 11:01:03
 An: pacemaker@oss.clusterlabs.org
 Betreff: Re: [Pacemaker] logd and corosync/pacemaker
 
 Hi,
 
 On Fri, Mar 12, 2010 at 05:24:53PM +0100, Andreas Mock wrote:
  Hi all,
  
  with heartbeat it was adviced to use logd for logging.
  
  a) Is still valid for a corosync/pacemaker combination?
 
 Yes.
 
  b) If yes, how is it enabled?
 
 Set use_logd to yes in the pacemaker service stanza in
 corosync.conf.
 
 
 Hi Dejan,
 
 I don't get it work with corosync. Probably some insight is missing.
 a) Can you give me an example of that stanza?

service {
#Default to start mgmtd with pacemaker
use_mgmtd:  yes
#Use logd for pacemaker
use_logd:   yes
#Version
ver:0
#The name of the service
name:   pacemaker
}

 b) Which services start to log to logd if logd is enabled?

All pacemaker subsystems.

 c) Does corosync also log to logd? 

No.

 d) If I enable logd, what does the paragraph
 logging {
  fileline: off
  to_syslog: no
  to_stderr: no
  to_logfile: yes
  syslog_facility: daemon
  logfile: /tmp/corosync.log
  debug: on
  timestamp: on
  logger_subsys {
  subsys: AMF
  debug: off
  }
 }
 in /etc/corosync/corosync.conf mean?

That's for corosync.

Thanks,

Dejan

 Help needed.   :-)
 
 Thank you in advance
 Andreas
 
 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Problem : Sometimes failed in the start of the guest(on KVM).

2010-03-19 Thread renayama19661014
Hi Dejan,

 IIRC, that port has to do with vnc and something else (another
 VNC server?) has already been started on that port.

Thank you for comment. 
I examine it a little more.

Best Regards,
Hideo Yamauchi.


--- Dejan Muhamedagic deja...@fastmail.fm wrote:

 Hi Hideo-san,
 
 On Fri, Mar 19, 2010 at 10:46:12AM +0900, renayama19661...@ybb.ne.jp wrote:
  Hi,
  
  I use VirtualDomain-RA and, on KVM, constitute a cluster.
  
  However, a guest sometimes fails in start.
  
  Mar 16 15:16:52 x3650e lrmd: [13457]: info: RA output: 
  (guest-kvm1:start:stderr) error: Failed
 to
  start domain kvm1 error: internal error unable to start guest: inet_listen:
 bind(ipv4,127.0.0.1,5900):
  Address already in use inet_listen: FAILED
  Mar 16 15:16:52 x3650e lrmd: [13457]: info: RA output: 
  (guest-kvm3:start:stdout) Domain kvm3
 started
  Mar 16 15:16:52 x3650e crmd: [13460]: info: abort_transition_graph: 
  te_update_diff:146 -
 Triggered
  transition abort (complete=0, tag=transient_attributes, id=x3650f, 
  magic=NA, cib=0.102.76) :
 Transient
  attribute: update
  Mar 16 15:16:52 x3650e VirtualDomain[13781]: ERROR: Failed to start virtual 
  domain kvm1.
  
  Is this a problem related to libvirt?
  
  We are the next environment.
   * RHEL5.4-64(kvm)
* libvirt-0.6.3-20.el5
* libvirt-python-0.6.3-20.el5
* libvirt-0.6.3-20.el5
   * corosync-1.2.0.zip
   * Cluster-Resource-Agents-bb7dc7b7f6e4.tar.gz
   * Pacemaker-1-0-efdc0d8143dd.tar.gz
   * Pacemaker-Python-GUI-a05fd62b2e13.tar.gz
   * Reusable-Cluster-Components-65900eaaf453.tar.gz
  
  Do know the solution of the problem?
 
  Mar 16 15:16:52 x3650e lrmd: [13457]: info: RA output: 
  (guest-kvm1:start:stderr) error: Failed
 to
  start domain kvm1 error: internal error unable to start guest: inet_listen:
 bind(ipv4,127.0.0.1,5900):
 
 IIRC, that port has to do with vnc and something else (another
 VNC server?) has already been started on that port.
 
 Thanks,
 
 Dejan
 
  
  Best Regards,
  Hideo Yamauchi.
  
  
  
  ___
  Pacemaker mailing list
  Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] WARNING: drbd0: default-action-timeout

2010-03-19 Thread Ken Dechick
Hi all,

  Thanks for the responses. I actaully had it figured out a few minutes after
I posted - doh! I just added:
 'op start timeout=XXs op stop timeout=XXs' for the 3 primitives that were
giving this warning - obviously using the suggested defaults for each. In the
case of DRBD I simply had to changeover to using:

 primitive drbd0 ocf:linbit:drbd params drbd_resource=drbd0 op \
 monitor interval=15s op start timeout=240s op stop timeout=100s  

 I had to make similar changes to the IPPADR2 and Filesystem primitives as well.

 This isn't something I wanted to ignore though - every command you issue in
the crm would give you that WARNING again - so with 3 resources complaining
you get a LOT of warnings anytime you issue a verify, commit, ect. With a warn
for the start timeout and one for the stop, that's 6 warning flags! As a
cluster admin, I don't like to see WARNING!  :)


  Heads up for all though: there seem to be a lot of minor little syntax
changes, the'crm resource migrate rsc node'  now needs an extra
argument of duration is anothert I ran into. tells you to reference:
http://en.wikipedia.org/wiki/ISO_8601#Durations
for proper syntax. I just used 1year duration as I planned on unmigrating
again almost immediately.

'crm resource migrate rsc node -PY1




Kenneth M DeChick
Linux Systems Administrator
Community Computer Service, Inc.
(315)-255-1751 ext154
http://www.medent.com
k...@medent.com
Registered Linux User #497318
-- -- -- -- -- -- -- -- -- -- --
You canna change the laws of physics, Captain; I've got to have thirtyminutes! 


.

-- Original Message ---
From: Glauber Cabral glauber...@gmail.com
To: pacema...@clusterlabs.org, k...@medent.com
Sent: Thu, 18 Mar 2010 17:25:22 -0300
Subject: Re: [Pacemaker] WARNING: drbd0: default-action-timeout

 Hi Kenneth
 
 I'm new to pacemaker, but I guess that your problem is that the
 timeout for start and stop action are set by default to 20s somewhere
 (it seems it's not defined in your file) and pacemaker is telling you
 there timeouts are shorter then the recomended ones.
 
 So, my suggestion is to define these timeouts by yourself, assuming
 the suggested values are OK.
 
 To do so, you should type the command below in shell to edit your
 configurations:
 # crm configure edit
 
 And change the DRDB primitive to this:
 
 primitive drbd0 ocf:linbit:drbd params drbd_resource=drbd0 \
 op monitor interval=15s \
 op start timeout=240s \
 op stop timeout=100s
 
 I hope this can help you =)
 
 []s
 Glauber
 
 On Thu, Mar 18, 2010 at 4:16 PM, Ken Dechick k...@medent.com wrote:
  Hi all,
 
  Just updated my test cluster to latest 1.0.8 pacemaker (from 1.0.6)  and
  3.0.2-2 heartbeat (from 3.0.1-1). Was going through my usual configuration
  steps, when I ran into a warning I have never seen before in setup. I start 
  my
  bare cluster and co cmd-line configuring within the crm shell from there.
 
  My first primitive device is my DRBD resource and the command I use:
   primitive drbd0 ocf:linbit:drbd params drbd_resource=drbd0 op monitor \
   interval=15s
 
  Today I am suddenly getting a new warning when I use this
 
   WARNING: drbd0: default-action-timeout 20s for start is smaller than the
  advised 240
  WARNING: drbd0: default-action-timeout 20s for stop is smaller than the
  advised 100
 
  But I don't know the syntax to correct this. Searching around in the lists I
  don't see anything - perhaps this is something new with pacemaker 1.0.8? Can
  anyone shed some light?
 
  -Thanks
 
 
  Kenneth M DeChick
  Linux Systems Administrator
  Community Computer Service, Inc.
  (315)-255-1751 ext154
  http://www.medent.com
  k...@medent.com
  Registered Linux User #497318
  -- -- -- -- -- -- -- -- -- -- --
  You canna change the laws of physics, Captain; I've got to have
thirtyminutes! 
 
  .
 
  This message has been scanned for viruses and dangerous content by
MailScanner, SpamAssassin  ClamAV.
 
  This message and any attachments may contain information that is protected
by law as privileged and confidential, and
  is transmitted for the sole use of the intended recipient(s). If you are
not the intended recipient, you are hereby notified
  that any use, dissemination, copying or retention of this e-mail or the
information contained herein is strictly prohibited.
  If you received this e-mail in error, please immediately notify the sender
by e-mail, and permanently delete this e-mail.
 
 
  ___
  Pacemaker mailing list
  Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
--- End of Original Message ---

This message has been scanned for viruses and dangerous content by MailScanner, 
SpamAssassin  ClamAV.

This message and any attachments may contain information that is protected by 
law as privileged and confidential, and
is transmitted for the sole use of the intended recipient(s). If you are not 
the intended recipient, you are hereby notified

Re: [Pacemaker] [PATCH] Medium: build: require Net-SNMP 5.3 or later

2010-03-19 Thread Florian Haas
On 03/18/2010 10:02 AM, Andrew Beekhof wrote:
 On Wed, Mar 17, 2010 at 11:02 AM, Dejan Muhamedagic deja...@fastmail.fm 
 wrote:
 Hi,

 On Wed, Mar 17, 2010 at 09:17:38AM +0100, Florian Haas wrote:
 Andrew,

 now that Pacemaker has been on a bi-monthly release schedule for a
 while, is there any chance you could consider publishing RCs before the
 actual releases, at least for the stable-1.0 branch?
 Good idea. That would give others a chance to give the RC a try
 and report any problems before the final release.
 
 I use the following for the pacemaker rpms I'm using at any given time:
http://www.clusterlabs.org/rpm/testing/
 
 They're for 64-bit F-12 but you can rebuild them for whatever platform you 
 like
 I've no intention of building them for other platforms, I'd spend my
 entire life building packages instead of getting any work done.
 
 Just be sure to refresh the metadata regularly, burning through
 release numbers and disk space isn't the goal here.

Who said you should build RC _packages_?

Tag an RC, upload a tarball, announce on mailing list, done. How is that
extra work?

No wait, Pacemaker builds directly from a Mercurial tarball. So scratch
the upload part.

Florian




signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] node states

2010-03-19 Thread Florian Haas
On 03/17/2010 09:30 PM, Andrew Beekhof wrote:
 On Wed, Mar 17, 2010 at 7:53 PM, Matthew Palmer mpal...@hezmatt.org wrote:
 On Wed, Mar 17, 2010 at 07:16:16AM -0500, Schaefer, Diane E wrote:
   We were wondering what the node state of UNCLEAN, with the three
   variations of online, offline and pending returned in crm_mon mean.  We
   had the heartbeat service off on one of our nodes and the other node
   reported UNCLEAN (online).  We seem to get it when the nodes are not
   communicating.  Thanks for any clarification.
 Unclean (online) means that the STONITH resource for that node had some
 failures, and so the cluster isn't confident that when it comes time to
 shoot that node (if required), it'll actually work.
 
 You'll also see it when any resource fails to stop _and_ stonith isn't 
 enabled.

Never seen that.

AFAICS when you disable STONITH and a resource fails on stop, then the
resource goes into the Unmanaged state, but the associated node does not
become Unclean. At least as far as crm_mon says.

Florian



signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] [PATCH] Medium: build: require Net-SNMP 5.3 or later

2010-03-19 Thread Andrew Beekhof
On Fri, Mar 19, 2010 at 8:54 AM, Florian Haas florian.h...@linbit.com wrote:
 On 03/18/2010 10:02 AM, Andrew Beekhof wrote:
 On Wed, Mar 17, 2010 at 11:02 AM, Dejan Muhamedagic deja...@fastmail.fm 
 wrote:
 Hi,

 On Wed, Mar 17, 2010 at 09:17:38AM +0100, Florian Haas wrote:
 Andrew,

 now that Pacemaker has been on a bi-monthly release schedule for a
 while, is there any chance you could consider publishing RCs before the
 actual releases, at least for the stable-1.0 branch?
 Good idea. That would give others a chance to give the RC a try
 and report any problems before the final release.

 I use the following for the pacemaker rpms I'm using at any given time:
    http://www.clusterlabs.org/rpm/testing/

 They're for 64-bit F-12 but you can rebuild them for whatever platform you 
 like
 I've no intention of building them for other platforms, I'd spend my
 entire life building packages instead of getting any work done.

 Just be sure to refresh the metadata regularly, burning through
 release numbers and disk space isn't the goal here.

 Who said you should build RC _packages_?

 Tag an RC, upload a tarball, announce on mailing list, done. How is that
 extra work?

 No wait, Pacemaker builds directly from a Mercurial tarball. So scratch
 the upload part.

What does the tag achieve apart from ensuring people waste their time
testing versions that don't have any fixes since it was created?

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] WARNING: drbd0: default-action-timeout

2010-03-19 Thread Dejan Muhamedagic
Hi,

On Fri, Mar 19, 2010 at 09:24:16AM -0400, Ken Dechick wrote:
 Hi all,
 
   Thanks for the responses. I actaully had it figured out a few minutes after
 I posted - doh! I just added:
  'op start timeout=XXs op stop timeout=XXs' for the 3 primitives that were
 giving this warning - obviously using the suggested defaults for each. In the
 case of DRBD I simply had to changeover to using:
 
  primitive drbd0 ocf:linbit:drbd params drbd_resource=drbd0 op \
  monitor interval=15s op start timeout=240s op stop timeout=100s  
 
  I had to make similar changes to the IPPADR2 and Filesystem primitives as 
 well.
 
  This isn't something I wanted to ignore though - every command you issue in
 the crm would give you that WARNING again - so with 3 resources complaining
 you get a LOT of warnings anytime you issue a verify, commit, ect. With a warn
 for the start timeout and one for the stop, that's 6 warning flags! As a
 cluster admin, I don't like to see WARNING!  :)

Well, that's a good attitude. This information is important.
Though they may be annoying, they should really be addressed in
some way. It could also be that the timeouts advertised by the RA
are wrong. If you think so, then please post a question. For
instance, I can see now that those for IPaddr/IPaddr2 are really
excessive.

   Heads up for all though: there seem to be a lot of minor little syntax
 changes, the'crm resource migrate rsc node'  now needs an extra
 argument of duration is anothert I ran into. tells you to reference:

That's not exactly true. If it were, that would've been a
regression and in general we don't like those. It is true that
there is an extra parameter, but it's optional. If the above form
doesn't work (it really does here), then please open a bugzilla.

Cheers,

Dejan

 http://en.wikipedia.org/wiki/ISO_8601#Durations
 for proper syntax. I just used 1year duration as I planned on unmigrating
 again almost immediately.
 
 'crm resource migrate rsc node -PY1
 
 
 
 
 Kenneth M DeChick
 Linux Systems Administrator
 Community Computer Service, Inc.
 (315)-255-1751 ext154
 http://www.medent.com
 k...@medent.com
 Registered Linux User #497318
 -- -- -- -- -- -- -- -- -- -- --
 You canna change the laws of physics, Captain; I've got to have 
 thirtyminutes! 
 
 .
 
 -- Original Message ---
 From: Glauber Cabral glauber...@gmail.com
 To: pacema...@clusterlabs.org, k...@medent.com
 Sent: Thu, 18 Mar 2010 17:25:22 -0300
 Subject: Re: [Pacemaker] WARNING: drbd0: default-action-timeout
 
  Hi Kenneth
  
  I'm new to pacemaker, but I guess that your problem is that the
  timeout for start and stop action are set by default to 20s somewhere
  (it seems it's not defined in your file) and pacemaker is telling you
  there timeouts are shorter then the recomended ones.
  
  So, my suggestion is to define these timeouts by yourself, assuming
  the suggested values are OK.
  
  To do so, you should type the command below in shell to edit your
  configurations:
  # crm configure edit
  
  And change the DRDB primitive to this:
  
  primitive drbd0 ocf:linbit:drbd params drbd_resource=drbd0 \
  op monitor interval=15s \
  op start timeout=240s \
  op stop timeout=100s
  
  I hope this can help you =)
  
  []s
  Glauber
  
  On Thu, Mar 18, 2010 at 4:16 PM, Ken Dechick k...@medent.com wrote:
   Hi all,
  
   Just updated my test cluster to latest 1.0.8 pacemaker (from 1.0.6)  and
   3.0.2-2 heartbeat (from 3.0.1-1). Was going through my usual configuration
   steps, when I ran into a warning I have never seen before in setup. I 
   start my
   bare cluster and co cmd-line configuring within the crm shell from there.
  
   My first primitive device is my DRBD resource and the command I use:
    primitive drbd0 ocf:linbit:drbd params drbd_resource=drbd0 op monitor \
    interval=15s
  
   Today I am suddenly getting a new warning when I use this
  
    WARNING: drbd0: default-action-timeout 20s for start is smaller than the
   advised 240
   WARNING: drbd0: default-action-timeout 20s for stop is smaller than the
   advised 100
  
   But I don't know the syntax to correct this. Searching around in the 
   lists I
   don't see anything - perhaps this is something new with pacemaker 1.0.8? 
   Can
   anyone shed some light?
  
   -Thanks
  
  
   Kenneth M DeChick
   Linux Systems Administrator
   Community Computer Service, Inc.
   (315)-255-1751 ext154
   http://www.medent.com
   k...@medent.com
   Registered Linux User #497318
   -- -- -- -- -- -- -- -- -- -- --
   You canna change the laws of physics, Captain; I've got to have
 thirtyminutes! 
  
   .
  
   This message has been scanned for viruses and dangerous content by
 MailScanner, SpamAssassin  ClamAV.
  
   This message and any attachments may contain information that is protected
 by law as privileged and confidential, and
   is transmitted for the sole use of the intended recipient(s). If you are
 not the intended recipient, you are hereby notified
   that any 

Re: [Pacemaker] [PATCH] Medium: build: require Net-SNMP 5.3 or later

2010-03-19 Thread Florian Haas
On 03/19/2010 03:39 PM, Andrew Beekhof wrote:
 Who said you should build RC _packages_?

 Tag an RC, upload a tarball, announce on mailing list, done. How is that
 extra work?

 No wait, Pacemaker builds directly from a Mercurial tarball. So scratch
 the upload part.
 
 What does the tag achieve apart from ensuring people waste their time
 testing versions that don't have any fixes since it was created?

Remind contributors that a release is imminent?

Florian



signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] Building an active/passive dhcp server

2010-03-19 Thread Emmanuel Lesouef
Hello,

I'm trying to make a active/passive dhcp server.

Currently, it works with the following setup :

* 2 debian servers with pacemaker :
node1 with physical ip1 and virtual ip vip1 (managed with pacemaker)
node2 with physical ip2.

* 1 lsb dhcp3-server resource that is on node1 and migrates ok on node2,

* a rsync cron gets the dhcp lease file from node1 to node2 in order
  not to start an empty dhcp lease file,

* the server should be delivering dhcp lease with vip1 because it
  is on a vlan and core router use cisco ip helper to send dhcp
  requests.

The problem is that when node1 come online again, there's a difference
in the dhcp lease file.

I think that using rsync to synchronize the lease file is not the best
solution and that a clustered file system is the best solution.

What are your opinions about such a setup ? Are there some best
practices ?

Thanks for your help and informations about this.

-- 
Emmanuel Lesouef

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Building an active/passive dhcp server

2010-03-19 Thread Matthew Palmer
On Fri, Mar 19, 2010 at 10:47:59PM +0100, Emmanuel Lesouef wrote:
 I'm trying to make a active/passive dhcp server.

[...]

 The problem is that when node1 come online again, there's a difference
 in the dhcp lease file.
 
 I think that using rsync to synchronize the lease file is not the best
 solution and that a clustered file system is the best solution.

Yes, rsyncing your leases file around isn't going to be a win.  However, a
clustered filesystem is a really bad idea, as the complexity is far more
than you need.  Instead, a small DRBD (http://www.drbd.org/) volume with a
regular filesystem such as ext3 will work Just Fine And Dandy.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker