[ClusterLabs] Announcing the 2.1.5 release of crmsh

2016-01-12 Thread Kristoffer Grönlund
Hello everyone,

Today we are proud to announce the release of crmsh version 2.1.5!
This release mainly consists of bug fixes, as well as compatibility
with Pacemaker 1.1.14.

For a complete list of changes since the previous version, please
refer to the changelog:

* https://github.com/ClusterLabs/crmsh/blob/2.1.5/ChangeLog

Packages for several popular Linux distributions can be downloaded
from the Stable repository at the OBS:

* http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/

Archives of the tagged release:

* https://github.com/ClusterLabs/crmsh/archive/2.1.5.tar.gz
* https://github.com/ClusterLabs/crmsh/archive/2.1.5.zip

Changes since the previous release:

- medium: report: Try to load source as session if possible (bsc#927407)
- medium: crm_gv: Wrap non-identifier names in quotes (bsc#931837)
- medium: crm_gv: Improved quoting of non-identifier node names (bsc#931837)
- medium: crm_pkg: Fix cluster init bug on RH-based systems
- medium: hb_report: Collect logs from pacemaker.log
- medium: constants: Add 'provides' meta attribute (bsc#936587)
- high: parse: Add attributes to terminator set (bsc#940920)
- Medium: cibconfig: skip sanity check for properties other than 
cib-bootstrap-options
- medium: config: Add report_tool_options (bsc#917638)
- low: main: Bash completion didn't handle sudo correctly
- high: report: New detection to fix missing transitions (bnc#917131)
- medium: report: Add pacemaker.log to find_node_log list (bsc#941734)
- high: hb_report: Prefer pacemaker.log if it exists (bsc#941681)
- high: report: Output format from pacemaker has changed (bsc#941681)
- high: report: Update transition edge regexes (bsc#942906)
- medium: report: Reintroduce empty transition pruning (bsc#943291)
- medium: log_patterns: Remove reference to function name in log patterns 
(bsc#942906)
- low: hb_report: Collect libqb version (bsc#943327)
- high: parse: Fix crash when referencing score types by name (bsc#940194)
- low: constants: Add meta attributes for remote nodes
- low: ui_history: Swap from and to times if to < from
- high: cibconfig: Do not fail on unknown pacemaker schemas (bsc#946893)
- high: log_patterns_118: Update the correct set of log patterns (bsc#942906)
- high: xmlutil: Order is significant in resource_set (bsc#955434)
- high: cibconfig: Fix XML import bug for cloned groups (bsc#959895)

Thank you,

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Automatic Recover for stonith:external/libvirt

2016-01-12 Thread mr
Thanks for the reply. After further successless testing for the 
automatic recover I had read this artikel:


 http://clusterlabs.org/doc/crm_fencing.html

There is a recommendation to monitor only once in a few hours the 
fencing device.


I am happy with it and so I configured the interval for monitoring at 
9600 secs (3-4 hours).


Cheers

Michael

On 08.01.2016 16:30, Ken Gaillot wrote:

On 01/08/2016 08:56 AM, m...@inwx.de wrote:

Hello List,

I have here a test environment for checking pacemaker. Sometimes our
kvm-hosts with libvirt have trouble with responding the stonith/libvirt
resource, so I like to configure the service to realize as failed after
three failed monitoring attempts. I was searching for a configuration
here:


http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/index.html


But I failed after hours.

That's the configuration line for stonith/libvirt:

crm configure primitive p_fence_ha3 stonith:external/libvirt  params
hostlist="ha3" hypervisor_uri="qemu+tls://debian1/system" op monitor
interval="60"

Every 60 seconds pacemaker makes something like this:

  stonith -t external/libvirt hostlist="ha3"
hypervisor_uri="qemu+tls://debian1/system" -S
  ok

To simulate the unavailability of the kvm host I remove the certificate
in /etc/libvirt/libvirtd.conf and restart libvirtd. After 60 seconds or
less I can see the error with "crm status". On the kvm host I add
certificate again to /etc/libvirt/libvirtd.conf and restart libvirt
again. Although libvirt is again available the stonith-resource did not
start again.

I altered the configuration line for stonith/libvirt with following parts:

  op monitor interval="60" pcmk_status_retries="3"
  op monitor interval="60" pcmk_monitor_retries="3"
  op monitor interval="60" start-delay=180
  meta migration-threshold="200" failure-timeout="120"

But always with first failed monitor check after 60 or less seconds
pacemakers did not resume stonith-libvirt after libvirt is again available.


Is there enough time left in the timeout for the cluster to retry? (The
interval is not the same as the timeout.) Check your pacemaker.log for
messages like "Attempted to execute agent ... the maximum number of
times (...) allowed". That will tell you whether it is retrying.

You definitely don't want start-delay, and migration-threshold doesn't
really mean much for fence devices.

Of course, you also want to fix the underlying problem of libvirt not
being responsive. That doesn't sound like something that should
routinely happen.

BTW I haven't used stonith/external agents (which rely on the
cluster-glue package) myself. I use the fence_virtd daemon on the host
with fence_xvm as the configured fence agent.


Here is the "crm status"-output on debian 8 (Jessie):

  root@ha4:~# crm status
  Last updated: Tue Jan  5 10:04:18 2016
  Last change: Mon Jan  4 18:18:12 2016
  Stack: corosync
  Current DC: ha3 (167772400) - partition with quorum
  Version: 1.1.12-561c4cf
  2 Nodes configured
  2 Resources configured
  Online: [ ha3 ha4 ]
  Service-IP (ocf::heartbeat:IPaddr2):   Started ha3
  haproxy(lsb:haproxy):  Started ha3
  p_fence_ha3(stonith:external/libvirt): Started ha4

Kind regards

Michael R.



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Wait until resource is really ready before moving clusterip

2016-01-12 Thread Joakim Hansson
Hi!
I have a cluster running tomcat which in turn run solr.
I use three nodes with loadbalancing via ipaddr2.
The thing is, when tomcat is started on a node it takes about 2 minutes
before solr is functioning correctly.

Is there a way to make the ipaddr2-clone wait 2 minutes after tomcat is
started before it moves the ip to the node?

Much appreciated!

/ Jocke
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Wait until resource is really ready before moving clusterip

2016-01-12 Thread Kristoffer Grönlund
Joakim Hansson  writes:

> Hi!
> I have a cluster running tomcat which in turn run solr.
> I use three nodes with loadbalancing via ipaddr2.
> The thing is, when tomcat is started on a node it takes about 2 minutes
> before solr is functioning correctly.
>
> Is there a way to make the ipaddr2-clone wait 2 minutes after tomcat is
> started before it moves the ip to the node?
>
> Much appreciated!

Hi,

There is the ocf:heartbeat:Delay resource agent, which on one hand is
documented as a test resource, but on the other hand should do what you
need:

primitive solr ...
primitive two-minute-delay ocf:heartbeat:Delay \
  params startdelay=120 meta target-role=Started \
  op start timeout=180
group solr-then-wait solr two-minute-delay

Now the group acts basically like the solr resource, except for the
two-minute delay after starting solr before the group itself is
considered started.

Cheers,
Kristoffer

>
> / Jocke
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Wait until resource is really ready before moving clusterip

2016-01-12 Thread Ken Gaillot
On 01/12/2016 07:57 AM, Kristoffer Grönlund wrote:
> Joakim Hansson  writes:
> 
>> Hi!
>> I have a cluster running tomcat which in turn run solr.
>> I use three nodes with loadbalancing via ipaddr2.
>> The thing is, when tomcat is started on a node it takes about 2 minutes
>> before solr is functioning correctly.
>>
>> Is there a way to make the ipaddr2-clone wait 2 minutes after tomcat is
>> started before it moves the ip to the node?
>>
>> Much appreciated!
> 
> Hi,
> 
> There is the ocf:heartbeat:Delay resource agent, which on one hand is
> documented as a test resource, but on the other hand should do what you
> need:
> 
> primitive solr ...
> primitive two-minute-delay ocf:heartbeat:Delay \
>   params startdelay=120 meta target-role=Started \
>   op start timeout=180
> group solr-then-wait solr two-minute-delay
> 
> Now the group acts basically like the solr resource, except for the
> two-minute delay after starting solr before the group itself is
> considered started.
> 
> Cheers,
> Kristoffer
> 
>>
>> / Jocke

Another way would be to customize the tomcat resource agent so that
start doesn't return success until it's fully ready to accept requests
(which would probably be specific to whatever app you're running via
tomcat). Of course you'd need a long start timeout.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org