Re: [ClusterLabs] Antw: VIP monitoring failing with Timed Out error

2015-10-28 Thread Pritam Kharat
Hi Ulrich/Anyone,

Could you please reply to the mail ?

On Wed, Oct 28, 2015 at 3:22 PM, Pritam Kharat <
pritam.kha...@oneconvergence.com> wrote:

> Resource migration to STANDBY node failed. And log on STANDBY node says
>  Timed Out of VIP monitoring . What might be the reason for VIP TimedOut.
>
> HA configuration is this way ->
>
> root@sc-node-2:~# crm configure show
> node $id="1" sc-node-1
> node $id="2" sc-node-2
> primitive oc-service-manager upstart:oc-service-manager \
> meta allow-migrate="true" migration-threshold="5" failure-timeout="120s" \
> op monitor interval="15s" timeout="60s"
> primitive sc_vip ocf:heartbeat:IPaddr2 \
> params ip="192.168.20.188" cidr_netmask="24" nic="eth0" \
> op monitor interval="15s"
> colocation sc-node inf: sc_vip oc-service-manager
> order SCVIP-BEFORE-SM 0: sc_vip oc-service-manager
> property $id="cib-bootstrap-options" \
> dc-version="1.1.10-42f2063" \
> cluster-infrastructure="corosync" \
> stonith-enabled="false" \
> cluster-recheck-interval="3min" \
> default-action-timeout="180s"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100"
>
>
>
> On Wed, Oct 28, 2015 at 2:50 PM, Ulrich Windl <
> ulrich.wi...@rz.uni-regensburg.de> wrote:
>
>> >>> Pritam Kharat  schrieb am
>> 28.10.2015 um 09:51
>> in Nachricht
>> 

[ClusterLabs] Antw: Resource placement strategy and utilization AND resource location preference

2015-10-28 Thread Ulrich Windl
>>> "Vallevand, Mark K"  schrieb am 27.10.2015 um 
>>> 22:24
in Nachricht
<37343dddcd2d454baaea374e80d73...@us-exch13-5.na.uis.unisys.com>:
> How do the resource placement strategy and utilization AND resource location 
> preference relate?
> 
> I mean, is it one or the other?  Or both somehow?

I think it's all AND (if you use -inf): So if one fails, the resource fails.

> 
> If I set a resource location preference, how will that affect placement 
> strategy like balanced?  Vice versa.
> 
> Here's the problem I'm looking at.
> I have a large number of resources that have very different utilization 
> values.  Say 1-10.  All my nodes have the same utilization values.  I want 
> the 
> placement to be balanced.  That works nicely.  Consider what happens when a 

That's default AFAIK.

> node fails and then rejoins the cluster.  The balanced placement moves the 
> resources when the node fails and again when it rejoins.  It's not good to 

Depending on stickiness of resources.

> have resources move.  Setting a resource-stickiness helps when the node 
> fails. 

You got it!

>  Rebalancing seems to be sane.  But, when the node rejoins, the resources 
> stick where they are and the rejoining node carries no load.  If I don't do 

You'll have to decide what you want: Should resources move, or shouldn't they?


> any resource placement strategy at all and consider each resource to be 
> equal, I can set resource location preferences so that resources move when 
> the node fails and return to it when it rejoins.

You could also write a script that checks the status and issues manual 
migration commands to the cluster to do what you want.

> I want it all.  :-)
> I want the resources to be placed with balanced regard to utilization.

Utilization does not balance,, but limit IMHO.

> I want only the resources on a failed node to be reallocated to remaining 
> nodes (with balanced utilization as much as possible).

Then use high stickiness.

> I want those resources to return to the node when it rejoins.  (Or a subset 
> of them if that balances better.)

Then you'll have to use a low stickiness.

> 
> I could ignore placement strategy and script up resource location 
> preferences that mimic a balanced load.  But, I'd rather let clustering do 
> it.

Honestly: Why do you care if one node has little work, while others can handle 
the load? Modern hardware can save significant energy when being idle.

> 
> Any ideas would be very welcome.

No more ideas ;-)

> 
> Regards.
> Mark K Vallevand   mark.vallev...@unisys.com
> Never try and teach a pig to sing: it's a waste of time, and it annoys the 
> pig.
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
> MATERIAL and is thus for use only by the intended recipient. If you received 
> this in error, please contact the sender and delete the e-mail and its 
> attachments from all computers.





___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: VIP monitoring failing with Timed Out error

2015-10-28 Thread Pritam Kharat
Resource migration to STANDBY node failed. And log on STANDBY node says
 Timed Out of VIP monitoring . What might be the reason for VIP TimedOut.

HA configuration is this way ->

root@sc-node-2:~# crm configure show
node $id="1" sc-node-1
node $id="2" sc-node-2
primitive oc-service-manager upstart:oc-service-manager \
meta allow-migrate="true" migration-threshold="5" failure-timeout="120s" \
op monitor interval="15s" timeout="60s"
primitive sc_vip ocf:heartbeat:IPaddr2 \
params ip="192.168.20.188" cidr_netmask="24" nic="eth0" \
op monitor interval="15s"
colocation sc-node inf: sc_vip oc-service-manager
order SCVIP-BEFORE-SM 0: sc_vip oc-service-manager
property $id="cib-bootstrap-options" \
dc-version="1.1.10-42f2063" \
cluster-infrastructure="corosync" \
stonith-enabled="false" \
cluster-recheck-interval="3min" \
default-action-timeout="180s"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"



On Wed, Oct 28, 2015 at 2:50 PM, Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:

> >>> Pritam Kharat  schrieb am
> 28.10.2015 um 09:51
> in Nachricht
> 

Re: [ClusterLabs] VIP monitoring failing with Timed Out error

2015-10-28 Thread Ken Gaillot
On 10/28/2015 03:51 AM, Pritam Kharat wrote:
> Hi All,
> 
> I am facing one issue in my two node HA. When I stop pacemaker on ACTIVE
> node, it takes more time to stop and by this time VIP migration with other
> resources migration fails to STANDBY node. (I have seen same issue in
> ACTIVE node reboot case also)

I assume STANDBY in this case is just a description of the node's
purpose, and does not mean that you placed the node in pacemaker's
standby mode. If the node really is in standby mode, it can't run any
resources.

> Last change: Wed Oct 28 02:52:57 2015 via cibadmin on node-1
> Stack: corosync
> Current DC: node-1 (1) - partition with quorum
> Version: 1.1.10-42f2063
> 2 Nodes configured
> 2 Resources configured
> 
> 
> Online: [ node-1 node-2 ]
> 
> Full list of resources:
> 
>  resource (upstart:resource): Stopped
>  vip (ocf::heartbeat:IPaddr2): Started node-2 (unmanaged) FAILED
> 
> Migration summary:
> * Node node-1:
> * Node node-2:
> 
> Failed actions:
> vip_stop_0 (node=node-2, call=-1, rc=1, status=Timed Out,
> last-rc-change=Wed Oct 28 03:05:24 2015
> , queued=0ms, exec=0ms
> ): unknown error
> 
> VIP monitor is failing over here with error Timed Out. What is the general
> reason for TimeOut. ? I have kept default-action-timeout=180secs which
> should be enough for monitoring

180s should be far more than enough, so something must be going wrong.
Notice that it is the stop operation on the active node that is failing.
Normally in such a case, pacemaker would fence that node to be sure that
it is safe to bring it up elsewhere, but you have disabled stonith.

Fencing is important in failure recovery such as this, so it would be a
good idea to try to get it implemented.

> I have added order property -> when vip is started then only start other
> resources.
> Any clue to solve this problem ? Most of the time this VIP monitoring is
> failing with Timed Out error.

The "stop" in "vip_stop_0" means that the stop operation is what failed.
Have you seen timeouts on any other operations?

Look through the logs around the time of the failure, and try to see if
there are any indications as to why the stop failed.

If you can set aside some time for testing or have a test cluster that
exhibits the same issue, you can try unmanaging the resource in
pacemaker, then:

1. Try adding/removing the IP via normal system commands, and make sure
that works.

2. Try running the resource agent manually (with any verbose option) to
start/stop/monitor the IP to see if you can reproduce the problem and
get more messages.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] restarting resources after configuration changes

2015-10-28 Thread - -
Hi,
I am having problems restarting resources (e.g apache) after a
configuration file change. I have tried 'pcs resource restart resourceid',
which says 'resource successfully restarted', but the httpd process
does not restart and hence my configuration changes in httpd.conf
does not take effect.
I am sure this scenario is quite common as administrators needs to
update httpd.config files often - how is it done in an HA cluster?

I can send a HUP signal to the httpd process to achieve this, but I hope
there will be a cluster (pcs/crm) method to do this.

Many Thanks

krishan
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] [Enhancement] When STONITH is not completed, a resource moves.

2015-10-28 Thread renayama19661014
Hi All,

The following problem produced us in Pacemaker1.1.12.
While STONITH was not completed, a resource moved it.

The next movement seemed to happen in a cluster.

Step1) Start a cluster.

Step2) Node 1 breaks down.

Step3) Node 1 is reconnected before practice is completed from node 2 STONITH.

Step4) Repeated between Step2 and Step3.

Step5) STONITH from node 2 is not completed, but a resource moves to node 2.



There was not resource information of node 1 when I saw pe file when a resource 
moved in node 2.
(snip)
  
    
      
        
          
          
          
          
          
        
      
    
    
      
        
          
          
          
        
      
      
        
(snip)

While STONITH is not completed, the information of the node of cib is deleted 
and seems to be caused by the fact that cib does not have the resource 
information of the node.

The cause of the problem was that the communication of the cluster became 
unstable.
However, an action of this cluster is a problem.

This problem is not taking place in Pacemaker1.1.13 for the moment.
However, I think that it is the same processing as far as I see a source code.

Does the deletion of the node information not have to perform it after all new 
node information gathered?

 * crmd/callback.c
(snip)
void
peer_update_callback(enum crm_status_type type, crm_node_t * node, const void 
*data)
{
(snip)
     if (down) {
            const char *task = crm_element_value(down->xml, XML_LRM_ATTR_TASK);

            if (alive && safe_str_eq(task, CRM_OP_FENCE)) {
                crm_info("Node return implies stonith of %s (action %d) 
completed", node->uname,
                         down->id);

                st_fail_count_reset(node->uname);

                erase_status_tag(node->uname, XML_CIB_TAG_LRM, cib_scope_local);
                erase_status_tag(node->uname, XML_TAG_TRANSIENT_NODEATTRS, 
cib_scope_local);
                /* down->confirmed = TRUE; Only stonith-ng returning should 
imply completion */
                down->sent_update = TRUE;       /* Prevent 
tengine_stonith_callback() from calling send_stonith_update() */

(snip)


 * There is the log, but cannot attach it because the information of the user 
is included.
 * Please contact me by an email if you need it.


These contents are registered with Bugzilla.
 * http://bugs.clusterlabs.org/show_bug.cgi?id=5254


Best Regards,
Hideo Yamauchi.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] restarting resources after configuration changes

2015-10-28 Thread GRAY Andrew G (SPARQ)
Centos/redhat: try `service httpd configtest` to validate the config changes 
first ….

Regards,

Andrew Gray.
RHCSA, Professional Unix Administration.
Ph: (07) 3664 5112

From: - - [mailto:kri...@googlemail.com]
Sent: Wednesday, 28 October 2015 8:21 PM
To: users@clusterlabs.org
Subject: [ClusterLabs] restarting resources after configuration changes

Hi,
I am having problems restarting resources (e.g apache) after a
configuration file change. I have tried 'pcs resource restart resourceid',
which says 'resource successfully restarted', but the httpd process
does not restart and hence my configuration changes in httpd.conf
does not take effect.
I am sure this scenario is quite common as administrators needs to
update httpd.config files often - how is it done in an HA cluster?
I can send a HUP signal to the httpd process to achieve this, but I hope
there will be a cluster (pcs/crm) method to do this.
Many Thanks
krishan




To report this email as spam, please forward to 
a...@websense.com



 
*
This email message (including any file attachments transmitted with it) is for 
the sole use of the intended recipient(s) and may contain confidential and 
legally privileged information. Any unauthorised review, use, alteration, 
disclosure or distribution of this email (including any attachments) by an 
unintended recipient is prohibited. If you have received this email in error, 
please notify the sender by return email and destroy all copies of the original 
message. Any confidential or legal professional privilege is not waived or lost 
by any mistaken delivery of the email. SPARQ Solutions accepts no 
responsibility for the content of any email which is sent by an employee which 
is of a personal nature.

Sender Details:
  SPARQ Solutions
  PO Box 15760 City East, Brisbane QLD Australia 4002
  +61 7 4931 

SPARQ Solutions policy is to not send unsolicited electronic messages. 
Suspected breaches of this policy can be reported by replying to this message 
including the original message and the word "UNSUBSCRIBE" in the subject. 
*
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: ORACLE 12 and SLES HAE (Sles 11sp3)

2015-10-28 Thread Ulrich Windl
Probably if Oracle 12 is compatible with Oracle 11, the RAs will continue to 
work.

>>> "Cristiano Coltro"  schrieb am 28.10.2015 um 09:45 in
Nachricht <56309953026ff...@prv-mh.provo.novell.com>:
> Hi,
> most of the SLES 11 sp3 with HAE are migrating Oracle Db.
> The migration will be from Oracle 11 to Oracle 12
> 
> They have verified that the Oracles cluster resources actually supports  
> - Oracle 10.2 and 11.2 
> Command used: usando il comando "crm ra info ocf:heartbeat:SAPDatabase"
> So seems they are out of support.
> So I would like to know which version of cluster/SO/Agent supports Oracle 
> 12.
> AFAIK agents are tipically included in rpm.
> # rpm -qf /usr/lib/ocf/resource.d/heartbeat/SAPDatabase
> resource-agents-3.9.5-0.34.57
> and there are NOT updates about thatin the channel.
> 
> Any Idea on that?
> Thanks,
> Crisitiano
> 
> 
> 
> Cristiano Coltro
> Premium Support Engineer
>   
> mail: cristiano.col...@microfocus.com 
> phone +39 02 36634936
> mobile +39 3351435589
> 
> 
>  





___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: VIP monitoring failing with Timed Out error

2015-10-28 Thread Ulrich Windl
>>> Pritam Kharat  schrieb am 28.10.2015 um 
>>> 09:51
in Nachricht

[ClusterLabs] ORACLE 12 and SLES HAE (Sles 11sp3)

2015-10-28 Thread Cristiano Coltro
Hi,
most of the SLES 11 sp3 with HAE are migrating Oracle Db.
The migration will be from Oracle 11 to Oracle 12

They have verified that the Oracles cluster resources actually supports  
- Oracle 10.2 and 11.2 
Command used: usando il comando "crm ra info ocf:heartbeat:SAPDatabase"
So seems they are out of support.
So I would like to know which version of cluster/SO/Agent supports Oracle 12.
AFAIK agents are tipically included in rpm.
# rpm -qf /usr/lib/ocf/resource.d/heartbeat/SAPDatabase
resource-agents-3.9.5-0.34.57
and there are NOT updates about thatin the channel.

Any Idea on that?
Thanks,
Crisitiano



Cristiano Coltro
Premium Support Engineer
  
mail: cristiano.col...@microfocus.com
phone +39 02 36634936
mobile +39 3351435589


 










Part.002
Description: Binary data
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] VIP monitoring failing with Timed Out error

2015-10-28 Thread Pritam Kharat
Hi All,

I am facing one issue in my two node HA. When I stop pacemaker on ACTIVE
node, it takes more time to stop and by this time VIP migration with other
resources migration fails to STANDBY node. (I have seen same issue in
ACTIVE node reboot case also)


Last change: Wed Oct 28 02:52:57 2015 via cibadmin on node-1
Stack: corosync
Current DC: node-1 (1) - partition with quorum
Version: 1.1.10-42f2063
2 Nodes configured
2 Resources configured


Online: [ node-1 node-2 ]

Full list of resources:

 resource (upstart:resource): Stopped
 vip (ocf::heartbeat:IPaddr2): Started node-2 (unmanaged) FAILED

Migration summary:
* Node node-1:
* Node node-2:

Failed actions:
vip_stop_0 (node=node-2, call=-1, rc=1, status=Timed Out,
last-rc-change=Wed Oct 28 03:05:24 2015
, queued=0ms, exec=0ms
): unknown error

VIP monitor is failing over here with error Timed Out. What is the general
reason for TimeOut. ? I have kept default-action-timeout=180secs which
should be enough for monitoring
I have added order property -> when vip is started then only start other
resources.
Any clue to solve this problem ? Most of the time this VIP monitoring is
failing with Timed Out error.

-- 
Thanks and Regards,
Pritam Kharat.
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ORACLE 12 and SLES HAE (Sles 11sp3)

2015-10-28 Thread Andrei Borzenkov
On Wed, Oct 28, 2015 at 11:45 AM, Cristiano Coltro 
wrote:

> Hi,
> most of the SLES 11 sp3 with HAE are migrating Oracle Db.
> The migration will be from Oracle 11 to Oracle 12
>
> They have verified that the Oracles cluster resources actually supports
> - Oracle 10.2 and 11.2
> Command used: usando il comando "crm ra info ocf:heartbeat:SAPDatabase"
> So seems they are out of support.
> So I would like to know which version of cluster/SO/Agent supports Oracle
> 12.
>


SAPDatabase is using sapagent interface to start/stop and optionally
monitor databases, so it is expected to work with any database that is
supported by sapagent. Looking at agent, unless STRICT_MONITORING is set it
simply checks for running processes; at least user may be wrong (IIRC
Oracle 12 is installed under oracle user, not orasid, although it can be
installed under orasid as well). I do not know Oracle 12 process structure
so cannot decide here.

With STRICT_MONITORING set I actually expect SAPDatabase to work, although
MONITOR_SERVICES may need adjustment.


> AFAIK agents are tipically included in rpm.
> # rpm -qf /usr/lib/ocf/resource.d/heartbeat/SAPDatabase
> resource-agents-3.9.5-0.34.57
> and there are NOT updates about thatin the channel.
>
> Any Idea on that?
> Thanks,
> Crisitiano
>
> **
>
> *Cristiano Coltro*
> Premium Support Engineer
>
> mail: cristiano.col...@microfocus.com
> phone +39 02 36634936
> mobile +39 3351435589
>
> 
>  [image: Image result for microfocus]
>
>
>
>
>
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org