Re: [Linux-HA] Master Became Slave - Cluster unstable $$$

2014-04-08 Thread Maloja01

On 04/08/2014 12:18 AM, Ammar Sheikh Saleh wrote:

yes i have the command ... its CentOS


Then please review the man page of crm_master and try to adjust the 
scores where you want to start the master and where you want to start 
the slave. Before you follow my general steps you could also ask again

on the list about using crm_master fom command line on centos - I am not
really sure if it is really the same.

1. Check the current promotion scores using the pengine:
ptest -Ls | grep promo
- You should get a list of scores per master/slave resources and node

2. Check the set crm_master score using crm_master:
crm_master -q -G -N node -l reboot -r resource-INSIDE-masterslave

3. Adjust the master/promotion scores (this is the most tricky part)
crm_master -v NEW_MASTER_VALUE -l reboot -r resource-INSIDE-masterslave

If you do not have constraints added by bad operations before that
might help the cluster to promote the preferred site.

But my procedure is without any warranty and further support, sorry.

Maloja01




On Mon, Apr 7, 2014 at 4:16 PM, Maloja01 maloj...@arcor.de wrote:


On 04/07/2014 03:00 PM, Ammar Sheikh Saleh wrote:


thanks for your help ... can you guide me to the correct commands :

I dont understand with is  rsc in this command

crm(live)node# attribute
usage:
  attribute node set rsc value
  attribute node delete rsc
  attribute node show rsc


how can I give a node a master attribute with high score in the above ?



At SLES (SUSE) there is a command crm_master - do you have such a command?




cheers!
Ammar


On Mon, Apr 7, 2014 at 3:49 PM, Maloja01 maloj...@arcor.de wrote:

  On 04/07/2014 01:23 PM, Ammar Sheikh Saleh wrote:


  thanks a million time for answering ...


I included all my software versions /OS  details in the thread  (
http://linux-ha.996297.n3.nabble.com/Master-Became-
Slave-Cluster-unstable-td15583.html)


but here they are :
this is the SW versions:
corosync-2.3.0-1.el6.x86_64
drbd84-utils-8.4.2-1.el6.elrepo.x86_64
pacemaker-1.1.8-1.el6.x86_64
OS:  CentOS 6.4 x64bit



Ah sorry than I couldn't tell how stonith is working, as RH has a
complete
different setup beyond pacemaker. I do not know how they implement
fencing.

Sorry - but however best regards
F.Maloja





I need to correct something .. the setup is 2 nodes for HA and third one
for Quorm only  ... also config changed a little bit (attached)

looking at the config right now ... I see a suspicious line

(rule role=Master score=-INFINITY
id=drbd-fence-by-handler-r0-rule-SuperDataClone
 expression attribute=#uname operation=ne value=
lws1h1.npario.com id=drbd-fence-by-handler-r0-expr-SuperDataClone/
  )

it might be the reason why services are not starting on the first node
(not
sure%100) ... I have a feeling your answer is the right one ... but I
dont
know the correct commands to do them:

1- I need to put back the Node1 back to Master (currently it is slaved)
2- remove any constraints   Or  preferred  locations  also remove
any
special attributes on Node1 that is making it slave

what do you think ?  how can I do these ? what are commands


cheers!
Ammar


On Mon, Apr 7, 2014 at 2:09 PM, Maloja01 maloj...@arcor.de wrote:

   hi ammar,



first we need to check:
a) which OS (which Linux dist are you using)?
b) which cluster version/packages do you have installed?
c) how is your cluster config look like?

As my tipp with crm_master/removing client-prefer-rules might only help
in
some combinations of a-b-c I need that info.

Second I need to tell that free help means that I will not give any
warrinty or whatever that your cluster gets better afterward. This
could
only be done by a cost intensive consulting.

regards
f.maloja


On 04/07/2014 12:10 PM, aali...@gmail.com wrote:

   hi  Maloja01,



could you please tell me help do these steps/ what commands look like:

- You Should use crm_master to Score the Master-Placement
- you should remove all client-Prefer-location-rules which you added
by
your experiments using the GUI

I dont want to make this worst ... production is down here.. and I am
desperately in need of any help

Thank you,
Ammar

_
Sent from http://linux-ha.996297.n3.nabble.com





















___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Master Became Slave - Cluster unstable $$$

2014-04-07 Thread Maloja01
FIRST you need to setup fencing (STONITH) - I do not see any stonith 
resource in your cluster - that WILL be a problem in your cluster.


You could not migrate a Master/Slave. You Should use crm_master to 
Score the Master-Placement. And you should remove all 
client-Prefer-location-rules which you added by your experiments using 
the GUI, they

might hurt the cluster in the future...

AND as already written in this thread you must tell the cluster to 
ignore quorum, if you really only have a two-node-cluster.


FMaloja

On 04/07/2014 02:31 AM, aalishe wrote:

tell me what informations are still needed to be sure / certain  solving the
problem ?

I can provide it all

thanks for your time



--
View this message in context: 
http://linux-ha.996297.n3.nabble.com/Master-Became-Slave-Cluster-unstable-tp15583p15585.html
Sent from the Linux-HA mailing list archive at Nabble.com.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Master Became Slave - Cluster unstable $$$

2014-04-06 Thread Digimer
I can't speak to your specific problem, but I can say for certain that 
you need to disable quorum[1] and enable stonith (also called 
fencing[2]). Once stonith is configured (and tested) in pacemaker, be 
sure to setup fencing in DRBD using the 'crm-fence-peer.sh' fence 
handler[3].


digimer

1. https://alteeve.ca/w/Quorum
2. https://alteeve.ca/w/AN!Cluster_Tutorial_2#Concept.3B_Fencing
3. 
https://alteeve.ca/w/AN!Cluster_Tutorial_2#Configuring_DRBD_Global_and_Common_Options 
 (replace 'rhcs_fence' for 'crm-fence-peer.sh')


On 06/04/14 08:02 PM, aalishe wrote:

Hi all,

I am new to corosync/pacemaker and  I have a 2 node production cluster
with (corosync+pacemaker+drbd)

Node1 = lws1h1.mydomain.com
Node2 = lws1h2.mydomain.com

they are in online/online  failover setup .  services are only running
where DRBD resides... the other node stays online to take over if Node1
fails

this is the SW versions:
corosync-2.3.0-1.el6.x86_64
drbd84-utils-8.4.2-1.el6.elrepo.x86_64
pacemaker-1.1.8-1.el6.x86_64
OS:  CentOS 6.4 x64bit

the cluster is configured with Quorum (not sure what that is)

few days ago I placed one of the nodes in maintenance mode after services
where going bad due to a problem   I dont remember the details of how I
moved/migrated the resources but I usually use  LCMC GUI tool  also I
did some restart for corosync / pacemaker in a random ways   :$

after that.. Node1 became slave and Node2 became master!

services are now sticking on Node2  and I cant migrate them even by force to
Node1  (tried command line tools and LCMC tool)


more details/outputs:


*### Start ###*
[aalishe@lws1h1 ~]$ sudo crm_mon -Afro
Last updated: Sun Apr  6 15:25:52 2014
Last change: Sun Apr  6 14:16:15 2014 via crm_resource on
lws1h2.mydomain.com
Stack: corosync
Current DC: lws1h2.mydomain.com (2) - partition with quorum
Version: 1.1.8-1.el6-394e906
2 Nodes configured, unknown expected votes
10 Resources configured.


Online: [ lws1h1.mydomain.com lws1h2.mydomain.com ]

Full list of resources:

  Resource Group: SuperMetaService
  SuperFloatIP  (ocf::heartbeat:IPaddr2):   Started 
lws1h2.mydomain.com
  SuperFs1   (ocf::heartbeat:Filesystem):Started lws1h2.mydomain.com
  SuperFs2   (ocf::heartbeat:Filesystem):Started lws1h2.mydomain.com
  SuperFs3   (ocf::heartbeat:Filesystem):Started lws1h2.mydomain.com
  SuperFs4   (ocf::heartbeat:Filesystem):Started lws1h2.mydomain.com
  Master/Slave Set: SuperDataClone [SuperData]
  Masters: [ lws1h2.mydomain.com ]
  Slaves: [ lws1h1.mydomain.com ]
SuperMetaSQL(ocf::mydomain:pgsql):Started lws1h2.mydomain.com
SuperGTS(ocf::mydomain:mmon): Started lws1h2.mydomain.com
SuperCQP(ocf::mydomain:mmon): Started lws1h2.mydomain.com

Node Attributes:
* Node lws1h1.mydomain.com:
* Node lws1h2.mydomain.com:
 + master-SuperData  : 1

Operations:
* Node lws1h2.mydomain.com:
SuperFs1: migration-threshold=100
 + (1241) start: rc=0 (ok)
SuperMetaSQL: migration-threshold=100
 + (1254) start: rc=0 (ok)
 + (1257) monitor: interval=3ms rc=0 (ok)
SuperFloatIP: migration-threshold=100
 + (1236) start: rc=0 (ok)   + (1239) monitor: interval=3ms rc=0 (ok)
SuperData:0: migration-threshold=100+ (957) probe: rc=0 (ok)+
(1230) promot

*### End ###*



CRM Configuration
*### Start ###*
[aalishe@lws1h1 ~]$ sudo crm configure show
node $id=1 lws1h1.mydomain.com \
 attributes standby=off
node $id=2 lws1h2.mydomain.com \
 attributes standby=off
primitive SuperCQP ocf:mydomain:mmon \
 params mmond=/opt/mydomain/platform/bin/
cfgfile=/opt/mydomain/platform/etc/mmon_mydomain_cqp.xml
pidfile=/opt/mydomain/platform/var/run/mmon_mydomain_cqp.pid
user=mydomainsvc db=bigdata dbport=5434 \
 operations $id=SuperCQP-operations \
 op start interval=0 timeout=120 \
 op stop interval=0 timeout=120 \
 op monitor interval=120 timeout=120 start-delay=0 \
 meta target-role=started is-managed=true
primitive SuperData ocf:linbit:drbd \
 params drbd_resource=r0 \
 op monitor interval=60s \
 meta target-role=started
primitive SuperFloatIP ocf:heartbeat:IPaddr2 \
 params ip=10.100.0.225 cidr_netmask=24 \
 op monitor interval=30s \
 meta target-role=started
primitive SuperFs1 ocf:heartbeat:Filesystem \
 params device=/dev/drbd1 directory=/mnt/drbd1 fstype=ext4 \
 meta target-role=started
primitive SuperFs2 ocf:heartbeat:Filesystem \
 params device=/dev/drbd2 directory=/mnt/drbd2 fstype=ext4 \
 meta target-role=started
primitive SuperFs3 ocf:heartbeat:Filesystem \
 params device=/dev/drbd3 directory=/mnt/drbd3 fstype=ext4
primitive SuperFs4 ocf:heartbeat:Filesystem \
 

Re: [Linux-HA] Master Became Slave - Cluster unstable $$$

2014-04-06 Thread aalishe
tell me what informations are still needed to be sure / certain  solving the
problem ?

I can provide it all 

thanks for your time 



--
View this message in context: 
http://linux-ha.996297.n3.nabble.com/Master-Became-Slave-Cluster-unstable-tp15583p15585.html
Sent from the Linux-HA mailing list archive at Nabble.com.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems