date:20170804

Re: [ClusterLabs] Fwd: Multi cluster

2017-08-04 Thread Jan Pokorný

[addendum inline]

On 04/08/17 18:35 +0200, Jan Pokorný wrote:
> On 03/08/17 20:37 +0530, sharafraz khan wrote:
>> I am new to clustering so please ignore if my Question sounds silly, i have
>> a requirement were in i need to create cluster for ERP application with
>> apache, VIP component,below is the scenario
>> 
>> We have 5 Sites,
>> 1. DC
>> 2. Site A
>> 3. Site B
>> 4. Site C
>> 5. Site D
>> 
>> Over here we need to configure HA as such that DC would be the primary Node
>> hosting application & be accessed from by all the users in each sites, in
>> case of Failure of DC Node, Site users should automatically be switched to
>> there local ERP server, and not to the Nodes at other sites, so
>> communication would be as below
>> 
>> DC < -- > Site A
>> DC < -- > Site B
>> DC < -- > Site C
>> DC < -- > Site D
>> 
>> Now the challenge is
>> 
>> 1. If i create a cluster between say DC < -- > Site A it won't allow me to
>> create another cluster on DC with other sites
>> 
>> 2. if i setup all the nodes in single cluster how can i ensure that in case
>> of Node Failure or loss of connectivity to DC node from any site, users
>> from that sites should be switched to Local ERP node and not to nodes on
>> other site.
>> 
>> a urgent response and help would be quite helpful
> 
> From your description, I suppose you are limited to just a single
> machine per site/DC (making the overall picture prone to double
> fault, first DC goes down, then any of the sites goes down, then
> at least the clients of that very site encounter the downtime).
> Otherwise I'd suggest looking at booth project that facilitates
> inter-cluster (back to your "multi cluster") decisions, extending
> upon pacemaker performing the intra-cluster ones.
> 
> Using a single cluster approach, you should certainly be able to
> model your fallback scenario, something like:
> 
> - define a group A (VIP, apache, app), infinity-located with DC
> - define a different group B with the same content, set up as clone
>   B_clone being (-infinity)-located with DC
> - set up ordering "B_clone starts when A stops", of "Mandatory" kind
> 
> Further tweaks may be needed.

Hmm, actually VIP would not help much here, even if "ip" adapted per
host ("#uname") as there're two conflicting principles ("globality"
of the network for serving from DC vs. locality when serving from
particular sites _in parallel_).  Something more sophisticated would
likely be needed.

-- 
Poki


pgp5y7r20EoBF.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Fwd: Multi cluster

2017-08-04 Thread Ken Gaillot

On Fri, 2017-08-04 at 18:35 +0200, Jan Pokorný wrote:
> On 03/08/17 20:37 +0530, sharafraz khan wrote:
> > I am new to clustering so please ignore if my Question sounds silly, i have
> > a requirement were in i need to create cluster for ERP application with
> > apache, VIP component,below is the scenario
> > 
> > We have 5 Sites,
> > 1. DC
> > 2. Site A
> > 3. Site B
> > 4. Site C
> > 5. Site D
> > 
> > Over here we need to configure HA as such that DC would be the primary Node
> > hosting application & be accessed from by all the users in each sites, in
> > case of Failure of DC Node, Site users should automatically be switched to
> > there local ERP server, and not to the Nodes at other sites, so
> > communication would be as below
> > 
> > DC < -- > Site A
> > DC < -- > Site B
> > DC < -- > Site C
> > DC < -- > Site D
> > 
> > Now the challenge is
> > 
> > 1. If i create a cluster between say DC < -- > Site A it won't allow me to
> > create another cluster on DC with other sites

Right, your choices (when using corosync+pacemaker) are one big cluster
with all sites (including the data center), or an independent cluster at
each site connected by booth.

It sounds like your secondary sites don't have any communication between
each other, only to the DC, so that suggests that the "one big cluster"
approach won't work.

For more details on pacemaker+booth, see:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm139900093104976


> > 2. if i setup all the nodes in single cluster how can i ensure that in case
> > of Node Failure or loss of connectivity to DC node from any site, users
> > from that sites should be switched to Local ERP node and not to nodes on
> > other site.

The details depend on the particular service. Unfortunately I don't have
any experience with ERP, maybe someone else can jump in with tips.

How do users contact the ERP node? Via an IP address, or a list of IP
addresses that will be tried in order, or some other way?

Is the ERP service itself managed by the cluster? If so, what resource
agent are you using? Does the agent support cloning or master/slave
operation?

> > 
> > a urgent response and help would be quite helpful
> 
> From your description, I suppose you are limited to just a single
> machine per site/DC (making the overall picture prone to double
> fault, first DC goes down, then any of the sites goes down, then
> at least the clients of that very site encounter the downtime).
> Otherwise I'd suggest looking at booth project that facilitates
> inter-cluster (back to your "multi cluster") decisions, extending
> upon pacemaker performing the intra-cluster ones.
> 
> Using a single cluster approach, you should certainly be able to
> model your fallback scenario, something like:
> 
> - define a group A (VIP, apache, app), infinity-located with DC
> - define a different group B with the same content, set up as clone
>   B_clone being (-infinity)-located with DC
> - set up ordering "B_clone starts when A stops", of "Mandatory" kind
> 
> Further tweaks may be needed.

-- 
Ken Gaillot 





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] big trouble with a DRBD resource

2017-08-04 Thread Ken Gaillot

On Fri, 2017-08-04 at 18:20 +0200, Lentes, Bernd wrote:
> Hi,
> 
> first: is there a tutorial or s.th. else which helps in understanding what 
> pacemaker logs in syslog and /var/log/cluster/corosync.log ?
> I try hard to find out what's going wrong, but they are difficult to 
> understand, also because of the amount of information.
> Or should i deal more with "crm histroy" or hb_report ?

Unfortunately no -- logging, and troubleshooting in general, is an area
we are continually striving to improve, but there are more to-do's than
time to do them.

> 
> What happened:
> I tried to configure a simple drbd resource following 
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html#idm140457860751296
> I used this simple snip from the doc:
> configure primitive WebData ocf:linbit:drbd params drbd_resource=wwwdata \
> op monitor interval=60s
> 
> I did it on live cluster, which is in testing currently. I will never do this 
> again. Shadow will be my friend.

lol, yep

> The cluster reacted promptly:
> crm(live)# configure primitive prim_drbd_idcc_devel ocf:linbit:drbd params 
> drbd_resource=idcc-devel \
>> op monitor interval=60
> WARNING: prim_drbd_idcc_devel: default timeout 20s for start is smaller than 
> the advised 240
> WARNING: prim_drbd_idcc_devel: default timeout 20s for stop is smaller than 
> the advised 100
> WARNING: prim_drbd_idcc_devel: action monitor not advertised in meta-data, it 
> may not be supported by the RA
> 
> From what i understand until now is that i didn't configure start/stop 
> operations, so the cluster chooses the default from default-action-timeout.
> It didn't configure the monitor operation, because this is not in the 
> meta-data.
> 
> I checked it:
> crm(live)# ra info ocf:linbit:drbd
> Manages a DRBD device as a Master/Slave resource (ocf:linbit:drbd)
> 
> Operations' defaults (advisory minimum):
> 
> start timeout=240
> promote   timeout=90
> demotetimeout=90
> notifytimeout=90
> stop  timeout=100
> monitor_Slave timeout=20 interval=20
> monitor_Master timeout=20 interval=10
> 
> OK. I have to configure monitor_Slave and monitor_Master.
> 
> The log says:
> Aug  1 14:19:33 ha-idg-1 drbd(prim_drbd_idcc_devel)[11325]: ERROR: meta 
> parameter misconfigured, expected clone-max -le 2, but found unset.
>   
> ^
> Aug  1 14:19:33 ha-idg-1 crmd[4692]:   notice: process_lrm_event: Operation 
> prim_drbd_idcc_devel_monitor_0: not configured (node=ha-idg-1, call=73, rc=6, 
> cib-update=37, confirmed=true)
> Aug  1 14:19:33 ha-idg-1 crmd[4692]:   notice: process_lrm_event: Operation 
> prim_drbd_idcc_devel_stop_0: not configured (node=ha-idg-1, call=74, rc=6, 
> cib-update=38, confirmed=true)
> 
> Why is it complaining about missing clone-max ? This is a meta attribute for 
> a clone, but not for a simple resource !?! This message is constantly 
> repeated, it still appears although cluster is in standby since three days.

The "ERROR" message is coming from the DRBD resource agent itself, not
pacemaker. Between that message and the two separate monitor operations,
it looks like the agent will only run as a master/slave clone.

> And why does it complain that stop is not configured ?

A confusing error message. It's not complaining that the operations are
not configured, it's saying the operations failed because the resource
is not properly configured. What "properly configured" means is up to
the individual resource agent.

> Isn't that configured with the default of 20 sec. ? That's what crm said. See 
> above. This message is also repeated nearly 7000 times in 9 minutes.
> If the stop op is not configured and the cluster complains about it, why does 
> it not complain about a unconfigured start op ?
> That the missing monitor is complained is clear.
> 
> The DC says:
> Aug  1 14:19:33 ha-idg-2 pengine[27043]:  warning: unpack_rsc_op_failure: 
> Processing failed op stop for prim_drbd_idcc_devel on ha-idg-1: not 
> configured (6)
> Aug  1 14:19:33 ha-idg-2 pengine[27043]:error: unpack_rsc_op: Preventing 
> prim_drbd_idcc_devel from re-starting anywhere: operation stop failed 'not 
> configured' (6)
> 
> Again complaining about a failed stop, saying it's not configured. Or does it 
> complain that the fail of a stop op is not configured ?

Again, it's confusing, but you have various logs of the same event
coming from three different places.

First, DRBD logged that there is a "meta parameter misconfigured". It
then reported that error value back to the crmd cluster daemon that
called it, so the crmd logged the error as well, that the result of the
operation was "not configured".

Then (above), when the policy engine reads the current status of the
cluster, it sees that there is a failed operation, so it decides what to
do about the failure.

> The d

Re: [ClusterLabs] Notification agent and Notification recipients

2017-08-04 Thread Ken Gaillot

On Thu, 2017-08-03 at 12:31 +0530, Sriram wrote:
> 
> Hi Team,
> 
> 
> We have a four node cluster (1 active : 3 standby) in our lab for a
> particular service. If the active node goes down, one of the three
> standby node  becomes active. Now there will be (1 active :  2
> standby : 1 offline).
> 
> 
> Is there any way where this newly elected node sends notification to
> the remaining 2 standby nodes about its new status ?

Hi Sriram,

This depends on how your service is configured in the cluster.

If you have a clone or master/slave resource, then clone notifications
is probably what you want (not alerts, which is the path you were going
down -- alerts are designed to e.g. email a system administrator after
an important event).

For details about clone notifications, see:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_clone_resource_agent_requirements

The RA must support the "notify" action, which will be called when a
clone instance is started or stopped. See the similar section later for
master/slave resources for additional information. See the mysql or
pgsql resource agents for examples of notify implementations.

> I was exploring "notification agent" and "notification recipient"
> features, but that doesn't seem to work. /etc/sysconfig/notify.sh
> doesn't get invoked even in the newly elected active node. 

Yep, that's something different altogether -- it's only enabled on RHEL
systems, and solely for backward compatibility with an early
implementation of the alerts interface. The new alerts interface is more
flexible, but it's not designed to send information between cluster
nodes -- it's designed to send information to something external to the
cluster, such as a human, or an SNMP server, or a monitoring system.

> Cluster Properties:
>  cluster-infrastructure: corosync
>  dc-version: 1.1.17-e2e6cdce80
>  default-action-timeout: 240
>  have-watchdog: false
>  no-quorum-policy: ignore
>  notification-agent: /etc/sysconfig/notify.sh
>  notification-recipient: /var/log/notify.log
>  placement-strategy: balanced
>  stonith-enabled: false
>  symmetric-cluster: false
> 
> 
> 
> 
> I m using the following versions of pacemaker and corosync.
> 
> 
> /usr/sbin # ./pacemakerd --version
> Pacemaker 1.1.17
> Written by Andrew Beekhof
> /usr/sbin # ./corosync -v
> Corosync Cluster Engine, version '2.3.5'
> Copyright (c) 2006-2009 Red Hat, Inc.
> 
> 
> Can you please suggest if I m doing anything wrong or if there any
> other mechanisms to achieve this ?
> 
> 
> Regards,
> Sriram.
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Fwd: Multi cluster

2017-08-04 Thread Jan Pokorný

On 03/08/17 20:37 +0530, sharafraz khan wrote:
> I am new to clustering so please ignore if my Question sounds silly, i have
> a requirement were in i need to create cluster for ERP application with
> apache, VIP component,below is the scenario
> 
> We have 5 Sites,
> 1. DC
> 2. Site A
> 3. Site B
> 4. Site C
> 5. Site D
> 
> Over here we need to configure HA as such that DC would be the primary Node
> hosting application & be accessed from by all the users in each sites, in
> case of Failure of DC Node, Site users should automatically be switched to
> there local ERP server, and not to the Nodes at other sites, so
> communication would be as below
> 
> DC < -- > Site A
> DC < -- > Site B
> DC < -- > Site C
> DC < -- > Site D
> 
> Now the challenge is
> 
> 1. If i create a cluster between say DC < -- > Site A it won't allow me to
> create another cluster on DC with other sites
> 
> 2. if i setup all the nodes in single cluster how can i ensure that in case
> of Node Failure or loss of connectivity to DC node from any site, users
> from that sites should be switched to Local ERP node and not to nodes on
> other site.
> 
> a urgent response and help would be quite helpful

From your description, I suppose you are limited to just a single
machine per site/DC (making the overall picture prone to double
fault, first DC goes down, then any of the sites goes down, then
at least the clients of that very site encounter the downtime).
Otherwise I'd suggest looking at booth project that facilitates
inter-cluster (back to your "multi cluster") decisions, extending
upon pacemaker performing the intra-cluster ones.

Using a single cluster approach, you should certainly be able to
model your fallback scenario, something like:

- define a group A (VIP, apache, app), infinity-located with DC
- define a different group B with the same content, set up as clone
  B_clone being (-infinity)-located with DC
- set up ordering "B_clone starts when A stops", of "Mandatory" kind

Further tweaks may be needed.

-- 
Poki


pgpfQIuSWqmZH.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] big trouble with a DRBD resource

2017-08-04 Thread Lentes, Bernd

Hi,

first: is there a tutorial or s.th. else which helps in understanding what 
pacemaker logs in syslog and /var/log/cluster/corosync.log ?
I try hard to find out what's going wrong, but they are difficult to 
understand, also because of the amount of information.
Or should i deal more with "crm histroy" or hb_report ?

What happened:
I tried to configure a simple drbd resource following 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html#idm140457860751296
I used this simple snip from the doc:
configure primitive WebData ocf:linbit:drbd params drbd_resource=wwwdata \
op monitor interval=60s

I did it on live cluster, which is in testing currently. I will never do this 
again. Shadow will be my friend.

The cluster reacted promptly:
crm(live)# configure primitive prim_drbd_idcc_devel ocf:linbit:drbd params 
drbd_resource=idcc-devel \
   > op monitor interval=60
WARNING: prim_drbd_idcc_devel: default timeout 20s for start is smaller than 
the advised 240
WARNING: prim_drbd_idcc_devel: default timeout 20s for stop is smaller than the 
advised 100
WARNING: prim_drbd_idcc_devel: action monitor not advertised in meta-data, it 
may not be supported by the RA

From what i understand until now is that i didn't configure start/stop 
operations, so the cluster chooses the default from default-action-timeout.
It didn't configure the monitor operation, because this is not in the meta-data.

I checked it:
crm(live)# ra info ocf:linbit:drbd
Manages a DRBD device as a Master/Slave resource (ocf:linbit:drbd)

Operations' defaults (advisory minimum):

start timeout=240
promote   timeout=90
demotetimeout=90
notifytimeout=90
stop  timeout=100
monitor_Slave timeout=20 interval=20
monitor_Master timeout=20 interval=10

OK. I have to configure monitor_Slave and monitor_Master.

The log says:
Aug  1 14:19:33 ha-idg-1 drbd(prim_drbd_idcc_devel)[11325]: ERROR: meta 
parameter misconfigured, expected clone-max -le 2, but found unset.

  ^
Aug  1 14:19:33 ha-idg-1 crmd[4692]:   notice: process_lrm_event: Operation 
prim_drbd_idcc_devel_monitor_0: not configured (node=ha-idg-1, call=73, rc=6, 
cib-update=37, confirmed=true)
Aug  1 14:19:33 ha-idg-1 crmd[4692]:   notice: process_lrm_event: Operation 
prim_drbd_idcc_devel_stop_0: not configured (node=ha-idg-1, call=74, rc=6, 
cib-update=38, confirmed=true)

Why is it complaining about missing clone-max ? This is a meta attribute for a 
clone, but not for a simple resource !?! This message is constantly repeated, 
it still appears although cluster is in standby since three days.
And why does it complain that stop is not configured ?
Isn't that configured with the default of 20 sec. ? That's what crm said. See 
above. This message is also repeated nearly 7000 times in 9 minutes.
If the stop op is not configured and the cluster complains about it, why does 
it not complain about a unconfigured start op ?
That the missing monitor is complained is clear.

The DC says:
Aug  1 14:19:33 ha-idg-2 pengine[27043]:  warning: unpack_rsc_op_failure: 
Processing failed op stop for prim_drbd_idcc_devel on ha-idg-1: not configured 
(6)
Aug  1 14:19:33 ha-idg-2 pengine[27043]:error: unpack_rsc_op: Preventing 
prim_drbd_idcc_devel from re-starting anywhere: operation stop failed 'not 
configured' (6)

Again complaining about a failed stop, saying it's not configured. Or does it 
complain that the fail of a stop op is not configured ?
The doc says:
"Some operations are generated by the cluster itself, for example, stopping and 
starting resources as needed."
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_resource_operations.html
 . Is the doc wrong ?
What happens when i DON'T configure start/stop operations ? Are they configured 
automatically ?
I have several primitives without a configured start/stop operation, but never 
had any problems with them.

failcount is direct INFINITY:
Aug  1 14:19:33 ha-idg-1 attrd[4690]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: fail-count-prim_drbd_idcc_devel (INFINITY)
Aug  1 14:19:33 ha-idg-1 attrd[4690]:   notice: attrd_perform_update: Sent 
update 8: fail-count-prim_drbd_idcc_devel=INFINITY


After exact 9 minutes the complaints about the not configured stop operation 
stopped, the complaints about missing clone-max still appears, although both 
nodes are in standby

now fail-count is 1 million:
Aug  1 14:28:33 ha-idg-1 attrd[4690]:   notice: attrd_trigger_update: Sending 
flush op to all hosts for: fail-count-prim_drbd_idcc_devel (100)
Aug  1 14:28:33 ha-idg-1 attrd[4690]:   notice: attrd_perform_update: Sent 
update 7076: fail-count-prim_drbd_idcc_devel=100

and a complain about monitor operation appeared again:
Aug  1 14:28:33 ha-idg-1 crmd[4692]:   notice: process_lrm_event: O

Re: [ClusterLabs] Notification agent and Notification recipients

2017-08-04 Thread Jan Pokorný

On 04/08/17 11:06 +0530, Sriram wrote:
> Any idea what could have gone wrong or if there are other ways to achieve
> the same ?

Sriram, I have just answered in the original thread.  Note that it's
that part of the year where vacations are quite common, so even if you
are eager to know the answer, the reasonable wait time should be a bit
higher (and moreover, do not start a new thread but rather respond to
the existing one next time around, please).

-- 
Poki

pgprM9DZSi_Dz.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Notification agent and Notification recipients

2017-08-04 Thread Jan Pokorný

On 03/08/17 12:31 +0530, Sriram wrote:
> We have a four node cluster (1 active : 3 standby) in our lab for a
> particular service. If the active node goes down, one of the three standby
> node  becomes active. Now there will be (1 active :  2 standby : 1 offline).
> 
> Is there any way where this newly elected node sends notification to the
> remaining 2 standby nodes about its new status ?
> 
> I was exploring "notification agent" and "notification recipient" features,
> but that doesn't seem to work. /etc/sysconfig/notify.sh doesn't get invoked
> even in the newly elected active node.
> 
> Cluster Properties:
>  cluster-infrastructure: corosync
>  dc-version: 1.1.17-e2e6cdce80
>  default-action-timeout: 240
>  have-watchdog: false
>  no-quorum-policy: ignore
>  *notification-agent: /etc/sysconfig/notify.sh*
>  *notification-recipient: /var/log/notify.log*

This ^ legacy approach to configure notifications ...

>  placement-strategy: balanced
>  stonith-enabled: false
>  symmetric-cluster: false
> 
> 
> I m using the following versions of pacemaker and corosync.
> 
> /usr/sbin # ./pacemakerd --version
> Pacemaker 1.1.17

... is not expected to be used with this ^ new pacemaker (or any
version 1.1.15+, for that matter, unless explicitly enabled):

 
https://github.com/ClusterLabs/pacemaker/commit/a8d8c0c2d4cad571f0746c879de4f6d0c55dd5d6

[...]

> Can you please suggest if I m doing anything wrong or if there any other
> mechanisms to achieve this ?

Please, have a look at the respective chapter of Pacemaker Explained

 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_alert_agents

with the details on how to use the blessed (and usually only, which
is very likely here) approach to configure notification scripts.

-- 
Poki


pgpLhWZi840yR.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: verify status starts at 100% and stays there?

2017-08-04 Thread Eric Robinson

Yeah, UpToDate was not of concern to me. The part that threw me off was 
"done:100.00." It did eventually finish, though, and that was shown in the 
dmesg output. However, 'drbdadm status'  said "done:100.00" the whole time, 
from start to finish, which seems weird.  

--
Eric Robinson
   

> -Original Message-
> From: Ulrich Windl [mailto:ulrich.wi...@rz.uni-regensburg.de]
> Sent: Thursday, August 03, 2017 11:25 PM
> To: users@clusterlabs.org
> Subject: [ClusterLabs] Antw: verify status starts at 100% and stays there?
> 
> >>> Eric Robinson  schrieb am 04.08.2017 um
> >>> 06:53 in
> Nachricht
>  d03.prod.outlook.com>
> 
> > I have drbd 9.0.8. I started an online verify, and immediately checked
> > status, and I see...
> >
> > ha11a:/ha01_mysql/trimtester # drbdadm status ha01_mysql role:Primary
> >   disk:UpToDate
> >   ha11b role:Secondary
> > replication:VerifyT peer-disk:UpToDate done:100.00
> >
> > ...which looks like it is finished, but the tail of dmesg says...
> >
> > [336704.851209] drbd ha01_mysql/0 drbd0 ha11b: repl( Established ->
> > VerifyT ) [336704.851244] drbd ha01_mysql/0 drbd0: Online Verify start
> > sector: 0
> >
> > ...which looks like the verify is still in progress.
> >
> > So is it done, or is it still in progress? Is this a drbd bug?
> 
> Not deep into DRBD, but I guess "disk:UpToDate" just indicated that up to
> the present moment DRBD thinks the disks are up to date (unless veryfy
> wouold detect otherwise). Maybe there should be an additional status like
> "syncing,verifying, etc."
> 
> Regards,
> Ulrich
> 
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: LVM resource and DAS - would two resources off one DAS...

2017-08-04 Thread roger zhou




On 07/27/2017 09:20 PM, Ulrich Windl wrote:

Hi!

I think it will work, because the cluster does not monitor the PVs or prtition 
or LUNs. It just checks whether you can activate the LVs (i.e.: the VG). That's 
what I know...

Regards,
Ulrich


lejeczek  schrieb am 27.07.2017 um 15:05 in Nachricht

<636398a2-e8ea-644b-046b-ff12358de...@yahoo.co.uk>:

hi fellas

I realise this might be quite specialized topic, as this
regards hardware DAS(sas2) and LVM and cluster itself but
I'm hoping with some luck an expert peeps over here and I'll
get some or all the answers then.

question:
Can cluster manage two(or more) LVM resources which would be
on/in same single DAS storage and have these resources(eg.
one LVM runs on 1&2 the other LVM runs on 3&4) run on
different nodes(which naturally all connect to that single DAS)?


Yes, it works in production environment for users.

While, it could depend on your detailed scenarios. You should further 
evaluation if you need protect lvm vg metadata, eg. resize your lv on 
multiple node simultaneously. If so, to involve clvm or the coming 
lvmlockd is necessary.


--Roger



Now, I guess this might be something many do already and
many will say: trivial. In which case a few firm "yes"
confirmations will mean - typical, just do it.
Or could it be something unusual and untested but
might/should work when done with care and special "preparation"?

I understand that lots depends on what/how harwdare+kernel
do things, but if possible(?) I'd leave it out for now and
ask only the cluster itself - do you do it?

many thanks.
L.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org






___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Fwd: Multi cluster

Re: [ClusterLabs] Fwd: Multi cluster

Re: [ClusterLabs] big trouble with a DRBD resource

Re: [ClusterLabs] Notification agent and Notification recipients

Re: [ClusterLabs] Fwd: Multi cluster

[ClusterLabs] big trouble with a DRBD resource

Re: [ClusterLabs] Notification agent and Notification recipients

Re: [ClusterLabs] Notification agent and Notification recipients

Re: [ClusterLabs] Antw: verify status starts at 100% and stays there?

Re: [ClusterLabs] Antw: LVM resource and DAS - would two resources off one DAS...

10 matches

Site Navigation

Mail list logo

Footer information