Re: [ClusterLabs] snapshoting of running VirtualDomain resources - OCFS2 ?

2018-03-14 Thread Gang He
Hello Lentes,


>>> 
> Hi,
> 
> i have a 2-node-cluster with my services (web, db) running in VirtualDomain 
> resources.
> I have a SAN with cLVM, each guest lies in a dedicated logical volume with 
> an ext3 fs.
> 
> Currently i'm thinking about snapshoting the guests to make a backup in the 
> background. With cLVM that's not possible, you can't snapshot a lustered lv.
> Using virsh and qemu-img i didn't find a way to do this without a shutdown 
> of the guest, which i'd like to avoid.
> 
> I found that ocfs2 is able to make snapshots, oracle calls them reflinks.
> So formatting the logical volumes for the guests with ocfs2 would give me 
> the possibility to snapshot them.
> 
> I know that using ocfs2 for the lv's is oversized, but i didn't find another 
> way to solve my problem.
Yes, OCFS2 reflink can meet your requirement, this is also why ocfs2 introduces 
file clone. 

> 
> What do you think ? I'd like to avoid to shutdown my guests, that's too 
> risky. I experienced already several times that a shutdown can last very long
> because of problems with umounting filesystems because of open files or user 
> connected remotely (on windows guests).
Just one comments, you have to make sure the VM file integrity before calling 
reflink.

Thanks
Gang

> 
> 
> Bernd
> 
> -- 
> 
> Bernd Lentes 
> Systemadministration 
> Institut für Entwicklungsgenetik 
> Gebäude 35.34 - Raum 208 
> HelmholtzZentrum münchen 
> [ mailto:bernd.len...@helmholtz-muenchen.de | 
> bernd.len...@helmholtz-muenchen.de ] 
> phone: +49 89 3187 1241 
> fax: +49 89 3187 2294 
> [ http://www.helmholtz-muenchen.de/idg | 
> http://www.helmholtz-muenchen.de/idg ] 
> 
> no backup - no mercy
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de 
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync 2.4 CPG config change callback

2018-03-14 Thread Ken Gaillot
On Fri, 2018-03-09 at 17:26 +0100, Jan Friesse wrote:
> Thomas,
> 
> > Hi,
> > 
> > On 3/7/18 1:41 PM, Jan Friesse wrote:
> > > Thomas,
> > > 
> > > > First thanks for your answer!
> > > > 
> > > > On 3/7/18 11:16 AM, Jan Friesse wrote:
> 
> ...
> 
> > TotemConfchgCallback: ringid (1.1436)
> > active processors 3: 1 2 3
> > EXIT
> > Finalize  result is 1 (should be 1)
> > 
> > 
> > Hope I did both test right, but as it reproduces multiple times
> > with testcpg, our cpg usage in our filesystem, this seems like
> > valid tested, not just an single occurrence.
> 
> I've tested it too and yes, you are 100% right. Bug is there and
> it's 
> pretty easy to reproduce when node with lowest nodeid is paused.
> It's 
> slightly harder when node with higher nodeid is paused.
> 
> Most of the clusters are using power fencing, so they simply never
> sees 
> this problem. That may be also the reason why it wasn't reported
> long 
> time ago (this bug exists virtually at least since OpenAIS
> Whitetank). 
> So really nice work with finding this bug.
> 
> What I'm not entirely sure is what may be best way to solve this 
> problem. What I'm sure is, that it's going to be "fun" :(
> 
> Lets start with very high level of possible solutions:
> - "Ignore the problem". CPG behaves more or less correctly.
> "Current" 
> membership really didn't changed so it doesn't make too much sense
> to 
> inform about change. It's possible to use cpg_totem_confchg_fn_t to
> find 
> out when ringid changes. I'm adding this solution just for
> completeness, 
> because I don't prefer it at all.
> - cpg_confchg_fn_t adds all left and back joined into left/join list
> - cpg will sends extra cpg_confchg_fn_t call about left and joined 
> nodes. I would prefer this solution simply because it makes cpg
> behavior 
> equal in all situations.
> 
> Which of the options you would prefer? Same question also for @Ken (-

Pacemaker should react essentially the same whichever of the last two
options is used. There could be differences due to timing (the second
solution might allow some work to be done between when the left and
join messages are received), but I think it should behave reasonably
with either approach.

Interestingly, there is some old code in Pacemaker for handling when a
node left and rejoined but "the cluster layer didn't notice", that may
have been a workaround for this case.

> > 
> what would you prefer for PCMK) and @Chrissie.
> 
> Regards,
>    Honza
> 
> 
> > 
> > cheers,
> > Thomas
> > 
> > > > 
> > > > > Now it's really cpg application problem to synchronize its
> > > > > data. Many applications (usually FS) are using quorum
> > > > > together with fencing to find out, which cluster partition is
> > > > > quorate and clean inquorate one.
> > > > > 
> > > > > Hopefully my explanation help you and feel free to ask more
> > > > > questions!
> > > > > 
> > > > 
> > > > They help, but I'm still a bit unsure about why the CB could
> > > > not happen here,
> > > > may need to dive a bit deeper into corosync :)
> > > > 
> > > > > Regards,
> > > > >    Honza
> > > > > 
> > > > > > 
> > > > > > help would be appreciated, much thanks!
> > > > > > 
> > > > > > cheers,
> > > > > > Thomas
> > > > > > 
> > > > > > [1]: https://git.proxmox.com/?p=pve-cluster.git;a=tree;f=da
> > > > > > ta/src;h=e5493468b456ba9fe3f681f387b4cd5b86e7ca08;hb=HEAD
> > > > > > [2]: https://git.proxmox.com/?p=pve-cluster.git;a=blob;f=da
> > > > > > ta/src/dfsm.c;h=cdf473e8226ab9706d693a457ae70c0809afa0fa;hb
> > > > > > =HEAD#l1096
> > > > > > 
> > > > 
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> > 
> 
> 
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] [Announce] clufter v0.77.1 released

2018-03-14 Thread Jan Pokorný
I am happy to announce that clufter, a tool/library for transforming
and analyzing cluster configuration formats, got its version 0.77.1
tagged and released (incl. signature using my 60BCBB4F5CD7F9EF key):


or alternative (original) location:



The updated test suite for this version is also provided:

or alternatively:


Changelog highlights for v0.77.1 (also available as a tag message):

- bug fix and slight fine-tuning release

- bug fixes:
  . runnning [cp]cs2pccmd commands in the absence of "defaults"
submodule (repository checkout, broken installation) no longer
fails
  . corosync configuration parser, employed e.g. with pcs2pcscmd-needle
command, no longer mistreats commented-out lines with spaces or tabs
in front of the respective delimiter
  . some options that were mechanically introduced recently for the
purpose of converting existing configuration into the procedural
steps leading there with the help pcs tool (i.e. {cib,pcs}2pcscmd
belong to affected commands) turned to be not actually suppported
by current pcs versions under closer examination, specifically
"nodelist.node.name" setting in corosync.conf (added into embedded
corosync configuration schema, regardless, in accord with having
this value legalized in corosync proper, just as with
"resources.watchdog_device"), and "quorum.device.votes" with
"quorum.device.net.tls" herein as well
[resolves/related: rhbz#1517834]
[resolves: rhbz#1552666]
- feature extensions:
  . beyond, Linux top-level system selection (via --sys) of the target
the command output is to be taylored to (standing for a whole bunch
of parameters that would be too unwieldy to work with individually),
there's now also (rather preliminary) support for BSD family,
currently comprising only FreeBSD, for a simple fact that some
cluster packages are downstreamed there the usual way
- internal enhancements:
  . handling of ambiguous specification of the target distribution plus
its version has been improved, also for the input values coming
from run-time auto-detection, leading to more widely normalized
values internally, hence bringing more reliable outputs
(now also with Fedora Rawhide, for instance)

* * *

The public repository (notably master and next branches) is currently at

(rather than ).

Official, signed releases can be found at
 or, alternatively, at

(also beware, automatic git archives preserve a "dev structure").

Natively packaged in Fedora (python-clufter, clufter-cli, ...).

Issues & suggestions can be reported at either of (regardless if Fedora)
,
.


Happy clustering/high-availing :)

-- 
Jan (Poki)


pgpBQEHj8NMaR.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Ignore lost monitoring request

2018-03-14 Thread Klecho

Hi all,

As Ken said

"Not currently, but that is planned for a future version",

just want to remind how useful would be to have "ignore X monitoring 
timeouts" as an option in the newest pacemaker.


Still having big problems with resources restarting because of a lost 
monitoring requests, which leads to service interruptions.


Best regards,

Klecho


On 1.09.2017 17:52, Klechomir wrote:

On 1.09.2017 17:21, Jan Pokorný wrote:

On 01/09/17 09:48 +0300, Klechomir wrote:

I have cases, when for an unknown reason a single monitoring request
never returns result.
So having bigger timeouts doesn't resolve this problem.

If I get you right, the pain point here is a command called by the
resource agents during monitor operation, while this command under
some circumstances _never_ terminates (for dead waiting, infinite
loop, or whatever other reason) or possibly terminates based on
external/asynchronous triggers (e.g. network connection gets
reestablished).

Stating obvious, the solution should be:
- work towards fixing such particular command if blocking
   is an unexpected behaviour (clarify this with upstream
   if needed)
- find more reliable way for the agent to monitor the resource

For the planned soft-recovery options Ken talked about, I am not
sure if it would be trivially possible to differentiate exceeded
monitor timeout from a plain monitor failure.
In any case currently there is no differentiation between failed 
monitoring request and timeouts, so a parameter for ignoring X fails 
in a row would be very welcome for me.


Here is one very fresh example, entirely unrelated to LV/O:
Aug 30 10:44:19 [1686093] CLUSTER-1   crmd:error: 
process_lrm_event:LRM operation p_PingD_monitor_0 (1148) Timed Out 
(timeout=2ms)
Aug 30 10:44:56 [1686093] CLUSTER-1   crmd:   notice: 
process_lrm_event:LRM operation p_PingD_stop_0 (call=1234, rc=0, 
cib-update=40, confirmed=true) ok
Aug 30 10:45:26 [1686093] CLUSTER-1   crmd:   notice: 
process_lrm_event:LRM operation p_PingD_start_0 (call=1240, rc=0, 
cib-update=41, confirmed=true) ok
In this case PingD is fencing drbd and causes unneeded (as the next 
monitoring request is ok) restart of all related resources.



___
Users mailing list:Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home:http://www.clusterlabs.org
Getting started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:http://bugs.clusterlabs.org





--
Klecho

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] FW: ocf_heartbeat_IPaddr2 - Real MAC of interface is revealed

2018-03-14 Thread Andreas M. Iwanowski
Thank you Andrei, and apologies for being unclear: offline in this example was 
supposed to mean stopped for maintenance, i.e. with pcs cluster stop.

So, basically, here is what's going on:
VIP 172.16.16.9; mac = 11:54:33:a8:b2:6b
redmine1 172.16.16.10, if mac = 00:0c:29:8e:0c:a4
redmine2 172.16.16.11 if mac = 00:0c:29:96:9c:c6

1. Both nodes online, as pcs status shows
[root@redmine2 ~]# pcs status
[...]
2 nodes configured
2 resources configured

Online: [ redmine1 redmine2 ]

Full list of resources:

 Clone Set: RedmineIP-clone [RedmineIP] (unique)
 RedmineIP:0(ocf::heartbeat:IPaddr2):   Started redmine1
 RedmineIP:1(ocf::heartbeat:IPaddr2):   Started redmine2

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ARP entry:
? (172.16.16.9) at 11:54:33:a8:b2:6b on re1_vlan6 expires in 1197 seconds

Everything correct here.

2. redmine1 is stopped with pcs cluster stop 172.16.16.10; pcs status shows

[root@redmine2 ~]# pcs status
[...]
2 nodes configured
2 resources configured

Online: [ redmine2 ]
OFFLINE: [ redmine1 ]

Full list of resources:

 Clone Set: RedmineIP-clone [RedmineIP] (unique)
 RedmineIP:0(ocf::heartbeat:IPaddr2):   Started redmine2
 RedmineIP:1(ocf::heartbeat:IPaddr2):   Started redmine2

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Failover worked, both resources serviced by second host.
However, target now learned redmine2's max for VIP:
ARP entry:
? (172.16.16.9) at 00:0c:29:96:9c:c6 on re1_vlan6 expires in 1155 seconds

So far not "dangerous", as all IPs are serviced by redmine2 anyway.

3. But now, after failback via pcs cluster start 172.16.16.10:
[root@redmine2 ~]# pcs status
[...]
2 nodes configured
2 resources configured

Online: [ redmine1 redmine2 ]

Full list of resources:

 Clone Set: RedmineIP-clone [RedmineIP] (unique)
 RedmineIP:0(ocf::heartbeat:IPaddr2):   Started redmine2
 RedmineIP:1(ocf::heartbeat:IPaddr2):   Started redmine1

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ARP Entry: ? (172.16.16.9) at 00:0c:29:8e:0c:a4 on re1_vlan6 expires in 1184 
seconds
For some reason, the VIP now resolves to only redmine1 instead of Multicast MAC.
If the host should be serviced by redmine2 (through clusterip_hash=sourceip), 
then the VIP becomes unreachable!

So, wouldn't the correct behavior be to always maintain the Multicast MAC?





Mit freundlichen Grüßen / With best regards

Andreas Iwanowski- IT Administrator / Software Developer
www.awato.de |namez...@afim.info
T:+49 2133 26031 55 | F: +49 (0)2133 26031 01
awato Software GmbH | Salm Reifferscheidt Allee 37 | D-41540 Dormagen

avisor-Support | T: +49 (0)621 6094 043 | F: +49 (0)621 6071 447

Geschäftsführer: Ursula Iwanowski | HRB: Neuss 7208 | VAT-no.: DE 122796158


-Original Message-
From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Andrei Borzenkov
Sent: Wednesday, 14 March, 2018 8:01
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] FW: ocf_heartbeat_IPaddr2 - Real MAC of interface is 
revealed

On Wed, Mar 14, 2018 at 12:40 AM, Andreas M. Iwanowski  
wrote:
> Dear folks,
>
> We are currently trying to set up a multimaster cluster and use a cloned 
> ocf_heartbeat_IPaddr2 resource to share the IP address.
>
> We have, however, run into a problem that, when a cluster member is taken 
> offline, the MAC for the IP address changes from the multicast-MAC to the 
> interface mac of the remaining host.
> When the other host is put pack online, pings to the cluster IP time out when 
> it changes back to multicast (until the ARP cache on the router expires).
>

What exactly offline means? Host failure? You put node in standby in pacemaker? 
When MAC changes - immediately or after host/cluster restart?

> Is there any way to prevent network devices from learning the interface MACs? 
> I.e. even if one host is servicing both resources, use the multicast MAC?
> Any help would be appreciated!
>
> Here is the pcs status:
> ===
> Cluster name: test_svc
> WARNING: corosync and pacemaker node names do not match (IPs used in
> setup?)
> Stack: corosync
> Current DC: host1 (version 1.1.16-12.el7_4.8-94ff4df) - partition with
> quorum Last updated: Tue Mar 13 07:12:07 2018 Last change: Sun Mar 11
> 17:17:04 2018 by hacluster via crmd on host1
>
> 2 nodes configured
> 2 resources configured
>
> Online: [ host1 host2 ]
>

I guess output when one host is "offline" would be needed here.

> Full list of resources:
>
>  Clone Set: RedmineIP-clone [RedmineIP] (unique)
>  RedmineIP:0(ocf::heartbeat:IPaddr2):   Started host1
>  RedmineIP:1(ocf::heartbeat:IPaddr2):   Started host2
>
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   

[ClusterLabs] Antw: snapshoting of running VirtualDomain resources - OCFS2 ?

2018-03-14 Thread Ulrich Windl
Hi!

IMHO the only clean solution would be this procedure:
1) pause the VMs and cause them to flush their disk buffers, or at least make
sure the writes of the VM guest arrived at the VM host's buffers
2) Cause the VM host filesystem buffers to be flushed to the disk (i.e.: LV)
3) Make a snapshot of the LV on the host
4) Unpause the VM guest
5) backup the LV snapshot on the host
6) delete the LV snapshot on the host

Truely cool would be a solution where a snapshot created inside the VM is
abailable as a snapshot on the host; then you could skip most steps.

Regards,
Ulrich

>>> "Lentes, Bernd"  schrieb am 14.03.2018
um
11:24 in Nachricht
<408515247.33283737.1521023085837.javamail.zim...@helmholtz-muenchen.de>:
> Hi,
> 
> i have a 2-node-cluster with my services (web, db) running in VirtualDomain

> resources.
> I have a SAN with cLVM, each guest lies in a dedicated logical volume with 
> an ext3 fs.
> 
> Currently i'm thinking about snapshoting the guests to make a backup in the

> background. With cLVM that's not possible, you can't snapshot a lustered
lv.
> Using virsh and qemu-img i didn't find a way to do this without a shutdown 
> of the guest, which i'd like to avoid.
> 
> I found that ocfs2 is able to make snapshots, oracle calls them reflinks.
> So formatting the logical volumes for the guests with ocfs2 would give me 
> the possibility to snapshot them.
> 
> I know that using ocfs2 for the lv's is oversized, but i didn't find another

> way to solve my problem.
> 
> What do you think ? I'd like to avoid to shutdown my guests, that's too 
> risky. I experienced already several times that a shutdown can last very
long
> because of problems with umounting filesystems because of open files or user

> connected remotely (on windows guests).
> 
> 
> Bernd
> 
> -- 
> 
> Bernd Lentes 
> Systemadministration 
> Institut für Entwicklungsgenetik 
> Gebäude 35.34 - Raum 208 
> HelmholtzZentrum münchen 
> [ mailto:bernd.len...@helmholtz-muenchen.de | 
> bernd.len...@helmholtz-muenchen.de ] 
> phone: +49 89 3187 1241 
> fax: +49 89 3187 2294 
> [ http://www.helmholtz-muenchen.de/idg | 
> http://www.helmholtz-muenchen.de/idg ] 
> 
> no backup - no mercy
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de 
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 



___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: single node fails to start the ocfs2 resource

2018-03-14 Thread Klaus Wenninger
On 03/14/2018 08:35 AM, Muhammad Sharfuddin wrote:
> Hi Andrei,
> >Somehow I miss corosync confiuration in this thread. Do you know
> >wait-for-all is set (how?) or you just assume it?
> >
> solution found, I was not using "wait_for_all"  option, I was assuming
> that "two_node: 1"
> would be sufficient:
>
> nodelist {
>     node { ring0_addr: 10.8.9.151  }
>     node { ring0_addr: 10.8.9.152  }
> }
> ###previously:
> quorum {
>     two_node:   1
>     provider:   corosync_votequorum
> }
> ###now/fix:
> quorum {
>     two_node:   1
>     provider:   corosync_votequorum
>     wait_for_all: 0  }
>
> My observation:
> when I was not using "wait_for_all: 0" in corosync.conf, only ocfs2
> resources were
> not running, rest of the resources were running fine because:
>     a - "two_node: 1" in corosync.conf file.
>     b - "no-quorum-policy=ignore" in cib.

If you now loose network-connection between the two nodes
one node might be lucky to fence the other.
If it is set to just power-off the other you are probably fine.
(With sbd you can achieve this behavior if you configure it
to just come up if the corresponding slot is clean.)
If fencing reboots the other node that one would come up
and right away fence the first doing startup-fencing.

>
> @ Klaus
> > what I tried to point out is that "no-quorum-policy=ignore"
> >is dangerous for services that do require a resource-manager. If you
> don't
> >have any of those go with a systemd startup.
> >
> running a single node is obviously something in-acceptable, but say if
> both the nodes crashes
> and only node come back and if I start the resources via systemd then
> the day the other node
> come back, I have to stop the services via systemd, to start the
> resources via cluster, while if a
> single node cluster was running the other node simply joins the
> cluster and no downtime would occur.

I had meant (a little bit provocative ;-) ) consider if you need the
resources to be started via a
resource-manager at all.

Klaus
>
> -- 
> Regards,
> Muhammad Sharfuddin
>
> On 3/13/2018 11:20 PM, Andrei Borzenkov wrote:
>> 13.03.2018 17:32, Klaus Wenninger пишет:
>>> On 03/13/2018 02:30 PM, Muhammad Sharfuddin wrote:
 Yes, by saying pacemaker,  I meant to say corosync as well.

 Is there any fix ? or a two node cluster can't run ocfs2 resources
 when one node is offline ?
>>> Actually there can't be a "fix" as 2 nodes are just not enough
>>> for a partial-cluster to be quorate in the classical sense
>>> (more votes than half of the cluster nodes).
>>>
>>> So to still be able to use it we have this 2-node config that
>>> permanently sets quorum. But not to run into issues on
>>> startup we need it to require both nodes seeing each
>>> other once.
>>>
>> I'm rather confused. I have run quite a lot of 2 node clusters and
>> standard way to resolve it is to require fencing on startup. Then single
>> node may assume it can safely proceed with starting resources. So it is
>> rather unexpected to suddenly read "cannot be fixed".
>>
>>> So this is definitely nothing that is specific to ocfs2.
>>> It just looks specific to ocfs2 because you've disabled
>>> quorum for pacemaker.
>>> To be honnest doing this you wouldn't need a resource-manager
>>> at all and could just start up your services using systemd.
>>>
>>> If you don't want a full 3rd node, and still want to handle cases
>>> where one node doesn't come up after a full shutdown of
>>> all nodes, you probably could go for a setup with qdevice.
 Regards,
>>> Klaus
>>>
 -- 
 Regards,
 Muhammad Sharfuddin

 On 3/13/2018 6:16 PM, Klaus Wenninger wrote:
> On 03/13/2018 02:03 PM, Muhammad Sharfuddin wrote:
>> Hi,
>>
>> 1 - if I put a node(node2) offline; ocfs2 resources keep running on
>> online node(node1)
>>
>> 2 - while node2 was offline, via cluster I stop/start the ocfs2
>> resource group successfully so many times in a row.
>>
>> 3 - while node2 was offline; I restart the pacemaker service on the
>> node1 and then tries to start the ocfs2 resource group, dlm started
>> but ocfs2 file system resource does not start.
>>
>> Nutshell:
>>
>> a - both nodes must be online to start the ocfs2 resource.
>>
>> b - if one crashes or offline(gracefully) ocfs2 resource keeps
>> running
>> on the other/surviving node.
>>
>> c - while one node was offline, we can stop/start the ocfs2 resource
>> group on the surviving node but if we stops the pacemaker service,
>> then ocfs2 file system resource does not start with the following
>> info
>> in the logs:
> >From the logs I would say startup of dlm_controld times out
> because it
> is waiting
> for quorum - which doesn't happen because of wait-for-all.
>> Somehow I miss corosync confiuration in this thread. Do you know
>> wait-for-all is set (how?) or you just assume it?
>>
>> 

Re: [ClusterLabs] Antw: Re: single node fails to start the ocfs2 resource

2018-03-14 Thread Andrei Borzenkov
On Wed, Mar 14, 2018 at 10:35 AM, Muhammad Sharfuddin
 wrote:
> Hi Andrei,
>>Somehow I miss corosync confiuration in this thread. Do you know
>>wait-for-all is set (how?) or you just assume it?
>>
> solution found, I was not using "wait_for_all"  option, I was assuming that
> "two_node: 1"
> would be sufficient:
>
> nodelist {
> node { ring0_addr: 10.8.9.151  }
> node { ring0_addr: 10.8.9.152  }
> }
> ###previously:
> quorum {
> two_node:   1
> provider:   corosync_votequorum
> }
> ###now/fix:
> quorum {
> two_node:   1
> provider:   corosync_votequorum
> wait_for_all: 0  }
>
> My observation:
> when I was not using "wait_for_all: 0" in corosync.conf, only ocfs2
> resources were
> not running, rest of the resources were running fine because:

OK, I tested it and indeed, when wait_for_all is (explicitly)
disabled, single node comes up quorate (immediately). It still
requests fencing of other node. So trying to wrap my head around it

1. two_node=1 appears to only permanently set "in quorate" state for
each node. So whether you have 1 or 2 nodes, you are in quorum. E.g.
with expected_votes=2 even if I kill one node I am left with single
node that believes it is in "partition with quorum".

2. two_node=1 implicitly sets wait_for_all which prevents corosync
entering quorate state until all nodes are up. Once they have been up,
we are left in quorum.

As long as OCFS2 requires quorum to be attained this also explains
your observation.

> a - "two_node: 1" in corosync.conf file.
> b - "no-quorum-policy=ignore" in cib.
>

If my reasoning above is correct, I question the value of
wait_for_all=1 with two_node. This is difference between "pretending
we have quorum" and "ignoring we have no quorum", but split between
different layers. End effect is the same as long as corosync quorum
state is not queried directly.

> @ Klaus
>> what I tried to point out is that "no-quorum-policy=ignore"
>>is dangerous for services that do require a resource-manager. If you don't
>>have any of those go with a systemd startup.
>>
> running a single node is obviously something in-acceptable, but say if both
> the nodes crashes
> and only node come back and if I start the resources via systemd then the
> day the other node
> come back, I have to stop the services via systemd, to start the resources
> via cluster, while if a
> single node cluster was running the other node simply joins the cluster and
> no downtime would occur.
>

Exactly. There is simply no other way to sensibly use two node cluster
without it and I argue that notion of quorum is not relevant to most
parts of pacemaker operation at all as long as stonith wirks properly.

Again - if you use two_node=1, your cluster is ALWAYS in quorum except
initial startup. So no-quorum-policy=ignore is redundant. It is only
needed because of implicit wait_for_all=1. But if everyone ignores
implicit wait_for_all=1 anyway, what's the point to set it by default?
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: single node fails to start the ocfs2 resource

2018-03-14 Thread Muhammad Sharfuddin

Hi Andrei,
>Somehow I miss corosync confiuration in this thread. Do you know
>wait-for-all is set (how?) or you just assume it?
>
solution found, I was not using "wait_for_all"  option, I was assuming 
that "two_node: 1"

would be sufficient:

nodelist {
    node { ring0_addr: 10.8.9.151  }
    node { ring0_addr: 10.8.9.152  }
}
###previously:
quorum {
    two_node:   1
    provider:   corosync_votequorum
}
###now/fix:
quorum {
    two_node:   1
    provider:   corosync_votequorum
    wait_for_all: 0  }

My observation:
when I was not using "wait_for_all: 0" in corosync.conf, only ocfs2 
resources were

not running, rest of the resources were running fine because:
    a - "two_node: 1" in corosync.conf file.
    b - "no-quorum-policy=ignore" in cib.

@ Klaus
> what I tried to point out is that "no-quorum-policy=ignore"
>is dangerous for services that do require a resource-manager. If you don't
>have any of those go with a systemd startup.
>
running a single node is obviously something in-acceptable, but say if 
both the nodes crashes
and only node come back and if I start the resources via systemd then 
the day the other node
come back, I have to stop the services via systemd, to start the 
resources via cluster, while if a
single node cluster was running the other node simply joins the cluster 
and no downtime would occur.


--
Regards,
Muhammad Sharfuddin

On 3/13/2018 11:20 PM, Andrei Borzenkov wrote:

13.03.2018 17:32, Klaus Wenninger пишет:

On 03/13/2018 02:30 PM, Muhammad Sharfuddin wrote:

Yes, by saying pacemaker,  I meant to say corosync as well.

Is there any fix ? or a two node cluster can't run ocfs2 resources
when one node is offline ?

Actually there can't be a "fix" as 2 nodes are just not enough
for a partial-cluster to be quorate in the classical sense
(more votes than half of the cluster nodes).

So to still be able to use it we have this 2-node config that
permanently sets quorum. But not to run into issues on
startup we need it to require both nodes seeing each
other once.


I'm rather confused. I have run quite a lot of 2 node clusters and
standard way to resolve it is to require fencing on startup. Then single
node may assume it can safely proceed with starting resources. So it is
rather unexpected to suddenly read "cannot be fixed".


So this is definitely nothing that is specific to ocfs2.
It just looks specific to ocfs2 because you've disabled
quorum for pacemaker.
To be honnest doing this you wouldn't need a resource-manager
at all and could just start up your services using systemd.

If you don't want a full 3rd node, and still want to handle cases
where one node doesn't come up after a full shutdown of
all nodes, you probably could go for a setup with qdevice.

Regards,

Klaus


--
Regards,
Muhammad Sharfuddin

On 3/13/2018 6:16 PM, Klaus Wenninger wrote:

On 03/13/2018 02:03 PM, Muhammad Sharfuddin wrote:

Hi,

1 - if I put a node(node2) offline; ocfs2 resources keep running on
online node(node1)

2 - while node2 was offline, via cluster I stop/start the ocfs2
resource group successfully so many times in a row.

3 - while node2 was offline; I restart the pacemaker service on the
node1 and then tries to start the ocfs2 resource group, dlm started
but ocfs2 file system resource does not start.

Nutshell:

a - both nodes must be online to start the ocfs2 resource.

b - if one crashes or offline(gracefully) ocfs2 resource keeps running
on the other/surviving node.

c - while one node was offline, we can stop/start the ocfs2 resource
group on the surviving node but if we stops the pacemaker service,
then ocfs2 file system resource does not start with the following info
in the logs:

>From the logs I would say startup of dlm_controld times out because it
is waiting
for quorum - which doesn't happen because of wait-for-all.

Somehow I miss corosync confiuration in this thread. Do you know
wait-for-all is set (how?) or you just assume it?

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] FW: ocf_heartbeat_IPaddr2 - Real MAC of interface is revealed

2018-03-14 Thread Andrei Borzenkov
On Wed, Mar 14, 2018 at 12:40 AM, Andreas M. Iwanowski
 wrote:
> Dear folks,
>
> We are currently trying to set up a multimaster cluster and use a cloned 
> ocf_heartbeat_IPaddr2 resource to share the IP address.
>
> We have, however, run into a problem that, when a cluster member is taken 
> offline, the MAC for the IP address changes from the multicast-MAC to the 
> interface mac of the remaining host.
> When the other host is put pack online, pings to the cluster IP time out when 
> it changes back to multicast (until the ARP cache on the router expires).
>

What exactly offline means? Host failure? You put node in standby in
pacemaker? When MAC changes - immediately or after host/cluster
restart?

> Is there any way to prevent network devices from learning the interface MACs? 
> I.e. even if one host is servicing both resources, use the multicast MAC?
> Any help would be appreciated!
>
> Here is the pcs status:
> ===
> Cluster name: test_svc
> WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
> Stack: corosync
> Current DC: host1 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum 
> Last updated: Tue Mar 13 07:12:07 2018 Last change: Sun Mar 11 17:17:04 2018 
> by hacluster via crmd on host1
>
> 2 nodes configured
> 2 resources configured
>
> Online: [ host1 host2 ]
>

I guess output when one host is "offline" would be needed here.

> Full list of resources:
>
>  Clone Set: RedmineIP-clone [RedmineIP] (unique)
>  RedmineIP:0(ocf::heartbeat:IPaddr2):   Started host1
>  RedmineIP:1(ocf::heartbeat:IPaddr2):   Started host2
>
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
> ===
>
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org