Re: [ClusterLabs] corosync doesn't start any resource

2018-06-25 Thread Stefan Krueger
Hello Andrei,

> Then you need to set symmetrical="false".
yep, now it seems to work now, thank you very much!

> I assume this would be "pcs constraint order set ...
> symmetrical=false".
yes almost:
pcs constraint order set nfs-server vm_storage ha-ip action=start setoptions 
symmetrical=false


Thank you very very much!

best regards
Stefan

> Gesendet: Samstag, 23. Juni 2018 um 22:13 Uhr
> Von: "Andrei Borzenkov" 
> An: users@clusterlabs.org
> Betreff: Re: [ClusterLabs] corosync doesn't start any resource
>
> 22.06.2018 11:22, Stefan Krueger пишет:
> > Hello Andrei,
> > 
> > thanks for this hint, but I need this "special" order. In an other setup it 
> > works.
> > 
> 
> Then you need to set symmetrical="false". Otherwise pacemaker implicitly
> creates reverse order which leads to deadlock. I am not intimately
> familiar with pcs, I assume this would be "pcs constraint order set ...
> symmetrical=false".
> 
> > best regards
> > Stefan
> > 
> >> Gesendet: Freitag, 22. Juni 2018 um 06:57 Uhr
> >> Von: "Andrei Borzenkov" 
> >> An: users@clusterlabs.org
> >> Betreff: Re: [ClusterLabs] corosync doesn't start any resource
> >>
> >> 21.06.2018 16:04, Stefan Krueger пишет:
> >>> Hi Ken,
> >>>
> >>>> Can you attach the pe-input file listed just above here?
> >>> done ;) 
> >>>
> >>> And thank you for your patience!
> >>>
> >>
> >> You delete all context which makes it hard to answer. This is not web
> >> forum where users can simply scroll up to see previous reply.
> >>
> >> Both your logs and pe-input show that nfs-server and vm-storage wait for
> >> each other.
> >>
> >> My best guess is that you have incorrect ordering for start and stop
> >> which causes loop in pacemaker decision. Your start order is "nfs-server
> >> vm-storage" and your stop order is "nfs-server vm-storage", while it
> >> should normally be symmetrical. Reversing order in one of sets makes it
> >> work as intended (verified).
> >>
> >> I would actually expect that asymmetrical configuration still should
> >> work, so I leave it to pacemaker developers to comment whether this is a
> >> bug or feature :)
> >>
> >> ___
> >> Users mailing list: Users@clusterlabs.org
> >> https://lists.clusterlabs.org/mailman/listinfo/users
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >>
> > ___
> > Users mailing list: Users@clusterlabs.org
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> > 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync doesn't start any resource

2018-06-23 Thread Andrei Borzenkov
22.06.2018 11:22, Stefan Krueger пишет:
> Hello Andrei,
> 
> thanks for this hint, but I need this "special" order. In an other setup it 
> works.
> 

Then you need to set symmetrical="false". Otherwise pacemaker implicitly
creates reverse order which leads to deadlock. I am not intimately
familiar with pcs, I assume this would be "pcs constraint order set ...
symmetrical=false".

> best regards
> Stefan
> 
>> Gesendet: Freitag, 22. Juni 2018 um 06:57 Uhr
>> Von: "Andrei Borzenkov" 
>> An: users@clusterlabs.org
>> Betreff: Re: [ClusterLabs] corosync doesn't start any resource
>>
>> 21.06.2018 16:04, Stefan Krueger пишет:
>>> Hi Ken,
>>>
>>>> Can you attach the pe-input file listed just above here?
>>> done ;) 
>>>
>>> And thank you for your patience!
>>>
>>
>> You delete all context which makes it hard to answer. This is not web
>> forum where users can simply scroll up to see previous reply.
>>
>> Both your logs and pe-input show that nfs-server and vm-storage wait for
>> each other.
>>
>> My best guess is that you have incorrect ordering for start and stop
>> which causes loop in pacemaker decision. Your start order is "nfs-server
>> vm-storage" and your stop order is "nfs-server vm-storage", while it
>> should normally be symmetrical. Reversing order in one of sets makes it
>> work as intended (verified).
>>
>> I would actually expect that asymmetrical configuration still should
>> work, so I leave it to pacemaker developers to comment whether this is a
>> bug or feature :)
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync doesn't start any resource

2018-06-22 Thread Stefan Krueger
Hello Andrei,

thanks for this hint, but I need this "special" order. In an other setup it 
works.

best regards
Stefan

> Gesendet: Freitag, 22. Juni 2018 um 06:57 Uhr
> Von: "Andrei Borzenkov" 
> An: users@clusterlabs.org
> Betreff: Re: [ClusterLabs] corosync doesn't start any resource
>
> 21.06.2018 16:04, Stefan Krueger пишет:
> > Hi Ken,
> > 
> >> Can you attach the pe-input file listed just above here?
> > done ;) 
> > 
> > And thank you for your patience!
> > 
> 
> You delete all context which makes it hard to answer. This is not web
> forum where users can simply scroll up to see previous reply.
> 
> Both your logs and pe-input show that nfs-server and vm-storage wait for
> each other.
> 
> My best guess is that you have incorrect ordering for start and stop
> which causes loop in pacemaker decision. Your start order is "nfs-server
> vm-storage" and your stop order is "nfs-server vm-storage", while it
> should normally be symmetrical. Reversing order in one of sets makes it
> work as intended (verified).
> 
> I would actually expect that asymmetrical configuration still should
> work, so I leave it to pacemaker developers to comment whether this is a
> bug or feature :)
> 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync doesn't start any resource

2018-06-21 Thread Andrei Borzenkov
21.06.2018 16:04, Stefan Krueger пишет:
> Hi Ken,
> 
>> Can you attach the pe-input file listed just above here?
> done ;) 
> 
> And thank you for your patience!
> 

You delete all context which makes it hard to answer. This is not web
forum where users can simply scroll up to see previous reply.

Both your logs and pe-input show that nfs-server and vm-storage wait for
each other.

My best guess is that you have incorrect ordering for start and stop
which causes loop in pacemaker decision. Your start order is "nfs-server
vm-storage" and your stop order is "nfs-server vm-storage", while it
should normally be symmetrical. Reversing order in one of sets makes it
work as intended (verified).

I would actually expect that asymmetrical configuration still should
work, so I leave it to pacemaker developers to comment whether this is a
bug or feature :)

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync doesn't start any resource

2018-06-21 Thread Stefan Krueger
Hi Ken,

> Can you attach the pe-input file listed just above here?
done ;) 

And thank you for your patience!

best regards
Stefan

pre-input-228.bz2
Description: application/bzip
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync doesn't start any resource

2018-06-20 Thread Ken Gaillot
On Wed, 2018-06-20 at 13:30 +0200, Stefan Krueger wrote:
> Hi Ken,
> 
> I don't see any issues in the logs, periodically this is in the logs:
> 
> Jun 20 11:52:19 [5613] zfs-serv3   crmd: info:
> crm_timer_popped:PEngine Recheck Timer (I_PE_CALC) just
> popped (90ms)
> Jun 20 11:52:19 [5613] zfs-serv3   crmd:   notice:
> do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE |
> input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped
> Jun 20 11:52:19 [5613] zfs-serv3   crmd: info:
> do_state_transition: Progressed to state S_POLICY_ENGINE after
> C_TIMER_POPPED
> Jun 20 11:52:19 [5612] zfs-serv3pengine: info:
> process_pe_message:  Input has not changed since last time, not
> saving to disk
> Jun 20 11:52:19 [5612] zfs-serv3pengine: info:
> determine_online_status: Node zfs-serv3 is online
> Jun 20 11:52:19 [5612] zfs-serv3pengine: info:
> determine_online_status: Node zfs-serv4 is online
> Jun 20 11:52:19 [5612] zfs-serv3pengine: info:
> native_print:vm_storage  (ocf::heartbeat:ZFS):   Stopped
> Jun 20 11:52:19 [5612] zfs-serv3pengine: info:
> native_print:ha-ip   (ocf::heartbeat:IPaddr2):   Stopped
> Jun 20 11:52:19 [5612] zfs-serv3pengine: info:
> native_print:resIPMI-
> zfs4(stonith:external/ipmi):Started zfs-serv3
> Jun 20 11:52:19 [5612] zfs-serv3pengine: info:
> native_print:resIPMI-
> zfs3(stonith:external/ipmi):Started zfs-serv4
> Jun 20 11:52:19 [5612] zfs-serv3pengine: info:
> native_print:nfs-server  (systemd:nfs-server):   Stopped
> Jun 20 11:52:19 [5612] zfs-serv3pengine: info:
> RecurringOp:  Start recurring monitor (5s) for vm_storage on zfs-
> serv3
> Jun 20 11:52:19 [5612] zfs-serv3pengine: info:
> RecurringOp:  Start recurring monitor (10s) for ha-ip on zfs-
> serv3
> Jun 20 11:52:19 [5612] zfs-serv3pengine: info:
> RecurringOp:  Start recurring monitor (60s) for nfs-server on
> zfs-serv3
> Jun 20 11:52:19 [5612] zfs-serv3pengine:   notice:
> LogActions:  Start   vm_storage  (zfs-serv3)
> Jun 20 11:52:19 [5612] zfs-serv3pengine:   notice:
> LogActions:  Start   ha-ip   (zfs-serv3)
> Jun 20 11:52:19 [5612] zfs-serv3pengine: info:
> LogActions:  Leave   resIPMI-zfs4(Started zfs-serv3)
> Jun 20 11:52:19 [5612] zfs-serv3pengine: info:
> LogActions:  Leave   resIPMI-zfs3(Started zfs-serv4)
> Jun 20 11:52:19 [5612] zfs-serv3pengine:   notice:
> LogActions:  Start   nfs-server  (zfs-serv3)
> Jun 20 11:52:19 [5612] zfs-serv3pengine:   notice:
> process_pe_message:  Calculated transition 80, saving inputs in
> /var/lib/pacemaker/pengine/pe-input-228.bz2
> Jun 20 11:52:19 [5613] zfs-serv3   crmd: info:
> do_state_transition: State transition S_POLICY_ENGINE ->
> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE
> origin=handle_response
> Jun 20 11:52:19 [5613] zfs-serv3   crmd: info:
> do_te_invoke:Processing graph 80 (ref=pe_calc-dc-1529488339-113)
> derived from /var/lib/pacemaker/pengine/pe-input-228.bz2
> Jun 20 11:52:19 [5613] zfs-serv3   crmd:  warning:
> run_graph:   Transition 80 (Complete=0, Pending=0, Fired=0,
> Skipped=0, Incomplete=6, Source=/var/lib/pacemaker/pengine/pe-input-
> 228.bz2): Terminated
> Jun 20 11:52:19 [5613] zfs-serv3   crmd:  warning:
> te_graph_trigger:Transition failed: terminated

Well this is why nothing starts, but I don't see any reason it happens.
:-/

Can you attach the pe-input file listed just above here?

> Jun 20 11:52:19 [5613] zfs-serv3   crmd:   notice:
> print_graph: Graph 80 with 6 actions: batch-limit=0 jobs,
> network-delay=6ms
> Jun 20 11:52:19 [5613] zfs-serv3   crmd:   notice:
> print_synapse:   [Action5]: Pending rsc op
> vm_storage_monitor_5000 on zfs-serv3 (priority: 0,
> waiting:  4)
> Jun 20 11:52:19 [5613] zfs-serv3   crmd:   notice:
> print_synapse:   [Action4]: Pending rsc op
> vm_storage_start_0  on zfs-serv3 (priority: 0,
> waiting:  12)
> Jun 20 11:52:19 [5613] zfs-serv3   crmd:   notice:
> print_synapse:   [Action7]: Pending rsc op ha-
> ip_monitor_1 on zfs-serv3 (priority: 0,
> waiting:  6)
> Jun 20 11:52:19 [5613] zfs-serv3   crmd:   notice:
> print_synapse:   [Action6]: Pending rsc op ha-
> ip_start_0   on zfs-serv3 (priority: 0,
> waiting:  4 12)
> Jun 20 11:52:19 [5613] zfs-serv3   crmd:   notice:
> print_synapse:   [Action   13]: Pending rsc op nfs-
> server_monitor_6on zfs-serv3 (priority: 0,
> waiting:  12)
> Jun 20 11:52:19 [5613] zfs-serv3   crmd:   notice:
> print_synapse:   [Action   12]: Pending rsc op nfs-
> server_start_0  on zfs-serv3 (priority: 0,
> waiting:  4)
> Jun 20 11:52:19 [5613] zfs-serv3   crmd: info: do_log:  Input

Re: [ClusterLabs] corosync doesn't start any resource

2018-06-20 Thread Stefan Krueger
Hi Ken,

I don't see any issues in the logs, periodically this is in the logs:

Jun 20 11:52:19 [5613] zfs-serv3   crmd: info: crm_timer_popped:
PEngine Recheck Timer (I_PE_CALC) just popped (90ms)
Jun 20 11:52:19 [5613] zfs-serv3   crmd:   notice: do_state_transition: 
State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC 
cause=C_TIMER_POPPED origin=crm_timer_popped
Jun 20 11:52:19 [5613] zfs-serv3   crmd: info: do_state_transition: 
Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED
Jun 20 11:52:19 [5612] zfs-serv3pengine: info: process_pe_message:  
Input has not changed since last time, not saving to disk
Jun 20 11:52:19 [5612] zfs-serv3pengine: info: determine_online_status: 
Node zfs-serv3 is online
Jun 20 11:52:19 [5612] zfs-serv3pengine: info: determine_online_status: 
Node zfs-serv4 is online
Jun 20 11:52:19 [5612] zfs-serv3pengine: info: native_print:
vm_storage  (ocf::heartbeat:ZFS):   Stopped
Jun 20 11:52:19 [5612] zfs-serv3pengine: info: native_print:ha-ip   
(ocf::heartbeat:IPaddr2):   Stopped
Jun 20 11:52:19 [5612] zfs-serv3pengine: info: native_print:
resIPMI-zfs4(stonith:external/ipmi):Started zfs-serv3
Jun 20 11:52:19 [5612] zfs-serv3pengine: info: native_print:
resIPMI-zfs3(stonith:external/ipmi):Started zfs-serv4
Jun 20 11:52:19 [5612] zfs-serv3pengine: info: native_print:
nfs-server  (systemd:nfs-server):   Stopped
Jun 20 11:52:19 [5612] zfs-serv3pengine: info: RecurringOp:  Start 
recurring monitor (5s) for vm_storage on zfs-serv3
Jun 20 11:52:19 [5612] zfs-serv3pengine: info: RecurringOp:  Start 
recurring monitor (10s) for ha-ip on zfs-serv3
Jun 20 11:52:19 [5612] zfs-serv3pengine: info: RecurringOp:  Start 
recurring monitor (60s) for nfs-server on zfs-serv3
Jun 20 11:52:19 [5612] zfs-serv3pengine:   notice: LogActions:  Start   
vm_storage  (zfs-serv3)
Jun 20 11:52:19 [5612] zfs-serv3pengine:   notice: LogActions:  Start   
ha-ip   (zfs-serv3)
Jun 20 11:52:19 [5612] zfs-serv3pengine: info: LogActions:  Leave   
resIPMI-zfs4(Started zfs-serv3)
Jun 20 11:52:19 [5612] zfs-serv3pengine: info: LogActions:  Leave   
resIPMI-zfs3(Started zfs-serv4)
Jun 20 11:52:19 [5612] zfs-serv3pengine:   notice: LogActions:  Start   
nfs-server  (zfs-serv3)
Jun 20 11:52:19 [5612] zfs-serv3pengine:   notice: process_pe_message:  
Calculated transition 80, saving inputs in 
/var/lib/pacemaker/pengine/pe-input-228.bz2
Jun 20 11:52:19 [5613] zfs-serv3   crmd: info: do_state_transition: 
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response
Jun 20 11:52:19 [5613] zfs-serv3   crmd: info: do_te_invoke:
Processing graph 80 (ref=pe_calc-dc-1529488339-113) derived from 
/var/lib/pacemaker/pengine/pe-input-228.bz2
Jun 20 11:52:19 [5613] zfs-serv3   crmd:  warning: run_graph:   
Transition 80 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=6, 
Source=/var/lib/pacemaker/pengine/pe-input-228.bz2): Terminated
Jun 20 11:52:19 [5613] zfs-serv3   crmd:  warning: te_graph_trigger:
Transition failed: terminated
Jun 20 11:52:19 [5613] zfs-serv3   crmd:   notice: print_graph: Graph 
80 with 6 actions: batch-limit=0 jobs, network-delay=6ms
Jun 20 11:52:19 [5613] zfs-serv3   crmd:   notice: print_synapse:   [Action 
   5]: Pending rsc op vm_storage_monitor_5000 on zfs-serv3 
(priority: 0, waiting:  4)
Jun 20 11:52:19 [5613] zfs-serv3   crmd:   notice: print_synapse:   [Action 
   4]: Pending rsc op vm_storage_start_0  on zfs-serv3 
(priority: 0, waiting:  12)
Jun 20 11:52:19 [5613] zfs-serv3   crmd:   notice: print_synapse:   [Action 
   7]: Pending rsc op ha-ip_monitor_1 on zfs-serv3 
(priority: 0, waiting:  6)
Jun 20 11:52:19 [5613] zfs-serv3   crmd:   notice: print_synapse:   [Action 
   6]: Pending rsc op ha-ip_start_0   on zfs-serv3 
(priority: 0, waiting:  4 12)
Jun 20 11:52:19 [5613] zfs-serv3   crmd:   notice: print_synapse:   [Action 
  13]: Pending rsc op nfs-server_monitor_6on zfs-serv3 
(priority: 0, waiting:  12)
Jun 20 11:52:19 [5613] zfs-serv3   crmd:   notice: print_synapse:   [Action 
  12]: Pending rsc op nfs-server_start_0  on zfs-serv3 
(priority: 0, waiting:  4)
Jun 20 11:52:19 [5613] zfs-serv3   crmd: info: do_log:  Input 
I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
Jun 20 11:52:19 [5613] zfs-serv3   crmd:   notice: do_state_transition: 
State transition S_TRANSITION_ENGINE -> S_IDLE | input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd
Jun 20 12:07:19 [5613] zfs-serv3   crmd: info: crm_timer_popped:
PEngine 

Re: [ClusterLabs] corosync doesn't start any resource

2018-06-19 Thread Ken Gaillot
On Tue, 2018-06-19 at 16:17 +0200, Stefan Krueger wrote:
> Hi Ken,
> 
> thanks for help!
> I create a stonith-device and delete the no-quorum-policy.
> 
> It doesn't change anything, so I delete the orders, (co)locations and
> one ressource (nfs-server). at first it works fine but when I stop a
> cluster via 'pcs cluster stop' it takes infinity time, it looks like
> it has an problem with the nfs server so I tried to stop them
> manuelly via systemctl stop nfs-server, but it didn't change anything
> - the nfs-server won't stop. So I did a reset the server, now
> everything should move to the other node but it also didn't happen :(
> 
> Manually I can start/stop the nfs-server without any problems (nobody
> mount the nfs-share yet):
> systemctl start nfs-server.service ; sleep 5; systemctl status nfs-
> server.service ; sleep 5; systemctl stop nfs-server
> 
> so, again my ressources won't start
> pcs status
> Cluster name: zfs-vmstorage
> Stack: corosync
> Current DC: zfs-serv3 (version 1.1.16-94ff4df) - partition with
> quorum
> Last updated: Tue Jun 19 16:15:37 2018
> Last change: Tue Jun 19 15:41:24 2018 by hacluster via crmd on zfs-
> serv4
> 
> 2 nodes configured
> 5 resources configured
> 
> Online: [ zfs-serv3 zfs-serv4 ]
> 
> Full list of resources:
> 
>  vm_storage (ocf::heartbeat:ZFS):   Stopped
>  ha-ip  (ocf::heartbeat:IPaddr2):   Stopped
>  resIPMI-zfs4   (stonith:external/ipmi):Started zfs-serv3
>  resIPMI-zfs3   (stonith:external/ipmi):Started zfs-serv4
>  nfs-server (systemd:nfs-server):   Stopped

I'd check the logs for more information. It's odd that status doesn't
show any failures, which suggests the cluster didn't schedule any
actions.

The system log will have the most essential information. The detail log
(usually /var/log/pacemaker.log or /var/log/cluster/corosync.log) will
have extended information. The most interesting will be messages from
the pengine with actions to be scheduled ("Start", etc.). Then there
should be messages from the crmd about "Initiating" the command and
obtaining its "Result".

> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> 
> 
> 
> pcs config
> Cluster Name: zfs-vmstorage
> Corosync Nodes:
>  zfs-serv3 zfs-serv4
> Pacemaker Nodes:
>  zfs-serv3 zfs-serv4
> 
> Resources:
>  Resource: vm_storage (class=ocf provider=heartbeat type=ZFS)
>   Attributes: pool=vm_storage importargs="-d /dev/disk/by-vdev/"
>   Operations: monitor interval=5s timeout=30s (vm_storage-monitor-
> interval-5s)
>   start interval=0s timeout=90 (vm_storage-start-
> interval-0s)
>   stop interval=0s timeout=90 (vm_storage-stop-interval-
> 0s)
>  Resource: ha-ip (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: ip=172.16.101.73 cidr_netmask=16
>   Operations: start interval=0s timeout=20s (ha-ip-start-interval-0s)
>   stop interval=0s timeout=20s (ha-ip-stop-interval-0s)
>   monitor interval=10s timeout=20s (ha-ip-monitor-
> interval-10s)
>  Resource: nfs-server (class=systemd type=nfs-server)
>   Operations: start interval=0s timeout=100 (nfs-server-start-
> interval-0s)
>   stop interval=0s timeout=100 (nfs-server-stop-interval-
> 0s)
>   monitor interval=60 timeout=100 (nfs-server-monitor-
> interval-60)
> 
> Stonith Devices:
>  Resource: resIPMI-zfs4 (class=stonith type=external/ipmi)
>   Attributes: hostname=ipmi-zfs-serv4 ipaddr=172.xx.xx.17 userid=USER
> passwd=GEHEIM interface=lan
>   Operations: monitor interval=60s (resIPMI-zfs4-monitor-interval-
> 60s)
>  Resource: resIPMI-zfs3 (class=stonith type=external/ipmi)
>   Attributes: hostname=ipmi-zfs-serv3 ipaddr=172.xx.xx.16 userid=USER
> passwd=GEHEIM interface=lan
>   Operations: monitor interval=60s (resIPMI-zfs3-monitor-interval-
> 60s)
> Fencing Levels:
> 
> Location Constraints:
>   Resource: resIPMI-zfs3
> Disabled on: zfs-serv3 (score:-INFINITY) (id:location-resIPMI-
> zfs3-zfs-serv3--INFINITY)
>   Resource: resIPMI-zfs4
> Disabled on: zfs-serv4 (score:-INFINITY) (id:location-resIPMI-
> zfs4-zfs-serv4--INFINITY)
> Ordering Constraints:
>   Resource Sets:
> set nfs-server vm_storage ha-ip action=start (id:pcs_rsc_set_nfs-
> server_vm_storage_ha-ip) (id:pcs_rsc_order_set_nfs-
> server_vm_storage_ha-ip)
> set ha-ip nfs-server vm_storage action=stop (id:pcs_rsc_set_ha-
> ip_nfs-server_vm_storage) (id:pcs_rsc_order_set_ha-ip_nfs-
> server_vm_storage)
> Colocation Constraints:
>   Resource Sets:
> set ha-ip nfs-server vm_storage (id:colocation-ha-ip-nfs-server-
> INFINITY-0) setoptions score=INFINITY (id:colocation-ha-ip-nfs-
> server-INFINITY)

I don't think your constraints are causing problems, but sets can be
difficult to follow. Your ordering/colocation constraints could be more
simply expressed as a group of nfs-server vm_storage ha-ip. With a
group, the cluster will do both ordering and colocation, in forward
order for start, and reverse 

Re: [ClusterLabs] corosync doesn't start any resource

2018-06-19 Thread Stefan Krueger
Hi Ken,

thanks for help!
I create a stonith-device and delete the no-quorum-policy.

It doesn't change anything, so I delete the orders, (co)locations and one 
ressource (nfs-server). at first it works fine but when I stop a cluster via 
'pcs cluster stop' it takes infinity time, it looks like it has an problem with 
the nfs server so I tried to stop them manuelly via systemctl stop nfs-server, 
but it didn't change anything - the nfs-server won't stop. So I did a reset the 
server, now everything should move to the other node but it also didn't happen 
:(

Manually I can start/stop the nfs-server without any problems (nobody mount the 
nfs-share yet):
systemctl start nfs-server.service ; sleep 5; systemctl status 
nfs-server.service ; sleep 5; systemctl stop nfs-server

so, again my ressources won't start
pcs status
Cluster name: zfs-vmstorage
Stack: corosync
Current DC: zfs-serv3 (version 1.1.16-94ff4df) - partition with quorum
Last updated: Tue Jun 19 16:15:37 2018
Last change: Tue Jun 19 15:41:24 2018 by hacluster via crmd on zfs-serv4

2 nodes configured
5 resources configured

Online: [ zfs-serv3 zfs-serv4 ]

Full list of resources:

 vm_storage (ocf::heartbeat:ZFS):   Stopped
 ha-ip  (ocf::heartbeat:IPaddr2):   Stopped
 resIPMI-zfs4   (stonith:external/ipmi):Started zfs-serv3
 resIPMI-zfs3   (stonith:external/ipmi):Started zfs-serv4
 nfs-server (systemd:nfs-server):   Stopped

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled




pcs config
Cluster Name: zfs-vmstorage
Corosync Nodes:
 zfs-serv3 zfs-serv4
Pacemaker Nodes:
 zfs-serv3 zfs-serv4

Resources:
 Resource: vm_storage (class=ocf provider=heartbeat type=ZFS)
  Attributes: pool=vm_storage importargs="-d /dev/disk/by-vdev/"
  Operations: monitor interval=5s timeout=30s (vm_storage-monitor-interval-5s)
  start interval=0s timeout=90 (vm_storage-start-interval-0s)
  stop interval=0s timeout=90 (vm_storage-stop-interval-0s)
 Resource: ha-ip (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=172.16.101.73 cidr_netmask=16
  Operations: start interval=0s timeout=20s (ha-ip-start-interval-0s)
  stop interval=0s timeout=20s (ha-ip-stop-interval-0s)
  monitor interval=10s timeout=20s (ha-ip-monitor-interval-10s)
 Resource: nfs-server (class=systemd type=nfs-server)
  Operations: start interval=0s timeout=100 (nfs-server-start-interval-0s)
  stop interval=0s timeout=100 (nfs-server-stop-interval-0s)
  monitor interval=60 timeout=100 (nfs-server-monitor-interval-60)

Stonith Devices:
 Resource: resIPMI-zfs4 (class=stonith type=external/ipmi)
  Attributes: hostname=ipmi-zfs-serv4 ipaddr=172.xx.xx.17 userid=USER 
passwd=GEHEIM interface=lan
  Operations: monitor interval=60s (resIPMI-zfs4-monitor-interval-60s)
 Resource: resIPMI-zfs3 (class=stonith type=external/ipmi)
  Attributes: hostname=ipmi-zfs-serv3 ipaddr=172.xx.xx.16 userid=USER 
passwd=GEHEIM interface=lan
  Operations: monitor interval=60s (resIPMI-zfs3-monitor-interval-60s)
Fencing Levels:

Location Constraints:
  Resource: resIPMI-zfs3
Disabled on: zfs-serv3 (score:-INFINITY) 
(id:location-resIPMI-zfs3-zfs-serv3--INFINITY)
  Resource: resIPMI-zfs4
Disabled on: zfs-serv4 (score:-INFINITY) 
(id:location-resIPMI-zfs4-zfs-serv4--INFINITY)
Ordering Constraints:
  Resource Sets:
set nfs-server vm_storage ha-ip action=start 
(id:pcs_rsc_set_nfs-server_vm_storage_ha-ip) 
(id:pcs_rsc_order_set_nfs-server_vm_storage_ha-ip)
set ha-ip nfs-server vm_storage action=stop 
(id:pcs_rsc_set_ha-ip_nfs-server_vm_storage) 
(id:pcs_rsc_order_set_ha-ip_nfs-server_vm_storage)
Colocation Constraints:
  Resource Sets:
set ha-ip nfs-server vm_storage (id:colocation-ha-ip-nfs-server-INFINITY-0) 
setoptions score=INFINITY (id:colocation-ha-ip-nfs-server-INFINITY)
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 resource-stickiness: 100
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: zfs-vmstorage
 dc-version: 1.1.16-94ff4df
 have-watchdog: false
 last-lrm-refresh: 1528814481
 no-quorum-policy: stop
 stonith-enabled: false

Quorum:
  Options:



thanks for help!
best regards
Stefan
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync doesn't start any resource

2018-06-18 Thread Ken Gaillot
On Fri, 2018-06-15 at 14:45 +0200, Stefan Krueger wrote:
> Hello, 
> 
> corosync doesn't start any ressource and I don't know why. I tried to
> stop/start the cluster, I also tried to reboot it but it doesn't
> help. also in the logs I dont find nothing that could be useful IMHO.
> 
> It would be very nice if someone can help me.
> 
> pcs status
> Cluster name: zfs-vmstorage
> Stack: corosync
> Current DC: zfs-serv3 (version 1.1.16-94ff4df) - partition with
> quorum
> Last updated: Fri Jun 15 14:42:32 2018
> Last change: Fri Jun 15 14:17:23 2018 by root via cibadmin on zfs-
> serv3
> 
> 2 nodes configured
> 3 resources configured
> 
> Online: [ zfs-serv3 zfs-serv4 ]
> 
> Full list of resources:
> 
>  nfs-server (systemd:nfs-server):   Stopped
>  vm_storage (ocf::heartbeat:ZFS):   Stopped
>  ha-ip  (ocf::heartbeat:IPaddr2):   Stopped
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> 
> 
> 
> pcs config
> Cluster Name: zfs-vmstorage
> Corosync Nodes:
>  zfs-serv3 zfs-serv4
> Pacemaker Nodes:
>  zfs-serv3 zfs-serv4
> 
> Resources:
>  Resource: nfs-server (class=systemd type=nfs-server)
>   Operations: start interval=0s timeout=100 (nfs-server-start-
> interval-0s)
>   stop interval=0s timeout=100 (nfs-server-stop-interval-
> 0s)
>   monitor interval=60 timeout=100 (nfs-server-monitor-
> interval-60)
>  Resource: vm_storage (class=ocf provider=heartbeat type=ZFS)
>   Attributes: pool=vm_storage importargs="-d /dev/disk/by-vdev/"
>   Operations: monitor interval=5s timeout=30s (vm_storage-monitor-
> interval-5s)
>   start interval=0s timeout=90 (vm_storage-start-
> interval-0s)
>   stop interval=0s timeout=90 (vm_storage-stop-interval-
> 0s)
>  Resource: ha-ip (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: ip=172.16.101.73 cidr_netmask=16
>   Operations: start interval=0s timeout=20s (ha-ip-start-interval-0s)
>   stop interval=0s timeout=20s (ha-ip-stop-interval-0s)
>   monitor interval=10s timeout=20s (ha-ip-monitor-
> interval-10s)
> 
> Stonith Devices:
> Fencing Levels:
> 
> Location Constraints:
> Ordering Constraints:
>   Resource Sets:
> set nfs-server vm_storage ha-ip action=start (id:pcs_rsc_set_nfs-
> server_vm_storage_ha-ip) (id:pcs_rsc_order_set_nfs-
> server_vm_storage_ha-ip)
> set ha-ip nfs-server vm_storage action=stop (id:pcs_rsc_set_ha-
> ip_nfs-server_vm_storage) (id:pcs_rsc_order_set_ha-ip_nfs-
> server_vm_storage)
> Colocation Constraints:
>   Resource Sets:
> set ha-ip nfs-server vm_storage (id:colocation-ha-ip-nfs-server-
> INFINITY-0) setoptions score=INFINITY (id:colocation-ha-ip-nfs-
> server-INFINITY)
> Ticket Constraints:
> 
> Alerts:
>  No alerts defined
> 
> Resources Defaults:
>  resource-stickiness: 100
> Operations Defaults:
>  No defaults set
> 
> Cluster Properties:
>  cluster-infrastructure: corosync
>  cluster-name: zfs-vmstorage
>  dc-version: 1.1.16-94ff4df
>  have-watchdog: false
>  last-lrm-refresh: 1528814481
>  no-quorum-policy: ignore

It's recommended to let no-quorum-policy default when using corosync 2,
and instead set "two_node: 1" in corosync.conf. In the old days, it was
necessary for pacemaker to ignore quorum with two nodes, but now,
corosync handles it better. With two_node, both nodes will need to be
online before the cluster can run, but once up, either node can go down
and the cluster will maintain quorum.

>  stonith-enabled: false

Without stonith, the cluster will be unable to recover from certain
failure scenarios, and there is a possibility of data corruption from a
split-brain situation. It's a good idea to get stonith configured and
tested before adding any resources to a cluster.

> 
> Quorum:
>   Options:
> 
> 
> 
> and here are the Log-files
> 
> https://paste.debian.net/hidden/9376add7/
> 
> best regards
> Stefan

As of the end of that log file, the cluster does intend to start the
resources:

Jun 15 14:29:11 [5623] zfs-serv3pengine:   notice: LogActions:  
Start   nfs-server  (zfs-serv3)
Jun 15 14:29:11 [5623] zfs-serv3pengine:   notice: LogActions:  
Start   vm_storage  (zfs-serv3)
Jun 15 14:29:11 [5623] zfs-serv3pengine:   notice: LogActions:  
Start   ha-ip   (zfs-serv3)

Later logs would show whether the start was successful or not.
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org