Re: [ClusterLabs] corosync doesn't start any resource
Hello Andrei, > Then you need to set symmetrical="false". yep, now it seems to work now, thank you very much! > I assume this would be "pcs constraint order set ... > symmetrical=false". yes almost: pcs constraint order set nfs-server vm_storage ha-ip action=start setoptions symmetrical=false Thank you very very much! best regards Stefan > Gesendet: Samstag, 23. Juni 2018 um 22:13 Uhr > Von: "Andrei Borzenkov" > An: users@clusterlabs.org > Betreff: Re: [ClusterLabs] corosync doesn't start any resource > > 22.06.2018 11:22, Stefan Krueger пишет: > > Hello Andrei, > > > > thanks for this hint, but I need this "special" order. In an other setup it > > works. > > > > Then you need to set symmetrical="false". Otherwise pacemaker implicitly > creates reverse order which leads to deadlock. I am not intimately > familiar with pcs, I assume this would be "pcs constraint order set ... > symmetrical=false". > > > best regards > > Stefan > > > >> Gesendet: Freitag, 22. Juni 2018 um 06:57 Uhr > >> Von: "Andrei Borzenkov" > >> An: users@clusterlabs.org > >> Betreff: Re: [ClusterLabs] corosync doesn't start any resource > >> > >> 21.06.2018 16:04, Stefan Krueger пишет: > >>> Hi Ken, > >>> > >>>> Can you attach the pe-input file listed just above here? > >>> done ;) > >>> > >>> And thank you for your patience! > >>> > >> > >> You delete all context which makes it hard to answer. This is not web > >> forum where users can simply scroll up to see previous reply. > >> > >> Both your logs and pe-input show that nfs-server and vm-storage wait for > >> each other. > >> > >> My best guess is that you have incorrect ordering for start and stop > >> which causes loop in pacemaker decision. Your start order is "nfs-server > >> vm-storage" and your stop order is "nfs-server vm-storage", while it > >> should normally be symmetrical. Reversing order in one of sets makes it > >> work as intended (verified). > >> > >> I would actually expect that asymmetrical configuration still should > >> work, so I leave it to pacemaker developers to comment whether this is a > >> bug or feature :) > >> > >> ___ > >> Users mailing list: Users@clusterlabs.org > >> https://lists.clusterlabs.org/mailman/listinfo/users > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > >> > > ___ > > Users mailing list: Users@clusterlabs.org > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync doesn't start any resource
22.06.2018 11:22, Stefan Krueger пишет: > Hello Andrei, > > thanks for this hint, but I need this "special" order. In an other setup it > works. > Then you need to set symmetrical="false". Otherwise pacemaker implicitly creates reverse order which leads to deadlock. I am not intimately familiar with pcs, I assume this would be "pcs constraint order set ... symmetrical=false". > best regards > Stefan > >> Gesendet: Freitag, 22. Juni 2018 um 06:57 Uhr >> Von: "Andrei Borzenkov" >> An: users@clusterlabs.org >> Betreff: Re: [ClusterLabs] corosync doesn't start any resource >> >> 21.06.2018 16:04, Stefan Krueger пишет: >>> Hi Ken, >>> >>>> Can you attach the pe-input file listed just above here? >>> done ;) >>> >>> And thank you for your patience! >>> >> >> You delete all context which makes it hard to answer. This is not web >> forum where users can simply scroll up to see previous reply. >> >> Both your logs and pe-input show that nfs-server and vm-storage wait for >> each other. >> >> My best guess is that you have incorrect ordering for start and stop >> which causes loop in pacemaker decision. Your start order is "nfs-server >> vm-storage" and your stop order is "nfs-server vm-storage", while it >> should normally be symmetrical. Reversing order in one of sets makes it >> work as intended (verified). >> >> I would actually expect that asymmetrical configuration still should >> work, so I leave it to pacemaker developers to comment whether this is a >> bug or feature :) >> >> ___ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync doesn't start any resource
Hello Andrei, thanks for this hint, but I need this "special" order. In an other setup it works. best regards Stefan > Gesendet: Freitag, 22. Juni 2018 um 06:57 Uhr > Von: "Andrei Borzenkov" > An: users@clusterlabs.org > Betreff: Re: [ClusterLabs] corosync doesn't start any resource > > 21.06.2018 16:04, Stefan Krueger пишет: > > Hi Ken, > > > >> Can you attach the pe-input file listed just above here? > > done ;) > > > > And thank you for your patience! > > > > You delete all context which makes it hard to answer. This is not web > forum where users can simply scroll up to see previous reply. > > Both your logs and pe-input show that nfs-server and vm-storage wait for > each other. > > My best guess is that you have incorrect ordering for start and stop > which causes loop in pacemaker decision. Your start order is "nfs-server > vm-storage" and your stop order is "nfs-server vm-storage", while it > should normally be symmetrical. Reversing order in one of sets makes it > work as intended (verified). > > I would actually expect that asymmetrical configuration still should > work, so I leave it to pacemaker developers to comment whether this is a > bug or feature :) > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync doesn't start any resource
21.06.2018 16:04, Stefan Krueger пишет: > Hi Ken, > >> Can you attach the pe-input file listed just above here? > done ;) > > And thank you for your patience! > You delete all context which makes it hard to answer. This is not web forum where users can simply scroll up to see previous reply. Both your logs and pe-input show that nfs-server and vm-storage wait for each other. My best guess is that you have incorrect ordering for start and stop which causes loop in pacemaker decision. Your start order is "nfs-server vm-storage" and your stop order is "nfs-server vm-storage", while it should normally be symmetrical. Reversing order in one of sets makes it work as intended (verified). I would actually expect that asymmetrical configuration still should work, so I leave it to pacemaker developers to comment whether this is a bug or feature :) ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync doesn't start any resource
Hi Ken, > Can you attach the pe-input file listed just above here? done ;) And thank you for your patience! best regards Stefan pre-input-228.bz2 Description: application/bzip ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync doesn't start any resource
On Wed, 2018-06-20 at 13:30 +0200, Stefan Krueger wrote: > Hi Ken, > > I don't see any issues in the logs, periodically this is in the logs: > > Jun 20 11:52:19 [5613] zfs-serv3 crmd: info: > crm_timer_popped:PEngine Recheck Timer (I_PE_CALC) just > popped (90ms) > Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: > do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE | > input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped > Jun 20 11:52:19 [5613] zfs-serv3 crmd: info: > do_state_transition: Progressed to state S_POLICY_ENGINE after > C_TIMER_POPPED > Jun 20 11:52:19 [5612] zfs-serv3pengine: info: > process_pe_message: Input has not changed since last time, not > saving to disk > Jun 20 11:52:19 [5612] zfs-serv3pengine: info: > determine_online_status: Node zfs-serv3 is online > Jun 20 11:52:19 [5612] zfs-serv3pengine: info: > determine_online_status: Node zfs-serv4 is online > Jun 20 11:52:19 [5612] zfs-serv3pengine: info: > native_print:vm_storage (ocf::heartbeat:ZFS): Stopped > Jun 20 11:52:19 [5612] zfs-serv3pengine: info: > native_print:ha-ip (ocf::heartbeat:IPaddr2): Stopped > Jun 20 11:52:19 [5612] zfs-serv3pengine: info: > native_print:resIPMI- > zfs4(stonith:external/ipmi):Started zfs-serv3 > Jun 20 11:52:19 [5612] zfs-serv3pengine: info: > native_print:resIPMI- > zfs3(stonith:external/ipmi):Started zfs-serv4 > Jun 20 11:52:19 [5612] zfs-serv3pengine: info: > native_print:nfs-server (systemd:nfs-server): Stopped > Jun 20 11:52:19 [5612] zfs-serv3pengine: info: > RecurringOp: Start recurring monitor (5s) for vm_storage on zfs- > serv3 > Jun 20 11:52:19 [5612] zfs-serv3pengine: info: > RecurringOp: Start recurring monitor (10s) for ha-ip on zfs- > serv3 > Jun 20 11:52:19 [5612] zfs-serv3pengine: info: > RecurringOp: Start recurring monitor (60s) for nfs-server on > zfs-serv3 > Jun 20 11:52:19 [5612] zfs-serv3pengine: notice: > LogActions: Start vm_storage (zfs-serv3) > Jun 20 11:52:19 [5612] zfs-serv3pengine: notice: > LogActions: Start ha-ip (zfs-serv3) > Jun 20 11:52:19 [5612] zfs-serv3pengine: info: > LogActions: Leave resIPMI-zfs4(Started zfs-serv3) > Jun 20 11:52:19 [5612] zfs-serv3pengine: info: > LogActions: Leave resIPMI-zfs3(Started zfs-serv4) > Jun 20 11:52:19 [5612] zfs-serv3pengine: notice: > LogActions: Start nfs-server (zfs-serv3) > Jun 20 11:52:19 [5612] zfs-serv3pengine: notice: > process_pe_message: Calculated transition 80, saving inputs in > /var/lib/pacemaker/pengine/pe-input-228.bz2 > Jun 20 11:52:19 [5613] zfs-serv3 crmd: info: > do_state_transition: State transition S_POLICY_ENGINE -> > S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE > origin=handle_response > Jun 20 11:52:19 [5613] zfs-serv3 crmd: info: > do_te_invoke:Processing graph 80 (ref=pe_calc-dc-1529488339-113) > derived from /var/lib/pacemaker/pengine/pe-input-228.bz2 > Jun 20 11:52:19 [5613] zfs-serv3 crmd: warning: > run_graph: Transition 80 (Complete=0, Pending=0, Fired=0, > Skipped=0, Incomplete=6, Source=/var/lib/pacemaker/pengine/pe-input- > 228.bz2): Terminated > Jun 20 11:52:19 [5613] zfs-serv3 crmd: warning: > te_graph_trigger:Transition failed: terminated Well this is why nothing starts, but I don't see any reason it happens. :-/ Can you attach the pe-input file listed just above here? > Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: > print_graph: Graph 80 with 6 actions: batch-limit=0 jobs, > network-delay=6ms > Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: > print_synapse: [Action5]: Pending rsc op > vm_storage_monitor_5000 on zfs-serv3 (priority: 0, > waiting: 4) > Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: > print_synapse: [Action4]: Pending rsc op > vm_storage_start_0 on zfs-serv3 (priority: 0, > waiting: 12) > Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: > print_synapse: [Action7]: Pending rsc op ha- > ip_monitor_1 on zfs-serv3 (priority: 0, > waiting: 6) > Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: > print_synapse: [Action6]: Pending rsc op ha- > ip_start_0 on zfs-serv3 (priority: 0, > waiting: 4 12) > Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: > print_synapse: [Action 13]: Pending rsc op nfs- > server_monitor_6on zfs-serv3 (priority: 0, > waiting: 12) > Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: > print_synapse: [Action 12]: Pending rsc op nfs- > server_start_0 on zfs-serv3 (priority: 0, > waiting: 4) > Jun 20 11:52:19 [5613] zfs-serv3 crmd: info: do_log: Input
Re: [ClusterLabs] corosync doesn't start any resource
Hi Ken, I don't see any issues in the logs, periodically this is in the logs: Jun 20 11:52:19 [5613] zfs-serv3 crmd: info: crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just popped (90ms) Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped Jun 20 11:52:19 [5613] zfs-serv3 crmd: info: do_state_transition: Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED Jun 20 11:52:19 [5612] zfs-serv3pengine: info: process_pe_message: Input has not changed since last time, not saving to disk Jun 20 11:52:19 [5612] zfs-serv3pengine: info: determine_online_status: Node zfs-serv3 is online Jun 20 11:52:19 [5612] zfs-serv3pengine: info: determine_online_status: Node zfs-serv4 is online Jun 20 11:52:19 [5612] zfs-serv3pengine: info: native_print: vm_storage (ocf::heartbeat:ZFS): Stopped Jun 20 11:52:19 [5612] zfs-serv3pengine: info: native_print:ha-ip (ocf::heartbeat:IPaddr2): Stopped Jun 20 11:52:19 [5612] zfs-serv3pengine: info: native_print: resIPMI-zfs4(stonith:external/ipmi):Started zfs-serv3 Jun 20 11:52:19 [5612] zfs-serv3pengine: info: native_print: resIPMI-zfs3(stonith:external/ipmi):Started zfs-serv4 Jun 20 11:52:19 [5612] zfs-serv3pengine: info: native_print: nfs-server (systemd:nfs-server): Stopped Jun 20 11:52:19 [5612] zfs-serv3pengine: info: RecurringOp: Start recurring monitor (5s) for vm_storage on zfs-serv3 Jun 20 11:52:19 [5612] zfs-serv3pengine: info: RecurringOp: Start recurring monitor (10s) for ha-ip on zfs-serv3 Jun 20 11:52:19 [5612] zfs-serv3pengine: info: RecurringOp: Start recurring monitor (60s) for nfs-server on zfs-serv3 Jun 20 11:52:19 [5612] zfs-serv3pengine: notice: LogActions: Start vm_storage (zfs-serv3) Jun 20 11:52:19 [5612] zfs-serv3pengine: notice: LogActions: Start ha-ip (zfs-serv3) Jun 20 11:52:19 [5612] zfs-serv3pengine: info: LogActions: Leave resIPMI-zfs4(Started zfs-serv3) Jun 20 11:52:19 [5612] zfs-serv3pengine: info: LogActions: Leave resIPMI-zfs3(Started zfs-serv4) Jun 20 11:52:19 [5612] zfs-serv3pengine: notice: LogActions: Start nfs-server (zfs-serv3) Jun 20 11:52:19 [5612] zfs-serv3pengine: notice: process_pe_message: Calculated transition 80, saving inputs in /var/lib/pacemaker/pengine/pe-input-228.bz2 Jun 20 11:52:19 [5613] zfs-serv3 crmd: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response Jun 20 11:52:19 [5613] zfs-serv3 crmd: info: do_te_invoke: Processing graph 80 (ref=pe_calc-dc-1529488339-113) derived from /var/lib/pacemaker/pengine/pe-input-228.bz2 Jun 20 11:52:19 [5613] zfs-serv3 crmd: warning: run_graph: Transition 80 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=6, Source=/var/lib/pacemaker/pengine/pe-input-228.bz2): Terminated Jun 20 11:52:19 [5613] zfs-serv3 crmd: warning: te_graph_trigger: Transition failed: terminated Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: print_graph: Graph 80 with 6 actions: batch-limit=0 jobs, network-delay=6ms Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: print_synapse: [Action 5]: Pending rsc op vm_storage_monitor_5000 on zfs-serv3 (priority: 0, waiting: 4) Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: print_synapse: [Action 4]: Pending rsc op vm_storage_start_0 on zfs-serv3 (priority: 0, waiting: 12) Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: print_synapse: [Action 7]: Pending rsc op ha-ip_monitor_1 on zfs-serv3 (priority: 0, waiting: 6) Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: print_synapse: [Action 6]: Pending rsc op ha-ip_start_0 on zfs-serv3 (priority: 0, waiting: 4 12) Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: print_synapse: [Action 13]: Pending rsc op nfs-server_monitor_6on zfs-serv3 (priority: 0, waiting: 12) Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: print_synapse: [Action 12]: Pending rsc op nfs-server_start_0 on zfs-serv3 (priority: 0, waiting: 4) Jun 20 11:52:19 [5613] zfs-serv3 crmd: info: do_log: Input I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd Jun 20 12:07:19 [5613] zfs-serv3 crmd: info: crm_timer_popped: PEngine
Re: [ClusterLabs] corosync doesn't start any resource
On Tue, 2018-06-19 at 16:17 +0200, Stefan Krueger wrote: > Hi Ken, > > thanks for help! > I create a stonith-device and delete the no-quorum-policy. > > It doesn't change anything, so I delete the orders, (co)locations and > one ressource (nfs-server). at first it works fine but when I stop a > cluster via 'pcs cluster stop' it takes infinity time, it looks like > it has an problem with the nfs server so I tried to stop them > manuelly via systemctl stop nfs-server, but it didn't change anything > - the nfs-server won't stop. So I did a reset the server, now > everything should move to the other node but it also didn't happen :( > > Manually I can start/stop the nfs-server without any problems (nobody > mount the nfs-share yet): > systemctl start nfs-server.service ; sleep 5; systemctl status nfs- > server.service ; sleep 5; systemctl stop nfs-server > > so, again my ressources won't start > pcs status > Cluster name: zfs-vmstorage > Stack: corosync > Current DC: zfs-serv3 (version 1.1.16-94ff4df) - partition with > quorum > Last updated: Tue Jun 19 16:15:37 2018 > Last change: Tue Jun 19 15:41:24 2018 by hacluster via crmd on zfs- > serv4 > > 2 nodes configured > 5 resources configured > > Online: [ zfs-serv3 zfs-serv4 ] > > Full list of resources: > > vm_storage (ocf::heartbeat:ZFS): Stopped > ha-ip (ocf::heartbeat:IPaddr2): Stopped > resIPMI-zfs4 (stonith:external/ipmi):Started zfs-serv3 > resIPMI-zfs3 (stonith:external/ipmi):Started zfs-serv4 > nfs-server (systemd:nfs-server): Stopped I'd check the logs for more information. It's odd that status doesn't show any failures, which suggests the cluster didn't schedule any actions. The system log will have the most essential information. The detail log (usually /var/log/pacemaker.log or /var/log/cluster/corosync.log) will have extended information. The most interesting will be messages from the pengine with actions to be scheduled ("Start", etc.). Then there should be messages from the crmd about "Initiating" the command and obtaining its "Result". > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > > > > > pcs config > Cluster Name: zfs-vmstorage > Corosync Nodes: > zfs-serv3 zfs-serv4 > Pacemaker Nodes: > zfs-serv3 zfs-serv4 > > Resources: > Resource: vm_storage (class=ocf provider=heartbeat type=ZFS) > Attributes: pool=vm_storage importargs="-d /dev/disk/by-vdev/" > Operations: monitor interval=5s timeout=30s (vm_storage-monitor- > interval-5s) > start interval=0s timeout=90 (vm_storage-start- > interval-0s) > stop interval=0s timeout=90 (vm_storage-stop-interval- > 0s) > Resource: ha-ip (class=ocf provider=heartbeat type=IPaddr2) > Attributes: ip=172.16.101.73 cidr_netmask=16 > Operations: start interval=0s timeout=20s (ha-ip-start-interval-0s) > stop interval=0s timeout=20s (ha-ip-stop-interval-0s) > monitor interval=10s timeout=20s (ha-ip-monitor- > interval-10s) > Resource: nfs-server (class=systemd type=nfs-server) > Operations: start interval=0s timeout=100 (nfs-server-start- > interval-0s) > stop interval=0s timeout=100 (nfs-server-stop-interval- > 0s) > monitor interval=60 timeout=100 (nfs-server-monitor- > interval-60) > > Stonith Devices: > Resource: resIPMI-zfs4 (class=stonith type=external/ipmi) > Attributes: hostname=ipmi-zfs-serv4 ipaddr=172.xx.xx.17 userid=USER > passwd=GEHEIM interface=lan > Operations: monitor interval=60s (resIPMI-zfs4-monitor-interval- > 60s) > Resource: resIPMI-zfs3 (class=stonith type=external/ipmi) > Attributes: hostname=ipmi-zfs-serv3 ipaddr=172.xx.xx.16 userid=USER > passwd=GEHEIM interface=lan > Operations: monitor interval=60s (resIPMI-zfs3-monitor-interval- > 60s) > Fencing Levels: > > Location Constraints: > Resource: resIPMI-zfs3 > Disabled on: zfs-serv3 (score:-INFINITY) (id:location-resIPMI- > zfs3-zfs-serv3--INFINITY) > Resource: resIPMI-zfs4 > Disabled on: zfs-serv4 (score:-INFINITY) (id:location-resIPMI- > zfs4-zfs-serv4--INFINITY) > Ordering Constraints: > Resource Sets: > set nfs-server vm_storage ha-ip action=start (id:pcs_rsc_set_nfs- > server_vm_storage_ha-ip) (id:pcs_rsc_order_set_nfs- > server_vm_storage_ha-ip) > set ha-ip nfs-server vm_storage action=stop (id:pcs_rsc_set_ha- > ip_nfs-server_vm_storage) (id:pcs_rsc_order_set_ha-ip_nfs- > server_vm_storage) > Colocation Constraints: > Resource Sets: > set ha-ip nfs-server vm_storage (id:colocation-ha-ip-nfs-server- > INFINITY-0) setoptions score=INFINITY (id:colocation-ha-ip-nfs- > server-INFINITY) I don't think your constraints are causing problems, but sets can be difficult to follow. Your ordering/colocation constraints could be more simply expressed as a group of nfs-server vm_storage ha-ip. With a group, the cluster will do both ordering and colocation, in forward order for start, and reverse
Re: [ClusterLabs] corosync doesn't start any resource
Hi Ken, thanks for help! I create a stonith-device and delete the no-quorum-policy. It doesn't change anything, so I delete the orders, (co)locations and one ressource (nfs-server). at first it works fine but when I stop a cluster via 'pcs cluster stop' it takes infinity time, it looks like it has an problem with the nfs server so I tried to stop them manuelly via systemctl stop nfs-server, but it didn't change anything - the nfs-server won't stop. So I did a reset the server, now everything should move to the other node but it also didn't happen :( Manually I can start/stop the nfs-server without any problems (nobody mount the nfs-share yet): systemctl start nfs-server.service ; sleep 5; systemctl status nfs-server.service ; sleep 5; systemctl stop nfs-server so, again my ressources won't start pcs status Cluster name: zfs-vmstorage Stack: corosync Current DC: zfs-serv3 (version 1.1.16-94ff4df) - partition with quorum Last updated: Tue Jun 19 16:15:37 2018 Last change: Tue Jun 19 15:41:24 2018 by hacluster via crmd on zfs-serv4 2 nodes configured 5 resources configured Online: [ zfs-serv3 zfs-serv4 ] Full list of resources: vm_storage (ocf::heartbeat:ZFS): Stopped ha-ip (ocf::heartbeat:IPaddr2): Stopped resIPMI-zfs4 (stonith:external/ipmi):Started zfs-serv3 resIPMI-zfs3 (stonith:external/ipmi):Started zfs-serv4 nfs-server (systemd:nfs-server): Stopped Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled pcs config Cluster Name: zfs-vmstorage Corosync Nodes: zfs-serv3 zfs-serv4 Pacemaker Nodes: zfs-serv3 zfs-serv4 Resources: Resource: vm_storage (class=ocf provider=heartbeat type=ZFS) Attributes: pool=vm_storage importargs="-d /dev/disk/by-vdev/" Operations: monitor interval=5s timeout=30s (vm_storage-monitor-interval-5s) start interval=0s timeout=90 (vm_storage-start-interval-0s) stop interval=0s timeout=90 (vm_storage-stop-interval-0s) Resource: ha-ip (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=172.16.101.73 cidr_netmask=16 Operations: start interval=0s timeout=20s (ha-ip-start-interval-0s) stop interval=0s timeout=20s (ha-ip-stop-interval-0s) monitor interval=10s timeout=20s (ha-ip-monitor-interval-10s) Resource: nfs-server (class=systemd type=nfs-server) Operations: start interval=0s timeout=100 (nfs-server-start-interval-0s) stop interval=0s timeout=100 (nfs-server-stop-interval-0s) monitor interval=60 timeout=100 (nfs-server-monitor-interval-60) Stonith Devices: Resource: resIPMI-zfs4 (class=stonith type=external/ipmi) Attributes: hostname=ipmi-zfs-serv4 ipaddr=172.xx.xx.17 userid=USER passwd=GEHEIM interface=lan Operations: monitor interval=60s (resIPMI-zfs4-monitor-interval-60s) Resource: resIPMI-zfs3 (class=stonith type=external/ipmi) Attributes: hostname=ipmi-zfs-serv3 ipaddr=172.xx.xx.16 userid=USER passwd=GEHEIM interface=lan Operations: monitor interval=60s (resIPMI-zfs3-monitor-interval-60s) Fencing Levels: Location Constraints: Resource: resIPMI-zfs3 Disabled on: zfs-serv3 (score:-INFINITY) (id:location-resIPMI-zfs3-zfs-serv3--INFINITY) Resource: resIPMI-zfs4 Disabled on: zfs-serv4 (score:-INFINITY) (id:location-resIPMI-zfs4-zfs-serv4--INFINITY) Ordering Constraints: Resource Sets: set nfs-server vm_storage ha-ip action=start (id:pcs_rsc_set_nfs-server_vm_storage_ha-ip) (id:pcs_rsc_order_set_nfs-server_vm_storage_ha-ip) set ha-ip nfs-server vm_storage action=stop (id:pcs_rsc_set_ha-ip_nfs-server_vm_storage) (id:pcs_rsc_order_set_ha-ip_nfs-server_vm_storage) Colocation Constraints: Resource Sets: set ha-ip nfs-server vm_storage (id:colocation-ha-ip-nfs-server-INFINITY-0) setoptions score=INFINITY (id:colocation-ha-ip-nfs-server-INFINITY) Ticket Constraints: Alerts: No alerts defined Resources Defaults: resource-stickiness: 100 Operations Defaults: No defaults set Cluster Properties: cluster-infrastructure: corosync cluster-name: zfs-vmstorage dc-version: 1.1.16-94ff4df have-watchdog: false last-lrm-refresh: 1528814481 no-quorum-policy: stop stonith-enabled: false Quorum: Options: thanks for help! best regards Stefan ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync doesn't start any resource
On Fri, 2018-06-15 at 14:45 +0200, Stefan Krueger wrote: > Hello, > > corosync doesn't start any ressource and I don't know why. I tried to > stop/start the cluster, I also tried to reboot it but it doesn't > help. also in the logs I dont find nothing that could be useful IMHO. > > It would be very nice if someone can help me. > > pcs status > Cluster name: zfs-vmstorage > Stack: corosync > Current DC: zfs-serv3 (version 1.1.16-94ff4df) - partition with > quorum > Last updated: Fri Jun 15 14:42:32 2018 > Last change: Fri Jun 15 14:17:23 2018 by root via cibadmin on zfs- > serv3 > > 2 nodes configured > 3 resources configured > > Online: [ zfs-serv3 zfs-serv4 ] > > Full list of resources: > > nfs-server (systemd:nfs-server): Stopped > vm_storage (ocf::heartbeat:ZFS): Stopped > ha-ip (ocf::heartbeat:IPaddr2): Stopped > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > > > > > pcs config > Cluster Name: zfs-vmstorage > Corosync Nodes: > zfs-serv3 zfs-serv4 > Pacemaker Nodes: > zfs-serv3 zfs-serv4 > > Resources: > Resource: nfs-server (class=systemd type=nfs-server) > Operations: start interval=0s timeout=100 (nfs-server-start- > interval-0s) > stop interval=0s timeout=100 (nfs-server-stop-interval- > 0s) > monitor interval=60 timeout=100 (nfs-server-monitor- > interval-60) > Resource: vm_storage (class=ocf provider=heartbeat type=ZFS) > Attributes: pool=vm_storage importargs="-d /dev/disk/by-vdev/" > Operations: monitor interval=5s timeout=30s (vm_storage-monitor- > interval-5s) > start interval=0s timeout=90 (vm_storage-start- > interval-0s) > stop interval=0s timeout=90 (vm_storage-stop-interval- > 0s) > Resource: ha-ip (class=ocf provider=heartbeat type=IPaddr2) > Attributes: ip=172.16.101.73 cidr_netmask=16 > Operations: start interval=0s timeout=20s (ha-ip-start-interval-0s) > stop interval=0s timeout=20s (ha-ip-stop-interval-0s) > monitor interval=10s timeout=20s (ha-ip-monitor- > interval-10s) > > Stonith Devices: > Fencing Levels: > > Location Constraints: > Ordering Constraints: > Resource Sets: > set nfs-server vm_storage ha-ip action=start (id:pcs_rsc_set_nfs- > server_vm_storage_ha-ip) (id:pcs_rsc_order_set_nfs- > server_vm_storage_ha-ip) > set ha-ip nfs-server vm_storage action=stop (id:pcs_rsc_set_ha- > ip_nfs-server_vm_storage) (id:pcs_rsc_order_set_ha-ip_nfs- > server_vm_storage) > Colocation Constraints: > Resource Sets: > set ha-ip nfs-server vm_storage (id:colocation-ha-ip-nfs-server- > INFINITY-0) setoptions score=INFINITY (id:colocation-ha-ip-nfs- > server-INFINITY) > Ticket Constraints: > > Alerts: > No alerts defined > > Resources Defaults: > resource-stickiness: 100 > Operations Defaults: > No defaults set > > Cluster Properties: > cluster-infrastructure: corosync > cluster-name: zfs-vmstorage > dc-version: 1.1.16-94ff4df > have-watchdog: false > last-lrm-refresh: 1528814481 > no-quorum-policy: ignore It's recommended to let no-quorum-policy default when using corosync 2, and instead set "two_node: 1" in corosync.conf. In the old days, it was necessary for pacemaker to ignore quorum with two nodes, but now, corosync handles it better. With two_node, both nodes will need to be online before the cluster can run, but once up, either node can go down and the cluster will maintain quorum. > stonith-enabled: false Without stonith, the cluster will be unable to recover from certain failure scenarios, and there is a possibility of data corruption from a split-brain situation. It's a good idea to get stonith configured and tested before adding any resources to a cluster. > > Quorum: > Options: > > > > and here are the Log-files > > https://paste.debian.net/hidden/9376add7/ > > best regards > Stefan As of the end of that log file, the cluster does intend to start the resources: Jun 15 14:29:11 [5623] zfs-serv3pengine: notice: LogActions: Start nfs-server (zfs-serv3) Jun 15 14:29:11 [5623] zfs-serv3pengine: notice: LogActions: Start vm_storage (zfs-serv3) Jun 15 14:29:11 [5623] zfs-serv3pengine: notice: LogActions: Start ha-ip (zfs-serv3) Later logs would show whether the start was successful or not. -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org