Re: [ClusterLabs] two node cluster: vm starting - shutting down 15min later - starting again 15min later ... and so on
On 02/10/2017 06:49 AM, Lentes, Bernd wrote: > > > - On Feb 10, 2017, at 1:10 AM, Ken Gaillot kgail...@redhat.com wrote: > >> On 02/09/2017 10:48 AM, Lentes, Bernd wrote: >>> Hi, >>> >>> i have a two node cluster with a vm as a resource. Currently i'm just >>> testing >>> and playing. My vm boots and shuts down again in 15min gaps. >>> Surely this is related to "PEngine Recheck Timer (I_PE_CALC) just popped >>> (90ms)" found in the logs. I googled, and it is said that this >>> is due to time-based rule >>> (http://oss.clusterlabs.org/pipermail/pacemaker/2009-May/001647.html). OK. >>> But i don't have any time-based rules. >>> This is the config for my vm: >>> >>> primitive prim_vm_mausdb VirtualDomain \ >>> params config="/var/lib/libvirt/images/xml/mausdb_vm.xml" \ >>> params hypervisor="qemu:///system" \ >>> params migration_transport=ssh \ >>> op start interval=0 timeout=90 \ >>> op stop interval=0 timeout=95 \ >>> op monitor interval=30 timeout=30 \ >>> op migrate_from interval=0 timeout=100 \ >>> op migrate_to interval=0 timeout=120 \ >>> meta allow-migrate=true \ >>> meta target-role=Started \ >>> utilization cpu=2 hv_memory=4099 >>> >>> The only constraint concerning the vm i had was a location (which i didn't >>> create). >> >> What is the constraint? If its ID starts with "cli-", it was created by >> a command-line tool (such as crm_resource, crm shell or pcs, generally >> for a "move" or "ban" command). >> > I deleted the one i mentioned, but now i have two again. I didn't create them. > Does the crm create constraints itself ? > > location cli-ban-prim_vm_mausdb-on-ha-idg-2 prim_vm_mausdb role=Started -inf: > ha-idg-2 > location cli-prefer-prim_vm_mausdb prim_vm_mausdb role=Started inf: ha-idg-2 The command-line tool you use creates them. If you're using crm_resource, they're created by crm_resource --move/--ban. If you're using pcs, they're created by pcs resource move/ban. Etc. > One location constraint inf, one -inf for the same resource on the same node. > Isn't that senseless ? Yes, but that's what you told it to do :-) The command-line tools move or ban resources by setting constraints to achieve that effect. Those constraints are permanent until you remove them. How to clear them again depends on which tool you use ... crm_resource --clear, pcs resource clear, etc. > > "crm resorce scores" show -inf for that resource on that node: > native_color: prim_vm_mausdb allocation score on ha-idg-1: 100 > native_color: prim_vm_mausdb allocation score on ha-idg-2: -INFINITY > > Is -inf stronger ? > Is it true that only the values for "native_color" are notable ? > > A principle question: When i have trouble to start/stop/migrate resources, > is it senseful to do a "crm resource cleanup" before trying again ? > (Beneath finding the reason for the trouble). It's best to figure out what the problem is first, make sure that's taken care of, then clean up. The cluster might or might not do anything when you clean up, depending on what stickiness you have, your failure handling settings, etc. > Sorry for asking basic stuff. I read a lot before, but in practise it's total > different. > Although i just have a vm as a resource, and i'm only testing, i'm sometimes > astonished about the > complexity of a simple two node cluster: scores, failcounts, constraints, > default values for a lot of variables ... > you have to keep an eye on a lot of stuff. > > Bernd > > > Helmholtz Zentrum Muenchen > Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) > Ingolstaedter Landstr. 1 > 85764 Neuherberg > www.helmholtz-muenchen.de > Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe > Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons > Enhsen > Registergericht: Amtsgericht Muenchen HRB 6466 > USt-IdNr: DE 129521671 ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] two node cluster: vm starting - shutting down 15min later - starting again 15min later ... and so on
- On Feb 10, 2017, at 1:10 AM, Ken Gaillot kgail...@redhat.com wrote: > On 02/09/2017 10:48 AM, Lentes, Bernd wrote: >> Hi, >> >> i have a two node cluster with a vm as a resource. Currently i'm just testing >> and playing. My vm boots and shuts down again in 15min gaps. >> Surely this is related to "PEngine Recheck Timer (I_PE_CALC) just popped >> (90ms)" found in the logs. I googled, and it is said that this >> is due to time-based rule >> (http://oss.clusterlabs.org/pipermail/pacemaker/2009-May/001647.html). OK. >> But i don't have any time-based rules. >> This is the config for my vm: >> >> primitive prim_vm_mausdb VirtualDomain \ >> params config="/var/lib/libvirt/images/xml/mausdb_vm.xml" \ >> params hypervisor="qemu:///system" \ >> params migration_transport=ssh \ >> op start interval=0 timeout=90 \ >> op stop interval=0 timeout=95 \ >> op monitor interval=30 timeout=30 \ >> op migrate_from interval=0 timeout=100 \ >> op migrate_to interval=0 timeout=120 \ >> meta allow-migrate=true \ >> meta target-role=Started \ >> utilization cpu=2 hv_memory=4099 >> >> The only constraint concerning the vm i had was a location (which i didn't >> create). > > What is the constraint? If its ID starts with "cli-", it was created by > a command-line tool (such as crm_resource, crm shell or pcs, generally > for a "move" or "ban" command). > I deleted the one i mentioned, but now i have two again. I didn't create them. Does the crm create constraints itself ? location cli-ban-prim_vm_mausdb-on-ha-idg-2 prim_vm_mausdb role=Started -inf: ha-idg-2 location cli-prefer-prim_vm_mausdb prim_vm_mausdb role=Started inf: ha-idg-2 One location constraint inf, one -inf for the same resource on the same node. Isn't that senseless ? "crm resorce scores" show -inf for that resource on that node: native_color: prim_vm_mausdb allocation score on ha-idg-1: 100 native_color: prim_vm_mausdb allocation score on ha-idg-2: -INFINITY Is -inf stronger ? Is it true that only the values for "native_color" are notable ? A principle question: When i have trouble to start/stop/migrate resources, is it senseful to do a "crm resource cleanup" before trying again ? (Beneath finding the reason for the trouble). Sorry for asking basic stuff. I read a lot before, but in practise it's total different. Although i just have a vm as a resource, and i'm only testing, i'm sometimes astonished about the complexity of a simple two node cluster: scores, failcounts, constraints, default values for a lot of variables ... you have to keep an eye on a lot of stuff. Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] two node cluster: vm starting - shutting down 15min later - starting again 15min later ... and so on
On 02/09/2017 10:48 AM, Lentes, Bernd wrote: > Hi, > > i have a two node cluster with a vm as a resource. Currently i'm just testing > and playing. My vm boots and shuts down again in 15min gaps. > Surely this is related to "PEngine Recheck Timer (I_PE_CALC) just popped > (90ms)" found in the logs. I googled, and it is said that this > is due to time-based rule > (http://oss.clusterlabs.org/pipermail/pacemaker/2009-May/001647.html). OK. > But i don't have any time-based rules. > This is the config for my vm: > > primitive prim_vm_mausdb VirtualDomain \ > params config="/var/lib/libvirt/images/xml/mausdb_vm.xml" \ > params hypervisor="qemu:///system" \ > params migration_transport=ssh \ > op start interval=0 timeout=90 \ > op stop interval=0 timeout=95 \ > op monitor interval=30 timeout=30 \ > op migrate_from interval=0 timeout=100 \ > op migrate_to interval=0 timeout=120 \ > meta allow-migrate=true \ > meta target-role=Started \ > utilization cpu=2 hv_memory=4099 > > The only constraint concerning the vm i had was a location (which i didn't > create). What is the constraint? If its ID starts with "cli-", it was created by a command-line tool (such as crm_resource, crm shell or pcs, generally for a "move" or "ban" command). > Ok, this timer is available, i can set it to zero to disable it. The timer is used for multiple purposes; I wouldn't recommend disabling it. Also, this doesn't fix the problem; the problem will still occur whenever the cluster recalculates, just not on a regular time schedule. > But why does it influence my vm in such a manner ? > > Excerp from the log: > > ... > Feb 9 16:19:38 ha-idg-1 VirtualDomain(prim_vm_mausdb)[13148]: INFO: Domain > mausdb_vm already stopped. > Feb 9 16:19:38 ha-idg-1 crmd[8407]: notice: process_lrm_event: Operation > prim_vm_mausdb_stop_0: ok (node=ha-idg-1, call=401, rc=0, cib-update=340, > confirmed=true) > Feb 9 16:19:38 ha-idg-1 kernel: [852506.947196] device vnet0 entered > promiscuous mode > Feb 9 16:19:38 ha-idg-1 kernel: [852507.008770] br0: port 2(vnet0) entering > forwarding state > Feb 9 16:19:38 ha-idg-1 kernel: [852507.008775] br0: port 2(vnet0) entering > forwarding state > Feb 9 16:19:38 ha-idg-1 kernel: [852507.172120] qemu-kvm: sending ioctl 5326 > to a partition! > Feb 9 16:19:38 ha-idg-1 kernel: [852507.172133] qemu-kvm: sending ioctl > 80200204 to a partition! > Feb 9 16:19:41 ha-idg-1 crmd[8407]: notice: process_lrm_event: Operation > prim_vm_mausdb_start_0: ok (node=ha-idg-1, call=402, rc=0, cib-update=341, > confirmed=true) > Feb 9 16:19:41 ha-idg-1 crmd[8407]: notice: process_lrm_event: Operation > prim_vm_mausdb_monitor_3: ok (node=ha-idg-1, call=403, rc=0, > cib-update=342, confirmed=false) > Feb 9 16:19:48 ha-idg-1 kernel: [852517.049015] vnet0: no IPv6 routers > present > ... > Feb 9 16:34:41 ha-idg-1 VirtualDomain(prim_vm_mausdb)[18272]: INFO: Issuing > graceful shutdown request for domain mausdb_vm. > Feb 9 16:35:06 ha-idg-1 kernel: [853434.550089] br0: port 2(vnet0) entering > forwarding state > Feb 9 16:35:06 ha-idg-1 kernel: [853434.550160] device vnet0 left > promiscuous mode > Feb 9 16:35:06 ha-idg-1 kernel: [853434.550165] br0: port 2(vnet0) entering > disabled state > Feb 9 16:35:06 ha-idg-1 ifdown: vnet0 > Feb 9 16:35:06 ha-idg-1 ifdown: Interface not available and no configuration > found. > Feb 9 16:35:07 ha-idg-1 crmd[8407]: notice: process_lrm_event: Operation > prim_vm_mausdb_stop_0: ok (node=ha-idg-1, call=405, rc=0, cib-update=343, > confirmed=true) > ... > > I deleted the location and until that vm is running fine for already 35min. The logs don't go far back enough to have an idea why the VM was stopped. Also, logs from the other node might be relevant, if it was the DC (controller) at the time. > System is SLES 11 SP4 64bit, vm is SLES 10 SP4 64bit. > > Thanks. > > Bernd ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org