[ovirt-users] Re: engine mails about FSM states
In borker.log I found this, just before 05:59am:Thread-3::INFO::2018-12-13 05:58:45,634::mem_free::51::mem_free.MemFree::(action) memFree: 82101 Thread-1::INFO::2018-12-13 05:58:46,322::ping::60::ping.Ping::(action) Successfully pinged 10.0.1.254 Thread-5::INFO::2018-12-13 05:58:46,611::engine_health::241::engine_health.EngineHealth::(_result_from_stats) VM is up on this host with healthy engine Thread-2::INFO::2018-12-13 05:58:49,144::mgmt_bridge::62::mgmt_bridge.MgmtBridge::(action) Found bridge ovirtmgmt with ports StatusStorageThread::ERROR::2018-12-13 05:58:54,935::status_broker::90::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to update state. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 82, in run if (self._status_broker._inquire_whiteboard_lock() or File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 190, in _inquire_whiteboard_lock self.host_id, self._lease_file) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 128, in host_id raise ex.HostIdNotLockedError("Host id is not set") HostIdNotLockedError: Host id is not set StatusStorageThread::ERROR::2018-12-13 05:58:54,937::status_broker::70::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(trigger_restart) Trying to restart the broker"Host is not set" ??? -- Regards, Frank Le Vendredi, Décembre 14, 2018 12:27 CET, Martin Sivak a écrit: Hi, check the broker.log as well. The connect is used to talk to ovirt-ha-broker service socket. Best regards Martin Sivak On Fri, Dec 14, 2018 at 12:20 PM fsoyer wrote:I think I have it in agent.log. What can be this "file not found" ? MainThread::ERROR::2018-12-13 05:59:03,909::hosted_engine::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Unhandled monitoring loop exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 428, in start_monitoring self._monitoring_loop() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 447, in _monitoring_loop for old_state, state, delay in self.fsm: File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py", line 127, in next new_data = self.refresh(self._state.data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py", line 81, in refresh stats.update(self.hosted_engine.collect_stats()) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 736, in collect_stats all_stats = self._broker.get_stats_from_storage() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 135, in get_stats_from_storage result = self._proxy.get_stats() File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__ return self.__send(self.__name, args) File "/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request verbose=self.__verbose File "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request return self.single_request(host, handler, request_body, verbose) File "/usr/lib64/python2.7/xmlrpclib.py", line 1301, in single_request self.send_content(h, request_body) File "/usr/lib64/python2.7/xmlrpclib.py", line 1448, in send_content connection.endheaders(request_body) File "/usr/lib64/python2.7/httplib.py", line 1037, in endheaders self._send_output(message_body) File "/usr/lib64/python2.7/httplib.py", line 881, in _send_output self.send(msg) File "/usr/lib64/python2.7/httplib.py", line 843, in send self.connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 52, in connect self.sock.connect(base64.b16decode(self.host)) File "/usr/lib64/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) error: [Errno 2] No such file or directory MainThread::ERROR::2018-12-13 05:59:04,043::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 435, in start_monitoring self.publish(stopped) File "/usr/lib/python2.7/site-packages/ovirt_h
[ovirt-users] Re: engine mails about FSM states
a.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::INFO::2018-12-13 05:59:04,044::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down MainThread::INFO::2018-12-13 05:59:14,923::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.2.16 started -- Cordialement, Frank Soyer Mob. 06 72 28 38 53 - Fix. 05 49 50 52 34 Systea IG Administration systèmes, réseaux et bases de données www.systea.net Membre du réseau Les Professionnels du Numérique KoGite Hébergement de proximité www.kogite.fr Le Vendredi, Décembre 14, 2018 12:11 CET, Martin Sivak a écrit: Hi, no StartState is not common, it is only ever entered when the agent boots up. So something restarted or killed the agent process. Check the agent log in /var/log/ovirt-hosted-engine-ha for errors. Best regards Martin Sivak On Fri, Dec 14, 2018 at 12:05 PM fsoyer wrote: > > Hi Martin, > my problem is that nobody restarted the agent. Do you mean that this is not a > normal behavior ? Is it possible that it restarts itself ? > > Thanks > -- > > Regards, > > Frank > > > > Le Jeudi, Décembre 13, 2018 15:25 CET, Martin Sivak a > écrit: > > > Hi, > > those are state change notifications from the hosted engine agent. It > basically means somebody restarted the ha-agent process and it found > out the VM is still running fine and returned to the proper state. > > Configuring it is possible using the broker.conf file in > /etc/ovirt-hosted-engine-ha (look for the notification section) or the > hosted-engine tool (search --help for set config) depending on the > version of hosted engine you are using. > > Best regards > > -- > Martin Sivak > > > On Thu, Dec 13, 2018 at 3:10 PM fsoyer wrote: > > > > Hi, > > I don't find revelant answer about this. Sorry il this was already asked. > > I receive randomly (one or two tims a week, differents hours) 3 mails with > > this subjects : > > first : ovirt-hosted-engine state transition StartState-ReinitializeFSM > > second : ovirt-hosted-engine state transition ReinitializeFSM-EngineStarting > > third : ovirt-hosted-engine state transition EngineStarting-EngineUp > > all at exactly the same time. The "events" in GUI doesn't indicate anything > > about this. No impact on engine or VMs. > > So I wonder what this messages means ? And, if case of just "info" > > messages, is there a way to disable them ? > > > > Thanks. > > -- > > > > Reagrds, > > > > Frank > > > > ___ > > Users mailing list -- users@ovirt.org > > To unsubscribe send an email to users-le...@ovirt.org > > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > > oVirt Code of Conduct: > > https://www.ovirt.org/community/about/community-guidelines/ > > List Archives: > > https://lists.ovirt.org/archives/list/users@ovirt.org/message/CVEHTWILWDEHASTCQHFHX62U4K4ZCOSK/ > > > > ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXCKOVSGK42ZNTG2KOEIBW65CD4ET6B4/
[ovirt-users] Re: engine mails about FSM states
Hi Martin, my problem is that nobody restarted the agent. Do you mean that this is not a normal behavior ? Is it possible that it restarts itself ? Thanks -- Regards, Frank Le Jeudi, Décembre 13, 2018 15:25 CET, Martin Sivak a écrit: Hi, those are state change notifications from the hosted engine agent. It basically means somebody restarted the ha-agent process and it found out the VM is still running fine and returned to the proper state. Configuring it is possible using the broker.conf file in /etc/ovirt-hosted-engine-ha (look for the notification section) or the hosted-engine tool (search --help for set config) depending on the version of hosted engine you are using. Best regards -- Martin Sivak On Thu, Dec 13, 2018 at 3:10 PM fsoyer wrote: > > Hi, > I don't find revelant answer about this. Sorry il this was already asked. > I receive randomly (one or two tims a week, differents hours) 3 mails with > this subjects : > first : ovirt-hosted-engine state transition StartState-ReinitializeFSM > second : ovirt-hosted-engine state transition ReinitializeFSM-EngineStarting > third : ovirt-hosted-engine state transition EngineStarting-EngineUp > all at exactly the same time. The "events" in GUI doesn't indicate anything > about this. No impact on engine or VMs. > So I wonder what this messages means ? And, if case of just "info" messages, > is there a way to disable them ? > > Thanks. > -- > > Reagrds, > > Frank > > ___ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/CVEHTWILWDEHASTCQHFHX62U4K4ZCOSK/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/JKVPQ2ZTQHH2U4C6JJN6ZMBYHBGK2P5E/
[ovirt-users] engine mails about FSM states
Hi, I don't find revelant answer about this. Sorry il this was already asked. I receive randomly (one or two tims a week, differents hours) 3 mails with this subjects : first : ovirt-hosted-engine state transition StartState-ReinitializeFSM second : ovirt-hosted-engine state transition ReinitializeFSM-EngineStarting third : ovirt-hosted-engine state transition EngineStarting-EngineUp all at exactly the same time. The "events" in GUI doesn't indicate anything about this. No impact on engine or VMs. So I wonder what this messages means ? And, if case of just "info" messages, is there a way to disable them ? Thanks. -- Reagrds, Frank ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/CVEHTWILWDEHASTCQHFHX62U4K4ZCOSK/
[ovirt-users] Re: VM ramdomly unresponsive
Hi, I can say now that the problem was related to storage performances, as there was no more errors since the replacement of the raid cards. Thanks for all, Frank Le Mardi, Novembre 27, 2018 08:30 CET, Sahina Bose a écrit: On Tue, Nov 13, 2018 at 4:46 PM fsoyer wrote: > > Hi all, > I continue to try to understand my problem between (I suppose) oVirt anf > Gluster. > After my recents posts titled 'VMs unexpectidly restarted' that did not > provide solution nor search idea, I submit to you another (related ?) problem. > Parallely with the problem of VMs down (that did not reproduce since Oct 16), > I have ramdomly some events in the GUI saying "VM x is not responding." > For example, VM "patjoub1" on 2018-11-11 14:34. Never the same hour, not all > the days, often this VM patjoub1 but not always : I had it on two others. All > VMs disks are on a volume DATA02 (with leases on the same volume). > > Searching in engine.log, I found : > > 2018-11-11 14:34:32,953+01 INFO > [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] > (EE-ManagedThreadFactory-engineScheduled-Thread-28) [] VM > '6116fb07-096b-4c7e-97fe-01ecc9a6bd9b'(patjoub1) moved from 'Up' --> > 'NotResponding' > 2018-11-11 14:34:33,116+01 WARN > [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder] > (EE-ManagedThreadFactory-engineScheduled-Thread-1) [] Invalid or unknown > guest architecture type '' received from guest agent > 2018-11-11 14:34:33,176+01 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engineScheduled-Thread-28) [] EVENT_ID: > VM_NOT_RESPONDING(126), VM patjoub1 is not responding. > ... > ... > 2018-11-11 14:34:48,278+01 INFO > [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] > (EE-ManagedThreadFactory-engineScheduled-Thread-48) [] VM > '6116fb07-096b-4c7e-97fe-01ecc9a6bd9b'(patjoub1) moved from 'NotResponding' > --> 'Up' > > So it becomes up 15s after, and the VM (and the monitoring) see no downtime. > At this time, I see in vdsm.log of the nodes : > > 2018-11-11 14:33:49,450+0100 ERROR (check/loop) [storage.Monitor] Error > checking path > /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata > (monitor:498) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 496, in > _pathChecked > delay = result.delay() > File "/usr/lib/python2.7/site-packages/vdsm/storage/check.py", line 391, in > delay > raise exception.MiscFileReadException(self.path, self.rc, self.err) > MiscFileReadException: Internal file read failure: > (u'/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata', > 1, 'Read timeout') > 2018-11-11 14:33:49,450+0100 INFO (check/loop) [storage.Monitor] Domain > ffc53fd8-c5d1-4070-ae51-2e91835cd937 became INVALID (monitor:469) > > 2018-11-11 14:33:59,451+0100 WARN (check/loop) [storage.check] Checker > u'/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata' > is blocked for 20.00 seconds (check:282) > > 2018-11-11 14:34:09,480+0100 INFO (event/37) [storage.StoragePool] Linking > /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937 > to > /rhev/data-center/6efda7f8-b62f-11e8-9d16-00163e263d21/ffc53fd8-c5d1-4070-ae51-2e91835cd937 > (sp:1230) > > OK : so, DATA02 marked as blocked for 20s ? I definitly have a problem with > gluster ? I'll inevitably find the reason in the gluster logs ? Uh : not at > all. > Please see gluster logs here : > https://seafile.systea.fr/d/65df86cca9d34061a1e4/ > > Unfortunatly I discovered this morning that I have not the sanlock.log for > this date. I don't understand why, the log rotate seems OK with "rotate 3", > but I have no backups files :(. > But, luck in bad luck, the same event occurs this morning ! Same VM patjoub1, > 2018-11-13 08:01:37. So I have added the sanlock.log for today, maybe it can > help. > > IMPORTANT NOTE : don't forget that Gluster log with on hour shift. For this > event at 14:34, search at 13h34 in gluster logs. > I recall my configuration : > Gluster 3.12.13 > oVirt 4.2.3 > 3 nodes where the third is arbiter (volumes in replica 2) > > The nodes are never overloaded (CPU average 5%, no peak detected at the time > of the event, mem 128G used at 15% (only 10 VMs on this cluster)). Network > underused, gluster is on a separa
[ovirt-users] Re: VM ramdomly unresponsive
Hi, questioning me about all the chain oVirt -> Gluster -> hardware, I continued to check all the components, finally testing the hardware. I found some latencies on storage when it was busy, and some searches on web convinced me that the RAID cards could be the problem : Dell servers are shipped with H310 cards which do not support cache... last week we ordered H710 cards, providing cache, installed Saturday. Since it, storage performances are better, and I noticed no more timeout or errors. But it happened ramdomly, so I wait some days more to say that this is solved ! Thank you for the wasted time, -- Regards, Frank Le Mardi, Novembre 27, 2018 08:30 CET, Sahina Bose a écrit: On Tue, Nov 13, 2018 at 4:46 PM fsoyer wrote: > > Hi all, > I continue to try to understand my problem between (I suppose) oVirt anf > Gluster. > After my recents posts titled 'VMs unexpectidly restarted' that did not > provide solution nor search idea, I submit to you another (related ?) problem. > Parallely with the problem of VMs down (that did not reproduce since Oct 16), > I have ramdomly some events in the GUI saying "VM x is not responding." > For example, VM "patjoub1" on 2018-11-11 14:34. Never the same hour, not all > the days, often this VM patjoub1 but not always : I had it on two others. All > VMs disks are on a volume DATA02 (with leases on the same volume). > > Searching in engine.log, I found : > > 2018-11-11 14:34:32,953+01 INFO > [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] > (EE-ManagedThreadFactory-engineScheduled-Thread-28) [] VM > '6116fb07-096b-4c7e-97fe-01ecc9a6bd9b'(patjoub1) moved from 'Up' --> > 'NotResponding' > 2018-11-11 14:34:33,116+01 WARN > [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder] > (EE-ManagedThreadFactory-engineScheduled-Thread-1) [] Invalid or unknown > guest architecture type '' received from guest agent > 2018-11-11 14:34:33,176+01 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engineScheduled-Thread-28) [] EVENT_ID: > VM_NOT_RESPONDING(126), VM patjoub1 is not responding. > ... > ... > 2018-11-11 14:34:48,278+01 INFO > [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] > (EE-ManagedThreadFactory-engineScheduled-Thread-48) [] VM > '6116fb07-096b-4c7e-97fe-01ecc9a6bd9b'(patjoub1) moved from 'NotResponding' > --> 'Up' > > So it becomes up 15s after, and the VM (and the monitoring) see no downtime. > At this time, I see in vdsm.log of the nodes : > > 2018-11-11 14:33:49,450+0100 ERROR (check/loop) [storage.Monitor] Error > checking path > /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata > (monitor:498) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 496, in > _pathChecked > delay = result.delay() > File "/usr/lib/python2.7/site-packages/vdsm/storage/check.py", line 391, in > delay > raise exception.MiscFileReadException(self.path, self.rc, self.err) > MiscFileReadException: Internal file read failure: > (u'/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata', > 1, 'Read timeout') > 2018-11-11 14:33:49,450+0100 INFO (check/loop) [storage.Monitor] Domain > ffc53fd8-c5d1-4070-ae51-2e91835cd937 became INVALID (monitor:469) > > 2018-11-11 14:33:59,451+0100 WARN (check/loop) [storage.check] Checker > u'/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata' > is blocked for 20.00 seconds (check:282) > > 2018-11-11 14:34:09,480+0100 INFO (event/37) [storage.StoragePool] Linking > /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937 > to > /rhev/data-center/6efda7f8-b62f-11e8-9d16-00163e263d21/ffc53fd8-c5d1-4070-ae51-2e91835cd937 > (sp:1230) > > OK : so, DATA02 marked as blocked for 20s ? I definitly have a problem with > gluster ? I'll inevitably find the reason in the gluster logs ? Uh : not at > all. > Please see gluster logs here : > https://seafile.systea.fr/d/65df86cca9d34061a1e4/ > > Unfortunatly I discovered this morning that I have not the sanlock.log for > this date. I don't understand why, the log rotate seems OK with "rotate 3", > but I have no backups files :(. > But, luck in bad luck, the same event occurs this morning ! Same VM patjoub1, > 2018-11-13 08:01:37. So I have added the sanlock.log for today, maybe it can > help. > > IM
[ovirt-users] VM ramdomly unresponsive
Hi all, I continue to try to understand my problem between (I suppose) oVirt anf Gluster. After my recents posts titled 'VMs unexpectidly restarted' that did not provide solution nor search idea, I submit to you another (related ?) problem. Parallely with the problem of VMs down (that did not reproduce since Oct 16), I have ramdomly some events in the GUI saying "VM x is not responding." For example, VM "patjoub1" on 2018-11-11 14:34. Never the same hour, not all the days, often this VM patjoub1 but not always : I had it on two others. All VMs disks are on a volume DATA02 (with leases on the same volume). Searching in engine.log, I found : 2018-11-11 14:34:32,953+01 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-28) [] VM '6116fb07-096b-4c7e-97fe-01ecc9a6bd9b'(patjoub1) moved from 'Up' --> 'NotResponding' 2018-11-11 14:34:33,116+01 WARN [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder] (EE-ManagedThreadFactory-engineScheduled-Thread-1) [] Invalid or unknown guest architecture type '' received from guest agent 2018-11-11 14:34:33,176+01 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-28) [] EVENT_ID: VM_NOT_RESPONDING(126), VM patjoub1 is not responding. ... ... 2018-11-11 14:34:48,278+01 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-48) [] VM '6116fb07-096b-4c7e-97fe-01ecc9a6bd9b'(patjoub1) moved from 'NotResponding' --> 'Up'So it becomes up 15s after, and the VM (and the monitoring) see no downtime. At this time, I see in vdsm.log of the nodes : 2018-11-11 14:33:49,450+0100 ERROR (check/loop) [storage.Monitor] Error checking path /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata (monitor:498) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 496, in _pathChecked delay = result.delay() File "/usr/lib/python2.7/site-packages/vdsm/storage/check.py", line 391, in delay raise exception.MiscFileReadException(self.path, self.rc, self.err) MiscFileReadException: Internal file read failure: (u'/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata', 1, 'Read timeout') 2018-11-11 14:33:49,450+0100 INFO (check/loop) [storage.Monitor] Domain ffc53fd8-c5d1-4070-ae51-2e91835cd937 became INVALID (monitor:469) 2018-11-11 14:33:59,451+0100 WARN (check/loop) [storage.check] Checker u'/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata' is blocked for 20.00 seconds (check:282) 2018-11-11 14:34:09,480+0100 INFO (event/37) [storage.StoragePool] Linking /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937 to /rhev/data-center/6efda7f8-b62f-11e8-9d16-00163e263d21/ffc53fd8-c5d1-4070-ae51-2e91835cd937 (sp:1230)OK : so, DATA02 marked as blocked for 20s ? I definitly have a problem with gluster ? I'll inevitably find the reason in the gluster logs ? Uh : not at all. Please see gluster logs here : https://seafile.systea.fr/d/65df86cca9d34061a1e4/ Unfortunatly I discovered this morning that I have not the sanlock.log for this date. I don't understand why, the log rotate seems OK with "rotate 3", but I have no backups files :(. But, luck in bad luck, the same event occurs this morning ! Same VM patjoub1, 2018-11-13 08:01:37. So I have added the sanlock.log for today, maybe it can help. IMPORTANT NOTE : don't forget that Gluster log with on hour shift. For this event at 14:34, search at 13h34 in gluster logs. I recall my configuration : Gluster 3.12.13 oVirt 4.2.3 3 nodes where the third is arbiter (volumes in replica 2) The nodes are never overloaded (CPU average 5%, no peak detected at the time of the event, mem 128G used at 15% (only 10 VMs on this cluster)). Network underused, gluster is on a separate network on a bond (2 NICs) 1+1Gb mode 4 = 2Gb, used in peak at 10%. Here is the configuration for the given volume : # gluster volume status DATA02 Status of volume: DATA02 Gluster process TCP Port RDMA Port Online Pid -- Brick victorstorage.local.systea.fr:/home/d ata02/data02/brick 49158 0 Y 4990 Brick gingerstorage.local.systea.fr:/home/d ata02/data02/brick 49153 0 Y 8460 Brick eskarinastorage.local.systea.fr:/home /data01/data02/brick 49158 0 Y 2470 Self-heal Daemon on localhost N/A N/A Y 8771 Self-heal Daemon on eskarinastorage.local.s ystea.fr N/A
[ovirt-users] Re: VM paused then killed with "device vda reported I/O error"
Hi, unfortunatly, at the time of this error there was no messages. In fact, this file (rhev-data-center-mnt-glusterSD-victor.local.systea.fr:_DATA01.log) contains only : [2018-10-21 01:41:11.646681] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 0-glusterfs: No change in volfile,continuing [2018-10-28 02:08:10.081010] I [MSGID: 100011] [glusterfsd.c:1446:reincarnate] 0-glusterfsd: Fetching the volume file from server... As I said in my another post "VMs unexpectidly restarted" which is maybe related to this one (not exactly the same things but close enough... Please see this posts as they can give some additional informations ? Especially the messages since sunday 28 12:42, after I tried an export/import of a VM on saturday 27) I see no revelant error in gluster logs at time of the problems occur on the VMs. Though, looking on this files, I find some messages in the mount log of the second volume DATA02, concerned by the second problem on oct-27. I'll add this messages on the other post "VMs unexpectidly restarted" to avoid confusion. I might not have open this new post after this one, but on the moment I couldn't be sure that this was a unique problem. Thank you for your time, -- Regards, Frank Le Mercredi, Octobre 31, 2018 08:15 CET, Sahina Bose a écrit: 2018-10-25 01:21:07,944+0200 INFO (libvirt/events) [virt.vm] (vmId='14fb9d79-c603-4691-b19e-9133c6bd5e22') abnormal vm stop device ua-134c4848-6897-46fc-b346-dd4a180ac653 error eio (vm:5158) 2018-10-25 01:21:07,944+0200 INFO (libvirt/events) [virt.vm] (vmId='14fb9d79-c603-4691-b19e-9133c6bd5e22') CPU stopped: onIOError (vm:6199) 2018-10-25 01:21:08,030+0200 INFO (libvirt/events) [virt.vm] (vmId='14fb9d79-c603-4691-b19e-9133c6bd5e22') CPU stopped: onSuspend (vm:6199) This indicates an I/O error from storage layer most likely. Can you also provide the mount logs for the gluster volume that hosts these VM's disks (under /var/log/glusterfs/rhev-data-center-mnt-glusterSD-.log) On Fri, Oct 26, 2018 at 12:38 AM fsoyer wrote: > > Oops, reading my message I find an error : the problem occurs at 1:21AM not > 1:01 :/ > > Frank > > > > Le Jeudi, Octobre 25, 2018 17:55 CEST, "fsoyer" a écrit: > > > > > Hi, > related (or maybe not) with my problem "VMs unexpectidly restarted", I have > one VM (only one) which was paused then killed this morning (1:01AM). > This is the second time (first time about 15 days ago), only this one (it is > on a domain with 5 others VMs, and it is not the most used of them. And it > was at night, without any particular treatment at this time). The others VMs > on the same storage were not impacted at all. And it is not on the same > storage domain as the other VM of "VMs unexpectidly restarted"... > At the same time, gluster seems to have seen absolutly nothing. Is there > really a storage issue ?? > > Here are some revelant logs > /var/log/messages of the node > vdsm.log > engine.log > glusterd.log > data01-brick.log > > For recall, this is a 3 nodes 4.2.3 cluster, on Gluster 3.12.13 (2+arbiter). > Any idea where or what I must search for ? > > Thanks > -- > > Cordialement, > > Frank > > > > ___ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/3BI45NBQTKNHLOOS3TO2TAT53LREC4EF/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/J4LWH5J7PUIFOO65YIBSC2BDFT2GD3VY/
[ovirt-users] Re: ETL service and winter hour
Thank you very much for all this detailed informations, Shirly. The only point is that we (must I say "I" ?) never really ask for DWH when installing ovirt with hosted engine, maybe it's a lake of documentation on my part but I never hear about it in the installation procedure. Where I deduce that it is installed with ovirt engine, and if it is a Postgre functionality, and if the Postgre database is automatically created in the engine VM, so it is installed in the engine VM (Q.E.D. :) ) The warning you point to me maybe is not enough visible in installation procedures for who lives in a country subject to the summer/winter time... But even apart of that, I don't remember during the hosted engine installation a moment where it ask us for a timezone, you see ? So I wonder where or when I could have force UTC time, in fact... Can you tell me (and maybe for others installing it in France or similar countries) if you see when this DWH and timezone questions can be managed in the hosted engine installation process ? This said, if I understand correctly your answers below, can I resume as this : don't touch anything now, as the error is automatically repaired after this "1 hour gap / overlap" (so I have no more messages after 3:00AM). Right ? Many thanks again, -- Regards, Frank Le Dimanche, Octobre 28, 2018 13:26 CET, Shirly Radco a écrit: Please see answers below and let me know if you have any other questions. Best,-- SHIRLY RADCO BI SENIOR SOFTWARE ENGINEER Red Hat IsraelTRIED. TESTED. TRUSTED. On Sun, Oct 28, 2018 at 12:55 PM fsoyer wrote:Well, I see that I'm late to give the information :) Thank you to pointing me to this, but I have now some other questions now... How can I see the timezone of the DB ? "If no time zone is stated in the input string, then it is assumed to be in the time zone indicated by the system's TimeZone parameter, and is converted to UTC using the offset for the timezone zone." https://serverfault.com/questions/554359/postgresql-timezone-does-not-match-system-timezone When it says "all machines", do you confirm that this is physical machines, not VMs ? I mean the machine DWH is installed on. It can be a VM.But I'm not saying we recommend all VMs to be set to UTC. May I apply the solution given on access.redhat or not, as there is no more messages since 3AM ? No need. And, last question but not least, can this timezone be changed on the machines (and DB ?) without issue ? It is possible to update it, but its not mandatory. The 1 hour gap / overlap is expected when moving from summer to winter and back when not using UTC and I'm not sure if its even worth updating at this point,at the risk of ending up with a real bug.. -- Regards, Frank Le Dimanche, Octobre 28, 2018 11:40 CET, Shirly Radco a écrit: Hi, Please see herehttps://www.ovirt.org/documentation/data-warehouse/Data_Collection_Setup_and_Reports_Installation_Overview/ "It is recommended that you set the system time zone for all machines in your Data Warehouse deployment to UTC. This ensures that data collection is not interrupted by variations in your local time zone: for example, a change from summer time to winter time." What timezone is your DB configured to? Best, -- SHIRLY RADCO BI SENIOR SOFTWARE ENGINEER Red Hat IsraelTRIED. TESTED. TRUSTED. On Sun, Oct 28, 2018 at 12:32 PM fsoyer wrote:Hi all, Maybe it has already been posted, but I think I've discoverd a little bug. This night I had this messages : 28 oct. 2018 03:00:00 ETL service aggregation to hourly tables has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:40:27 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:33:42 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:27:42 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:22:27 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:16:37 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:11:06 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:05:06 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:00:06 ETL service sampling has encountered an error. Please consult the service log for more details 28 oct. 2018 02:00:00 ETL service aggregation to hourly tables has encountered an error. Please consult the service log for more details.and, coincidence, here in France we have change to winter hour at... 2AM :) So regarding this post: https://access.redhat.com/solutions/3338001 speaking about a time problem
[ovirt-users] Re: VMs unexpectidly restarted
lf-heal Daemon on eskarinastorage.local.s ystea.fr N/A N/A Y 30725 Self-heal Daemon on victorstorage.local.sys tea.fr N/A N/A Y 2810 Task Status of Volume ISO --- But, a df on the nodes shows that all volumes except ENGINE was mounted on ovirmgmt network (hosts names without "storage") : gingerstorage.local.systea.fr:/ENGINE 5,0T 226G 4,7T 5% /rhev/data-center/mnt/glusterSD/gingerstorage.local.systea.fr:_ENGINE victor.local.systea.fr:/DATA01 1,3T 425G 862G 33% /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA01 victor.local.systea.fr:/DATA02 5,0T 226G 4,7T 5% /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02 victor.local.systea.fr:/ISO 1,3T 425G 862G 33% /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_ISO victor.local.systea.fr:/EXPORT 1,3T 425G 862G 33% /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_EXPORT I can't remember how it was declared at install time, maybe I had not seen that, but if I tried to had a domain now, gluster managed, effectively it proposes to me only the nodes by their ovirmgmt names, not storage names. Names are only known in the /etc/hosts of all nodes + engine, there is no DNS for this local addresses. So : in your opinion, can this configuration be a (the) source of my problems ? And have you an idea how I could correct this now, without loosing anything ? Thanks for all suggestions. -- Regards, Frank Le Jeudi, Octobre 18, 2018 23:13 CEST, Nir Soffer a écrit: On Thu, Oct 18, 2018 at 3:43 PM fsoyer wrote:Hi, I forgot to look in the /var/log/messages file on the host ! What a shame :/ Here is the messages file at the time of the error : https://gist.github.com/fsoyer/4d1247d4c3007a8727459efd23d89737 At the sasme time, the second host as no particular messages in its log. Does anyone have an idea of the source problem ? The problem started when sanlock could not renew storage leases held by some processes: Oct 16 11:01:46 victor sanlock[904]: 2018-10-16 11:01:46 2945585 [4167]: s3 delta_renew read timeout 10 sec offset 0 /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/idsOct 16 11:01:46 victor sanlock[904]: 2018-10-16 11:01:46 2945585 [4167]: s3 renewal error -202 delta_length 25 last_success 2945539 After 80 seconds, the vms are terminated by sanlock: Oct 16 11:02:19 victor sanlock[904]: 2018-10-16 11:02:18 2945617 [904]: s1 check_our_lease failed 80Oct 16 11:02:19 victor sanlock[904]: 2018-10-16 11:02:18 2945617 [904]: s1 kill 13823 sig 15 count 1 But process 13823 cannot be killed, since it is blocked on storage, so sanlock send many moreTERM signals: Oct 16 11:02:33 victor sanlock[904]: 2018-10-16 11:02:33 2945633 [904]: s1 kill 13823 sig 15 count 17 The VM finally dies after 17 retries: Oct 16 11:02:33 victor sanlock[904]: 2018-10-16 11:02:33 2945633 [904]: dead 13823 ci 10 count 17 We can see the same flow for other processes (HA VMs?) This allows the system to start the HA VMon another host, which is what we see in the events log in the first message. Trying to restart VM npi2 on Host victor.local.systea.fr 16 oct. 2018 11:02:33 Highly Available VM npi2 failed. It will be restarted automatically. 16 oct. 2018 11:02:33 VM npi2 is down with error. Exit message: VM has been terminated on the host. If the VMs were not started successfully on the other hosts, maybe the storage domainused for VM lease is not accessible? It is recommended to choose the same storage domain used by the other VM disks forthe VM lease. Also check that all storage domains are accessible - if they are not you will have warningsin /var/log/vdsm/vdsm.log. Nir -- Cordialement, Frank Le Mardi, Octobre 16, 2018 13:25 CEST, "fsoyer" a écrit: Hi all, this morning, some of my VMs were restarted unexpectidly. The events in GUI say : 16 oct. 2018 11:03:50 Trying to restart VM patjoub1 on Host ginger.local.systea.fr 16 oct. 2018 11:03:26 Trying to restart VM op2drugs1 on Host victor.local.systea.fr 16 oct. 2018 11:03:23 Trying to restart VM npi2 on Host ginger.local.systea.fr 16 oct. 2018 11:02:54 Trying to restart VM op2drugs1 on Host victor.local.systea.fr 16 oct. 2018 11:02:54 Trying to restart VM patjoub1 on Host ginger.local.systea.fr 16 oct. 2018 11:02:53 Highly Available VM op2drugs1 failed. It will be restarted automatically. 16 oct. 2018 11:02:53 Failed to restart VM patjoub1 on Host victor.local.systea.fr 16 oct. 2018 11:02:53 VM op2drugs1 is down with error. Exit message: VM has been terminated on the host. 16 oct. 2018 11:02:53 VM patjoub1 is down with error. Exit message: Failed to acquire lock: Aucun espace disponible sur le périphérique. 16 oct. 2018 11:02:47 Trying to restart VM npi2 on Host ginger.local.systea.fr 16 oct. 2018 11:02:46 Failed to rest
[ovirt-users] Re: ETL service and winter hour
Well, I see that I'm late to give the information :) Thank you to pointing me to this, but I have now some other questions now... How can I see the timezone of the DB ? When it says "all machines", do you confirm that this is physical machines, not VMs ? May I apply the solution given on access.redhat or not, as there is no more messages since 3AM ? And, last question but not least, can this timezone be changed on the machines (and DB ?) without issue ? -- Regards, Frank Le Dimanche, Octobre 28, 2018 11:40 CET, Shirly Radco a écrit: Hi, Please see herehttps://www.ovirt.org/documentation/data-warehouse/Data_Collection_Setup_and_Reports_Installation_Overview/ "It is recommended that you set the system time zone for all machines in your Data Warehouse deployment to UTC. This ensures that data collection is not interrupted by variations in your local time zone: for example, a change from summer time to winter time." What timezone is your DB configured to? Best, -- SHIRLY RADCO BI SENIOR SOFTWARE ENGINEER Red Hat IsraelTRIED. TESTED. TRUSTED. On Sun, Oct 28, 2018 at 12:32 PM fsoyer wrote:Hi all, Maybe it has already been posted, but I think I've discoverd a little bug. This night I had this messages : 28 oct. 2018 03:00:00 ETL service aggregation to hourly tables has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:40:27 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:33:42 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:27:42 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:22:27 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:16:37 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:11:06 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:05:06 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:00:06 ETL service sampling has encountered an error. Please consult the service log for more details 28 oct. 2018 02:00:00 ETL service aggregation to hourly tables has encountered an error. Please consult the service log for more details.and, coincidence, here in France we have change to winter hour at... 2AM :) So regarding this post: https://access.redhat.com/solutions/3338001 speaking about a time problem, I've supposed that this is related ! No ? access.redhat says that the cause was not yet determined, but maybe it can be interesting to propose this cause ? But the bug is actually closed. Question : does this repair all alone (as there is no more messages after 3AM) or may I applied the solution with postgres updates (I must say that I'm not very enthousiast for that...) ? Regards, -- Frank ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GNGAQKTA5PDJUYMTFJABDLZA27FHUY7O/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/D46VJJ5LHQP4JTLMWL6L2QBMPL2SCU3W/
[ovirt-users] ETL service and winter hour
Hi all, Maybe it has already been posted, but I think I've discoverd a little bug. This night I had this messages : 28 oct. 2018 03:00:00 ETL service aggregation to hourly tables has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:40:27 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:33:42 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:27:42 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:22:27 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:16:37 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:11:06 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:05:06 ETL service sampling has encountered an error. Please consult the service log for more details. 28 oct. 2018 02:00:06 ETL service sampling has encountered an error. Please consult the service log for more details 28 oct. 2018 02:00:00 ETL service aggregation to hourly tables has encountered an error. Please consult the service log for more details.and, coincidence, here in France we have change to winter hour at... 2AM :) So regarding this post: https://access.redhat.com/solutions/3338001 speaking about a time problem, I've supposed that this is related ! No ? access.redhat says that the cause was not yet determined, but maybe it can be interesting to propose this cause ? But the bug is actually closed. Question : does this repair all alone (as there is no more messages after 3AM) or may I applied the solution with postgres updates (I must say that I'm not very enthousiast for that...) ? Regards, -- Frank ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GNGAQKTA5PDJUYMTFJABDLZA27FHUY7O/
[ovirt-users] Re: VM paused then killed with "device vda reported I/O error"
Oops, reading my message I find an error : the problem occurs at 1:21AM not 1:01 :/ Frank Le Jeudi, Octobre 25, 2018 17:55 CEST, "fsoyer" a écrit: Hi, related (or maybe not) with my problem "VMs unexpectidly restarted", I have one VM (only one) which was paused then killed this morning (1:01AM). This is the second time (first time about 15 days ago), only this one (it is on a domain with 5 others VMs, and it is not the most used of them. And it was at night, without any particular treatment at this time). The others VMs on the same storage were not impacted at all. And it is not on the same storage domain as the other VM of "VMs unexpectidly restarted"... At the same time, gluster seems to have seen absolutly nothing. Is there really a storage issue ?? Here are some revelant logs /var/log/messages of the node vdsm.log engine.log glusterd.log data01-brick.log For recall, this is a 3 nodes 4.2.3 cluster, on Gluster 3.12.13 (2+arbiter). Any idea where or what I must search for ? Thanks -- Cordialement, Frank ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3BI45NBQTKNHLOOS3TO2TAT53LREC4EF/
[ovirt-users] VM paused then killed with "device vda reported I/O error"
Hi, related (or maybe not) with my problem "VMs unexpectidly restarted", I have one VM (only one) which was paused then killed this morning (1:01AM). This is the second time (first time about 15 days ago), only this one (it is on a domain with 5 others VMs, and it is not the most used of them. And it was at night, without any particular treatment at this time). The others VMs on the same storage were not impacted at all. And it is not on the same storage domain as the other VM of "VMs unexpectidly restarted"... At the same time, gluster seems to have seen absolutly nothing. Is there really a storage issue ?? Here are some revelant logs /var/log/messages of the node vdsm.log engine.log glusterd.log data01-brick.log For recall, this is a 3 nodes 4.2.3 cluster, on Gluster 3.12.13 (2+arbiter). Any idea where or what I must search for ? Thanks -- Cordialement, Frank ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/JU3OTWA7HU4F5POYFR7KMNTKKWQMYEZE/
[ovirt-users] Re: re-enabling networkmanager
At least view some graphs. Here is a screenshot of the cockpit tab (not sure that a picture can be displayed via the list, tell me if not) : -- Cordialement, Frank Le Lundi, Octobre 22, 2018 12:42 CEST, Donny Davis a écrit: So you are trying to capture utilization stats from your network? On Mon, Oct 22, 2018, 6:37 AM fsoyer wrote:Hi Donny, thank you for this precision, but I don't want to manage network from Cockpit, just view the graphs of network usage in it, and eventually logs (can be interesting !). That's why I ask about risks to activate NM after configuring the networks in engine UI : what are the opinions ? -- Cordialement, Frank Soyer Mob. 06 72 28 38 53 - Fix. 05 49 50 52 34 Systea IG Administration systèmes, réseaux et bases de données www.systea.net Membre du réseau Les Professionnels du Numérique KoGite Hébergement de proximité www.kogite.fr Le Dimanche, Octobre 21, 2018 14:05 CEST, Donny Davis a écrit: Use the Ovirt engine to manage your networks. VDSM takes over at boot time, and the only way for this to happen is if you use the engine On Fri, Oct 19, 2018 at 9:26 AM fsoyer wrote:Hi, I have installed a 4.2 cluster on CentOS 7 nodes but I have follow an (old) procedure of mine done with 4.0 : so, I have disabled Network Manager before installing oVirt. The networks created and validated in the engine UI are : ovirmgmt on bond0 (2 slaves) failover mode storagemanager on bond1 (2 slaves), jumbo frames, aggregation mode, serving Gluster. Today, I installed Cockpit on the node to have the nodes consoles. But it say that it cannot manage the network without NM. So my question is : is there any risk to re-enabled NM on the nodes ? Can it broke anything done by the UI ? -- Cordialement, Frank Soyer Mob. 06 72 28 38 53 - Fix. 05 49 50 52 34 Systea IG Administration systèmes, réseaux et bases de données www.systea.net Membre du réseau Les Professionnels du Numérique KoGite Hébergement de proximité www.kogite.fr ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GOWGDMXIQ2VLHW2NAL2SSRQLXFKD7753/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/EC6NWQDYU6JJDPSZJFDQRAJXLBFLEMY7/
[ovirt-users] Re: re-enabling networkmanager
Hi Donny, thank you for this precision, but I don't want to manage network from Cockpit, just view the graphs of network usage in it, and eventually logs (can be interesting !). That's why I ask about risks to activate NM after configuring the networks in engine UI : what are the opinions ? -- Cordialement, Frank Soyer Mob. 06 72 28 38 53 - Fix. 05 49 50 52 34 Systea IG Administration systèmes, réseaux et bases de données www.systea.net Membre du réseau Les Professionnels du Numérique KoGite Hébergement de proximité www.kogite.fr Le Dimanche, Octobre 21, 2018 14:05 CEST, Donny Davis a écrit: Use the Ovirt engine to manage your networks. VDSM takes over at boot time, and the only way for this to happen is if you use the engine On Fri, Oct 19, 2018 at 9:26 AM fsoyer wrote:Hi, I have installed a 4.2 cluster on CentOS 7 nodes but I have follow an (old) procedure of mine done with 4.0 : so, I have disabled Network Manager before installing oVirt. The networks created and validated in the engine UI are : ovirmgmt on bond0 (2 slaves) failover mode storagemanager on bond1 (2 slaves), jumbo frames, aggregation mode, serving Gluster. Today, I installed Cockpit on the node to have the nodes consoles. But it say that it cannot manage the network without NM. So my question is : is there any risk to re-enabled NM on the nodes ? Can it broke anything done by the UI ? -- Cordialement, Frank Soyer Mob. 06 72 28 38 53 - Fix. 05 49 50 52 34 Systea IG Administration systèmes, réseaux et bases de données www.systea.net Membre du réseau Les Professionnels du Numérique KoGite Hébergement de proximité www.kogite.fr ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GOWGDMXIQ2VLHW2NAL2SSRQLXFKD7753/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5TK7TSIN4UEG5F3KO3NLBYVFIW57LWL5/
[ovirt-users] re-enabling networkmanager
Hi, I have installed a 4.2 cluster on CentOS 7 nodes but I have follow an (old) procedure of mine done with 4.0 : so, I have disabled Network Manager before installing oVirt. The networks created and validated in the engine UI are : ovirmgmt on bond0 (2 slaves) failover mode storagemanager on bond1 (2 slaves), jumbo frames, aggregation mode, serving Gluster. Today, I installed Cockpit on the node to have the nodes consoles. But it say that it cannot manage the network without NM. So my question is : is there any risk to re-enabled NM on the nodes ? Can it broke anything done by the UI ? -- Cordialement, Frank Soyer Mob. 06 72 28 38 53 - Fix. 05 49 50 52 34 Systea IG Administration systèmes, réseaux et bases de données www.systea.net Membre du réseau Les Professionnels du Numérique KoGite Hébergement de proximité www.kogite.fr ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GOWGDMXIQ2VLHW2NAL2SSRQLXFKD7753/
[ovirt-users] Re: VMs unexpectidly restarted
Hi Nir, thank you for this detailed analysis. As I can see, the fist VM to shutdown had its lease on hosted storage domain (probably not the best, maybe a test remained here) and its disk on DATA02. The 3 others (HA VMs) had a lease on the same domain as their disk (DATA02). So I suppose this looks like a gluster latency on DATA02. But what I don't understand at this time is : - if this was a lease problem on DATA02, the VM npi2 should not be impacted... Or DATA02 was inaccessible, and the messages should have reported a storage error (something like "IO error" I suppode ?) - If this was a problem on hosted storage domain too, the engine do not restart (if the domain was off or blocked, it would have?) nor was marked as not responding, even temporarily - Gluster saw absolutly nothing at the same time, on engine domain or DATA02 : the logs of daemons and bricks show nothing revelant. Unfortunatly, I have no more the vdsm log file at the time of the problem : it is rotated+compressed all 2 hours, and I discover that if you uncompress the "vdsm.log.1.xz" for example, at the time of rotation the system overwrite it with the last log :( I'm afraid that I need to wait for another problem to re-scan all the logs and try to understand what append... -- Cordialement, Frank Le Jeudi, Octobre 18, 2018 23:13 CEST, Nir Soffer a écrit: On Thu, Oct 18, 2018 at 3:43 PM fsoyer wrote:Hi, I forgot to look in the /var/log/messages file on the host ! What a shame :/ Here is the messages file at the time of the error : https://gist.github.com/fsoyer/4d1247d4c3007a8727459efd23d89737 At the sasme time, the second host as no particular messages in its log. Does anyone have an idea of the source problem ? The problem started when sanlock could not renew storage leases held by some processes: Oct 16 11:01:46 victor sanlock[904]: 2018-10-16 11:01:46 2945585 [4167]: s3 delta_renew read timeout 10 sec offset 0 /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/idsOct 16 11:01:46 victor sanlock[904]: 2018-10-16 11:01:46 2945585 [4167]: s3 renewal error -202 delta_length 25 last_success 2945539 After 80 seconds, the vms are terminated by sanlock: Oct 16 11:02:19 victor sanlock[904]: 2018-10-16 11:02:18 2945617 [904]: s1 check_our_lease failed 80Oct 16 11:02:19 victor sanlock[904]: 2018-10-16 11:02:18 2945617 [904]: s1 kill 13823 sig 15 count 1 But process 13823 cannot be killed, since it is blocked on storage, so sanlock send many moreTERM signals: Oct 16 11:02:33 victor sanlock[904]: 2018-10-16 11:02:33 2945633 [904]: s1 kill 13823 sig 15 count 17 The VM finally dies after 17 retries: Oct 16 11:02:33 victor sanlock[904]: 2018-10-16 11:02:33 2945633 [904]: dead 13823 ci 10 count 17 We can see the same flow for other processes (HA VMs?) This allows the system to start the HA VMon another host, which is what we see in the events log in the first message. Trying to restart VM npi2 on Host victor.local.systea.fr 16 oct. 2018 11:02:33 Highly Available VM npi2 failed. It will be restarted automatically. 16 oct. 2018 11:02:33 VM npi2 is down with error. Exit message: VM has been terminated on the host. If the VMs were not started successfully on the other hosts, maybe the storage domainused for VM lease is not accessible? It is recommended to choose the same storage domain used by the other VM disks forthe VM lease. Also check that all storage domains are accessible - if they are not you will have warningsin /var/log/vdsm/vdsm.log. Nir -- Cordialement, Frank Le Mardi, Octobre 16, 2018 13:25 CEST, "fsoyer" a écrit: Hi all, this morning, some of my VMs were restarted unexpectidly. The events in GUI say : 16 oct. 2018 11:03:50 Trying to restart VM patjoub1 on Host ginger.local.systea.fr 16 oct. 2018 11:03:26 Trying to restart VM op2drugs1 on Host victor.local.systea.fr 16 oct. 2018 11:03:23 Trying to restart VM npi2 on Host ginger.local.systea.fr 16 oct. 2018 11:02:54 Trying to restart VM op2drugs1 on Host victor.local.systea.fr 16 oct. 2018 11:02:54 Trying to restart VM patjoub1 on Host ginger.local.systea.fr 16 oct. 2018 11:02:53 Highly Available VM op2drugs1 failed. It will be restarted automatically. 16 oct. 2018 11:02:53 Failed to restart VM patjoub1 on Host victor.local.systea.fr 16 oct. 2018 11:02:53 VM op2drugs1 is down with error. Exit message: VM has been terminated on the host. 16 oct. 2018 11:02:53 VM patjoub1 is down with error. Exit message: Failed to acquire lock: Aucun espace disponible sur le périphérique. 16 oct. 2018 11:02:47 Trying to restart VM npi2 on Host ginger.local.systea.fr 16 oct. 2018 11:02:46 Failed to restart VM npi2 on Host victor.local.systea.fr 16 oct. 2018 11:02:46 VM npi2 is down with error. Exit message: Failed to acquire lock: Aucun espace disponible sur le périphérique. 16 oct. 2018 11:02:38 Trying to restart VM patjoub1 on Host victor.local.systea.fr 1
[ovirt-users] Re: VMs unexpectidly restarted
And I add log of one of the restarted VMs https://gist.github.com/fsoyer/b63daa0653d91a59ffc65f2b6ad263f6 -- Cordialement, Frank Le Jeudi, Octobre 18, 2018 14:41 CEST, "fsoyer" a écrit: Hi, I forgot to look in the /var/log/messages file on the host ! What a shame :/ Here is the messages file at the time of the error : https://gist.github.com/fsoyer/4d1247d4c3007a8727459efd23d89737 At the sasme time, the second host as no particular messages in its log. Does anyone have an idea of the source problem ? -- Cordialement, Frank Le Mardi, Octobre 16, 2018 13:25 CEST, "fsoyer" a écrit: Hi all, this morning, some of my VMs were restarted unexpectidly. The events in GUI say : 16 oct. 2018 11:03:50 Trying to restart VM patjoub1 on Host ginger.local.systea.fr 16 oct. 2018 11:03:26 Trying to restart VM op2drugs1 on Host victor.local.systea.fr 16 oct. 2018 11:03:23 Trying to restart VM npi2 on Host ginger.local.systea.fr 16 oct. 2018 11:02:54 Trying to restart VM op2drugs1 on Host victor.local.systea.fr 16 oct. 2018 11:02:54 Trying to restart VM patjoub1 on Host ginger.local.systea.fr 16 oct. 2018 11:02:53 Highly Available VM op2drugs1 failed. It will be restarted automatically. 16 oct. 2018 11:02:53 Failed to restart VM patjoub1 on Host victor.local.systea.fr 16 oct. 2018 11:02:53 VM op2drugs1 is down with error. Exit message: VM has been terminated on the host. 16 oct. 2018 11:02:53 VM patjoub1 is down with error. Exit message: Failed to acquire lock: Aucun espace disponible sur le périphérique. 16 oct. 2018 11:02:47 Trying to restart VM npi2 on Host ginger.local.systea.fr 16 oct. 2018 11:02:46 Failed to restart VM npi2 on Host victor.local.systea.fr 16 oct. 2018 11:02:46 VM npi2 is down with error. Exit message: Failed to acquire lock: Aucun espace disponible sur le périphérique. 16 oct. 2018 11:02:38 Trying to restart VM patjoub1 on Host victor.local.systea.fr 16 oct. 2018 11:02:37 Highly Available VM patjoub1 failed. It will be restarted automatically. 16 oct. 2018 11:02:37 VM patjoub1 is down with error. Exit message: VM has been terminated on the host. 16 oct. 2018 11:02:36 VM patjoub1 is not responding. 16 oct. 2018 11:02:36 VM altern8 is not responding. 16 oct. 2018 11:02:36 VM Sogov3 is not responding. 16 oct. 2018 11:02:36 VM cerbere3 is not responding. 16 oct. 2018 11:02:36 VM Mint19 is not responding. 16 oct. 2018 11:02:35 VM cerbere4 is not responding. 16 oct. 2018 11:02:35 VM zabbix is not responding. 16 oct. 2018 11:02:34 Trying to restart VM npi2 on Host victor.local.systea.fr 16 oct. 2018 11:02:33 Highly Available VM npi2 failed. It will be restarted automatically. 16 oct. 2018 11:02:33 VM npi2 is down with error. Exit message: VM has been terminated on the host. 16 oct. 2018 11:02:20 VM cerbere3 is not responding. 16 oct. 2018 11:02:20 VM logcollector is not responding. 16 oct. 2018 11:02:20 VM HostedEngine is not responding.with engine. log : https://gist.github.com/fsoyer/e3b74b4693006736b4f737b642aed0ef searching for "Failed to acquire lock" I see a post about sanlock.log. Here it is at the time of the restart : https://gist.github.com/fsoyer/8d6952e85623a12f09317652aa4babd7 (hope that you can display this gists) First question : there is all the days those message "delta_renew long write time". What does this mean ? Even if I suspect some storage problem, I don't see latency on it (configuration described bellow). Second question : what append that force some VMs (not all, and not and the sams host !) to restart ? Where and what must I search for ? Thanks Configuration 2 DELL R620 as ovirt hosts (4.2.8-2) with hosted-engine, also members of a gluster 3.12.13-1 cluster with an arbiter (1 DELL R310, non-ovirt). The DATAs and ENGINE storages are on gluster volumes. Around 11am, I do not see any specific messages in glusterd.log or glfsheal-*.log. Gluster is on a separate network (2*1G bond mode 4=aggegation) than ovirmgmt (2*1G bond mode 1=failover). -- Regards, Frank ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/IYQGUX7GLK7KKXOWCYCLHRMHYTH5CRKY/
[ovirt-users] Re: VMs unexpectidly restarted
Hi, I forgot to look in the /var/log/messages file on the host ! What a shame :/ Here is the messages file at the time of the error : https://gist.github.com/fsoyer/4d1247d4c3007a8727459efd23d89737 At the sasme time, the second host as no particular messages in its log. Does anyone have an idea of the source problem ? -- Cordialement, Frank Le Mardi, Octobre 16, 2018 13:25 CEST, "fsoyer" a écrit: Hi all, this morning, some of my VMs were restarted unexpectidly. The events in GUI say : 16 oct. 2018 11:03:50 Trying to restart VM patjoub1 on Host ginger.local.systea.fr 16 oct. 2018 11:03:26 Trying to restart VM op2drugs1 on Host victor.local.systea.fr 16 oct. 2018 11:03:23 Trying to restart VM npi2 on Host ginger.local.systea.fr 16 oct. 2018 11:02:54 Trying to restart VM op2drugs1 on Host victor.local.systea.fr 16 oct. 2018 11:02:54 Trying to restart VM patjoub1 on Host ginger.local.systea.fr 16 oct. 2018 11:02:53 Highly Available VM op2drugs1 failed. It will be restarted automatically. 16 oct. 2018 11:02:53 Failed to restart VM patjoub1 on Host victor.local.systea.fr 16 oct. 2018 11:02:53 VM op2drugs1 is down with error. Exit message: VM has been terminated on the host. 16 oct. 2018 11:02:53 VM patjoub1 is down with error. Exit message: Failed to acquire lock: Aucun espace disponible sur le périphérique. 16 oct. 2018 11:02:47 Trying to restart VM npi2 on Host ginger.local.systea.fr 16 oct. 2018 11:02:46 Failed to restart VM npi2 on Host victor.local.systea.fr 16 oct. 2018 11:02:46 VM npi2 is down with error. Exit message: Failed to acquire lock: Aucun espace disponible sur le périphérique. 16 oct. 2018 11:02:38 Trying to restart VM patjoub1 on Host victor.local.systea.fr 16 oct. 2018 11:02:37 Highly Available VM patjoub1 failed. It will be restarted automatically. 16 oct. 2018 11:02:37 VM patjoub1 is down with error. Exit message: VM has been terminated on the host. 16 oct. 2018 11:02:36 VM patjoub1 is not responding. 16 oct. 2018 11:02:36 VM altern8 is not responding. 16 oct. 2018 11:02:36 VM Sogov3 is not responding. 16 oct. 2018 11:02:36 VM cerbere3 is not responding. 16 oct. 2018 11:02:36 VM Mint19 is not responding. 16 oct. 2018 11:02:35 VM cerbere4 is not responding. 16 oct. 2018 11:02:35 VM zabbix is not responding. 16 oct. 2018 11:02:34 Trying to restart VM npi2 on Host victor.local.systea.fr 16 oct. 2018 11:02:33 Highly Available VM npi2 failed. It will be restarted automatically. 16 oct. 2018 11:02:33 VM npi2 is down with error. Exit message: VM has been terminated on the host. 16 oct. 2018 11:02:20 VM cerbere3 is not responding. 16 oct. 2018 11:02:20 VM logcollector is not responding. 16 oct. 2018 11:02:20 VM HostedEngine is not responding.with engine. log : https://gist.github.com/fsoyer/e3b74b4693006736b4f737b642aed0ef searching for "Failed to acquire lock" I see a post about sanlock.log. Here it is at the time of the restart : https://gist.github.com/fsoyer/8d6952e85623a12f09317652aa4babd7 (hope that you can display this gists) First question : there is all the days those message "delta_renew long write time". What does this mean ? Even if I suspect some storage problem, I don't see latency on it (configuration described bellow). Second question : what append that force some VMs (not all, and not and the sams host !) to restart ? Where and what must I search for ? Thanks Configuration 2 DELL R620 as ovirt hosts (4.2.8-2) with hosted-engine, also members of a gluster 3.12.13-1 cluster with an arbiter (1 DELL R310, non-ovirt). The DATAs and ENGINE storages are on gluster volumes. Around 11am, I do not see any specific messages in glusterd.log or glfsheal-*.log. Gluster is on a separate network (2*1G bond mode 4=aggegation) than ovirmgmt (2*1G bond mode 1=failover). -- Regards, Frank ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/XFFJT4NORIELIOAGPHU4CUPC67KY3MMP/
[ovirt-users] VMs unexpectidly restarted
Hi all, this morning, some of my VMs were restarted unexpectidly. The events in GUI say : 16 oct. 2018 11:03:50 Trying to restart VM patjoub1 on Host ginger.local.systea.fr 16 oct. 2018 11:03:26 Trying to restart VM op2drugs1 on Host victor.local.systea.fr 16 oct. 2018 11:03:23 Trying to restart VM npi2 on Host ginger.local.systea.fr 16 oct. 2018 11:02:54 Trying to restart VM op2drugs1 on Host victor.local.systea.fr 16 oct. 2018 11:02:54 Trying to restart VM patjoub1 on Host ginger.local.systea.fr 16 oct. 2018 11:02:53 Highly Available VM op2drugs1 failed. It will be restarted automatically. 16 oct. 2018 11:02:53 Failed to restart VM patjoub1 on Host victor.local.systea.fr 16 oct. 2018 11:02:53 VM op2drugs1 is down with error. Exit message: VM has been terminated on the host. 16 oct. 2018 11:02:53 VM patjoub1 is down with error. Exit message: Failed to acquire lock: Aucun espace disponible sur le périphérique. 16 oct. 2018 11:02:47 Trying to restart VM npi2 on Host ginger.local.systea.fr 16 oct. 2018 11:02:46 Failed to restart VM npi2 on Host victor.local.systea.fr 16 oct. 2018 11:02:46 VM npi2 is down with error. Exit message: Failed to acquire lock: Aucun espace disponible sur le périphérique. 16 oct. 2018 11:02:38 Trying to restart VM patjoub1 on Host victor.local.systea.fr 16 oct. 2018 11:02:37 Highly Available VM patjoub1 failed. It will be restarted automatically. 16 oct. 2018 11:02:37 VM patjoub1 is down with error. Exit message: VM has been terminated on the host. 16 oct. 2018 11:02:36 VM patjoub1 is not responding. 16 oct. 2018 11:02:36 VM altern8 is not responding. 16 oct. 2018 11:02:36 VM Sogov3 is not responding. 16 oct. 2018 11:02:36 VM cerbere3 is not responding. 16 oct. 2018 11:02:36 VM Mint19 is not responding. 16 oct. 2018 11:02:35 VM cerbere4 is not responding. 16 oct. 2018 11:02:35 VM zabbix is not responding. 16 oct. 2018 11:02:34 Trying to restart VM npi2 on Host victor.local.systea.fr 16 oct. 2018 11:02:33 Highly Available VM npi2 failed. It will be restarted automatically. 16 oct. 2018 11:02:33 VM npi2 is down with error. Exit message: VM has been terminated on the host. 16 oct. 2018 11:02:20 VM cerbere3 is not responding. 16 oct. 2018 11:02:20 VM logcollector is not responding. 16 oct. 2018 11:02:20 VM HostedEngine is not responding.with engine. log : https://gist.github.com/fsoyer/e3b74b4693006736b4f737b642aed0ef searching for "Failed to acquire lock" I see a post about sanlock.log. Here it is at the time of the restart : https://gist.github.com/fsoyer/8d6952e85623a12f09317652aa4babd7 (hope that you can display this gists) First question : there is all the days those message "delta_renew long write time". What does this mean ? Even if I suspect some storage problem, I don't see latency on it (configuration described bellow). Second question : what append that force some VMs (not all, and not and the sams host !) to restart ? Where and what must I search for ? Thanks Configuration 2 DELL R620 as ovirt hosts (4.2.8-2) with hosted-engine, also members of a gluster 3.12.13-1 cluster with an arbiter (1 DELL R310, non-ovirt). The DATAs and ENGINE storages are on gluster volumes. Around 11am, I do not see any specific messages in glusterd.log or glfsheal-*.log. Gluster is on a separate network (2*1G bond mode 4=aggegation) than ovirmgmt (2*1G bond mode 1=failover). -- Regards, Frank ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/I3KRS6IESKNZNZ2UXYW356Y6QVSTUAA6/
[ovirt-users] Re: clone snapshot of running vm
Hi guys, I just have this issue on a fresh 4.2.6 install. The snapshot of a vm Seems to be the same logs in the ui.log (I paste it here to be sure), and unable to clone the snapshot. VM on or off doesn't change things. This really seems to be a UI issue because when it appends, we can no more create or clone a snapshot on any VM : the buttons just do nothing (and no log in the ui.log when we hit them). We must reload the ui (F5 or Ctrl-R) to recover the functionnalities. Please help us ? Thanks. 2018-09-15 11:37:50,589+02 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-17) [] Permutation name: 3F33631A4CFC71A7A5878CCA004CB97D 2018-09-15 11:37:50,589+02 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-17) [] Uncaught exception: com.google.gwt.event.shared.UmbrellaException: Exception caught: (TypeError) : Cannot read property 'K' of null at java.lang.Throwable.Throwable(Throwable.java:70) [rt.jar:1.8.0_181] at java.lang.RuntimeException.RuntimeException(RuntimeException.java:32) [rt.jar:1.8.0_181] at com.google.web.bindery.event.shared.UmbrellaException.UmbrellaException(UmbrellaException.java:64) [gwt-servlet.jar:] at Unknown.new C0(webadmin-0.js) at com.google.gwt.event.shared.HandlerManager.$fireEvent(HandlerManager.java:117) [gwt-servlet.jar:] at com.google.gwt.user.client.ui.Widget.$fireEvent(Widget.java:127) [gwt-servlet.jar:] at com.google.gwt.user.client.ui.Widget.fireEvent(Widget.java:127) [gwt-servlet.jar:] at com.google.gwt.event.dom.client.DomEvent.fireNativeEvent(DomEvent.java:110) [gwt-servlet.jar:] at com.google.gwt.user.client.ui.Widget.$onBrowserEvent(Widget.java:163) [gwt-servlet.jar:] at com.google.gwt.user.client.ui.Widget.onBrowserEvent(Widget.java:163) [gwt-servlet.jar:] at com.google.gwt.user.client.DOM.dispatchEvent(DOM.java:1415) [gwt-servlet.jar:] at com.google.gwt.user.client.impl.DOMImplStandard.dispatchEvent(DOMImplStandard.java:312) [gwt-servlet.jar:] at com.google.gwt.core.client.impl.Impl.apply(Impl.java:236) [gwt-servlet.jar:] at com.google.gwt.core.client.impl.Impl.entry0(Impl.java:275) [gwt-servlet.jar:] at Unknown.eval(webadmin-0.js) Caused by: com.google.gwt.core.client.JavaScriptException: (TypeError) : Cannot read property 'K' of null at org.ovirt.engine.ui.uicommonweb.models.vms.ExistingVmModelBehavior.updateHaAvailability(ExistingVmModelBehavior.java:481) at org.ovirt.engine.ui.uicommonweb.models.vms.UnitVmModel.eventRaised(UnitVmModel.java:1933) at org.ovirt.engine.ui.uicompat.Event.$raise(Event.java:99) at org.ovirt.engine.ui.uicommonweb.models.ListModel.$setSelectedItem(ListModel.java:82) at org.ovirt.engine.ui.uicommonweb.models.ListModel.setSelectedItem(ListModel.java:78) at org.ovirt.engine.ui.common.editor.UiCommonEditorVisitor.$updateListEditor(UiCommonEditorVisitor.java:193) at org.ovirt.engine.ui.common.editor.UiCommonEditorVisitor.visit(UiCommonEditorVisitor.java:47) at com.google.gwt.editor.client.impl.AbstractEditorContext.$traverse(AbstractEditorContext.java:127) [gwt-servlet.jar:] at org.ovirt.engine.ui.common.widget.uicommon.popup.AbstractVmPopupWidget_UiCommonModelEditorDelegate.accept(AbstractVmPopupWidget_UiCommonModelEditorDelegate.java:502) at com.google.gwt.editor.client.impl.AbstractEditorContext.$traverse(AbstractEditorContext.java:127) [gwt-servlet.jar:] at org.ovirt.engine.ui.common.widget.uicommon.popup.AbstractVmPopupWidget_DriverImpl.accept(AbstractVmPopupWidget_DriverImpl.java:4) at org.ovirt.engine.ui.common.editor.AbstractUiCommonModelEditorDriver.$edit(AbstractUiCommonModelEditorDriver.java:32) at org.ovirt.engine.ui.common.widget.uicommon.popup.AbstractVmPopupWidget.$edit(AbstractVmPopupWidget.java:1518) at org.ovirt.engine.ui.common.widget.uicommon.popup.AbstractVmPopupWidget.edit(AbstractVmPopupWidget.java:1518) at org.ovirt.engine.ui.common.widget.uicommon.popup.AbstractVmPopupWidget.edit(AbstractVmPopupWidget.java:1518) at org.ovirt.engine.ui.common.widget.uicommon.popup.AbstractModeSwitchingPopupWidget.edit(AbstractModeSwitchingPopupWidget.java:80) at org.ovirt.engine.ui.common.view.popup.AbstractModelBoundWidgetPopupView.edit(AbstractModelBoundWidgetPopupView.java:37) at org.ovirt.engine.ui.common.presenter.AbstractModelBoundPopupPresenterWidget.$init(AbstractModelBoundPopupPresenterWidget.java:105) at org.ovirt.engine.ui.common.widget.popup.AbstractVmBasedPopupPresenterWidget.$init(AbstractVmBasedPopupPresenterWidget.java:63) at org.ovirt.engine.ui.common.widget.popup.AbstractVmBasedPopupPresenterWidget.init(AbstractVmBasedPopupPresenterWidget.java:63) at org.ovirt.engine.ui.common.widget.popup.AbstractVmBasedPopupPresenterWidget.init(AbstractVmBasedPopupPresenterWidget.java:63) at org.ovirt.engine.ui.common.uicommon.model.ModelBoundPopupHandler.$han
[ovirt-users] Re: Re : [ovirt-users] Re: Install hosted-engine - Task Get local VM IP failed
At this time the engine (and the cluster) is up. No problem after activating gluster and creating the volumes, then finish the install in the screen session. So... Le Vendredi, Juin 29, 2018 12:32 CEST, "fsoyer" a écrit: Hi, I must say it : I'm -totally- lost. To try to find a reason to this error, I've re-installed the first host from scratch - CentOS 7.5-1804, ovirt 4.2.3-1, gluster 3.12.9. The first attemp was made with only em1 declared. Result = SUCCESS, the install pass "Get local VM IP", then through "Wait for the host to be up" without difficulty and wait at "Please specify the storage...". At this time I even notice that I've forgot to stop/disable NetworkManager, that had no impact ! So : I re-install the host from scratch (yes, sometimes I'm a fool) to be absolutly sure that there is no problem coming from the preceding install. Now I declare em1 (10.0.0.230) and em2 (10.0.0.229, without gateway nor DNS, for futur vmnetwork). NetworkManager off and disabled. Result = SUCCESS... Oo OK : Re-install host !! Now I declare, as I did some days ago, em1, em2 and bond0(em3+em4 with IP 192.168.0.30). Result : SUCCESS !!! Oo So I'm unable to say what append tuesday. Actually I see only two differences : - gluster is not active (I don't configure it to go faster) - the version of ovirt (ovirt-release, ovirt-host, appliance...) has sligthly changed. I've no more time for another attempt re-installing the host(s) with gluster activated, I must now go on as I need an operational system for other tasks with VMs this afternoon. So I leave the first host waiting for the end of install in a screen, I re-install the 2 other hosts and activate gluster and volumes on the 3 nodes. Then I'll end the install on the gluster volume. I'll tell you if this works finally, but I hope so ! however, I'm in doubt with this problem. I have no explanation of what append tuesday, this is really annoying... Maybe have you the ability to test on a same configuration (3 hosts with 2 nics on the same network for ovirtmgmt and a futur vmnetwork, and gluster on a separate network) to try to understand ? Thank you for the time spent. Frank PS : to answer to your question : yes, tuesday I ran ovirt-hosted-engine-cleanup between each attempt. Le Jeudi, Juin 28, 2018 16:26 CEST, Simone Tiraboschi a écrit: On Wed, Jun 27, 2018 at 5:48 PM fso...@systea.fr wrote:Hi again, In fact, the hour in file is exactly 2hours before, I guess a timezone problem (in the process of install ?), as the file itself is correctly timed at 11:17am (correct hour here in France). So the messages are synchrone. Yes, sorry, fault of mine.From the logs I don't see anything strange. Can you please try again on your environment and connect to the bootstrap VM via virsh console or VNC to check what's happening there? Did you also run ovirt-hosted-engine-cleanup between one attempt and the next? Message original Objet : Re: [ovirt-users] Re: Install hosted-engine - Task Get local VM IP failed De : Simone Tiraboschi À : fso...@systea.fr Cc : users Hi,HostedEngineLocal was started at 2018-06-26 09:17:26 but /var/log/messages starts only at Jun 26 11:02:32.Can you please reattach it fro the relevant time frame? On Wed, Jun 27, 2018 at 10:54 AM fsoyer wrote:Hi Simone, here are the revelant part of messages and the engine install log (there were only this file in /var/log/libvirt/qemu) . Thanks for your time. Frank Le Mardi, Juin 26, 2018 11:43 CEST, Simone Tiraboschi a écrit: On Tue, Jun 26, 2018 at 11:39 AM fsoyer wrote:Well, unfortunatly, it was a "false-positive". This morning I tried again, with the idea that at one moment the deploy will ask for the final destination for the engine, I will restart bond0+gluster+volume engine at thos moment. Re-launching the deploy on the second "fresh" host (the first one with all errors yesterday let it in a doutful state) with em2 and gluster+bond0 off : # ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: em1: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether e0:db:55:15:f0:f0 brd ff:ff:ff:ff:ff:ff inet 10.0.0.227/8 brd 10.255.255.255 scope global em1 valid_lft forever preferred_lft forever inet6 fe80::e2db:55ff:fe15:f0f0/64 scope link valid_lft forever preferred_lft forever 3: em2: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:f0:f1 brd ff:ff:ff:ff:ff:ff 4: em3: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:f0:f2 brd ff:ff:ff:ff:ff:ff 5: em4: mtu 1500 qdisc mq state DOWN group defa
[ovirt-users] Re: Re : [ovirt-users] Re: Install hosted-engine - Task Get local VM IP failed
Hi, I must say it : I'm -totally- lost. To try to find a reason to this error, I've re-installed the first host from scratch - CentOS 7.5-1804, ovirt 4.2.3-1, gluster 3.12.9. The first attemp was made with only em1 declared. Result = SUCCESS, the install pass "Get local VM IP", then through "Wait for the host to be up" without difficulty and wait at "Please specify the storage...". At this time I even notice that I've forgot to stop/disable NetworkManager, that had no impact ! So : I re-install the host from scratch (yes, sometimes I'm a fool) to be absolutly sure that there is no problem coming from the preceding install. Now I declare em1 (10.0.0.230) and em2 (10.0.0.229, without gateway nor DNS, for futur vmnetwork). NetworkManager off and disabled. Result = SUCCESS... Oo OK : Re-install host !! Now I declare, as I did some days ago, em1, em2 and bond0(em3+em4 with IP 192.168.0.30). Result : SUCCESS !!! Oo So I'm unable to say what append tuesday. Actually I see only two differences : - gluster is not active (I don't configure it to go faster) - the version of ovirt (ovirt-release, ovirt-host, appliance...) has sligthly changed. I've no more time for another attempt re-installing the host(s) with gluster activated, I must now go on as I need an operational system for other tasks with VMs this afternoon. So I leave the first host waiting for the end of install in a screen, I re-install the 2 other hosts and activate gluster and volumes on the 3 nodes. Then I'll end the install on the gluster volume. I'll tell you if this works finally, but I hope so ! however, I'm in doubt with this problem. I have no explanation of what append tuesday, this is really annoying... Maybe have you the ability to test on a same configuration (3 hosts with 2 nics on the same network for ovirtmgmt and a futur vmnetwork, and gluster on a separate network) to try to understand ? Thank you for the time spent. Frank PS : to answer to your question : yes, tuesday I ran ovirt-hosted-engine-cleanup between each attempt. Le Jeudi, Juin 28, 2018 16:26 CEST, Simone Tiraboschi a écrit: On Wed, Jun 27, 2018 at 5:48 PM fso...@systea.fr wrote:Hi again, In fact, the hour in file is exactly 2hours before, I guess a timezone problem (in the process of install ?), as the file itself is correctly timed at 11:17am (correct hour here in France). So the messages are synchrone. Yes, sorry, fault of mine.From the logs I don't see anything strange. Can you please try again on your environment and connect to the bootstrap VM via virsh console or VNC to check what's happening there? Did you also run ovirt-hosted-engine-cleanup between one attempt and the next? Message original Objet : Re: [ovirt-users] Re: Install hosted-engine - Task Get local VM IP failed De : Simone Tiraboschi À : fso...@systea.fr Cc : users Hi,HostedEngineLocal was started at 2018-06-26 09:17:26 but /var/log/messages starts only at Jun 26 11:02:32.Can you please reattach it fro the relevant time frame? On Wed, Jun 27, 2018 at 10:54 AM fsoyer wrote:Hi Simone, here are the revelant part of messages and the engine install log (there were only this file in /var/log/libvirt/qemu) . Thanks for your time. Frank Le Mardi, Juin 26, 2018 11:43 CEST, Simone Tiraboschi a écrit: On Tue, Jun 26, 2018 at 11:39 AM fsoyer wrote:Well, unfortunatly, it was a "false-positive". This morning I tried again, with the idea that at one moment the deploy will ask for the final destination for the engine, I will restart bond0+gluster+volume engine at thos moment. Re-launching the deploy on the second "fresh" host (the first one with all errors yesterday let it in a doutful state) with em2 and gluster+bond0 off : # ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: em1: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether e0:db:55:15:f0:f0 brd ff:ff:ff:ff:ff:ff inet 10.0.0.227/8 brd 10.255.255.255 scope global em1 valid_lft forever preferred_lft forever inet6 fe80::e2db:55ff:fe15:f0f0/64 scope link valid_lft forever preferred_lft forever 3: em2: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:f0:f1 brd ff:ff:ff:ff:ff:ff 4: em3: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:f0:f2 brd ff:ff:ff:ff:ff:ff 5: em4: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:f0:f3 brd ff:ff:ff:ff:ff:ff 6: bond0: mtu 9000 qdisc noqueue state DOWN group default qlen 1000 link/ether 3a:ab:a2:f2:38:5c brd ff:ff:ff:ff:ff:ff # ip r default via 10.0.1.254 dev em1 10.
[ovirt-users] Re: Install hosted-engine - Task Get local VM IP failed
Hi Simone, here are the revelant part of messages and the engine install log (there were only this file in /var/log/libvirt/qemu) . Thanks for your time. Frank Le Mardi, Juin 26, 2018 11:43 CEST, Simone Tiraboschi a écrit: On Tue, Jun 26, 2018 at 11:39 AM fsoyer wrote:Well, unfortunatly, it was a "false-positive". This morning I tried again, with the idea that at one moment the deploy will ask for the final destination for the engine, I will restart bond0+gluster+volume engine at thos moment. Re-launching the deploy on the second "fresh" host (the first one with all errors yesterday let it in a doutful state) with em2 and gluster+bond0 off : # ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: em1: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether e0:db:55:15:f0:f0 brd ff:ff:ff:ff:ff:ff inet 10.0.0.227/8 brd 10.255.255.255 scope global em1 valid_lft forever preferred_lft forever inet6 fe80::e2db:55ff:fe15:f0f0/64 scope link valid_lft forever preferred_lft forever 3: em2: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:f0:f1 brd ff:ff:ff:ff:ff:ff 4: em3: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:f0:f2 brd ff:ff:ff:ff:ff:ff 5: em4: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:f0:f3 brd ff:ff:ff:ff:ff:ff 6: bond0: mtu 9000 qdisc noqueue state DOWN group default qlen 1000 link/ether 3a:ab:a2:f2:38:5c brd ff:ff:ff:ff:ff:ff # ip r default via 10.0.1.254 dev em1 10.0.0.0/8 dev em1 proto kernel scope link src 10.0.0.227 169.254.0.0/16 dev em1 scope link metric 1002 ... does NOT work this morning [ INFO ] TASK [Get local VM IP] [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:01:c6:32 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.083587", "end": "2018-06-26 11:26:07.581706", "rc": 0, "start": "2018-06-26 11:26:07.498119", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}I'm sure that the network was the same yesterday when my attempt finally pass the "get local vm ip". Why not today ? After the error, the network was : 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: em1: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether e0:db:55:15:f0:f0 brd ff:ff:ff:ff:ff:ff inet 10.0.0.227/8 brd 10.255.255.255 scope global em1 valid_lft forever preferred_lft forever inet6 fe80::e2db:55ff:fe15:f0f0/64 scope link valid_lft forever preferred_lft forever 3: em2: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:f0:f1 brd ff:ff:ff:ff:ff:ff 4: em3: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:f0:f2 brd ff:ff:ff:ff:ff:ff 5: em4: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:f0:f3 brd ff:ff:ff:ff:ff:ff 6: bond0: mtu 9000 qdisc noqueue state DOWN group default qlen 1000 link/ether 3a:ab:a2:f2:38:5c brd ff:ff:ff:ff:ff:ff 7: virbr0: mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:ae:8d:93 brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 valid_lft forever preferred_lft forever 8: virbr0-nic: mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000 link/ether 52:54:00:ae:8d:93 brd ff:ff:ff:ff:ff:ff 9: vnet0: mtu 1500 qdisc pfifo_fast master virbr0 state UNKNOWN group default qlen 1000 link/ether fe:16:3e:01:c6:32 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc16:3eff:fe01:c632/64 scope link valid_lft forever preferred_lft forever # ip r default via 10.0.1.254 dev em1 10.0.0.0/8 dev em1 proto kernel scope link src 10.0.0.227 169.254.0.0/16 dev em1 scope link metric 1002 192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 So, finally, I have no idea why this appends :((( Can you please attach /var/log/messages and /var/log/libvirt/qemu/* ? Le Mardi, Juin 26, 2018 09:21 CEST, Simone Tiraboschi a écrit: On Mon, Jun 25, 2018 at 6:32 PM fsoyer wrote:Well, answering to myself for more informations. Thinking that the network was part of the problem, I tried to
[ovirt-users] Re: Install hosted-engine - Task Get local VM IP failed
Well, unfortunatly, it was a "false-positive". This morning I tried again, with the idea that at one moment the deploy will ask for the final destination for the engine, I will restart bond0+gluster+volume engine at thos moment. Re-launching the deploy on the second "fresh" host (the first one with all errors yesterday let it in a doutful state) with em2 and gluster+bond0 off : # ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: em1: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether e0:db:55:15:f0:f0 brd ff:ff:ff:ff:ff:ff inet 10.0.0.227/8 brd 10.255.255.255 scope global em1 valid_lft forever preferred_lft forever inet6 fe80::e2db:55ff:fe15:f0f0/64 scope link valid_lft forever preferred_lft forever 3: em2: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:f0:f1 brd ff:ff:ff:ff:ff:ff 4: em3: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:f0:f2 brd ff:ff:ff:ff:ff:ff 5: em4: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:f0:f3 brd ff:ff:ff:ff:ff:ff 6: bond0: mtu 9000 qdisc noqueue state DOWN group default qlen 1000 link/ether 3a:ab:a2:f2:38:5c brd ff:ff:ff:ff:ff:ff # ip r default via 10.0.1.254 dev em1 10.0.0.0/8 dev em1 proto kernel scope link src 10.0.0.227 169.254.0.0/16 dev em1 scope link metric 1002 ... does NOT work this morning [ INFO ] TASK [Get local VM IP] [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:01:c6:32 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.083587", "end": "2018-06-26 11:26:07.581706", "rc": 0, "start": "2018-06-26 11:26:07.498119", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}I'm sure that the network was the same yesterday when my attempt finally pass the "get local vm ip". Why not today ? After the error, the network was : 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: em1: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether e0:db:55:15:f0:f0 brd ff:ff:ff:ff:ff:ff inet 10.0.0.227/8 brd 10.255.255.255 scope global em1 valid_lft forever preferred_lft forever inet6 fe80::e2db:55ff:fe15:f0f0/64 scope link valid_lft forever preferred_lft forever 3: em2: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:f0:f1 brd ff:ff:ff:ff:ff:ff 4: em3: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:f0:f2 brd ff:ff:ff:ff:ff:ff 5: em4: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:f0:f3 brd ff:ff:ff:ff:ff:ff 6: bond0: mtu 9000 qdisc noqueue state DOWN group default qlen 1000 link/ether 3a:ab:a2:f2:38:5c brd ff:ff:ff:ff:ff:ff 7: virbr0: mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:ae:8d:93 brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 valid_lft forever preferred_lft forever 8: virbr0-nic: mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000 link/ether 52:54:00:ae:8d:93 brd ff:ff:ff:ff:ff:ff 9: vnet0: mtu 1500 qdisc pfifo_fast master virbr0 state UNKNOWN group default qlen 1000 link/ether fe:16:3e:01:c6:32 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc16:3eff:fe01:c632/64 scope link valid_lft forever preferred_lft forever # ip r default via 10.0.1.254 dev em1 10.0.0.0/8 dev em1 proto kernel scope link src 10.0.0.227 169.254.0.0/16 dev em1 scope link metric 1002 192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 So, finally, I have no idea why this appends :((( Le Mardi, Juin 26, 2018 09:21 CEST, Simone Tiraboschi a écrit: On Mon, Jun 25, 2018 at 6:32 PM fsoyer wrote:Well, answering to myself for more informations. Thinking that the network was part of the problem, I tried to stop gluster volumes, stop gluster on host, and stop bond0. So, the host now had just em1 with one IP. And... The winner is... Yes : the install passed the "[Get local VM IP]" and continued !! I hit ctrl-c, restart the bond0, restart deploy : it crashed. So it seems that more than one network is the problem. But ! How do I install en
[ovirt-users] Re: Install hosted-engine - Task Get local VM IP failed
Well, answering to myself for more informations. Thinking that the network was part of the problem, I tried to stop gluster volumes, stop gluster on host, and stop bond0. So, the host now had just em1 with one IP. And... The winner is... Yes : the install passed the "[Get local VM IP]" and continued !! I hit ctrl-c, restart the bond0, restart deploy : it crashed. So it seems that more than one network is the problem. But ! How do I install engine on gluster on a separate - bonding - jumbo network in this case ??? Can you reproduce this on your side ? Frank Le Lundi, Juin 25, 2018 16:50 CEST, "fsoyer" a écrit: Hi staff, Installing a fresh ovirt - CentOS 7.5.1804 up to date, ovirt version : # rpm -qa | grep ovirt ovirt-hosted-engine-ha-2.2.11-1.el7.centos.noarch ovirt-imageio-common-1.3.1.2-0.el7.centos.noarch ovirt-host-dependencies-4.2.2-2.el7.centos.x86_64 ovirt-vmconsole-1.0.5-4.el7.centos.noarch ovirt-provider-ovn-driver-1.2.10-1.el7.centos.noarch ovirt-hosted-engine-setup-2.2.20-1.el7.centos.noarch ovirt-engine-appliance-4.2-20180504.1.el7.centos.noarch python-ovirt-engine-sdk4-4.2.6-2.el7.centos.x86_64 ovirt-host-deploy-1.7.3-1.el7.centos.noarch ovirt-release42-4.2.3.1-1.el7.noarch ovirt-vmconsole-host-1.0.5-4.el7.centos.noarch cockpit-ovirt-dashboard-0.11.24-1.el7.centos.noarch ovirt-setup-lib-1.1.4-1.el7.centos.noarch ovirt-imageio-daemon-1.3.1.2-0.el7.centos.noarch ovirt-host-4.2.2-2.el7.centos.x86_64 ovirt-engine-sdk-python-3.6.9.1-1.el7.noarch ON PHYSICAL SERVERS (not on VMware, why should I be ?? ;) I got exactly the same error : [ INFO ] TASK [Get local VM IP] [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:69:3a:c6 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.073313", "end": "2018-06-25 16:11:36.025277", "rc": 0, "start": "2018-06-25 16:11:35.951964", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} [ INFO ] TASK [include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [Remove local vm dir] [ INFO ] changed: [localhost] [ INFO ] TASK [Notify the user about a failure] [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"} [ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook [ INFO ] Stage: Clean up I have 4 NIC : em1 10.0.0.230/8 is for ovirmgmt, it have the gateway em2 10.0.0.229/8 is for a vmnetwork em3+em4 in bond0 192.168.0.30 are for gluster with jumbo frames, volumes (ENGINE, ISO,EXPORT,DATA) are up and operationals. I tried to stop em2 (ONBOOT=No and restart network), so the network is actually : # ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: em1: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether e0:db:55:15:eb:70 brd ff:ff:ff:ff:ff:ff inet 10.0.0.230/8 brd 10.255.255.255 scope global em1 valid_lft forever preferred_lft forever inet6 fe80::e2db:55ff:fe15:eb70/64 scope link valid_lft forever preferred_lft forever 3: em2: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:eb:71 brd ff:ff:ff:ff:ff:ff 4: em3: mtu 9000 qdisc mq master bond0 state UP group default qlen 1000 link/ether e0:db:55:15:eb:72 brd ff:ff:ff:ff:ff:ff 5: em4: mtu 9000 qdisc mq master bond0 state UP group default qlen 1000 link/ether e0:db:55:15:eb:72 brd ff:ff:ff:ff:ff:ff 6: bond0: mtu 9000 qdisc noqueue state UP group default qlen 1000 link/ether e0:db:55:15:eb:72 brd ff:ff:ff:ff:ff:ff inet 192.168.0.30/24 brd 192.168.0.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::e2db:55ff:fe15:eb72/64 scope link valid_lft forever preferred_lft forever # ip r default via 10.0.1.254 dev em1 10.0.0.0/8 dev em1 proto kernel scope link src 10.0.0.230 169.254.0.0/16 dev em1 scope link metric 1002 169.254.0.0/16 dev bond0 scope link metric 1006 192.168.0.0/24 dev bond0 proto kernel scope link src 192.168.0.30 but same issue, after "/usr/sbin/ovirt-hosted-engine-cleanup" and restarting the deployment. NetworkManager was stopped and disabled at the node install, and it is still stopped.After the error, the network shows this after device 6 (bond0) : 7: virbr0: mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:38:e0:
[ovirt-users] Re: Install hosted-engine - Task Get local VM IP failed
Hi staff, Installing a fresh ovirt - CentOS 7.5.1804 up to date, ovirt version : # rpm -qa | grep ovirt ovirt-hosted-engine-ha-2.2.11-1.el7.centos.noarch ovirt-imageio-common-1.3.1.2-0.el7.centos.noarch ovirt-host-dependencies-4.2.2-2.el7.centos.x86_64 ovirt-vmconsole-1.0.5-4.el7.centos.noarch ovirt-provider-ovn-driver-1.2.10-1.el7.centos.noarch ovirt-hosted-engine-setup-2.2.20-1.el7.centos.noarch ovirt-engine-appliance-4.2-20180504.1.el7.centos.noarch python-ovirt-engine-sdk4-4.2.6-2.el7.centos.x86_64 ovirt-host-deploy-1.7.3-1.el7.centos.noarch ovirt-release42-4.2.3.1-1.el7.noarch ovirt-vmconsole-host-1.0.5-4.el7.centos.noarch cockpit-ovirt-dashboard-0.11.24-1.el7.centos.noarch ovirt-setup-lib-1.1.4-1.el7.centos.noarch ovirt-imageio-daemon-1.3.1.2-0.el7.centos.noarch ovirt-host-4.2.2-2.el7.centos.x86_64 ovirt-engine-sdk-python-3.6.9.1-1.el7.noarch ON PHYSICAL SERVERS (not on VMware, why should I be ?? ;) I got exactly the same error : [ INFO ] TASK [Get local VM IP] [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:69:3a:c6 | awk '{ print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.073313", "end": "2018-06-25 16:11:36.025277", "rc": 0, "start": "2018-06-25 16:11:35.951964", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} [ INFO ] TASK [include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [Remove local vm dir] [ INFO ] changed: [localhost] [ INFO ] TASK [Notify the user about a failure] [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"} [ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook [ INFO ] Stage: Clean up I have 4 NIC : em1 10.0.0.230/8 is for ovirmgmt, it have the gateway em2 10.0.0.229/8 is for a vmnetwork em3+em4 in bond0 192.168.0.30 are for gluster with jumbo frames, volumes (ENGINE, ISO,EXPORT,DATA) are up and operationals. I tried to stop em2 (ONBOOT=No and restart network), so the network is actually : # ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: em1: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether e0:db:55:15:eb:70 brd ff:ff:ff:ff:ff:ff inet 10.0.0.230/8 brd 10.255.255.255 scope global em1 valid_lft forever preferred_lft forever inet6 fe80::e2db:55ff:fe15:eb70/64 scope link valid_lft forever preferred_lft forever 3: em2: mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e0:db:55:15:eb:71 brd ff:ff:ff:ff:ff:ff 4: em3: mtu 9000 qdisc mq master bond0 state UP group default qlen 1000 link/ether e0:db:55:15:eb:72 brd ff:ff:ff:ff:ff:ff 5: em4: mtu 9000 qdisc mq master bond0 state UP group default qlen 1000 link/ether e0:db:55:15:eb:72 brd ff:ff:ff:ff:ff:ff 6: bond0: mtu 9000 qdisc noqueue state UP group default qlen 1000 link/ether e0:db:55:15:eb:72 brd ff:ff:ff:ff:ff:ff inet 192.168.0.30/24 brd 192.168.0.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::e2db:55ff:fe15:eb72/64 scope link valid_lft forever preferred_lft forever # ip r default via 10.0.1.254 dev em1 10.0.0.0/8 dev em1 proto kernel scope link src 10.0.0.230 169.254.0.0/16 dev em1 scope link metric 1002 169.254.0.0/16 dev bond0 scope link metric 1006 192.168.0.0/24 dev bond0 proto kernel scope link src 192.168.0.30 but same issue, after "/usr/sbin/ovirt-hosted-engine-cleanup" and restarting the deployment. NetworkManager was stopped and disabled at the node install, and it is still stopped.After the error, the network shows this after device 6 (bond0) : 7: virbr0: mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:38:e0:5a brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 valid_lft forever preferred_lft forever 8: virbr0-nic: mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000 link/ether 52:54:00:38:e0:5a brd ff:ff:ff:ff:ff:ff 11: vnet0: mtu 1500 qdisc pfifo_fast master virbr0 state UNKNOWN group default qlen 1000 link/ether fe:16:3e:69:3a:c6 brd ff:ff:ff:ff:ff:ff inet6 fe80::fc16:3eff:fe69:3ac6/64 scope link valid_lft forever preferred_lft forever I do not see ovirmgmt... And I don't know if I can access the engine vm as I have not its IP :( I tried to ping addresses after 192.168.122.1, but no one are accessible so I stopped at 122.10. The VM seems up (kvm process), qemu-kvm process taking 150% of cpu in "top"... I pasted the log here : https://pastebin.com/Ebzh1uEh PLEASE ! This issue seems to be recc
Re: [ovirt-users] ?==?utf-8?q? ?==?utf-8?q? ?= vdsClient is removed and replaced by vdsm-client
Hi Arik, unfortunatly I couldn't discuss last friday with a colleague who needs to work on the cluster this week-end. Discovering the freezed tasks, he found a workaround by deleting the lines in the job table, and the tasks disappear in the GUI. Then... He drops the faulty VM :( But, I have shared the engine.log of april 11 and 12. The export to OVA of the "CO7_VM1" vm was lauched around 17:01, with other vms (to the same directory on host "victor"). Other vms exports end successfully but not this one. Another export of the same vm was launched on april 12 at 10:42, to see if it says "already working"... But it didn't, and the second task rans with the first, indefinitly. This links are active for 2 days : https://seafile.systea.fr/f/12d90cc5c59b488c9fde/?dl=1 https://seafile.systea.fr/f/ce7cd61231924020bc62/?dl=1 I've not seen something revelant in it, but have a look ? This morning I tried to reproduce the problem. I've create a vm from template (50G). Migrate it : OK. Export it to OVA : OK. I then extended the disk to 100G, then tested again. Migrate it : OK. Export it to OVA.. OK :( So I wasn't able to reproduce the error. Thx Frank Le Vendredi, Avril 13, 2018 21:53 CEST, Arik Hadas a écrit: On Fri, Apr 13, 2018 at 6:54 PM, fsoyer wrote:Hi, This task is listed (since 2 days) in the GUI / up right "tasks" icon. It is visibly freezed as no OVA file has been created, but no errors in GUI, just... it runs. Or : it loops :) This (test) vm is one on which I have extended the disk (50 -> 100G). Before being stopped and trying to export it to OVA, it works fine. All other vms around can be well exported but not this one. I've tried to restart engine, change SPM, restart one by one each node, but the task is always here. I could even restart the vm today without error and it works fine ! But... the task runs... Today also, I tried to clone the vm : same thing, now I have 2 tasks running indefinitly :( Strange bug, where no timeout stopped the tasks in error. I can't see any revelant things in engine.log or vdsm.log, but probably I've not seen them in all the messages. No problem to remove this (test) vm and try on another (test) one (extend disk to see if this is the reason of the problem). But before I want to kill this tasks ! Please don't remove that VM yet.It would be appreciated if you could file a bug and share the engine log that covers the attempt to export this VM to OVA + the ansible log of that operation. Thanks. Frank Le Vendredi, Avril 13, 2018 16:24 CEST, Arik Hadas a écrit: On Fri, Apr 13, 2018 at 11:14 AM, fsoyer wrote:Hi all, I can't find any exhaustive doc for new vdsm-client. My problem actually is a task (export a vm to OVA) blocked. I'm afraid you won't find any task in VDSM for 'export a VM to OVA'.Expoting a VM to OVA is comprised of three steps:1. Creating temporary cloned disks - you'll find a task of copy-image-group for each disk.2. Creating the OVA file - that's done by a python script executed by ansible, there is no task for that in VDSM.3. Removing the temporary cloned disks - you'll find a task of remove-image for each disk. Can you please elaborate on the problem you're having - where do you see that task and how can you see that it's blocked? I found that I can interact with vdsm-client Task getInfo taskID=, and replace getInfo by "stop", BUT : how can I find this UUID ??? Old "vdsClient -s 0 getAllTasksStatuses" has no equivalent ?? Does someone knows if a complete doc exists dot vdsm-client ? Thanks Frank Le Mercredi, Janvier 25, 2017 12:30 CET, Irit Goihman a écrit: Hi All, vdsClient will be removed from master branch today.It is using XMLRPC protocol which has been deprecated and replaced by JSON-RPC. A new client for vdsm was introduced in 4.1: vdsm-client.This is a simple client that uses JSON-RPC protocol which was introduced in ovirt 3.5. The client is not aware of the available methods and parameters, and you should consult the schema [1] in order to construct the desired command. Future version should parse the schema and provide online help. If you're using vdsClient, we will be happy to assist you in migrating to the new vdsm client. vdsm-client usage: vdsm-client [-h] [-a ADDRESS] [-p PORT] [--unsecure] [--timeout TIMEOUT] [-f FILE] namespace method [name=value [name=value] ...] Invoking simple methods: # vdsm-client Host getVMList['b3f6fa00-b315-4ad4-8108-f73da817b5c5'] For invoking methods with many or complex parameters, you can read the parameters from a JSON format file: # vdsm-client Lease info -f lease.json where lease.json file content is: { "lease": {
Re: [ovirt-users] ?==?utf-8?q? ?==?utf-8?q? ?= vdsClient is removed and replaced by vdsm-clien
Hi, This task is listed (since 2 days) in the GUI / up right "tasks" icon. It is visibly freezed as no OVA file has been created, but no errors in GUI, just... it runs. Or : it loops :) This (test) vm is one on which I have extended the disk (50 -> 100G). Before being stopped and trying to export it to OVA, it works fine. All other vms around can be well exported but not this one. I've tried to restart engine, change SPM, restart one by one each node, but the task is always here. I could even restart the vm today without error and it works fine ! But... the task runs... Today also, I tried to clone the vm : same thing, now I have 2 tasks running indefinitly :( Strange bug, where no timeout stopped the tasks in error. I can't see any revelant things in engine.log or vdsm.log, but probably I've not seen them in all the messages. No problem to remove this (test) vm and try on another (test) one (extend disk to see if this is the reason of the problem). But before I want to kill this tasks ! Thanks. Frank Le Vendredi, Avril 13, 2018 16:24 CEST, Arik Hadas a écrit: On Fri, Apr 13, 2018 at 11:14 AM, fsoyer wrote:Hi all, I can't find any exhaustive doc for new vdsm-client. My problem actually is a task (export a vm to OVA) blocked. I'm afraid you won't find any task in VDSM for 'export a VM to OVA'.Expoting a VM to OVA is comprised of three steps:1. Creating temporary cloned disks - you'll find a task of copy-image-group for each disk.2. Creating the OVA file - that's done by a python script executed by ansible, there is no task for that in VDSM.3. Removing the temporary cloned disks - you'll find a task of remove-image for each disk. Can you please elaborate on the problem you're having - where do you see that task and how can you see that it's blocked? I found that I can interact with vdsm-client Task getInfo taskID=, and replace getInfo by "stop", BUT : how can I find this UUID ??? Old "vdsClient -s 0 getAllTasksStatuses" has no equivalent ?? Does someone knows if a complete doc exists dot vdsm-client ? Thanks Frank Le Mercredi, Janvier 25, 2017 12:30 CET, Irit Goihman a écrit: Hi All, vdsClient will be removed from master branch today.It is using XMLRPC protocol which has been deprecated and replaced by JSON-RPC. A new client for vdsm was introduced in 4.1: vdsm-client.This is a simple client that uses JSON-RPC protocol which was introduced in ovirt 3.5. The client is not aware of the available methods and parameters, and you should consult the schema [1] in order to construct the desired command. Future version should parse the schema and provide online help. If you're using vdsClient, we will be happy to assist you in migrating to the new vdsm client. vdsm-client usage: vdsm-client [-h] [-a ADDRESS] [-p PORT] [--unsecure] [--timeout TIMEOUT] [-f FILE] namespace method [name=value [name=value] ...] Invoking simple methods: # vdsm-client Host getVMList['b3f6fa00-b315-4ad4-8108-f73da817b5c5'] For invoking methods with many or complex parameters, you can read the parameters from a JSON format file: # vdsm-client Lease info -f lease.json where lease.json file content is: { "lease": { "sd_id": "75ab40e3-06b1-4a54-a825-2df7a40b93b2", "lease_id": "b3f6fa00-b315-4ad4-8108-f73da817b5c5" } } It is also possible to read parameters from standard input, creating complex parameters interactively: # cat <https://github.com/oVirt/vdsm/blob/master/lib/api/vdsm-api.yml --Irit GoihmanSoftware EngineerRed Hat Israel Ltd. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] ?==?utf-8?q? ?==?utf-8?q? ?= vdsClient is removed and replaced by vdsm-clien
Ok I see, thank you. vdsm-client Host getAllTasksStatuses works on the SPM. But vdsm-client Task getInfo TaskID=TaskID=55cbec7f-e7dc-4431-bce9-8ec1d61a7feb returns : vdsm-client: Command Task.getInfo with args {'TaskID': '55cbec7f-e7dc-4431-bce9-8ec1d61a7feb'} failed: (code=-32603, message=Internal JSON-RPC error: {'reason': '__init__() takes exactly 2 arguments (1 given)'}) There is no examples with tasks below, and the link to github ends in 404... I'll try to find some docs about API, and tasks management, if you believe that's best. Any link to share ? Thanks, Frank Le Vendredi, Avril 13, 2018 14:41 CEST, Michal Skrivanek a écrit: On 13 Apr 2018, at 10:14, fsoyer wrote: Hi all, I can't find any exhaustive doc for new vdsm-client. My problem actually is a task (export a vm to OVA) blocked. if you want to interact with that action it would always be best to start with engine’s REST API rather than internal host-side API I found that I can interact with vdsm-client Task getInfo taskID=, and replace getInfo by "stop", BUT : how can I find this UUID ???Old "vdsClient -s 0 getAllTasksStatuses" has no equivalent ?? that’s a Host class api, vdsm-client Host getAllTasksStatuses Does someone knows if a complete doc exists dot vdsm-client ? the man page mentioned below and source code. this is not a public API Thanks,michal Thanks Frank Le Mercredi, Janvier 25, 2017 12:30 CET, Irit Goihman a écrit: Hi All, vdsClient will be removed from master branch today.It is using XMLRPC protocol which has been deprecated and replaced by JSON-RPC. A new client for vdsm was introduced in 4.1: vdsm-client.This is a simple client that uses JSON-RPC protocol which was introduced in ovirt 3.5. The client is not aware of the available methods and parameters, and you should consult the schema [1] in order to construct the desired command. Future version should parse the schema and provide online help. If you're using vdsClient, we will be happy to assist you in migrating to the new vdsm client. vdsm-client usage: vdsm-client [-h] [-a ADDRESS] [-p PORT] [--unsecure] [--timeout TIMEOUT] [-f FILE] namespace method [name=value [name=value] ...] Invoking simple methods: # vdsm-client Host getVMList['b3f6fa00-b315-4ad4-8108-f73da817b5c5'] For invoking methods with many or complex parameters, you can read the parameters from a JSON format file: # vdsm-client Lease info -f lease.json where lease.json file content is: { "lease": { "sd_id": "75ab40e3-06b1-4a54-a825-2df7a40b93b2", "lease_id": "b3f6fa00-b315-4ad4-8108-f73da817b5c5" } } It is also possible to read parameters from standard input, creating complex parameters interactively: # cat <https://github.com/oVirt/vdsm/blob/master/lib/api/vdsm-api.yml --Irit GoihmanSoftware EngineerRed Hat Israel Ltd. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] ?==?utf-8?q? vdsClient is removed and replaced by vdsm-client
Hi all, I can't find any exhaustive doc for new vdsm-client. My problem actually is a task (export a vm to OVA) blocked. I found that I can interact with vdsm-client Task getInfo taskID=, and replace getInfo by "stop", BUT : how can I find this UUID ??? Old "vdsClient -s 0 getAllTasksStatuses" has no equivalent ?? Does someone knows if a complete doc exists dot vdsm-client ? Thanks Frank Le Mercredi, Janvier 25, 2017 12:30 CET, Irit Goihman a écrit: Hi All, vdsClient will be removed from master branch today.It is using XMLRPC protocol which has been deprecated and replaced by JSON-RPC. A new client for vdsm was introduced in 4.1: vdsm-client.This is a simple client that uses JSON-RPC protocol which was introduced in ovirt 3.5. The client is not aware of the available methods and parameters, and you should consult the schema [1] in order to construct the desired command. Future version should parse the schema and provide online help. If you're using vdsClient, we will be happy to assist you in migrating to the new vdsm client. vdsm-client usage: vdsm-client [-h] [-a ADDRESS] [-p PORT] [--unsecure] [--timeout TIMEOUT] [-f FILE] namespace method [name=value [name=value] ...] Invoking simple methods: # vdsm-client Host getVMList['b3f6fa00-b315-4ad4-8108-f73da817b5c5'] For invoking methods with many or complex parameters, you can read the parameters from a JSON format file: # vdsm-client Lease info -f lease.json where lease.json file content is: { "lease": { "sd_id": "75ab40e3-06b1-4a54-a825-2df7a40b93b2", "lease_id": "b3f6fa00-b315-4ad4-8108-f73da817b5c5" } } It is also possible to read parameters from standard input, creating complex parameters interactively: # cat
Re: [ovirt-users] VMs with multiple vdisks don't migrate
I Milan, I tried to activate the debug mode, but the restart of libvirt crashed something on the host : it was no more possible to start any vm on it, and migration to it just never started. So I decided to restart it, and to be sure, I've restarted all the hosts. And... now the migration of all VMs, simple or multi-disks, works ?!? So, there was probably something hidden that was resetted or repaired by the global restart ! In french, we call that "tomber en marche" ;) So : solved. Thank you for the wasted time ! -- Cordialement, Frank Soyer Mob. 06 72 28 38 53 - Fix. 05 49 50 52 34 Le Lundi, Février 26, 2018 12:59 CET, Milan Zamazal a écrit: "fsoyer" writes: > I don't beleive that this is relatd to a host, tests have been done from > victor > source to ginger dest and ginger to victor. I don't see problems on storage > (gluster 3.12 native managed by ovirt), when VMs with a uniq disk from 20 to > 250G migrate without error in some seconds and with no downtime. The host itself may be fine, but libvirt/QEMU running there may expose problems, perhaps just for some VMs. According to your logs something is not behaving as expected on the source host during the faulty migration. > How ca I enable this libvirt debug mode ? Set the following options in /etc/libvirt/libvirtd.conf (look for examples in comments there) - log_level=1 - log_outputs="1:file:/var/log/libvirt/libvirtd.log" and restart libvirt. Then /var/log/libvirt/libvirtd.log should contain the log. It will be huge, so I suggest to enable it only for the time of reproducing the problem. > -- > > Cordialement, > > Frank Soyer > > > > Le Vendredi, Février 23, 2018 09:56 CET, Milan Zamazal > a écrit: > Maor Lipchuk writes: > >> I encountered a bug (see [1]) which contains the same error mentioned in >> your VDSM logs (see [2]), but I doubt it is related. > > Indeed, it's not related. > > The error in vdsm_victor.log just means that the info gathering call > tries to access libvirt domain before the incoming migration is > completed. It's ugly but harmless. > >> Milan, maybe you have any advice to troubleshoot the issue? Will the >> libvirt/qemu logs can help? > > It seems there is something wrong on (at least) the source host. There > are no migration progress messages in the vdsm_ginger.log and there are > warnings about stale stat samples. That looks like problems with > calling libvirt – slow and/or stuck calls, maybe due to storage > problems. The possibly faulty second disk could cause that. > > libvirt debug logs could tell us whether that is indeed the problem and > whether it is caused by storage or something else. > >> I would suggest to open a bug on that issue so we can track it more >> properly. >> >> Regards, >> Maor >> >> >> [1] >> https://bugzilla.redhat.com/show_bug.cgi?id=1486543 - Migration leads to >> VM running on 2 Hosts >> >> [2] >> 2018-02-16 09:43:35,236+0100 ERROR (jsonrpc/7) [jsonrpc.JsonRpcServer] >> Internal server error (__init__:577) >> Traceback (most recent call last): >> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 572, >> in _handle_request >> res = method(**params) >> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 198, in >> _dynamicMethod >> result = fn(*methodArgs) >> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies >> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >> File "/usr/share/vdsm/clientIF.py", line 454, in getAllVmIoTunePolicies >> 'current_values': v.getIoTune()} >> File "/usr/share/vdsm/virt/vm.py", line 2859, in getIoTune >> result = self.getIoTuneResponse() >> File "/usr/share/vdsm/virt/vm.py", line 2878, in getIoTuneResponse >> res = self._dom.blockIoTune( >> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47, >> in __getattr__ >> % self.vmid) >> NotConnectedError: VM u'755cf168-de65-42ed-b22f-efe9136f7594' was not >> started yet or was shut down >> >> On Thu, Feb 22, 2018 at 4:22 PM, fsoyer wrote: >> >>> Hi, >>> Yes, on 2018-02-16 (vdsm logs) I tried with a VM standing on ginger >>> (192.168.0.6) migrated (or failed to migrate...) to victor (192.168.0.5), >>> while the engine.log in the first mail on 2018-02-12 was for VMs standing >>> on victor, migrated (or failed to migrate...) to ginger. Symptoms were >>> exactly the same, in both directions, and VMs works like a charm before, >>> and even after (migration "killed
Re: [ovirt-users] VMs with multiple vdisks don't migrate
Hi, I don't beleive that this is relatd to a host, tests have been done from victor source to ginger dest and ginger to victor. I don't see problems on storage (gluster 3.12 native managed by ovirt), when VMs with a uniq disk from 20 to 250G migrate without error in some seconds and with no downtime. How ca I enable this libvirt debug mode ? -- Cordialement, Frank Soyer Le Vendredi, Février 23, 2018 09:56 CET, Milan Zamazal a écrit: Maor Lipchuk writes: > I encountered a bug (see [1]) which contains the same error mentioned in > your VDSM logs (see [2]), but I doubt it is related. Indeed, it's not related. The error in vdsm_victor.log just means that the info gathering call tries to access libvirt domain before the incoming migration is completed. It's ugly but harmless. > Milan, maybe you have any advice to troubleshoot the issue? Will the > libvirt/qemu logs can help? It seems there is something wrong on (at least) the source host. There are no migration progress messages in the vdsm_ginger.log and there are warnings about stale stat samples. That looks like problems with calling libvirt – slow and/or stuck calls, maybe due to storage problems. The possibly faulty second disk could cause that. libvirt debug logs could tell us whether that is indeed the problem and whether it is caused by storage or something else. > I would suggest to open a bug on that issue so we can track it more > properly. > > Regards, > Maor > > > [1] > https://bugzilla.redhat.com/show_bug.cgi?id=1486543 - Migration leads to > VM running on 2 Hosts > > [2] > 2018-02-16 09:43:35,236+0100 ERROR (jsonrpc/7) [jsonrpc.JsonRpcServer] > Internal server error (__init__:577) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 572, > in _handle_request > res = method(**params) > File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 198, in > _dynamicMethod > result = fn(*methodArgs) > File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies > io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() > File "/usr/share/vdsm/clientIF.py", line 454, in getAllVmIoTunePolicies > 'current_values': v.getIoTune()} > File "/usr/share/vdsm/virt/vm.py", line 2859, in getIoTune > result = self.getIoTuneResponse() > File "/usr/share/vdsm/virt/vm.py", line 2878, in getIoTuneResponse > res = self._dom.blockIoTune( > File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47, > in __getattr__ > % self.vmid) > NotConnectedError: VM u'755cf168-de65-42ed-b22f-efe9136f7594' was not > started yet or was shut down > > On Thu, Feb 22, 2018 at 4:22 PM, fsoyer wrote: > >> Hi, >> Yes, on 2018-02-16 (vdsm logs) I tried with a VM standing on ginger >> (192.168.0.6) migrated (or failed to migrate...) to victor (192.168.0.5), >> while the engine.log in the first mail on 2018-02-12 was for VMs standing >> on victor, migrated (or failed to migrate...) to ginger. Symptoms were >> exactly the same, in both directions, and VMs works like a charm before, >> and even after (migration "killed" by a poweroff of VMs). >> Am I the only one experimenting this problem ? >> >> >> Thanks >> -- >> >> Cordialement, >> >> *Frank Soyer * >> >> >> >> Le Jeudi, Février 22, 2018 00:45 CET, Maor Lipchuk >> a écrit: >> >> >> Hi Frank, >> >> Sorry about the delay repond. >> I've been going through the logs you attached, although I could not find >> any specific indication why the migration failed because of the disk you >> were mentionning. >> Does this VM run with both disks on the target host without migration? >> >> Regards, >> Maor >> >> >> On Fri, Feb 16, 2018 at 11:03 AM, fsoyer wrote: >>> >>> Hi Maor, >>> sorry for the double post, I've change the email adress of my account and >>> supposed that I'd need to re-post it. >>> And thank you for your time. Here are the logs. I added a vdisk to an >>> existing VM : it no more migrates, needing to poweroff it after minutes. >>> Then simply deleting the second disk makes migrate it in exactly 9s without >>> problem ! >>> https://gist.github.com/fgth/4707446331d201eef574ac31b6e89561 >>> https://gist.github.com/fgth/f8de9c22664aee53722af676bff8719d >>> >>> -- >>> >>> Cordialement, >>> >>> *Frank Soyer * >>> Le Mercredi, Février 14, 2018 11:04 CET, Maor Lipchuk < >>> mlipc...@redhat.com> a écrit: >>
Re: [ovirt-users] ?==?utf-8?q? VMs with multiple vdisks don't migrate
Hi, Yes, on 2018-02-16 (vdsm logs) I tried with a VM standing on ginger (192.168.0.6) migrated (or failed to migrate...) to victor (192.168.0.5), while the engine.log in the first mail on 2018-02-12 was for VMs standing on victor, migrated (or failed to migrate...) to ginger. Symptoms were exactly the same, in both directions, and VMs works like a charm before, and even after (migration "killed" by a poweroff of VMs). Am I the only one experimenting this problem ? Thanks -- Cordialement, Frank Soyer Le Jeudi, Février 22, 2018 00:45 CET, Maor Lipchuk a écrit: Hi Frank, Sorry about the delay repond.I've been going through the logs you attached, although I could not find any specific indication why the migration failed because of the disk you were mentionning.Does this VM run with both disks on the target host without migration? Regards,Maor On Fri, Feb 16, 2018 at 11:03 AM, fsoyer wrote:Hi Maor, sorry for the double post, I've change the email adress of my account and supposed that I'd need to re-post it. And thank you for your time. Here are the logs. I added a vdisk to an existing VM : it no more migrates, needing to poweroff it after minutes. Then simply deleting the second disk makes migrate it in exactly 9s without problem ! https://gist.github.com/fgth/4707446331d201eef574ac31b6e89561 https://gist.github.com/fgth/f8de9c22664aee53722af676bff8719d -- Cordialement, Frank Soyer Le Mercredi, Février 14, 2018 11:04 CET, Maor Lipchuk a écrit: Hi Frank, I already replied on your last email.Can you provide the VDSM logs from the time of the migration failure for both hosts: ginger.local.systea.fr and victor.local.systea.fr Thanks,Maor On Wed, Feb 14, 2018 at 11:23 AM, fsoyer wrote: Hi all, I discovered yesterday a problem when migrating VM with more than one vdisk. On our test servers (oVirt4.1, shared storage with Gluster), I created 2 VMs needed for a test, from a template with a 20G vdisk. On this VMs I added a 100G vdisk (for this tests I didn't want to waste time to extend the existing vdisks... But I lost time finally...). The VMs with the 2 vdisks works well. Now I saw some updates waiting on the host. I tried to put it in maintenance... But it stopped on the two VM. They were marked "migrating", but no more accessible. Other (small) VMs with only 1 vdisk was migrated without problem at the same time. I saw that a kvm process for the (big) VMs was launched on the source AND destination host, but after tens of minutes, the migration and the VMs was always freezed. I tried to cancel the migration for the VMs : failed. The only way to stop it was to poweroff the VMs : the kvm process died on the 2 hosts and the GUI alerted on a failed migration. In doubt, I tried to delete the second vdisk on one of this VMs : it migrates then without error ! And no access problem. I tried to extend the first vdisk of the second VM, the delete the second vdisk : it migrates now without problem ! So after another test with a VM with 2 vdisks, I can say that this blocked the migration process :( In engine.log, for a VMs with 1 vdisk migrating well, we see :2018-02-12 16:46:29,705+01 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-28) [2f712024-5982-46a8-82c8-fd8293da5725] Lock Acquired to object 'EngineLock:{exclusiveLocks='[3f57e669-5e4c-4d10-85cc-d573004a099d=VM]', sharedLocks=''}' 2018-02-12 16:46:29,955+01 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] Running command: MigrateVmToServerCommand internal: false. Entities affected : ID: 3f57e669-5e4c-4d10-85cc-d573004a099d Type: VMAction group MIGRATE_VM with role type USER 2018-02-12 16:46:30,261+01 INFO [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] START, MigrateVDSCommand( MigrateVDSCommandParameters:{runAsync='true', hostId='ce3938b1-b23f-4d22-840a-f17d7cd87bb1', vmId='3f57e669-5e4c-4d10-85cc-d573004a099d', srcHost='192.168.0.6', dstVdsId='d569c2dd-8f30-4878-8aea-858db285cf69', dstHost='192.168.0.5:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='true', migrateCompressed='false', consoleAddress='null', maxBandwidth='500', enableGuestEvents='true', maxIncomingMigrations='2', maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, {limit=2, action={name=setDowntime, params=[200]}}, {limit=3, action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, params=[400]}}, {limit=6, action={name=setDowntime, params=[500
Re: [ovirt-users] ?==?utf-8?q? VMs with multiple vdisks don't migrate
Hi Maor, sorry for the double post, I've change the email adress of my account and supposed that I'd need to re-post it. And thank you for your time. Here are the logs. I added a vdisk to an existing VM : it no more migrates, needing to poweroff it after minutes. Then simply deleting the second disk makes migrate it in exactly 9s without problem ! https://gist.github.com/fgth/4707446331d201eef574ac31b6e89561 https://gist.github.com/fgth/f8de9c22664aee53722af676bff8719d -- Cordialement, Frank Soyer Le Mercredi, Février 14, 2018 11:04 CET, Maor Lipchuk a écrit: Hi Frank, I already replied on your last email.Can you provide the VDSM logs from the time of the migration failure for both hosts: ginger.local.systea.fr and victor.local.systea.fr Thanks,Maor On Wed, Feb 14, 2018 at 11:23 AM, fsoyer wrote: Hi all, I discovered yesterday a problem when migrating VM with more than one vdisk. On our test servers (oVirt4.1, shared storage with Gluster), I created 2 VMs needed for a test, from a template with a 20G vdisk. On this VMs I added a 100G vdisk (for this tests I didn't want to waste time to extend the existing vdisks... But I lost time finally...). The VMs with the 2 vdisks works well. Now I saw some updates waiting on the host. I tried to put it in maintenance... But it stopped on the two VM. They were marked "migrating", but no more accessible. Other (small) VMs with only 1 vdisk was migrated without problem at the same time. I saw that a kvm process for the (big) VMs was launched on the source AND destination host, but after tens of minutes, the migration and the VMs was always freezed. I tried to cancel the migration for the VMs : failed. The only way to stop it was to poweroff the VMs : the kvm process died on the 2 hosts and the GUI alerted on a failed migration. In doubt, I tried to delete the second vdisk on one of this VMs : it migrates then without error ! And no access problem. I tried to extend the first vdisk of the second VM, the delete the second vdisk : it migrates now without problem ! So after another test with a VM with 2 vdisks, I can say that this blocked the migration process :( In engine.log, for a VMs with 1 vdisk migrating well, we see :2018-02-12 16:46:29,705+01 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-28) [2f712024-5982-46a8-82c8-fd8293da5725] Lock Acquired to object 'EngineLock:{exclusiveLocks='[3f57e669-5e4c-4d10-85cc-d573004a099d=VM]', sharedLocks=''}' 2018-02-12 16:46:29,955+01 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] Running command: MigrateVmToServerCommand internal: false. Entities affected : ID: 3f57e669-5e4c-4d10-85cc-d573004a099d Type: VMAction group MIGRATE_VM with role type USER 2018-02-12 16:46:30,261+01 INFO [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] START, MigrateVDSCommand( MigrateVDSCommandParameters:{runAsync='true', hostId='ce3938b1-b23f-4d22-840a-f17d7cd87bb1', vmId='3f57e669-5e4c-4d10-85cc-d573004a099d', srcHost='192.168.0.6', dstVdsId='d569c2dd-8f30-4878-8aea-858db285cf69', dstHost='192.168.0.5:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='true', migrateCompressed='false', consoleAddress='null', maxBandwidth='500', enableGuestEvents='true', maxIncomingMigrations='2', maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, {limit=2, action={name=setDowntime, params=[200]}}, {limit=3, action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, params=[400]}}, {limit=6, action={name=setDowntime, params=[500]}}, {limit=-1, action={name=abort, params=[]}}]]'}), log id: 14f61ee0 2018-02-12 16:46:30,262+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] START, MigrateBrokerVDSCommand(HostName = victor.local.systea.fr, MigrateVDSCommandParameters:{runAsync='true', hostId='ce3938b1-b23f-4d22-840a-f17d7cd87bb1', vmId='3f57e669-5e4c-4d10-85cc-d573004a099d', srcHost='192.168.0.6', dstVdsId='d569c2dd-8f30-4878-8aea-858db285cf69', dstHost='192.168.0.5:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='true', migrateCompressed='false', consoleAddress='null', maxBandwidth='500', enableGuestEvents='true', maxIncomingMigrations='2',
[ovirt-users] VMs with multiple vdisks don't migrate
Hi all, I discovered yesterday a problem when migrating VM with more than one vdisk. On our test servers (oVirt4.1, shared storage with Gluster), I created 2 VMs needed for a test, from a template with a 20G vdisk. On this VMs I added a 100G vdisk (for this tests I didn't want to waste time to extend the existing vdisks... But I lost time finally...). The VMs with the 2 vdisks works well. Now I saw some updates waiting on the host. I tried to put it in maintenance... But it stopped on the two VM. They were marked "migrating", but no more accessible. Other (small) VMs with only 1 vdisk was migrated without problem at the same time. I saw that a kvm process for the (big) VMs was launched on the source AND destination host, but after tens of minutes, the migration and the VMs was always freezed. I tried to cancel the migration for the VMs : failed. The only way to stop it was to poweroff the VMs : the kvm process died on the 2 hosts and the GUI alerted on a failed migration. In doubt, I tried to delete the second vdisk on one of this VMs : it migrates then without error ! And no access problem. I tried to extend the first vdisk of the second VM, the delete the second vdisk : it migrates now without problem ! So after another test with a VM with 2 vdisks, I can say that this blocked the migration process :( In engine.log, for a VMs with 1 vdisk migrating well, we see :2018-02-12 16:46:29,705+01 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-28) [2f712024-5982-46a8-82c8-fd8293da5725] Lock Acquired to object 'EngineLock:{exclusiveLocks='[3f57e669-5e4c-4d10-85cc-d573004a099d=VM]', sharedLocks=''}' 2018-02-12 16:46:29,955+01 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] Running command: MigrateVmToServerCommand internal: false. Entities affected : ID: 3f57e669-5e4c-4d10-85cc-d573004a099d Type: VMAction group MIGRATE_VM with role type USER 2018-02-12 16:46:30,261+01 INFO [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] START, MigrateVDSCommand( MigrateVDSCommandParameters:{runAsync='true', hostId='ce3938b1-b23f-4d22-840a-f17d7cd87bb1', vmId='3f57e669-5e4c-4d10-85cc-d573004a099d', srcHost='192.168.0.6', dstVdsId='d569c2dd-8f30-4878-8aea-858db285cf69', dstHost='192.168.0.5:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='true', migrateCompressed='false', consoleAddress='null', maxBandwidth='500', enableGuestEvents='true', maxIncomingMigrations='2', maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, {limit=2, action={name=setDowntime, params=[200]}}, {limit=3, action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, params=[400]}}, {limit=6, action={name=setDowntime, params=[500]}}, {limit=-1, action={name=abort, params=[]}}]]'}), log id: 14f61ee0 2018-02-12 16:46:30,262+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] START, MigrateBrokerVDSCommand(HostName = victor.local.systea.fr, MigrateVDSCommandParameters:{runAsync='true', hostId='ce3938b1-b23f-4d22-840a-f17d7cd87bb1', vmId='3f57e669-5e4c-4d10-85cc-d573004a099d', srcHost='192.168.0.6', dstVdsId='d569c2dd-8f30-4878-8aea-858db285cf69', dstHost='192.168.0.5:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='true', migrateCompressed='false', consoleAddress='null', maxBandwidth='500', enableGuestEvents='true', maxIncomingMigrations='2', maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, {limit=2, action={name=setDowntime, params=[200]}}, {limit=3, action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, params=[400]}}, {limit=6, action={name=setDowntime, params=[500]}}, {limit=-1, action={name=abort, params=[]}}]]'}), log id: 775cd381 2018-02-12 16:46:30,277+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] FINISH, MigrateBrokerVDSCommand, log id: 775cd381 2018-02-12 16:46:30,285+01 INFO [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] FINISH, MigrateVDSCommand, return: MigratingFrom, log id: 14f61ee0 2018-02-12 16:46:30,301+01 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] EVENT_ID: VM_MIGRATION_START(62), Correlation ID: 2f712024-5982-46a8-82c8-fd8293da5725, Job ID: 4bd19aa9-cc99-4d02-884e-5a1e8
[ovirt-users] VM with multiple vdisks can't migrate
Hi all, I discovered yesterday a problem when migrating VM with more than one vdisk. On our test servers (oVirt4.1, shared storage with Gluster), I created 2 VMs needed for a test, from a template with a 20G vdisk. On this VMs I added a 100G vdisk (for this tests I didn't want to waste time to extend the existing vdisks... But I lost time finally...). The VMs with the 2 vdisks works well. Now I saw some updates waiting on the host. I tried to put it in maintenance... But it stopped on the two VM. They were marked "migrating", but no more accessible. Other (small) VMs with only 1 vdisk was migrated without problem at the same time. I saw that a kvm process for the (big) VMs was launched on the source AND destination host, but after tens of minutes, the migration and the VMs was always freezed. I tried to cancel the migration for the VMs : failed. The only way to stop it was to poweroff the VMs : the kvm process died on the 2 hosts and the GUI alerted on a failed migration. In doubt, I tried to delete the second vdisk on one of this VMs : it migrates then without error ! And no access problem. I tried to extend the first vdisk of the second VM, the delete the second vdisk : it migrates now without problem ! So after another test with a VM with 2 vdisks, I can say that this blocked the migration process :( In engine.log, for a VMs with 1 vdisk migrating well, we see :2018-02-12 16:46:29,705+01 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-28) [2f712024-5982-46a8-82c8-fd8293da5725] Lock Acquired to object 'EngineLock:{exclusiveLocks='[3f57e669-5e4c-4d10-85cc-d573004a099d=VM]', sharedLocks=''}' 2018-02-12 16:46:29,955+01 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] Running command: MigrateVmToServerCommand internal: false. Entities affected : ID: 3f57e669-5e4c-4d10-85cc-d573004a099d Type: VMAction group MIGRATE_VM with role type USER 2018-02-12 16:46:30,261+01 INFO [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] START, MigrateVDSCommand( MigrateVDSCommandParameters:{runAsync='true', hostId='ce3938b1-b23f-4d22-840a-f17d7cd87bb1', vmId='3f57e669-5e4c-4d10-85cc-d573004a099d', srcHost='192.168.0.6', dstVdsId='d569c2dd-8f30-4878-8aea-858db285cf69', dstHost='192.168.0.5:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='true', migrateCompressed='false', consoleAddress='null', maxBandwidth='500', enableGuestEvents='true', maxIncomingMigrations='2', maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, {limit=2, action={name=setDowntime, params=[200]}}, {limit=3, action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, params=[400]}}, {limit=6, action={name=setDowntime, params=[500]}}, {limit=-1, action={name=abort, params=[]}}]]'}), log id: 14f61ee0 2018-02-12 16:46:30,262+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] START, MigrateBrokerVDSCommand(HostName = victor.local.systea.fr, MigrateVDSCommandParameters:{runAsync='true', hostId='ce3938b1-b23f-4d22-840a-f17d7cd87bb1', vmId='3f57e669-5e4c-4d10-85cc-d573004a099d', srcHost='192.168.0.6', dstVdsId='d569c2dd-8f30-4878-8aea-858db285cf69', dstHost='192.168.0.5:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='true', migrateCompressed='false', consoleAddress='null', maxBandwidth='500', enableGuestEvents='true', maxIncomingMigrations='2', maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, {limit=2, action={name=setDowntime, params=[200]}}, {limit=3, action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, params=[400]}}, {limit=6, action={name=setDowntime, params=[500]}}, {limit=-1, action={name=abort, params=[]}}]]'}), log id: 775cd381 2018-02-12 16:46:30,277+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] FINISH, MigrateBrokerVDSCommand, log id: 775cd381 2018-02-12 16:46:30,285+01 INFO [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] FINISH, MigrateVDSCommand, return: MigratingFrom, log id: 14f61ee0 2018-02-12 16:46:30,301+01 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] EVENT_ID: VM_MIGRATION_START(62), Correlation ID: 2f712024-5982-46a8-82c8-fd8293da5725, Job ID: 4bd19aa9-cc99-4d02-884e-5a1e8