from:"fsoyer"

[ovirt-users] Re: engine mails about FSM states

2018-12-14 Thread fsoyer


In borker.log I found this, just before 05:59am:Thread-3::INFO::2018-12-13 
05:58:45,634::mem_free::51::mem_free.MemFree::(action) memFree: 82101
Thread-1::INFO::2018-12-13 05:58:46,322::ping::60::ping.Ping::(action) 
Successfully pinged 10.0.1.254
Thread-5::INFO::2018-12-13 
05:58:46,611::engine_health::241::engine_health.EngineHealth::(_result_from_stats)
 VM is up on this host with healthy engine
Thread-2::INFO::2018-12-13 
05:58:49,144::mgmt_bridge::62::mgmt_bridge.MgmtBridge::(action) Found bridge 
ovirtmgmt with ports
StatusStorageThread::ERROR::2018-12-13 
05:58:54,935::status_broker::90::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run)
 Failed to update state.
Traceback (most recent call last):
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py",
 line 82, in run
    if (self._status_broker._inquire_whiteboard_lock() or
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py",
 line 190, in _inquire_whiteboard_lock
    self.host_id, self._lease_file)
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py",
 line 128, in host_id
    raise ex.HostIdNotLockedError("Host id is not set")
HostIdNotLockedError: Host id is not set
StatusStorageThread::ERROR::2018-12-13 
05:58:54,937::status_broker::70::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(trigger_restart)
 Trying to restart the broker"Host is not set" ???
--

Regards,

Frank

Le Vendredi, Décembre 14, 2018 12:27 CET, Martin Sivak  a 
écrit:
 Hi, check the broker.log as well. The connect is used to talk to 
ovirt-ha-broker service socket. Best regards Martin Sivak   On Fri, Dec 14, 
2018 at 12:20 PM fsoyer  wrote:I think I have it in 
agent.log. What can be this "file not found" ?
MainThread::ERROR::2018-12-13 
05:59:03,909::hosted_engine::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
 Unhandled monitoring loop exception
Traceback (most recent call last):
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 428, in start_monitoring
    self._monitoring_loop()
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 447, in _monitoring_loop
    for old_state, state, delay in self.fsm:
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py", 
line 127, in next
    new_data = self.refresh(self._state.data)
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
 line 81, in refresh
    stats.update(self.hosted_engine.collect_stats())
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 736, in collect_stats
    all_stats = self._broker.get_stats_from_storage()
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 135, in get_stats_from_storage
    result = self._proxy.get_stats()
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1301, in single_request
    self.send_content(h, request_body)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1448, in send_content
    connection.endheaders(request_body)
  File "/usr/lib64/python2.7/httplib.py", line 1037, in endheaders
    self._send_output(message_body)
  File "/usr/lib64/python2.7/httplib.py", line 881, in _send_output
    self.send(msg)
  File "/usr/lib64/python2.7/httplib.py", line 843, in send
    self.connect()
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 
52, in connect
    self.sock.connect(base64.b16decode(self.host))
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory
MainThread::ERROR::2018-12-13 
05:59:04,043::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
 Traceback (most recent call last):
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 
131, in _run_agent
    return action(he)
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 
55, in action_proper
    return he.start_monitoring()
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 435, in start_monitoring
    self.publish(stopped)
  File 
"/usr/lib/python2.7/site-packages/ovirt_h

[ovirt-users] Re: engine mails about FSM states

2018-12-14 Thread fsoyer

a.agent.agent.Agent::(_run_agent)
 Trying to restart agent
MainThread::INFO::2018-12-13 
05:59:04,044::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent 
shutting down
MainThread::INFO::2018-12-13 
05:59:14,923::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
ovirt-hosted-engine-ha agent 2.2.16 started

--

Cordialement,

Frank Soyer
Mob. 06 72 28 38 53 - Fix. 05 49 50 52 34
Systea IG
Administration systèmes, réseaux et bases de données
www.systea.net
Membre du réseau Les Professionnels du Numérique
KoGite
Hébergement de proximité  
www.kogite.fr
 

Le Vendredi, Décembre 14, 2018 12:11 CET, Martin Sivak  a 
écrit:
 Hi,

no StartState is not common, it is only ever entered when the agent
boots up. So something restarted or killed the agent process. Check
the agent log in /var/log/ovirt-hosted-engine-ha for errors.

Best regards

Martin Sivak

On Fri, Dec 14, 2018 at 12:05 PM fsoyer  wrote:
>
> Hi Martin,
> my problem is that nobody restarted the agent. Do you mean that this is not a 
> normal behavior ? Is it possible that it restarts itself ?
>
> Thanks
> --
>
> Regards,
>
> Frank
>
>
>
> Le Jeudi, Décembre 13, 2018 15:25 CET, Martin Sivak  a 
> écrit:
>
>
> Hi,
>
> those are state change notifications from the hosted engine agent. It
> basically means somebody restarted the ha-agent process and it found
> out the VM is still running fine and returned to the proper state.
>
> Configuring it is possible using the broker.conf file in
> /etc/ovirt-hosted-engine-ha (look for the notification section) or the
> hosted-engine tool (search --help for set config) depending on the
> version of hosted engine you are using.
>
> Best regards
>
> --
> Martin Sivak
>
>
> On Thu, Dec 13, 2018 at 3:10 PM fsoyer  wrote:
> >
> > Hi,
> > I don't find revelant answer about this. Sorry il this was already asked.
> > I receive randomly (one or two tims a week, differents hours) 3 mails with 
> > this subjects :
> > first : ovirt-hosted-engine state transition StartState-ReinitializeFSM
> > second : ovirt-hosted-engine state transition ReinitializeFSM-EngineStarting
> > third : ovirt-hosted-engine state transition EngineStarting-EngineUp
> > all at exactly the same time. The "events" in GUI doesn't indicate anything 
> > about this. No impact on engine or VMs.
> > So I wonder what this messages means ? And, if case of just "info" 
> > messages, is there a way to disable them ?
> >
> > Thanks.
> > --
> >
> > Reagrds,
> >
> > Frank
> >
> > ___
> > Users mailing list -- users@ovirt.org
> > To unsubscribe send an email to users-le...@ovirt.org
> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> > oVirt Code of Conduct: 
> > https://www.ovirt.org/community/about/community-guidelines/
> > List Archives: 
> > https://lists.ovirt.org/archives/list/users@ovirt.org/message/CVEHTWILWDEHASTCQHFHX62U4K4ZCOSK/
>
>
>
>
 
 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXCKOVSGK42ZNTG2KOEIBW65CD4ET6B4/

[ovirt-users] Re: engine mails about FSM states

2018-12-14 Thread fsoyer

Hi Martin,
my problem is that nobody restarted the agent. Do you mean that this is not a 
normal behavior ? Is it possible that it restarts itself ?

Thanks
--

Regards,

Frank

Le Jeudi, Décembre 13, 2018 15:25 CET, Martin Sivak  a écrit:
 Hi,

those are state change notifications from the hosted engine agent. It
basically means somebody restarted the ha-agent process and it found
out the VM is still running fine and returned to the proper state.

Configuring it is possible using the broker.conf file in
/etc/ovirt-hosted-engine-ha (look for the notification section) or the
hosted-engine tool (search --help for set config) depending on the
version of hosted engine you are using.

Best regards

--
Martin Sivak

On Thu, Dec 13, 2018 at 3:10 PM fsoyer  wrote:
>
> Hi,
> I don't find revelant answer about this. Sorry il this was already asked.
> I receive randomly (one or two tims a week, differents hours) 3 mails with 
> this subjects :
> first : ovirt-hosted-engine state transition StartState-ReinitializeFSM
> second : ovirt-hosted-engine state transition ReinitializeFSM-EngineStarting
> third : ovirt-hosted-engine state transition EngineStarting-EngineUp
> all at exactly the same time. The "events" in GUI doesn't indicate anything 
> about this. No impact on engine or VMs.
> So I wonder what this messages means ? And, if case of just "info" messages, 
> is there a way to disable them ?
>
> Thanks.
> --
>
> Reagrds,
>
> Frank
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/CVEHTWILWDEHASTCQHFHX62U4K4ZCOSK/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JKVPQ2ZTQHH2U4C6JJN6ZMBYHBGK2P5E/

[ovirt-users] engine mails about FSM states

2018-12-13 Thread fsoyer


Hi,
I don't find revelant answer about this. Sorry il this was already asked.
I receive randomly (one or two tims a week, differents hours) 3 mails with this 
subjects :
first : ovirt-hosted-engine state transition StartState-ReinitializeFSM
second : ovirt-hosted-engine state transition ReinitializeFSM-EngineStarting
third : ovirt-hosted-engine state transition EngineStarting-EngineUp
all at exactly the same time. The "events" in GUI doesn't indicate anything 
about this. No impact on engine or VMs.
So I wonder what this messages means ? And, if case of just "info" messages, is 
there a way to disable them ?

Thanks.
--

Reagrds,

Frank
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CVEHTWILWDEHASTCQHFHX62U4K4ZCOSK/

[ovirt-users] Re: VM ramdomly unresponsive

2018-12-07 Thread fsoyer


Hi,
I can say now that the problem was related to storage performances, as there 
was no more errors since the replacement of the raid cards.
Thanks for all,
Frank
 

Le Mardi, Novembre 27, 2018 08:30 CET, Sahina Bose  a écrit:
 On Tue, Nov 13, 2018 at 4:46 PM fsoyer  wrote:
>
> Hi all,
> I continue to try to understand my problem between (I suppose) oVirt anf 
> Gluster.
> After my recents posts titled 'VMs unexpectidly restarted' that did not 
> provide solution nor search idea, I submit to you another (related ?) problem.
> Parallely with the problem of VMs down (that did not reproduce since Oct 16), 
> I have ramdomly some events in the GUI saying "VM x is not responding." 
> For example, VM "patjoub1" on 2018-11-11 14:34. Never the same hour, not all 
> the days, often this VM patjoub1 but not always : I had it on two others. All 
> VMs disks are on a volume DATA02 (with leases on the same volume).
>
> Searching in engine.log, I found :
>
> 2018-11-11 14:34:32,953+01 INFO 
> [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
> (EE-ManagedThreadFactory-engineScheduled-Thread-28) [] VM 
> '6116fb07-096b-4c7e-97fe-01ecc9a6bd9b'(patjoub1) moved from 'Up' --> 
> 'NotResponding'
> 2018-11-11 14:34:33,116+01 WARN 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder] 
> (EE-ManagedThreadFactory-engineScheduled-Thread-1) [] Invalid or unknown 
> guest architecture type '' received from guest agent
> 2018-11-11 14:34:33,176+01 WARN 
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
> (EE-ManagedThreadFactory-engineScheduled-Thread-28) [] EVENT_ID: 
> VM_NOT_RESPONDING(126), VM patjoub1 is not responding.
> ...
> ...
> 2018-11-11 14:34:48,278+01 INFO 
> [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
> (EE-ManagedThreadFactory-engineScheduled-Thread-48) [] VM 
> '6116fb07-096b-4c7e-97fe-01ecc9a6bd9b'(patjoub1) moved from 'NotResponding' 
> --> 'Up'
>
> So it becomes up 15s after, and the VM (and the monitoring) see no downtime.
> At this time, I see in vdsm.log of the nodes :
>
> 2018-11-11 14:33:49,450+0100 ERROR (check/loop) [storage.Monitor] Error 
> checking path 
> /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata
>  (monitor:498)
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 496, in 
> _pathChecked
> delay = result.delay()
> File "/usr/lib/python2.7/site-packages/vdsm/storage/check.py", line 391, in 
> delay
> raise exception.MiscFileReadException(self.path, self.rc, self.err)
> MiscFileReadException: Internal file read failure: 
> (u'/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata',
>  1, 'Read timeout')
> 2018-11-11 14:33:49,450+0100 INFO (check/loop) [storage.Monitor] Domain 
> ffc53fd8-c5d1-4070-ae51-2e91835cd937 became INVALID (monitor:469)
>
> 2018-11-11 14:33:59,451+0100 WARN (check/loop) [storage.check] Checker 
> u'/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata'
>  is blocked for 20.00 seconds (check:282)
>
> 2018-11-11 14:34:09,480+0100 INFO (event/37) [storage.StoragePool] Linking 
> /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937
>  to 
> /rhev/data-center/6efda7f8-b62f-11e8-9d16-00163e263d21/ffc53fd8-c5d1-4070-ae51-2e91835cd937
>  (sp:1230)
>
> OK : so, DATA02 marked as blocked for 20s ? I definitly have a problem with 
> gluster ? I'll inevitably find the reason in the gluster logs ? Uh : not at 
> all.
> Please see gluster logs here :
> https://seafile.systea.fr/d/65df86cca9d34061a1e4/
>
> Unfortunatly I discovered this morning that I have not the sanlock.log for 
> this date. I don't understand why, the log rotate seems OK with "rotate 3", 
> but I have no backups files :(.
> But, luck in bad luck, the same event occurs this morning ! Same VM patjoub1, 
> 2018-11-13 08:01:37. So I have added the sanlock.log for today, maybe it can 
> help.
>
> IMPORTANT NOTE : don't forget that Gluster log with on hour shift. For this 
> event at 14:34, search at 13h34 in gluster logs.
> I recall my configuration :
> Gluster 3.12.13
> oVirt 4.2.3
> 3 nodes where the third is arbiter (volumes in replica 2)
>
> The nodes are never overloaded (CPU average 5%, no peak detected at the time 
> of the event, mem 128G used at 15% (only 10 VMs on this cluster)). Network 
> underused, gluster is on a separa

[ovirt-users] Re: VM ramdomly unresponsive

2018-11-27 Thread fsoyer


Hi,
questioning me about all the chain oVirt -> Gluster -> hardware, I continued to 
check all the components, finally testing the hardware.
I found some latencies on storage when it was busy, and some searches on web 
convinced me that the RAID cards could be the problem : Dell servers are 
shipped with H310 cards which do not support cache... last week we ordered H710 
cards, providing cache, installed Saturday. Since it, storage performances are 
better, and I noticed no more timeout or errors. But it happened ramdomly, so I 
wait some days more to say that this is solved !

Thank you for the wasted time,
--

Regards,

Frank
Le Mardi, Novembre 27, 2018 08:30 CET, Sahina Bose  a écrit:
 On Tue, Nov 13, 2018 at 4:46 PM fsoyer  wrote:
>
> Hi all,
> I continue to try to understand my problem between (I suppose) oVirt anf 
> Gluster.
> After my recents posts titled 'VMs unexpectidly restarted' that did not 
> provide solution nor search idea, I submit to you another (related ?) problem.
> Parallely with the problem of VMs down (that did not reproduce since Oct 16), 
> I have ramdomly some events in the GUI saying "VM x is not responding." 
> For example, VM "patjoub1" on 2018-11-11 14:34. Never the same hour, not all 
> the days, often this VM patjoub1 but not always : I had it on two others. All 
> VMs disks are on a volume DATA02 (with leases on the same volume).
>
> Searching in engine.log, I found :
>
> 2018-11-11 14:34:32,953+01 INFO 
> [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
> (EE-ManagedThreadFactory-engineScheduled-Thread-28) [] VM 
> '6116fb07-096b-4c7e-97fe-01ecc9a6bd9b'(patjoub1) moved from 'Up' --> 
> 'NotResponding'
> 2018-11-11 14:34:33,116+01 WARN 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder] 
> (EE-ManagedThreadFactory-engineScheduled-Thread-1) [] Invalid or unknown 
> guest architecture type '' received from guest agent
> 2018-11-11 14:34:33,176+01 WARN 
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
> (EE-ManagedThreadFactory-engineScheduled-Thread-28) [] EVENT_ID: 
> VM_NOT_RESPONDING(126), VM patjoub1 is not responding.
> ...
> ...
> 2018-11-11 14:34:48,278+01 INFO 
> [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
> (EE-ManagedThreadFactory-engineScheduled-Thread-48) [] VM 
> '6116fb07-096b-4c7e-97fe-01ecc9a6bd9b'(patjoub1) moved from 'NotResponding' 
> --> 'Up'
>
> So it becomes up 15s after, and the VM (and the monitoring) see no downtime.
> At this time, I see in vdsm.log of the nodes :
>
> 2018-11-11 14:33:49,450+0100 ERROR (check/loop) [storage.Monitor] Error 
> checking path 
> /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata
>  (monitor:498)
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 496, in 
> _pathChecked
> delay = result.delay()
> File "/usr/lib/python2.7/site-packages/vdsm/storage/check.py", line 391, in 
> delay
> raise exception.MiscFileReadException(self.path, self.rc, self.err)
> MiscFileReadException: Internal file read failure: 
> (u'/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata',
>  1, 'Read timeout')
> 2018-11-11 14:33:49,450+0100 INFO (check/loop) [storage.Monitor] Domain 
> ffc53fd8-c5d1-4070-ae51-2e91835cd937 became INVALID (monitor:469)
>
> 2018-11-11 14:33:59,451+0100 WARN (check/loop) [storage.check] Checker 
> u'/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata'
>  is blocked for 20.00 seconds (check:282)
>
> 2018-11-11 14:34:09,480+0100 INFO (event/37) [storage.StoragePool] Linking 
> /rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937
>  to 
> /rhev/data-center/6efda7f8-b62f-11e8-9d16-00163e263d21/ffc53fd8-c5d1-4070-ae51-2e91835cd937
>  (sp:1230)
>
> OK : so, DATA02 marked as blocked for 20s ? I definitly have a problem with 
> gluster ? I'll inevitably find the reason in the gluster logs ? Uh : not at 
> all.
> Please see gluster logs here :
> https://seafile.systea.fr/d/65df86cca9d34061a1e4/
>
> Unfortunatly I discovered this morning that I have not the sanlock.log for 
> this date. I don't understand why, the log rotate seems OK with "rotate 3", 
> but I have no backups files :(.
> But, luck in bad luck, the same event occurs this morning ! Same VM patjoub1, 
> 2018-11-13 08:01:37. So I have added the sanlock.log for today, maybe it can 
> help.
>
> IM

[ovirt-users] VM ramdomly unresponsive

2018-11-13 Thread fsoyer


Hi all,
I continue to try to understand my problem between (I suppose) oVirt anf 
Gluster.
After my recents posts titled 'VMs unexpectidly restarted' that did not provide 
solution nor search idea, I submit to you another (related ?) problem.
Parallely with the problem of VMs down (that did not reproduce since Oct 16), I 
have ramdomly some events in the GUI saying "VM x is not responding." For 
example, VM "patjoub1" on 2018-11-11 14:34. Never the same hour, not all the 
days, often this VM patjoub1 but not always : I had it on two others. All VMs 
disks are on a volume DATA02 (with leases on the same volume).

Searching in engine.log, I found :
2018-11-11 14:34:32,953+01 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(EE-ManagedThreadFactory-engineScheduled-Thread-28) [] VM 
'6116fb07-096b-4c7e-97fe-01ecc9a6bd9b'(patjoub1) moved from 'Up' --> 
'NotResponding'
2018-11-11 14:34:33,116+01 WARN  
[org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder] 
(EE-ManagedThreadFactory-engineScheduled-Thread-1) [] Invalid or unknown guest 
architecture type '' received from guest agent
2018-11-11 14:34:33,176+01 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-28) [] EVENT_ID: 
VM_NOT_RESPONDING(126), VM patjoub1 is not responding.
...
...
2018-11-11 14:34:48,278+01 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(EE-ManagedThreadFactory-engineScheduled-Thread-48) [] VM 
'6116fb07-096b-4c7e-97fe-01ecc9a6bd9b'(patjoub1) moved from 'NotResponding' --> 
'Up'So it becomes up 15s after, and the VM (and the monitoring) see no downtime.
At this time, I see in vdsm.log of the nodes :
2018-11-11 14:33:49,450+0100 ERROR (check/loop) [storage.Monitor] Error 
checking path 
/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata
 (monitor:498)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 496, in 
_pathChecked
    delay = result.delay()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/check.py", line 391, in 
delay
    raise exception.MiscFileReadException(self.path, self.rc, self.err)
MiscFileReadException: Internal file read failure: 
(u'/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata',
 1, 'Read timeout')
2018-11-11 14:33:49,450+0100 INFO  (check/loop) [storage.Monitor] Domain 
ffc53fd8-c5d1-4070-ae51-2e91835cd937 became INVALID (monitor:469)

2018-11-11 14:33:59,451+0100 WARN  (check/loop) [storage.check] Checker 
u'/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/metadata'
 is blocked for 20.00 seconds (check:282)

2018-11-11 14:34:09,480+0100 INFO  (event/37) [storage.StoragePool] Linking 
/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937
 to 
/rhev/data-center/6efda7f8-b62f-11e8-9d16-00163e263d21/ffc53fd8-c5d1-4070-ae51-2e91835cd937
 (sp:1230)OK : so, DATA02 marked as blocked for 20s ? I definitly have a 
problem with gluster ? I'll inevitably find the reason in the gluster logs ? Uh 
: not at all.
Please see gluster logs here :
https://seafile.systea.fr/d/65df86cca9d34061a1e4/

Unfortunatly I discovered this morning that I have not the sanlock.log for this 
date. I don't understand why, the log rotate seems OK with "rotate 3", but I 
have no backups files :(.
But, luck in bad luck, the same event occurs this morning ! Same VM patjoub1, 
2018-11-13 08:01:37. So I have added the sanlock.log for today, maybe it can 
help.

IMPORTANT NOTE : don't forget that Gluster log with on hour shift. For this 
event at 14:34, search at 13h34 in gluster logs.
I recall my configuration :
Gluster 3.12.13
oVirt 4.2.3
3 nodes where the third is arbiter (volumes in replica 2)

The nodes are never overloaded (CPU average 5%, no peak detected at the time of 
the event, mem 128G used at 15% (only 10 VMs on this cluster)). Network 
underused, gluster is on a separate network on a bond (2 NICs) 1+1Gb mode 4 = 
2Gb, used in peak at 10%.

Here is the configuration for the given volume :
# gluster volume status DATA02
Status of volume: DATA02
Gluster process                             TCP Port  RDMA Port  Online  Pid
--
Brick victorstorage.local.systea.fr:/home/d
ata02/data02/brick                          49158     0          Y       4990 
Brick gingerstorage.local.systea.fr:/home/d
ata02/data02/brick                          49153     0          Y       8460 
Brick eskarinastorage.local.systea.fr:/home
/data01/data02/brick                        49158     0          Y       2470 
Self-heal Daemon on localhost               N/A       N/A        Y       8771 
Self-heal Daemon on eskarinastorage.local.s
ystea.fr                                    N/A

[ovirt-users] Re: VM paused then killed with "device vda reported I/O error"

2018-10-31 Thread fsoyer

Hi,
unfortunatly, at the time of this error there was no messages. In fact, this 
file (rhev-data-center-mnt-glusterSD-victor.local.systea.fr:_DATA01.log) 
contains only :
[2018-10-21 01:41:11.646681] I [glusterfsd-mgmt.c:1894:mgmt_getspec_cbk] 
0-glusterfs: No change in volfile,continuing
[2018-10-28 02:08:10.081010] I [MSGID: 100011] [glusterfsd.c:1446:reincarnate] 
0-glusterfsd: Fetching the volume file from server...
As I said in my another post "VMs unexpectidly restarted" which is maybe 
related to this one (not exactly the same things but close enough... Please see 
this posts as they can give some additional informations ? Especially the 
messages since sunday 28 12:42, after I tried an export/import of a VM on 
saturday 27) I see no revelant error in gluster logs at time of the problems 
occur on the VMs.

Though, looking on this files, I find some messages in the mount log of the 
second volume DATA02, concerned by the second problem on oct-27. I'll add this 
messages on the other post "VMs unexpectidly restarted" to avoid confusion. I 
might not have open this new post after this one, but on the moment I couldn't 
be sure that this was a unique problem.

Thank you for your time,
--

Regards,

Frank

Le Mercredi, Octobre 31, 2018 08:15 CET, Sahina Bose  a 
écrit:
 2018-10-25 01:21:07,944+0200 INFO (libvirt/events) [virt.vm]
(vmId='14fb9d79-c603-4691-b19e-9133c6bd5e22') abnormal vm stop device
ua-134c4848-6897-46fc-b346-dd4a180ac653 error eio (vm:5158)
2018-10-25 01:21:07,944+0200 INFO (libvirt/events) [virt.vm]
(vmId='14fb9d79-c603-4691-b19e-9133c6bd5e22') CPU stopped: onIOError
(vm:6199)
2018-10-25 01:21:08,030+0200 INFO (libvirt/events) [virt.vm]
(vmId='14fb9d79-c603-4691-b19e-9133c6bd5e22') CPU stopped: onSuspend
(vm:6199)

This indicates an I/O error from storage layer most likely. Can you
also provide the mount logs for the gluster volume that hosts these
VM's disks (under
/var/log/glusterfs/rhev-data-center-mnt-glusterSD-.log)

On Fri, Oct 26, 2018 at 12:38 AM fsoyer  wrote:
>
> Oops, reading my message I find an error : the problem occurs at 1:21AM not 
> 1:01 :/
>
> Frank
>
>
>
> Le Jeudi, Octobre 25, 2018 17:55 CEST, "fsoyer"  a écrit:
>
>
>
>
> Hi,
> related (or maybe not) with my problem "VMs unexpectidly restarted", I have 
> one VM (only one) which was paused then killed this morning (1:01AM).
> This is the second time (first time about 15 days ago), only this one (it is 
> on a domain with 5 others VMs, and it is not the most used of them. And it 
> was at night, without any particular treatment at this time). The others VMs 
> on the same storage were not impacted at all. And it is not on the same 
> storage domain as the other VM of "VMs unexpectidly restarted"...
> At the same time, gluster seems to have seen absolutly nothing. Is there 
> really a storage issue ??
>
> Here are some revelant logs
> /var/log/messages of the node
> vdsm.log
> engine.log
> glusterd.log
> data01-brick.log
>
> For recall, this is a 3 nodes 4.2.3 cluster, on Gluster 3.12.13 (2+arbiter).
> Any idea where or what I must search for ?
>
> Thanks
> --
>
> Cordialement,
>
> Frank
>
>
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3BI45NBQTKNHLOOS3TO2TAT53LREC4EF/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/J4LWH5J7PUIFOO65YIBSC2BDFT2GD3VY/

[ovirt-users] Re: ETL service and winter hour

2018-10-28 Thread fsoyer

Thank you very much for all this detailed informations, Shirly. The only point
is that we (must I say "I" ?) never really ask for DWH when installing ovirt
with hosted engine, maybe it's a lake of documentation on my part but I never
hear about it in the installation procedure. Where I deduce that it is
installed with ovirt engine, and if it is a Postgre functionality, and if the
Postgre database is automatically created in the engine VM, so it is installed
in the engine VM (Q.E.D. :) )
The warning you point to me maybe is not enough visible in installation
procedures for who lives in a country subject to the summer/winter time... But
even apart of that, I don't remember during the hosted engine installation a
moment where it ask us for a timezone, you see ? So I wonder where or when I
could have force UTC time, in fact...
Can you tell me (and maybe for others installing it in France or similar
countries) if you see when this DWH and timezone questions can be managed in
the hosted engine installation process ?

This said, if I understand correctly your answers below, can I resume as this :
don't touch anything now, as the error is automatically repaired after this "1
hour gap / overlap" (so I have no more messages after 3:00AM). Right ?

Many thanks again,
--

Regards,

Frank

Le Dimanche, Octobre 28, 2018 13:26 CET, Shirly Radco a
écrit:
Please see answers below and let me know if you have any other questions.
Best,--
SHIRLY RADCO
BI SENIOR SOFTWARE ENGINEER
Red Hat IsraelTRIED. TESTED. TRUSTED. On Sun, Oct 28, 2018 at 12:55 PM fsoyer
wrote:Well, I see that I'm late to give the information :)
Thank you to pointing me to this, but I have now some other questions now...
How can I see the timezone of the DB ? "If no time zone is stated in the input
string, then it is assumed to be in the time zone indicated by the system's
TimeZone parameter, and is converted to UTC using the offset for the timezone
zone."
https://serverfault.com/questions/554359/postgresql-timezone-does-not-match-system-timezone
When it says "all machines", do you confirm that this is physical machines,
not VMs ? I mean the machine DWH is installed on. It can be a VM.But I'm not
saying we recommend all VMs to be set to UTC. May I apply the solution given on
access.redhat or not, as there is no more messages since 3AM ? No need. And,
last question but not least, can this timezone be changed on the machines (and
DB ?) without issue ? It is possible to update it, but its not mandatory. The 1
hour gap / overlap is expected when moving from summer to winter and back when
not using UTC and I'm not sure if its even worth updating at this point,at the
risk of ending up with a real bug..
--
Regards,

Frank

Le Dimanche, Octobre 28, 2018 11:40 CET, Shirly Radco a
écrit:
Hi, Please see
herehttps://www.ovirt.org/documentation/data-warehouse/Data_Collection_Setup_and_Reports_Installation_Overview/
"It is recommended that you set the system time zone for all machines in your
Data Warehouse deployment to UTC. This ensures that data collection is not
interrupted by variations in your local time zone: for example, a change from
summer time to winter time." What timezone is your DB configured to? Best, --
SHIRLY RADCO
BI SENIOR SOFTWARE ENGINEER
Red Hat IsraelTRIED. TESTED. TRUSTED. On Sun, Oct 28, 2018 at 12:32 PM fsoyer
wrote:Hi all,
Maybe it has already been posted, but I think I've discoverd a little bug. This
night I had this messages :
28 oct. 2018 03:00:00
ETL service aggregation to hourly tables has encountered an error. Please
consult the service log for more details.
28 oct. 2018 02:40:27
ETL service sampling has encountered an error. Please consult the service log
for more details.
28 oct. 2018 02:33:42
ETL service sampling has encountered an error. Please consult the service log
for more details.
28 oct. 2018 02:27:42
ETL service sampling has encountered an error. Please consult the service log
for more details.
28 oct. 2018 02:22:27
ETL service sampling has encountered an error. Please consult the service log
for more details.
28 oct. 2018 02:16:37
ETL service sampling has encountered an error. Please consult the service log
for more details.
28 oct. 2018 02:11:06
ETL service sampling has encountered an error. Please consult the service log
for more details.
28 oct. 2018 02:05:06
ETL service sampling has encountered an error. Please consult the service log
for more details.
28 oct. 2018 02:00:06
ETL service sampling has encountered an error. Please consult the service log
for more details
28 oct. 2018 02:00:00
ETL service aggregation to hourly tables has encountered an error. Please
consult the service log for more details.and, coincidence, here in France we
have change to winter hour at... 2AM :) So regarding this post:
https://access.redhat.com/solutions/3338001
speaking about a time problem

[ovirt-users] Re: VMs unexpectidly restarted

2018-10-28 Thread fsoyer

lf-heal Daemon on eskarinastorage.local.s
ystea.fr                                    N/A       N/A        Y       30725
Self-heal Daemon on victorstorage.local.sys
tea.fr                                      N/A       N/A        Y       2810 
 
Task Status of Volume ISO
---
But, a df on the nodes shows that all volumes except ENGINE was mounted on 
ovirmgmt network (hosts names without "storage") :

gingerstorage.local.systea.fr:/ENGINE   5,0T    226G  4,7T   5% 
/rhev/data-center/mnt/glusterSD/gingerstorage.local.systea.fr:_ENGINE
victor.local.systea.fr:/DATA01          1,3T    425G  862G  33% 
/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA01
victor.local.systea.fr:/DATA02          5,0T    226G  4,7T   5% 
/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02
victor.local.systea.fr:/ISO             1,3T    425G  862G  33% 
/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_ISO
victor.local.systea.fr:/EXPORT          1,3T    425G  862G  33% 
/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_EXPORT

I can't remember how it was declared at install time, maybe I had not seen 
that, but if I tried to had a domain now, gluster managed, effectively it 
proposes to me only the nodes by their ovirmgmt names, not storage names.

Names are only known in the /etc/hosts of all nodes + engine, there is no DNS 
for this local addresses.

So : in your opinion, can this configuration be a (the) source of my problems ? 
And have you an idea how I could correct this now, without loosing anything ?

Thanks for all suggestions.

--

Regards,

Frank
 

Le Jeudi, Octobre 18, 2018 23:13 CEST, Nir Soffer  a écrit:
 On Thu, Oct 18, 2018 at 3:43 PM fsoyer  wrote:Hi,
I forgot to look in the /var/log/messages file on the host ! What a shame :/
Here is the messages file at the time of the error : 
https://gist.github.com/fsoyer/4d1247d4c3007a8727459efd23d89737
At the sasme time, the second host as no particular messages in its log.
Does anyone have an idea of the source problem ? The problem started when 
sanlock could not renew storage leases held by some processes: Oct 16 11:01:46 
victor sanlock[904]: 2018-10-16 11:01:46 2945585 [4167]: s3 delta_renew read 
timeout 10 sec offset 0 
/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/idsOct
 16 11:01:46 victor sanlock[904]: 2018-10-16 11:01:46 2945585 [4167]: s3 
renewal error -202 delta_length 25 last_success 2945539 After 80 seconds, the 
vms are terminated by sanlock: Oct 16 11:02:19 victor sanlock[904]: 2018-10-16 
11:02:18 2945617 [904]: s1 check_our_lease failed 80Oct 16 11:02:19 victor 
sanlock[904]: 2018-10-16 11:02:18 2945617 [904]: s1 kill 13823 sig 15 count 1 
But process 13823 cannot be killed, since it is blocked on storage, so sanlock 
send many moreTERM signals: Oct 16 11:02:33 victor sanlock[904]: 2018-10-16 
11:02:33 2945633 [904]: s1 kill 13823 sig 15 count 17 The VM finally dies after 
17 retries: Oct 16 11:02:33 victor sanlock[904]: 2018-10-16 11:02:33 2945633 
[904]: dead 13823 ci 10 count 17 We can see the same flow for other processes 
(HA VMs?) This allows the system to start the HA VMon another host, which is 
what we see in the events log in the first message. Trying to restart VM npi2 
on Host victor.local.systea.fr
16 oct. 2018 11:02:33
Highly Available VM npi2 failed. It will be restarted automatically.
16 oct. 2018 11:02:33
VM npi2 is down with error. Exit message: VM has been terminated on the host. 
If the VMs were not started successfully on the other hosts, maybe the storage 
domainused for VM lease is not accessible? It is recommended to choose the same 
storage domain used by the other VM disks forthe VM lease. Also check that all 
storage domains are accessible - if they are not you will have warningsin 
/var/log/vdsm/vdsm.log. Nir 

--
Cordialement,

Frank

Le Mardi, Octobre 16, 2018 13:25 CEST, "fsoyer"  a écrit:
  Hi all,
this morning, some of my VMs were restarted unexpectidly. The events in GUI say 
:
16 oct. 2018 11:03:50
Trying to restart VM patjoub1 on Host ginger.local.systea.fr
16 oct. 2018 11:03:26
Trying to restart VM op2drugs1 on Host victor.local.systea.fr
16 oct. 2018 11:03:23
Trying to restart VM npi2 on Host ginger.local.systea.fr
16 oct. 2018 11:02:54
Trying to restart VM op2drugs1 on Host victor.local.systea.fr
16 oct. 2018 11:02:54
Trying to restart VM patjoub1 on Host ginger.local.systea.fr
16 oct. 2018 11:02:53
Highly Available VM op2drugs1 failed. It will be restarted automatically.
16 oct. 2018 11:02:53
Failed to restart VM patjoub1 on Host victor.local.systea.fr
16 oct. 2018 11:02:53
VM op2drugs1 is down with error. Exit message: VM has been terminated on the 
host.
16 oct. 2018 11:02:53
VM patjoub1 is down with error. Exit message: Failed to acquire lock: Aucun 
espace disponible sur le périphérique.
16 oct. 2018 11:02:47
Trying to restart VM npi2 on Host ginger.local.systea.fr
16 oct. 2018 11:02:46
Failed to rest

[ovirt-users] Re: ETL service and winter hour

2018-10-28 Thread fsoyer

Well, I see that I'm late to give the information :) Thank you to pointing me
to this, but I have now some other questions now...
How can I see the timezone of the DB ?
When it says "all machines", do you confirm that this is physical machines, not
VMs ?
May I apply the solution given on access.redhat or not, as there is no more
messages since 3AM ?
And, last question but not least, can this timezone be changed on the machines
(and DB ?) without issue ?

Regards,

Frank

Question : does this repair all alone (as there is no more messages after 3AM)
or may I applied the solution with postgres updates (I must say that I'm not
very enthousiast for that...) ?

Regards,

--
Frank ___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GNGAQKTA5PDJUYMTFJABDLZA27FHUY7O/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/D46VJJ5LHQP4JTLMWL6L2QBMPL2SCU3W/

[ovirt-users] ETL service and winter hour

2018-10-28 Thread fsoyer

Hi all,
Maybe it has already been posted, but I think I've discoverd a little bug. This
night I had this messages :
28 oct. 2018 03:00:00
ETL service aggregation to hourly tables has encountered an error. Please
consult the service log for more details.
28 oct. 2018 02:40:27
ETL service sampling has encountered an error. Please consult the service log
for more details.
28 oct. 2018 02:33:42
ETL service sampling has encountered an error. Please consult the service log
for more details.
28 oct. 2018 02:27:42
ETL service sampling has encountered an error. Please consult the service log
for more details.
28 oct. 2018 02:22:27
ETL service sampling has encountered an error. Please consult the service log
for more details.
28 oct. 2018 02:16:37
ETL service sampling has encountered an error. Please consult the service log
for more details.
28 oct. 2018 02:11:06
ETL service sampling has encountered an error. Please consult the service log
for more details.
28 oct. 2018 02:05:06
ETL service sampling has encountered an error. Please consult the service log
for more details.
28 oct. 2018 02:00:06
ETL service sampling has encountered an error. Please consult the service log
for more details
28 oct. 2018 02:00:00
ETL service aggregation to hourly tables has encountered an error. Please
consult the service log for more details.and, coincidence, here in France we
have change to winter hour at... 2AM :) So regarding this post:
https://access.redhat.com/solutions/3338001
speaking about a time problem, I've supposed that this is related ! No ?
access.redhat says that the cause was not yet determined, but maybe it can be
interesting to propose this cause ? But the bug is actually closed.

Question : does this repair all alone (as there is no more messages after 3AM)
or may I applied the solution with postgres updates (I must say that I'm not
very enthousiast for that...) ?

Regards,

Frank
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GNGAQKTA5PDJUYMTFJABDLZA27FHUY7O/

[ovirt-users] Re: VM paused then killed with "device vda reported I/O error"

2018-10-25 Thread fsoyer


Oops, reading my message I find an error : the problem occurs at 1:21AM  not 
1:01 :/

Frank
 Le Jeudi, Octobre 25, 2018 17:55 CEST, "fsoyer"  a écrit:
  Hi,
related (or maybe not) with my problem "VMs unexpectidly restarted", I have one 
VM (only one) which was paused then killed this morning (1:01AM).
This is the second time (first time about 15 days ago), only this one (it is on 
a domain with 5 others VMs, and it is not the most used of them. And it was at 
night, without any particular treatment at this time). The others VMs on the 
same storage were not impacted at all. And it is not on the same storage domain 
as the other VM of "VMs unexpectidly restarted"...
At the same time, gluster seems to have seen absolutly nothing. Is there really 
a storage issue ??

Here are some revelant logs
/var/log/messages of the node
vdsm.log
engine.log
glusterd.log
data01-brick.log

For recall, this is a 3 nodes 4.2.3 cluster, on Gluster 3.12.13 (2+arbiter).
Any idea where or what I must search for ?

Thanks
--

Cordialement,

Frank


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3BI45NBQTKNHLOOS3TO2TAT53LREC4EF/

[ovirt-users] VM paused then killed with "device vda reported I/O error"

2018-10-25 Thread fsoyer


Hi,
related (or maybe not) with my problem "VMs unexpectidly restarted", I have one 
VM (only one) which was paused then killed this morning (1:01AM).
This is the second time (first time about 15 days ago), only this one (it is on 
a domain with 5 others VMs, and it is not the most used of them. And it was at 
night, without any particular treatment at this time). The others VMs on the 
same storage were not impacted at all. And it is not on the same storage domain 
as the other VM of "VMs unexpectidly restarted"...
At the same time, gluster seems to have seen absolutly nothing. Is there really 
a storage issue ??

Here are some revelant logs
/var/log/messages of the node
vdsm.log
engine.log
glusterd.log
data01-brick.log

For recall, this is a 3 nodes 4.2.3 cluster, on Gluster 3.12.13 (2+arbiter).
Any idea where or what I must search for ?

Thanks
--

Cordialement,

Frank
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JU3OTWA7HU4F5POYFR7KMNTKKWQMYEZE/

[ovirt-users] Re: re-enabling networkmanager

2018-10-22 Thread fsoyer


At least view some graphs. Here is a screenshot of the cockpit tab (not sure 
that a picture can be displayed via the list, tell me if not) :



--

Cordialement,

Frank

Le Lundi, Octobre 22, 2018 12:42 CEST, Donny Davis  a 
écrit:
 So you are trying to capture utilization stats from your network? On Mon, Oct 
22, 2018, 6:37 AM fsoyer  wrote:Hi Donny,
thank you for this precision, but I don't want to manage network from Cockpit, 
just view the graphs of network usage in it, and eventually logs (can be 
interesting !).
That's why I ask about risks to activate NM after configuring the networks in 
engine UI : what are the opinions ?

--
Cordialement,

Frank Soyer
Mob. 06 72 28 38 53 - Fix. 05 49 50 52 34
Systea IG
Administration systèmes, réseaux et bases de données
www.systea.net
Membre du réseau Les Professionnels du Numérique
KoGite
Hébergement de proximité  
www.kogite.fr
 

Le Dimanche, Octobre 21, 2018 14:05 CEST, Donny Davis  a 
écrit:
 Use the Ovirt engine to manage your networks. VDSM takes over at boot time, 
and the only way for this to happen is if you use the engine On Fri, Oct 19, 
2018 at 9:26 AM fsoyer  wrote:Hi,
I have installed a 4.2 cluster on CentOS 7 nodes but I have follow an (old) 
procedure of mine done with 4.0 : so, I have disabled Network Manager before 
installing oVirt.
The networks created and validated in the engine UI are :
ovirmgmt on bond0 (2 slaves) failover mode
storagemanager on bond1 (2 slaves), jumbo frames, aggregation mode, serving 
Gluster.
Today, I installed Cockpit on the node to have the nodes consoles. But it say 
that it cannot manage the network without NM.
So my question is : is there any risk to re-enabled NM on the nodes ? Can it 
broke anything done by the UI ?

--
Cordialement,

Frank Soyer
Mob. 06 72 28 38 53 - Fix. 05 49 50 52 34
Systea IG
Administration systèmes, réseaux et bases de données
www.systea.net
Membre du réseau Les Professionnels du Numérique
KoGite
Hébergement de proximité  
www.kogite.fr
  ___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GOWGDMXIQ2VLHW2NAL2SSRQLXFKD7753/
 
 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/EC6NWQDYU6JJDPSZJFDQRAJXLBFLEMY7/

[ovirt-users] Re: re-enabling networkmanager

2018-10-22 Thread fsoyer


Hi Donny,
thank you for this precision, but I don't want to manage network from Cockpit, 
just view the graphs of network usage in it, and eventually logs (can be 
interesting !).
That's why I ask about risks to activate NM after configuring the networks in 
engine UI : what are the opinions ?

--

Cordialement,

Frank Soyer
Mob. 06 72 28 38 53 - Fix. 05 49 50 52 34
Systea IG
Administration systèmes, réseaux et bases de données
www.systea.net
Membre du réseau Les Professionnels du Numérique
KoGite
Hébergement de proximité  
www.kogite.fr
 

Le Dimanche, Octobre 21, 2018 14:05 CEST, Donny Davis  a 
écrit:
 Use the Ovirt engine to manage your networks. VDSM takes over at boot time, 
and the only way for this to happen is if you use the engine On Fri, Oct 19, 
2018 at 9:26 AM fsoyer  wrote:Hi,
I have installed a 4.2 cluster on CentOS 7 nodes but I have follow an (old) 
procedure of mine done with 4.0 : so, I have disabled Network Manager before 
installing oVirt.
The networks created and validated in the engine UI are :
ovirmgmt on bond0 (2 slaves) failover mode
storagemanager on bond1 (2 slaves), jumbo frames, aggregation mode, serving 
Gluster.
Today, I installed Cockpit on the node to have the nodes consoles. But it say 
that it cannot manage the network without NM.
So my question is : is there any risk to re-enabled NM on the nodes ? Can it 
broke anything done by the UI ?

--
Cordialement,

Frank Soyer
Mob. 06 72 28 38 53 - Fix. 05 49 50 52 34
Systea IG
Administration systèmes, réseaux et bases de données
www.systea.net
Membre du réseau Les Professionnels du Numérique
KoGite
Hébergement de proximité  
www.kogite.fr
  ___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GOWGDMXIQ2VLHW2NAL2SSRQLXFKD7753/
 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5TK7TSIN4UEG5F3KO3NLBYVFIW57LWL5/

[ovirt-users] re-enabling networkmanager

2018-10-19 Thread fsoyer


Hi,
I have installed a 4.2 cluster on CentOS 7 nodes but I have follow an (old) 
procedure of mine done with 4.0 : so, I have disabled Network Manager before 
installing oVirt.
The networks created and validated in the engine UI are :
ovirmgmt on bond0 (2 slaves) failover mode
storagemanager on bond1 (2 slaves), jumbo frames, aggregation mode, serving 
Gluster.
Today, I installed Cockpit on the node to have the nodes consoles. But it say 
that it cannot manage the network without NM.
So my question is : is there any risk to re-enabled NM on the nodes ? Can it 
broke anything done by the UI ?

--

Cordialement,

Frank Soyer
Mob. 06 72 28 38 53 - Fix. 05 49 50 52 34
Systea IG
Administration systèmes, réseaux et bases de données
www.systea.net
Membre du réseau Les Professionnels du Numérique
KoGite
Hébergement de proximité  
www.kogite.fr
 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GOWGDMXIQ2VLHW2NAL2SSRQLXFKD7753/

[ovirt-users] Re: VMs unexpectidly restarted

2018-10-19 Thread fsoyer


Hi Nir,
thank you for this detailed analysis. As I can see, the fist VM to shutdown had 
its lease on hosted storage domain (probably not the best, maybe a test 
remained here) and its disk on DATA02. The 3 others (HA VMs) had a lease on the 
same domain as their disk (DATA02).
So I suppose this looks like a gluster latency on DATA02. But what I don't 
understand at this time is :
- if this was a lease problem on DATA02, the VM npi2 should not be impacted... 
Or DATA02 was inaccessible, and the messages should have reported a storage 
error (something like "IO error" I suppode ?)
- If this was a problem on hosted storage domain too, the engine do not restart 
(if the domain was off or blocked, it would have?) nor was marked as not 
responding, even temporarily
- Gluster saw absolutly nothing at the same time, on engine domain or DATA02 : 
the logs of daemons and bricks show nothing revelant.

Unfortunatly, I have no more the vdsm log file at the time of the problem : it 
is rotated+compressed all 2 hours, and I discover that if you uncompress the 
"vdsm.log.1.xz" for example, at the time of rotation the system overwrite it 
with the last log :(
I'm afraid that I need to wait for another problem to re-scan all the logs and 
try to understand what append... 
--

Cordialement,

Frank
 

Le Jeudi, Octobre 18, 2018 23:13 CEST, Nir Soffer  a écrit:
 On Thu, Oct 18, 2018 at 3:43 PM fsoyer  wrote:Hi,
I forgot to look in the /var/log/messages file on the host ! What a shame :/
Here is the messages file at the time of the error : 
https://gist.github.com/fsoyer/4d1247d4c3007a8727459efd23d89737
At the sasme time, the second host as no particular messages in its log.
Does anyone have an idea of the source problem ? The problem started when 
sanlock could not renew storage leases held by some processes: Oct 16 11:01:46 
victor sanlock[904]: 2018-10-16 11:01:46 2945585 [4167]: s3 delta_renew read 
timeout 10 sec offset 0 
/rhev/data-center/mnt/glusterSD/victor.local.systea.fr:_DATA02/ffc53fd8-c5d1-4070-ae51-2e91835cd937/dom_md/idsOct
 16 11:01:46 victor sanlock[904]: 2018-10-16 11:01:46 2945585 [4167]: s3 
renewal error -202 delta_length 25 last_success 2945539 After 80 seconds, the 
vms are terminated by sanlock: Oct 16 11:02:19 victor sanlock[904]: 2018-10-16 
11:02:18 2945617 [904]: s1 check_our_lease failed 80Oct 16 11:02:19 victor 
sanlock[904]: 2018-10-16 11:02:18 2945617 [904]: s1 kill 13823 sig 15 count 1 
But process 13823 cannot be killed, since it is blocked on storage, so sanlock 
send many moreTERM signals: Oct 16 11:02:33 victor sanlock[904]: 2018-10-16 
11:02:33 2945633 [904]: s1 kill 13823 sig 15 count 17 The VM finally dies after 
17 retries: Oct 16 11:02:33 victor sanlock[904]: 2018-10-16 11:02:33 2945633 
[904]: dead 13823 ci 10 count 17 We can see the same flow for other processes 
(HA VMs?) This allows the system to start the HA VMon another host, which is 
what we see in the events log in the first message. Trying to restart VM npi2 
on Host victor.local.systea.fr
16 oct. 2018 11:02:33
Highly Available VM npi2 failed. It will be restarted automatically.
16 oct. 2018 11:02:33
VM npi2 is down with error. Exit message: VM has been terminated on the host. 
If the VMs were not started successfully on the other hosts, maybe the storage 
domainused for VM lease is not accessible? It is recommended to choose the same 
storage domain used by the other VM disks forthe VM lease. Also check that all 
storage domains are accessible - if they are not you will have warningsin 
/var/log/vdsm/vdsm.log. Nir 

--
Cordialement,

Frank

Le Mardi, Octobre 16, 2018 13:25 CEST, "fsoyer"  a écrit:
  Hi all,
this morning, some of my VMs were restarted unexpectidly. The events in GUI say 
:
16 oct. 2018 11:03:50
Trying to restart VM patjoub1 on Host ginger.local.systea.fr
16 oct. 2018 11:03:26
Trying to restart VM op2drugs1 on Host victor.local.systea.fr
16 oct. 2018 11:03:23
Trying to restart VM npi2 on Host ginger.local.systea.fr
16 oct. 2018 11:02:54
Trying to restart VM op2drugs1 on Host victor.local.systea.fr
16 oct. 2018 11:02:54
Trying to restart VM patjoub1 on Host ginger.local.systea.fr
16 oct. 2018 11:02:53
Highly Available VM op2drugs1 failed. It will be restarted automatically.
16 oct. 2018 11:02:53
Failed to restart VM patjoub1 on Host victor.local.systea.fr
16 oct. 2018 11:02:53
VM op2drugs1 is down with error. Exit message: VM has been terminated on the 
host.
16 oct. 2018 11:02:53
VM patjoub1 is down with error. Exit message: Failed to acquire lock: Aucun 
espace disponible sur le périphérique.
16 oct. 2018 11:02:47
Trying to restart VM npi2 on Host ginger.local.systea.fr
16 oct. 2018 11:02:46
Failed to restart VM npi2 on Host victor.local.systea.fr
16 oct. 2018 11:02:46
VM npi2 is down with error. Exit message: Failed to acquire lock: Aucun espace 
disponible sur le périphérique.
16 oct. 2018 11:02:38
Trying to restart VM patjoub1 on Host victor.local.systea.fr
1

[ovirt-users] Re: VMs unexpectidly restarted

2018-10-18 Thread fsoyer


And I add log of one of the restarted VMs
https://gist.github.com/fsoyer/b63daa0653d91a59ffc65f2b6ad263f6


--

Cordialement,

Frank

Le Jeudi, Octobre 18, 2018 14:41 CEST, "fsoyer"  a écrit:
  Hi,
I forgot to look in the /var/log/messages file on the host ! What a shame :/
Here is the messages file at the time of the error : 
https://gist.github.com/fsoyer/4d1247d4c3007a8727459efd23d89737
At the sasme time, the second host as no particular messages in its log.
Does anyone have an idea of the source problem ?


--

Cordialement,

Frank

Le Mardi, Octobre 16, 2018 13:25 CEST, "fsoyer"  a écrit:
  Hi all,
this morning, some of my VMs were restarted unexpectidly. The events in GUI say 
:
16 oct. 2018 11:03:50
Trying to restart VM patjoub1 on Host ginger.local.systea.fr
16 oct. 2018 11:03:26
Trying to restart VM op2drugs1 on Host victor.local.systea.fr
16 oct. 2018 11:03:23
Trying to restart VM npi2 on Host ginger.local.systea.fr
16 oct. 2018 11:02:54
Trying to restart VM op2drugs1 on Host victor.local.systea.fr
16 oct. 2018 11:02:54
Trying to restart VM patjoub1 on Host ginger.local.systea.fr
16 oct. 2018 11:02:53
Highly Available VM op2drugs1 failed. It will be restarted automatically.
16 oct. 2018 11:02:53
Failed to restart VM patjoub1 on Host victor.local.systea.fr
16 oct. 2018 11:02:53
VM op2drugs1 is down with error. Exit message: VM has been terminated on the 
host.
16 oct. 2018 11:02:53
VM patjoub1 is down with error. Exit message: Failed to acquire lock: Aucun 
espace disponible sur le périphérique.
16 oct. 2018 11:02:47
Trying to restart VM npi2 on Host ginger.local.systea.fr
16 oct. 2018 11:02:46
Failed to restart VM npi2 on Host victor.local.systea.fr
16 oct. 2018 11:02:46
VM npi2 is down with error. Exit message: Failed to acquire lock: Aucun espace 
disponible sur le périphérique.
16 oct. 2018 11:02:38
Trying to restart VM patjoub1 on Host victor.local.systea.fr
16 oct. 2018 11:02:37
Highly Available VM patjoub1 failed. It will be restarted automatically.
16 oct. 2018 11:02:37
VM patjoub1 is down with error. Exit message: VM has been terminated on the 
host.
16 oct. 2018 11:02:36
VM patjoub1 is not responding.
16 oct. 2018 11:02:36
VM altern8 is not responding.
16 oct. 2018 11:02:36
VM Sogov3 is not responding.
16 oct. 2018 11:02:36
VM cerbere3 is not responding.
16 oct. 2018 11:02:36
VM Mint19 is not responding.
16 oct. 2018 11:02:35
VM cerbere4 is not responding.
16 oct. 2018 11:02:35
VM zabbix is not responding.
16 oct. 2018 11:02:34
Trying to restart VM npi2 on Host victor.local.systea.fr
16 oct. 2018 11:02:33
Highly Available VM npi2 failed. It will be restarted automatically.
16 oct. 2018 11:02:33
VM npi2 is down with error. Exit message: VM has been terminated on the host.
16 oct. 2018 11:02:20
VM cerbere3 is not responding.
16 oct. 2018 11:02:20
VM logcollector is not responding.
16 oct. 2018 11:02:20
VM HostedEngine is not responding.with engine. log : 
https://gist.github.com/fsoyer/e3b74b4693006736b4f737b642aed0ef
searching for "Failed to acquire lock" I see a post about sanlock.log. Here it 
is at the time of the restart : 
https://gist.github.com/fsoyer/8d6952e85623a12f09317652aa4babd7
(hope that you can display this gists)

First question : there is all the days those message "delta_renew long write 
time".  What does this mean ? Even if I suspect some storage problem, I don't 
see latency on it (configuration described bellow).
Second question : what append that force some VMs (not all, and not and the 
sams host !) to restart ? Where and what must I search for ?
Thanks

Configuration
2 DELL R620 as ovirt hosts (4.2.8-2) with hosted-engine, also members of a 
gluster 3.12.13-1 cluster with an arbiter (1 DELL R310, non-ovirt). The DATAs 
and ENGINE storages are on gluster volumes. Around 11am, I do not see any 
specific messages in glusterd.log or glfsheal-*.log. Gluster is on a separate 
network (2*1G bond mode 4=aggegation) than ovirmgmt (2*1G bond mode 1=failover).

--

Regards,

Frank
 
 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IYQGUX7GLK7KKXOWCYCLHRMHYTH5CRKY/

[ovirt-users] Re: VMs unexpectidly restarted

2018-10-18 Thread fsoyer


Hi,
I forgot to look in the /var/log/messages file on the host ! What a shame :/
Here is the messages file at the time of the error : 
https://gist.github.com/fsoyer/4d1247d4c3007a8727459efd23d89737
At the sasme time, the second host as no particular messages in its log.
Does anyone have an idea of the source problem ?


--

Cordialement,

Frank

Le Mardi, Octobre 16, 2018 13:25 CEST, "fsoyer"  a écrit:
  Hi all,
this morning, some of my VMs were restarted unexpectidly. The events in GUI say 
:
16 oct. 2018 11:03:50
Trying to restart VM patjoub1 on Host ginger.local.systea.fr
16 oct. 2018 11:03:26
Trying to restart VM op2drugs1 on Host victor.local.systea.fr
16 oct. 2018 11:03:23
Trying to restart VM npi2 on Host ginger.local.systea.fr
16 oct. 2018 11:02:54
Trying to restart VM op2drugs1 on Host victor.local.systea.fr
16 oct. 2018 11:02:54
Trying to restart VM patjoub1 on Host ginger.local.systea.fr
16 oct. 2018 11:02:53
Highly Available VM op2drugs1 failed. It will be restarted automatically.
16 oct. 2018 11:02:53
Failed to restart VM patjoub1 on Host victor.local.systea.fr
16 oct. 2018 11:02:53
VM op2drugs1 is down with error. Exit message: VM has been terminated on the 
host.
16 oct. 2018 11:02:53
VM patjoub1 is down with error. Exit message: Failed to acquire lock: Aucun 
espace disponible sur le périphérique.
16 oct. 2018 11:02:47
Trying to restart VM npi2 on Host ginger.local.systea.fr
16 oct. 2018 11:02:46
Failed to restart VM npi2 on Host victor.local.systea.fr
16 oct. 2018 11:02:46
VM npi2 is down with error. Exit message: Failed to acquire lock: Aucun espace 
disponible sur le périphérique.
16 oct. 2018 11:02:38
Trying to restart VM patjoub1 on Host victor.local.systea.fr
16 oct. 2018 11:02:37
Highly Available VM patjoub1 failed. It will be restarted automatically.
16 oct. 2018 11:02:37
VM patjoub1 is down with error. Exit message: VM has been terminated on the 
host.
16 oct. 2018 11:02:36
VM patjoub1 is not responding.
16 oct. 2018 11:02:36
VM altern8 is not responding.
16 oct. 2018 11:02:36
VM Sogov3 is not responding.
16 oct. 2018 11:02:36
VM cerbere3 is not responding.
16 oct. 2018 11:02:36
VM Mint19 is not responding.
16 oct. 2018 11:02:35
VM cerbere4 is not responding.
16 oct. 2018 11:02:35
VM zabbix is not responding.
16 oct. 2018 11:02:34
Trying to restart VM npi2 on Host victor.local.systea.fr
16 oct. 2018 11:02:33
Highly Available VM npi2 failed. It will be restarted automatically.
16 oct. 2018 11:02:33
VM npi2 is down with error. Exit message: VM has been terminated on the host.
16 oct. 2018 11:02:20
VM cerbere3 is not responding.
16 oct. 2018 11:02:20
VM logcollector is not responding.
16 oct. 2018 11:02:20
VM HostedEngine is not responding.with engine. log : 
https://gist.github.com/fsoyer/e3b74b4693006736b4f737b642aed0ef
searching for "Failed to acquire lock" I see a post about sanlock.log. Here it 
is at the time of the restart : 
https://gist.github.com/fsoyer/8d6952e85623a12f09317652aa4babd7
(hope that you can display this gists)

First question : there is all the days those message "delta_renew long write 
time".  What does this mean ? Even if I suspect some storage problem, I don't 
see latency on it (configuration described bellow).
Second question : what append that force some VMs (not all, and not and the 
sams host !) to restart ? Where and what must I search for ?
Thanks

Configuration
2 DELL R620 as ovirt hosts (4.2.8-2) with hosted-engine, also members of a 
gluster 3.12.13-1 cluster with an arbiter (1 DELL R310, non-ovirt). The DATAs 
and ENGINE storages are on gluster volumes. Around 11am, I do not see any 
specific messages in glusterd.log or glfsheal-*.log. Gluster is on a separate 
network (2*1G bond mode 4=aggegation) than ovirmgmt (2*1G bond mode 1=failover).

--

Regards,

Frank
 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XFFJT4NORIELIOAGPHU4CUPC67KY3MMP/

[ovirt-users] VMs unexpectidly restarted

2018-10-16 Thread fsoyer


Hi all,
this morning, some of my VMs were restarted unexpectidly. The events in GUI say 
:
16 oct. 2018 11:03:50
Trying to restart VM patjoub1 on Host ginger.local.systea.fr
16 oct. 2018 11:03:26
Trying to restart VM op2drugs1 on Host victor.local.systea.fr
16 oct. 2018 11:03:23
Trying to restart VM npi2 on Host ginger.local.systea.fr
16 oct. 2018 11:02:54
Trying to restart VM op2drugs1 on Host victor.local.systea.fr
16 oct. 2018 11:02:54
Trying to restart VM patjoub1 on Host ginger.local.systea.fr
16 oct. 2018 11:02:53
Highly Available VM op2drugs1 failed. It will be restarted automatically.
16 oct. 2018 11:02:53
Failed to restart VM patjoub1 on Host victor.local.systea.fr
16 oct. 2018 11:02:53
VM op2drugs1 is down with error. Exit message: VM has been terminated on the 
host.
16 oct. 2018 11:02:53
VM patjoub1 is down with error. Exit message: Failed to acquire lock: Aucun 
espace disponible sur le périphérique.
16 oct. 2018 11:02:47
Trying to restart VM npi2 on Host ginger.local.systea.fr
16 oct. 2018 11:02:46
Failed to restart VM npi2 on Host victor.local.systea.fr
16 oct. 2018 11:02:46
VM npi2 is down with error. Exit message: Failed to acquire lock: Aucun espace 
disponible sur le périphérique.
16 oct. 2018 11:02:38
Trying to restart VM patjoub1 on Host victor.local.systea.fr
16 oct. 2018 11:02:37
Highly Available VM patjoub1 failed. It will be restarted automatically.
16 oct. 2018 11:02:37
VM patjoub1 is down with error. Exit message: VM has been terminated on the 
host.
16 oct. 2018 11:02:36
VM patjoub1 is not responding.
16 oct. 2018 11:02:36
VM altern8 is not responding.
16 oct. 2018 11:02:36
VM Sogov3 is not responding.
16 oct. 2018 11:02:36
VM cerbere3 is not responding.
16 oct. 2018 11:02:36
VM Mint19 is not responding.
16 oct. 2018 11:02:35
VM cerbere4 is not responding.
16 oct. 2018 11:02:35
VM zabbix is not responding.
16 oct. 2018 11:02:34
Trying to restart VM npi2 on Host victor.local.systea.fr
16 oct. 2018 11:02:33
Highly Available VM npi2 failed. It will be restarted automatically.
16 oct. 2018 11:02:33
VM npi2 is down with error. Exit message: VM has been terminated on the host.
16 oct. 2018 11:02:20
VM cerbere3 is not responding.
16 oct. 2018 11:02:20
VM logcollector is not responding.
16 oct. 2018 11:02:20
VM HostedEngine is not responding.with engine. log : 
https://gist.github.com/fsoyer/e3b74b4693006736b4f737b642aed0ef
searching for "Failed to acquire lock" I see a post about sanlock.log. Here it 
is at the time of the restart : 
https://gist.github.com/fsoyer/8d6952e85623a12f09317652aa4babd7
(hope that you can display this gists)

First question : there is all the days those message "delta_renew long write 
time".  What does this mean ? Even if I suspect some storage problem, I don't 
see latency on it (configuration described bellow).
Second question : what append that force some VMs (not all, and not and the 
sams host !) to restart ? Where and what must I search for ?
Thanks

Configuration
2 DELL R620 as ovirt hosts (4.2.8-2) with hosted-engine, also members of a 
gluster 3.12.13-1 cluster with an arbiter (1 DELL R310, non-ovirt). The DATAs 
and ENGINE storages are on gluster volumes. Around 11am, I do not see any 
specific messages in glusterd.log or glfsheal-*.log. Gluster is on a separate 
network (2*1G bond mode 4=aggegation) than ovirmgmt (2*1G bond mode 1=failover).

--

Regards,

Frank
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/I3KRS6IESKNZNZ2UXYW356Y6QVSTUAA6/

[ovirt-users] Re: clone snapshot of running vm

2018-09-15 Thread fsoyer


Hi guys,
I just have this issue on a fresh 4.2.6 install. The snapshot of a vm  Seems to 
be the same logs in the ui.log (I paste it here to be sure), and unable to 
clone the snapshot. VM on or off doesn't change things. This really seems to be 
a UI issue because when it appends, we can no more create or clone a snapshot 
on any VM : the buttons just do nothing (and no log in the ui.log when we hit 
them). We must reload the ui (F5 or Ctrl-R) to recover the functionnalities.  
Please help us ? Thanks.

2018-09-15 11:37:50,589+02 ERROR 
[org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default 
task-17) [] Permutation name: 3F33631A4CFC71A7A5878CCA004CB97D
2018-09-15 11:37:50,589+02 ERROR 
[org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default 
task-17) [] Uncaught exception: com.google.gwt.event.shared.UmbrellaException: 
Exception caught: (TypeError) : Cannot read property 'K' of null
    at java.lang.Throwable.Throwable(Throwable.java:70) [rt.jar:1.8.0_181]
    at java.lang.RuntimeException.RuntimeException(RuntimeException.java:32) 
[rt.jar:1.8.0_181]
    at 
com.google.web.bindery.event.shared.UmbrellaException.UmbrellaException(UmbrellaException.java:64)
 [gwt-servlet.jar:]
    at Unknown.new C0(webadmin-0.js)
    at 
com.google.gwt.event.shared.HandlerManager.$fireEvent(HandlerManager.java:117) 
[gwt-servlet.jar:]
    at com.google.gwt.user.client.ui.Widget.$fireEvent(Widget.java:127) 
[gwt-servlet.jar:]
    at com.google.gwt.user.client.ui.Widget.fireEvent(Widget.java:127) 
[gwt-servlet.jar:]
    at 
com.google.gwt.event.dom.client.DomEvent.fireNativeEvent(DomEvent.java:110) 
[gwt-servlet.jar:]
    at com.google.gwt.user.client.ui.Widget.$onBrowserEvent(Widget.java:163) 
[gwt-servlet.jar:]
    at com.google.gwt.user.client.ui.Widget.onBrowserEvent(Widget.java:163) 
[gwt-servlet.jar:]
    at com.google.gwt.user.client.DOM.dispatchEvent(DOM.java:1415) 
[gwt-servlet.jar:]
    at 
com.google.gwt.user.client.impl.DOMImplStandard.dispatchEvent(DOMImplStandard.java:312)
 [gwt-servlet.jar:]
    at com.google.gwt.core.client.impl.Impl.apply(Impl.java:236) 
[gwt-servlet.jar:]
    at com.google.gwt.core.client.impl.Impl.entry0(Impl.java:275) 
[gwt-servlet.jar:]
    at Unknown.eval(webadmin-0.js)
Caused by: com.google.gwt.core.client.JavaScriptException: (TypeError) : Cannot 
read property 'K' of null
    at 
org.ovirt.engine.ui.uicommonweb.models.vms.ExistingVmModelBehavior.updateHaAvailability(ExistingVmModelBehavior.java:481)
    at 
org.ovirt.engine.ui.uicommonweb.models.vms.UnitVmModel.eventRaised(UnitVmModel.java:1933)
    at org.ovirt.engine.ui.uicompat.Event.$raise(Event.java:99)
    at 
org.ovirt.engine.ui.uicommonweb.models.ListModel.$setSelectedItem(ListModel.java:82)
    at 
org.ovirt.engine.ui.uicommonweb.models.ListModel.setSelectedItem(ListModel.java:78)
    at 
org.ovirt.engine.ui.common.editor.UiCommonEditorVisitor.$updateListEditor(UiCommonEditorVisitor.java:193)
    at 
org.ovirt.engine.ui.common.editor.UiCommonEditorVisitor.visit(UiCommonEditorVisitor.java:47)
    at 
com.google.gwt.editor.client.impl.AbstractEditorContext.$traverse(AbstractEditorContext.java:127)
 [gwt-servlet.jar:]
    at 
org.ovirt.engine.ui.common.widget.uicommon.popup.AbstractVmPopupWidget_UiCommonModelEditorDelegate.accept(AbstractVmPopupWidget_UiCommonModelEditorDelegate.java:502)
    at 
com.google.gwt.editor.client.impl.AbstractEditorContext.$traverse(AbstractEditorContext.java:127)
 [gwt-servlet.jar:]
    at 
org.ovirt.engine.ui.common.widget.uicommon.popup.AbstractVmPopupWidget_DriverImpl.accept(AbstractVmPopupWidget_DriverImpl.java:4)
    at 
org.ovirt.engine.ui.common.editor.AbstractUiCommonModelEditorDriver.$edit(AbstractUiCommonModelEditorDriver.java:32)
    at 
org.ovirt.engine.ui.common.widget.uicommon.popup.AbstractVmPopupWidget.$edit(AbstractVmPopupWidget.java:1518)
    at 
org.ovirt.engine.ui.common.widget.uicommon.popup.AbstractVmPopupWidget.edit(AbstractVmPopupWidget.java:1518)
    at 
org.ovirt.engine.ui.common.widget.uicommon.popup.AbstractVmPopupWidget.edit(AbstractVmPopupWidget.java:1518)
    at 
org.ovirt.engine.ui.common.widget.uicommon.popup.AbstractModeSwitchingPopupWidget.edit(AbstractModeSwitchingPopupWidget.java:80)
    at 
org.ovirt.engine.ui.common.view.popup.AbstractModelBoundWidgetPopupView.edit(AbstractModelBoundWidgetPopupView.java:37)
    at 
org.ovirt.engine.ui.common.presenter.AbstractModelBoundPopupPresenterWidget.$init(AbstractModelBoundPopupPresenterWidget.java:105)
    at 
org.ovirt.engine.ui.common.widget.popup.AbstractVmBasedPopupPresenterWidget.$init(AbstractVmBasedPopupPresenterWidget.java:63)
    at 
org.ovirt.engine.ui.common.widget.popup.AbstractVmBasedPopupPresenterWidget.init(AbstractVmBasedPopupPresenterWidget.java:63)
    at 
org.ovirt.engine.ui.common.widget.popup.AbstractVmBasedPopupPresenterWidget.init(AbstractVmBasedPopupPresenterWidget.java:63)
    at 
org.ovirt.engine.ui.common.uicommon.model.ModelBoundPopupHandler.$han

[ovirt-users] Re: Re : [ovirt-users] Re: Install hosted-engine - Task Get local VM IP failed

2018-06-29 Thread fsoyer


At this time the engine (and the cluster) is up. No problem after activating 
gluster and creating the volumes, then finish the install in the screen session.
So...

 
Le Vendredi, Juin 29, 2018 12:32 CEST, "fsoyer"  a écrit:  Hi,
I must say it : I'm -totally- lost.
To try to find a reason to this error, I've re-installed the first host from 
scratch - CentOS 7.5-1804, ovirt 4.2.3-1, gluster 3.12.9.
The first attemp was made with only em1 declared. Result = SUCCESS, the install 
pass "Get local VM IP", then through "Wait for the host to be up" without 
difficulty and wait at "Please specify the storage...".
At this time I even notice that I've forgot to stop/disable NetworkManager, 
that had no impact !
So : I re-install the host from scratch (yes, sometimes I'm a fool) to be 
absolutly sure that there is no problem coming from the preceding install. Now 
I declare em1 (10.0.0.230) and em2 (10.0.0.229, without gateway nor DNS, for 
futur vmnetwork). NetworkManager off and disabled. Result = SUCCESS... Oo
OK : Re-install host !! Now I declare, as I did some days ago, em1, em2 and 
bond0(em3+em4 with IP 192.168.0.30). Result : SUCCESS !!! Oo

So I'm unable to say what append tuesday. Actually I see only two differences :
- gluster is not active (I don't configure it to go faster)
- the version of ovirt (ovirt-release, ovirt-host, appliance...) has sligthly 
changed.

I've no more time for another attempt re-installing the host(s) with gluster 
activated, I must now go on as I need an operational system for other tasks 
with VMs this afternoon. So I leave the first host waiting for the end of 
install in a screen, I re-install the 2 other hosts and activate gluster and 
volumes on the 3 nodes. Then I'll end the install on the gluster volume.
I'll tell you if this works finally, but I hope so !
however, I'm in doubt with this problem. I have no explanation of what append 
tuesday, this is really annoying... Maybe have you the ability to test on a 
same configuration (3 hosts with 2 nics on the same network for ovirtmgmt and a 
futur vmnetwork, and gluster on a separate network) to try to understand ?

Thank you for the time spent.
Frank

PS : to answer to your question : yes, tuesday I ran 
ovirt-hosted-engine-cleanup between each attempt.


Le Jeudi, Juin 28, 2018 16:26 CEST, Simone Tiraboschi  a 
écrit:
  On Wed, Jun 27, 2018 at 5:48 PM fso...@systea.fr  wrote:Hi 
again,
In fact, the hour in file is exactly 2hours before, I guess a timezone problem 
(in the process of install ?), as the file itself is correctly timed at 11:17am 
(correct hour here in France). So the messages are synchrone.  Yes, sorry, 
fault of mine.From the logs I don't see anything strange. Can you please try 
again on your environment and connect to the bootstrap VM via virsh console or 
VNC to check what's happening there? Did you also run 
ovirt-hosted-engine-cleanup between one attempt and the next? 
 Message original 
Objet : Re: [ovirt-users] Re: Install hosted-engine - Task Get local VM IP 
failed
De : Simone Tiraboschi
À : fso...@systea.fr
Cc : users

 Hi,HostedEngineLocal was started at 2018-06-26 09:17:26 but /var/log/messages 
starts only at Jun 26 11:02:32.Can you please reattach it fro the relevant time 
frame?
 On Wed, Jun 27, 2018 at 10:54 AM fsoyer  wrote:Hi Simone,
here are the revelant part of messages and the engine install log (there were 
only this file in /var/log/libvirt/qemu) .

Thanks for your time.

Frank
 Le Mardi, Juin 26, 2018 11:43 CEST, Simone Tiraboschi  a 
écrit:
  On Tue, Jun 26, 2018 at 11:39 AM fsoyer  wrote:Well,
unfortunatly, it was a "false-positive". This morning I tried again, with the 
idea that at one moment the deploy will ask for the final destination for the 
engine, I will restart bond0+gluster+volume engine at thos moment.
Re-launching the deploy on the second "fresh" host (the first one with all 
errors yesterday let it in a doutful state) with em2 and gluster+bond0 off :
# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: em1:  mtu 1500 qdisc mq state UP group 
default qlen 1000
    link/ether e0:db:55:15:f0:f0 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.227/8 brd 10.255.255.255 scope global em1
       valid_lft forever preferred_lft forever
    inet6 fe80::e2db:55ff:fe15:f0f0/64 scope link 
       valid_lft forever preferred_lft forever
3: em2:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:f0:f1 brd ff:ff:ff:ff:ff:ff
4: em3:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:f0:f2 brd ff:ff:ff:ff:ff:ff
5: em4:  mtu 1500 qdisc mq state DOWN group defa

[ovirt-users] Re: Re : [ovirt-users] Re: Install hosted-engine - Task Get local VM IP failed

2018-06-29 Thread fsoyer


Hi,
I must say it : I'm -totally- lost.
To try to find a reason to this error, I've re-installed the first host from 
scratch - CentOS 7.5-1804, ovirt 4.2.3-1, gluster 3.12.9.
The first attemp was made with only em1 declared. Result = SUCCESS, the install 
pass "Get local VM IP", then through "Wait for the host to be up" without 
difficulty and wait at "Please specify the storage...".
At this time I even notice that I've forgot to stop/disable NetworkManager, 
that had no impact !
So : I re-install the host from scratch (yes, sometimes I'm a fool) to be 
absolutly sure that there is no problem coming from the preceding install. Now 
I declare em1 (10.0.0.230) and em2 (10.0.0.229, without gateway nor DNS, for 
futur vmnetwork). NetworkManager off and disabled. Result = SUCCESS... Oo
OK : Re-install host !! Now I declare, as I did some days ago, em1, em2 and 
bond0(em3+em4 with IP 192.168.0.30). Result : SUCCESS !!! Oo

So I'm unable to say what append tuesday. Actually I see only two differences :
- gluster is not active (I don't configure it to go faster)
- the version of ovirt (ovirt-release, ovirt-host, appliance...) has sligthly 
changed.

I've no more time for another attempt re-installing the host(s) with gluster 
activated, I must now go on as I need an operational system for other tasks 
with VMs this afternoon. So I leave the first host waiting for the end of 
install in a screen, I re-install the 2 other hosts and activate gluster and 
volumes on the 3 nodes. Then I'll end the install on the gluster volume.
I'll tell you if this works finally, but I hope so !
however, I'm in doubt with this problem. I have no explanation of what append 
tuesday, this is really annoying... Maybe have you the ability to test on a 
same configuration (3 hosts with 2 nics on the same network for ovirtmgmt and a 
futur vmnetwork, and gluster on a separate network) to try to understand ?

Thank you for the time spent.
Frank

PS : to answer to your question : yes, tuesday I ran 
ovirt-hosted-engine-cleanup between each attempt.


Le Jeudi, Juin 28, 2018 16:26 CEST, Simone Tiraboschi  a 
écrit:
  On Wed, Jun 27, 2018 at 5:48 PM fso...@systea.fr  wrote:Hi 
again,
In fact, the hour in file is exactly 2hours before, I guess a timezone problem 
(in the process of install ?), as the file itself is correctly timed at 11:17am 
(correct hour here in France). So the messages are synchrone.  Yes, sorry, 
fault of mine.From the logs I don't see anything strange. Can you please try 
again on your environment and connect to the bootstrap VM via virsh console or 
VNC to check what's happening there? Did you also run 
ovirt-hosted-engine-cleanup between one attempt and the next? 
 Message original 
Objet : Re: [ovirt-users] Re: Install hosted-engine - Task Get local VM IP 
failed
De : Simone Tiraboschi
À : fso...@systea.fr
Cc : users

 Hi,HostedEngineLocal was started at 2018-06-26 09:17:26 but /var/log/messages 
starts only at Jun 26 11:02:32.Can you please reattach it fro the relevant time 
frame?
 On Wed, Jun 27, 2018 at 10:54 AM fsoyer  wrote:Hi Simone,
here are the revelant part of messages and the engine install log (there were 
only this file in /var/log/libvirt/qemu) .

Thanks for your time.

Frank
 Le Mardi, Juin 26, 2018 11:43 CEST, Simone Tiraboschi  a 
écrit:
  On Tue, Jun 26, 2018 at 11:39 AM fsoyer  wrote:Well,
unfortunatly, it was a "false-positive". This morning I tried again, with the 
idea that at one moment the deploy will ask for the final destination for the 
engine, I will restart bond0+gluster+volume engine at thos moment.
Re-launching the deploy on the second "fresh" host (the first one with all 
errors yesterday let it in a doutful state) with em2 and gluster+bond0 off :
# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: em1:  mtu 1500 qdisc mq state UP group 
default qlen 1000
    link/ether e0:db:55:15:f0:f0 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.227/8 brd 10.255.255.255 scope global em1
       valid_lft forever preferred_lft forever
    inet6 fe80::e2db:55ff:fe15:f0f0/64 scope link 
       valid_lft forever preferred_lft forever
3: em2:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:f0:f1 brd ff:ff:ff:ff:ff:ff
4: em3:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:f0:f2 brd ff:ff:ff:ff:ff:ff
5: em4:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:f0:f3 brd ff:ff:ff:ff:ff:ff
6: bond0:  mtu 9000 qdisc noqueue state DOWN group 
default qlen 1000
    link/ether 3a:ab:a2:f2:38:5c brd ff:ff:ff:ff:ff:ff

# ip r
default via 10.0.1.254 dev em1 
10.

[ovirt-users] Re: Install hosted-engine - Task Get local VM IP failed

2018-06-27 Thread fsoyer


Hi Simone,
here are the revelant part of messages and the engine install log (there were 
only this file in /var/log/libvirt/qemu) .

Thanks for your time.

Frank
 Le Mardi, Juin 26, 2018 11:43 CEST, Simone Tiraboschi  a 
écrit:
  On Tue, Jun 26, 2018 at 11:39 AM fsoyer  wrote:Well,
unfortunatly, it was a "false-positive". This morning I tried again, with the 
idea that at one moment the deploy will ask for the final destination for the 
engine, I will restart bond0+gluster+volume engine at thos moment.
Re-launching the deploy on the second "fresh" host (the first one with all 
errors yesterday let it in a doutful state) with em2 and gluster+bond0 off :
# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: em1:  mtu 1500 qdisc mq state UP group 
default qlen 1000
    link/ether e0:db:55:15:f0:f0 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.227/8 brd 10.255.255.255 scope global em1
       valid_lft forever preferred_lft forever
    inet6 fe80::e2db:55ff:fe15:f0f0/64 scope link 
       valid_lft forever preferred_lft forever
3: em2:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:f0:f1 brd ff:ff:ff:ff:ff:ff
4: em3:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:f0:f2 brd ff:ff:ff:ff:ff:ff
5: em4:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:f0:f3 brd ff:ff:ff:ff:ff:ff
6: bond0:  mtu 9000 qdisc noqueue state DOWN group 
default qlen 1000
    link/ether 3a:ab:a2:f2:38:5c brd ff:ff:ff:ff:ff:ff

# ip r
default via 10.0.1.254 dev em1 
10.0.0.0/8 dev em1 proto kernel scope link src 10.0.0.227 
169.254.0.0/16 dev em1 scope link metric 1002 ... does NOT work this morning
[ INFO  ] TASK [Get local VM IP]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, 
"cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:01:c6:32 | awk '{ 
print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.083587", "end": "2018-06-26 
11:26:07.581706", "rc": 0, "start": "2018-06-26 11:26:07.498119", "stderr": "", 
"stderr_lines": [], "stdout": "", "stdout_lines": []}I'm sure that the network 
was the same yesterday when my attempt finally pass the "get local vm ip". Why 
not today ?
After the error, the network was :
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: em1:  mtu 1500 qdisc mq state UP group 
default qlen 1000
    link/ether e0:db:55:15:f0:f0 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.227/8 brd 10.255.255.255 scope global em1
       valid_lft forever preferred_lft forever
    inet6 fe80::e2db:55ff:fe15:f0f0/64 scope link 
       valid_lft forever preferred_lft forever
3: em2:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:f0:f1 brd ff:ff:ff:ff:ff:ff
4: em3:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:f0:f2 brd ff:ff:ff:ff:ff:ff
5: em4:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:f0:f3 brd ff:ff:ff:ff:ff:ff
6: bond0:  mtu 9000 qdisc noqueue state DOWN group 
default qlen 1000
    link/ether 3a:ab:a2:f2:38:5c brd ff:ff:ff:ff:ff:ff
7: virbr0:  mtu 1500 qdisc noqueue state UP 
group default qlen 1000
    link/ether 52:54:00:ae:8d:93 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever
8: virbr0-nic:  mtu 1500 qdisc pfifo_fast master virbr0 
state DOWN group default qlen 1000
    link/ether 52:54:00:ae:8d:93 brd ff:ff:ff:ff:ff:ff
9: vnet0:  mtu 1500 qdisc pfifo_fast master 
virbr0 state UNKNOWN group default qlen 1000
    link/ether fe:16:3e:01:c6:32 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc16:3eff:fe01:c632/64 scope link 
       valid_lft forever preferred_lft forever

# ip r
default via 10.0.1.254 dev em1 
10.0.0.0/8 dev em1 proto kernel scope link src 10.0.0.227 
169.254.0.0/16 dev em1 scope link metric 1002 
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 
 
 So, finally, I have no idea why this appends :((( Can you please attach 
/var/log/messages and /var/log/libvirt/qemu/* ?  

Le Mardi, Juin 26, 2018 09:21 CEST, Simone Tiraboschi  a 
écrit:
  On Mon, Jun 25, 2018 at 6:32 PM fsoyer  wrote:Well, 
answering to myself for more informations.
Thinking that the network was part of the problem, I tried to

[ovirt-users] Re: Install hosted-engine - Task Get local VM IP failed

2018-06-26 Thread fsoyer


Well,
unfortunatly, it was a "false-positive". This morning I tried again, with the 
idea that at one moment the deploy will ask for the final destination for the 
engine, I will restart bond0+gluster+volume engine at thos moment.
Re-launching the deploy on the second "fresh" host (the first one with all 
errors yesterday let it in a doutful state) with em2 and gluster+bond0 off :
# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: em1:  mtu 1500 qdisc mq state UP group 
default qlen 1000
    link/ether e0:db:55:15:f0:f0 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.227/8 brd 10.255.255.255 scope global em1
       valid_lft forever preferred_lft forever
    inet6 fe80::e2db:55ff:fe15:f0f0/64 scope link 
       valid_lft forever preferred_lft forever
3: em2:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:f0:f1 brd ff:ff:ff:ff:ff:ff
4: em3:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:f0:f2 brd ff:ff:ff:ff:ff:ff
5: em4:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:f0:f3 brd ff:ff:ff:ff:ff:ff
6: bond0:  mtu 9000 qdisc noqueue state DOWN group 
default qlen 1000
    link/ether 3a:ab:a2:f2:38:5c brd ff:ff:ff:ff:ff:ff

# ip r
default via 10.0.1.254 dev em1 
10.0.0.0/8 dev em1 proto kernel scope link src 10.0.0.227 
169.254.0.0/16 dev em1 scope link metric 1002 ... does NOT work this morning
[ INFO  ] TASK [Get local VM IP]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, 
"cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:01:c6:32 | awk '{ 
print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.083587", "end": "2018-06-26 
11:26:07.581706", "rc": 0, "start": "2018-06-26 11:26:07.498119", "stderr": "", 
"stderr_lines": [], "stdout": "", "stdout_lines": []}I'm sure that the network 
was the same yesterday when my attempt finally pass the "get local vm ip". Why 
not today ?
After the error, the network was :
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: em1:  mtu 1500 qdisc mq state UP group 
default qlen 1000
    link/ether e0:db:55:15:f0:f0 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.227/8 brd 10.255.255.255 scope global em1
       valid_lft forever preferred_lft forever
    inet6 fe80::e2db:55ff:fe15:f0f0/64 scope link 
       valid_lft forever preferred_lft forever
3: em2:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:f0:f1 brd ff:ff:ff:ff:ff:ff
4: em3:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:f0:f2 brd ff:ff:ff:ff:ff:ff
5: em4:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:f0:f3 brd ff:ff:ff:ff:ff:ff
6: bond0:  mtu 9000 qdisc noqueue state DOWN group 
default qlen 1000
    link/ether 3a:ab:a2:f2:38:5c brd ff:ff:ff:ff:ff:ff
7: virbr0:  mtu 1500 qdisc noqueue state UP 
group default qlen 1000
    link/ether 52:54:00:ae:8d:93 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever
8: virbr0-nic:  mtu 1500 qdisc pfifo_fast master virbr0 
state DOWN group default qlen 1000
    link/ether 52:54:00:ae:8d:93 brd ff:ff:ff:ff:ff:ff
9: vnet0:  mtu 1500 qdisc pfifo_fast master 
virbr0 state UNKNOWN group default qlen 1000
    link/ether fe:16:3e:01:c6:32 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc16:3eff:fe01:c632/64 scope link 
       valid_lft forever preferred_lft forever

# ip r
default via 10.0.1.254 dev em1 
10.0.0.0/8 dev em1 proto kernel scope link src 10.0.0.227 
169.254.0.0/16 dev em1 scope link metric 1002 
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 
 
 So, finally, I have no idea why this appends :(((


Le Mardi, Juin 26, 2018 09:21 CEST, Simone Tiraboschi  a 
écrit:
  On Mon, Jun 25, 2018 at 6:32 PM fsoyer  wrote:Well, 
answering to myself for more informations.
Thinking that the network was part of the problem, I tried to stop gluster 
volumes, stop gluster on host, and stop bond0.
So, the host now had just em1 with one IP.
And... The winner is... Yes : the install passed the "[Get local VM IP]" and 
continued !!

I hit ctrl-c, restart the bond0, restart deploy : it crashed. So it seems that 
more than one network is the problem. But ! How do I install en

[ovirt-users] Re: Install hosted-engine - Task Get local VM IP failed

2018-06-25 Thread fsoyer


Well, answering to myself for more informations.
Thinking that the network was part of the problem, I tried to stop gluster 
volumes, stop gluster on host, and stop bond0.
So, the host now had just em1 with one IP.
And... The winner is... Yes : the install passed the "[Get local VM IP]" and 
continued !!

I hit ctrl-c, restart the bond0, restart deploy : it crashed. So it seems that 
more than one network is the problem. But ! How do I install engine on gluster 
on a separate - bonding - jumbo network in this case ???

Can you reproduce this on your side ?

Frank
 

Le Lundi, Juin 25, 2018 16:50 CEST, "fsoyer"  a écrit:
  Hi staff,
Installing a fresh ovirt - CentOS 7.5.1804 up to date, ovirt version :
# rpm -qa | grep ovirt
ovirt-hosted-engine-ha-2.2.11-1.el7.centos.noarch
ovirt-imageio-common-1.3.1.2-0.el7.centos.noarch
ovirt-host-dependencies-4.2.2-2.el7.centos.x86_64
ovirt-vmconsole-1.0.5-4.el7.centos.noarch
ovirt-provider-ovn-driver-1.2.10-1.el7.centos.noarch
ovirt-hosted-engine-setup-2.2.20-1.el7.centos.noarch
ovirt-engine-appliance-4.2-20180504.1.el7.centos.noarch
python-ovirt-engine-sdk4-4.2.6-2.el7.centos.x86_64
ovirt-host-deploy-1.7.3-1.el7.centos.noarch
ovirt-release42-4.2.3.1-1.el7.noarch
ovirt-vmconsole-host-1.0.5-4.el7.centos.noarch
cockpit-ovirt-dashboard-0.11.24-1.el7.centos.noarch
ovirt-setup-lib-1.1.4-1.el7.centos.noarch
ovirt-imageio-daemon-1.3.1.2-0.el7.centos.noarch
ovirt-host-4.2.2-2.el7.centos.x86_64
ovirt-engine-sdk-python-3.6.9.1-1.el7.noarch

ON PHYSICAL SERVERS (not on VMware, why should I be ?? ;) I got exactly the 
same error :
[ INFO  ] TASK [Get local VM IP]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, 
"cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:69:3a:c6 | awk '{ 
print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.073313", "end": "2018-06-25 
16:11:36.025277", "rc": 0, "start": "2018-06-25 16:11:35.951964", "stderr": "", 
"stderr_lines": [], "stdout": "", "stdout_lines": []}
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Notify the user about a failure]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system 
may not be provisioned according to the playbook results: please check the logs 
for the issue, fix accordingly or re-deploy from scratch.\n"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing 
ansible-playbook
[ INFO  ] Stage: Clean up


I have 4 NIC :
em1 10.0.0.230/8 is for ovirmgmt, it have the gateway
em2 10.0.0.229/8 is for a vmnetwork
em3+em4 in bond0 192.168.0.30 are for gluster with jumbo frames, volumes 
(ENGINE, ISO,EXPORT,DATA) are up and operationals.

I tried to stop em2 (ONBOOT=No and restart network), so the network is actually 
:
# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: em1:  mtu 1500 qdisc mq state UP group 
default qlen 1000
    link/ether e0:db:55:15:eb:70 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.230/8 brd 10.255.255.255 scope global em1
       valid_lft forever preferred_lft forever
    inet6 fe80::e2db:55ff:fe15:eb70/64 scope link 
       valid_lft forever preferred_lft forever
3: em2:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:eb:71 brd ff:ff:ff:ff:ff:ff
4: em3:  mtu 9000 qdisc mq master bond0 
state UP group default qlen 1000
    link/ether e0:db:55:15:eb:72 brd ff:ff:ff:ff:ff:ff
5: em4:  mtu 9000 qdisc mq master bond0 
state UP group default qlen 1000
    link/ether e0:db:55:15:eb:72 brd ff:ff:ff:ff:ff:ff
6: bond0:  mtu 9000 qdisc noqueue state 
UP group default qlen 1000
    link/ether e0:db:55:15:eb:72 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.30/24 brd 192.168.0.255 scope global bond0
       valid_lft forever preferred_lft forever
    inet6 fe80::e2db:55ff:fe15:eb72/64 scope link 
       valid_lft forever preferred_lft forever

# ip r
default via 10.0.1.254 dev em1 
10.0.0.0/8 dev em1 proto kernel scope link src 10.0.0.230 
169.254.0.0/16 dev em1 scope link metric 1002 
169.254.0.0/16 dev bond0 scope link metric 1006 
192.168.0.0/24 dev bond0 proto kernel scope link src 192.168.0.30 

but same issue, after "/usr/sbin/ovirt-hosted-engine-cleanup" and restarting 
the deployment.
NetworkManager was stopped and disabled at the node install, and it is still 
stopped.After the error, the network shows this after device 6 (bond0) :
7: virbr0:  mtu 1500 qdisc noqueue state UP 
group default qlen 1000
    link/ether 52:54:00:38:e0:

[ovirt-users] Re: Install hosted-engine - Task Get local VM IP failed

2018-06-25 Thread fsoyer


Hi staff,
Installing a fresh ovirt - CentOS 7.5.1804 up to date, ovirt version :
# rpm -qa | grep ovirt
ovirt-hosted-engine-ha-2.2.11-1.el7.centos.noarch
ovirt-imageio-common-1.3.1.2-0.el7.centos.noarch
ovirt-host-dependencies-4.2.2-2.el7.centos.x86_64
ovirt-vmconsole-1.0.5-4.el7.centos.noarch
ovirt-provider-ovn-driver-1.2.10-1.el7.centos.noarch
ovirt-hosted-engine-setup-2.2.20-1.el7.centos.noarch
ovirt-engine-appliance-4.2-20180504.1.el7.centos.noarch
python-ovirt-engine-sdk4-4.2.6-2.el7.centos.x86_64
ovirt-host-deploy-1.7.3-1.el7.centos.noarch
ovirt-release42-4.2.3.1-1.el7.noarch
ovirt-vmconsole-host-1.0.5-4.el7.centos.noarch
cockpit-ovirt-dashboard-0.11.24-1.el7.centos.noarch
ovirt-setup-lib-1.1.4-1.el7.centos.noarch
ovirt-imageio-daemon-1.3.1.2-0.el7.centos.noarch
ovirt-host-4.2.2-2.el7.centos.x86_64
ovirt-engine-sdk-python-3.6.9.1-1.el7.noarch

ON PHYSICAL SERVERS (not on VMware, why should I be ?? ;) I got exactly the 
same error :
[ INFO  ] TASK [Get local VM IP]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, 
"cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:69:3a:c6 | awk '{ 
print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.073313", "end": "2018-06-25 
16:11:36.025277", "rc": 0, "start": "2018-06-25 16:11:35.951964", "stderr": "", 
"stderr_lines": [], "stdout": "", "stdout_lines": []}
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Notify the user about a failure]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system 
may not be provisioned according to the playbook results: please check the logs 
for the issue, fix accordingly or re-deploy from scratch.\n"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing 
ansible-playbook
[ INFO  ] Stage: Clean up


I have 4 NIC :
em1 10.0.0.230/8 is for ovirmgmt, it have the gateway
em2 10.0.0.229/8 is for a vmnetwork
em3+em4 in bond0 192.168.0.30 are for gluster with jumbo frames, volumes 
(ENGINE, ISO,EXPORT,DATA) are up and operationals.

I tried to stop em2 (ONBOOT=No and restart network), so the network is actually 
:
# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: em1:  mtu 1500 qdisc mq state UP group 
default qlen 1000
    link/ether e0:db:55:15:eb:70 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.230/8 brd 10.255.255.255 scope global em1
       valid_lft forever preferred_lft forever
    inet6 fe80::e2db:55ff:fe15:eb70/64 scope link 
       valid_lft forever preferred_lft forever
3: em2:  mtu 1500 qdisc mq state DOWN group default qlen 
1000
    link/ether e0:db:55:15:eb:71 brd ff:ff:ff:ff:ff:ff
4: em3:  mtu 9000 qdisc mq master bond0 
state UP group default qlen 1000
    link/ether e0:db:55:15:eb:72 brd ff:ff:ff:ff:ff:ff
5: em4:  mtu 9000 qdisc mq master bond0 
state UP group default qlen 1000
    link/ether e0:db:55:15:eb:72 brd ff:ff:ff:ff:ff:ff
6: bond0:  mtu 9000 qdisc noqueue state 
UP group default qlen 1000
    link/ether e0:db:55:15:eb:72 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.30/24 brd 192.168.0.255 scope global bond0
       valid_lft forever preferred_lft forever
    inet6 fe80::e2db:55ff:fe15:eb72/64 scope link 
       valid_lft forever preferred_lft forever

# ip r
default via 10.0.1.254 dev em1 
10.0.0.0/8 dev em1 proto kernel scope link src 10.0.0.230 
169.254.0.0/16 dev em1 scope link metric 1002 
169.254.0.0/16 dev bond0 scope link metric 1006 
192.168.0.0/24 dev bond0 proto kernel scope link src 192.168.0.30 

but same issue, after "/usr/sbin/ovirt-hosted-engine-cleanup" and restarting 
the deployment.
NetworkManager was stopped and disabled at the node install, and it is still 
stopped.After the error, the network shows this after device 6 (bond0) :
7: virbr0:  mtu 1500 qdisc noqueue state UP 
group default qlen 1000
    link/ether 52:54:00:38:e0:5a brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever
8: virbr0-nic:  mtu 1500 qdisc pfifo_fast master virbr0 
state DOWN group default qlen 1000
    link/ether 52:54:00:38:e0:5a brd ff:ff:ff:ff:ff:ff
11: vnet0:  mtu 1500 qdisc pfifo_fast master 
virbr0 state UNKNOWN group default qlen 1000
    link/ether fe:16:3e:69:3a:c6 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc16:3eff:fe69:3ac6/64 scope link 
       valid_lft forever preferred_lft forever

I do not see ovirmgmt... And I don't know if I can access the engine vm as I 
have not its IP :(
I tried to ping addresses after 192.168.122.1, but no one are accessible so I 
stopped at 122.10. The VM seems up (kvm process), qemu-kvm process taking 150% 
of cpu in "top"...

I pasted the log here : https://pastebin.com/Ebzh1uEh

PLEASE ! This issue seems to be recc

Re: [ovirt-users] ?==?utf-8?q? ?==?utf-8?q? ?= vdsClient is removed and replaced by vdsm-client

2018-04-16 Thread fsoyer


Hi Arik,
unfortunatly I couldn't discuss last friday with a colleague who needs to work 
on the cluster this week-end. Discovering the freezed tasks, he found a 
workaround by deleting the lines in the job table, and the tasks disappear in 
the GUI. Then... He drops the faulty VM :(
But, I have shared the engine.log of april 11 and 12. The export to OVA of the 
"CO7_VM1" vm was lauched around 17:01, with other vms (to the same directory on 
host "victor"). Other vms exports end successfully but not this one.
Another export of the same vm was launched on april 12 at 10:42, to see if it 
says "already working"... But it didn't, and the second task rans with the 
first, indefinitly. 
This links are active for 2 days :
https://seafile.systea.fr/f/12d90cc5c59b488c9fde/?dl=1
https://seafile.systea.fr/f/ce7cd61231924020bc62/?dl=1
I've not seen something revelant in it, but have a look ?

This morning I tried to reproduce the problem. I've create a vm from template 
(50G). Migrate it : OK. Export it to OVA : OK.
I then extended the disk to 100G, then tested again. Migrate it : OK. Export it 
to OVA.. OK :( So I wasn't able to reproduce the error.

Thx
Frank
 
Le Vendredi, Avril 13, 2018 21:53 CEST, Arik Hadas  a écrit: 
  On Fri, Apr 13, 2018 at 6:54 PM, fsoyer  wrote:Hi,
This task is listed (since 2 days) in the GUI / up right "tasks" icon. It is 
visibly freezed as no OVA file has been created, but no errors in GUI, just... 
it runs. Or : it loops :)
This (test) vm is one on which I have extended the disk (50 -> 100G). Before 
being stopped and trying to export it to OVA, it works fine. All other vms 
around can be well exported but not this one. I've tried to restart engine, 
change SPM, restart one by one each node, but the task is always here. I could 
even restart the vm today without error and it works fine ! But... the task 
runs...
Today also, I tried to clone the vm : same thing, now I have 2 tasks running 
indefinitly :( 
Strange bug, where no timeout stopped the tasks in error. I can't see any 
revelant things in engine.log or vdsm.log, but probably I've not seen them in 
all the messages.
No problem to remove this (test) vm and try on another (test) one (extend disk 
to see if this is the reason of the problem). But before I want to kill this 
tasks ! Please don't remove that VM yet.It would be appreciated if you could 
file a bug and share the engine log that covers the attempt to export this VM 
to OVA + the ansible log of that operation. 
Thanks.
Frank

 Le Vendredi, Avril 13, 2018 16:24 CEST, Arik Hadas  a écrit:
   On Fri, Apr 13, 2018 at 11:14 AM, fsoyer  wrote:Hi all,
I can't find any exhaustive doc for new vdsm-client. My problem actually is a 
task (export a vm to OVA) blocked. I'm afraid you won't find any task in VDSM 
for 'export a VM to OVA'.Expoting a VM to OVA is comprised of three steps:1. 
Creating temporary cloned disks - you'll find a task of copy-image-group for 
each disk.2. Creating the OVA file - that's done by a python script executed by 
ansible, there is no task for that in VDSM.3. Removing the temporary cloned 
disks - you'll find a task of remove-image for each disk. Can you please 
elaborate on the problem you're having - where do you see that task and how can 
you see that it's blocked? I found that I can interact with
vdsm-client Task getInfo taskID=, and replace getInfo by "stop", BUT : 
how can I find this UUID ???
Old "vdsClient -s 0 getAllTasksStatuses" has no equivalent ??

Does someone knows if a complete doc exists dot vdsm-client ?
Thanks
Frank

 

Le Mercredi, Janvier 25, 2017 12:30 CET, Irit Goihman  a 
écrit:
 Hi All, vdsClient will be removed from master branch today.It is using XMLRPC 
protocol which has been deprecated and replaced by JSON-RPC. A new client for 
vdsm was introduced in 4.1: vdsm-client.This is a simple client that uses 
JSON-RPC protocol which was introduced in ovirt 3.5. The client is not aware of 
the available methods and parameters, and you should consult the schema [1] in 
order to construct the desired command. Future version should parse the schema 
and provide online help.  If you're using vdsClient, we will be happy to assist 
you in migrating to the new vdsm client.   vdsm-client usage: vdsm-client [-h] 
[-a ADDRESS] [-p PORT] [--unsecure] [--timeout TIMEOUT]                   [-f 
FILE] namespace method [name=value [name=value] ...]  Invoking simple methods: 
# vdsm-client Host getVMList['b3f6fa00-b315-4ad4-8108-f73da817b5c5'] For 
invoking methods with many or complex parameters, you can read the parameters 
from a JSON format file: # vdsm-client Lease info -f lease.json where 
lease.json file content is: {                                                   
                                   "lease": {

Re: [ovirt-users] ?==?utf-8?q? ?==?utf-8?q? ?= vdsClient is removed and replaced by vdsm-clien

2018-04-13 Thread fsoyer


Hi,
This task is listed (since 2 days) in the GUI / up right "tasks" icon. It is 
visibly freezed as no OVA file has been created, but no errors in GUI, just... 
it runs. Or : it loops :)
This (test) vm is one on which I have extended the disk (50 -> 100G). Before 
being stopped and trying to export it to OVA, it works fine. All other vms 
around can be well exported but not this one. I've tried to restart engine, 
change SPM, restart one by one each node, but the task is always here. I could 
even restart the vm today without error and it works fine ! But... the task 
runs...
Today also, I tried to clone the vm : same thing, now I have 2 tasks running 
indefinitly :(

Strange bug, where no timeout stopped the tasks in error. I can't see any 
revelant things in engine.log or vdsm.log, but probably I've not seen them in 
all the messages.
No problem to remove this (test) vm and try on another (test) one (extend disk 
to see if this is the reason of the problem). But before I want to kill this 
tasks !

Thanks.
Frank

 Le Vendredi, Avril 13, 2018 16:24 CEST, Arik Hadas  a écrit:
   On Fri, Apr 13, 2018 at 11:14 AM, fsoyer  wrote:Hi all,
I can't find any exhaustive doc for new vdsm-client. My problem actually is a 
task (export a vm to OVA) blocked. I'm afraid you won't find any task in VDSM 
for 'export a VM to OVA'.Expoting a VM to OVA is comprised of three steps:1. 
Creating temporary cloned disks - you'll find a task of copy-image-group for 
each disk.2. Creating the OVA file - that's done by a python script executed by 
ansible, there is no task for that in VDSM.3. Removing the temporary cloned 
disks - you'll find a task of remove-image for each disk. Can you please 
elaborate on the problem you're having - where do you see that task and how can 
you see that it's blocked? I found that I can interact with
vdsm-client Task getInfo taskID=, and replace getInfo by "stop", BUT : 
how can I find this UUID ???
Old "vdsClient -s 0 getAllTasksStatuses" has no equivalent ??

Does someone knows if a complete doc exists dot vdsm-client ?
Thanks
Frank

 

Le Mercredi, Janvier 25, 2017 12:30 CET, Irit Goihman  a 
écrit:
 Hi All, vdsClient will be removed from master branch today.It is using XMLRPC 
protocol which has been deprecated and replaced by JSON-RPC. A new client for 
vdsm was introduced in 4.1: vdsm-client.This is a simple client that uses 
JSON-RPC protocol which was introduced in ovirt 3.5. The client is not aware of 
the available methods and parameters, and you should consult the schema [1] in 
order to construct the desired command. Future version should parse the schema 
and provide online help.  If you're using vdsClient, we will be happy to assist 
you in migrating to the new vdsm client.   vdsm-client usage: vdsm-client [-h] 
[-a ADDRESS] [-p PORT] [--unsecure] [--timeout TIMEOUT]                   [-f 
FILE] namespace method [name=value [name=value] ...]  Invoking simple methods: 
# vdsm-client Host getVMList['b3f6fa00-b315-4ad4-8108-f73da817b5c5'] For 
invoking methods with many or complex parameters, you can read the parameters 
from a JSON format file: # vdsm-client Lease info -f lease.json where 
lease.json file content is: {                                                   
                                   "lease": {                                   
                                          "sd_id": 
"75ab40e3-06b1-4a54-a825-2df7a40b93b2",                                   
"lease_id": "b3f6fa00-b315-4ad4-8108-f73da817b5c5"                             
}                                                                          }   
It is also possible to read parameters from standard input, creating complex 
parameters interactively: # cat <https://github.com/oVirt/vdsm/blob/master/lib/api/vdsm-api.yml --Irit 
GoihmanSoftware EngineerRed Hat Israel Ltd.


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
 


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] ?==?utf-8?q? ?==?utf-8?q? ?= vdsClient is removed and replaced by vdsm-clien

2018-04-13 Thread fsoyer


Ok I see, thank you. 
vdsm-client Host getAllTasksStatuses works on the SPM.
But
vdsm-client Task getInfo TaskID=TaskID=55cbec7f-e7dc-4431-bce9-8ec1d61a7feb
returns :
vdsm-client: Command Task.getInfo with args {'TaskID': 
'55cbec7f-e7dc-4431-bce9-8ec1d61a7feb'} failed:
(code=-32603, message=Internal JSON-RPC error: {'reason': '__init__() takes 
exactly 2 arguments (1 given)'})

There is no examples with tasks below, and the link to github ends in 404...

I'll try to find some docs about API, and tasks management, if you believe 
that's best. Any link to share ?

Thanks,
Frank
 
Le Vendredi, Avril 13, 2018 14:41 CEST, Michal Skrivanek 
 a écrit:
   On 13 Apr 2018, at 10:14, fsoyer  wrote: Hi all,
I can't find any exhaustive doc for new vdsm-client. My problem actually is a 
task (export a vm to OVA) blocked. if you want to interact with that action it 
would always be best to start with engine’s REST API rather than internal 
host-side API I found that I can interact with
vdsm-client Task getInfo taskID=, and replace getInfo by "stop", BUT : 
how can I find this UUID ???Old "vdsClient -s 0 getAllTasksStatuses" has no 
equivalent ?? that’s a Host class api, vdsm-client Host getAllTasksStatuses
Does someone knows if a complete doc exists dot vdsm-client ? the man page 
mentioned below and source code. this is not a public API Thanks,michal Thanks
Frank

 

Le Mercredi, Janvier 25, 2017 12:30 CET, Irit Goihman  a 
écrit:
 Hi All, vdsClient will be removed from master branch today.It is using XMLRPC 
protocol which has been deprecated and replaced by JSON-RPC. A new client for 
vdsm was introduced in 4.1: vdsm-client.This is a simple client that uses 
JSON-RPC protocol which was introduced in ovirt 3.5. The client is not aware of 
the available methods and parameters, and you should consult the schema [1] in 
order to construct the desired command. Future version should parse the schema 
and provide online help.  If you're using vdsClient, we will be happy to assist 
you in migrating to the new vdsm client.   vdsm-client usage: vdsm-client [-h] 
[-a ADDRESS] [-p PORT] [--unsecure] [--timeout TIMEOUT]                   [-f 
FILE] namespace method [name=value [name=value] ...]  Invoking simple methods: 
# vdsm-client Host getVMList['b3f6fa00-b315-4ad4-8108-f73da817b5c5'] For 
invoking methods with many or complex parameters, you can read the parameters 
from a JSON format file: # vdsm-client Lease info -f lease.json where 
lease.json file content is: {                                                   
                                   "lease": {                                   
                                          "sd_id": 
"75ab40e3-06b1-4a54-a825-2df7a40b93b2",                                   
"lease_id": "b3f6fa00-b315-4ad4-8108-f73da817b5c5"                             
}                                                                          }   
It is also possible to read parameters from standard input, creating complex 
parameters interactively: # cat <https://github.com/oVirt/vdsm/blob/master/lib/api/vdsm-api.yml --Irit 
GoihmanSoftware EngineerRed Hat Israel Ltd.

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] ?==?utf-8?q? vdsClient is removed and replaced by vdsm-client

2018-04-13 Thread fsoyer


Hi all,
I can't find any exhaustive doc for new vdsm-client. My problem actually is a 
task (export a vm to OVA) blocked.
I found that I can interact with
vdsm-client Task getInfo taskID=, and replace getInfo by "stop", BUT : 
how can I find this UUID ???
Old "vdsClient -s 0 getAllTasksStatuses" has no equivalent ??

Does someone knows if a complete doc exists dot vdsm-client ?
Thanks
Frank

 

Le Mercredi, Janvier 25, 2017 12:30 CET, Irit Goihman  a 
écrit:
 Hi All, vdsClient will be removed from master branch today.It is using XMLRPC 
protocol which has been deprecated and replaced by JSON-RPC. A new client for 
vdsm was introduced in 4.1: vdsm-client.This is a simple client that uses 
JSON-RPC protocol which was introduced in ovirt 3.5. The client is not aware of 
the available methods and parameters, and you should consult the schema [1] in 
order to construct the desired command. Future version should parse the schema 
and provide online help.  If you're using vdsClient, we will be happy to assist 
you in migrating to the new vdsm client.   vdsm-client usage: vdsm-client [-h] 
[-a ADDRESS] [-p PORT] [--unsecure] [--timeout TIMEOUT]                   [-f 
FILE] namespace method [name=value [name=value] ...]  Invoking simple methods: 
# vdsm-client Host getVMList['b3f6fa00-b315-4ad4-8108-f73da817b5c5'] For 
invoking methods with many or complex parameters, you can read the parameters 
from a JSON format file: # vdsm-client Lease info -f lease.json where 
lease.json file content is: {                                                   
                                   "lease": {                                   
                                          "sd_id": 
"75ab40e3-06b1-4a54-a825-2df7a40b93b2",                                   
"lease_id": "b3f6fa00-b315-4ad4-8108-f73da817b5c5"                             
}                                                                          }   
It is also possible to read parameters from standard input, creating complex 
parameters interactively: # cat

Re: [ovirt-users] VMs with multiple vdisks don't migrate

2018-03-01 Thread fsoyer


I Milan,
I tried to activate the debug mode, but the restart of libvirt crashed 
something on the host : it was no more possible to start any vm on it, and 
migration to it just never started. So I decided to restart it, and to be sure, 
I've restarted all the hosts.
And... now the migration of all VMs, simple or multi-disks, works ?!? So, there 
was probably something hidden that was resetted or repaired by the global 
restart ! In french, we call that "tomber en marche" ;)

So : solved. Thank you for the wasted time !

--

Cordialement,

Frank Soyer
Mob. 06 72 28 38 53 - Fix. 05 49 50 52 34

Le Lundi, Février 26, 2018 12:59 CET, Milan Zamazal  a 
écrit:
 "fsoyer"  writes:

> I don't beleive that this is relatd to a host, tests have been done from 
> victor
> source to ginger dest and ginger to victor. I don't see problems on storage
> (gluster 3.12 native managed by ovirt), when VMs with a uniq disk from 20 to
> 250G migrate without error in some seconds and with no downtime.

The host itself may be fine, but libvirt/QEMU running there may expose
problems, perhaps just for some VMs. According to your logs something
is not behaving as expected on the source host during the faulty
migration.

> How ca I enable this libvirt debug mode ?

Set the following options in /etc/libvirt/libvirtd.conf (look for
examples in comments there)

- log_level=1
- log_outputs="1:file:/var/log/libvirt/libvirtd.log"

and restart libvirt. Then /var/log/libvirt/libvirtd.log should contain
the log. It will be huge, so I suggest to enable it only for the time
of reproducing the problem.

> --
>
> Cordialement,
>
> Frank Soyer
>
>  
>
> Le Vendredi, Février 23, 2018 09:56 CET, Milan Zamazal  
> a écrit:
>  Maor Lipchuk  writes:
>
>> I encountered a bug (see [1]) which contains the same error mentioned in
>> your VDSM logs (see [2]), but I doubt it is related.
>
> Indeed, it's not related.
>
> The error in vdsm_victor.log just means that the info gathering call
> tries to access libvirt domain before the incoming migration is
> completed. It's ugly but harmless.
>
>> Milan, maybe you have any advice to troubleshoot the issue? Will the
>> libvirt/qemu logs can help?
>
> It seems there is something wrong on (at least) the source host. There
> are no migration progress messages in the vdsm_ginger.log and there are
> warnings about stale stat samples. That looks like problems with
> calling libvirt – slow and/or stuck calls, maybe due to storage
> problems. The possibly faulty second disk could cause that.
>
> libvirt debug logs could tell us whether that is indeed the problem and
> whether it is caused by storage or something else.
>
>> I would suggest to open a bug on that issue so we can track it more
>> properly.
>>
>> Regards,
>> Maor
>>
>>
>> [1]
>> https://bugzilla.redhat.com/show_bug.cgi?id=1486543 - Migration leads to
>> VM running on 2 Hosts
>>
>> [2]
>> 2018-02-16 09:43:35,236+0100 ERROR (jsonrpc/7) [jsonrpc.JsonRpcServer]
>> Internal server error (__init__:577)
>> Traceback (most recent call last):
>> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 572,
>> in _handle_request
>> res = method(**params)
>> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 198, in
>> _dynamicMethod
>> result = fn(*methodArgs)
>> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies
>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies()
>> File "/usr/share/vdsm/clientIF.py", line 454, in getAllVmIoTunePolicies
>> 'current_values': v.getIoTune()}
>> File "/usr/share/vdsm/virt/vm.py", line 2859, in getIoTune
>> result = self.getIoTuneResponse()
>> File "/usr/share/vdsm/virt/vm.py", line 2878, in getIoTuneResponse
>> res = self._dom.blockIoTune(
>> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47,
>> in __getattr__
>> % self.vmid)
>> NotConnectedError: VM u'755cf168-de65-42ed-b22f-efe9136f7594' was not
>> started yet or was shut down
>>
>> On Thu, Feb 22, 2018 at 4:22 PM, fsoyer  wrote:
>>
>>> Hi,
>>> Yes, on 2018-02-16 (vdsm logs) I tried with a VM standing on ginger
>>> (192.168.0.6) migrated (or failed to migrate...) to victor (192.168.0.5),
>>> while the engine.log in the first mail on 2018-02-12 was for VMs standing
>>> on victor, migrated (or failed to migrate...) to ginger. Symptoms were
>>> exactly the same, in both directions, and VMs works like a charm before,
>>> and even after (migration "killed

Re: [ovirt-users] VMs with multiple vdisks don't migrate

2018-02-26 Thread fsoyer


Hi,
I don't beleive that this is relatd to a host, tests have been done from victor 
source to ginger dest and ginger to victor. I don't see problems on storage 
(gluster 3.12 native managed by ovirt), when VMs with a uniq disk from 20 to 
250G migrate without error in some seconds and with no downtime.
How ca I enable this libvirt debug mode ?

--

Cordialement,

Frank Soyer

 

Le Vendredi, Février 23, 2018 09:56 CET, Milan Zamazal  a 
écrit:
 Maor Lipchuk  writes:

> I encountered a bug (see [1]) which contains the same error mentioned in
> your VDSM logs (see [2]), but I doubt it is related.

Indeed, it's not related.

The error in vdsm_victor.log just means that the info gathering call
tries to access libvirt domain before the incoming migration is
completed. It's ugly but harmless.

> Milan, maybe you have any advice to troubleshoot the issue? Will the
> libvirt/qemu logs can help?

It seems there is something wrong on (at least) the source host. There
are no migration progress messages in the vdsm_ginger.log and there are
warnings about stale stat samples. That looks like problems with
calling libvirt – slow and/or stuck calls, maybe due to storage
problems. The possibly faulty second disk could cause that.

libvirt debug logs could tell us whether that is indeed the problem and
whether it is caused by storage or something else.

> I would suggest to open a bug on that issue so we can track it more
> properly.
>
> Regards,
> Maor
>
>
> [1]
> https://bugzilla.redhat.com/show_bug.cgi?id=1486543 - Migration leads to
> VM running on 2 Hosts
>
> [2]
> 2018-02-16 09:43:35,236+0100 ERROR (jsonrpc/7) [jsonrpc.JsonRpcServer]
> Internal server error (__init__:577)
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 572,
> in _handle_request
> res = method(**params)
> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 198, in
> _dynamicMethod
> result = fn(*methodArgs)
> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies
> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies()
> File "/usr/share/vdsm/clientIF.py", line 454, in getAllVmIoTunePolicies
> 'current_values': v.getIoTune()}
> File "/usr/share/vdsm/virt/vm.py", line 2859, in getIoTune
> result = self.getIoTuneResponse()
> File "/usr/share/vdsm/virt/vm.py", line 2878, in getIoTuneResponse
> res = self._dom.blockIoTune(
> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47,
> in __getattr__
> % self.vmid)
> NotConnectedError: VM u'755cf168-de65-42ed-b22f-efe9136f7594' was not
> started yet or was shut down
>
> On Thu, Feb 22, 2018 at 4:22 PM, fsoyer  wrote:
>
>> Hi,
>> Yes, on 2018-02-16 (vdsm logs) I tried with a VM standing on ginger
>> (192.168.0.6) migrated (or failed to migrate...) to victor (192.168.0.5),
>> while the engine.log in the first mail on 2018-02-12 was for VMs standing
>> on victor, migrated (or failed to migrate...) to ginger. Symptoms were
>> exactly the same, in both directions, and VMs works like a charm before,
>> and even after (migration "killed" by a poweroff of VMs).
>> Am I the only one experimenting this problem ?
>>
>>
>> Thanks
>> --
>>
>> Cordialement,
>>
>> *Frank Soyer *
>>
>>
>>
>> Le Jeudi, Février 22, 2018 00:45 CET, Maor Lipchuk 
>> a écrit:
>>
>>
>> Hi Frank,
>>
>> Sorry about the delay repond.
>> I've been going through the logs you attached, although I could not find
>> any specific indication why the migration failed because of the disk you
>> were mentionning.
>> Does this VM run with both disks on the target host without migration?
>>
>> Regards,
>> Maor
>>
>>
>> On Fri, Feb 16, 2018 at 11:03 AM, fsoyer  wrote:
>>>
>>> Hi Maor,
>>> sorry for the double post, I've change the email adress of my account and
>>> supposed that I'd need to re-post it.
>>> And thank you for your time. Here are the logs. I added a vdisk to an
>>> existing VM : it no more migrates, needing to poweroff it after minutes.
>>> Then simply deleting the second disk makes migrate it in exactly 9s without
>>> problem !
>>> https://gist.github.com/fgth/4707446331d201eef574ac31b6e89561
>>> https://gist.github.com/fgth/f8de9c22664aee53722af676bff8719d
>>>
>>> --
>>>
>>> Cordialement,
>>>
>>> *Frank Soyer *
>>> Le Mercredi, Février 14, 2018 11:04 CET, Maor Lipchuk <
>>> mlipc...@redhat.com> a écrit:
>>

Re: [ovirt-users] ?==?utf-8?q? VMs with multiple vdisks don't migrate

2018-02-22 Thread fsoyer


Hi,
Yes, on 2018-02-16 (vdsm logs) I tried with a VM standing on ginger 
(192.168.0.6) migrated (or failed to migrate...) to victor (192.168.0.5), while 
the engine.log in the first mail on 2018-02-12 was for VMs standing on victor, 
migrated (or failed to migrate...) to ginger. Symptoms were exactly the same, 
in both directions, and VMs works like a charm before, and even after 
(migration "killed" by a poweroff of VMs).
Am I the only one experimenting this problem ?


Thanks
--

Cordialement,

Frank Soyer
 

Le Jeudi, Février 22, 2018 00:45 CET, Maor Lipchuk  a 
écrit:
 Hi Frank, Sorry about the delay repond.I've been going through the logs you 
attached, although I could not find any specific indication why the migration 
failed because of the disk you were mentionning.Does this VM run with both 
disks on the target host without migration? Regards,Maor  On Fri, Feb 16, 2018 
at 11:03 AM, fsoyer  wrote:Hi Maor,
sorry for the double post, I've change the email adress of my account and 
supposed that I'd need to re-post it.
And thank you for your time. Here are the logs. I added a vdisk to an existing 
VM : it no more migrates, needing to poweroff it after minutes. Then simply 
deleting the second disk makes migrate it in exactly 9s without problem ! 
https://gist.github.com/fgth/4707446331d201eef574ac31b6e89561
https://gist.github.com/fgth/f8de9c22664aee53722af676bff8719d

--
Cordialement,

Frank Soyer Le Mercredi, Février 14, 2018 11:04 CET, Maor Lipchuk 
 a écrit:
 Hi Frank, I already replied on your last email.Can you provide the VDSM logs 
from the time of the migration failure for both hosts:  ginger.local.systea.fr 
and victor.local.systea.fr Thanks,Maor On Wed, Feb 14, 2018 at 11:23 AM, fsoyer 
 wrote:
Hi all,
I discovered yesterday a problem when migrating VM with more than one vdisk.
On our test servers (oVirt4.1, shared storage with Gluster), I created 2 VMs 
needed for a test, from a template with a 20G vdisk. On this VMs I added a 100G 
vdisk (for this tests I didn't want to waste time to extend the existing 
vdisks... But I lost time finally...). The VMs with the 2 vdisks works well.
Now I saw some updates waiting on the host. I tried to put it in maintenance... 
But it stopped on the two VM. They were marked "migrating", but no more 
accessible. Other (small) VMs with only 1 vdisk was migrated without problem at 
the same time.
I saw that a kvm process for the (big) VMs was launched on the source AND 
destination host, but after tens of minutes, the migration and the VMs was 
always freezed. I tried to cancel the migration for the VMs : failed. The only 
way to stop it was to poweroff the VMs : the kvm process died on the 2 hosts 
and the GUI alerted on a failed migration.
In doubt, I tried to delete the second vdisk on one of this VMs : it migrates 
then without error ! And no access problem.
I tried to extend the first vdisk of the second VM, the delete the second vdisk 
: it migrates now without problem !   

So after another test with a VM with 2 vdisks, I can say that this blocked the 
migration process :(

In engine.log, for a VMs with 1 vdisk migrating well, we see :2018-02-12 
16:46:29,705+01 INFO  [org.ovirt.engine.core.bll.MigrateVmToServerCommand] 
(default task-28) [2f712024-5982-46a8-82c8-fd8293da5725] Lock Acquired to 
object 'EngineLock:{exclusiveLocks='[3f57e669-5e4c-4d10-85cc-d573004a099d=VM]', 
sharedLocks=''}'
2018-02-12 16:46:29,955+01 INFO  
[org.ovirt.engine.core.bll.MigrateVmToServerCommand] 
(org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] 
Running command: MigrateVmToServerCommand internal: false. Entities affected :  
ID: 3f57e669-5e4c-4d10-85cc-d573004a099d Type: VMAction group MIGRATE_VM with 
role type USER
2018-02-12 16:46:30,261+01 INFO  
[org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] 
(org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] 
START, MigrateVDSCommand( MigrateVDSCommandParameters:{runAsync='true', 
hostId='ce3938b1-b23f-4d22-840a-f17d7cd87bb1', 
vmId='3f57e669-5e4c-4d10-85cc-d573004a099d', srcHost='192.168.0.6', 
dstVdsId='d569c2dd-8f30-4878-8aea-858db285cf69', dstHost='192.168.0.5:54321', 
migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', 
autoConverge='true', migrateCompressed='false', consoleAddress='null', 
maxBandwidth='500', enableGuestEvents='true', maxIncomingMigrations='2', 
maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, 
params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, 
{limit=2, action={name=setDowntime, params=[200]}}, {limit=3, 
action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, 
params=[400]}}, {limit=6, action={name=setDowntime, params=[500

Re: [ovirt-users] ?==?utf-8?q? VMs with multiple vdisks don't migrate

2018-02-16 Thread fsoyer


Hi Maor,
sorry for the double post, I've change the email adress of my account and 
supposed that I'd need to re-post it.
And thank you for your time. Here are the logs. I added a vdisk to an existing 
VM : it no more migrates, needing to poweroff it after minutes. Then simply 
deleting the second disk makes migrate it in exactly 9s without problem ! 
https://gist.github.com/fgth/4707446331d201eef574ac31b6e89561
https://gist.github.com/fgth/f8de9c22664aee53722af676bff8719d

--

Cordialement,

Frank Soyer Le Mercredi, Février 14, 2018 11:04 CET, Maor Lipchuk 
 a écrit:
 Hi Frank, I already replied on your last email.Can you provide the VDSM logs 
from the time of the migration failure for both hosts:  ginger.local.systea.fr 
and victor.local.systea.fr Thanks,Maor On Wed, Feb 14, 2018 at 11:23 AM, fsoyer 
 wrote:
Hi all,
I discovered yesterday a problem when migrating VM with more than one vdisk.
On our test servers (oVirt4.1, shared storage with Gluster), I created 2 VMs 
needed for a test, from a template with a 20G vdisk. On this VMs I added a 100G 
vdisk (for this tests I didn't want to waste time to extend the existing 
vdisks... But I lost time finally...). The VMs with the 2 vdisks works well.
Now I saw some updates waiting on the host. I tried to put it in maintenance... 
But it stopped on the two VM. They were marked "migrating", but no more 
accessible. Other (small) VMs with only 1 vdisk was migrated without problem at 
the same time.
I saw that a kvm process for the (big) VMs was launched on the source AND 
destination host, but after tens of minutes, the migration and the VMs was 
always freezed. I tried to cancel the migration for the VMs : failed. The only 
way to stop it was to poweroff the VMs : the kvm process died on the 2 hosts 
and the GUI alerted on a failed migration.
In doubt, I tried to delete the second vdisk on one of this VMs : it migrates 
then without error ! And no access problem.
I tried to extend the first vdisk of the second VM, the delete the second vdisk 
: it migrates now without problem !   

So after another test with a VM with 2 vdisks, I can say that this blocked the 
migration process :(

In engine.log, for a VMs with 1 vdisk migrating well, we see :2018-02-12 
16:46:29,705+01 INFO  [org.ovirt.engine.core.bll.MigrateVmToServerCommand] 
(default task-28) [2f712024-5982-46a8-82c8-fd8293da5725] Lock Acquired to 
object 'EngineLock:{exclusiveLocks='[3f57e669-5e4c-4d10-85cc-d573004a099d=VM]', 
sharedLocks=''}'
2018-02-12 16:46:29,955+01 INFO  
[org.ovirt.engine.core.bll.MigrateVmToServerCommand] 
(org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] 
Running command: MigrateVmToServerCommand internal: false. Entities affected :  
ID: 3f57e669-5e4c-4d10-85cc-d573004a099d Type: VMAction group MIGRATE_VM with 
role type USER
2018-02-12 16:46:30,261+01 INFO  
[org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] 
(org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] 
START, MigrateVDSCommand( MigrateVDSCommandParameters:{runAsync='true', 
hostId='ce3938b1-b23f-4d22-840a-f17d7cd87bb1', 
vmId='3f57e669-5e4c-4d10-85cc-d573004a099d', srcHost='192.168.0.6', 
dstVdsId='d569c2dd-8f30-4878-8aea-858db285cf69', dstHost='192.168.0.5:54321', 
migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', 
autoConverge='true', migrateCompressed='false', consoleAddress='null', 
maxBandwidth='500', enableGuestEvents='true', maxIncomingMigrations='2', 
maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, 
params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, 
{limit=2, action={name=setDowntime, params=[200]}}, {limit=3, 
action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, 
params=[400]}}, {limit=6, action={name=setDowntime, params=[500]}}, {limit=-1, 
action={name=abort, params=[]}}]]'}), log id: 14f61ee0
2018-02-12 16:46:30,262+01 INFO  
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] 
(org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] 
START, MigrateBrokerVDSCommand(HostName = victor.local.systea.fr, 
MigrateVDSCommandParameters:{runAsync='true', 
hostId='ce3938b1-b23f-4d22-840a-f17d7cd87bb1', 
vmId='3f57e669-5e4c-4d10-85cc-d573004a099d', srcHost='192.168.0.6', 
dstVdsId='d569c2dd-8f30-4878-8aea-858db285cf69', dstHost='192.168.0.5:54321', 
migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', 
autoConverge='true', migrateCompressed='false', consoleAddress='null', 
maxBandwidth='500', enableGuestEvents='true', maxIncomingMigrations='2',

[ovirt-users] VMs with multiple vdisks don't migrate

2018-02-14 Thread fsoyer


Hi all,
I discovered yesterday a problem when migrating VM with more than one vdisk.
On our test servers (oVirt4.1, shared storage with Gluster), I created 2 VMs 
needed for a test, from a template with a 20G vdisk. On this VMs I added a 100G 
vdisk (for this tests I didn't want to waste time to extend the existing 
vdisks... But I lost time finally...). The VMs with the 2 vdisks works well.
Now I saw some updates waiting on the host. I tried to put it in maintenance... 
But it stopped on the two VM. They were marked "migrating", but no more 
accessible. Other (small) VMs with only 1 vdisk was migrated without problem at 
the same time.
I saw that a kvm process for the (big) VMs was launched on the source AND 
destination host, but after tens of minutes, the migration and the VMs was 
always freezed. I tried to cancel the migration for the VMs : failed. The only 
way to stop it was to poweroff the VMs : the kvm process died on the 2 hosts 
and the GUI alerted on a failed migration.
In doubt, I tried to delete the second vdisk on one of this VMs : it migrates 
then without error ! And no access problem.
I tried to extend the first vdisk of the second VM, the delete the second vdisk 
: it migrates now without problem !   

So after another test with a VM with 2 vdisks, I can say that this blocked the 
migration process :(

In engine.log, for a VMs with 1 vdisk migrating well, we see :2018-02-12 
16:46:29,705+01 INFO  [org.ovirt.engine.core.bll.MigrateVmToServerCommand] 
(default task-28) [2f712024-5982-46a8-82c8-fd8293da5725] Lock Acquired to 
object 'EngineLock:{exclusiveLocks='[3f57e669-5e4c-4d10-85cc-d573004a099d=VM]', 
sharedLocks=''}'
2018-02-12 16:46:29,955+01 INFO  
[org.ovirt.engine.core.bll.MigrateVmToServerCommand] 
(org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] 
Running command: MigrateVmToServerCommand internal: false. Entities affected :  
ID: 3f57e669-5e4c-4d10-85cc-d573004a099d Type: VMAction group MIGRATE_VM with 
role type USER
2018-02-12 16:46:30,261+01 INFO  
[org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] 
(org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] 
START, MigrateVDSCommand( MigrateVDSCommandParameters:{runAsync='true', 
hostId='ce3938b1-b23f-4d22-840a-f17d7cd87bb1', 
vmId='3f57e669-5e4c-4d10-85cc-d573004a099d', srcHost='192.168.0.6', 
dstVdsId='d569c2dd-8f30-4878-8aea-858db285cf69', dstHost='192.168.0.5:54321', 
migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', 
autoConverge='true', migrateCompressed='false', consoleAddress='null', 
maxBandwidth='500', enableGuestEvents='true', maxIncomingMigrations='2', 
maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, 
params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, 
{limit=2, action={name=setDowntime, params=[200]}}, {limit=3, 
action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, 
params=[400]}}, {limit=6, action={name=setDowntime, params=[500]}}, {limit=-1, 
action={name=abort, params=[]}}]]'}), log id: 14f61ee0
2018-02-12 16:46:30,262+01 INFO  
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] 
(org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] 
START, MigrateBrokerVDSCommand(HostName = victor.local.systea.fr, 
MigrateVDSCommandParameters:{runAsync='true', 
hostId='ce3938b1-b23f-4d22-840a-f17d7cd87bb1', 
vmId='3f57e669-5e4c-4d10-85cc-d573004a099d', srcHost='192.168.0.6', 
dstVdsId='d569c2dd-8f30-4878-8aea-858db285cf69', dstHost='192.168.0.5:54321', 
migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', 
autoConverge='true', migrateCompressed='false', consoleAddress='null', 
maxBandwidth='500', enableGuestEvents='true', maxIncomingMigrations='2', 
maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, 
params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, 
{limit=2, action={name=setDowntime, params=[200]}}, {limit=3, 
action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, 
params=[400]}}, {limit=6, action={name=setDowntime, params=[500]}}, {limit=-1, 
action={name=abort, params=[]}}]]'}), log id: 775cd381
2018-02-12 16:46:30,277+01 INFO  
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] 
(org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] 
FINISH, MigrateBrokerVDSCommand, log id: 775cd381
2018-02-12 16:46:30,285+01 INFO  
[org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] 
(org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] 
FINISH, MigrateVDSCommand, return: MigratingFrom, log id: 14f61ee0
2018-02-12 16:46:30,301+01 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] 
EVENT_ID: VM_MIGRATION_START(62), Correlation ID: 
2f712024-5982-46a8-82c8-fd8293da5725, Job ID: 
4bd19aa9-cc99-4d02-884e-5a1e8

[ovirt-users] VM with multiple vdisks can't migrate

2018-02-13 Thread fsoyer


Hi all,
I discovered yesterday a problem when migrating VM with more than one vdisk.
On our test servers (oVirt4.1, shared storage with Gluster), I created 2 VMs 
needed for a test, from a template with a 20G vdisk. On this VMs I added a 100G 
vdisk (for this tests I didn't want to waste time to extend the existing 
vdisks... But I lost time finally...). The VMs with the 2 vdisks works well.
Now I saw some updates waiting on the host. I tried to put it in maintenance... 
But it stopped on the two VM. They were marked "migrating", but no more 
accessible. Other (small) VMs with only 1 vdisk was migrated without problem at 
the same time.
I saw that a kvm process for the (big) VMs was launched on the source AND 
destination host, but after tens of minutes, the migration and the VMs was 
always freezed. I tried to cancel the migration for the VMs : failed. The only 
way to stop it was to poweroff the VMs : the kvm process died on the 2 hosts 
and the GUI alerted on a failed migration.
In doubt, I tried to delete the second vdisk on one of this VMs : it migrates 
then without error ! And no access problem.
I tried to extend the first vdisk of the second VM, the delete the second vdisk 
: it migrates now without problem !   

So after another test with a VM with 2 vdisks, I can say that this blocked the 
migration process :(

In engine.log, for a VMs with 1 vdisk migrating well, we see :2018-02-12 
16:46:29,705+01 INFO  [org.ovirt.engine.core.bll.MigrateVmToServerCommand] 
(default task-28) [2f712024-5982-46a8-82c8-fd8293da5725] Lock Acquired to 
object 'EngineLock:{exclusiveLocks='[3f57e669-5e4c-4d10-85cc-d573004a099d=VM]', 
sharedLocks=''}'
2018-02-12 16:46:29,955+01 INFO  
[org.ovirt.engine.core.bll.MigrateVmToServerCommand] 
(org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] 
Running command: MigrateVmToServerCommand internal: false. Entities affected :  
ID: 3f57e669-5e4c-4d10-85cc-d573004a099d Type: VMAction group MIGRATE_VM with 
role type USER
2018-02-12 16:46:30,261+01 INFO  
[org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] 
(org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] 
START, MigrateVDSCommand( MigrateVDSCommandParameters:{runAsync='true', 
hostId='ce3938b1-b23f-4d22-840a-f17d7cd87bb1', 
vmId='3f57e669-5e4c-4d10-85cc-d573004a099d', srcHost='192.168.0.6', 
dstVdsId='d569c2dd-8f30-4878-8aea-858db285cf69', dstHost='192.168.0.5:54321', 
migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', 
autoConverge='true', migrateCompressed='false', consoleAddress='null', 
maxBandwidth='500', enableGuestEvents='true', maxIncomingMigrations='2', 
maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, 
params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, 
{limit=2, action={name=setDowntime, params=[200]}}, {limit=3, 
action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, 
params=[400]}}, {limit=6, action={name=setDowntime, params=[500]}}, {limit=-1, 
action={name=abort, params=[]}}]]'}), log id: 14f61ee0
2018-02-12 16:46:30,262+01 INFO  
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] 
(org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] 
START, MigrateBrokerVDSCommand(HostName = victor.local.systea.fr, 
MigrateVDSCommandParameters:{runAsync='true', 
hostId='ce3938b1-b23f-4d22-840a-f17d7cd87bb1', 
vmId='3f57e669-5e4c-4d10-85cc-d573004a099d', srcHost='192.168.0.6', 
dstVdsId='d569c2dd-8f30-4878-8aea-858db285cf69', dstHost='192.168.0.5:54321', 
migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', 
autoConverge='true', migrateCompressed='false', consoleAddress='null', 
maxBandwidth='500', enableGuestEvents='true', maxIncomingMigrations='2', 
maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, 
params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, 
{limit=2, action={name=setDowntime, params=[200]}}, {limit=3, 
action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, 
params=[400]}}, {limit=6, action={name=setDowntime, params=[500]}}, {limit=-1, 
action={name=abort, params=[]}}]]'}), log id: 775cd381
2018-02-12 16:46:30,277+01 INFO  
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] 
(org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] 
FINISH, MigrateBrokerVDSCommand, log id: 775cd381
2018-02-12 16:46:30,285+01 INFO  
[org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] 
(org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] 
FINISH, MigrateVDSCommand, return: MigratingFrom, log id: 14f61ee0
2018-02-12 16:46:30,301+01 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(org.ovirt.thread.pool-6-thread-32) [2f712024-5982-46a8-82c8-fd8293da5725] 
EVENT_ID: VM_MIGRATION_START(62), Correlation ID: 
2f712024-5982-46a8-82c8-fd8293da5725, Job ID: 
4bd19aa9-cc99-4d02-884e-5a1e8

38 matches

Mail list logo