Hi, I have been seeing random failures of tests in different projects caused by vdsm monitoring.
I need someone from vdsm to please help debug this issue. >From what I can see, the test suspend/resume vm failed because we could not query the status of the vm on the host. you can see full log from failed tests here: https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/10208/artifact/basic-suite.el7.x86_64/test_logs/basic-suite-master/post-004_basic_sanity.py/ Here are the errors that I can see in the vdsm which seem to suggest that there is an issue getting stats on the vm which was suspended and caused the failure of the test: 2018-09-11 00:40:05,896-0400 INFO (monitor/c1fe6e6) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:734) 2018-09-11 00:40:07,957-0400 DEBUG (qgapoller/1) [vds] Not sending QEMU-GA command 'guest-get-users' to vm_id='8214433a-f233-4aaa-aeda-2ce1d31c78dc', command is not supported (qemuguestagent:192) 2018-09-11 00:40:08,068-0400 DEBUG (periodic/3) [virt.sampling.VMBulkstatsMonitor] sampled timestamp 4296118.49 elapsed 0.010 acquired True domains all (sampling:443) 2018-09-11 00:40:08,271-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer] Calling 'Image.prepare' in bridge with {u'allowIllegal': True, u'storagepoolID': u'e80a56d9-74da-498a-b010-4a9df287f11d', u'imageID': u'd4c831e6-02d2-4d89-b516-0ec4597 5e024', u'volumeID': u'15b07af1-625b-42e3-b62a-8e7c7a120a56', u'storagedomainID': u'f1744940-41b6-4d35-b7bf-870c4e07d995'} (__init__:329) 2018-09-11 00:40:10,846-0400 DEBUG (vmchannels) [virt.vm] (vmId='8214433a-f233-4aaa-aeda-2ce1d31c78dc') Guest connection timed out (guestagent:556) 2018-09-11 00:40:11,637-0400 DEBUG (jsonrpc/5) [jsonrpc.JsonRpcServer] Calling 'Host.getStats' in bridge with {} (__init__:329) 2018-09-11 00:40:11,637-0400 INFO (jsonrpc/5) [api.host] START getStats() from=::ffff:192.168.201.4,49184 (api:47) 2018-09-11 00:40:11,643-0400 DEBUG (jsonrpc/5) [root] cannot read eth0 speed (nic:42) 2018-09-11 00:40:11,645-0400 DEBUG (jsonrpc/5) [root] cannot read eth1 speed (nic:42) 2018-09-11 00:40:11,647-0400 DEBUG (jsonrpc/5) [root] cannot read eth2 speed (nic:42) 2018-09-11 00:40:11,649-0400 DEBUG (jsonrpc/5) [root] cannot read eth3 speed (nic:42) 2018-09-11 00:40:11,667-0400 INFO (jsonrpc/5) [api.host] FINISH getStats error=[Errno 22] Invalid argument from=::ffff:192.168.201.4,49184 (api:51) 2018-09-11 00:40:11,667-0400 ERROR (jsonrpc/5) [jsonrpc.JsonRpcServer] Internal server error (__init__:350) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 345, in _handle_request res = method(**params) File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 202, in _dynamicMethod result = fn(*methodArgs) File "<string>", line 2, in getStats File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1407, in getStats multipath=True)} File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 49, in get_stats decStats = stats.produce(first_sample, last_sample) File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 71, in produce stats.update(get_interfaces_stats()) File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 153, in get_interfaces_stats return net_api.network_stats() File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 63, in network_stats return netstats.report() File "/usr/lib/python2.7/site-packages/vdsm/network/netstats.py", line 31, in report stats = link_stats.report() File "/usr/lib/python2.7/site-packages/vdsm/network/link/stats.py", line 41, in report speed = vlan.speed(i.device) File "/usr/lib/python2.7/site-packages/vdsm/network/link/vlan.py", line 36, in speed dev_speed = nic.read_speed_using_sysfs(dev_name) File "/usr/lib/python2.7/site-packages/vdsm/network/link/nic.py", line 48, in read_speed_using_sysfs s = int(f.read()) IOError: [Errno 22] Invalid argument 2018-09-11 00:40:11,669-0400 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.getStats failed (error -32603) in 0.03 seconds (__init__:312) 2018-09-11 00:40:14,656-0400 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmStats' in bridge with {} (__init__:329) 2018-09-11 00:40:14,657-0400 INFO (jsonrpc/2) [api.host] START getAllVmStats() from=::1,46772 (api:47) 2018-09-11 00:40:14,659-0400 INFO (jsonrpc/2) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::1,46772 (api:53) 2018-09-11 00:40:14,660-0400 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer] Return 'Host.getAllVmStats' in bridge with (suppressed) (__init__:356) 2018-09-11 00:40:14,661-0400 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:312) 2018-09-11 00:40:14,674-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmIoTunePolicies' in bridge with {} (__init__:329) 2018-09-11 00:40:14,675-0400 INFO (jsonrpc/1) [api.host] START getAllVmIoTunePolicies() from=::1,46772 (api:47) 2018-09-11 00:40:14,675-0400 INFO (jsonrpc/1) [api.host] FINISH getAllVmIoTunePolicies return={'status': {'message': 'Done', 'code': 0}, 'io_tune_policies_dict': {'8214433a-f233-4aaa-aeda-2ce1d31c78dc': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': '/rhev/data-center/mnt/blockSD/f1744940-41b6-4d35-b7bf-870c4e07d995/images/9a3659dc-339e-4b3f-a1d1-6ee3debb6168/e2570635-a798-4e9e-a0ea-88436d5b296f', 'name': 'vda'}]}}} from=::1,46772 (api:53) 2018-09-11 00:40:14,675-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer] Return 'Host.getAllVmIoTunePolicies' in bridge with {'8214433a-f233-4aaa-aeda-2ce1d31c78dc': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': '/rhev/data-center/mnt/blockSD/f1744940-41b6-4d35-b7bf-870c4e07d995/images/9a3659dc-339e-4b3f-a1d1-6ee3debb6168/e2570635-a798-4e9e-a0ea-88436d5b296f', 'name': 'vda'}]}} (__init__:356) 2018-09-11 00:40:14,675-0400 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmIoTunePolicies succeeded in 0.00 seconds (__init__:312) 2018-09-11 00:40:16,118-0400 DEBUG (jsonrpc/7) [jsonrpc.JsonRpcServer] Calling 'Host.getAllVmStats' in bridge with {} (__init__:329) 2018-09-11 00:40:16,119-0400 INFO (jsonrpc/7) [api.host] START getAllVmStats() from=::ffff:192.168.201.4,49184 (api:47) 2018-09-11 00:40:16,121-0400 INFO (jsonrpc/7) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::ffff:192.168.201.4,49184 (api:53) 2018-09-11 00:40:16,122-0400 DEBUG (jsonrpc/7) [jsonrpc.JsonRpcServer] Return 'Host.getAllVmStats' in bridge with (suppressed) (__init__:356) 2018-09-11 00:40:16,122-0400 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.01 seconds (__init__:312) 2018-09-11 00:40:17,956-0400 DEBUG (qgapoller/2) [vds] Not sending QEMU-GA command 'guest-get-users' to vm_id='8214433a-f233-4aaa-aeda-2ce1d31c78dc', command is not supported (qemuguestagent:192) 2018-09-11 00:40:18,025-0400 DEBUG (periodic/3) [virt.periodic] Looking for stale paused VMs (periodic:388) I can also see this in Super vdsm which may be related but I an not 100% sure its related to current issue: MainProcess|jsonrpc/2::DEBUG::2018-09-11 00:33:20,012::cmdutils::159::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 ifup/on039d2d88d8ee4::DEBUG::2018-09-11 00:33:20,238::cmdutils::159::root::(exec_cmd) FAILED: <err> = 'Running scope as unit 2f1f417d-85b9-46c0-b7d7-6082836daaae.scope.\n/etc/sysconfig/network-scripts/ifup-eth: line 321: 20715 Terminated /sbin/dhclient ${DHCLIENTARGS} ${DEVICE}\nDevice "on039d2d88d8ee4" does not exist.\nCannot find device "on039d2d88d8ee4"\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d 8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevic e "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d 8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevic e "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d 8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevic e "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\nDevice "on039d2d88d8ee4" does not exist.\n'; <rc> = 1 ifup/on039d2d88d8ee4::ERROR::2018-09-11 00:33:20,238::concurrent::202::root::(run) FINISH thread <Thread(ifup/on039d2d88d8ee4, stopped daemon 140281550731008)> failed Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 195, in run ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 948, in _exec_ifup _exec_ifup_by_name(iface.name, cgroup) File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 934, in _exec_ifup_by_name raise ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else '') ConfigNetworkError: (29, '\n') thanks, Dafna
_______________________________________________ Infra mailing list -- infra@ovirt.org To unsubscribe send an email to infra-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/infra@ovirt.org/message/TECK7DKETJZ4E56GVW7GPQLJGJ7OZ6VV/