On Wed, Aug 4, 2021 at 1:56 PM Michal Skrivanek <michal.skriva...@redhat.com> wrote: > I don’t really know for sure, but AFAICT it should be real data from the > start. > Maybe for the first interval, but afterwards it’s always a libvirt reported > value
Adding Nir. Not sure who else... sorry. This now happened again: https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/2129/ Console has: 06:25:25 2021-08-05 03:25:25+0000,873 INFO [root] Starting the engine VM... (test_008_restart_he_vm:96) broker.log has (I think it only logs once a minute): Thread-4::INFO::2021-08-05 05:25:31,995::cpu_load_no_engine::126::cpu_load_no_engine.CpuLoadNoEngine::(calculate_load) System load total=0.8164, engine=0.0000, non-engine=0.8164 Thread-4::INFO::2021-08-05 05:26:32,072::cpu_load_no_engine::126::cpu_load_no_engine.CpuLoadNoEngine::(calculate_load) System load total=0.8480, engine=0.0000, non-engine=0.8480 Thread-4::INFO::2021-08-05 05:27:32,175::cpu_load_no_engine::126::cpu_load_no_engine.CpuLoadNoEngine::(calculate_load) System load total=0.7572, engine=0.2656, non-engine=0.4916 vdsm.log [1] has: 2021-08-05 05:25:29,017+0200 DEBUG (jsonrpc/4) [jsonrpc.JsonRpcServer] Calling 'VM.create' in bridge... 2021-08-05 05:25:31,991+0200 DEBUG (jsonrpc/7) [api] FINISH getStats response={'status': {'code': 0, 'message': 'Done'}, 'statsList': [{'statusTime': '2152587436', 'status': 'WaitForLaunch', 'vmId': '230ea8e8-e365-46cd-98fa-e9d6a653306f', 'vmName': 'HostedEngine', 'vmType': 'kvm', 'kvmEnable': 'true', 'acpiEnable': 'true', 'elapsedTime': '2', 'monitorResponse': '0', 'clientIp': '', 'timeOffset': '0', 'cpuUser': '0.00', 'cpuSys': '0.00',... and 17 more such [2] lines. Line 11 is the first one with cpuUser != 0.00, at '2021-08-05 05:27:02', 92 seconds later. Incidentally (or not), this is also the first line with 'network' in it. There are other differences along the way - e.g. I see status moving from WaitForLaunch to 'Powering up' and to 'Up', but the first 'Up' line is number 7 - 40 seconds before cpuUser>0. I'd like to clarify that I do not see this mainly as an OST issue, but more as a general HE HA issue - if users start global maint, then restart the engine vm, then exit global maint too quickly, the reported high cpu load might make the machine go down. In OST, I can easily just add another 60 seconds or so delay after the engine is up. Of course we can do the same also in HA, and I'd be for doing this, if we do not get any more information (or find out that this is a recently-introduced bug and fix it). [1] https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/2129/artifact/exported-artifacts/test_logs/ost-he-basic-suite-master-host-0/var/log/vdsm/vdsm.log [2] grep -i " 05:2[5678].*api. finish getStats.*cpuUser':" Thanks and best regards, -- Didi _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/BUPXKHLOEQN3E5PM6LNFSKAVUYGPYDCF/