On Sun, Aug 8, 2021 at 10:14 AM Yedidyah Bar David <d...@redhat.com> wrote: > > On Thu, Aug 5, 2021 at 9:31 AM Yedidyah Bar David <d...@redhat.com> wrote: > > > > On Wed, Aug 4, 2021 at 1:56 PM Michal Skrivanek > > <michal.skriva...@redhat.com> wrote: > > > I don’t really know for sure, but AFAICT it should be real data from the > > > start. > > > Maybe for the first interval, but afterwards it’s always a libvirt > > > reported value > > > > Adding Nir. Not sure who else... sorry. > > > > This now happened again: > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/2129/ > > > > Console has: > > > > 06:25:25 2021-08-05 03:25:25+0000,873 INFO [root] Starting the > > engine VM... (test_008_restart_he_vm:96) > > > > broker.log has (I think it only logs once a minute): > > > > Thread-4::INFO::2021-08-05 > > 05:25:31,995::cpu_load_no_engine::126::cpu_load_no_engine.CpuLoadNoEngine::(calculate_load) > > System load total=0.8164, engine=0.0000, non-engine=0.8164 > > Thread-4::INFO::2021-08-05 > > 05:26:32,072::cpu_load_no_engine::126::cpu_load_no_engine.CpuLoadNoEngine::(calculate_load) > > System load total=0.8480, engine=0.0000, non-engine=0.8480 > > Thread-4::INFO::2021-08-05 > > 05:27:32,175::cpu_load_no_engine::126::cpu_load_no_engine.CpuLoadNoEngine::(calculate_load) > > System load total=0.7572, engine=0.2656, non-engine=0.4916 > > > > vdsm.log [1] has: > > > > 2021-08-05 05:25:29,017+0200 DEBUG (jsonrpc/4) [jsonrpc.JsonRpcServer] > > Calling 'VM.create' in bridge... > > > > 2021-08-05 05:25:31,991+0200 DEBUG (jsonrpc/7) [api] FINISH getStats > > response={'status': {'code': 0, 'message': 'Done'}, 'statsList': > > [{'statusTime': '2152587436', 'status': 'WaitForLaunch', 'vmId': > > '230ea8e8-e365-46cd-98fa-e9d6a653306f', 'vmName': 'HostedEngine', > > 'vmType': 'kvm', 'kvmEnable': 'true', 'acpiEnable': 'true', > > 'elapsedTime': '2', 'monitorResponse': '0', 'clientIp': '', > > 'timeOffset': '0', 'cpuUser': '0.00', 'cpuSys': '0.00',... > > > > and 17 more such [2] lines. Line 11 is the first one with cpuUser != > > 0.00, at '2021-08-05 05:27:02', 92 seconds later. Incidentally (or > > not), this is also the first line with 'network' in it. There are > > other differences along the way - e.g. I see status moving from > > WaitForLaunch to 'Powering up' and to 'Up', but the first 'Up' line is > > number 7 - 40 seconds before cpuUser>0.
Milan should be able to help with this. In storage monitoring we avoid this issue by reporting actual=False before we got the first monitoring results, so engine can wait for the actual results. https://github.com/oVirt/vdsm/blob/4309a39492040300e1b983eb583e8847f5cc7538/lib/vdsm/storage/monitor.py#L297 > > I'd like to clarify that I do not see this mainly as an OST issue, but > > more as a general HE HA issue - if users start global maint, then > > restart the engine vm, then exit global maint too quickly, the > > reported high cpu load might make the machine go down. In OST, I can > > easily just add another 60 seconds or so delay after the engine is up. > > Of course we can do the same also in HA, and I'd be for doing this, if > > we do not get any more information (or find out that this is a > > recently-introduced bug and fix it). If this is a real issue you should be able to reproduce this on a real system. Nir _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/UZPQE6TUC433FRNMXO6LENYDNRJ7EE5L/