On Sun, Aug 8, 2021 at 10:14 AM Yedidyah Bar David <d...@redhat.com> wrote:
>
> On Thu, Aug 5, 2021 at 9:31 AM Yedidyah Bar David <d...@redhat.com> wrote:
> >
> > On Wed, Aug 4, 2021 at 1:56 PM Michal Skrivanek
> > <michal.skriva...@redhat.com> wrote:
> > > I don’t really know for sure, but AFAICT it should be real data from the 
> > > start.
> > > Maybe for the first interval, but afterwards it’s always a libvirt 
> > > reported value
> >
> > Adding Nir. Not sure who else... sorry.
> >
> > This now happened again:
> >
> > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/2129/
> >
> > Console has:
> >
> > 06:25:25 2021-08-05 03:25:25+0000,873 INFO    [root] Starting the
> > engine VM... (test_008_restart_he_vm:96)
> >
> > broker.log has (I think it only logs once a minute):
> >
> > Thread-4::INFO::2021-08-05
> > 05:25:31,995::cpu_load_no_engine::126::cpu_load_no_engine.CpuLoadNoEngine::(calculate_load)
> > System load total=0.8164, engine=0.0000, non-engine=0.8164
> > Thread-4::INFO::2021-08-05
> > 05:26:32,072::cpu_load_no_engine::126::cpu_load_no_engine.CpuLoadNoEngine::(calculate_load)
> > System load total=0.8480, engine=0.0000, non-engine=0.8480
> > Thread-4::INFO::2021-08-05
> > 05:27:32,175::cpu_load_no_engine::126::cpu_load_no_engine.CpuLoadNoEngine::(calculate_load)
> > System load total=0.7572, engine=0.2656, non-engine=0.4916
> >
> > vdsm.log [1] has:
> >
> > 2021-08-05 05:25:29,017+0200 DEBUG (jsonrpc/4) [jsonrpc.JsonRpcServer]
> > Calling 'VM.create' in bridge...
> >
> > 2021-08-05 05:25:31,991+0200 DEBUG (jsonrpc/7) [api] FINISH getStats
> > response={'status': {'code': 0, 'message': 'Done'}, 'statsList':
> > [{'statusTime': '2152587436', 'status': 'WaitForLaunch', 'vmId':
> > '230ea8e8-e365-46cd-98fa-e9d6a653306f', 'vmName': 'HostedEngine',
> > 'vmType': 'kvm', 'kvmEnable': 'true', 'acpiEnable': 'true',
> > 'elapsedTime': '2', 'monitorResponse': '0', 'clientIp': '',
> > 'timeOffset': '0', 'cpuUser': '0.00', 'cpuSys': '0.00',...
> >
> > and 17 more such [2] lines. Line 11 is the first one with cpuUser !=
> > 0.00, at '2021-08-05 05:27:02', 92 seconds later. Incidentally (or
> > not), this is also the first line with 'network' in it. There are
> > other differences along the way - e.g. I see status moving from
> > WaitForLaunch to 'Powering up' and to 'Up', but the first 'Up' line is
> > number 7 - 40 seconds before cpuUser>0.

Milan should be able to help with this.

In storage monitoring we avoid this issue by reporting actual=False
before we got the first monitoring results, so engine can wait for the actual
results.
https://github.com/oVirt/vdsm/blob/4309a39492040300e1b983eb583e8847f5cc7538/lib/vdsm/storage/monitor.py#L297

> > I'd like to clarify that I do not see this mainly as an OST issue, but
> > more as a general HE HA issue - if users start global maint, then
> > restart the engine vm, then exit global maint too quickly, the
> > reported high cpu load might make the machine go down. In OST, I can
> > easily just add another 60 seconds or so delay after the engine is up.
> > Of course we can do the same also in HA, and I'd be for doing this, if
> > we do not get any more information (or find out that this is a
> > recently-introduced bug and fix it).

If this is a real issue you should be able to reproduce this on a real system.

Nir
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/UZPQE6TUC433FRNMXO6LENYDNRJ7EE5L/

Reply via email to