----- Original Message ----- > From: "Simone Tiraboschi" <stira...@redhat.com> > To: "Nir Soffer" <nsof...@redhat.com> > Cc: devel@ovirt.org, "Fabian Deutsch" <fdeut...@redhat.com> > Sent: Friday, May 29, 2015 6:42:08 PM > Subject: Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on > 100% while vdsmd indefinitely tries to > restart > > > > ----- Original Message ----- > > From: "Nir Soffer" <nsof...@redhat.com> > > To: "Simone Tiraboschi" <stira...@redhat.com> > > Cc: devel@ovirt.org, "Fabian Deutsch" <fdeut...@redhat.com> > > Sent: Friday, May 29, 2015 5:26:52 PM > > Subject: Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck > > on 100% while vdsmd indefinitely tries to > > restart > > > > ----- Original Message ----- > > > From: "Simone Tiraboschi" <stira...@redhat.com> > > > To: devel@ovirt.org > > > Cc: "Fabian Deutsch" <fdeut...@redhat.com> > > > Sent: Friday, May 29, 2015 1:44:02 PM > > > Subject: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on > > > 100% while vdsmd indefinitely tries to > > > restart > > > > > > Hi, > > > I tried to have hosted-engine deploying the engine appliance over oVirt > > > node. > > > I think it will be quite a common scenario. > > > I tried with an oVirt node build from yesterday. > > > > > > Unfortunately I'm not able to conclude the setup cause oVirt node got the > > > CPU > > > load indefinitely stuck on 100% and so it's almost unresponsive. > > > > > > The issue seams to be related to vdsmd daemon witch couldn't really start > > > and > > > so it retries indefinitely using all the available CPU power (it also > > > runs > > > with niceless -20...). > > > > > > [root@node36 admin]# grep "Unit vdsmd.service entered failed state." > > > /var/log/messages | wc -l > > > 368 > > > It tried 368 times in a row in a few minutes. > > > > > > With journalctl I can read: > > > May 29 10:06:45 node36 systemd[1]: Unit vdsmd.service entered failed > > > state. > > > May 29 10:06:45 node36 systemd[1]: vdsmd.service holdoff time over, > > > scheduling restart. > > > May 29 10:06:45 node36 systemd[1]: Stopping Virtual Desktop Server > > > Manager... > > > May 29 10:06:45 node36 systemd[1]: Starting Virtual Desktop Server > > > Manager... > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running mkdirs > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > configure_coredump > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > configure_vdsm_logs > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > wait_for_network > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > run_init_hooks > > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > upgraded_version_check > > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > check_is_configured > > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > validate_configuration > > > May 29 10:06:47 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > prepare_transient_repository > > > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > syslog_available > > > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > nwfilter > > > May 29 10:06:50 node36 vdsmd_init_common.sh[13697]: vdsm: Running dummybr > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > load_needed_modules > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > tune_system > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > test_space > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_lo > > > May 29 10:06:51 node36 systemd[1]: Started Virtual Desktop Server > > > Manager. > > > May 29 10:06:51 node36 systemd[1]: vdsmd.service: main process exited, > > > code=exited, status=1/FAILURE > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13821]: vdsm: Running > > > run_final_hooks > > > May 29 10:06:52 node36 systemd[1]: Unit vdsmd.service entered failed > > > state. > > > May 29 10:06:52 node36 systemd[1]: vdsmd.service holdoff time over, > > > scheduling restart. > > > May 29 10:06:52 node36 systemd[1]: Stopping Virtual Desktop Server > > > Manager... > > > May 29 10:06:52 node36 systemd[1]: Starting Virtual Desktop Server > > > Manager... > > > repeated a lot of times > > > > > > /var/log/vdsm/vdsm.log is empty. > > > > > > while > > > [root@node36 admin]# /usr/share/vdsm/daemonAdapter -0 /dev/null -1 > > > /dev/null > > > -2 /dev/null /usr/share/vdsm/vdsm; echo $? > > > 1 > > > > Can you try to run vdsm manually from the shell? > > > > # /usr/share/vdsm/vdsm > > > > Typically you would see a python traceback explaining the failure. > > I tried and it just fails. > Exit code is 1
Can show strace of the failure? # strace /usr/share/vdsm/vdsm Nir _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel