Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd indefinitely tries to restart
- Original Message - > From: "Simone Tiraboschi" > To: devel@ovirt.org > Cc: "Fabian Deutsch" > Sent: Friday, May 29, 2015 1:44:02 PM > Subject: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on 100% > while vdsmd indefinitely tries to > restart > > Hi, > I tried to have hosted-engine deploying the engine appliance over oVirt node. > I think it will be quite a common scenario. > I tried with an oVirt node build from yesterday. > > Unfortunately I'm not able to conclude the setup cause oVirt node got the CPU > load indefinitely stuck on 100% and so it's almost unresponsive. > > The issue seams to be related to vdsmd daemon witch couldn't really start and > so it retries indefinitely using all the available CPU power (it also runs > with niceless -20...). > > [root@node36 admin]# grep "Unit vdsmd.service entered failed state." > /var/log/messages | wc -l > 368 > It tried 368 times in a row in a few minutes. > > With journalctl I can read: > May 29 10:06:45 node36 systemd[1]: Unit vdsmd.service entered failed state. > May 29 10:06:45 node36 systemd[1]: vdsmd.service holdoff time over, > scheduling restart. > May 29 10:06:45 node36 systemd[1]: Stopping Virtual Desktop Server Manager... > May 29 10:06:45 node36 systemd[1]: Starting Virtual Desktop Server Manager... > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running mkdirs > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > configure_coredump > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > configure_vdsm_logs > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > wait_for_network > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > run_init_hooks > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > upgraded_version_check > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > check_is_configured > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > validate_configuration > May 29 10:06:47 node36 vdsmd_init_common.sh[13697]: vdsm: Running > prepare_transient_repository > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running > syslog_available > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running nwfilter > May 29 10:06:50 node36 vdsmd_init_common.sh[13697]: vdsm: Running dummybr > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running > load_needed_modules > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running tune_system > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_space > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_lo > May 29 10:06:51 node36 systemd[1]: Started Virtual Desktop Server Manager. > May 29 10:06:51 node36 systemd[1]: vdsmd.service: main process exited, > code=exited, status=1/FAILURE > May 29 10:06:51 node36 vdsmd_init_common.sh[13821]: vdsm: Running > run_final_hooks > May 29 10:06:52 node36 systemd[1]: Unit vdsmd.service entered failed state. > May 29 10:06:52 node36 systemd[1]: vdsmd.service holdoff time over, > scheduling restart. > May 29 10:06:52 node36 systemd[1]: Stopping Virtual Desktop Server Manager... > May 29 10:06:52 node36 systemd[1]: Starting Virtual Desktop Server Manager... > repeated a lot of times > > /var/log/vdsm/vdsm.log is empty. > > while > [root@node36 admin]# /usr/share/vdsm/daemonAdapter -0 /dev/null -1 /dev/null > -2 /dev/null /usr/share/vdsm/vdsm; echo $? > 1 Can you try to run vdsm manually from the shell? # /usr/share/vdsm/vdsm Typically you would see a python traceback explaining the failure. Nir ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd indefinitely tries to restart
- Original Message - > From: "Nir Soffer" > To: "Simone Tiraboschi" > Cc: devel@ovirt.org, "Fabian Deutsch" > Sent: Friday, May 29, 2015 5:26:52 PM > Subject: Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on > 100% while vdsmd indefinitely tries to > restart > > - Original Message - > > From: "Simone Tiraboschi" > > To: devel@ovirt.org > > Cc: "Fabian Deutsch" > > Sent: Friday, May 29, 2015 1:44:02 PM > > Subject: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on > > 100% while vdsmd indefinitely tries to > > restart > > > > Hi, > > I tried to have hosted-engine deploying the engine appliance over oVirt > > node. > > I think it will be quite a common scenario. > > I tried with an oVirt node build from yesterday. > > > > Unfortunately I'm not able to conclude the setup cause oVirt node got the > > CPU > > load indefinitely stuck on 100% and so it's almost unresponsive. > > > > The issue seams to be related to vdsmd daemon witch couldn't really start > > and > > so it retries indefinitely using all the available CPU power (it also runs > > with niceless -20...). > > > > [root@node36 admin]# grep "Unit vdsmd.service entered failed state." > > /var/log/messages | wc -l > > 368 > > It tried 368 times in a row in a few minutes. > > > > With journalctl I can read: > > May 29 10:06:45 node36 systemd[1]: Unit vdsmd.service entered failed state. > > May 29 10:06:45 node36 systemd[1]: vdsmd.service holdoff time over, > > scheduling restart. > > May 29 10:06:45 node36 systemd[1]: Stopping Virtual Desktop Server > > Manager... > > May 29 10:06:45 node36 systemd[1]: Starting Virtual Desktop Server > > Manager... > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running mkdirs > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > configure_coredump > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > configure_vdsm_logs > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > wait_for_network > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > run_init_hooks > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > upgraded_version_check > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > check_is_configured > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > validate_configuration > > May 29 10:06:47 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > prepare_transient_repository > > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > syslog_available > > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running nwfilter > > May 29 10:06:50 node36 vdsmd_init_common.sh[13697]: vdsm: Running dummybr > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > load_needed_modules > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > tune_system > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > test_space > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_lo > > May 29 10:06:51 node36 systemd[1]: Started Virtual Desktop Server Manager. > > May 29 10:06:51 node36 systemd[1]: vdsmd.service: main process exited, > > code=exited, status=1/FAILURE > > May 29 10:06:51 node36 vdsmd_init_common.sh[13821]: vdsm: Running > > run_final_hooks > > May 29 10:06:52 node36 systemd[1]: Unit vdsmd.service entered failed state. > > May 29 10:06:52 node36 systemd[1]: vdsmd.service holdoff time over, > > scheduling restart. > > May 29 10:06:52 node36 systemd[1]: Stopping Virtual Desktop Server > > Manager... > > May 29 10:06:52 node36 systemd[1]: Starting Virtual Desktop Server > > Manager... > > repeated a lot of times > > > > /var/log/vdsm/vdsm.log is empty. > > > > while > > [root@node36 admin]# /usr/share/vdsm/daemonAdapter -0 /dev/null -1 > > /dev/null > > -2 /dev/null /usr/share/vdsm/vdsm; echo $? > > 1 > > Can you try to run vdsm manually from the shell? > > # /usr/share/vdsm/vdsm > > Typically you would see a python traceback explaining the failure. I tried and it just fails. Exit code is 1 ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd indefinitely tries to restart
- Original Message - > From: "Simone Tiraboschi" > To: "Nir Soffer" > Cc: devel@ovirt.org, "Fabian Deutsch" > Sent: Friday, May 29, 2015 6:42:08 PM > Subject: Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on > 100% while vdsmd indefinitely tries to > restart > > > > - Original Message - > > From: "Nir Soffer" > > To: "Simone Tiraboschi" > > Cc: devel@ovirt.org, "Fabian Deutsch" > > Sent: Friday, May 29, 2015 5:26:52 PM > > Subject: Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck > > on 100% while vdsmd indefinitely tries to > > restart > > > > - Original Message - > > > From: "Simone Tiraboschi" > > > To: devel@ovirt.org > > > Cc: "Fabian Deutsch" > > > Sent: Friday, May 29, 2015 1:44:02 PM > > > Subject: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on > > > 100% while vdsmd indefinitely tries to > > > restart > > > > > > Hi, > > > I tried to have hosted-engine deploying the engine appliance over oVirt > > > node. > > > I think it will be quite a common scenario. > > > I tried with an oVirt node build from yesterday. > > > > > > Unfortunately I'm not able to conclude the setup cause oVirt node got the > > > CPU > > > load indefinitely stuck on 100% and so it's almost unresponsive. > > > > > > The issue seams to be related to vdsmd daemon witch couldn't really start > > > and > > > so it retries indefinitely using all the available CPU power (it also > > > runs > > > with niceless -20...). > > > > > > [root@node36 admin]# grep "Unit vdsmd.service entered failed state." > > > /var/log/messages | wc -l > > > 368 > > > It tried 368 times in a row in a few minutes. > > > > > > With journalctl I can read: > > > May 29 10:06:45 node36 systemd[1]: Unit vdsmd.service entered failed > > > state. > > > May 29 10:06:45 node36 systemd[1]: vdsmd.service holdoff time over, > > > scheduling restart. > > > May 29 10:06:45 node36 systemd[1]: Stopping Virtual Desktop Server > > > Manager... > > > May 29 10:06:45 node36 systemd[1]: Starting Virtual Desktop Server > > > Manager... > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running mkdirs > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > configure_coredump > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > configure_vdsm_logs > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > wait_for_network > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > run_init_hooks > > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > upgraded_version_check > > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > check_is_configured > > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > validate_configuration > > > May 29 10:06:47 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > prepare_transient_repository > > > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > syslog_available > > > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > nwfilter > > > May 29 10:06:50 node36 vdsmd_init_common.sh[13697]: vdsm: Running dummybr > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > load_needed_modules > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > tune_system > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > test_space > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_lo > > > May 29 10:06:51 node36 systemd[1]: Started Virtual Desktop Server > > > Manager. > > > May 29 10:06:51 node36 systemd[1]: vdsmd.service: main process exited, > > > code=exited, status=1/FAILURE > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13821]: vdsm: Running > > > run_final_hooks > > > May 29 10:06:52 node36 systemd[1]: Unit vdsmd.service entered failed > > > state. > > > May 29 10:06:52 node36 systemd[1]: vdsmd.service holdoff time over, > > > scheduling restart. > > > May 29 10:06:52 node36 systemd[1]: Stopping Virtual Desktop Server > > > Manager... > > > May 29 10:06:52 node36 systemd[1]: Starting Virtual Desktop Server > > > Manager... > > > repeated a lot of times > > > > > > /var/log/vdsm/vdsm.log is empty. > > > > > > while > > > [root@node36 admin]# /usr/share/vdsm/daemonAdapter -0 /dev/null -1 > > > /dev/null > > > -2 /dev/null /usr/share/vdsm/vdsm; echo $? > > > 1 > > > > Can you try to run vdsm manually from the shell? > > > > # /usr/share/vdsm/vdsm > > > > Typically you would see a python traceback explaining the failure. > > I tried and it just fails. > Exit code is 1 Can show strace of the failure? # strace /usr/share/vdsm/vdsm Nir ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd indefinitely tries to restart
- Original Message - > From: "Nir Soffer" > To: "Simone Tiraboschi" > Cc: devel@ovirt.org, "Fabian Deutsch" > Sent: Friday, May 29, 2015 5:45:48 PM > Subject: Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on > 100% while vdsmd indefinitely tries to > restart > > > > - Original Message - > > From: "Simone Tiraboschi" > > To: "Nir Soffer" > > Cc: devel@ovirt.org, "Fabian Deutsch" > > Sent: Friday, May 29, 2015 6:42:08 PM > > Subject: Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck > > on 100% while vdsmd indefinitely tries to > > restart > > > > > > > > - Original Message - > > > From: "Nir Soffer" > > > To: "Simone Tiraboschi" > > > Cc: devel@ovirt.org, "Fabian Deutsch" > > > Sent: Friday, May 29, 2015 5:26:52 PM > > > Subject: Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck > > > on 100% while vdsmd indefinitely tries to > > > restart > > > > > > - Original Message - > > > > From: "Simone Tiraboschi" > > > > To: devel@ovirt.org > > > > Cc: "Fabian Deutsch" > > > > Sent: Friday, May 29, 2015 1:44:02 PM > > > > Subject: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck > > > > on > > > > 100% while vdsmd indefinitely tries to > > > > restart > > > > > > > > Hi, > > > > I tried to have hosted-engine deploying the engine appliance over oVirt > > > > node. > > > > I think it will be quite a common scenario. > > > > I tried with an oVirt node build from yesterday. > > > > > > > > Unfortunately I'm not able to conclude the setup cause oVirt node got > > > > the > > > > CPU > > > > load indefinitely stuck on 100% and so it's almost unresponsive. > > > > > > > > The issue seams to be related to vdsmd daemon witch couldn't really > > > > start > > > > and > > > > so it retries indefinitely using all the available CPU power (it also > > > > runs > > > > with niceless -20...). > > > > > > > > [root@node36 admin]# grep "Unit vdsmd.service entered failed state." > > > > /var/log/messages | wc -l > > > > 368 > > > > It tried 368 times in a row in a few minutes. > > > > > > > > With journalctl I can read: > > > > May 29 10:06:45 node36 systemd[1]: Unit vdsmd.service entered failed > > > > state. > > > > May 29 10:06:45 node36 systemd[1]: vdsmd.service holdoff time over, > > > > scheduling restart. > > > > May 29 10:06:45 node36 systemd[1]: Stopping Virtual Desktop Server > > > > Manager... > > > > May 29 10:06:45 node36 systemd[1]: Starting Virtual Desktop Server > > > > Manager... > > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > mkdirs > > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > configure_coredump > > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > configure_vdsm_logs > > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > wait_for_network > > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > run_init_hooks > > > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > upgraded_version_check > > > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > check_is_configured > > > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > validate_configuration > > > > May 29 10:06:47 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > prepare_transient_repository > > > > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > syslog_available > > > > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > nwfilter > > > > May 29 10:06:50 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > dummybr > > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > load_needed_modules > > > > May 29 10:06:51 node36 vd
Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd indefinitely tries to restart
On 05/29/2015 06:44 AM, Simone Tiraboschi wrote: Hi, I tried to have hosted-engine deploying the engine appliance over oVirt node. I think it will be quite a common scenario. I tried with an oVirt node build from yesterday. Unfortunately I'm not able to conclude the setup cause oVirt node got the CPU load indefinitely stuck on 100% and so it's almost unresponsive. The issue seams to be related to vdsmd daemon witch couldn't really start and so it retries indefinitely using all the available CPU power (it also runs with niceless -20...). [root@node36 admin]# grep "Unit vdsmd.service entered failed state." /var/log/messages | wc -l 368 It tried 368 times in a row in a few minutes. With journalctl I can read: May 29 10:06:45 node36 systemd[1]: Unit vdsmd.service entered failed state. May 29 10:06:45 node36 systemd[1]: vdsmd.service holdoff time over, scheduling restart. May 29 10:06:45 node36 systemd[1]: Stopping Virtual Desktop Server Manager... May 29 10:06:45 node36 systemd[1]: Starting Virtual Desktop Server Manager... May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running mkdirs May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running configure_coredump May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running configure_vdsm_logs May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running wait_for_network May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running run_init_hooks May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running upgraded_version_check May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running check_is_configured May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running validate_configuration May 29 10:06:47 node36 vdsmd_init_common.sh[13697]: vdsm: Running prepare_transient_repository May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running syslog_available May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running nwfilter May 29 10:06:50 node36 vdsmd_init_common.sh[13697]: vdsm: Running dummybr May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running load_needed_modules May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running tune_system May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_space May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_lo May 29 10:06:51 node36 systemd[1]: Started Virtual Desktop Server Manager. May 29 10:06:51 node36 systemd[1]: vdsmd.service: main process exited, code=exited, status=1/FAILURE May 29 10:06:51 node36 vdsmd_init_common.sh[13821]: vdsm: Running run_final_hooks May 29 10:06:52 node36 systemd[1]: Unit vdsmd.service entered failed state. May 29 10:06:52 node36 systemd[1]: vdsmd.service holdoff time over, scheduling restart. May 29 10:06:52 node36 systemd[1]: Stopping Virtual Desktop Server Manager... May 29 10:06:52 node36 systemd[1]: Starting Virtual Desktop Server Manager... repeated a lot of times /var/log/vdsm/vdsm.log is empty. while [root@node36 admin]# /usr/share/vdsm/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null /usr/share/vdsm/vdsm; echo $? 1 Thanks for the report Simone. From my tests you are facing: non-root user cannot `from ovirtnode import ovirtfunctions`: permission denied: '/var/log/ovirt-node.log' and '/var/log/ovirt.log https://bugzilla.redhat.com/show_bug.cgi?id=1224400 We should handle this bug very soon. The workaround is chmod o+rw in /var/log/ovirt.log /var/log/ovirt-node.log -- Cheers Douglas ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd indefinitely tries to restart
- Original Message - > From: "Douglas Schilling Landgraf" > To: "Simone Tiraboschi" , devel@ovirt.org > Cc: "Fabian Deutsch" > Sent: Saturday, May 30, 2015 11:28:38 PM > Subject: Re: oVirt node 3.6 and CPU load indefinitely stuck on 100% while > vdsmd indefinitely tries to restart > > On 05/29/2015 06:44 AM, Simone Tiraboschi wrote: > > Hi, > > I tried to have hosted-engine deploying the engine appliance over oVirt > > node. I think it will be quite a common scenario. > > I tried with an oVirt node build from yesterday. > > > > Unfortunately I'm not able to conclude the setup cause oVirt node got the > > CPU load indefinitely stuck on 100% and so it's almost unresponsive. > > > > The issue seams to be related to vdsmd daemon witch couldn't really start > > and so it retries indefinitely using all the available CPU power (it also > > runs with niceless -20...). > > > > [root@node36 admin]# grep "Unit vdsmd.service entered failed state." > > /var/log/messages | wc -l > > 368 > > It tried 368 times in a row in a few minutes. > > > > With journalctl I can read: > > May 29 10:06:45 node36 systemd[1]: Unit vdsmd.service entered failed state. > > May 29 10:06:45 node36 systemd[1]: vdsmd.service holdoff time over, > > scheduling restart. > > May 29 10:06:45 node36 systemd[1]: Stopping Virtual Desktop Server > > Manager... > > May 29 10:06:45 node36 systemd[1]: Starting Virtual Desktop Server > > Manager... > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running mkdirs > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > configure_coredump > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > configure_vdsm_logs > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > wait_for_network > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > run_init_hooks > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > upgraded_version_check > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > check_is_configured > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > validate_configuration > > May 29 10:06:47 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > prepare_transient_repository > > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > syslog_available > > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running nwfilter > > May 29 10:06:50 node36 vdsmd_init_common.sh[13697]: vdsm: Running dummybr > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > load_needed_modules > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > tune_system > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > test_space > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_lo > > May 29 10:06:51 node36 systemd[1]: Started Virtual Desktop Server Manager. > > May 29 10:06:51 node36 systemd[1]: vdsmd.service: main process exited, > > code=exited, status=1/FAILURE > > May 29 10:06:51 node36 vdsmd_init_common.sh[13821]: vdsm: Running > > run_final_hooks > > May 29 10:06:52 node36 systemd[1]: Unit vdsmd.service entered failed state. > > May 29 10:06:52 node36 systemd[1]: vdsmd.service holdoff time over, > > scheduling restart. > > May 29 10:06:52 node36 systemd[1]: Stopping Virtual Desktop Server > > Manager... > > May 29 10:06:52 node36 systemd[1]: Starting Virtual Desktop Server > > Manager... > > repeated a lot of times > > > > /var/log/vdsm/vdsm.log is empty. > > > > while > > [root@node36 admin]# /usr/share/vdsm/daemonAdapter -0 /dev/null -1 > > /dev/null -2 /dev/null /usr/share/vdsm/vdsm; echo $? > > 1 > > > > Thanks for the report Simone. From my tests you are facing: > > non-root user cannot `from ovirtnode import ovirtfunctions`: permission > denied: '/var/log/ovirt-node.log' and '/var/log/ovirt.log > https://bugzilla.redhat.com/show_bug.cgi?id=1224400 > > We should handle this bug very soon. The workaround is chmod o+rw in > /var/log/ovirt.log /var/log/ovirt-node.log OK. I tried [root@node36 admin]# chmod o+rw /var/log/ovirt.log /var/log/ovirt-node.log but now I'm getting: [root@node36 admin]# systemctl status -l vdsmd vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Mon 2015-06-01 07:53:09 UTC; 17s ago Process: 4040 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS) Process: 4049 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 4164 (vdsm) CGroup: /system.slice/vdsmd.service └─4164 /usr/bin/python /usr/share/vdsm/vdsm Jun 01 07:53:07 node36 vdsmd_init_common.sh[4049]: vdsm: Running nwfilter Jun 01 07:53:08 node36 vdsmd_init_common.sh[4049]: vdsm: Running dummybr Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running load_needed_mo
Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd indefinitely tries to restart
On 06/01/2015 03:56 AM, Simone Tiraboschi wrote: - Original Message - From: "Douglas Schilling Landgraf" To: "Simone Tiraboschi" , devel@ovirt.org Cc: "Fabian Deutsch" Sent: Saturday, May 30, 2015 11:28:38 PM Subject: Re: oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd indefinitely tries to restart On 05/29/2015 06:44 AM, Simone Tiraboschi wrote: Hi, I tried to have hosted-engine deploying the engine appliance over oVirt node. I think it will be quite a common scenario. I tried with an oVirt node build from yesterday. Unfortunately I'm not able to conclude the setup cause oVirt node got the CPU load indefinitely stuck on 100% and so it's almost unresponsive. The issue seams to be related to vdsmd daemon witch couldn't really start and so it retries indefinitely using all the available CPU power (it also runs with niceless -20...). [root@node36 admin]# grep "Unit vdsmd.service entered failed state." /var/log/messages | wc -l 368 It tried 368 times in a row in a few minutes. With journalctl I can read: May 29 10:06:45 node36 systemd[1]: Unit vdsmd.service entered failed state. May 29 10:06:45 node36 systemd[1]: vdsmd.service holdoff time over, scheduling restart. May 29 10:06:45 node36 systemd[1]: Stopping Virtual Desktop Server Manager... May 29 10:06:45 node36 systemd[1]: Starting Virtual Desktop Server Manager... May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running mkdirs May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running configure_coredump May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running configure_vdsm_logs May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running wait_for_network May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running run_init_hooks May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running upgraded_version_check May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running check_is_configured May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running validate_configuration May 29 10:06:47 node36 vdsmd_init_common.sh[13697]: vdsm: Running prepare_transient_repository May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running syslog_available May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running nwfilter May 29 10:06:50 node36 vdsmd_init_common.sh[13697]: vdsm: Running dummybr May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running load_needed_modules May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running tune_system May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_space May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_lo May 29 10:06:51 node36 systemd[1]: Started Virtual Desktop Server Manager. May 29 10:06:51 node36 systemd[1]: vdsmd.service: main process exited, code=exited, status=1/FAILURE May 29 10:06:51 node36 vdsmd_init_common.sh[13821]: vdsm: Running run_final_hooks May 29 10:06:52 node36 systemd[1]: Unit vdsmd.service entered failed state. May 29 10:06:52 node36 systemd[1]: vdsmd.service holdoff time over, scheduling restart. May 29 10:06:52 node36 systemd[1]: Stopping Virtual Desktop Server Manager... May 29 10:06:52 node36 systemd[1]: Starting Virtual Desktop Server Manager... repeated a lot of times /var/log/vdsm/vdsm.log is empty. while [root@node36 admin]# /usr/share/vdsm/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null /usr/share/vdsm/vdsm; echo $? 1 Thanks for the report Simone. From my tests you are facing: non-root user cannot `from ovirtnode import ovirtfunctions`: permission denied: '/var/log/ovirt-node.log' and '/var/log/ovirt.log https://bugzilla.redhat.com/show_bug.cgi?id=1224400 We should handle this bug very soon. The workaround is chmod o+rw in /var/log/ovirt.log /var/log/ovirt-node.log OK. I tried [root@node36 admin]# chmod o+rw /var/log/ovirt.log /var/log/ovirt-node.log but now I'm getting: [root@node36 admin]# systemctl status -l vdsmd vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Mon 2015-06-01 07:53:09 UTC; 17s ago Process: 4040 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS) Process: 4049 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 4164 (vdsm) CGroup: /system.slice/vdsmd.service └─4164 /usr/bin/python /usr/share/vdsm/vdsm Jun 01 07:53:07 node36 vdsmd_init_common.sh[4049]: vdsm: Running nwfilter Jun 01 07:53:08 node36 vdsmd_init_common.sh[4049]: vdsm: Running dummybr Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running load_needed_modules Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running tune_system Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running test_space Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running test_lo Jun 01 07:53:09 node36 systemd