Re: [Openstack] compute nodes down

2017-12-30 Thread Adam Lawson
Be careful, a compute node showing as down may not actually be down at all
but the agent not being able to report back or frm the conductor not being
able to update the db. I was about to ask if the VM's were offline or not.
Glad you got it figured out!

//adam

On Dec 29, 2017 1:04 PM, "Jim Okken"  wrote:

> I believe this issue turned out to be the shared storage device we are
> using for shared storage to each compute node.
>
> it had an access issue and one instance's vHD files had access attempts
> that hung forever and never timed out.
> this make sense for one node to be having nova issues. But could this
> cause all compute nodes to have nova services to stop after some time? (in
> a shared storage setup does each node access/query each vHD on the storage
> periodically?)
>
> thanks!
>
> -- Jim
>
> On Tue, Dec 19, 2017 at 3:45 AM, Tobias Urdin 
> wrote:
>
>> Enable debug in nova.conf and check conductor and compute logs.
>>
>> Check that your clock is in-sync with NTP or you might experience that
>> the alive checks in the database exceeds the service_down_time config value.
>>
>> On 12/19/2017 12:09 AM, Jim Okken wrote:
>>
>> hi list,
>>
>> hoping someone could shed some light on this issue I just started seeing
>> today
>>
>> all my compute nodes started showing as "Down" in the Horizon ->
>> Hypervisors -> Compute Nodes tab
>>
>>
>> root@node-1:~# nova service-list
>> +-+--+---+--+---
>> --+---++-+
>> | Id  | Binary   | Host  | Zone | Status  | State
>> | Updated_at | Disabled Reason |
>> +-+--+---+--+---
>> --+---++-+
>> | 325 | nova-compute | node-9.mydom.com  | nova | enabled |
>> down  | 2017-12-18T21:59:38.00 | -   |
>> | 448 | nova-compute | node-14.mydom.com | nova | enabled | up
>>   | 2017-12-18T22:41:42.00 | -   |
>> | 451 | nova-compute | node-17.mydom.com | nova | enabled | up
>>   | 2017-12-18T22:42:04.00 | -   |
>> | 454 | nova-compute | node-11.mydom.com | nova | enabled | up
>>   | 2017-12-18T22:42:02.00 | -   |
>> | 457 | nova-compute | node-12.mydom.com | nova | enabled | up
>>   | 2017-12-18T22:42:12.00 | -   |
>> | 472 | nova-compute | node-16.mydom.com | nova | enabled |
>> down  | 2017-12-18T00:16:01.00 | -   |
>> | 475 | nova-compute | node-10.mydom.com | nova | enabled |
>> down  | 2017-12-18T00:26:09.00 | -   |
>> | 478 | nova-compute | node-13.mydom.com | nova | enabled |
>> down  | 2017-12-17T23:54:06.00 | -   |
>> | 481 | nova-compute | node-15.mydom.com | nova | enabled | up
>>   | 2017-12-18T22:41:34.00 | -   |
>> | 484 | nova-compute | node-8.mydom.com  | nova | enabled |
>> down  | 2017-12-17T23:55:50.00 | -   |
>>
>>
>> if I stop and the start nova-compute on the down nodes the stop will take
>> several minutes and then the start will be quick and fine. but after about
>> 2 hours the nova-compute service will show down again.
>>
>> i am not seeing any ERRORS in nova logs.
>>
>> I get this for the status of a node that is showing as "UP"
>>
>>
>>
>> root@node-14:~# systemctl status nova-compute.service
>> â nova-compute.service - OpenStack Compute
>>Loaded: loaded (/lib/systemd/system/nova-compute.service; enabled;
>> vendor preset: enabled)
>>Active: active (running) since Mon 2017-12-18 21:57:10 UTC; 35min ago
>>  Docs: man:nova-compute(1)
>>   Process: 32193 ExecStartPre=/bin/chown nova:adm /var/log/nova
>> (code=exited, status=0/SUCCESS)
>>   Process: 32190 ExecStartPre=/bin/chown nova:nova /var/lock/nova
>> /var/lib/nova (code=exited, status=0/SUCCESS)
>>   Process: 32187 ExecStartPre=/bin/mkdir -p /var/lock/nova /var/log/nova
>> /var/lib/nova (code=exited, status=0/SUCCESS)
>>  Main PID: 32196 (nova-compute)
>>CGroup: /system.slice/nova-compute.service
>>ââ32196 /usr/bin/python /usr/bin/nova-compute
>> --config-file=/etc/nova/nova-compute.conf --config-file=/etc/nova/nova.conf
>> --log-file=/var/log/nova/nova-compute.log
>>
>> Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
>> 22:31:47.570 32196 DEBUG oslo_messaging._drivers.amqpdriver
>> [req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] CALL msg_id:
>> 2877b9707da144f3a91e7b80e2705fb3 exchange 'nova' topic 'conductor' _send
>> /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/
>> amqpdriver.py:448
>> Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
>> 22:31:47.604 32196 DEBUG oslo_messaging._drivers.amqpdriver [-] received
>> reply msg_id: 2877b9707da144f3a91e7b80e2705fb3 __call__
>> /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/
>> amqpdriver.py:296
>> Dec 18 2

Re: [Openstack] compute nodes down

2017-12-29 Thread Jim Okken
I believe this issue turned out to be the shared storage device we are
using for shared storage to each compute node.

it had an access issue and one instance's vHD files had access attempts
that hung forever and never timed out.
this make sense for one node to be having nova issues. But could this cause
all compute nodes to have nova services to stop after some time? (in a
shared storage setup does each node access/query each vHD on the storage
periodically?)

thanks!

-- Jim

On Tue, Dec 19, 2017 at 3:45 AM, Tobias Urdin 
wrote:

> Enable debug in nova.conf and check conductor and compute logs.
>
> Check that your clock is in-sync with NTP or you might experience that the
> alive checks in the database exceeds the service_down_time config value.
>
> On 12/19/2017 12:09 AM, Jim Okken wrote:
>
> hi list,
>
> hoping someone could shed some light on this issue I just started seeing
> today
>
> all my compute nodes started showing as "Down" in the Horizon ->
> Hypervisors -> Compute Nodes tab
>
>
> root@node-1:~# nova service-list
> +-+--+---+--+---
> --+---++-+
> | Id  | Binary   | Host  | Zone | Status  | State
> | Updated_at | Disabled Reason |
> +-+--+---+--+---
> --+---++-+
> | 325 | nova-compute | node-9.mydom.com  | nova | enabled | down
> | 2017-12-18T21:59:38.00 | -   |
> | 448 | nova-compute | node-14.mydom.com | nova | enabled | up
> | 2017-12-18T22:41:42.00 | -   |
> | 451 | nova-compute | node-17.mydom.com | nova | enabled | up
> | 2017-12-18T22:42:04.00 | -   |
> | 454 | nova-compute | node-11.mydom.com | nova | enabled | up
> | 2017-12-18T22:42:02.00 | -   |
> | 457 | nova-compute | node-12.mydom.com | nova | enabled | up
> | 2017-12-18T22:42:12.00 | -   |
> | 472 | nova-compute | node-16.mydom.com | nova | enabled | down
> | 2017-12-18T00:16:01.00 | -   |
> | 475 | nova-compute | node-10.mydom.com | nova | enabled | down
> | 2017-12-18T00:26:09.00 | -   |
> | 478 | nova-compute | node-13.mydom.com | nova | enabled | down
> | 2017-12-17T23:54:06.00 | -   |
> | 481 | nova-compute | node-15.mydom.com | nova | enabled | up
> | 2017-12-18T22:41:34.00 | -   |
> | 484 | nova-compute | node-8.mydom.com  | nova | enabled | down
> | 2017-12-17T23:55:50.00 | -   |
>
>
> if I stop and the start nova-compute on the down nodes the stop will take
> several minutes and then the start will be quick and fine. but after about
> 2 hours the nova-compute service will show down again.
>
> i am not seeing any ERRORS in nova logs.
>
> I get this for the status of a node that is showing as "UP"
>
>
>
> root@node-14:~# systemctl status nova-compute.service
> â nova-compute.service - OpenStack Compute
>Loaded: loaded (/lib/systemd/system/nova-compute.service; enabled;
> vendor preset: enabled)
>Active: active (running) since Mon 2017-12-18 21:57:10 UTC; 35min ago
>  Docs: man:nova-compute(1)
>   Process: 32193 ExecStartPre=/bin/chown nova:adm /var/log/nova
> (code=exited, status=0/SUCCESS)
>   Process: 32190 ExecStartPre=/bin/chown nova:nova /var/lock/nova
> /var/lib/nova (code=exited, status=0/SUCCESS)
>   Process: 32187 ExecStartPre=/bin/mkdir -p /var/lock/nova /var/log/nova
> /var/lib/nova (code=exited, status=0/SUCCESS)
>  Main PID: 32196 (nova-compute)
>CGroup: /system.slice/nova-compute.service
>ââ32196 /usr/bin/python /usr/bin/nova-compute
> --config-file=/etc/nova/nova-compute.conf --config-file=/etc/nova/nova.conf
> --log-file=/var/log/nova/nova-compute.log
>
> Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
> 22:31:47.570 32196 DEBUG oslo_messaging._drivers.amqpdriver
> [req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] CALL msg_id:
> 2877b9707da144f3a91e7b80e2705fb3 exchange 'nova' topic 'conductor' _send
> /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:448
> Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
> 22:31:47.604 32196 DEBUG oslo_messaging._drivers.amqpdriver [-] received
> reply msg_id: 2877b9707da144f3a91e7b80e2705fb3 __call__
> /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:296
> Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
> 22:31:47.605 32196 INFO nova.compute.resource_tracker
> [req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] Total usable vcpus:
> 40, total allocated vcpus: 0
> Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
> 22:31:47.606 32196 INFO nova.compute.resource_tracker
> [req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] Final resource view:
> name=node-14.mydom.com

Re: [Openstack] compute nodes down

2017-12-19 Thread Tobias Urdin
Enable debug in nova.conf and check conductor and compute logs.

Check that your clock is in-sync with NTP or you might experience that the 
alive checks in the database exceeds the service_down_time config value.

On 12/19/2017 12:09 AM, Jim Okken wrote:
hi list,

hoping someone could shed some light on this issue I just started seeing today

all my compute nodes started showing as "Down" in the Horizon -> Hypervisors -> 
Compute Nodes tab


root@node-1:~# nova service-list
+-+--+---+--+-+---++-+
| Id  | Binary   | Host  | Zone | Status  | State | 
Updated_at | Disabled Reason |
+-+--+---+--+-+---++-+
| 325 | nova-compute | node-9.mydom.com  | nova
 | enabled | down  | 2017-12-18T21:59:38.00 | -   |
| 448 | nova-compute | node-14.mydom.com | nova   
  | enabled | up| 2017-12-18T22:41:42.00 | -   |
| 451 | nova-compute | node-17.mydom.com | nova   
  | enabled | up| 2017-12-18T22:42:04.00 | -   |
| 454 | nova-compute | node-11.mydom.com | nova   
  | enabled | up| 2017-12-18T22:42:02.00 | -   |
| 457 | nova-compute | node-12.mydom.com | nova   
  | enabled | up| 2017-12-18T22:42:12.00 | -   |
| 472 | nova-compute | node-16.mydom.com | nova   
  | enabled | down  | 2017-12-18T00:16:01.00 | -   |
| 475 | nova-compute | node-10.mydom.com | nova   
  | enabled | down  | 2017-12-18T00:26:09.00 | -   |
| 478 | nova-compute | node-13.mydom.com | nova   
  | enabled | down  | 2017-12-17T23:54:06.00 | -   |
| 481 | nova-compute | node-15.mydom.com | nova   
  | enabled | up| 2017-12-18T22:41:34.00 | -   |
| 484 | nova-compute | node-8.mydom.com  | nova
 | enabled | down  | 2017-12-17T23:55:50.00 | -   |


if I stop and the start nova-compute on the down nodes the stop will take 
several minutes and then the start will be quick and fine. but after about 2 
hours the nova-compute service will show down again.

i am not seeing any ERRORS in nova logs.

I get this for the status of a node that is showing as "UP"



root@node-14:~# systemctl status nova-compute.service
â nova-compute.service - OpenStack Compute
   Loaded: loaded (/lib/systemd/system/nova-compute.service; enabled; vendor 
preset: enabled)
   Active: active (running) since Mon 2017-12-18 21:57:10 UTC; 35min ago
 Docs: man:nova-compute(1)
  Process: 32193 ExecStartPre=/bin/chown nova:adm /var/log/nova (code=exited, 
status=0/SUCCESS)
  Process: 32190 ExecStartPre=/bin/chown nova:nova /var/lock/nova /var/lib/nova 
(code=exited, status=0/SUCCESS)
  Process: 32187 ExecStartPre=/bin/mkdir -p /var/lock/nova /var/log/nova 
/var/lib/nova (code=exited, status=0/SUCCESS)
 Main PID: 32196 (nova-compute)
   CGroup: /system.slice/nova-compute.service
   ââ32196 /usr/bin/python /usr/bin/nova-compute 
--config-file=/etc/nova/nova-compute.conf --config-file=/etc/nova/nova.conf 
--log-file=/var/log/nova/nova-compute.log

Dec 18 22:31:47 node-14.mydom.com 
nova-compute[32196]: 2017-12-18 22:31:47.570 32196 DEBUG 
oslo_messaging._drivers.amqpdriver [req-f30b2331-2097-4981-89c8-acea4a81f7f2 - 
- - - -] CALL msg_id: 2877b9707da144f3a91e7b80e2705fb3 exchange 'nova' topic 
'conductor' _send 
/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:448
Dec 18 22:31:47 node-14.mydom.com 
nova-compute[32196]: 2017-12-18 22:31:47.604 32196 DEBUG 
oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 
2877b9707da144f3a91e7b80e2705fb3 __call__ 
/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:296
Dec 18 22:31:47 node-14.mydom.com 
nova-compute[32196]: 2017-12-18 22:31:47.605 32196 INFO 
nova.compute.resource_tracker [req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - 
-] Total usable vcpus: 40, total allocated vcpus: 0
Dec 18 22:31:47 node-14.mydom.com 
nova-compute[32196]: 2017-12-18 22:31:47.606 32196 INFO 
nova.compute.resource_tracker [req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - 
-] Final resource view: name=node-14.mydom.com 
phys_ram=128812MB used_ram=512MB phys_disk=6691GB used_disk=0GB total_vcpus=40 
used_vcpus=0 pci_stats=[]
Dec 18 22:31:47 node-14.mydom.com 
nova-compute[32196]: 2017-12-18 22:31:47.610 32196 DEBUG 
oslo_messaging._drivers.amqpdriver 

Re: [Openstack] compute nodes down

2017-12-18 Thread Paras pradhan
It might not be the compute nodes. I would check the rabbitmq, neutron and
nova logs on the controllers.


Paras

On Dec 18, 2017 5:30 PM, "Volodymyr Litovka"  wrote:

> Hi Jim,
>
> switch debug to true in nova.conf and check *also* other logs -
> nova-scheduler, nova-placement, nova-conductor.
>
> On 12/19/17 12:54 AM, Jim Okken wrote:
>
> hi list,
>
> hoping someone could shed some light on this issue I just started seeing
> today
>
> all my compute nodes started showing as "Down" in the Horizon ->
> Hypervisors -> Compute Nodes tab
>
>
> root@node-1:~# nova service-list
> +-+--+---+--+---
> --+---++-+
> | Id  | Binary   | Host  | Zone | Status  | State
> | Updated_at | Disabled Reason |
> +-+--+---+--+---
> --+---++-+
> | 325 | nova-compute | node-9.mydom.com  | nova | enabled | down
> | 2017-12-18T21:59:38.00 | -   |
> | 448 | nova-compute | node-14.mydom.com | nova | enabled | up
> | 2017-12-18T22:41:42.00 | -   |
> | 451 | nova-compute | node-17.mydom.com | nova | enabled | up
> | 2017-12-18T22:42:04.00 | -   |
> | 454 | nova-compute | node-11.mydom.com | nova | enabled | up
> | 2017-12-18T22:42:02.00 | -   |
> | 457 | nova-compute | node-12.mydom.com | nova | enabled | up
> | 2017-12-18T22:42:12.00 | -   |
> | 472 | nova-compute | node-16.mydom.com | nova | enabled | down
> | 2017-12-18T00:16:01.00 | -   |
> | 475 | nova-compute | node-10.mydom.com | nova | enabled | down
> | 2017-12-18T00:26:09.00 | -   |
> | 478 | nova-compute | node-13.mydom.com | nova | enabled | down
> | 2017-12-17T23:54:06.00 | -   |
> | 481 | nova-compute | node-15.mydom.com | nova | enabled | up
> | 2017-12-18T22:41:34.00 | -   |
> | 484 | nova-compute | node-8.mydom.com  | nova | enabled | down
> | 2017-12-17T23:55:50.00 | -   |
>
>
> if I stop and the start nova-compute on the down nodes the stop will take
> several minutes and then the start will be quick and fine. but after about
> 2 hours the nova-compute service will show down again.
>
> i am not seeing any ERRORS in nova logs.
>
> I get this for the status of a node that is showing as "UP"
>
>
>
> root@node-14:~# systemctl status nova-compute.service
> â nova-compute.service - OpenStack Compute
>Loaded: loaded (/lib/systemd/system/nova-compute.service; enabled;
> vendor preset: enabled)
>Active: active (running) since Mon 2017-12-18 21:57:10 UTC; 35min ago
>  Docs: man:nova-compute(1)
>   Process: 32193 ExecStartPre=/bin/chown nova:adm /var/log/nova
> (code=exited, status=0/SUCCESS)
>   Process: 32190 ExecStartPre=/bin/chown nova:nova /var/lock/nova
> /var/lib/nova (code=exited, status=0/SUCCESS)
>   Process: 32187 ExecStartPre=/bin/mkdir -p /var/lock/nova /var/log/nova
> /var/lib/nova (code=exited, status=0/SUCCESS)
>  Main PID: 32196 (nova-compute)
>CGroup: /system.slice/nova-compute.service
>ââ32196 /usr/bin/python /usr/bin/nova-compute
> --config-file=/etc/nova/nova-compute.conf --config-file=/etc/nova/nova.conf
> --log-file=/var/log/nova/nova-compute.log
>
> Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
> 22:31:47.570 32196 DEBUG oslo_messaging._drivers.amqpdriver
> [req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] CALL msg_id:
> 2877b9707da144f3a91e7b80e2705fb3 exchange 'nova' topic 'conductor' _send
> /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:448
> Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
> 22:31:47.604 32196 DEBUG oslo_messaging._drivers.amqpdriver [-] received
> reply msg_id: 2877b9707da144f3a91e7b80e2705fb3 __call__
> /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:296
> Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
> 22:31:47.605 32196 INFO nova.compute.resource_tracker
> [req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] Total usable vcpus:
> 40, total allocated vcpus: 0
> Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
> 22:31:47.606 32196 INFO nova.compute.resource_tracker
> [req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] Final resource view:
> name=node-14.mydom.com phys_ram=128812MB used_ram=512MB phys_disk=6691GB
> used_disk=0GB total_vcpus=40 used_vcpus=0 pci_stats=[]
> Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
> 22:31:47.610 32196 DEBUG oslo_messaging._drivers.amqpdriver
> [req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] CALL msg_id:
> ad32abe833f4440d86c15b911aa35c43 exchange 'nova' topic 'conductor' _send
> /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:448
> Dec 

Re: [Openstack] compute nodes down

2017-12-18 Thread Volodymyr Litovka

Hi Jim,

switch debug to true in nova.conf and check *also* other logs - 
nova-scheduler, nova-placement, nova-conductor.


On 12/19/17 12:54 AM, Jim Okken wrote:

hi list,

hoping someone could shed some light on this issue I just started 
seeing today


all my compute nodes started showing as "Down" in the Horizon -> 
Hypervisors -> Compute Nodes tab



root@node-1:~# nova service-list
+-+--+---+--+-+---++-+
| Id  | Binary           | Host         | Zone     | Status  | State | 
Updated_at      | Disabled Reason |

+-+--+---+--+-+---++-+
| 325 | nova-compute     | node-9.mydom.com  
| nova     | enabled | down  | 2017-12-18T21:59:38.00 | -          
     |
| 448 | nova-compute     | node-14.mydom.com 
 | nova     | enabled | up    | 
2017-12-18T22:41:42.00 | -               |
| 451 | nova-compute     | node-17.mydom.com 
 | nova     | enabled | up    | 
2017-12-18T22:42:04.00 | -               |
| 454 | nova-compute     | node-11.mydom.com 
 | nova     | enabled | up    | 
2017-12-18T22:42:02.00 | -               |
| 457 | nova-compute     | node-12.mydom.com 
 | nova     | enabled | up    | 
2017-12-18T22:42:12.00 | -               |
| 472 | nova-compute     | node-16.mydom.com 
 | nova     | enabled | down  | 
2017-12-18T00:16:01.00 | -               |
| 475 | nova-compute     | node-10.mydom.com 
 | nova     | enabled | down  | 
2017-12-18T00:26:09.00 | -               |
| 478 | nova-compute     | node-13.mydom.com 
 | nova     | enabled | down  | 
2017-12-17T23:54:06.00 | -               |
| 481 | nova-compute     | node-15.mydom.com 
 | nova     | enabled | up    | 
2017-12-18T22:41:34.00 | -               |
| 484 | nova-compute     | node-8.mydom.com  
| nova     | enabled | down  | 2017-12-17T23:55:50.00 | -          
     |



if I stop and the start nova-compute on the down nodes the stop will 
take several minutes and then the start will be quick and fine. but 
after about 2 hours the nova-compute service will show down again.


i am not seeing any ERRORS in nova logs.

I get this for the status of a node that is showing as "UP"



root@node-14:~# systemctl status nova-compute.service
â nova-compute.service - OpenStack Compute
   Loaded: loaded (/lib/systemd/system/nova-compute.service; enabled; 
vendor preset: enabled)

   Active: active (running) since Mon 2017-12-18 21:57:10 UTC; 35min ago
     Docs: man:nova-compute(1)
  Process: 32193 ExecStartPre=/bin/chown nova:adm /var/log/nova 
(code=exited, status=0/SUCCESS)
  Process: 32190 ExecStartPre=/bin/chown nova:nova /var/lock/nova 
/var/lib/nova (code=exited, status=0/SUCCESS)
  Process: 32187 ExecStartPre=/bin/mkdir -p /var/lock/nova 
/var/log/nova /var/lib/nova (code=exited, status=0/SUCCESS)

 Main PID: 32196 (nova-compute)
   CGroup: /system.slice/nova-compute.service
           ââ32196 /usr/bin/python /usr/bin/nova-compute 
--config-file=/etc/nova/nova-compute.conf 
--config-file=/etc/nova/nova.conf 
--log-file=/var/log/nova/nova-compute.log


Dec 18 22:31:47 node-14.mydom.com  
nova-compute[32196]: 2017-12-18 22:31:47.570 32196 DEBUG 
oslo_messaging._drivers.amqpdriver 
[req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] CALL msg_id: 
2877b9707da144f3a91e7b80e2705fb3 exchange 'nova' topic 'conductor' 
_send 
/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:448
Dec 18 22:31:47 node-14.mydom.com  
nova-compute[32196]: 2017-12-18 22:31:47.604 32196 DEBUG 
oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 
2877b9707da144f3a91e7b80e2705fb3 __call__ 
/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:296
Dec 18 22:31:47 node-14.mydom.com  
nova-compute[32196]: 2017-12-18 22:31:47.605 32196 INFO 
nova.compute.resource_tracker 
[req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] Total usable 
vcpus: 40, total allocated vcpus: 0
Dec 18 22:31:47 node-14.mydom.com  
nova-compute[32196]: 2017-12-18 22:31:47.606 32196 INFO 
nova.compute.resource_tracker 
[req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] Final resource 
view: name=node-14.mydom.com  
phys_ram=128812MB used_ram=512MB phys_disk=6691GB used_disk=0GB 
total_vcpus=40 used_vcpus=0 pci_stats=[]
Dec 18 22:31:47 node-14.mydom.com  
nova-compute[32196]: 2017-12-18 22:31:47.610 32196 DEBUG 
oslo_messaging._drivers.amqpdriver 
[req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] CALL msg_id: 
ad

[Openstack] compute nodes down

2017-12-18 Thread Jim Okken
hi list,

hoping someone could shed some light on this issue I just started seeing
today

all my compute nodes started showing as "Down" in the Horizon ->
Hypervisors -> Compute Nodes tab


root@node-1:~# nova service-list
+-+--+---+--+-+---++-+
| Id  | Binary   | Host  | Zone | Status  | State |
Updated_at | Disabled Reason |
+-+--+---+--+-+---++-+
| 325 | nova-compute | node-9.mydom.com  | nova | enabled | down  |
2017-12-18T21:59:38.00 | -   |
| 448 | nova-compute | node-14.mydom.com | nova | enabled | up|
2017-12-18T22:41:42.00 | -   |
| 451 | nova-compute | node-17.mydom.com | nova | enabled | up|
2017-12-18T22:42:04.00 | -   |
| 454 | nova-compute | node-11.mydom.com | nova | enabled | up|
2017-12-18T22:42:02.00 | -   |
| 457 | nova-compute | node-12.mydom.com | nova | enabled | up|
2017-12-18T22:42:12.00 | -   |
| 472 | nova-compute | node-16.mydom.com | nova | enabled | down  |
2017-12-18T00:16:01.00 | -   |
| 475 | nova-compute | node-10.mydom.com | nova | enabled | down  |
2017-12-18T00:26:09.00 | -   |
| 478 | nova-compute | node-13.mydom.com | nova | enabled | down  |
2017-12-17T23:54:06.00 | -   |
| 481 | nova-compute | node-15.mydom.com | nova | enabled | up|
2017-12-18T22:41:34.00 | -   |
| 484 | nova-compute | node-8.mydom.com  | nova | enabled | down  |
2017-12-17T23:55:50.00 | -   |


if I stop and the start nova-compute on the down nodes the stop will take
several minutes and then the start will be quick and fine. but after about
2 hours the nova-compute service will show down again.

i am not seeing any ERRORS in nova logs.

I get this for the status of a node that is showing as "UP"



root@node-14:~# systemctl status nova-compute.service
â nova-compute.service - OpenStack Compute
   Loaded: loaded (/lib/systemd/system/nova-compute.service; enabled;
vendor preset: enabled)
   Active: active (running) since Mon 2017-12-18 21:57:10 UTC; 35min ago
 Docs: man:nova-compute(1)
  Process: 32193 ExecStartPre=/bin/chown nova:adm /var/log/nova
(code=exited, status=0/SUCCESS)
  Process: 32190 ExecStartPre=/bin/chown nova:nova /var/lock/nova
/var/lib/nova (code=exited, status=0/SUCCESS)
  Process: 32187 ExecStartPre=/bin/mkdir -p /var/lock/nova /var/log/nova
/var/lib/nova (code=exited, status=0/SUCCESS)
 Main PID: 32196 (nova-compute)
   CGroup: /system.slice/nova-compute.service
   ââ32196 /usr/bin/python /usr/bin/nova-compute
--config-file=/etc/nova/nova-compute.conf --config-file=/etc/nova/nova.conf
--log-file=/var/log/nova/nova-compute.log

Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
22:31:47.570 32196 DEBUG oslo_messaging._drivers.amqpdriver
[req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] CALL msg_id:
2877b9707da144f3a91e7b80e2705fb3 exchange 'nova' topic 'conductor' _send
/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:448
Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
22:31:47.604 32196 DEBUG oslo_messaging._drivers.amqpdriver [-] received
reply msg_id: 2877b9707da144f3a91e7b80e2705fb3 __call__
/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:296
Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
22:31:47.605 32196 INFO nova.compute.resource_tracker
[req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] Total usable vcpus:
40, total allocated vcpus: 0
Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
22:31:47.606 32196 INFO nova.compute.resource_tracker
[req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] Final resource view:
name=node-14.mydom.com phys_ram=128812MB used_ram=512MB phys_disk=6691GB
used_disk=0GB total_vcpus=40 used_vcpus=0 pci_stats=[]
Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
22:31:47.610 32196 DEBUG oslo_messaging._drivers.amqpdriver
[req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] CALL msg_id:
ad32abe833f4440d86c15b911aa35c43 exchange 'nova' topic 'conductor' _send
/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:448
Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
22:31:47.632 32196 DEBUG oslo_messaging._drivers.amqpdriver [-] received
reply msg_id: ad32abe833f4440d86c15b911aa35c43 __call__
/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:296
Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
22:31:47.633 32196 WARNING nova.scheduler.client.report
[req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] Unable to refresh my
resource provider record
Dec