Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-16 Thread Torin Woltjer
I feel pretty dumb about this, but it was fixed by adding a rule to my security 
groups. I'm still very confused about some of the other behavior that I saw, 
but at least the problem is fixed now.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: Brian Haley 
Sent: 7/16/18 4:39 PM
To: torin.wolt...@granddial.com, thangam.ar...@gmail.com, jpetr...@coredial.com
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] [Openstack-operators] Recovering from full outage
On 07/16/2018 08:41 AM, Torin Woltjer wrote:
> $ip netns exec qdhcp-87a5200d-057f-475d-953d-17e873a47454 curl
> http://169.254.169.254
>
>
>  404 Not Found
>
>
>
404 Not Found

>  The resource could not be found.

>
>

Strange, don't know where the reply came from for that.

> $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e curl
> http://169.254.169.254
> curl: (7) Couldn't connect to server

Based on your iptables output below, I would think the metadata proxy is
running in the qrouter namespace. However, a curl from there will not
work since it is restricted to only work for incoming packets from the
qr- device(s). You would have to try curl from a running instance.

Is there an haproxy process running? And is it listening on port 9697
in the qrouter namespace?

-Brian

> 
> *From*: "Torin Woltjer"
> *Sent*: 7/12/18 11:16 AM
> *To*: , ,
> "jpetr...@coredial.com"
> *Cc*: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
> *Subject*: Re: [Openstack] [Openstack-operators] Recovering from full outage
> Checking iptables for the metadata-proxy inside of qrouter provides the
> following:
> $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e
> iptables-save -c | grep 169
> [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p
> tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697
> [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p
> tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0x
> Packets:Bytes are both 0, so no traffic is touching this rule?
>
> Interestingly the command:
> $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat
> -anep | grep 9697
> returns nothing, so there isn't actually anything running on 9697 in the
> network namespace...
>
> This is the output without grep:
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address   Foreign Address
> State   User   Inode  PID/Program name
> raw0  0 0.0.0.0:112 0.0.0.0:*   7
> 0  76154  8404/keepalived
> raw0  0 0.0.0.0:112 0.0.0.0:*   7
> 0  76153  8404/keepalived
> Active UNIX domain sockets (servers and established)
> Proto RefCnt Flags   Type   State I-Node   PID/Program
> name Path
> unix  2  [ ] DGRAM645017567/python2
> unix  2  [ ] DGRAM799538403/keepalived
>
> Could the reason no traffic touching the rule be that nothing is
> listening on that port, or is there a second issue down the chain?
>
> Curl fails even after restarting the neutron-dhcp-agent &
> neutron-metadata agent.
>
> Thank you for this, and any future help.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-16 Thread Torin Woltjer
$ip netns exec qdhcp-87a5200d-057f-475d-953d-17e873a47454 curl 
http://169.254.169.254


 404 Not Found


 404 Not Found
 The resource could not be found.



$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e curl 
http://169.254.169.254
curl: (7) Couldn't connect to server

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/12/18 11:16 AM
To: , , "jpetr...@coredial.com" 

Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] [Openstack-operators] Recovering from full outage
Checking iptables for the metadata-proxy inside of qrouter provides the 
following:
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e iptables-save -c | 
grep 169
[0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m 
tcp --dport 80 -j REDIRECT --to-ports 9697
[0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m 
tcp --dport 80 -j MARK --set-xmark 0x1/0x
Packets:Bytes are both 0, so no traffic is touching this rule?

Interestingly the command:
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat -anep | 
grep 9697
returns nothing, so there isn't actually anything running on 9697 in the 
network namespace...

This is the output without grep:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address   Foreign Address State   
User   Inode  PID/Program name
raw0  0 0.0.0.0:112 0.0.0.0:*   7   
0  76154  8404/keepalived
raw0  0 0.0.0.0:112 0.0.0.0:*   7   
0  76153  8404/keepalived
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags   Type   State I-Node   PID/Program name 
Path
unix  2  [ ] DGRAM645017567/python2
unix  2  [ ] DGRAM799538403/keepalived

Could the reason no traffic touching the rule be that nothing is listening on 
that port, or is there a second issue down the chain?

Curl fails even after restarting the neutron-dhcp-agent & neutron-metadata 
agent.

Thank you for this, and any future help.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-12 Thread Torin Woltjer
Checking iptables for the metadata-proxy inside of qrouter provides the 
following:
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e iptables-save -c | 
grep 169
[0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m 
tcp --dport 80 -j REDIRECT --to-ports 9697
[0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m 
tcp --dport 80 -j MARK --set-xmark 0x1/0x
Packets:Bytes are both 0, so no traffic is touching this rule?

Interestingly the command:
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat -anep | 
grep 9697
returns nothing, so there isn't actually anything running on 9697 in the 
network namespace...

This is the output without grep:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address   Foreign Address State   
User   Inode  PID/Program name
raw0  0 0.0.0.0:112 0.0.0.0:*   7   
0  76154  8404/keepalived
raw0  0 0.0.0.0:112 0.0.0.0:*   7   
0  76153  8404/keepalived
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags   Type   State I-Node   PID/Program name 
Path
unix  2  [ ] DGRAM645017567/python2
unix  2  [ ] DGRAM799538403/keepalived

Could the reason no traffic touching the rule be that nothing is listening on 
that port, or is there a second issue down the chain?

Curl fails even after restarting the neutron-dhcp-agent & neutron-metadata 
agent.

Thank you for this, and any future help.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-12 Thread John Petrini
You might want to try giving the neutron-dhcp and metadata agents a restart.
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-12 Thread Torin Woltjer
I tested this on two instances. The first instance has existed since before I 
began having this issue. The second is created from a cirros test image.

On the first instance:
The route exists: 169.254.169.254 via 172.16.1.1 dev ens3 proto dhcp metric 100.
curl returns information, for example;
`curl http://169.254.169.254/latest/meta-data/public-keys`
0=nextcloud

On the second instance:
The route exists: 169.254.169.254 via 172.16.1.1 dev eth0
curl fails;
`curl http://169.254.169.254/latest/meta-data`
curl: (7) Failed to connect to 169.254.169.254 port 80: Connection timed out

I am curious why this is the case that one is able to connect but not the 
other. Both the first and second instances were running on the same compute 
node.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: John Petrini 
Sent: 7/12/18 9:16 AM
To: torin.wolt...@granddial.com
Cc: thangam.ar...@gmail.com, OpenStack Operators 
, OpenStack Mailing List 

Subject: Re: [Openstack-operators] [Openstack] Recovering from full outage
Are you instances receiving a route to the metadata service (169.254.169.254) 
from DHCP? Can you curl the endpoint? curl 
http://169.254.169.254/latest/meta-data


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-12 Thread John Petrini
Are you instances receiving a route to the metadata service
(169.254.169.254) from DHCP? Can you curl the endpoint? curl
http://169.254.169.254/latest/meta-data
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-11 Thread Arun Kumar
Hi Torin,

If I run `ip netns exec qrouter netstat -lnp` or `ip netns exec qdhcp
> netstat -lnp` on the controller, should I see anything listening on the
> metadata port (8775)? When I run these commands I don't see that listening,
> but I have no example of a working system to check against. Can anybody
> verify this?
>

Either on qrouter/qdhcp namespaces, you won't see port 8775, instead check
whether meta-data service is running on the neutron controller node(s) and
listening on port 8775? Aslo, you can verify metadata and neturon services
using following commands

service neutron-metadata-agent status
neutron agent-list
netstat -ntplua | grep :8775


Thanks & Regards
Arun

ஃஃ
அன்புடன்
அருண்
நுட்பம் நம்மொழியில் தழைக்கச் செய்வோம்
http://thangamaniarun.wordpress.com
ஃஃ
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-11 Thread Torin Woltjer
If I run `ip netns exec qrouter netstat -lnp` or `ip netns exec qdhcp netstat 
-lnp` on the controller, should I see anything listening on the metadata port 
(8775)? When I run these commands I don't see that listening, but I have no 
example of a working system to check against. Can anybody verify this?

Thanks,

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/10/18 2:58 PM
To: 
Cc: , 
Subject: Re: [Openstack] Recovering from full outage
DHCP is working again so instances are getting their addresses. For some reason 
cloud-init isn't working correctly. Hostnames aren't getting set, and SSH key 
pair isn't getting set. The neutron-metadata service is in control of this?

neutron-metadata-agent.log:
2018-07-10 08:01:42.046 5518 INFO eventlet.wsgi.server [-] 
109.73.185.195, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0622332
2018-07-10 09:49:42.604 5518 INFO eventlet.wsgi.server [-] 
197.149.85.150, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0645461
2018-07-10 10:52:50.845 5517 INFO eventlet.wsgi.server [-] 
88.249.225.204, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0659041
2018-07-10 11:43:20.471 5518 INFO eventlet.wsgi.server [-] 
143.208.186.168, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0618532
2018-07-10 11:53:15.574 5511 INFO eventlet.wsgi.server [-] 
194.40.240.254, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0636070
2018-07-10 13:26:46.795 5518 INFO eventlet.wsgi.server [-] 
109.73.177.149, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0611560
2018-07-10 13:27:38.795 5513 INFO eventlet.wsgi.server [-] 
125.167.69.238, "GET / HTTP/1.0" status: 404  len: 195 time: 0.0631371
2018-07-10 13:30:49.551 5514 INFO eventlet.wsgi.server [-] 
155.93.152.111, "GET / HTTP/1.0" status: 404  len: 195 time: 0.0609179
2018-07-10 14:12:42.008 5521 INFO eventlet.wsgi.server [-] 
190.85.38.173, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0597739

No other log files show abnormal behavior.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/6/18 2:33 PM
To: "lmihaie...@gmail.com" 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
I explored creating a second "selfservice" vxlan to see if DHCP would work on 
it as it does on my external "provider" network. The new vxlan network shares 
the same problems as the old vxlan network. Am I having problems with VXLAN in 
particular?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/6/18 12:05 PM
To: 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP 
(located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually 
added the IP address to has a floating IP, and oddly enough I am able to ping 
DHCP on the provider network, which suggests that DHCP may be working on other 
networks but not on my selfservice network. I was able to confirm this by 
creating a new virtual machine directly on the provider network, I was able to 
ping to it and SSH into it right off of the bat, as it obtained the proper 
address on its own. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains:
fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8
fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7
fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12
fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10
fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3
fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14
fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1
fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2
fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4
fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100
fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13
fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18

I've done system restarts since the power outage and the agent hasn't corrected 
itself. I've restarted all neutron services as I've done things, I could also 
try stopping and starting dnsmasq.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/6/18 11:15 AM
To: torin.wolt...@granddial.com
Cc: "openst...@lists.openstack.org" , 
"openstack-operators@lists.openstack.org" 
, pgso...@gmail.com
Subject: Re: [Openst

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-10 Thread Torin Woltjer
DHCP is working again so instances are getting their addresses. For some reason 
cloud-init isn't working correctly. Hostnames aren't getting set, and SSH key 
pair isn't getting set. The neutron-metadata service is in control of this?

neutron-metadata-agent.log:
2018-07-10 08:01:42.046 5518 INFO eventlet.wsgi.server [-] 
109.73.185.195, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0622332
2018-07-10 09:49:42.604 5518 INFO eventlet.wsgi.server [-] 
197.149.85.150, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0645461
2018-07-10 10:52:50.845 5517 INFO eventlet.wsgi.server [-] 
88.249.225.204, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0659041
2018-07-10 11:43:20.471 5518 INFO eventlet.wsgi.server [-] 
143.208.186.168, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0618532
2018-07-10 11:53:15.574 5511 INFO eventlet.wsgi.server [-] 
194.40.240.254, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0636070
2018-07-10 13:26:46.795 5518 INFO eventlet.wsgi.server [-] 
109.73.177.149, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0611560
2018-07-10 13:27:38.795 5513 INFO eventlet.wsgi.server [-] 
125.167.69.238, "GET / HTTP/1.0" status: 404  len: 195 time: 0.0631371
2018-07-10 13:30:49.551 5514 INFO eventlet.wsgi.server [-] 
155.93.152.111, "GET / HTTP/1.0" status: 404  len: 195 time: 0.0609179
2018-07-10 14:12:42.008 5521 INFO eventlet.wsgi.server [-] 
190.85.38.173, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0597739

No other log files show abnormal behavior.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/6/18 2:33 PM
To: "lmihaie...@gmail.com" 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
I explored creating a second "selfservice" vxlan to see if DHCP would work on 
it as it does on my external "provider" network. The new vxlan network shares 
the same problems as the old vxlan network. Am I having problems with VXLAN in 
particular?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/6/18 12:05 PM
To: 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP 
(located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually 
added the IP address to has a floating IP, and oddly enough I am able to ping 
DHCP on the provider network, which suggests that DHCP may be working on other 
networks but not on my selfservice network. I was able to confirm this by 
creating a new virtual machine directly on the provider network, I was able to 
ping to it and SSH into it right off of the bat, as it obtained the proper 
address on its own. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains:
fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8
fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7
fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12
fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10
fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3
fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14
fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1
fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2
fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4
fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100
fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13
fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18

I've done system restarts since the power outage and the agent hasn't corrected 
itself. I've restarted all neutron services as I've done things, I could also 
try stopping and starting dnsmasq.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/6/18 11:15 AM
To: torin.wolt...@granddial.com
Cc: "openst...@lists.openstack.org" , 
"openstack-operators@lists.openstack.org" 
, pgso...@gmail.com
Subject: Re: [Openstack] Recovering from full outage
Can you manually assign an IP address to a VM and once inside, ping the address 
of the dhcp server?
That would confirm if there is connectivity at least.

Also, on the controller node where the dhcp server for that network is, check 
the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and 
make sure there are entries corresponding to your instances.

In my experience, if neutron is broken after working fine (so excluding any 
miss-configuration), then an agent is out-of-sync and restart usually fixes 
things.

On Fri

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-06 Thread Torin Woltjer
I explored creating a second "selfservice" vxlan to see if DHCP would work on 
it as it does on my external "provider" network. The new vxlan network shares 
the same problems as the old vxlan network. Am I having problems with VXLAN in 
particular?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/6/18 12:05 PM
To: 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP 
(located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually 
added the IP address to has a floating IP, and oddly enough I am able to ping 
DHCP on the provider network, which suggests that DHCP may be working on other 
networks but not on my selfservice network. I was able to confirm this by 
creating a new virtual machine directly on the provider network, I was able to 
ping to it and SSH into it right off of the bat, as it obtained the proper 
address on its own. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains:
fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8
fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7
fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12
fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10
fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3
fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14
fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1
fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2
fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4
fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100
fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13
fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18

I've done system restarts since the power outage and the agent hasn't corrected 
itself. I've restarted all neutron services as I've done things, I could also 
try stopping and starting dnsmasq.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/6/18 11:15 AM
To: torin.wolt...@granddial.com
Cc: "openst...@lists.openstack.org" , 
"openstack-operators@lists.openstack.org" 
, pgso...@gmail.com
Subject: Re: [Openstack] Recovering from full outage
Can you manually assign an IP address to a VM and once inside, ping the address 
of the dhcp server?
That would confirm if there is connectivity at least.

Also, on the controller node where the dhcp server for that network is, check 
the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and 
make sure there are entries corresponding to your instances.

In my experience, if neutron is broken after working fine (so excluding any 
miss-configuration), then an agent is out-of-sync and restart usually fixes 
things.

On Fri, Jul 6, 2018 at 9:38 AM, Torin Woltjer  
wrote:
I have done tcpdumps on both the controllers and on a compute node.
Controller:
`ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i 
ns-83d68c76-b8 port 67`
`tcpdump -vnes0 -i any port 67`
Compute:
`tcpdump -vnes0 -i brqd85c2a00-a6 port 68`

For the first command on the controller, there are no packets captured at all. 
The second command on the controller captures packets, but they don't appear to 
be relevant to openstack. The dump from the compute node shows constant 
requests are getting sent by openstack instances.

In summary; DHCP requests are being sent, but are never received.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 4:50 PM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage

The cloud-init requires network connectivity by default in order to reach the 
metadata server for the hostname, ssh-key, etc

You can configure cloud-init to use the config-drive, but the lack of network 
connectivity will make the instance useless anyway, even though it will have 
you ssh-key and hostname...

Did you check the things I told you?

On Jul 5, 2018, at 16:06, Torin Woltjer  wrote:

Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't 
working on my VMs. created a new instance from an ubuntu 18.04 image to test 
with, the hostname was not set to the name of the instance and could not login 
as users I had specified in the configuration.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 12:57 PM
To: torin.wo

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-06 Thread Torin Woltjer
Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP 
(located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually 
added the IP address to has a floating IP, and oddly enough I am able to ping 
DHCP on the provider network, which suggests that DHCP may be working on other 
networks but not on my selfservice network. I was able to confirm this by 
creating a new virtual machine directly on the provider network, I was able to 
ping to it and SSH into it right off of the bat, as it obtained the proper 
address on its own. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains:
fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8
fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7
fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12
fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10
fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3
fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14
fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1
fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2
fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4
fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100
fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13
fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18

I've done system restarts since the power outage and the agent hasn't corrected 
itself. I've restarted all neutron services as I've done things, I could also 
try stopping and starting dnsmasq.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/6/18 11:15 AM
To: torin.wolt...@granddial.com
Cc: "openst...@lists.openstack.org" , 
"openstack-operators@lists.openstack.org" 
, pgso...@gmail.com
Subject: Re: [Openstack] Recovering from full outage
Can you manually assign an IP address to a VM and once inside, ping the address 
of the dhcp server?
That would confirm if there is connectivity at least.

Also, on the controller node where the dhcp server for that network is, check 
the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and 
make sure there are entries corresponding to your instances.

In my experience, if neutron is broken after working fine (so excluding any 
miss-configuration), then an agent is out-of-sync and restart usually fixes 
things.

On Fri, Jul 6, 2018 at 9:38 AM, Torin Woltjer  
wrote:
I have done tcpdumps on both the controllers and on a compute node.
Controller:
`ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i 
ns-83d68c76-b8 port 67`
`tcpdump -vnes0 -i any port 67`
Compute:
`tcpdump -vnes0 -i brqd85c2a00-a6 port 68`

For the first command on the controller, there are no packets captured at all. 
The second command on the controller captures packets, but they don't appear to 
be relevant to openstack. The dump from the compute node shows constant 
requests are getting sent by openstack instances.

In summary; DHCP requests are being sent, but are never received.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 4:50 PM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage

The cloud-init requires network connectivity by default in order to reach the 
metadata server for the hostname, ssh-key, etc

You can configure cloud-init to use the config-drive, but the lack of network 
connectivity will make the instance useless anyway, even though it will have 
you ssh-key and hostname...

Did you check the things I told you?

On Jul 5, 2018, at 16:06, Torin Woltjer  wrote:

Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't 
working on my VMs. created a new instance from an ubuntu 18.04 image to test 
with, the hostname was not set to the name of the instance and could not login 
as users I had specified in the configuration.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 12:57 PM
To: torin.wolt...@granddial.com
Cc: "openst...@lists.openstack.org" , 
"openstack-operators@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
You should tcpdump inside the qdhcp namespace to see if the requests make it 
there, and also check iptables rules on the compute nodes for the return 
traffic.

On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer  
wrote:
Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually 
come up with no addresses. neutron-dhcp-agent has been restarted on both 
controllers. The qdhcp netns's were all present; I st

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-06 Thread George Mihaiescu
Can you manually assign an IP address to a VM and once inside, ping the
address of the dhcp server?
That would confirm if there is connectivity at least.


Also, on the controller node where the dhcp server for that network is,
check the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases"
and make sure there are entries corresponding to your instances.

In my experience, if neutron is broken after working fine (so excluding any
miss-configuration), then an agent is out-of-sync and restart usually fixes
things.



On Fri, Jul 6, 2018 at 9:38 AM, Torin Woltjer 
wrote:

> I have done tcpdumps on both the controllers and on a compute node.
> Controller:
> `ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0
> -i ns-83d68c76-b8 port 67`
> `tcpdump -vnes0 -i any port 67`
> Compute:
> `tcpdump -vnes0 -i brqd85c2a00-a6 port 68`
>
> For the first command on the controller, there are no packets captured at
> all. The second command on the controller captures packets, but they don't
> appear to be relevant to openstack. The dump from the compute node shows
> constant requests are getting sent by openstack instances.
>
> In summary; DHCP requests are being sent, but are never received.
>
> *Torin Woltjer*
>
> *Grand Dial Communications - A ZK Tech Inc. Company*
>
> *616.776.1066 ext. 2006*
> * www.granddial.com *
>
> --
> *From*: George Mihaiescu 
> *Sent*: 7/5/18 4:50 PM
> *To*: torin.wolt...@granddial.com
> *Subject*: Re: [Openstack] Recovering from full outage
>
> The cloud-init requires network connectivity by default in order to reach
> the metadata server for the hostname, ssh-key, etc
>
> You can configure cloud-init to use the config-drive, but the lack of
> network connectivity will make the instance useless anyway, even though it
> will have you ssh-key and hostname...
>
> Did you check the things I told you?
>
> On Jul 5, 2018, at 16:06, Torin Woltjer 
> wrote:
>
> Are IP addresses set by cloud-init on boot? I noticed that cloud-init
> isn't working on my VMs. created a new instance from an ubuntu 18.04 image
> to test with, the hostname was not set to the name of the instance and
> could not login as users I had specified in the configuration.
>
> *Torin Woltjer*
>
> *Grand Dial Communications - A ZK Tech Inc. Company*
>
> *616.776.1066 ext. 2006*
> *  
> www.granddial.com *
>
> --
> *From*: George Mihaiescu 
> *Sent*: 7/5/18 12:57 PM
> *To*: torin.wolt...@granddial.com
> *Cc*: "openst...@lists.openstack.org" , "
> openstack-operators@lists.openstack.org"  openstack.org>
> *Subject*: Re: [Openstack] Recovering from full outage
> You should tcpdump inside the qdhcp namespace to see if the requests make
> it there, and also check iptables rules on the compute nodes for the return
> traffic.
>
>
> On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer <
> torin.wolt...@granddial.com> wrote:
>
>> Yes, I've done this. The VMs hang for awhile waiting for DHCP and
>> eventually come up with no addresses. neutron-dhcp-agent has been restarted
>> on both controllers. The qdhcp netns's were all present; I stopped the
>> service, removed the qdhcp netns's, noted the dhcp agents show offline by
>> `neutron agent-list`, restarted all neutron services, noted the qdhcp
>> netns's were recreated, restarted a VM again and it still fails to pull an
>> IP address.
>>
>> *Torin Woltjer*
>>
>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>
>> *616.776.1066 ext. 2006*
>> *  
>>  
>> www.granddial.com *
>>
>> --
>> *From*: George Mihaiescu 
>> *Sent*: 7/5/18 10:38 AM
>> *To*: torin.wolt...@granddial.com
>> *Subject*: Re: [Openstack] Recovering from full outage
>> Did you restart the neutron-dhcp-agent  and rebooted the VMs?
>>
>> On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer <
>> torin.wolt...@granddial.com> wrote:
>>
>>> The qrouter netns appears once the lock_path is specified, the neutron
>>> router is pingable as well. However, instances are not pingable. If I log
>>> in via console, the instances have not been given IP addresses, if I
>>> manually give them an address and route they are pingable and seem to work.
>>> So the router is working correctly but dhcp is not working.
>>>
>>> No errors in any of the neutron or nova logs on controllers or compute
>>> nodes.
>>>
>>>
>>> *Torin Woltjer*
>>>
>>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>>
>>> *616.776.1066 ext. 2006*
>>> *  
>>>  
>>>  
>>> www.granddia

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-06 Thread Torin Woltjer
I have done tcpdumps on both the controllers and on a compute node.
Controller:
`ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i 
ns-83d68c76-b8 port 67`
`tcpdump -vnes0 -i any port 67`
Compute:
`tcpdump -vnes0 -i brqd85c2a00-a6 port 68`

For the first command on the controller, there are no packets captured at all. 
The second command on the controller captures packets, but they don't appear to 
be relevant to openstack. The dump from the compute node shows constant 
requests are getting sent by openstack instances.

In summary; DHCP requests are being sent, but are never received.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 4:50 PM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage

The cloud-init requires network connectivity by default in order to reach the 
metadata server for the hostname, ssh-key, etc

You can configure cloud-init to use the config-drive, but the lack of network 
connectivity will make the instance useless anyway, even though it will have 
you ssh-key and hostname...

Did you check the things I told you?

On Jul 5, 2018, at 16:06, Torin Woltjer  wrote:

Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't 
working on my VMs. created a new instance from an ubuntu 18.04 image to test 
with, the hostname was not set to the name of the instance and could not login 
as users I had specified in the configuration.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 12:57 PM
To: torin.wolt...@granddial.com
Cc: "openst...@lists.openstack.org" , 
"openstack-operators@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
You should tcpdump inside the qdhcp namespace to see if the requests make it 
there, and also check iptables rules on the compute nodes for the return 
traffic.

On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer  
wrote:
Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually 
come up with no addresses. neutron-dhcp-agent has been restarted on both 
controllers. The qdhcp netns's were all present; I stopped the service, removed 
the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, 
restarted all neutron services, noted the qdhcp netns's were recreated, 
restarted a VM again and it still fails to pull an IP address.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 10:38 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Did you restart the neutron-dhcp-agent  and rebooted the VMs?

On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer  
wrote:
The qrouter netns appears once the lock_path is specified, the neutron router 
is pingable as well. However, instances are not pingable. If I log in via 
console, the instances have not been given IP addresses, if I manually give 
them an address and route they are pingable and seem to work. So the router is 
working correctly but dhcp is not working.

No errors in any of the neutron or nova logs on controllers or compute nodes.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/5/18 8:53 AM
To: 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operators@lists.openstack.org" 
, "openst...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just caus

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-05 Thread Pedro Sousa
Hi,

that could be a problem with neutron metadata service, check the logs.

Have you considered that the outage might have corrupted your databases,
neutron, nova, etc?

BR

On Thu, Jul 5, 2018 at 9:07 PM Torin Woltjer 
wrote:

> Are IP addresses set by cloud-init on boot? I noticed that cloud-init
> isn't working on my VMs. created a new instance from an ubuntu 18.04 image
> to test with, the hostname was not set to the name of the instance and
> could not login as users I had specified in the configuration.
>
> *Torin Woltjer*
>
> *Grand Dial Communications - A ZK Tech Inc. Company*
>
> *616.776.1066 ext. 2006*
> *www.granddial.com  *
>
> --
> *From*: George Mihaiescu 
> *Sent*: 7/5/18 12:57 PM
> *To*: torin.wolt...@granddial.com
> *Cc*: "openst...@lists.openstack.org" , "
> openstack-operators@lists.openstack.org" <
> openstack-operators@lists.openstack.org>
> *Subject*: Re: [Openstack] Recovering from full outage
> You should tcpdump inside the qdhcp namespace to see if the requests make
> it there, and also check iptables rules on the compute nodes for the return
> traffic.
>
>
> On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer <
> torin.wolt...@granddial.com> wrote:
>
>> Yes, I've done this. The VMs hang for awhile waiting for DHCP and
>> eventually come up with no addresses. neutron-dhcp-agent has been restarted
>> on both controllers. The qdhcp netns's were all present; I stopped the
>> service, removed the qdhcp netns's, noted the dhcp agents show offline by
>> `neutron agent-list`, restarted all neutron services, noted the qdhcp
>> netns's were recreated, restarted a VM again and it still fails to pull an
>> IP address.
>>
>> *Torin Woltjer*
>>
>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>
>> *616.776.1066 ext. 2006*
>> *  www.granddial.com
>>  *
>>
>> --
>> *From*: George Mihaiescu 
>> *Sent*: 7/5/18 10:38 AM
>> *To*: torin.wolt...@granddial.com
>> *Subject*: Re: [Openstack] Recovering from full outage
>> Did you restart the neutron-dhcp-agent  and rebooted the VMs?
>>
>> On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer <
>> torin.wolt...@granddial.com> wrote:
>>
>>> The qrouter netns appears once the lock_path is specified, the neutron
>>> router is pingable as well. However, instances are not pingable. If I log
>>> in via console, the instances have not been given IP addresses, if I
>>> manually give them an address and route they are pingable and seem to work.
>>> So the router is working correctly but dhcp is not working.
>>>
>>> No errors in any of the neutron or nova logs on controllers or compute
>>> nodes.
>>>
>>>
>>> *Torin Woltjer*
>>>
>>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>>
>>> *616.776.1066 ext. 2006*
>>> *  
>>>  www.granddial.com
>>>  *
>>>
>>> --
>>> *From*: "Torin Woltjer" 
>>> *Sent*: 7/5/18 8:53 AM
>>> *To*: 
>>> *Cc*: openstack-operators@lists.openstack.org,
>>> openst...@lists.openstack.org
>>> *Subject*: Re: [Openstack] Recovering from full outage
>>> There is no lock path set in my neutron configuration. Does it
>>> ultimately matter what it is set to as long as it is consistent? Does it
>>> need to be set on compute nodes as well as controllers?
>>>
>>> *Torin Woltjer*
>>>
>>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>>
>>> *616.776.1066 ext. 2006*
>>> *  
>>>  
>>>  www.granddial.com
>>>  *
>>>
>>> --
>>> *From*: George Mihaiescu 
>>> *Sent*: 7/3/18 7:47 PM
>>> *To*: torin.wolt...@granddial.com
>>> *Cc*: openstack-operators@lists.openstack.org,
>>> openst...@lists.openstack.org
>>> *Subject*: Re: [Openstack] Recovering from full outage
>>>
>>> Did you set a lock_path in the neutron’s config?
>>>
>>> On Jul 3, 2018, at 17:34, Torin Woltjer 
>>> wrote:
>>>
>>> The following errors appear in the neutron-linuxbridge-agent.log on both
>>> controllers: 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> http://paste.openstack.org/show/724930/
>>>
>>> No such errors are on the compute nodes themselves.
>>>
>>> *Torin Woltjer*
>>>
>>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>>
>>> *616.776.1066 ext. 2006*
>>> *  

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-05 Thread Torin Woltjer
Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't 
working on my VMs. created a new instance from an ubuntu 18.04 image to test 
with, the hostname was not set to the name of the instance and could not login 
as users I had specified in the configuration.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 12:57 PM
To: torin.wolt...@granddial.com
Cc: "openst...@lists.openstack.org" , 
"openstack-operators@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
You should tcpdump inside the qdhcp namespace to see if the requests make it 
there, and also check iptables rules on the compute nodes for the return 
traffic.

On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer  
wrote:
Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually 
come up with no addresses. neutron-dhcp-agent has been restarted on both 
controllers. The qdhcp netns's were all present; I stopped the service, removed 
the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, 
restarted all neutron services, noted the qdhcp netns's were recreated, 
restarted a VM again and it still fails to pull an IP address.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 10:38 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Did you restart the neutron-dhcp-agent  and rebooted the VMs?

On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer  
wrote:
The qrouter netns appears once the lock_path is specified, the neutron router 
is pingable as well. However, instances are not pingable. If I log in via 
console, the instances have not been given IP addresses, if I manually give 
them an address and route they are pingable and seem to work. So the router is 
working correctly but dhcp is not working.

No errors in any of the neutron or nova logs on controllers or compute nodes.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/5/18 8:53 AM
To: 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operators@lists.openstack.org" 
, "openst...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down.

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-05 Thread George Mihaiescu
You should tcpdump inside the qdhcp namespace to see if the requests make
it there, and also check iptables rules on the compute nodes for the return
traffic.


On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer 
wrote:

> Yes, I've done this. The VMs hang for awhile waiting for DHCP and
> eventually come up with no addresses. neutron-dhcp-agent has been restarted
> on both controllers. The qdhcp netns's were all present; I stopped the
> service, removed the qdhcp netns's, noted the dhcp agents show offline by
> `neutron agent-list`, restarted all neutron services, noted the qdhcp
> netns's were recreated, restarted a VM again and it still fails to pull an
> IP address.
>
> *Torin Woltjer*
>
> *Grand Dial Communications - A ZK Tech Inc. Company*
>
> *616.776.1066 ext. 2006*
> * www.granddial.com *
>
> --
> *From*: George Mihaiescu 
> *Sent*: 7/5/18 10:38 AM
> *To*: torin.wolt...@granddial.com
> *Subject*: Re: [Openstack] Recovering from full outage
> Did you restart the neutron-dhcp-agent  and rebooted the VMs?
>
> On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer <
> torin.wolt...@granddial.com> wrote:
>
>> The qrouter netns appears once the lock_path is specified, the neutron
>> router is pingable as well. However, instances are not pingable. If I log
>> in via console, the instances have not been given IP addresses, if I
>> manually give them an address and route they are pingable and seem to work.
>> So the router is working correctly but dhcp is not working.
>>
>> No errors in any of the neutron or nova logs on controllers or compute
>> nodes.
>>
>>
>> *Torin Woltjer*
>>
>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>
>> *616.776.1066 ext. 2006*
>> *  
>> www.granddial.com *
>>
>> --
>> *From*: "Torin Woltjer" 
>> *Sent*: 7/5/18 8:53 AM
>> *To*: 
>> *Cc*: openstack-operators@lists.openstack.org,
>> openst...@lists.openstack.org
>> *Subject*: Re: [Openstack] Recovering from full outage
>> There is no lock path set in my neutron configuration. Does it ultimately
>> matter what it is set to as long as it is consistent? Does it need to be
>> set on compute nodes as well as controllers?
>>
>> *Torin Woltjer*
>>
>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>
>> *616.776.1066 ext. 2006*
>> *  
>>  
>> www.granddial.com *
>>
>> --
>> *From*: George Mihaiescu 
>> *Sent*: 7/3/18 7:47 PM
>> *To*: torin.wolt...@granddial.com
>> *Cc*: openstack-operators@lists.openstack.org,
>> openst...@lists.openstack.org
>> *Subject*: Re: [Openstack] Recovering from full outage
>>
>> Did you set a lock_path in the neutron’s config?
>>
>> On Jul 3, 2018, at 17:34, Torin Woltjer 
>> wrote:
>>
>> The following errors appear in the neutron-linuxbridge-agent.log on both
>> controllers: 
>> 
>> 
>> 
>> 
>> http://paste.openstack.org/sho
>> w/724930/
>>
>> No such errors are on the compute nodes themselves.
>>
>> *Torin Woltjer*
>>
>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>
>> *616.776.1066 ext. 2006*
>> *  
>>  
>>  
>> www.granddial.com *
>>
>> --
>> *From*: "Torin Woltjer" 
>> *Sent*: 7/3/18 5:14 PM
>> *To*: 
>> *Cc*: "openstack-operators@lists.openstack.org" <
>> openstack-operators@lists.openstack.org>, "openst...@lists.openstack.org"
>> 
>> *Subject*: Re: [Openstack] Recovering from full outage
>> Running `openstack server reboot` on an instance just causes the instance
>> to be stuck in a rebooting status. Most notable of the logs is
>> neutron-server.log which shows the following:
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> http://paste.openstack.org/sho
>> w/724917/
>>
>> I realized that rabbitmq was in a failed state, so I bootstrapped it,
>> rebooted controllers, and all of the agents show online.
>> 
>> 
>> 

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-05 Thread Torin Woltjer
Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually 
come up with no addresses. neutron-dhcp-agent has been restarted on both 
controllers. The qdhcp netns's were all present; I stopped the service, removed 
the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, 
restarted all neutron services, noted the qdhcp netns's were recreated, 
restarted a VM again and it still fails to pull an IP address.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 10:38 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Did you restart the neutron-dhcp-agent  and rebooted the VMs?

On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer  
wrote:
The qrouter netns appears once the lock_path is specified, the neutron router 
is pingable as well. However, instances are not pingable. If I log in via 
console, the instances have not been given IP addresses, if I manually give 
them an address and route they are pingable and seem to work. So the router is 
working correctly but dhcp is not working.

No errors in any of the neutron or nova logs on controllers or compute nodes.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/5/18 8:53 AM
To: 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operators@lists.openstack.org" 
, "openst...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down. Running `ip addr` on any of the nodes shows 
that none of the bridges are present and `ip netns` shows that all of the 
network namespaces are missing as well. So despite all of the neutron service 
running, none of the networking appears to be active, which is concerning. How 
do I solve this without recreating all of the networks?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openst...@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-05 Thread Torin Woltjer
The qrouter netns appears once the lock_path is specified, the neutron router 
is pingable as well. However, instances are not pingable. If I log in via 
console, the instances have not been given IP addresses, if I manually give 
them an address and route they are pingable and seem to work. So the router is 
working correctly but dhcp is not working.

No errors in any of the neutron or nova logs on controllers or compute nodes.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/5/18 8:53 AM
To: 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operators@lists.openstack.org" 
, "openst...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down. Running `ip addr` on any of the nodes shows 
that none of the bridges are present and `ip netns` shows that all of the 
network namespaces are missing as well. So despite all of the neutron service 
running, none of the networking appears to be active, which is concerning. How 
do I solve this without recreating all of the networks?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openst...@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-05 Thread Torin Woltjer
There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operators@lists.openstack.org" 
, "openst...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down. Running `ip addr` on any of the nodes shows 
that none of the bridges are present and `ip netns` shows that all of the 
network namespaces are missing as well. So despite all of the neutron service 
running, none of the networking appears to be active, which is concerning. How 
do I solve this without recreating all of the networks?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openst...@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-03 Thread George Mihaiescu
Did you set a lock_path in the neutron’s config?

> On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:
> 
> The following errors appear in the neutron-linuxbridge-agent.log on both 
> controllers: http://paste.openstack.org/show/724930/
> 
> No such errors are on the compute nodes themselves.
> 
> Torin Woltjer
>  
> Grand Dial Communications - A ZK Tech Inc. Company
>  
> 616.776.1066 ext. 2006
> www.granddial.com
> 
> From: "Torin Woltjer" 
> Sent: 7/3/18 5:14 PM
> To: 
> Cc: "openstack-operators@lists.openstack.org" 
> , "openst...@lists.openstack.org" 
> 
> Subject: Re: [Openstack] Recovering from full outage
> Running `openstack server reboot` on an instance just causes the instance to 
> be stuck in a rebooting status. Most notable of the logs is 
> neutron-server.log which shows the following:
> http://paste.openstack.org/show/724917/
> 
> I realized that rabbitmq was in a failed state, so I bootstrapped it, 
> rebooted controllers, and all of the agents show online.
> http://paste.openstack.org/show/724921/
> And all of the instances can be properly started, however I cannot ping any 
> of the instances floating IPs or the neutron router. And when logging into an 
> instance with the console, there is no IP address on any interface.
> 
> Torin Woltjer
>  
> Grand Dial Communications - A ZK Tech Inc. Company
>  
> 616.776.1066 ext. 2006
> www.granddial.com
> 
> From: George Mihaiescu 
> Sent: 7/3/18 11:50 AM
> To: torin.wolt...@granddial.com
> Subject: Re: [Openstack] Recovering from full outage
> Try restarting them using "openstack server reboot" and also check the 
> nova-compute.log and neutron agents logs on the compute nodes.
> 
>> On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
>> wrote:
>> We just suffered a power outage in out data center and I'm having trouble 
>> recovering the Openstack cluster. All of the nodes are back online, every 
>> instance shows active but `virsh list --all` on the compute nodes show that 
>> all of the VMs are actually shut down. Running `ip addr` on any of the nodes 
>> shows that none of the bridges are present and `ip netns` shows that all of 
>> the network namespaces are missing as well. So despite all of the neutron 
>> service running, none of the networking appears to be active, which is 
>> concerning. How do I solve this without recreating all of the networks?
>> 
>> Torin Woltjer
>>  
>> Grand Dial Communications - A ZK Tech Inc. Company
>>  
>> 616.776.1066 ext. 2006
>> www.granddial.com
>> 
>> ___
>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to : openst...@lists.openstack.org
>> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> 
> 
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-03 Thread Torin Woltjer
The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operators@lists.openstack.org" 
, "openst...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down. Running `ip addr` on any of the nodes shows 
that none of the bridges are present and `ip netns` shows that all of the 
network namespaces are missing as well. So despite all of the neutron service 
running, none of the networking appears to be active, which is concerning. How 
do I solve this without recreating all of the networks?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openst...@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-03 Thread Jimmy McArthur
I'm adding this to the OpenStack Operators list as it's a bit better for 
these types of questions.


Torin Woltjer wrote:
We just suffered a power outage in out data center and I'm having 
trouble recovering the Openstack cluster. All of the nodes are back 
online, every instance shows active but `virsh list --all` on the 
compute nodes show that all of the VMs are actually shut down. Running 
`ip addr` on any of the nodes shows that none of the bridges are 
present and `ip netns` shows that all of the network namespaces are 
missing as well. So despite all of the neutron service running, none 
of the networking appears to be active, which is concerning. How do I 
solve this without recreating all of the networks?


/*Torin Woltjer*/
*Grand Dial Communications - A ZK Tech Inc. Company*
*616.776.1066 ext. 2006*
/*www.granddial.com */
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openst...@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators