Re: [Openstack] DHCP not accessible on new compute node.
I've just done this and the problem is still there. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: Marcio Prado Sent: 11/2/18 5:08 PM To: torin.wolt...@granddial.com Subject: Re: [Openstack] DHCP not accessible on new compute node. Clone the hd of a server and restore to what is not working. then only change the required settings ... ip, hostname, etc. Marcio Prado Analista de TI - Infraestrutura e Redes Fone: (35) 9.9821-3561 www.marcioprado.eti.br Em 02/11/2018 16:27, Torin Woltjer escreveu:I've completely wiped the node and reinstalled it, and the problem still persists. I can't ping instances on other compute nodes, or ping the DHCP ports. Instances don't get addresses or metadata when started on this node. From: Marcio Prado Sent: 11/1/18 9:51 AM To: torin.wolt...@granddial.com Cc: openstack@lists.openstack.org Subject: Re: [Openstack] DHCP not accessible on new compute node. I believe you have not forgotten anything. This should probably be bug ... As my cloud is not production, but rather masters research. I migrate the VM live to a node that is working, restart it, after that I migrate back to the original node that was not working and it keeps running ... Em 30-10-2018 17:50, Torin Woltjer escreveu: > Interestingly, I created a brand new selfservice network and DHCP > doesn't work on that either. I've followed the instructions in the > minimal setup (excluding the controllers as they're already set up) > but the new node has no access to the DHCP agent in neutron it seems. > Is there a likely component that I've overlooked? > > _TORIN WOLTJER_ > > GRAND DIAL COMMUNICATIONS - A ZK TECH INC. COMPANY > > 616.776.1066 EXT. 2006 > _WWW.GRANDDIAL.COM [1]_ > > ----- > > FROM: "Torin Woltjer" > SENT: 10/30/18 10:48 AM > TO: , "openstack@lists.openstack.org" > > SUBJECT: Re: [Openstack] DHCP not accessible on new compute node. > > I deleted both DHCP ports and they recreated as you said. However, > instances are still unable to get network addresses automatically. > > _TORIN WOLTJER_ > > GRAND DIAL COMMUNICATIONS - A ZK TECH INC. COMPANY > > 616.776.1066 EXT. 2006 > _ [1] [1]WWW.GRANDDIAL.COM [1]_ > > - > > FROM: Marcio Prado > SENT: 10/29/18 6:23 PM > TO: torin.wolt...@granddial.com > SUBJECT: Re: [Openstack] DHCP not accessible on new compute node. > The door is recreated automatically. The problem like I said is not in > DHCP, but for some reason, erasing and waiting for OpenStack to > re-create the port often solves the problem. > > Please, if you can find out the problem in fact, let me know. I'm very > interested to know. > > You can delete the door without fear. OpenStack will recreate in a > short > time. > > Links: > -- > [1] http://www.granddial.com -- Marcio Prado Analista de TI - Infraestrutura e Redes Fone: (35) 9.9821-3561 www.marcioprado.eti.br ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] DHCP not accessible on new compute node.
So I did further ping tests and explored differences between my working compute nodes and my non-working compute node. Firstly, it seems that the VXLAN is working between the nonworking compute node and controller nodes. After manually setting IP addresses, I can ping from an instance on the non working node to 172.16.1.1 (neutron gateway); when running tcpdump I can see icmp on: -compute's bridge interface -compute's vxlan interface -controller's vxlan interface -controller's bridge interface -controller's qrouter namespace This behavior is expected and is the same for instances on the working compute nodes. However if I try to ping 172.16.1.2 (neutron dhcp) from an instance on the nonworking compute node, pings do not flow. If I use tcpdump to listen for pings I cannot hear any, even listening on the compute node itself; this includes listening on the vxlan, bridge, and the tap device directly. Once I try to ping in reverse, from the dhcp netns on the controller to the instance on the non-working compute node, pings begin to flow. The same is true for pings between the instance on the nonworking compute and an instance on the working compute. Pings do not flow, until the working instance pings. Once pings are flowing between the nonworking instance and neutron DHCP; I run dhclient on the instance and start listening for DHCP requests with tcpdump, and I hear them on: -compute's bridge interface -compute's vxlan interface They don't make it to the controller node. I've re-enabled l2-population on the controller's and rebooted them just in case, but the problem persists. A diff of /etc/ on all compute nodes shows that all openstack and networking related configuration is effectively identical. The last difference between the non-working compute node and the working compute nodes as far as I can tell, is that the new node has a different network card. The working nodes use "Broadcom Limited NetXtreme II BCM57712 10 Gigabit Ethernet" and the nonworking node uses a "NetXen Incorporated NX3031 Multifunction 1/10-Gigabit Server Adapter". Are there any known issues with neutron and this brand of network adapter? I looked at the capabilities on both adapters and here are the differences: Broadcom NetXen tx-tcp-ecn-segmentation: on tx-tcp-ecn-segmentation: off [fixed] rx-vlan-offload: on [fixed] rx-vlan-offload: off [fixed] receive-hashing: on receive-hashing: off [fixed] rx-vlan-filter: on rx-vlan-filter: off [fixed] tx-gre-segmentation: on tx-gre-segmentation: off [fixed] tx-gre-csum-segmentation: ontx-gre-csum-segmentation: off [fixed] tx-ipxip4-segmentation: on tx-ipxip4-segmentation: off [fixed] tx-udp_tnl-segmentation: on tx-udp_tnl-segmentation: off [fixed] tx-udp_tnl-csum-segmentation: ontx-udp_tnl-csum-segmentation: off [fixed] tx-gso-partial: on tx-gso-partial: off [fixed] loopback: off loopback: off [fixed] rx-udp_tunnel-port-offload: on rx-udp_tunnel-port-offload: off [fixed] ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] DHCP not accessible on new compute node.
I've completely wiped the node and reinstalled it, and the problem still persists. I can't ping instances on other compute nodes, or ping the DHCP ports. Instances don't get addresses or metadata when started on this node. From: Marcio Prado Sent: 11/1/18 9:51 AM To: torin.wolt...@granddial.com Cc: openstack@lists.openstack.org Subject: Re: [Openstack] DHCP not accessible on new compute node. I believe you have not forgotten anything. This should probably be bug ... As my cloud is not production, but rather masters research. I migrate the VM live to a node that is working, restart it, after that I migrate back to the original node that was not working and it keeps running ... Em 30-10-2018 17:50, Torin Woltjer escreveu: > Interestingly, I created a brand new selfservice network and DHCP > doesn't work on that either. I've followed the instructions in the > minimal setup (excluding the controllers as they're already set up) > but the new node has no access to the DHCP agent in neutron it seems. > Is there a likely component that I've overlooked? > > _TORIN WOLTJER_ > > GRAND DIAL COMMUNICATIONS - A ZK TECH INC. COMPANY > > 616.776.1066 EXT. 2006 > _WWW.GRANDDIAL.COM [1]_ > > ----- > > FROM: "Torin Woltjer" > SENT: 10/30/18 10:48 AM > TO: , "openstack@lists.openstack.org" > > SUBJECT: Re: [Openstack] DHCP not accessible on new compute node. > > I deleted both DHCP ports and they recreated as you said. However, > instances are still unable to get network addresses automatically. > > _TORIN WOLTJER_ > > GRAND DIAL COMMUNICATIONS - A ZK TECH INC. COMPANY > > 616.776.1066 EXT. 2006 > _ [1] [1]WWW.GRANDDIAL.COM [1]_ > > - > > FROM: Marcio Prado > SENT: 10/29/18 6:23 PM > TO: torin.wolt...@granddial.com > SUBJECT: Re: [Openstack] DHCP not accessible on new compute node. > The door is recreated automatically. The problem like I said is not in > DHCP, but for some reason, erasing and waiting for OpenStack to > re-create the port often solves the problem. > > Please, if you can find out the problem in fact, let me know. I'm very > interested to know. > > You can delete the door without fear. OpenStack will recreate in a > short > time. > > Links: > -- > [1] http://www.granddial.com -- Marcio Prado Analista de TI - Infraestrutura e Redes Fone: (35) 9.9821-3561 www.marcioprado.eti.br ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [openstack-dev] [Openstack] DHCP not accessible on new compute node.
I've completely wiped the node and reinstalled it, and the problem still persists. I can't ping instances on other compute nodes, or ping the DHCP ports. Instances don't get addresses or metadata when started on this node. From: Marcio Prado Sent: 11/1/18 9:51 AM To: torin.wolt...@granddial.com Cc: openst...@lists.openstack.org Subject: Re: [Openstack] DHCP not accessible on new compute node. I believe you have not forgotten anything. This should probably be bug ... As my cloud is not production, but rather masters research. I migrate the VM live to a node that is working, restart it, after that I migrate back to the original node that was not working and it keeps running ... Em 30-10-2018 17:50, Torin Woltjer escreveu: > Interestingly, I created a brand new selfservice network and DHCP > doesn't work on that either. I've followed the instructions in the > minimal setup (excluding the controllers as they're already set up) > but the new node has no access to the DHCP agent in neutron it seems. > Is there a likely component that I've overlooked? > > _TORIN WOLTJER_ > > GRAND DIAL COMMUNICATIONS - A ZK TECH INC. COMPANY > > 616.776.1066 EXT. 2006 > _WWW.GRANDDIAL.COM [1]_ > > ----- > > FROM: "Torin Woltjer" > SENT: 10/30/18 10:48 AM > TO: , "openst...@lists.openstack.org" > > SUBJECT: Re: [Openstack] DHCP not accessible on new compute node. > > I deleted both DHCP ports and they recreated as you said. However, > instances are still unable to get network addresses automatically. > > _TORIN WOLTJER_ > > GRAND DIAL COMMUNICATIONS - A ZK TECH INC. COMPANY > > 616.776.1066 EXT. 2006 > _ [1] [1]WWW.GRANDDIAL.COM [1]_ > > - > > FROM: Marcio Prado > SENT: 10/29/18 6:23 PM > TO: torin.wolt...@granddial.com > SUBJECT: Re: [Openstack] DHCP not accessible on new compute node. > The door is recreated automatically. The problem like I said is not in > DHCP, but for some reason, erasing and waiting for OpenStack to > re-create the port often solves the problem. > > Please, if you can find out the problem in fact, let me know. I'm very > interested to know. > > You can delete the door without fear. OpenStack will recreate in a > short > time. > > Links: > -- > [1] http://www.granddial.com -- Marcio Prado Analista de TI - Infraestrutura e Redes Fone: (35) 9.9821-3561 www.marcioprado.eti.br __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [Openstack] DHCP not accessible on new compute node.
Interestingly, I created a brand new selfservice network and DHCP doesn't work on that either. I've followed the instructions in the minimal setup (excluding the controllers as they're already set up) but the new node has no access to the DHCP agent in neutron it seems. Is there a likely component that I've overlooked? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 10/30/18 10:48 AM To: , "openstack@lists.openstack.org" Subject: Re: [Openstack] DHCP not accessible on new compute node. I deleted both DHCP ports and they recreated as you said. However, instances are still unable to get network addresses automatically. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: Marcio Prado Sent: 10/29/18 6:23 PM To: torin.wolt...@granddial.com Subject: Re: [Openstack] DHCP not accessible on new compute node. The door is recreated automatically. The problem like I said is not in DHCP, but for some reason, erasing and waiting for OpenStack to re-create the port often solves the problem. Please, if you can find out the problem in fact, let me know. I'm very interested to know. You can delete the door without fear. OpenStack will recreate in a short time. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] DHCP not accessible on new compute node.
I deleted both DHCP ports and they recreated as you said. However, instances are still unable to get network addresses automatically. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: Marcio Prado Sent: 10/29/18 6:23 PM To: torin.wolt...@granddial.com Subject: Re: [Openstack] DHCP not accessible on new compute node. The door is recreated automatically. The problem like I said is not in DHCP, but for some reason, erasing and waiting for OpenStack to re-create the port often solves the problem. Please, if you can find out the problem in fact, let me know. I'm very interested to know. You can delete the door without fear. OpenStack will recreate in a short time. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] DHCP not accessible on new compute node.
Recently installed a new compute node, but noticed none of the instances that I put on it will successfully receive network addresses from DHCP. This seems to work on all other compute nodes however. When listening for DHCP requests on the vxlan of the compute node, I notice that while I can see the DHCP requests on the new compute node, I do not see them anywhere else. If I manually assign an address to the interface on the instance I am able to ping in and out. Running dhclient -v on an instance on a working compute node successfully gets a DHCP response, on the new compute node there is no response, I also discovered that the instance on the new compute node cannot ping the DHCP ports at 172.16.1.2 & 172.16.1.3 yet can ping the gateway at 172.16.1.1. The setup is neutron-linuxbridge on Openstack Queens. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] Rename Cinder Volume
That was easy, Thanks! Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] Rename Cinder Volume
Is it possible to change the name and description of a Cinder volume? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] [Openstack-operators] Recovering from full outage
I feel pretty dumb about this, but it was fixed by adding a rule to my security groups. I'm still very confused about some of the other behavior that I saw, but at least the problem is fixed now. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: Brian Haley Sent: 7/16/18 4:39 PM To: torin.wolt...@granddial.com, thangam.ar...@gmail.com, jpetr...@coredial.com Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org Subject: Re: [Openstack] [Openstack-operators] Recovering from full outage On 07/16/2018 08:41 AM, Torin Woltjer wrote: > $ip netns exec qdhcp-87a5200d-057f-475d-953d-17e873a47454 curl > http://169.254.169.254 > > > 404 Not Found > > > 404 Not Found > The resource could not be found. > > Strange, don't know where the reply came from for that. > $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e curl > http://169.254.169.254 > curl: (7) Couldn't connect to server Based on your iptables output below, I would think the metadata proxy is running in the qrouter namespace. However, a curl from there will not work since it is restricted to only work for incoming packets from the qr- device(s). You would have to try curl from a running instance. Is there an haproxy process running? And is it listening on port 9697 in the qrouter namespace? -Brian > -------- > *From*: "Torin Woltjer" > *Sent*: 7/12/18 11:16 AM > *To*: , , > "jpetr...@coredial.com" > *Cc*: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org > *Subject*: Re: [Openstack] [Openstack-operators] Recovering from full outage > Checking iptables for the metadata-proxy inside of qrouter provides the > following: > $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e > iptables-save -c | grep 169 > [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p > tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697 > [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p > tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0x > Packets:Bytes are both 0, so no traffic is touching this rule? > > Interestingly the command: > $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat > -anep | grep 9697 > returns nothing, so there isn't actually anything running on 9697 in the > network namespace... > > This is the output without grep: > Active Internet connections (servers and established) > Proto Recv-Q Send-Q Local Address Foreign Address > State User Inode PID/Program name > raw0 0 0.0.0.0:112 0.0.0.0:* 7 > 0 76154 8404/keepalived > raw0 0 0.0.0.0:112 0.0.0.0:* 7 > 0 76153 8404/keepalived > Active UNIX domain sockets (servers and established) > Proto RefCnt Flags Type State I-Node PID/Program > name Path > unix 2 [ ] DGRAM645017567/python2 > unix 2 [ ] DGRAM799538403/keepalived > > Could the reason no traffic touching the rule be that nothing is > listening on that port, or is there a second issue down the chain? > > Curl fails even after restarting the neutron-dhcp-agent & > neutron-metadata agent. > > Thank you for this, and any future help. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack-operators] [Openstack] Recovering from full outage
I feel pretty dumb about this, but it was fixed by adding a rule to my security groups. I'm still very confused about some of the other behavior that I saw, but at least the problem is fixed now. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: Brian Haley Sent: 7/16/18 4:39 PM To: torin.wolt...@granddial.com, thangam.ar...@gmail.com, jpetr...@coredial.com Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org Subject: Re: [Openstack] [Openstack-operators] Recovering from full outage On 07/16/2018 08:41 AM, Torin Woltjer wrote: > $ip netns exec qdhcp-87a5200d-057f-475d-953d-17e873a47454 curl > http://169.254.169.254 > > > 404 Not Found > > > 404 Not Found > The resource could not be found. > > Strange, don't know where the reply came from for that. > $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e curl > http://169.254.169.254 > curl: (7) Couldn't connect to server Based on your iptables output below, I would think the metadata proxy is running in the qrouter namespace. However, a curl from there will not work since it is restricted to only work for incoming packets from the qr- device(s). You would have to try curl from a running instance. Is there an haproxy process running? And is it listening on port 9697 in the qrouter namespace? -Brian > -------- > *From*: "Torin Woltjer" > *Sent*: 7/12/18 11:16 AM > *To*: , , > "jpetr...@coredial.com" > *Cc*: openstack-operators@lists.openstack.org, openst...@lists.openstack.org > *Subject*: Re: [Openstack] [Openstack-operators] Recovering from full outage > Checking iptables for the metadata-proxy inside of qrouter provides the > following: > $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e > iptables-save -c | grep 169 > [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p > tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697 > [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p > tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0x > Packets:Bytes are both 0, so no traffic is touching this rule? > > Interestingly the command: > $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat > -anep | grep 9697 > returns nothing, so there isn't actually anything running on 9697 in the > network namespace... > > This is the output without grep: > Active Internet connections (servers and established) > Proto Recv-Q Send-Q Local Address Foreign Address > State User Inode PID/Program name > raw0 0 0.0.0.0:112 0.0.0.0:* 7 > 0 76154 8404/keepalived > raw0 0 0.0.0.0:112 0.0.0.0:* 7 > 0 76153 8404/keepalived > Active UNIX domain sockets (servers and established) > Proto RefCnt Flags Type State I-Node PID/Program > name Path > unix 2 [ ] DGRAM645017567/python2 > unix 2 [ ] DGRAM799538403/keepalived > > Could the reason no traffic touching the rule be that nothing is > listening on that port, or is there a second issue down the chain? > > Curl fails even after restarting the neutron-dhcp-agent & > neutron-metadata agent. > > Thank you for this, and any future help. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack] [Openstack-operators] Recovering from full outage
$ip netns exec qdhcp-87a5200d-057f-475d-953d-17e873a47454 curl http://169.254.169.254 404 Not Found 404 Not Found The resource could not be found. $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e curl http://169.254.169.254 curl: (7) Couldn't connect to server Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/12/18 11:16 AM To: , , "jpetr...@coredial.com" Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org Subject: Re: [Openstack] [Openstack-operators] Recovering from full outage Checking iptables for the metadata-proxy inside of qrouter provides the following: $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e iptables-save -c | grep 169 [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697 [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0x Packets:Bytes are both 0, so no traffic is touching this rule? Interestingly the command: $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat -anep | grep 9697 returns nothing, so there isn't actually anything running on 9697 in the network namespace... This is the output without grep: Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name raw0 0 0.0.0.0:112 0.0.0.0:* 7 0 76154 8404/keepalived raw0 0 0.0.0.0:112 0.0.0.0:* 7 0 76153 8404/keepalived Active UNIX domain sockets (servers and established) Proto RefCnt Flags Type State I-Node PID/Program name Path unix 2 [ ] DGRAM645017567/python2 unix 2 [ ] DGRAM799538403/keepalived Could the reason no traffic touching the rule be that nothing is listening on that port, or is there a second issue down the chain? Curl fails even after restarting the neutron-dhcp-agent & neutron-metadata agent. Thank you for this, and any future help. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack-operators] [Openstack] Recovering from full outage
$ip netns exec qdhcp-87a5200d-057f-475d-953d-17e873a47454 curl http://169.254.169.254 404 Not Found 404 Not Found The resource could not be found. $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e curl http://169.254.169.254 curl: (7) Couldn't connect to server Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/12/18 11:16 AM To: , , "jpetr...@coredial.com" Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org Subject: Re: [Openstack] [Openstack-operators] Recovering from full outage Checking iptables for the metadata-proxy inside of qrouter provides the following: $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e iptables-save -c | grep 169 [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697 [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0x Packets:Bytes are both 0, so no traffic is touching this rule? Interestingly the command: $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat -anep | grep 9697 returns nothing, so there isn't actually anything running on 9697 in the network namespace... This is the output without grep: Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name raw0 0 0.0.0.0:112 0.0.0.0:* 7 0 76154 8404/keepalived raw0 0 0.0.0.0:112 0.0.0.0:* 7 0 76153 8404/keepalived Active UNIX domain sockets (servers and established) Proto RefCnt Flags Type State I-Node PID/Program name Path unix 2 [ ] DGRAM645017567/python2 unix 2 [ ] DGRAM799538403/keepalived Could the reason no traffic touching the rule be that nothing is listening on that port, or is there a second issue down the chain? Curl fails even after restarting the neutron-dhcp-agent & neutron-metadata agent. Thank you for this, and any future help. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack] [Openstack-operators] Recovering from full outage
Checking iptables for the metadata-proxy inside of qrouter provides the following: $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e iptables-save -c | grep 169 [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697 [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0x Packets:Bytes are both 0, so no traffic is touching this rule? Interestingly the command: $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat -anep | grep 9697 returns nothing, so there isn't actually anything running on 9697 in the network namespace... This is the output without grep: Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name raw0 0 0.0.0.0:112 0.0.0.0:* 7 0 76154 8404/keepalived raw0 0 0.0.0.0:112 0.0.0.0:* 7 0 76153 8404/keepalived Active UNIX domain sockets (servers and established) Proto RefCnt Flags Type State I-Node PID/Program name Path unix 2 [ ] DGRAM645017567/python2 unix 2 [ ] DGRAM799538403/keepalived Could the reason no traffic touching the rule be that nothing is listening on that port, or is there a second issue down the chain? Curl fails even after restarting the neutron-dhcp-agent & neutron-metadata agent. Thank you for this, and any future help. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack-operators] [Openstack] Recovering from full outage
Checking iptables for the metadata-proxy inside of qrouter provides the following: $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e iptables-save -c | grep 169 [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697 [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0x Packets:Bytes are both 0, so no traffic is touching this rule? Interestingly the command: $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat -anep | grep 9697 returns nothing, so there isn't actually anything running on 9697 in the network namespace... This is the output without grep: Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name raw0 0 0.0.0.0:112 0.0.0.0:* 7 0 76154 8404/keepalived raw0 0 0.0.0.0:112 0.0.0.0:* 7 0 76153 8404/keepalived Active UNIX domain sockets (servers and established) Proto RefCnt Flags Type State I-Node PID/Program name Path unix 2 [ ] DGRAM645017567/python2 unix 2 [ ] DGRAM799538403/keepalived Could the reason no traffic touching the rule be that nothing is listening on that port, or is there a second issue down the chain? Curl fails even after restarting the neutron-dhcp-agent & neutron-metadata agent. Thank you for this, and any future help. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack] [Openstack-operators] Recovering from full outage
I tested this on two instances. The first instance has existed since before I began having this issue. The second is created from a cirros test image. On the first instance: The route exists: 169.254.169.254 via 172.16.1.1 dev ens3 proto dhcp metric 100. curl returns information, for example; `curl http://169.254.169.254/latest/meta-data/public-keys` 0=nextcloud On the second instance: The route exists: 169.254.169.254 via 172.16.1.1 dev eth0 curl fails; `curl http://169.254.169.254/latest/meta-data` curl: (7) Failed to connect to 169.254.169.254 port 80: Connection timed out I am curious why this is the case that one is able to connect but not the other. Both the first and second instances were running on the same compute node. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: John Petrini Sent: 7/12/18 9:16 AM To: torin.wolt...@granddial.com Cc: thangam.ar...@gmail.com, OpenStack Operators , OpenStack Mailing List Subject: Re: [Openstack-operators] [Openstack] Recovering from full outage Are you instances receiving a route to the metadata service (169.254.169.254) from DHCP? Can you curl the endpoint? curl http://169.254.169.254/latest/meta-data ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack-operators] [Openstack] Recovering from full outage
I tested this on two instances. The first instance has existed since before I began having this issue. The second is created from a cirros test image. On the first instance: The route exists: 169.254.169.254 via 172.16.1.1 dev ens3 proto dhcp metric 100. curl returns information, for example; `curl http://169.254.169.254/latest/meta-data/public-keys` 0=nextcloud On the second instance: The route exists: 169.254.169.254 via 172.16.1.1 dev eth0 curl fails; `curl http://169.254.169.254/latest/meta-data` curl: (7) Failed to connect to 169.254.169.254 port 80: Connection timed out I am curious why this is the case that one is able to connect but not the other. Both the first and second instances were running on the same compute node. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: John Petrini Sent: 7/12/18 9:16 AM To: torin.wolt...@granddial.com Cc: thangam.ar...@gmail.com, OpenStack Operators , OpenStack Mailing List Subject: Re: [Openstack-operators] [Openstack] Recovering from full outage Are you instances receiving a route to the metadata service (169.254.169.254) from DHCP? Can you curl the endpoint? curl http://169.254.169.254/latest/meta-data ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack] Recovering from full outage
If I run `ip netns exec qrouter netstat -lnp` or `ip netns exec qdhcp netstat -lnp` on the controller, should I see anything listening on the metadata port (8775)? When I run these commands I don't see that listening, but I have no example of a working system to check against. Can anybody verify this? Thanks, Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/10/18 2:58 PM To: Cc: , Subject: Re: [Openstack] Recovering from full outage DHCP is working again so instances are getting their addresses. For some reason cloud-init isn't working correctly. Hostnames aren't getting set, and SSH key pair isn't getting set. The neutron-metadata service is in control of this? neutron-metadata-agent.log: 2018-07-10 08:01:42.046 5518 INFO eventlet.wsgi.server [-] 109.73.185.195, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0622332 2018-07-10 09:49:42.604 5518 INFO eventlet.wsgi.server [-] 197.149.85.150, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0645461 2018-07-10 10:52:50.845 5517 INFO eventlet.wsgi.server [-] 88.249.225.204, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0659041 2018-07-10 11:43:20.471 5518 INFO eventlet.wsgi.server [-] 143.208.186.168, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0618532 2018-07-10 11:53:15.574 5511 INFO eventlet.wsgi.server [-] 194.40.240.254, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0636070 2018-07-10 13:26:46.795 5518 INFO eventlet.wsgi.server [-] 109.73.177.149, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0611560 2018-07-10 13:27:38.795 5513 INFO eventlet.wsgi.server [-] 125.167.69.238, "GET / HTTP/1.0" status: 404 len: 195 time: 0.0631371 2018-07-10 13:30:49.551 5514 INFO eventlet.wsgi.server [-] 155.93.152.111, "GET / HTTP/1.0" status: 404 len: 195 time: 0.0609179 2018-07-10 14:12:42.008 5521 INFO eventlet.wsgi.server [-] 190.85.38.173, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0597739 No other log files show abnormal behavior. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/6/18 2:33 PM To: "lmihaie...@gmail.com" Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage I explored creating a second "selfservice" vxlan to see if DHCP would work on it as it does on my external "provider" network. The new vxlan network shares the same problems as the old vxlan network. Am I having problems with VXLAN in particular? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/6/18 12:05 PM To: Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP (located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually added the IP address to has a floating IP, and oddly enough I am able to ping DHCP on the provider network, which suggests that DHCP may be working on other networks but not on my selfservice network. I was able to confirm this by creating a new virtual machine directly on the provider network, I was able to ping to it and SSH into it right off of the bat, as it obtained the proper address on its own. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains: fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8 fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7 fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12 fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10 fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3 fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14 fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1 fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2 fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4 fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100 fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13 fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18 I've done system restarts since the power outage and the agent hasn't corrected itself. I've restarted all neutron services as I've done things, I could also try stopping and starting dnsmasq. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiesc
Re: [Openstack-operators] [Openstack] Recovering from full outage
If I run `ip netns exec qrouter netstat -lnp` or `ip netns exec qdhcp netstat -lnp` on the controller, should I see anything listening on the metadata port (8775)? When I run these commands I don't see that listening, but I have no example of a working system to check against. Can anybody verify this? Thanks, Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/10/18 2:58 PM To: Cc: , Subject: Re: [Openstack] Recovering from full outage DHCP is working again so instances are getting their addresses. For some reason cloud-init isn't working correctly. Hostnames aren't getting set, and SSH key pair isn't getting set. The neutron-metadata service is in control of this? neutron-metadata-agent.log: 2018-07-10 08:01:42.046 5518 INFO eventlet.wsgi.server [-] 109.73.185.195, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0622332 2018-07-10 09:49:42.604 5518 INFO eventlet.wsgi.server [-] 197.149.85.150, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0645461 2018-07-10 10:52:50.845 5517 INFO eventlet.wsgi.server [-] 88.249.225.204, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0659041 2018-07-10 11:43:20.471 5518 INFO eventlet.wsgi.server [-] 143.208.186.168, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0618532 2018-07-10 11:53:15.574 5511 INFO eventlet.wsgi.server [-] 194.40.240.254, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0636070 2018-07-10 13:26:46.795 5518 INFO eventlet.wsgi.server [-] 109.73.177.149, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0611560 2018-07-10 13:27:38.795 5513 INFO eventlet.wsgi.server [-] 125.167.69.238, "GET / HTTP/1.0" status: 404 len: 195 time: 0.0631371 2018-07-10 13:30:49.551 5514 INFO eventlet.wsgi.server [-] 155.93.152.111, "GET / HTTP/1.0" status: 404 len: 195 time: 0.0609179 2018-07-10 14:12:42.008 5521 INFO eventlet.wsgi.server [-] 190.85.38.173, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0597739 No other log files show abnormal behavior. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/6/18 2:33 PM To: "lmihaie...@gmail.com" Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage I explored creating a second "selfservice" vxlan to see if DHCP would work on it as it does on my external "provider" network. The new vxlan network shares the same problems as the old vxlan network. Am I having problems with VXLAN in particular? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/6/18 12:05 PM To: Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP (located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually added the IP address to has a floating IP, and oddly enough I am able to ping DHCP on the provider network, which suggests that DHCP may be working on other networks but not on my selfservice network. I was able to confirm this by creating a new virtual machine directly on the provider network, I was able to ping to it and SSH into it right off of the bat, as it obtained the proper address on its own. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains: fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8 fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7 fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12 fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10 fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3 fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14 fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1 fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2 fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4 fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100 fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13 fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18 I've done system restarts since the power outage and the agent hasn't corrected itself. I've restarted all neutron services as I've done things, I could also try stopping and starting dnsmasq. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiesc
Re: [Openstack] Recovering from full outage
DHCP is working again so instances are getting their addresses. For some reason cloud-init isn't working correctly. Hostnames aren't getting set, and SSH key pair isn't getting set. The neutron-metadata service is in control of this? neutron-metadata-agent.log: 2018-07-10 08:01:42.046 5518 INFO eventlet.wsgi.server [-] 109.73.185.195, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0622332 2018-07-10 09:49:42.604 5518 INFO eventlet.wsgi.server [-] 197.149.85.150, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0645461 2018-07-10 10:52:50.845 5517 INFO eventlet.wsgi.server [-] 88.249.225.204, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0659041 2018-07-10 11:43:20.471 5518 INFO eventlet.wsgi.server [-] 143.208.186.168, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0618532 2018-07-10 11:53:15.574 5511 INFO eventlet.wsgi.server [-] 194.40.240.254, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0636070 2018-07-10 13:26:46.795 5518 INFO eventlet.wsgi.server [-] 109.73.177.149, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0611560 2018-07-10 13:27:38.795 5513 INFO eventlet.wsgi.server [-] 125.167.69.238, "GET / HTTP/1.0" status: 404 len: 195 time: 0.0631371 2018-07-10 13:30:49.551 5514 INFO eventlet.wsgi.server [-] 155.93.152.111, "GET / HTTP/1.0" status: 404 len: 195 time: 0.0609179 2018-07-10 14:12:42.008 5521 INFO eventlet.wsgi.server [-] 190.85.38.173, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0597739 No other log files show abnormal behavior. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/6/18 2:33 PM To: "lmihaie...@gmail.com" Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage I explored creating a second "selfservice" vxlan to see if DHCP would work on it as it does on my external "provider" network. The new vxlan network shares the same problems as the old vxlan network. Am I having problems with VXLAN in particular? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/6/18 12:05 PM To: Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP (located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually added the IP address to has a floating IP, and oddly enough I am able to ping DHCP on the provider network, which suggests that DHCP may be working on other networks but not on my selfservice network. I was able to confirm this by creating a new virtual machine directly on the provider network, I was able to ping to it and SSH into it right off of the bat, as it obtained the proper address on its own. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains: fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8 fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7 fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12 fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10 fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3 fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14 fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1 fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2 fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4 fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100 fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13 fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18 I've done system restarts since the power outage and the agent hasn't corrected itself. I've restarted all neutron services as I've done things, I could also try stopping and starting dnsmasq. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/6/18 11:15 AM To: torin.wolt...@granddial.com Cc: "openstack@lists.openstack.org" , "openstack-operat...@lists.openstack.org" , pgso...@gmail.com Subject: Re: [Openstack] Recovering from full outage Can you manually assign an IP address to a VM and once inside, ping the address of the dhcp server? That would confirm if there is connectivity at least. Also, on the controller node where the dhcp server for that network is, check the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and make sure there are entries corr
Re: [Openstack-operators] [Openstack] Recovering from full outage
DHCP is working again so instances are getting their addresses. For some reason cloud-init isn't working correctly. Hostnames aren't getting set, and SSH key pair isn't getting set. The neutron-metadata service is in control of this? neutron-metadata-agent.log: 2018-07-10 08:01:42.046 5518 INFO eventlet.wsgi.server [-] 109.73.185.195, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0622332 2018-07-10 09:49:42.604 5518 INFO eventlet.wsgi.server [-] 197.149.85.150, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0645461 2018-07-10 10:52:50.845 5517 INFO eventlet.wsgi.server [-] 88.249.225.204, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0659041 2018-07-10 11:43:20.471 5518 INFO eventlet.wsgi.server [-] 143.208.186.168, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0618532 2018-07-10 11:53:15.574 5511 INFO eventlet.wsgi.server [-] 194.40.240.254, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0636070 2018-07-10 13:26:46.795 5518 INFO eventlet.wsgi.server [-] 109.73.177.149, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0611560 2018-07-10 13:27:38.795 5513 INFO eventlet.wsgi.server [-] 125.167.69.238, "GET / HTTP/1.0" status: 404 len: 195 time: 0.0631371 2018-07-10 13:30:49.551 5514 INFO eventlet.wsgi.server [-] 155.93.152.111, "GET / HTTP/1.0" status: 404 len: 195 time: 0.0609179 2018-07-10 14:12:42.008 5521 INFO eventlet.wsgi.server [-] 190.85.38.173, "GET / HTTP/1.1" status: 404 len: 195 time: 0.0597739 No other log files show abnormal behavior. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/6/18 2:33 PM To: "lmihaie...@gmail.com" Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage I explored creating a second "selfservice" vxlan to see if DHCP would work on it as it does on my external "provider" network. The new vxlan network shares the same problems as the old vxlan network. Am I having problems with VXLAN in particular? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/6/18 12:05 PM To: Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP (located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually added the IP address to has a floating IP, and oddly enough I am able to ping DHCP on the provider network, which suggests that DHCP may be working on other networks but not on my selfservice network. I was able to confirm this by creating a new virtual machine directly on the provider network, I was able to ping to it and SSH into it right off of the bat, as it obtained the proper address on its own. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains: fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8 fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7 fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12 fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10 fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3 fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14 fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1 fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2 fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4 fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100 fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13 fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18 I've done system restarts since the power outage and the agent hasn't corrected itself. I've restarted all neutron services as I've done things, I could also try stopping and starting dnsmasq. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/6/18 11:15 AM To: torin.wolt...@granddial.com Cc: "openst...@lists.openstack.org" , "openstack-operators@lists.openstack.org" , pgso...@gmail.com Subject: Re: [Openstack] Recovering from full outage Can you manually assign an IP address to a VM and once inside, ping the address of the dhcp server? That would confirm if there is connectivity at least. Also, on the controller node where the dhcp server for that network is, check the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and make sure there are entries corr
Re: [Openstack] Recovering from full outage
I explored creating a second "selfservice" vxlan to see if DHCP would work on it as it does on my external "provider" network. The new vxlan network shares the same problems as the old vxlan network. Am I having problems with VXLAN in particular? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---- From: "Torin Woltjer" Sent: 7/6/18 12:05 PM To: Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP (located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually added the IP address to has a floating IP, and oddly enough I am able to ping DHCP on the provider network, which suggests that DHCP may be working on other networks but not on my selfservice network. I was able to confirm this by creating a new virtual machine directly on the provider network, I was able to ping to it and SSH into it right off of the bat, as it obtained the proper address on its own. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains: fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8 fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7 fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12 fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10 fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3 fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14 fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1 fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2 fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4 fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100 fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13 fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18 I've done system restarts since the power outage and the agent hasn't corrected itself. I've restarted all neutron services as I've done things, I could also try stopping and starting dnsmasq. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/6/18 11:15 AM To: torin.wolt...@granddial.com Cc: "openstack@lists.openstack.org" , "openstack-operat...@lists.openstack.org" , pgso...@gmail.com Subject: Re: [Openstack] Recovering from full outage Can you manually assign an IP address to a VM and once inside, ping the address of the dhcp server? That would confirm if there is connectivity at least. Also, on the controller node where the dhcp server for that network is, check the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and make sure there are entries corresponding to your instances. In my experience, if neutron is broken after working fine (so excluding any miss-configuration), then an agent is out-of-sync and restart usually fixes things. On Fri, Jul 6, 2018 at 9:38 AM, Torin Woltjer wrote: I have done tcpdumps on both the controllers and on a compute node. Controller: `ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i ns-83d68c76-b8 port 67` `tcpdump -vnes0 -i any port 67` Compute: `tcpdump -vnes0 -i brqd85c2a00-a6 port 68` For the first command on the controller, there are no packets captured at all. The second command on the controller captures packets, but they don't appear to be relevant to openstack. The dump from the compute node shows constant requests are getting sent by openstack instances. In summary; DHCP requests are being sent, but are never received. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/5/18 4:50 PM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage The cloud-init requires network connectivity by default in order to reach the metadata server for the hostname, ssh-key, etc You can configure cloud-init to use the config-drive, but the lack of network connectivity will make the instance useless anyway, even though it will have you ssh-key and hostname... Did you check the things I told you? On Jul 5, 2018, at 16:06, Torin Woltjer wrote: Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't working on my VMs. created a new instance from an ubuntu 18.04 image to test with, the hostname was not set to the name of the instance and could not login as users I had specified in the configuration. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com --
Re: [Openstack-operators] [Openstack] Recovering from full outage
I explored creating a second "selfservice" vxlan to see if DHCP would work on it as it does on my external "provider" network. The new vxlan network shares the same problems as the old vxlan network. Am I having problems with VXLAN in particular? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---- From: "Torin Woltjer" Sent: 7/6/18 12:05 PM To: Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP (located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually added the IP address to has a floating IP, and oddly enough I am able to ping DHCP on the provider network, which suggests that DHCP may be working on other networks but not on my selfservice network. I was able to confirm this by creating a new virtual machine directly on the provider network, I was able to ping to it and SSH into it right off of the bat, as it obtained the proper address on its own. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains: fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8 fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7 fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12 fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10 fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3 fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14 fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1 fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2 fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4 fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100 fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13 fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18 I've done system restarts since the power outage and the agent hasn't corrected itself. I've restarted all neutron services as I've done things, I could also try stopping and starting dnsmasq. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/6/18 11:15 AM To: torin.wolt...@granddial.com Cc: "openst...@lists.openstack.org" , "openstack-operators@lists.openstack.org" , pgso...@gmail.com Subject: Re: [Openstack] Recovering from full outage Can you manually assign an IP address to a VM and once inside, ping the address of the dhcp server? That would confirm if there is connectivity at least. Also, on the controller node where the dhcp server for that network is, check the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and make sure there are entries corresponding to your instances. In my experience, if neutron is broken after working fine (so excluding any miss-configuration), then an agent is out-of-sync and restart usually fixes things. On Fri, Jul 6, 2018 at 9:38 AM, Torin Woltjer wrote: I have done tcpdumps on both the controllers and on a compute node. Controller: `ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i ns-83d68c76-b8 port 67` `tcpdump -vnes0 -i any port 67` Compute: `tcpdump -vnes0 -i brqd85c2a00-a6 port 68` For the first command on the controller, there are no packets captured at all. The second command on the controller captures packets, but they don't appear to be relevant to openstack. The dump from the compute node shows constant requests are getting sent by openstack instances. In summary; DHCP requests are being sent, but are never received. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/5/18 4:50 PM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage The cloud-init requires network connectivity by default in order to reach the metadata server for the hostname, ssh-key, etc You can configure cloud-init to use the config-drive, but the lack of network connectivity will make the instance useless anyway, even though it will have you ssh-key and hostname... Did you check the things I told you? On Jul 5, 2018, at 16:06, Torin Woltjer wrote: Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't working on my VMs. created a new instance from an ubuntu 18.04 image to test with, the hostname was not set to the name of the instance and could not login as users I had specified in the configuration. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com --
Re: [Openstack-operators] [Openstack] Recovering from full outage
Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP (located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually added the IP address to has a floating IP, and oddly enough I am able to ping DHCP on the provider network, which suggests that DHCP may be working on other networks but not on my selfservice network. I was able to confirm this by creating a new virtual machine directly on the provider network, I was able to ping to it and SSH into it right off of the bat, as it obtained the proper address on its own. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains: fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8 fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7 fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12 fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10 fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3 fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14 fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1 fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2 fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4 fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100 fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13 fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18 I've done system restarts since the power outage and the agent hasn't corrected itself. I've restarted all neutron services as I've done things, I could also try stopping and starting dnsmasq. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/6/18 11:15 AM To: torin.wolt...@granddial.com Cc: "openst...@lists.openstack.org" , "openstack-operators@lists.openstack.org" , pgso...@gmail.com Subject: Re: [Openstack] Recovering from full outage Can you manually assign an IP address to a VM and once inside, ping the address of the dhcp server? That would confirm if there is connectivity at least. Also, on the controller node where the dhcp server for that network is, check the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and make sure there are entries corresponding to your instances. In my experience, if neutron is broken after working fine (so excluding any miss-configuration), then an agent is out-of-sync and restart usually fixes things. On Fri, Jul 6, 2018 at 9:38 AM, Torin Woltjer wrote: I have done tcpdumps on both the controllers and on a compute node. Controller: `ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i ns-83d68c76-b8 port 67` `tcpdump -vnes0 -i any port 67` Compute: `tcpdump -vnes0 -i brqd85c2a00-a6 port 68` For the first command on the controller, there are no packets captured at all. The second command on the controller captures packets, but they don't appear to be relevant to openstack. The dump from the compute node shows constant requests are getting sent by openstack instances. In summary; DHCP requests are being sent, but are never received. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/5/18 4:50 PM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage The cloud-init requires network connectivity by default in order to reach the metadata server for the hostname, ssh-key, etc You can configure cloud-init to use the config-drive, but the lack of network connectivity will make the instance useless anyway, even though it will have you ssh-key and hostname... Did you check the things I told you? On Jul 5, 2018, at 16:06, Torin Woltjer wrote: Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't working on my VMs. created a new instance from an ubuntu 18.04 image to test with, the hostname was not set to the name of the instance and could not login as users I had specified in the configuration. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/5/18 12:57 PM To: torin.wolt...@granddial.com Cc: "openst...@lists.openstack.org" , "openstack-operators@lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage You should tcpdump inside the qdhcp namespace to see if the requests make it there, and also check iptables rules on the compute nodes for the return traffic. On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer wrote: Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually come up with no addresses. neutron-dhcp-agent has been
Re: [Openstack] Recovering from full outage
Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP (located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually added the IP address to has a floating IP, and oddly enough I am able to ping DHCP on the provider network, which suggests that DHCP may be working on other networks but not on my selfservice network. I was able to confirm this by creating a new virtual machine directly on the provider network, I was able to ping to it and SSH into it right off of the bat, as it obtained the proper address on its own. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains: fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8 fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7 fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12 fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10 fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3 fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14 fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1 fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2 fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4 fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100 fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13 fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18 I've done system restarts since the power outage and the agent hasn't corrected itself. I've restarted all neutron services as I've done things, I could also try stopping and starting dnsmasq. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/6/18 11:15 AM To: torin.wolt...@granddial.com Cc: "openstack@lists.openstack.org" , "openstack-operat...@lists.openstack.org" , pgso...@gmail.com Subject: Re: [Openstack] Recovering from full outage Can you manually assign an IP address to a VM and once inside, ping the address of the dhcp server? That would confirm if there is connectivity at least. Also, on the controller node where the dhcp server for that network is, check the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and make sure there are entries corresponding to your instances. In my experience, if neutron is broken after working fine (so excluding any miss-configuration), then an agent is out-of-sync and restart usually fixes things. On Fri, Jul 6, 2018 at 9:38 AM, Torin Woltjer wrote: I have done tcpdumps on both the controllers and on a compute node. Controller: `ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i ns-83d68c76-b8 port 67` `tcpdump -vnes0 -i any port 67` Compute: `tcpdump -vnes0 -i brqd85c2a00-a6 port 68` For the first command on the controller, there are no packets captured at all. The second command on the controller captures packets, but they don't appear to be relevant to openstack. The dump from the compute node shows constant requests are getting sent by openstack instances. In summary; DHCP requests are being sent, but are never received. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/5/18 4:50 PM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage The cloud-init requires network connectivity by default in order to reach the metadata server for the hostname, ssh-key, etc You can configure cloud-init to use the config-drive, but the lack of network connectivity will make the instance useless anyway, even though it will have you ssh-key and hostname... Did you check the things I told you? On Jul 5, 2018, at 16:06, Torin Woltjer wrote: Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't working on my VMs. created a new instance from an ubuntu 18.04 image to test with, the hostname was not set to the name of the instance and could not login as users I had specified in the configuration. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/5/18 12:57 PM To: torin.wolt...@granddial.com Cc: "openstack@lists.openstack.org" , "openstack-operat...@lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage You should tcpdump inside the qdhcp namespace to see if the requests make it there, and also check iptables rules on the compute nodes for the return traffic. On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer wrote: Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually come up with no addresses. neutron-dhcp-agent has been
Re: [Openstack] Recovering from full outage
I have done tcpdumps on both the controllers and on a compute node. Controller: `ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i ns-83d68c76-b8 port 67` `tcpdump -vnes0 -i any port 67` Compute: `tcpdump -vnes0 -i brqd85c2a00-a6 port 68` For the first command on the controller, there are no packets captured at all. The second command on the controller captures packets, but they don't appear to be relevant to openstack. The dump from the compute node shows constant requests are getting sent by openstack instances. In summary; DHCP requests are being sent, but are never received. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/5/18 4:50 PM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage The cloud-init requires network connectivity by default in order to reach the metadata server for the hostname, ssh-key, etc You can configure cloud-init to use the config-drive, but the lack of network connectivity will make the instance useless anyway, even though it will have you ssh-key and hostname... Did you check the things I told you? On Jul 5, 2018, at 16:06, Torin Woltjer wrote: Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't working on my VMs. created a new instance from an ubuntu 18.04 image to test with, the hostname was not set to the name of the instance and could not login as users I had specified in the configuration. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/5/18 12:57 PM To: torin.wolt...@granddial.com Cc: "openstack@lists.openstack.org" , "openstack-operat...@lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage You should tcpdump inside the qdhcp namespace to see if the requests make it there, and also check iptables rules on the compute nodes for the return traffic. On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer wrote: Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually come up with no addresses. neutron-dhcp-agent has been restarted on both controllers. The qdhcp netns's were all present; I stopped the service, removed the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, restarted all neutron services, noted the qdhcp netns's were recreated, restarted a VM again and it still fails to pull an IP address. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/5/18 10:38 AM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage Did you restart the neutron-dhcp-agent and rebooted the VMs? On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer wrote: The qrouter netns appears once the lock_path is specified, the neutron router is pingable as well. However, instances are not pingable. If I log in via console, the instances have not been given IP addresses, if I manually give them an address and route they are pingable and seem to work. So the router is working correctly but dhcp is not working. No errors in any of the neutron or nova logs on controllers or compute nodes. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---- From: "Torin Woltjer" Sent: 7/5/18 8:53 AM To: Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.wolt...@granddial.com Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com -------- From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operat...@lists.openstack.org" , "openstack@lists.openstack.org" Subject: Re: [Openstack] Recovering from full
Re: [Openstack-operators] [Openstack] Recovering from full outage
I have done tcpdumps on both the controllers and on a compute node. Controller: `ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i ns-83d68c76-b8 port 67` `tcpdump -vnes0 -i any port 67` Compute: `tcpdump -vnes0 -i brqd85c2a00-a6 port 68` For the first command on the controller, there are no packets captured at all. The second command on the controller captures packets, but they don't appear to be relevant to openstack. The dump from the compute node shows constant requests are getting sent by openstack instances. In summary; DHCP requests are being sent, but are never received. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/5/18 4:50 PM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage The cloud-init requires network connectivity by default in order to reach the metadata server for the hostname, ssh-key, etc You can configure cloud-init to use the config-drive, but the lack of network connectivity will make the instance useless anyway, even though it will have you ssh-key and hostname... Did you check the things I told you? On Jul 5, 2018, at 16:06, Torin Woltjer wrote: Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't working on my VMs. created a new instance from an ubuntu 18.04 image to test with, the hostname was not set to the name of the instance and could not login as users I had specified in the configuration. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/5/18 12:57 PM To: torin.wolt...@granddial.com Cc: "openst...@lists.openstack.org" , "openstack-operators@lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage You should tcpdump inside the qdhcp namespace to see if the requests make it there, and also check iptables rules on the compute nodes for the return traffic. On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer wrote: Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually come up with no addresses. neutron-dhcp-agent has been restarted on both controllers. The qdhcp netns's were all present; I stopped the service, removed the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, restarted all neutron services, noted the qdhcp netns's were recreated, restarted a VM again and it still fails to pull an IP address. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/5/18 10:38 AM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage Did you restart the neutron-dhcp-agent and rebooted the VMs? On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer wrote: The qrouter netns appears once the lock_path is specified, the neutron router is pingable as well. However, instances are not pingable. If I log in via console, the instances have not been given IP addresses, if I manually give them an address and route they are pingable and seem to work. So the router is working correctly but dhcp is not working. No errors in any of the neutron or nova logs on controllers or compute nodes. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---- From: "Torin Woltjer" Sent: 7/5/18 8:53 AM To: Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.wolt...@granddial.com Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com -------- From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operators@lists.openstack.org" , "openst...@lists.openstack.org" Subject: Re: [Openstack] Recovering from full
Re: [Openstack-operators] [Openstack] Recovering from full outage
Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't working on my VMs. created a new instance from an ubuntu 18.04 image to test with, the hostname was not set to the name of the instance and could not login as users I had specified in the configuration. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/5/18 12:57 PM To: torin.wolt...@granddial.com Cc: "openst...@lists.openstack.org" , "openstack-operators@lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage You should tcpdump inside the qdhcp namespace to see if the requests make it there, and also check iptables rules on the compute nodes for the return traffic. On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer wrote: Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually come up with no addresses. neutron-dhcp-agent has been restarted on both controllers. The qdhcp netns's were all present; I stopped the service, removed the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, restarted all neutron services, noted the qdhcp netns's were recreated, restarted a VM again and it still fails to pull an IP address. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/5/18 10:38 AM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage Did you restart the neutron-dhcp-agent and rebooted the VMs? On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer wrote: The qrouter netns appears once the lock_path is specified, the neutron router is pingable as well. However, instances are not pingable. If I log in via console, the instances have not been given IP addresses, if I manually give them an address and route they are pingable and seem to work. So the router is working correctly but dhcp is not working. No errors in any of the neutron or nova logs on controllers or compute nodes. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---- From: "Torin Woltjer" Sent: 7/5/18 8:53 AM To: Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.wolt...@granddial.com Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com -------- From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operators@lists.openstack.org" , "openst...@lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all`
Re: [Openstack] Recovering from full outage
Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't working on my VMs. created a new instance from an ubuntu 18.04 image to test with, the hostname was not set to the name of the instance and could not login as users I had specified in the configuration. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/5/18 12:57 PM To: torin.wolt...@granddial.com Cc: "openstack@lists.openstack.org" , "openstack-operat...@lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage You should tcpdump inside the qdhcp namespace to see if the requests make it there, and also check iptables rules on the compute nodes for the return traffic. On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer wrote: Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually come up with no addresses. neutron-dhcp-agent has been restarted on both controllers. The qdhcp netns's were all present; I stopped the service, removed the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, restarted all neutron services, noted the qdhcp netns's were recreated, restarted a VM again and it still fails to pull an IP address. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/5/18 10:38 AM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage Did you restart the neutron-dhcp-agent and rebooted the VMs? On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer wrote: The qrouter netns appears once the lock_path is specified, the neutron router is pingable as well. However, instances are not pingable. If I log in via console, the instances have not been given IP addresses, if I manually give them an address and route they are pingable and seem to work. So the router is working correctly but dhcp is not working. No errors in any of the neutron or nova logs on controllers or compute nodes. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---- From: "Torin Woltjer" Sent: 7/5/18 8:53 AM To: Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.wolt...@granddial.com Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com -------- From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operat...@lists.openstack.org" , "openstack@lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all`
Re: [Openstack] Recovering from full outage
Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually come up with no addresses. neutron-dhcp-agent has been restarted on both controllers. The qdhcp netns's were all present; I stopped the service, removed the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, restarted all neutron services, noted the qdhcp netns's were recreated, restarted a VM again and it still fails to pull an IP address. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/5/18 10:38 AM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage Did you restart the neutron-dhcp-agent and rebooted the VMs? On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer wrote: The qrouter netns appears once the lock_path is specified, the neutron router is pingable as well. However, instances are not pingable. If I log in via console, the instances have not been given IP addresses, if I manually give them an address and route they are pingable and seem to work. So the router is working correctly but dhcp is not working. No errors in any of the neutron or nova logs on controllers or compute nodes. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/5/18 8:53 AM To: Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.wolt...@granddial.com Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---- From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operat...@lists.openstack.org" , "openstack@lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/ope
Re: [Openstack-operators] [Openstack] Recovering from full outage
Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually come up with no addresses. neutron-dhcp-agent has been restarted on both controllers. The qdhcp netns's were all present; I stopped the service, removed the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, restarted all neutron services, noted the qdhcp netns's were recreated, restarted a VM again and it still fails to pull an IP address. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/5/18 10:38 AM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage Did you restart the neutron-dhcp-agent and rebooted the VMs? On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer wrote: The qrouter netns appears once the lock_path is specified, the neutron router is pingable as well. However, instances are not pingable. If I log in via console, the instances have not been given IP addresses, if I manually give them an address and route they are pingable and seem to work. So the router is working correctly but dhcp is not working. No errors in any of the neutron or nova logs on controllers or compute nodes. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/5/18 8:53 AM To: Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.wolt...@granddial.com Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---- From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operators@lists.openstack.org" , "openst...@lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openst...@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org h
Re: [Openstack-operators] [Openstack] Recovering from full outage
The qrouter netns appears once the lock_path is specified, the neutron router is pingable as well. However, instances are not pingable. If I log in via console, the instances have not been given IP addresses, if I manually give them an address and route they are pingable and seem to work. So the router is working correctly but dhcp is not working. No errors in any of the neutron or nova logs on controllers or compute nodes. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/5/18 8:53 AM To: Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.wolt...@granddial.com Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---- From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operators@lists.openstack.org" , "openst...@lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openst...@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack] Recovering from full outage
The qrouter netns appears once the lock_path is specified, the neutron router is pingable as well. However, instances are not pingable. If I log in via console, the instances have not been given IP addresses, if I manually give them an address and route they are pingable and seem to work. So the router is working correctly but dhcp is not working. No errors in any of the neutron or nova logs on controllers or compute nodes. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/5/18 8:53 AM To: Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.wolt...@granddial.com Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ---- From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operat...@lists.openstack.org" , "openstack@lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] Recovering from full outage
There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.wolt...@granddial.com Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operat...@lists.openstack.org" , "openstack@lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack-operators] [Openstack] Recovering from full outage
There is no lock path set in my neutron configuration. Does it ultimately matter what it is set to as long as it is consistent? Does it need to be set on compute nodes as well as controllers? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 7:47 PM To: torin.wolt...@granddial.com Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org Subject: Re: [Openstack] Recovering from full outage Did you set a lock_path in the neutron’s config? On Jul 3, 2018, at 17:34, Torin Woltjer wrote: The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operators@lists.openstack.org" , "openst...@lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openst...@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack] Recovering from full outage
The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operat...@lists.openstack.org" , "openstack@lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack-operators] [Openstack] Recovering from full outage
The following errors appear in the neutron-linuxbridge-agent.log on both controllers: http://paste.openstack.org/show/724930/ No such errors are on the compute nodes themselves. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/3/18 5:14 PM To: Cc: "openstack-operators@lists.openstack.org" , "openst...@lists.openstack.org" Subject: Re: [Openstack] Recovering from full outage Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openst...@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be stuck in a rebooting status. Most notable of the logs is neutron-server.log which shows the following: http://paste.openstack.org/show/724917/ I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted controllers, and all of the agents show online. http://paste.openstack.org/show/724921/ And all of the instances can be properly started, however I cannot ping any of the instances floating IPs or the neutron router. And when logging into an instance with the console, there is no IP address on any interface. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: George Mihaiescu Sent: 7/3/18 11:50 AM To: torin.wolt...@granddial.com Subject: Re: [Openstack] Recovering from full outage Try restarting them using "openstack server reboot" and also check the nova-compute.log and neutron agents logs on the compute nodes. On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer wrote: We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] Recovering from full outage
We just suffered a power outage in out data center and I'm having trouble recovering the Openstack cluster. All of the nodes are back online, every instance shows active but `virsh list --all` on the compute nodes show that all of the VMs are actually shut down. Running `ip addr` on any of the nodes shows that none of the bridges are present and `ip netns` shows that all of the network namespaces are missing as well. So despite all of the neutron service running, none of the networking appears to be active, which is concerning. How do I solve this without recreating all of the networks? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] masakari client (cannot list/add segment)
Installing it with tox instead of pip seems to have precisely the same effect. Is there a config file for the masakari client that I am not aware of? Nothing seems to be provided with it, and documentation is nonexistant. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/2/18 11:45 AM To: Subject: Re: [Openstack] masakari client (cannot list/add segment) Running the command with the -d debug option provides this python traceback: Traceback (most recent call last): File "/usr/local/bin/masakari", line 11, in sys.exit(main()) File "/usr/local/lib/python2.7/dist-packages/masakariclient/shell.py", line 189, in main MasakariShell().main(args) File "/usr/local/lib/python2.7/dist-packages/masakariclient/shell.py", line 160, in main sc = self._setup_masakari_client(api_ver, args) File "/usr/local/lib/python2.7/dist-packages/masakariclient/shell.py", line 116, in _setup_masakari_client return masakari_client.Client(api_ver, user_agent=USER_AGENT, **kwargs) File "/usr/local/lib/python2.7/dist-packages/masakariclient/client.py", line 28, in Client return cls(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/masakariclient/v1/client.py", line 22, in __init__ prof=prof, user_agent=user_agent, **kwargs) File "/usr/local/lib/python2.7/dist-packages/masakariclient/sdk/ha/connection.py", line 48, in create_connection raise e AttributeError: 'NoneType' object has no attribute 'auth_url' Specifying --os-auth-url http://controller:5000 doesn't change this. Is python-masakariclient incorrectly installed? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/2/18 8:43 AM To: "openstack@lists.openstack.org" Subject: masakari client (cannot list/add segment) Installed masakari 4.0.0 on queens. Hostmonitor, instancemonitor, and processmonitor all running on compute nodes. API and engine running on controller nodes. I've tried using the masakari client to list/add segments, any of those commands does nothing and returns: ("'NoneType' object has no attribute 'auth_url'", ', mode 'w' at 0x7f26bb4b71e0>) I cannot find any log file for the masakari client to troubleshoot this further. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] masakari client (cannot list/add segment)
Running the command with the -d debug option provides this python traceback: Traceback (most recent call last): File "/usr/local/bin/masakari", line 11, in sys.exit(main()) File "/usr/local/lib/python2.7/dist-packages/masakariclient/shell.py", line 189, in main MasakariShell().main(args) File "/usr/local/lib/python2.7/dist-packages/masakariclient/shell.py", line 160, in main sc = self._setup_masakari_client(api_ver, args) File "/usr/local/lib/python2.7/dist-packages/masakariclient/shell.py", line 116, in _setup_masakari_client return masakari_client.Client(api_ver, user_agent=USER_AGENT, **kwargs) File "/usr/local/lib/python2.7/dist-packages/masakariclient/client.py", line 28, in Client return cls(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/masakariclient/v1/client.py", line 22, in __init__ prof=prof, user_agent=user_agent, **kwargs) File "/usr/local/lib/python2.7/dist-packages/masakariclient/sdk/ha/connection.py", line 48, in create_connection raise e AttributeError: 'NoneType' object has no attribute 'auth_url' Specifying --os-auth-url http://controller:5000 doesn't change this. Is python-masakariclient incorrectly installed? Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 7/2/18 8:43 AM To: "openstack@lists.openstack.org" Subject: masakari client (cannot list/add segment) Installed masakari 4.0.0 on queens. Hostmonitor, instancemonitor, and processmonitor all running on compute nodes. API and engine running on controller nodes. I've tried using the masakari client to list/add segments, any of those commands does nothing and returns: ("'NoneType' object has no attribute 'auth_url'", ', mode 'w' at 0x7f26bb4b71e0>) I cannot find any log file for the masakari client to troubleshoot this further. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] flavor metadata
I would recommend using availability zones for this. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: Satish Patel Sent: 7/1/18 9:56 AM To: openstack Subject: [Openstack] flavor metadata Folks, Recently we build openstack for production and i have question related flavor metadata. I have 3 kind of servers 8 core / 32 core / 40 core servers, now i want to tell my openstack my one of specific application always go to 32 core machine, how do i tell that to flavor metadata? Or should i use availability zone option and create two group? ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] DNS integration
Have a look at Designate: https://wiki.openstack.org/wiki/Designate It has support for powerDNS, and sounds like what you're looking for. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: Satish Patel Sent: 7/1/18 10:27 AM To: openstack Subject: [Openstack] DNS integration Folks, Is there a way to tell openstack when you launch instance add them in external DNS using some kind of api call? We are using extarnal pDSN (power DNS) and wants my VM get register itself as soon as we launch them, is it possible by neutron or we should use cloud-init? ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] masakari client (cannot list/add segment)
Installed masakari 4.0.0 on queens. Hostmonitor, instancemonitor, and processmonitor all running on compute nodes. API and engine running on controller nodes. I've tried using the masakari client to list/add segments, any of those commands does nothing and returns: ("'NoneType' object has no attribute 'auth_url'", ', mode 'w' at 0x7f26bb4b71e0>) I cannot find any log file for the masakari client to troubleshoot this further. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] (no subject)
Installed masakari 4.0.0 on queens. Hostmonitor, instancemonitor, and processmonitor all running on compute nodes. API and engine running on controller nodes. I've tried using the masakari client to list/add segments, any of those commands does nothing and returns: ("'NoneType' object has no attribute 'auth_url'", ', mode 'w' at 0x7f26bb4b71e0>) Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] Masakari on queens
The wrong address was specified in the corosync configuration. Corrected that and now it runs without error. The important part here was the -c 1 switch of tcpdump. Timeout was being reached before a single packet was captured on tcpdump ( because the configuration of corosync was incorrect ). Once timeout was reached it was producing an exit code 124, which triggered the exception in the host_handler. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Torin Woltjer" Sent: 6/22/18 2:17 PM To: "tushar.pa...@nttdata.com" Subject: Re: Masakari on queens Oddly enough, I never made changes to the original code to get that output. It is just masakari-monitor 4.0.0 as installed by pip. Here are the changes and output to that code snippit you sent: http://paste.openstack.org/show/723924/ I'd like to increase the logging, but I'm not familiar with the codebase and lack more than a rudimentary knowledge of python. I've found where it seems pip installed the files for masakari-hostmonitor, but I don't know which one contains the corosync bit. From: "Patil, Tushar" Sent: 6/20/18 12:51 AM To: "torin.wolt...@granddial.com" Subject: Re: Masakari on queens Hi Torin, Option -i is correct. It seems that you have modified code to log error message: "ProcessExecutionError: Unexpected error while running command." Could you please log 'stderr' and 'exit_code' as well in order to know the exact error you are getting? I suspect you must be getting 124 exit code. This is a small program which I have created to simulate the error you are getting. http://paste.openstack.org/show/723882/ Please specify interface and port as per your configuration and run the program. Regards, Tushar Patil ________ From: Torin Woltjer Sent: Tuesday, June 19, 2018 9:58:32 PM To: Patil, Tushar Subject: Re: Masakari on queens Thank for the reply. Tushar Patil. The command: $timeout 5 tcpdump -n -c 1 -p -I vlan60 port 5405 returns: "tcpdump: enp2s0f0: That device doesn't support monitor mode" The command: (lowercase i) $timeout 5 tcpdump -n -c 1 -p -i vlan60 port 5405 Runs fine with no errors: "tcpdump: listening on vlan60, link-type EN10MB (Ethernet), capture size 262144 bytes" The in use interfaces on all of my nodes are as follows: enp2s0f0=192.168.114.x enp3s0f0=bond0=vlan60,vlan101 enp3s0f1=bond0=vlan60,vlan101 vlan60=management vlan101=provider >From this part of handle_host.py I can't tell what is causing the command to >raise exception. From: "Patil, Tushar" Sent: 6/18/18 9:10 PM To: "openstack@lists.openstack.org" , "torin.wolt...@granddial.com" Subject: Re: Masakari on queens Hi Torin, Looking at the code, it seems it is trying to run below command as root user. timeout tcpdump -n -c 1 -p -I port where, tcpdump_timeout -> CONF.host.tcpdump_timeout -> default value is 5 seconds multicast_interface -> corosync_multicast_interface -> vlan60 multicast_ports-> corosync_multicast_ports -> 5405 Unfortunately, the error message is suppressed [1] hence it's difficult to know the exact reason. Can you please run below command on the host where you are running masakari-hostmonitor service? The error you would get after running this command would give you some hint to troubleshoot this issue further. $timeout 5 tcpdump -n -c 1 -p -I vlan60 port 5405 [1] : https://github.com/openstack/masakari-monitors/blob/cde057bc685b7bbc35f5c425f9690b01766654b2/masakarimonitors/hostmonitor/host_handler/handle_host.py#L121 Regards, Tushar Patil From: Torin Woltjer Sent: Tuesday, June 19, 2018 4:01:29 AM To: Patil, Tushar; openstack@lists.openstack.org Subject: Masakari on queens Hello Tushar Patil, I have upgraded to Openstack Queens and am trying to run Masakari version 4.0.0 . I'm curious what additional configuration is required to get this set up correctly. /etc/masakarimonitors/masakarimonitors.conf http://paste.openstack.org/show/723726/ masakari-hostmonitor is giving me errors like: 2018-06-18 12:44:44.812 18236 ERROR masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication is failed.: ProcessExecutionError: Unexpected error while running command. 2018-06-18 12:45:14.895 18236 INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'UBNTU-OSTACK-COMPUTE2' is 'online'. 2018-06-18 12:46:20.047 18236 WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication using 'vlan60' is failed.: ProcessExecutionError: Unexpected error while running command. Do you have any knowledge on this? Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of
Re: [Openstack] Masakari on queens
Oddly enough, I never made changes to the original code to get that output. It is just masakari-monitor 4.0.0 as installed by pip. Here are the changes and output to that code snippit you sent: http://paste.openstack.org/show/723924/ I'd like to increase the logging, but I'm not familiar with the codebase and lack more than a rudimentary knowledge of python. I've found where it seems pip installed the files for masakari-hostmonitor, but I don't know which one contains the corosync bit. From: "Patil, Tushar" Sent: 6/20/18 12:51 AM To: "torin.wolt...@granddial.com" Subject: Re: Masakari on queens Hi Torin, Option -i is correct. It seems that you have modified code to log error message: "ProcessExecutionError: Unexpected error while running command." Could you please log 'stderr' and 'exit_code' as well in order to know the exact error you are getting? I suspect you must be getting 124 exit code. This is a small program which I have created to simulate the error you are getting. http://paste.openstack.org/show/723882/ Please specify interface and port as per your configuration and run the program. Regards, Tushar Patil ________ From: Torin Woltjer Sent: Tuesday, June 19, 2018 9:58:32 PM To: Patil, Tushar Subject: Re: Masakari on queens Thank for the reply. Tushar Patil. The command: $timeout 5 tcpdump -n -c 1 -p -I vlan60 port 5405 returns: "tcpdump: enp2s0f0: That device doesn't support monitor mode" The command: (lowercase i) $timeout 5 tcpdump -n -c 1 -p -i vlan60 port 5405 Runs fine with no errors: "tcpdump: listening on vlan60, link-type EN10MB (Ethernet), capture size 262144 bytes" The in use interfaces on all of my nodes are as follows: enp2s0f0=192.168.114.x enp3s0f0=bond0=vlan60,vlan101 enp3s0f1=bond0=vlan60,vlan101 vlan60=management vlan101=provider >From this part of handle_host.py I can't tell what is causing the command to >raise exception. From: "Patil, Tushar" Sent: 6/18/18 9:10 PM To: "openstack@lists.openstack.org" , "torin.wolt...@granddial.com" Subject: Re: Masakari on queens Hi Torin, Looking at the code, it seems it is trying to run below command as root user. timeout tcpdump -n -c 1 -p -I port where, tcpdump_timeout -> CONF.host.tcpdump_timeout -> default value is 5 seconds multicast_interface -> corosync_multicast_interface -> vlan60 multicast_ports-> corosync_multicast_ports -> 5405 Unfortunately, the error message is suppressed [1] hence it's difficult to know the exact reason. Can you please run below command on the host where you are running masakari-hostmonitor service? The error you would get after running this command would give you some hint to troubleshoot this issue further. $timeout 5 tcpdump -n -c 1 -p -I vlan60 port 5405 [1] : https://github.com/openstack/masakari-monitors/blob/cde057bc685b7bbc35f5c425f9690b01766654b2/masakarimonitors/hostmonitor/host_handler/handle_host.py#L121 Regards, Tushar Patil From: Torin Woltjer Sent: Tuesday, June 19, 2018 4:01:29 AM To: Patil, Tushar; openstack@lists.openstack.org Subject: Masakari on queens Hello Tushar Patil, I have upgraded to Openstack Queens and am trying to run Masakari version 4.0.0 . I'm curious what additional configuration is required to get this set up correctly. /etc/masakarimonitors/masakarimonitors.conf http://paste.openstack.org/show/723726/ masakari-hostmonitor is giving me errors like: 2018-06-18 12:44:44.812 18236 ERROR masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication is failed.: ProcessExecutionError: Unexpected error while running command. 2018-06-18 12:45:14.895 18236 INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'UBNTU-OSTACK-COMPUTE2' is 'online'. 2018-06-18 12:46:20.047 18236 WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication using 'vlan60' is failed.: ProcessExecutionError: Unexpected error while running command. Do you have any knowledge on this? Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged,confidential, and proprietary data. If you are not the intended recipient,please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged,confidential, and proprietary data. If you are not the intended recipient,please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding.
Re: [Openstack] Masakari on queens
Thank for the reply. Tushar Patil. The command: $timeout 5 tcpdump -n -c 1 -p -I vlan60 port 5405 returns: "tcpdump: enp2s0f0: That device doesn't support monitor mode" The command: (lowercase i) $timeout 5 tcpdump -n -c 1 -p -i vlan60 port 5405 Runs fine with no errors: "tcpdump: listening on vlan60, link-type EN10MB (Ethernet), capture size 262144 bytes" The in use interfaces on all of my nodes are as follows: enp2s0f0=192.168.114.x enp3s0f0=bond0=vlan60,vlan101 enp3s0f1=bond0=vlan60,vlan101 vlan60=management vlan101=provider >From this part of handle_host.py I can't tell what is causing the command to >raise exception. From: "Patil, Tushar" Sent: 6/18/18 9:10 PM To: "openstack@lists.openstack.org" , "torin.wolt...@granddial.com" Subject: Re: Masakari on queens Hi Torin, Looking at the code, it seems it is trying to run below command as root user. timeout tcpdump -n -c 1 -p -I port where, tcpdump_timeout -> CONF.host.tcpdump_timeout -> default value is 5 seconds multicast_interface -> corosync_multicast_interface -> vlan60 multicast_ports-> corosync_multicast_ports -> 5405 Unfortunately, the error message is suppressed [1] hence it's difficult to know the exact reason. Can you please run below command on the host where you are running masakari-hostmonitor service? The error you would get after running this command would give you some hint to troubleshoot this issue further. $timeout 5 tcpdump -n -c 1 -p -I vlan60 port 5405 [1] : https://github.com/openstack/masakari-monitors/blob/cde057bc685b7bbc35f5c425f9690b01766654b2/masakarimonitors/hostmonitor/host_handler/handle_host.py#L121 Regards, Tushar Patil From: Torin Woltjer Sent: Tuesday, June 19, 2018 4:01:29 AM To: Patil, Tushar; openstack@lists.openstack.org Subject: Masakari on queens Hello Tushar Patil, I have upgraded to Openstack Queens and am trying to run Masakari version 4.0.0 . I'm curious what additional configuration is required to get this set up correctly. /etc/masakarimonitors/masakarimonitors.conf http://paste.openstack.org/show/723726/ masakari-hostmonitor is giving me errors like: 2018-06-18 12:44:44.812 18236 ERROR masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication is failed.: ProcessExecutionError: Unexpected error while running command. 2018-06-18 12:45:14.895 18236 INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'UBNTU-OSTACK-COMPUTE2' is 'online'. 2018-06-18 12:46:20.047 18236 WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication using 'vlan60' is failed.: ProcessExecutionError: Unexpected error while running command. Do you have any knowledge on this? Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged,confidential, and proprietary data. If you are not the intended recipient,please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] Masakari on queens
Hello Tushar Patil, I have upgraded to Openstack Queens and am trying to run Masakari version 4.0.0 . I'm curious what additional configuration is required to get this set up correctly. /etc/masakarimonitors/masakarimonitors.conf http://paste.openstack.org/show/723726/ masakari-hostmonitor is giving me errors like: 2018-06-18 12:44:44.812 18236 ERROR masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication is failed.: ProcessExecutionError: Unexpected error while running command. 2018-06-18 12:45:14.895 18236 INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'UBNTU-OSTACK-COMPUTE2' is 'online'. 2018-06-18 12:46:20.047 18236 WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication using 'vlan60' is failed.: ProcessExecutionError: Unexpected error while running command. Do you have any knowledge on this? ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] Masakari Setup
Currently trying to run masakari 4.0.0 on openstack queens. I have corosync + pacemaker running on compute nodes, crm status shows both running. When I run masakari-hostmonitor, I see 2 errors that are repeated while running. 2018-06-14 09:48:58.475 11062 WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication using 'vlan60' is failed.: ProcessExecutionError: Unexpected error while running command. 2018-06-14 09:48:58.476 11062 ERROR masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication is failed.: ProcessExecutionError: Unexpected error while running command. This is my /etc/masakarimonitors/masakarimonitors.conf [DEFAULT] [api] auth_uri = http://controller:5000 auth_url = http://controller:5000 memcached_servers = controller1:11211,controller2:11211 auth_type = password project_domain_name = default user_domain_name = default project_name = service username = masakari password = ** [host] corosync_multicast_interfaces = vlan60 corosync_multicast_ports = 5405 ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] Openstack Dashboard console iframe
When using haproxy for the openstack dashboard the iframe for the instance console stops working. The console does work without the iframe, if I open it in its own window. nova.conf on controller: my_ip = 192.168.116.21 [vnc] enabled = true server_listen = $my_ip server_proxyclient_address = $my_ip novncproxy_host = $my_ip nova.conf on compute: my_ip = 192.168.116.23 [vnc] enabled = True server_listen = 0.0.0.0 server_proxyclient_address = $my_ip novncproxy_base_url = http://controller:6080/vnc_auto.html controller in the host file resolves to 192.168.116.16, the VIP on HAProxy. Is there something wrong with this configuration? Does anybody else have this problem with the console iframe when using HAProxy? ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] Cinder Queens installdoc wrong
I've upgraded from pike to queens, and the keystone admin port 35357 has been deprecated in favor of 5000 it seems. However, the documentation for the installation of cinder still uses that port in [keystone_authtoken]. What is the correct entry for this line? auth_url = http://controller:5000 I imagine. https://docs.openstack.org/cinder/queens/install/cinder-controller-install-ubuntu.html ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] Masakari client error
Hi Tushar, Thanks for linking to that document, I hadn't seen it before and it's very useful. As far as milestones are concerned, I was planning on sticking with Pike. Up until this point I've been using the packages from http://ubuntu-cloud.archive.canonical.com/ubuntu xenial-updates/pike main, when I installed the latest packages from pip, I was ignorant of what I would doing and what would happen. I may switch to queens or rocky, but I would like to upgrade to the latest Ubuntu LTS if I am to do that (bionic only has a repo for rocky but not queens I believe). From: "Patil, Tushar" <tushar.pa...@nttdata.com> Sent: 5/16/18 9:03 PM To: "fu...@yuggoth.org" <fu...@yuggoth.org>, "openstack@lists.openstack.org" <openstack@lists.openstack.org>, "torin.wolt...@granddial.com" <torin.wolt...@granddial.com> Subject: Re: [Openstack] Masakari client error Hi Torin, If you are using stable/pike, then it is recommended to use python-masakariclient version 3.0.1 [1] which requires openstacksdk version 0.9.17. Are you trying to upgrade your stable/pike environment to the latest rocky-milestone1 (all services including Masakari)? [1] : https://github.com/openstack/requirements/blob/stable/pike/upper-constraints.txt Regards, Tushar Patil From: Torin Woltjer Sent: Wednesday, May 16, 2018 10:32:10 PM To: fu...@yuggoth.org; openstack@lists.openstack.org Subject: Re: [Openstack] Masakari client error It looks like pip install actually upgraded my openstacksdk to 0.13 when I installed masakari from pip. Meanwhile the sdk in the 16.04 repository is 0.9.17. I'm wondering now if this might explain why my block storage is also having problems. What is the process for setting up a local environment for separate versions of the SDK (With different services using each?) Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: Jeremy Stanley Sent: 5/16/18 8:59 AM To: openstack@lists.openstack.org Subject: Re: [Openstack] Masakari client error On 2018-05-16 12:30:47 + (+), Torin Woltjer wrote: [...] > I am using Pike and not Queens so the openstacksdk version 13 is > not available in the repository. Should openstacksdk version 0.13 > still work with Pike [...] OpenStackSDK strives for backwards-compatibility with even fairly ancient OpenStack releases, and is not tied to any particular version of OpenStack services. It should always be safe to run the latest releases of OpenStackSDK no matter the age of the deployment with which you intend to communicate. Note however that the dependencies of OpenStackSDK may conflict with dependencies of some OpenStack service, so you can't necessarily expect to be able to co-install them on the same machine without some means of context separation (virtualenvs, containers, pip install --local, et cetera). -- Jeremy Stanley ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged,confidential, and proprietary data. If you are not the intended recipient,please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] Masakari client error
It looks like pip install actually upgraded my openstacksdk to 0.13 when I installed masakari from pip. Meanwhile the sdk in the 16.04 repository is 0.9.17. I'm wondering now if this might explain why my block storage is also having problems. What is the process for setting up a local environment for separate versions of the SDK (With different services using each?) Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: Jeremy Stanley <fu...@yuggoth.org> Sent: 5/16/18 8:59 AM To: openstack@lists.openstack.org Subject: Re: [Openstack] Masakari client error On 2018-05-16 12:30:47 + (+0000), Torin Woltjer wrote: [...] > I am using Pike and not Queens so the openstacksdk version 13 is > not available in the repository. Should openstacksdk version 0.13 > still work with Pike [...] OpenStackSDK strives for backwards-compatibility with even fairly ancient OpenStack releases, and is not tied to any particular version of OpenStack services. It should always be safe to run the latest releases of OpenStackSDK no matter the age of the deployment with which you intend to communicate. Note however that the dependencies of OpenStackSDK may conflict with dependencies of some OpenStack service, so you can't necessarily expect to be able to co-install them on the same machine without some means of context separation (virtualenvs, containers, pip install --local, et cetera). -- Jeremy Stanley ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] Masakari client error
Hello again, I am not using the git version of masakari anymore, I am using the version installed from python pip. I am using Pike and not Queens so the openstacksdk version 13 is not available in the repository. Should openstacksdk version 0.13 still work with Pike, and should this version of masakari still work with Pike? Thanks, Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com From: "Patil, Tushar" <tushar.pa...@nttdata.com> Sent: 5/16/18 12:29 AM To: "openstack@lists.openstack.org" <openstack@lists.openstack.org>, "torin.wolt...@granddial.com" <torin.wolt...@granddial.com> Subject: Re: [Openstack] Masakari client error Hi Torin, Few days back, this patch [1] got merged in which the service type is changed from "ha" to "instance_ha". We have tried reproducing the issue you are facing but we are not getting the exact same error. With different versions of openstacksdk, we got different errors. Masakariclient/masakari-monitors requires openstacksdk version 0.13.0. Today we have fixed LP bug [2] in patch [3] which should also fix the issue you are facing. We will release another version of python-masakariclient soon. Are you installing masakari using devstack? If yes, please install masakari from scratch. After installing latest masakari, you should be able to run segment-list and host-list using openstack commands. If you want to run same commands using masakariclient, then you will need to wait until new version of masakariclient is released or you can apply patch [3] in your environment. If you need any help in applying patches, please ask for help on #openstack-masakari IRC. Simple way to install latest masakariclient from code: 1. git clone https://github.com/openstack/python-masakariclient.git 2. Go to folder python-masakariclient 3. sudo python setup.py install If you find any issues in Masakari, you can also report bugs in launchpad against below respective projects. http://launchpad.net/python-masakariclient https://launchpad.net/masakari-monitors https://launchpad.net/masakari Hope this helps!!! [1] : https://review.openstack.org/#/c/536653/ [2] : https://bugs.launchpad.net/python-masakariclient/+bug/1756047 [3] : https://review.openstack.org/#/c/557634/ Regards, Tushar Patil From: Torin Woltjer Sent: Tuesday, May 15, 2018 11:36:11 PM To: openstack@lists.openstack.org Subject: [Openstack] Masakari client error I am using the masakari client version 5.0.0 installed from python pip. I keep getting the following error: ("'Connection' object has no attribute 'ha'", ', mode 'w' at 0x7f6ee88791e0>) when I try to run any commands with it: segment-list host-list etc. It's entirely possible that I'm missing some peice of configuration, or have something improperly configured, but there isn't sufficient documentation for me to figure out if or what. Anybody have a working example that I can see, or know if this an issue? Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged,confidential, and proprietary data. If you are not the intended recipient,please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] Masakari client error
I am using the masakari client version 5.0.0 installed from python pip. I keep getting the following error: ("'Connection' object has no attribute 'ha'", ', mode 'w' at 0x7f6ee88791e0>) when I try to run any commands with it: segment-list host-list etc. It's entirely possible that I'm missing some peice of configuration, or have something improperly configured, but there isn't sufficient documentation for me to figure out if or what. Anybody have a working example that I can see, or know if this an issue? ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] Database (Got timeout reading communication packets)
>While I was working on something else I remembered the error messages >you >described, I have them, too. It's a lab environment on hardware >nodes with a >sufficient network connection, and since we had to debug >network issues >before, we can rule out network problems in our case. >I found a website [1] >to track down galera issues, I tried to apply >those steps and it seems that >the openstack code doesn't close the >connections properly, hence the aborted >connections. >I'm not sure if this is the correct interpretation, but since I >didn't >face any problems related to the openstack databases I decided to >>ignore these messages as long as the openstack environment works >properly. I >did think something similar to this initially when I noticed a high number of >sleeping connections, but because I was unsure I thought to ask. Because this >effects all Openstack services as a whole, what project would I file a bug >report on? ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] Database (Got timeout reading communication packets)
>are these interruptions occasionally or do they occur all the time? Is >this a new issue or has this happened before? This is a 3 node Galera cluster on 3 KVM virtual machines. The errors are constantly printing in the logs, and no node is excluded from receiving the errors. I don't know whether they had always been there or not, but I noticed them after an update. >Does the openstack environment work as expected despite these messages >or do you experience interruptions in the services? The openstack services operate normally, the dashboard is fairly slow, but it always has been. >I would check the network setup first (I have read about loose cables >in different threads...), maybe run some ping tests between the >machines to see if there's anything weird. Since you mention different >services reporting these interruptions this seems like a network issue >to me. The hosts are all networked with bonded 10G SFP+ cables networked via a switch. Pings between the VMs seem fine. If I were to guess, any networking problem would be between the guest and host due to libvirt. Anything that I should be looking for there? ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] HA Compute & Instance Evacuation
On Friday, May 11, 2018 12:40:58 AM EDT Patil, Tushar wrote: > I think this is what is needed to make it work. > Install openstacksdk version 0.13.0. > > Apply patch: https://review.openstack.org/#/c/546492/ > > In this patch ,we need to bump openstacksdk version from 0.11.2 to 0.13.0. > We will merge above patch soon. Do you have a timetable on when the patch will be merged? If it is a relatively small window of time, I would rather wait to use the patched mainline code. Otherwise, I am willing to try to work with the patch. Additionally, patching python is something that I am not familiar with. Is there a good resource on doing this? You have been a great help so far, thanks again. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] Database (Got timeout reading communication packets)
Just the other day I noticed a bunch of errors spewing from the mysql service. I've spent quite a bit of time trying to track this down, and I haven't had any luck figuring out why this is happening. The following line is repeatedly spewed in the service's journal. May 08 11:13:47 UBNTU-DBMQ2 mysqld[20788]: 2018-05-08 11:13:47 140127545740032 [Warning] Aborted connection 211 to db: 'nova_api' user: 'nova' host: '192.168.116.21' (Got timeout reading communication packets) It isn't always nova_api, it's happening with all of the openstack projects. And either of the controller node's ip addresses. The database is a mariadb galera cluster. Removing haproxy has no effect. The output only occurs on the node receiving the connections; with haproxy it is multiple nodes, otherwise it is whatever node I specify as database in my controllers' host file's. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] HA Compute & Instance Evacuation
Thank you very much for the information. Just for clarification, when you say reserved hosts, do you mean that I must keep unloaded virtualization hosts in reserve? Or can Masakari move instances from a downed host to an already loaded host that has open capacity? ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] HA Compute & Instance Evacuation
I'm vaguely familiar with Pacemaker/Corosync, as I'm using it with HAProxy on my controller nodes. I'm assuming in this instance that you use Pacemaker on your compute hosts so masakari can detect host outages? If possible could you go into more detail about the configuration? I would like to use Masakari and I'm having trouble finding a step by step or other documentation to get started with. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] HA Compute & Instance Evacuation
> There is no HA behaviour for compute nodes. > > You are referring to HA of workloads running on compute nodes, not HA of > compute nodes themselves. It was a mistake for me to say HA when referring to compute and instances. Really I want to avoid a situation where one of my compute hosts gives up the ghost, and all of the instances are offline until someone reboots them on a different host. I would like them to automatically reboot on a healthy compute node. > Check out Masakari: > > https://wiki.openstack.org/wiki/Masakari This looks like the kind of thing I'm searching for. I'm seeing 3 components here, I'm assuming one goes on compute hosts and one or both of the others go on the control nodes? Is there any documentation outlining the procedure for deploying this? Will there be any problem running the Masakari API service on 2 machines simultaneously, sitting behind HAProxy? ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] HA Compute & Instance Evacuation
I am working on setting up Openstack for HA and one of the last orders of business is getting HA behavior out of the compute nodes. Is there a project that will automatically evacuate instances from a downed or failed compute host, and automatically reboot them on their new host? I'm curious what suggestions people have about this, or whatever advice you might have. Is there a best way of getting this functionality, or anything else I should be aware of? Thanks, ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] Multiple floating IPs one instance
Is it possible to run an instance with more than one floating IPs? It is not immediately evident how to do this, or whether it is even possible. I have an instance that I would like to have address on two separate networks, and would like to use floating IPs so that I can have that are capable of living longer than the instance itself. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] Nova VNC console broken
After setting up HA for my openstack cluster, the nova console no longer works. Nothing of note appears in any of the logs at /var/log/nova on the controller or the compute node running the instance. I get a single line that looks relevant output to /var/log/apache2/errors.log on the controller node: [Fri Apr 20 15:14:07.666495 2018] [wsgi:error] [pid 25807:tid 139801204832000] WARNING horizon.exceptions Recoverable error: No available console found. Trying to run the command "openstack console url show" with a verbosity of 2 outputs the following: http://paste.openstack.org/show/719660/ Does anybody know the solution to this or of any way that I can further troubleshoot the issue? Thanks, ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] nova live migration setup
This cluster is still predeployment so some information isn't locked in place but currently: We are using libvirt/kvm for the hypervisor. The network is 2 primary VLANs on a 10GiB bond (one VLAN for provider, one for Management & Storage). Network load is currently very low, but I can expect it to be under reasonable load. I expect the majority of load to be from storage on the VMs and VOIP traffic. The real intention of using live migration for us is only for compute node maintenance (without downtime), and not much else. From: David Medberry <openst...@medberry.net> Sent: 3/21/18 5:08 PM To: torin.wolt...@granddial.com Cc: OpenStack General <openstack@lists.openstack.org> Subject: Re: [Openstack] nova live migration setup Best practice is to use shared storage and then the "copy" is really only the active memory. A few changes came about in about the newton? timeframe that allows for some memory convergence. Take a look at the nova release notes from that time forward and you should see reference to the change(s). You likely won't get much more detail without providing a lot more detail about your environment (and maybe not even then.) This functionality is very dependent on your specific configuration regarding: storage design hypervisor choice and is also very dependent upon network load network bandwidth VM size VM busy-ness network design nova structure (regions AZs, etc.) -dave On Wed, Mar 21, 2018 at 1:35 PM, Torin Woltjer <torin.wolt...@granddial.com> wrote: I can't find any up to date official documentation on the topic, and only find documentation referring to the commands used. What is the best practice for setting up live migration for nova? I have used live migration over SSH in the past, but the documentation for how to do so is lost to me. Also there is live migration over TCP, is this preferable to ssh and how would you set it up. What are any general best practices for doing this, and what recommendations do you have? Thanks, ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] nova live migration setup
I can't find any up to date official documentation on the topic, and only find documentation referring to the commands used. What is the best practice for setting up live migration for nova? I have used live migration over SSH in the past, but the documentation for how to do so is lost to me. Also there is live migration over TCP, is this preferable to ssh and how would you set it up. What are any general best practices for doing this, and what recommendations do you have? Thanks, ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] Cinder and HA
I've set up haproxy, pacemaker and the like on some controller nodes and should have a highly available openstack cluster. One thing I notice almost immediately is that volumes show the host as whatever controller owned the VIP at the time of creation. Would this possibly be an issue? Is there a way to consolidate them to show only one host? ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] Internal Server Error (HTTP 500) in PIKE
Hi Vamsi, It looks to me like an issue with the neutron service. I would suggest watching what happens in the various neutron logs when you try running the failing commands. If you see errors happening in the logs, paste them at http://paste.openstack.org for us to view. Thanks, From: A VamsikrishnaSent: 3/16/18 2:31 PM To: "openstack@lists.openstack.org" Cc: "Yamahata, Isaku" , "Bhatia, Manjeet S" , Isaku Yamahata Subject: [Openstack] Internal Server Error (HTTP 500) in PIKE undefinedundefined Hi All, I am using Openstack PIKE & when I am seeing HTTP 500 error during below operations: stack@pike-ctrl:~/devstack$ openstack port set --qos-policy BothRules af63928b-4061-443d-bd9e-622a8b120f90 HttpException: Internal Server Error (HTTP 500) (Request-ID: req-832be17f-e516-4840-a707-6e163c5454a0), Request Failed: internal server error while processing your request. stack@pike-ctrl:~/devstack$ openstack network set --qos-policy BothRules 8ee4a086-0c88-47bf-b0ed-0fb177b38f17 HttpException: Internal Server Error (HTTP 500) (Request-ID: req-e4badfe0-2245-454f-bc4b-d37733c9b506), Request Failed: internal server error while processing your request. Lot of googling didn’t help much. Can you please help me with some pointers for reason behind this error & fix for this ? I am using below wiki: https://docs.openstack.org/neutron/pike/admin/config-qos.html Thanks, Vamsi ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] HA Guide, no Ubuntu instructions for HA Identity
I'm currently going through the HA guide, setting up openstack HA on ubuntu server. I've gotten to this page, https://docs.openstack.org/ha-guide/controller-ha-identity.html , and there is no instructions for ubuntu. Would I be fine following the instructions for SUSE or is there a different process for setting up HA keystone on Ubuntu? ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] Nova + LXD + Ceph?
Thank you for the response James. I now have a couple of further questions regarding boot volume support on nova-lxd. Is this feature on the radar? On nova-kvm documentation states you need shared storage for live migration; is this the same case with nova-lxd, or can you live migrate between compute hosts when using a dir storage pool for root? Putting the host's LXD storage under a folder that a ceph pool is mounted on, while an obvious sleight of hand, what would the repercussions be? I don't know if anyone has answers to these, but any are welcome. I'm assuming the feature I'm looking for relys on work from the nova project rather than the LXD project; I will try to track down a nova features timeline or submit a request myself. James, any documentation you can put together would be great and I look forward to seeing it. Thanks. From: James Page <james.p...@ubuntu.com> Sent: 3/13/18 5:33 AM To: torin.wolt...@granddial.com Cc: openstack@lists.openstack.org Subject: Re: [Openstack] Nova + LXD + Ceph? Hi Torin On Mon, 12 Mar 2018 at 21:52, Torin Woltjer <torin.wolt...@granddial.com> wrote: Hello, I am looking to deploy an openstack cluster using LXD for compute and Ceph for storage, and I was running into some doubt as to whether this was possible; and doubt that nova-lxd was mature enough for production. If anyone is running nova-lxd in production, or knows anything about it, please let me know. I've had a hard time finding good informational resources on the topic, specifically relating to LXD + Ceph; which is supposedly possible, but I haven't heard if it's possible in Openstack. If you otherwise know of a resource that could be helpful to me, I would appreciate hearing it. Short answer is that nova-lxd does support use with Ceph, but only for additional block devices (I.e no boot from volume or ephemeral device support right now). You have highlighted a documentation gap - there are a few non obvious things todo like ensuring that cinder creates RBD devices with a minimal feature set to support use the the kRBD driver used for nova-lxd. Will look to put a howto for ceph in place shortly! Cheers James ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] Nova + LXD + Ceph?
Hello, I am looking to deploy an openstack cluster using LXD for compute and Ceph for storage, and I was running into some doubt as to whether this was possible; and doubt that nova-lxd was mature enough for production. If anyone is running nova-lxd in production, or knows anything about it, please let me know. I've had a hard time finding good informational resources on the topic, specifically relating to LXD + Ceph; which is supposedly possible, but I haven't heard if it's possible in Openstack. If you otherwise know of a resource that could be helpful to me, I would appreciate hearing it. Thanks. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] [Pike][Neutron] ERROR neutron.plugins.ml2.drivers.agent._common_agent - AgentNotFoundByTypeHost
My virtual machines do not get their IP addresses, the dashboard does show the address they should have, but when using the console to access the virtual machine, it shows that no address is assigned to its interface. What kind of misconfiguration could've occured? The following two line repeat in /var/log/nova/nova-compute.log on the compute node: 2018-03-06 13:34:15.051 32084 WARNING nova.compute.manager [req-cc5ee519-111f-4b70-b77f-b6607c5e611e ffe5adfe1f7c40a5b5d0a8f89e65a452 358008d2e1a6428ab2abcf51b10d0a50 - default default] [instance: 7249d430-743e-4463-8d28-d13cdb8cfddc] Received unexpected event network-vif-plugged-87e7138e-9e29-4e67-a181-077b3f6ea09b for instance 2018-03-06 13:34:17.563 32084 WARNING nova.compute.manager [req-512ef7b6-0936-4dd6-a7e0-0044cee7e9cf ffe5adfe1f7c40a5b5d0a8f89e65a452 358008d2e1a6428ab2abcf51b10d0a50 - default default] [instance: 7249d430-743e-4463-8d28-d13cdb8cfddc] Received unexpected event network-vif-unplugged-87e7138e-9e29-4e67-a181-077b3f6ea09b for instance These errors repeat in /var/log/neutron/neutron-linuxbridge-agent.log 2018-03-06 13:38:49.403 1978 INFO neutron.agent.securitygroups_rpc [req-262cb010-9068-4ad9-b93d-bd0875fc66e1 - - - - -] Preparing filters for devices set(['tap87e7138e-9e']) 2018-03-06 13:38:52.286 1978 INFO neutron.agent.securitygroups_rpc [req-262cb010-9068-4ad9-b93d-bd0875fc66e1 - - - - -] Security group member updated [u'04a877fe-f6bc-445c-9e03-204a0cae9d32'] 2018-03-06 13:38:52.289 1978 INFO neutron.plugins.ml2.drivers.agent._common_agent [req-262cb010-9068-4ad9-b93d-bd0875fc66e1 - - - - -] Port tap87e7138e-9e updated. Details: {u'profile': {}, u'network_qos_policy_id': None, u'qos_policy_id': None, u'allowed_address_pairs': [], u'admin_state_up': True, u'network_id': u'a06ac367-fe14-4bcd-96f3-8c8081a874ad', u'segmentation_id': None, u'mtu': 1500, u'device_owner': u'compute:nova', u'physical_network': u'provider', u'mac_address': u'fa:16:3e:23:49:97', u'device': u'tap87e7138e-9e', u'port_security_enabled': True, u'port_id': u'87e7138e-9e29-4e67-a181-077b3f6ea09b', u'fixed_ips': [{u'subnet_id': u'4dc26826-49f3-4cb9-8490-e4cc5e82853d', u'ip_address': u'216.109.195.245'}], u'network_type': u'flat'} 2018-03-06 13:38:55.392 1978 INFO neutron.agent.securitygroups_rpc [req-262cb010-9068-4ad9-b93d-bd0875fc66e1 - - - - -] Security group member updated [u'04a877fe-f6bc-445c-9e03-204a0cae9d32'] 2018-03-06 13:38:55.810 1978 INFO neutron.agent.securitygroups_rpc [req-262cb010-9068-4ad9-b93d-bd0875fc66e1 - - - - -] Remove device filter for set(['tap87e7138e-9e']) 2018-03-06 13:38:57.468 1978 INFO neutron.plugins.ml2.drivers.agent._common_agent [req-262cb010-9068-4ad9-b93d-bd0875fc66e1 - - - - -] Attachment tap87e7138e-9e removed 2018-03-06 13:38:57.909 1978 INFO neutron.agent.securitygroups_rpc [req-262cb010-9068-4ad9-b93d-bd0875fc66e1 - - - - -] Security group member updated [u'04a877fe-f6bc-445c-9e03-204a0cae9d32'] 2018-03-06 13:38:58.199 1978 ERROR neutron.plugins.ml2.drivers.agent._common_agent [req-262cb010-9068-4ad9-b93d-bd0875fc66e1 - - - - -] Error occurred while removing port tap87e7138e-9e: RemoteError: Remote error: AgentNotFoundByTypeHost Agent with agent_type=L3 agent and host=UBNTU-OSTACK-COMPUTE1 could not be found [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 160, in _process_incoming\nres = self.dispatcher.dispatch(message)\n', u' File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 213, in dispatch\nreturn self._do_dispatch(endpoint, method, ctxt, args)\n', u' File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _do_dispatch\nresult = func(ctxt, **new_args)\n', u' File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/rpc.py", line 234, in update_device_down\nn_const.PORT_STATUS_DOWN, host)\n', u' File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/rpc.py", line 331, in notify_l2pop_port_wiring\n l2pop_driver.obj.update_port_down(port_context)\n', u' File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 253, in update_port_down\nadmin_context, agent_host, [port[\'device_id\']]):\n', u' File "/usr/lib/python2.7/dist-packages/neutron/db/l3_agentschedulers_db.py", line 303, in list_router_ids_on_host\ncontext, constants.AGENT_TYPE_L3, host)\n', u' File "/usr/lib/python2.7/dist-packages/neutron/db/agents_db.py", line 291, in _get_agent_by_type_and_host\nhost=host)\n', u'AgentNotFoundByTypeHost: Agent with agent_type=L3 agent and host=UBNTU-OSTACK-COMPUTE1 could not be found\n']. 2018-03-06 13:38:58.199 1978 ERROR neutron.plugins.ml2.drivers.agent._common_agent Traceback (most recent call last): 2018-03-06 13:38:58.199 1978 ERROR neutron.plugins.ml2.drivers.agent._common_agent File
[Openstack] Migration of attached cinder volumes fails.
The backend being used for all storage is ceph, with different pools for nova, glance, and cinder; with cinder having a separate pool for ssd and hdd. The goal is being able to migrate VM's from HDD backed storage to SSD backed storage without downtime. Migrating volumes that are not attached works as expected; however, when migrating a volume attached to an instance, the migration appears to fail. I can see the new volume created, and then deleted as the old volume remains. This is the log file for nova-compute during the migration http://paste.openstack.org/raw/691729/ ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack