from:"Torin Woltjer"

Re: [Openstack] DHCP not accessible on new compute node.

2018-11-15 Thread Torin Woltjer

I've just done this and the problem is still there.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: Marcio Prado 
Sent: 11/2/18 5:08 PM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] DHCP not accessible on new compute node.
Clone the hd of a server and restore to what is not working.

then only change the required settings ... ip, hostname, etc.

Marcio Prado
Analista de TI - Infraestrutura e Redes
Fone: (35) 9.9821-3561
www.marcioprado.eti.br

Em 02/11/2018 16:27, Torin Woltjer  escreveu:I've 
completely wiped the node and reinstalled it, and the problem still persists. I 
can't ping instances on other compute nodes, or ping the DHCP ports. Instances 
don't get addresses or metadata when started on this node.

From: Marcio Prado 
Sent: 11/1/18 9:51 AM
To: torin.wolt...@granddial.com
Cc: openstack@lists.openstack.org
Subject: Re: [Openstack] DHCP not accessible on new compute node.
I believe you have not forgotten anything. This should probably be bug
...

As my cloud is not production, but rather masters research. I migrate
the VM live to a node that is working, restart it, after that I migrate
back to the original node that was not working and it keeps running ...

Em 30-10-2018 17:50, Torin Woltjer escreveu:
> Interestingly, I created a brand new selfservice network and DHCP
> doesn't work on that either. I've followed the instructions in the
> minimal setup (excluding the controllers as they're already set up)
> but the new node has no access to the DHCP agent in neutron it seems.
> Is there a likely component that I've overlooked?
>
> _TORIN WOLTJER_
>
> GRAND DIAL COMMUNICATIONS - A ZK TECH INC. COMPANY
>
> 616.776.1066 EXT. 2006
> _WWW.GRANDDIAL.COM [1]_
>
> -----
>
> FROM: "Torin Woltjer"
> SENT: 10/30/18 10:48 AM
> TO: , "openstack@lists.openstack.org"
>
> SUBJECT: Re: [Openstack] DHCP not accessible on new compute node.
>
> I deleted both DHCP ports and they recreated as you said. However,
> instances are still unable to get network addresses automatically.
>
> _TORIN WOLTJER_
>
> GRAND DIAL COMMUNICATIONS - A ZK TECH INC. COMPANY
>
> 616.776.1066 EXT. 2006
> _ [1] [1]WWW.GRANDDIAL.COM [1]_
>
> -
>
> FROM: Marcio Prado
> SENT: 10/29/18 6:23 PM
> TO: torin.wolt...@granddial.com
> SUBJECT: Re: [Openstack] DHCP not accessible on new compute node.
> The door is recreated automatically. The problem like I said is not in
> DHCP, but for some reason, erasing and waiting for OpenStack to
> re-create the port often solves the problem.
>
> Please, if you can find out the problem in fact, let me know. I'm very
> interested to know.
>
> You can delete the door without fear. OpenStack will recreate in a
> short
> time.
>
> Links:
> --
> [1] http://www.granddial.com

--
Marcio Prado
Analista de TI - Infraestrutura e Redes
Fone: (35) 9.9821-3561
www.marcioprado.eti.br


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] DHCP not accessible on new compute node.

2018-11-06 Thread Torin Woltjer

So I did further ping tests and explored differences between my working compute 
nodes and my non-working compute node. Firstly, it seems that the VXLAN is 
working between the nonworking compute node and controller nodes. After 
manually setting IP addresses, I can ping from an instance on the non working 
node to 172.16.1.1 (neutron gateway); when running tcpdump I can see icmp on:
-compute's bridge interface
-compute's vxlan interface
-controller's vxlan interface
-controller's bridge interface
-controller's qrouter namespace

This behavior is expected and is the same for instances on the working compute 
nodes. However if I try to ping 172.16.1.2 (neutron dhcp) from an instance on 
the nonworking compute node, pings do not flow. If I use tcpdump to listen for 
pings I cannot hear any, even listening on the compute node itself; this 
includes listening on the vxlan, bridge, and the tap device directly. Once I 
try to ping in reverse, from the dhcp netns on the controller to the instance 
on the non-working compute node, pings begin to flow. The same is true for 
pings between the instance on the nonworking compute and an instance on the 
working compute. Pings do not flow, until the working instance pings. Once 
pings are flowing between the nonworking instance and neutron DHCP; I run 
dhclient on the instance and start listening for DHCP requests with tcpdump, 
and I hear them on:
-compute's bridge interface
-compute's vxlan interface
They don't make it to the controller node.

I've re-enabled l2-population on the controller's and rebooted them just in 
case, but the problem persists. A diff of /etc/ on all compute nodes shows that 
all openstack and networking related configuration is effectively identical. 
The last difference between the non-working compute node and the working 
compute nodes as far as I can tell, is that the new node has a different 
network card. The working nodes use "Broadcom Limited NetXtreme II BCM57712 10 
Gigabit Ethernet" and the nonworking node uses a "NetXen Incorporated NX3031 
Multifunction 1/10-Gigabit Server Adapter".

Are there any known issues with neutron and this brand of network adapter? I 
looked at the capabilities on both adapters and here are the differences:

Broadcom NetXen
 tx-tcp-ecn-segmentation: on tx-tcp-ecn-segmentation: off [fixed]
 rx-vlan-offload: on [fixed] rx-vlan-offload: off [fixed]
 receive-hashing: on receive-hashing: off [fixed]
 rx-vlan-filter: on  rx-vlan-filter: off [fixed]
 tx-gre-segmentation: on tx-gre-segmentation: off [fixed]
 tx-gre-csum-segmentation: ontx-gre-csum-segmentation: off [fixed]
 tx-ipxip4-segmentation: on  tx-ipxip4-segmentation: off [fixed]
 tx-udp_tnl-segmentation: on tx-udp_tnl-segmentation: off [fixed]
 tx-udp_tnl-csum-segmentation: ontx-udp_tnl-csum-segmentation: off 
[fixed]
 tx-gso-partial: on  tx-gso-partial: off [fixed]
 loopback: off   loopback: off [fixed]
 rx-udp_tunnel-port-offload: on  rx-udp_tunnel-port-offload: off [fixed]


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] DHCP not accessible on new compute node.

2018-11-02 Thread Torin Woltjer

I've completely wiped the node and reinstalled it, and the problem still 
persists. I can't ping instances on other compute nodes, or ping the DHCP 
ports. Instances don't get addresses or metadata when started on this node.

From: Marcio Prado 
Sent: 11/1/18 9:51 AM
To: torin.wolt...@granddial.com
Cc: openstack@lists.openstack.org
Subject: Re: [Openstack] DHCP not accessible on new compute node.
I believe you have not forgotten anything. This should probably be bug
...

As my cloud is not production, but rather masters research. I migrate
the VM live to a node that is working, restart it, after that I migrate
back to the original node that was not working and it keeps running ...

Em 30-10-2018 17:50, Torin Woltjer escreveu:
> Interestingly, I created a brand new selfservice network and DHCP
> doesn't work on that either. I've followed the instructions in the
> minimal setup (excluding the controllers as they're already set up)
> but the new node has no access to the DHCP agent in neutron it seems.
> Is there a likely component that I've overlooked?
>
> _TORIN WOLTJER_
>
> GRAND DIAL COMMUNICATIONS - A ZK TECH INC. COMPANY
>
> 616.776.1066 EXT. 2006
> _WWW.GRANDDIAL.COM [1]_
>
> -----
>
> FROM: "Torin Woltjer"
> SENT: 10/30/18 10:48 AM
> TO: , "openstack@lists.openstack.org"
>
> SUBJECT: Re: [Openstack] DHCP not accessible on new compute node.
>
> I deleted both DHCP ports and they recreated as you said. However,
> instances are still unable to get network addresses automatically.
>
> _TORIN WOLTJER_
>
> GRAND DIAL COMMUNICATIONS - A ZK TECH INC. COMPANY
>
> 616.776.1066 EXT. 2006
> _ [1] [1]WWW.GRANDDIAL.COM [1]_
>
> -
>
> FROM: Marcio Prado
> SENT: 10/29/18 6:23 PM
> TO: torin.wolt...@granddial.com
> SUBJECT: Re: [Openstack] DHCP not accessible on new compute node.
> The door is recreated automatically. The problem like I said is not in
> DHCP, but for some reason, erasing and waiting for OpenStack to
> re-create the port often solves the problem.
>
> Please, if you can find out the problem in fact, let me know. I'm very
> interested to know.
>
> You can delete the door without fear. OpenStack will recreate in a
> short
> time.
>
> Links:
> --
> [1] http://www.granddial.com

--
Marcio Prado
Analista de TI - Infraestrutura e Redes
Fone: (35) 9.9821-3561
www.marcioprado.eti.br


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [openstack-dev] [Openstack] DHCP not accessible on new compute node.

2018-11-02 Thread Torin Woltjer

I've completely wiped the node and reinstalled it, and the problem still 
persists. I can't ping instances on other compute nodes, or ping the DHCP 
ports. Instances don't get addresses or metadata when started on this node.

From: Marcio Prado 
Sent: 11/1/18 9:51 AM
To: torin.wolt...@granddial.com
Cc: openst...@lists.openstack.org
Subject: Re: [Openstack] DHCP not accessible on new compute node.
I believe you have not forgotten anything. This should probably be bug
...

As my cloud is not production, but rather masters research. I migrate
the VM live to a node that is working, restart it, after that I migrate
back to the original node that was not working and it keeps running ...

Em 30-10-2018 17:50, Torin Woltjer escreveu:
> Interestingly, I created a brand new selfservice network and DHCP
> doesn't work on that either. I've followed the instructions in the
> minimal setup (excluding the controllers as they're already set up)
> but the new node has no access to the DHCP agent in neutron it seems.
> Is there a likely component that I've overlooked?
>
> _TORIN WOLTJER_
>
> GRAND DIAL COMMUNICATIONS - A ZK TECH INC. COMPANY
>
> 616.776.1066 EXT. 2006
> _WWW.GRANDDIAL.COM [1]_
>
> -----
>
> FROM: "Torin Woltjer"
> SENT: 10/30/18 10:48 AM
> TO: , "openst...@lists.openstack.org"
>
> SUBJECT: Re: [Openstack] DHCP not accessible on new compute node.
>
> I deleted both DHCP ports and they recreated as you said. However,
> instances are still unable to get network addresses automatically.
>
> _TORIN WOLTJER_
>
> GRAND DIAL COMMUNICATIONS - A ZK TECH INC. COMPANY
>
> 616.776.1066 EXT. 2006
> _ [1] [1]WWW.GRANDDIAL.COM [1]_
>
> -
>
> FROM: Marcio Prado
> SENT: 10/29/18 6:23 PM
> TO: torin.wolt...@granddial.com
> SUBJECT: Re: [Openstack] DHCP not accessible on new compute node.
> The door is recreated automatically. The problem like I said is not in
> DHCP, but for some reason, erasing and waiting for OpenStack to
> re-create the port often solves the problem.
>
> Please, if you can find out the problem in fact, let me know. I'm very
> interested to know.
>
> You can delete the door without fear. OpenStack will recreate in a
> short
> time.
>
> Links:
> --
> [1] http://www.granddial.com

--
Marcio Prado
Analista de TI - Infraestrutura e Redes
Fone: (35) 9.9821-3561
www.marcioprado.eti.br


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [Openstack] DHCP not accessible on new compute node.

2018-10-30 Thread Torin Woltjer

Interestingly, I created a brand new selfservice network and DHCP doesn't work 
on that either. I've followed the instructions in the minimal setup (excluding 
the controllers as they're already set up) but the new node has no access to 
the DHCP agent in neutron it seems. Is there a likely component that I've 
overlooked?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 10/30/18 10:48 AM
To: , "openstack@lists.openstack.org" 

Subject: Re: [Openstack] DHCP not accessible on new compute node.
I deleted both DHCP ports and they recreated as you said. However, instances 
are still unable to get network addresses automatically.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: Marcio Prado 
Sent: 10/29/18 6:23 PM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] DHCP not accessible on new compute node.
The door is recreated automatically. The problem like I said is not in
DHCP, but for some reason, erasing and waiting for OpenStack to
re-create the port often solves the problem.

Please, if you can find out the problem in fact, let me know. I'm very
interested to know.

You can delete the door without fear. OpenStack will recreate in a short
time.


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] DHCP not accessible on new compute node.

2018-10-30 Thread Torin Woltjer

I deleted both DHCP ports and they recreated as you said. However, instances 
are still unable to get network addresses automatically.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: Marcio Prado 
Sent: 10/29/18 6:23 PM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] DHCP not accessible on new compute node.
The door is recreated automatically. The problem like I said is not in
DHCP, but for some reason, erasing and waiting for OpenStack to
re-create the port often solves the problem.

Please, if you can find out the problem in fact, let me know. I'm very
interested to know.

You can delete the door without fear. OpenStack will recreate in a short
time.


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] DHCP not accessible on new compute node.

2018-10-29 Thread Torin Woltjer

Recently installed a new compute node, but noticed none of the instances that I 
put on it will successfully receive network addresses from DHCP. This seems to 
work on all other compute nodes however. When listening for DHCP requests on 
the vxlan of the compute node, I notice that while I can see the DHCP requests 
on the new compute node, I do not see them anywhere else. If I manually assign 
an address to the interface on the instance I am able to ping in and out. 
Running dhclient -v on an instance on a working compute node successfully gets 
a DHCP response, on the new compute node there is no response, I also 
discovered that the instance on the new compute node cannot ping the DHCP ports 
at 172.16.1.2 & 172.16.1.3 yet can ping the gateway at 172.16.1.1.

The setup is neutron-linuxbridge on Openstack Queens.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Rename Cinder Volume

2018-10-12 Thread Torin Woltjer

That was easy, Thanks!

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] Rename Cinder Volume

2018-10-12 Thread Torin Woltjer

Is it possible to change the name and description of a Cinder volume?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] [Openstack-operators] Recovering from full outage

2018-07-16 Thread Torin Woltjer

I feel pretty dumb about this, but it was fixed by adding a rule to my security 
groups. I'm still very confused about some of the other behavior that I saw, 
but at least the problem is fixed now.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: Brian Haley 
Sent: 7/16/18 4:39 PM
To: torin.wolt...@granddial.com, thangam.ar...@gmail.com, jpetr...@coredial.com
Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org
Subject: Re: [Openstack] [Openstack-operators] Recovering from full outage
On 07/16/2018 08:41 AM, Torin Woltjer wrote:
> $ip netns exec qdhcp-87a5200d-057f-475d-953d-17e873a47454 curl
> http://169.254.169.254
>
>
>  404 Not Found
>
>
>
404 Not Found

>  The resource could not be found.

>
>

Strange, don't know where the reply came from for that.

> $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e curl
> http://169.254.169.254
> curl: (7) Couldn't connect to server

Based on your iptables output below, I would think the metadata proxy is
running in the qrouter namespace. However, a curl from there will not
work since it is restricted to only work for incoming packets from the
qr- device(s). You would have to try curl from a running instance.

Is there an haproxy process running? And is it listening on port 9697
in the qrouter namespace?

-Brian

> --------
> *From*: "Torin Woltjer"
> *Sent*: 7/12/18 11:16 AM
> *To*: , ,
> "jpetr...@coredial.com"
> *Cc*: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org
> *Subject*: Re: [Openstack] [Openstack-operators] Recovering from full outage
> Checking iptables for the metadata-proxy inside of qrouter provides the
> following:
> $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e
> iptables-save -c | grep 169
> [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p
> tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697
> [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p
> tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0x
> Packets:Bytes are both 0, so no traffic is touching this rule?
>
> Interestingly the command:
> $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat
> -anep | grep 9697
> returns nothing, so there isn't actually anything running on 9697 in the
> network namespace...
>
> This is the output without grep:
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address   Foreign Address
> State   User   Inode  PID/Program name
> raw0  0 0.0.0.0:112 0.0.0.0:*   7
> 0  76154  8404/keepalived
> raw0  0 0.0.0.0:112 0.0.0.0:*   7
> 0  76153  8404/keepalived
> Active UNIX domain sockets (servers and established)
> Proto RefCnt Flags   Type   State I-Node   PID/Program
> name Path
> unix  2  [ ] DGRAM645017567/python2
> unix  2  [ ] DGRAM799538403/keepalived
>
> Could the reason no traffic touching the rule be that nothing is
> listening on that port, or is there a second issue down the chain?
>
> Curl fails even after restarting the neutron-dhcp-agent &
> neutron-metadata agent.
>
> Thank you for this, and any future help.


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-16 Thread Torin Woltjer

I feel pretty dumb about this, but it was fixed by adding a rule to my security 
groups. I'm still very confused about some of the other behavior that I saw, 
but at least the problem is fixed now.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: Brian Haley 
Sent: 7/16/18 4:39 PM
To: torin.wolt...@granddial.com, thangam.ar...@gmail.com, jpetr...@coredial.com
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] [Openstack-operators] Recovering from full outage
On 07/16/2018 08:41 AM, Torin Woltjer wrote:
> $ip netns exec qdhcp-87a5200d-057f-475d-953d-17e873a47454 curl
> http://169.254.169.254
>
>
>  404 Not Found
>
>
>
404 Not Found

>  The resource could not be found.

>
>

Strange, don't know where the reply came from for that.

> $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e curl
> http://169.254.169.254
> curl: (7) Couldn't connect to server

Based on your iptables output below, I would think the metadata proxy is
running in the qrouter namespace. However, a curl from there will not
work since it is restricted to only work for incoming packets from the
qr- device(s). You would have to try curl from a running instance.

Is there an haproxy process running? And is it listening on port 9697
in the qrouter namespace?

-Brian

> --------
> *From*: "Torin Woltjer"
> *Sent*: 7/12/18 11:16 AM
> *To*: , ,
> "jpetr...@coredial.com"
> *Cc*: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
> *Subject*: Re: [Openstack] [Openstack-operators] Recovering from full outage
> Checking iptables for the metadata-proxy inside of qrouter provides the
> following:
> $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e
> iptables-save -c | grep 169
> [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p
> tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697
> [0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p
> tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0x
> Packets:Bytes are both 0, so no traffic is touching this rule?
>
> Interestingly the command:
> $ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat
> -anep | grep 9697
> returns nothing, so there isn't actually anything running on 9697 in the
> network namespace...
>
> This is the output without grep:
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address   Foreign Address
> State   User   Inode  PID/Program name
> raw0  0 0.0.0.0:112 0.0.0.0:*   7
> 0  76154  8404/keepalived
> raw0  0 0.0.0.0:112 0.0.0.0:*   7
> 0  76153  8404/keepalived
> Active UNIX domain sockets (servers and established)
> Proto RefCnt Flags   Type   State I-Node   PID/Program
> name Path
> unix  2  [ ] DGRAM645017567/python2
> unix  2  [ ] DGRAM799538403/keepalived
>
> Could the reason no traffic touching the rule be that nothing is
> listening on that port, or is there a second issue down the chain?
>
> Curl fails even after restarting the neutron-dhcp-agent &
> neutron-metadata agent.
>
> Thank you for this, and any future help.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack] [Openstack-operators] Recovering from full outage

2018-07-16 Thread Torin Woltjer

$ip netns exec qdhcp-87a5200d-057f-475d-953d-17e873a47454 curl 
http://169.254.169.254


 404 Not Found


 404 Not Found
 The resource could not be found.



$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e curl 
http://169.254.169.254
curl: (7) Couldn't connect to server

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/12/18 11:16 AM
To: , , "jpetr...@coredial.com" 

Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org
Subject: Re: [Openstack] [Openstack-operators] Recovering from full outage
Checking iptables for the metadata-proxy inside of qrouter provides the 
following:
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e iptables-save -c | 
grep 169
[0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m 
tcp --dport 80 -j REDIRECT --to-ports 9697
[0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m 
tcp --dport 80 -j MARK --set-xmark 0x1/0x
Packets:Bytes are both 0, so no traffic is touching this rule?

Interestingly the command:
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat -anep | 
grep 9697
returns nothing, so there isn't actually anything running on 9697 in the 
network namespace...

This is the output without grep:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address   Foreign Address State   
User   Inode  PID/Program name
raw0  0 0.0.0.0:112 0.0.0.0:*   7   
0  76154  8404/keepalived
raw0  0 0.0.0.0:112 0.0.0.0:*   7   
0  76153  8404/keepalived
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags   Type   State I-Node   PID/Program name 
Path
unix  2  [ ] DGRAM645017567/python2
unix  2  [ ] DGRAM799538403/keepalived

Could the reason no traffic touching the rule be that nothing is listening on 
that port, or is there a second issue down the chain?

Curl fails even after restarting the neutron-dhcp-agent & neutron-metadata 
agent.

Thank you for this, and any future help.


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-16 Thread Torin Woltjer

$ip netns exec qdhcp-87a5200d-057f-475d-953d-17e873a47454 curl 
http://169.254.169.254


 404 Not Found


 404 Not Found
 The resource could not be found.



$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e curl 
http://169.254.169.254
curl: (7) Couldn't connect to server

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/12/18 11:16 AM
To: , , "jpetr...@coredial.com" 

Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] [Openstack-operators] Recovering from full outage
Checking iptables for the metadata-proxy inside of qrouter provides the 
following:
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e iptables-save -c | 
grep 169
[0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m 
tcp --dport 80 -j REDIRECT --to-ports 9697
[0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m 
tcp --dport 80 -j MARK --set-xmark 0x1/0x
Packets:Bytes are both 0, so no traffic is touching this rule?

Interestingly the command:
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat -anep | 
grep 9697
returns nothing, so there isn't actually anything running on 9697 in the 
network namespace...

This is the output without grep:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address   Foreign Address State   
User   Inode  PID/Program name
raw0  0 0.0.0.0:112 0.0.0.0:*   7   
0  76154  8404/keepalived
raw0  0 0.0.0.0:112 0.0.0.0:*   7   
0  76153  8404/keepalived
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags   Type   State I-Node   PID/Program name 
Path
unix  2  [ ] DGRAM645017567/python2
unix  2  [ ] DGRAM799538403/keepalived

Could the reason no traffic touching the rule be that nothing is listening on 
that port, or is there a second issue down the chain?

Curl fails even after restarting the neutron-dhcp-agent & neutron-metadata 
agent.

Thank you for this, and any future help.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack] [Openstack-operators] Recovering from full outage

2018-07-12 Thread Torin Woltjer

Checking iptables for the metadata-proxy inside of qrouter provides the 
following:
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e iptables-save -c | 
grep 169
[0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m 
tcp --dport 80 -j REDIRECT --to-ports 9697
[0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m 
tcp --dport 80 -j MARK --set-xmark 0x1/0x
Packets:Bytes are both 0, so no traffic is touching this rule?

Interestingly the command:
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat -anep | 
grep 9697
returns nothing, so there isn't actually anything running on 9697 in the 
network namespace...

This is the output without grep:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address   Foreign Address State   
User   Inode  PID/Program name
raw0  0 0.0.0.0:112 0.0.0.0:*   7   
0  76154  8404/keepalived
raw0  0 0.0.0.0:112 0.0.0.0:*   7   
0  76153  8404/keepalived
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags   Type   State I-Node   PID/Program name 
Path
unix  2  [ ] DGRAM645017567/python2
unix  2  [ ] DGRAM799538403/keepalived

Could the reason no traffic touching the rule be that nothing is listening on 
that port, or is there a second issue down the chain?

Curl fails even after restarting the neutron-dhcp-agent & neutron-metadata 
agent.

Thank you for this, and any future help.


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-12 Thread Torin Woltjer

Checking iptables for the metadata-proxy inside of qrouter provides the 
following:
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e iptables-save -c | 
grep 169
[0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m 
tcp --dport 80 -j REDIRECT --to-ports 9697
[0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m 
tcp --dport 80 -j MARK --set-xmark 0x1/0x
Packets:Bytes are both 0, so no traffic is touching this rule?

Interestingly the command:
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat -anep | 
grep 9697
returns nothing, so there isn't actually anything running on 9697 in the 
network namespace...

This is the output without grep:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address   Foreign Address State   
User   Inode  PID/Program name
raw0  0 0.0.0.0:112 0.0.0.0:*   7   
0  76154  8404/keepalived
raw0  0 0.0.0.0:112 0.0.0.0:*   7   
0  76153  8404/keepalived
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags   Type   State I-Node   PID/Program name 
Path
unix  2  [ ] DGRAM645017567/python2
unix  2  [ ] DGRAM799538403/keepalived

Could the reason no traffic touching the rule be that nothing is listening on 
that port, or is there a second issue down the chain?

Curl fails even after restarting the neutron-dhcp-agent & neutron-metadata 
agent.

Thank you for this, and any future help.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack] [Openstack-operators] Recovering from full outage

2018-07-12 Thread Torin Woltjer

I tested this on two instances. The first instance has existed since before I 
began having this issue. The second is created from a cirros test image.

On the first instance:
The route exists: 169.254.169.254 via 172.16.1.1 dev ens3 proto dhcp metric 100.
curl returns information, for example;
`curl http://169.254.169.254/latest/meta-data/public-keys`
0=nextcloud

On the second instance:
The route exists: 169.254.169.254 via 172.16.1.1 dev eth0
curl fails;
`curl http://169.254.169.254/latest/meta-data`
curl: (7) Failed to connect to 169.254.169.254 port 80: Connection timed out

I am curious why this is the case that one is able to connect but not the 
other. Both the first and second instances were running on the same compute 
node.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: John Petrini 
Sent: 7/12/18 9:16 AM
To: torin.wolt...@granddial.com
Cc: thangam.ar...@gmail.com, OpenStack Operators 
, OpenStack Mailing List 

Subject: Re: [Openstack-operators] [Openstack] Recovering from full outage
Are you instances receiving a route to the metadata service (169.254.169.254) 
from DHCP? Can you curl the endpoint? curl 
http://169.254.169.254/latest/meta-data


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-12 Thread Torin Woltjer

I tested this on two instances. The first instance has existed since before I 
began having this issue. The second is created from a cirros test image.

On the first instance:
The route exists: 169.254.169.254 via 172.16.1.1 dev ens3 proto dhcp metric 100.
curl returns information, for example;
`curl http://169.254.169.254/latest/meta-data/public-keys`
0=nextcloud

On the second instance:
The route exists: 169.254.169.254 via 172.16.1.1 dev eth0
curl fails;
`curl http://169.254.169.254/latest/meta-data`
curl: (7) Failed to connect to 169.254.169.254 port 80: Connection timed out

I am curious why this is the case that one is able to connect but not the 
other. Both the first and second instances were running on the same compute 
node.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: John Petrini 
Sent: 7/12/18 9:16 AM
To: torin.wolt...@granddial.com
Cc: thangam.ar...@gmail.com, OpenStack Operators 
, OpenStack Mailing List 

Subject: Re: [Openstack-operators] [Openstack] Recovering from full outage
Are you instances receiving a route to the metadata service (169.254.169.254) 
from DHCP? Can you curl the endpoint? curl 
http://169.254.169.254/latest/meta-data


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack] Recovering from full outage

2018-07-11 Thread Torin Woltjer

If I run `ip netns exec qrouter netstat -lnp` or `ip netns exec qdhcp netstat 
-lnp` on the controller, should I see anything listening on the metadata port 
(8775)? When I run these commands I don't see that listening, but I have no 
example of a working system to check against. Can anybody verify this?

Thanks,

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/10/18 2:58 PM
To: 
Cc: , 
Subject: Re: [Openstack] Recovering from full outage
DHCP is working again so instances are getting their addresses. For some reason 
cloud-init isn't working correctly. Hostnames aren't getting set, and SSH key 
pair isn't getting set. The neutron-metadata service is in control of this?

neutron-metadata-agent.log:
2018-07-10 08:01:42.046 5518 INFO eventlet.wsgi.server [-] 
109.73.185.195, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0622332
2018-07-10 09:49:42.604 5518 INFO eventlet.wsgi.server [-] 
197.149.85.150, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0645461
2018-07-10 10:52:50.845 5517 INFO eventlet.wsgi.server [-] 
88.249.225.204, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0659041
2018-07-10 11:43:20.471 5518 INFO eventlet.wsgi.server [-] 
143.208.186.168, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0618532
2018-07-10 11:53:15.574 5511 INFO eventlet.wsgi.server [-] 
194.40.240.254, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0636070
2018-07-10 13:26:46.795 5518 INFO eventlet.wsgi.server [-] 
109.73.177.149, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0611560
2018-07-10 13:27:38.795 5513 INFO eventlet.wsgi.server [-] 
125.167.69.238, "GET / HTTP/1.0" status: 404  len: 195 time: 0.0631371
2018-07-10 13:30:49.551 5514 INFO eventlet.wsgi.server [-] 
155.93.152.111, "GET / HTTP/1.0" status: 404  len: 195 time: 0.0609179
2018-07-10 14:12:42.008 5521 INFO eventlet.wsgi.server [-] 
190.85.38.173, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0597739

No other log files show abnormal behavior.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/6/18 2:33 PM
To: "lmihaie...@gmail.com" 
Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
I explored creating a second "selfservice" vxlan to see if DHCP would work on 
it as it does on my external "provider" network. The new vxlan network shares 
the same problems as the old vxlan network. Am I having problems with VXLAN in 
particular?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/6/18 12:05 PM
To: 
Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP 
(located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually 
added the IP address to has a floating IP, and oddly enough I am able to ping 
DHCP on the provider network, which suggests that DHCP may be working on other 
networks but not on my selfservice network. I was able to confirm this by 
creating a new virtual machine directly on the provider network, I was able to 
ping to it and SSH into it right off of the bat, as it obtained the proper 
address on its own. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains:
fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8
fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7
fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12
fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10
fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3
fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14
fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1
fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2
fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4
fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100
fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13
fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18

I've done system restarts since the power outage and the agent hasn't corrected 
itself. I've restarted all neutron services as I've done things, I could also 
try stopping and starting dnsmasq.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiesc

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-11 Thread Torin Woltjer

If I run `ip netns exec qrouter netstat -lnp` or `ip netns exec qdhcp netstat 
-lnp` on the controller, should I see anything listening on the metadata port 
(8775)? When I run these commands I don't see that listening, but I have no 
example of a working system to check against. Can anybody verify this?

Thanks,

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/10/18 2:58 PM
To: 
Cc: , 
Subject: Re: [Openstack] Recovering from full outage
DHCP is working again so instances are getting their addresses. For some reason 
cloud-init isn't working correctly. Hostnames aren't getting set, and SSH key 
pair isn't getting set. The neutron-metadata service is in control of this?

neutron-metadata-agent.log:
2018-07-10 08:01:42.046 5518 INFO eventlet.wsgi.server [-] 
109.73.185.195, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0622332
2018-07-10 09:49:42.604 5518 INFO eventlet.wsgi.server [-] 
197.149.85.150, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0645461
2018-07-10 10:52:50.845 5517 INFO eventlet.wsgi.server [-] 
88.249.225.204, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0659041
2018-07-10 11:43:20.471 5518 INFO eventlet.wsgi.server [-] 
143.208.186.168, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0618532
2018-07-10 11:53:15.574 5511 INFO eventlet.wsgi.server [-] 
194.40.240.254, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0636070
2018-07-10 13:26:46.795 5518 INFO eventlet.wsgi.server [-] 
109.73.177.149, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0611560
2018-07-10 13:27:38.795 5513 INFO eventlet.wsgi.server [-] 
125.167.69.238, "GET / HTTP/1.0" status: 404  len: 195 time: 0.0631371
2018-07-10 13:30:49.551 5514 INFO eventlet.wsgi.server [-] 
155.93.152.111, "GET / HTTP/1.0" status: 404  len: 195 time: 0.0609179
2018-07-10 14:12:42.008 5521 INFO eventlet.wsgi.server [-] 
190.85.38.173, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0597739

No other log files show abnormal behavior.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/6/18 2:33 PM
To: "lmihaie...@gmail.com" 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
I explored creating a second "selfservice" vxlan to see if DHCP would work on 
it as it does on my external "provider" network. The new vxlan network shares 
the same problems as the old vxlan network. Am I having problems with VXLAN in 
particular?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/6/18 12:05 PM
To: 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP 
(located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually 
added the IP address to has a floating IP, and oddly enough I am able to ping 
DHCP on the provider network, which suggests that DHCP may be working on other 
networks but not on my selfservice network. I was able to confirm this by 
creating a new virtual machine directly on the provider network, I was able to 
ping to it and SSH into it right off of the bat, as it obtained the proper 
address on its own. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains:
fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8
fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7
fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12
fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10
fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3
fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14
fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1
fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2
fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4
fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100
fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13
fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18

I've done system restarts since the power outage and the agent hasn't corrected 
itself. I've restarted all neutron services as I've done things, I could also 
try stopping and starting dnsmasq.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiesc

Re: [Openstack] Recovering from full outage

2018-07-10 Thread Torin Woltjer

DHCP is working again so instances are getting their addresses. For some reason 
cloud-init isn't working correctly. Hostnames aren't getting set, and SSH key 
pair isn't getting set. The neutron-metadata service is in control of this?

neutron-metadata-agent.log:
2018-07-10 08:01:42.046 5518 INFO eventlet.wsgi.server [-] 
109.73.185.195, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0622332
2018-07-10 09:49:42.604 5518 INFO eventlet.wsgi.server [-] 
197.149.85.150, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0645461
2018-07-10 10:52:50.845 5517 INFO eventlet.wsgi.server [-] 
88.249.225.204, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0659041
2018-07-10 11:43:20.471 5518 INFO eventlet.wsgi.server [-] 
143.208.186.168, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0618532
2018-07-10 11:53:15.574 5511 INFO eventlet.wsgi.server [-] 
194.40.240.254, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0636070
2018-07-10 13:26:46.795 5518 INFO eventlet.wsgi.server [-] 
109.73.177.149, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0611560
2018-07-10 13:27:38.795 5513 INFO eventlet.wsgi.server [-] 
125.167.69.238, "GET / HTTP/1.0" status: 404  len: 195 time: 0.0631371
2018-07-10 13:30:49.551 5514 INFO eventlet.wsgi.server [-] 
155.93.152.111, "GET / HTTP/1.0" status: 404  len: 195 time: 0.0609179
2018-07-10 14:12:42.008 5521 INFO eventlet.wsgi.server [-] 
190.85.38.173, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0597739

No other log files show abnormal behavior.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/6/18 2:33 PM
To: "lmihaie...@gmail.com" 
Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
I explored creating a second "selfservice" vxlan to see if DHCP would work on 
it as it does on my external "provider" network. The new vxlan network shares 
the same problems as the old vxlan network. Am I having problems with VXLAN in 
particular?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/6/18 12:05 PM
To: 
Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP 
(located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually 
added the IP address to has a floating IP, and oddly enough I am able to ping 
DHCP on the provider network, which suggests that DHCP may be working on other 
networks but not on my selfservice network. I was able to confirm this by 
creating a new virtual machine directly on the provider network, I was able to 
ping to it and SSH into it right off of the bat, as it obtained the proper 
address on its own. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains:
fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8
fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7
fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12
fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10
fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3
fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14
fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1
fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2
fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4
fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100
fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13
fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18

I've done system restarts since the power outage and the agent hasn't corrected 
itself. I've restarted all neutron services as I've done things, I could also 
try stopping and starting dnsmasq.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/6/18 11:15 AM
To: torin.wolt...@granddial.com
Cc: "openstack@lists.openstack.org" , 
"openstack-operat...@lists.openstack.org" 
, pgso...@gmail.com
Subject: Re: [Openstack] Recovering from full outage
Can you manually assign an IP address to a VM and once inside, ping the address 
of the dhcp server?
That would confirm if there is connectivity at least.

Also, on the controller node where the dhcp server for that network is, check 
the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and 
make sure there are entries corr

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-10 Thread Torin Woltjer

DHCP is working again so instances are getting their addresses. For some reason 
cloud-init isn't working correctly. Hostnames aren't getting set, and SSH key 
pair isn't getting set. The neutron-metadata service is in control of this?

neutron-metadata-agent.log:
2018-07-10 08:01:42.046 5518 INFO eventlet.wsgi.server [-] 
109.73.185.195, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0622332
2018-07-10 09:49:42.604 5518 INFO eventlet.wsgi.server [-] 
197.149.85.150, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0645461
2018-07-10 10:52:50.845 5517 INFO eventlet.wsgi.server [-] 
88.249.225.204, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0659041
2018-07-10 11:43:20.471 5518 INFO eventlet.wsgi.server [-] 
143.208.186.168, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0618532
2018-07-10 11:53:15.574 5511 INFO eventlet.wsgi.server [-] 
194.40.240.254, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0636070
2018-07-10 13:26:46.795 5518 INFO eventlet.wsgi.server [-] 
109.73.177.149, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0611560
2018-07-10 13:27:38.795 5513 INFO eventlet.wsgi.server [-] 
125.167.69.238, "GET / HTTP/1.0" status: 404  len: 195 time: 0.0631371
2018-07-10 13:30:49.551 5514 INFO eventlet.wsgi.server [-] 
155.93.152.111, "GET / HTTP/1.0" status: 404  len: 195 time: 0.0609179
2018-07-10 14:12:42.008 5521 INFO eventlet.wsgi.server [-] 
190.85.38.173, "GET / HTTP/1.1" status: 404  len: 195 time: 0.0597739

No other log files show abnormal behavior.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/6/18 2:33 PM
To: "lmihaie...@gmail.com" 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
I explored creating a second "selfservice" vxlan to see if DHCP would work on 
it as it does on my external "provider" network. The new vxlan network shares 
the same problems as the old vxlan network. Am I having problems with VXLAN in 
particular?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/6/18 12:05 PM
To: 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP 
(located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually 
added the IP address to has a floating IP, and oddly enough I am able to ping 
DHCP on the provider network, which suggests that DHCP may be working on other 
networks but not on my selfservice network. I was able to confirm this by 
creating a new virtual machine directly on the provider network, I was able to 
ping to it and SSH into it right off of the bat, as it obtained the proper 
address on its own. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains:
fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8
fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7
fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12
fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10
fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3
fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14
fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1
fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2
fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4
fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100
fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13
fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18

I've done system restarts since the power outage and the agent hasn't corrected 
itself. I've restarted all neutron services as I've done things, I could also 
try stopping and starting dnsmasq.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/6/18 11:15 AM
To: torin.wolt...@granddial.com
Cc: "openst...@lists.openstack.org" , 
"openstack-operators@lists.openstack.org" 
, pgso...@gmail.com
Subject: Re: [Openstack] Recovering from full outage
Can you manually assign an IP address to a VM and once inside, ping the address 
of the dhcp server?
That would confirm if there is connectivity at least.

Also, on the controller node where the dhcp server for that network is, check 
the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and 
make sure there are entries corr

Re: [Openstack] Recovering from full outage

2018-07-06 Thread Torin Woltjer

I explored creating a second "selfservice" vxlan to see if DHCP would work on 
it as it does on my external "provider" network. The new vxlan network shares 
the same problems as the old vxlan network. Am I having problems with VXLAN in 
particular?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

----
From: "Torin Woltjer" 
Sent: 7/6/18 12:05 PM
To: 
Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP 
(located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually 
added the IP address to has a floating IP, and oddly enough I am able to ping 
DHCP on the provider network, which suggests that DHCP may be working on other 
networks but not on my selfservice network. I was able to confirm this by 
creating a new virtual machine directly on the provider network, I was able to 
ping to it and SSH into it right off of the bat, as it obtained the proper 
address on its own. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains:
fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8
fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7
fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12
fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10
fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3
fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14
fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1
fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2
fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4
fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100
fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13
fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18

I've done system restarts since the power outage and the agent hasn't corrected 
itself. I've restarted all neutron services as I've done things, I could also 
try stopping and starting dnsmasq.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/6/18 11:15 AM
To: torin.wolt...@granddial.com
Cc: "openstack@lists.openstack.org" , 
"openstack-operat...@lists.openstack.org" 
, pgso...@gmail.com
Subject: Re: [Openstack] Recovering from full outage
Can you manually assign an IP address to a VM and once inside, ping the address 
of the dhcp server?
That would confirm if there is connectivity at least.

Also, on the controller node where the dhcp server for that network is, check 
the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and 
make sure there are entries corresponding to your instances.

In my experience, if neutron is broken after working fine (so excluding any 
miss-configuration), then an agent is out-of-sync and restart usually fixes 
things.

On Fri, Jul 6, 2018 at 9:38 AM, Torin Woltjer  
wrote:
I have done tcpdumps on both the controllers and on a compute node.
Controller:
`ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i 
ns-83d68c76-b8 port 67`
`tcpdump -vnes0 -i any port 67`
Compute:
`tcpdump -vnes0 -i brqd85c2a00-a6 port 68`

For the first command on the controller, there are no packets captured at all. 
The second command on the controller captures packets, but they don't appear to 
be relevant to openstack. The dump from the compute node shows constant 
requests are getting sent by openstack instances.

In summary; DHCP requests are being sent, but are never received.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 4:50 PM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage

The cloud-init requires network connectivity by default in order to reach the 
metadata server for the hostname, ssh-key, etc

You can configure cloud-init to use the config-drive, but the lack of network 
connectivity will make the instance useless anyway, even though it will have 
you ssh-key and hostname...

Did you check the things I told you?

On Jul 5, 2018, at 16:06, Torin Woltjer  wrote:

Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't 
working on my VMs. created a new instance from an ubuntu 18.04 image to test 
with, the hostname was not set to the name of the instance and could not login 
as users I had specified in the configuration.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

--

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-06 Thread Torin Woltjer

I explored creating a second "selfservice" vxlan to see if DHCP would work on 
it as it does on my external "provider" network. The new vxlan network shares 
the same problems as the old vxlan network. Am I having problems with VXLAN in 
particular?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

----
From: "Torin Woltjer" 
Sent: 7/6/18 12:05 PM
To: 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP 
(located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually 
added the IP address to has a floating IP, and oddly enough I am able to ping 
DHCP on the provider network, which suggests that DHCP may be working on other 
networks but not on my selfservice network. I was able to confirm this by 
creating a new virtual machine directly on the provider network, I was able to 
ping to it and SSH into it right off of the bat, as it obtained the proper 
address on its own. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains:
fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8
fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7
fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12
fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10
fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3
fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14
fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1
fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2
fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4
fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100
fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13
fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18

I've done system restarts since the power outage and the agent hasn't corrected 
itself. I've restarted all neutron services as I've done things, I could also 
try stopping and starting dnsmasq.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/6/18 11:15 AM
To: torin.wolt...@granddial.com
Cc: "openst...@lists.openstack.org" , 
"openstack-operators@lists.openstack.org" 
, pgso...@gmail.com
Subject: Re: [Openstack] Recovering from full outage
Can you manually assign an IP address to a VM and once inside, ping the address 
of the dhcp server?
That would confirm if there is connectivity at least.

Also, on the controller node where the dhcp server for that network is, check 
the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and 
make sure there are entries corresponding to your instances.

In my experience, if neutron is broken after working fine (so excluding any 
miss-configuration), then an agent is out-of-sync and restart usually fixes 
things.

On Fri, Jul 6, 2018 at 9:38 AM, Torin Woltjer  
wrote:
I have done tcpdumps on both the controllers and on a compute node.
Controller:
`ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i 
ns-83d68c76-b8 port 67`
`tcpdump -vnes0 -i any port 67`
Compute:
`tcpdump -vnes0 -i brqd85c2a00-a6 port 68`

For the first command on the controller, there are no packets captured at all. 
The second command on the controller captures packets, but they don't appear to 
be relevant to openstack. The dump from the compute node shows constant 
requests are getting sent by openstack instances.

In summary; DHCP requests are being sent, but are never received.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 4:50 PM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage

The cloud-init requires network connectivity by default in order to reach the 
metadata server for the hostname, ssh-key, etc

You can configure cloud-init to use the config-drive, but the lack of network 
connectivity will make the instance useless anyway, even though it will have 
you ssh-key and hostname...

Did you check the things I told you?

On Jul 5, 2018, at 16:06, Torin Woltjer  wrote:

Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't 
working on my VMs. created a new instance from an ubuntu 18.04 image to test 
with, the hostname was not set to the name of the instance and could not login 
as users I had specified in the configuration.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

--

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-06 Thread Torin Woltjer

Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP 
(located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually 
added the IP address to has a floating IP, and oddly enough I am able to ping 
DHCP on the provider network, which suggests that DHCP may be working on other 
networks but not on my selfservice network. I was able to confirm this by 
creating a new virtual machine directly on the provider network, I was able to 
ping to it and SSH into it right off of the bat, as it obtained the proper 
address on its own. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains:
fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8
fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7
fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12
fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10
fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3
fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14
fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1
fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2
fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4
fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100
fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13
fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18

I've done system restarts since the power outage and the agent hasn't corrected 
itself. I've restarted all neutron services as I've done things, I could also 
try stopping and starting dnsmasq.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/6/18 11:15 AM
To: torin.wolt...@granddial.com
Cc: "openst...@lists.openstack.org" , 
"openstack-operators@lists.openstack.org" 
, pgso...@gmail.com
Subject: Re: [Openstack] Recovering from full outage
Can you manually assign an IP address to a VM and once inside, ping the address 
of the dhcp server?
That would confirm if there is connectivity at least.

Also, on the controller node where the dhcp server for that network is, check 
the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and 
make sure there are entries corresponding to your instances.

In my experience, if neutron is broken after working fine (so excluding any 
miss-configuration), then an agent is out-of-sync and restart usually fixes 
things.

On Fri, Jul 6, 2018 at 9:38 AM, Torin Woltjer  
wrote:
I have done tcpdumps on both the controllers and on a compute node.
Controller:
`ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i 
ns-83d68c76-b8 port 67`
`tcpdump -vnes0 -i any port 67`
Compute:
`tcpdump -vnes0 -i brqd85c2a00-a6 port 68`

For the first command on the controller, there are no packets captured at all. 
The second command on the controller captures packets, but they don't appear to 
be relevant to openstack. The dump from the compute node shows constant 
requests are getting sent by openstack instances.

In summary; DHCP requests are being sent, but are never received.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 4:50 PM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage

The cloud-init requires network connectivity by default in order to reach the 
metadata server for the hostname, ssh-key, etc

You can configure cloud-init to use the config-drive, but the lack of network 
connectivity will make the instance useless anyway, even though it will have 
you ssh-key and hostname...

Did you check the things I told you?

On Jul 5, 2018, at 16:06, Torin Woltjer  wrote:

Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't 
working on my VMs. created a new instance from an ubuntu 18.04 image to test 
with, the hostname was not set to the name of the instance and could not login 
as users I had specified in the configuration.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 12:57 PM
To: torin.wolt...@granddial.com
Cc: "openst...@lists.openstack.org" , 
"openstack-operators@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
You should tcpdump inside the qdhcp namespace to see if the requests make it 
there, and also check iptables rules on the compute nodes for the return 
traffic.

On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer  
wrote:
Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually 
come up with no addresses. neutron-dhcp-agent has been

Re: [Openstack] Recovering from full outage

2018-07-06 Thread Torin Woltjer

Interestingly, I can ping the neutron router at 172.16.1.1 just fine, but DHCP 
(located at 172.16.1.2 and 172.16.1.3) fails. The instance that I manually 
added the IP address to has a floating IP, and oddly enough I am able to ping 
DHCP on the provider network, which suggests that DHCP may be working on other 
networks but not on my selfservice network. I was able to confirm this by 
creating a new virtual machine directly on the provider network, I was able to 
ping to it and SSH into it right off of the bat, as it obtained the proper 
address on its own. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" is empty. 
"/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" contains:
fa:16:3e:3f:94:17,host-172-16-1-8.openstacklocal,172.16.1.8
fa:16:3e:e0:57:e7,host-172-16-1-7.openstacklocal,172.16.1.7
fa:16:3e:db:a7:cb,host-172-16-1-12.openstacklocal,172.16.1.12
fa:16:3e:f8:10:99,host-172-16-1-10.openstacklocal,172.16.1.10
fa:16:3e:a7:82:4c,host-172-16-1-3.openstacklocal,172.16.1.3
fa:16:3e:f8:23:1d,host-172-16-1-14.openstacklocal,172.16.1.14
fa:16:3e:63:53:a4,host-172-16-1-1.openstacklocal,172.16.1.1
fa:16:3e:b7:41:a8,host-172-16-1-2.openstacklocal,172.16.1.2
fa:16:3e:5e:25:5f,host-172-16-1-4.openstacklocal,172.16.1.4
fa:16:3e:3a:a2:53,host-172-16-1-100.openstacklocal,172.16.1.100
fa:16:3e:46:39:e2,host-172-16-1-13.openstacklocal,172.16.1.13
fa:16:3e:06:de:e0,host-172-16-1-18.openstacklocal,172.16.1.18

I've done system restarts since the power outage and the agent hasn't corrected 
itself. I've restarted all neutron services as I've done things, I could also 
try stopping and starting dnsmasq.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/6/18 11:15 AM
To: torin.wolt...@granddial.com
Cc: "openstack@lists.openstack.org" , 
"openstack-operat...@lists.openstack.org" 
, pgso...@gmail.com
Subject: Re: [Openstack] Recovering from full outage
Can you manually assign an IP address to a VM and once inside, ping the address 
of the dhcp server?
That would confirm if there is connectivity at least.

Also, on the controller node where the dhcp server for that network is, check 
the "/var/lib/neutron/dhcp/d85c2a00-a637-4109-83f0-7c2949be4cad/leases" and 
make sure there are entries corresponding to your instances.

In my experience, if neutron is broken after working fine (so excluding any 
miss-configuration), then an agent is out-of-sync and restart usually fixes 
things.

On Fri, Jul 6, 2018 at 9:38 AM, Torin Woltjer  
wrote:
I have done tcpdumps on both the controllers and on a compute node.
Controller:
`ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i 
ns-83d68c76-b8 port 67`
`tcpdump -vnes0 -i any port 67`
Compute:
`tcpdump -vnes0 -i brqd85c2a00-a6 port 68`

For the first command on the controller, there are no packets captured at all. 
The second command on the controller captures packets, but they don't appear to 
be relevant to openstack. The dump from the compute node shows constant 
requests are getting sent by openstack instances.

In summary; DHCP requests are being sent, but are never received.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 4:50 PM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage

The cloud-init requires network connectivity by default in order to reach the 
metadata server for the hostname, ssh-key, etc

You can configure cloud-init to use the config-drive, but the lack of network 
connectivity will make the instance useless anyway, even though it will have 
you ssh-key and hostname...

Did you check the things I told you?

On Jul 5, 2018, at 16:06, Torin Woltjer  wrote:

Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't 
working on my VMs. created a new instance from an ubuntu 18.04 image to test 
with, the hostname was not set to the name of the instance and could not login 
as users I had specified in the configuration.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 12:57 PM
To: torin.wolt...@granddial.com
Cc: "openstack@lists.openstack.org" , 
"openstack-operat...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
You should tcpdump inside the qdhcp namespace to see if the requests make it 
there, and also check iptables rules on the compute nodes for the return 
traffic.

On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer  
wrote:
Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually 
come up with no addresses. neutron-dhcp-agent has been

Re: [Openstack] Recovering from full outage

2018-07-06 Thread Torin Woltjer

I have done tcpdumps on both the controllers and on a compute node.
Controller:
`ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i 
ns-83d68c76-b8 port 67`
`tcpdump -vnes0 -i any port 67`
Compute:
`tcpdump -vnes0 -i brqd85c2a00-a6 port 68`

For the first command on the controller, there are no packets captured at all. 
The second command on the controller captures packets, but they don't appear to 
be relevant to openstack. The dump from the compute node shows constant 
requests are getting sent by openstack instances.

In summary; DHCP requests are being sent, but are never received.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 4:50 PM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage

The cloud-init requires network connectivity by default in order to reach the 
metadata server for the hostname, ssh-key, etc

You can configure cloud-init to use the config-drive, but the lack of network 
connectivity will make the instance useless anyway, even though it will have 
you ssh-key and hostname...

Did you check the things I told you?

On Jul 5, 2018, at 16:06, Torin Woltjer  wrote:

Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't 
working on my VMs. created a new instance from an ubuntu 18.04 image to test 
with, the hostname was not set to the name of the instance and could not login 
as users I had specified in the configuration.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 12:57 PM
To: torin.wolt...@granddial.com
Cc: "openstack@lists.openstack.org" , 
"openstack-operat...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
You should tcpdump inside the qdhcp namespace to see if the requests make it 
there, and also check iptables rules on the compute nodes for the return 
traffic.

On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer  
wrote:
Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually 
come up with no addresses. neutron-dhcp-agent has been restarted on both 
controllers. The qdhcp netns's were all present; I stopped the service, removed 
the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, 
restarted all neutron services, noted the qdhcp netns's were recreated, 
restarted a VM again and it still fails to pull an IP address.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 10:38 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Did you restart the neutron-dhcp-agent  and rebooted the VMs?

On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer  
wrote:
The qrouter netns appears once the lock_path is specified, the neutron router 
is pingable as well. However, instances are not pingable. If I log in via 
console, the instances have not been given IP addresses, if I manually give 
them an address and route they are pingable and seem to work. So the router is 
working correctly but dhcp is not working.

No errors in any of the neutron or nova logs on controllers or compute nodes.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

----
From: "Torin Woltjer" 
Sent: 7/5/18 8:53 AM
To: 
Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

--------
From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operat...@lists.openstack.org" 
, "openstack@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-06 Thread Torin Woltjer

I have done tcpdumps on both the controllers and on a compute node.
Controller:
`ip netns exec qdhcp-d85c2a00-a637-4109-83f0-7c2949be4cad tcpdump -vnes0 -i 
ns-83d68c76-b8 port 67`
`tcpdump -vnes0 -i any port 67`
Compute:
`tcpdump -vnes0 -i brqd85c2a00-a6 port 68`

For the first command on the controller, there are no packets captured at all. 
The second command on the controller captures packets, but they don't appear to 
be relevant to openstack. The dump from the compute node shows constant 
requests are getting sent by openstack instances.

In summary; DHCP requests are being sent, but are never received.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 4:50 PM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage

The cloud-init requires network connectivity by default in order to reach the 
metadata server for the hostname, ssh-key, etc

You can configure cloud-init to use the config-drive, but the lack of network 
connectivity will make the instance useless anyway, even though it will have 
you ssh-key and hostname...

Did you check the things I told you?

On Jul 5, 2018, at 16:06, Torin Woltjer  wrote:

Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't 
working on my VMs. created a new instance from an ubuntu 18.04 image to test 
with, the hostname was not set to the name of the instance and could not login 
as users I had specified in the configuration.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 12:57 PM
To: torin.wolt...@granddial.com
Cc: "openst...@lists.openstack.org" , 
"openstack-operators@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
You should tcpdump inside the qdhcp namespace to see if the requests make it 
there, and also check iptables rules on the compute nodes for the return 
traffic.

On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer  
wrote:
Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually 
come up with no addresses. neutron-dhcp-agent has been restarted on both 
controllers. The qdhcp netns's were all present; I stopped the service, removed 
the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, 
restarted all neutron services, noted the qdhcp netns's were recreated, 
restarted a VM again and it still fails to pull an IP address.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 10:38 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Did you restart the neutron-dhcp-agent  and rebooted the VMs?

On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer  
wrote:
The qrouter netns appears once the lock_path is specified, the neutron router 
is pingable as well. However, instances are not pingable. If I log in via 
console, the instances have not been given IP addresses, if I manually give 
them an address and route they are pingable and seem to work. So the router is 
working correctly but dhcp is not working.

No errors in any of the neutron or nova logs on controllers or compute nodes.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

----
From: "Torin Woltjer" 
Sent: 7/5/18 8:53 AM
To: 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

--------
From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operators@lists.openstack.org" 
, "openst...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-05 Thread Torin Woltjer

Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't 
working on my VMs. created a new instance from an ubuntu 18.04 image to test 
with, the hostname was not set to the name of the instance and could not login 
as users I had specified in the configuration.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 12:57 PM
To: torin.wolt...@granddial.com
Cc: "openst...@lists.openstack.org" , 
"openstack-operators@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
You should tcpdump inside the qdhcp namespace to see if the requests make it 
there, and also check iptables rules on the compute nodes for the return 
traffic.

On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer  
wrote:
Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually 
come up with no addresses. neutron-dhcp-agent has been restarted on both 
controllers. The qdhcp netns's were all present; I stopped the service, removed 
the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, 
restarted all neutron services, noted the qdhcp netns's were recreated, 
restarted a VM again and it still fails to pull an IP address.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 10:38 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Did you restart the neutron-dhcp-agent  and rebooted the VMs?

On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer  
wrote:
The qrouter netns appears once the lock_path is specified, the neutron router 
is pingable as well. However, instances are not pingable. If I log in via 
console, the instances have not been given IP addresses, if I manually give 
them an address and route they are pingable and seem to work. So the router is 
working correctly but dhcp is not working.

No errors in any of the neutron or nova logs on controllers or compute nodes.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

----
From: "Torin Woltjer" 
Sent: 7/5/18 8:53 AM
To: 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

--------
From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operators@lists.openstack.org" 
, "openst...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all`

Re: [Openstack] Recovering from full outage

2018-07-05 Thread Torin Woltjer

Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't 
working on my VMs. created a new instance from an ubuntu 18.04 image to test 
with, the hostname was not set to the name of the instance and could not login 
as users I had specified in the configuration.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 12:57 PM
To: torin.wolt...@granddial.com
Cc: "openstack@lists.openstack.org" , 
"openstack-operat...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
You should tcpdump inside the qdhcp namespace to see if the requests make it 
there, and also check iptables rules on the compute nodes for the return 
traffic.

On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer  
wrote:
Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually 
come up with no addresses. neutron-dhcp-agent has been restarted on both 
controllers. The qdhcp netns's were all present; I stopped the service, removed 
the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, 
restarted all neutron services, noted the qdhcp netns's were recreated, 
restarted a VM again and it still fails to pull an IP address.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 10:38 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Did you restart the neutron-dhcp-agent  and rebooted the VMs?

On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer  
wrote:
The qrouter netns appears once the lock_path is specified, the neutron router 
is pingable as well. However, instances are not pingable. If I log in via 
console, the instances have not been given IP addresses, if I manually give 
them an address and route they are pingable and seem to work. So the router is 
working correctly but dhcp is not working.

No errors in any of the neutron or nova logs on controllers or compute nodes.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

----
From: "Torin Woltjer" 
Sent: 7/5/18 8:53 AM
To: 
Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

--------
From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operat...@lists.openstack.org" 
, "openstack@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all`

Re: [Openstack] Recovering from full outage

2018-07-05 Thread Torin Woltjer

Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually 
come up with no addresses. neutron-dhcp-agent has been restarted on both 
controllers. The qdhcp netns's were all present; I stopped the service, removed 
the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, 
restarted all neutron services, noted the qdhcp netns's were recreated, 
restarted a VM again and it still fails to pull an IP address.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 10:38 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Did you restart the neutron-dhcp-agent  and rebooted the VMs?

On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer  
wrote:
The qrouter netns appears once the lock_path is specified, the neutron router 
is pingable as well. However, instances are not pingable. If I log in via 
console, the instances have not been given IP addresses, if I manually give 
them an address and route they are pingable and seem to work. So the router is 
working correctly but dhcp is not working.

No errors in any of the neutron or nova logs on controllers or compute nodes.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/5/18 8:53 AM
To: 
Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

----
From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operat...@lists.openstack.org" 
, "openstack@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down. Running `ip addr` on any of the nodes shows 
that none of the bridges are present and `ip netns` shows that all of the 
network namespaces are missing as well. So despite all of the neutron service 
running, none of the networking appears to be active, which is concerning. How 
do I solve this without recreating all of the networks?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/ope

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-05 Thread Torin Woltjer

Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually 
come up with no addresses. neutron-dhcp-agent has been restarted on both 
controllers. The qdhcp netns's were all present; I stopped the service, removed 
the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, 
restarted all neutron services, noted the qdhcp netns's were recreated, 
restarted a VM again and it still fails to pull an IP address.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 10:38 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Did you restart the neutron-dhcp-agent  and rebooted the VMs?

On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer  
wrote:
The qrouter netns appears once the lock_path is specified, the neutron router 
is pingable as well. However, instances are not pingable. If I log in via 
console, the instances have not been given IP addresses, if I manually give 
them an address and route they are pingable and seem to work. So the router is 
working correctly but dhcp is not working.

No errors in any of the neutron or nova logs on controllers or compute nodes.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/5/18 8:53 AM
To: 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

----
From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operators@lists.openstack.org" 
, "openst...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down. Running `ip addr` on any of the nodes shows 
that none of the bridges are present and `ip netns` shows that all of the 
network namespaces are missing as well. So despite all of the neutron service 
running, none of the networking appears to be active, which is concerning. How 
do I solve this without recreating all of the networks?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openst...@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
h

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-05 Thread Torin Woltjer

The qrouter netns appears once the lock_path is specified, the neutron router 
is pingable as well. However, instances are not pingable. If I log in via 
console, the instances have not been given IP addresses, if I manually give 
them an address and route they are pingable and seem to work. So the router is 
working correctly but dhcp is not working.

No errors in any of the neutron or nova logs on controllers or compute nodes.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/5/18 8:53 AM
To: 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

----
From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operators@lists.openstack.org" 
, "openst...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down. Running `ip addr` on any of the nodes shows 
that none of the bridges are present and `ip netns` shows that all of the 
network namespaces are missing as well. So despite all of the neutron service 
running, none of the networking appears to be active, which is concerning. How 
do I solve this without recreating all of the networks?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openst...@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack] Recovering from full outage

2018-07-05 Thread Torin Woltjer

The qrouter netns appears once the lock_path is specified, the neutron router 
is pingable as well. However, instances are not pingable. If I log in via 
console, the instances have not been given IP addresses, if I manually give 
them an address and route they are pingable and seem to work. So the router is 
working correctly but dhcp is not working.

No errors in any of the neutron or nova logs on controllers or compute nodes.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/5/18 8:53 AM
To: 
Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

----
From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operat...@lists.openstack.org" 
, "openstack@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down. Running `ip addr` on any of the nodes shows 
that none of the bridges are present and `ip netns` shows that all of the 
network namespaces are missing as well. So despite all of the neutron service 
running, none of the networking appears to be active, which is concerning. How 
do I solve this without recreating all of the networks?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Recovering from full outage

2018-07-05 Thread Torin Woltjer

There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operat...@lists.openstack.org, openstack@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operat...@lists.openstack.org" 
, "openstack@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down. Running `ip addr` on any of the nodes shows 
that none of the bridges are present and `ip netns` shows that all of the 
network namespaces are missing as well. So despite all of the neutron service 
running, none of the networking appears to be active, which is concerning. How 
do I solve this without recreating all of the networks?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-05 Thread Torin Woltjer

There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operators@lists.openstack.org" 
, "openst...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down. Running `ip addr` on any of the nodes shows 
that none of the bridges are present and `ip netns` shows that all of the 
network namespaces are missing as well. So despite all of the neutron service 
running, none of the networking appears to be active, which is concerning. How 
do I solve this without recreating all of the networks?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openst...@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack] Recovering from full outage

2018-07-03 Thread Torin Woltjer

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operat...@lists.openstack.org" 
, "openstack@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down. Running `ip addr` on any of the nodes shows 
that none of the bridges are present and `ip netns` shows that all of the 
network namespaces are missing as well. So despite all of the neutron service 
running, none of the networking appears to be active, which is concerning. How 
do I solve this without recreating all of the networks?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-03 Thread Torin Woltjer

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operators@lists.openstack.org" 
, "openst...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down. Running `ip addr` on any of the nodes shows 
that none of the bridges are present and `ip netns` shows that all of the 
network namespaces are missing as well. So despite all of the neutron service 
running, none of the networking appears to be active, which is concerning. How 
do I solve this without recreating all of the networks?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openst...@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack] Recovering from full outage

2018-07-03 Thread Torin Woltjer

Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down. Running `ip addr` on any of the nodes shows 
that none of the bridges are present and `ip netns` shows that all of the 
network namespaces are missing as well. So despite all of the neutron service 
running, none of the networking appears to be active, which is concerning. How 
do I solve this without recreating all of the networks?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] Recovering from full outage

2018-07-03 Thread Torin Woltjer

We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down. Running `ip addr` on any of the nodes shows 
that none of the bridges are present and `ip netns` shows that all of the 
network namespaces are missing as well. So despite all of the neutron service 
running, none of the networking appears to be active, which is concerning. How 
do I solve this without recreating all of the networks?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] masakari client (cannot list/add segment)

2018-07-02 Thread Torin Woltjer

Installing it with tox instead of pip seems to have precisely the same effect. 
Is there a config file for the masakari client that I am not aware of? Nothing 
seems to be provided with it, and documentation is nonexistant.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/2/18 11:45 AM
To: 
Subject: Re: [Openstack] masakari client (cannot list/add segment)
Running the command with the -d debug option provides this python traceback:

Traceback (most recent call last):
 File "/usr/local/bin/masakari", line 11, in 
   sys.exit(main())
 File "/usr/local/lib/python2.7/dist-packages/masakariclient/shell.py", line 
189, in main
   MasakariShell().main(args)
 File "/usr/local/lib/python2.7/dist-packages/masakariclient/shell.py", line 
160, in main
   sc = self._setup_masakari_client(api_ver, args)
 File "/usr/local/lib/python2.7/dist-packages/masakariclient/shell.py", line 
116, in _setup_masakari_client
   return masakari_client.Client(api_ver, user_agent=USER_AGENT, **kwargs)
 File "/usr/local/lib/python2.7/dist-packages/masakariclient/client.py", line 
28, in Client
   return cls(*args, **kwargs)
 File "/usr/local/lib/python2.7/dist-packages/masakariclient/v1/client.py", 
line 22, in __init__
   prof=prof, user_agent=user_agent, **kwargs)
 File 
"/usr/local/lib/python2.7/dist-packages/masakariclient/sdk/ha/connection.py", 
line 48, in create_connection
   raise e
AttributeError: 'NoneType' object has no attribute 'auth_url'

Specifying --os-auth-url http://controller:5000 doesn't change this. Is 
python-masakariclient incorrectly installed?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/2/18 8:43 AM
To: "openstack@lists.openstack.org" 
Subject: masakari client (cannot list/add segment)
Installed masakari 4.0.0 on queens. Hostmonitor, instancemonitor, and 
processmonitor all running on compute nodes. API and engine running on 
controller nodes. I've tried using the masakari client to list/add segments, 
any of those commands does nothing and returns:

("'NoneType' object has no attribute 'auth_url'", ', mode 
'w' at 0x7f26bb4b71e0>)

I cannot find any log file for the masakari client to troubleshoot this further.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] masakari client (cannot list/add segment)

2018-07-02 Thread Torin Woltjer

Running the command with the -d debug option provides this python traceback:

Traceback (most recent call last):
 File "/usr/local/bin/masakari", line 11, in 
   sys.exit(main())
 File "/usr/local/lib/python2.7/dist-packages/masakariclient/shell.py", line 
189, in main
   MasakariShell().main(args)
 File "/usr/local/lib/python2.7/dist-packages/masakariclient/shell.py", line 
160, in main
   sc = self._setup_masakari_client(api_ver, args)
 File "/usr/local/lib/python2.7/dist-packages/masakariclient/shell.py", line 
116, in _setup_masakari_client
   return masakari_client.Client(api_ver, user_agent=USER_AGENT, **kwargs)
 File "/usr/local/lib/python2.7/dist-packages/masakariclient/client.py", line 
28, in Client
   return cls(*args, **kwargs)
 File "/usr/local/lib/python2.7/dist-packages/masakariclient/v1/client.py", 
line 22, in __init__
   prof=prof, user_agent=user_agent, **kwargs)
 File 
"/usr/local/lib/python2.7/dist-packages/masakariclient/sdk/ha/connection.py", 
line 48, in create_connection
   raise e
AttributeError: 'NoneType' object has no attribute 'auth_url'

Specifying --os-auth-url http://controller:5000 doesn't change this. Is 
python-masakariclient incorrectly installed?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/2/18 8:43 AM
To: "openstack@lists.openstack.org" 
Subject: masakari client (cannot list/add segment)
Installed masakari 4.0.0 on queens. Hostmonitor, instancemonitor, and 
processmonitor all running on compute nodes. API and engine running on 
controller nodes. I've tried using the masakari client to list/add segments, 
any of those commands does nothing and returns:

("'NoneType' object has no attribute 'auth_url'", ', mode 
'w' at 0x7f26bb4b71e0>)

I cannot find any log file for the masakari client to troubleshoot this further.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] flavor metadata

2018-07-02 Thread Torin Woltjer

I would recommend using availability zones for this.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: Satish Patel 
Sent: 7/1/18 9:56 AM
To: openstack 
Subject: [Openstack] flavor metadata
Folks,

Recently we build openstack for production and i have question related
flavor metadata.

I have 3 kind of servers 8 core / 32 core / 40 core servers, now i
want to tell my openstack my one of specific application always go to
32 core machine, how do i tell that to flavor metadata?

Or should i use availability zone option and create two group?

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] DNS integration

2018-07-02 Thread Torin Woltjer

Have a look at Designate: https://wiki.openstack.org/wiki/Designate
It has support for powerDNS, and sounds like what you're looking for.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: Satish Patel 
Sent: 7/1/18 10:27 AM
To: openstack 
Subject: [Openstack] DNS integration
Folks,

Is there a way to tell openstack when you launch instance add them in
external DNS using some kind of api call?

We are using extarnal pDSN (power DNS) and wants my VM get register
itself as soon as we launch them, is it possible by neutron or we
should use cloud-init?

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] masakari client (cannot list/add segment)

2018-07-02 Thread Torin Woltjer

Installed masakari 4.0.0 on queens. Hostmonitor, instancemonitor, and 
processmonitor all running on compute nodes. API and engine running on 
controller nodes. I've tried using the masakari client to list/add segments, 
any of those commands does nothing and returns:

("'NoneType' object has no attribute 'auth_url'", ', mode 
'w' at 0x7f26bb4b71e0>)

I cannot find any log file for the masakari client to troubleshoot this further.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] (no subject)

2018-06-29 Thread Torin Woltjer

Installed masakari 4.0.0 on queens. Hostmonitor, instancemonitor, and 
processmonitor all running on compute nodes. API and engine running on 
controller nodes. I've tried using the masakari client to list/add segments, 
any of those commands does nothing and returns:

("'NoneType' object has no attribute 'auth_url'", ', mode 
'w' at 0x7f26bb4b71e0>)

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Masakari on queens

2018-06-29 Thread Torin Woltjer

The wrong address was specified in the corosync configuration. Corrected that 
and now it runs without error. The important part here was the -c 1 switch of 
tcpdump. Timeout was being reached before a single packet was captured on 
tcpdump ( because the configuration of corosync was incorrect ). Once timeout 
was reached it was producing an exit code 124, which triggered the exception in 
the host_handler.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 6/22/18 2:17 PM
To: "tushar.pa...@nttdata.com" 
Subject: Re: Masakari on queens
Oddly enough, I never made changes to the original code to get that output. It 
is just masakari-monitor 4.0.0 as installed by pip.

Here are the changes and output to that code snippit you sent:
http://paste.openstack.org/show/723924/

I'd like to increase the logging, but I'm not familiar with the codebase and 
lack more than a rudimentary knowledge of python. I've found where it seems pip 
installed the files for masakari-hostmonitor, but I don't know which one 
contains the corosync bit.


From: "Patil, Tushar" 
Sent: 6/20/18 12:51 AM
To: "torin.wolt...@granddial.com" 
Subject: Re: Masakari on queens
Hi Torin,

Option -i is correct.

It seems that you have modified code to log error message: 
"ProcessExecutionError: Unexpected error while running command."

Could you please log 'stderr' and 'exit_code' as well in order to know the 
exact error you are getting?
I suspect you must be getting 124 exit code.

This is a small program which I have created to simulate the error you are 
getting.
http://paste.openstack.org/show/723882/

Please specify interface and port as per your configuration and run the program.

Regards,
Tushar Patil

________
From: Torin Woltjer
Sent: Tuesday, June 19, 2018 9:58:32 PM
To: Patil, Tushar
Subject: Re: Masakari on queens

Thank for the reply. Tushar Patil.

The command:
$timeout 5 tcpdump -n -c 1 -p -I vlan60 port 5405
returns:
"tcpdump: enp2s0f0: That device doesn't support monitor mode"

The command:
(lowercase i)
$timeout 5 tcpdump -n -c 1 -p -i vlan60 port 5405
Runs fine with no errors:
"tcpdump: listening on vlan60, link-type EN10MB (Ethernet), capture size 262144 
bytes"

The in use interfaces on all of my nodes are as follows:

enp2s0f0=192.168.114.x
enp3s0f0=bond0=vlan60,vlan101
enp3s0f1=bond0=vlan60,vlan101
vlan60=management
vlan101=provider

>From this part of handle_host.py I can't tell what is causing the command to 
>raise exception.


From: "Patil, Tushar"
Sent: 6/18/18 9:10 PM
To: "openstack@lists.openstack.org" , "torin.wolt...@granddial.com"
Subject: Re: Masakari on queens
Hi Torin,

Looking at the code, it seems it is trying to run below command as root user.

timeout tcpdump -n -c 1 -p -I port

where,
tcpdump_timeout -> CONF.host.tcpdump_timeout -> default value is 5 seconds
multicast_interface -> corosync_multicast_interface -> vlan60
multicast_ports-> corosync_multicast_ports -> 5405

Unfortunately, the error message is suppressed [1] hence it's difficult to know 
the exact reason.
Can you please run below command on the host where you are running 
masakari-hostmonitor service? The error you would get after running this 
command would give you some hint to troubleshoot this issue further.

$timeout 5 tcpdump -n -c 1 -p -I vlan60 port 5405

[1] : 
https://github.com/openstack/masakari-monitors/blob/cde057bc685b7bbc35f5c425f9690b01766654b2/masakarimonitors/hostmonitor/host_handler/handle_host.py#L121

Regards,
Tushar Patil


From: Torin Woltjer
Sent: Tuesday, June 19, 2018 4:01:29 AM
To: Patil, Tushar; openstack@lists.openstack.org
Subject: Masakari on queens

Hello Tushar Patil,

I have upgraded to Openstack Queens and am trying to run Masakari version 4.0.0 
. I'm curious what additional configuration is required to get this set up 
correctly.

/etc/masakarimonitors/masakarimonitors.conf
http://paste.openstack.org/show/723726/

masakari-hostmonitor is giving me errors like:
2018-06-18 12:44:44.812 18236 ERROR 
masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync 
communication is failed.: ProcessExecutionError: Unexpected error while running 
command.
2018-06-18 12:45:14.895 18236 INFO 
masakarimonitors.hostmonitor.host_handler.handle_host [-] 
'UBNTU-OSTACK-COMPUTE2' is 'online'.
2018-06-18 12:46:20.047 18236 WARNING 
masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync 
communication using 'vlan60' is failed.: ProcessExecutionError: Unexpected 
error while running command.

Do you have any knowledge on this?
Disclaimer: This email and any attachments are sent in strictest confidence for 
the sole use of

Re: [Openstack] Masakari on queens

2018-06-20 Thread Torin Woltjer

Oddly enough, I never made changes to the original code to get that output. It 
is just masakari-monitor 4.0.0 as installed by pip.

Here are the changes and output to that code snippit you sent:
http://paste.openstack.org/show/723924/

I'd like to increase the logging, but I'm not familiar with the codebase and 
lack more than a rudimentary knowledge of python. I've found where it seems pip 
installed the files for masakari-hostmonitor, but I don't know which one 
contains the corosync bit.


From: "Patil, Tushar" 
Sent: 6/20/18 12:51 AM
To: "torin.wolt...@granddial.com" 
Subject: Re: Masakari on queens
Hi Torin,

Option -i is correct.

It seems that you have modified code to log error message: 
"ProcessExecutionError: Unexpected error while running command."

Could you please log 'stderr' and 'exit_code' as well in order to know the 
exact error you are getting?
I suspect you must be getting 124 exit code.

This is a small program which I have created to simulate the error you are 
getting.
http://paste.openstack.org/show/723882/

Please specify interface and port as per your configuration and run the program.

Regards,
Tushar Patil

________
From: Torin Woltjer
Sent: Tuesday, June 19, 2018 9:58:32 PM
To: Patil, Tushar
Subject: Re: Masakari on queens

Thank for the reply. Tushar Patil.

The command:
$timeout 5 tcpdump -n -c 1 -p -I vlan60 port 5405
returns:
"tcpdump: enp2s0f0: That device doesn't support monitor mode"

The command:
(lowercase i)
$timeout 5 tcpdump -n -c 1 -p -i vlan60 port 5405
Runs fine with no errors:
"tcpdump: listening on vlan60, link-type EN10MB (Ethernet), capture size 262144 
bytes"

The in use interfaces on all of my nodes are as follows:

enp2s0f0=192.168.114.x
enp3s0f0=bond0=vlan60,vlan101
enp3s0f1=bond0=vlan60,vlan101
vlan60=management
vlan101=provider

>From this part of handle_host.py I can't tell what is causing the command to 
>raise exception.


From: "Patil, Tushar"
Sent: 6/18/18 9:10 PM
To: "openstack@lists.openstack.org" , "torin.wolt...@granddial.com"
Subject: Re: Masakari on queens
Hi Torin,

Looking at the code, it seems it is trying to run below command as root user.

timeout tcpdump -n -c 1 -p -I port

where,
tcpdump_timeout -> CONF.host.tcpdump_timeout -> default value is 5 seconds
multicast_interface -> corosync_multicast_interface -> vlan60
multicast_ports-> corosync_multicast_ports -> 5405

Unfortunately, the error message is suppressed [1] hence it's difficult to know 
the exact reason.
Can you please run below command on the host where you are running 
masakari-hostmonitor service? The error you would get after running this 
command would give you some hint to troubleshoot this issue further.

$timeout 5 tcpdump -n -c 1 -p -I vlan60 port 5405

[1] : 
https://github.com/openstack/masakari-monitors/blob/cde057bc685b7bbc35f5c425f9690b01766654b2/masakarimonitors/hostmonitor/host_handler/handle_host.py#L121

Regards,
Tushar Patil


From: Torin Woltjer
Sent: Tuesday, June 19, 2018 4:01:29 AM
To: Patil, Tushar; openstack@lists.openstack.org
Subject: Masakari on queens

Hello Tushar Patil,

I have upgraded to Openstack Queens and am trying to run Masakari version 4.0.0 
. I'm curious what additional configuration is required to get this set up 
correctly.

/etc/masakarimonitors/masakarimonitors.conf
http://paste.openstack.org/show/723726/

masakari-hostmonitor is giving me errors like:
2018-06-18 12:44:44.812 18236 ERROR 
masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync 
communication is failed.: ProcessExecutionError: Unexpected error while running 
command.
2018-06-18 12:45:14.895 18236 INFO 
masakarimonitors.hostmonitor.host_handler.handle_host [-] 
'UBNTU-OSTACK-COMPUTE2' is 'online'.
2018-06-18 12:46:20.047 18236 WARNING 
masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync 
communication using 'vlan60' is failed.: ProcessExecutionError: Unexpected 
error while running command.

Do you have any knowledge on this?
Disclaimer: This email and any attachments are sent in strictest confidence for 
the sole use of the addressee and may contain legally privileged,confidential, 
and proprietary data. If you are not the intended recipient,please advise the 
sender by replying promptly to this email and then delete and destroy this 
email and any attachments without any further use, copying or forwarding.
Disclaimer: This email and any attachments are sent in strictest confidence for 
the sole use of the addressee and may contain legally privileged,confidential, 
and proprietary data. If you are not the intended recipient,please advise the 
sender by replying promptly to this email and then delete and destroy this 
email and any attachments without any further use, copying or forwarding.

Re: [Openstack] Masakari on queens

2018-06-19 Thread Torin Woltjer

Thank for the reply. Tushar Patil.

The command:
$timeout 5 tcpdump -n -c 1 -p -I vlan60 port 5405
returns:
"tcpdump: enp2s0f0: That device doesn't support monitor mode"

The command:
(lowercase i)
$timeout 5 tcpdump -n -c 1 -p -i vlan60 port 5405
Runs fine with no errors:
"tcpdump: listening on vlan60, link-type EN10MB (Ethernet), capture size 262144 
bytes"

The in use interfaces on all of my nodes are as follows:

enp2s0f0=192.168.114.x
enp3s0f0=bond0=vlan60,vlan101
enp3s0f1=bond0=vlan60,vlan101
vlan60=management
vlan101=provider

>From this part of handle_host.py I can't tell what is causing the command to 
>raise exception.


From: "Patil, Tushar" 
Sent: 6/18/18 9:10 PM
To: "openstack@lists.openstack.org" , 
"torin.wolt...@granddial.com" 
Subject: Re: Masakari on queens
Hi Torin,

Looking at the code, it seems it is trying to run below command as root user.

timeout tcpdump -n -c 1 -p -I port

where,
tcpdump_timeout -> CONF.host.tcpdump_timeout -> default value is 5 seconds
multicast_interface -> corosync_multicast_interface -> vlan60
multicast_ports-> corosync_multicast_ports -> 5405

Unfortunately, the error message is suppressed [1] hence it's difficult to know 
the exact reason.
Can you please run below command on the host where you are running 
masakari-hostmonitor service? The error you would get after running this 
command would give you some hint to troubleshoot this issue further.

$timeout 5 tcpdump -n -c 1 -p -I vlan60 port 5405

[1] : 
https://github.com/openstack/masakari-monitors/blob/cde057bc685b7bbc35f5c425f9690b01766654b2/masakarimonitors/hostmonitor/host_handler/handle_host.py#L121

Regards,
Tushar Patil


From: Torin Woltjer
Sent: Tuesday, June 19, 2018 4:01:29 AM
To: Patil, Tushar; openstack@lists.openstack.org
Subject: Masakari on queens

Hello Tushar Patil,

I have upgraded to Openstack Queens and am trying to run Masakari version 4.0.0 
. I'm curious what additional configuration is required to get this set up 
correctly.

/etc/masakarimonitors/masakarimonitors.conf
http://paste.openstack.org/show/723726/

masakari-hostmonitor is giving me errors like:
2018-06-18 12:44:44.812 18236 ERROR 
masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync 
communication is failed.: ProcessExecutionError: Unexpected error while running 
command.
2018-06-18 12:45:14.895 18236 INFO 
masakarimonitors.hostmonitor.host_handler.handle_host [-] 
'UBNTU-OSTACK-COMPUTE2' is 'online'.
2018-06-18 12:46:20.047 18236 WARNING 
masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync 
communication using 'vlan60' is failed.: ProcessExecutionError: Unexpected 
error while running command.

Do you have any knowledge on this?
Disclaimer: This email and any attachments are sent in strictest confidence for 
the sole use of the addressee and may contain legally privileged,confidential, 
and proprietary data. If you are not the intended recipient,please advise the 
sender by replying promptly to this email and then delete and destroy this 
email and any attachments without any further use, copying or forwarding.


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] Masakari on queens

2018-06-18 Thread Torin Woltjer

Hello Tushar Patil,

I have upgraded to Openstack Queens and am trying to run Masakari version 4.0.0 
. I'm curious what additional configuration is required to get this set up 
correctly.

/etc/masakarimonitors/masakarimonitors.conf
http://paste.openstack.org/show/723726/

masakari-hostmonitor is giving me errors like:
2018-06-18 12:44:44.812 18236 ERROR 
masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync 
communication is failed.: ProcessExecutionError: Unexpected error while running 
command.
2018-06-18 12:45:14.895 18236 INFO 
masakarimonitors.hostmonitor.host_handler.handle_host [-] 
'UBNTU-OSTACK-COMPUTE2' is 'online'.
2018-06-18 12:46:20.047 18236 WARNING 
masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync 
communication using 'vlan60' is failed.: ProcessExecutionError: Unexpected 
error while running command.

Do you have any knowledge on this?


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] Masakari Setup

2018-06-14 Thread Torin Woltjer

Currently trying to run masakari 4.0.0 on openstack queens. I have corosync + 
pacemaker running on compute nodes, crm status shows both running.

When I run masakari-hostmonitor, I see 2 errors that are repeated while running.

2018-06-14 09:48:58.475 11062 WARNING 
masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync 
communication using 'vlan60' is failed.: ProcessExecutionError: Unexpected 
error while running command.
2018-06-14 09:48:58.476 11062 ERROR 
masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync 
communication is failed.: ProcessExecutionError: Unexpected error while running 
command.

This is my /etc/masakarimonitors/masakarimonitors.conf

[DEFAULT]
[api]
auth_uri = http://controller:5000
auth_url = http://controller:5000
memcached_servers = controller1:11211,controller2:11211
auth_type = password
project_domain_name = default
user_domain_name = default
project_name = service
username = masakari
password = **

[host]
corosync_multicast_interfaces = vlan60
corosync_multicast_ports = 5405


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] Openstack Dashboard console iframe

2018-06-07 Thread Torin Woltjer

When using haproxy for the openstack dashboard the iframe for the instance 
console stops working. The console does work without the iframe, if I open it 
in its own window.

nova.conf on controller:

my_ip = 192.168.116.21

[vnc]
enabled = true
server_listen = $my_ip
server_proxyclient_address = $my_ip
novncproxy_host = $my_ip

nova.conf on compute:

my_ip = 192.168.116.23

[vnc]
enabled = True
server_listen = 0.0.0.0
server_proxyclient_address = $my_ip
novncproxy_base_url = http://controller:6080/vnc_auto.html

controller in the host file resolves to 192.168.116.16, the VIP on HAProxy.
Is there something wrong with this configuration? Does anybody else have this 
problem with the console iframe when using HAProxy?

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] Cinder Queens installdoc wrong

2018-05-24 Thread Torin Woltjer

I've upgraded from pike to queens, and the keystone admin port 35357 has been 
deprecated in favor of 5000 it seems. However, the documentation for the 
installation of cinder still uses that port in [keystone_authtoken]. What is 
the correct entry for this line? auth_url = http://controller:5000 I imagine.

https://docs.openstack.org/cinder/queens/install/cinder-controller-install-ubuntu.html


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Masakari client error

2018-05-17 Thread Torin Woltjer

Hi Tushar,

Thanks for linking to that document, I hadn't seen it before and it's very 
useful. As far as milestones are concerned, I was planning on sticking with 
Pike. Up until this point I've been using the packages from 
http://ubuntu-cloud.archive.canonical.com/ubuntu xenial-updates/pike main, when 
I installed the latest packages from pip, I was ignorant of what I would doing 
and what would happen. I may switch to queens or rocky, but I would like to 
upgrade to the latest Ubuntu LTS if I am to do that (bionic only has a repo for 
rocky but not queens I believe).

From: "Patil, Tushar" <tushar.pa...@nttdata.com>
Sent: 5/16/18 9:03 PM
To: "fu...@yuggoth.org" <fu...@yuggoth.org>, "openstack@lists.openstack.org" 
<openstack@lists.openstack.org>, "torin.wolt...@granddial.com" 
<torin.wolt...@granddial.com>
Subject: Re: [Openstack] Masakari client error
Hi Torin,

If you are using stable/pike, then it is recommended to use 
python-masakariclient version 3.0.1 [1] which requires openstacksdk version 
0.9.17.

Are you trying to upgrade your stable/pike environment to the latest 
rocky-milestone1 (all services including Masakari)?

[1] : 
https://github.com/openstack/requirements/blob/stable/pike/upper-constraints.txt

Regards,
Tushar Patil

From: Torin Woltjer
Sent: Wednesday, May 16, 2018 10:32:10 PM
To: fu...@yuggoth.org; openstack@lists.openstack.org
Subject: Re: [Openstack] Masakari client error

It looks like pip install actually upgraded my openstacksdk to 0.13 when I 
installed masakari from pip. Meanwhile the sdk in the 16.04 repository is 
0.9.17. I'm wondering now if this might explain why my block storage is also 
having problems. What is the process for setting up a local environment for 
separate versions of the SDK (With different services using each?)

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

From: Jeremy Stanley
Sent: 5/16/18 8:59 AM
To: openstack@lists.openstack.org
Subject: Re: [Openstack] Masakari client error
On 2018-05-16 12:30:47 + (+), Torin Woltjer wrote:
[...]
> I am using Pike and not Queens so the openstacksdk version 13 is
> not available in the repository. Should openstacksdk version 0.13
> still work with Pike
[...]

OpenStackSDK strives for backwards-compatibility with even fairly
ancient OpenStack releases, and is not tied to any particular
version of OpenStack services. It should always be safe to run the
latest releases of OpenStackSDK no matter the age of the deployment
with which you intend to communicate.

Note however that the dependencies of OpenStackSDK may conflict with
dependencies of some OpenStack service, so you can't necessarily
expect to be able to co-install them on the same machine without
some means of context separation (virtualenvs, containers, pip
install --local, et cetera).
--
Jeremy Stanley
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Disclaimer: This email and any attachments are sent in strictest confidence for 
the sole use of the addressee and may contain legally privileged,confidential, 
and proprietary data. If you are not the intended recipient,please advise the 
sender by replying promptly to this email and then delete and destroy this 
email and any attachments without any further use, copying or forwarding.

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Masakari client error

2018-05-16 Thread Torin Woltjer

It looks like pip install actually upgraded my openstacksdk to 0.13 when I 
installed masakari from pip. Meanwhile the sdk in the 16.04 repository is 
0.9.17. I'm wondering now if this might explain why my block storage is also 
having problems. What is the process for setting up a local environment for 
separate versions of the SDK (With different services using each?)

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

From: Jeremy Stanley <fu...@yuggoth.org>
Sent: 5/16/18 8:59 AM
To: openstack@lists.openstack.org
Subject: Re: [Openstack] Masakari client error
On 2018-05-16 12:30:47 + (+0000), Torin Woltjer wrote:
[...]
> I am using Pike and not Queens so the openstacksdk version 13 is
> not available in the repository. Should openstacksdk version 0.13
> still work with Pike
[...]

OpenStackSDK strives for backwards-compatibility with even fairly
ancient OpenStack releases, and is not tied to any particular
version of OpenStack services. It should always be safe to run the
latest releases of OpenStackSDK no matter the age of the deployment
with which you intend to communicate.

Note however that the dependencies of OpenStackSDK may conflict with
dependencies of some OpenStack service, so you can't necessarily
expect to be able to co-install them on the same machine without
some means of context separation (virtualenvs, containers, pip
install --local, et cetera).
--
Jeremy Stanley
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Masakari client error

2018-05-16 Thread Torin Woltjer

Hello again,

I am not using the git version of masakari anymore, I am using the version 
installed from python pip. I am using Pike and not Queens so the openstacksdk 
version 13 is not available in the repository. Should openstacksdk version 0.13 
still work with Pike, and should this version of masakari still work with Pike?

Thanks,

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Patil, Tushar" <tushar.pa...@nttdata.com>
Sent: 5/16/18 12:29 AM
To: "openstack@lists.openstack.org" <openstack@lists.openstack.org>, 
"torin.wolt...@granddial.com" <torin.wolt...@granddial.com>
Subject: Re: [Openstack] Masakari client error
Hi Torin,

Few days back, this patch [1] got merged in which the service type is changed 
from "ha" to "instance_ha".

We have tried reproducing the issue you are facing but we are not getting the 
exact same error. With different versions of openstacksdk, we got different 
errors.

Masakariclient/masakari-monitors requires openstacksdk version 0.13.0.

Today we have fixed LP bug [2] in patch [3] which should also fix the issue you 
are facing.

We will release another version of python-masakariclient soon.

Are you installing masakari using devstack? If yes, please install masakari 
from scratch.
After installing latest masakari, you should be able to run segment-list and 
host-list using openstack commands.
If you want to run same commands using masakariclient, then you will need to 
wait until new version of masakariclient is released or you can apply patch [3] 
in your environment. If you need any help in applying patches, please ask for 
help on #openstack-masakari IRC.

Simple way to install latest masakariclient from code:
1. git clone https://github.com/openstack/python-masakariclient.git
2. Go to folder python-masakariclient
3. sudo python setup.py install

If you find any issues in Masakari, you can also report bugs in launchpad 
against below respective projects.
http://launchpad.net/python-masakariclient
https://launchpad.net/masakari-monitors
https://launchpad.net/masakari

Hope this helps!!!

[1] : https://review.openstack.org/#/c/536653/
[2] : https://bugs.launchpad.net/python-masakariclient/+bug/1756047
[3] : https://review.openstack.org/#/c/557634/

Regards,
Tushar Patil


From: Torin Woltjer
Sent: Tuesday, May 15, 2018 11:36:11 PM
To: openstack@lists.openstack.org
Subject: [Openstack] Masakari client error

I am using the masakari client version 5.0.0 installed from python pip. I keep 
getting the following error:
("'Connection' object has no attribute 'ha'", ', mode 'w' at 0x7f6ee88791e0>)
when I try to run any commands with it: segment-list host-list etc. It's 
entirely possible that I'm missing some peice of configuration, or have 
something improperly configured, but there isn't sufficient documentation for 
me to figure out if or what. Anybody have a working example that I can see, or 
know if this an issue?
Disclaimer: This email and any attachments are sent in strictest confidence for 
the sole use of the addressee and may contain legally privileged,confidential, 
and proprietary data. If you are not the intended recipient,please advise the 
sender by replying promptly to this email and then delete and destroy this 
email and any attachments without any further use, copying or forwarding.


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] Masakari client error

2018-05-15 Thread Torin Woltjer

I am using the masakari client version 5.0.0 installed from python pip. I keep 
getting the following error:
("'Connection' object has no attribute 'ha'", ', mode 'w' 
at 0x7f6ee88791e0>)
when I try to run any commands with it: segment-list host-list etc. It's 
entirely possible that I'm missing some peice of configuration, or have 
something improperly configured, but there isn't sufficient documentation for 
me to figure out if or what. Anybody have a working example that I can see, or 
know if this an issue?


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Database (Got timeout reading communication packets)

2018-05-14 Thread Torin Woltjer

>While I was working on something else I remembered the error messages >you 
>described, I have them, too. It's a lab environment on hardware >nodes with a 
>sufficient network connection, and since we had to debug >network issues 
>before, we can rule out network problems in our case. >I found a website [1] 
>to track down galera issues, I tried to apply >those steps and it seems that 
>the openstack code doesn't close the >connections properly, hence the aborted 
>connections. >I'm not sure if this is the correct interpretation, but since I 
>didn't >face any problems related to the openstack databases I decided to 
>>ignore these messages as long as the openstack environment works >properly. I 
>did think something similar to this initially when I noticed a high number of 
>sleeping connections, but because I was unsure I thought to ask. Because this 
>effects all Openstack services as a whole, what project would I file a bug 
>report on?

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Database (Got timeout reading communication packets)

2018-05-14 Thread Torin Woltjer

>are these interruptions occasionally or do they occur all the time? Is
>this a new issue or has this happened before?

This is a 3 node Galera cluster on 3 KVM virtual machines. The errors are
constantly printing in the logs, and no node is excluded from receiving the
errors. I don't know whether they had always been there or not, but I
noticed them after an update.

>Does the openstack environment work as expected despite these messages
>or do you experience interruptions in the services?

The openstack services operate normally, the dashboard is fairly slow, but it
always has been.

>I would check the network setup first (I have read about loose cables
>in different threads...), maybe run some ping tests between the
>machines to see if there's anything weird. Since you mention different
>services reporting these interruptions this seems like a network issue
>to me.

The hosts are all networked with bonded 10G SFP+ cables networked via a
switch. Pings between the VMs seem fine. If I were to guess, any networking
problem would be between the guest and host due to libvirt. Anything that I
should be looking for there?


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] HA Compute & Instance Evacuation

2018-05-11 Thread Torin Woltjer

On Friday, May 11, 2018 12:40:58 AM EDT Patil, Tushar wrote:
> I think this is what is needed to make it work.
> Install openstacksdk version 0.13.0.
> 
> Apply  patch: https://review.openstack.org/#/c/546492/
> 
> In this patch ,we need to bump openstacksdk version from 0.11.2 to 0.13.0.
> We will merge above patch soon.

Do you have a timetable on when the patch will be merged? If it is a 
relatively small window of time, I would rather wait to use the patched 
mainline code. Otherwise, I am willing to try to work with the patch. 
Additionally, patching python is something that I am not familiar with. Is 
there a good resource on doing this?

You have been a great help so far, thanks again.

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] Database (Got timeout reading communication packets)

2018-05-08 Thread Torin Woltjer

Just the other day I noticed a bunch of errors spewing from the mysql service. 
I've spent quite a bit of time trying to track this down, and I haven't had any 
luck figuring out why this is happening. The following line is repeatedly 
spewed in the service's journal.

May 08 11:13:47 UBNTU-DBMQ2 mysqld[20788]: 2018-05-08 11:13:47 140127545740032 
[Warning] Aborted connection 211 to db: 'nova_api' user: 'nova' host: 
'192.168.116.21' (Got timeout reading communication packets)

It isn't always nova_api, it's happening with all of the openstack projects. 
And either of the controller node's ip addresses.

The database is a mariadb galera cluster. Removing haproxy has no effect. The 
output only occurs on the node receiving the connections; with haproxy it is 
multiple nodes, otherwise it is whatever node I specify as database in my 
controllers' host file's.


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] HA Compute & Instance Evacuation

2018-05-04 Thread Torin Woltjer

Thank you very much for the information. Just for clarification, when you say 
reserved hosts, do you mean that I must keep unloaded virtualization hosts in 
reserve? Or can Masakari move instances from a downed host to an already loaded 
host that has open capacity?


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] HA Compute & Instance Evacuation

2018-05-02 Thread Torin Woltjer

I'm vaguely familiar with Pacemaker/Corosync, as I'm using it with HAProxy on 
my controller nodes. I'm assuming in this instance that you use Pacemaker on 
your compute hosts so masakari can detect host outages? If possible could you 
go into more detail about the configuration? I would like to use Masakari and 
I'm having trouble finding a step by step or other documentation to get started 
with.


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] HA Compute & Instance Evacuation

2018-05-02 Thread Torin Woltjer

> There is no HA behaviour for compute nodes.
>
> You are referring to HA of workloads running on compute nodes, not HA of
> compute nodes themselves.
It was a mistake for me to say HA when referring to compute and instances. 
Really I want to avoid a situation where one of my compute hosts gives up the 
ghost, and all of the instances are offline until someone reboots them on a 
different host. I would like them to automatically reboot on a healthy compute 
node.

> Check out Masakari:
>
> https://wiki.openstack.org/wiki/Masakari
This looks like the kind of thing I'm searching for.

I'm seeing 3 components here, I'm assuming one goes on compute hosts and one or 
both of the others go on the control nodes? Is there any documentation 
outlining the procedure for deploying this? Will there be any problem running 
the Masakari API service on 2 machines simultaneously, sitting behind HAProxy?

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] HA Compute & Instance Evacuation

2018-05-02 Thread Torin Woltjer

I am working on setting up Openstack for HA and one of the last orders of 
business is getting HA behavior out of the compute nodes. Is there a project 
that will automatically evacuate instances from a downed or failed compute 
host, and automatically reboot them on their new host? I'm curious what 
suggestions people have about this, or whatever advice you might have. Is 
there a best way of getting this functionality, or anything else I should be 
aware of?

Thanks,




___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] Multiple floating IPs one instance

2018-04-27 Thread Torin Woltjer

Is it possible to run an instance with more than one floating IPs? It is not 
immediately evident how to do this, or whether it is even possible. I have an 
instance that I would like to have address on two separate networks, and would 
like to use floating IPs so that I can have that are capable of living longer 
than the instance itself.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] Nova VNC console broken

2018-04-20 Thread Torin Woltjer

After setting up HA for my openstack cluster, the nova console no longer works. 
Nothing of note appears in any of the logs at /var/log/nova on the controller 
or the compute node running the instance. I get a single line that looks 
relevant output to /var/log/apache2/errors.log on the controller node:

[Fri Apr 20 15:14:07.666495 2018] [wsgi:error] [pid 25807:tid 139801204832000] 
WARNING horizon.exceptions Recoverable error: No available console found.

Trying to run the command "openstack console url show" with a verbosity of 2 
outputs the following:
http://paste.openstack.org/show/719660/

Does anybody know the solution to this or of any way that I can further 
troubleshoot the issue?

Thanks,


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] nova live migration setup

2018-03-21 Thread Torin Woltjer

This cluster is still predeployment so some information isn't locked in place 
but currently:
We are using libvirt/kvm for the hypervisor.
The network is 2 primary VLANs on a 10GiB bond (one VLAN for provider, one for 
Management & Storage).
Network  load is currently very low, but I can expect it to be under reasonable 
load. I expect the majority of load to be from storage on the VMs and VOIP 
traffic.

The real intention of using live migration for us is only for compute node 
maintenance (without downtime), and not much else.


From: David Medberry <openst...@medberry.net>
Sent: 3/21/18 5:08 PM
To: torin.wolt...@granddial.com
Cc: OpenStack General <openstack@lists.openstack.org>
Subject: Re: [Openstack] nova live migration setup
Best practice is to use shared storage and then the "copy" is really only the 
active memory. A few changes came about in about the newton? timeframe that 
allows for some memory convergence.

Take a look at the nova release notes from that time forward and you should see 
reference to the change(s).

You likely won't get much more detail without providing a lot more detail about 
your environment (and maybe not even then.) This functionality is very 
dependent on your specific configuration regarding:
storage design
hypervisor choice
and is also very dependent upon
network load
network bandwidth
VM size
VM busy-ness
network design
nova structure (regions AZs, etc.)

-dave

On Wed, Mar 21, 2018 at 1:35 PM, Torin Woltjer <torin.wolt...@granddial.com> 
wrote:
I can't find any up to date official documentation on the topic, and only find 
documentation referring to the commands used. What is the best practice for 
setting up live migration for nova? I have used live migration over SSH in the 
past, but the documentation for how to do so is lost to me. Also there is live 
migration over TCP, is this preferable to ssh and how would you set it up. What 
are any general best practices for doing this, and what recommendations do you 
have?

Thanks,

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] nova live migration setup

2018-03-21 Thread Torin Woltjer

I can't find any up to date official documentation on the topic, and only find 
documentation referring to the commands used. What is the best practice for 
setting up live migration for nova? I have used live migration over SSH in the 
past, but the documentation for how to do so is lost to me. Also there is live 
migration over TCP, is this preferable to ssh and how would you set it up. What 
are any general best practices for doing this, and what recommendations do you 
have?

Thanks,


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] Cinder and HA

2018-03-20 Thread Torin Woltjer

I've set up haproxy, pacemaker and the like on some controller nodes and should 
have a highly available openstack cluster. One thing I notice almost 
immediately is that volumes show the host as whatever controller owned the VIP 
at the time of creation. Would this possibly be an issue? Is there a way to 
consolidate them to show only one host?


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Internal Server Error (HTTP 500) in PIKE

2018-03-16 Thread Torin Woltjer



Hi Vamsi,

It looks to me like an issue with the neutron service. I would suggest watching 
what happens in the various neutron logs when you try running the failing 
commands. If you see errors happening in the logs, paste them at 
http://paste.openstack.org for us to view.

Thanks,


From: A Vamsikrishna 
Sent: 3/16/18 2:31 PM
To: "openstack@lists.openstack.org" 
Cc: "Yamahata, Isaku" , "Bhatia, Manjeet S" 
, Isaku Yamahata 
Subject: [Openstack] Internal Server Error (HTTP 500) in PIKE
undefinedundefined

Hi All,



I am using Openstack PIKE & when I am seeing HTTP 500 error during below 
operations:



stack@pike-ctrl:~/devstack$ openstack port set --qos-policy BothRules 
af63928b-4061-443d-bd9e-622a8b120f90

HttpException: Internal Server Error (HTTP 500) (Request-ID: 
req-832be17f-e516-4840-a707-6e163c5454a0), Request Failed: internal server 
error while processing your request.



stack@pike-ctrl:~/devstack$ openstack network set --qos-policy BothRules 
8ee4a086-0c88-47bf-b0ed-0fb177b38f17

HttpException: Internal Server Error (HTTP 500) (Request-ID: 
req-e4badfe0-2245-454f-bc4b-d37733c9b506), Request Failed: internal server 
error while processing your request.



Lot of googling didn’t help much. Can you please help me with some pointers for 
reason behind this error & fix for this ?



I am using below wiki:



https://docs.openstack.org/neutron/pike/admin/config-qos.html



Thanks,

Vamsi


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] HA Guide, no Ubuntu instructions for HA Identity

2018-03-16 Thread Torin Woltjer

I'm currently going through the HA guide, setting up openstack HA on ubuntu 
server. I've gotten to this page, 
https://docs.openstack.org/ha-guide/controller-ha-identity.html , and there is 
no instructions for ubuntu. Would I be fine following the instructions for SUSE 
or is there a different process for setting up HA keystone on Ubuntu?


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Nova + LXD + Ceph?

2018-03-13 Thread Torin Woltjer

Thank you for the response James.

I now have a couple of further questions regarding boot volume support on 
nova-lxd.

Is this feature on the radar?
On nova-kvm documentation states you need shared storage for live migration; is 
this the same case with nova-lxd, or can you live migrate between compute hosts 
when using a dir storage pool for root?
Putting the host's LXD storage under a folder that a ceph pool is mounted on, 
while an obvious sleight of hand, what would the repercussions be?

I don't know if anyone has answers to these, but any are welcome. I'm assuming 
the feature I'm looking for relys on work from the nova project rather than the 
LXD project; I will try to track down a nova features timeline or submit a 
request myself.
James, any documentation you can put together would be great and I look forward 
to seeing it.

Thanks.


From: James Page <james.p...@ubuntu.com>
Sent: 3/13/18 5:33 AM
To: torin.wolt...@granddial.com
Cc: openstack@lists.openstack.org
Subject: Re: [Openstack] Nova + LXD + Ceph?
Hi Torin

On Mon, 12 Mar 2018 at 21:52, Torin Woltjer <torin.wolt...@granddial.com> wrote:
Hello,

I am looking to deploy an openstack cluster using LXD for compute and Ceph for 
storage, and I was running into some doubt as to whether this was possible; and 
doubt that nova-lxd was mature enough for production. If anyone is running 
nova-lxd in production, or knows anything about it, please let me know. I've 
had a hard time finding good informational resources on the topic, specifically 
relating to LXD + Ceph; which is supposedly possible, but I haven't heard if 
it's possible in Openstack. If you otherwise know of a resource that could be 
helpful to me, I would appreciate hearing it.

Short answer is that nova-lxd does support use with Ceph, but only for 
additional block devices  (I.e no boot from volume or ephemeral device support 
right now).

You have highlighted a documentation gap - there are a few non obvious things 
todo like ensuring that cinder creates RBD devices with a minimal feature set 
to support use the the kRBD driver used for nova-lxd.  Will look to put a howto 
for ceph in place shortly!

Cheers

James


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] Nova + LXD + Ceph?

2018-03-12 Thread Torin Woltjer

Hello,

I am looking to deploy an openstack cluster using LXD for compute and Ceph for 
storage, and I was running into some doubt as to whether this was possible; and 
doubt that nova-lxd was mature enough for production. If anyone is running 
nova-lxd in production, or knows anything about it, please let me know. I've 
had a hard time finding good informational resources on the topic, specifically 
relating to LXD + Ceph; which is supposedly possible, but I haven't heard if 
it's possible in Openstack. If you otherwise know of a resource that could be 
helpful to me, I would appreciate hearing it.

Thanks.


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] [Pike][Neutron] ERROR neutron.plugins.ml2.drivers.agent._common_agent - AgentNotFoundByTypeHost

2018-03-06 Thread Torin Woltjer

My virtual machines do not get their IP addresses, the dashboard does show the 
address they should have, but when using the console to access the virtual 
machine, it shows that no address is assigned to its interface. What kind of 
misconfiguration could've occured?

The following two line repeat in /var/log/nova/nova-compute.log on the compute 
node:

2018-03-06 13:34:15.051 32084 WARNING nova.compute.manager 
[req-cc5ee519-111f-4b70-b77f-b6607c5e611e ffe5adfe1f7c40a5b5d0a8f89e65a452 
358008d2e1a6428ab2abcf51b10d0a50 - default default] [instance: 
7249d430-743e-4463-8d28-d13cdb8cfddc] Received unexpected event 
network-vif-plugged-87e7138e-9e29-4e67-a181-077b3f6ea09b for instance
2018-03-06 13:34:17.563 32084 WARNING nova.compute.manager 
[req-512ef7b6-0936-4dd6-a7e0-0044cee7e9cf ffe5adfe1f7c40a5b5d0a8f89e65a452 
358008d2e1a6428ab2abcf51b10d0a50 - default default] [instance: 
7249d430-743e-4463-8d28-d13cdb8cfddc] Received unexpected event 
network-vif-unplugged-87e7138e-9e29-4e67-a181-077b3f6ea09b for instance

These errors repeat in /var/log/neutron/neutron-linuxbridge-agent.log

2018-03-06 13:38:49.403 1978 INFO neutron.agent.securitygroups_rpc 
[req-262cb010-9068-4ad9-b93d-bd0875fc66e1 - - - - -] Preparing filters for 
devices set(['tap87e7138e-9e'])
2018-03-06 13:38:52.286 1978 INFO neutron.agent.securitygroups_rpc 
[req-262cb010-9068-4ad9-b93d-bd0875fc66e1 - - - - -] Security group member 
updated [u'04a877fe-f6bc-445c-9e03-204a0cae9d32']
2018-03-06 13:38:52.289 1978 INFO 
neutron.plugins.ml2.drivers.agent._common_agent 
[req-262cb010-9068-4ad9-b93d-bd0875fc66e1 - - - - -] Port tap87e7138e-9e 
updated. Details: {u'profile': {}, u'network_qos_policy_id': None, 
u'qos_policy_id': None, u'allowed_address_pairs': [], u'admin_state_up': True, 
u'network_id': u'a06ac367-fe14-4bcd-96f3-8c8081a874ad', u'segmentation_id': 
None, u'mtu': 1500, u'device_owner': u'compute:nova', u'physical_network': 
u'provider', u'mac_address': u'fa:16:3e:23:49:97', u'device': 
u'tap87e7138e-9e', u'port_security_enabled': True, u'port_id': 
u'87e7138e-9e29-4e67-a181-077b3f6ea09b', u'fixed_ips': [{u'subnet_id': 
u'4dc26826-49f3-4cb9-8490-e4cc5e82853d', u'ip_address': u'216.109.195.245'}], 
u'network_type': u'flat'}
2018-03-06 13:38:55.392 1978 INFO neutron.agent.securitygroups_rpc 
[req-262cb010-9068-4ad9-b93d-bd0875fc66e1 - - - - -] Security group member 
updated [u'04a877fe-f6bc-445c-9e03-204a0cae9d32']
2018-03-06 13:38:55.810 1978 INFO neutron.agent.securitygroups_rpc 
[req-262cb010-9068-4ad9-b93d-bd0875fc66e1 - - - - -] Remove device filter for 
set(['tap87e7138e-9e'])
2018-03-06 13:38:57.468 1978 INFO 
neutron.plugins.ml2.drivers.agent._common_agent 
[req-262cb010-9068-4ad9-b93d-bd0875fc66e1 - - - - -] Attachment tap87e7138e-9e 
removed
2018-03-06 13:38:57.909 1978 INFO neutron.agent.securitygroups_rpc 
[req-262cb010-9068-4ad9-b93d-bd0875fc66e1 - - - - -] Security group member 
updated [u'04a877fe-f6bc-445c-9e03-204a0cae9d32']
2018-03-06 13:38:58.199 1978 ERROR 
neutron.plugins.ml2.drivers.agent._common_agent 
[req-262cb010-9068-4ad9-b93d-bd0875fc66e1 - - - - -] Error occurred while 
removing port tap87e7138e-9e: RemoteError: Remote error: 
AgentNotFoundByTypeHost Agent with agent_type=L3 agent and 
host=UBNTU-OSTACK-COMPUTE1 could not be found
[u'Traceback (most recent call last):\n', u'  File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 160, in 
_process_incoming\nres = self.dispatcher.dispatch(message)\n', u'  File 
"/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 213, 
in dispatch\nreturn self._do_dispatch(endpoint, method, ctxt, args)\n', u'  
File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 
183, in _do_dispatch\nresult = func(ctxt, **new_args)\n', u'  File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/rpc.py", line 234, in 
update_device_down\nn_const.PORT_STATUS_DOWN, host)\n', u'  File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/rpc.py", line 331, in 
notify_l2pop_port_wiring\n
l2pop_driver.obj.update_port_down(port_context)\n', u'  File 
"/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py",
 line 253, in update_port_down\nadmin_context, agent_host, 
[port[\'device_id\']]):\n', u'  File 
"/usr/lib/python2.7/dist-packages/neutron/db/l3_agentschedulers_db.py", line 
303, in list_router_ids_on_host\ncontext, constants.AGENT_TYPE_L3, 
host)\n', u'  File "/usr/lib/python2.7/dist-packages/neutron/db/agents_db.py", 
line 291, in _get_agent_by_type_and_host\nhost=host)\n', 
u'AgentNotFoundByTypeHost: Agent with agent_type=L3 agent and 
host=UBNTU-OSTACK-COMPUTE1 could not be found\n'].
2018-03-06 13:38:58.199 1978 ERROR 
neutron.plugins.ml2.drivers.agent._common_agent Traceback (most recent call 
last):
2018-03-06 13:38:58.199 1978 ERROR 
neutron.plugins.ml2.drivers.agent._common_agent   File

[Openstack] Migration of attached cinder volumes fails.

2018-03-05 Thread Torin Woltjer

The backend being used for all storage is ceph, with different pools for nova, 
glance, and cinder; with cinder having a separate pool for ssd and hdd. The 
goal is being able to migrate VM's from HDD backed storage to SSD backed 
storage without downtime. Migrating volumes that are not attached works as 
expected; however, when migrating a volume attached to an instance, the 
migration appears to fail. I can see the new volume created, and then deleted 
as the old volume remains. This is the log file for nova-compute during the 
migration http://paste.openstack.org/raw/691729/


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

75 matches

Mail list logo